Array Functions
In Refine an Array is data structure representing an ordered set of values, displayed as a comma separated list enclosed by square brackets, e.g. ["example", "two", "three"]
.
However, cells can not directly store an array. Instead, you will have to split a value to create an array, use array functions to manipulate the data, and then join the array back into a string.
If you think of the patterns in your strings as potential chunks in an array, this becomes a surprisingly powerful approach for wrangling data! Puzzles like getting the last word from a the sentence, the stuff after the coma, the unique words, or the filename from a URL become much easier.
Split
Since cells do not contain arrays, we start by strategically splitting the value into parts. For example, create an array by splitting:
- a multi-valued cell on a separator or deliminator character, e.g. semicolon
value.split(";")
or pipevalue.split("|")
- a string based on a meaningful feature or pattern, e.g. space for basic word
value.split(" ")
, white spacevalue.split(/\s/)
, or new linevalue.split(/\n/)
- a string based on pattern of the numbers of characters, e.g.
value.splitByLengths(3,3,4)
(note: this will throw out characters beyond the total)
Bracket Notation
The individual values in the array can be accessed using an index number in bracket notation.
The index starts at 0, e.g. first item value.split(";")[0]
or third item value.split(";").sort()[2]
.
Alternatively, you can use negative numbers to count backward in the array, e.g. last item value.split(";")[-1]
or third to last item value.split(";").sort()[-3]
.
Bracket notation can also be used to select a range of items (a short cut for slice()
), e.g. value.split(";").sort()[2,4]
.
Join
Some expressions, such as using an index number or length, will result in a regular string value ready to go into the cell.
However, if the result of your expression is an array, you will need to reconstitute it into a string using join()
.
This could be the same pattern that you used to split, a new character to create a string, or unique deliminator useful for future splits, e.g. .join("|||")
or .join(";")
.
If you forget to add the join, the result of the transformation will be an error. Depending on what “On error” option you chose, it may not be clear that your expression didn’t work!
Examples of Common Array Functions
For example, if we had a column with multi-valued cells representing lists of items like “dogs;cats;muffins;cats”:
Description | Expression | Output |
---|---|---|
split() creates an array |
value.split(";") |
[ “dogs”, “cats”, “muffins”, “cats” ] |
length() count the number of items in the array |
value.split(";").length() |
4 |
sort() returns the array sorted, ascending order, case-sensitive with uppercase first and lowercase second |
value.split(";").sort().join(";") |
“cats;cats;dogs;muffins” |
reverse() returns the array in reversed order |
value.split(";").reverse().join(";") |
“cats;muffins;cats;dogs” |
uniques() returns only the unique values, removing duplicates |
value.split(";").uniques().join(";") |
“dogs;cats;muffins” |
slice() returns an array of items starting from the first argument (including it) to the second number (no including it). |
value.split(";").slice(n, n) or bracket value.split(";")[n, n] |
|
Remove first item | value.split(";").slice(1).join(";") |
“cats;muffins;cats” |
Remove last item | value.split(";").slice(0,-1).join(";") |
“dogs;cats;muffins” |
Remove first and last | value.split(";")[1,-1].join(";") |
“cats;muffins” |
Get the second and third items | value.split(";").slice(1,3).join(";") |
“cats;muffins” |
Exercises
Extract the identifier from the “Document” column.
One method using two splits: value.split(",")[0].split(" ")[-1]