arquero

Arquero API Reference

Top-Level Table Verbs Op Functions Expressions Extensibility


Core Verbs


# table.derive(values[, options]) · Source

Derive new column values based on the provided expressions.

Examples

table.derive({ sumXY: d => d.x + d.y })
table.derive({ z: d => d.x * d.y }, { before: 'x' })

# table.filter(criteria) · Source

Filter a table to a subset of rows based on the input criteria. The resulting table provides a filtered view over the original data; no data copy is made. To create a table that copies only filtered data to new data structures, call reify on the output table.

Examples

table.filter(d => op.abs(d.value) < 5)

# table.slice([start, end]) · Source

Extract rows with indices from start to end (end not included), where start and end represent per-group ordered row numbers in the table. The table row indices are determined by the current orderby settings. The start and end arguments are applied separately to each group, as determined by groupby.

Examples

// slice the table to include all rows except for the first and last
table.slice(1, -1)
// extract (up to) the first two rows of each group
table.groupby('colA').slice(0, 2)

# table.groupby(…keys) · Source

Group table rows based on a set of column values. Subsequent operations that are sensitive to grouping (such as aggregate functions) will operate over the grouped rows. To undo grouping, use ungroup.

Examples

table.groupby('colA', 'colB')
table.groupby({ key: d => d.colA + d.colB })

# table.ungroup() · Source

Ungroup a table, removing any grouping criteria. Undoes the effects of groupby.

Examples

table.ungroup()

# table.orderby(…keys) · Source

Order table rows based on a set of column values. Subsequent operations sensitive to ordering (such as window functions) will operate over sorted values. The resulting table provides an view over the original data, without any copying. To create a table with sorted data copied to new data strucures, call reify on the result of this method. To undo ordering, use unorder.

Examples

// order by column 'a' in ascending order, than 'b' in descending order
table.orderby('a', aq.desc('b'))
// same as above, but with object syntax
// key order is significant, but the key names are ignored
table.orderby({ a: 'a', b: aq.desc('b') )})
// order by column 'a' according to German locale settings
table.orderby(aq.collate('a', 'de'))
// orderby accepts table expressions as well as column names
table.orderby(d => d.a)
// the configurations above can be combined
table.orderby(aq.desc(aq.collate(d => d.a, 'de')))

# table.unorder() · Source

Unorder a table, removing any sorting criteria. Undoes the effects of orderby.

Examples

table.unorder()

# table.rollup(values) · Source

Rollup a table to produce an aggregate summary. Often used in conjunction with groupby. To produce counts only, count provides a convenient shortcut.

Examples

table.groupby('colA').rollup({ mean: d => op.mean(d.colB) })
table.groupby('colA').rollup({ mean: op.median('colB') })

# table.count([options]) · Source

Count the number of values in a group. This method is a shorthand for rollup with a count aggregate function.

Examples

table.groupby('colA').count()
table.groupby('colA').count({ as: 'num' })

# table.sample(size[, options]) · Source

Generate a table from a random sample of rows. If the table is grouped, perform stratified sampling by sampling separately from each group.

Examples

// sample 50 rows without replacement
table.sample(50)
// sample 100 rows with replacement
table.sample(100, { replace: true })
// stratified sampling with dynamic sample size
table.groupby('colA').sample(aq.frac(0.5))
// sample twice the number of records in each group, with replacement
table.groupby('colA').sample(aq.frac(2), { replace: true })

# table.select(…columns) · Source

Select a subset of columns into a new table, potentially renaming them.

Examples

table.select('colA', 'colB')
table.select(aq.not('colB', 'colC'))
table.select({ colA: 'newA', colB: 'newB' })

# table.relocate(columns, options) · Source

Relocate a subset of columns to change their positions, also potentially renaming them.

Examples

// place colY and colZ immediately after colX
table.relocate(['colY', 'colZ'], { after: 'colX' })
// place all columns but colB and colC immediately before
// the position of colA prior to relocation
table.relocate(not('colB', 'colC'), { before: 'colA' })
// place colA and colB immediately after colC, while also
// respectively renaming them as newA and newB
table.relocate({ colA: 'newA', colB: 'newB' }, { after: 'colC' })

# table.rename(columns) · Source

Rename one or more columns, preserving column order. The columns input should be an object or Map instance that maps existing column names to new column names. Use the names() helper function to create a rename map based on integer column indices.

Examples

// rename colA to colA2
table.rename({ colA: 'colA2' })
// rename 'old col' to 'new col'
table.rename({ 'old col': 'new col' })
// rename colA and colB
table.rename({ colA: 'colA2', colB: 'colB2' })
// rename colA and colB, alternate syntax
table.rename({ colA: 'colA2' }, { colB: 'colB2' })
// rename the first two columns (by index) to 'colA2' and 'colB2'
table.rename(aq.names('colA2', 'colB2'))

# table.reify([indices]) · Source

Create a new fully-materialized instance of this table. All filter and orderby settings are removed from the new table. Instead, the data itself is filtered and ordered as needed to produce new backing data columns.

Examples

// materialize any internal filtering and ordering
table.reify()


Join Verbs


# table.cross(other[, values, options]) · Source

Produce the Cartesian cross product of two tables. The output table has one row for every pair of input table rows. Beware that outputs may be quite large, as the number of output rows is the product of the input row counts. This method is a convenient shorthand for a join in which the join criteria is always true.

Examples

table.cross(other)
table.cross(other, [['leftKey', 'leftVal'], ['rightVal']])

# table.join(other[, on, values, options]) · Source

Join two tables, extending the columns of one table with values from the other table. The current table is considered the “left” table in the join, and the new table input is considered the “right” table in the join. By default an inner join is performed, removing all rows that do not match the join criteria. To perform left, right, or full outer joins, use the join_left, join_right, or join_full methods, or provide an options argument.

Examples

table.join(other, 'keyShared')
table.join(other, ['keyL', 'keyR'])
table.join(other, (a, b) => op.equal(a.keyL, b.keyR))

# table.join_left(other[, on, values, options]) · Source

Perform a left outer join on two tables. Rows in the left table that do not match a row in the right table will be preserved. This method is a convenient shorthand with fixed options {left: true, right: false} passed to join.

Examples

table.join_left(other, 'keyShared')
table.join_left(other, ['keyL', 'keyR'])
table.join_left(other, (a, b) => op.equal(a.keyL, b.keyR))

# table.join_right(other[, on, values, options]) · Source

Perform a right outer join on two tables. Rows in the right table that do not match a row in the left table will be preserved. This method is a convenient shorthand with fixed options {left: false, right: true} passed to join.

Examples

table.join_right(other, 'keyShared')
table.join_right(other, ['keyL', 'keyR'])
table.join_right(other, (a, b) => op.equal(a.keyL, b.keyR))

# table.join_full(other[, on, values, options]) · Source

Perform a full outer join on two tables. Rows in either the left or right table that do not match a row in the other will be preserved. This method is a convenient shorthand with fixed options {left: true, right: true} passed to join.

Examples

table.join_full(other, 'keyShared')
table.join_full(other, ['keyL', 'keyR'])
table.join_full(other, (a, b) => op.equal(a.keyL, b.keyR))

# table.lookup(other, on, …values) · Source

Lookup values from a secondary table and add them as new columns. A lookup occurs upon matching key values for rows in both tables. If the secondary table has multiple rows with the same key, only the last observed instance will be considered in the lookup. Lookup is similar to join_left, but with a streamlined syntax and the added constraint of allowing at most one match only.

Example

table.lookup(other, ['key1', 'key2'], 'value1', 'value2')

# table.semijoin(other[, on]) · Source

Perform a semi-join, filtering the left table to only rows that match a row in the right table.

Similar to the filter verb, the resulting table provides a filtered view over the original data; no data copy is made. To create a table that copies only semi-joined data to new data structures, call reify on the output table.

Examples

table.semijoin(other)
table.semijoin(other, 'keyShared')
table.semijoin(other, ['keyL', 'keyR'])
table.semijoin(other, (a, b) => op.equal(a.keyL, b.keyR))

# table.antijoin(other[, on]) · Source

Perform an anti-join, filtering the left table to only rows that do not match a row in the right table.

Similar to the filter verb, the resulting table provides a filtered view over the original data; no data copy is made. To create a table that copies only anti-joined data to new data structures, call reify on the output table.

Examples

table.antijoin(other)
table.antijoin(other, 'keyShared')
table.antijoin(other, ['keyL', 'keyR'])
table.antijoin(other, (a, b) => op.equal(a.keyL, b.keyR))


Cleaning Verbs


# table.dedupe(…keys) · Source

De-duplicate table rows by removing repeated row values.

Examples

// remove rows that duplicate all column values
table.dedupe()
// remove rows that duplicate the 'a' and 'b' columns
table.dedupe('a', 'b')
// remove rows that duplicate the absolute value of column 'a'
table.dedupe({ abs: d => op.abs(d.a) })

# table.impute(values[, options]) · Source

Impute missing values or rows. Any of null, undefined, or NaN are considered missing values.

The expand option additionally imputes new rows for missing combinations of values. All combinations of expand values (the full cross product) are considered for each group (if specified by groupby). New rows are added for any combination of expand and groupby values not already contained in the table; the additional columns are populated with imputed values (if specified in values) or are otherwise undefined.

The output table persists a groupby specification. If the expand option is specified, a reified table is returned without any filter or orderby settings.

Examples

// replace missing values in column 'v' with zeros
table.impute({ v: () => 0 })
// replace missing values in column 'v' with the mean of non-missing values
table.impute({ v: d => op.mean(d.v) })
// replace missing values in column 'v' with zeros
// impute rows based on all combinations of values in columns 'x' and 'y'
table.impute({ v: () => 0 }, { expand: ['x', 'y'] })


Reshape Verbs


# table.fold(values[, options]) · Source

Fold one or more columns into two key-value pair columns. The fold transform is an inverse of the pivot transform. The resulting table has two new columns, one containing the column names (named “key”) and the other the column values (named “value”). The number of output rows equals the original row count multiplied by the number of folded columns.

Examples

table.fold('colA')
table.fold(['colA', 'colB'])
table.fold(aq.range(5, 8))

# table.pivot(keys, values[, options]) · Source

Pivot columns into a cross-tabulation. The pivot transform is an inverse of the fold transform. The resulting table has new columns for each unique combination of the provided keys, populated with the provided values. The provided values must be aggregates, as a single set of keys may include more than one row. If string-valued, the any aggregate is used. If only one values column is defined, the new pivoted columns are named using key values directly. Otherwise, input value column names are included as a component of the output column names.

Examples

// pivot the values in the 'key' column to be new column names
// using the 'value' column as the new column values
// the any() aggregate combines multiple values with the same key
table.pivot('key', 'value')
// pivot lowercase values of the 'key' column to be new column names
// use the sum of corresponding 'value' entris as new column values
table.pivot(
  { key: d => op.lower(d.key) },
  { value: d => op.sum(d.value) }
)
// pivot on key column 'type' and value columns ['x', 'y']
// generates: { x_a: [1], x_b: [2], y_a: [3], y_b: [4] }
aq.table({ type: ['a', 'b'], x: [1, 2], y: [3, 4 ]})
  .pivot('type', ['x', 'y'])
// pivot on the combination of the keys 'foo' and 'bar' for the values of 'x' and 'y'
aq.table({ foo: ['a', 'b'], bar: ['u', 'v'], x: [1, 2], y: [3, 4 ]})
  .pivot(['foo', 'bar'], ['x', 'y'])

# table.spread(values[, options]) · Source

Spread array elements into a set of new columns. Output columns are named either according to the as option or using a combination of the input colum names and array index.

Examples

// generate new columns 'text_1', 'text_2', etc. by splitting on whitespace
// the input column 'text' is dropped from the output
table.spread({ text: d => op.split(d.text, ' ') })
// generate new columns 'text_1', 'text_2', etc. by splitting on whitespace
// the input column 'text' is retained in the output
table.spread({ text: d => op.split(d.text, ' ') }, { drop: false })
// spread the 'arrayCol' column across a maximum of 100 new columns
table.spread('arrayCol', { limit: 100 })
// extract the first two 'arrayCol' entries into 'value1', 'value2' columns
table.spread('arrayCol', { as: ['value1', 'value2'] })

# table.unroll(values[, options]) · Source

Unroll one or more array-valued columns into new rows. If more than one array value is used, the number of new rows is the smaller of the limit and the largest length. Values for all other columns are copied over.

Examples

table.unroll('colA', { limit: 1000 })
table.unroll('colA', { limit: 1000, index: 'idxnum' })


Set Verbs


# table.concat(…tables) · Source

Concatenate multiple tables into a single table, preserving all rows. This transformation mirrors the UNION_ALL operation in SQL. It is similar to union but retains all rows, without removing duplicates. Only named columns in this table are included in the output.

Examples

table.concat(other)
table.concat(other1, other2)
table.concat([other1, other2])

# table.union(…tables) · Source

Union multiple tables into a single table, deduplicating all rows. This transformation mirrors the UNION operation in SQL. It is similar to concat but suppresses duplicate rows with values identical to another row. Only named columns in this table are included in the output.

Examples

table.union(other)
table.union(other1, other2)
table.union([other1, other2])

# table.intersect(…tables) · Source

Intersect multiple tables, keeping only rows with matching values for all columns in all tables, and deduplicates the rows. This transformation is similar to a series of one or more semijoin calls, but additionally suppresses duplicate rows.

Examples

table.intersect(other)
table.intersect(other1, other2)
table.intersect([other1, other2])

# table.except(…tables) · Source

Compute the set difference with multiple tables, keeping only rows in this table whose values do not occur in the other tables. This transformation is similar to a series of one or more antijoin calls, but additionally suppresses duplicate rows.

Examples

table.except(other)
table.except(other1, other2)
table.except([other1, other2])