Top-Level | Table | Verbs | Op Functions | Expressions | Extensibility |
Methods for creating new table instances.
# aq.table(columns[, names]) · Source
Create a new table for a set of named columns, optionally including an array of ordered column names. The columns input can be an object or Map with names for keys and columns for values, or an entry array of [name, values]
tuples.
JavaScript objects have specific key ordering rules: keys are enumerated in the order they are assigned, except for integer keys, which are enumerated first in sorted order. As a result, when using a standard object any columns entries with integer keys are listed first regardless of their order in the object definition. Use the names argument to ensure proper column ordering is respected. Map and entry arrays will preserve name ordering, in which case the names argument is only needed if you wish to specify an ordering different from the columns input.
To bind together columns from multiple tables with the same number of rows, use the table assign method. To transform the table, use the various verb methods.
[[name, values], ...]
. Keys are column name strings; the enumeration order of the keys determines the column indices if the names argument is not provided. Column values should be arrays (or array-like values) of identical length.Examples
// create a new table with 2 columns and 3 rows
aq.table({ colA: ['a', 'b', 'c'], colB: [3, 4, 5] })
// create a new table, preserving column order for integer names
aq.table({ key: ['a', 'b'], 1: [9, 8], 2: [7, 6] }, ['key', '1', '2'])
// create a new table from a Map instance
const map = new Map()
.set('colA', ['a', 'b', 'c'])
.set('colB', [3, 4, 5]);
aq.table(map)
# aq.from(values[, names]) · Source
Create a new table from an existing object, such as an array of objects or a set of key-value pairs.
Examples
// from an array, create a new table with two columns and two rows
// akin to table({ colA: [1, 3], colB: [2, 4] })
aq.from([ { colA: 1, colB: 2 }, { colA: 3, colB: 4 } ])
// from an object, create a new table with 'key' and 'value columns
// akin to table({ key: ['a', 'b', 'c'], value: [1, 2, 3] })
aq.from({ a: 1, b: 2, c: 3 })
// from a Map, create a new table with 'key' and 'value' columns
// akin to table({ key: ['d', 'e', 'f'], value: [4, 5, 6] })
aq.from(new Map([ ['d', 4], ['e', 5], ['f', 6] ])
# aq.fromArrow(arrowTable[, options]) · Source
Create a new table backed by an Apache Arrow table instance. The input arrowTable can either be an instantiated Arrow table instance or a byte array in the Arrow IPC format.
For most data types, Arquero uses binary-encoded Arrow columns as-is with zero data copying. For columns containing string, list (array), or struct values, Arquero additionally memoizes value lookups to amortize access costs. For dictionary columns, Arquero unpacks columns with null
entries or containing multiple record batches to optimize query performance.
This method performs parsing only. To both load and parse an Arrow file, use loadArrow.
true
) to convert Arrow date values to JavaScript Date objects. If false, defaults to what the Arrow implementation provides, typically timestamps as number values.true
) to convert Arrow fixed point decimal values to JavaScript numbers. If false, defaults to what the Arrow implementation provides, typically byte arrays. The conversion will be lossy if the decimal can not be exactly represented as a double-precision floating point number.
convertTimestamp: Boolean flag (default true
) to convert Arrow timestamp values to JavaScript Date objects. If false, defaults to what the Arrow implementation provides, typically timestamps as number values.
convertBigInt: Boolean flag (default false
) to convert Arrow integers with bit widths of 64 bits or higher to JavaScript numbers. If false, defaults to what the Arrow implementation provides, typically BigInt
values. The conversion will be lossy if the integer is so large it can not be exactly represented as a double-precision floating point number.
memoize: Boolean hint (default true
) to enable memoization of expensive conversions. If true, memoization is applied for string and nested (list, struct) types, caching extracted values to enable faster access. Memoization is also applied to converted Date values, in part to ensure exact object equality. This hint is ignored for dictionary columns, whose values are always memoized.Examples
// encode input array-of-objects data as an Arrow table
const arrowTable = aq.toArrow([
{ x: 1, y: 3.4 },
{ x: 2, y: 1.6 },
{ x: 3, y: 5.4 },
{ x: 4, y: 7.1 },
{ x: 5, y: 2.9 }
]);
// now access the Arrow-encoded data as an Arquero table
const dt = aq.fromArrow(arrowTable);
# aq.fromCSV(text[, options]) · Source
Parse a comma-separated values (CSV) text string into a table. Delimiters other than commas, such as tabs or pipes (‘|’), can be specified using the options argument. By default, automatic type inference is performed for input values; string values that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior set options.autoType to false
, which will cause all columns to be loaded as strings. To perform custom parsing of input column values, use options.parse.
This method performs parsing only. To both load and parse a CSV file, use loadCSV.
','
).'.'
).true
) to specify the presence of a header row. If true
, indicates the CSV contains a header row with column names. If false
, indicates the CSV does not contain a header row and the columns are given the names 'col1'
, 'col2'
, etc unless the names option is specified.true
.0
) before reading data.true
) for automatic type inference.1000
) to use for type inference.Examples
// create table from an input CSV string
// akin to table({ a: [1, 3], b: [2, 4] })
aq.fromCSV('a,b\n1,2\n3,4')
// skip commented lines
aq.fromCSV('# a comment\na,b\n1,2\n3,4', { comment: '#' })
// skip the first line
aq.fromCSV('# a comment\na,b\n1,2\n3,4', { skip: 1 })
// override autoType with custom parser for column 'a'
// akin to table({ a: ['00152', '30219'], b: [2, 4] })
aq.fromCSV('a,b\n00152,2\n30219,4', { parse: { a: String } })
// parse semi-colon delimited text with comma as decimal separator
aq.fromCSV('a;b\nu;-1,23\nv;3,45e5', { delimiter: ';', decimal: ',' })
// create table from an input CSV loaded from 'url'
// alternatively, use the loadCSV method
aq.fromCSV(await fetch(url).then(res => res.text()))
# aq.fromFixed(text[, options]) · Source
Parse a fixed-width file text string into a table. By default, automatic type inference is performed for input values; string values that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior set options.autoType to false
, which will cause all columns to be loaded as strings. To perform custom parsing of input column values, use options.parse.
This method performs parsing only. To both load and parse a fixed-width file, use loadFixed.
'.'
).0
) before reading data.true
) for automatic type inference.1000
) to use for type inference.Examples
// create table from an input fixed-width string
// akin to table({ u: ['a', 'b'], v: [1, 2] })
aq.fromFixed('a1\nb2', { widths: [1, 1], names: ['u', 'v'] })
Parse JavaScript Object Notation (JSON) data into a table. String values in JSON column arrays that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior, set options.autoType to false
. To perform custom parsing of input column values, use options.parse. Auto-type Date parsing is not performed for columns with custom parse options.
This method performs parsing only. To both load and parse a JSON file, use loadJSON.
The expected JSON data format is an object with column names for keys and column value arrays for values, like so:
{
"colA": ["a", "b", "c"],
"colB": [1, 2, 3]
}
The data payload can also be provided as the data property of an enclosing object, with an optional schema property containing table metadata such as a fields array of ordered column information:
{
"schema": {
"fields": [
{ "name": "colA" },
{ "name": "colB" }
]
},
"data": {
"colA": ["a", "b", "c"],
"colB": [1, 2, 3]
}
}
true
) for automatic type inference. If false
, automatic date parsing for input JSON strings is disabled.Examples
// create table from an input JSON string
// akin to table({ a: [1, 3], b: [2, 4] })
aq.fromJSON('{"a":[1,3],"b":[2,4]}')
// create table from an input JSON string
// akin to table({ a: [1, 3], b: [2, 4] }, ['a', 'b'])
aq.fromJSON(`{
"schema":{"fields":[{"name":"a"},{"name":"b"}]},
"data":{"a":[1,3],"b":[2,4]}
}`)
// create table from an input JSON string loaded from 'url'
aq.fromJSON(await fetch(url).then(res => res.text()))
// create table from an input JSON object loaded from 'url'
// disable autoType Date parsing
aq.fromJSON(await fetch(url).then(res => res.json()), { autoType: false })
Methods for loading files and creating new table instances.
# aq.load(url[, options]) · Source
Load data from a url or file and return a Promise for a table. A specific format parser can be provided with the using option, otherwise CSV format is assumed. This method’s options are also passed as the second argument to the format parser.
When invoked in the browser, the Fetch API is used to load the url. When invoked in node.js, the url argument can also be a local file path. If the input url string has a network protocol at the beginning (e.g., 'http://'
, 'https://'
, etc.) it is treated as a URL and the node-fetch library is used. If the 'file://'
protocol is used, the rest of the string should be an absolute file path, from which a local file is loaded. Otherwise the input is treated as a path to a local file and loaded using the node.js fs
module.
This method provides a generic base for file loading and parsing, and can be used to customize table parsing. To load CSV data, use the more specific loadCSV method. Similarly,to load JSON data use loadJSON and to load Apache Arrow data use loadArrow.
'arrayBuffer'
, 'json'
, or 'text'
(the default).Examples
// load table from a CSV file
const dt = await aq.load('data/table.csv', { using: aq.fromCSV });
// load table from a JSON file with column format
const dt = await aq.load('data/table.json', { as: 'json', using: aq.fromJSON })
// load table from a JSON file with array-of-objects format
const dt = await aq.load('data/table.json', { as: 'json', using: aq.from })
# aq.loadArrow(url[, options]) · Source
Load a file in the Apache Arrow IPC format from a url and return a Promise for a table.
This method performs both loading and parsing, and is equivalent to aq.load(url, { as: 'arrayBuffer', using: aq.fromArrow })
. To instead create an Arquero table for an Apache Arrow dataset that has already been loaded, use fromArrow.
When invoked in the browser, the Fetch API is used to load the url. When invoked in node.js, the url argument can also be a local file path. If the input url string has a network protocol at the beginning (e.g., 'http://'
, 'https://'
, etc.) it is treated as a URL and the node-fetch library is used. If the 'file://'
protocol is used, the rest of the string should be an absolute file path, from which a local file is loaded. Otherwise the input is treated as a path to a local file and loaded using the node.js fs
module.
Examples
// load table from an Apache Arrow file
const dt = await aq.loadArrow('data/table.arrow');
# aq.loadCSV(url[, options]) · Source
Load a comma-separated values (CSV) file from a url and return a Promise for a table. Delimiters other than commas, such as tabs or pipes (‘|’), can be specified using the options argument. By default, automatic type inference is performed for input values; string values that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior set options.autoType to false
, which will cause all columns to be loaded as strings. To perform custom parsing of input column values, use options.parse.
This method performs both loading and parsing, and is equivalent to aq.load(url, { using: aq.fromCSV })
. To instead parse a CSV string that has already been loaded, use fromCSV.
When invoked in the browser, the Fetch API is used to load the url. When invoked in node.js, the url argument can also be a local file path. If the input url string has a network protocol at the beginning (e.g., 'http://'
, 'https://'
, etc.) it is treated as a URL and the node-fetch library is used. If the 'file://'
protocol is used, the rest of the string should be an absolute file path, from which a local file is loaded. Otherwise the input is treated as a path to a local file and loaded using the node.js fs
module.
Examples
// load table from a CSV file
const dt = await aq.loadCSV('data/table.csv');
// load table from a tab-delimited file
const dt = await aq.loadCSV('data/table.tsv', { delimiter: '\t' })
# aq.loadFixed(url[, options]) · Source
Load a fixed-width file from a url and return a Promise for a table. By default, automatic type inference is performed for input values; string values that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior set options.autoType to false
, which will cause all columns to be loaded as strings. To perform custom parsing of input column values, use options.parse.
This method performs both loading and parsing, and is equivalent to aq.load(url, { using: aq.fromFixed })
. To instead parse a fixed width string that has already been loaded, use fromFixed.
When invoked in the browser, the Fetch API is used to load the url. When invoked in node.js, the url argument can also be a local file path. If the input url string has a network protocol at the beginning (e.g., 'http://'
, 'https://'
, etc.) it is treated as a URL and the node-fetch library is used. If the 'file://'
protocol is used, the rest of the string should be an absolute file path, from which a local file is loaded. Otherwise the input is treated as a path to a local file and loaded using the node.js fs
module.
Examples
// load table from a fixed-width file
const dt = await aq.loadFixed('a1\nb2', { widths: [1, 1], names: ['u', 'v'] });
# aq.loadJSON(url[, options]) · Source
Load a JavaScript Object Notation (JSON) file from a url and return a Promise for a table. If the loaded JSON is array-valued, an array-of-objects format is assumed and the from method is used to construct the table. Otherwise, a column object format (as produced by toJSON) is assumed and the fromJSON method is applied.
This method performs both loading and parsing, similar to aq.load(url, { as: 'json', using: aq.fromJSON })
. To instead parse JSON data that has already been loaded, use either from or fromJSON.
When parsing using fromJSON, string values in JSON column arrays that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior, set options.autoType to false
. To perform custom parsing of input column values, use options.parse. Auto-type Date parsing is not performed for columns with custom parse options.
When invoked in the browser, the Fetch API is used to load the url. When invoked in node.js, the url argument can also be a local file path. If the input url string has a network protocol at the beginning (e.g., 'http://'
, 'https://'
, etc.) it is treated as a URL and the node-fetch library is used. If the 'file://'
protocol is used, the rest of the string should be an absolute file path, from which a local file is loaded. Otherwise the input is treated as a path to a local file and loaded using the node.js fs
module.
Examples
// load table from a JSON file
const dt = await aq.loadJSON('data/table.json');
// load table from a JSON file, disable Date autoType
const dt = await aq.loadJSON('data/table.json', { autoType: false })
Methods for writing data to an output format. Most output methods are available as table methods, in addition to the top level namespace.
# aq.toArrow(data[, options]) · Source
Create an Apache Arrow table for the input data. The input data can be either an Arquero table or an array of standard JavaScript objects. This method will throw an error if type inference fails or if the generated columns have differing lengths. For Arquero tables, this method can instead be invoked as table.toArrow().
Infinity
).0
).types: An optional object indicating the Arrow data type to use for named columns. If specified, the input should be an object with column names for keys and Arrow data types for values. If a column’s data type is not explicitly provided, type inference will be performed.
Type values can either be instantiated Arrow DataType instances (for example, new Float64()
,new DateMilliseconds()
, etc.) or type enum codes (Type.Float64
, Type.Date
, Type.Dictionary
). High-level types map to specific data type instances as follows:
Type.Date
→ new DateMilliseconds()
Type.Dictionary
→ new Dictionary(new Utf8(), new Int32())
Type.Float
→ new Float64()
Type.Int
→ new Int32()
Type.Interval
→ new IntervalYearMonth()
Type.Time
→ new TimeMillisecond()
Types that require additional parameters (including List
, Struct
, and Timestamp
) can not be specified using type codes. Instead, use data type constructors from apache-arrow, such as new List(new Int32())
.
Examples
Encode Arrow data from an input Arquero table:
import { table, toArrow } from 'arquero';
import { Type } from 'apache-arrow';
// create Arquero table
const dt = table({
x: [1, 2, 3, 4, 5],
y: [3.4, 1.6, 5.4, 7.1, 2.9]
});
// encode as an Arrow table (infer data types)
// here, infers Uint8 for 'x' and Float64 for 'y'
// equivalent to dt.toArrow()
const at1 = toArrow(dt);
// encode into Arrow table (set explicit data types)
// equivalent to dt.toArrow({ types: { ... } })
const at2 = toArrow(dt, {
types: {
x: Type.Uint16,
y: Type.Float32
}
});
// serialize Arrow table to a transferable byte array
const bytes = at1.serialize();
Encode Arrow data from an input object array:
import { toArrow } from 'arquero';
// encode object array as an Arrow table (infer data types)
const at = toArrow([
{ x: 1, y: 3.4 },
{ x: 2, y: 1.6 },
{ x: 3, y: 5.4 },
{ x: 4, y: 7.1 },
{ x: 5, y: 2.9 }
]);
# table.toArrowBuffer(data[, options]) · Source
Format input data in the binary Apache Arrow IPC format. The input data can be either an Arquero table or an array of standard JavaScript objects. This method will throw an error if type inference fails or if the generated columns have differing lengths. For Arquero tables, this method can instead be invoked as table.toArrowIPC().
The resulting binary data may be saved to disk or passed between processes or tools. For example, when using Web Workers, the output of this method can be passed directly between threads (no data copy) as a Transferable object. Additionally, Arrow binary data can be loaded in other language environments such as Python or R.
This method will throw an error if type inference fails or if the generated columns have differing lengths.
'stream'
(default) or 'file'
.Examples
Encode Arrow data from an input Arquero table:
import { table, toArrowIPC } from 'arquero';
const dt = table({
x: [1, 2, 3, 4, 5],
y: [3.4, 1.6, 5.4, 7.1, 2.9]
});
// encode table as a transferable Arrow byte buffer
// here, infers Uint8 for 'x' and Float64 for 'y'
const bytes = toArrowIPC(dt);
Methods for invoking or modifying table expressions.
All table expression operations, including standard functions, aggregate functions, and window functions. See the Operations API Reference for documentation of all available functions.
# aq.agg(table, expression) · Source
Compute a single aggregate value for a table. This method is a convenient shortcut for ungrouping a table, applying a rollup verb for a single aggregate expression, and extracting the resulting aggregate value.
Examples
aq.agg(aq.table({ a: [1, 2, 3] }), op.max('a')) // 3
aq.agg(aq.table({ a: [1, 3, 5] }), d => [op.min(d.a), op.max('a')]) // [1, 5]
Annotate a JavaScript function or value to bypass Arquero’s default table expression handling. Escaped values enable the direct use of JavaScript functions to process row data: no internal parsing or code generation is performed, and so closures and arbitrary function invocations are supported. Escaped values provide a lightweight alternative to table params and function registration to access variables in enclosing scopes.
An escaped value can be applied anywhere Arquero accepts single-table table expressions, including the derive, filter, and spread verbs. In addition, any of the standard op
functions can be used within an escaped function. However, aggregate and window op
functions are not supported. Also note that using escaped values will break serialization of Arquero queries to worker threads.
op
functions are not permitted.Examples
// filter based on a variable defined in the enclosing scope
const thresh = 5;
aq.table({ a: [1, 4, 9], b: [1, 2, 3] })
.filter(aq.escape(d => d.a < thresh))
// { a: [1, 4], b: [1, 2] }
// apply a parsing function defined in the enclosing scope
const parseMDY = d3.timeParse('%m/%d/%Y');
aq.table({ date: ['1/1/2000', '06/01/2010', '12/10/2020'] })
.derive({ date: aq.escape(d => parseMDY(d.date)) })
// { date: [new Date(2000,0,1), new Date(2010,5,1), new Date(2020,11,10)] }
// spread results from an escaped function that returns an array
const denom = 4;
aq.table({ a: [1, 4, 9] })
.spread(
{ a: aq.escape(d => [Math.floor(d.a / denom), d.a % denom]) },
{ as: ['div', 'mod'] }
)
// { div: [0, 1, 2], mod: [1, 0, 1] }
# aq.bin(name[, options]) · Source
Generate a table expression that performs uniform binning of number values. The resulting string can be used as part of the input to table transformation verbs.
true
) indicating if bins should snap to “nice” human-friendly values such as multiples of ten.0
) floors to the lower bin boundary. A value of 1
snaps one step higher to the upper bin boundary, and so on.Examples
aq.bin('colA', { maxbins: 20 })
Annotate a table expression (expr) to indicate descending sort order.
Examples
// sort colA in descending order
aq.desc('colA')
// sort colA in descending order of lower case values
aq.desc(d => op.lower(d.colA))
Generate a table expression that computes the number of rows corresponding to a given fraction for each group. The resulting string can be used as part of the input to the sample verb.
Examples
aq.frac(0.5)
# aq.rolling(expr[, frame, includePeers]) · Source
Annotate a table expression to compute rolling aggregate or window functions within a sliding window frame. For example, to specify a rolling 7-day average centered on the current day, call rolling with a frame value of [-3, 3].
null
, the default frame [-Infinity, 0]
includes the current values and all preceding values.false
(the default), the window frame boundaries are insensitive to peer values. If true
, the window frame expands to include all peers. This parameter only affects operations that depend on the window frame: namely aggregate functions and the first_value, last_value, and nth_value window functions.Examples
// cumulative sum, with an implicit frame of [-Infinity, 0]
aq.rolling(d => op.sum(d.colA))
// centered 7-day moving average, assuming one value per day
aq.rolling(d => op.mean(d.colA), [-3, 3])
// retrieve last value in window frame, including peers (ties)
aq.rolling(d => op.last_value(d.colA), [-3, 3], true)
Set a seed value for random number generation. If the seed is a valid number, a 32-bit linear congruential generator with the given seed will be used to generate random values. If the seed is null
, undefined
, or not a valid number, the random number generator will revert to Math.random
.
Examples
// set random seed as an integer
aq.seed(12345)
// set random seed as a fraction, maps to floor(fraction * (2 ** 32))
aq.seed(0.5)
// revert to using Math.random
aq.seed(null)
Methods for selecting columns. The result of these methods can be passed as arguments to select, groupby, join and other transformation verbs.
Select all columns in a table. Returns a function-valued selection compatible with select.
Examples
aq.all()
Negate a column selection, selecting all other columns in a table. Returns a function-valued selection compatible with select.
Examples
aq.not('colA', 'colB')
aq.not(aq.range(2, 5))
# aq.range(start, stop) · Source
Select a contiguous range of columns. Returns a function-valued selection compatible with select.
Examples
aq.range('colB', 'colE')
aq.range(2, 5)
# aq.matches(pattern) · Source
Select all columns whose names match a pattern. Returns a function-valued selection compatible with select.
Examples
// contains the string 'col'
aq.matches('col')
// has 'a', 'b', or 'c' as the first character (case-insensitve)
aq.matches(/^[abc]/i)
# aq.startswith(string) · Source
Select all columns whose names start with a string. Returns a function-valued selection compatible with select.
Examples
aq.startswith('prefix_')
# aq.endswith(string) · Source
Select all columns whose names end with a string. Returns a function-valued selection compatible with select.
Examples
aq.endswith('_suffix')
Select columns by index and rename them to the provided names. Returns a selection helper function that takes a table as input and produces a rename map as output. If the number of provided names is less than the number of table columns, the rename map will include entries for the provided names only. If the number of table columns is less than then number of provided names, the rename map will include only entries that cover the existing columns.
Examples
// helper to rename the first three columns to 'a', 'b', 'c'
aq.names('a', 'b', 'c')
// names can also be passed as arrays
aq.names(['a', 'b', 'c'])
// rename the first three columns, all other columns remain as-is
table.rename(aq.names(['a', 'b', 'c']))
// select and rename the first three columns, all other columns are dropped
table.select(aq.names(['a', 'b', 'c']))