Top-Level | Table | Verbs | Op Functions | Expressions | Extensibility |
Methods for creating new table instances.
# aq.table(columns[, names]) · Source
Create a new table for a set of named columns, optionally including an array of ordered column names. The columns input can be an object or Map with names for keys and columns for values, or an entry array of [name, values]
tuples.
JavaScript objects have specific key ordering rules: keys are enumerated in the order they are assigned, except for integer keys, which are enumerated first in sorted order. As a result, when using a standard object any columns entries with integer keys are listed first regardless of their order in the object definition. Use the names argument to ensure proper column ordering is respected. Map and entry arrays will preserve name ordering, in which case the names argument is only needed if you wish to specify an ordering different from the columns input.
To bind together columns from multiple tables with the same number of rows, use the table assign method. To transform the table, use the various verb methods.
[[name, values], ...]
. Keys are column name strings; the enumeration order of the keys determines the column indices if the names argument is not provided. Column values should be arrays (or array-like values) of identical length.Examples
// create a new table with 2 columns and 3 rows
aq.table({ colA: ['a', 'b', 'c'], colB: [3, 4, 5] })
// create a new table, preserving column order for integer names
aq.table({ key: ['a', 'b'], 1: [9, 8], 2: [7, 6] }, ['key', '1', '2'])
// create a new table from a Map instance
const map = new Map()
.set('colA', ['a', 'b', 'c'])
.set('colB', [3, 4, 5]);
aq.table(map)
# aq.from(values[, names]) · Source
Create a new table from an existing object, such as an array of objects or a set of key-value pairs. For varied JSON formats, see the fromJSON
method.
Examples
// from an array, create a new table with two columns and two rows
// akin to table({ colA: [1, 3], colB: [2, 4] })
aq.from([ { colA: 1, colB: 2 }, { colA: 3, colB: 4 } ])
// from an object, create a new table with 'key' and 'value columns
// akin to table({ key: ['a', 'b', 'c'], value: [1, 2, 3] })
aq.from({ a: 1, b: 2, c: 3 })
// from a Map, create a new table with 'key' and 'value' columns
// akin to table({ key: ['d', 'e', 'f'], value: [4, 5, 6] })
aq.from(new Map([ ['d', 4], ['e', 5], ['f', 6] ])
Methods for loading files and parsing data formats to create new table instances.
# aq.loadArrow(url[, options]) · Source
Load a file in the Apache Arrow IPC binary format from a url and return a Promise for a table. Both the Arrow IPC stream
and file
formats are supported; the format type is determined automatically.
When invoked in the browser, the Fetch API is used to load the url. When invoked in node.js, the url argument can also be a local file path. If the input url string has a network protocol at the beginning (e.g., 'http://'
, 'https://'
, etc.) it is treated as a URL and fetch
is used. If the 'file://'
protocol is used, the rest of the string should be an absolute file path, from which a local file is loaded. Otherwise the input is treated as a path to a local file and opened using the node.js fs
module.
string
): The url or local file path (node.js only) to load.RequestInit
): Options to pass to the HTTP fetch method when loading a URL.'gzip' | 'deflate' | null
): A decompression format to apply. If unspecified, the decompression type is inferred from the file extension (.gz
for 'gzip'
, .zz
for 'deflate'
). If no matching extension is found, no decompression is performed.Select
): An ordered set of columns to import. The input may consist of: column name strings, column integer indices, objects with current column names as keys and new column names as values (for renaming), or a selection helper function such as all, not, or range.boolean
): Boolean flag (default false
) to extract 64-bit integer types as JavaScript BigInt
values. For Flechette tables, the default is to coerce 64-bit integers to JavaScript numbers and raise an error if the number is out of range. This option is only applied when parsing IPC binary data, otherwise the settings of the provided table instance are used.boolean
): Boolean flag (default true
) to convert Arrow date and timestamp values to JavaScript Date objects. Otherwise, numeric timestamps are used. This option is only applied when parsing IPC binary data, otherwise the settings of the provided table instance are used.boolean
): Boolean flag (default false
) to extract Arrow decimal-type data as BigInt values, where fractional digits are scaled to integers. Otherwise, decimals are (sometimes lossily) converted to floating-point numbers (default). This option is only applied when parsing IPC binary data, otherwise the settings of the provided table instance are used.boolean
): Boolean flag (default false
) to represent Arrow Map data as JavaScript Map
values. For Flechette tables, the default is to produce an array of [key, value]
arrays. This option is only applied when parsing IPC binary data, otherwise the settings of the provided table instance are used.boolean
): Boolean flag (default false
) to extract Arrow Struct values and table row objects using zero-copy proxy objects that extract data from underlying Arrow batches. The proxy objects can improve performance and reduce memory usage, but do not support property enumeration (Object.keys
, Object.values
, Object.entries
) or spreading ({ ...object }
). This option is only applied when parsing IPC binary data, otherwise the settings of the provided table instance are used.Examples
// load table from an Apache Arrow file
const dt = await aq.loadArrow('data/table.arrow');
# aq.loadCSV(url[, options]) · Source
Load a comma-separated values (CSV) file from a url and return a Promise for a table. Delimiters other than commas, such as tabs or pipes (‘|’), can be specified using the options argument. By default, automatic type inference is performed for input values; string values that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior set options.autoType to false
, which will cause all columns to be loaded as strings. To perform custom parsing of input column values, use options.parse.
When invoked in the browser, the Fetch API is used to load the url. When invoked in node.js, the url argument can also be a local file path. If the input url string has a network protocol at the beginning (e.g., 'http://'
, 'https://'
, etc.) it is treated as a URL and fetch
is used. If the 'file://'
protocol is used, the rest of the string should be an absolute file path, from which a local file is loaded. Otherwise the input is treated as a path to a local file and loaded using the node.js fs
module. In either case, stream processing is used to load the data while minimizing memory usage.
string
): The url or local file (node.js only) to load.RequestInit
): Options to pass to the HTTP fetch method when loading a URL.'gzip' | 'deflate' | null
): A decompression format to apply. If unspecified, the decompression type is inferred from the file extension (.gz
for 'gzip'
, .zz
for 'deflate'
). If no matching extension is found, no decompression is performed.string
): A single-character delimiter string between column values (default ','
).string
): A single-character numeric decimal separator (default '.'
).boolean
): Boolean flag (default true
) to specify the presence of a header row. If true
, indicates the CSV contains a header row with column names. If false
, indicates the CSV does not contain a header row and the columns are given the names 'col1'
, 'col2'
, etc unless the names option is specified.string[]
): An array of column names to use for header-less CSV files. This option is ignored if the header option is true
.number
): The number of lines to skip (default 0
) before reading data.string
): A string used to identify comment lines. Any lines that start with the comment pattern are skipped.true
): Boolean flag (default true
) for automatic type inference.number
): Maximum number of initial rows (default 1000
) to use for type inference.Record<string, function>
): Object of column parsing options. The object keys should be column names. The object values should be parsing functions to invoke to transform values upon input.Examples
// load table from a CSV file
const dt = await aq.loadCSV('data/table.csv');
// load table from a gzip-compressed CSV file
// the { decompress: 'gzip' } option is inferred from the file extension
const dt = await aq.loadCSV('data/table.csv.gz');
// load table from a tab-delimited file
const dt = await aq.loadCSV('data/table.tsv', { delimiter: '\t' })
# aq.loadFixed(url[, options]) · Source
Load a fixed-width file from a url and return a Promise for a table. By default, automatic type inference is performed for input values; string values that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior set the autoType option to false
, which will cause all columns to be loaded as strings. To perform custom parsing of input column values, use the parse option.
When invoked in the browser, the Fetch API is used to load the url. When invoked in node.js, the url argument can also be a local file path. If the input url string has a network protocol at the beginning (e.g., 'http://'
, 'https://'
, etc.) it is treated as a URL and fetch
is used. If the 'file://'
protocol is used, the rest of the string should be an absolute file path, from which a local file is loaded. Otherwise the input is treated as a path to a local file and loaded using the node.js fs
module. In either case, stream processing is used to load the data while minimizing memory usage.
string
): The url or local file (node.js only) to load.RequestInit
): Options to pass to the HTTP fetch method when loading a URL.'gzip' | 'deflate' | null
): A decompression format to apply. If unspecified, the decompression type is inferred from the file extension (.gz
for 'gzip'
, .zz
for 'deflate'
). If no matching extension is found, no decompression is performed.[number, number][]
): Array of [start, end] indices for fixed-width columns.number[]
): Array of fixed column widths. This option is ignored if the positions property is specified.string[]
): An array of column names. The array length should match the length of the positions or widths array. If not specified or shorter than the other array, default column names are generated.string
): A single-character numeric decimal separator (default '.'
).number
): The number of lines to skip (default 0
) before reading data.string
): A string used to identify comment lines. Any lines that start with the comment pattern are skipped.boolean
): Boolean flag (default true
) for automatic type inference.number
): Maximum number of initial rows (default 1000
) to use for type inference.Record<string, function>
): Object of column parsing options. The object keys should be column names. The object values should be parsing functions to invoke to transform values upon input.Examples
// load table from a fixed-width file
const dt = await aq.loadFixed(
'data/fixed-width.txt',
{ widths: [1, 1], names: ['u', 'v'] }
);
# aq.loadJSON(url[, options]) · Source
Load a JavaScript Object Notation (JSON) file from a url and return a Promise for a table. If the type option is unspecified and the loaded JSON is array-valued, an array-of-objects format is assumed. If object-valued, a column-oriented format is assumed. See the parseJSON method for format type examples.
By default, string values that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior, set the autoType option to false
. To perform custom parsing of input column values, use the parse option. Auto-type Date parsing is not performed for columns with custom parse options.
When invoked in the browser, the Fetch API is used to load the url. When invoked in node.js, the url argument can also be a local file path. If the input url string has a network protocol at the beginning (e.g., 'http://'
, 'https://'
, etc.) it is treated as a URL and fetch
is used. If the 'file://'
protocol is used, the rest of the string should be an absolute file path, from which a local file is loaded. Otherwise the input is treated as a path to a local file and loaded using the node.js fs
module. For the 'ndjson'
format type, stream processing is used to load the data while minimizing memory usage.
string
): The url or local file (node.js only) to load.RequestInit
): Options to pass to the HTTP fetch method when loading a URL.'gzip' | 'deflate' | null
): A decompression format to apply. If unspecified, the decompression type is inferred from the file extension (.gz
for 'gzip'
, .zz
for 'deflate'
). If no matching extension is found, no decompression is performed.'columns' | 'rows' | 'ndjson' | null
): The JSON format type. One of 'columns'
(for an object with named column arrays), 'rows'
(for an array for row objects), or 'ndjson'
for newline-delimited JSON rows. For 'ndjson'
, each line of text must contain a JSON row object (with no trailing comma) and string properties must not contain any newline characters. If no format type is specified, one of 'rows'
or 'columns'
is inferred from the structure of the parsed JSON.string[]
): An array of column names to include. JSON properties missing from this list are not included in the table.number
): The number of lines to skip (default 0
) before reading data. Applicable to the 'ndjson'
type only.string
): A string used to identify comment lines. Any lines that start with the comment pattern are skipped. Applicable to the ndjson
type only.boolean
): Boolean flag (default true
) for automatic type inference. If false
, automatic date parsing for input JSON strings is disabled.Record<string, function>
): Object of column parsing options. The object keys should be column names. The object values should be parsing functions to invoke to transform values upon input.Examples
// load table from a JSON file
const dt = await aq.loadJSON('data/table.json');
// load table from a JSON file, disable Date autoType
const dt = await aq.loadJSON('data/table.json', { autoType: false })
# aq.fromArrow(input[, options]) · Source
Returns a new table backed by Apache Arrow binary data. The input can be a byte array in the Arrow IPC format, or an instantiated Flechette or Apache Arrow JS table instance. Binary inputs are decoded using Flechette.
For many data types, Arquero uses binary-encoded Arrow columns as-is with zero data copying. For dictionary columns, Arquero unpacks columns with null
entries or containing multiple record batches to optimize query performance.
Both the Arrow IPC stream
and file
formats are supported; the format type is determined automatically. This method performs parsing only. To specify a URL or file to load, use loadArrow.
Select
): An ordered set of columns to import. The input may consist of: column name strings, column integer indices, objects with current column names as keys and new column names as values (for renaming), or a selection helper function such as all, not, or range.boolean
): Boolean flag (default false
) to extract 64-bit integer types as JavaScript BigInt
values. For Flechette tables, the default is to coerce 64-bit integers to JavaScript numbers and raise an error if the number is out of range. This option is only applied when parsing IPC binary data, otherwise the settings of the provided table instance are used.boolean
): Boolean flag (default true
) to convert Arrow date and timestamp values to JavaScript Date objects. Otherwise, numeric timestamps are used. This option is only applied when parsing IPC binary data, otherwise the settings of the provided table instance are used.boolean
): Boolean flag (default false
) to extract Arrow decimal-type data as BigInt values, where fractional digits are scaled to integers. Otherwise, decimals are (sometimes lossily) converted to floating-point numbers (default). This option is only applied when parsing IPC binary data, otherwise the settings of the provided table instance are used.boolean
): Boolean flag (default false
) to represent Arrow Map data as JavaScript Map
values. For Flechette tables, the default is to produce an array of [key, value]
arrays. This option is only applied when parsing IPC binary data, otherwise the settings of the provided table instance are used.boolean
): Boolean flag (default false
) to extract Arrow Struct values and table row objects using zero-copy proxy objects that extract data from underlying Arrow batches. The proxy objects can improve performance and reduce memory usage, but do not support property enumeration (Object.keys
, Object.values
, Object.entries
) or spreading ({ ...object }
). This option is only applied when parsing IPC binary data, otherwise the settings of the provided table instance are used.Examples
// encode input table as Arrow IPC bytes
const arrowBytes = aq.table({
x: [1, 2, 3, 4, 5],
y: [3.4, 1.6, 5.4, 7.1, 2.9]
})
.toArrowIPC();
// access the Arrow-encoded data as an Arquero table
const dt = aq.fromArrow(arrowBytes);
# aq.fromCSV(input[, options]) · Source
Parse a comma-separated values (CSV) input and return a table. Delimiters other than commas, such as tabs or pipes (‘|’), can be specified using the delimiter option. By default, automatic type inference is performed for input values; string values that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior set autoType option to false
, which will cause all columns to be loaded as strings. To perform custom parsing of input column values, use the parse option.
This method performs parsing only. To specify a URL or file to load, use loadCSV.
string
): A text string in a delimited-value format.string
): A single-character delimiter string between column values (default ','
).string
): A single-character numeric decimal separator (default '.'
).boolean
): Boolean flag (default true
) to specify the presence of a header row. If true
, indicates the CSV contains a header row with column names. If false
, indicates the CSV does not contain a header row and the columns are given the names 'col1'
, 'col2'
, etc unless the names option is specified.string[]
): An array of column names to use for header-less CSV files. This option is ignored if the header option is true
.number
): The number of lines to skip (default 0
) before reading data.string
): A string used to identify comment lines. Any lines that start with the comment pattern are skipped.true
): Boolean flag (default true
) for automatic type inference.number
): Maximum number of initial rows (default 1000
) to use for type inference.Record<string, function>
): Object of column parsing options. The object keys should be column names. The object values should be parsing functions to invoke to transform values upon input.Examples
// create table from an input CSV string
// akin to table({ a: [1, 3], b: [2, 4] })
aq.fromCSV('a,b\n1,2\n3,4')
// skip commented lines
aq.fromCSV('# a comment\na,b\n1,2\n3,4', { comment: '#' })
// skip the first line
aq.fromCSV('# a comment\na,b\n1,2\n3,4', { skip: 1 })
// override autoType with custom parser for column 'a'
// akin to table({ a: ['00152', '30219'], b: [2, 4] })
aq.fromCSV('a,b\n00152,2\n30219,4', { parse: { a: String } })
// parse semi-colon delimited text with comma as decimal separator
aq.fromCSV('a;b\nu;-1,23\nv;3,45e5', { delimiter: ';', decimal: ',' })
// create table from an input CSV loaded from 'url'
// for performant stream-based parsing, use the loadCSV method
aq.fromCSV(await fetch(url).then(res => res.text()))
# aq.fromFixed(input[, options]) · Source
Parse a fixed-width file input and a table. By default, automatic type inference is performed for input values; string values that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior set options.autoType to false
, which will cause all columns to be loaded as strings. To perform custom parsing of input column values, use options.parse.
This method performs parsing only. To specify a URL or file to load, use loadFixed.
string
): A text string in a fixed-width format.[number, number][]
): Array of [start, end] indices for fixed-width columns.number[]
): Array of fixed column widths. This option is ignored if the positions property is specified.string[]
): An array of column names. The array length should match the length of the positions or widths array. If not specified or shorter than the other array, default column names are generated.string
): A single-character numeric decimal separator (default '.'
).number
): The number of lines to skip (default 0
) before reading data.string
): A string used to identify comment lines. Any lines that start with the comment pattern are skipped.boolean
): Boolean flag (default true
) for automatic type inference.number
): Maximum number of initial rows (default 1000
) to use for type inference.Record<string, function>
): Object of column parsing options. The object keys should be column names. The object values should be parsing functions to invoke to transform values upon input.Examples
// create table from an input fixed-width string
// akin to table({ u: ['a', 'b'], v: [1, 2] })
aq.fromFixed('a1\nb2', { widths: [1, 1], names: ['u', 'v'] })
# aq.fromJSON(input[, options]) · Source
Parse JavaScript Object Notation (JSON) input into and return a table. String values in JSON column arrays that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior, set the autoType option to false
. To perform custom parsing of input column values, use the parse option. Auto-type Date parsing is not performed for columns with custom parse options.
This method performs parsing only. To specify a URL or file to load, use loadJSON. Additionally, the table reads pre-parsed column-oriented JSON data into an Arquero table without type inference, while the from method similarly maps pre-parsed row-oriented JSON data into an Arquero table.
input (string |
object[] |
object ): A string in a supported JSON format or pre-parsed JSON data. |
'columns' | 'rows' | 'ndjson' | null
): The JSON format type. One of 'columns'
(for an object with named column arrays), 'rows'
(for an array for row objects), or 'ndjson'
for newline-delimited JSON rows. For 'ndjson'
, each line of text must contain a JSON row object (with no trailing comma) and string properties must not contain any newline characters. If no format type is specified, one of 'rows'
or 'columns'
is inferred from the structure of the parsed JSON.string[]
): An array of column names to include. JSON properties missing from this list are not included in the table.number
): The number of lines to skip (default 0
) before reading data. Applicable to the 'ndjson'
type only.string
): A string used to identify comment lines. Any lines that start with the comment pattern are skipped. Applicable to the ndjson
type only.boolean
): Boolean flag (default true
) for automatic type inference. If false
, automatic date parsing for input JSON strings is disabled.Record<string, function>
): Object of column parsing options. The object keys should be column names. The object values should be parsing functions to invoke to transform values upon input.JSON Format Types
'columns'
: column-oriented JSON as an object-of-arrays.
{
"colA": ["a", "b", "c"],
"colB": [1, 2, 3]
}
'rows'
: row-oriented JSON as an array-of-objects.
[
{"colA": "a", "colB": 1},
{"colA": "b", "colB": 2},
{"colA": "c", "colB": 3}
]
'ndjson'
: newline-delimited JSON as individual objects separated by newline.
{"colA": "a", "colB": 1}
{"colA": "b", "colB": 2}
{"colA": "c", "colB": 3}
Examples
// create table from an input JSON string
// akin to table({ a: [1, 3], b: [2, 4] })
aq.fromJSON('{"a":[1,3],"b":[2,4]}')
// create table from an input JSON string loaded from 'url'
aq.fromJSON(await fetch(url).then(res => res.text()))
// create table from an input JSON object loaded from 'url'
// disable autoType Date parsing
aq.fromJSON(await fetch(url).then(res => res.json()), { autoType: false })
# aq.fromArrowStream(stream[, options]) · Source
Returns a Promise to a new table backed by Apache Arrow binary data. The stream must be a ReadableStream of bytes, which is then decoded using Flechette.
For many data types, Arquero uses binary-encoded Arrow columns as-is with zero data copying. For dictionary columns, Arquero unpacks columns with null
entries or containing multiple record batches to optimize query performance.
Both the Arrow IPC stream
and file
formats are supported; the format type is determined automatically. This method performs parsing only. To specify a URL or file to load, use loadArrow.
Select
): An ordered set of columns to import. The input may consist of: column name strings, column integer indices, objects with current column names as keys and new column names as values (for renaming), or a selection helper function such as all, not, or range.boolean
): Boolean flag (default false
) to extract 64-bit integer types as JavaScript BigInt
values. For Flechette tables, the default is to coerce 64-bit integers to JavaScript numbers and raise an error if the number is out of range. This option is only applied when parsing IPC binary data, otherwise the settings of the provided table instance are used.boolean
): Boolean flag (default true
) to convert Arrow date and timestamp values to JavaScript Date objects. Otherwise, numeric timestamps are used. This option is only applied when parsing IPC binary data, otherwise the settings of the provided table instance are used.boolean
): Boolean flag (default false
) to extract Arrow decimal-type data as BigInt values, where fractional digits are scaled to integers. Otherwise, decimals are (sometimes lossily) converted to floating-point numbers (default). This option is only applied when parsing IPC binary data, otherwise the settings of the provided table instance are used.boolean
): Boolean flag (default false
) to represent Arrow Map data as JavaScript Map
values. For Flechette tables, the default is to produce an array of [key, value]
arrays. This option is only applied when parsing IPC binary data, otherwise the settings of the provided table instance are used.boolean
): Boolean flag (default false
) to extract Arrow Struct values and table row objects using zero-copy proxy objects that extract data from underlying Arrow batches. The proxy objects can improve performance and reduce memory usage, but do not support property enumeration (Object.keys
, Object.values
, Object.entries
) or spreading ({ ...object }
). This option is only applied when parsing IPC binary data, otherwise the settings of the provided table instance are used.Examples
// load table from an Apache Arrow file
const dt = await aq.fromArrowStream(byteStream);
# aq.fromCSVStream(stream[, options]) · Source
Parse a comma-separated values (CSV) stream and return a Promise to a table. Delimiters other than commas, such as tabs or pipes (‘|’), can be specified using the delimiter option. By default, automatic type inference is performed for input values; string values that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior set autoType option to false
, which will cause all columns to be loaded as strings. To perform custom parsing of input column values, use the parse option.
This method performs parsing only. To specify a URL or file to load, use loadCSV.
string
): A single-character delimiter string between column values (default ','
).string
): A single-character numeric decimal separator (default '.'
).boolean
): Boolean flag (default true
) to specify the presence of a header row. If true
, indicates the CSV contains a header row with column names. If false
, indicates the CSV does not contain a header row and the columns are given the names 'col1'
, 'col2'
, etc unless the names option is specified.string[]
): An array of column names to use for header-less CSV files. This option is ignored if the header option is true
.number
): The number of lines to skip (default 0
) before reading data.string
): A string used to identify comment lines. Any lines that start with the comment pattern are skipped.true
): Boolean flag (default true
) for automatic type inference.number
): Maximum number of initial rows (default 1000
) to use for type inference.Record<string, function>
): Object of column parsing options. The object keys should be column names. The object values should be parsing functions to invoke to transform values upon input.// parse CSV from a compressed input stream
// (these stream transforms are performed internally by loadCSV)
const stream = (await fetch(url).then(res => res.body))
.pipeThrough(new DecompressionStream('gzip')) // decompress bytes
.pipeThrough(new TextDecoderStream()); // map bytes to strings
await aq.fromCSVStream(stream);
# aq.fromFixedStream(stream[, options]) · Source
Parse a fixed-width file stream and return a Promise to a table. By default, automatic type inference is performed for input values; string values that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior set options.autoType to false
, which will cause all columns to be loaded as strings. To perform custom parsing of input column values, use options.parse.
This method performs parsing only. To specify a URL or file to load, use loadFixed.
[number, number][]
): Array of [start, end] indices for fixed-width columns.number[]
): Array of fixed column widths. This option is ignored if the positions property is specified.string[]
): An array of column names. The array length should match the length of the positions or widths array. If not specified or shorter than the other array, default column names are generated.string
): A single-character numeric decimal separator (default '.'
).number
): The number of lines to skip (default 0
) before reading data.string
): A string used to identify comment lines. Any lines that start with the comment pattern are skipped.boolean
): Boolean flag (default true
) for automatic type inference.number
): Maximum number of initial rows (default 1000
) to use for type inference.Record<string, function>
): Object of column parsing options. The object keys should be column names. The object values should be parsing functions to invoke to transform values upon input.// parse fixed width text from a compressed input stream
// (these stream transforms are performed internally by loadFixed)
const stream = (await fetch(url).then(res => res.body))
.pipeThrough(new DecompressionStream('gzip')) // decompress bytes
.pipeThrough(new TextDecoderStream()); // map bytes to strings
await aq.fromFixedStream(stream);
# aq.parseJSONStream(stream[, options]) · Source
Parse a JavaScript Object Notation (JSON) stream and return a Promise to a table. String values in JSON column arrays that match the ISO standard date format are parsed into JavaScript Date objects. To disable this behavior, set the autoType option to false
. To perform custom parsing of input column values, use the parse option. Auto-type Date parsing is not performed for columns with custom parse options.
This method performs parsing only. To specify a URL or file to load, use loadJSON. Additionally, the table reads pre-parsed column-oriented JSON data into an Arquero table without type inference, while the from method similarly maps pre-parsed row-oriented JSON data into an Arquero table.
'columns' | 'rows' | 'ndjson' | null
): The JSON format type. One of 'columns'
(for an object with named column arrays), 'rows'
(for an array for row objects), or 'ndjson'
for newline-delimited JSON rows. For 'ndjson'
, each line of text must contain a JSON row object (with no trailing comma) and string properties must not contain any newline characters. If no format type is specified, one of 'rows'
or 'columns'
is inferred from the structure of the parsed JSON.string[]
): An array of column names to include. JSON properties missing from this list are not included in the table.number
): The number of lines to skip (default 0
) before reading data. Applicable to the 'ndjson'
type only.string
): A string used to identify comment lines. Any lines that start with the comment pattern are skipped. Applicable to the ndjson
type only.boolean
): Boolean flag (default true
) for automatic type inference. If false
, automatic date parsing for input JSON strings is disabled.Record<string, function>
): Object of column parsing options. The object keys should be column names. The object values should be parsing functions to invoke to transform values upon input.JSON Format Types
'columns'
: column-oriented JSON as an object-of-arrays.
{
"colA": ["a", "b", "c"],
"colB": [1, 2, 3]
}
'rows'
: row-oriented JSON as an array-of-objects.
[
{"colA": "a", "colB": 1},
{"colA": "b", "colB": 2},
{"colA": "c", "colB": 3}
]
'ndjson'
: newline-delimited JSON as individual objects separated by newline.
{"colA": "a", "colB": 1}
{"colA": "b", "colB": 2}
{"colA": "c", "colB": 3}
Examples
// create table from an input text stream in NDJSON format
aq.fromJSONStream(textStream, { type: 'ndjson' })
Methods for invoking or modifying table expressions.
All table expression operations, including standard functions, aggregate functions, and window functions. See the Operations API Reference for documentation of all available functions.
# aq.agg(table, expression) · Source
Compute a single aggregate value for a table. This method is a convenient shortcut for ungrouping a table, applying a rollup verb for a single aggregate expression, and extracting the resulting aggregate value.
Examples
aq.agg(aq.table({ a: [1, 2, 3] }), op.max('a')) // 3
aq.agg(aq.table({ a: [1, 3, 5] }), d => [op.min(d.a), op.max('a')]) // [1, 5]
Annotate a JavaScript function or value to bypass Arquero’s default table expression handling. Escaped values enable the direct use of JavaScript functions to process row data: no internal parsing or code generation is performed, and so closures and arbitrary function invocations are supported. Escaped values provide a lightweight alternative to table params and function registration to access variables in enclosing scopes.
An escaped value can be applied anywhere Arquero accepts single-table table expressions, including the derive, filter, and spread verbs. In addition, any of the standard op
functions can be used within an escaped function. However, aggregate and window op
functions are not supported. Also note that using escaped values will break serialization of Arquero queries to worker threads.
op
functions are not permitted.Examples
// filter based on a variable defined in the enclosing scope
const thresh = 5;
aq.table({ a: [1, 4, 9], b: [1, 2, 3] })
.filter(aq.escape(d => d.a < thresh))
// { a: [1, 4], b: [1, 2] }
// apply a parsing function defined in the enclosing scope
const parseMDY = d3.timeParse('%m/%d/%Y');
aq.table({ date: ['1/1/2000', '06/01/2010', '12/10/2020'] })
.derive({ date: aq.escape(d => parseMDY(d.date)) })
// { date: [new Date(2000,0,1), new Date(2010,5,1), new Date(2020,11,10)] }
// spread results from an escaped function that returns an array
const denom = 4;
aq.table({ a: [1, 4, 9] })
.spread(
{ a: aq.escape(d => [Math.floor(d.a / denom), d.a % denom]) },
{ as: ['div', 'mod'] }
)
// { div: [0, 1, 2], mod: [1, 0, 1] }
# aq.bin(name[, options]) · Source
Generate a table expression that performs uniform binning of number values. The resulting string can be used as part of the input to table transformation verbs.
true
) indicating if bins should snap to “nice” human-friendly values such as multiples of ten.0
) floors to the lower bin boundary. A value of 1
snaps one step higher to the upper bin boundary, and so on.Examples
aq.bin('colA', { maxbins: 20 })
# aq.collate(expr, comparator[, options]) · Source
Annotate a table expression with collation metadata, indicating how expression values should be compared and sorted. The orderby verb uses collation metadata to determine sort order. The collate helper is particularly useful for locale-specific string comparisons. The collation information can either take the form a standard two-argument comparator function, or as locale and option arguments compatible with Intl.Collator
.
'de'
, 'tr'
, etc.) and Intl.Locale
objects (or an array with either) is supported.Intl.Collator
. This argument only applies if locales are provided as the second argument.Examples
// order colA using a German locale
aq.collate('colA', 'de')
// order colA using a provided comparator function
aq.collate('colA', new Intl.Collator('de').compare)
Annotate a table expression (expr) to indicate descending sort order.
Examples
// sort colA in descending order
aq.desc('colA')
// sort colA in descending order of lower case values
aq.desc(d => op.lower(d.colA))
Generate a table expression that computes the number of rows corresponding to a given fraction for each group. The resulting string can be used as part of the input to the sample verb.
Examples
aq.frac(0.5)
# aq.rolling(expr[, frame, includePeers]) · Source
Annotate a table expression to compute rolling aggregate or window functions within a sliding window frame. For example, to specify a rolling 7-day average centered on the current day, call rolling with a frame value of [-3, 3].
null
, the default frame [-Infinity, 0]
includes the current values and all preceding values.false
(the default), the window frame boundaries are insensitive to peer values. If true
, the window frame expands to include all peers. This parameter only affects operations that depend on the window frame: namely aggregate functions and the first_value, last_value, and nth_value window functions.Examples
// cumulative sum, with an implicit frame of [-Infinity, 0]
aq.rolling(d => op.sum(d.colA))
// centered 7-day moving average, assuming one value per day
aq.rolling(d => op.mean(d.colA), [-3, 3])
// retrieve last value in window frame, including peers (ties)
aq.rolling(d => op.last_value(d.colA), [-3, 3], true)
Set a seed value for random number generation. If the seed is a valid number, a 32-bit linear congruential generator with the given seed will be used to generate random values. If the seed is null
, undefined
, or not a valid number, the random number generator will revert to Math.random
.
Examples
// set random seed as an integer
aq.seed(12345)
// set random seed as a fraction, maps to floor(fraction * (2 ** 32))
aq.seed(0.5)
// revert to using Math.random
aq.seed(null)
Methods for selecting columns. The result of these methods can be passed as arguments to select, groupby, join and other transformation verbs.
Select all columns in a table. Returns a function-valued selection compatible with select.
Examples
aq.all()
Negate a column selection, selecting all other columns in a table. Returns a function-valued selection compatible with select.
Examples
aq.not('colA', 'colB')
aq.not(aq.range(2, 5))
# aq.range(start, stop) · Source
Select a contiguous range of columns. Returns a function-valued selection compatible with select.
Examples
aq.range('colB', 'colE')
aq.range(2, 5)
# aq.matches(pattern) · Source
Select all columns whose names match a pattern. Returns a function-valued selection compatible with select.
Examples
// contains the string 'col'
aq.matches('col')
// has 'a', 'b', or 'c' as the first character (case-insensitve)
aq.matches(/^[abc]/i)
# aq.startswith(string) · Source
Select all columns whose names start with a string. Returns a function-valued selection compatible with select.
Examples
aq.startswith('prefix_')
# aq.endswith(string) · Source
Select all columns whose names end with a string. Returns a function-valued selection compatible with select.
Examples
aq.endswith('_suffix')
Select columns by index and rename them to the provided names. Returns a selection helper function that takes a table as input and produces a rename map as output. If the number of provided names is less than the number of table columns, the rename map will include entries for the provided names only. If the number of table columns is less than then number of provided names, the rename map will include only entries that cover the existing columns.
Examples
// helper to rename the first three columns to 'a', 'b', 'c'
aq.names('a', 'b', 'c')
// names can also be passed as arrays
aq.names(['a', 'b', 'c'])
// rename the first three columns, all other columns remain as-is
table.rename(aq.names(['a', 'b', 'c']))
// select and rename the first three columns, all other columns are dropped
table.select(aq.names(['a', 'b', 'c']))