flechette

Flechette API Reference

Top-Level Data Types Schema Table Column

Data Type Overview

The table below provides an overview of all data types supported by the Apache Arrow format and how Flechette maps them to JavaScript types. The table indicates if Flechette can read the type (via tableFromIPC), write the type (via tableToIPC), and build the type from JavaScript values (via tableFromArrays or columnFromArray).

Id Data Type Read? Write? Build? JavaScript Type
-1 Dictionary depends on dictionary value type
1 Null null
2 Int number, or bigint for 64-bit values via the useBigInt flag
3 Float number
4 Binary Uint8Array
5 Utf8 string
6 Bool boolean
7 Decimal number, or bigint via the useDecimalBigInt flag
8 Date number, or Date via the useDate flag.
9 Time number, or bigint for 64-bit values via the useBigInt flag
10 Timestamp number, or Date via the useDate flag.
11 Interval depends on the interval unit
12 List Array or TypedArray of child type
13 Struct object, properties depend on child types
14 Union depends on child types
15 FixedSizeBinary Uint8Array
16 FixedSizeList Array or TypedArray of child type
17 Map [key, value][], or Map via the useMap flag
18 Duration number, or bigint via the useBigInt flag
19 LargeBinary Uint8Array
20 LargeUtf8 string
21 LargeList Array or TypedArray of child type
22 RunEndEncoded depends on child type
23 BinaryView Uint8Array
24 Utf8View string
25 ListView Array or TypedArray of child type
26 LargeListView Array or TypedArray of child type

Data Type Methods

Each of the methods below returns a DataType instance as a standard JavaScript object. Data types are used to define a Field within a table Schema. See the Schema documentation for more.

Dictionary


# dictionary(type[, indexType, ordered, id])

Create a Dictionary data type instance. A dictionary type consists of a dictionary of values (which may be of any type) and corresponding integer indices that reference those values. If values are repeated, a dictionary encoding can provide substantial space savings. In the IPC format, dictionary indices reside alongside other columns in a record batch, while dictionary values are written to special dictionary batches, linked by a unique dictionary id assigned at encoding time. Internally Flechette extracts dictionary values immediately upon decoding; while this incurs some initial overhead, it enables fast subsequent lookups.

import { dictionary, int16, utf8 } from '@uwdata/flechette';
// dictionary type with string values and int16 indices
// {
//   typeId: -1,
//   id: -1,
//   dictionary: { typeId: 5, ... },
//   indices: { typeId: 2, bitWidth: 16, signed: true, ... }
//   ordered: false
// }
dictionary(utf8(), int16())

Null


# nullType()

Create a Null data type instance. Null data requires no storage and all extracted values are null.

import { nullType } from '@uwdata/flechette';
// { typeId: 1 }
nullType()

Int


# int([bitWidth, signed])

Create an Int data type instance. Integer values are stored within typed arrays and extracted to JavaScript number values by default.

import { int } from '@uwdata/flechette';
// { typeId: 2, bitWidth: 32, signed: true, ... }
int()

# int8()

Create an Int data type instance for 8-bit signed integers. 8-bit signed integers are stored within an Int8Array and accessed directly.


# int16()

Create an Int data type instance for 16-bit signed integers. 16-bit signed integers are stored within an Int16Array and accessed directly.


# int32()

Create an Int data type instance for 32-bit signed integers. 32-bit signed integers are stored within an Int32Array and accessed directly.


# int64()

Create an Int data type instance for 64-bit signed integers. 64-bit signed integers are stored within a BigInt64Array and converted to JavaScript number values upon extraction. An error is raised if a value exceeds either Number.MIN_SAFE_INTEGER or Number.MAX_SAFE_INTEGER. Pass the useBigInt extraction option (e.g., to tableFromIPC or tableFromArrays) to instead extract 64-bit integers directly as BigInt values.


# uint8()

Create an Int data type instance for 8-bit unsigned integers. 8-bit unsigned integers are stored within an Uint8Array and accessed directly.


# uint16()

Create an Int data type instance for 16-bit unsigned integers. 16-bit unsigned integers are stored within an Uint16Array and accessed directly.


# uint32()

Create an Int data type instance for 32-bit unsigned integers. 32-bit unsigned integers are stored within an Uint32Array and accessed directly.


# uint64()

Create an Int data type instance for 64-bit unsigned integers. 64-bit unsigned integers are stored within a BigUint64Array and converted to JavaScript number values upon extraction. An error is raised if a value exceeds Number.MAX_SAFE_INTEGER. Pass the useBigInt extraction option (e.g., to tableFromIPC or tableFromArrays) to instead extract 64-bit integers directly as BigInt values.

Float


# float([precision])

Create a Float data type instance for floating point numbers. Floating point values are stored within typed arrays and extracted to JavaScript number values.

import { float } from '@uwdata/flechette';
// { typeId: 3, precision: 2, ... }
float()

# float16()

Create a Float data type instance for 16-bit (half precision) floating point numbers. 16-bit floats are stored within a Uint16Array and converted to/from number values. We intend to use Float16Array once it is widespread among JavaScript engines.


# float32()

Create a Float data type instance for 32-bit (single precision) floating point numbers. 32-bit floats are stored within a Float32Array and accessed directly.


# float64()

Create a Float data type instance for 64-bit (double precision) floating point numbers. 64-bit floats are stored within a Float64Array and accessed directly.

Binary


# binary()

Create a Binary data type instance for variably-sized opaque binary data with 32-bit offsets. Binary values are stored in a Uint8Array using a 32-bit offset array and extracted to JavaScript Uint8Array subarray values.

import { binary } from '@uwdata/flechette';
// { typeId: 4 }
binary()

Utf8


# utf8()

Create a Utf8 data type instance for Unicode string data of variable length with 32-bit offsets. UTF-8 code points are stored as binary data and extracted to JavaScript string values using TextDecoder. Due to decoding overhead, repeated access to string data can be costly. If making multiple passes over Utf8 data, we recommended converting the string upfront (e.g., via Column.toArray) and accessing the result.

import { utf8 } from '@uwdata/flechette';
// { typeId: 5 }
utf8()

Bool


# bool()

Create a Bool data type instance for boolean data. Bool values are stored compactly in Uint8Array bitmaps with eight values per byte, and extracted to JavaScript boolean values.

import { bool } from '@uwdata/flechette';
// { typeId: 6 }
bool()

Decimal


# decimal(precision, scale[, bitWidth])

Create an Decimal data type instance for exact decimal values, represented as a 128 or 256-bit integer value in two’s complement. Decimals are fixed point numbers with a set precision (total number of decimal digits) and scale (number of fractional digits). For example, the number 35.42 can be represented as 3542 with precision ≥ 4 and scale = 2.

By default, Flechette converts decimals to 64-bit floating point numbers upon extraction (e.g., mapping 3542 back to 35.42). While useful for many downstream applications, this conversion may be lossy and introduce inaccuracies. Pass the useDecimalBigInt extraction option (e.g., to tableFromIPC or tableFromArrays) to instead extract decimal data as BigInt values.

import { utf8 } from '@uwdata/flechette';
// decimal with 18 total digits, including 3 fractional digits
// { typeId: 7, precision: 18, scale: 3, bitWidth: 128, ... }
decimal(18, 3)

Date


# date(unit)

Create a Date data type instance. Date values are 32-bit or 64-bit signed integers representing an elapsed time since the UNIX epoch (Jan 1, 1970 UTC), either in units of days (32 bits) or milliseconds (64 bits, with values evenly divisible by 86400000). Dates are stored in either an Int32Array (days) or BigInt64Array (milliseconds).

By default, extracted date values are converted to JavaScript number values representing milliseconds since the UNIX epoch. Pass the useDate extraction option (e.g., to tableFromIPC or tableFromArrays) to instead extract date values as JavaScript Date objects.

import { DateUnit, date } from '@uwdata/flechette';
// { typeId: 8, unit: 0, ... }
date(DateUnit.DAY)

# dateDay()

Create a Date data type instance with units of DateUnit.DAY.

import { dateDay } from '@uwdata/flechette';
// { typeId: 8, unit: 0, ... }
dateDay()

# dateMillisecond()

Create a Date data type instance with units of DateUnit.MILLISECOND.

import { dateMillisecond } from '@uwdata/flechette';
// { typeId: 8, unit: 1, ... }
dateMillisecond()

Time


# time([unit, bitWidth])

Create a Time data type instance, stored in one of four units: seconds, milliseconds, microseconds or nanoseconds. The integer bitWidth depends on the unit and must be 32 bits for seconds and milliseconds or 64 bits for microseconds and nanoseconds. The allowed values are between 0 (inclusive) and 86400 (=246060) seconds (exclusive), adjusted for the time unit (for example, up to 86400000 exclusive for the DateUnit.MILLISECOND unit.

This definition doesn’t allow for leap seconds. Time values from measurements with leap seconds will need to be corrected when ingesting into Arrow (for example by replacing the value 86400 with 86399).

Time values are stored as integers in either an Int32Array (bitWidth = 32) or BigInt64Array (bitWidth = 64). By default, all time values are returned as untransformed number values. 64-bit values are stored within a BigInt64Array and converted to JavaScript number values upon extraction. An error is raised if a value exceeds either Number.MIN_SAFE_INTEGER or Number.MAX_SAFE_INTEGER. Pass the useBigInt extraction option (e.g., to tableFromIPC or tableFromArrays) to instead extract 64-bit time values directly as BigInt values.

import { TimeUnit, time } from '@uwdata/flechette';
// { typeId: 9, unit: 1, bitWidth: 32, ... }
time()
// { typeId: 9, unit: 2, bitWidth: 64, ... }
time(TimeUnit.MICROSECONDS, 64)

# timeSecond()

Create a Time data type instance with units of TimeUnit.SECOND.

import { timeSecond } from '@uwdata/flechette';
// { typeId: 9, unit: 0, bitWidth: 32, ... }
timeSecond()

# timeMillisecond()

Create a Time data type instance with units of TimeUnit.MILLISECOND.

import { timeMillisecond } from '@uwdata/flechette';
// { typeId: 9, unit: 1, bitWidth: 32, ... }
timeMillisecond()

# timeMicrosecond()

Create a Time data type instance with units of TimeUnit.MICROSECOND.

import { timeMicrosecond } from '@uwdata/flechette';
// { typeId: 9, unit: 2, bitWidth: 64, ... }
timeMicrosecond()

# timeNanosecond()

Create a Time data type instance with units of TimeUnit.NANOSECOND.

import { timeNanosecond } from '@uwdata/flechette';
// { typeId: 9, unit: 3, bitWidth: 64, ... }
timeNanosecond()

Timestamp


# timestamp([unit, timezone])

Create a Timestamp data type instance. Timestamp values are 64-bit signed integers representing an elapsed time since a fixed epoch, stored in either of four units: seconds, milliseconds, microseconds or nanoseconds, and are optionally annotated with a timezone. Timestamp values do not include any leap seconds (in other words, all days are considered 86400 seconds long).

Timestamp values are stored in a BigInt64Array and converted to millisecond-based JavaScript number values (potentially with fractional digits) upon extraction. An error is raised if a value exceeds either Number.MIN_SAFE_INTEGER or Number.MAX_SAFE_INTEGER. Pass the useDate extraction option (e.g., to tableFromIPC or tableFromArrays) to instead extract timestamp values as JavaScript Date objects.

import { timestamp } from '@uwdata/flechette';
// { typeId: 10, unit: 1, timezone: null, ... }
timestamp()
// { typeId: 10, unit: 2, timezone: 'Europe/Berlin', ... }
timestamp(TimeUnit.MICROSECOND, 'Europe/Berlin')

Interval


# interval([unit])

Create an Interval data type instance. Values represent calendar intervals stored using integers for each date part. The supported units are year/moth, day/time, and month/day/nanosecond intervals.

IntervalUnit.YEAR_MONTH indicates the number of elapsed whole months, stored as 32-bit signed integers. Flechette extracts these month count integers directly.

IntervalUnit.DAY_TIME indicates the number of elapsed days and milliseconds (no leap seconds), stored as 2 contiguous 32-bit signed integers (8-bytes in total). Flechette extracts these values to two-element [day, time] Int32Array instances.

IntervalUnit.MONTH_DAY_NANO is a triple of the number of elapsed months, days, and nanoseconds. The values are stored contiguously in 16-byte blocks. Months and days are encoded as 32-bit signed integers and nanoseconds is encoded as a 64-bit signed integer. Nanoseconds does not allow for leap seconds. Each field is independent (e.g. there is no constraint that nanoseconds have the same sign as days or that the quantity of nanoseconds represents less than a day’s worth of time). Flechette extracts these values to three-element [month, day, nano] Float64Array instances.

import { interval } from '@uwdata/flechette';
// { typeId: 11, unit: 0, ... }
interval(IntervalUnit.YEAR_MONTH)
// { typeId: 11, unit: 2, ... }
interval(IntervalUnit.MONTH_DAY_NANO)

List


# list(child)

Create a List type instance, representing variably-sized lists (arrays) with 32-bit offsets. A list has a single child data type for list entries. Lists are represented using integer offsets that indicate list extents within a single child array containing all list values. Lists are extracted to either Array or TypedArray instances, depending on the child type.

import { int32, list } from '@uwdata/flechette';
// {
//   typeId: 12,
//   children: [{
//     name: '',
//     type: type: { typeId: 2, bitWidth: 32, signed: true, ... },
//     ...
//   }],
//   ...
// }
list(int32())

Struct


# struct(children)

Create a Struct type instance. A struct consists of multiple named child data types. Struct values are stored as parallel child batches, one per child type.

By default, structs are fully extracted to standard JavaScript objects. Pass the useProxy extraction option (e.g., to tableFromIPC or tableFromArrays) to instead use zero-copy proxy row objects that extract data on-demand from underlying Arrow batches. Proxy objects can improve performance and reduce memory usage, but do not support convenient property enumeration (Object.keys, Object.values, Object.entries) or spreading ({ ...object }). A proxy object can be converted to a standard object by calling its toJSON() method.

import { bool, field, float32, int16, struct } from '@uwdata/flechette';
// using an object with property names and types
// {
//   typeId: 13,
//   children: [
//     { name: 'foo', type: { typeId: 2, bitWidth: 16, ... }, ... },
//     { name: 'bar', type: { typeId: 6 }, ... },
//     { name: 'baz', type: { typeId: 3, precision: 1, ... }, ... }
//   ]
// }
struct({ foo: int16(), bar: bool(), baz: float32() })

// using an array of Field instances
struct([
  field('foo', int16()),
  field('bar', bool()),
  field('baz', float32())
])

Union


# union(mode, children[, typeIds, typeIdForValue])

Create a Union type instance. A union is a complex type with parallel children data types. Union values are stored in either a sparse (UnionMode.Sparse) or dense (UnionMode.Dense) layout mode. In a sparse layout, child types are stored in parallel arrays with the same lengths, resulting in many unused, empty values. In a dense layout, child types have variable lengths and an offsets array is used to index the appropriate value.

By default, ids in the type vector refer to the index in the children array. Optionally, typeIds provide an indirection between the child index and the type id. For each child, typeIds[index] is the id used in the type vector. The typeIdForValue argument provides a lookup function for mapping input data to the proper child type id, and is required if using builder methods.

Extracted JavaScript values depend on the child types.

import { float64, utf8, union } from '@uwdata/flechette';
// {
//   typeId: 14,
//   mode: 1,
//   typeIds: [ 0, 1 ],
//   typeMap: { '0': 0, '1': 1 },
//   children: [
//     { name: '_0', type: { typeId: 3, precision: 2, ... }, ... },
//     { name: '_1', type: { typeId: 5 }, ... }
//   ],
//   typeIdForValue: <<function>>
// }
union(
  UnionMode.Dense,
  [float64(), utf8()],
  [0, 1],
  v => typeof v === 'string' ? 1 : 0
)

FixedSizeBinary


# fixedSizeBinary(stride)

Create a FixedSizeBinary data type instance for opaque binary data where each entry has the same fixed size. Fixed binary data are stored in a single Uint8Array, indexed using the known stride and extracted to JavaScript Uint8Array subarray values.

import { fixedSizeBinary } from '@uwdata/flechette';
// { typeId: 15, stride: 128 }
fixedSizeBinary(128)

FixedSizeList


# fixedSizeList(child, stride)

Create a FixedSizeList type instance for list (array) data where every list has the same fixed size. A list has a single child data type for list entries. Fixed size lists are represented as a single child array containing all list values, indexed using the known stride. Lists are extracted to either Array or TypedArray instances, depending on the child type.

import { fixedSizeList, float32 } from '@uwdata/flechette';
// {
//   typeId: 16,
//   stride: 8,
//   children: [ { name: '', type: { typeId: 3, precision: 1, ... }, ... } ]
// }
fixedSizeList(float32(), 8)

Map


# map(keyField, valueField[, keysSorted])

Create a Map type instance representing collections of key-value pairs. A Map is a logical nested type that is represented as a list of key-value structs. The key and value types are not constrained, so the application is responsible for ensuring that the keys are hashable and unique, and that keys are properly sorted if keysSorted is true.

By default, map data is extracted to arrays of [key, value] pairs, in the style of Object.entries. Pass the useMap extraction option (e.g., to tableFromIPC or tableFromArrays) to instead extract JavaScript Map instances.

import { int64, map, utf8 } from '@uwdata/flechette';
// {
//   typeId: 17,
//   keysSorted: false,
//   children: [{
//     name: 'entries',
//     type: {
//       typeId: 13,
//       children: [
//         { name: 'key', type: { typeId: 5 }, ... },
//         { name: 'value', type: { typeId: 2, bitWidth: 64, ... }, ... }
//       ]
//     }, ...
//   }
//  ]}, ...
// }
map(utf8(), int64())

Duration


# duration([unit])

Create a Duration data type instance. Durations represent an absolute length of time unrelated to any calendar artifacts. The resolution defaults to millisecond, but can be any of the other TimeUnit values. This type is always represented as a 64-bit integer.

Duration values are stored as integers in a BigInt64Array. By default, duration values are extracted as JavaScript number values. An error is raised if a value exceeds either Number.MIN_SAFE_INTEGER or Number.MAX_SAFE_INTEGER. Pass the useBigInt extraction option (e.g., to tableFromIPC or tableFromArrays) to instead extract duration values directly as BigInt values.

import { duration } from '@uwdata/flechette';
// { typeId: 18, unit: 1, ... }
duration()

LargeBinary


# largeBinary()

Create a LargeBinary data type instance for variably-sized opaque binary data with 64-bit offsets, allowing representation of extremely large data values. Large binary values are stored in a Uint8Array, indexed using a 64-bit offset array and extracted to JavaScript Uint8Array subarray values.

import { largeBinary } from '@uwdata/flechette';
// { typeId: 19, ... }
largeBinary()

LargeUtf8


# largeUtf8()

Create a LargeUtf8 data type instance for Unicode string data of variable length with 64-bit offsets, allowing representation of extremely large data values. UTF-8 code points are stored as binary data and extracted to JavaScript string values using TextDecoder. Due to decoding overhead, repeated access to string data can be costly. If making multiple passes over Utf8 data, we recommended converting the string upfront (e.g., via Column.toArray) and accessing the result.

import { largeUtf8 } from '@uwdata/flechette';
// { typeId: 20, ... }
largeUtf8()

LargeList


# largeList(child)

Create a LargeList type instance, representing variably-sized lists (arrays) with 64-bit offsets, allowing representation of extremely large data values. A list has a single child data type for list entries. Lists are represented using integer offsets that indicate list extents within a single child array containing all list values. Lists are extracted to either Array or TypedArray instances, depending on the child type.

import { largeList, utf8 } from '@uwdata/flechette';
// { typeId: 21, children: [ { name: '', type: { typeId: 5 }, ... } ], ... }
largeList(utf8())

RunEndEncoded


# runEndEncoded(runsField, valuesField)

Create a RunEndEncoded type instance, which compresses data by representing consecutive repeated values as a run. This data type uses two child arrays, run_ends and values. The run_ends child array must be a 16, 32, or 64 bit integer array which encodes the indices at which the run with the value in each corresponding index in the values child array ends. Like list and struct types, the values array can be of any type.

To extract values by index, binary search is performed over the run_ends to locate the correct value. The extracted value depends on the values data type.

import { int32, runEndEncoded, utf8 } from '@uwdata/flechette';
// 32-bit integer run ends and utf8 string values
// {
//   typeId: 22,
//   children: [
//     { name: 'run_ends', type: { typeId: 2, bitWidth: 32, ... }, ... },
//     { name: 'values', type: { typeId: 5 }, ... }
//   ]
// }
runEndEncoded(int32(), utf8())

BinaryView


# binaryView()

Create a BinaryView type instance. BinaryView data is logically the same as the Binary type, but the internal representation uses a view struct that contains the string length and either the string’s entire data inline (for small strings) or an inlined prefix, an index of another buffer, and an offset pointing to a slice in that buffer (for non-small strings). For more details, see the Apache Arrow format documentation.

Flechette can encode and decode BinaryView data, extracting Uint8Array values. However, Flechette does not currently support building BinaryView columns from JavaScript values.

import { binaryView } from '@uwdata/flechette';
// { typeId: 23 }
binaryView()

Utf8View


# utf8View()

Create a Utf8View type instance. Utf8View data is logically the same as the Utf8 type, but the internal representation uses a view struct that contains the string length and either the string’s entire data inline (for small strings) or an inlined prefix, an index of another buffer, and an offset pointing to a slice in that buffer (for non-small strings). For more details, see the Apache Arrow format documentation.

Flechette can encode and decode Utf8View data, extracting string values. However, Flechette does not currently support building Utf8View columns from JavaScript values.

import { utf8View } from '@uwdata/flechette';
// { typeId: 24 }
utf8View()

ListView


# listView(child)

Create a ListView type instance, representing variably-sized lists (arrays) with 32-bit offsets. ListView data represents the same logical types that List can, but contains both offsets and sizes allowing for writes in any order and sharing of child values among list values. For more details, see the Apache Arrow format documentation.

ListView data are extracted to either Array or TypedArray instances, depending on the child type. Flechette can encode and decode ListView data; however, Flechette does not currently support building ListView columns from JavaScript values.

import { float16, listView } from '@uwdata/flechette';
// {
//   typeId: 25,
//   children: [ { name: 'value', type: { typeId: 3, ... }, ... } ]
// }
listView(float16())

LargeListView


# largeListView(child)

Create a LargeListView type instance, representing variably-sized lists (arrays) with 64-bit offsets, allowing representation of extremely large data values. LargeListView data represents the same logical types that LargeList can, but contains both offsets and sizes allowing for writes in any order and sharing of child values among list values. For more details, see the Apache Arrow format documentation.

LargeListView data are extracted to either Array or TypedArray instances, depending on the child type. Flechette can encode and decode LargeListView data; however, Flechette does not currently support building LargeListView columns from JavaScript values.

import { float16, largeListView } from '@uwdata/flechette';
// {
//   typeId: 26,
//   children: [ { name: 'value', type: { typeId: 3, ... }, ... } ]
// }
largeListView(float16())