Why Mosaic?
Though many expressive visualization tools exist, scalability to large datasets and interoperability across tools remain challenging. The visualization community lacks open, standardized tools for integrating visualization specifications with scalable analytic databases. While libraries like D3 embrace Web standards for cross-tool interoperability, higher-level frameworks often make closed-world assumptions, complicating integration with other tools and environments.
Mosaic is scalable
Visualization tools such as ggplot2, Vega-Lite / Altair, and Observable Plot support an expressive range of visualizations with a concise syntax. However, these tools were not designed to handle millions of data points. Mosaic provides greater scalability by pushing data-heavy computation to a backing DuckDB database. Mosaic improves performance further by caching results and, when possible, performing automatic query optimization.
The figure below shows render times for static plots over increasing dataset sizes. Mosaic provides faster results, often by one or more orders of magnitude. DuckDB-WASM in the browser fares well, though is limited (compared to a DuckDB server) by WebAssembly's lack of parallel processing. VegaFusion performs server-side optimization for bars and 2D histograms, but otherwise provides results identical to Vega-Lite.
When it comes to interaction, Mosaic really shines! For many forms of aggregated data, the coordinator will automatically pre-aggregate data into smaller tables ("materialized views") to support real-time interaction with billion+ element databases. The figure below shows benchmark results for optimized interactive updates. Even with billions of rows, Mosaic with a server-side DuckDB instance maintains interactive response rates.
If not already present, Mosaic will build pre-aggregated data tables when the mouse cursor enters a view. For very large data sets with longer pre-aggregation times, precomputation and server-side caching are supported.
Other tasks, like changing a color encoding or adjusting a smoothing parameter, can be carried out quickly in the browser alone, including over aggregated data. Mosaic clients have the flexibility of choosing what works best.
Mosaic is interoperable
Mosaic provides an open, "middle-tier" architecture that manages data access and linked selections between clients. With a shared architecture, a visualization framework can readily interoperate with other libraries, including input components and other visualization tools. We demonstrate this through the design of vgplot, a Mosaic-based grammar of interactive graphics that combines concepts from existing visualization tools.
To link across views, Mosaic provides a generalized selection abstraction inspired by Vega-Lite. Compared to Vega-Lite, Mosaic selections are decoupled from input event handling and support more complex resolution strategies — with computation offloaded to a backing scalable database.
Importantly, Mosaic selections are first-class entities, and not internal to a single visualization language or tool. Any component that implements the Mosaic client interface can both issue queries and be automatically filtered by a provided selection. Mosaic inputs and vgplot plots can freely interact, as can any other components or visualizations (such as custom D3 plots) that follow the Mosaic client interface.
Though written in JavaScript and deployable over the Web, Mosaic was designed to work well in data science environments such as Jupyter notebooks, too. A DuckDB server can run in a host environment such as a Python kernel and communicate with a Web-based output cell interface. See the Mosaic Jupyter Widget for more.
Mosaic is extensible
Mosaic can readily be extended with new clients, or, as in the case of vgplot, entire component libraries. Possible future additions include network visualization tools, WebGL/WebGPU enabled clients for more scalable rendering, and more!
As sketched in the code below, data-consuming elements (plot layers, widgets, etc) can be realized as Mosaic clients that provide queries and accept resulting data.
import { MosaicClient } from '@uwdata/mosaic-core';
import { Query } from '@uwdata/mosaic-sql';
export class CustomClient extends MosaicClient {
/**
* Create a new client instance, with a backing table name
* and an optional filterBy selection.
*/
constructor(tableName, filterBy) {
super(filterBy);
this.tableName = tableName;
}
/**
* Return a SQL query for the client's data needs,
* ideally using @uwdata/mosaic-sql query helpers.
* Be sure to incorporate the given filter criteria.
*/
query(filter = []) {
return Query
.from(this.tableName)
.select(/* desired columns here */)
.where(filter);
}
/**
* Process query result data. This method is called by the
* coordinator to pass query results from the database.
*/
queryResult(data) {
// visualize, analyze, ...
}
}
If you are interested in creating your own Mosaic clients, see the Mosaic GitHub repository. For concrete examples, start with the source code of Mosaic inputs. Once you've instantiated a custom client, register it using coordinator.connect(client)
.
Mosaic can also be extended with additional database connectors, and – though not for the faint of heart! – even the central coordinator can be replaced to experiment with alternative query management and optimization schemes.