Learning Perceptual Kernels for Visualization Design

This repo contains the results and source code from our crowdsourced experiments to estimate perceptual kernels for color, shape, size and combinations thereof. What is a perceptual kernel? It is a distance matrix derived from aggregate perceptual judgments. In its basic form, a perceptual kernel contains pairwise perceptual dissimilarity values for a specific set of perceptual stimuli---we refer to this set as a palette. In our study, we estimate perceptual kernels for the following six palettes.

There can be several alternative ways for experimentally constructing perceptual kernels. For example, we construct perceptual kernels from subjective similarity judgments. Psychology literature offers several task types for these judgments. How to choose one? What is the most effective judgment task in the context of perceptual kernels? So, understanding the trade-offs between different designs of judgment tasks is important. We estimate five perceptual kernels for each of the palettes above using the five different judgment tasks below---links show the task interfaces of the shape palette (refresh your page if you see a garbled image).

How to use the data and source code in this repo?

There are several ways to use the data and source code provided here. To start, get a local copy of the directory structure, either using git commands or by downloading and uncompressing the zipped repo.

Accessing the data

You can directly access the final perceptual kernels and use them for your own purposes, research or otherwise. You will see thirty kernels in data/kernels/ folder. These are symmetric, normalized matrices stored as comma-seperated text files. File names denote the variable and judgment task types used. For example, color-sa.txt is a perceptual kernel for the color palette that was elicited using spatial arrangement (SA). The kernels under data/kernels are all filtered and aggregated as discussed in our draft.

You can also access the raw datasets in data/raw, which include unprocessed per-subject measurements. You can use the raw data, e.g., to perform your own custom data processing and agregation or, more interestingly, per-subject data analysis.

Viewing the data

First, you will need to install node.js and the node modules for express.js and d3.js (pretty easy with npm, a package manager for node.js; for example, npm install d3 will install the d3 module for you). The kernels in data/kernels can be then viewed as interactive grayscale heatmaps and two-dimensional scatter plots by running view/showkernel.js from the command line. For example,

./showkernel.js  color-tm

will draw the corresponding color kernel as a grayscale heatmap in your default browser along with a two-dimensional projection of the kernel, where in-plane distances between the colors approximate the perceptual distances of the kernel. We obtain the projection points with multidimensional scaling on the perceptual kernel.

Hovering over a cell of the heatmap will show the corresponding perceptual kernel distance in a tooltip. Similarly, hovering over a projection point will isolate the corresponding row and column in the heatmap. The two bars under the projection scatter plot show the overall and per-row (or column) rank correlations between the perceptual kernel and the distance matrix directly derived from the planar projection. They are there to give you an idea about the accuracy of the projection with respect to the kernel distances as two-dimensional projections are lossy representations in general.

Reproducing the experiments

In addition to accessing the data, you can reproduce and extend our experiments using the source code provided. Each experiment is designed to be as self-contained as possible. For example, if you would like to see the experiment setup produced color-sa.txt, you can go to exp/color/sa/ directory. You can check out the task interface by opening color-sa.html in your browser. We recommend you go through and complete the task to understand what it entails. If you want to reproduce this experiment (or other experiments in exp/, for that matter), you need to first install Amazon Mechanical Turk Command Line Tools and then set two environment variables: MTURKCLT_HOME, which should point the installation directory for Amazon's command line tools, and STUDY_HOME , which should point your local perceptual-kernels directory. Now, take a look at color-sa.properties, which contains the properties of the experiment, from its summary description to the number and qualifications of subjects (Turkers) requested. Since the goal is to repeat the experiment, you don't need to edit this file but make sure you understand its contents. You will need, however, to edit the files color-sa.html and color-sa.question.

In order to run the experiment in a test mode on Amazon's Mechanical Turk sandbox, uncomment the following line in color-sa.html

<form id="form" autocomplete="off" method="POST" action="https://workersandbox.mturk.com/mturk/externalSubmit">

and make sure the next line

<form id="form" autocomplete="off" method="POST" action="https://www.mturk.com/mturk/externalSubmit">

is commented out. Of course, you shouldn't do this if you want to use the production site.

color-sa.html implements the task as a dynamic single page web application. Next step is to make it publicly available so that Turkers can access it embedded in an iframe on Amazon's site. Copy color-sa.html (with its dependencies) somewhere on your web server and provide its url address within <ExternalURL></ExternalURL> tags in color-sa.question. If you are using an http server (as opposed to https), remember to remove the http keyword from the url address---see color-sa.question for an example. Make sure that the .js and .css files that color-sa.html uses are also publicly accessible, either on your web server or somewhere else on the web. Assuming you have set up your Amazon Mechanical Turk account properly, you are now ready to upload the task by running the script runSandbox.sh. This will try to upload the task on Amazon Mechanical Turk's sandbox site and, if successful, will create a file called color-sa.success. Note that runSandbox.sh calls $MTURKCLT_HOME/bin/loadHITs.sh with -sandbox argument. If you're ready to use the production site, you should run runProduction.sh, while making sure service_url in $MTURKCLT_HOME/bin/mturk.properties is set to the Amazon Mechanical Turk's production site.

What is a perceptual kernel?

Perceptual kernels are distance matrices derived from aggregate perceptual similarity judgments. Here is an example of a perceptual kernel:

(Left) A crowd-estimated perceptual kernel for a shape palette. Darker entries indicate perceptually closer (similar) shapes. (Right) A two-dimensional projection of the palette shapes obtained via multidimensional scaling of the perceptual kernel.

What is it useful for?

Visualization design benefits from careful consideration of perception, as different assignments of visual encoding variables such as color, shape and size affect how viewers interpret data. Perceptual kernels represent perceptual differences between and within visual variables in a reusable form that is directly applicable to visualization evaluation and automated design. In other words, perceptual kernels provide a useful operational model for incorporating empirical perception data directly into visualization design tools. Please refer to our draft on perceptual kernels for further details.

Here are few examples of how the kernels can be used.

Automatically designing new palettes

Given an estimated perceptual kernel, we can use it to revisit existing palettes. For example, we can choose a set of stimuli that maximizes perceptual distance or conversely minimizes perceptual similarity according to the kernel. The following shows the n most discriminable subsets of the shape, size, and color variables. (We include size for completeness, though in practice this palette is better suited to quantitative, rather than categorical, data.) To compute a subset with n elements, we first initialize the set with the variable pair that has the highest perceptual distance. We then add new elements to this set, by finding the variable whose minimum distance to the existing subset is the maximum (i.e., the Hausdorff distance between two point sets).

You may also want to check out this illustration ---note that we cannot run js code from this page directly.

Visual embedding

Perceptual kernels can also guide visual embedding to choose encodings that preserve data-space distance metrics in terms of kernel-defined perceptual distances. To perform discrete embeddings, we find the optimal distance-preserving assignment of palette items to data points. The following scatter plot compares color distance measures.

The plotting symbols were chosen automatically using visual embedding. We use the correlation matrix of the color models below as the distances in the data domain, and the triplet matching (Tm) kernel for the shape palette as the distances in the perceptual range.

	Kernel (Tm)	CIELAB	CIEDE2000	Color Name
Kernel (Tm)	1.00	0.67	0.59	0.75
CIELAB	0.67	1.00	0.87	0.81
CIEDE2000	0.59	0.87	1.00	0.77
Color Name	0.75	0.81	0.77	1.00

Rank correlations between a crowd-estimated perceptual-kernel of the
color palette and the kernels derived from the existing color distance models for
the same palette. Higher values indicate more similar kernels.

This automatic assignment reflects the correlations between the variables. The correlation between CIELAB and CIEDE2000 is higher than the correlation between the triplet matching kernel and color names, and the assigned shapes reflect this relationship perceptually. For example, the perceptual distance between upward- and downward-pointing triangles is smaller than the perceptual distance between circle and square.

In a second example, we use visual embedding to encode community clusters in a character co-occurrence graph derived from Victor Hugo’s novel Les Miserables. Cluster memberships were computed using a standard modularity-based community-detection algorithm. For the data space distances, we count all inter-cluster edges and then normalize by the theoretically maximal number of edges between groups. To provide more dynamic range, we re-scale these normalized values to the range [0.2,0.8]. Clusters that share no connecting edges are given a maximal distance of 1. We then perform separate visual embeddings using univariate color and shape kernels (both estimated using triplet matching). As shown in the following figure, the assigned colors and shapes perceptually reflect the inter-cluster relations.