Category Archives: Data Representation

Sculpted Ocean

Sculpted Ocean is a unique kind of globe. Instead of focusing on the features and topography of the world’s landmasses like most globes, it tries to shine a light upon the depth and composition of the world’s vast oceans.

To make the globe, Amanda Gelb and I used over 1.6 million points of worldwide oceanic depth data from the NOAA, and chlorophyll level imaging data from the NASA SeaWIFS satellite monitoring program.
Data was cleaned and parsed in Ocean Data View and Python, mapped and interpolated in ArcGIS (with help from the data specialist librarians at NYU Data Services), and prototyped and modeled in different phases in Processing, Rhino, and ZBrush. It was printed on the 3DSystems ZPrinter 650 powder printer at NYU’s Advanced Media Studio.



NYC Food Crawl – ITP Winter Show 2013

This post assumes knowledge of the concept behind this project. To view the project proposal, click here.
The Processing code that I wrote is posted on Github. Click here to view.


NYC Food Crawl is a physical data representation that uses diorama sets with live cockroaches, along with an accompanying screen-based visualization, to represent the frequency of vermin violations in New York City area restaurants. I took the restaurant food inspection results dataset from NYC OpenData and brought the main database and the violation codes database into Google Refine and Excel for cleaning. After cleaning this data and isolating the vermin-specific violation codes, I brought both of these datasets into Processing. By doing so, I was able to calculate how many restaurants were in each inspection grade level, how many of those restaurants had had a vermin-related violation recorded, and how many of those violations had occurred within the past year or sooner.

I then used these figures to build the physical part of the project: restaurant dioramas, one for each grade, with a representative amount of live cockroaches inserted into each one to reflect the data. I decided to keep the representation simple: the number of cockroaches in each grade’s diorama would represent the percentage of restaurants in that grade that had had a vermin violation. Since my order of cockroaches (linked if you’re curious – most people at the show were) came in mixed sizes, I decided to make cockroach size indicative of recency: the ratio of larger cockroaches to total cockroaches in each diorama is the ratio of recent violations to total violations. More simply, the more detectable the presence of the large cockroaches were in each box, the higher percentage of recent violations there were for that grade level.

A concern I had from the perspective of the viewer was that, while memorable and attention-getting, the cockroach dioramas provide a very shallow representation of the data. Also, viewers might receive a skewed representation of the data depending, as the cockroaches like to hide out of sight and may not be all visible at once. To address this, I built out an interactive graphical visualization in Processing that both emulates the physical display for each grade level and provides additional statistics. I also included statistics for the data sorted by borough instead of inspection grade level, in case viewers wanted to explore the data further. You can see a video of this below:


Here are some pictures of my setup for the show, with both the dioramas and screen visualization:




New York Times API Search Tool



Code for this project can be seen on GitHub.

The assignment for the week was to use the New York Times API in order to chart a cultural shift over time. I initially chose to track the terms “payphone” vs “cellphone” to try to see the moment in time in which cellphones made payphones obsolete.

For how simple this result is, I had a lot of trouble wrapping my head around the coding process for some reason. One big sticking point was the need to combine the date ranges for both sets of data into one complete range that would encompass both. The solution was to take one of the search terms’ date ranges, add it to an ArrayList, comb the other term’s range for dates that aren’t duplicates, use a little Java (the Collections class) to sort the combined results, and then re-export that out to a String array.

At one point, I was getting a complete graphing for “payphone” but an incomplete graphing for “cellphone” (it would only draw the results for 2oo6 and 2007, even though some debug efforts were showing that the data was still there for other years). After trying many many things and wasting lots of time to no avail, I eventually realized that the culprit was using the “==” operator instead of “.equals()” for a String array. I ALWAYS forget about this and my lazy side wishes that Processing would just let you get away with the former option. What I don’t understand is why the first result set still graphed correctly and why only the second was affected – I was using the same display code for both.

Eventually, I got it working and took the screenshot above. I found the results kind of boring, so then I set about making the program modular so that you can define whatever two strings you want at the beginning of the code and the graph scale and date range will adjust itself accordingly. When I accomplished that, I took some screenshots of some more visually interesting results:



Strange how “cassette” briefly resurfaces in 1998. I wonder if some alternate use for the term came up?



This one shows you a president sandwich. I thought it looked neat.

World’s Hotels – Manipulating A Large Dataset

First, I’ll start with the video (HD recommended):

And the Git repo with the code:

This visualization takes a random sampling of one-tenth of all hotels in the world and displays them in a map view by default. The points on the map are color-coded by the hotel’s star rating, with 0/unrated being the darkest orange and 5 stars being the brightest yellow. If you press S, the visualization changes to star ranking mode, in which it divides the random 10% set into proportionally-sized squares for each ranking. By pressing M, you can return to map view.
Development Process:

The first challenge with this project was wrangling the huge dataset into some sort of usable shape. I downloaded and went through some tutorials for openRefine, and then loaded the set it. After poking around, I noticed that some of the most complete categories included the hotel name, the star rating (after combing through and setting all unrated hotels to 0), the latitude and longitude, and the country name. In order to try to reduce the amount of data for Processing to parse, I cleaned for those categories and deleted the data I didn’t plan on using (later I ended up taking out the hotel name field, which further cut the file size by about 50%).

Once loaded into Processing, I set out trying to plot the data into a map view. I looked around for map libraries, but the most prominent one only worked with Processing 1.5 or lower. Since I was using Table classes to organize my data (2.0+), I couldn’t use it. After some experimenting, I found that mapping the latitude and longitude coordinates to the edges of the screen makes for a perfectly usable (and rather neat-looking) map. I color-coded them by star ranking by mapping their rating to green in an RGB color with red set to 255. That way, the 0-stars are red, and the hotels get more yellow the higher they’re ranked.

After that, I set about animating the map. It was around then that framerate became a major issue (the screen only refreshed once every 1-3 seconds or so). By putting in a random-selection filter, I found that limiting the data to 8% of the total produced a usable framerate, which was disappointing. (Later, I added a motion blur which let me get away with a slightly chunkier framerate, letting me bump it to 10% – it could probably go to 15 but it would start to really show at that point.)

The final step was to add the star ranking view. In order to make the squares proportionally sized to each other, I had Processing sort/count the number of hotels in each category in the setup() loop. I noticed unrated was the largest, so I mapped all the other squares against that. I also ended up square-rooting each category, so that the size difference of the squares would be exaggerated.