First, I’ll start with the video (HD recommended):
And the Git repo with the code: http://github.com/andrewcerrito/Hotel-Data-Project
This visualization takes a random sampling of one-tenth of all hotels in the world and displays them in a map view by default. The points on the map are color-coded by the hotel’s star rating, with 0/unrated being the darkest orange and 5 stars being the brightest yellow. If you press S, the visualization changes to star ranking mode, in which it divides the random 10% set into proportionally-sized squares for each ranking. By pressing M, you can return to map view.
The first challenge with this project was wrangling the huge dataset into some sort of usable shape. I downloaded and went through some tutorials for openRefine, and then loaded the set it. After poking around, I noticed that some of the most complete categories included the hotel name, the star rating (after combing through and setting all unrated hotels to 0), the latitude and longitude, and the country name. In order to try to reduce the amount of data for Processing to parse, I cleaned for those categories and deleted the data I didn’t plan on using (later I ended up taking out the hotel name field, which further cut the file size by about 50%).
Once loaded into Processing, I set out trying to plot the data into a map view. I looked around for map libraries, but the most prominent one only worked with Processing 1.5 or lower. Since I was using Table classes to organize my data (2.0+), I couldn’t use it. After some experimenting, I found that mapping the latitude and longitude coordinates to the edges of the screen makes for a perfectly usable (and rather neat-looking) map. I color-coded them by star ranking by mapping their rating to green in an RGB color with red set to 255. That way, the 0-stars are red, and the hotels get more yellow the higher they’re ranked.
After that, I set about animating the map. It was around then that framerate became a major issue (the screen only refreshed once every 1-3 seconds or so). By putting in a random-selection filter, I found that limiting the data to 8% of the total produced a usable framerate, which was disappointing. (Later, I added a motion blur which let me get away with a slightly chunkier framerate, letting me bump it to 10% – it could probably go to 15 but it would start to really show at that point.)
The final step was to add the star ranking view. In order to make the squares proportionally sized to each other, I had Processing sort/count the number of hotels in each category in the setup() loop. I noticed unrated was the largest, so I mapped all the other squares against that. I also ended up square-rooting each category, so that the size difference of the squares would be exaggerated.