On November 18th, Gabriel Florit from the Boston Globe talked about data analysis and exploration and presented a wide variety of his interactive visualizations to an attentive audience at the CfA Phillips Auditorium.
Gabriel’s Natural Language Toolkit (NLTK) projects included text analysis of words used by Mitt Romney and Barak Obama during the Presidential campaign. NLTK was used to filter out stop words, and Gabriel later switched to term frequency-inverse document frequency (tf-idf) in order to rank words in a context specific weighting. For an analysis of Mayor Menino’s speeches, Gabriel focused on ten words, and showed how their frequency has changed over time (‘Women’ a central theme in Menino’s speech). He also studied over 1.8 million words of past State of the Union speeches and created a State of the Union Address quiz, which increased reader engagement by allowing them to share their score on Facebook and Twitter. A Year of Complaints mapped calls made to the Mayor’s 24-hour hotline during the course of a year. Boston mayoral election visualizations included a look at mayoral campaign donations by zipcode, campaign donations block by block, and mayoral race turnout by precinct. Many more examples are available on his website, http://gabrielflor.it.
Some of Gabriel’s best practices:
- showing all the data isn’t always the best approach
- it’s good to incorporate a “tacit tutorial” as a quick usage guide
- outlier data can be an interesting side note
- sometimes there is no substitute for intimate familiarity with the data
- responsive annotations engage viewers