• Assumptions, Visualizations, and the Quantified Self
  • Assumptions, Visualizations, and the Quantified Self
  • Assumptions, Visualizations, and the Quantified Self
  • Assumptions, Visualizations, and the Quantified Self
Latest Blogs
Popular Posts

Latest Blogs


Data Visualization Made Easy

Interactive Visualizations: Finding and Telling Stories with Data using Tableau On February 5th Tara Walker taught us how to use Tableau Public (a free service offered by Tableau, also known for its Tableau Desktop software). Tableau offers an Academic program for both students and teachers that universities can take advantage of. After learning best practices… Read More

Design for Interactive Data Visualization

On December 1st we were lucky to have Lynn Cherny join us as a guest speaker. Lynn’s talk, Design for Interactive Data Visualization (slides & video), started with Shneiderman’s Infovis mantra: Overview first, zoom and filter, then details-on-demand. For the remainder of her talk, she followed the outline below using a number of examples from the NYTimes: Overview &… Read More

Data Digging and Display with Python

Kidney cancer mortality rates by state.

“(11/8/2014) The Craftsman worked with speed and precision. He was architect, researcher, tester, and creator. From the translucent python coiled around his arm, he wrought the tools to build his dreams. By his own admission the quiet creature was still mostly mystery. As he worked, the anaconda’s gentle chatter aligned with the appearance of biblical… Read More

Why Git? Managing Version Control

Kyle Daigle, GitHub Trainer

A roomful of librarians interspersed with a few technology geeks gathered at the Harvard-Smithsonian Center for Astrophysics Phillips Auditorium last weekend to hear software developer and GitHub staffer Kyle Daigle speak passionately about distributed version control through the use of Git and GitHub. Part of the DST4L course initiated by John G. Wolbach’s Head Librarian,… Read More

OpenRefine: A Tool for Print Collection Management


When I enrolled in the Fall, 2014 Data Scientist Training for Librarians (DST4L) course, I wasn’t sure how relevant the course’s content would be to my current projects.  In prepping for the course’s first two sessions, which focused on OpenRefine (an open source tool for managing “messy” data), I quickly realized that here is a… Read More

OpenRefine and Metadata Freedom

“(9/22/2014) The spectacles were perpetrated by wizards of a sort. Magicians of Dada, or ta-da or someodd. As I first understood it, their power was mystical or perhaps equinoctial. All I knew was that their magic was more than illusion, and for them it was something other than magic. It could be studied, taught, and learned.… Read More

Data Savvy Librarians Meetup

Mark your calendars: the first Data Savvy Librarians Meetup is coming up soon! Wednesday, August 27 4 pm-6 pm Phillips Auditorium, Harvard-Smithsonian Center for Astrophysics 60 Garden Street, Cambridge We’ll be around to help you get Python, Google Refine, etc. set up on your laptops in advance of the meetup, from 3:30-4 pm. More and… Read More

Registration now open for Data Scientist Training for Librarians (DST4L)

Data Scientist Training for Librarians or DST4L is an experimental course being offered by the Harvard-Smithsonian Center for Astrophysics John G. Wolbach Library and the Harvard Library to train librarians to respond to the growing data needs of their communities. Data science techniques are becoming increasingly important to all fields of scholarship. In this hands-on… Read More

DST4L Feedback Session

The last regularly scheduled meeting of the DST4L class on November 26th, 2013 was devoted to course feedback (the originally scheduled topic “Putting it all Together: On the Web, PDF, Reporting” was changed to account for a short holiday week and the hackathon planned in January). Chris Erdmann opened the session by reiterating the original… Read More

Topic Modeling and Gephi


“(11/17/2013) Guided once more by the wise and powerful Dr. Lynn Cherny, MD (Master of Data), we first attempted to discourage cardiac fibrillation by applying Topic Modeling to Grimm’s Fairy Tales, and we then performed an exploratory Gephi Procedure on the resulting corpus.  Both efforts were considered successful and the Gephi Procedure revealed that data… Read More

Gabriel Florit, Boston Globe Visualizations


On November 18th, Gabriel Florit from the Boston Globe talked about data analysis and exploration and presented a wide variety of his interactive visualizations to an attentive audience at the CfA Phillips Auditorium. Gabriel’s Natural Language Toolkit (NLTK) projects included text analysis of words used by Mitt Romney and Barak Obama during the Presidential campaign. NLTK… Read More

Data Vis 101 with Lynn Cherny


The DST4L class of November 4 featured the return of instructor Lyn Cherny, whom we last saw at the end of August when she taught a class on data manipulation and graphing in Excel. Lynn started her lecture by quoting a reporter she was talking to earlier this year, who said, “Data visuals are the… Read More

Naive Bayes and Friends


In our 9th week of #DST4L, Rahul Dave built on the previous week’s crash course in statistics with a deep dive into machine learning, classifiers, and information retrieval. First, this saw us working through further applications of scikit-learn, using a common pattern for classification: Create a classifier Fit the classifier with training values Run classifier.predict… Read More