• OpenRefine: A Tool for Print Collection Management
  • OpenRefine: A Tool for Print Collection Management
  • OpenRefine: A Tool for Print Collection Management
  • OpenRefine: A Tool for Print Collection Management
Latest Blogs
Popular Posts

Latest Blogs


OpenRefine: A Tool for Print Collection Management


When I enrolled in the Fall, 2014 Data Scientist Training for Librarians (DST4L) course, I wasn’t sure how relevant the course’s content would be to my current projects.  In prepping for the course’s first two sessions, which focused on OpenRefine (an open source tool for managing “messy” data), I quickly realized that here is a… Read More

OpenRefine and Metadata Freedom

“(9/22/2014) The spectacles were perpetrated by wizards of a sort. Magicians of Dada, or ta-da or someodd. As I first understood it, their power was mystical or perhaps equinoctial. All I knew was that their magic was more than illusion, and for them it was something other than magic. It could be studied, taught, and learned.… Read More

Data Savvy Librarians Meetup

Mark your calendars: the first Data Savvy Librarians Meetup is coming up soon! Wednesday, August 27 4 pm-6 pm Phillips Auditorium, Harvard-Smithsonian Center for Astrophysics 60 Garden Street, Cambridge We’ll be around to help you get Python, Google Refine, etc. set up on your laptops in advance of the meetup, from 3:30-4 pm. More and… Read More

Registration now open for Data Scientist Training for Librarians (DST4L)

Data Scientist Training for Librarians or DST4L is an experimental course being offered by the Harvard-Smithsonian Center for Astrophysics John G. Wolbach Library and the Harvard Library to train librarians to respond to the growing data needs of their communities. Data science techniques are becoming increasingly important to all fields of scholarship. In this hands-on… Read More

DST4L Feedback Session

The last regularly scheduled meeting of the DST4L class on November 26th, 2013 was devoted to course feedback (the originally scheduled topic “Putting it all Together: On the Web, PDF, Reporting” was changed to account for a short holiday week and the hackathon planned in January). Chris Erdmann opened the session by reiterating the original… Read More

Topic Modeling and Gephi


“(11/17/2013) Guided once more by the wise and powerful Dr. Lynn Cherny, MD (Master of Data), we first attempted to discourage cardiac fibrillation by applying Topic Modeling to Grimm’s Fairy Tales, and we then performed an exploratory Gephi Procedure on the resulting corpus.  Both efforts were considered successful and the Gephi Procedure revealed that data… Read More

Gabriel Florit, Boston Globe Visualizations


On November 18th, Gabriel Florit from the Boston Globe talked about data analysis and exploration and presented a wide variety of his interactive visualizations to an attentive audience at the CfA Phillips Auditorium. Gabriel’s Natural Language Toolkit (NLTK) projects included text analysis of words used by Mitt Romney and Barak Obama during the Presidential campaign. NLTK… Read More

Data Vis 101 with Lynn Cherny


The DST4L class of November 4 featured the return of instructor Lyn Cherny, whom we last saw at the end of August when she taught a class on data manipulation and graphing in Excel. Lynn started her lecture by quoting a reporter she was talking to earlier this year, who said, “Data visuals are the… Read More

Naive Bayes and Friends


In our 9th week of #DST4L, Rahul Dave built on the previous week’s crash course in statistics with a deep dive into machine learning, classifiers, and information retrieval. First, this saw us working through further applications of scikit-learn, using a common pattern for classification: Create a classifier Fit the classifier with training values Run classifier.predict… Read More

Using Web APIs from Python

iPython Screen shot 2

On October 1, we had our fifth and final session with our able instructor, Tom Morris, for this section of the DST4L course. Tom provided us with an overview of APIs that was succinct but detailed. We learned how web APIs are used to tease data from web sites. For those who are new to… Read More

Opening Government with Python

On October 4th, we were lucky to be joined by James Turk from the Sunlight Foundation, “a nonprofit, nonpartisan organization that uses the power of the Internet to catalyze greater government openness and transparency, and provides new tools and resources for media and citizens, alike. They are committed to improving access to government information by… Read More

Web Scraping 101

This week we continued on our Data Scientist journey by exploring web scraping basics. Our intrepid leader was again Tom Morris, who demonstrated two ways, out of many, to scrape web pages. While there are plenty of tools online to accomplish this task, the focus of the class was on OpenRefine and Python. Before starting… Read More