• Data Savvy Librarians Meetup
  • Data Savvy Librarians Meetup
  • Data Savvy Librarians Meetup
  • Data Savvy Librarians Meetup
Latest Blogs
Popular Posts
Recommended

Latest Blogs

ajax-loader

Data Savvy Librarians Meetup

Mark your calendars: the first Data Savvy Librarians Meetup is coming up soon! Wednesday, August 27 4 pm-6 pm Phillips Auditorium, Harvard-Smithsonian Center for Astrophysics 60 Garden Street, Cambridge We’ll be around to help you get Python, Google Refine, etc. set up on your laptops in advance of the meetup, from 3:30-4 pm. More and… Read More

Registration now open for Data Scientist Training for Librarians (DST4L)

Data Scientist Training for Librarians or DST4L is an experimental course being offered by the Harvard-Smithsonian Center for Astrophysics John G. Wolbach Library and the Harvard Library to train librarians to respond to the growing data needs of their communities. Data science techniques are becoming increasingly important to all fields of scholarship. In this hands-on… Read More

DST4L Feedback Session

The last regularly scheduled meeting of the DST4L class on November 26th, 2013 was devoted to course feedback (the originally scheduled topic “Putting it all Together: On the Web, PDF, Reporting” was changed to account for a short holiday week and the hackathon planned in January). Chris Erdmann opened the session by reiterating the original… Read More

Topic Modeling and Gephi

topicsIndex

“(11/17/2013) Guided once more by the wise and powerful Dr. Lynn Cherny, MD (Master of Data), we first attempted to discourage cardiac fibrillation by applying Topic Modeling to Grimm’s Fairy Tales, and we then performed an exploratory Gephi Procedure on the resulting corpus.  Both efforts were considered successful and the Gephi Procedure revealed that data… Read More

Gabriel Florit, Boston Globe Visualizations

quiz

On November 18th, Gabriel Florit from the Boston Globe talked about data analysis and exploration and presented a wide variety of his interactive visualizations to an attentive audience at the CfA Phillips Auditorium. Gabriel’s Natural Language Toolkit (NLTK) projects included text analysis of words used by Mitt Romney and Barak Obama during the Presidential campaign. NLTK… Read More

Data Vis 101 with Lynn Cherny

pie

The DST4L class of November 4 featured the return of instructor Lyn Cherny, whom we last saw at the end of August when she taught a class on data manipulation and graphing in Excel. Lynn started her lecture by quoting a reporter she was talking to earlier this year, who said, “Data visuals are the… Read More

Naive Bayes and Friends

vsm

In our 9th week of #DST4L, Rahul Dave built on the previous week’s crash course in statistics with a deep dive into machine learning, classifiers, and information retrieval. First, this saw us working through further applications of scikit-learn, using a common pattern for classification: Create a classifier Fit the classifier with training values Run classifier.predict… Read More

Using Web APIs from Python

iPython Screen shot 2

On October 1, we had our fifth and final session with our able instructor, Tom Morris, for this section of the DST4L course. Tom provided us with an overview of APIs that was succinct but detailed. We learned how web APIs are used to tease data from web sites. For those who are new to… Read More

Opening Government with Python

On October 4th, we were lucky to be joined by James Turk from the Sunlight Foundation, “a nonprofit, nonpartisan organization that uses the power of the Internet to catalyze greater government openness and transparency, and provides new tools and resources for media and citizens, alike. They are committed to improving access to government information by… Read More

Web Scraping 101

This week we continued on our Data Scientist journey by exploring web scraping basics. Our intrepid leader was again Tom Morris, who demonstrated two ways, out of many, to scrape web pages. While there are plenty of tools online to accomplish this task, the focus of the class was on OpenRefine and Python. Before starting… Read More

OpenRefine: Reconcile, Extend, & Publish

Training on OpenRefine continues with instruction from expert Tom Morris. Over two weeks we have learned how to import different kinds of file formats, and explored the tools and techniques for reconciling names and other entries in our data. Reconciling is the process of matching elements in the data set to remote linked data sources.… Read More

OpenRefine in Action: Working with Strings

substring

OpenRefine has already proved a valuable tool in the workplace. Days after Tom’s first session I produced a set of CSV files with usage data from Harvard’s library reporting system. I wanted to upload and share the data in a Google spreadsheet, but first had to modify the date information to enable sorting and filtering.… Read More

ajax-loader