Search Results for "openrefine"

OpenRefine: Reconcile, Extend, & Publish

Training on OpenRefine continues with instruction from expert Tom Morris. Over two weeks we have learned how to import different kinds of file formats, and explored the tools and techniques for reconciling names and other entries in our data. Reconciling is the process of matching elements in the data set to remote linked data sources.… Read More

OpenRefine in Action: Working with Strings


OpenRefine has already proved a valuable tool in the workplace. Days after Tom’s first session I produced a set of CSV files with usage data from Harvard’s library reporting system. I wanted to upload and share the data in a Google spreadsheet, but first had to modify the date information to enable sorting and filtering.… Read More

Getting data into shape with OpenRefine

We were visited last Thursday by Tom Morris, one of the lead developers on the data refinement tool OpenRefine.  OpenRefine (also known simply as Refine and by its former name, Google Refine) is free, desktop-based software described by its developers as a “power tool for working with messy data.”  Tom presented an overview of Refine’s… Read More

Current Course

Data Scientist Training for Librarians (DST4L) is an experimental course being offered by the Harvard-Smithsonian Center for Astrophysics John G. Wolbach Library and the Harvard Library to train librarians to respond to the growing data needs of their communities. Data science techniques are becoming increasingly important to all fields of scholarship. In this hands-on course,… Read More

Course Details

Recent studies suggest that there will be a shortfall in the near future of skilled talent available to help take advantage of big data in organizations. Meanwhile, government initiatives have encouraged the research community to share their data more openly, raising new challenges for researchers. Librarians can assist in this new data driven environment. Data… Read More

DST4L Feedback Session

The last regularly scheduled meeting of the DST4L class on November 26th, 2013 was devoted to course feedback (the originally scheduled topic “Putting it all Together: On the Web, PDF, Reporting” was changed to account for a short holiday week and the hackathon planned in January). Chris Erdmann opened the session by reiterating the original… Read More

Web Scraping 101

This week we continued on our Data Scientist journey by exploring web scraping basics. Our intrepid leader was again Tom Morris, who demonstrated two ways, out of many, to scrape web pages. While there are plenty of tools online to accomplish this task, the focus of the class was on OpenRefine and Python. Before starting… Read More

Course Details

The aim of the course is to upgrade the skills sets of librarians so that they can better serve the data needs of their communities, while also learning how to improve current library workflows and systems. Due to the experimental nature of the course, the teaching style will be less formal and flexible to the… Read More

Summing Up

After five months of hands-on tutorials, guest speakers, and group projects, we have reached the end of Data Scientist Training for Librarians knowing much more than we did before about data, where we can get it, and what we can do with it.  This blog has been my vehicle for communicating what we were learning… Read More

What is Open Access, really?**

I signed up for the Data Scientist Training for Librarians (DST4L) course because I wanted to learn more about Python; to learn to use emerging tools and technologies; to get involved with projects that are cultivating new roles, opening new possibilities, and identifying new challenges within the field of escience librarianship; and to explore my… Read More

DST4L: Coding 101 plus Data Science Conversations

This past semester I was one of about two dozen librarians and library students participating in the Data Scientist Training for Librarians (DST4L) course, hosted out of the Harvard-Smithsonian Center for Astrophysics, taught by Chris Erdmann.   I was interested in the course as a science librarian because big data is a growing theme in science… Read More

Data journalism at the Globe

Our final guest speaker for the class was Matt Carroll, a reporter for the Boston Globe who specializes in data.  Matt is the go-to data guy in the newsroom: reporters can come to him with their datasets whether they have an idea for a story in mind or not, and he will help them identify… Read More