Search Results for "openrefine"

OpenRefine: A Tool for Print Collection Management

OpenRefine_DataSeries_Example

When I enrolled in the Fall, 2014 Data Scientist Training for Librarians (DST4L) course, I wasn’t sure how relevant the course’s content would be to my current projects.  In prepping for the course’s first two sessions, which focused on OpenRefine (an open source tool for managing “messy” data), I quickly realized that here is a… Read More

OpenRefine and Metadata Freedom

“(9/22/2014) The spectacles were perpetrated by wizards of a sort. Magicians of Dada, or ta-da or someodd. As I first understood it, their power was mystical or perhaps equinoctial. All I knew was that their magic was more than illusion, and for them it was something other than magic. It could be studied, taught, and learned.… Read More

OpenRefine: Reconcile, Extend, & Publish

Training on OpenRefine continues with instruction from expert Tom Morris. Over two weeks we have learned how to import different kinds of file formats, and explored the tools and techniques for reconciling names and other entries in our data. Reconciling is the process of matching elements in the data set to remote linked data sources.… Read More

OpenRefine in Action: Working with Strings

substring

OpenRefine has already proved a valuable tool in the workplace. Days after Tom’s first session I produced a set of CSV files with usage data from Harvard’s library reporting system. I wanted to upload and share the data in a Google spreadsheet, but first had to modify the date information to enable sorting and filtering.… Read More

Getting data into shape with OpenRefine

We were visited last Thursday by Tom Morris, one of the lead developers on the data refinement tool OpenRefine.  OpenRefine (also known simply as Refine and by its former name, Google Refine) is free, desktop-based software described by its developers as a “power tool for working with messy data.”  Tom presented an overview of Refine’s… Read More

Data Digging and Display with Python

Kidney cancer mortality rates by state.

“(11/8/2014) The Craftsman worked with speed and precision. He was architect, researcher, tester, and creator. From the translucent python coiled around his arm, he wrought the tools to build his dreams. By his own admission the quiet creature was still mostly mystery. As he worked, the anaconda’s gentle chatter aligned with the appearance of biblical… Read More

Current Course

Data Scientist Training for Librarians (DST4L) is an experimental course being offered by the Harvard-Smithsonian Center for Astrophysics John G. Wolbach Library and the Harvard Library to train librarians to respond to the growing data needs of their communities. Data science techniques are becoming increasingly important to all fields of scholarship. In this hands-on course,… Read More

Course Details

Recent studies suggest that there will be a shortfall in the near future of skilled talent available to help take advantage of big data in organizations. Meanwhile, government initiatives have encouraged the research community to share their data more openly, raising new challenges for researchers. Librarians can assist in this new data driven environment. Data… Read More

DST4L Feedback Session

The last regularly scheduled meeting of the DST4L class on November 26th, 2013 was devoted to course feedback (the originally scheduled topic “Putting it all Together: On the Web, PDF, Reporting” was changed to account for a short holiday week and the hackathon planned in January). Chris Erdmann opened the session by reiterating the original… Read More

Web Scraping 101

This week we continued on our Data Scientist journey by exploring web scraping basics. Our intrepid leader was again Tom Morris, who demonstrated two ways, out of many, to scrape web pages. While there are plenty of tools online to accomplish this task, the focus of the class was on OpenRefine and Python. Before starting… Read More

Course Details

The aim of the course is to upgrade the skills sets of librarians so that they can better serve the data needs of their communities, while also learning how to improve current library workflows and systems. Due to the experimental nature of the course, the teaching style will be less formal and flexible to the… Read More

Summing Up

After five months of hands-on tutorials, guest speakers, and group projects, we have reached the end of Data Scientist Training for Librarians knowing much more than we did before about data, where we can get it, and what we can do with it.  This blog has been my vehicle for communicating what we were learning… Read More

What is Open Access, really?**

I signed up for the Data Scientist Training for Librarians (DST4L) course because I wanted to learn more about Python; to learn to use emerging tools and technologies; to get involved with projects that are cultivating new roles, opening new possibilities, and identifying new challenges within the field of escience librarianship; and to explore my… Read More