On August 23rd and 24th, we met at the Harvard-Smithsonian CfA for Session I of Data Science Training for Librarians (DST4L). The beautiful weather notwithstanding, we assembled on Friday and Saturday morning in the Phillips Auditorium, ready to work up a sweat at this 2-day Software Carpentry Bootcamp.
We covered key concepts like variables, String- and List manipulation, conditional statements, and writing functions. With the able guidance of our instructors we quickly progressed to the point that we were able to create Python scripts to perform useful data-mining functions. We searched a dictionary file to find words with certain patterns of interest (okay, we learned how to cheat at Scrabble–but what else are you supposed to do when you’re stuck with an ‘S’, a ‘Z’ and three ‘Y”s?). We were able to parse a large file, the text of Hamlet, using regular expressions and other Python tricks we’d learned to format the text, create word-frequency lists, and extract other kinds of useful information. We also saw some examples of more complex, “real world” research projects that applied some of these principles.
Version control, a vital part of any successful collaborative project, was introduced as well. There are different version control systems out there, but for this class we used the popular Git (http://github.com). Systems like Git facilitate project management features such as the tracking of revision history, the submission of proposed changes, and the undoing of changes when necessary. Git also allows users to ‘fork’ a copy of the project to their own repository so they can experiment with changes without effecting the main project file.
This is just a quick summary of what was an information-packed inaugural session. Thank you to the folks from Software Carpentry! Thanks as well to Chris Erdmann for all he’s done to make this learning opportunity available, and for taking the time to show us the beautiful old Great Refractor telescope and amazing SDO solar images at the CfA.