Course Details

The aim of the course is to upgrade the skills sets of librarians so that they can better serve the data needs of their communities, while also learning how to improve current library workflows and systems. Due to the experimental nature of the course, the teaching style will be less formal and flexible to the needs of the instructors and students. Participants will get their hands dirty with data and work with some of the latest technologies currently used by Data Scientists. Just like the last course, the class will present their work to the library community at the conclusion.

Lead Instructors:
Tom Morris, Rahul Dave, Lynn Cherny

David Dietrich, Software Carpentry, James Turk, Gabriel Florit, Christopher Erdmann (Organizer)

Registration closed

Course Material:
Class Projects
Class Notes
Previous course

Harvard-Smithsonian CfA, 60 Garden Street, Cambridge, MA
Harvard Library, 90 Mt Auburn, Cambridge, MA

July 26 – Intro to DST4L
Aug 23-24 – Software Carpentry Bootcamp
Aug 28 – Nov 26 – Classes by T. Morris, R. Dave, L. Cherny
Dec 14 – Projects/Class Ends
Mainly Tues 10-1 & 11-2 also Wed, Thurs (see syllabus)

General Course Outline:
Overview, Data Sources, Data Extraction, Data Cleansing, Sharing, Visualization, Presentation, Advanced Topics

Unix Shell, Git, Python, OpenRefine, Excel
* D3, R, Tableau
PDF extraction, JSON/XML wrangling, HTML page scraping, pandas and viz, NLTK/scikit text extraction, text-tools/RegEx, recommender system, basic statistics, simple ml/naive bayes, static viz, dynamic viz, OpenRefine and tools, Excel spreadsheet usage, basic NumPy/matplotlib

One Response

  1. Peter Green April 10, 2014

Leave a Reply