We were visited last Thursday by David Dietrich, an Advisory Technical Consultant in Data Science at EMC Education Services. David’s topic was big data: what it is, how it is being used in various industries, and what it takes to work with it effectively as a data scientist. His examples of data science in action got a good discussion going about privacy issues and the ethics of using big data to make decisions that affect people’s lives. It was a lively and interactive class session. Details can be found in David’s excellent slides (PDF, 4.8 MB) and in my notes (PDF, 121 KB) from his presentation.
One aspect of David’s presentation that interested me was the distinction he made between “data scientists” and “data savvy professionals,” both of which will be in high demand in the coming years, according to a report on big data by the McKinsey Global Institute. Data scientists are “deep analytical talent”: people with solid backgrounds in applied math, economics, or life science, who not only have a quantitative and technical bent, but are also skeptical, communicative, curious, and creative. They are likely to “know more statistics than engineers and more engineering than statisticians.” Data savvy professionals, on the other hand, may have many of these qualities but do not have the same level of technical training. They may have a single class in statistics under their belts and can think of problems as data-driven or analytical problems. They know how to think about data, how to ask the right kinds of questions and challenge the answers they receive.
The name of our class notwithstanding, when it comes to thinking of ourselves as data scientists or data savvy professionals, I think it would be accurate to say that most of us in this class are aiming towards the latter designation. However, for those of us interested in pursuing more rigorous training in data science, David recommends the following formal training options:
- EMC Data Science & Big Data Analytics course
- STEM (science, technology, engineering, and math) graduate programs and certificates
- Conferences on analytics (which often post content online): Strata, PAW, ACM, ACL, INFORMS
- Massive Open Online Courses (MOOCs)
He also recommends a variety of informal options for building skills in this area, including joining meetups, volunteering on Datakind, entering contests on Kaggle and InnoCentive, and simply looking for “opportunities to drive new value” in our own work. Among the potential analyses he proposed within the library domain were
- mapping the influence of ideas in research literature,
- using citation networks to identify the most influential researchers, and
- using citation mapping and social network techniques to predict award-winning research papers.
As a specific example of social graphs in the humanities, Chris suggested Mapping the Republic of Letters, a project centered at Stanford that visualizes the connections of early-modern scholars.
Tonight we went further into the topic of visualizations with a preview of Tableau visualization software and a presentation of tips and best practices by Chris. Stay tuned for more in my next post, coming soon!