JupyterDays Day 1!

Welcome to Jupyter Days Boston 2016 Day 1! This post will cover the highlights of the first day of the meeting. The event is taking place at the Harvard Law School and is organized by Project Jupyter, O”Reilly Media, and the Harvard-Smithsonian Center for Astrophysics Library. Follow along on Twitter #JupyterDays.

The day started with opening words of welcome from the organizers. Chris Erdmann of the CfA Library highlighted the contributions libraries can have in supporting projects and tools like Jupyter Notebooks. Jupyter Notebooks are an excellent example of an open educational resource that can have a profound impact on research communities. Chris pointed out the GitHub Guide “Making your code citable” to help get attendees using Jupyter Notebooks and Zenodo to mint DOIs for GitHub repo releases.  Andrew Odewahn from O’Reilly gave a rundown of the schedule for the day and explained how the Birds of a Feather sessions would be held. Ana Ruvalcaba, the Operations Manager for Project Jupyter, pointed out the Jupyter Community Code of Conduct that would be followed throughout the event.

The keynote speaker for Day 1 was Matthias Bussonnier. His talk documents are on this GitHub repo. Matthias talked about the history of the Jupyter Project from the “Dark Ages” (before the year 2001, which included programming languages like C++, Fortran, etc.) to IPython (~2001-2010). IPython helped scientists do their everyday work instead of fighting with Python all the time. From 2011-2012 the IPython team developed the QtConsole, which was really useful but still kind of difficult to use. Now, the project is called Jupyter and is currently on it’s 6th Notebook prototype. That prototype integrates the following improvements: Websocket vs. Ajax, MathJax, Javascript, JSON vs. XML, a more mature protocol, separation of concern, and feedback from Sage Math. There is an emphasis on community built examples being important to the continual development of Jupyter and the group does not distinguish between developers and users in their mailing lists. They rely on the community to create the new ideas (more ways to contribute than just writing code) and they will help along the way. For example, they recently rolled out nbviewer and are showcasing a gallery of interesting notebooks like the work done by LIGO. Matthias also noted some of the goals moving forward including improvements to running code in-line in documents using Docker and the difficulties of responding to everything that’s happening in a fast paced open source project suggesting people try a more curated channel like the project newsletter to communicate with project developers.

Thorin Tabor gave the next talk of the first half of the day. Thorin’s work on GenePattern Notebooks was showcased and described as a concrete example of how Jupyter Notebook extensions can be used to improve workflows for a specific discipline, in this case bioinformatic research (particularly gene expression data). Thorin originally worked on building out a tool called Genepattern in 2004 for reproducible bioinformatics research, but now by combining that work with Jupyter, they have created an open platform containing a language agnostic analytical tool repository and tools to create custom modules and pipelines that can be integrated with Jupyter notebooks. The tool uses a web server architecture and includes REST APIs for Python, R, MATLAB, and Java. The GenePattern Notebook Jupyter Extension uses the Jupyter extension framework to encapsulate full research narrative. Thorin emphasized that JupyterHub integration on it’s way and that they GenePattern Notebook Extension is on PIP (Python Package Index) and DockerHub. They are tying to make it useful for the whole research team (coders and non-coders alike).

Thirty minute talks on a variety of topics began about half-way though the day. The first group to present focused on signal processing for wearable technologies using Docker notebook containers and Amazon Web services. The presentation was given by Fasas Sadek, Yasha Iravantchi, Diana Zhang, and Demba Ba, and showed how making use of these tools in educational settings helped students make full use of data from fitness trackers. With trackers, you may purchase with the thought that you have access to all the data you’re gathering, but in reality you only have access to pre-processed data in proprietary formats. The signal processing group’s slides are available here.

Jeremy Freeman of Binder gave the next thirty minute talk. He explained that with Binder, a GitHub repo that can be deployed on demand in a docker container using Jupyter notebooks. This sort of combination of technologies could radically impact how publishing is done. He brought up an anonymous quote that illustrated just this point: “repos + notebooks = torpedo aimed at the academic publishing industry”. myBinder.org provides just that combination of tools to creat an executable reproducible environment at the touch of a button. Check out their Binder GitHub repo and look forward to seeing the future of the project. Some of their next steps include re-writing the project in node.js.

Safia Abdalla took the stage after Jeremy with a presentation called “Kooking with Kernels”. The slides for her talk are here. Safia walked through what kernels are and why they’re important using gifs along the way to illustrate her point. She then moved on to describing the Jupyter messaging protocol and why it matters; separating front end and back end is key. She also highlighted resources like try Jupyter to get people testing out the notebook if they’d never used one before and showed that you can write Jupyter Kernels in Python so long as that language has bindings to Python. Safia also noted that she runs mentorship sessions for women in computing and is happy to support other women in the field.

Elaine Angelino and Sam Lau  were up next. Elaine and Sam talked about a program through UC Berkley called called DS8. You can read more about it here. The DS8 course integrates JupyterHub  and Jupyter Notebooks to help students tell a story about data, or as Elaine put it, “an explicit computational narrative”. The tools DS8 is making use of are great for distributing course content and lowering the barrier of entry for students who may not have access to their own computers. The JupyterHub and notebooks allowed for a much-needed browser-based environment where students can learn how to manipulate data in tabular formats and start doing analytics and modeling without prior experience in this field. The course website for spring 2016 can be found here; it makes use of Binder infrastructure and is distributed across many github repos using data8 infrastructure. The course textbook uses GitBook and content (labs, homeworks, projects, etc.) is distributed using ds8-interact.

The final session of the day was a panel focusing on Jupyter in Education. The panel speakers were Paca Nathan, Rahul Dave, Elaine Angelino, Demba Ba, Eni Mustafaraj, and Allen Downey. The panelists were asked a number of questions, but a few stood out, particularly “What’s the most compelling thing you saw today”. In response, the panelists described how wonderful the session on signal processing in an environment like binder was and how the tools presented seemed to lower barriers for students. Panelists were also excited about JupyterHub, dat-data and teaching students to tell data stories using Jupyter notebooks. The panelists reflected on feedback they have received from students using the notebooks with a few notable sentiments being expressed: “Notebooks changed the pedagogy” and one student shared that he/she “doesn’t know what we would be doing without them” in class. The panel seemed to agree that the notebook is a tool for students to record the process of building a script, while scripts themselves are nicer deliverables. Notebooks are great for beginners and people looking to experiment, software engineers will probably not use notebooks often but that’s just not really what they’re for.

Latest posts by Daina Bouquin (see all)