Medical School Office Building (MSOB)
|DATE:||March 1, 2018|
|TIME:||1:30 - 2:50 pm|
|TITLE:||Open science and reproducible research on Jupyter|
Assistant Professor in Statistics, UC Berkeley
Faculty Scientist, Department of Data Science and Technology, Lawrence Berkeley National Laboratory
Project Jupyter, evolved from the IPython environment, provides a platform for interactive computing that is widely used today in research, education, journalism and industry. The core premise of the Jupyter architecture is to design tools around the experience of interactive computing. It provides an environment, protocol, file format and libraries optimized for the computational process when there is a human in the loop, in a live iteration with ideas and data assisted by the computer.
From protocols that enable multiple programming languages to coexist in the same framework, to the container architecture of the Binder project for sharing live computational environments, Jupyter's tolls support open science and reproducible research. I will focus on these areas of the project, and discuss challenges that remain ahead.
On reproducible research, these two overview papers:
- Developing open source scientific practice: http://fperez.org/papers/millman-perez.pdf
- Jupyter Notebooks—a publishing format for reproducible computational workflows: http://fperez.org/papers/kluyver16-jupyter-nb-repro.pdf
On a high-visibility example of the real-world impact of reproducibility issues, the Reinhart-Rogoff case, these three blog posts:
- Government debt, economic growth and a buggy Excel spreadsheet: the code behind the politics of fiscal austerity.
- Researchers Finally Replicated Reinhart-Rogoff, and There Are Serious Problems.
- More Bad Excel.