Workshop in Biostatistics

Medical School Office Building (MSOB)
Rm x303

DATE: March 1, 2018
TIME: 1:30 - 2:50 pm
TITLE: Open science and reproducible research on Jupyter
Fernando Perez
Assistant Professor in Statistics, UC Berkeley
Faculty Scientist, Department of Data Science and Technology, Lawrence Berkeley National Laboratory



Project Jupyter, evolved from the IPython environment, provides a platform for interactive computing that is widely used today in research, education, journalism and industry. The core premise of the Jupyter architecture is to design tools around the experience of interactive computing. It provides an environment, protocol, file format and libraries optimized for the computational process when there is a human in the loop, in a live iteration with ideas and data assisted by the computer.

From protocols that enable multiple programming languages to coexist in the same framework, to the container architecture of the Binder project for sharing live computational environments, Jupyter's tolls support open science and reproducible research. I will focus on these areas of the project, and discuss challenges that remain ahead.

Suggested readings:

On reproducible research, these two overview papers:

  1. Developing open source scientific practice:
  2. Jupyter Notebooks—a publishing format for reproducible computational workflows:

On a high-visibility example of the real-world impact of reproducibility issues, the Reinhart-Rogoff case, these three blog posts: