News

New repository for algorithms trained on genomic data

Cambridge, 28.05.2019

A new repository, Kipoi, provides centralised access to standardised machine learning models trained on genomic data. 


Researchers from the groups of Alliance members Anshul Kundaje (Stanford) and Oliver Stegle (EMBL), together with Julien Gagneur from the Technical University of Munich and researchers at Cambridge University have developed a repository providing free centralised access to machine learning models trained in genomic data analysis. Their paper, published in Nature Biotechnology, demonstrates how the Kipoi repository will accelerate knowledge exchange in the genomics community by providing free access to more than 2000 trained predictive machine learning models.


A unique resource

“Traditionally there are two key things in bioinformatics and genome science,” says Oliver Stegle, Group Leader at EMBL and Division Head at the German Cancer Research Center. “The first is big datasets; institutions like EMBL-EBI have always shared data and made it available. The second is software, including the methods, approaches and algorithms we can apply to data in order to gain insights.”

Anshul Kundaje, Assistant Professor at Stanford University, agrees. “What we are doing with Kipoi is not just sharing data and software, but sharing models and algorithms that are already trained on the most relevant data. These models are ready to use, because all the cumbersome work of applying them to data has already been done.”

"What we are doing with Kipoi is not just sharing data and software, but sharing models and algorithms that are already trained on the most relevant data. These models are ready to use, because all the cumbersome work of applying them to data has already been done."


Acceleration through standardisation

Kipoi will help tackle the rapid growth of genome sequencing and molecular profiling datasets by accelerating community exchange and encouraging reuse of the predictive models held in the repository.

“This repository is really targeted for models that link genotype to phenotype,” says Žiga Avsec, PhD student at the Technical University of Munich. “Because we focus on models of this type we can standardise them and improve accessibility to an extent that wasn’t previously possible.”

“By simplifying access to these tools, other researchers can then grab those models and do things like transfer learning, build on top of existing models and enhance the capabilities of models in a fairly easy way,” says Roman Kreuzhuber, a former joint PhD student at EMBL-EBI and the University of Cambridge. “This standardisation and simplification could in turn contribute to the acceleration of research in the field. It should really help in the reuse of models and in the development of new ones.”

As well as providing access to these models, Kipoi also simplifies the process of feeding data into models. By standardising file formats and software frameworks, the installation and execution of a model is reduced to three simple commands, making the repository easy to use for those who are non-expert in machine learning.

“Kipoi puts the latest deep learning models trained on massive genomics data at the fingertips of clinical researchers. This provides very exciting opportunities to understand individual genomes, for instance to pinpoint genetic variants causing diseases or to interpret mutations occurring in tumours,” says Julien Gagneur, assistant professor at the Technical University of Munich.

It is hoped that more researchers will contribute their models to the repository in future, making genomics analysis more accessible and ergonomic and ultimately making a wider range of predictive machine learning tools available to the genomics community.