Meta-learning for biomedical data in oncology

Meta learning can reduce the amount of data you need in the target domain. A typical meta-learning pipeline consists of two stages: (1) top panel: pre-train on source data which is abundant available and potentially training on different tasks, (2) Bottom panel: fine-tuning of the model on scarce target data. Read more here.


RNA sequencing has emerged as a promising approach in cancer prognosis as RNA sequencing becomes more easily and affordable. However, it remains challenging to build good predictive models especially when the sample size is limited, which is a common situation in biomedical studies. We developed a meta-learning framework based on neural networks for survival analysis applied in cancer research. We demonstrate that, compared to regular pre-training, meta-learning is a more efficient paradigm to learn information from data that is relevant but not directly related to the problem of interest, thus, alleviating the issue of not having a large sample size from a particular problem to train a model. For the application of predicting cancer survival outcome, we show that the meta-learning framework is able to achieve similar performances as regular learning from a significantly larger number of samples aided by an efficient knowledge transfer. We demonstrate that our model can prioritize genes based on their contribution to survival prediction and identify important pathways in cancer.


Comparison of meta-learning with regular transfer learning (pre-training), and combined learing.  The figure shows results for lung cancer (top panel), glioma (middle panel) and head and neck cancer (bottom panel).


For all of the large target cancer sites, meta-learning achieves similar or better performance than regular pre-training or combined learning. Note that, the variance of the meta-learning results across 25 random trials also tends to be the smallest, which is most observable for the lung cancer and glioma cohorts. In addition, each of these multi-layer neural networks also shows better performance on average than a linear baseline model. The linear baseline model achieves a C-index of 0.61 for lung cancer, 0.77 for glioma, and 0.59 for HNSC. 

The code used for this project is available here. You can find more about meta learning here.



Qiu YL, Zheng H, Devos A, Selby H, Gevaert O. A meta-learning approach for genomic survival analysis. Nat Commun. 2020 Dec 11;11(1):6350. doi: 10.1038/s41467-020-20167-3. PMID: 33311484; PMCID: PMC7733508.