Innovative Method to Better Understand How Protein Isoforms are Disease-associated

by Amanda Chase, PhD
July 27, 2021

Proteins are the “work horses” of the body; they are necessary for the structure, function, and regulation of the body’s tissues and organs, and are involved in all body functions, both healthy and diseased. Each protein has unique functions, and those are often related to the protein shape, which, in turn, is dependent upon the order of amino acids. The process of creating a protein involves several steps, and starts with DNA, the instructions for how to make protein. One can imagine DNA as being the instruction manual, although written in a foreign language that must be translated to create the final product. The information in DNA is copied, but not understood, into messenger RNA (mRNA) that acts as the messenger to bring the information to the translator. However, extra pieces of information must first be removed to create mature mRNA that can be used for translating the instructions to create the protein. In an mRNA, extra information can be referred to as introns. After intron removal, the remaining information (exons) are joined together to become the code (information) that translates to protein. One can imagine that the introns can be removed in different ways, leading to the creation of different proteins (isoforms). This process of differently removing introns and exon joining is referred to as alternative splicing. Alternative splicing has several benefits, among them the observation that there is variance across tissue types, populations, and individuals. However, dysregulation of alternative splicing is known to be associated with disease, including cardiovascular disease. For example, mutations in a heart-specific alternative splicing regulator, RMB20, leads to dilated cardiomyopathy (DCM), a leading cause of heart failure.

Understanding how alternative splicing may be dysregulated during disease would be an important step in finding molecular mechanisms that lead to diseases in which splicing is implicated. Long-read sequencing technologies allow for more complete analysis of alternative splicing compared to short-read sequencing technologies, although challenges remain. Current alternative splicing analysis methods do not provide sufficient understanding of the underlying disease mechanisms and are prone to influence by sequencing errors and artifacts. This need for improved methods of analyzing alternative splicing was addressed by a group of researchers, led by co-first authors Chenchen Zhu, PhD, and Jingyan Wu, PhD, and senior author Lars Steinmetz, PhD, Professor of Genetics at Stanford and Group Leader at European Molecular Biology Laboratory in Germany. Their method, recently described and published in Nature Communications, establishes a computational method to quantify full-length transcripts and enable the identification of disease-associated transcript isoforms.

The team leveraged the knowledge that a specific mutation in the heart-specific alternative splicing regulator RMB20 can lead to DCM to generate data, develop the method, and identify mis-spliced isoforms in RMB20 mutants. (For more insights on mutations in RMB20 that cause DCM, a previous paper was published in Cell Reports and described here). They generated a large dataset for human induced pluripotent stem cell derived cardiomyocytes (iPSC-CMs), with and without the RMB20 mutations, and developed a workflow to measure and compare splicing isoforms more accurately and quantitatively. This innovative method enables analysis of the impact of patient mutations via long-read sequencing. The transcriptome (full range of mRNA) is complex, as demonstrated in the figure that shows a region as an example. The team created a computational method for transcript quantification, FulQuant, that allows de novo transcript annotation and better rules out artifacts and sequencing errors. When this method was first used, it was found that the number and complexity of all alternative splicing isoforms is not fully appreciated yet and RBM20 mutations may generate novel transcripts that have previously been missed. These findings could be used for future diagnostics and drug development.  

Figure. Example of genome-wide measurement of full-length splicing isoforms in iPSC-CMs with a region of chromosome 19, demonstrating the complexity. Known isoforms are shown in green; previously unidentified isoforms are in red.

Intriguingly, this new method also provides the opportunity to study differences in protein expression that separate health from disease. Using the comparison between iPSC-CMs with or without DCM causing RMB20 mutations allowed the team to identify an example of a gene with differential expression of specific isoforms: IMMT. Knowing the full-length isoform is essential for understanding the functional product. Here, it can provide insights into specific DCM-causing mutations and provide biological insights important for drug discovery.

Other authors include Han Sun and Francesca Briganti from Stanford; Benjamin Meder from Institute for Cardiomyopathies Heidelberg, University of Heidelberg, and German Center for Cardiovascular Research; and co-senior author Wu Wei from CAS Key Laboratory of Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences.

Chenchen Zhu, PhD

Jingyan Wu, PhD

Lars Steinmetz, PhD