2020 Stanford scientists generate a global map of protein expression which helps explain the basis of many genetic diseases

Researchers demonstrate how understanding protein levels can provide insights into regulation, secretome, metabolism, and human disease.


Unraveling the genetic basis of many human diseases is a daunting task. However, understanding where the products of disease-related genes act can provide clues into disease formation. Presently, scientists examine the  RNA -- the first product of a gene --  to infer the tissue where genes act. Unfortunately, there is a downfall to this method: the level of protein -- the active end product of a gene -- often correlates poorly with RNA levels. Thus, generating a map of proteins may be a more revealing approach to understanding the cardinal foundation of disease development.


In a paper published on Sept. 11 in Cell, researchers in the lab of Michael Snyder, Stanford B. Ascherman Professor and Chair of Genetics and Director of Genomics and Personalized Medicine at Stanford University School of Medicine, generated the most comprehensive protein map to date. The map shows where proteins are expressed throughout the human body, providing new insights into regulation, secretome, metabolism, and human diseases. The researchers measured relative protein levels from over 12,000 genes across 32 normal human tissues. Tissue-specific proteins were identified and compared to RNA data. Information from tissue-specific proteins could lead to novel explanations of disease phenotype that could not have been deduced by RNA information alone. 


"The tissue-specific distribution of proteins can provide an in-depth view of complex biological processes that require the interplay of multiple organs,” said lead author Lihua Jiang, a proteomics expert in the Snyder Laboratory responsible for proteomics profiling of host samples of the project. “Analysis of enzymes involved in amino acid metabolism revealed different roles of each organ as well as novel organs (heart, stomach, pancreas) that are important for metabolic control. We envision this kind of analysis can shed light on the understanding of many biological processes.”


Correlation between RNA and Protein Levels


Previous studies of protein levels have already been performed. However, most of these studies focused on in-depth protein identification and analysis, and the protein measurements were either less accurate or less precise. Moreover, most samples in these studies did not have the corresponding RNA information from the same tissue, making the comparison of RNA and protein levels difficult. Although recent studies have greatly advanced tissue-protein identification, a broader study with accurate measurements for both protein and RNA levels within the same tissues is needed to understand protein level differences from RNA. Additionally, no previous studies have used tissue-specific protein data to systematically examine human biological processes and diseases.


This Snyder Lab study offered a good opportunity to characterize the correlation between protein and RNA, as data were generated from the same tissue specimens. For many genes, the Stanford team found that only the RNA (not their corresponding proteins) were present at a significant level, while for other genes, it was the opposite -- the protein was detected and not the RNA. These results indicate that tissue-specific functions cannot be distinguished on the basis of RNA levels alone. 

Insights into Disease and Drug Targets


“Lastly, for genetic diseases caused by mutations in protein-coding regions , the protein information across tissues can suggest the affected organs and explain specific disease symptoms that cannot be explained by genomic studies . As such, the protein data generated in this study is expected to provide valuable insights into human biology and disease," said Dr. Michael Snyder.


Importantly, for many genes, enrichment only occurs at the protein level and not at the RNA level, so protein expression information may provide insights into the underlying disease mechanisms that cannot be identified using RNA information alone. This demonstrates the importance of collecting protein expression information for the understanding of disease phenotype. Thus, the researchers systematically investigated the protein expression patterns of genetic diseases and found many cases where disease-associated proteins are present in tissues that manifest disease-related pathophysiology; many of these would not be evident from RNA analysis. 


For example, Bardet-Biedl syndrome (BBS) is a genetic disorder caused by mutations in at least 14 different genes and affects many parts of the body. BBS-affiliated vision loss, polydactyly, obesity, and other abnormalities can be explained by specific gene mutations but many are still largely unknown; tissue-specific protein expression information might explain some of the clinical symptoms. The researchers detected proteins from 11/14 BBS genes, among which seven are enriched in the pituitary and five are in the brain, muscle, heart, or liver. Abnormality of proteins in the pituitary can broadly affect developmental processes and perhaps cause obesity, diabetes, or hypogonadism observed in BBS patients. The abnormalities in proteins in the brain, muscle, heart, and liver might also contribute to defects such as intellectual disability, delayed motor skills, and conditions that involve the heart, liver, and digestive systems. 


Leigh syndrome is another genetic disease that is associated with mutations in as many as 75 genes. Most of the affected proteins are involved in energy production in the mitochondria. Of the 67/75 proteins observed, 52 showed-up in metabolically active tissues, such as the heart, muscle, brain, and stomach. Some of these proteins were present in all affected tissues and some were only in one or several tissues; their different distributions might cause different tissue-related clinical symptoms. For example, the characteristic progressive loss of mental and movement abilities of Leigh syndrome is most likely related to protein abnormalities in the brain and muscle. Some individuals develop hypertrophic cardiomyopathy which could be caused by mutations in proteins present in the heart. The first signs of Leigh syndrome are vomiting, diarrhea, and difficulty swallowing -- which could be explained by the abnormality of proteins found in the stomach.


Finally, the team identified 1,329 potential drug-targeted proteins, about half of which are FDA approved drug targets. These drug-targeted proteins span 742 different tissues, and 368 are present in more than one tissue. For drug-targeted proteins present outside of the target organ, the drug may have unintended side effects in the off-target tissue. For example, valproic acid is an anticonvulsant drug that works through the inhibition of a protein in the brain. Snyder’s team showed that this drug-targeted protein is also enriched in the liver and pancreas, suggesting the underlying cause of reported liver and pancreas toxicity side effects. 


“This study provides a valuable resource for us to understand human biology and diseases from proteins which are closer to phenotype. We envision some tissue specific proteins can be used as better biomarkers for diagnosis as well.” said Dr. Michael Snyder.