Bachelor of Science, University of Minnesota Twin Cities, Biology (2007)
Doctor of Philosophy, Stanford University, GENE-PHD (2013)
Robert West, Postdoctoral Faculty Sponsor
In embryonic stem cells, extracellular signals are required to derepress developmental promoters to drive lineage specification, but the proteins involved in connecting extrinsic cues to relaxation of chromatin remain unknown. We demonstrate that the helix-loop-helix (HLH) protein, HEB, directly associates with the Polycomb repressive complex 2 (PRC2) at a subset of developmental promoters, including at genes involved in mesoderm and endoderm specification and at the Hox and Fox gene families. While we show that depletion of HEB does not affect mouse ESCs, it does cause premature differentiation after exposure to Activin. Further, we find that HEB deposition at developmental promoters is dependent upon PRC2 and independent of Nodal, whereas HEB association with SMAD2/3 elements is dependent of Nodal, but independent of PRC2. We suggest that HEB is a fundamental link between Nodal signalling, the derepression of a specific class of poised promoters during differentiation, and lineage specification in mouse ESCs.
View details for DOI 10.1038/ncomms7546
View details for PubMedID 25775035
High-occupancy target (HOT) regions are compact genome loci occupied by many different transcription factors (TFs). HOT regions were initially defined in invertebrate model organisms, and we here show that they are a ubiquitous feature of the human gene-regulation landscape.We identified HOT regions by a comprehensive analysis of ChIP-seq data from 96 DNA-associated proteins in 5 human cell lines. Most HOT regions co-localize with RNA polymerase II binding sites, but many are not near the promoters of annotated genes. At HOT promoters, TF occupancy is strongly predictive of transcription preinitiation complex recruitment and moderately predictive of initiating Pol II recruitment, but only weakly predictive of elongating Pol II and RNA transcript abundance. TF occupancy varies quantitatively within human HOT regions; we used this variation to discover novel associations between TFs. The sequence motif associated with any given TF's direct DNA binding is somewhat predictive of its empirical occupancy, but a great deal of occupancy occurs at sites without the TF's motif, implying indirect recruitment by another TF whose motif is present.Mammalian HOT regions are regulatory hubs that integrate the signals from diverse regulatory pathways to quantitatively tune the promoter for RNA polymerase II recruitment.
View details for DOI 10.1186/1471-2164-14-720
View details for Web of Science ID 000328633100002
View details for PubMedID 24138567
BACKGROUND: Molecular characterization of tumors has been critical for identifying important genes in cancer biology and for improving tumor classification and diagnosis. Long non-coding RNAs, as a new, relatively unstudied class of transcripts, provide a rich opportunity to identify both functional drivers and cancer-type-specific biomarkers. However, despite the potential importance of long non-coding RNAs to the cancer field, no comprehensive survey of long non-coding RNA expression across various cancers has been reported. RESULTS: We performed a sequencing-based transcriptional survey of both known long non-coding RNAs and novel intergenic transcripts across a panel of 64 archival tumor samples comprising 17 diagnostic subtypes of adenocarcinomas, squamous cell carcinomas and sarcomas. We identified hundreds of transcripts from among the known 1,065 long non-coding RNAs surveyed that showed variability in transcript levels between the tumor types and are therefore potential biomarker candidates. We discovered 1,071 novel intergenic transcribed regions and demonstrate that these show similar patterns of variability between tumor types. We found that many of these differentially expressed cancer transcripts are also expressed in normal tissues. One such novel transcript specifically expressed in breast tissue was further evaluated using RNA in situ hybridization on a panel of breast tumors. It was shown to correlate with low tumor grade and estrogen receptor expression, thereby representing a potentially important new breast cancer biomarker. CONCLUSIONS: This study provides the first large survey of long non-coding RNA expression within a panel of solid cancers and also identifies a number of novel transcribed regions differentially expressed across distinct cancer types that represent candidate biomarkers for future research.
View details for DOI 10.1186/gb-2012-13-8-r75
View details for PubMedID 22929540
Mutations at the APM1 and APM2 loci in the green alga Chlamydomonas reinhardtii confer resistance to phosphorothioamidate and dinitroaniline herbicides. Genetic interactions between apm1 and apm2 mutations suggest an interaction between the gene products. We identified the APM1 and APM2 genes using a map-based cloning strategy. Genomic DNA fragments containing only the DNJ1 gene encoding a type I Hsp40 protein rescue apm1 mutant phenotypes, conferring sensitivity to the herbicides and rescuing a temperature-sensitive growth defect. Lesions at five apm1 alleles include missense mutations and nucleotide insertions and deletions that result in altered proteins or very low levels of gene expression. The HSP70A gene, encoding a cytosolic Hsp70 protein known to interact with Hsp40 proteins, maps near the APM2 locus. Missense mutations found in three apm2 alleles predict altered Hsp70 proteins. Genomic fragments containing the HSP70A gene rescue apm2 mutant phenotypes. The results suggest that a client of the Hsp70-Hsp40 chaperone complex may function to increase microtubule dynamics in Chlamydomonas cells. Failure of the chaperone system to recognize or fold the client protein(s) results in increased microtubule stability and resistance to the microtubule-destabilizing effect of the herbicides. The lack of redundancy of genes encoding cytosolic Hsp70 and Hsp40 type I proteins in Chlamydomonas makes it a uniquely valuable system for genetic analysis of the function of the Hsp70 chaperone complex.
View details for DOI 10.1534/genetics.111.133587
View details for Web of Science ID 000298412100010
View details for PubMedID 21940683
Gene expression microarrays are the most widely used technique for genome-wide expression profiling. However, microarrays do not perform well on formalin fixed paraffin embedded tissue (FFPET). Consequently, microarrays cannot be effectively utilized to perform gene expression profiling on the vast majority of archival tumor samples. To address this limitation of gene expression microarrays, we designed a novel procedure (3'-end sequencing for expression quantification (3SEQ)) for gene expression profiling from FFPET using next-generation sequencing. We performed gene expression profiling by 3SEQ and microarray on both frozen tissue and FFPET from two soft tissue tumors (desmoid type fibromatosis (DTF) and solitary fibrous tumor (SFT)) (total n = 23 samples, which were each profiled by at least one of the four platform-tissue preparation combinations). Analysis of 3SEQ data revealed many genes differentially expressed between the tumor types (FDR<0.01) on both the frozen tissue (approximately 9.6K genes) and FFPET (approximately 8.1K genes). Analysis of microarray data from frozen tissue revealed fewer differentially expressed genes (approximately 4.64K), and analysis of microarray data on FFPET revealed very few (69) differentially expressed genes. Functional gene set analysis of 3SEQ data from both frozen tissue and FFPET identified biological pathways known to be important in DTF and SFT pathogenesis and suggested several additional candidate oncogenic pathways in these tumors. These findings demonstrate that 3SEQ is an effective technique for gene expression profiling from archival tumor samples and may facilitate significant advances in translational cancer research.
View details for DOI 10.1371/journal.pone.0008768
View details for Web of Science ID 000273778900012
View details for PubMedID 20098735
Large biological data sets, such as expression profiles, benefit from reduction of random noise. Principal component (PC) analysis has been used for this purpose, but it tends to remove small features as well as random noise.We interpreted the PCs as a mere signal-rich coordinate system and sorted the squared PC-coordinates of each row in descending order. The sorted squared PC-coordinates were compared with the distribution of the ordered squared random noise, and PC-coordinates for insignificant contributions were treated as random noise and nullified. The processed data were transformed back to the initial coordinates as noise-reduced data. To increase the sensitivity of signal capture and reduce the effects of stochastic noise, this procedure was applied to multiple small subsets of rows randomly sampled from a large data set, and the results corresponding to each row of the data set from multiple subsets were averaged. We call this procedure Row-specific, Sorted PRincipal component-guided Noise Reduction (RSPR-NR). Robust performance of RSPR-NR, measured by noise reduction and retention of small features, was demonstrated using simulated data sets. Furthermore, when applied to an actual expression profile data set, RSPR-NR preferentially increased the correlations between genes that share the same Gene Ontology terms, strongly suggesting reduction of random noise in the data set.RSPR-NR is a robust random noise reduction method that retains small features well. It should be useful in improving the quality of large biological data sets.
View details for DOI 10.1186/1471-2105-9-508
View details for Web of Science ID 000262159700001
View details for PubMedID 19040754