PhD, Stanford, Statistics (2002)
MS, Stanford, Statistics (2002)
BS, Caltech, Math and Computer Science (1997)
My research develops theory and methodology to perform statistical inference about the latent structure of complex systems. I am collaborating with individuals in the Cancer Center on data from phospho flow cytometry and protein arrays.
Variation in human skin and eye color is substantial and especially apparent in admixed populations, yet the underlying genetic architecture is poorly understood because most genome-wide studies are based on individuals of European ancestry. We study pigmentary variation in 699 individuals from Cape Verde, where extensive West African/European admixture has given rise to a broad range in trait values and genomic ancestry proportions. We develop and apply a new approach for measuring eye color, and identify two major loci (HERC2[OCA2] P = 2.3 × 10(-62), SLC24A5 P = 9.6 × 10(-9)) that account for both blue versus brown eye color and varying intensities of brown eye color. We identify four major loci (SLC24A5 P = 5.4 × 10(-27), TYR P = 1.1 × 10(-9), APBA2[OCA2] P = 1.5 × 10(-8), SLC45A2 P = 6 × 10(-9)) for skin color that together account for 35% of the total variance, but the genetic component with the largest effect (~44%) is average genomic ancestry. Our results suggest that adjacent cis-acting regulatory loci for OCA2 explain the relationship between skin and eye color, and point to an underlying genetic architecture in which several genes of moderate effect act together with many genes of small effect to explain ~70% of the estimated heritability.
View details for DOI 10.1371/journal.pgen.1003372
View details for Web of Science ID 000316866700048
View details for PubMedID 23555287
Blood lipid concentrations are heritable risk factors associated with atherosclerosis and cardiovascular diseases. Lipid traits exhibit considerable variation among populations of distinct ancestral origin as well as between individuals within a population. We performed association analyses to identify genetic loci influencing lipid concentrations in African American and Hispanic American women in the Women's Health Initiative SNP Health Association Resource. We validated one African-specific high-density lipoprotein cholesterol locus at CD36 as well as 14 known lipid loci that have been previously implicated in studies of European populations. Moreover, we demonstrate striking similarities in genetic architecture (loci influencing the trait, direction and magnitude of genetic effects, and proportions of phenotypic variation explained) of lipid traits across populations. In particular, we found that a disproportionate fraction of lipid variation in African Americans and Hispanic Americans can be attributed to genomic loci exhibiting statistical evidence of association in Europeans, even though the precise genes and variants remain unknown. At the same time, we found substantial allelic heterogeneity within shared loci, characterized both by population-specific rare variants and variants shared among multiple populations that occur at disparate frequencies. The allelic heterogeneity emphasizes the importance of including diverse populations in future genetic association studies of complex traits such as lipids; furthermore, the overlap in lipid loci across populations of diverse ancestral origin argues that additional knowledge can be gleaned from multiple populations.
View details for PubMedID 23726366
AR-V7, a ligand independent splice variant of androgen receptor, may support the growth of castration resistant prostate cancer and have prognostic value. Another variant, AR-V1, interferes with AR-V7 activity. We investigated whether AR-V7 or V1 expression would predict biochemical recurrence in men at indeterminate (about 50%) risk for progression following radical prostatectomy.AR-V7 and V1 transcripts in a mixed grade cohort of 53 men in whom cancer contained 30% to 70% Gleason grade 4/5 and in a grade 3 only cohort of 52 were measured using a branched chain DNA assay. Spearman rank correlations of the transcripts, and histomorphological and clinical variables were determined. AR-V7 and V1 levels were assessed as determinants of recurrence in the mixed grade cohort by logistic regression and survival analysis. The impact of TMPRSS2-ERG gene fusion on prognosis was also evaluated.Neither AR-V7 nor V1 levels in grade 3 or 4/5 cancer in the mixed grade cohort were associated with recurrence or time to recurrence. However, AR-V7 and V1 inversely correlated with serum prostate specific antigen and positively correlated with age. The AR-V1 level in grade 3 cancer in the grade 3 only cohort was higher than in grade 3 or grade 4/5 components of mixed grade cancer. TMPRSS2-ERG fusion was not associated with AR-V7, AR-V1 or recurrence but it was associated with the percent of grade 4/5 cancer.The AR-V1 or V7 transcript level does not predict recurrence in patients with high grade prostate cancer at indeterminate risk for progression. Grade 3 cancer in mixed grade tumors may differ from 100% grade 3 cancer, at least in AR-V1 expression.
View details for DOI 10.1016/j.juro.2012.08.014
View details for Web of Science ID 000311581400035
View details for PubMedID 23088973
To improve cancer therapy, it is critical to target metastasizing cells. Circulating tumor cells (CTCs) are rare cells found in the blood of patients with solid tumors and may play a key role in cancer dissemination. Uncovering CTC phenotypes offers a potential avenue to inform treatment. However, CTC transcriptional profiling is limited by leukocyte contamination; an approach to surmount this problem is single cell analysis. Here we demonstrate feasibility of performing high dimensional single CTC profiling, providing early insight into CTC heterogeneity and allowing comparisons to breast cancer cell lines widely used for drug discovery.We purified CTCs using the MagSweeper, an immunomagnetic enrichment device that isolates live tumor cells from unfractionated blood. CTCs that met stringent criteria for further analysis were obtained from 70% (14/20) of primary and 70% (21/30) of metastatic breast cancer patients; none were captured from patients with non-epithelial cancer (n = 20) or healthy subjects (n = 25). Microfluidic-based single cell transcriptional profiling of 87 cancer-associated and reference genes showed heterogeneity among individual CTCs, separating them into two major subgroups, based on 31 highly expressed genes. In contrast, single cells from seven breast cancer cell lines were tightly clustered together by sample ID and ER status. CTC profiles were distinct from those of cancer cell lines, questioning the suitability of such lines for drug discovery efforts for late stage cancer therapy.For the first time, we directly measured high dimensional gene expression in individual CTCs without the common practice of pooling such cells. Elevated transcript levels of genes associated with metastasis NPTN, S100A4, S100A9, and with epithelial mesenchymal transition: VIM, TGFß1, ZEB2, FOXC1, CXCR4, were striking compared to cell lines. Our findings demonstrate that profiling CTCs on a cell-by-cell basis is possible and may facilitate the application of 'liquid biopsies' to better model drug discovery.
View details for DOI 10.1371/journal.pone.0033788
View details for Web of Science ID 000305335000005
View details for PubMedID 22586443
For most of the world, human genome structure at a population level is shaped by interplay between ancient geographic isolation and more recent demographic shifts, factors that are captured by the concepts of biogeographic ancestry and admixture, respectively. The ancestry of non-admixed individuals can often be traced to a specific population in a precise region, but current approaches for studying admixed individuals generally yield coarse information in which genome ancestry proportions are identified according to continent of origin. Here we introduce a new analytic strategy for this problem that allows fine-grained characterization of admixed individuals with respect to both geographic and genomic coordinates. Ancestry segments from different continents, identified with a probabilistic model, are used to construct and study "virtual genomes" of admixed individuals. We apply this approach to a cohort of 492 parent-offspring trios from Mexico City. The relative contributions from the three continental-level ancestral populations-Africa, Europe, and America-vary substantially between individuals, and the distribution of haplotype block length suggests an admixing time of 10-15 generations. The European and Indigenous American virtual genomes of each Mexican individual can be traced to precise regions within each continent, and they reveal a gradient of Amerindian ancestry between indigenous people of southwestern Mexico and Mayans of the Yucatan Peninsula. This contrasts sharply with the African roots of African Americans, which have been characterized by a uniform mixing of multiple West African populations. We also use the virtual European and Indigenous American genomes to search for the signatures of selection in the ancestral populations, and we identify previously known targets of selection in other populations, as well as new candidate loci. The ability to infer precise ancestral components of admixed genomes will facilitate studies of disease-related phenotypes and will allow new insight into the adaptive and demographic history of indigenous people.
View details for DOI 10.1371/journal.pgen.1002410
View details for Web of Science ID 000299167900027
View details for PubMedID 22194699
Identifying the targets of immune response after allogeneic hematopoietic cell transplantation (HCT) promises to provide relevant immune therapy candidate proteins. We used protein microarrays to serologically identify nucleolar and spindle-associated protein 1 (NuSAP1) and chromatin assembly factor 1, subunit B (p60; CHAF1b) as targets of new antibody responses that developed after allogeneic HCT. Western blots and enzyme-linked immunosorbent assays (ELISA) validated their post-HCT recognition and enabled ELISA testing of 120 other patients with various malignancies who underwent allo-HCT. CHAF1b-specific antibodies were predominantly detected in patients with acute myeloid leukemia (AML), whereas NuSAP1-specific antibodies were exclusively detected in patients with AML 1 year after transplantation (P < .001). Complete genomic exon sequencing failed to identify a nonsynonymous single nucleotide polymorphism (SNP) for NuSAP1 and CHAF1b between the donor and recipient cells. Expression profiles and reverse transcriptase-polymerase chain reaction (RT-PCR) showed NuSAP1 was predominately expressed in the bone marrow CD34(+)CD90(+) hematopoietic stem cells, leukemic cell lines, and B lymphoblasts compared with other tissues or cells. Thus, NuSAP1 is recognized as an immunogenic antigen in 65% of patients with AML following allogeneic HCT and suggests a tumor antigen role.
View details for DOI 10.1182/blood-2009-03-211375
View details for Web of Science ID 000275751300033
View details for PubMedID 20053754
Gleason grade 4/5 prostate cancer is a determinant for recurrence following radical prostatectomy. Monoamine oxidase-A is over expressed in grade 4/5 compared to grade 3 cancer. Monoamine oxidase-A is also expressed by normal basal cells and in vitro studies suggest that its function is to repress secretory differentiation. Therefore, monoamine oxidase-A in grade 4/5 cancer might reflect dedifferentiation to a basal cell-like phenotype. We investigated whether monoamine oxidase-A expression correlates with another basal cell protein, CD44, in high grade cancer and whether either is associated with an aggressive phenotype.A total of 133 grade 4/5 archival cancers from a cohort previously used to evaluate the prognostic significance of histomorphological variables were scored for monoamine oxidase-A and CD44 immunohistochemical labeling. Spearman rank correlations of the proteins, and histomorphological and clinical variables were determined. The univariate and multivariate value of each variable as a determinant of biochemical recurrence was assessed by logistic regression.Monoamine oxidase-A expression correlated with CD44. Neither was prognostic for biochemical recurrence. However, monoamine oxidase-A expression positively correlated with preoperative serum prostate specific antigen and the percent of grade 4/5 cancer.Concurrent expression of monoamine oxidase-A and CD44 suggests that grade 4/5 cancer may be basal cell-like in nature, despite the absence of other classic basal cell biomarkers such as cytokeratins 5 and 14, and p63. The correlation of monoamine oxidase-A expression with prostate specific antigen and the percent of grade 4/5 cancer suggests that monoamine oxidase-A may contribute to growth of high grade cancer and that antidepressant drugs that target monoamine oxidase-A may have applications in treating prostate cancer.
View details for DOI 10.1016/j.juro.2008.07.019
View details for Web of Science ID 000260102000083
View details for PubMedID 18804811
Progress in understanding the molecular pathogenesis of human myeloproliferative disorders (MPDs) has led to guidelines incorporating genetic assays with histopathology during diagnosis. Advances in flow cytometry have made it possible to simultaneously measure cell type and signaling abnormalities arising as a consequence of genetic pathologies. Using flow cytometry, we observed a specific evoked STAT5 signaling signature in a subset of samples from patients suspected of having juvenile myelomonocytic leukemia (JMML), an aggressive MPD with a challenging clinical presentation during active disease. This signature was a specific feature involving JAK-STAT signaling, suggesting a critical role of this pathway in the biological mechanism of this disorder and indicating potential targets for future therapies.
View details for DOI 10.1016/j.ccr.2008.08.014
View details for Web of Science ID 000259896500008
View details for PubMedID 18835035
Human minor histocompatibility antigens (mHA) and clinically relevant immune responses to them have not been well defined in organ transplantation. We hypothesized that women with male kidney transplants would develop antibodies against H-Y, the mHA encoded on the Y-chromosome, in association with graft rejection.We tested sera from 118 consecutive transplant recipients with kidney biopsies. Antibodies that specifically recognized the recombinant H-Y antigens RPS4Y1 or DDX3Y were detected by IgG enzyme-linked immunosorbent assay and western blotting. Immunogenic epitopes were further identified using overlapping H-Y antigen peptides for both the H-Y proteins.In the 26 female recipients of male kidneys, H-Y antibody development posttransplant (1) was more frequent (46%) than in other gender combinations (P<0.001), (2) showed strong correlation with acute rejection (P=0.00048), (3) correlated with plasma cell infiltrates in biopsied kidneys (P=0.04), and (4) did not correlate with C4d deposition or donor-specific anti-human leukocyte antigen (HLA) antibodies. Of the two H-Y antigens, RPS4Y1 was more frequently recognized (P=0.005).This first demonstration of a strong association between H-Y antibody development and acute rejection in kidney transplant recipients shows that in solid organ allografts, humoral immune responses against well defined mHA have clear clinical correlates, can be easily monitored, and warrant study for possible effects on long-term graft function.
View details for DOI 10.1097/TP.0b013e31817352b9
View details for Web of Science ID 000257790400014
View details for PubMedID 18622281
Estimation of the allele frequency at genetic markers is a key ingredient in biological and biomedical research, such as studies of human genetic variation or of the genetic etiology of heritable traits. As genetic data becomes increasingly available, investigators face a dilemma: when should data from other studies and population subgroups be pooled with the primary data? Pooling additional samples will generally reduce the variance of the frequency estimates; however, used inappropriately, pooled estimates can be severely biased due to population stratification. Because of this potential bias, most investigators avoid pooling, even for samples with the same ethnic background and residing on the same continent. Here, we propose an empirical Bayes approach for estimating allele frequencies of single nucleotide polymorphisms. This procedure adaptively incorporates genotypes from related samples, so that more similar samples have a greater influence on the estimates. In every example we have considered, our estimator achieves a mean squared error (MSE) that is smaller than either pooling or not, and sometimes substantially improves over both extremes. The bias introduced is small, as is shown by a simulation study that is carefully matched to a real data example. Our method is particularly useful when small groups of individuals are genotyped at a large number of markers, a situation we are likely to encounter in a genome-wide association study.
View details for DOI 10.1214/07-AOAS121
View details for Web of Science ID 000261057600010
View details for PubMedID 21451739
Integrated liquid-chromatography mass-spectrometry (LC-MS) is becoming a widely used approach for quantifying the protein composition of complex samples. The output of the LC-MS system measures the intensity of a peptide with a specific mass-charge ratio and retention time. In the last few years, this technology has been used to compare complex biological samples across multiple conditions. One challenge for comparative proteomic profiling with LC-MS is to match corresponding peptide features from different experiments. In this paper, we propose a new method--Peptide Element Alignment (PETAL) that uses raw spectrum data and detected peak to simultaneously align features from multiple LC-MS experiments. PETAL creates spectrum elements, each of which represents the mass spectrum of a single peptide in a single scan. Peptides detected in different LC-MS data are aligned if they can be represented by the same elements. By considering each peptide separately, PETAL enjoys greater flexibility than time warping methods. While most existing methods process multiple data sets by sequentially aligning each data set to an arbitrarily chosen template data set, PETAL treats all experiments symmetrically and can analyze all experiments simultaneously. We illustrate the performance of PETAL on example data sets.
View details for DOI 10.1093/biostatistics/kxl015
View details for Web of Science ID 000245512000015
View details for PubMedID 16880200
While high-throughput genotyping technologies are becoming readily available, the merit of using these technologies to perform genome-wide association studies has not been established. One major concern is that for studies of complex diseases and traits, the whole-genome approach requires such large sample sizes that both recruitment and genotyping pose considerable challenge. Here we propose a novel statistical method that boosts the effective sample size by combining data obtained from several studies. Specifically, we consider a situation in which various studies have genotyped non-overlapping subjects at largely non-overlapping sets of markers. Our approach, which exploits the local linkage disequilibrium structure without assuming an explicit population model, opens up the possibility of improving statistical power by incorporating existing data into future association studies.
View details for PubMedID 18466508
Isotopic labeling of cysteine residues with acrylamide was previously utilized for relative quantitation of proteins by MALDI-TOF. Here, we explored and compared the application of deuterated and (13)C isotopes of acrylamide for quantitative proteomic analysis using LC-MS/MS and high-resolution FTICR mass spectrometry. The method was applied to human serum samples that were immunodepleted of abundant proteins. Our results show reliable quantitation of proteins across an abundance range that spans 5 orders of magnitude based on ion intensities and known protein concentration in plasma. The use of (13)C isotope of acrylamide had a slightly greater advantage relative to deuterated acrylamide, because of shifts in elution of deuterated acrylamide relative to its corresponding nondeuterated compound by reversed-phase chromatography. Overall, the use of acrylamide for differentially labeling intact proteins in complex mixtures, in combination with LC-MS/MS provides a robust method for quantitative analysis of complex proteomes.
View details for DOI 10.1021/pr060102+
View details for Web of Science ID 000239493700022
View details for PubMedID 16889424
Comparing two or more complex protein mixtures using liquid chromatography mass spectrometry (LC-MS) requires multiple analysis steps to locate and quantitate natural peptides within a single experiment and to align and normalize findings across multiple experiments.We describe msInspect, an open-source application comprising algorithms and visualization tools for the analysis of multiple LC-MS experimental measurements. The platform integrates novel algorithms for detecting signatures of natural peptides within a single LC-MS measurement and combines multiple experimental measurements into a peptide array, which may then be mined using analysis tools traditionally applied to genomic array analysis. The platform supports quantitation by both label-free and isotopic labeling approaches. The software implementation has been designed so that many key components may be easily replaced, making it useful as a workbench for integrating other novel algorithms developed by a growing research community.The msInspect software is distributed freely under an Apache 2.0 license. The software as well as a Zip file with all peptide feature files and scripts needed to generate the tables and figures in this article are available at http://proteomics.fhcrc.org/.
View details for DOI 10.1093/bioinformatics/btl276
View details for Web of Science ID 000239899700013
View details for PubMedID 16766559
A chromosome in an individual of recently admixed ancestry resembles a mosaic of chromosomal segments, or ancestry blocks, each derived from a particular ancestral population. We consider the problem of inferring ancestry along the chromosomes in an admixed individual and thereby delineating the ancestry blocks. Using a simple population model, we infer gene-flow history in each individual. Compared with existing methods, which are based on a hidden Markov model, the Markov-hidden Markov model (MHMM) we propose has the advantage of accounting for the background linkage disequilibrium (LD) that exists in ancestral populations. When there are more than two ancestral groups, we allow each ancestral population to admix at a different time in history. We use simulations to illustrate the accuracy of the inferred ancestry as well as the importance of modeling the background LD; not accounting for background LD between markers may mislead us to false inferences about mixed ancestry in an indigenous population. The MHMM makes it possible to identify genomic blocks of a particular ancestry by use of any high-density single-nucleotide-polymorphism panel. One application of our method is to perform admixture mapping without genotyping special ancestry-informative-marker panels.
View details for Web of Science ID 000238341200001
View details for PubMedID 16773560
View details for Web of Science ID 000182454900003