Dr. Xia is a hybrid computer scientist, statistician and bioinformatician who takes a general interest in genome sciences and medicine. His career goal is to advance the field of precision health, in particular precision cancer medicine, by integrating novel technology, big-data, and intelligent models. Dr. Xia worked extensively on microbiome and human multiomics data modeling and analysis. His publications addressed many methodological needs in commensal bacteria, structural variation, and cancer genomics analysis. He and his works were recognized by the career awards he received from the American Cancer Society and the Innovation in Cancer Informatics Fund. Find out more about his research at

Academic Appointments

Honors & Awards

  • Postdoc Fellowship, American Cancer Society (2019)
  • Scholar-In-Training Award, American Association for Cancer Research (2018)
  • Travel Fellowship, Alzheimer’s Association International Conference (2016)
  • Reviewer's Choice Best Abstract, The American Society of Human Genetics Annual Meeting (2015)
  • Travel Fellowship, Bayer International Computational Biology Workshop (2014)
  • Dissertation Year Fellowship, University of Southern California (2012)
  • Merit Fellowship, University of Southern California (2006-2007)

Boards, Advisory Committees, Professional Organizations

  • Program Committee Co-chair, COMMAND workshop of the IEEE Bioinformatics and Biomedicine Conference 2015 (2015 - 2015)

Professional Education

  • Doctor of Philosophy, University of Southern California, Los Angeles, US, Bioinformatics and Computational Biology (2013)
  • Master of Science, University of Southern California, Los Angeles, US, Statistics (2012)
  • Master of Science, University of Southern California, Los Angeles, US, Computer Science (2008)
  • Master of Science, Fudan University, Shanghai, China, Physics (Theoretical Physics) (2006)
  • Bachelor of Science, Fudan University, Shanghai, China, Electronics Engineering (2003)


All Publications

  • Explore mediated co-varying dynamics in microbial community using integrated local similarity and liquid association analysis. BMC genomics Ai, D., Li, X., Pan, H., Chen, J., Cram, J. A., Xia, L. C. 2019; 20 (Suppl 2): 185


    BACKGROUND: Discovering the key microbial species and environmental factors of microbial community and characterizing their relationships with other members are critical to ecosystem studies. The microbial co-occurrence patterns across a variety of environmental settings have been extensively characterized. However, previous studies were limited by their restriction toward pairwise relationships, while there was ample evidence of third-party mediated co-occurrence in microbial communities.METHODS: We implemented and applied the triplet-based liquid association analysis in combination with the local similarity analysis procedure to microbial ecology data. We developed an intuitive scheme to visualize those complex triplet associations along with pairwise correlations. Using a time series from the marine microbial ecosystem as example, we identified pairs of operational taxonomic units (OTUs) where the strength of their associations appeared to relate to the values of a third "mediator" variable. These "mediator" variables appear to modulate the associations between pairs of bacteria.RESULTS: Using this analysis, we were able to assess the OTUs' ability to regulate its functional partners in the community, typically not manifested in the pairwise correlation patterns. For example, we identified Flavobacteria as a multifaceted player in the marine microbial ecosystem, and its clades were involved in mediating other OTU pairs. By contrast, SAR11 clades were not active mediators of the community, despite being abundant and highly correlated with other OTUs. Our results suggested that Flavobacteria are more likely to respond to situations where particles and unusual sources of dissolved organic material are prevalent, such as after a plankton bloom. On the other hand, SAR11s are oligotrophic chemoheterotrophs with inflexible metabolisms, and their relationships with other organisms may be less governed by environmental or biological factors.CONCLUSIONS: By integrating liquid association with local similarity analysis to explore the mediated co-varying dynamics, we presented a novel perspective and a useful toolkit to analyze and interpret time series data from microbial community. Our augmented association network analysis is thus more representative of the true underlying dynamic structure of the microbial community. The analytic software in this study was implemented as new functionalities of the ELSA (Extended local similarity analysis) tool, which is available for free download ( ).

    View details for PubMedID 30967122

  • Identifying Gut Microbiota Associated With Colorectal Cancer Using a Zero-Inflated Lognormal Model. Frontiers in microbiology Ai, D., Pan, H., Li, X., Gao, Y., Liu, G., Xia, L. C. 2019; 10: 826


    Colorectal cancer (CRC) is the third most common cancer worldwide. Its incidence is still increasing, and the mortality rate is high. New therapeutic and prognostic strategies are urgently needed. It became increasingly recognized that the gut microbiota composition differs significantly between healthy people and CRC patients. Thus, identifying the difference between gut microbiota of the healthy people and CRC patients is fundamental to understand these microbes' functional roles in the development of CRC. We studied the microbial community structure of a CRC metagenomic dataset of 156 patients and healthy controls, and analyzed the diversity, differentially abundant bacteria, and co-occurrence networks. We applied a modified zero-inflated lognormal (ZIL) model for estimating the relative abundance. We found that the abundance of genera: Anaerostipes, Bilophila, Catenibacterium, Coprococcus, Desulfovibrio, Flavonifractor, Porphyromonas, Pseudoflavonifractor, and Weissella was significantly different between the healthy and CRC groups. We also found that bacteria such as Streptococcus, Parvimonas, Collinsella, and Citrobacter were uniquely co-occurring within the CRC patients. In addition, we found that the microbial diversity of healthy controls is significantly higher than that of the CRC patients, which indicated a significant negative correlation between gut microbiota diversity and the stage of CRC. Collectively, our results strengthened the view that individual microbes as well as the overall structure of gut microbiota were co-evolving with CRC.

    View details for PubMedID 31068913

    View details for PubMedCentralID PMC6491826

  • SVEngine: an efficient and versatile simulator of genome structural variations with features of cancer clonal evolution. GigaScience Xia, L. C., Ai, D., Lee, H., Andor, N., Li, C., Zhang, N. R., Ji, H. P. 2018


    Background: Simulating genome sequence data with variant features facilitates the development and benchmarking of structural variant analysis programs. However, there are only a few data simulators that provide structural variants in silico and even fewer that provide variants with different allelic fraction and haplotypes.Findings: We developed SVEngine, an open source tool to address this need. SVEngine simulates next generation sequencing data with embedded structural variations. As input, SVEngine takes template haploid sequences (FASTA) and an external variant file, a variant distribution file and/or a clonal phylogeny tree file (NEWICK) as input. Subsequently, it simulates and outputs sequence contigs (FASTAs), sequence reads (FASTQs) and/or post-alignment files (BAMs). All of the files contain the desired variants, along with BED files containing the ground truth. SVEngine's flexible design process enables one to specify size, position, and allelic fraction for deletions, insertions, duplications, inversions and translocations. Finally, SVEngine simulates sequence data that replicates the characteristics of a sequencing library with mixed sizes of DNA insert molecules. To improve the compute speed, SVEngine is highly parallelized to reduce the simulation time.Conclusions: We demonstrated the versatile features of SVEngine and its improved runtime comparisons with other available simulators. SVEngine's features include the simulation of locus-specific variant frequency designed to mimic the phylogeny of cancer clonal evolution. We validated SVEngine's accuracy by simulating genome-wide structural variants of NA12878 and a heterogenous cancer genome. Our evaluation included checking various sequencing mapping features such as coverage change, read clipping, insert size shift and neighbouring hanging read pairs for representative variant types. Structural variant callers Lumpy and Manta and tumor heterogeneity estimator THetA2 were able to perform realistically on the simulated data. SVEngine is implemented as a standard Python package and is freely available for academic use at:

    View details for PubMedID 29982625

  • Identification of large rearrangements in cancer genomes with barcode linked reads. Nucleic acids research Xia, L. C., Bell, J. M., Wood-Bouwens, C., Chen, J. J., Zhang, N. R., Ji, H. P. 2018; 46 (4): e19


    Large genomic rearrangements involve inversions, deletions and other structural changes that span Megabase segments of the human genome. This category of genetic aberration is the cause of many hereditary genetic disorders and contributes to pathogenesis of diseases like cancer. We developed a new algorithm called ZoomX for analysing barcode-linked sequence reads-these sequences can be traced to individual high molecular weight DNA molecules (>50 kb). To generate barcode linked sequence reads, we employ a library preparation technology (10X Genomics) that uses droplets to partition and barcode DNA molecules. Using linked read data from whole genome sequencing, we identify large genomic rearrangements, typically greater than 200kb, even when they are only present in low allelic fractions. Our algorithm uses a Poisson scan statistic to identify genomic rearrangement junctions, determine counts of junction-spanning molecules and calculate a Fisher's exact test for determining statistical significance for somatic aberrations. Utilizing a well-characterized human genome, we benchmarked this approach to accurately identify large rearrangement. Subsequently, we demonstrated that our algorithm identifies somatic rearrangements when present in lower allelic fractions as occurs in tumors. We characterized a set of complex cancer rearrangements with multiple classes of structural aberrations and with possible roles in oncogenesis.

    View details for PubMedID 29186506

  • Integrated metagenomic data analysis demonstrates that a loss of diversity in oral microbiota is associated with periodontitis. BMC genomics Ai, D., Huang, R., Wen, J., Li, C., Zhu, J., Xia, L. C. 2017; 18: 1041-?


    Periodontitis is an inflammatory disease affecting the tissues supporting teeth (periodontium). Integrative analysis of metagenomic samples from multiple periodontitis studies is a powerful way to examine microbiota diversity and interactions within host oral cavity.A total of 43 subjects were recruited to participate in two previous studies profiling the microbial community of human subgingival plaque samples using shotgun metagenomic sequencing. We integrated metagenomic sequence data from those two studies, including six healthy controls, 14 sites representative of stable periodontitis, 16 sites representative of progressing periodontitis, and seven periodontal sites of unknown status. We applied phylogenetic diversity, differential abundance, and network analyses, as well as clustering, to the integrated dataset to compare microbiological community profiles among the different disease states.We found alpha-diversity, i.e., mean species diversity in sites or habitats at a local scale, to be the single strongest predictor of subjects' periodontitis status (P < 0.011). More specifically, healthy subjects had the highest alpha-diversity, while subjects with stable sites had the lowest alpha-diversity. From these results, we developed an alpha-diversity logistic model-based naive classifier able to perfectly predict the disease status of the seven subjects with unknown periodontal status (not used in training). Phylogenetic profiling resulted in the discovery of nine marker microbes, and these species are able to differentiate between stable and progressing periodontitis, achieving an accuracy of 94.4%. Finally, we found that the reduction of negatively correlated species is a notable signature of disease progression.Our results consistently show a strong association between the loss of oral microbiota diversity and the progression of periodontitis, suggesting that metagenomics sequencing and phylogenetic profiling are predictive of early periodontitis, leading to potential therapeutic intervention. Our results also support a keystone pathogen-mediated polymicrobial synergy and dysbiosis (PSD) model to explain the etiology of periodontitis. Apart from P. gingivalis, we identified three additional keystone species potentially mediating the progression of periodontitis progression based on pathogenic characteristics similar to those of known keystone pathogens.

    View details for DOI 10.1186/s12864-016-3254-5

    View details for PubMedID 28198672

    View details for PubMedCentralID PMC5310281

  • A genome-wide approach for detecting novel insertion-deletion variants of mid-range size. Nucleic acids research Xia, L. C., Sakshuwong, S., Hopmans, E. S., Bell, J. M., Grimes, S. M., Siegmund, D. O., Ji, H. P., Zhang, N. R. 2016; 44 (15)


    We present SWAN, a statistical framework for robust detection of genomic structural variants in next-generation sequencing data and an analysis of mid-range size insertion and deletions (<10 Kb) for whole genome analysis and DNA mixtures. To identify these mid-range size events, SWAN collectively uses information from read-pair, read-depth and one end mapped reads through statistical likelihoods based on Poisson field models. SWAN also uses soft-clip/split read remapping to supplement the likelihood analysis and determine variant boundaries. The accuracy of SWAN is demonstrated by in silico spike-ins and by identification of known variants in the NA12878 genome. We used SWAN to identify a series of novel set of mid-range insertion/deletion detection that were confirmed by targeted deep re-sequencing. An R package implementation of SWAN is open source and freely available.

    View details for DOI 10.1093/nar/gkw481

    View details for PubMedID 27325742

    View details for PubMedCentralID PMC5009736

  • Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains BMC BIOINFORMATICS Xia, L. C., Ai, D., Cram, J. A., Liang, X., Fuhrman, J. A., Sun, F. 2015; 16


    Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies.By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns.The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website:

    View details for DOI 10.1186/s12859-015-0732-8

    View details for Web of Science ID 000361431300001

    View details for PubMedID 26390921

    View details for PubMedCentralID PMC4578688

  • Efficient statistical significance approximation for local similarity analysis of high-throughput time series data BIOINFORMATICS Xia, L. C., Ai, D., Cram, J., Fuhrman, J. A., Sun, F. 2013; 29 (2): 230-237


    Local similarity analysis of biological time series data helps elucidate the varying dynamics of biological systems. However, its applications to large scale high-throughput data are limited by slow permutation procedures for statistical significance evaluation.We developed a theoretical approach to approximate the statistical significance of local similarity analysis based on the approximate tail distribution of the maximum partial sum of independent identically distributed (i.i.d.) random variables. Simulations show that the derived formula approximates the tail distribution reasonably well (starting at time points > 10 with no delay and > 20 with delay) and provides P-values comparable with those from permutations. The new approach enables efficient calculation of statistical significance for pairwise local similarity analysis, making possible all-to-all local association studies otherwise prohibitive. As a demonstration, local similarity analysis of human microbiome time series shows that core operational taxonomic units (OTUs) are highly synergetic and some of the associations are body-site specific across samples.The new approach is implemented in our eLSA package, which now provides pipelines for faster local similarity analysis of time series data. The tool is freely available from eLSA's website: data are available at Bioinformatics

    View details for DOI 10.1093/bioinformatics/bts668

    View details for Web of Science ID 000313722800011

    View details for PubMedID 23178636

  • Extended local similarity analysis (eLSA) of microbial community and other time series data with replicates BMC SYSTEMS BIOLOGY Xia, L. C., Steele, J. A., Cram, J. A., Cardon, Z. G., Simmons, S. L., Vallino, J. J., Fuhrman, J. A., Sun, F. 2011; 5


    The increasing availability of time series microbial community data from metagenomics and other molecular biological studies has enabled the analysis of large-scale microbial co-occurrence and association networks. Among the many analytical techniques available, the Local Similarity Analysis (LSA) method is unique in that it captures local and potentially time-delayed co-occurrence and association patterns in time series data that cannot otherwise be identified by ordinary correlation analysis. However LSA, as originally developed, does not consider time series data with replicates, which hinders the full exploitation of available information. With replicates, it is possible to understand the variability of local similarity (LS) score and to obtain its confidence interval.We extended our LSA technique to time series data with replicates and termed it extended LSA, or eLSA. Simulations showed the capability of eLSA to capture subinterval and time-delayed associations. We implemented the eLSA technique into an easy-to-use analytic software package. The software pipeline integrates data normalization, statistical correlation calculation, statistical significance evaluation, and association network construction steps. We applied the eLSA technique to microbial community and gene expression datasets, where unique time-dependent associations were identified.The extended LSA analysis technique was demonstrated to reveal statistically significant local and potentially time-delayed association patterns in replicated time series data beyond that of ordinary correlation analysis. These statistically significant associations can provide insights to the real dynamics of biological systems. The newly designed eLSA software efficiently streamlines the analysis and is freely available from the eLSA homepage, which can be accessed at

    View details for DOI 10.1186/1752-0509-5-S2-S15

    View details for Web of Science ID 000301987000015

    View details for PubMedID 22784572

  • Accurate Genome Relative Abundance Estimation Based on Shotgun Metagenomic Reads PLOS ONE Xia, L. C., Cram, J. A., Chen, T., Fuhrman, J. A., Sun, F. 2011; 6 (12)


    Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data-sets) in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based) even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes.

    View details for DOI 10.1371/journal.pone.0027992

    View details for Web of Science ID 000298173500008

    View details for PubMedID 22162995

  • The statistical power of k-mer based aggregative statistics for alignment-free detection of horizontal gene transfer. Synthetic and systems biotechnology Huang, G., Liu, X., Huang, T., Xia, L. 2019; 4 (3): 150–56


    Alignment-based database search and sequence comparison are commonly used to detect horizontal gene transfer (HGT). However, with the rapid increase of sequencing depth, hundreds of thousands of contigs are routinely assembled from metagenomics studies, which challenges alignment-based HGT analysis by overwhelming the known reference sequences. Detecting HGT by k-mer statistics thus becomes an attractive alternative. These alignment-free statistics have been demonstrated in high performance and efficiency in whole-genome and transcriptome comparisons. To adapt k-mer statistics for HGT detection, we developed two aggregative statistics T s u m S and T s u m * , which subsample metagenome contigs by their representative regions, and summarize the regional D 2 S and D 2 * metrics by their upper bounds. We systematically studied the aggregative statistics' power at different k-mer size using simulations. Our analysis showed that, in general, the power of T s u m S and T s u m * increases with sequencing coverage, and reaches a maximum power >80% at k = 6, with 5% Type-I error and the coverage ratio >0.2x. The statistical power of T s u m S and T s u m * was evaluated with realistic simulations of HGT mechanism, sequencing depth, read length, and base error. We expect these statistics to be useful distance metrics for identifying HGT in metagenomic studies.

    View details for DOI 10.1016/j.synbio.2019.08.001

    View details for PubMedID 31508512

  • Association network analysis identifies enzymatic components of gut microbiota that significantly differ between colorectal cancer patients and healthy controls PEERJ Ai, D., Pan, H., Li, X., Wu, M., Xia, L. C. 2019; 7

    View details for DOI 10.7717/peerj.7315

    View details for Web of Science ID 000477696400008

  • Identifying Gut Microbiota Associated With Colorectal Cancer Using a Zero-Inflated Lognormal Model FRONTIERS IN MICROBIOLOGY Ai, D., Pan, H., Li, X., Gao, Y., Liu, G., Xia, L. C. 2019; 10
  • Constructing the Microbial Association Network from Large-Scale Time Series Data Using Granger Causality. Genes Ai, D., Li, X., Liu, G., Liang, X., Xia, L. C. 2019; 10 (3)


    The increasing availability of large-scale time series data allows the inference of microbial community dynamics by association network analysis. However, correlation-based association network analyses are noninformative of causal, mediating and time-dependent relationships between microbial community functional factors. To address this insufficiency, we introduced the Granger causality model to the analysis of a recent marine microbial time series dataset. We systematically constructed a directed acyclic network, representing both internal and external causal relationships among the microbial and environmental factors. We further optimized the network by removing false causal associations using the conditional Granger causality. The final network was visualized as a Granger graph, which was analyzed to identify causal relationships driven by key functional operators in the environment, such as Gammaproteobacteria, which was Granger caused by total organic nitrogen and primary production (p < 0.05 and Q < 0.05).

    View details for PubMedID 30875820

  • Constructing the Microbial Association Network from Large-Scale Time Series Data Using Granger Causality GENES Ai, D., Li, X., Liu, G., Liang, X., Xia, L. C. 2019; 10 (3)
  • Association of Rare Coding Mutations With Alzheimer Disease and Other Dementias Among Adults of European Ancestry JAMA NETWORK OPEN Patel, D., Mez, J., Vardarajan, B. N., Staley, L., Chung, J., Zhang, X., Farrell, J. J., Rynkiewicz, M. J., Cannon-Albright, L. A., Teerlink, C. C., Stevens, J., Corcoran, C., Murcia, J., Lopez, O. L., Mayeux, R., Haines, J. L., Pericak-Vance, M. A., Schellenberg, G., Kauwe, J. K., Lunetta, K. L., Farrer, L. A., Bellair, M., Dinh, H., Doddapeneni, H., Dugan-Perez, S., English, A., Gibbs, R. A., Han, Y., Hu, J., Jayaseelan, J., Kalra, D., Khan, Z., Korchina, V., Lee, S., Liu, Y., Liu, X., Muzny, D., Nasser, W., Salerno, W., Santibanez, J., Skinner, E., White, S., Worley, K., Zhu, Y., Beiser, A., Chen, Y., Chung, J., Cupples, A., DeStefano, A., Dupuis, J., Farrell, J., Farrer, L., Lancour, D., Lin, H., Liu, C., Lunetta, K., Ma, Y., Patel, D., Sarnowski, C., Satizabal, C., Seshadri, S., Sun, F., Zhang, X., Choi, S., Banks, E., Gabriel, S., Gupta, N., Bush, W., Butkiewicz, M., Haines, J., Smieszek, S., Song, Y., Barral, S., De Jager, P. L., Mayeux, R., Reitz, C., Reyes, D., Tosto, G., Vardarajan, B., Amad, S., Amin, N., Ikram, M., van der Lee, S., van Duijn, C., Vanderspek, A., Schmidt, H., Schmidt, R., Goate, A., Kapoor, M., Marcora, E., Renton, A., Faber, K., Foroud, T., Feolo, M., Stine, A., Launer, L. J., Bennett, D. A., Xia, L., Beecham, G., Hamilton-Nelson, K., Jaworski, J., Kunkle, B., Martin, E., Pericak-Vance, M., Rajabli, F., Schmidt, M., Mosley, T. H., Cantwell, L., Childress, M., Chou, Y., Cweibel, R., Gangadharan, P., Kuzma, A., Leung, Y., Lin, H., Malamon, J., Mlynarski, E., Naj, A., Qu, L., Schellenberg, G., Valladares, O., Wang, L., Wang, W., Zhang, N., Below, J. E., Boerwinkle, E., Bressler, J., Fornage, M., Jian, X., Liu, X., Bis, J. C., Blue, E., Brown, L., Day, T., Dorschner, M., Horimoto, A. R., Nafikov, R., Nato, A. Q., Navas, P., Nguyen, H., Psaty, B., Rice, K., Saad, M., Sohi, H., Thornton, T., Tsuang, D., Wang, B., Wijsman, E., Witten, D., Antonacci-Fulton, L., Appelbaum, E., Cruchaga, C., Fulton, R. S., Koboldt, D. C., Larson, D. E., Waligorski, J., Wilson, R. K., Alzheimers Dis Sequencing Project 2019; 2 (3): e191350


    Some of the unexplained heritability of Alzheimer disease (AD) may be due to rare variants whose effects are not captured in genome-wide association studies because very large samples are needed to observe statistically significant associations.To identify genetic variants associated with AD risk using a nonstatistical approach.Genetic association study in which rare variants were identified by whole-exome sequencing in unrelated individuals of European ancestry from the Alzheimer's Disease Sequencing Project (ADSP). Data were analyzed between March 2017 and September 2018.Minor alleles genome-wide and in 95 genes previously associated with AD, AD-related traits, or other dementias were tabulated and filtered for predicted functional impact and occurrence in participants with AD but not controls. Support for several findings was sought in a whole-exome sequencing data set comprising 19 affected relative pairs from Utah high-risk pedigrees and whole-genome sequencing data sets from the ADSP and Alzheimer's Disease Neuroimaging Initiative.Among 5617 participants with AD (3202 [57.0%] women; mean [SD] age, 76.4 [9.3] years) and 4594 controls (2719 [59.0%] women; mean [SD] age, 86.5 [4.5] years), a total of 24 variants with moderate or high functional impact from 19 genes were observed in 10 or more participants with AD but not in controls. These variants included a missense mutation (rs149307620 [p.A284T], n = 10) in NOTCH3, a gene in which coding mutations are associated with cerebral autosomal-dominant arteriopathy with subcortical infarcts and leukoencephalopathy (CADASIL), that was also identified in 1 participant with AD and 1 participant with mild cognitive impairment in the whole genome sequencing data sets. Four participants with AD carried the TREM2 rs104894002 (p.Q33X) high-impact mutation that, in homozygous form, causes Nasu-Hakola disease, a rare disorder characterized by early-onset dementia and multifocal bone cysts, suggesting an intermediate inheritance model for the mutation. Compared with controls, participants with AD had a significantly higher burden of deleterious rare coding variants in dementia-associated genes (2314 vs 3354 cumulative variants, respectively; P = .006).Different mutations in the same gene or variable dose of a mutation may be associated with result in distinct dementias. These findings suggest that minor differences in the structure or amount of protein may be associated with in different clinical outcomes. Understanding these genotype-phenotype associations may provide further insight into the pathogenic nature of the mutations, as well as offer clues for developing new therapeutic targets.

    View details for DOI 10.1001/jamanetworkopen.2019.1350

    View details for Web of Science ID 000465424000082

    View details for PubMedID 30924900

    View details for PubMedCentralID PMC6450321

  • Using Decision Tree Aggregation with Random Forest Model to Identify Gut Microbes Associated with Colorectal Cancer. Genes Ai, D., Pan, H., Han, R., Li, X., Liu, G., Xia, L. C. 2019; 10 (2)


    The imbalance of human gut microbiota has been associated with colorectal cancer. In recent years, metagenomics research has provided a large amount of scientific data enabling us to study the dedicated roles of gut microbes in the onset and progression of cancer. We removed unrelated and redundant features during feature selection by mutual information. We then trained a random forest classifier on a large metagenomics dataset of colorectal cancer patients and healthy people assembled from published reports and extracted and analysed the information from the learned decision trees. We identified key microbial species associated with colorectal cancers. These microbes included Porphyromonas asaccharolytica, Peptostreptococcus stomatis, Fusobacterium, Parvimonas sp., Streptococcus vestibularis and Flavonifractor plautii. We obtained the optimal splitting abundance thresholds for these species to distinguish between healthy and colorectal cancer samples. This extracted consensus decision tree may be applied to the diagnosis of colorectal cancers.

    View details for PubMedID 30717284

  • Targeted short read sequencing and assembly of re-arrangements and candidate gene loci provide megabase diplotypes. Nucleic acids research Shin, G., Greer, S. U., Xia, L. C., Lee, H., Zhou, J., Boles, T. C., Ji, H. P. 2019


    The human genome is composed of two haplotypes, otherwise called diplotypes, which denote phased polymorphisms and structural variations (SVs) that are derived from both parents. Diplotypes place genetic variants in the context of cis-related variants from a diploid genome. As a result, they provide valuable information about hereditary transmission, context of SV, regulation of gene expression and other features which are informative for understanding human genetics. Successful diplotyping with short read whole genome sequencing generally requires either a large population or parent-child trio samples. To overcome these limitations, we developed a targeted sequencing method for generating megabase (Mb)-scale haplotypes with short reads. One selects specific 0.1-0.2 Mb high molecular weight DNA targets with custom-designed Cas9-guide RNA complexes followed by sequencing with barcoded linked reads. To test this approach, we designed three assays, targeting the BRCA1 gene, the entire 4-Mb major histocompatibility complex locus and 18 well-characterized SVs, respectively. Using an integrated alignment- and assembly-based approach, we generated comprehensive variant diplotypes spanning the entirety of the targeted loci and characterized SVs with exact breakpoints. Our results were comparable in quality to long read sequencing.

    View details for DOI 10.1093/nar/gkz661

    View details for PubMedID 31350896

  • Association network analysis identifies enzymatic components of gut microbiota that significantly differ between colorectal cancer patients and healthy controls. PeerJ Ai, D., Pan, H., Li, X., Wu, M., Xia, L. C. 2019; 7: e7315


    The human gut microbiota plays a major role in maintaining human health and was recently recognized as a promising target for disease prevention and treatment. Many diseases are traceable to microbiota dysbiosis, implicating altered gut microbial ecosystems, or, in many cases, disrupted microbial enzymes carrying out essential physio-biochemical reactions. Thus, the changes of essential microbial enzyme levels may predict human disorders. With the rapid development of high-throughput sequencing technologies, metagenomics analysis has emerged as an important method to explore the microbial communities in the human body, as well as their functionalities. In this study, we analyzed 156 gut metagenomics samples from patients with colorectal cancer (CRC) and adenoma, as well as that from healthy controls. We estimated the abundance of microbial enzymes using the HMP Unified Metabolic Analysis Network method and identified the differentially abundant enzymes between CRCs and controls. We constructed enzymatic association networks using the extended local similarity analysis algorithm. We identified CRC-associated enzymic changes by analyzing the topological features of the enzymatic association networks, including the clustering coefficient, the betweenness centrality, and the closeness centrality of network nodes. The network topology of enzymatic association network exhibited a difference between the healthy and the CRC environments. The ABC (ATP binding cassette) transporter and small subunit ribosomal protein S19 enzymes, had the highest clustering coefficient in the healthy enzymatic networks. In contrast, the Adenosylhomocysteinase enzyme had the highest clustering coefficient in the CRC enzymatic networks. These enzymic and metabolic differences may serve as risk predictors for CRCs and are worthy of further research.

    View details for DOI 10.7717/peerj.7315

    View details for PubMedID 31392094

    View details for PubMedCentralID PMC6673421

  • High-quality CNV segments from low-coverage whole genome sequencing from FFPE cancer biopsies based on an evaluation of multiple CNV tools Lee, H., Xia, L., Greer, S., Bell, J., Grimes, S. M., Bouwens, C., Shin, G., Lau, B. C., Johnson, L., Andor, N., Day, K., Miller, M., Escobar, H., Nadauld, L., Ji, H. P., Van Hummelen, P. AMER ASSOC CANCER RESEARCH. 2018
  • Linked read whole genome sequencing reveals pervasive chromosomal level instability and novel rearrangements in brain metastases from colorectal cancer Xia, L. C., Bell, J. M., Wood-Bouwens, C., King, D. A., Shin, G., Greer, S., Connolly, I. D., Gephart, M. H., Ji, H. P. AMER ASSOC CANCER RESEARCH. 2018
  • CoreProbe: A Novel Algorithm for Estimating Relative Abundance Based on Metagenomic Reads. Genes Ai, D., Pan, H., Huang, R., Xia, L. C. 2018; 9 (6)


    With the rapid development of high-throughput sequencing technology, the analysis of metagenomic sequencing data and the accurate and efficient estimation of relative microbial abundance have become important ways to explore the microbial composition and function of microbes. In addition, the accuracy and efficiency of the relative microbial abundance estimation are closely related to the algorithm and the selection of the reference sequence for sequence alignment. We introduced the microbial core genome as the reference sequence for potential microbes in a metagenomic sample, and we constructed a finite mixture and latent Dirichlet models and used the Gibbs sampling algorithm to estimate the relative abundance of microorganisms. The simulation results showed that our approach can improve the efficiency while maintaining high accuracy and is more suitable for high-throughput metagenomic data. The new approach was implemented in our CoreProbe package which provides a pipeline for an accurate and efficient estimation of the relative abundance of microbes in a community. This tool is available free of charge from the CoreProbe's website: Access the Docker image with the following instruction: sudo docker pull panhongfei/coreprobe:1.0.

    View details for PubMedID 29925824

  • Chromosome-scale mega-haplotypes enable digital karyotyping of cancer aneuploidy NUCLEIC ACIDS RESEARCH Bell, J. M., Lau, B. T., Greer, S. U., Wood-Bouwens, C., Xia, L. C., Connolly, I. D., Gephart, M. H., Ji, H. P. 2017; 45 (19): e162


    Genomic instability is a frequently occurring feature of cancer that involves large-scale structural alterations. These somatic changes in chromosome structure include duplication of entire chromosome arms and aneuploidy where chromosomes are duplicated beyond normal diploid content. However, the accurate determination of aneuploidy events in cancer genomes is a challenge. Recent advances in sequencing technology allow the characterization of haplotypes that extend megabases along the human genome using high molecular weight (HMW) DNA. For this study, we employed a library preparation method in which sequence reads have barcodes linked to single HMW DNA molecules. Barcode-linked reads are used to generate extended haplotypes on the order of megabases. We developed a method that leverages haplotypes to identify chromosomal segmental alterations in cancer and uses this information to join haplotypes together, thus extending the range of phased variants. With this approach, we identified mega-haplotypes that encompass entire chromosome arms. We characterized the chromosomal arm changes and aneuploidy events in a manner that offers similar information as a traditional karyotype but with the benefit of DNA sequence resolution. We applied this approach to characterize aneuploidy and chromosomal alterations from a series of primary colorectal cancers.

    View details for PubMedID 28977555

    View details for PubMedCentralID PMC5737808

  • CRISPR-Cas9-targeted fragmentation and selective sequencing enable massively parallel microsatellite analysis NATURE COMMUNICATIONS Shin, G., Grimes, S. M., Lee, H., Lau, B. T., Xia, L. C., Ji, H. P. 2017; 8


    Microsatellites are multi-allelic and composed of short tandem repeats (STRs) with individual motifs composed of mononucleotides, dinucleotides or higher including hexamers. Next-generation sequencing approaches and other STR assays rely on a limited number of PCR amplicons, typically in the tens. Here, we demonstrate STR-Seq, a next-generation sequencing technology that analyses over 2,000 STRs in parallel, and provides the accurate genotyping of microsatellites. STR-Seq employs in vitro CRISPR-Cas9-targeted fragmentation to produce specific DNA molecules covering the complete microsatellite sequence. Amplification-free library preparation provides single molecule sequences without unique molecular barcodes. STR-selective primers enable massively parallel, targeted sequencing of large STR sets. Overall, STR-Seq has higher throughput, improved accuracy and provides a greater number of informative haplotypes compared with other microsatellite analysis approaches. With these new features, STR-Seq can identify a 0.1% minor genome fraction in a DNA mixture composed of different, unrelated samples.

    View details for DOI 10.1038/ncomms14291

    View details for PubMedID 28169275

  • Correlation detection strategies in microbial data sets vary widely in sensitivity and precision ISME JOURNAL Weiss, S., Van Treuren, W., Lozupone, C., Faust, K., Friedman, J., Deng, Y., Xia, L. C., Xu, Z. Z., Ursell, L., Alm, E. J., Birmingham, A., Cram, J. A., Fuhrman, J. A., Raes, J., Sun, F., Zhou, J., Knight, R. 2016; 10 (7): 1669-1681


    Disruption of healthy microbial communities has been linked to numerous diseases, yet microbial interactions are little understood. This is due in part to the large number of bacteria, and the much larger number of interactions (easily in the millions), making experimental investigation very difficult at best and necessitating the nascent field of computational exploration through microbial correlation networks. We benchmark the performance of eight correlation techniques on simulated and real data in response to challenges specific to microbiome studies: fractional sampling of ribosomal RNA sequences, uneven sampling depths, rare microbes and a high proportion of zero counts. Also tested is the ability to distinguish signals from noise, and detect a range of ecological and time-series relationships. Finally, we provide specific recommendations for correlation technique usage. Although some methods perform better than others, there is still considerable need for improvement in current techniques.

    View details for DOI 10.1038/ismej.2015.235

    View details for Web of Science ID 000378292100011

    View details for PubMedID 26905627

    View details for PubMedCentralID PMC4918442


    View details for DOI 10.1214/15-AOAS892

    View details for Web of Science ID 000385029700008

  • Pan-cancer analysis of the extent and consequences of intratumor heterogeneity. Nature medicine Andor, N., Graham, T. A., Jansen, M., Xia, L. C., Aktipis, C. A., Petritsch, C., Ji, H. P., Maley, C. C. 2016; 22 (1): 105-113


    Intratumor heterogeneity (ITH) drives neoplastic progression and therapeutic resistance. We used the bioinformatics tools 'expanding ploidy and allele frequency on nested subpopulations' (EXPANDS) and PyClone to detect clones that are present at a ≥10% frequency in 1,165 exome sequences from tumors in The Cancer Genome Atlas. 86% of tumors across 12 cancer types had at least two clones. ITH in the morphology of nuclei was associated with genetic ITH (Spearman's correlation coefficient, ρ = 0.24-0.41; P < 0.001). Mutation of a driver gene that typically appears in smaller clones was a survival risk factor (hazard ratio (HR) = 2.15, 95% confidence interval (CI): 1.71-2.69). The risk of mortality also increased when >2 clones coexisted in the same tumor sample (HR = 1.49, 95% CI: 1.20-1.87). In two independent data sets, copy-number alterations affecting either <25% or >75% of a tumor's genome predicted reduced risk (HR = 0.15, 95% CI: 0.08-0.29). Mortality risk also declined when >4 clones coexisted in the sample, suggesting a trade-off between the costs and benefits of genomic instability. ITH and genomic instability thus have the potential to be useful measures that can universally be applied to all cancers.

    View details for DOI 10.1038/nm.3984

    View details for PubMedID 26618723

  • Pan-cancer analysis of the extent and consequences of intratumor heterogeneity NATURE MEDICINE Andor, N., Graham, T. A., Jansen, M., Xia, L. C., Aktipis, C. A., Petritsch, C., Ji, H. P., Maley, C. C. 2016; 22 (1): 105-?


    Intratumor heterogeneity (ITH) drives neoplastic progression and therapeutic resistance. We used the bioinformatics tools 'expanding ploidy and allele frequency on nested subpopulations' (EXPANDS) and PyClone to detect clones that are present at a ≥10% frequency in 1,165 exome sequences from tumors in The Cancer Genome Atlas. 86% of tumors across 12 cancer types had at least two clones. ITH in the morphology of nuclei was associated with genetic ITH (Spearman's correlation coefficient, ρ = 0.24-0.41; P < 0.001). Mutation of a driver gene that typically appears in smaller clones was a survival risk factor (hazard ratio (HR) = 2.15, 95% confidence interval (CI): 1.71-2.69). The risk of mortality also increased when >2 clones coexisted in the same tumor sample (HR = 1.49, 95% CI: 1.20-1.87). In two independent data sets, copy-number alterations affecting either <25% or >75% of a tumor's genome predicted reduced risk (HR = 0.15, 95% CI: 0.08-0.29). Mortality risk also declined when >4 clones coexisted in the sample, suggesting a trade-off between the costs and benefits of genomic instability. ITH and genomic instability thus have the potential to be useful measures that can universally be applied to all cancers.

    View details for DOI 10.1038/nm.3984

    View details for Web of Science ID 000367590700022

  • Cross-depth analysis of marine bacterial networks suggests downward propagation of temporal changes ISME JOURNAL Cram, J. A., Xia, L. C., Needham, D. M., Sachdeva, R., Sun, F., Fuhrman, J. A. 2015; 9 (12): 2573-2586


    Interactions among microbes and stratification across depths are both believed to be important drivers of microbial communities, though little is known about how microbial associations differ between and across depths. We have monitored the free-living microbial community at the San Pedro Ocean Time-series station, monthly, for a decade, at five different depths: 5 m, the deep chlorophyll maximum layer, 150 m, 500 m and 890 m (just above the sea floor). Here, we introduce microbial association networks that combine data from multiple ocean depths to investigate both within- and between-depth relationships, sometimes time-lagged, among microbes and environmental parameters. The euphotic zone, deep chlorophyll maximum and 890 m depth each contain two negatively correlated 'modules' (groups of many inter-correlated bacteria and environmental conditions) suggesting regular transitions between two contrasting environmental states. Two-thirds of pairwise correlations of bacterial taxa between depths lagged such that changes in the abundance of deeper organisms followed changes in shallower organisms. Taken in conjunction with previous observations of seasonality at 890 m, these trends suggest that planktonic microbial communities throughout the water column are linked to environmental conditions and/or microbial communities in overlying waters. Poorly understood groups including Marine Group A, Nitrospina and AEGEAN-169 clades contained taxa that showed diverse association patterns, suggesting these groups contain multiple ecological species, each shaped by different factors, which we have started to delineate. These observations build upon previous work at this location, lending further credence to the hypothesis that sinking particles and vertically migrating animals transport materials that significantly shape the time-varying patterns of microbial community composition.

    View details for DOI 10.1038/ismej.2015.76

    View details for Web of Science ID 000365094400004

    View details for PubMedID 25989373

  • A new multiple feature approach for rapid and highly accurate somatic structural variation discovery from whole cancer genome sequencing Xia, L. C., Bell, J., Chen, J., Zhang, N. R., Ji, H. P. AMER ASSOC CANCER RESEARCH. 2015
  • Emergence of Hemagglutinin Mutations During the Course of Influenza Infection. Scientific reports Cushing, A., Kamali, A., Winters, M., Hopmans, E. S., Bell, J. M., Grimes, S. M., Xia, L. C., Zhang, N. R., Moss, R. B., Holodniy, M., Ji, H. P. 2015; 5: 16178-?


    Influenza remains a significant cause of disease mortality. The ongoing threat of influenza infection is partly attributable to the emergence of new mutations in the influenza genome. Among the influenza viral gene products, the hemagglutinin (HA) glycoprotein plays a critical role in influenza pathogenesis, is the target for vaccines and accumulates new mutations that may alter the efficacy of immunization. To study the emergence of HA mutations during the course of infection, we employed a deep-targeted sequencing method. We used samples from 17 patients with active H1N1 or H3N2 influenza infections. These patients were not treated with antivirals. In addition, we had samples from five patients who were analyzed longitudinally. Thus, we determined the quantitative changes in the fractional representation of HA mutations during the course of infection. Across individuals in the study, a series of novel HA mutations directly altered the HA coding sequence were identified. Serial viral sampling revealed HA mutations that either were stable, expanded or were reduced in representation during the course of the infection. Overall, we demonstrated the emergence of unique mutations specific to an infected individual and temporal genetic variation during infection.

    View details for DOI 10.1038/srep16178

    View details for PubMedID 26538451

  • Extended Local Similarity Analysis (eLSA) of Biological Data Encyclopedia of Metagenomics: Genes, Genomes and Metagenomes. Basics, Methods, Databases and Tools Sun, F., Xia, L. edited by Nelson, K. Springer. 2014
  • Accurate Genome Relative Abundance Estimation Based on Shotgun Metagenomic Reads Encyclopedia of Metagenomics: Genes, Genomes and Metagenomes. Basics, Methods, Databases and Tools Sun, F., Xia, L. C. edited by Nelson, K. Springer. 2014
  • A Quantitative Evaluation of Health Care System in US, China, and Sweden Health Med Wang, Q., Li, M., Zu, H., Gao, M., Cao, C., Xia, L. C. 2013; 7 (4)
  • Genetic analysis of differentiation of T-helper lymphocytes GENETICS AND MOLECULAR RESEARCH Wang, Q., Li, M., Xia, L. C., Wen, G., Zu, H., Gao, M. 2013; 12 (2): 972-987


    In the human immune system, T-helper cells are able to differentiate into two lymphocyte subsets: Th1 and Th2. The intracellular signaling pathways of differentiation form a dynamic regulation network by secreting distinctive types of cytokines, while differentiation is regulated by two major gene loci: T-bet and GATA-3. We developed a system dynamics model to simulate the differentiation and re-differentiation process of T-helper cells, based on gene expression levels of T-bet and GATA-3 during differentiation of these cells. We arrived at three ultimate states of the model and came to the conclusion that cell differentiation potential exists as long as the system dynamics is at an unstable equilibrium point; the T-helper cells will no longer have the potential of differentiation when the model reaches a stable equilibrium point. In addition, the time lag caused by expression of transcription factors can lead to oscillations in the secretion of cytokines during differentiation.

    View details for DOI 10.4238/2013.April.2.13

    View details for Web of Science ID 000320030100011

    View details for PubMedID 23613243

  • Marine bacterial, archaeal and protistan association networks reveal ecological linkages ISME JOURNAL Steele, J. A., Countway, P. D., Xia, L., Vigil, P. D., Beman, J. M., Kim, D. Y., Chow, C. T., Sachdeva, R., Jones, A. C., Schwalbach, M. S., Rose, J. M., Hewson, I., Patel, A., Sun, F., Caron, D. A., Fuhrman, J. A. 2011; 5 (9): 1414-1425


    Microbes have central roles in ocean food webs and global biogeochemical processes, yet specific ecological relationships among these taxa are largely unknown. This is in part due to the dilute, microscopic nature of the planktonic microbial community, which prevents direct observation of their interactions. Here, we use a holistic (that is, microbial system-wide) approach to investigate time-dependent variations among taxa from all three domains of life in a marine microbial community. We investigated the community composition of bacteria, archaea and protists through cultivation-independent methods, along with total bacterial and viral abundance, and physico-chemical observations. Samples and observations were collected monthly over 3 years at a well-described ocean time-series site of southern California. To find associations among these organisms, we calculated time-dependent rank correlations (that is, local similarity correlations) among relative abundances of bacteria, archaea, protists, total abundance of bacteria and viruses and physico-chemical parameters. We used a network generated from these statistical correlations to visualize and identify time-dependent associations among ecologically important taxa, for example, the SAR11 cluster, stramenopiles, alveolates, cyanobacteria and ammonia-oxidizing archaea. Negative correlations, perhaps suggesting competition or predation, were also common. The analysis revealed a progression of microbial communities through time, and also a group of unknown eukaryotes that were highly correlated with dinoflagellates, indicating possible symbioses or parasitism. Possible 'keystone' species were evident. The network has statistical features similar to previously described ecological networks, and in network parlance has non-random, small world properties (that is, highly interconnected nodes). This approach provides new insights into the natural history of microbes.

    View details for DOI 10.1038/ismej.2011.24

    View details for Web of Science ID 000295782900003

    View details for PubMedID 21430787

  • PPLook: an automated data mining tool for protein-protein interaction BMC BIOINFORMATICS Zhang, S., Li, Y., Xia, L., Pan, Q. 2010; 11


    Extracting and visualizing of protein-protein interaction (PPI) from text literatures are a meaningful topic in protein science. It assists the identification of interactions among proteins. There is a lack of tools to extract PPI, visualize and classify the results.We developed a PPI search system, termed PPLook, which automatically extracts and visualizes protein-protein interaction (PPI) from text. Given a query protein name, PPLook can search a dataset for other proteins interacting with it by using a keywords dictionary pattern-matching algorithm, and display the topological parameters, such as the number of nodes, edges, and connected components. The visualization component of PPLook enables us to view the interaction relationship among the proteins in a three-dimensional space based on the OpenGL graphics interface technology. PPLook can also provide the functions of selecting protein semantic class, counting the number of semantic class proteins which interact with query protein, counting the literature number of articles appearing the interaction relationship about the query protein. Moreover, PPLook provides heterogeneous search and a user-friendly graphical interface.PPLook is an effective tool for biologists and biosystem developers who need to access PPI information from the literature. PPLook is freely available for non-commercial users at

    View details for DOI 10.1186/1471-2105-11-326

    View details for Web of Science ID 000280331700002

    View details for PubMedID 20550717

  • Oligonucleotide profiling for discriminating bacteria in bacterial communities COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING He, P., Xia, L. 2007; 10 (4): 247-255


    Based on the relative ratios of di- and tri-nucleotides in the DNA sequences, the profiles of 164 genome sequences from 152 representative microbial organisms were computed. By comparing the profiles of the genomes and their substrings with length 500 bps, the fluctuations of the relative abundances of di- and tri-nucleotides of these genomic sequences were analyzed. A new method to discriminate the origins of orphan DNA sequences was proposed, and the origins of 17 uncultured bacterium sequences from a bacterial community in the human gut were postulated and discussed.

    View details for Web of Science ID 000247022400002

    View details for PubMedID 17506707

  • Phase transition in sequence unique reconstruction JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY Xia, L., Zhou, C. 2007; 20 (1): 18-29