As a mathematician, I build stochastic models from first principles to solve biology problems. Fields of interest include population genetics and hematopoiesis. As a computer scientist, my primary interest lies in predicting time series, data-mining medical records, and building recommender systems.
At Stanford, I'm developing a scalable pipeline of supervised learning for multiple hospitals and leading the normality prediction (classification) of inpatients' future lab tests by mining the historical electronic health records.


All Publications

  • mmigration-induced phase transition in a regulated multispecies birth-death process JOURNAL OF PHYSICS A-MATHEMATICAL AND THEORETICAL Xu, S., Chou, T. 2018; 51 (42)
  • Modeling large fluctuations of thousands of clones during hematopoiesis: The role of stem cell self-renewal and bursty progenitor dynamics in rhesus macaque. PLoS computational biology Xu, S., Kim, S., Chen, I. S., Chou, T. 2018; 14 (10): e1006489


    In a recent clone-tracking experiment, millions of uniquely tagged hematopoietic stem cells (HSCs) were autologously transplanted into rhesus macaques and peripheral blood containing thousands of tags were sampled and sequenced over 14 years to quantify the abundance of hundreds to thousands of tags or "clones." Two major puzzles of the data have been observed: consistent differences and massive temporal fluctuations of clone populations. The large sample-to-sample variability can lead clones to occasionally go "extinct" but "resurrect" themselves in subsequent samples. Although heterogeneity in HSC differentiation rates, potentially due to tagging, and random sampling of the animals' blood and cellular demographic stochasticity might be invoked to explain these features, we show that random sampling cannot explain the magnitude of the temporal fluctuations. Moreover, we show through simpler neutral mechanistic and statistical models of hematopoiesis of tagged cells that a broad distribution in clone sizes can arise from stochastic HSC self-renewal instead of tag-induced heterogeneity. The very large clone population fluctuations that often lead to extinctions and resurrections can be naturally explained by a generation-limited proliferation constraint on the progenitor cells. This constraint leads to bursty cell population dynamics underlying the large temporal fluctuations. We analyzed experimental clone abundance data using a new statistic that counts clonal disappearances and provided least-squares estimates of two key model parameters in our model, the total HSC differentiation rate and the maximum number of progenitor-cell divisions.

    View details for PubMedID 30335762

  • Escape rate for nonequilibrium processes dominated by strong non-detailed balance force JOURNAL OF CHEMICAL PHYSICS Tang, Y., Xu, S., Ao, P. 2018; 148 (6): 064102


    Quantifying the escape rate from a meta-stable state is essential to understand a wide range of dynamical processes. Kramers' classical rate formula is the product of an exponential function of the potential barrier height and a pre-factor related to the friction coefficient. Although many applications of the rate formula focused on the exponential term, the prefactor can have a significant effect on the escape rate in certain parameter regions, such as the overdamped limit and the underdamped limit. There have been continuous interests to understand the effect of non-detailed balance on the escape rate; however, how the prefactor behaves under strong non-detailed balance force remains elusive. In this work, we find that the escape rate formula has a vanishing prefactor with decreasing friction strength under the strong non-detailed balance limit. We both obtain analytical solutions in specific examples and provide a derivation for more general cases. We further verify the result by simulations and propose a testable experimental system of a charged Brownian particle in electromagnetic field. Our study demonstrates that a special care is required to estimate the effect of prefactor on the escape rate when non-detailed balance force dominates.

    View details for DOI 10.1063/1.5008524

    View details for Web of Science ID 000425299800002

    View details for PubMedID 29448766

  • Bidirectional Retroviral Integration Site PCR Methodology and Quantitative Data Analysis Workflow JOVE-JOURNAL OF VISUALIZED EXPERIMENTS Suryawanshi, G. W., Xu, S., Xie, Y., Chou, T., Kim, N., Chen, I. Y., Kim, S. 2017


    Integration Site (IS) assays are a critical component of the study of retroviral integration sites and their biological significance. In recent retroviral gene therapy studies, IS assays, in combination with next-generation sequencing, have been used as a cell-tracking tool to characterize clonal stem cell populations sharing the same IS. For the accurate comparison of repopulating stem cell clones within and across different samples, the detection sensitivity, data reproducibility, and high-throughput capacity of the assay are among the most important assay qualities. This work provides a detailed protocol and data analysis workflow for bidirectional IS analysis. The bidirectional assay can simultaneously sequence both upstream and downstream vector-host junctions. Compared to conventional unidirectional IS sequencing approaches, the bidirectional approach significantly improves IS detection rates and the characterization of integration events at both ends of the target DNA. The data analysis pipeline described here accurately identifies and enumerates identical IS sequences through multiple steps of comparison that map IS sequences onto the reference genome and determine sequencing errors. Using an optimized assay procedure, we have recently published the detailed repopulation patterns of thousands of Hematopoietic Stem Cell (HSC) clones following transplant in rhesus macaques, demonstrating for the first time the precise time point of HSC repopulation and the functional heterogeneity of HSCs in the primate system. The following protocol describes the step-by-step experimental procedure and data analysis workflow that accurately identifies and quantifies identical IS sequences.

    View details for DOI 10.3791/55812

    View details for Web of Science ID 000415751100016

    View details for PubMedID 28654067

    View details for PubMedCentralID PMC5608418

  • Two-time-scale population evolution on a singular landscape PHYSICAL REVIEW E Xu, S., Jiao, S., Jiang, P., Ao, P. 2014; 89 (1): 012724


    Under the effect of strong genetic drift, it is highly probable to observe gene fixation or gene loss in a population, shown by singular peaks on a potential landscape. The genetic drift-induced noise gives rise to two-time-scale diffusion dynamics on the bipeaked landscape. We find that the logarithmically divergent (singular) peaks do not necessarily imply infinite escape times or biological fixations by iterating the Wright-Fisher model and approximating the average escape time. Our analytical results under weak mutation and weak selection extend Kramers's escape time formula to models with B (Beta) function-like equilibrium distributions and overcome constraints in previous methods. The constructed landscape provides a coherent description for the bistable system, supports the quantitative analysis of bipeaked dynamics, and generates mathematical insights for understanding the boundary behaviors of the diffusion model.

    View details for DOI 10.1103/PhysRevE.89.012724

    View details for Web of Science ID 000332169900007

    View details for PubMedID 24580274

  • Wright-Fisher dynamics on adaptive landscape IET SYSTEMS BIOLOGY Jiao, S., Xu, S., Jiang, P., Yuan, B., Ao, P. 2013; 7 (5): 153?64


    Adaptive landscape, proposed by Sewall Wright, has provided a conceptual framework to describe dynamical behaviours. However, it is still a challenge to explicitly construct such a landscape, and apply it to quantify interesting evolutionary processes. This is particularly true for neutral evolution. In this work, the authors study one-dimensional Wright Fisher process, and analytically obtain an adaptive landscape as a potential function. They provide the complete characterisation for dynamical behaviours of all possible mutation rates under the influence of mutation and random drift. This same analysis has been applied to situations with additive selection and random drift for all possible selection rates. The critical state dividing the basins of two stable states is directly obtained by the landscape. In addition, the landscape is able to handle situations with pure random drift, which would be non-normalisable for its stationary distribution. The nature of non-normalisation is from the singularity of adaptive landscape. In addition, they propose a new type of neutral evolution. It has the same probability for all possible states. The new type of neutral evolution describes the non-neutral alleles with 0%. They take the equal effect of mutation and random drift as an example.

    View details for DOI 10.1049/iet-syb.2012.0058

    View details for Web of Science ID 000326458200005

    View details for PubMedID 24067415

Footer Links:

Stanford Medicine Resources: