Dr. Palacios seek to provide statistically rigorous answers to concrete, data driven questions in evolutionary genetics and public health . My research involves probabilistic modeling of evolutionary forces and the development of computationally tractable methods that are applicable to big data problems. Past and current research relies heavily on the theory of stochastic processes, Bayesian nonparametrics and recent developments in machine learning and statistical theory for big data.

Academic Appointments

  • Assistant Professor, Statistics
  • Assistant Professor, Biomedical Data Science
  • Member, Bio-X


2016-17 Courses


All Publications

  • Quantifying and Mitigating the Effect of Preferential Sampling on Phylodynamic Inference. PLoS computational biology Karcher, M. D., Palacios, J. A., Bedford, T., Suchard, M. A., Minin, V. N. 2016; 12 (3)


    Phylodynamics seeks to estimate effective population size fluctuations from molecular sequences of individuals sampled from a population of interest. One way to accomplish this task formulates an observed sequence data likelihood exploiting a coalescent model for the sampled individuals' genealogy and then integrating over all possible genealogies via Monte Carlo or, less efficiently, by conditioning on one genealogy estimated from the sequence data. However, when analyzing sequences sampled serially through time, current methods implicitly assume either that sampling times are fixed deterministically by the data collection protocol or that their distribution does not depend on the size of the population. Through simulation, we first show that, when sampling times do probabilistically depend on effective population size, estimation methods may be systematically biased. To correct for this deficiency, we propose a new model that explicitly accounts for preferential sampling by modeling the sampling times as an inhomogeneous Poisson process dependent on effective population size. We demonstrate that in the presence of preferential sampling our new model not only reduces bias, but also improves estimation precision. Finally, we compare the performance of the currently used phylodynamic methods with our proposed model through clinically-relevant, seasonal human influenza examples.

    View details for DOI 10.1371/journal.pcbi.1004789

    View details for PubMedID 26938243

  • An efficient Bayesian inference framework for coalescent-based nonparametric phylodynamics BIOINFORMATICS Lan, S., Palacios, J. A., Karcher, M., Minin, V. N., Shahbaba, B. 2015; 31 (20): 3282-3289


    The field of phylodynamics focuses on the problem of reconstructing population size dynamics over time using current genetic samples taken from the population of interest. This technique has been extensively used in many areas of biology but is particularly useful for studying the spread of quickly evolving infectious diseases agents, e.g. influenza virus. Phylodynamic inference uses a coalescent model that defines a probability density for the genealogy of randomly sampled individuals from the population. When we assume that such a genealogy is known, the coalescent model, equipped with a Gaussian process prior on population size trajectory, allows for nonparametric Bayesian estimation of population size dynamics. Although this approach is quite powerful, large datasets collected during infectious disease surveillance challenge the state-of-the-art of Bayesian phylodynamics and demand inferential methods with relatively low computational cost.To satisfy this demand, we provide a computationally efficient Bayesian inference framework based on Hamiltonian Monte Carlo for coalescent process models. Moreover, we show that by splitting the Hamiltonian function, we can further improve the efficiency of this approach. Using several simulated and real datasets, we show that our method provides accurate estimates of population size dynamics and is substantially faster than alternative methods based on elliptical slice sampler and Metropolis-adjusted Langevin algorithm.The R code for all simulation studies and real data analysis conducted in this article are publicly available at∼slan/lanzi/CODES.html and in the R package phylodyn available at or babaks@uci.eduSupplementary data are available at Bioinformatics online.

    View details for DOI 10.1093/bioinformatics/btv378

    View details for Web of Science ID 000362846600007

    View details for PubMedID 26093147

  • Bayesian Nonparametric Inference of Population Size Changes from Sequential Genealogies GENETICS Palacios, J. A., Wakeley, J., Ramachandran, S. 2015; 201 (1): 281-?


    Sophisticated inferential tools coupled with the coalescent model have recently emerged for estimating past population sizes from genomic data. Recent methods that model recombination require small sample sizes, make constraining assumptions about population size changes, and do not report measures of uncertainty for estimates. Here, we develop a Gaussian process-based Bayesian nonparametric method coupled with a sequentially Markov coalescent model that allows accurate inference of population sizes over time from a set of genealogies. In contrast to current methods, our approach considers a broad class of recombination events, including those that do not change local genealogies. We show that our method outperforms recent likelihood-based methods that rely on discretization of the parameter space. We illustrate the application of our method to multiple demographic histories, including population bottlenecks and exponential growth. In simulation, our Bayesian approach produces point estimates four times more accurate than maximum-likelihood estimation (based on the sum of absolute differences between the truth and the estimated values). Further, our method's credible intervals for population size as a function of time cover 90% of true values across multiple demographic scenarios, enabling formal hypothesis testing about population size differences over time. Using genealogies estimated with ARGweaver, we apply our method to European and Yoruban samples from the 1000 Genomes Project and confirm key known aspects of population size history over the past 150,000 years.

    View details for DOI 10.1534/genetics.115.177980

    View details for Web of Science ID 000361206400021

    View details for PubMedID 26224734

  • Phylogeography of the Trans-Volcanic bunchgrass lizard (Sceloporus bicanthalis) across the highlands of south-eastern Mexico BIOLOGICAL JOURNAL OF THE LINNEAN SOCIETY Leache, A. D., Palacios, J. A., Minin, V. N., Bryson, R. W. 2013; 110 (4): 852-865

    View details for DOI 10.1111/bij.12172

    View details for Web of Science ID 000330183200012

  • Gaussian Process-Based Bayesian Nonparametric Inference of Population Size Trajectories from Gene Genealogies BIOMETRICS Palacios, J. A., Minin, V. N. 2013; 69 (1): 8-18


    Changes in population size influence genetic diversity of the population and, as a result, leave a signature of these changes in individual genomes in the population. We are interested in the inverse problem of reconstructing past population dynamics from genomic data. We start with a standard framework based on the coalescent, a stochastic process that generates genealogies connecting randomly sampled individuals from the population of interest. These genealogies serve as a glue between the population demographic history and genomic sequences. It turns out that only the times of genealogical lineage coalescences contain information about population size dynamics. Viewing these coalescent times as a point process, estimating population size trajectories is equivalent to estimating a conditional intensity of this point process. Therefore, our inverse problem is similar to estimating an inhomogeneous Poisson process intensity function. We demonstrate how recent advances in Gaussian process-based nonparametric inference for Poisson processes can be extended to Bayesian nonparametric estimation of population size dynamics under the coalescent. We compare our Gaussian process (GP) approach to one of the state-of-the-art Gaussian Markov random field (GMRF) methods for estimating population trajectories. Using simulated data, we demonstrate that our method has better accuracy and precision. Next, we analyze two genealogies reconstructed from real sequences of hepatitis C and human Influenza A viruses. In both cases, we recover more believed aspects of the viral demographic histories than the GMRF approach. We also find that our GP method produces more reasonable uncertainty estimates than the GMRF method.

    View details for DOI 10.1111/biom.12003

    View details for Web of Science ID 000317303500003

    View details for PubMedID 23409705