# Courses

### Statistics PhD Core Courses

Students need to take the following classes:

**STATS 310ABC: Theory of Probability**(3 units)
Mathematical tools: asymptotics, metric spaces; measure and integration; Lp spaces; some Hilbert spaces theory. Probability: independence, Borel--Cantelli lemmas, almost sure and Lp convergence, weak and strong laws of large numbers. Weak convergence and characteristic functions; central limit theorems; local limit theorems; Poisson convergence. Stopping times, 0-1 laws, Kolmogorov consistency theorem. Uniform integrability. Radon--Nikodym theorem, branching processes, conditional expectation, discrete time martingales. Exchangeability. Large deviations. Laws of the iterated logarithm. Birkhoff’s and Kingman’s ergodic theorems. Recurrence, entropy. Infinitely divisible laws. Continuous time martingales, random walks and Brownian motion. Invariance principle. Markov and strong Markov property. Processes with stationary independent increments.**STATS 300ABC: Theory of Statistics**(3 units)
Elementary decision theory; loss and risk functions, Bayes estimation; UMVU estimator, minimax estimators, shrinkage estimators. Hypothesis testing and confidence intervals: Neyman--Pearson theory; UMP tests and uniformly most accurate confidence intervals; use of unbiasedness and invariance to eliminate nuisance parameters. Large sample theory: basic convergence concepts; robustness; efficiency; contiguity, locally asymptotically normal experiments; convolution theorem; asymptotically UMP and maximin tests. Asymptotic theory of likelihood ratio and score tests. Rank permutation and randomization tests; jackknife, bootstrap, subsampling and other resampling methods. Further topics: sequential analysis, optimal experimental design, empirical processes with applications to statistics, Edgeworth expansions, density estimation, time series. Minimax, admissible procedures. Complete class theorems (“all” minimax or admissible procedures are “Bayes”), Bayes procedures, conjugate priors, hierarchical models. Bayesian nonparametrics: Dirichlet, tail free, Polya trees, Bayesian sieves. Inconsistency of Bayes rules.**STATS 305: Introduction to Statistical Modeling**(3 units)
The linear model: simple linear regression, polynomial regression, multiple regression, ANOVA models; and with some extensions, orthogonal series regression, wavelets, radial basis functions, and MARS. Topics: normal theory inference (tests, confidence intervals, power), related distributions (t, chi-square, F), numerical methods (QR, SVD), model selection/regularization (Cp, AIC, BIC), diagnostics of model inadequacy, and remedies including bootstrap inference, and cross-validation. Emphasis is on problem sets involving substantial computations with data sets, including developing extensions of existing methods.**STATS 306AB: Methods for Applied Statistics**(3 units)- Extension of modeling techniques of 305: binary and discrete response data and nonlinear least squares. Topics include regression, Poisson loglinear models, classification methods, clustering. Unsupervised learning techniques in statistics, machine learning, and data mining.

### Coursework in Biomedical Sciences

Students need to take a total of at least 7 units from the following classes:

**GENE 203: Advanced Genetics**(4 units)
For graduate students in Bioscience programs; may be appropriate for graduate students in other programs. The genetic toolbox. Examples of analytic methods, genetic manipulation, genome analysis, and human genetics. Emphasis is on use of genetic tools in dissecting complex biological pathways, developmental processes, and regulatory systems. Faculty-led discussion sections with evaluation of papers.**BIOC 205: Molecular Foundations of Medicine**(3 units)
Topics include DNA structure, replication, repair, and recombination; chromosome structure and function; gene expression including mechanisms for regulating transcription and translation; and methods for manipulating DNA, RNA, and proteins. Patient presentations illustrate how molecular biology affects the practice of medicine.**INDE 220: Human Health and Disease I**(3 units)
This course establishes the foundation for the Human Health and Disease block which spans the spring quarter of the first year medical curriculum through the winter quarter of year two.**GENE 211: Genomics**(3 units)
Genome evolution, organization, and function; technical, computational, and experimental approaches; hands-on experience with representative computational tools used in genome science; and a beginning working knowledge of PERL.**GENE 210: Genomics and Personalized Medicine**(2 units)

Student-initiated course. Principles of genetics underlying associations between genetic variants and disease susceptibility and drug response. Topics include: genetic and environmental risk factors for complex genetic disorders; design and interpretation of genome-wide association studies; pharmacogenetics; full genome sequencing for disease gene discovery; population structure and genetic ancestry; use of personal genetic information in clinical medicine; ethical, legal, and social issues with personal genetic testing. Hands-on workshop making use of personal or publically available genetic data.

Three of these suggested classes are required by the Masters of Medicine (MOM) program, which the Medical School opens to all University PhD students (with competitive admission). This is an opportunity that we encourage the trainees to consider. The Spectrum program for translational medicine may contribute funds to cover the extra tuition costs (relative to the second year of the MOM).

### Coursework in Biostatistics/Epidemiology

Students need to take a total of at least 6 units from the following classes:

**HRP 226: Advanced Epidemiologic and Clinical Research Methods**(3-4 units)
The principles of measurement, measures of effect, confounding, effect modification, and strategies for minimizing bias in clinical and epidemiologic studies.**STATS 338: Topics in Biostatistics**(3 units)
Data monitoring and interim analysis of clinical trials. Design of Phase I, II, III trials. Survival analysis. Longitudinal data analysis.**STATS 315A: Modern Applied Statistics: Learning**(2-3 units)- Overview of supervised learning. Linear regression and related methods. Least angle regression and the lasso. Classification. Support vector machines (SVMs). Kernels and string kernels. Basic expansions and regularization. Generalized additive models. Kernel smoothing. Gaussian mixtures and EM algorithm. Model assessment and selection: cross-validation and the bootstrap. Pathwise coordinate descent and the fused lasso. Sparse graphical models. Discrete graphical models.

### Coursework in Computational Biology/Bioinformatics/Statistical Genetics

Students need to take a total of at least 6 units from the following classes. The reader will note that there are a large number of courses in this group and that there is a substantial overlap among their content. Rather than making a selection a priori, we include all of these possibilities to maximize flexibility (taking into account that not all classes will be offered each year and that new classes may develop). The students will be assisted by their advisor and program directors in deciding which specific classes to take, so that repetition is avoided.

**BIOC 218: Computational Molecular Biology**(3 units)
For molecular biologists and computer scientists. Representation and analysis of genomes, sequences, and proteins. Strengths and limitations of existing methods. Coursework performed on web or using downloadable applications.**BIOC 228: Computational Genomic Biology**(3 units)
Application of computational genomics methods to biological problems. Topics include: assembly of genomic sequences; genome databases; comparative genomics; gene discovery; gene expression analyses including gene clustering by expression, transcription factor binding site discovery, metabolic pathway discovery, functional genomics, and gene and genome ontologies; and medical diagnostics using SNPs and gene expression. Recent papers from the literature and hands-on use of the methods.**BIOMEDIN 214: Representations and Algorithms for Computational Molecular Biology**(3-4 units)
Topics: introduction to bioinformatics and computational biology, algorithms for alignment of biological sequences and structures, computing with strings, phylogenetic tree construction, hidden Markov models, Gibbs Sampling, basic structural computations on proteins, protein structure prediction, protein threading techniques, homology modeling, molecular dynamics and energy minimization, statistical analysis of 3D biological data, integration of data sources, knowledge representation and controlled terminologies for molecular biology, microarray analysis, machine learning (clustering and classification), and natural language text processing.**BIOMEDIN 217: Translational Bioinformatics**(4 units)
Analytic, storage, and interpretive methods to optimize the transformation of genetic, genomic, and biological data into diagnostics and therapeutics for medicine. Topics: access and utility of publicly available data sources; types of genome-scale measurements in molecular biology and genomic medicine; analysis of microarray data; analysis of polymorphisms, proteomics, and protein interactions; linking genome-scale data to clinical data and phenotypes; and new questions in biomedicine using bioinformatics. Case studies.**CS 262: Computational Genomics**(3 units)
Applications of computer science to genomics and concepts in genomics from a computer science point of view. Topics: dynamic programming, sequence alignments, hidden Markov models, Gibbs sampling, and probabilistic context-free grammars. Applications of these tools to sequence analysis: comparative genomics, DNA sequencing and assembly, genomic annotation of repeats, genes, and regulatory sequences, microarrays and gene expression, phylogeny and molecular evolution, and RNA structure.**CS 273A: A Computational Tour of the Human Genome**(3 units)
Introduction to computational biology through an informatics exploration of the human genome. Topics include: genome sequencing (technologies, assembly, personalized sequencing); functional landscape (genes, gene regulation, repeats, RNA genes, epigenetics); genome evolution (comparative genomics, ultraconservation, co-option). Additional topics may include population genetics, personalized genomics, and ancient DNA. Course includes primers on molecular biology, the UCSC Genome Browser, and text processing languages. Guest lectures from genomic researchers.**CS 278: Systems Biology**(3 units)
Complex biological behaviors through the integration of computational modeling and molecular biology. Topics: reconstructing biological networks from high-throughput data and knowledge bases. Network properties. Computational modeling of network behaviors at the small- and large-scale. Using model predictions to guide an experimental program. Robustness, noise, and cellular variation.**CS 279: Computational Methods for Analysis and Reconstruction of Biological Networks**(3 units)
Types of interactions, including: regulatory such as transcriptional, signaling, and chromatin modification; protein-protein interactions; and genetic. Biological network structure at scales such as single interaction, small subgraphs, and global organization. Methods for analyzing properties of biological networks. Techniques for reconstructing networks from biological data, including: DNA/protein sequence motifs and sequence conservation; gene expression data; and physical binding data such as protein-DNA, protein-RNA, and protein-protein. Network dynamics and evolution.**CS 374: Algorithms in Biology**(2-3 units)
Algorithms and computational models applied to molecular biology and genetics. Topics vary annually. Possible topics include biological sequence comparison, annotation of genes and other functional elements, molecular evolution, genome rearrangements, microarrays and gene regulation, protein folding and classification, molecular docking, RNA secondary structure, DNA computing, and self-assembly. May be repeated for credit.**GENE 244: Introduction to Statistical Genetics**(3 units)
Statistical methods for analyzing human genetics studies of Mendelian disorders and common complex traits. Probable topics include: principles of population genetics; epidemiologic designs; familial aggregation; segregation analysis; linkage analysis; linkage-disequilibrium-based association mapping approaches; and genome-wide analysis based on high-throughput genotyping platforms.**STATS 345: Computational Algorithms for Statistical Genetics**(2-3 units)
Computational algorithms for human genetics research. Topics include: permutation, bootstrap, expectation maximization, hidden Markov model, and Markov chain Monte Carlo. Rationales and techniques illustrated with existing implementations commonly used in population genetics research, disease association studies, and genomics analysis.**STATS 366: Computational Biology**(2-3 units)
Methods to understand sequence alignments and phylogenetic trees built from molecular data, and general genetic data. Phylogenetic trees, median networks, microarray analysis, Bayesian statistics. Binary labeled trees as combinatorial objects, graphs, and networks. Distances between trees. Multivariate methods (PCA, CA, multidimensional scaling). Combining data, nonparametric inference. Algorithms used: branch and bound, dynamic programming, Markov chain approach to combinatorial optimization (simulated annealing, Markov chain Monte Carlo, approximate counting, exact tests). Software such as Matlab, Phylip, Seq-gen, Arlequin, Puzzle, Splitstree, and XGobi.**STATS 367: Statistical Models in Genetics**(2-3 units)- Stochastic models and related statistical problems in linkage analysis of qualitative and quantitative traits in humans and experimental populations; sequence alignment and analysis; and population genetics/evolution, both classical (Wright--Fisher--Kimura) and modern (Kingman coalescent). Computational algorithms as applications of dynamic programming, Markov chain Monte Carlo, and hidden Markov models. Prerequisites: knowledge of probability through elementary stochastic processes and statistics through likelihood theory.