Bio

Publications

Journal Articles


  • Breast cancer treatment across health care systems: linking electronic medical records and state registry data to enable outcomes research. Cancer Kurian, A. W., Mitani, A., Desai, M., Yu, P. P., Seto, T., Weber, S. C., Olson, C., Kenkare, P., Gomez, S. L., de Bruin, M. A., Horst, K., Belkora, J., May, S. G., Frosch, D. L., Blayney, D. W., Luft, H. S., Das, A. K. 2014; 120 (1): 103-111

    Abstract

    Understanding of cancer outcomes is limited by data fragmentation. In the current study, the authors analyzed the information yielded by integrating breast cancer data from 3 sources: electronic medical records (EMRs) from 2 health care systems and the state registry.Diagnostic test and treatment data were extracted from the EMRs of all patients with breast cancer treated between 2000 and 2010 in 2 independent California institutions: a community-based practice (Palo Alto Medical Foundation; "Community") and an academic medical center (Stanford University; "University"). The authors incorporated records from the population-based California Cancer Registry and then linked EMR-California Cancer Registry data sets of Community and University patients.The authors initially identified 8210 University patients and 5770 Community patients; linked data sets revealed a 16% patient overlap, yielding 12,109 unique patients. The percentage of all Community patients, but not University patients, treated at both institutions increased with worsening cancer prognostic factors. Before linking the data sets, Community patients appeared to receive less intervention than University patients (mastectomy: 37.6% vs 43.2%; chemotherapy: 35% vs 41.7%; magnetic resonance imaging: 10% vs 29.3%; and genetic testing: 2.5% vs 9.2%). Linked Community and University data sets revealed that patients treated at both institutions received substantially more interventions (mastectomy: 55.8%; chemotherapy: 47.2%; magnetic resonance imaging: 38.9%; and genetic testing: 10.9% [P < .001 for each 3-way institutional comparison]).Data linkage identified 16% of patients who were treated in 2 health care systems and who, despite comparable prognostic factors, received far more intensive treatment than others. By integrating complementary data from EMRs and population-based registries, a more comprehensive understanding of breast cancer care and factors that drive treatment use was obtained.

    View details for DOI 10.1002/cncr.28395

    View details for PubMedID 24101577

  • Systematic identification of risk factors for Alzheimer's disease through shared genetic architecture and electronic medical records. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Li, L., Ruau, D., Chen, R., Weber, S., Butte, A. J. 2013: 224-235

    Abstract

    Alzheimer's disease (AD) is one of the leading causes of death for older people in US with rapidly increasing incidence. AD irreversibly and progressively damages the brain, but there are treatments in clinical trials to potentially slow the development of AD. We hypothesize that the presence of clinical traits, sharing common genetic variants with AD, could be used as a non-invasive means to predict AD or trigger for administration of preventative therapeutics. We developed a method to compare the genetic architecture between AD and traits from prior GWAS studies. Six clinical traits were significantly associated with AD, capturing 5 known risk factors and 1 novel association: erythrocyte sedimentation rate (ESR). The association of ESR with AD was then validated using Electronic Medical Records (EMR) collected from Stanford Hospital and Clinics. We found that female patients and with abnormally elevated ESR were significantly associated with higher risk of AD diagnosis (OR: 1.85 [1.32-2.61], p=0.003), within 1 year prior to AD diagnosis (OR: 2.31 [1.06-5.01], p=0.032), and within 1 year after AD diagnosis (OR: 3.49 [1.93-6.31], p<0.0001). Additionally, significantly higher ESR values persist for all time courses analyzed. Our results suggest that ESR should be tested in a specific longitudinal study for association with AD diagnosis, and if positive, could be used as a prognostic marker.

    View details for PubMedID 23424127

  • A simple heuristic for blindfolded record linkage JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION Weber, S. C., Lowe, H., Das, A., Ferris, T. 2012; 19 (E1): E157-E161

    Abstract

    To address the challenge of balancing privacy with the need to create cross-site research registry records on individual patients, while matching the data for a given patient as he or she moves between participating sites. To evaluate the strategy of generating anonymous identifiers based on real identifiers in such a way that the chances of a shared patient being accurately identified were maximized, and the chances of incorrectly joining two records belonging to different people were minimized.Our hypothesis was that most variation in names occurs after the first two letters, and that date of birth is highly reliable, so a single match variable consisting of a hashed string built from the first two letters of the patient's first and last names plus their date of birth would have the desired characteristics. We compared and contrasted the match algorithm characteristics (rate of false positive v. rate of false negative) for our chosen variable against both Social Security Numbers and full names.In a data set of 19 000 records, a derived match variable consisting of a 2-character prefix from both first and last names combined with date of birth has a 97% sensitivity; by contrast, an anonymized identifier based on the patient's full names and date of birth has a sensitivity of only 87% and SSN has sensitivity 86%.The approach we describe is most useful in situations where privacy policies preclude the full exchange of the identifiers required by more sophisticated and sensitive linkage algorithms. For data sets of sufficiently high quality this effective approach, while producing a lower rate of matching than more complex algorithms, has the merit of being easy to explain to institutional review boards, adheres to the minimum necessary rule of the HIPAA privacy rule, and is faster and less cumbersome to implement than a full probabilistic linkage.

    View details for DOI 10.1136/amiajnl-2011-000329

    View details for Web of Science ID 000314151400026

    View details for PubMedID 22298567

  • Clinical Research Alerting for Early Septic Shock Detection 2012 Summit on Clinical Research Informatics S. Weber, H. Lowe, S. Malunjkar, V. Ojha, R. Pearl, J. Quinn 2012
  • Oncoshare: Lessons learned from building an integrated multi-institutional database for comparative effectiveness research Proceedings of the AMIA 2012 Annual Symposium SC Weber, T Seto, C Olson, P Kenkare, A Kurian, A Das 2012
  • Use of RxNorm and SNOMET-CT® to Support the Use of Medication Information in Research Patient Cohort Searching AMIA Annu Symp Proc Hernandez P, Podchiyska T, Ferris TA, Weber S, Lowe HJ 2011: 106
  • Hash-Based Algorithmic Linkage of Patient Records in De-identified Multi-site Patient Research Registries AMIA Annu Symp Proc Weber S, Lowe HJ, Olson G, Seto T, Ferris TA, Das A, Kurian A, Olson C, Kenkare P 2011; CRI: 81
  • A Model for Efficient Review of Clinical Data by Researchers within the STRIDE Clinical Data Warehouse AMIA Annu Symp Proc Lowe HJ, Weber S, Ramamoorthy N, Ferris TA, Hernandez P 2011; CRI: 34
  • Implementing a Real-time Complex Event Stream Processing System to Help Identify Potential Participants in Clinical and Translational Research Studies AMIA Annu Symp Proc Weber S, Lowe HJ, Malunjkar S, Quinn J 2010: 472
  • Managing Medical Vocabulary Updates in a Clinical Data Warehouse: An RxNorm Case Study. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Podchiyska, T., Hernandez, P., Ferris, T., Weber, S., Lowe, H. J. 2010; 2010: 477-481

    Abstract

    Use of terminology standards facilitates aggregating data from multiple sources for information retrieval, exchange and analysis. However, medical vocabularies are continuously updated and incorporating those changes consistently into clinical data warehouses requires rigorous methodology. To integrate pharmacy data from two hospital pharmacy information systems the Stanford Translational Research Integrated Database Environment (STRIDE) project mapped medication orders to RxNorm content using the RxNorm drug model. In order to keep the data relevant and up-to-date, we developed a strategy for updating to RxNorm, while preserving the original meaning and mapping of the legacy data. This case study discusses managing the vocabulary update by following the RxNorm content maintenance strategy and supplementing it with operations to retain access to its drug model information.

    View details for PubMedID 21347024

  • Self-Service Support for Research Patient Cohort Identification and Review of Clinical Data in the STRIDE Clinical Data Warehouse AMIA Annu Symp Proc Lowe HJ, Weber S, Ferris T, Hernandez P 2010; CRI
  • Automated mapping of pharmacy orders from two electronic health record systems to RxNorm within the STRIDE clinical data warehouse. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Hernandez, P., Podchiyska, T., Weber, S., Ferris, T., Lowe, H. 2009; 2009: 244-248

    Abstract

    The Stanford Translational Research Integrated Database Environment (STRIDE) clinical data warehouse integrates medication information from two Stanford hospitals that use different drug representation systems. To merge this pharmacy data into a single, standards-based model supporting research we developed an algorithm to map HL7 pharmacy orders to RxNorm concepts. A formal evaluation of this algorithm on 1.5 million pharmacy orders showed that the system could accurately assign pharmacy orders in over 96% of cases. This paper describes the algorithm and discusses some of the causes of failures in mapping to RxNorm.

    View details for PubMedID 20351858

  • STRIDE--An integrated standards-based translational research informatics platform. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Lowe, H. J., Ferris, T. A., Hernandez, P. M., Weber, S. C. 2009; 2009: 391-395

    Abstract

    STRIDE (Stanford Translational Research Integrated Database Environment) is a research and development project at Stanford University to create a standards-based informatics platform supporting clinical and translational research. STRIDE consists of three integrated components: a clinical data warehouse, based on the HL7 Reference Information Model (RIM), containing clinical information on over 1.3 million pediatric and adult patients cared for at Stanford University Medical Center since 1995; an application development framework for building research data management applications on the STRIDE platform and a biospecimen data management system. STRIDE's semantic model uses standardized terminologies, such as SNOMED, RxNorm, ICD and CPT, to represent important biomedical concepts and their relationships. The system is in daily use at Stanford and is an important component of Stanford University's CTSA (Clinical and Translational Science Award) Informatics Program.

    View details for PubMedID 20351886

  • Novel integration of hospital electronic medical records and gene expression measurements to identify genetic markers of maturation. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Chen, D. P., Weber, S. C., Constantinou, P. S., Ferris, T. A., Lowe, H. J., Butte, A. J. 2008: 243-254

    Abstract

    Traditionally, the elucidation of genes involved in maturation and aging has been studied in a temporal fashion by examining gene expression at different time points in an organism's life as well as by knocking out, knocking in, and mutating genes thought to be involved. Here, we propose an in silico method to combine clinical electronic medical record (EMR) data and gene expression measurements in the context of disease to identify genes that may be involved in the process of human maturation and aging. First we show that absolute lymphocyte count may serve as a biomarker for maturation by using statistical methods to compare trends among different clinical laboratory tests in response to an increase in age. We then propose using the rate of decay for absolute lymphocyte count across 12 diseases as a proxy for differences in aging. We correlate the differing rates with gene expression across the same diseases to find maturation/aging related genes. Among the 53 genes with strongest correlations between expression profile and change in rate of decay, we found genes previously implicated in the process of aging, including MGMT (DNA repair), TERF2 (telomere stability), POLD1 (DNA replication and repair), and POLG (mtDNA replication).

    View details for PubMedID 18229690

  • Clinical arrays of laboratory measures, or "clinarrays", built from an electronic health record enable disease subtyping by severity. AMIA ... Annual Symposium proceedings / AMIA Symposium. AMIA Symposium Chen, D. P., Weber, S. C., Constantinou, P. S., Ferris, T. A., Lowe, H. J., Butte, A. J. 2007: 115-119

    Abstract

    The severity of diseases has often been assigned by direct observation of a patient and by pathological examination after symptoms have appeared. As we move into the genomic era, the ability to predict disease severity prior to manifestation has improved dramatically due to genomic sequencing and analysis of gene expression microarrays. However, as the severity of diseases can be exacerbated by non genetic factors, the ability to predict disease severity by examining gene expression alone may be inadequate. We propose the creation of a "clinarray" to examine phenotypic expression in the form of clinical laboratory measurements. We demonstrate that the clinarray can be used to distinguish between the severities of patients with cystic fibrosis and those with Crohn's disease by applying unsupervised clustering methods that have been previously applied to microarrays.

    View details for PubMedID 18693809

Conference Proceedings


  • Stanford-NIH Pain Registry: Open source platform for large-scale longitudinal assessment of clinical data and patient-reported outcomes American Academy of Pain Medicine’s 30th Annual Meeting Kao, M., Cook, K., Olson, G., Pacht, T., Darnall, B., Weber, S., Mackey, S. 2014
  • Stanford-NIH Pain Registry: Catalyzing the rate-limiting step of big data psychometrics with item-response theory and advanced computerized adaptive testing American Academy of Pain Medicine’s 30th Annual Meeting Kao, M., Cook, K., Olson, G., Pacht, T., Darnall, B., Weber, S. C., Mackey, S. 2014
  • Row-Filtering with Dynamic SQL in Support of Compliant EHR Data Marts AMIA CRI Summit on Clinical Research Informatics Weber, S., Srinivas, R., Zhou, V., Ferris, T., Lowe, H. 2013: 278

Stanford Medicine Resources: