Bio

Professional Education


  • Doctor of Philosophy, Tsinghua University (2017)
  • Bachelor of Engineering, Tsinghua University (2011)

Publications

All Publications


  • Assessing Statewide All-Cause Future One-Year Mortality: Prospective Study With Implications for Quality of Life, Resource Utilization, and Medical Futility. Journal of medical Internet research Guo, Y., Zheng, G., Fu, T., Hao, S., Ye, C., Zheng, L., Liu, M., Xia, M., Jin, B., Zhu, C., Wang, O., Wu, Q., Culver, D. S., Alfreds, S. T., Stearns, F., Kanov, L., Bhatia, A., Sylvester, K. G., Widen, E., McElhinney, D. B., Ling, X. B. 2018; 20 (6): e10311

    Abstract

    For many elderly patients, a disproportionate amount of health care resources and expenditures is spent during the last year of life, despite the discomfort and reduced quality of life associated with many aggressive medical approaches. However, few prognostic tools have focused on predicting all-cause 1-year mortality among elderly patients at a statewide level, an issue that has implications for improving quality of life while distributing scarce resources fairly.Using data from a statewide elderly population (aged ≥65 years), we sought to prospectively validate an algorithm to identify patients at risk for dying in the next year for the purpose of minimizing decision uncertainty, improving quality of life, and reducing futile treatment.Analysis was performed using electronic medical records from the Health Information Exchange in the state of Maine, which covered records of nearly 95% of the statewide population. The model was developed from 125,896 patients aged at least 65 years who were discharged from any care facility in the Health Information Exchange network from September 5, 2013, to September 4, 2015. Validation was conducted using 153,199 patients with same inclusion and exclusion criteria from September 5, 2014, to September 4, 2016. Patients were stratified into risk groups. The association between all-cause 1-year mortality and risk factors was screened by chi-squared test and manually reviewed by 2 clinicians. We calculated risk scores for individual patients using a gradient tree-based boost algorithm, which measured the probability of mortality within the next year based on the preceding 1-year clinical profile.The development sample included 125,896 patients (72,572 women, 57.64%; mean 74.2 [SD 7.7] years). The final validation cohort included 153,199 patients (88,177 women, 57.56%; mean 74.3 [SD 7.8] years). The c-statistic for discrimination was 0.96 (95% CI 0.93-0.98) in the development group and 0.91 (95% CI 0.90-0.94) in the validation cohort. The mortality was 0.99% in the low-risk group, 16.75% in the intermediate-risk group, and 72.12% in the high-risk group. A total of 99 independent risk factors (n=99) for mortality were identified (reported as odds ratios; 95% CI). Age was on the top of list (1.41; 1.06-1.48); congestive heart failure (20.90; 15.41-28.08) and different tumor sites were also recognized as driving risk factors, such as cancer of the ovaries (14.42; 2.24-53.04), colon (14.07; 10.08-19.08), and stomach (13.64; 3.26-86.57). Disparities were also found in patients' social determinants like respiratory hazard index (1.24; 0.92-1.40) and unemployment rate (1.18; 0.98-1.24). Among high-risk patients who expired in our dataset, cerebrovascular accident, amputation, and type 1 diabetes were the top 3 diseases in terms of average cost in the last year of life.Our study prospectively validated an accurate 1-year risk prediction model and stratification for the elderly population (≥65 years) at risk of mortality with statewide electronic medical record datasets. It should be a valuable adjunct for helping patients to make better quality-of-life choices and alerting care givers to target high-risk elderly for appropriate care and discussions, thus cutting back on futile treatment.

    View details for PubMedID 29866643

  • Web-based Real-Time Case Finding for the Population Health Management of Patients With Diabetes Mellitus: A Prospective Validation of the Natural Language Processing-Based Algorithm With Statewide Electronic Medical Records. JMIR medical informatics Zheng, L., Wang, Y., Hao, S., Shin, A. Y., Jin, B., Ngo, A. D., Jackson-Browne, M. S., Feller, D. J., Fu, T., Zhang, K., Zhou, X., Zhu, C., Dai, D., Yu, Y., Zheng, G., Li, Y., McElhinney, D. B., Culver, D. S., Alfreds, S. T., Stearns, F., Sylvester, K. G., Widen, E., Ling, X. B. 2016; 4 (4)

    Abstract

    Diabetes case finding based on structured medical records does not fully identify diabetic patients whose medical histories related to diabetes are available in the form of free text. Manual chart reviews have been used but involve high labor costs and long latency.This study developed and tested a Web-based diabetes case finding algorithm using both structured and unstructured electronic medical records (EMRs).This study was based on the health information exchange (HIE) EMR database that covers almost all health facilities in the state of Maine, United States. Using narrative clinical notes, a Web-based natural language processing (NLP) case finding algorithm was retrospectively (July 1, 2012, to June 30, 2013) developed with a random subset of HIE-associated facilities, which was then blind tested with the remaining facilities. The NLP-based algorithm was subsequently integrated into the HIE database and validated prospectively (July 1, 2013, to June 30, 2014).Of the 935,891 patients in the prospective cohort, 64,168 diabetes cases were identified using diagnosis codes alone. Our NLP-based case finding algorithm prospectively found an additional 5756 uncodified cases (5756/64,168, 8.97% increase) with a positive predictive value of .90. Of the 21,720 diabetic patients identified by both methods, 6616 patients (6616/21,720, 30.46%) were identified by the NLP-based algorithm before a diabetes diagnosis was noted in the structured EMR (mean time difference = 48 days).The online NLP algorithm was effective in identifying uncodified diabetes cases in real time, leading to a significant improvement in diabetes case finding. The successful integration of the NLP-based case finding algorithm into the Maine HIE database indicates a strong potential for application of this novel method to achieve a more complete ascertainment of diagnoses of diabetes mellitus.

    View details for PubMedID 27836816

  • NLP based congestive heart failure case finding: A prospective analysis on statewide electronic medical records. International journal of medical informatics Wang, Y., Luo, J., Hao, S., Xu, H., Shin, A. Y., Jin, B., Liu, R., Deng, X., Wang, L., Zheng, L., Zhao, Y., Zhu, C., Hu, Z., Fu, C., Hao, Y., Zhao, Y., Jiang, Y., Dai, D., Culver, D. S., Alfreds, S. T., Todd, R., Stearns, F., Sylvester, K. G., Widen, E., Ling, X. B. 2015; 84 (12): 1039-1047

    Abstract

    In order to proactively manage congestive heart failure (CHF) patients, an effective CHF case finding algorithm is required to process both structured and unstructured electronic medical records (EMR) to allow complementary and cost-efficient identification of CHF patients.We set to identify CHF cases from both EMR codified and natural language processing (NLP) found cases. Using narrative clinical notes from all Maine Health Information Exchange (HIE) patients, the NLP case finding algorithm was retrospectively (July 1, 2012-June 30, 2013) developed with a random subset of HIE associated facilities, and blind-tested with the remaining facilities. The NLP based method was integrated into a live HIE population exploration system and validated prospectively (July 1, 2013-June 30, 2014). Total of 18,295 codified CHF patients were included in Maine HIE. Among the 253,803 subjects without CHF codings, our case finding algorithm prospectively identified 2411 uncodified CHF cases. The positive predictive value (PPV) is 0.914, and 70.1% of these 2411 cases were found to be with CHF histories in the clinical notes.A CHF case finding algorithm was developed, tested and prospectively validated. The successful integration of the CHF case findings algorithm into the Maine HIE live system is expected to improve the Maine CHF care.

    View details for DOI 10.1016/j.ijmedinf.2015.06.007

    View details for PubMedID 26254876

  • Development, Validation and Deployment of a Real Time 30 Day Hospital Readmission Risk Assessment Tool in the Maine Healthcare Information Exchange PLOS ONE Hao, S., Wang, Y., Jin, B., Shin, A. Y., Zhu, C., Huang, M., Zheng, L., Luo, J., Hu, Z., Fu, C., Dai, D., Wang, Y., Culver, D. S., Alfreds, S. T., Rogow, T., Stearns, F., Sylvester, K. G., Widen, E., Ling, X. B. 2015; 10 (10)

    Abstract

    Identifying patients at risk of a 30-day readmission can help providers design interventions, and provide targeted care to improve clinical effectiveness. This study developed a risk model to predict a 30-day inpatient hospital readmission for patients in Maine, across all payers, all diseases and all demographic groups.Our objective was to develop a model to determine the risk for inpatient hospital readmission within 30 days post discharge. All patients within the Maine Health Information Exchange (HIE) system were included. The model was retrospectively developed on inpatient encounters between January 1, 2012 to December 31, 2012 from 24 randomly chosen hospitals, and then prospectively validated on inpatient encounters from January 1, 2013 to December 31, 2013 using all HIE patients.A risk assessment tool partitioned the entire HIE population into subgroups that corresponded to probability of hospital readmission as determined by a corresponding positive predictive value (PPV). An overall model c-statistic of 0.72 was achieved. The total 30-day readmission rates in low (score of 0-30), intermediate (score of 30-70) and high (score of 70-100) risk groupings were 8.67%, 24.10% and 74.10%, respectively. A time to event analysis revealed the higher risk groups readmitted to a hospital earlier than the lower risk groups. Six high-risk patient subgroup patterns were revealed through unsupervised clustering. Our model was successfully integrated into the statewide HIE to identify patient readmission risk upon admission and daily during hospitalization or for 30 days subsequently, providing daily risk score updates.The risk model was validated as an effective tool for predicting 30-day readmissions for patients across all payer, disease and demographic groups within the Maine HIE. Exposing the key clinical, demographic and utilization profiles driving each patient's risk of readmission score may be useful to providers in developing individualized post discharge care plans.

    View details for DOI 10.1371/journal.pone.0140271

    View details for Web of Science ID 000362511000113

    View details for PubMedID 26448562

  • Online Prediction of Health Care Utilization in the Next Six Months Based on Electronic Health Record Information: A Cohort and Validation Study JOURNAL OF MEDICAL INTERNET RESEARCH Hu, Z., Hao, S., Jin, B., Shin, A. Y., Zhu, C., Huang, M., Wang, Y., Zheng, L., Dai, D., Culver, D. S., Alfreds, S. T., Rogow, T., Stearns, F., Sylvester, K. G., Widen, E., Ling, X. 2015; 17 (9)

    Abstract

    The increasing rate of health care expenditures in the United States has placed a significant burden on the nation's economy. Predicting future health care utilization of patients can provide useful information to better understand and manage overall health care deliveries and clinical resource allocation.This study developed an electronic medical record (EMR)-based online risk model predictive of resource utilization for patients in Maine in the next 6 months across all payers, all diseases, and all demographic groups.In the HealthInfoNet, Maine's health information exchange (HIE), a retrospective cohort of 1,273,114 patients was constructed with the preceding 12-month EMR. Each patient's next 6-month (between January 1, 2013 and June 30, 2013) health care resource utilization was retrospectively scored ranging from 0 to 100 and a decision tree-based predictive model was developed. Our model was later integrated in the Maine HIE population exploration system to allow a prospective validation analysis of 1,358,153 patients by forecasting their next 6-month risk of resource utilization between July 1, 2013 and December 31, 2013.Prospectively predicted risks, on either an individual level or a population (per 1000 patients) level, were consistent with the next 6-month resource utilization distributions and the clinical patterns at the population level. Results demonstrated the strong correlation between its care resource utilization and our risk scores, supporting the effectiveness of our model. With the online population risk monitoring enterprise dashboards, the effectiveness of the predictive algorithm has been validated by clinicians and caregivers in the State of Maine.The model and associated online applications were designed for tracking the evolving nature of total population risk, in a longitudinal manner, for health care resource utilization. It will enable more effective care management strategies driving improved patient outcomes.

    View details for DOI 10.2196/jmir.4976

    View details for Web of Science ID 000361809800005

    View details for PubMedID 26395541

  • Cerebrospinal fluid protein dynamic driver network: At the crossroads of brain tumorigenesis METHODS Tan, Z., Liu, R., Zheng, L., Hao, S., Fu, C., Li, Z., Deng, X., Jang, T., Merchant, M., Whitin, J. C., Guo, M., Cohen, H. J., Recht, L., Ling, X. B. 2015; 83: 36-43

    Abstract

    To get a better understanding of the ongoing in situ environmental changes preceding the brain tumorigenesis, we assessed cerebrospinal fluid (CSF) proteome profile changes in a glioma rat model in which brain tumor invariably developed after a single in utero exposure to the neurocarcinogen ethylnitrosourea (ENU). Computationally, the CSF proteome profile dynamics during the tumorigenesis can be modeled as non-smooth or even abrupt state changes. Such brain tumor environment transition analysis, correlating the CSF composition changes with the development of early cellular hyperplasia, can reveal the pathogenesis process at network level during a time before the image detection of the tumors. In our controlled rat model study, matched ENU- and saline-exposed rats' CSF proteomics changes were quantified at approximately 30, 60, 90, 120, 150days of age (P30, P60, P90, P120, P150). We applied our transition-based network entropy (TNE) method to compute the CSF proteome changes in the ENU rat model and test the hypothesis of the critical transition state prior to impending hyperplasia. Our analysis identified a dynamic driver network (DDN) of CSF proteins related with the emerging tumorigenesis progressing from the non-hyperplasia state. The DDN associated leading network CSF proteins can allow the early detection of such dynamics before the catastrophic shift to the clear clinical landmarks in gliomas. Future characterization of the critical transition state (P60) during the brain tumor progression may reveal the underlying pathophysiology to device novel therapeutics preventing tumor formation. More detailed method and information are accessible through our website at http://translationalmedicine.stanford.edu.

    View details for DOI 10.1016/j.ymeth.2015.05.004

    View details for Web of Science ID 000358755100005