Selen Bozkurt is a biomedical informatician and biostatistician at Stanford University, Center for Biomedical Informatics Research. She was a postdoctoral scholar before, at Stanford Biomedical Data Science Department. Her research area and interests have focused on health informatics research using electronic health records, machine learning and natural language processing. She also has work experience as a biostatistician in several projects. She is a member of RSNA Radiology Reporting Committee since 2009. Her PhD dissertation work was entitled "A Real Time Decision Support System for Mammography Interpretations" in which she developed an automated system for deep information extraction from mammography reports and an approach for real-time decision support driven by analysis of dictated radiology reports.

Education & Certifications

  • PhD, Akdeniz University, Faculty of Medicine, Biostatistics and Medical Informatics
  • Visiting PhD Student, Stanford University, Biomedical Informatics
  • MSc, Akdeniz University, Faculty of Medicine, Biostatistics and Medical Informatics
  • BSc, Dokuz Eylul University, Statistics


All Publications

  • Phenotyping severity of patient-centered outcomes using clinical notes: A prostate cancer use case LEARNING HEALTH SYSTEMS Bozkurt, S., Paul, R., Coquet, J., Sun, R., Banerjee, I., Brooks, J. D., Hernandez-Boussard, T. 2020

    View details for DOI 10.1002/lrh2.10237

    View details for Web of Science ID 000548944700001

  • MINIMAR (MINimum Information for Medical AI Reporting): Developing reporting standards for artificial intelligence in health care. Journal of the American Medical Informatics Association : JAMIA Hernandez-Boussard, T., Bozkurt, S., Ioannidis, J. P., Shah, N. H. 2020


    The rise of digital data and computing power have contributed to significant advancements in artificial intelligence (AI), leading to the use of classification and prediction models in health care to enhance clinical decision-making for diagnosis, treatment and prognosis. However, such advances are limited by the lack of reporting standards for the data used to develop those models, the model architecture, and the model evaluation and validation processes. Here, we present MINIMAR (MINimum Information for Medical AI Reporting), a proposal describing the minimum information necessary to understand intended predictions, target populations, and hidden biases, and the ability to generalize these emerging technologies. We call for a standard to accurately and responsibly report on AI in health care. This will facilitate the design and implementation of these models and promote the development and use of associated clinical decision support tools, as well as manage concerns regarding accuracy and bias.

    View details for DOI 10.1093/jamia/ocaa088

    View details for PubMedID 32594179

  • Four distinct patient-reported outcome (PRO) trajectories in longitudinal responses collected before, during, and after chemotherapy. Blayney, D. W., Azad, A., Yilmaz, M., Bozkurt, S., Brooks, J. D., Hernandez-Boussard, T. AMER SOC CLINICAL ONCOLOGY. 2020
  • Acute pain after breast surgery and reconstruction: A two-institution study of surgical factors influencing short-term pain outcomes. Journal of surgical oncology Azad, A. D., Bozkurt, S., Wheeler, A. J., Curtin, C., Wagner, T. H., Hernandez-Boussard, T. 2020


    Acute postoperative pain following surgery is known to be associated with chronic pain development and lower quality of life. We sought to analyze the relationship between differing breast cancer excisional procedures, reconstruction, and short-term pain outcomes.Women undergoing breast cancer excisional procedures with or without reconstruction at two systems: an academic hospital (AH) and Veterans Health Administration (VHA) were included. Average pain scores at the time of discharge and at 30-day follow-up were analyzed across demographic and clinical characteristics. Linear mixed effects modeling was used to assess the relationship between patient/clinical characteristics and interval pain scores with a random slope to account for differences in baseline pain.Our study included 1402 patients at AH and 1435 at VHA, of which 426 AH and 165 patients with VHA underwent reconstruction. Pain scores improved over time and were found to be highest at discharge. Time at discharge, 30-day follow-up, and preoperative opioid use were the strongest predictors of high pain scores. Younger age and longer length of stay were independently associated with worse pain scores.Younger age, preoperative opioid use, and longer length of stay were associated with higher levels of postoperative pain across both sites.

    View details for DOI 10.1002/jso.26070

    View details for PubMedID 32563208

  • Reporting of demographic data and representativeness in machine learning models using electronic health records. Journal of the American Medical Informatics Association : JAMIA Bozkurt, S., Cahan, E. M., Seneviratne, M. G., Sun, R., Lossio-Ventura, J. A., Ioannidis, J. P., Hernandez-Boussard, T. 2020


    The development of machine learning (ML) algorithms to address a variety of issues faced in clinical practice has increased rapidly. However, questions have arisen regarding biases in their development that can affect their applicability in specific populations. We sought to evaluate whether studies developing ML models from electronic health record (EHR) data report sufficient demographic data on the study populations to demonstrate representativeness and reproducibility.We searched PubMed for articles applying ML models to improve clinical decision-making using EHR data. We limited our search to papers published between 2015 and 2019.Across the 164 studies reviewed, demographic variables were inconsistently reported and/or included as model inputs. Race/ethnicity was not reported in 64%; gender and age were not reported in 24% and 21% of studies, respectively. Socioeconomic status of the population was not reported in 92% of studies. Studies that mentioned these variables often did not report if they were included as model inputs. Few models (12%) were validated using external populations. Few studies (17%) open-sourced their code. Populations in the ML studies include higher proportions of White and Black yet fewer Hispanic subjects compared to the general US population.The demographic characteristics of study populations are poorly reported in the ML literature based on EHR data. Demographic representativeness in training data and model transparency is necessary to ensure that ML models are deployed in an equitable and reproducible manner. Wider adoption of reporting guidelines is warranted to improve representativeness and reproducibility.

    View details for DOI 10.1093/jamia/ocaa164

    View details for PubMedID 32935131

  • Phenotyping severity of patient-centered outcomes using clinical notes: A prostate cancer use case. Learning health systems Bozkurt, S., Paul, R., Coquet, J., Sun, R., Banerjee, I., Brooks, J. D., Hernandez-Boussard, T. 2020; 4 (4): e10237


    A learning health system (LHS) must improve care in ways that are meaningful to patients, integrating patient-centered outcomes (PCOs) into core infrastructure. PCOs are common following cancer treatment, such as urinary incontinence (UI) following prostatectomy. However, PCOs are not systematically recorded because they can only be described by the patient, are subjective and captured as unstructured text in the electronic health record (EHR). Therefore, PCOs pose significant challenges for phenotyping patients. Here, we present a natural language processing (NLP) approach for phenotyping patients with UI to classify their disease into severity subtypes, which can increase opportunities to provide precision-based therapy and promote a value-based delivery system.Patients undergoing prostate cancer treatment from 2008 to 2018 were identified at an academic medical center. Using a hybrid NLP pipeline that combines rule-based and deep learning methodologies, we classified positive UI cases as mild, moderate, and severe by mining clinical notes.The rule-based model accurately classified UI into disease severity categories (accuracy: 0.86), which outperformed the deep learning model (accuracy: 0.73). In the deep learning model, the recall rates for mild and moderate group were higher than the precision rate (0.78 and 0.79, respectively). A hybrid model that combined both methods did not improve the accuracy of the rule-based model but did outperform the deep learning model (accuracy: 0.75).Phenotyping patients based on indication and severity of PCOs is essential to advance a patient centered LHS. EHRs contain valuable information on PCOs and by using NLP methods, it is feasible to accurately and efficiently phenotype PCO severity. Phenotyping must extend beyond the identification of disease to provide classification of disease severity that can be used to guide treatment and inform shared decision-making. Our methods demonstrate a path to a patient centered LHS that could advance precision medicine.

    View details for DOI 10.1002/lrh2.10237

    View details for PubMedID 33083539

    View details for PubMedCentralID PMC7556418

  • Automated Detection of Measurements and Their Descriptors in Radiology Reports Using a Hybrid Natural Language Processing Algorithm. Journal of digital imaging Bozkurt, S., Alkim, E., Banerjee, I., Rubin, D. L. 2019


    Radiological measurements are reported in free text reports, and it is challenging to extract such measures for treatment planning such as lesion summarization and cancer response assessment. The purpose of this work is to develop and evaluate a natural language processing (NLP) pipeline that can extract measurements and their core descriptors, such as temporality, anatomical entity, imaging observation, RadLex descriptors, series number, image number, and segment from a wide variety of radiology reports (MR, CT, and mammogram). We created a hybrid NLP pipeline that integrates rule-based feature extraction modules and conditional random field (CRF) model for extraction of the measurements from the radiology reports and links them with clinically relevant features such as anatomical entities or imaging observations. The pipeline was trained on 1117 CT/MR reports, and performance of the system was evaluated on an independent set of 100 expert-annotated CT/MR reports and also tested on 25 mammography reports. The system detected 813 out of 806 measurements in the CT/MR reports; 784 were true positives, 29 were false positives, and 0 were false negatives. Similarly, from the mammography reports, 96% of the measurements with their modifiers were extracted correctly. Our approach could enable the development of computerized applications that can utilize summarized lesion measurements from radiology report of varying modalities and improve practice by tracking the same lesions along multiple radiologic encounters.

    View details for DOI 10.1007/s10278-019-00237-9

    View details for PubMedID 31222557

  • Comparison of orthogonal NLP methods for clinical phenotyping and assessment of bone scan utilization among prostate cancer patients JOURNAL OF BIOMEDICAL INFORMATICS Coquet, J., Bozkurt, S., Kan, K. M., Ferrari, M. K., Blayney, D. W., Brooks, J. D., Hernandez-Boussard, T. 2019; 94
  • A nomogram for decision-making of completion surgery in endometrial cancer diagnosed after hysterectomy. Archives of gynecology and obstetrics Bozkurt, S., Toptas, T., Aydin, H. A., Simsek, T., Yavuz, Y. 2019


    Extrauterine tumor spread is one of the essential determinants of disease outcome in endometrial cancer. However; more than 30% of patients still undergo incomplete surgery at the initial attempt. Strategies regarding the management of patients with incompletely staged early-stage disease or patients with undebulked advanced-stage disease remain controversial. Depending on postoperative uterine features and findings on imaging, patients may be put on observation or receive adjuvant therapy or undergo re-staging or debulking surgery followed by adjuvant therapy. To identify patients who would most benefit from a completion surgery, either for restaging or for cytoreduction, we developed a nomogram for estimation of extrauterine disease based on findings of final hysterectomy specimen.Data of 336 patients whose extrauterine disease status was known were analyzed. A nomogram was constructed using patient characteristics including age, grade, myometrial invasion, lymphovascular space involvement, cervical involvement, and peritoneal cytology. The nomogram was internally validated in terms of discrimination, calibration and overall performance.The nomogram showed good performance accuracy with an area under the receiver operating characteristic curve of 0.870, a specificity of 95.5%, and a positive predictive value of 73.9%. Decision curve analysis revealed that the use of the nomogram in decision-making for completion surgery leads to the equivalent of a net 18 true-positive results per 100 patients without an increase in the number of false-positive results.Estimation of extrauterine disease from final hysterectomy specimen is possible with high predictive performance using the nomogram developed. The nomogram may help clinicians in decision-making for management of incomplete surgeries.

    View details for DOI 10.1007/s00404-019-05223-8

    View details for PubMedID 31250198

  • Machine Learning Approaches for Extracting Stage from Pathology Reports in Prostate Cancer. Studies in health technology and informatics Lenain, R., Seneviratne, M. G., Bozkurt, S., Blayney, D. W., Brooks, J. D., Hernandez-Boussard, T. 2019; 264: 1522–23


    Clinical and pathological stage are defining parameters in oncology, which direct a patient's treatment options and prognosis. Pathology reports contain a wealth of staging information that is not stored in structured form in most electronic health records (EHRs). Therefore, we evaluated three supervised machine learning methods (Support Vector Machine, Decision Trees, Gradient Boosting) to classify free-text pathology reports for prostate cancer into T, N and M stage groups.

    View details for DOI 10.3233/SHTI190515

    View details for PubMedID 31438212

  • Natural Language Processing Approaches to Detect the Timeline of Metastatic Recurrence of Breast Cancer. JCO clinical cancer informatics Banerjee, I., Bozkurt, S., Caswell-Jin, J. L., Kurian, A. W., Rubin, D. L. 2019; 3: 1–12


    Electronic medical records (EMRs) and population-based cancer registries contain information on cancer outcomes and treatment, yet rarely capture information on the timing of metastatic cancer recurrence, which is essential to understand cancer survival outcomes. We developed a natural language processing (NLP) system to identify patient-specific timelines of metastatic breast cancer recurrence.We used the OncoSHARE database, which includes merged data from the California Cancer Registry and EMRs of 8,956 women diagnosed with breast cancer in 2000 to 2018. We curated a comprehensive vocabulary by interviewing expert clinicians and processing radiology and pathology reports and progress notes. We developed and evaluated the following two distinct NLP approaches to analyze free-text notes: a traditional rule-based model, using rules for metastatic detection from the literature and curated by domain experts; and a contemporary neural network model. For each 3-month period (quarter) from 2000 to 2018, we applied both models to infer recurrence status for that quarter. We trained the NLP models using 894 randomly selected patient records that were manually reviewed by clinical experts and evaluated model performance using 179 hold-out patients (20%) as a test set.The median follow-up time was 19 quarters (5 years) for the training set and 15 quarters (4 years) for the test set. The neural network model predicted the timing of distant metastatic recurrence with a sensitivity of 0.83 and specificity of 0.73, outperforming the rule-based model, which had a specificity of 0.35 and sensitivity of 0.88 (P < .001).We developed an NLP method that enables identification of the occurrence and timing of metastatic breast cancer recurrence from EMRs. This approach may be adaptable to other cancer sites and could help to unlock the potential of EMRs for research on real-world cancer outcomes.

    View details for DOI 10.1200/CCI.19.00034

    View details for PubMedID 31584836

  • Comparison of Orthogonal NLP Methods for Clinical Phenotyping and Assessment of Bone Scan Utilization among Prostate Cancer Patients. Journal of biomedical informatics Coquet, J., Bozkurt, S., Kan, K. M., Ferrari, M. K., Blayney, D. W., Brooks, J. D., Hernandez-Boussard, T. 2019: 103184


    Clinical care guidelines recommend that newly diagnosed prostate cancer patients at high risk for metastatic spread receive a bone scan prior to treatment and that low risk patients not receive it. The objective was to develop an automated pipeline to interrogate heterogeneous data to evaluate the use of bone scans using a two different Natural Language Processing (NLP) approaches.Our cohort was divided into risk groups based on Electronic Health Records (EHR). Information on bone scan utilization was identified in both structured data and free text from clinical notes. Our pipeline annotated sentences with a combination of a rule-based method using the ConText algorithm (a generalization of NegEx) and a Convolutional Neural Network (CNN) method using word2vec to produce word embeddings.A total of 5,500 patients and 369,764 notes were included in the study. A total of 39% of patients were high-risk and 73% of these received a bone scan; of the 18% low risk patients, 10% received one. The accuracy of CNN model outperformed the rule-based model one (F-measure = 0.918 and 0.897 respectively). We demonstrate a combination of both models could maximize precision or recall, based on the study question.Using structured data, we accurately classified patients' cancer risk group, identified bone scan documentation with two NLP methods, and evaluated guideline adherence. Our pipeline can be used to provide concrete feedback to clinicians and guide treatment decisions.

    View details for PubMedID 31014980

  • Automatic Inference of BI-RADS Final Assessment Categories from Narrative Mammography Report Findings. Journal of biomedical informatics Banerjee, I., Bozkurt, S., Alkim, E., Sagreiya, H., Kurian, A. W., Rubin, D. L. 2019: 103137


    We propose an efficient natural language processing approach for inferring the BI-RADS final assessment categories by analyzing only the mammogram findings reported by the mammographer in narrative form. The proposed hybrid method integrates semantic term embedding with distributional semantics, producing a context-aware vector representation of unstructured mammography reports. A large corpus of unannotated mammography reports (300,000) was used to learn the context of the key-terms using a distributional semantics approach, and the trained model was applied to generate context-aware vector representations of the reports annotated with BI-RADS category(22,091). The vectorized reports were utilized to train a supervised classifier to derive the BI-RADS assessment class. Even though the majority of the proposed embedding pipeline is unsupervised, the classifier was able to recognize substantial semantic information for deriving the BI-RADS categorization not only on a holdout internal testset and also on an external validation set (1,900 reports). Our proposed method outperforms a recently published domain-specific rule-based system and could be relevant for evaluating concordance between radiologists. With minimal requirement for task specific customization, the proposed method can be easily transferable to a different domain to support large scale text mining or derivation of patient phenotype.

    View details for PubMedID 30807833

  • Knowledge, attitudes and medical practice regarding hepatitis B prevention and management among healthcare workers in Northern Vietnam. PloS one Hang Pham, T. T., Le, T. X., Nguyen, D. T., Luu, C. M., Truong, B. D., Tran, P. D., Toy, M., Bozkurt, S., So, S. 2019; 14 (10): e0223733


    BACKGROUND AND AIM: Vietnam's burden of liver cancer is largely due to its high prevalence of chronic hepatitis B virus (HBV) infection. This study aimed to examine healthcare workers' (HCWs) knowledge, attitude and practices regarding HBV prevention and management.METHODS: A cross-sectional survey among health care workers working at primary and tertiary facilities in two Northern provinces in Vietnam in 2017. A standardized questionnaire was administered to randomly selected HCWs. Multivariate regression was used to identify predictors of the HBV knowledge score.RESULTS: Among the 314 participants, 75.5% did not know HBV infection at birth carries the highest risk of developing chronic infection. The median knowledge score was 25 out of 42 (59.5%). About one third (30.2%) wrongly believed that HBV can be transmitted through eating or sharing food with chronic hepatitis B patients. About 38.8% did not feel confident that the hepatitis B vaccine is safe. Only 30.1% provided correct answers to all the questions on injection safety. Up to 48.2% reported they consistently recap needles with two hands after injection, a practice that would put them at greater risk of needle stick injury. About 24.2% reported having been pricked by a needle at work within the past 12 months. More than 40% were concerned about having casual contact or sharing food with a person with chronic hepatitis B infection (CHB). In multivariate analysis, physicians scored significantly higher compared to other healthcare professionals. Having received training regarding hepatitis B within the last two years was also significantly associated with a better HBV knowledge score.CONCLUSIONS: Findings from the survey indicated an immediate need to implement an effective hepatitis B education and training program to build capacity among Vietnam's healthcare workers in hepatitis B prevention and control and to dispel hepatitis B stigma.

    View details for DOI 10.1371/journal.pone.0223733

    View details for PubMedID 31609983

  • Automatic inference of BI-RADS final assessment categories from narrative mammography report findings Journal of Biomedical Informatics Banerjee, I., Bozkurt, S., Alkim, E., Sagreiya, H., Kurian, A. W., Rubin, D. L. 2019
  • Is it possible to automatically assess pretreatment digital rectal examination documentation using natural language processing? A single-centre retrospective study. BMJ open Bozkurt, S., Kan, K. M., Ferrari, M. K., Rubin, D. L., Blayney, D. W., Hernandez-Boussard, T., Brooks, J. D. 2019; 9 (7): e027182


    To develop and test a method for automatic assessment of a quality metric, provider-documented pretreatment digital rectal examination (DRE), using the outputs of a natural language processing (NLP) framework.An electronic health records (EHR)-based prostate cancer data warehouse was used to identify patients and associated clinical notes from 1 January 2005 to 31 December 2017. Using a previously developed natural language processing pipeline, we classified DRE assessment as documented (currently or historically performed), deferred (or suggested as a future examination) and refused.We investigated the quality metric performance, documentation 6 months before treatment and identified patient and clinical factors associated with metric performance.The cohort included 7215 patients with prostate cancer and 426 227 unique clinical notes associated with pretreatment encounters. DREs of 5958 (82.6%) patients were documented and 1257 (17.4%) of patients did not have a DRE documented in the EHR. A total of 3742 (51.9%) patient DREs were documented within 6 months prior to treatment, meeting the quality metric. Patients with private insurance had a higher rate of DRE 6 months prior to starting treatment as compared with Medicaid-based or Medicare-based payors (77.3%vs69.5%, p=0.001). Patients undergoing chemotherapy, radiation therapy or surgery as the first line of treatment were more likely to have a documented DRE 6 months prior to treatment.EHRs contain valuable unstructured information and with NLP, it is feasible to accurately and efficiently identify quality metrics with current documentation clinician workflow.

    View details for DOI 10.1136/bmjopen-2018-027182

    View details for PubMedID 31324681

  • Impact of age on intermittent hypoxia in obstructive sleep apnea: a propensity-matched analysis SLEEP AND BREATHING Bostanci, A., Bozkurt, S., Turhan, M. 2018; 22 (2): 317–22


    To determine independent relationship of aging with chronic intermittent hypoxia, we compared hypoxia-related polysomnographic variables of geriatric patients (aged ≥ 65 years) with an apnea-hypopnea index (AHI)-, gender-, body mass index (BMI)-, and neck circumference-matched cohort of non-geriatric patients.The study was conducted using clinical and polysomnographic data of 1280 consecutive patients who underwent complete polysomnographic evaluation for suspected sleep-disordered breathing (SDB) at a single sleep disorder center. A propensity score-matched analysis was performed to obtain matched cohorts of geriatric and non-geriatric patients, which resulted in successful matching of 168 patients from each group.Study groups were comparable for gender (P = 0.999), BMI (P = 0.940), neck circumference (P = 0.969), AHI (P = 0.935), and severity of SDB (P = 0.089). The oximetric variables representing the duration of chronic intermittent hypoxia such as mean (P = 0.001), the longest (P = 0.001) and total apnea durations (P = 0.003), mean (P = 0.001) and the longest hypopnea durations (P = 0.001), and total sleep time with oxygen saturation below 90% (P = 0.008) were significantly higher in the geriatric patients as compared with younger adults. Geriatric patients had significantly lower minimum (P = 0.013) and mean oxygen saturation (P = 0.001) than non-geriatric patients.The study provides evidence that elderly patients exhibit more severe and deeper nocturnal intermittent hypoxia than the younger adults, independent of severity of obstructive sleep apnea, BMI, gender, and neck circumference. Hypoxia-related polysomnographic variables in geriatric patients may in fact reflect a physiological aging process rather than the severity of a SDB.

    View details for DOI 10.1007/s11325-017-1560-z

    View details for Web of Science ID 000430993000006

    View details for PubMedID 28849299

  • Expanding a radiology lexicon using contextual patterns in radiology reports. Journal of the American Medical Informatics Association : JAMIA Percha, B., Zhang, Y., Bozkurt, S., Rubin, D., Altman, R. B., Langlotz, C. P. 2018


    Distributional semantics algorithms, which learn vector space representations of words and phrases from large corpora, identify related terms based on contextual usage patterns. We hypothesize that distributional semantics can speed up lexicon expansion in a clinical domain, radiology, by unearthing synonyms from the corpus.We apply word2vec, a distributional semantics software package, to the text of radiology notes to identify synonyms for RadLex, a structured lexicon of radiology terms. We stratify performance by term category, term frequency, number of tokens in the term, vector magnitude, and the context window used in vector building.Ranking candidates based on distributional similarity to a target term results in high curation efficiency: on a ranked list of 775 249 terms, >50% of synonyms occurred within the first 25 terms. Synonyms are easier to find if the target term is a phrase rather than a single word, if it occurs at least 100× in the corpus, and if its vector magnitude is between 4 and 5. Some RadLex categories, such as anatomical substances, are easier to identify synonyms for than others.The unstructured text of clinical notes contains a wealth of information about human diseases and treatment patterns. However, searching and retrieving information from clinical notes often suffer due to variations in how similar concepts are described in the text. Biomedical lexicons address this challenge, but are expensive to produce and maintain. Distributional semantics algorithms can assist lexicon curation, saving researchers time and money.

    View details for PubMedID 29329435

  • Distribution of global health measures from routinely collected PROMIS surveys in patients with breast cancer or prostate cancer. Cancer Seneviratne, M. G., Bozkurt, S., Patel, M. I., Seto, T., Brooks, J. D., Blayney, D. W., Kurian, A. W., Hernandez-Boussard, T. 2018


    The collection of patient-reported outcomes (PROs) is an emerging priority internationally, guiding clinical care, quality improvement projects and research studies. After the deployment of Patient-Reported Outcomes Measurement Information System (PROMIS) surveys in routine outpatient workflows at an academic cancer center, electronic health record data were used to evaluate survey completion rates and self-reported global health measures across 2 tumor types: breast and prostate cancer.This study retrospectively analyzed 11,657 PROMIS surveys from patients with breast cancer and 4411 surveys from patients with prostate cancer, and it calculated survey completion rates and global physical health (GPH) and global mental health (GMH) scores between 2013 and 2018.A total of 36.6% of eligible patients with breast cancer and 23.7% of patients with prostate cancer completed at least 1 survey, with completion rates lower among black patients for both tumor types (P < .05). The mean T scores (calibrated to a general population mean of 50) for GPH were 48.4 ± 9 for breast cancer and 50.6 ± 9 for prostate cancer, and the GMH scores were 52.7 ± 8 and 52.1 ± 9, respectively. GPH and GMH were frequently lower among ethnic minorities, patients without private health insurance, and those with advanced disease.This analysis provides important baseline data on patient-reported global health in breast and prostate cancer. Demonstrating that PROs can be integrated into clinical workflows, this study shows that supportive efforts may be needed to improve PRO collection and global health endpoints in vulnerable populations.

    View details for PubMedID 30512191

  • An Automated Feature Engineering for Digital Rectal Examination Documentation using Natural Language Processing. AMIA ... Annual Symposium proceedings. AMIA Symposium Bozkurt, S., Park, J. I., Kan, K. M., Ferrari, M., Rubin, D. L., Brooks, J. D., Hernandez-Boussard, T. 2018; 2018: 288–94


    Digital rectal examination (DRE) is considered a quality metric for prostate cancer care. However, much of the DRE related rich information is documented as free-text in clinical narratives. Therefore, we aimed to develop a natural language processing (NLP) pipeline for automatic documentation of DRE in clinical notes using a domain-specific dictionary created by clinical experts and an extended version of the same dictionary learned by clinical notes using distributional semantics algorithms. The proposed pipeline was compared to a baseline NLP algorithm and the results of the proposed pipeline were found superior in terms of precision (0.95) and recall (0.90) for documentation of DRE. We believe the rule-based NLP pipeline enriched with terms learned from the whole corpus can provide accurate and efficient identification of this quality metric.

    View details for PubMedID 30815067

  • Impact of coexistent adenomyosis on outcomes of patients with endometrioid endometrial cancer: a propensity score-matched analysis TUMORI J Aydin, H., Toptas, T., Bozkurt, S., Pestereli, E., Simsek, T. 2018; 104 (1): 60–65


    Despite the common occurrence of adenomyosis in endometrial cancer (EC), there is a paucity and conflict in the literature regarding its impact on outcomes of patients. We sought to compare outcomes of patients with endometrioid type EC with or without adenomyosis.A total of 314 patients were included in the analysis. Patients were divided into 2 groups according to the presence or absence of adenomyosis. Adenomyosis was identified in 79 patients (25.1%). A propensity score-matched comparison (1:1) was carried out to minimize selection biases. The propensity score was developed through multivariable logistic regression model including age, stage, and tumor grade as covariates. After performing propensity score matching, 70 patients from each group were successfully matched. Primary outcome of the study was disease-free survival (DFS), and the secondary outcomes were overall survival (OS) and disease-specific survival (DSS).Median follow-up time was 61 months for the adenomyosis positive group and 76 months for the adenomyosis negative group. There were no statistically significant differences in 3- and 5-year DFS, OS, and DSS rates between the 2 groups. Five-year DFS was 92% vs 88% (hazard ratio [HR] 1.54 [0.56-4.27]; p = 0.404), 5-year OS was 94% vs 92% (HR 1.60 [0.49-5.26]; p = 0.441), and 5-year DSS was 94% vs 96% (HR 2.51 [0.46-13.71]; p = 0.290) for patients with and without adenomyosis, respectively.Coexistent adenomyosis in EC is not a prognostic factor and does not impact survival outcomes.

    View details for DOI 10.5301/tj.5000698

    View details for Web of Science ID 000434682400009

    View details for PubMedID 29192745

  • Can Statistical Machine Learning Algorithms Help for Classification of Obstructive Sleep Apnea Severity to Optimal Utilization of Polysomnography Resources? Methods of information in medicine Bozkurt, S., Bostanci, A., Turhan, M. 2017; 56 (4)


    The goal of this study is to evaluate the results of machine learning methods for the classification of OSA severity of patients with suspected sleep disorder breathing as normal, mild, moderate and severe based on non-polysomnographic variables: 1) clinical data, 2) symptoms and 3) physical examination.In order to produce classification models for OSA severity, five different machine learning methods (Bayesian network, Decision Tree, Random Forest, Neural Networks and Logistic Regression) were trained while relevant variables and their relationships were derived empirically from observed data. Each model was trained and evaluated using 10-fold cross-validation and to evaluate classification performances of all methods, true positive rate (TPR), false positive rate (FPR), Positive Predictive Value (PPV), F measure and Area Under Receiver Operating Characteristics curve (ROC-AUC) were used.Results of 10-fold cross validated tests with different variable settings promisingly indicated that the OSA severity of suspected OSA patients can be classified, using non-polysomnographic features, with 0.71 true positive rate as the highest and, 0.15 false positive rate as the lowest, respectively. Moreover, the test results of different variables settings revealed that the accuracy of the classification models was significantly improved when physical examination variables were added to the model.Study results showed that machine learning methods can be used to estimate the probabilities of no, mild, moderate, and severe obstructive sleep apnea and such approaches may improve accurate initial OSA screening and help referring only the suspected moderate or severe OSA patients to sleep laboratories for the expensive tests.

    View details for DOI 10.3414/ME16-01-0084

    View details for PubMedID 28590499

  • Usability Study of RSNA Radiology Reporting Template Library. Studies in health technology and informatics Hong, Y., Zhu, Y., Bozkurt, S., Zhang, J., Kahn, C. E. 2017; 245: 1325


    This study provides insights that could help to improve the Radiological Society of North America (RSNA) Reporting Template Digital Library, based on a usability evaluation. The results show that most users have been satisfied with the website. The general comments for the library are positive, although the participants suggested quite a few areas to improve. About 40% are returning visitors which means people often come back to the website.

    View details for PubMedID 29295406

  • Estimation of cardiovascular disease from polysomnographic parameters in sleep-disordered breathing EUROPEAN ARCHIVES OF OTO-RHINO-LARYNGOLOGY Turhan, M., Bostanci, A., Bozkurt, S. 2016; 273 (12): 4585-4593


    We aimed to illustrate the causal relationships between cardiovascular diseases (CVDs) and various polysomnographic variables, and to develop a CVD estimation model from these variables in a population referred for assessment of possible sleep-disordered breathing (SDB). Clinical and polysomnographic data of 1162 consecutive patients with suspected SDB whose comorbidity status was known, were reviewed, retrospectively. Variable selection was performed in two steps using univariate analysis and tenfold cross validation information gain analysis. The resulting set of variables with an average merit value (m) of >0.005 was considered to be causal factors contributing to the CVDs, and used in Bayesian network models for providing estimations. Of the 1162 patients, 234 had CVDs (20.1 %). In total, 28 parameters were evaluated for variable selection. Of those, 19 were found to be associated with CVDs. Age was the most effective attribute in estimating CVD (m = 0.051), followed by total sleep time with oxygen saturation <90 % (m = 0.021). Some other important variables were apnea-hypopnea index during non-rapid eye movement (m = 0.018), lowest oxygen saturation (m = 0.018), body mass index (m = 0.016), total apnea duration (m = 0.014), mean apnea duration (m = 0.014), longest apnea duration (m = 0.013), and severity of SDB (m = 0.012). The modeling process resulted in a final model, with 76.9 % sensitivity, 96.2 % specificity, and 92.6 % negative predictive value, consisting of all selected variables. The study provides evidence that the estimation of CVDs from polysomnographic parameters is possible with high predictive performance using Bayesian network analysis.

    View details for DOI 10.1007/s00405-016-4176-1

    View details for Web of Science ID 000387700400066

    View details for PubMedID 27363409

  • Using automatically extracted information from mammography reports for decision-support. Journal of biomedical informatics Bozkurt, S., Gimenez, F., Burnside, E. S., Gulkesen, K. H., Rubin, D. L. 2016; 62: 224-231


    To evaluate a system we developed that connects natural language processing (NLP) for information extraction from narrative text mammography reports with a Bayesian network for decision-support about breast cancer diagnosis. The ultimate goal of this system is to provide decision support as part of the workflow of producing the radiology report.We built a system that uses an NLP information extraction system (which extract BI-RADS descriptors and clinical information from mammography reports) to provide the necessary inputs to a Bayesian network (BN) decision support system (DSS) that estimates lesion malignancy from BI-RADS descriptors. We used this integrated system to predict diagnosis of breast cancer from radiology text reports and evaluated it with a reference standard of 300 mammography reports. We collected two different outputs from the DSS: (1) the probability of malignancy and (2) the BI-RADS final assessment category. Since NLP may produce imperfect inputs to the DSS, we compared the difference between using perfect ("reference standard") structured inputs to the DSS ("RS-DSS") vs NLP-derived inputs ("NLP-DSS") on the output of the DSS using the concordance correlation coefficient. We measured the classification accuracy of the BI-RADS final assessment category when using NLP-DSS, compared with the ground truth category established by the radiologist.The NLP-DSS and RS-DSS had closely matched probabilities, with a mean paired difference of 0.004±0.025. The concordance correlation of these paired measures was 0.95. The accuracy of the NLP-DSS to predict the correct BI-RADS final assessment category was 97.58%.The accuracy of the information extracted from mammography reports using the NLP system was sufficient to provide accurate DSS results. We believe our system could ultimately reduce the variation in practice in mammography related to assessment of malignant lesions and improve management decisions.

    View details for DOI 10.1016/j.jbi.2016.07.001

    View details for PubMedID 27388877

  • Automatic abstraction of imaging observations with their characteristics from mammography reports. Journal of the American Medical Informatics Association Bozkurt, S., Lipson, J. A., Senol, U., Rubin, D. L., Bulu, H. 2015; 22 (e1): e81-92


    Radiology reports are usually narrative, unstructured text, a format which hinders the ability to input report contents into decision support systems. In addition, reports often describe multiple lesions, and it is challenging to automatically extract information on each lesion and its relationships to characteristics, anatomic locations, and other information that describes it. The goal of our work is to develop natural language processing (NLP) methods to recognize each lesion in free-text mammography reports and to extract its corresponding relationships, producing a complete information frame for each lesion.We built an NLP information extraction pipeline in the General Architecture for Text Engineering (GATE) NLP toolkit. Sequential processing modules are executed, producing an output information frame required for a mammography decision support system. Each lesion described in the report is identified by linking it with its anatomic location in the breast. In order to evaluate our system, we selected 300 mammography reports from a hospital report database.The gold standard contained 797 lesions, and our system detected 815 lesions (780 true positives, 35 false positives, and 17 false negatives). The precision of detecting all the imaging observations with their modifiers was 94.9, recall was 90.9, and the F measure was 92.8.Our NLP system extracts each imaging observation and its characteristics from mammography reports. Although our application focuses on the domain of mammography, we believe our approach can generalize to other domains and may narrow the gap between unstructured clinical report text and structured information extraction needed for data mining and decision support.

    View details for DOI 10.1136/amiajnl-2014-003009

    View details for PubMedID 25352567

  • Automated detection of ambiguity in BI-RADS assessment categories in mammography reports. Studies in health technology and informatics Bozkurt, S., Rubin, D. 2014; 197: 35-39


    An unsolved challenge in biomedical natural language processing (NLP) is detecting ambiguities in the reports that can help physicians to improve report clarity. Our goal was to develop NLP methods to tackle the challenges of identifying ambiguous descriptions of the laterality of BI-RADS Final Assessment Categories in mammography radiology reports. We developed a text processing system that uses a BI-RADS ontology we built as a knowledge source for automatic annotation of the entities in mammography reports relevant to this problem. We used the GATE NLP toolkit and developed customized processing resources for report segmentation, named entity recognition, and detection of mismatches between BI-RADS Final Assessment Categories and mammogram laterality. Our system detected 55 mismatched cases in 190 reports and the accuracy rate was 81%. We conclude that such NLP techniques can detect ambiguities in mammography reports and may reduce discrepancy and variability in reporting.

    View details for PubMedID 24743074

  • Annotation for Information Extraction from Mammography Reports INFORMATICS, MANAGEMENT AND TECHNOLOGY IN HEALTHCARE Bozkurt, S., Gulkesen, K. H., Rubin, D. 2013; 190: 183-185


    Inter and intra-observer variability in mammographic interpretation is a challenging problem, and decision support systems (DSS) may be helpful to reduce variation in practice. Since radiology reports are created as unstructured text reports, Natural language processing (NLP) techniques are needed to extract structured information from reports in order to provide the inputs to DSS. Before creating NLP systems, producing high quality annotated data set is essential. The goal of this project is to develop an annotation schema to guide the information extraction tasks needed from free-text mammography reports.

    View details for DOI 10.3233/978-1-61499-276-9-183

    View details for Web of Science ID 000341032900053

    View details for PubMedID 23823416

  • An Open-Standards Grammar for Outline-Style Radiology Report Templates JOURNAL OF DIGITAL IMAGING Bozkurt, S., Kahn, C. E. 2012; 25 (3): 359-364


    Structured reporting uses consistent ordering of results and standardized terminology to improve the quality and reduce the complexity of radiology reports. We sought to define a generalized approach for radiology reporting that produces flexible outline-style reports, accommodates structured information and named reporting elements, allows reporting terms to be linked to controlled vocabularies, uses existing informatics standards, and allows structured report data to be extracted readily. We applied the Regular Language for XML-Next Generation (RELAX NG) schema language to create templates for 110 reporting templates created as part of the Radiological Society of North America reporting initiative. We evaluated how well this approach addressed the project's goals. The RELAX NG schema language expressed the cardinality and hierarchical relationships of reporting concepts, and allowed reporting elements to be mapped to terms in controlled medical vocabularies, such as RadLex®, Systematized Nomenclature of Medicine Clinical Terms®, and Logical Observation Identifiers Names and Codes®. The approach provided extensibility and accommodated the addition of new features. Overall, the approach has proven to be useful and will form the basis for a supplement to the Digital Imaging and Communication in Medicine Standard.

    View details for DOI 10.1007/s10278-012-9456-8

    View details for Web of Science ID 000304109700007

    View details for PubMedID 22258732

    View details for PubMedCentralID PMC3348985

Latest information on COVID-19