Empirical assessment of bias in machine learning diagnostic test accuracy studies.
Journal of the American Medical Informatics Association : JAMIA
Machine learning (ML) diagnostic tools have significant potential to improve health care. However, methodological pitfalls may affect diagnostic test accuracy studies used to appraise such tools. We aimed to evaluate the prevalence and reporting of design characteristics within the literature. Further, we sought to empirically assess whether design features may be associated with different estimates of diagnostic accuracy.We systematically retrieved 2 × 2 tables (n = 281) describing the performance of ML diagnostic tools, derived from 114 publications in 38 meta-analyses, from PubMed. Data extracted included test performance, sample sizes, and design features. A mixed-effects metaregression was run to quantify the association between design features and diagnostic accuracy.Participant ethnicity and blinding in test interpretation was unreported in 90% and 60% of studies, respectively. Reporting was occasionally lacking for rudimentary characteristics such as study design (28% unreported). Internal validation without appropriate safeguards was used in 44% of studies. Several design features were associated with larger estimates of accuracy, including having unreported (relative diagnostic odds ratio [RDOR], 2.11; 95% confidence interval [CI], 1.43-3.1) or case-control study designs (RDOR, 1.27; 95% CI, 0.97-1.66), and recruiting participants for the index test (RDOR, 1.67; 95% CI, 1.08-2.59).Significant underreporting of experimental details was present. Study design features may affect estimates of diagnostic performance in the ML diagnostic test accuracy literature.The present study identifies pitfalls that threaten the validity, generalizability, and clinical value of ML diagnostic tools and provides recommendations for improvement.
View details for DOI 10.1093/jamia/ocaa075
View details for PubMedID 32548642
An empirical assessment of research practices across 163 clinical trials of tumor-bearing companion dogs.
2019; 9 (1): 11877
Comparative clinical trials of domestic dogs with spontaneously-occurring cancers are increasingly common. Canine cancers are likely more representative of human cancers than induced murine tumors. These trials could bridge murine models and human trials and better prioritize drug candidates. Such investigations also benefit veterinary patients. We aimed to evaluate the design and reporting practices of clinical trials containing ≥2 arms and involving tumor-bearing dogs. 163 trials containing 8552 animals were systematically retrieved from PubMed (searched 1/18/18). Data extracted included sample sizes, response criteria, study design, and outcome reporting. Low sample sizes were prevalent (median n = 33). The median detectable hazard ratio was 0.3 for overall survival and 0.06 for disease progression. Progressive disease thresholds for studies that did not adopt VCOG-RECIST guidelines varied in stringency. Additionally, there was significant underreporting across all Cochrane risk of bias categories. The proportion of studies with unclear reporting ranged from 44% (randomization) to 94% (selective reporting). 72% of studies also failed to define a primary outcome. The present study confirms previous findings that clinical trials in dogs need to be improved, particularly regarding low statistical power and underreporting of design and outcomes.
View details for DOI 10.1038/s41598-019-48425-5
View details for PubMedID 31417164