STARR-Radiology manuscript on arXiv

High performance on-demand de-identification of a petabyte-scale medical imaging data lake

Research IT publishes its radiology DICOM de-identification best practices on arXiv 

Aug 9, 2020: With the increase in Artificial Intelligence driven approaches, researchers are requesting unprecedented volumes of medical imaging data which far exceed the capacity of traditional on-premise client-server approaches for making the data research analysis-ready. We are making available a flexible solution for on-demand de-identification that combines the use of mature software technologies with modern cloud-based distributed computing techniques to enable faster turnaround in medical imaging research. The solution is part of a broader platform that supports a secure high performance clinical data science platform.

For example, 5000 CT images result in ~5 million de-identified DICOMs, the processing takes 45 minutes and costs less than $6.