Imaging Data for Research

There is growing interest in research use of radiology images and reports. In the past years, small imaging studies (e.g. 10-100 studies) have been supported  by STARR (fka STRIDE) database. Now, with growing interest in application of machine learning to imaging data, researchers are starting to inquire about very large data volumes, from thousands to tens of thousands to millions of images. Providing high-volume access to large numbers of images for research poses a number of challenges. To address these challenges, Research IT is building a solution (project codename STARR-Radio) to support large-scale imaging data extracts, and tools for de-identification, image analysis and the ability to link images with other modalities of clinical data. We expect alpha version of STARR-Radio solution to be available later in the year. We will provide a roadmap for beta and general availability at the same time.

FAQ for imaging data extract

When will researchers get access to imaging data?

The prototype STARR-Radio is under development. ETA for the alpha, beta and general availability of the pipeline will be provided in the summer of 2018. We will update this link.

I am writing an image based grant now. How can I get help with budgeting?

We will not be able to provide any help with budgeting at this time.

How can I get included in the alpha phase?

Submit a consultation request when the STARR-Radio pipeline is available in alpha.

I need access to 1000 studies now for preliminary research, how do I get it?

At this time there is no mechanism to get access to data. When STARR-Radio is available, you will be able to request a consultation and get access similar to current STARR data access.

I will need access to 100,000 studies. How long will it take to get the data?

Once the solution is available for general use, data access following IRB and (necessary) DRA approval should take no more than a few weeks.

Will I be able to request access to anonymized radiology reports as well?

If the IRB approves, you can indeed access anonymized radiology reports and other structured data in STARR. Note that algorithmic text anonmization does not result in 100% de-idenitfication. Therefore, algorithmically anonymized data is treated as high risk PHI by Stanford.

I have some existing image data. Can I get help with de-identification of my image data?

No, we will only offer help in de-identifying data we extract for researchers. For your own images, please refer to NIH Wiki on Clinical Trials Processor (CTP).

I am planning to de-identify my data. Do you have tools that will help me evaluate whether my de-identification is running correctly or not?

No, there is currently no algorithmic mechanism to assure 100% de-identification. You need a manual review. Note that algorithmic image anonmization does not result in 100% de-idenitfication. Therefore, algorithmically anonymized image data is treated as high risk PHI by Stanford.