top of page

Free Datasets for Radiology AI

  • Dr. Candace Makeda Moore
  • 20 באפר׳ 2019
  • זמן קריאה 4 דקות

The ACR has released a directory of datasets for AI. I found it to be a significantly smaller than the number I encountered. I'm sending them updates- but who knows how long it will take them to put them up. I've decided to post my fruitful methods for finding radiology datasets in 2019.

Obviously, you could just go googling around, however many previously available datasets you will find links to were either opensource projects of academic projects- and they died, or the websites moved or they migrated behind a paywall. Here are some links that are not dead as of today.

The ACR list of datasets is here:

https://www.acrdsi.org/DSI-Services/Dataset-Directory

My additional recommendations to find datasets are:

#1: The Medical Image Bank of Valencia- not just one but MANY DATASETS!

http://bimcv.cipf.es/

Spine (MIDAS), Brains (specifically multiple datasets like GLIOHABITATS, NEUROBIM-MS etc.), Chest Xrays (PADCHEST), even methodological frameworks and a GIS. Just off the hook cool- with people who actually write you back!

#2 MD.ai

https://public.md.ai/hub/projects/public

Columbia, Harvard and Duke put some great datasets including the Qure.ai head CT dataset in one place. Not the largest list of datasets- but taking #2 in my heart for leading me to publicly available algorithms and code nearby at https://public.md.ai/hub/models/public

#3: OpenI: The Open Access Biomedical Image Search Engine

https://openi.nlm.nih.gov/

Home to the University of Indiana Chest Xray dataset. The U of I dataset, while smaller than either ChexNet set includes the full reports in XML. So there is a CXR REPORT dataset here, not just images.

#4: Kaggle:

www.kaggle.com

A general dataset website that includes nonradiology datasets, but also many radiology datasets. Often a better way to get datasets than their own official websites as you don't have to buy special software but just download a zip or two/ A subset of the DeepLesion dataset. ChestXray8...(because before 14, there was 8) and so on.

#5+6: OpenNeuro and OASIS

https://openneuro.org/ and http://oasis-brains.org/

Neuro, neuro-> Brain MRIs and more

#7 Spineweb's datasets

http://spineweb.digitalimaginggroup.ca/spineweb/index.php?n=Main.Datasets

Over a dozen datasets about the spine

#8 Zenodo

https://zenodo.org/

A few simple clicks or queries and you can grab plenty of datasets such as this one (UCLH Stroke EIT Dataset - Radiology Data) https://zenodo.org/record/1199398#.XL2uNOgzZPY

#8 The Cancer Imaging Archive:

https://wiki.cancerimagingarchive.net/

The ACR posted some but not all or even most of the datasets available from this site. The site has cancer imaging DICOMS by the terabyte. So many collections I'm too lazy to describe them all- just look:

  • 4D-Lung

  • ACRIN-FLT-Breast

  • ACRIN-FMISO-Brain

  • ACRIN-NSCLC-FDG-PET

  • Anti-PD-1 Immunotherapy Lung (Anti-PD-1_Lung)

  • Anti-PD-1 Immunotherapy Melanoma (Anti-PD-1_MELANOMA)

  • APOLLO-1-VA

  • APOLLO2

  • Brain-Tumor-Progression

  • BREAST-DIAGNOSIS

  • Breast-MRI-NACT-Pilot

  • CBIS-DDSM

  • CPTAC-CCRCC

  • CPTAC-CM

  • CPTAC-GBM

  • CPTAC-HNSCC

  • CPTAC-LSCC

  • CPTAC-LUAD

  • CPTAC-PDA

  • CPTAC-SAR

  • CPTAC-UCEC

  • Credence Cartridge Radiomics Phantom CT Scans

  • Credence Cartridge Radiomics Phantom CT Scans with Controlled Scanning Approach (CC-Radiomics-Phantom-2)

  • CT COLONOGRAPHY

  • CT Lymph Nodes

  • Head-and-neck squamous cell carcinoma patients with CT taken during pre-treatment, mid-treatment, and post-treatment (HNSCC-3DCT-RT)

  • Head-Neck Cetuximab

  • Head-Neck-PET-CT

  • ISPY1

  • Ivy GAP

  • LGG-1p19qDeletion

  • LIDC-IDRI

  • LungCT-Diagnosis

  • Lung CT Segmentation Challenge 2017

  • Lung Phantom

  • Mouse-Astrocytoma

  • Mouse-Mammary

  • NaF Prostate

  • NRG-1308

  • NSCLC-Cetuximab

  • NSCLC Radiogenomics

  • NSCLC-Radiomics

  • NSCLC-Radiomics-Genomics

  • Osteosarcoma data from UT Southwestern/UT Dallas for Viable and Necrotic Tumor Assessment

  • Pancreas-CT

  • Phantom FDA

  • Prostate-3T

  • PROSTATE-DIAGNOSIS

  • Prostate Fused-MRI-Pathology

  • PROSTATE-MRI

  • QIBA CT-1C

  • QIN-BRAIN-DSC-MRI

  • QIN-Breast

  • QIN Breast DCE-MRI

  • QIN GBM Treatment Response

  • QIN-HEADNECK

  • QIN LUNG CT

  • QIN PET Phantom

  • QIN PROSTATE

  • QIN-PROSTATE-Repeatability

  • QIN-SARCOMA

  • Quantitative Imaging Network Collections

  • REMBRANDT

  • RIDER Breast MRI

  • RIDER Collections

  • RIDER Lung CT

  • RIDER Lung PET-CT

  • RIDER NEURO MRI

  • RIDER PHANTOM MRI

  • RIDER Phantom PET-CT

  • Soft-tissue-Sarcoma

  • SPIE-AAPM Lung CT Challenge

  • SPIE-AAPM-NCI PROSTATEx Challenges

  • Synthetic and Phantom MR Images for Determining Deformable Image Registration Accuracy (MRI-DIR)

  • TCGA-BLCA

  • TCGA-BRCA

  • TCGA-CESC

  • TCGA-COAD

  • TCGA-ESCA

  • TCGA-GBM

  • TCGA-HNSC

  • TCGA-KICH

  • TCGA-KIRC

  • TCGA-KIRP

  • TCGA-LGG

  • TCGA-LIHC

  • TCGA-LUAD

  • TCGA-LUSC

  • TCGA-OV

  • TCGA-PRAD

  • TCGA-READ

  • TCGA-SARC

  • TCGA-STAD

  • TCGA-THCA

  • TCGA-UCEC

  • The VICTRE Trial: Open-Source, In-Silico Clinical Trial For Evaluating Digital Breast Tomosynthesis

# 8002 The NIH:

I can't tell you how irritated I have been every time I try and access an NIH dataset and I find out that theoretically I need to pay. The N is for national- and I'm a sucker who pays taxes for this institution- yet somehow I need to pay a private company called BOX to get datasets? Because I need to subsidize rich people in tech, not the other way around according to government logic. Apparently there is even a National Biomedical Imaging Archive complete with an NBIA Data Retriever which is always down for maintenence. Since I can't get the thing, I must presume the cart icon means they are working on some way for me to pay (beyond my taxes) for that as well. Nonetheless, the DeepLesion, (https://nihcc.app.box.com/v/DeepLesion) as well as ChestXray 8, ChestXray 14 have made their way to various dataset groupie websites and gone open- if you look, you can find and torrent. The question is when will someone set the MIMIC Chest Xray data set free? (https://physionet.nlm.nih.gov/physiobank/database/mimiccxr/)

Clearly the NIH did not get the memo- I mean literally- THE MEMO from the US government:

https://project-open-data.cio.gov/policy-memo/

Which includes such nuggets of hope as:

"this Memorandum requires agencies to collect or create information in a way that supports downstream information processing and dissemination activities" and "Making information resources accessible, discoverable, and usable by the public can help fuel entrepreneurship, innovation, and scientific discovery – all of which improve Americans’ lives and contribute significantly to job creation"///

The NSF seems to have gotten the memo, so here's to hoping... someone besides the one tiny branch that published some serious CXR data gets the memo. (Here is that CXR gold: https://ceb.nlm.nih.gov/repositories/tuberculosis-chest-x-ray-image-data-sets/)


 
 
 

Comments


©2018 by Dr. Candace Makeda Moore.

bottom of page