RADCURE: An open-source head and neck cancer CT dataset for clinical radiation therapy insights
PMID: 38362943 · DOI: 10.1002/mp.16971 · Journal: Medical Physics (2024)
TL;DR
Welch et al. describe RADCURE, one of the largest publicly available head and neck cancer (HNC) CT imaging datasets, comprising 3346 patients treated at Princess Margaret Cancer Centre. Each case includes a radiation therapy (RT) simulation CT, manually contoured target volumes (gross primary tumor, gross nodal volumes) and 19 organs-at-risk in DICOM RT-STRUCT format, plus linked demographic, clinical, and outcomes data. The dataset is distributed via The Cancer Imaging Archive and is intended to support radiomics, machine learning, and prognostic-model research in head and neck radiation oncology.
Cohort & data
- Patients: 3346 head and neck cancer (HNSC) patients treated at Princess Margaret Cancer Centre, Toronto.
- Modality: RT simulation computed tomography scans acquired on systems from three different manufacturers under standard clinical protocols.
- Annotations: Manually generated target and organ-at-risk contours, reviewed at weekly RT quality assurance rounds; standardized nomenclature for gross primary tumor, gross nodal volumes, and 19 organs-at-risk.
- Clinical metadata: Demographic, clinical, and treatment information, including staging based on the 7th edition TNM classification.
- Demographics: Median age 63; 80% male.
- Disease subsites: Oropharyngeal 50%, laryngeal 25%, nasopharyngeal 12%, hypopharyngeal 5%.
- Follow-up: Median 5 years; 60% surviving at last follow-up.
- Distribution: Images and contours released as DICOM CT and RT-STRUCT; clinical data as CSV. Publicly accessible via The Cancer Imaging Archive.
Key findings
This is a dataset descriptor rather than a hypothesis-testing study; the abstract reports cohort and acquisition characteristics rather than statistical results.
- Assembled and curated 3346 HNC RT planning CTs at a single institution into a research-grade public release PMID:38362943.
- Built a custom data-mining and processing system to extract imaging and structure-set data from the institution’s RT planning and oncology information systems and link each scan to longitudinal clinical outcomes PMID:38362943.
- Standardized RT-STRUCT nomenclature across the cohort to improve interoperability for downstream analyses PMID:38362943.
Genes & alterations
- None reported. RADCURE is an imaging/clinical dataset; no molecular or genomic data are included in the release described.
Clinical implications
- Provides a large, publicly available resource for developing and benchmarking radiomics and machine-learning models for head and neck cancer treatment planning, organ-at-risk segmentation, and prognostic modeling PMID:38362943.
- Authors highlight applications in non-invasive biomarker discovery and prognostic-model development for HNC radiotherapy PMID:38362943.
Limitations & open questions
- Source depth. This wiki page is compiled from the PubMed abstract only; the full Medical Physics manuscript was not available to the compiler. Detailed acquisition parameters, contour quality-assurance metrics, train/validation splits, and benchmark results — if any — are not captured here.
- Single-institution cohort. All scans originate from Princess Margaret Cancer Centre, with three CT manufacturers represented. Generalizability of models trained on RADCURE to other institutions, scanner vendors, or contouring conventions is not addressed in the abstract.
- Staging vintage. Clinical metadata uses the 7th edition TNM system; users wishing to align with current 8th-edition staging (notably the HPV-status-aware oropharyngeal restaging) must remap.
- No molecular data. HPV status, p16 IHC, and genomic profiling are not described as part of the released package in the abstract; downstream radiogenomic studies would require linkage to external molecular cohorts.
- Class imbalance across subsites. Oropharyngeal cases dominate (50%); hypopharyngeal cases are sparse (5%), which may limit subsite-specific model development.
Citations from this paper used in the wiki
- “RADCURE encompasses data from 3346 patients, featuring computed tomography (CT) RT simulation images with corresponding target and organ-at-risk contours.” — Abstract, Acquisition and validation methods.
- “Half of the cohort is diagnosed with oropharyngeal cancer, while laryngeal, nasopharyngeal, and hypopharyngeal cancers account for 25%, 12%, and 5% of cases, respectively.” — Abstract, Acquisition and validation methods.
- “The median patient age is 63, with the final dataset including 80% males… The median duration of follow-up is five years, with 60% of the cohort surviving until the last follow-up point.” — Abstract, Acquisition and validation methods.
- “We have standardized the nomenclature for individual contours—such as the gross primary tumor, gross nodal volumes, and 19 organs-at-risk—to enhance the RT-STRUCT files’ utility.” — Abstract, Data format and usage notes.
- “This comprehensive dataset is publicly accessible via The Cancer Imaging Archive.” — Abstract, Data format and usage notes.
This page was processed by crosslinker on 2026-05-04.