Imaging and clinical data archive for head and neck squamous cell carcinoma patients treated with radiotherapy
PMID: 30179230 · DOI: 10.1038/sdata.2018.173 · Journal: Scientific Data (2018)
TL;DR
Grossberg et al. release a curated, single-institution imaging and clinical data archive of 215 head and neck squamous cell carcinoma (HNSC) patients treated with definitive radiotherapy at MD Anderson Cancer Center between 2003 and 2013. The collection — published on The Cancer Imaging Archive (TCIA) as the HNSCC collection — pairs 433,384 DICOM files (diagnostic CT, PET-CT, MRI; CT simulation; RTSTRUCT, RTPLAN, RTDOSE) with date-matched demographic, risk-factor, staging, recurrence, and survival data, plus pre- and post-treatment CT-derived skeletal muscle and adipose body-composition measurements at the L3 vertebra. It is the first publicly shared HNSCC RT dataset that supports body composition as either a risk factor or endpoint PMID:30179230.
Cohort & data
- Patients (n=215) screened from 2,840 consecutive HNSC patients treated with curative-intent radiotherapy at The University of Texas MD Anderson Cancer Center between October 1, 2003 and August 31, 2013. Eligibility required whole-body PET-CT or abdominal CT both before and after RT.
- Subsite distribution: 156/215 (73%) oropharyngeal primaries; among oropharynx cases the most common sites were base of tongue (51%) and tonsil (43%). 85.5% male.
- Treatment: 127/215 (59%) received concurrent chemotherapy, of whom 98% received platinum-based regimens (i.e., cisplatin). 27/215 (13%) received postoperative RT. Mean RT dose 68.66 Gy (range 56–72 Gy) in 28–40 fractions. Techniques included 2D RT, IMRT matched to 2D using half-beam block, whole-field IMRT, and volumetric arc therapy.
- Imaging: whole-body PET-CT in 212/215 (98.6%) at baseline; follow-up PET-CT in 213/215 (99.1%) and abdominal CT in 2 patients. Median interval between RT completion and follow-up imaging 2.77 months (IQR 1.93–6.57). PET-CT scanners: GE Discovery RX, Discovery ST, Discovery STE; CT scanners: GE LightSpeed, Discovery CT750HD; RT simulation scanners: Picker PQ 5000, Marconi MX8000, Philips Brilliance Big Bore; MRI scanners: GE Genesis Signa, Signa Excite, Signa HDxt.
- Dataset: published as the TCIA HNSCC collection (DOI 10.7937/K9/TCIA.2017.umz8dv6s); 433,384 DICOM files across 3,225 series and 765 studies, plus a single XLSX with demographics, treatment, outcomes, and body composition.
- HPV status was assessed by in situ hybridization for high-risk HPV subtypes; staging used AJCC 7th edition TNM. Stage IVC patients (distant metastases at diagnosis) were excluded.
Key findings
This is a data descriptor; the paper’s contributions are infrastructural rather than hypothesis-testing.
- Scale and completeness: 215 patients, 433,384 DICOM files, 3,225 series, 765 studies. Each patient contributes a pre-treatment diagnostic CT or PET-CT, a simulation CT, RTPLAN, RTSTRUCT, RTDOSE, and a post-treatment diagnostic CT or PET-CT; recurrence imaging is included where applicable.
- Body composition annotations (unique to this archive): pre- and post-RT skeletal muscle and adipose cross-sectional area at the third lumbar vertebra (L3), measured on Pinnacle v9.6 using Hounsfield-unit thresholds of −29 to 150 for skeletal muscle and −190 to −30 for adipose tissue, normalized to height² as the lumbar skeletal muscle index (SMI) and adipose index (ADI) in cm²/m². Patients are dichotomized as skeletal-muscle “depleted” (SMI <52.4 cm²/m² men; <38.5 cm²/m² women) or “not depleted” at each timepoint. Total lean body mass and fat mass are derived from L3 cross-sectional areas via published formulae: LBM(kg) = 0.3 × [skeletal muscle CSA at L3 (cm²)] + 6.06; FM(kg) = 0.042 × [total adipose CSA at L3 (cm²)] + 11.2.
- Curation pipeline: data transmission and de-identification used the RSNA MIRC Clinical Trial Processor (CTP) following DICOM PS 3.15 Appendix E (Basic Attribute Confidentiality Profile), with Tag Sniffer scans pre- and post-script execution and curator review. RT planning DICOMs were re-exported from MD Anderson’s Pinnacle archive (Philips Radiation Oncology Systems, v9) after primary clinical use; RTDOSE and RTPLAN were recalculated during export.
- Technical validation: performed using Posda Tools (open-source perl-based DICOM curation toolkit, ~62,817 lines + 38,569 lines of programs; 165 include files, 328 programs). 195/215 (90.7%) subjects had series/study inconsistency problems on initial intake; the curated archive required 2–6 revisions per subject (mean 3.17) to resolve duplicates, mis-linked RTSTRUCT/RTDOSE/RTPLAN, and Frame of Reference UID mismatches between RT structure sets and source CT.
Genes & alterations
- No germline or somatic gene-level analyses are reported. HPV status (high-risk subtypes by in situ hybridization) is captured as a clinical variable in the metadata table but not analyzed in this descriptor.
Clinical implications
- The paper makes no new clinical claims of its own; it enables downstream work. The authors explicitly note prior use of these data to evaluate the association between body composition and oncologic outcomes in radiotherapy-treated HNSC (Grossberg et al., JAMA Oncol 2016, PMID:26891703 — not in this corpus).
- The combination of full RT planning DICOMs (RTPLAN/RTSTRUCT/RTDOSE) with date-matched outcomes supports inter-institutional benchmarking of dose-volume relationships and enables modeling of skeletal muscle depletion as a prognostic factor or treatment endpoint in HNSCC RT cohorts.
Limitations & open questions
- Single-institution, single-era: all 215 patients were treated at MD Anderson between 2003 and 2013 using equipment and planning practices of that period (Pinnacle v9; AJCC 7th edition staging). Generalizability to other centers, modern equipment, and AJCC 8th edition (which separates HPV-positive oropharyngeal cancer) is not addressed.
- Selection bias: the 215 patients required both pre- and post-RT whole-body PET-CT or abdominal CT at MD Anderson — a 7.6% subset of the 2,840-patient screening population. Stage IVC patients were excluded by design.
- Cohort skew: heavily oropharyngeal (73%) and male (85.5%); subsite-specific or sex-stratified body-composition analyses may be underpowered for non-oropharyngeal sites or female patients.
- Body composition methodology: contours reviewed by a single radiation oncologist with 5 years of post-residency experience; inter-rater reliability is not reported in this descriptor. SMI thresholds (52.4/38.5 cm²/m²) are inherited from prior cancer-cohort literature and were not re-derived for HNSCC.
- No molecular data: the archive contains no sequencing, expression, or mutation data — only imaging plus clinical metadata. HPV status is a categorical clinical field, not a molecular profile.
- DICOM curation residuals: although Posda Tools resolved series/study inconsistencies, the descriptor does not report a residual-error rate for RT structure-to-CT linkage after curation.
Citations from this paper used in the wiki
- “we detail the collection and processing of computed tomography based imaging in 215 patients with head and neck squamous cell carcinoma that were treated with radiotherapy” (Abstract).
- “the records of 2840 consecutive patients with HNSCC treated with curative-intent RT at The University of Texas MD Anderson Cancer Center from October 1, 2003, to August 31, 2013, were screened. Those patients with whole-body PET-CT or abdominal CT scans performed within the parent institution both before and after RT were included (n=215)” (Methods, Patient selection).
- “One hundred twenty seven patients (59%) received concurrent chemotherapy, with 98% of these patients receiving platinum-based systemic treatment” (Methods, Patient selection).
- “Mean radiation dose was 68.66 Gy (range, 56–72 Gy) delivered in 28–40 daily fractions” (Methods, Patient selection).
- “The HNSCC collection is a dataset consisting of 433,384 DICOM files from 3,225 series and 765 studies collected from 215 patients” (Data Records).
- “Skeletal muscle was defined by a Hounsfield unit range of −29 to 150, and adipose tissue by a range of −190 to −30 Hounsfield units” (Methods, Body Composition Analysis).
- “Skeletal muscle depletion was defined as an SMI<52.4 cm²/m² for men and <38.5 cm²/m² for women” (Methods, Body Composition Analysis).
- “initial analysis showed that 195 of the 215 subjects had series and study inconsistency problems. At completion of curation, the minimum number of revisions for a subject was 2, the maximum was 6, and the average number of revisions was 3.17” (Technical Validation).
This page was processed by paper-compiler on 2026-04-15.