Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features
PMID: 28872634 · DOI: 10.1038/sdata.2017.117 · Journal: Scientific Data (2017)
TL;DR
Bakas et al. release expert tumor segmentation labels and a panel of >700 radiomic features for the pre-operative multi-parametric MRI (mMRI) scans of the TCGA-GBM (n=135) and TCGA-LGG (n=108) glioma collections hosted in The Cancer Imaging Archive (TCIA). Sub-region labels (enhancing tumor [ET], non-enhancing tumor core [NET], peritumoral edema [ED]) were generated by the BraTS’15-winning GLISTRboost method and then manually corrected by a board-certified neuroradiologist. The data resource is intended to enable reproducible imaging analyses and radiogenomic studies linking imaging phenotypes to TCGA molecular profiles for GB and DIFG patients.
Cohort & data
- 243 pre-operative mMRI scans total: 135 from TCGA-GBM (originating set n=262, 8 institutions) and 108 from TCGA-LGG (originating set n=199, 5 institutions).
- Cancer types: glioblastoma (GB) and lower-grade diffuse glioma (DIFG).
- Required modalities per case: T1, T1-Gd (post-contrast), T2, T2-FLAIR.
- Imaging hosted at TCIA; molecular/clinical counterparts are the cBioPortal studies gbm_tcga and lgg_tcga.
- Scans were retrospective standard-of-care acquisitions with no uniform imaging protocol; scanners spanned GE, Siemens, Philips, and Hitachi at field strengths 0.5–3 T.
- Pre-processing pipeline: re-orientation to LPS, affine co-registration to a common T1 template via FSL FLIRT, resampling to 1 mm³, skull-stripping with BET (and MASS/MUSE fallback), SUSAN smoothing for the segmentation pipeline, and ITK histogram matching across subjects PMID:28872634.
- Methods released alongside the data: GLISTRboost segmentation, CaPTk for seed-point initialization and feature extraction, evaluated under BraTS challenge protocols.
Key findings
- Computer-aided segmentation by GLISTRboost on the BraTS’15 GBM training subset identified as pre-operative achieved median DICE of 0.92 (IQR 0.88–0.94) for whole tumor (WT), 0.88 (IQR 0.81–0.93) for tumor core (TC), and 0.88 (IQR 0.81–0.91) for enhancing tumor (ET) PMID:28872634.
- 95th-percentile Hausdorff distances (in mm) were 3.61 (IQR 2.39–8.15) for WT, 4.06 (IQR 2.39–7.29) for TC, and 2.00 (IQR 1.41–2.83) for ET PMID:28872634.
- Median Jaccard agreement between automated GLISTRboost labels and the released manually-revised labels across all 243 cases was 0.96 (mean 0.93 ± 0.10) for WT, 0.87 (0.78 ± 0.23) for TC, and 0.86 (0.73 ± 0.29) for ET — i.e., manual revision impacted core/enhancing labels much more than whole-tumor extent PMID:28872634.
- Manual revision rules treated DIFG cases without an apparent ET region as NET-only (or NET+ED), reflecting the lower blood-brain-barrier disruption typical of low-grade glioma biology PMID:28872634.
- A panel of >700 radiomic features was extracted volumetrically from manually-revised labels, spanning intensity, volumetric, morphologic, histogram-based, GLCM/GLRLM/GLSZM/NGTDM textural, wavelet-based, spatial, and glioma growth-model parameters PMID:28872634.
- N=200 GBM and N=44 LGG cases overlap with the BraTS’15 training set, and N=23 GBM / N=15 LGG with the BraTS’15 testing set, allowing direct cross-comparison with the BraTS leaderboard PMID:28872634.
- The manually-revised labels provided here became the reference labels used in the BraTS’17 challenge PMID:28872634.
Genes & alterations
This paper does not report gene-level findings. It is an imaging data-resource paper that intentionally enables downstream radiogenomic studies in which the released MRI segmentations and radiomic features can be correlated with TCGA molecular profiles for GB and DIFG (e.g., the cBioPortal studies gbm_tcga and lgg_tcga). The authors cite their own prior work using analogous radiomic features to non-invasively predict mutation status (e.g., EGFRvIII), but no new gene-level analyses are performed in this manuscript PMID:28872634.
Clinical implications
- Provides a reproducible, expert-validated imaging substrate enabling downstream predictive, prognostic, and diagnostic biomarker studies in glioma without requiring image-analysis expertise from end users PMID:28872634.
- Standardized pre-processed mMRI volumes plus sub-region labels (ET, NET, ED) support both clinical correlative work (e.g., associating imaging phenotypes with outcome) and benchmarking of automated segmentation methods for surgical/treatment planning PMID:28872634.
- The authors do not themselves claim a new clinical biomarker; the contribution is infrastructural.
Limitations & open questions
- Cohorts are retrospective, multi-institutional standard-of-care scans with no harmonized acquisition protocol; scanner vendor, field strength, and sequence parameters vary substantially across cases PMID:28872634.
- Pre- vs post-operative classification was based on radiological assessment (absence of skull defect or operative cavity) rather than reliable surgical metadata; radiology reports were not available PMID:28872634.
- Non-uniform intensity bias correction (N3/N4-style) was deliberately not applied because it obliterated the T2-FLAIR signal, which may affect downstream feature reproducibility PMID:28872634.
- Boundaries between NET and ED are intrinsically uncertain; ambiguous regions were left as segmented by GLISTRboost rather than corrected, introducing a known noise floor in fine-grained labels PMID:28872634.
- The biological significance of the >700 individual radiomic features is explicitly stated as unknown; features are released as-is for hypothesis generation PMID:28872634.
- Quantitative BraTS’15 test-set performance for GLISTRboost was withheld by the challenge organizers at time of publication, so reported DICE/Hausdorff numbers come from cross-validated training-set evaluation only PMID:28872634.
Citations from this paper used in the wiki
- “we release segmentation labels and radiomic features for all pre-operative multimodal magnetic resonance imaging (MRI) (n=243) of the multi-institutional glioma collections of The Cancer Genome Atlas (TCGA)” — Abstract.
- “Pre-operative scans were identified in both glioblastoma (TCGA-GBM, n=135) and low-grade-glioma (TCGA-LGG, n=108) collections via radiological assessment.” — Abstract.
- “The median DICE values with their corresponding inter-quartile ranges (IQR) for the three evaluated regions, i.e., WT, TC, ET, were equal to 0.92 (IQR: 0.88–0.94), 0.88 (IQR: 0.81–0.93) and 0.88 (IQR: 0.81–0.91), respectively.” — Technical Validation.
- “The median (mean±std.dev) Jaccard values for the three regions of interest i.e., WT, TC, ET, were equal to 0.96 (0.93±0.1), 0.87 (0.78±0.23), and 0.86 (0.73±0.29), respectively.” — Technical Validation.
- “An extensive panel of more than 700 radiomic features is extracted volumetrically (in 3D), based on the manually-revised labels of each tumor sub-region” — Radiomic features panel.
- “the manually-revised segmentation labels provided in [Data Citation 3 and Data Citation 4] are included in the datasets of the BraTS’17 challenge” — Data Records.
This page was processed by crosslinker on 2026-05-04.