NSCLC Radiogenomics (Stanford / AMC)

Overview

A multi-modal radiogenomics dataset of 211 NSCLC patients from two cohorts: 162 in the R01 cohort (Stanford / Stanford-affiliated institutions) and 49 in the AMC cohort (Amsterdam Medical Center). The dataset pairs pretreatment CT and FDG PET/CT imaging with clinical mutational testing (EGFR, KRAS, ALK / EML4 fusion), bulk RNA-seq (n=130), Illumina HT-12 microarray (n=26), semantic CT annotations (n=190), and longitudinal clinical outcomes. Published by Bakr et al. 2018 and hosted on TCIA (DOI 10.7937/K9/TCIA.2017.7hs46erv); gene expression data also deposited at GEO (GSE28827 microarray; GSE103584 RNA-seq). PMID:30325352

Composition

  • Cancer types: NSCLC — 172 LUAD (adenocarcinoma), 35 LUSC (squamous cell carcinoma), 4 NOS.
  • R01 cohort: n=162 (38F/124M, mean age 68, range 42–86); AMC cohort: n=49 (33F/16M, mean age 67, range 24–80).
  • Imaging: CT for all 211; FDG PET/CT for 201; CT tumour segmentations for 144; semantic CT annotations for 190.
  • Molecular: EGFR mutational testing (n=206), KRAS (n=205), ALK by FISH (n=196); RNA-seq (n=130, Illumina HiSeq 2500, aligned to hg19 with STAR, quantified with Cufflinks FPKM); microarray (n=26 Illumina HT-12, 17 overlapping with RNA-seq).
  • Clinical: survival, recurrence, smoking history, pathological TNM stage (n=161), histopathological grade (n=162), adjuvant and systemic therapy.
  • Semantic annotations: 28 nodule analysis features + parenchymal features encoded in AIM format using ePAD; annotated by a single radiologist with >20 years experience. PMID:30325352

Assays / panels (linked)

  • CT radiomics and FDG-PET/CT imaging.
  • RNA-seq (Illumina HiSeq 2500, TruSeq Total Stranded RNA + Ribo-Zero, STAR v2.3 / Cufflinks v2.0.2, hg19).
  • Illumina HT-12 gene expression microarray (GEO: GSE28827).
  • SNaPshot multiplex PCR for EGFR / KRAS; FISH for EML4-ALK.

Papers using this cohort

  • PMID:30325352 — Bakr et al. 2018, Scientific Data: primary dataset descriptor; 211 NSCLC patients with paired CT, PET/CT, RNA-seq, and mutational data; TCIA DOI 10.7937/K9/TCIA.2017.7hs46erv.

Notable findings derived from this cohort

  • 130 of 211 subjects have the complete quartet of clinical, CT, PET/CT, and RNA-seq data, enabling multi-modal radiogenomic analysis. PMID:30325352
  • EGFR and KRAS mutation status were clinically tested in 206/211 and 205/211 subjects respectively using SNaPshot PCR covering EGFR exons 18–21 and KRAS exon 2; prior published work used subsets to build radiomic–genomic association maps. PMID:30325352

Sources

  • TCIA collection: NSCLC-Radiogenomics — TCIA DOI 10.7937/K9/TCIA.2017.7hs46erv
  • GEO microarray: GSE28827; GEO RNA-seq: GSE103584
  • PMID:30325352 — Bakr et al. 2018, Scientific Data, DOI 10.1038/sdata.2018.202.

This page was processed by crosslinker on 2026-05-04.