The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups

Authors

Curtis C

Shah SP

Chin SF

Turashvili G

Rueda OM

Dunning MJ

Speed D

Lynch AG

Samarajiwa S

Yuan Y

Graf S

Ha G

Haffari G

Bashashati A

Russell R

McKinney S

METABRIC Group

Langerod A

Green A

Provenzano E

Wishart G

Pinder S

Watson P

Markowetz F

Murphy L

Ellis I

Purushotham A

Borresen-Dale AL

Brenton JD

Tavare S

Caldas C

Aparicio S

Doi

PMID: 22522925 · DOI: 10.1038/nature10983 · Journal: Nature (2012)

TL;DR

This landmark METABRIC study performed integrated analysis of copy number aberrations (CNAs) and gene expression in 2,000 primary breast tumours (997 discovery, 995 validation) with long-term clinical follow-up. By jointly clustering DNA copy number and RNA expression data, the authors identified 10 novel integrative clusters (IntClust 1–10) with distinct genomic drivers and clinical outcomes, refining the prior PAM50 intrinsic subtypes. The study revealed that cis- and trans-acting CNAs dominate the breast cancer expression landscape, identified putative tumour suppressors (PPP2R2A, MTAP, MAP2K4), and uncovered subtype-specific trans-acting modules including a TCR deletion-mediated adaptive immune response in a CNA-devoid subgroup with favourable prognosis.

Cohort & data

  • Discovery cohort: 997 primary breast tumours; validation cohort: 995 primary breast tumours (total n = 1,992).
  • Cancer type: breast cancer (BRCA), sourced from tumour banks in the UK and Canada.
  • Dataset: brca_metabric (METABRIC – Molecular Taxonomy of Breast Cancer International Consortium).
  • Genomic profiling: Affymetrix SNP 6.0 (copy number and SNP genotyping).
  • Transcriptomic profiling: Illumina HT-12 v3 expression arrays.
  • TP53 mutational profiling was performed on a subset.
  • ER-positive and/or LN-negative patients did not receive chemotherapy; ER-negative and LN-positive patients did. No HER2+ patients received trastuzumab.
  • Data deposited at the European Genome-Phenome Archive (EGAS00000000083).

Key findings

  • Germline variants (CNVs, SNPs) and somatic CNAs influenced expression of >39% (11,198/28,609) of expression probes genome-wide, with CNAs associated with the greatest number of expression profiles.
  • 5,401 CNAs were significantly associated in cis and 5,462 in trans (Sidak adjusted P < 0.0001).
  • Joint integrative clustering of CNA and expression data identified 10 integrative clusters (IntClust 1–10) with distinct copy number profiles and clinical outcomes, validated in the independent 995-tumour cohort.
  • IntClust 2 (11q13/14 cis-acting, ER-positive, n = 45): high-risk subgroup with elevated hazard ratios (discovery HR 3.620, 95% CI 1.905–6.878; validation HR 3.353, 95% CI 1.381–8.141), driven by CCND1, EMSY, PAK1, and RSF1 amplifications.
  • IntClust 4 (CNA-devoid, n = 167): favourable prognosis, mixed ER status, enriched for lymphocytic infiltration and an adaptive immune response signature driven by TCR locus deletions (TRG and TRA).
  • IntClust 5 (ERBB2-amplified, n = 94): worst disease-specific survival at 5 and 15 years (discovery HR 3.899, 95% CI 2.234–6.804; validation HR 4.447, 95% CI 2.284–8.661).
  • IntClust 10 (basal-like, n = 96): high genomic instability but relatively good long-term outcomes (after 5 years), with chromosome 5q deletions modulating a mitotic/cell-cycle trans-acting network.
  • Expression outlier analysis identified 45 regions with putative driver genes, including known drivers (ZNF703, PTEN, MYC, CCND1, MDM2, ERBB2, CCNE1) and novel candidates (MDM4, CDK4, NCOR1).

Genes & alterations

  • PPP2R2A (8p21): heterozygous and homozygous deletions driving loss of expression, enriched in mitotic ER-positive (luminal B) cancers. A regulatory subunit of the PP2A mitotic exit holoenzyme complex.
  • MTAP (9p21): deletions frequently co-occurring with CDKN2A and CDKN2B loss across multiple cancer types; confirmed in breast cancer.
  • MAP2K4 (17p11): recurrent deletions with outlying expression in predominantly ER-positive cases, including confirmed homozygous deletions. Evidence supports MAP2K4 as a tumour suppressor in breast cancer.
  • CCND1 (11q13.3): amplification in IntClust 2 (39/45 cases), part of the 11q13/14 amplicon cassette.
  • EMSY (11q13.5): amplification in IntClust 2 (34/45 cases), linking the BRCA2 pathway to sporadic breast cancer.
  • PAK1 and RSF1 (11q14.1): amplified in IntClust 2 as part of the 11q13/14 cis-acting cassette.
  • ERBB2: amplification defines IntClust 5; includes both HER2-enriched (ER-negative) and luminal (ER-positive) cases.
  • TP53: mutational profiling performed; somatic mutations characterised across subtypes.
  • IGF1R, KRAS, EGFR: rare amplification events (<1% of patients) identified through the CNA-expression landscape.
  • CDKN2B, BRCA2, RB1, ATM, SMAD4, NCOR1: homozygous deletions identified as rare but potentially significant events.
  • AURKB, BUB1, CHEK1, FOXM1, TTK: upregulated in the basal-like IntClust 10 subgroup as part of a chromosome 5q deletion-associated trans-acting mitotic network.

Clinical implications

  • The 10 integrative clusters provide a refined molecular stratification beyond PAM50 subtypes, with distinct survival trajectories that reproduced in an independent validation cohort.
  • IntClust 2 (11q13/14 amplified, ER-positive) represents a high-risk subgroup that might be missed by standard ER-positive classification; these patients had steep mortality trajectories.
  • IntClust 5 (ERBB2-amplified) identifies additional patients beyond the intrinsic HER2 subtype who might benefit from anti-HER2 targeted therapy (e.g., trastuzumab). Patients in this study did not receive trastuzumab, explaining the worst survival in this subgroup.
  • IntClust 4 (CNA-devoid) patients have favourable prognosis associated with an adaptive immune response and lymphocytic infiltration, suggesting potential relevance for immunotherapy approaches.
  • Rare events (IGF1R, KRAS, EGFR amplifications; CDKN2B, RB1, ATM deletions) may have implications for targeted agents, particularly tyrosine kinase inhibitors.
  • Approximately 17% of breast cancers are CNA-devoid, making them candidates for mutational profiling to identify alternative drivers.

Limitations & open questions

  • No whole-exome or whole-genome sequencing was performed; the study relied on array-based copy number and expression profiling, so point mutations (beyond TP53) were not systematically captured.
  • Patients were enrolled before widespread availability of trastuzumab and modern targeted therapies, so treatment effects on the IntClust subtypes cannot be assessed.
  • The CNA-devoid subgroup (IntClust 4) lacks identifiable copy number drivers; its mutational landscape remains to be characterised by sequencing.
  • The 11q13/14 amplicon (IntClust 2) contains multiple candidate drivers; the relative contribution of CCND1, EMSY, PAK1, RSF1, and other genes in the cassette is not resolved.
  • Trans-acting associations are correlative; causal mechanisms linking TCR locus deletions to the adaptive immune response require functional validation.
  • The integrative clusters were derived from fresh-frozen tissue; applicability to FFPE clinical specimens and prospective clinical utility remain to be demonstrated.

Citations from this paper used in the wiki

  • “We present an integrated analysis of copy number and gene expression in a discovery and validation set of 997 and 995 primary breast tumours, respectively, with long-term clinical follow-up.” (Abstract)
  • “By delineating expression outlier genes driven in cis by CNAs, we identified putative cancer genes, including deletions in PPP2R2A, MTAP and MAP2K4.” (Abstract)
  • “This subgroup [IntClust 2] exhibited a steep mortality trajectory with elevated hazard ratios (discovery set: 3.620, 95% confidence interval (1.905–6.878); validation set: 3.353, 95% confidence interval (1.381–8.141)).” (p. 5)
  • “The ERBB2-amplified cancers composed of HER2-enriched (ER-negative) cases and luminal (ER-positive) cases appear as IntClust 5 (n = 94), thus refining the ERBB2 intrinsic subtype by grouping additional patients that might benefit from targeted therapy.” (p. 6)
  • “We conclude that genomic copy number loss at the TCR loci drives a trans-acting immune response module that associates with lymphocytic infiltration, and characterizes an otherwise genomically quiescent subgroup of ER-positive and ER-negative patients with good prognosis.” (p. 6)
  • “The CNA-expression landscape also illuminates rare but potentially significant events, including IGF1R, KRAS and EGFR amplifications and CDKN2B, BRCA2, RB1, ATM, SMAD4, NCOR1 and UTX homozygous deletions.” (p. 7)

This page was processed by paper-compiler on 2026-05-06.