The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes

Authors

Bernard Pereira

Suet-Feung Chin

Oscar M. Rueda

Hans-Kristian Moen Vollan

Elena Provenzano

Helen A. Bardwell

Michelle Pugh

Linda Jones

Roslin Russell

Stephen-John Sammut

Dana W.Y. Tsui

Bin Liu

Sarah-Jane Dawson

Jean Abraham

Helen Northen

John F. Peden

Abhik Mukherjee

Gulisa Turashvili

Andrew R. Green

Steve McKinney

Arusha Oloumi

Sohrab Shah

Nitzan Rosenfeld

Leigh Murphy

David R. Bentley

Ian O. Ellis

Arnie Purushotham

Sarah E. Pinder

Anne-Lise Børresen-Dale

Helena M. Earl

Paul D. Pharoah

Mark T. Ross

Samuel Aparicio

Carlos Caldas

Doi

PMID: 27161491 · DOI: 10.1038/ncomms11479 · Journal: Nature Communications (2016)

TL;DR

Pereira, Chin, Rueda et al. targeted-sequenced 173 breast-cancer genes in 2,433 primary tumours from the METABRIC cohort that already had matched copy-number, expression, and long-term clinical follow-up data (median 115 months). Using a ratiometric driver-discovery scheme, they identified 40 mutation-driver (Mut-driver) genes (22 ER+ only, 3 ER- only, 15 shared), characterised co-mutation/mutual-exclusivity patterns, mapped Mut-driver mutations onto the 10 Integrative Clusters (IntClusts), estimated cancer-cell fractions to score clonality, and computed MATH intra-tumour heterogeneity scores. The headline clinical result is that PIK3CA mutations carry distinct prognostic value depending on the IntClust background — they are associated with shorter breast-cancer-specific survival in ER+ tumours belonging to IntClust1, IntClust2, and IntClust9 (each driven by amplification of 17q23, 11q13–14, or 8q24 respectively) but not in other ER+ IntClusts. High MATH scores were also associated with worse outcome in ER+ disease, except in the highly aggressive IntClust2 (CCND1/PAK1 11q13–14 co-amplification) which paradoxically showed low intra-tumour heterogeneity.

Cohort & data

  • 2,433 primary breast tumours (BRCA), with 2,319 patients (95%) carrying long-term follow-up data (median 115 months). Source cohorts pooled in METABRIC: METABRIC, NeoTango, Nottingham, and DETECT (PMID:27161491).
  • 650 normal samples (523 normal-adjacent breast, 127 peripheral blood); 548 matched tumour/normal pairs; 221 primary tumours sequenced in replicate.
  • Targeted capture of 173 genes (~1.2 Mbp) selected from 5 large 2012 breast-cancer sequencing studies plus genes recurrently homozygously deleted in the prior METABRIC CNA analysis. Method: Illumina Nextera Custom Target Enrichment + paired-end 100 bp HiSeq 2000 sequencing — see metabric-targeted-sequencing. Mean depth ≥112× in 80% of samples (median 152×).
  • ER status assigned by IHC where available, reconciled with bimodal ESR1 gaussian-mixture expression calls; ERBB2 expression similarly used to corroborate HER2 IHC.
  • ER+ n=1,780; ER- n=630 (status available for 2,410 tumours).
  • Driver discovery: Vogelstein 20/20-style ratiometric scheme requiring ≥5 recurrent or inactivating mutations, with ER-stratified ONC ≥20% (oncogene) and TSG ≥20% with ONC ≥5% (tumour suppressor) cut-offs.
  • LOH calls from ASCAT; recurrent CNAs from GISTIC2; MATH scores per Mroz & Rocco; panel-of-normals filtering analogous to MuTect.

Key findings

  • Mutation landscape. 32,476 somatic mutations called; 13,084 protein-affecting (11,006 SNVs — 10,193 missense, 808 nonsense, 5 read-through; 1,635 indels — 1,315 frameshift, 320 in-frame; plus 443 splice-site variants). Mean 13 mutations per tumour (5 coding); 131 tumours had ≥30 mutations and 38 had no mutations at all (76 had no coding mutations).
  • Top recurrently mutated genes. Coding mutation frequencies: PIK3CA 40.1%, TP53 35.4%, MUC16 16.8%, AHNAK2 16.2%, SYNE1 12.0%, KMT2C 11.4%, GATA3 11.1% (PMID:27161491).
  • Pathogenic germline mutations. BRCA1 1.36%, BRCA2 1.64%, CHEK2 2.22%, TP53 0.82% of the cohort.
  • 40 Mut-driver genes identified (22 ER+ only, 3 ER- only, 15 shared; only 6 oncogenes); list spans Akt-signalling (PIK3CA, AKT1, PTEN, PIK3R1, FOXO3), cell-cycle (RB1, CDKN1B, CDKN2A), chromatin function (KMT2C, ARID1A, NCOR1, CTCF, KDM6A, PBRM1, TBL1XR1), DNA damage & apoptosis (TP53, BRCA1, BRCA2, CHEK2, BAP1), MAPK signalling (NF1, MAP3K1, MAP2K4, KRAS, ERBB2), tissue organisation (CDH1, CTNNA1, AFDN/MLLT4), transcription (GATA3, TBX3, CBFB, RUNX1, FOXP1, FOXO3), ubiquitination (USP9X), splicing (SF3B1), and others (SMAD4, GPS2, ZFP36L1, MEN1, AGTR2) (PMID:27161491, Fig. 1a).
  • 22.6% of all tumours harboured a coding mutation in one of seven chromatin-function Mut-drivers (KMT2C, ARID1A, NCOR1, CTCF, KDM6A, PBRM1, TBL1XR1).
  • Ras-pathway events. KRAS codon 12 mutations in 11 samples (ER+ ONC=89%, ER- ONC=60%); two HRAS codon-61 and one codon-12 mutations; one BRAF V600E in an ER- tumour. KRAS, PIK3CA, and AKT1 amplification more common in ER- tumours (3.9%, 2.7%, 1.2%).
  • ER-stratified differences. SMAD4 had a high TSG score only in ER+ tumours (TSG=35% vs 0% in ER-). ERBB2 mutation frequencies were similar by ER status (ER+ 2.8%, ER- 3.2%) but codon 755 mutations were enriched in ER+ tumours (13/53 ER+ vs 1/22 ER-, P=0.053). PIK3CA helical-domain (codons 542/545) and codon 345 mutations were enriched in ER+ tumours, while kinase-domain codon 1047 mutations were enriched in ER-.
  • Homozygous-deletion-derived tumour suppressors. Of 40 HD-targeted genes resequenced, 8 were independently confirmed as Mut-driver TSGs (FOXO3, CTNNA1, FOXP1, MEN1, CHEK2 in ER+; CDKN2A, KDM6A, AFDN/MLLT4 in both). CDKN2A HDs in 53/2,087 tumours (most common HD target). JAK1 had 4 HDs + 4 inactivating + 4 missense/LOH events; NT5E (CD73) had 3 HDs + 1 inactivating + 4 missense/LOH events.
  • Clinical-pathology associations. ER+ functional mutations in PIK3CA (OR=0.58, 95%CI 0.49–0.69), GATA3 (OR=0.77), MAP3K1 (OR=0.52), KMT2C (OR=0.69), and CBFB (OR=0.56) associated with lower grade. TP53 mutations associated with higher grade in both ER+ (OR=3.3, P<0.001) and ER- (OR=3.6, P<0.001). Younger age at diagnosis associated with GATA3 (OR=0.63) and CBFB (OR=0.48) inactivating mutations; older age with CDH1, KMT2C, and SF3B1 (OR=4.5).
  • Histology. CDH1 inactivating mutations in 52.6% of lobular carcinomas vs 3.4% of ductal/NST. Mucinous (8.3%) and medullary (8.8%) carcinomas had significantly fewer functional PIK3CA mutations than ductal (36.9%), mixed (50.0%), or lobular (46.9%).
  • Co-mutation / mutual exclusivity. Mutual exclusivity between PIK3CA and each of AKT1 (OR=0.017), PIK3R1 (OR=0.092), and FOXO3 (OR=0.10), reflecting Akt-pathway redundancy. 45.2% of all tumours had a functional mutation in at least one Akt-pathway member (PIK3CA, AKT1, PIK3R1, PTEN, FOXO3). Mutual exclusivity between TP53 and CDH1 (OR=0.23), GATA3 (OR=0.13), and SF3B1 (OR=0.049). Co-mutation of TP53+RB1 (OR=5.3) characteristic of triple-negative cancers; co-mutation of CDH1 with PIK3CA, TBX3, RUNX1, and ERBB2 (OR=5.7) characteristic of lobular biology.
  • Mutation–CNA interactions. Classic mutation/LOH coupling for tumour suppressors (PTEN + 10q23.1 deletion OR=3.4; GPS2 + 17p13.1 deletion OR=7.1). CDH1 (OR=2) and CBFB (OR=5) (both at 16q22) associated with the t(1q;16p)-driven 1q-gain/16q-loss pattern. Mutual exclusivity of AKT1 mutations with ERBB2 17q12 amplification (OR=0.091).
  • IntClust-specific Mut-driver enrichment. TP53 mutated in 84.6% of IntClust10, 64.2% of IntClust5, 50.5% of IntClust4-, 44.7% of IntClust9, 40.7% of IntClust6, but only 4.4% of IntClust8, 10.0% of IntClust3, and 14.0% of IntClust7. GATA3 enriched in IntClust1 (20.0%) and IntClust8 (19.5%). CBFB enriched in IntClust3 (7.8%) and IntClust8 (9.7%). 27/2,021 ER+ tumours mapped to IntClust10 (basal-like-mimicking ER+) had 59.3% TP53 mutation rate vs 18.7% in all ER+.
  • Clonality. Most Mut-driver mutation CCFs centred near 1 (early/clonal). PIK3CA mutations were near-clonal in IntClust9 (median 1.0) and IntClust10 (median 1.0) but more subclonal in IntClust3 (median 0.96, IQR 0.75–1, n=215). 199 tumours (10.4%) carried >1 functional mutation in a single Mut-driver gene; MAP3K1 most often multi-hit (53/152 mutants).
  • Survival in ER+ disease. MAP3K1 (HR=0.56, CI 0.38–0.82) and GATA3 (HR=0.58, CI 0.4–0.82) associated with longer breast-cancer-specific survival (BCSS). Inactivating SMAD4 (HR=3.4, CI 1.4–8.3) and USP9X (HR=3.0, CI 1.2–7.2) mutations associated with worse BCSS. TP53 mutations associated with worse outcome in ER+ (HR=1.6, P=0.0001) but not ER- (HR=1.1).
  • Survival in ER- disease. PIK3CA mutations were prognostic (HR=1.4, CI 1.1–1.9) in both helical and kinase domains; NF1 inactivating mutations associated with shorter BCSS (HR=2.7, CI 1.3–5.5).
  • PIK3CA × IntClust interaction (key novel finding). PIK3CA mutations were associated with poor outcome in ER+ tumours specifically within IntClust1 (P=0.02), IntClust2 (P=0.05), and IntClust9 (P=0.09 trend) — i.e. IntClusts defined by 17q23, 11q13–14, and 8q24 amplifications respectively — but not in IntClusts 3, 4+, 7, or 8.
  • Intra-tumour heterogeneity (MATH). ER+ tumours had lower MATH scores (median 0.29) than ER- (median 0.41). Higher MATH scores associated with worse BCSS in ER+ (P=0.003) but not ER- (P=0.302). IntClust10 had the highest MATH (median 0.47); IntClust2 had paradoxically low MATH (median 0.25) despite poor outcome — likely because the CCND1/PAK1 11q13–14 co-amplification driving IntClust2 is a single early clonal event.

Genes & alterations

  • PIK3CA — 40.1% coding-mutation frequency; ER+ ONC=94%, ER- ONC=81%. Helical-domain (codons 542/545) and codon 345 hotspots enriched in ER+; kinase-domain codon 1047 enriched in ER-. Mutually exclusive with AKT1, PIK3R1, FOXO3. Prognostic in ER- across both domains (HR=1.4); prognostic in ER+ only within IntClusts 1, 2, and 9. Frequently co-occurs with PTEN inactivating mutations (15/57 PTEN-mutant tumours also have recurrent PIK3CA mutations).
  • TP53 — 35.4% coding-mutation frequency; ER+ ONC=42% TSG=35%, ER- ONC=45% TSG=40%. 85.4% of TP53-mutant tumours show LOH. Mutated in 84.6% of IntClust10. Mutations in the DNA-binding domain associated with worst outcomes; prognostic in ER+ (HR=1.6) but not ER-.
  • MUC16, AHNAK2, SYNE1 — high coding-mutation frequencies (16.8%, 16.2%, 12.0%) but flagged as having high background mutation rates; uncertain breast-cancer driver status.
  • KMT2C (MLL3) — 11.4% mutated; ER+ TSG-driver. Often subclonal in IntClust1, more clonal in IntClust8. Lower-grade-associated.
  • GATA3 — 11.1% mutated; enriched in HER2+/ER+ (8.2%) vs HER2+/ER- (0.5%). Mutations associated with younger age, lower grade, longer survival (HR=0.58 ER+).
  • CDH1 — Inactivating mutations in 52.6% of lobular carcinomas (hallmark of invasive lobular); 96.0% of mutant tumours show LOH. Co-mutated with PIK3CA, TBX3, RUNX1, ERBB2. Strongly mutually exclusive with TP53. HD in 18 ductal/NST and 4 lobular cases.
  • ERBB2 — 2.8% ER+, 3.2% ER- somatic mutation frequency; codon 755 hotspot enriched in ER+; mutations observed in relapsed CDH1-mutant lobular carcinomas; mutually exclusive of AKT1 mutations within 17q12-amplified context.
  • MAP3K1 — Most common multi-hit gene (53/152 mutants with >1 functional mutation), suggesting biallelic inactivation. ER+ TSG. Mutations protect against worse survival (HR=0.56).
  • SF3B1 — Recurrent K700E in 3.5% of ER+ tumours (ONC=52%); linked to differential splicing in breast tumours.
  • SWI/SNF members ARID1A and PBRM1 — Inactivating mutations identified as Mut-drivers; raises possibility of synthetic-lethal vulnerabilities (e.g. ARID1B dependency in ARID1A-deficient context).
  • HD-defined TSGs FOXO3, FOXP1, CTNNA1, AFDN/MLLT4, MEN1 — newly nominated breast-cancer Mut-drivers via combined HD + inactivating-mutation evidence.
  • NF1 — Inactivating mutations associated with shorter BCSS in ER- tumours (HR=2.7).
  • SMAD4, USP9X — Inactivating mutations associated with worse BCSS in ER+ (HR=3.4 and 3.0 respectively).
  • KRAS, HRAS, BRAF — classical hotspot activating mutations observed at low frequencies (KRAS codon 12 in 11 samples; HRAS at codons 12/61 in 3 samples; one BRAF V600E in ER-) but did not meet Mut-driver criteria as standalone breast-cancer events.
  • CCND1 + PAK1 — co-amplification at 11q13–14 defines IntClust2; single early clonal driver event explains the paradoxically low MATH heterogeneity despite poor outcomes.
  • ZNF703, MYC — driver CNAs at 8p11 (IntClust6) and 8q24 (IntClust9) respectively, used in the IntClust copy-number taxonomy referenced here.
  • JAK1, NT5E (CD73) — newly identified HD-targeted candidates linked to immune evasion / immune modulation.

Clinical implications

  • PIK3CA prognostic context-dependence in ER+ disease. The paper explicitly argues that the contradictory literature on PIK3CA’s prognostic value in ER+ breast cancer can be resolved by stratifying on IntClust: PIK3CA mutations are associated with shorter BCSS specifically in IntClust1 (17q23-amp), IntClust2 (11q13–14-amp), and IntClust9 (8q24-amp) ER+ tumours. The authors call this out as a finding that should inform stratification in future PI3K-inhibitor trials (PMID:27161491, Discussion).
  • PI3K-inhibitor trial design. No specific PI3K inhibitor is named or tested, but the authors flag that clinical-trial interpretation should account for IntClust background in addition to PIK3CA mutation status.
  • Cross-cancer drug repurposing hypothesis. Mut-drivers like KRAS, ARID1A, CDKN2A, PBRM1, KDM6A, MEN1, FOXP1, USP9X, BAP1, and SMAD4 — well-known drivers in other cancer types — are detectable subsets of breast cancer where therapies developed elsewhere may be applicable. Specifically suggested: ARID1B-targeting in ARID1A-deficient SWI/SNF tumours.
  • AGTR2 (angiotensin II receptor type 2). Recurrent P271L mutations in 6 ER+ tumours; flagged as a possible therapeutic target.
  • HER2 + ER subgroup heterogeneity. HER2+/ER- tumours had higher TP53 functional mutation rates (67.5%) than HER2+/ER+ (42.6%); HER2+/ER+ had higher GATA3 mutation rates (8.2% vs 0.5%). Implications for resistance to anti-HER2 therapy via PIK3CA mutations are noted (HER2-/ER+ had 46.5% PIK3CA mutation rate vs HER2+/ER+ 29.5%).
  • MATH heterogeneity as a survival biomarker in ER+. Higher MATH score (upper quartile) associated with significantly worse BCSS in ER+ (P=0.003) but not ER-. The authors caution this should be interpreted in the IntClust context: IntClust2 (CCND1/PAK1-amp) has low heterogeneity but very poor outcome, so heterogeneity-based biomarkers must be IntClust-aware.
  • Resistance to neo-adjuvant chemotherapy in IntClust2. Tumours co-amplifying CCND1 and PAK1 at 11q13–14 are highlighted as a small but highly aggressive subgroup that has previously been shown to be resistant to neo-adjuvant cytotoxic chemotherapy and warrants development of better strategies.

Limitations & open questions

  • Pan-cancer-frequent genes MUC16, AHNAK2, and SYNE1 appear at high coding-mutation rates (16.8%, 16.2%, 12.0%) but their tumorigenic roles in breast cancer remain uncertain — high background mutation rate confounds interpretation.
  • The 173-gene targeted panel excludes whole-genome / whole-exome events; novel drivers outside the panel would not be detected. The panel was assembled from 5 prior 2012 sequencing studies plus HD-targeted genes — recently nominated drivers are necessarily under-represented.
  • Several IntClust × PIK3CA interactions reach P=0.02–0.09 in fairly small per-IntClust subgroups (e.g. IntClust1 21/117 PIK3CA-mutant; IntClust2 28/74); the authors explicitly flag the need for external validation.
  • MATH score depends on having ≥5 mutations per sample; tumours with very few mutations were excluded from the heterogeneity analysis, limiting analysis of mutation-quiet (e.g. IntClust4-CNA-devoid) tumours.
  • Cancer-cell-fraction estimates depend on copy number and purity calls from ASCAT; subclonal architecture is only approximated, not directly reconstructed.
  • No specific PI3K inhibitor (e.g. alpelisib, buparlisib) is tested — the trial-design implications are correlative/observational only.
  • Pathogenic germline classification was performed in-cohort; concordance with established germline-variant classifications is not formally benchmarked.
  • The “tissue-organisation” cluster of CDH1, CTNNA1, AFDN/MLLT4 is biologically compelling but functional confirmation of CTNNA1 and AFDN/MLLT4 as breast-cancer TSGs is left to future work.

Citations from this paper used in the wiki

  • “We sequence 173 genes in 2,433 primary breast tumours that have copy number aberration (CNA), gene expression and long-term clinical follow-up data. We identify 40 mutation-driver (Mut-driver) genes…” (Abstract).
  • “PIK3CA (coding mutations in 40.1% of the samples) and TP53 (35.4%) dominated the mutation landscape. Only five other genes harboured coding mutations in at least 10% of the samples: MUC16 (16.8%); AHNAK2 (16.2%); SYNE1 (12.0%); KMT2C (also known as MLL3; 11.4%) and GATA3 (11.1%)” (Results, p.2).
  • “Predicted pathogenic germline mutations… in BRCA1 and BRCA2 were identified in 1.36% and 1.64% of the cohort, respectively, and 2.22% of tumours harboured pathogenic CHEK2 germline mutations. TP53 pathogenic germline mutations were found in 0.82% of the tumours.” (Results, p.2).
  • “After stratifying by ER status, we identified 40 genes (22 in ER+ only, 3 in ER- only, 15 shared) that are here on referred to as Mut-drivers genes.” (Results, p.3).
  • “Mutual exclusivity between mutations in PIK3CA and AKT1 (OR=0.017, CI=0.00044–0.1), between PIK3CA and PIK3R1 (OR=0.092…), and between PIK3CA and FOXO3 (OR=0.1…) reflect functional redundancy within the Akt signalling pathway.” (Results, p.7).
  • “TP53 has functional mutations in 84.6% of IntClust10, 64.2% of IntClust5, 50.5% of IntClust4-, and 44.7% and 40.7% of IntClusts 9 and 6, respectively… In contrast, TP53 mutations occurred in only 10.0% of IntClust3, 14.0% of IntClust7 and 4.4% of IntClust8…” (Results, p.7–8).
  • “In ER+ tumours, mutations in both MAP3K1 (HR=0.56, CI=0.38–0.82) and GATA3 (HR=0.58, CI=0.4–0.82) were associated with longer survival… inactivating mutations in SMAD4 (HR=3.4) and USP9X (HR=3) were associated with worse BCSS.” (Results, p.8).
  • “Significant interactions were identified in IntClusts 1+, 2+ and 9+, suggesting that PIK3CA mutations in these specific groups were associated with poor outcome.” (Results, p.10).
  • “Tumours within IntClust10 had the highest MATH scores (median=0.47, IQR=0.31–0.61)… Surprisingly, tumours in IntClust2 had low MATH scores (median=0.25, IQR=0.16–0.37) despite patients in this subgroup having poor outcomes. The 11q13–14 amplicon (two gene cassettes centred around CCND1 and PAK1, respectively) is a key driver CNA in IntClust2.” (Results, p.10).

This page was processed by crosslinker on 2026-05-14.