The somatic mutation profiles of 2,433 breast cancers refines their genomic and transcriptomic landscapes
PMID: 27161491 · DOI: 10.1038/ncomms11479 · Journal: Nature Communications (2016)
TL;DR
Pereira, Chin, Rueda et al. targeted-sequenced 173 breast-cancer genes in 2,433 primary tumours from the METABRIC cohort that already had matched copy-number, expression, and long-term clinical follow-up data (median 115 months). Using a ratiometric driver-discovery scheme, they identified 40 mutation-driver (Mut-driver) genes (22 ER+ only, 3 ER- only, 15 shared), characterised co-mutation/mutual-exclusivity patterns, mapped Mut-driver mutations onto the 10 Integrative Clusters (IntClusts), estimated cancer-cell fractions to score clonality, and computed MATH intra-tumour heterogeneity scores. The headline clinical result is that PIK3CA mutations carry distinct prognostic value depending on the IntClust background — they are associated with shorter breast-cancer-specific survival in ER+ tumours belonging to IntClust1, IntClust2, and IntClust9 (each driven by amplification of 17q23, 11q13–14, or 8q24 respectively) but not in other ER+ IntClusts. High MATH scores were also associated with worse outcome in ER+ disease, except in the highly aggressive IntClust2 (CCND1/PAK1 11q13–14 co-amplification) which paradoxically showed low intra-tumour heterogeneity.
Cohort & data
- 2,433 primary breast tumours (BRCA), with 2,319 patients (95%) carrying long-term follow-up data (median 115 months). Source cohorts pooled in METABRIC: METABRIC, NeoTango, Nottingham, and DETECT (PMID:27161491).
- 650 normal samples (523 normal-adjacent breast, 127 peripheral blood); 548 matched tumour/normal pairs; 221 primary tumours sequenced in replicate.
- Targeted capture of 173 genes (~1.2 Mbp) selected from 5 large 2012 breast-cancer sequencing studies plus genes recurrently homozygously deleted in the prior METABRIC CNA analysis. Method: Illumina Nextera Custom Target Enrichment + paired-end 100 bp HiSeq 2000 sequencing — see metabric-targeted-sequencing. Mean depth ≥112× in 80% of samples (median 152×).
- ER status assigned by IHC where available, reconciled with bimodal ESR1 gaussian-mixture expression calls; ERBB2 expression similarly used to corroborate HER2 IHC.
- ER+ n=1,780; ER- n=630 (status available for 2,410 tumours).
- Driver discovery: Vogelstein 20/20-style ratiometric scheme requiring ≥5 recurrent or inactivating mutations, with ER-stratified ONC ≥20% (oncogene) and TSG ≥20% with ONC ≥5% (tumour suppressor) cut-offs.
- LOH calls from ASCAT; recurrent CNAs from GISTIC2; MATH scores per Mroz & Rocco; panel-of-normals filtering analogous to MuTect.
Key findings
- Mutation landscape. 32,476 somatic mutations called; 13,084 protein-affecting (11,006 SNVs — 10,193 missense, 808 nonsense, 5 read-through; 1,635 indels — 1,315 frameshift, 320 in-frame; plus 443 splice-site variants). Mean 13 mutations per tumour (5 coding); 131 tumours had ≥30 mutations and 38 had no mutations at all (76 had no coding mutations).
- Top recurrently mutated genes. Coding mutation frequencies: PIK3CA 40.1%, TP53 35.4%, MUC16 16.8%, AHNAK2 16.2%, SYNE1 12.0%, KMT2C 11.4%, GATA3 11.1% (PMID:27161491).
- Pathogenic germline mutations. BRCA1 1.36%, BRCA2 1.64%, CHEK2 2.22%, TP53 0.82% of the cohort.
- 40 Mut-driver genes identified (22 ER+ only, 3 ER- only, 15 shared; only 6 oncogenes); list spans Akt-signalling (PIK3CA, AKT1, PTEN, PIK3R1, FOXO3), cell-cycle (RB1, CDKN1B, CDKN2A), chromatin function (KMT2C, ARID1A, NCOR1, CTCF, KDM6A, PBRM1, TBL1XR1), DNA damage & apoptosis (TP53, BRCA1, BRCA2, CHEK2, BAP1), MAPK signalling (NF1, MAP3K1, MAP2K4, KRAS, ERBB2), tissue organisation (CDH1, CTNNA1, AFDN/MLLT4), transcription (GATA3, TBX3, CBFB, RUNX1, FOXP1, FOXO3), ubiquitination (USP9X), splicing (SF3B1), and others (SMAD4, GPS2, ZFP36L1, MEN1, AGTR2) (PMID:27161491, Fig. 1a).
- 22.6% of all tumours harboured a coding mutation in one of seven chromatin-function Mut-drivers (KMT2C, ARID1A, NCOR1, CTCF, KDM6A, PBRM1, TBL1XR1).
- Ras-pathway events. KRAS codon 12 mutations in 11 samples (ER+ ONC=89%, ER- ONC=60%); two HRAS codon-61 and one codon-12 mutations; one BRAF V600E in an ER- tumour. KRAS, PIK3CA, and AKT1 amplification more common in ER- tumours (3.9%, 2.7%, 1.2%).
- ER-stratified differences. SMAD4 had a high TSG score only in ER+ tumours (TSG=35% vs 0% in ER-). ERBB2 mutation frequencies were similar by ER status (ER+ 2.8%, ER- 3.2%) but codon 755 mutations were enriched in ER+ tumours (13/53 ER+ vs 1/22 ER-, P=0.053). PIK3CA helical-domain (codons 542/545) and codon 345 mutations were enriched in ER+ tumours, while kinase-domain codon 1047 mutations were enriched in ER-.
- Homozygous-deletion-derived tumour suppressors. Of 40 HD-targeted genes resequenced, 8 were independently confirmed as Mut-driver TSGs (FOXO3, CTNNA1, FOXP1, MEN1, CHEK2 in ER+; CDKN2A, KDM6A, AFDN/MLLT4 in both). CDKN2A HDs in 53/2,087 tumours (most common HD target). JAK1 had 4 HDs + 4 inactivating + 4 missense/LOH events; NT5E (CD73) had 3 HDs + 1 inactivating + 4 missense/LOH events.
- Clinical-pathology associations. ER+ functional mutations in PIK3CA (OR=0.58, 95%CI 0.49–0.69), GATA3 (OR=0.77), MAP3K1 (OR=0.52), KMT2C (OR=0.69), and CBFB (OR=0.56) associated with lower grade. TP53 mutations associated with higher grade in both ER+ (OR=3.3, P<0.001) and ER- (OR=3.6, P<0.001). Younger age at diagnosis associated with GATA3 (OR=0.63) and CBFB (OR=0.48) inactivating mutations; older age with CDH1, KMT2C, and SF3B1 (OR=4.5).
- Histology. CDH1 inactivating mutations in 52.6% of lobular carcinomas vs 3.4% of ductal/NST. Mucinous (8.3%) and medullary (8.8%) carcinomas had significantly fewer functional PIK3CA mutations than ductal (36.9%), mixed (50.0%), or lobular (46.9%).
- Co-mutation / mutual exclusivity. Mutual exclusivity between PIK3CA and each of AKT1 (OR=0.017), PIK3R1 (OR=0.092), and FOXO3 (OR=0.10), reflecting Akt-pathway redundancy. 45.2% of all tumours had a functional mutation in at least one Akt-pathway member (PIK3CA, AKT1, PIK3R1, PTEN, FOXO3). Mutual exclusivity between TP53 and CDH1 (OR=0.23), GATA3 (OR=0.13), and SF3B1 (OR=0.049). Co-mutation of TP53+RB1 (OR=5.3) characteristic of triple-negative cancers; co-mutation of CDH1 with PIK3CA, TBX3, RUNX1, and ERBB2 (OR=5.7) characteristic of lobular biology.
- Mutation–CNA interactions. Classic mutation/LOH coupling for tumour suppressors (PTEN + 10q23.1 deletion OR=3.4; GPS2 + 17p13.1 deletion OR=7.1). CDH1 (OR=2) and CBFB (OR=5) (both at 16q22) associated with the t(1q;16p)-driven 1q-gain/16q-loss pattern. Mutual exclusivity of AKT1 mutations with ERBB2 17q12 amplification (OR=0.091).
- IntClust-specific Mut-driver enrichment. TP53 mutated in 84.6% of IntClust10, 64.2% of IntClust5, 50.5% of IntClust4-, 44.7% of IntClust9, 40.7% of IntClust6, but only 4.4% of IntClust8, 10.0% of IntClust3, and 14.0% of IntClust7. GATA3 enriched in IntClust1 (20.0%) and IntClust8 (19.5%). CBFB enriched in IntClust3 (7.8%) and IntClust8 (9.7%). 27/2,021 ER+ tumours mapped to IntClust10 (basal-like-mimicking ER+) had 59.3% TP53 mutation rate vs 18.7% in all ER+.
- Clonality. Most Mut-driver mutation CCFs centred near 1 (early/clonal). PIK3CA mutations were near-clonal in IntClust9 (median 1.0) and IntClust10 (median 1.0) but more subclonal in IntClust3 (median 0.96, IQR 0.75–1, n=215). 199 tumours (10.4%) carried >1 functional mutation in a single Mut-driver gene; MAP3K1 most often multi-hit (53/152 mutants).
- Survival in ER+ disease. MAP3K1 (HR=0.56, CI 0.38–0.82) and GATA3 (HR=0.58, CI 0.4–0.82) associated with longer breast-cancer-specific survival (BCSS). Inactivating SMAD4 (HR=3.4, CI 1.4–8.3) and USP9X (HR=3.0, CI 1.2–7.2) mutations associated with worse BCSS. TP53 mutations associated with worse outcome in ER+ (HR=1.6, P=0.0001) but not ER- (HR=1.1).
- Survival in ER- disease. PIK3CA mutations were prognostic (HR=1.4, CI 1.1–1.9) in both helical and kinase domains; NF1 inactivating mutations associated with shorter BCSS (HR=2.7, CI 1.3–5.5).
- PIK3CA × IntClust interaction (key novel finding). PIK3CA mutations were associated with poor outcome in ER+ tumours specifically within IntClust1 (P=0.02), IntClust2 (P=0.05), and IntClust9 (P=0.09 trend) — i.e. IntClusts defined by 17q23, 11q13–14, and 8q24 amplifications respectively — but not in IntClusts 3, 4+, 7, or 8.
- Intra-tumour heterogeneity (MATH). ER+ tumours had lower MATH scores (median 0.29) than ER- (median 0.41). Higher MATH scores associated with worse BCSS in ER+ (P=0.003) but not ER- (P=0.302). IntClust10 had the highest MATH (median 0.47); IntClust2 had paradoxically low MATH (median 0.25) despite poor outcome — likely because the CCND1/PAK1 11q13–14 co-amplification driving IntClust2 is a single early clonal event.
Genes & alterations
- PIK3CA — 40.1% coding-mutation frequency; ER+ ONC=94%, ER- ONC=81%. Helical-domain (codons 542/545) and codon 345 hotspots enriched in ER+; kinase-domain codon 1047 enriched in ER-. Mutually exclusive with AKT1, PIK3R1, FOXO3. Prognostic in ER- across both domains (HR=1.4); prognostic in ER+ only within IntClusts 1, 2, and 9. Frequently co-occurs with PTEN inactivating mutations (15/57 PTEN-mutant tumours also have recurrent PIK3CA mutations).
- TP53 — 35.4% coding-mutation frequency; ER+ ONC=42% TSG=35%, ER- ONC=45% TSG=40%. 85.4% of TP53-mutant tumours show LOH. Mutated in 84.6% of IntClust10. Mutations in the DNA-binding domain associated with worst outcomes; prognostic in ER+ (HR=1.6) but not ER-.
- MUC16, AHNAK2, SYNE1 — high coding-mutation frequencies (16.8%, 16.2%, 12.0%) but flagged as having high background mutation rates; uncertain breast-cancer driver status.
- KMT2C (MLL3) — 11.4% mutated; ER+ TSG-driver. Often subclonal in IntClust1, more clonal in IntClust8. Lower-grade-associated.
- GATA3 — 11.1% mutated; enriched in HER2+/ER+ (8.2%) vs HER2+/ER- (0.5%). Mutations associated with younger age, lower grade, longer survival (HR=0.58 ER+).
- CDH1 — Inactivating mutations in 52.6% of lobular carcinomas (hallmark of invasive lobular); 96.0% of mutant tumours show LOH. Co-mutated with PIK3CA, TBX3, RUNX1, ERBB2. Strongly mutually exclusive with TP53. HD in 18 ductal/NST and 4 lobular cases.
- ERBB2 — 2.8% ER+, 3.2% ER- somatic mutation frequency; codon 755 hotspot enriched in ER+; mutations observed in relapsed CDH1-mutant lobular carcinomas; mutually exclusive of AKT1 mutations within 17q12-amplified context.
- MAP3K1 — Most common multi-hit gene (53/152 mutants with >1 functional mutation), suggesting biallelic inactivation. ER+ TSG. Mutations protect against worse survival (HR=0.56).
- SF3B1 — Recurrent K700E in 3.5% of ER+ tumours (ONC=52%); linked to differential splicing in breast tumours.
- SWI/SNF members ARID1A and PBRM1 — Inactivating mutations identified as Mut-drivers; raises possibility of synthetic-lethal vulnerabilities (e.g. ARID1B dependency in ARID1A-deficient context).
- HD-defined TSGs FOXO3, FOXP1, CTNNA1, AFDN/MLLT4, MEN1 — newly nominated breast-cancer Mut-drivers via combined HD + inactivating-mutation evidence.
- NF1 — Inactivating mutations associated with shorter BCSS in ER- tumours (HR=2.7).
- SMAD4, USP9X — Inactivating mutations associated with worse BCSS in ER+ (HR=3.4 and 3.0 respectively).
- KRAS, HRAS, BRAF — classical hotspot activating mutations observed at low frequencies (KRAS codon 12 in 11 samples; HRAS at codons 12/61 in 3 samples; one BRAF V600E in ER-) but did not meet Mut-driver criteria as standalone breast-cancer events.
- CCND1 + PAK1 — co-amplification at 11q13–14 defines IntClust2; single early clonal driver event explains the paradoxically low MATH heterogeneity despite poor outcomes.
- ZNF703, MYC — driver CNAs at 8p11 (IntClust6) and 8q24 (IntClust9) respectively, used in the IntClust copy-number taxonomy referenced here.
- JAK1, NT5E (CD73) — newly identified HD-targeted candidates linked to immune evasion / immune modulation.
Clinical implications
- PIK3CA prognostic context-dependence in ER+ disease. The paper explicitly argues that the contradictory literature on PIK3CA’s prognostic value in ER+ breast cancer can be resolved by stratifying on IntClust: PIK3CA mutations are associated with shorter BCSS specifically in IntClust1 (17q23-amp), IntClust2 (11q13–14-amp), and IntClust9 (8q24-amp) ER+ tumours. The authors call this out as a finding that should inform stratification in future PI3K-inhibitor trials (PMID:27161491, Discussion).
- PI3K-inhibitor trial design. No specific PI3K inhibitor is named or tested, but the authors flag that clinical-trial interpretation should account for IntClust background in addition to PIK3CA mutation status.
- Cross-cancer drug repurposing hypothesis. Mut-drivers like KRAS, ARID1A, CDKN2A, PBRM1, KDM6A, MEN1, FOXP1, USP9X, BAP1, and SMAD4 — well-known drivers in other cancer types — are detectable subsets of breast cancer where therapies developed elsewhere may be applicable. Specifically suggested: ARID1B-targeting in ARID1A-deficient SWI/SNF tumours.
- AGTR2 (angiotensin II receptor type 2). Recurrent P271L mutations in 6 ER+ tumours; flagged as a possible therapeutic target.
- HER2 + ER subgroup heterogeneity. HER2+/ER- tumours had higher TP53 functional mutation rates (67.5%) than HER2+/ER+ (42.6%); HER2+/ER+ had higher GATA3 mutation rates (8.2% vs 0.5%). Implications for resistance to anti-HER2 therapy via PIK3CA mutations are noted (HER2-/ER+ had 46.5% PIK3CA mutation rate vs HER2+/ER+ 29.5%).
- MATH heterogeneity as a survival biomarker in ER+. Higher MATH score (upper quartile) associated with significantly worse BCSS in ER+ (P=0.003) but not ER-. The authors caution this should be interpreted in the IntClust context: IntClust2 (CCND1/PAK1-amp) has low heterogeneity but very poor outcome, so heterogeneity-based biomarkers must be IntClust-aware.
- Resistance to neo-adjuvant chemotherapy in IntClust2. Tumours co-amplifying CCND1 and PAK1 at 11q13–14 are highlighted as a small but highly aggressive subgroup that has previously been shown to be resistant to neo-adjuvant cytotoxic chemotherapy and warrants development of better strategies.
Limitations & open questions
- Pan-cancer-frequent genes MUC16, AHNAK2, and SYNE1 appear at high coding-mutation rates (16.8%, 16.2%, 12.0%) but their tumorigenic roles in breast cancer remain uncertain — high background mutation rate confounds interpretation.
- The 173-gene targeted panel excludes whole-genome / whole-exome events; novel drivers outside the panel would not be detected. The panel was assembled from 5 prior 2012 sequencing studies plus HD-targeted genes — recently nominated drivers are necessarily under-represented.
- Several IntClust × PIK3CA interactions reach P=0.02–0.09 in fairly small per-IntClust subgroups (e.g. IntClust1 21/117 PIK3CA-mutant; IntClust2 28/74); the authors explicitly flag the need for external validation.
- MATH score depends on having ≥5 mutations per sample; tumours with very few mutations were excluded from the heterogeneity analysis, limiting analysis of mutation-quiet (e.g. IntClust4-CNA-devoid) tumours.
- Cancer-cell-fraction estimates depend on copy number and purity calls from ASCAT; subclonal architecture is only approximated, not directly reconstructed.
- No specific PI3K inhibitor (e.g. alpelisib, buparlisib) is tested — the trial-design implications are correlative/observational only.
- Pathogenic germline classification was performed in-cohort; concordance with established germline-variant classifications is not formally benchmarked.
- The “tissue-organisation” cluster of CDH1, CTNNA1, AFDN/MLLT4 is biologically compelling but functional confirmation of CTNNA1 and AFDN/MLLT4 as breast-cancer TSGs is left to future work.
Citations from this paper used in the wiki
- “We sequence 173 genes in 2,433 primary breast tumours that have copy number aberration (CNA), gene expression and long-term clinical follow-up data. We identify 40 mutation-driver (Mut-driver) genes…” (Abstract).
- “PIK3CA (coding mutations in 40.1% of the samples) and TP53 (35.4%) dominated the mutation landscape. Only five other genes harboured coding mutations in at least 10% of the samples: MUC16 (16.8%); AHNAK2 (16.2%); SYNE1 (12.0%); KMT2C (also known as MLL3; 11.4%) and GATA3 (11.1%)” (Results, p.2).
- “Predicted pathogenic germline mutations… in BRCA1 and BRCA2 were identified in 1.36% and 1.64% of the cohort, respectively, and 2.22% of tumours harboured pathogenic CHEK2 germline mutations. TP53 pathogenic germline mutations were found in 0.82% of the tumours.” (Results, p.2).
- “After stratifying by ER status, we identified 40 genes (22 in ER+ only, 3 in ER- only, 15 shared) that are here on referred to as Mut-drivers genes.” (Results, p.3).
- “Mutual exclusivity between mutations in PIK3CA and AKT1 (OR=0.017, CI=0.00044–0.1), between PIK3CA and PIK3R1 (OR=0.092…), and between PIK3CA and FOXO3 (OR=0.1…) reflect functional redundancy within the Akt signalling pathway.” (Results, p.7).
- “TP53 has functional mutations in 84.6% of IntClust10, 64.2% of IntClust5, 50.5% of IntClust4-, and 44.7% and 40.7% of IntClusts 9 and 6, respectively… In contrast, TP53 mutations occurred in only 10.0% of IntClust3, 14.0% of IntClust7 and 4.4% of IntClust8…” (Results, p.7–8).
- “In ER+ tumours, mutations in both MAP3K1 (HR=0.56, CI=0.38–0.82) and GATA3 (HR=0.58, CI=0.4–0.82) were associated with longer survival… inactivating mutations in SMAD4 (HR=3.4) and USP9X (HR=3) were associated with worse BCSS.” (Results, p.8).
- “Significant interactions were identified in IntClusts 1+, 2+ and 9+, suggesting that PIK3CA mutations in these specific groups were associated with poor outcome.” (Results, p.10).
- “Tumours within IntClust10 had the highest MATH scores (median=0.47, IQR=0.31–0.61)… Surprisingly, tumours in IntClust2 had low MATH scores (median=0.25, IQR=0.16–0.37) despite patients in this subgroup having poor outcomes. The 11q13–14 amplicon (two gene cassettes centred around CCND1 and PAK1, respectively) is a key driver CNA in IntClust2.” (Results, p.10).
This page was processed by crosslinker on 2026-05-14.