Whole-Genome and Epigenomic Landscapes of Etiologically Distinct Subtypes of Cholangiocarcinoma
PMID: 28667006 · DOI: 10.1158/2159-8290.CD-17-0368 · Journal: Cancer Discovery (2017)
TL;DR
Jusakul et al., on behalf of the International Cancer Genome Consortium, profiled 489 cholangiocarcinomas (CCAs) from 10 countries — combining whole-genome (71), exome (200), targeted (188), copy-number (175), DNA methylation (138), and gene-expression (118) platforms — to define four etiology-driven molecular subtypes. Fluke-Positive Clusters 1 and 2 are enriched in ERBB2 amplifications and TP53 mutations; Fluke-Negative Clusters 3 and 4 either exhibit high copy-number alterations with PD-1/PD-L2 immune-checkpoint expression (Cluster 3) or carry chromatin-modifier mutations (IDH1/IDH2, BAP1) with FGFR/PRKA-related rearrangements (Cluster 4). The study nominates four new CCA driver genes (RASA1, STK11, MAP2K4, SF3B1), reports the first FGFR3-TACC3 fusion in CCA, identifies FGFR2 3′UTR deletions as a new mechanism of FGFR2 upregulation, and reveals two epigenetically distinct hypermethylation subtypes (CpG-island vs CpG-shore) reflecting carcinogen-driven vs genetically-driven oncogenesis PMID:28667006.
Cohort & data
- Total cohort: 489 cholangiocarcinomas (133 Fluke-Positive: 132 Opisthorchis viverrini, 1 Clonorchis sinensis; 356 Fluke-Negative; 39 HBV/HCV-positive; 5 PSC-positive) from 10 countries (Singapore, Romania, Thailand, Italy, France, South Korea, Brazil, Taiwan, China, Japan), all staged AJCC 7th edition, none pretreated PMID:28667006.
- Anatomical breakdown: intrahepatic (IHCH), perihilar (PHCH), and distal extrahepatic (EHCH) tumors, with anatomical and survival data on 459 samples PMID:28667006.
- Genomic platforms: whole-genome sequencing on 71 tumor/normal pairs at average 64.2× depth (Illumina HiSeq X10/2500/2000); whole-exome sequencing on 200 cases (previously published); targeted DNA sequencing of 404 genes via SureSelect XT2 capture on 188 cases (HiSeq 4000, 99.6% coding coverage); HumanOmniExpress SNP arrays on 175 cases (Affymetrix-style SNP CN profiling); Illumina 450K methylation BeadChip on 138 cases; HumanHT-12 Expression BeadChip (microarray gene expression, Illumina platform) on 118 cases PMID:28667006.
- Integrative clustering: iCluster (iClusterPlus) on the 94 samples with all four data types (sSNVs/indels, sCNAs, mRNA, methylation); validated by randomized subsampling and an expanded 121-sample reanalysis with 90% cluster-prediction accuracy PMID:28667006.
- Validation cohort: newly classified samples plus a published 38-sample US TCGA CCA series for survival reproducibility PMID:28667006.
- Cell lines: H69 (immortalized cholangiocyte), HEK293T, EGI-1 (DSMZ ACC 385), HUCCT1 (JCRB0425), and M213 (JCRB1557, from the Liver Fluke and Cholangiocarcinoma Research Center) were used for luciferase reporter and shRNA functional assays PMID:28667006.
- Dataset: chol_icgc_2017. Comparison study: chol_jhu_2013.
Key findings
- Four molecular CCA clusters from integrative multi-omic clustering. Cluster 1 (mostly Fluke-Pos): CpG-island promoter hypermethylation, enriched in ARID1A (p < 0.01) and BRCA1/BRCA2 (p < 0.05) mutations, high non-synonymous mutation burden, H3K27me3-promoter mutation enrichment. Cluster 2 (mixed): upregulated CTNNB1, WNT5B, AKT1 expression. Clusters 1 and 2 are enriched in TP53 mutations (p < 0.001) and ERBB2 amplifications (p < 0.01), Fisher’s exact test. Cluster 3 (Fluke-Neg): highest sCNA burden, immune-checkpoint upregulation (PDCD1, PDCD1LG2, BTLA). Cluster 4 (Fluke-Neg): BAP1, IDH1/IDH2 mutations, FGFR alterations (all p < 0.01), CpG-shore hypermethylation PMID:28667006.
- Cluster correlates with anatomy and prognosis. Clusters 1 and 2 are enriched in extrahepatic tumors; Clusters 3 and 4 are almost entirely intrahepatic (p < 0.001). Patients in Clusters 3 and 4 have significantly better overall survival (log-rank p < 0.001), independent of fluke status, anatomical location, and stage (Cox p < 0.05); reproduced in an independent validation cohort PMID:28667006.
- Mutation burden: average 82 non-silent somatic mutations/tumor (median 47); 64 sSNVs (median 41) and 18 indels (median 6). Three CCAs were hypermutated (5.91 sSNVs/Mb and 24.17 indels/Mb) with MSI signatures and two carrying POLE mutations. Excluding hypermutators, Fluke-Pos CCAs carried significantly more somatic mutations than Fluke-Neg CCAs (median 4,700 vs 3,143/tumor, p < 0.05) PMID:28667006.
- Four new CCA driver genes (32 SMGs total by MutSigCV and IntOGen, q < 0.1 by both): RASA1 (4.1%, mostly frame-shift/nonsense; shRNA knockdown enhances migration/invasion in CCA cell lines), STK11 (5%, mostly inactivating), MAP2K4 (homozygous deletions in 2 Fluke-Pos cases plus 2.2% mutations, half inactivating), and SF3B1 (4.6%, hotspots at codons 625 and 700, previously seen in uveal melanoma and breast cancer) PMID:28667006.
- ERBB2 amplification is enriched in Fluke-Pos CCAs (10.4% vs 2.7% in Fluke-Neg, p < 0.01). ERBB2-amplified samples averaged 14 copies (by ASCAT on SNP array or Quandico on sequencing data) and were independently validated by FISH. Activating ERBB2 mutations (S310F/Y, G292R, T862A, D769H, L869R, V842I, G660D) were detected in 9 cases (2%) PMID:28667006.
- Other recurrent CN events: MYC amplification (n=12), MDM2 (n=9), EGFR (n=11), CCND1 (n=7) amplifications; CDKN2A (n=17), UTY (n=17), KDM5D (n=16) deletions PMID:28667006.
- Structural variants: CREST called ~93 somatic SVs/tumor (median 69, range 0–395), 91% PCR-validated, mostly intra-chromosomal (65%), associated with ARID1A, CDKN2A/B, TTC28, and fragile site 1q21.3. SV burden varied across clusters (Kruskal–Wallis p < 0.05); TP53, FBXW7, and SMAD4 mutation status was associated with increased SV burden (q < 0.1) PMID:28667006.
- FGFR rearrangement landscape. Five in-frame fusions with intact tyrosine-kinase domains: FGFR2-STK26, FGFR2-TBC1D1, FGFR2-WAC, FGFR2-BICC1, and FGFR3-TACC3. The FGFR3-TACC3 fusion is the first reported in CCA. All FGFR2 rearrangements occurred exclusively in Cluster 4 (p < 0.001) PMID:28667006.
- FGFR2 3′UTR loss as a new activation mechanism. Recurrent truncating SVs translocated FGFR2 without its 3′UTR to intergenic regions; FGFR2-truncated CCAs had significantly higher FGFR2 transcript levels (p < 0.01), and luciferase reporter assays in HEK293T and H69 cells confirmed that the intact FGFR2 3′UTR represses expression PMID:28667006.
- PRKACB (PKA catalytic subunit B) rearrangements. ATP1B1-PRKACB and LINC00261-PRKACB fusions retain the PKA pseudokinase domain and may activate downstream MAPK signaling PMID:28667006.
- L1 (LINE-1) retrotransposition is recurrent and Fluke-Pos-associated. 52 events in 20/71 (28.2%) of WGS tumors, 98% PCR-validated, predominantly originating from a TTC28 intron-1 L1 element. L1 retrotransposition was enriched in Fluke-Pos tumors (p < 0.01) and correlated with increased SV burden (p < 0.05) PMID:28667006.
- TERT promoter mutations are rare in CCA. Only 2/71 WGS cases (2.8%) carried TERT promoter mutations (chr5:1295228); no other recurrent non-coding promoter point mutations were identified PMID:28667006.
- FIREFLY (a new method) identifies non-coding regulatory dysregulation at the gene-set level. Applied to 70 WGS samples and 6,639 mutated promoters, FIREFLY — which integrates protein-binding microarray data for 486 TFs — identified four gene sets enriched for promoter mutations that alter TF binding and produce concordant transcriptional dysregulation. Two of the four sets (MIKKELSEN_MCV6_HCP_WITH_H3K27ME3 and MIKKELSEN_MEF_ICP_WITH_H3K27ME3) are PRC2 targets bearing the H3K27me3 mark, and 3 of 4 sets were preferentially mutated in Cluster 1, supporting a role for PRC2/H3K27me3 dysregulation in that subtype PMID:28667006.
- Two distinct DNA-methylation subgroups. Unsupervised methylation clustering on 138 CCAs reproduced Cluster 1 (CpG-island hypermethylation, Fluke-Pos) and Cluster 4 (CpG-shore hypermethylation, Fluke-Neg) at 96.3% and 86.1% concordance with the integrative clusters. GSEA showed both target PRC2 pathways but at different genomic features. Promoter methylation was inversely correlated with transcript level in both clusters (q < 0.05) PMID:28667006.
- Distinct mechanisms drive Cluster 1 vs Cluster 4 hypermethylation. Cluster 1 shows downregulation of the demethylase TET1 and upregulation of histone methyltransferase EZH2. Cluster 4 is enriched in IDH1/IDH2 mutations (31.6% vs 1.0% in other clusters, q < 0.001) and, among IDH-WT Cluster 4 tumors, enriched in BAP1 inactivating point mutations and regional deletions (q < 0.001 and q < 0.05 respectively) PMID:28667006.
- Mutational signatures: ten established signatures detected — COSMIC Signatures 1, 5, 8, 16, 17; APOBEC (Signatures 2, 13); MMR-deficient (Signatures 6, 20); aristolochic acid (Signature 22). Fluke-Pos CCAs were enriched for APOBEC mutation burden (p < 0.001). Signature 1 (CpG>TpG) was elevated in Cluster 1 even after age adjustment (p < 0.001), and CpG>TpG mutations were preferentially located near hypermethylated regions in Cluster 1 (p < 0.001) but not Cluster 4 — consistent with deamination of methylated cytosines as a Cluster 1 mutational driver PMID:28667006.
- Cluster 1 is subclonally heterogeneous, Cluster 4 is clonal. VAF-distribution analysis of point mutations (in copy-neutral regions, purity-adjusted) showed wide spread in Cluster 1 indicating heterogeneous subclones, vs tightly clonal structure in Cluster 4 PMID:28667006.
- Driver-gene–anatomy associations: BAP1 and KRAS were more frequently mutated in intrahepatic CCAs (q < 0.1, Fisher’s exact test), persisting after fluke-status adjustment in multivariate regression PMID:28667006.
- Immune signal in Cluster 3. ESTIMATE showed immune infiltration in both Clusters 2 and 3, but only Cluster 3 specifically upregulated immune-checkpoint genes (PDCD1, PDCD1LG2, BTLA) and antigen cross-presentation, CD28 co-stimulation, and T-cell signaling pathways PMID:28667006.
Genes & alterations
- ERBB2 — Amplification in 3.9–8.5% of CCAs, enriched in Fluke-Pos (10.4% vs 2.7%, p < 0.01); average 14 copies; FISH-validated. Activating point mutations (S310F/Y, G292R, T862A, D769H, L869R, V842I, G660D) in 2% of cases. Defining co-feature of Clusters 1 and 2; nominated as anti-HER2 therapeutic target PMID:28667006.
- TP53 — Significantly enriched in Clusters 1 and 2 (p < 0.001). TP53 mutation associated with increased SV burden (q < 0.1) PMID:28667006.
- ARID1A — Enriched in Cluster 1 (p < 0.01); also a recurrent SV target PMID:28667006.
- BRCA1/BRCA2 — Enriched in Cluster 1 (p < 0.05) PMID:28667006.
- IDH1/IDH2 — Enriched in Cluster 4 (31.6% vs 1.0% in other clusters, q < 0.001); proposed driver of CpG-shore DNA hypermethylation via 2-hydroxyglutarate oncometabolite production. Suggested as candidates for IDH inhibitors (e.g. ivosidenib, then in trial NCT02073994) PMID:28667006.
- BAP1 — Enriched in IDH-WT Cluster 4 (q < 0.001 for inactivating point mutations, q < 0.05 for regional deletions); BAP1-mutant CCAs show increased CpG hypermethylation. More frequent in intrahepatic CCAs (q < 0.1) PMID:28667006.
- FGFR2 — In-frame fusions (FGFR2-STK26, -TBC1D1, -WAC, -BICC1) and a new class of recurrent truncating rearrangements removing the 3′UTR — both classes elevate FGFR2 expression. Indels (n=3), SNVs (n=10), and copy-gain (n=1) also observed. Rearrangements are exclusive to Cluster 4 (p < 0.001); aggregated FGFR2 alterations enriched in Cluster 4 (p < 0.01) PMID:28667006.
- FGFR3–TACC3 — First reported in CCA; previously characterized as oncogenic in bladder cancer, glioblastoma, and lung cancer PMID:28667006.
- PRKACB — ATP1B1-PRKACB and LINC00261-PRKACB fusions retain the pseudokinase domain; proposed to activate MAPK signaling PMID:28667006.
- RASA1 — Newly nominated CCA driver; inactivating mutations in 4.1% (10 frame-shift, 4 nonsense) plus focal CN losses, both correlated with reduced expression. shRNA knockdown in CCA cell lines (M213, HUCCT1) enhanced migration and invasion in Transwell assays — supporting tumor-suppressor function PMID:28667006.
- STK11 — Newly nominated CCA driver; mutated in 5%, mostly inactivating (7 nonsense, 9 frame-shift) PMID:28667006.
- MAP2K4 — Newly nominated CCA driver; homozygous deletions in 2 Fluke-Pos cases plus 2.2% mutations (half inactivating) — consistent with tumor-suppressor role PMID:28667006.
- SF3B1 — Newly nominated CCA driver; mutated in 4.6% at hotspots codon 625 (23%) and codon 700 (14%) — implicating splicing dysregulation in CCA, paralleling uveal melanoma and breast cancer hotspots PMID:28667006.
- KRAS — Significantly more frequent in intrahepatic CCAs (q < 0.1) PMID:28667006.
- TERT — Promoter mutations rare in CCA (2/71 WGS cases, 2.8%) at chr5:1295228 PMID:28667006.
- FBXW7, SMAD4 — Mutations associated with elevated SV burden (q < 0.1) PMID:28667006.
- CTNNB1, WNT5B, AKT1 — Upregulated expression in Cluster 2 (p < 0.05) PMID:28667006.
- MYC, MDM2, EGFR, CCND1 — Recurrent oncogene amplifications (n=12, 9, 11, 7 respectively) PMID:28667006.
- CDKN2A, UTY, KDM5D — Recurrent deletions (n=17, 17, 16 respectively) PMID:28667006.
- TTC28 — Source of recurrent somatic L1 retrotransposition in 28.2% of WGS tumors PMID:28667006.
- POLE — Mutated in 2 of the 3 hypermutator CCAs (alongside MSI mutational signatures) PMID:28667006.
- PDCD1 (PD-1), PDCD1LG2 (PD-L2), BTLA — Specifically upregulated in Cluster 3, motivating immune-checkpoint blockade as a candidate therapeutic strategy in that subtype PMID:28667006.
- TET1, EZH2 — Cluster 1 shows downregulated TET1 and upregulated EZH2 expression, suggesting loss-of-demethylation plus gain-of-repressive-methylation as the epigenetic mechanism for Cluster 1 hypermethylation PMID:28667006.
Clinical implications
- Molecular subtype provides prognostic information beyond anatomy. Clusters 3 and 4 carry significantly better overall survival than Clusters 1 and 2 (log-rank p < 0.001), independent of fluke status, anatomical site, and stage on multivariate Cox regression (p < 0.05); reproduced in an independent validation cohort PMID:28667006.
- Cluster-specific therapeutic hypotheses (require clinical validation):
- Clusters 1 and 2 — anti-HER2. ERBB2-amplified CCAs (predominantly Fluke-Pos) are candidates for HER2-targeted agents; cell-line data referenced by the authors suggest high-ERBB2 CCAs are more sensitive to ERBB2 inhibition than low-ERBB2 cases PMID:28667006.
- Cluster 3 — immune checkpoint blockade. Upregulation of PDCD1 (PD-1), PDCD1LG2 (PD-L2), and BTLA, combined with antigen-presentation and T-cell signaling pathway upregulation, motivates immunotherapy trials in this subtype — though sample size is small PMID:28667006.
- Cluster 4 — IDH inhibitors and FGFR-targeted agents. IDH1/IDH2 mutations and FGFR2/FGFR3 rearrangements suggest opportunities for IDH inhibitors (e.g. ivosidenib, then in NCT02073994) and FGFR inhibitors. FGFR2 3′UTR-loss truncating rearrangements expand the FGFR-targetable population beyond canonical in-frame fusions and activating mutations PMID:28667006.
- Anatomical classification alone is insufficient for treatment decisions. Tumors in different anatomical sites can be molecularly similar, and tumors in the same anatomical site span all four molecular clusters; current oncology guidelines do not discriminate CCA treatment by anatomical site PMID:28667006.
- Liver fluke status is not equivalent to molecular subtype. Cluster-associated survival differences persist after fluke-status adjustment, and Cluster 2 contains both Fluke-Pos and Fluke-Neg tumors — cautioning against using fluke status as a proxy for molecular classification PMID:28667006.
Limitations & open questions
- Platform heterogeneity. Sample-resource constraints (DNA, RNA, FFPE) prevented all platforms from being applied to all samples; the integrative clustering rests on 94–121 samples with multi-platform data while the full driver/SV/CN analyses use larger but partially-overlapping subsets. Authors used statistical models and overlap concordance to argue platform bias is minimal, but cross-platform merging restricts analysis to common genomic regions PMID:28667006.
- Cluster 3 is small. Immunotherapy nominations for Cluster 3 rest on a small sample size; the mechanistic basis for immune-checkpoint upregulation in this subtype is unresolved (the authors note that aneuploidy-associated immunogenicity is one possible explanation) PMID:28667006.
- Center-specific pre-processing differences (collection, biopsy site, processing protocols) may introduce sequencing biases; partially mitigated by standardized AJCC 7th-edition histology review and tumor-cell-content estimation PMID:28667006.
- Functional validation is restricted. Only RASA1 (shRNA knockdown migration/invasion) and FGFR2 3′UTR (luciferase reporter) and 2/3 selected promoter mutations (FIREFLY-prediction validation) were experimentally tested; the other newly nominated drivers (STK11, MAP2K4, SF3B1) and the proposed therapeutic vulnerabilities require further functional and clinical validation PMID:28667006.
- FIREFLY does not predict directionality of expression change for individual TF binding-change mutations, since TFs can be activators or repressors; gene-set-level effects are inferred without per-mutation directionality PMID:28667006.
- Cell-of-origin remains a competing explanation for the Cluster 1 vs Cluster 4 hypermethylation differences. The biliary system contains multipotent stem/progenitor cells; liver fluke infection primarily affects large bile ducts while parenchymal liver diseases affect canals of Hering and bile ductules — distinct vulnerable populations could partly explain the molecular differences alongside the proposed extrinsic-vs-intrinsic carcinogenesis model PMID:28667006.
- Comparison with the contemporary US TCGA CCA series (38 samples, exclusively fluke-negative, mostly intrahepatic, North American — overlaps with chol_jhu_2013 lineage of CCA studies). The TCGA “IDH” and “METH3” groups map approximately to Cluster 4; “ECC” to Cluster 2; the TCGA “METH2” group (CCND1 amplifications) and Cluster 3 are not obviously matched PMID:28667006.
- Generalizability of FIREFLY. Whether the gene-set-level promoter-mutation framework recovers analogous H3K27me3/PRC2 signals in other cancer types remains an open empirical question raised by the authors PMID:28667006.
Citations from this paper used in the wiki
- “We analysed 489 CCAs from 10 countries, combining whole-genome (71 cases), targeted/exome, copy-number, gene expression, and DNA methylation information” (Abstract).
- “Integrative clustering defined four CCA clusters – Fluke-Positive CCAs (Clusters 1/2) are enriched in ERBB2 amplifications and TP53 mutations, conversely Fluke-Negative CCAs (Clusters 3/4) exhibit high copy-number alterations and PD-1/PD-L2 expression, or epigenetic mutations (IDH1/2, BAP1) and FGFR/PRKA-related gene rearrangements” (Abstract).
- “Whole-genome analysis highlighted FGFR2 3′UTR deletion as a mechanism of FGFR2 upregulation” (Abstract).
- “Our analysis revealed four potentially new CCA driver genes not highlighted in previous CCA publications – RASA1, STK11, MAP2K4, and SF3B1” (Results, p. 5).
- “RASA1 … was predicted to be inactivated in 4.1% of cases (10 frame shift, 4 nonsense) … shRNA-mediated knockdown of RASA1 resulted in significantly enhanced migration and invasion” (Results, p. 5).
- “ERBB2 amplifications were more frequent in Fluke-Pos cases (10.4% in Fluke-Pos vs 2.7% in Fluke-Neg CCA, p < 0.01, Fisher’s exact test)” (Results, p. 5).
- “we identified 5 in-frame gene fusions with intact tyrosine kinase domains – four involving FGFR2 (FGFR2-STK26, FGFR2-TBC1D1, FGFR2-WAC, and FGFR2-BICC1) and one involving FGFR3 (FGFR3-TACC3) … this is the first report of FGFR3 fusions in CCA” (Results, p. 6).
- “FGFR2 3′UTR loss may thus represent a new and additional mechanism for enhancing FGFR2 expression in CCA” (Results, p. 6).
- “FGFR2 rearrangements were observed exclusively in Cluster 4 (p < 0.001, Fisher’s exact test)” (Results, p. 6).
- “we observed frequent somatic L1 retrotranspositions, particularly originating from an L1 element in intron 1 of the TTC28 gene (52 events in 20/71 tumors, 28.2%)” (Results, p. 7).
- “only two CCAs (2.8%) harboured TERT-promoter mutations (chr5:1295228)” (Results, p. 7).
- “FIREFLY (FInding Regulatory mutations in gEne sets with FunctionaL dYsregulation) … uses experimentally determined high-throughput TF-DNA binding data for 486 TFs” (Results, p. 7).
- “two of these (MIKKELSEN_MCV6_HCP_WITH_H3K27ME3 and MIKKELSEN_MEF_ICP_WITH_H3K27ME3) are subsets of PRC2 target genes” (Results, p. 8).
- “Cluster 1 … was dominated by hypermethylation in promoter CpG islands, while Cluster 4, enriched in Fluke-Neg CCAs, was dominated by hypermethylation in promoter CpG island shores” (Results, p. 9).
- “Cluster 4 CCAs were significantly enriched in IDH1/2 mutations, which are known to be associated with CCA hypermethylation (31.6% in Cluster 4 versus 1.0% in other clusters, q < 0.001)” (Results, p. 9).
- “patients in Clusters 3 and 4 had significantly better overall survival relative to the other 2 clusters (p < 0.001, log-rank test)” (Results, p. 4).
- “Cluster 4 CCAs, which are associated with IDH1/2 mutations and FGFR2 and PRKA-related gene rearrangements, might also be tested with recently described IDH inhibitors (ClinicalTrials.gov identifier: NCT02073994) or FGFR-targeting agents” (Discussion, p. 11).
This page was processed by crosslinker on 2026-05-15.