Driver Fusions and Their Implications in the Development and Treatment of Human Cancers

Authors

Qingsong Gao

Wen-Wei Liang

Steven M. Foltz

Gnanavel Mutharasu

Reyka G. Jayasinghe

Song Cao

Wen-Wei Liao

Sheila M. Reynolds

Matthew A. Wyczalkowski

Lijun Yao

Lihua Yu

Sam Q. Sun

Ken Chen

Alexander J. Lazar

Ryan C. Fields

Michael C. Wendl

Brian A. Van Tine

Ravi Vij

Feng Chen

Matti Nykter

Ilya Shmulevich

Li Ding

Doi

10.1016/j.celrep.2018.03.050

PMID: 29617662 · DOI: 10.1016/j.celrep.2018.03.050 · Journal: Cell Reports (2018)

TL;DR

Gao et al. systematically called gene fusions across 9,624 tumor samples spanning 33 TCGA cancer types using a multi-tool RNA-seq pipeline (star-fusion, ericscript, and breakfast) followed by stringent panel-of-normals filtering. They identified 25,664 fusions with a 63.3% WGS validation rate (on the 18.2% of fusions with available wgs data) and reproduced 95.5% of fusions reported in TCGA marker papers. Integrating expression, copy number, and driver-gene annotations, they argue that fusions drive ~16.5% of cancer cases and act as the sole driver in >1%, while 6.0% of samples harbor at least one druggable fusion. The work catalogs 1,275 kinase fusions with intact catalytic domains, highlights cancer-type-specific kinase-fusion enrichments (e.g., 35.6% of THCA samples), shows mutual exclusivity between fusions and mutations in the same driver gene, and predicts neoantigens from fusion junctions for exploratory immunotherapy purposes PMID:29617662.

Cohort & data

9,624 tumor samples + 713 normal samples from TCGA spanning 33 cancer types: ACC, BLCA, LGG, BRCA, CESC, CHOL, COAD, ESCA, GBM, HNSC, KICH, KIRC, KIRP, LAML, LIHC, LUAD, LUSC, DLBCL, MESO, OV, PAAD, PCPG, PRAD, READ, SARC, SKCM, STAD, TGCT, THYM, THCA, UCS, UCEC, and UVM (PANCAN scope).
Datasets: TCGA RNA-seq corpus from CGHub processed on the ISB Cancer Genomics Cloud; mutation calls from MC3 (Public MAF; Ellrott et al., 2018); Level-3 RSEM expression and segment-based copy number from Broad GDAC Firehose (2016_01_28). Frontmatter-anchored to gbm_tcga_pan_can_atlas_2018; the analysis spans the equivalent pan-can_atlas studies for all 33 disease types.
Assays / pipelines: RNA-seq fusion calling with star-fusion, ericscript, and breakfast (5 kb and 100 kb min-distance cutoffs); kinase-domain integrity inference via agfusion against UniProt/PFAM; WGS-based fusion validation; HLA-typed neoantigen prediction with netmhc4; driver-gene mutation overlay from mc3; druggability annotation from the curated depo (Database of Evidence for Precision Oncology).
Reference genome: GRCh38 (build 38).

Key findings

Fusion landscape (n = 25,664): Filtering across multiple panels-of-normals (TCGA normals, GTEx tissues, non-cancer cells from Babiceanu et al., 2016) yielded 25,664 fusions; the pipeline recovered 405 of 424 (95.5%) curated TCGA marker-paper fusions, and 63.3% of testable predictions (4,675 of 25,664; 18.2%) were WGS-validated by ≥3 discordant read pairs PMID:29617662.
Breakpoint distribution: Most breakpoints fall in CDS of both partners; significantly more breakpoints occur in 5′UTRs than 3′UTRs (Mann-Whitney U, p < 2.2e-16) for both partners — consistent with more open 5′ chromatin and a larger 5′-UTR exon count PMID:29617662.
Per-sample fusion burden: Median 1 fusion/sample overall (range 0–60); cancer types with median 0 are KICH, KIRC, KIRP, LGG, PCPG, TGCT, THCA, THYM, and UVM PMID:29617662.
Top per-cancer recurrent fusions: TMPRSS2–ERG in PRAD (38.2%), FGFR3–TACC3 in BLCA (2.0%), CESC (1.7%), and LUSC (1.2%); EML4–ALK in LUAD (1.0%); CCDC6–RET in THCA (4.2%); FGFR2–BICC1 in CHOL (5.6%) PMID:29617662.
Expression follows oncogene/TSG identity: 6% (MESO) to 28% (KIRP) of kinase-partner fusions were over-expression outliers; oncogene partners over-express, TSG partners under-express. TSG under-expression ranged from 3% (BRCA) to 38% (PCPG) of TSG-partner fusions PMID:29617662.
Kinase fusions (n = 2,892): 1,172 with kinase at 3′, 1,603 with kinase at 5′, 117 both-kinase. 1,275 (44.1%) retained an intact kinase domain. THCA is dramatically enriched (35.6% of samples; Fisher p < 2.2e-16), with 94.0% being 3′-kinase fusions. Top 3′-kinase recurrent partners are tyrosine kinases enriched in THCA: RET, BRAF, NTRK1, NTRK3, ALK. FGFR2 and FGFR3 dominate the 5′-kinase recurrent set PMID:29617662.
Promoter-swap mechanism: 70.5% of 3′-kinase fusions show higher 5′-partner expression vs. ~66.7% of 5′-kinase fusions show lower partner expression — i.e., 3′ kinases borrow their partner’s stronger promoter. Concrete example: a TRABD–DDR2 fusion in one HNSC sample drives DDR2 overexpression via the TRABD promoter PMID:29617662.
WNK-family fusions: 23 fusions involving WNK1 or WNK2 detected across cancer types, most associated with elevated WNK mRNA without matching copy-number amplification (e.g., neither WNK1 nor WNK2 amplified in ESCA or LIHC); ERC1–WNK1 was independently reported in a Chinese ESCC cohort (Chang et al., 2017) PMID:29617662.
Fusion-only drivers: Across 8,955 patients with both MC3 mutations and fusion calls, 8.3% had both driver mutations and driver fusions, 6.4% had both mutations and fusions in driver genes, and 1.8% had driver fusions only (mean 1.1 fusions, no driver mutations). The “driver fusion only” group has mutational burden comparable to the no-driver-alteration group (Mann-Whitney U, p < 2.2e-16) PMID:29617662.
Mutual exclusivity: When fusion events are present in a driver gene, point mutations in that same gene are rarely observed — strict in ESR1 (0 overlapping samples). TP53 is mutation-dominated except in SARC, where both fusions and mutations occur. In LAML, CBFB is fused but rarely mutated PMID:29617662.
LAML “fusion-only” cases: 14.0% of LAML tumors had fusions but no driver-gene mutations; recovered fusions include CBFB–MYH11 (n=3), BCR–ABL1 (n=2), PML–RARA (n=2), and the leukemia-initiating NUP98–NSD1 (n=2) PMID:29617662.
Druggable fusions cover 6.0% of samples (574/9,624) by DEPO annotation across 29 cancer types. Major recurrent druggable targets: TMPRSS2 in PRAD (205 samples), RET in THCA (33 samples), PML–RARA in LAML (16 samples). FGFR3 is a potential target in 15 cancer types PMID:29617662.
Smoking-stratified LUAD druggability: Of 500 LUAD with known smoking status, never-smokers had a significantly higher rate of druggable fusions (15/75, 20%) than smokers (9/425, 2.1%) (chi-square p < 1e-6) PMID:29617662.
Neoantigen prediction: Mean 1.5 predicted neoantigens per fusion across cancer types (range 0.33 in KICH to 2.88 in THYM). Frameshift fusions yielded more epitopes than inframe fusions (mean 2.2 vs. 1.0). TMPRSS2–ERG, CCDC6–RET, and FGFR3–TACC3 had the most samples with at least one predicted neoantigen PMID:29617662.

Genes & alterations

TMPRSS2–ERG: Most-recurrent intra-cancer fusion overall (38.2% of PRAD, 205 samples flagged druggable). Modest neoantigen yield in only a small subset.
FGFR3–TACC3: Recurrent across BLCA (2.0%), CESC (1.7%), LUSC (1.2%); inframe activating kinase fusion. FGFR3 named druggable target in 15 cancer types.
EML4–ALK and other ALK fusions: ALK fusions in 20 samples across 8 cancer types (5 in LUAD); EML4 is the most frequent 5′ partner (7/17). Fusion status corresponds to copy-number-neutral ALK overexpression — the rationale for crizotinib and other approved ALK inhibitors.
CCDC6–RET and other RET fusions: Recurrent in THCA (4.2%); 33 THCA samples flagged as druggable on RET. Inframe protein kinase fusions with overexpression of the 3′ RET oncogene; also seen in LUAD.
FGFR2–BICC1: Most recurrent fusion in CHOL (5.6%).
ERBB2 fusions: 4 samples; 3 of 4 had HPV integration within 1 Mb of ERBB2. Partner genes PPP1R1B and IKZF3 are genomic neighbors of ERBB2, suggesting fusions arise from local instability potentially induced by viral integration. Compares with trastuzumab-targetable HER2+ amplification in BRCA.
CBFB–MYH11: Recurrent in LAML; strongly associated with decreased CBFB (TSG/transcriptional regulator) expression. CBFB is fused but rarely mutated in LAML — alternative inactivation mechanism.
DNAJB1–PRKACA: Liver-only; specific to fibrolamellar carcinoma (FLC) subtype of LIHC, corroborating Dinh et al. 2017.
ESR1 fusions: 16 samples across 5 cancer types (9 in BRCA, 8 of which are luminal A/B). Strict mutual exclusivity with ESR1 point mutations. When ESR1 is the 5′ partner, the AF1 transactivation domain is preserved; when 3′, the AF2 domain is preserved. ESR1 expression is elevated in fusion-positive samples, especially in the 9 BRCA cases.
BCR–ABL1, PML–RARA, NUP98–NSD1: Classic leukemic fusions recovered in LAML “fusion-only” tumors.
WNK1/WNK2: 23 WNK-family fusions; mostly higher WNK mRNA without copy-number amplification. ERC1–WNK1 independently reported in a Chinese ESCC cohort.
TRABD–DDR2: HNSC sample with promoter-swap-driven DDR2 overexpression — proposed as candidate for dasatinib. DDR2 fusions seen in nine additional samples across five cancer types.
BRAF, NTRK1, NTRK3: Among the top recurrent 3′-kinase tyrosine kinases enriched in THCA.
TP53: Predominantly mutated, not fused, across cancer types — except in SARC, where both fusion and mutation events were observed.
MERTK (TMEM87B partner) and FGR–WASF2: Recurrent singleton-pattern kinase fusions; FGR–WASF2 in seven samples uses the WASF2 5′UTR promoter to drive FGR overexpression in five of seven. (Wiki pages do not yet exist for MERTK, TMEM87B, FGR, or WASF2.)

Clinical implications

Druggable fusions in 6.0% of pan-can samples suggests substantial untapped opportunity for fusion-directed targeted therapy; DEPO was used as the druggability oracle (off-label allowed).
crizotinib and other FDA-approved ALK inhibitors are flagged for the 20 ALK-fusion samples spanning 8 cancer types, with EML4 being the dominant 5′ partner and ALK overexpression copy-number neutral.
dasatinib is proposed for DDR2-overexpressed tumors with promoter-swap fusions (e.g., TRABD–DDR2 in HNSC) following von Massenhausen et al., 2016.
trastuzumab is contextualized as the canonical ERBB2-amplification therapy; the four ERBB2 fusions identified here (with PPP1R1B and IKZF3 partners) point to a different but potentially actionable mechanism of HER2 dysregulation.
Never-smoker LUAD cohort enrichment for druggable fusions (20% vs. 2.1% in smokers, p < 1e-6) implies fusion screening should be prioritized in non-smoker NSCLC.
Immunotherapy caution: Patients with driver-fusion-only tumors have low overall mutational burden and may be poor candidates for checkpoint immunotherapy despite fusion peptides themselves being potentially immunogenic. Predicted fusion-derived neoantigens were highest for TMPRSS2–ERG, CCDC6–RET, and FGFR3–TACC3, but only in a small subset of the carriers.
Diagnostic implication: Robust pan-cancer fusion diagnostics (RNA-seq based) are necessary to identify the 1.8% of patients whose tumor is driven only by fusion events and who would otherwise look “driver-less” on mutation panels.

Limitations & open questions

WGS validation only on 18.2% of fusions (1,725/9,624 samples). The 63.3% validation rate is therefore a partial estimate; the unvalidated 81.8% retains some false-positive load even after stringent filtering.
RNA-seq-based fusion calling cannot directly measure fusion allele frequency or clonality, limiting interpretation of clonal evolution and intratumor heterogeneity for fusions.
Neoantigen prediction is “exploratory and speculative”: no immune-cell-infiltration or response data are integrated. Frameshift fusions may be down-regulated by nonsense-mediated decay, dampening the apparent advantage over inframe fusions in epitope yield.
Druggability annotation depends on DEPO, which permits off-label inclusion — overstates true clinical benefit and does not consider co-occurring resistance alterations.
STAR-Fusion is the dominant caller; results may inherit STAR-Fusion biases despite multi-tool integration. Disagreements among callers were filtered using FFPM > 0.1 thresholds for STAR-Fusion-only calls — borderline real fusions could be lost.
Cancer-type-specific fusion biology remains under-explained: e.g., the dramatic 3′-kinase enrichment in THCA (94% of THCA kinase fusions) lacks a mechanistic explanation here.
Several reported fusion partners are not yet wiki entities (MERTK, TMEM87B, FGR, WASF2, REF1) and need either confirmation against canonical HUGO symbols (REF1 may be an alias) or new gene pages.

Citations from this paper used in the wiki

“Our pipeline detected 405 of 424 events curated from individual TCGA marker papers (Table S1) (95.5% sensitivity).” (p. 4)
“Of that subset, WGS validated 63.3% of RNA-seq-based fusions by requiring at least three supporting discordant read pairs from the WGS data.” (pp. 4–5)
“The most recurrent example within any cancer type was TMPRSS2–ERG in prostate adenocarcinoma (PRAD; 38.2%).” (p. 5)
“We further divided these fusions into eight categories on the basis of different kinase groups, including AGC, CAMK, CK1, CMGC, STE, TK, and TKL.” (p. 7)
“Comparison of kinase fusions across different cancer types indicated that kinase fusions are significantly enriched in THCA (35.6%, Fisher’s exact test, p < 2.2e-16).” (p. 7)
“Most (66.7% [293 of 439]) 5′-kinase fusions showed lower expression in the partner gene compared with the kinase. In contrast, 70.5% of 3′-kinase fusions (239 of 339) showed higher partner expression.” (pp. 7–8)
“We found potentially druggable fusions across 29 cancer types … Overall, we found 6.0% of samples (574 of 9,624 samples) to be potentially druggable by one or more fusion targeted treatments.” (p. 9)
“15% of LUAD samples (75 of 500 samples with known smoking status) were from never smokers, while a significantly higher percentage of never smokers (15 of 75 samples) versus smokers (9 of 425 samples) were found to have druggable fusion (chi-square test, p < 1e-6).” (p. 9)
“We observed strict mutual exclusivity between ESR1 mutations and fusions … When ESR1 is the 5′ gene in the fusion, the transactivation (AF1) domain is always included … When ESR1 is the 3′ gene, the transactivation (AF2) domain is always included.” (p. 9)
“On average, there were 1.5 predicted neoantigens per fusion across different cancer types … frameshift fusions can generate more immunogenic epitopes than inframe fusions (mean value 2.2 versus 1.0).” (pp. 9–10)

This page was processed by crosslinker on 2026-05-15.