Multimodal data integration using machine learning improves risk stratification of high-grade serous ovarian cancer

Authors

Kevin M. Boehm

Emily A. Aherne

Lora Ellenson

Ines Nikolovski

Mohammed Alghamdi

Ignacio Vázquez-García

Dmitriy Zamarin

Kara Long Roche

Ying Liu

Druv Patel

Andrew Aukerman

Arfath Pasha

Doori Rose

Pier Selenica

Pamela I. Causa Andrieu

Chris Fong

Marinela Capanu

Jorge S. Reis-Filho

Rami Vanguri

Harini Veeraraghavan

Natalie Gangai

Ramon Sosa

Samantha Leung

Andrew McPherson

JianJiong Gao

MSK MIND Consortium

Yulia Lakhman

Sohrab P. Shah

Doi

PMID: 35764743 · DOI: 10.1038/s43018-022-00388-9 · Journal: Nature Cancer (2022; 3(6):723–733) · Code: github.com/kmboehm/onco-fusion · Data: Synapse syn25946117

TL;DR

Boehm et al. assembled a multimodal cohort of 444 predominantly late-stage high-grade serous ovarian cancer (HGSOC) patients (296 from MSKCC, 148 from TCGA-OV) and trained separate Cox prognostic submodels from (i) pre-treatment contrast-enhanced CT (omental-implant radiomics), (ii) pre-treatment H&E whole-slide images (tumor nuclear morphometry) and (iii) clinicogenomic features (HRD status, residual disease, PARP-inhibitor receipt). They discovered that omental-implant autocorrelation on wavelet-filtered CT and mean tumor nuclear area on H&E are independently prognostic, and that integrating these modalities by late fusion yields test-set concordance indices of 0.62 (radiomic+histopathological, RH) and 0.61 (genomic+radiomic+histopathological, GRH) — significantly outperforming HRD status alone (c=0.52), the clinical model (c=0.51) and every unimodal model. The GRH risk groups separated overall survival (30 vs 50 months median, P=0.023) and progression-free survival (P=0.040) in the held-out test set of 40 patients PMID:35764743.

Cohort & data

  • Disease: HGSOC, predominantly stage III/IV. Training cohort stage distribution: 160 stage IV, 225 stage III, 10 stage II, 8 stage I, 1 unknown. Test cohort: 31 stage IV, 9 stage III PMID:35764743.
  • Total N: 444 patients — 296 from Memorial Sloan Kettering Cancer Center (including 36 from the prospective MSK-SPECTRUM project) and 148 from The Cancer Genome Atlas Ovarian Cancer (TCGA-OV). Train: 404; held-out test: 40 (randomly sampled from patients with all four modalities available) PMID:35764743.
  • Modalities per training patient: 243 with H&E WSIs, 245 with adnexal CT lesions, 251 with omental CT implants. All 40 test patients had complete H&E + omental CT + sequencing by construction PMID:35764743.
  • Imaging: Pre-treatment contrast-enhanced abdominal/pelvic CT in DICOM (median 120 kVp, 5 mm slice thickness). MSKCC scans acquired on GE Medical Systems scanners; TCGA-OV scans pulled from The Cancer Imaging Archive (TCIA-TCGA-OV). Three fellowship-trained gynecologic radiologists manually segmented adnexal masses and representative omental implants on every tumor-containing axial slice using ITK-SNAP v3.8.0 PMID:35764743.
  • Histopathology: Pre-treatment H&E-stained whole-slide images of diagnostic biopsies, primarily of peritoneal/omental lesions. 60 WSIs were partially annotated by two expert pathologists for tissue-type classifier training, yielding >1.4M 128×128 px (64×64 µm) tiles. Macenko stain normalization applied PMID:35764743.
  • Sequencing: For MSKCC cases, HRD status was inferred from MSK-IMPACT targeted clinical sequencing (with variant significance by OncoKB and Hotspot annotations) plus COSMIC SBS3 detection via SigMA (n=130 with research consent; high-confidence SBS3 in 48, low-confidence in 30). For TCGA cases, CNA and SNV calls were downloaded from cBioPortal for the TCGA-OV project, plus SBS3 frequencies from Synapse (syn11801889). Training HRD/HRP/ambiguous split: 119 HRD / 218 HRP / 67 ambiguous PMID:35764743.
  • Outcomes: Median overall survival 38.7 (IQR 25–55) months in training and 37.6 (IQR 26–49) months in test; 132 training and 17 test censored OS outcomes. Start date for OS/PFS was the date of CT when available, otherwise the date of pathologic diagnosis PMID:35764743.

Key findings

  • Omental (but not adnexal) radiomics carry prognostic signal. Of 600 Coif-wavelet-transformed radiomic features extracted per lesion via PyRadiomics, nine omental features survived Benjamini–Hochberg correction on univariate Cox regression; none of the adnexal features did. The final omental Cox submodel reduced to a single feature — autocorrelation of the gray level co-occurrence matrix on the HLL Coif wavelet — with log(HR) = 1.68 (corrected P < 0.01), invariant to CT vendor and segmenting radiologist. Training c-index 0.55 (95% CI 0.549–0.554), test c-index 0.53 (95% CI 0.517–0.547) PMID:35764743.
  • Tumor nuclear size is the dominant histopathological prognostic factor. Of 216 histopathological features (cell-type + tissue-type, via a weakly supervised ResNet-18 tissue classifier trained on 60 annotated WSIs with 4-fold slide-wise cross-validated accuracy 0.88 and nuclei segmented by StarDist in QuPath), 24 had univariate log(HR) significantly different from 0 and 20 of those described tumor nuclear diameter/size (larger → shorter OS). The final two-feature histopathological Cox submodel used mean tumor nuclear area and the major axis length of stroma, and was not confounded by specimen size. Training c-index 0.56 (95% CI 0.559–0.564), test c-index 0.54 (95% CI 0.527–0.560) PMID:35764743.
  • HRD status alone is weakly prognostic in this late-stage cohort. HRD stratification by OS yielded c = 0.55 (training) and 0.52 (test) without fitting any parameters. HRP versus HRD survival differed with P = 7 × 10⁻³ in patients with explicit evidence of either status PMID:35764743.
  • Late fusion beats every unimodal model on held-out test. Using a two-stage Cox late-fusion scheme (unimodal submodels fit on all available data for that modality; final Cox integrator fit only on the patient intersection), the radiomic+histopathological (RH) model reached test c = 0.62 (95% CI 0.604–0.638) and the genomic+radiomic+histopathological (GRH) model reached c = 0.61 (95% CI 0.594–0.625) — both significantly better than HRD alone (c=0.52), the clinical model (c=0.51) and each individual imaging model by 1000-fold permutation test PMID:35764743.
  • GRH risk groups stratify OS and PFS in the test set. GRH high- vs low-risk median OS = 30 vs 50 months (P = 0.023, log-rank); 36-month OS = 34% vs 68%. The same groups separated PFS (P = 0.040) PMID:35764743.
  • The modalities carry orthogonal prognostic information. Absolute Kendall rank correlation between any pair of unimodal risk scores was < 0.14; Pearson and Spearman correlations between modality feature spaces peaked at 0.191 and 0.192 respectively. Radiological and histopathological submodels flagged largely non-overlapping subsets of poor-prognosis patients PMID:35764743.
  • Learning from partial-information cases is essential. Restricting training to the 114 patients with complete information across all four modalities yielded markedly worse test performance than the full late-fusion model trained on all 404 training cases with any subset of modalities — motivating late fusion specifically for missing-data tolerance PMID:35764743.
  • Model risk scores correlate with pathological chemotherapy response. Inferred risk from all models except the pure genomic and genomic+histopathological models (including the GRH model) was significantly associated with pathological chemotherapy response score (CRS) in the training set (one-sided Mann–Whitney U, P = 0.0044 for GRH) PMID:35764743.
  • Adding clinical features did not improve performance. The full GHRC (genomic+histopathological+radiological+clinical) model underperformed the RH and GRH models, attributed to the small test cohort and to the fact that RD-status and PARP-inhibitor annotation were unavailable for the 148 TCGA-OV cases PMID:35764743.

Genes & alterations

  • BRCA1, BRCA2 — pathogenic germline/somatic variants in these and other HRD-DDR genes were the primary driver of HRD-subtype assignment. HRD status on its own provided only modest OS stratification (test c = 0.52), consistent with HRD being a necessary but insufficient prognostic variable in late-stage HGSOC PMID:35764743.
  • CDK12 — SNVs in CDK12 were used to assign patients to the tandem-duplicator-enriched subtype (following Wang et al.), even in the presence of HRD-DDR variants, per the MSKCC subtype-assignment rules PMID:35764743.
  • CCNE1CCNE1 amplification was used to assign patients to the foldback-inversion-enriched subtype, overriding HRD-DDR variant evidence when present. CCNE1 copy number was analyzed via the standard MSK-IMPACT clinical pipeline for MSKCC cases and downloaded from cBioPortal for the TCGA-OV cases PMID:35764743.

Clinical implications

  • The authors position the GRH multimodal model as a proof-of-principle for combining routine diagnostic data streams (pre-treatment CT + pre-treatment H&E + MSK-IMPACT-style targeted sequencing) to refine HGSOC risk stratification beyond HRD status, residual disease and stage. The intended downstream uses are selecting primary treatment (PDS vs NACT-IDS), planning surveillance frequency, making maintenance-therapy decisions and counseling patients about investigative trials PMID:35764743.
  • Practical advantage of an omental-implant radiomic model: omental disease is ubiquitous in advanced HGSOC (including primary peritoneal cases without adnexal mass), omental implants are segmentable by less-experienced radiologists, and segmentation is substantially faster than whole-burden delineation. The authors argue this lowers adoption barriers versus published adnexal-only or whole-tumor radiomic models PMID:35764743.
  • Pathology workflow: the two-feature histopathological signature (mean tumor nuclear area + stromal major axis) is interpretable and can be inspected by pathologists, distinguishing this approach from deep-feature “black-box” WSI survival models. Trained weights and source code are released to enable extension to other cancer types PMID:35764743.
  • Biomarker hypothesis: larger tumor nuclei on H&E may reflect whole-genome doubling or cellular fusion events and warrant direct co-registration of matched genomes to histology in future cohorts. Omental autocorrelation may reflect lesion density and intratumoral heterogeneity rather than texture coarseness per se PMID:35764743.

Limitations & open questions

  • Small test set (n=40). The authors call out that the clinical submodel (RD status + PARP-inhibitor receipt) did not stratify the test cohort at all (c = 0.51), likely because of test-set size and because the TCGA-OV subset lacks RD and PARP annotation altogether PMID:35764743.
  • Late-fusion cannot gate noisy modalities. With only 114 patients having all four modalities, the authors could not fit attention or gating mechanisms that would down-weight unreliable modalities per patient; they anticipate that larger cohorts would enable such architectures PMID:35764743.
  • Imperfect HRD calls. For MSKCC patients sequenced only with germline HRD-DDR panels (not MSK-IMPACT), HRD status is assigned more loosely and each risk group is “enriched for — but not exclusively composed of” the genomic subtype of interest. The authors anticipate clinical WGS will tighten this PMID:35764743.
  • Incomplete TCGA treatment annotation. Treatment regimens are unannotated for the 148 TCGA-OV patients, so drug-level effects (e.g., PARP inhibitor receipt, platinum sensitivity) could not be modeled across the full cohort PMID:35764743.
  • No prospective validation. The authors explicitly flag prospective randomized validation as the necessary next step before clinical deployment PMID:35764743.
  • Open-corpus question (not in paper): the same multimodal-late-fusion recipe would be worth testing across other peritoneal-spread cancers (pancreatic, gastric, appendiceal) where omental metastasis is similarly dominant; whether the specific omental-autocorrelation feature transfers is unknown.

Citations from this paper used in the wiki

  • “We analyzed 444 patients with HGSOC, including 296 patients treated at the Memorial Sloan Kettering Cancer Center (MSKCC) and 148 patients from The Cancer Genome Atlas Ovarian Cancer (TCGA-OV) data. The 40 test cases were randomly sampled from the entire pool of patients with all data modalities available” — Results/Cohort.
  • “In the test set, the model combining both imaging modalities (radiomic–histopathological (RH) model) significantly outperformed the HRD status-based model, clinical model and individual imaging models, with a test concordance index of 0.62 (95% CI 0.604–0.638)… The model with genomic, radiomic and histopathological (GRH) modalities performed comparably, with a test concordance index of 0.61 (95% CI 0.594–0.625)” — Results/Multimodal prognostication.
  • “In the test set, the GRH risk groups also showed significantly different OS, with median survival of 30 months for the high-risk group and 50 months for the low-risk group (P = 0.023)… At 36 months, 68% and 34% survived for low- and high-risk groups, respectively” — Results/Multimodal prognostication.
  • “The same two risk groups identified by the model in the test set also showed significantly different progression-free survival (PFS) (P = 0.040)” — Results/Multimodal prognostication.
  • “Absolute Kendall rank correlation coefficient values were low between individual modalities (<0.14)… The maximal magnitude of the Pearson correlation between individual modalities is 0.191” — Results + Extended Data Fig. 9.
  • “Patients with high-confidence dominant signature 3 or at least one significant variant or deep deletion in the HRD-DDR genes were assigned to the HRD subtype, except when there was evidence that patients belonged to the foldback inversion- or tandem duplicator-enriched subgroups (via CCNE1 amplification or CDK12 SNVs, specifically)” — Methods/Inferring HRD status.
  • “Analysis of only training cases with full information (n = 114) resulted in poor performance… reinforcing the ability of late-fusion models to learn in the setting of missing data” — Results/Multimodal prognostication.

This page was processed by crosslinker on 2026-05-04.