Multimodal data integration using machine learning improves risk stratification of high-grade serous ovarian cancer

Authors

Kevin M. Boehm

Emily A. Aherne

Lora Ellenson

Ines Nikolovski

Mohammed Alghamdi

Ignacio Vázquez-García

Dmitriy Zamarin

Kara Long Roche

Ying Liu

Druv Patel

Andrew Aukerman

Arfath Pasha

Doori Rose

Pier Selenica

Pamela I. Causa Andrieu

Chris Fong

Marinela Capanu

Jorge S. Reis-Filho

Rami Vanguri

Harini Veeraraghavan

Natalie Gangai

Ramon Sosa

Samantha Leung

Andrew McPherson

JianJiong Gao

MSK MIND Consortium

Yulia Lakhman

Sohrab P. Shah

Doi

10.1038/s43018-022-00388-9

PMID: 35764743 · DOI: 10.1038/s43018-022-00388-9 · Journal: Nature Cancer (2022; 3(6):723–733) · Code: github.com/kmboehm/onco-fusion · Data: Synapse syn25946117

TL;DR

Boehm et al. assembled a multimodal cohort of 444 predominantly late-stage high-grade serous ovarian cancer (HGSOC) patients (296 from MSKCC, 148 from TCGA-OV) and trained separate Cox prognostic submodels from (i) pre-treatment contrast-enhanced CT (omental-implant radiomics), (ii) pre-treatment H&E whole-slide images (tumor nuclear morphometry) and (iii) clinicogenomic features (HRD status, residual disease, PARP-inhibitor receipt). They discovered that omental-implant autocorrelation on wavelet-filtered CT and mean tumor nuclear area on H&E are independently prognostic, and that integrating these modalities by late fusion yields test-set concordance indices of 0.62 (radiomic+histopathological, RH) and 0.61 (genomic+radiomic+histopathological, GRH) — significantly outperforming HRD status alone (c=0.52), the clinical model (c=0.51) and every unimodal model. The GRH risk groups separated overall survival (30 vs 50 months median, P=0.023) and progression-free survival (P=0.040) in the held-out test set of 40 patients PMID:35764743.

Cohort & data

Disease: HGSOC, predominantly stage III/IV. Training cohort stage distribution: 160 stage IV, 225 stage III, 10 stage II, 8 stage I, 1 unknown. Test cohort: 31 stage IV, 9 stage III PMID:35764743.
Total N: 444 patients — 296 from Memorial Sloan Kettering Cancer Center (including 36 from the prospective MSK-SPECTRUM project) and 148 from The Cancer Genome Atlas Ovarian Cancer (TCGA-OV). Train: 404; held-out test: 40 (randomly sampled from patients with all four modalities available) PMID:35764743.
Modalities per training patient: 243 with H&E WSIs, 245 with adnexal CT lesions, 251 with omental CT implants. All 40 test patients had complete H&E + omental CT + sequencing by construction PMID:35764743.
Imaging: Pre-treatment contrast-enhanced abdominal/pelvic CT in DICOM (median 120 kVp, 5 mm slice thickness). MSKCC scans acquired on GE Medical Systems scanners; TCGA-OV scans pulled from The Cancer Imaging Archive (TCIA-TCGA-OV). Three fellowship-trained gynecologic radiologists manually segmented adnexal masses and representative omental implants on every tumor-containing axial slice using ITK-SNAP v3.8.0 PMID:35764743.
Histopathology: Pre-treatment H&E-stained whole-slide images of diagnostic biopsies, primarily of peritoneal/omental lesions. 60 WSIs were partially annotated by two expert pathologists for tissue-type classifier training, yielding >1.4M 128×128 px (64×64 µm) tiles. Macenko stain normalization applied PMID:35764743.
Sequencing: For MSKCC cases, HRD status was inferred from MSK-IMPACT targeted clinical sequencing (with variant significance by OncoKB and Hotspot annotations) plus COSMIC SBS3 detection via SigMA (n=130 with research consent; high-confidence SBS3 in 48, low-confidence in 30). For TCGA cases, CNA and SNV calls were downloaded from cBioPortal for the TCGA-OV project, plus SBS3 frequencies from Synapse (syn11801889). Training HRD/HRP/ambiguous split: 119 HRD / 218 HRP / 67 ambiguous PMID:35764743.
Outcomes: Median overall survival 38.7 (IQR 25–55) months in training and 37.6 (IQR 26–49) months in test; 132 training and 17 test censored OS outcomes. Start date for OS/PFS was the date of CT when available, otherwise the date of pathologic diagnosis PMID:35764743.

Key findings

Omental (but not adnexal) radiomics carry prognostic signal. Of 600 Coif-wavelet-transformed radiomic features extracted per lesion via PyRadiomics, nine omental features survived Benjamini–Hochberg correction on univariate Cox regression; none of the adnexal features did. The final omental Cox submodel reduced to a single feature — autocorrelation of the gray level co-occurrence matrix on the HLL Coif wavelet — with log(HR) = 1.68 (corrected P < 0.01), invariant to CT vendor and segmenting radiologist. Training c-index 0.55 (95% CI 0.549–0.554), test c-index 0.53 (95% CI 0.517–0.547) PMID:35764743.
Tumor nuclear size is the dominant histopathological prognostic factor. Of 216 histopathological features (cell-type + tissue-type, via a weakly supervised ResNet-18 tissue classifier trained on 60 annotated WSIs with 4-fold slide-wise cross-validated accuracy 0.88 and nuclei segmented by StarDist in QuPath), 24 had univariate log(HR) significantly different from 0 and 20 of those described tumor nuclear diameter/size (larger → shorter OS). The final two-feature histopathological Cox submodel used mean tumor nuclear area and the major axis length of stroma, and was not confounded by specimen size. Training c-index 0.56 (95% CI 0.559–0.564), test c-index 0.54 (95% CI 0.527–0.560) PMID:35764743.
HRD status alone is weakly prognostic in this late-stage cohort. HRD stratification by OS yielded c = 0.55 (training) and 0.52 (test) without fitting any parameters. HRP versus HRD survival differed with P = 7 × 10⁻³ in patients with explicit evidence of either status PMID:35764743.
Late fusion beats every unimodal model on held-out test. Using a two-stage Cox late-fusion scheme (unimodal submodels fit on all available data for that modality; final Cox integrator fit only on the patient intersection), the radiomic+histopathological (RH) model reached test c = 0.62 (95% CI 0.604–0.638) and the genomic+radiomic+histopathological (GRH) model reached c = 0.61 (95% CI 0.594–0.625) — both significantly better than HRD alone (c=0.52), the clinical model (c=0.51) and each individual imaging model by 1000-fold permutation test PMID:35764743.
GRH risk groups stratify OS and PFS in the test set. GRH high- vs low-risk median OS = 30 vs 50 months (P = 0.023, log-rank); 36-month OS = 34% vs 68%. The same groups separated PFS (P = 0.040) PMID:35764743.
The modalities carry orthogonal prognostic information. Absolute Kendall rank correlation between any pair of unimodal risk scores was < 0.14; Pearson and Spearman correlations between modality feature spaces peaked at 0.191 and 0.192 respectively. Radiological and histopathological submodels flagged largely non-overlapping subsets of poor-prognosis patients PMID:35764743.
Learning from partial-information cases is essential. Restricting training to the 114 patients with complete information across all four modalities yielded markedly worse test performance than the full late-fusion model trained on all 404 training cases with any subset of modalities — motivating late fusion specifically for missing-data tolerance PMID:35764743.
Model risk scores correlate with pathological chemotherapy response. Inferred risk from all models except the pure genomic and genomic+histopathological models (including the GRH model) was significantly associated with pathological chemotherapy response score (CRS) in the training set (one-sided Mann–Whitney U, P = 0.0044 for GRH) PMID:35764743.
Adding clinical features did not improve performance. The full GHRC (genomic+histopathological+radiological+clinical) model underperformed the RH and GRH models, attributed to the small test cohort and to the fact that RD-status and PARP-inhibitor annotation were unavailable for the 148 TCGA-OV cases PMID:35764743.

Genes & alterations

BRCA1, BRCA2 — pathogenic germline/somatic variants in these and other HRD-DDR genes were the primary driver of HRD-subtype assignment. HRD status on its own provided only modest OS stratification (test c = 0.52), consistent with HRD being a necessary but insufficient prognostic variable in late-stage HGSOC PMID:35764743.
CDK12 — SNVs in CDK12 were used to assign patients to the tandem-duplicator-enriched subtype (following Wang et al.), even in the presence of HRD-DDR variants, per the MSKCC subtype-assignment rules PMID:35764743.
CCNE1 — CCNE1 amplification was used to assign patients to the foldback-inversion-enriched subtype, overriding HRD-DDR variant evidence when present. CCNE1 copy number was analyzed via the standard MSK-IMPACT clinical pipeline for MSKCC cases and downloaded from cBioPortal for the TCGA-OV cases PMID:35764743.

Clinical implications

The authors position the GRH multimodal model as a proof-of-principle for combining routine diagnostic data streams (pre-treatment CT + pre-treatment H&E + MSK-IMPACT-style targeted sequencing) to refine HGSOC risk stratification beyond HRD status, residual disease and stage. The intended downstream uses are selecting primary treatment (PDS vs NACT-IDS), planning surveillance frequency, making maintenance-therapy decisions and counseling patients about investigative trials PMID:35764743.
Practical advantage of an omental-implant radiomic model: omental disease is ubiquitous in advanced HGSOC (including primary peritoneal cases without adnexal mass), omental implants are segmentable by less-experienced radiologists, and segmentation is substantially faster than whole-burden delineation. The authors argue this lowers adoption barriers versus published adnexal-only or whole-tumor radiomic models PMID:35764743.
Pathology workflow: the two-feature histopathological signature (mean tumor nuclear area + stromal major axis) is interpretable and can be inspected by pathologists, distinguishing this approach from deep-feature “black-box” WSI survival models. Trained weights and source code are released to enable extension to other cancer types PMID:35764743.
Biomarker hypothesis: larger tumor nuclei on H&E may reflect whole-genome doubling or cellular fusion events and warrant direct co-registration of matched genomes to histology in future cohorts. Omental autocorrelation may reflect lesion density and intratumoral heterogeneity rather than texture coarseness per se PMID:35764743.

Limitations & open questions

Small test set (n=40). The authors call out that the clinical submodel (RD status + PARP-inhibitor receipt) did not stratify the test cohort at all (c = 0.51), likely because of test-set size and because the TCGA-OV subset lacks RD and PARP annotation altogether PMID:35764743.
Late-fusion cannot gate noisy modalities. With only 114 patients having all four modalities, the authors could not fit attention or gating mechanisms that would down-weight unreliable modalities per patient; they anticipate that larger cohorts would enable such architectures PMID:35764743.
Imperfect HRD calls. For MSKCC patients sequenced only with germline HRD-DDR panels (not MSK-IMPACT), HRD status is assigned more loosely and each risk group is “enriched for — but not exclusively composed of” the genomic subtype of interest. The authors anticipate clinical WGS will tighten this PMID:35764743.
Incomplete TCGA treatment annotation. Treatment regimens are unannotated for the 148 TCGA-OV patients, so drug-level effects (e.g., PARP inhibitor receipt, platinum sensitivity) could not be modeled across the full cohort PMID:35764743.
No prospective validation. The authors explicitly flag prospective randomized validation as the necessary next step before clinical deployment PMID:35764743.
Open-corpus question (not in paper): the same multimodal-late-fusion recipe would be worth testing across other peritoneal-spread cancers (pancreatic, gastric, appendiceal) where omental metastasis is similarly dominant; whether the specific omental-autocorrelation feature transfers is unknown.

Citations from this paper used in the wiki

“We analyzed 444 patients with HGSOC, including 296 patients treated at the Memorial Sloan Kettering Cancer Center (MSKCC) and 148 patients from The Cancer Genome Atlas Ovarian Cancer (TCGA-OV) data. The 40 test cases were randomly sampled from the entire pool of patients with all data modalities available” — Results/Cohort.
“In the test set, the model combining both imaging modalities (radiomic–histopathological (RH) model) significantly outperformed the HRD status-based model, clinical model and individual imaging models, with a test concordance index of 0.62 (95% CI 0.604–0.638)… The model with genomic, radiomic and histopathological (GRH) modalities performed comparably, with a test concordance index of 0.61 (95% CI 0.594–0.625)” — Results/Multimodal prognostication.
“In the test set, the GRH risk groups also showed significantly different OS, with median survival of 30 months for the high-risk group and 50 months for the low-risk group (P = 0.023)… At 36 months, 68% and 34% survived for low- and high-risk groups, respectively” — Results/Multimodal prognostication.
“The same two risk groups identified by the model in the test set also showed significantly different progression-free survival (PFS) (P = 0.040)” — Results/Multimodal prognostication.
“Absolute Kendall rank correlation coefficient values were low between individual modalities (<0.14)… The maximal magnitude of the Pearson correlation between individual modalities is 0.191” — Results + Extended Data Fig. 9.
“Patients with high-confidence dominant signature 3 or at least one significant variant or deep deletion in the HRD-DDR genes were assigned to the HRD subtype, except when there was evidence that patients belonged to the foldback inversion- or tandem duplicator-enriched subgroups (via CCNE1 amplification or CDK12 SNVs, specifically)” — Methods/Inferring HRD status.
“Analysis of only training cases with full information (n = 114) resulted in poor performance… reinforcing the ability of late-fusion models to learn in the setting of missing data” — Results/Multimodal prognostication.

This page was processed by crosslinker on 2026-05-04.