Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Deep learning can extract predictive and prognostic biomarkers from routine pathology slides in colorectal cancer.
  • A DL test for the diagnosis of microsatellite instability (MSI) in CRC has been approved in 2022.
  • Current approaches rely on convolutional neural networks (CNNs).
  • Transformer networks are outperforming CNNs and are replacing them in many applications.
  • A fully transformer-based pipeline was developed for end-to-end biomarker prediction from pathology slides.
  • The pipeline was trained on over 9,000 patients from 10 colorectal cancer cohorts.
  • The fully transformer-based approach improved performance, generalizability, data efficiency, and interpretability.
  • After training on a large multicenter cohort, a sensitivity of 0.97 and a negative predictive value of 0.99 for MSI prediction was achieved.
  • Clinical-grade performance was reached on endoscopic biopsy tissue.
  • The new methods are freely available under an open source license.

Paper Content

Introduction

  • Precision oncology in colorectal cancer requires evaluation of genetic biomarkers
  • Common biomarkers are measured by PCR, sequencing, or immunohistochemical assays
  • Biomarker identification is important for providing treatment as recommended by medical guidelines
  • Genetic biomarkers are increasingly used in earlier tumor stages of CRC
  • Genetic diagnostic assays have several disadvantages
  • Diagnosis of CRC requires pathologist’s histopathological evaluation of tissue sections
  • Deep Learning can predict genetic biomarkers from digitized H&E-stained CRC tissue sections
  • Commercial DL algorithm for biomarker detection from H&E images approved for routine clinical use in Europe in 2022
  • State-of-the-art approaches have reached a sensitivity and specificity of 0.95 and 0.46, respectively
  • Poor performance on endoscopic biopsy tissue
  • Technology underlying current approaches is based on weakly-supervised learning
  • Most common approach uses a small two-layer network to learn the patch-level weighting of the embeddings
  • Transformers have been proposed as potentially superior feature extractors or aggregation models
  • Aim to enhance performance of DL-based biomarker detection from pathology slides
  • Evaluate use of fully transformer-based workflow in CRC in 10 cohorts with resection specimen slides and one large cohort of CRC biopsies

Model description

  • Pipeline consists of 3 steps: data pre-processing, feature extraction, and aggregation
  • Pre-processing includes tissue segmentation and stain-normalization
  • WSI tessellated into tiles of 512x512 pixels
  • Feature representations of dimension 768 extracted using CTransPath model
  • Model architecture based on Swin Transformer
  • Embeddings for each tile stored for subsequent training
  • Final part of model takes all patches of WSI as input and predicts one biomarker
  • Attention-based MIL approach uses small neural network to compute patch importance
  • Transformer network uses multi-headed self-attention to relate each element to every other element
  • Model architecture compared to TransMIL

Ethics statement

  • Analyzed anonymized patient samples from multiple academic institutions
  • Ethics board gave consent to analysis at DACHS, Epi700, MECC, MUNICH, NLCS, and QUASAR
  • Specific ethics approval not required for retrospective analysis of anonymized samples at CPTAC, DUSSEL, TCGA, and YCR-BCIP
  • Study adheres to STARD

Cohort description

  • 9,048 patients with CRC from 10 patient cohorts were collected by the MSIDETECT consortium
  • Two public databases were included
  • Detailed clinicopathological variables are available
  • Tissue samples were formalin-fixed paraffin-embedded (FFPE)
  • MSI and dMMR status are available for each patient
  • KRAS and BRAF mutational status are available for some cohorts

Experimental setup and implementation details

  • Performed experiments using five-fold cross-validation
  • Split data set into in-domain validation and test set
  • Used validation set to determine best model
  • Evaluated models on external cohorts
  • Trained models with AdamW optimizer
  • Trained for 8 epochs with batch size of 1
  • Evaluated models every 500/1000 iterations
  • Implemented AttentionMIL with Adam optimizer

Statistics and endpoints

  • Used AUROC and AUPRC as evaluation metrics
  • Data is highly imbalanced with respect to target variables
  • Reported mean and standard deviation of 5-fold cross-validations
  • Split dataset into patient-wise training, validation and internal test sets
  • External test sets consisted of different cohorts

Visualization and explainability

  • Final prediction is retrieved via class token attached to input sequence
  • Attention rollout used to visualize contribution of each input patch
  • Attention scores for each head in transformer visualized by taking class token’s self-attention

Results

A fully transformer-based msi prediction outperforms the state-of-the-art

  • Tested pipeline on MSI prediction in 10 large cohorts of CRC patients
  • Trained model on single cohort and tested on held-out test set and other cohorts
  • Achieved in-domain AUROCs close to 0.95
  • Achieved high performance close to 0.9 AUROC for early-onset CRC
  • Outperformed CNN-based approach on all four cohorts
  • Evaluated AttentionMIL with CTransPath as feature extractor
  • Transformer-based model performed slightly better with an AUROC of 0.97
  • Obtained a sensitivity of 0.97 with a negative predictive value of 0.99
  • Transformer model reduced performance loss for external testing to a maximum of 0.08

The fully transformer-based model predicts multiple biomarkers in crc

  • Investigated whether fully transformer-based model yields high performance in other biomarker prediction tasks
  • Trained model on single cohorts and one fully merged multi-center cohort
  • Tested BRAF and KRAS prediction on DACHS, QUASAR, NLCS, TCGA, and Epi700 cohorts
  • Single cohort training achieved good results with AUROCs of 0.86, 0.84, and 0.88
  • Smaller cohorts achieved poorer results with wider standard deviations in AUROC
  • In-domain test using TCGA outperformed previous approaches
  • Multi-centric cohort yielded an AUROC of 0.86
  • Generalization gap from internal test set to external cohorts was consistently small
  • Performance increases with number of patients in training cohort

Fully transformer-based workflows are explainable

  • DL-based biomarker predictions should be explainable to domain experts
  • Visualized how much each patch contributed to the final classification
  • Used same WSIs from external cohort YCR-BCIP
  • Majority of highly-contributing patches originate from tumor regions
  • Model attributes high level of attention to tissue regions and larger blood vessels

Fully transformer-based workflows are more data efficient

  • Problem in computational pathology is determining sample size for prediction tasks
  • Unclear what minimum sample size is and if adding more samples improves performance
  • Experiments conducted with 8181 patients from nine cohorts
  • Transformer-based model architecture achieved AUROC value of 0.92 with 250 patients
  • AttentionMIL model exceeded AUROC of 0.9 with 4000 patients
  • Transformer-based model surpassed 0.95 mean testing AUROC with 1500 patients

Fully transformer-based workflows result in clinical-grade performance on biopsies

  • Previous studies used surgical resection slides for biomarker prediction in CRC
  • Commercially available MSI detection algorithms are intended for resection slides
  • Recent clinical evidence shows MSI-positive CRC patients need to be tested on biopsy material
  • Model trained on resections from DACHS, QUASAR, NLCS, and TCGA and evaluated on biopsies from 1,502 patients with CRC
  • Model yielded mean AUROC score of 0.91 when externally validated on biopsies
  • Model outperformed existing approaches and achieved clinical-grade performance on biopsies
  • Intended clinical use of workflow is to speed up step between taking biopsy and molecular determination of MSI-high status

Discussion

  • Precision oncology biomarkers are complex, costly and require intricate instrumentation and expertise.
  • DL can extract biomarker information directly from routinely available material, potentially providing cost savings.
  • DL-based analysis of histopathology slides to extract biomarkers for oncology has become a common approach in the research setting in 2018.
  • Multiple algorithms have been approved for clinical use.
  • Existing DL biomarkers have some key limitations.
  • A new class of neural networks, transformers, is replacing CNNs.
  • Transformers are more robust to distortions in the input data and provide more detailed explainability.
  • A transformer-based approach was developed to predict MSI on WSI from CRC with an AUROC of 0.97 on resections and 0.91 on biopsies.
  • The transformer-based approach generalized better to unseen cohorts and was more data-efficient compared to existing state-of-the-art MIL or CNN approaches.
  • The model was published to enable researchers and clinicians to apply the automated MSI prediction tool in clinical practice.
  • Further optimization of the architecture and collecting biopsy samples from different hospitals could potentially improve the performance of the model.