Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

Deep learning can extract predictive and prognostic biomarkers from routine pathology slides in colorectal cancer.
A DL test for the diagnosis of microsatellite instability (MSI) in CRC has been approved in 2022.
Current approaches rely on convolutional neural networks (CNNs).
Transformer networks are outperforming CNNs and are replacing them in many applications.
A fully transformer-based pipeline was developed for end-to-end biomarker prediction from pathology slides.
The pipeline was trained on over 9,000 patients from 10 colorectal cancer cohorts.
The fully transformer-based approach improved performance, generalizability, data efficiency, and interpretability.
After training on a large multicenter cohort, a sensitivity of 0.97 and a negative predictive value of 0.99 for MSI prediction was achieved.
Clinical-grade performance was reached on endoscopic biopsy tissue.
The new methods are freely available under an open source license.

Paper Content

Introduction

Precision oncology in colorectal cancer requires evaluation of genetic biomarkers
Common biomarkers are measured by PCR, sequencing, or immunohistochemical assays
Biomarker identification is important for providing treatment as recommended by medical guidelines
Genetic biomarkers are increasingly used in earlier tumor stages of CRC
Genetic diagnostic assays have several disadvantages
Diagnosis of CRC requires pathologist’s histopathological evaluation of tissue sections
Deep Learning can predict genetic biomarkers from digitized H&E-stained CRC tissue sections
Commercial DL algorithm for biomarker detection from H&E images approved for routine clinical use in Europe in 2022
State-of-the-art approaches have reached a sensitivity and specificity of 0.95 and 0.46, respectively
Poor performance on endoscopic biopsy tissue
Technology underlying current approaches is based on weakly-supervised learning
Most common approach uses a small two-layer network to learn the patch-level weighting of the embeddings
Transformers have been proposed as potentially superior feature extractors or aggregation models
Aim to enhance performance of DL-based biomarker detection from pathology slides
Evaluate use of fully transformer-based workflow in CRC in 10 cohorts with resection specimen slides and one large cohort of CRC biopsies

Model description

Pipeline consists of 3 steps: data pre-processing, feature extraction, and aggregation
Pre-processing includes tissue segmentation and stain-normalization
WSI tessellated into tiles of 512x512 pixels
Feature representations of dimension 768 extracted using CTransPath model
Model architecture based on Swin Transformer
Embeddings for each tile stored for subsequent training
Final part of model takes all patches of WSI as input and predicts one biomarker
Attention-based MIL approach uses small neural network to compute patch importance
Transformer network uses multi-headed self-attention to relate each element to every other element
Model architecture compared to TransMIL

Ethics statement

Analyzed anonymized patient samples from multiple academic institutions
Ethics board gave consent to analysis at DACHS, Epi700, MECC, MUNICH, NLCS, and QUASAR
Specific ethics approval not required for retrospective analysis of anonymized samples at CPTAC, DUSSEL, TCGA, and YCR-BCIP
Study adheres to STARD

Cohort description

9,048 patients with CRC from 10 patient cohorts were collected by the MSIDETECT consortium
Two public databases were included
Detailed clinicopathological variables are available
Tissue samples were formalin-fixed paraffin-embedded (FFPE)
MSI and dMMR status are available for each patient
KRAS and BRAF mutational status are available for some cohorts

Experimental setup and implementation details

Performed experiments using five-fold cross-validation
Split data set into in-domain validation and test set
Used validation set to determine best model
Evaluated models on external cohorts
Trained models with AdamW optimizer
Trained for 8 epochs with batch size of 1
Evaluated models every 500/1000 iterations
Implemented AttentionMIL with Adam optimizer

Statistics and endpoints

Used AUROC and AUPRC as evaluation metrics
Data is highly imbalanced with respect to target variables
Reported mean and standard deviation of 5-fold cross-validations
Split dataset into patient-wise training, validation and internal test sets
External test sets consisted of different cohorts

Visualization and explainability

Final prediction is retrieved via class token attached to input sequence
Attention rollout used to visualize contribution of each input patch
Attention scores for each head in transformer visualized by taking class token’s self-attention

Results

A fully transformer-based msi prediction outperforms the state-of-the-art

Tested pipeline on MSI prediction in 10 large cohorts of CRC patients
Trained model on single cohort and tested on held-out test set and other cohorts
Achieved in-domain AUROCs close to 0.95
Achieved high performance close to 0.9 AUROC for early-onset CRC
Outperformed CNN-based approach on all four cohorts
Evaluated AttentionMIL with CTransPath as feature extractor
Transformer-based model performed slightly better with an AUROC of 0.97
Obtained a sensitivity of 0.97 with a negative predictive value of 0.99
Transformer model reduced performance loss for external testing to a maximum of 0.08

The fully transformer-based model predicts multiple biomarkers in crc

Investigated whether fully transformer-based model yields high performance in other biomarker prediction tasks
Trained model on single cohorts and one fully merged multi-center cohort
Tested BRAF and KRAS prediction on DACHS, QUASAR, NLCS, TCGA, and Epi700 cohorts
Single cohort training achieved good results with AUROCs of 0.86, 0.84, and 0.88
Smaller cohorts achieved poorer results with wider standard deviations in AUROC
In-domain test using TCGA outperformed previous approaches
Multi-centric cohort yielded an AUROC of 0.86
Generalization gap from internal test set to external cohorts was consistently small
Performance increases with number of patients in training cohort

Fully transformer-based workflows are explainable

DL-based biomarker predictions should be explainable to domain experts
Visualized how much each patch contributed to the final classification
Used same WSIs from external cohort YCR-BCIP
Majority of highly-contributing patches originate from tumor regions
Model attributes high level of attention to tissue regions and larger blood vessels

Fully transformer-based workflows are more data efficient

Problem in computational pathology is determining sample size for prediction tasks
Unclear what minimum sample size is and if adding more samples improves performance
Experiments conducted with 8181 patients from nine cohorts
Transformer-based model architecture achieved AUROC value of 0.92 with 250 patients
AttentionMIL model exceeded AUROC of 0.9 with 4000 patients
Transformer-based model surpassed 0.95 mean testing AUROC with 1500 patients

Fully transformer-based workflows result in clinical-grade performance on biopsies

Previous studies used surgical resection slides for biomarker prediction in CRC
Commercially available MSI detection algorithms are intended for resection slides
Recent clinical evidence shows MSI-positive CRC patients need to be tested on biopsy material
Model trained on resections from DACHS, QUASAR, NLCS, and TCGA and evaluated on biopsies from 1,502 patients with CRC
Model yielded mean AUROC score of 0.91 when externally validated on biopsies
Model outperformed existing approaches and achieved clinical-grade performance on biopsies
Intended clinical use of workflow is to speed up step between taking biopsy and molecular determination of MSI-high status

Discussion

Precision oncology biomarkers are complex, costly and require intricate instrumentation and expertise.
DL can extract biomarker information directly from routinely available material, potentially providing cost savings.
DL-based analysis of histopathology slides to extract biomarkers for oncology has become a common approach in the research setting in 2018.
Multiple algorithms have been approved for clinical use.
Existing DL biomarkers have some key limitations.
A new class of neural networks, transformers, is replacing CNNs.
Transformers are more robust to distortions in the input data and provide more detailed explainability.
A transformer-based approach was developed to predict MSI on WSI from CRC with an AUROC of 0.97 on resections and 0.91 on biopsies.
The transformer-based approach generalized better to unseen cohorts and was more data-efficient compared to existing state-of-the-art MIL or CNN approaches.
The model was published to enable researchers and clinicians to apply the automated MSI prediction tool in clinical practice.
Further optimization of the architecture and collecting biopsy samples from different hospitals could potentially improve the performance of the model.

Link to paper#

Abstract#

Paper Content#

Introduction#

Model description#

Ethics statement#

Cohort description#

Experimental setup and implementation details#

Statistics and endpoints#

Visualization and explainability#

Results#

A fully transformer-based msi prediction outperforms the state-of-the-art#

The fully transformer-based model predicts multiple biomarkers in crc#

Fully transformer-based workflows are explainable#

Fully transformer-based workflows are more data efficient#

Fully transformer-based workflows result in clinical-grade performance on biopsies#

Discussion#

Link to paper

Abstract

Paper Content

Introduction

Model description

Ethics statement

Cohort description

Experimental setup and implementation details

Statistics and endpoints

Visualization and explainability

Results

A fully transformer-based msi prediction outperforms the state-of-the-art

The fully transformer-based model predicts multiple biomarkers in crc

Fully transformer-based workflows are explainable

Fully transformer-based workflows are more data efficient

Fully transformer-based workflows result in clinical-grade performance on biopsies

Discussion