Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Deep learning can extract predictive and prognostic biomarkers from routine pathology slides in colorectal cancer.
- A DL test for the diagnosis of microsatellite instability (MSI) in CRC has been approved in 2022.
- Current approaches rely on convolutional neural networks (CNNs).
- Transformer networks are outperforming CNNs and are replacing them in many applications.
- A fully transformer-based pipeline was developed for end-to-end biomarker prediction from pathology slides.
- The pipeline was trained on over 9,000 patients from 10 colorectal cancer cohorts.
- The fully transformer-based approach improved performance, generalizability, data efficiency, and interpretability.
- After training on a large multicenter cohort, a sensitivity of 0.97 and a negative predictive value of 0.99 for MSI prediction was achieved.
- Clinical-grade performance was reached on endoscopic biopsy tissue.
- The new methods are freely available under an open source license.
Paper Content
Introduction
- Precision oncology in colorectal cancer requires evaluation of genetic biomarkers
- Common biomarkers are measured by PCR, sequencing, or immunohistochemical assays
- Biomarker identification is important for providing treatment as recommended by medical guidelines
- Genetic biomarkers are increasingly used in earlier tumor stages of CRC
- Genetic diagnostic assays have several disadvantages
- Diagnosis of CRC requires pathologist’s histopathological evaluation of tissue sections
- Deep Learning can predict genetic biomarkers from digitized H&E-stained CRC tissue sections
- Commercial DL algorithm for biomarker detection from H&E images approved for routine clinical use in Europe in 2022
- State-of-the-art approaches have reached a sensitivity and specificity of 0.95 and 0.46, respectively
- Poor performance on endoscopic biopsy tissue
- Technology underlying current approaches is based on weakly-supervised learning
- Most common approach uses a small two-layer network to learn the patch-level weighting of the embeddings
- Transformers have been proposed as potentially superior feature extractors or aggregation models
- Aim to enhance performance of DL-based biomarker detection from pathology slides
- Evaluate use of fully transformer-based workflow in CRC in 10 cohorts with resection specimen slides and one large cohort of CRC biopsies
Model description
- Pipeline consists of 3 steps: data pre-processing, feature extraction, and aggregation
- Pre-processing includes tissue segmentation and stain-normalization
- WSI tessellated into tiles of 512x512 pixels
- Feature representations of dimension 768 extracted using CTransPath model
- Model architecture based on Swin Transformer
- Embeddings for each tile stored for subsequent training
- Final part of model takes all patches of WSI as input and predicts one biomarker
- Attention-based MIL approach uses small neural network to compute patch importance
- Transformer network uses multi-headed self-attention to relate each element to every other element
- Model architecture compared to TransMIL
Ethics statement
- Analyzed anonymized patient samples from multiple academic institutions
- Ethics board gave consent to analysis at DACHS, Epi700, MECC, MUNICH, NLCS, and QUASAR
- Specific ethics approval not required for retrospective analysis of anonymized samples at CPTAC, DUSSEL, TCGA, and YCR-BCIP
- Study adheres to STARD
Cohort description
- 9,048 patients with CRC from 10 patient cohorts were collected by the MSIDETECT consortium
- Two public databases were included
- Detailed clinicopathological variables are available
- Tissue samples were formalin-fixed paraffin-embedded (FFPE)
- MSI and dMMR status are available for each patient
- KRAS and BRAF mutational status are available for some cohorts
Experimental setup and implementation details
- Performed experiments using five-fold cross-validation
- Split data set into in-domain validation and test set
- Used validation set to determine best model
- Evaluated models on external cohorts
- Trained models with AdamW optimizer
- Trained for 8 epochs with batch size of 1
- Evaluated models every 500/1000 iterations
- Implemented AttentionMIL with Adam optimizer
Statistics and endpoints
- Used AUROC and AUPRC as evaluation metrics
- Data is highly imbalanced with respect to target variables
- Reported mean and standard deviation of 5-fold cross-validations
- Split dataset into patient-wise training, validation and internal test sets
- External test sets consisted of different cohorts
Visualization and explainability
- Final prediction is retrieved via class token attached to input sequence
- Attention rollout used to visualize contribution of each input patch
- Attention scores for each head in transformer visualized by taking class token’s self-attention
Results
A fully transformer-based msi prediction outperforms the state-of-the-art
- Tested pipeline on MSI prediction in 10 large cohorts of CRC patients
- Trained model on single cohort and tested on held-out test set and other cohorts
- Achieved in-domain AUROCs close to 0.95
- Achieved high performance close to 0.9 AUROC for early-onset CRC
- Outperformed CNN-based approach on all four cohorts
- Evaluated AttentionMIL with CTransPath as feature extractor
- Transformer-based model performed slightly better with an AUROC of 0.97
- Obtained a sensitivity of 0.97 with a negative predictive value of 0.99
- Transformer model reduced performance loss for external testing to a maximum of 0.08
The fully transformer-based model predicts multiple biomarkers in crc
- Investigated whether fully transformer-based model yields high performance in other biomarker prediction tasks
- Trained model on single cohorts and one fully merged multi-center cohort
- Tested BRAF and KRAS prediction on DACHS, QUASAR, NLCS, TCGA, and Epi700 cohorts
- Single cohort training achieved good results with AUROCs of 0.86, 0.84, and 0.88
- Smaller cohorts achieved poorer results with wider standard deviations in AUROC
- In-domain test using TCGA outperformed previous approaches
- Multi-centric cohort yielded an AUROC of 0.86
- Generalization gap from internal test set to external cohorts was consistently small
- Performance increases with number of patients in training cohort
Fully transformer-based workflows are explainable
- DL-based biomarker predictions should be explainable to domain experts
- Visualized how much each patch contributed to the final classification
- Used same WSIs from external cohort YCR-BCIP
- Majority of highly-contributing patches originate from tumor regions
- Model attributes high level of attention to tissue regions and larger blood vessels
Fully transformer-based workflows are more data efficient
- Problem in computational pathology is determining sample size for prediction tasks
- Unclear what minimum sample size is and if adding more samples improves performance
- Experiments conducted with 8181 patients from nine cohorts
- Transformer-based model architecture achieved AUROC value of 0.92 with 250 patients
- AttentionMIL model exceeded AUROC of 0.9 with 4000 patients
- Transformer-based model surpassed 0.95 mean testing AUROC with 1500 patients
Fully transformer-based workflows result in clinical-grade performance on biopsies
- Previous studies used surgical resection slides for biomarker prediction in CRC
- Commercially available MSI detection algorithms are intended for resection slides
- Recent clinical evidence shows MSI-positive CRC patients need to be tested on biopsy material
- Model trained on resections from DACHS, QUASAR, NLCS, and TCGA and evaluated on biopsies from 1,502 patients with CRC
- Model yielded mean AUROC score of 0.91 when externally validated on biopsies
- Model outperformed existing approaches and achieved clinical-grade performance on biopsies
- Intended clinical use of workflow is to speed up step between taking biopsy and molecular determination of MSI-high status
Discussion
- Precision oncology biomarkers are complex, costly and require intricate instrumentation and expertise.
- DL can extract biomarker information directly from routinely available material, potentially providing cost savings.
- DL-based analysis of histopathology slides to extract biomarkers for oncology has become a common approach in the research setting in 2018.
- Multiple algorithms have been approved for clinical use.
- Existing DL biomarkers have some key limitations.
- A new class of neural networks, transformers, is replacing CNNs.
- Transformers are more robust to distortions in the input data and provide more detailed explainability.
- A transformer-based approach was developed to predict MSI on WSI from CRC with an AUROC of 0.97 on resections and 0.91 on biopsies.
- The transformer-based approach generalized better to unseen cohorts and was more data-efficient compared to existing state-of-the-art MIL or CNN approaches.
- The model was published to enable researchers and clinicians to apply the automated MSI prediction tool in clinical practice.
- Further optimization of the architecture and collecting biopsy samples from different hospitals could potentially improve the performance of the model.