Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- DPM is a hot topic in computer vision
- It is used for image generation, deblurring, super-resolution and anomaly detection
- MedSegDiff is the first DPM based model for medical image segmentation
- Dynamic conditional encoding and FF-Parser are proposed to enhance the step-wise regional attention
- MedSegDiff outperforms SOTA methods on three medical segmentation tasks
Paper Content
I. introduction
- Propose a DPM-based segmentation model, MedSegDiff, for medical image segmentation.
- Introduce dynamic conditional encoding and feature frequency parser to improve the segmentation accuracy.
- Demonstrate the effectiveness of MedSegDiff on three different medical segmentation tasks.
- Medical image segmentation is the process of partitioning a medical image into meaningful regions.
- It allows medical professionals to better understand what they’re looking at and compare images over time.
- Automatic medical image segmentation methods have been developed to reduce time and effort.
- Deep learning techniques have been used to improve accuracy.
- A new DPM-based segmentation model, MedSegDiff, has been proposed and outperforms previous SOTA on three different medical segmentation tasks.
Ii. method
- Diffusion models are generative models composed of two stages: forward diffusion and reverse diffusion.
- The reverse process uses a neural network to recover the original data by reversing the noising process.
- A UNet is used as the network for the learning, with the step estimation function conditioned by raw image prior.
A. dynamic conditional encoding
- Conditional DPMs use a unique given information as the conditional prior.
- Medical image segmentation is difficult due to ambiguous objects and low-contrast images.
- A dynamic conditional encoding is proposed to integrate the current-step segmentation information into the raw image encoding.
- An attentive-like mechanism is used to enhance the attentive region.
- FF-Parser is proposed to constrain the high-frequency components in the features.
B. ff-parser
- FF-Parser is used to constrain noise-related components in features.
- FF-Parser is a parameterized attentive map that is applied to Fourier space features.
- FF-Parser is a learnable version of frequency filters used in digital image processing.
C. training and architecture
- MedSegDiff is trained using the standard process of DPM.
- A random couple of raw image and segmentation label is sampled for each iteration.
- Iteration number is sampled from a uniform and Gaussian distribution.
- MedSegDiff is a modified Re-sUNet with a ResNet encoder and UNet decoder.
Iii. experiments a. dataset
- Conducted experiments on 3 medical tasks with different image modalities
- Experiments conducted on REFUGE-2, BraTs-2021 and DDTI datasets
- Datasets publicly available with segmentation and diagnosis labels
- Train/validation/test sets split following default settings of dataset
B. implementation details
- 4 variants of the model MedSegDiff++, MedSegDiff-L, MedSegDiff-B, and MedSegDiff-S are experimented with
- UNet with 4x, 5x, 6x, 6x downsamples are used in the variants
- 100 diffusion steps are used for inference
- Experiments are implemented with PyTorch and trained/tested on 4 Tesla P40 GPU
- Images are resized to 256x256 pixels
- Networks are trained in an end-to-end manner using AdamW optimizer
- MedSegDiff-B and MedSegDiff-S are trained with 32 batch size, MedSegDiff-L and MedSegDiff++ are trained with 64 batch size
- Learning rate is initially set to 1x10-4
C. main results
- Comparing SOTA segmentation methods for 3 specific tasks and general medical image segmentation
- Evaluating segmentation performance by Dice score and IoU
- Advanced network architectures commonly gain better results
- MedSegDiff outperforms all other methods on 3 different tasks
D. ablation study
- Dynamic conditioning (Dy-Cond) improves performance on all three tasks.
- Dy-Cond improves 2.1% on optic-cup segmentation.
- Dy-Cond improves 1.6% and 1.8% on brain tumor and thyroid nodule segmentation respectively.
- FF-Parser further optimizes segmentation results.
Iv. conclusion
- Proposed MedSegDiff scheme for DPM-based general medical image segmentation
- Proposed two novel techniques to improve performance
- Experiments show MedSegDiff outperforms previous SOTA