Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

DPM is a hot topic in computer vision
It is used for image generation, deblurring, super-resolution and anomaly detection
MedSegDiff is the first DPM based model for medical image segmentation
Dynamic conditional encoding and FF-Parser are proposed to enhance the step-wise regional attention
MedSegDiff outperforms SOTA methods on three medical segmentation tasks

Paper Content

I. introduction

Propose a DPM-based segmentation model, MedSegDiff, for medical image segmentation.
Introduce dynamic conditional encoding and feature frequency parser to improve the segmentation accuracy.
Demonstrate the effectiveness of MedSegDiff on three different medical segmentation tasks.
Medical image segmentation is the process of partitioning a medical image into meaningful regions.
It allows medical professionals to better understand what they’re looking at and compare images over time.
Automatic medical image segmentation methods have been developed to reduce time and effort.
Deep learning techniques have been used to improve accuracy.
A new DPM-based segmentation model, MedSegDiff, has been proposed and outperforms previous SOTA on three different medical segmentation tasks.

Ii. method

Diffusion models are generative models composed of two stages: forward diffusion and reverse diffusion.
The reverse process uses a neural network to recover the original data by reversing the noising process.
A UNet is used as the network for the learning, with the step estimation function conditioned by raw image prior.

A. dynamic conditional encoding

Conditional DPMs use a unique given information as the conditional prior.
Medical image segmentation is difficult due to ambiguous objects and low-contrast images.
A dynamic conditional encoding is proposed to integrate the current-step segmentation information into the raw image encoding.
An attentive-like mechanism is used to enhance the attentive region.
FF-Parser is proposed to constrain the high-frequency components in the features.

B. ff-parser

FF-Parser is used to constrain noise-related components in features.
FF-Parser is a parameterized attentive map that is applied to Fourier space features.
FF-Parser is a learnable version of frequency filters used in digital image processing.

C. training and architecture

MedSegDiff is trained using the standard process of DPM.
A random couple of raw image and segmentation label is sampled for each iteration.
Iteration number is sampled from a uniform and Gaussian distribution.
MedSegDiff is a modified Re-sUNet with a ResNet encoder and UNet decoder.

Iii. experiments a. dataset

Conducted experiments on 3 medical tasks with different image modalities
Experiments conducted on REFUGE-2, BraTs-2021 and DDTI datasets
Datasets publicly available with segmentation and diagnosis labels
Train/validation/test sets split following default settings of dataset

B. implementation details

4 variants of the model MedSegDiff++, MedSegDiff-L, MedSegDiff-B, and MedSegDiff-S are experimented with
UNet with 4x, 5x, 6x, 6x downsamples are used in the variants
100 diffusion steps are used for inference
Experiments are implemented with PyTorch and trained/tested on 4 Tesla P40 GPU
Images are resized to 256x256 pixels
Networks are trained in an end-to-end manner using AdamW optimizer
MedSegDiff-B and MedSegDiff-S are trained with 32 batch size, MedSegDiff-L and MedSegDiff++ are trained with 64 batch size
Learning rate is initially set to 1x10-4

C. main results

Comparing SOTA segmentation methods for 3 specific tasks and general medical image segmentation
Evaluating segmentation performance by Dice score and IoU
Advanced network architectures commonly gain better results
MedSegDiff outperforms all other methods on 3 different tasks

D. ablation study

Dynamic conditioning (Dy-Cond) improves performance on all three tasks.
Dy-Cond improves 2.1% on optic-cup segmentation.
Dy-Cond improves 1.6% and 1.8% on brain tumor and thyroid nodule segmentation respectively.
FF-Parser further optimizes segmentation results.

Iv. conclusion

Proposed MedSegDiff scheme for DPM-based general medical image segmentation
Proposed two novel techniques to improve performance
Experiments show MedSegDiff outperforms previous SOTA

Link to paper#

Abstract#

Paper Content#

I. introduction#

Ii. method#

A. dynamic conditional encoding#

B. ff-parser#

C. training and architecture#

Iii. experiments a. dataset#

B. implementation details#

C. main results#

D. ablation study#

Iv. conclusion#

Link to paper

Abstract

Paper Content

I. introduction

Ii. method

A. dynamic conditional encoding

B. ff-parser

C. training and architecture

Iii. experiments a. dataset

B. implementation details

C. main results

D. ablation study

Iv. conclusion