Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • DPM is a hot topic in computer vision
  • It is used for image generation, deblurring, super-resolution and anomaly detection
  • MedSegDiff is the first DPM based model for medical image segmentation
  • Dynamic conditional encoding and FF-Parser are proposed to enhance the step-wise regional attention
  • MedSegDiff outperforms SOTA methods on three medical segmentation tasks

Paper Content

I. introduction

  • Propose a DPM-based segmentation model, MedSegDiff, for medical image segmentation.
  • Introduce dynamic conditional encoding and feature frequency parser to improve the segmentation accuracy.
  • Demonstrate the effectiveness of MedSegDiff on three different medical segmentation tasks.
  • Medical image segmentation is the process of partitioning a medical image into meaningful regions.
  • It allows medical professionals to better understand what they’re looking at and compare images over time.
  • Automatic medical image segmentation methods have been developed to reduce time and effort.
  • Deep learning techniques have been used to improve accuracy.
  • A new DPM-based segmentation model, MedSegDiff, has been proposed and outperforms previous SOTA on three different medical segmentation tasks.

Ii. method

  • Diffusion models are generative models composed of two stages: forward diffusion and reverse diffusion.
  • The reverse process uses a neural network to recover the original data by reversing the noising process.
  • A UNet is used as the network for the learning, with the step estimation function conditioned by raw image prior.

A. dynamic conditional encoding

  • Conditional DPMs use a unique given information as the conditional prior.
  • Medical image segmentation is difficult due to ambiguous objects and low-contrast images.
  • A dynamic conditional encoding is proposed to integrate the current-step segmentation information into the raw image encoding.
  • An attentive-like mechanism is used to enhance the attentive region.
  • FF-Parser is proposed to constrain the high-frequency components in the features.

B. ff-parser

  • FF-Parser is used to constrain noise-related components in features.
  • FF-Parser is a parameterized attentive map that is applied to Fourier space features.
  • FF-Parser is a learnable version of frequency filters used in digital image processing.

C. training and architecture

  • MedSegDiff is trained using the standard process of DPM.
  • A random couple of raw image and segmentation label is sampled for each iteration.
  • Iteration number is sampled from a uniform and Gaussian distribution.
  • MedSegDiff is a modified Re-sUNet with a ResNet encoder and UNet decoder.

Iii. experiments a. dataset

  • Conducted experiments on 3 medical tasks with different image modalities
  • Experiments conducted on REFUGE-2, BraTs-2021 and DDTI datasets
  • Datasets publicly available with segmentation and diagnosis labels
  • Train/validation/test sets split following default settings of dataset

B. implementation details

  • 4 variants of the model MedSegDiff++, MedSegDiff-L, MedSegDiff-B, and MedSegDiff-S are experimented with
  • UNet with 4x, 5x, 6x, 6x downsamples are used in the variants
  • 100 diffusion steps are used for inference
  • Experiments are implemented with PyTorch and trained/tested on 4 Tesla P40 GPU
  • Images are resized to 256x256 pixels
  • Networks are trained in an end-to-end manner using AdamW optimizer
  • MedSegDiff-B and MedSegDiff-S are trained with 32 batch size, MedSegDiff-L and MedSegDiff++ are trained with 64 batch size
  • Learning rate is initially set to 1x10-4

C. main results

  • Comparing SOTA segmentation methods for 3 specific tasks and general medical image segmentation
  • Evaluating segmentation performance by Dice score and IoU
  • Advanced network architectures commonly gain better results
  • MedSegDiff outperforms all other methods on 3 different tasks

D. ablation study

  • Dynamic conditioning (Dy-Cond) improves performance on all three tasks.
  • Dy-Cond improves 2.1% on optic-cup segmentation.
  • Dy-Cond improves 1.6% and 1.8% on brain tumor and thyroid nodule segmentation respectively.
  • FF-Parser further optimizes segmentation results.

Iv. conclusion

  • Proposed MedSegDiff scheme for DPM-based general medical image segmentation
  • Proposed two novel techniques to improve performance
  • Experiments show MedSegDiff outperforms previous SOTA