Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Human organs change due to short-term and long-term factors
  • Medical image generation tasks rely on single images, ignoring sequential dependency
  • Sequence-aware deep generative models are underexplored in medical imaging
  • Sequence-aware diffusion model (SADM) proposed to generate longitudinal medical images
  • SADM uses sequence-aware transformer to learn longitudinal dependency with missing data

Paper Content

Introduction

  • Generative models used in medical domain due to hardware and data availability
  • Two tasks: generation of longitudinal brain image and multi-frame cardiac image
  • Generative adversarial networks and diffusion models used
  • Sequence-aware deep generative models to learn temporal dependency
  • Proposed sequence-aware diffusion model (SADM) to address issues of longitudinal medical data
  • SADM works with single image input, longitudinal data with missing frames, and high-dimensional images
  • State-of-the-art results in longitudinal image generation and missing data imputation

Preliminary

Diffusion models

  • Diffusion models consist of a forward process and a reverse process.
  • The forward process adds noise to data to create a noisy version.
  • The reverse process predicts and subtracts the noise in the reverse direction.
  • The reverse process is parameterized by a generative model.

Classifier-free guidance

  • Two branches of conditioning methods for diffusion models: classifier guided and classifier-free guidance
  • Classifier-free guidance used for conditioning the diffusion model
  • Guidance strength controls tradeoff between sample quality and diversity
  • Sequence-aware diffusion model (SADM) proposed for longitudinal medical image generation
  • Transformer-based attention module used to generate conditioning signals for diffusion model
  • Problem setting: X is a longitudinal 3D medical image with temporal length L
  • Token embedding: 4D convolution with l = 1
  • Temporal encoder: self-attention along temporal dimensions
  • Spatial decoder: self-attention to spatial dimensions between all tokens in same temporal index

Conditional diffusion model

  • Proposed SADM follows formulation of classifier-free diffusion model
  • Uses sequence-aware conditioning signal
  • Autoregressive sampling scheme captures long-distance temporal dependency
  • Training pipeline defined in Algorithm 1
  • Autoregressive sampling scheme imputes missing images

Experiments

  • Proposed SADM effective for medical image generation
  • Compared to GAN-based and diffusion-based baselines
  • Ablation study of model components with various input sequence settings

Dataset and implementation

  • Used the multi-frame cardiac MRI curated by ACDC for a common task of synthesizing the final frame of a cardiac cycle
  • Resized the dataset to X โˆˆ R 12ร—128ร—128ร—32 and Min-Max normalized it subject-wise
  • Followed the classifier-free diffusion model architecture and hyperparameters and modified it into a 3D model
  • Trained the model for 3 million iterations, which took 150 GPU hours

Comparison with baseline methods

  • Two state-of-the-art baseline models used for comparison
  • Augmented baselines with intermediate frames and ES frames
  • SADM shows better depiction of blood pool and other areas
  • Ventricular regions synthesized by SADM are more crisp
  • Quantitative comparison of SSIM, PSNR, and normalized RMSE
  • SADM outperforms GAN-based method by 3-13% in each metric

Ablation study

  • Synthesis using full sequence and missing data settings show higher SSIM than single input setting.
  • Model is learning which frames of the input sequence are important in generating future frames.
  • Removing either the attention module or the diffusion model results in worse performance.

Conclusion

  • Diffusion models are computationally expensive
  • SADM uses an autoregressive sampling scheme
  • Alternative sampling methods are not suitable for medical domain
  • SADM uses a transformer-based attention module to learn temporal dependence
  • Tested SADM on longitudinal cardiac and brain MRI
  • First to explore temporal dependency in diffusion models for medical image generation
  • Examples of longitudinal medical images synthesized by SADM shown in Fig. 1
  • SADM tested with single image, missing data, and full sequence settings
  • In-house synthesis of longitudinal brain MRI done in two steps
  • GAN-based registration model used to register subject scans to age-specific templates
  • Ablation study of SADM components conducted