Link to paper

The full paper is available here.

You can also find the paper on PapersWithCode here.

Abstract

  • Focuses on automatic segmentation of multiple anatomical structures in CT images.
  • Problems with existing algorithms: difficult to use, don’t generalize, can only segment one structure.
  • New dataset and segmentation toolkit solves these problems.

Paper Content

Introduction

  • Need for computer-based evaluation methods for radiological images
  • Segmentation of major anatomical structures
  • Work on segmenting specific anatomical structures (e.g. pancreas, spleen, colon, lung)
  • Work on segmenting several anatomical structures in one dataset and model
  • First work trying to segment over 100 classes
  • Closest work is from Chen et al. (50 classes)
  • Model and training data is publicly available and easy to use

Methods

Data selection

  • Collected CT images from PACS over 10 years
  • Only images from patients with general research consent
  • Excluded CT images of legs and hands
  • Sampled CT series of each examination randomly
  • Dataset contains CT images with different sequences, contrast agent, bulb voltages, slice thicknesses, resolution and kernels
  • Excluded examinations with high ambiguity (40 subjects)
  • Final dataset contains 1204 images

Data annotation

  • 104 anatomical structures identified for segmentation
  • Manual segmentation of 104 classes in 1204 subjects would take 10 years
  • Used existing models to create first segmentation for 68 classes
  • Manual segmentation for remaining 36 classes
  • Used inhouse dataset with ground truth segmentations of heart subparts
  • Trained U-Net on images with and without contrast agent
  • Used Nora imaging platform to speed up manual segmentation
  • Used active learning approach to further speed up process
  • Used 3D renderings to spot errors
  • Finished segmentation in 8 weeks with 1 person working on it
  • 65x speedup compared to manual segmentation
  • Quality of segmentations better than manual segmentation

Model

  • Uses nnU-Net model for computer science paper
  • Automatically configures hyperparameters based on dataset characteristics
  • Delivers state of the art results and is easy to apply
  • “-fast” option can be used for larger studies with many thousand subjects

Evaluation

  • Dataset of 1204 subjects split into 1082 training, 57 validation and 65 test subjects
  • Metric used: Dice score and normalized surface distance (NSD)
  • NSD threshold set to 3 mm
  • Dice score biased towards big classes, NSD biased towards small classes
  • Evaluation done by calculating mean across all classes and subjects (micro-average)
  • Compared results to nnU-Net trained on dataset from “Multi-Atlas Labeling Beyond the Cranial Vault Challenge”

Results

Overall results

  • Model trained on CTs with 1.5 mm resolution has high accuracy (Dice and NSD scores over 0.96)
  • Model trained on CTs with 3 mm resolution has lower accuracy (Dice score 0.85, NSD score over 0.96)

Results per class

  • Results of all classes are shown in Figure 5
  • Similar classes were grouped together for better readability
  • Results range from 0.87 to 1.0

Comparison to other model

  • Our model achieved a higher dice score than the nnU-Net
  • Our model achieved a higher NSD score than the nnU-Net

Runtime

  • Runtime, RAM and GPU memory requirements were measured on a local workstation with a NVidia GeForce RTX 3090 GPU.
  • Three different CT images were used: small abdomen, medium thorax and abdomen, and large head to knee.

Typical failure cases

  • Model produces highly accurate results
  • Model is robust
  • 15% of subjects have typical failure cases
  • Figure 9 shows failure cases
  • Awareness of failure cases is helpful when using segmentations

Application example

  • Dataset of 4102 whole body CTs collected from patients aged 18-100
  • Liver volume increases up to age 50, density decreases up to age 60
  • Left atrium volume and density increase over lifespan
  • Hip volume increases up to age 60, density decreases over lifespan

Conclusion

  • Model produces accurate segmentation
  • Model available on Github
  • NVidia GPU with CUDA support required
  • Online tool to upload images for segmentation
  • Dataset of 1204 subjects with 104 ground truth segmentations publicly available
  • Planning to extend number of classes
  • Active learning approach for manual segmentation
  • Results of high resolution model per class
  • Adaptations to default nnU-Net settings
  • Model trained on 1.5 mm isotropic resolution
  • Model trained on 3 mm isotropic resolution
  • Comparison of proposed model with publicly available multi organ segmentation model
  • Typical failure cases of the proposed model
  • Mean density and volume of three exemplary classes over lifespan of polytrauma cohort