Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Focuses on automatic segmentation of multiple anatomical structures in CT images.
- Problems with existing algorithms: difficult to use, don’t generalize, can only segment one structure.
- New dataset and segmentation toolkit solves these problems.
Paper Content
Introduction
- Need for computer-based evaluation methods for radiological images
- Segmentation of major anatomical structures
- Work on segmenting specific anatomical structures (e.g. pancreas, spleen, colon, lung)
- Work on segmenting several anatomical structures in one dataset and model
- First work trying to segment over 100 classes
- Closest work is from Chen et al. (50 classes)
- Model and training data is publicly available and easy to use
Methods
Data selection
- Collected CT images from PACS over 10 years
- Only images from patients with general research consent
- Excluded CT images of legs and hands
- Sampled CT series of each examination randomly
- Dataset contains CT images with different sequences, contrast agent, bulb voltages, slice thicknesses, resolution and kernels
- Excluded examinations with high ambiguity (40 subjects)
- Final dataset contains 1204 images
Data annotation
- 104 anatomical structures identified for segmentation
- Manual segmentation of 104 classes in 1204 subjects would take 10 years
- Used existing models to create first segmentation for 68 classes
- Manual segmentation for remaining 36 classes
- Used inhouse dataset with ground truth segmentations of heart subparts
- Trained U-Net on images with and without contrast agent
- Used Nora imaging platform to speed up manual segmentation
- Used active learning approach to further speed up process
- Used 3D renderings to spot errors
- Finished segmentation in 8 weeks with 1 person working on it
- 65x speedup compared to manual segmentation
- Quality of segmentations better than manual segmentation
Model
- Uses nnU-Net model for computer science paper
- Automatically configures hyperparameters based on dataset characteristics
- Delivers state of the art results and is easy to apply
- “-fast” option can be used for larger studies with many thousand subjects
Evaluation
- Dataset of 1204 subjects split into 1082 training, 57 validation and 65 test subjects
- Metric used: Dice score and normalized surface distance (NSD)
- NSD threshold set to 3 mm
- Dice score biased towards big classes, NSD biased towards small classes
- Evaluation done by calculating mean across all classes and subjects (micro-average)
- Compared results to nnU-Net trained on dataset from “Multi-Atlas Labeling Beyond the Cranial Vault Challenge”
Results
Overall results
- Model trained on CTs with 1.5 mm resolution has high accuracy (Dice and NSD scores over 0.96)
- Model trained on CTs with 3 mm resolution has lower accuracy (Dice score 0.85, NSD score over 0.96)
Results per class
- Results of all classes are shown in Figure 5
- Similar classes were grouped together for better readability
- Results range from 0.87 to 1.0
Comparison to other model
- Our model achieved a higher dice score than the nnU-Net
- Our model achieved a higher NSD score than the nnU-Net
Runtime
- Runtime, RAM and GPU memory requirements were measured on a local workstation with a NVidia GeForce RTX 3090 GPU.
- Three different CT images were used: small abdomen, medium thorax and abdomen, and large head to knee.
Typical failure cases
- Model produces highly accurate results
- Model is robust
- 15% of subjects have typical failure cases
- Figure 9 shows failure cases
- Awareness of failure cases is helpful when using segmentations
Application example
- Dataset of 4102 whole body CTs collected from patients aged 18-100
- Liver volume increases up to age 50, density decreases up to age 60
- Left atrium volume and density increase over lifespan
- Hip volume increases up to age 60, density decreases over lifespan
Conclusion
- Model produces accurate segmentation
- Model available on Github
- NVidia GPU with CUDA support required
- Online tool to upload images for segmentation
- Dataset of 1204 subjects with 104 ground truth segmentations publicly available
- Planning to extend number of classes
- Active learning approach for manual segmentation
- Results of high resolution model per class
- Adaptations to default nnU-Net settings
- Model trained on 1.5 mm isotropic resolution
- Model trained on 3 mm isotropic resolution
- Comparison of proposed model with publicly available multi organ segmentation model
- Typical failure cases of the proposed model
- Mean density and volume of three exemplary classes over lifespan of polytrauma cohort