Master’s Thesis
Calibration and Uncertainty Visualisation in Medical Image Segmentation
Abstract
Deep learning models for medical image segmentation must provide not only accurate predictions but also calibrated confidence and clinically meaningful uncertainty estimates. In full-body CT segmentation of small and low-contrast structures such as the lymphatic system, predictive probabilities are often miscalibrated and voxel-wise uncertainty maps are dominated by trivial boundary effects.
This thesis investigates calibration and uncertainty estimation within the nnU-Net v2 framework on 45 annotated CT volumes. Three configurations are compared: a single baseline model, a deep ensemble based on 5-fold cross-validation, and a checkpoint ensemble based on cyclical learning rates. Confidence calibration is performed via post-hoc temperature scaling, with the temperature parameter estimated by minimizing negative log-likelihood on a held-out region of interest.
On the test set, temperature scaling consistently reduces miscalibration across all models, with relative improvements up to 41% in NLL and 56% in Expected Calibration Error, without altering segmentation decisions. Voxel-wise uncertainty maps derived from single or ensemble predictions are quantitatively evaluated as error localization tools through threshold calibration based on Dice overlap with segmentation errors.
Within a 15 mm ROI, ensemble-based measures achieve the highest overall error recall, while distance-aware entropy and Exceedance-Based Contextual Uncertainty improve the detection of non-boundary uncertainty patterns. An interactive visualization tool integrates calibrated predictions and thresholded uncertainty maps to support structured qualitative inspection. Overall, the results show that post-hoc calibration improves the alignment between uncertainty and actual segmentation errors, and that voxel-wise uncertainty maps can support error detection and clinical interpretability without architectural modifications or substantial computational overhead.