Master’s Thesis

Calibration and Uncertainty Visualisation in Medical Image Segmentation

Author: Lorenzo Reitani
Degree programme: MSc in Computer Science and Engineering
University: Politecnico di Milano
Advisor: Prof. Loiacono Daniele
Co-advisor: Brioso Ricardo
Academic Year: 2025-2026

Abstract

Deep learning models for medical image segmentation must provide not only accurate predictions but also calibrated confidence and clinically meaningful uncertainty estimates. In full-body CT segmentation of small and low-contrast structures such as the lymphatic system, predictive probabilities are often miscalibrated and voxel-wise uncertainty maps are dominated by trivial boundary effects.

This thesis investigates calibration and uncertainty estimation within the nnU-Net v2 framework on 45 annotated CT volumes. Three configurations are compared: a single baseline model, a deep ensemble based on 5-fold cross-validation, and a checkpoint ensemble based on cyclical learning rates. Confidence calibration is performed via post-hoc temperature scaling, with the temperature parameter estimated by minimizing negative log-likelihood on a held-out region of interest.

On the test set, temperature scaling consistently reduces miscalibration across all models, with relative improvements up to 41% in NLL and 56% in Expected Calibration Error, without altering segmentation decisions. Voxel-wise uncertainty maps derived from single or ensemble predictions are quantitatively evaluated as error localization tools through threshold calibration based on Dice overlap with segmentation errors.

Within a 15 mm ROI, ensemble-based measures achieve the highest overall error recall, while distance-aware entropy and Exceedance-Based Contextual Uncertainty improve the detection of non-boundary uncertainty patterns. An interactive visualization tool integrates calibrated predictions and thresholded uncertainty maps to support structured qualitative inspection. Overall, the results show that post-hoc calibration improves the alignment between uncertainty and actual segmentation errors, and that voxel-wise uncertainty maps can support error detection and clinical interpretability without architectural modifications or substantial computational overhead.

uncertainty estimation confidence calibration medical image segmentation nnU-Net deep ensemble temperature scaling

Contact

LinkedIn — Lorenzo Reitani