3.2.2. Segmentation

Nautilus' segmentation pipeline was evaluated on four different clinical and cadaver datasets. The clinical dataset consisted of 58 pre-operative images with voxel resolutions ranging from 0.1 to 0.4 mm in the x-y plane and slice thickness ranging from 0.1 to 1 mm. The images were uploaded on Nautilus, and the union of ST and SV segmentation masks were obtained and compared with the manually labelled cochlea annotations. All the images were successfully processed, and a mean dice similarity coefficient and average surface error [71] of 86 ± 3% and 0.14 ± 0.03 mm were, respectively, observed for the clinical dataset. The cadaver datasets comprised 23 temporal bone (TB) μCT images in total. For computational limitations, the CT scans were resampled to an isotropic resolution of 0.1 mm. The images were uploaded on Nautilus, and the segmentation masks were obtained and compared with the manually labelled ST and SV annotations. All the images

were successfully processed, and a mean dice similarity coefficient and average surface error of 80 ± 3% and 0.19 ± 0.04 mm were, respectively, observed for this cadaveric image dataset.

Figure 7 depicts segmentation results for each dataset. For a more thorough analysis, the cochlea was sectioned along its centerline at an 18◦ angular interval. Dice similarity coefficients were computed for each segment (see Figure S9), where it appears that Dice scores decrease towards the apical area.

**Figure 7.** Segmentation output for different patients. (**A**) Clinical dataset, (**B**) cadaver dataset 1, (**C**) cadaver dataset 2, (**D**) cadaver dataset 3, blue: Nautilus estimation, orange: ground truth, green: overlap between the two.
