*3.4. Failure Detection*

The outputs of Nautilus' flagging system were compared with the qualitative assessment of the two reviewers, as detailed in the previous section. Figure S10 presents a performance summary of each flagging mechanism. An overall pre-operative failure detection sensitivity and specificity 100% and 97.4%, respectively, was achieved, with a corresponding post-operative failure detection sensitivity and specificity of 97.3% and 59.7%, respectively.

#### *3.5. Computational Performances*

Average computation times for each process are listed in Table 2. Computation times were obtained for a processing run on a standard Azure cloud VMs (Standard DS3 v2). On average, a complete pre- and post-operative analysis took around 10–12 min, with data storage and shape model adaptation for the segmentation taking the most time. All the other processes take less than two minutes combined. Nautilus is orchestrated with Azure Kubernetes with scalability in mind, and the throughput can be trivially scaled up by increasing the number of worker nodes.

#### **4. Discussion**

We present a web-based imaging research platform enabling the segmentation of cochlear structures and reconstruction of a cochlear implant electrode from conventional pre- and post-operative CT scans, respectively. Detailed analyses of accuracy, robustness, and failure detection provide legitimate grounds for using Nautilus for the exploration of clinically relevant questions on cochlear implantation and envisage further developments towards image-guided CI therapy.

Nautilus demonstrates segmentation performances in the range of previously presented academic results. More recent works have reported average cochlear Dice scores and average surface errors in the range of 72–91% and 0.11–0.27 mm, respectively [8–10,20,72]. Some of these groups have achieved higher Dice scores on limited datasets with highresolution CT and μCT images [8,72]. A direct comparison between the works is not possible since our dataset and analysis focused on clinical and downsampled μCT images. Moreover, there is no publicly available benchmark analysis available for a fair comparison between different approaches. Nevertheless, our results on a varied dataset supports our claim of high accuracy and usability with conventional clinical CTs.

Many prior works have focused on inferring cochlea shape from μCT or high-resolution CTs as they offer good contrast and resolution compared to routine clinical CTs [8,72]. Our segmentation approach relies on JASMIN-inspired shape analysis [20], which offers the advantage of more interpretability of the estimated model parameters allowing further statistical studies. However, the same process is the bottleneck of our pipeline in terms

of computational efficiency. This process could be adapted to benefit from learned shape models and anatomically inspired post-processing [73,74]. Our analysis also suggests that Nautilus performs better on clinical CT scans compared to cadaver head scans, which might be inherent to the cadaver head preparation process that often results in random air pockets, leading to a different intensity profile [75]. Additionally, our training dataset is comprised of mainly clinical scans. In future, a cadaver-specific pipeline may be developed to support cadaver-based research. Regardless, this is not a limiting factor in the applicability of Nautilus, as the main foreseen applications are in clinical research. Furthermore, our discretized analysis of the segmentation revealed that the performance decreases beyond two turns of the cochlea because of the small diameter of the cochlear ducts relative to image resolutions. This, however, is also not a limiting factor as most of the CI electrode arrays only reach around 450–600◦ of insertion coverage.

Post-operatively, our electrode detection process outperforms previously reported works, which have reported localization errors in the range of 0.1–0.35 mm [58,61,62]. The electrode contact-BM distances could serve for inferring insertion trauma according to the Eshraghi trauma scales [55]. This would require distance-trauma evaluation against either cadaveric histology samples or high-resolution μCT scans where the various grades of BM trauma would be resolvable. We must note that metallic artifacts emanating from the electrodes do not permit direct segmentation of cochlear structures. This warrants the necessity of a pre-operative CT-scan to infer information about the cochlear structures. The post-operative images can be converted into pseudo-pre-operative images suitable for segmentation using artifact reduction techniques [76], or an atlas can be adapted on the post-operative to segment it directly [77]. The metallic artifacts might have an impact on pre-post registration as well. However, the challenge of post- to pre-operative image conversion can be circumvented by simply using a mirrored version of the contralateral cochlea in the post-operative scan if that contralateral ear is not implanted [24].

Although accuracy is an elementary performance metric for any segmentation pipeline, robustness is key for the usefulness of a tool such as Nautilus, especially given the heterogeneity of image quality expected to be input to the tool. Our subjective quality assessment provides an indication that Nautilus can be used with confidence when dealing with images of various resolutions, contrast, and signal-to-noise ratios. To the best of our knowledge, no other work in this domain has focused on robustness analysis from a comprehensive multi-centric dataset with varying image qualities. Recently, Fan et al. achieved 85% robustness for cochlea segmentation on their 177-image dataset [44]. Contrarily, our qualitative analysis depicts a robustness of around 97% with clinically reasonable performance. Our analysis enabled us to identify a resolution cutoff beyond which robustness seems to drop. The processing of images presenting voxel sizes superior to 0.3 mm does result in a significantly greater number of failures or inadequate outputs. This assessment, therefore, sets input specifications for recommended input image resolutions.

Because the probability of failure of our pipelines is non-zero, especially if out-ofspecification images are input to the tool, Nautilus does provide cautionary flagging mechanisms that embody our guiding design principle of transparency. Our current set of flags has been 100 percent sensitive and about 60 percent specific, meaning that processing failures are very unlikely to go unaccounted for and that the system will result in false positives (notified non-failures) in less than half of the time, which we deemed an acceptable threshold for usability, especially as Nautilus is robust. A further observation for failures related to electrode detection in particular is that any failures are hard failures and easily noticed by the user. All in all, our flagging mechanisms should be useful to call for manual verification and potentially discard faulty analyses.

The set of features proposed by Nautilus provides legitimate grounds for exploring many relevant clinical and basic questions related to cochlear anatomy. Nautilus' statistical model of the electrode insertion trajectory from pre-operative images, for instance, could be used prospectively to aim at a specific insertion angular coverage. The accuracy of these predictions could be validated using Nautilus with the post-operative images. Post-operatively, Nautilus makes possible the exploration of anatomo-physiologically-tuned fitting [78,79] or the exploration of the relationship between electrode geometrical configuration within the cochlea and clinical outcomes, including perhaps residual hearing. For all its utility, Nautilus could in the future be extended with additional features to address a broader spectrum of investigations, such as these related to the prediction of insertion difficulties during surgical planning, including for abnormal anatomies [80,81]. The delineation of other structures, including the facial nerve, chorda tympani, or RW would then be required. Other imaging modalities (e.g., MRI) and electrode arrays could be the subject of future developments. Bridging pre- and post-operative use-cases, an augmented reality setup inspired by [82] could be envisaged for intraoperative guidance.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/jcm11226640/s1, Figure S1: Landmark prediction model architecture; Figure S2: Pre-operative U-Net used for cochlear segmentation; Figure S3: Post-operative U-Net for cochlear implant detection; Figure S4: An example of a failure flag being triggered and shown to caution the user about possible processing failure; Figure S5: ROC curve for failure detection process; Figure S6: Pre-operative statistics from the qualitative assessment cohort automatically computed from the segmentations; Figure S7: Qualitative segmentation performance with respect to image quality criteria; Figure S8: Qualitative registration performance with respect to image quality criteria; Figure S9: Dice scores per cochlear angle for the cadaver bone dataset (n = 23); Figure S10: Quantitative evaluation of the failure detection pipeline with respect to reviewer's grading.

**Author Contributions:** Conceptualization, J.M., T.D., D.G. and F.P.; methodology, J.M., R.H., P.L.D., T.D., Z.W. and H.D.; software, J.M., R.H., P.L.D., T.D., Z.W., O.M.M. and H.D.; validation, R.H., A.M., C.V. and N.G.; formal analysis, J.M., R.H., T.D., Z.W. and D.G.; investigation, J.M., R.H., P.L.D., A.M., T.D. and Z.W.; resources, R.H., A.M., T.D., A.B., T.L., F.P. and N.G.; data curation, J.M., R.H., P.L.D., A.M., T.D., C.V., A.B. and N.G.; writing—original draft preparation, J.M., R.H., P.L.D. and F.P.; writing—review and editing, J.M., R.H., P.L.D., A.M., T.D., Z.W., D.G., O.M.M., C.V., H.D., A.B., T.L., F.P. and N.G; visualization, J.M., R.H. and P.L.D.; supervision, J.M., R.H., D.G., H.D., A.B., T.L., F.P. and N.G.; project administration, F.P.; funding acquisition, D.G. and F.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** All clinical CT images used for the development of Nautilus were anonymized. These clinical scans are part of the clinical routine at the Hannover Medical School to pre-operatively evaluate the condition of the cochlea and post-operatively confirm correct intracochlear array placement. The institutional ethics committee at Hannover Medical School approved the use of anonymized imaging data obtained within the clinical routine.

**Informed Consent Statement:** Informed consent was obtained from all patients, and all experiments were performed in accordance with relevant guidelines and regulations and in accordance with the Declaration of Helsinki.

**Data Availability Statement:** The datasets analysed within the scope of the current study cannot be made publicly available as they have been made available to the authors under the specific authorization of the Hannover Medical School. The Hannover Medical School has collected the authorization of their patients to share their data anonymously for third-party analyses in the context of clinical research. This authorization does not extend to the public publication and distribution of the data. Access to the tool is, however, available upon reasonable request at nautilus\_info@oticonmedical.com.

**Acknowledgments:** We would like to thank all beta-testers and early users for critical feedback on the platform. We are also grateful to the developers of the many software tools and packages used for this project, including, but not limited to, PyTorch [83], MONAI [37], TorchIO [39], ITK [26], ITK-SNAP [64], Elastix [53], VTK [84], NumPy [85], SciPy [86], scikit-learn [87], Django, Django REST framework, Celery, Kubernetes, Docker, PostgreSQL, Redis, React, react-vtkjs-viewport [88], React, Chart.js, Plotly.js, Bulma, and PyVista [89].

**Conflicts of Interest:** J.M. is a consultant for, and at the time of this study, R.H., T.D., O.M.M., D.G. and F.P. worked in the Research & Technology Department at Oticon Medical, manufacturer of the Neuro Zti cochlear implant system. The remaining authors declare no conflict of interest.
