**Joint Optic Disc and Cup Segmentation Using Self-Supervised Multimodal Reconstruction Pre-Training †**

#### **Álvaro S. Hervella 1,2,\*, Lucía Ramos 1,2, José Rouco 1,2, Jorge Novo 1,2 and Marcos Ortega 1,2**


#### Published: 20 August 2020

**Abstract:** The analysis of the optic disc and cup in retinal images is important for the early diagnosis of glaucoma. In order to improve the joint segmentation of these relevant retinal structures, we propose a novel approach applying the self-supervised multimodal reconstruction of retinal images as pre-training for deep neural networks. The proposed approach is evaluated on different public datasets. The obtained results indicate that the self-supervised multimodal reconstruction pre-training improves the performance of the segmentation. Thus, the proposed approach presents a great potential for also improving the interpretable diagnosis of glaucoma.

**Keywords:** deep learning; self-supervised learning; segmentation; eye fundus; glaucoma

#### **1. Introduction**

The detailed analysis of the optic disc and optic cup in retinal images is a key step for the diagnosis of glaucoma. In this regard, several biomarkers derived from the morphological analysis of these two structures have demonstrated to be useful for the diagnosis and screening of the disease. This has motivated the development of automated methods for the segmentation of optic disc and cup in retinography. Nowadays, the most successful segmentation approaches are those based on deep learning techniques. However, the training of deep neural networks requires large amounts of annotated data that can be difficult to obtain.

In this work, we present a novel approach for the joint segmentation of the optic disc and cup in retinography using deep learning [1]. Given the limited size of common datasets, we propose a novel self-supervised pre-training consisting in a multimodal reconstruction between complementary retinal image modalities. To validate this proposal, we perform experiments on different public datasets.

#### **2. Methodology**

The proposed pre-training consists of the self-supervised multimodal reconstruction of fluorescein angiography from retinography. This multimodal reconstruction leverages the availability of unlabeled multimodal image pairs for learning about the retinal anatomy [2]. In particular, the self-supervised multimodal reconstruction is trained as proposed in Reference [3] using a public dataset of retinography-angiography pairs. After the pre-training phase, the neural network is fine-tuned in the target task, that is, the joint segmentation of the optic disc and cup in retinography. This joint segmentation is approached as a pixel-wise multi-class classification where the neural network learns to predict the likelihood of the different classes [1].

#### **3. Results and Conclusions**

The proposed methodology is evaluated on two different public datasets, which present a representative variety of glaucomatous an healthy retinas. Additionally, we also perform a comparison against training the network from scratch in the segmentation task, which is the most common approach in the literature. The quantitative evaluation shows that the proposed methodology significantly improves the segmentation of the optic disc and optic cup. In particular, our proposal achieves a Jaccard index of 82.29 and 92.43 for the optic cup and disc, respectively, whereas the training from scratch achieves a Jaccard index of 75.36 and 88.19 for the optic cup and disc, respectively. Additionally, in comparison to other alternatives in the literature, our proposal achieves a competitive state-of-the-art performance while using less annotated data. Figure 1 depicts a representative example of predicted segmentations, where it is observed that the proposed approach produces more consistent segmentations than the alternative training from scratch. Conclusively, the self-supervised multimodal reconstruction pre-training demonstrates to be useful for improving the joint segmentation of the optic disc and cup in retinography.

**Figure 1.** Representative example of predicted segmentations for a glaucomatous retina. (**a**) Retinography, (**b**,**c**) predicted segmentations, and (**d**) ground truth segmentation. The predicted segmentations correspond to (b) a neural network trained from scratch and (**c**) the proposed approach.

**Author Contributions:** A.S.H., J.R. and J.N. contributed to the analysis and design of the computer methods and the experimental evaluation methods, whereas A.S.H. also developed the software and performed the experiments. L.R and M.O. contributed with domain-specific knowledge. All the authors performed the result analysis. A.S.H. was in charge of writing the manuscript, and all the authors participated in its critical revision and final approval. All authors have read and agreed to the published version of the manuscript.

**Acknowledgments:** This work is supported by Instituto de Salud Carlos III, Government of Spain, and the European Regional Development Fund (ERDF) of the European Union (EU) through the DTS18/00136 research project, and by Ministerio de Ciencia, Innovación y Universidades, Government of Spain, through the RTI2018-095894-B-I00 research project. The authors of this work also receive financial support from the ERDF, the European Social Fund (ESF) of the EU, and Xunta de Galicia through Centro de Investigación de Galicia ref. ED431G 2019/01 and the predoctoral grant contract ref. ED481A-2017/328.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
