**3. Results**

The digital images of the patients' eyes have been captured by the device reported in Figure 2 and assembled on a Samsung S6 smartphone; 94 patients were involved, aged 19–75 (average 34), 46 female and 48 male, with Hb level concentrations in the range of 7.6–17.1 g/dL (average of 11.45 g/dL).

Each picture underwent a manual selection process, isolating and cropping regions of palpebral and forniceal conjunctiva, as shown in Figure 10. This step is needed to compare the manually segmented images considered as the ground truth with the automatic segmentation output from the proposed model. We evaluated both spatial and color properties of regions of interest by assessing the most suitable metrics based on this specific medical image segmentation problem [49]. F1 (FMS1), also known as the Sørensen–Dice coefficient, is the harmonic mean of precision and recall, defined as follows for binary segmentation applications:

$$F\_1 = 2 \cdot (\frac{Precision \cdot Recall}{Precision + Recall}) = \frac{2 \cdot TP}{2 \cdot TP + FP + FN} \tag{8}$$

**Figure 10.** (**a**) Manually segmented conjunctiva used as ground truth. (**b**) Automatically segmented conjunctiva obtained by the proposed approach. (**c**) Visualization of the overlapping between green ground truth image and white automatically segmented image (*F*1 = 0.904, *accuracy* = 96.41%).

The Dice coefficient being an overlapping measure ranging from 0 to 1, gives us a useful perspective about the quality of the segmentation. We are also interested in a calculation involving the number of pixels classified as non-relevant (false positive rate), which is not taken into account either by Dice coefficient or by Jaccard similarity. Accuracy metric is helpful in this case by outlining the rate of correctly classified pixels over the full image.

$$Accuracy = \frac{TP + TN}{TP + TN + FP + FN} \tag{9}$$

With the aim of assessing an average for the overlapping metrics, we computed a binary confusion matrix for each image. The values of this matrix refer to the number of pixels linked to set intersection or set difference between ground truth image and proposed segmentation, which are visually described in Figure 10c.

The averaged summation of each confusion matrix is summarized in Table 2. To give the reader the opportunity to observe the indicators for each sample included in the dataset, in Table A1 we have reported the values of the above metrics in a complete manner. Higher values of specificity for this segmentation task highlight the eligibility to disregard non conjunctival regions with proper confidence. On the other hand, sensitivity as well as F1 being overlapping measures, can reasonably fluctuate with higher variance, meaning in most cases that a finer meaningful subset of the conjunctival region has been selected.

**Table 2.** Metrics of averaged results of the comparison between manually and automatically segmented images of the conjunctiva.


The optimal results indicated by the above metrics are sufficient to state the effectiveness of our segmentation algorithm. Since here we are dealing with a rigorous diagnostic procedure, if comparing the precision of the overlapping between proposed and ground truth ROIs is acceptable, we think that a further investigation of the color properties for left-out or added regions would be interesting.

CIELAB is one of the most useful amongs<sup>t</sup> color spaces for erythema analysis and computer vision for diagnostics, composed by an approximately uniform three-dimensional space: L\*, a\*, b\*. A widely used dimension from this space, a\*, has a well-known correlation with hemoglobin values in this domain [36–38]. Our purpose is to examine the strength of linear correlation between mean values of a\* extracted from digital images of conjunctivas and the relative Hb g/dL concentration from blood samples taken almost at the same time of picture capturing phase (Figure 11). Generalizing the idea of Pearson correlation coefficient (PCC) from two random variables to two standardized vectors, we can estimate the weight of their linear correlation ranging from −1 to 1 and defined by the following equation:

$$\rho(a,b) = \frac{1}{N-1} \sum\_{i=1}^{N} (\frac{a\_i - \mu\_a}{\sigma\_a}) \cdot (\frac{b\_i - \mu\_b}{\sigma\_b}) \tag{10}$$

**Figure 11.** (**a**) Linear regression and strength of correlation between a\* from manual segmentation and Hb g/dL standardized vectors. (**b**) Linear regression and strength of correlation between a\* from automatic segmentation and Hb g/dL standardized vectors.

We computed PCC between the mean a\* values for both manually and automatically segmented images and Hb g/dL through the entire dataset of 94 samples, thereby obtaining respectively 0.59 and 0.53. The results reconfirm not only the moderate linear correlation between those values, but also a robust contiguity among human based manual segmentation and fully automated segmentation approach proposed.
