*5.2. Real-time Performance*

A Softmax layer was added after the CNN in the real-time gaze estimation. As this is a four-class classification problem, the Softmax block mapped the 4-D class scores vector, output by the CNN, to an equivalent 4-D class probabilities vector. The probabilities in this vector quantify the model's confidence level in thinking that the input frame is of a particular class. Ultimately, the one with the highest probability value was the predicted class. A sample of the results is shown in Figure 18.



**Figure 18.** Real time prediction results.

#### *5.3. Classification Results*

Five-fold cross-validation was performed to obtain precise classification accuracy (i.e., the ratio of the number of correctly classified patterns to the total number of patterns classified). This estimates the model's true ability to generalize and extrapolate to new unseen data. This was accomplished by testing on 20% of the entire benchmark dataset of eight users that contains 3200 images. With 5-fold

cross-validation, five CNNs were trained, and therefore five 4x4 confusion matrices (CM) per user were computed. These five CMs were then accumulated to yield a cross-validation confusion matrix per user, out of which the cross-validation classification accuracy for every user was computed and shown in Figure 19. Subsequently, the eight cross-validation CMs were accumulated to produce an overall CM that represents the complete model performance over the whole dataset as illustrated in Table 2. Finally, the overall CNN classification accuracy was computed. It is worth noting from Figure 19 that the lowest cross-validation classification accuracy among all the subjects is 96.875%, which is quite satisfactory for accurately estimating a user's gaze. Moreover, five out eight users had a 100% cross-validation accuracy.


**Table 2.** Normalized confusion matrix of % of classification results.

According to the normalized CM shown in Table 2, the classification result of the designed CNN yields probabilities higher than 98.4% for an accurate gaze estimation for all classes. As the ground truth probabilities (columns) indicate, right and forward classes have extremely low confusion probabilities (1.25% and 1.56%) to left and forward classes, respectively. The overall CNN classification accuracy is 99.3%, which makes the gaze estimation reliability of the designed system practically 100%. The implemented system's performance is summarized in Table 3.

**Table 3.** Summary of the implemented system's evaluation metrics.

