**4. Validation**

#### *4.1. Face Detection*

The training of the neural network Single Shot Detection was started using only the dataset Face Detection in Images [54] with 100,000 epochs, later, with the aim of reducing the error substantially, 30,000 epochs were trained with both datasets. Figure 14 represents the training of the neural network from epoch 1 to 130,000.

**Figure 14.** Neural network error Single Shot Detection, note that from epoch 100,000 the error decreases significantly.

While the error metric is a valid indicative. A confusion matrix allows to know more clearly how the network is performing, since with it, the following scenarios can be considered:


To evaluate the three previous cases, the *Intersection over the Union* (IoU) [62] is used as a metric. Table 9 shows the results of the aforementioned confusion matrix.

**Table 9.** Confusion matrix for face detection.


#### *4.2. Obtaining Age and Gender*

The part of the age estimate is measured with the error obtained in the training and testing phases, since it is configured as a regressor. In the training, an average error of 0.11 years is achieved at the end of the 2600 epochs, while in the test set, the average error is 10.44 years. Figures 15 and 16 show the evolutions of the errors during the 2600 training and test periods, respectively.

**Figure 15.** Evolution of the error in the training phase of the age estimation.

**Figure 16.** Evolution of the error in the test phase of the age estimate.

The classification part of the gender is measured by its accuracy, giving a result of 83.73% in the training phase and 80.09% in the testing phase. The evolution of these are shown in Figures 17 and 18, respectively.

**Figure 17.** Evolution of precision in the training phase of gender classification.

**Figure 18.** Evolution of precision in the test phase of gender classification.

Finally, the confusion matrix of the gender classification is presented in Table 10.

**Table 10.** Gender classification confusion matrix.


#### *4.3. Augmented Reality*

In Figure 19, the image shown on the screen in a real case is shown. The company logo (Bubble Town® [63]) is displayed at the upper left part of the face, there is the image of the drink and name of the drink and modality are shown in the lower central part. However, even though the NVIDIA Jetson Xavier is a powerful device, it was observed that it is not enough for the project to flow properly, since there is a delay between the video captured by the camera and image displayed on the screen (with augmented reality).

Another important aspect to highlight in the work is the capacity of the system to be able to generate a recommendation to more than one person at the same time, see Figure 20, given that if there are several faces in the camera image, the system will recognize them and generate recommendations independently. So that each user knows the recommended drink, the thought balloons for each one, and the banner will be changing from time to time; to identify who the banner belongs to, it will have the same color as the thought balloon. The advertising totem was in operation for several days in the Laboratory of Computational Cognitive Sciences of the Center for Computing Research [64], showing that it properly worked at all times. To replicate these results, the reader can visit https://github.com/vicleo14/PublicidadBT (accessed on 20 December 2021). A short video demo can be found at https://tinyurl.com/2p8bf68s (accessed on 20 December 2021). Regarding working time, once the system detected a face, it started generating the augmented reality scene in 2.03 s on average; lastly, on a small poll with 30 users, 86.66% of them liked our beverage recommendation; see all data in Table A1 and logifle in https://tinyurl.com/59ev279p (accessed on 18 December 2021).

**Figure 19.** Final result of Augmented Reality.

**Figure 20.** Final result of augmented reality with two people using our advertising totem.
