In this section, the model performance was assessed using accuracy, precision, recall, F1-score, and area under the receiver operating characteristic (ROC-AUC) curves. We also computed confusion matrices to analyze misclassifications. In addition, we performed deep analysis of each technique on our four custom datasets. Furthermore, we highlighted each technique’s precision and recall and compared with our EmotionNET technique. Moreover, we compared the analysis of each emotion technique with the EmotionNET results. At the end, we compared and discussed EmotionNET-based QoE on two networks.
3.1. Four Techniques’ Training and Validation Accuracy, and Training and Validation Loss
In this section, we analyzed each technique’s validation and training accuracy with their losses and overfitting.
Figure 3 shows graphs for the techniques and a comparison focusing on why EmotionNET might be considered the best technique for emotion QoE analysis.
EmotionNET: The training accuracy steadily increases, reaching close to 0.99 by the end of 50 epochs, while the validation accuracy also improves consistently, stabilizing around 0.88. This indicates that the model is learning well from the training data and generalizing effectively on unseen data. The training loss decreases sharply, settling around 0.26, while the validation loss decreases as well but at a slower rate, reaching about 0.34. The gap between the training and validation loss is relatively small, suggesting that the model is not significantly overfitting.
ConvoNEXT: The training accuracy quickly reaches 0.99 within the first few epochs. However, the validation accuracy stands around 0.92, indicating a strong performance but possibly a minor generalization gap. The training loss drops significantly, nearing 0, which is typical for a model that fits very well to the training data. However, in the beginning, validation loss is decreasing but, after 10 epochs, it starts to increase and shows that the model is overfitting.
EfficientNET: The training accuracy also reaches well like ConvoNEXT and the validation accuracy reaches 0.92. In this model, it is the same problem of overfitting because in the beginning it is decreasing and learning well but after 10 epochs it starts increasing. This is a clear indication that the model is overfitting and no longer improving in general terms.
ViT: In this technique, the training accuracy improves rapidly and achieves almost perfect accuracy. The validation accuracy of this ViT technique reaches 0.91; this shows that model works and learns well. However, we looked into the training loss which is decreasing quickly like the other techniques. Moreover, in the beginning, the validation loss decreases but when it reaches 10 epochs it starts increasing which means the model technique is overfitting.
Comparative analysis of training and validation accuracy with their loss: We tested four model techniques on our custom-created dataset. The training loss and validation loss show that the ConvoNEXT, EfficientNET, and ViT model techniques are overfitting in our custom emotion recognition dataset. EmotionNET performs better and balanced accuracy between the training and validation. Although the validation and training accuracy is not much better than ConvoNEXT and EfficientNET, it has the smallest gap between the training and validation loss. This means that the EmotionNET technique performs better on emotion recognition datasets and it has less overfitting as compared to the other model techniques. The other model techniques have high training accuracy but they suffer from increased validation loss which means the model techniques are overfitting. They have limitations on their effectiveness and unseen data. From the graph and data, the EmotionNET model technique shows the best and most balanced performance among the four model techniques because it became balanced between the learning and validation of emotion recognition data.
3.2. Four Model Techniques’ Performance Analysis and Comparison
In this section, we evaluate the performance of the four model techniques in binary classification. The ROC curves help us to understand each technique performance in depth with their positive and negative classes.
EmotionNET Model Technique ROC: The ROC curves in
Figure 4 show the classification performance of the EmotionNET model technique across six emotion categories (Angry, Disgusted, Fearful, Happy, Neutral, Sad, and Surprised) for four datasets: the WiFi (Boosteroid and NVIDIA) datasets, and the mobile data (Boosteroid and NVIDIA) datasets. Each emotion category shows varying levels of classification accuracy, measured by the AUC (area under the curve), where a higher AUC value indicates better performance. For the “Angry” emotion, the EmotionNET technique performs excellently across three datasets, with AUC values close to 0.99, but the mobile data NVIDIA dataset shows much lower performance, with an AUC of 0.78.
Similarly, for the “Disgusted” emotion, this technique achieves AUC values above 0.96 for most datasets, while the mobile data NVIDIA dataset lags with an AUC of 0.84. In the case of the “Fearful” emotion, three datasets again perform strongly, achieving near-perfect AUC values around 0.99 to 1.00, while the mobile data NVIDIA dataset records a significantly lower AUC of 0.70. The pattern is similar for the “Happy” emotion, where AUC values for three datasets are as high as 0.99 to 1.00, while the mobile data NVIDIA dataset lags behind at 0.76. The “Neutral” emotion shows similarly high AUC values across most datasets, ranging from 0.98 to 0.99, except for the mobile data NVIDIA dataset, which achieves a moderate AUC of 0.81. For the “Sad” emotion, the model performs well on the WiFi (Boosteroid and NVIDIA), and mobile data Boosteroid datasets (AUC = 0.98), but the mobile data NVIDIA dataset records its lowest performance overall, with an AUC of 0.65. Finally, the “Surprised” emotion follows the general trend, with high AUC values around 0.99 for most datasets, while the mobile data NVIDIA dataset performs comparatively lower with an AUC of 0.85. The EmotionNET model technique demonstrates a strong classification performance for most datasets, with AUC values close to 1.0 for many emotions. However, the mobile data NVIDIA dataset consistently underperforms across all emotion categories, suggesting it may have more challenging data or lower quality, making it harder for the model to classify emotions accurately. The EmotionNET model demonstrates superior performance in a low-latency (WiFi) environment, achieving high precision and recall across most emotional categories. However, under high-latency conditions (mobile data, especially the NVIDIA dataset), the classification performance dropped significantly, particularly for emotions such as “Disgusted” and “Surprised”. The increased latency likely caused a delay in facial expression responses, leading to misclassifications. While the model performed better than other deep learning techniques in real-time gaming scenarios, future enhancements are required to adapt to dynamic network conditions.
ConvoNEXT Model Technique ROC: Figure 5 highlights the ROC curves for different emotion categories across four datasets. Each ROC curve represents the performance of a classification model for a specific emotion category: Angry, Disgusted, Fearful, Happy, Neutral, Sad, and Surprised. The datasets include the following: WiFi Boosteroid consistently demonstrates excellent performance across all emotion categories. For Angry, Fearful, Sad, and Surprised, the AUC values are approximately 0.98 to 0.99, indicating that the model can effectively classify these emotions with minimal false positives. The model performs exceptionally well for Neutral and Happy, achieving an AUC of 1.00, which suggests near-perfect classification for these emotions. Even for the more challenging category of Disgusted, the model maintains a high AUC of 0.96, further emphasizing the reliability of this dataset in producing accurate emotion detection results. Similarly to the WiFi Boosteroid dataset, the mobile data Boosteroid dataset shows a strong classification performance across all emotions. For emotions such as Angry, Happy, Sad, and Neutral, the AUC values range from 0.97 to 0.99, which indicates excellent model performance. The model handles challenging emotions like Fearful and Disgusted well, with AUC values around 0.98. While slightly lower for Surprised, with an AUC of 0.90, the dataset still performs effectively, showing the model’s ability to generalize well across different emotion categories.
The WiFi NVIDIA dataset exhibits a significant drop in performance compared to the Boosteroid datasets. For emotions such as Angry, Fearful, and Sad, the AUC values fall between 0.67 and 0.83, reflecting moderate-to-poor classification accuracy. The ROC curves for these emotions are much closer to the diagonal line, suggesting that the model struggles to distinguish between different emotional states in this dataset. In particular, for the Disgusted category, the performance is notably weak, with the model unable to effectively differentiate this emotion from others. While it performs slightly better for Happy and Neutral with AUC values close to 0.83, the overall performance remains suboptimal across all emotion categories. The mobile data NVIDIA dataset, like its Boosteroid equal, shows a high classification performance for most emotions. Emotions like Angry, Fearful, Sad, Happy, and Neutral have AUC values between 0.97 and 0.99, signifying excellent classification accuracy with few false positives. However, the performance for Surprised is slightly lower, with an AUC of 0.90, indicating a small drop in accuracy. The model performs similarly well for Disgusted, maintaining an AUC of around 0.98, suggesting the model can handle complex emotions in this dataset effectively. The WiFi Boosteroid dataset outperforms the others, especially for emotions like Neutral and Happy, where it achieves an AUC of 1.00. The mobile data (Boosteroid and NVIDIA) datasets also show a strong performance across most emotions. In contrast, the WiFi NVIDIA dataset exhibits a significantly weaker performance, particularly for emotions like Angry, Fearful, and Disgusted, where the model struggles to classify emotions accurately.
EfficientNET Model Technique ROC: Similarly,
Figure 6 contains ROC curves for seven emotional categories: Angry, Disgusted, Fearful, Happy, Sad, Neutral, and Surprised. The performance in the WiFi Boosteroid dataset is moderate. For emotions such as Angry and Fearful, the AUC values are 0.78 and 0.76, indicating a reasonable classification accuracy but with room for improvement. For Happy and Neutral, the AUC scores are 0.96 and 0.81, representing a much better performance, especially for the Happy emotion, where the model exhibits excellent classification capability. However, for Disgusted, the model’s performance is weak with an AUC of 0.43, showing considerable difficulties in distinguishing this emotion. The Sad and Surprised emotions show a reasonable performance, with AUC values of 0.64 and 0.70, respectively, highlighting that the model handles these categories moderately well but could improve. The WiFi NVIDIA dataset shows a notable drop in performance compared to the others. For the Angry and Sad emotions, the AUC values are 0.74 and 0.64, showing a moderate classification accuracy but a significant distance from ideal. For Fearful, the AUC is 0.84, indicating the model performs relatively better, with fewer false positives and improved sensitivity. The Happy emotion is one of the strongest in this dataset, with an AUC of 0.94, but Disgusted remains poorly classified, with a low AUC of 0.48, showing considerable difficulty in distinguishing this emotion from others. For Neutral and Surprised, the AUC values are 0.64 and 0.65, respectively, suggesting the model has trouble accurately predicting these emotions in this dataset.
The mobile data Boosteroid dataset performs quite well in this dataset for most emotions. For Angry, the AUC is 0.67, reflecting some challenges in classification, but it improves significantly for Fearful with an AUC of 0.81. The Happy and Neutral emotions exhibit a strong performance with AUC values of 0.97 and 0.81, indicating that the model can classify these emotions accurately, with few false positives. For Disgusted, the model continues to struggle, achieving an AUC of 0.49, similar to other datasets. The AUC for Sad is 0.70, showing moderate classification, while Surprised achieves a score of 0.64, indicating room for improvement. The mobile data NVIDIA dataset shows the highest performance for many emotions. For Angry, the AUC is 0.90, indicating a strong classification accuracy with minimal false positives. For Fearful, the model also performs well, with an AUC of 0.90, and, for Happy, it achieves an AUC of 0.98, showing almost perfect classification for this emotion. The model has a better handle on Disgusted in this dataset compared to others, with an AUC of 0.81, which is a noticeable improvement. For Neutral and Sad, the AUC values are 0.90 and 0.97, respectively, indicating a high performance, particularly for Sad. Surprised also achieves a relatively good score with an AUC of 0.90, showing that the model performs strongly across most emotions in this dataset.
ViT Model Technique ROC: Furthermore,
Figure 7 represented the ROC curves for seven dataset emotion categories (“Angry”, “Disgusted”, “Fearful”, “Happy”, “Neutral”, “Sad”, and “Surprised”) across four different datasets. The ROC curve is a graphical representation of a model’s ability to distinguish between classes, with the True Positive Rate plotted against the False Positive Rate. The AUC is also provided as a metric to summarize the overall performance of each model, with values closer to 1.0 indicating better performance. The WiFi Boosteroid dataset demonstrates a consistently strong performance across all emotions. For Angry, Fearful, Sad, Happy, Neutral, and Surprised, the AUC values range from 0.96 to 0.99, indicating a very high classification accuracy with minimal false positives. For Disgusted, the AUC is slightly lower at 0.97, but still suggests an excellent performance, especially compared to the other datasets. The ROC curve is close to the ideal top-left corner for all emotions, suggesting a reliable model in this dataset. The WiFi NVIDIA dataset shows a more varied performance. For Angry, the AUC is 0.94, indicating a good but slightly lower classification accuracy than the WiFi Boosteroid dataset. For Fearful, Neutral, and Happy, the AUC values range between 0.95 and 0.98, reflecting an excellent classification performance. For Disgusted, the model struggles, with an AUC of 0.60, highlighting the difficulty in distinguishing this emotion in this dataset. The ROC curve for Disgusted is far from the top-left corner, indicating a higher rate of false positives. The Sad and Surprised emotions have strong AUC values of 0.96 and 0.94, showing high reliability for these categories.
The mobile data Boosteroid dataset displays a solid performance across all emotions. For Angry, Fearful, Sad, Neutral, and Happy, the AUC values range between 0.97 and 1.00, demonstrating near-perfect classification, especially for Happy, where the AUC reaches 1.00. The performance for Disgusted is strong, with an AUC of 0.98, showcasing the model’s ability to handle this challenging emotion well in this dataset. Surprised has a slightly lower AUC of 0.97, but still indicates highly accurate classification with minimal false positives. The mobile data NVIDIA dataset shows mixed results, with a strong classification for Angry (AUC 0.85) and Happy (AUC 0.90). The model performs moderately for Fearful and Neutral, with AUC values of 0.85 and 0.81, respectively, indicating reasonable classification but with some room for improvement. For Disgusted, the model struggles significantly, achieving the lowest AUC at 0.40, suggesting a high rate of misclassification. For Sad and Surprised, the AUC values are 0.76 and 0.86, respectively, showing a moderate performance but significantly lower than the other datasets.
ROC-based model technique comparison: EmotionNET stands out as the top- performing model across all emotions and datasets, demonstrating a consistently high performance, particularly in the “Happy”, “Sad”, and “Neutral” categories, where it achieves near-perfect AUC scores. Although it excels overall, EmotionNET, like the other models, encounters challenges in correctly classifying the “Disgusted” emotion, indicating that this emotion is inherently more difficult to predict across different datasets. In comparison, ConvoNEXT also performs well but exhibits more variation in AUC scores across different emotions. It competes closely with EmotionNET in categories like “Happy” and “Fearful” but generally falls short in others, such as “Angry” and “Neutral”. EfficientNET shows significant variability across emotions, with its performance being less consistent than both EmotionNET and ConvoNEXT. While it performs adequately in “Angry” and “Sad”, it struggles notably with the “Disgusted” emotion. Lastly, ViT shows a strong performance in several emotions, particularly in “Happy” and “Sad”, sometimes matching or even surpassing EmotionNET. However, ViT also exhibits more pronounced fluctuations in performance across different emotions, similar to ConvoNEXT and EfficientNET. EmotionNET is the best choice for deployment, given its superior and more consistent performance across all emotions, though all models show a common weakness in classifying the “Disgusted” emotion, highlighting an area for potential improvement. EmotionNET demonstrated superior accuracy and generalizability with minimal overfitting. ConvoNEXT and EfficientNET, while achieving higher training accuracy, suffered from overfitting, making them less effective in real-world scenarios. ViT exhibited an inconsistent classification performance across different network conditions, suggesting that transformer-based architectures may require additional fine-tuning for emotion recognition in QoE assessment.
3.3. Precision and Recall of Four Model Techniques on Four Custom Datasets
In this section, we have analyzed the four models-technique’s precision and recall performance. It shows the each technique’s performance on our custom dataset.
EmotionNET model technique prediction performance: In the
Table 3 WiFi Boosteroid dataset, the model technique demonstrates high precision and recall across most emotion categories, with “Angry” having a precision of 0.996451 and a recall of 0.943507. The “Disgusted” category shows a precision of 0.998307 and a recall of 0.894303, suggesting strong but slightly lower recall. For “Fearful”, the model achieves a precision of 0.8966 and a recall of 0.958045, indicating high recall but some false positives. “Happy” is detected with a precision of 0.980498 and a recall of 0.946979, showing a strong overall performance. The “Neutral” category, while having a high recall of 0.992221, shows a lower precision of 0.792283, possibly due to misclassification with other emotions. “Sad” is well balanced with a precision of 0.948759 and a recall of 0.933745. However, the “Surprised” category has a lower precision of 0.509869 but a high recall of 0.932985, indicating frequent detection but some confusion with other emotions. Overall, the model achieves an accuracy of 0.94432, with a macro average precision of 0.874681 and recall of 0.943112, and a weighted average precision of 0.954899 and recall of 0.94432. For the WiFi NVIDIA dataset, the model maintains an excellent performance, especially in the “Angry” category, with a precision of 0.992839 and a recall of 0.940402. The “Disgusted” category shows near-perfect precision at 0.999209, though with a slightly lower recall of 0.845461. “Fearful” is detected with a precision of 0.903822 and a recall of 0.95817, reflecting good detection with some false positives. The “Happy” category exhibits a strong performance with a precision of 0.996376 and a recall of 0.941228. The “Neutral” category has a slightly lower precision of 0.837867 but a high recall of 0.99181. The “Sad” category shows balanced detection with a precision of 0.79293 and a recall of 0.932435. However, “Surprised” has lower precision at 0.684749 but high recall at 0.947592. The accuracy for this dataset is 0.94901, with a macro average precision of 0.886828 and recall of 0.936854, and a weighted average precision of 0.949317 and recall of 0.94901.
In the mobile data Boosteroid dataset, this technique performs well, particularly in the “Angry” category, with a precision of 0.997275 and a recall of 0.912834. The “Disgusted” category also shows high precision at 0.997842 but slightly lower recall at 0.8358. For “Fearful”, the model achieves a precision of 0.939096 and a recall of 0.952018, indicating strong detection. “Happy” is detected with near-perfect precision of 0.993108 and a recall of 0.97137. The “Neutral” category, while having a very high recall of 0.992797, shows a lower precision of 0.797852. The “Sad” category exhibits good detection with a precision of 0.837156 and a recall of 0.967115. “Surprised” is detected with both high precision of 0.93585 and recall of 0.927648. Overall, the model achieves an accuracy of 0.937588, with a macro average precision of 0.930802 and recall of 0.937088, and a weighted average precision of 0.945943 and recall of 0.937588. However, in the mobile data NVIDIA dataset, the model’s performance significantly declines. The “Angry” category shows a precision of 0.585788 and a recall of 0.234193, indicating poor performance. The “Disgusted” category has both precision and recall at 0, suggesting a complete failure in detection. “Fearful” is detected with a precision of 0.237341 and a recall of 0.417283, showing low performance. The “Happy” category has better precision at 0.893458, though recall is still low at 0.490776. The “Neutral” category shows good precision at 0.949068 but lower recall at 0.83284. The “Sad” category is detected with a precision of 0.120353 and a recall of 0.25301, indicating low performance. The “Surprised” category also performs poorly, with a precision of 0.490042 and a recall of 0.044269. The overall accuracy for this dataset is 0.407133, with a macro average precision of 0.374598 and recall of 0.320782, and a weighted average precision of 0.543385 and recall of 0.407133, reflecting weak detection capability across categories. Given in
Figure 8 is the graphical representation of the precision and recall performance in the EmotionNET model. The classification results show that EmotionNET’s performance varies based on network stability. The model maintained high AUC scores (0.99) on WiFi networks, indicating accurate emotion recognition. However, in mobile data conditions, the AUC scores dropped (e.g., 0.70 for “Fearful” and 0.76 for “Happy” in the NVIDIA dataset), confirming that unstable latency impacts facial emotion-based QoE estimation. The results suggest that latency-aware adaptation mechanisms should be integrated into future model versions.
ConvoNEXT model technique prediction performance: In
Table 4, the ConvoNEXT model technique’s performance varies across different datasets, as demonstrated by the evaluation metrics provided. In the WiFi Boosteroid dataset, the model shows high precision for the “Angry” category at 0.975075, with a recall of 0.76526, indicating good accuracy in predicting “Angry” emotions but with some false negatives. The “Disgusted” category achieves both extremely high precision and recall, at 0.999612 and 0.930386, respectively, reflecting the model’s excellent performance in this category. For “Fearful” emotions, the precision is lower at 0.652562, but recall is higher at 0.807772, meaning the model detects most instances of this emotion, albeit with some classification errors. The “Happy” category also shows a strong performance with a precision of 0.892505 and a recall of 0.827109. However, the model’s performance in the “Neutral” category is mixed, with a low precision of 0.405371 but an exceptionally high recall of 0.977298, suggesting that while many instances are classified as “Neutral”, not all are correct. The “Sad” category has moderate precision and recall values of 0.672179 and 0.733344, respectively. The model struggles most with the “Surprised” category, where precision and recall are the lowest at 0.443454 and 0.569191, respectively. Overall, the model’s accuracy for this dataset is 0.752011, with a macro average precision of 0.72012 and recall of 0.723694, and a weighted average precision of 0.848551. In the WiFi NVIDIA dataset, the model shows slightly higher precision for the “Angry” category at 0.931996 but a lower recall of 0.7507. For the “Disgusted” category, while precision remains high at 0.99824, recall significantly drops to 0.241543, indicating that many instances go undetected. The “Fearful” category sees an improvement in precision to 0.678374, with a substantial increase in recall to 0.800021. The “Happy” category maintains very high precision at 0.979006 and recall at 0.819233, demonstrating a strong performance. In the “Neutral” category, precision drops to 0.467218, though recall remains high at 0.972754. The “Sad” category shows a moderate performance with a precision of 0.502351 and a recall of 0.761708, while the “Surprised” category has a precision of 0.618405 and a recall of 0.549401. The accuracy for this dataset is 0.75211, with a macro average precision of 0.73937 and recall of 0.705863, and a weighted average precision of 0.834592.
In the mobile data Boosteroid dataset, this technique maintains high precision for the “Angry” category at 0.981519, but recall drops to 0.464809. For the “Disgusted” category, both precision and recall remain high at 0.997433 and 0.26333, respectively, indicating accurate but less sensitive performance. The “Fearful” category has moderate precision at 0.696016 and higher recall at 0.756588. Precision and recall are high in the “Happy” category, at 0.985466 and 0.807506, respectively. The “Neutral” category shows low precision at 0.321687 but high recall at 0.973237, suggesting the model is more sensitive but less accurate in this category. The “Sad” category shows a balanced performance, with precision and recall at 0.540343 and 0.863147, respectively. However, the model struggles with the “Surprised” category, where both precision and recall are low, at 0.349073 and 0.341246, respectively. The overall accuracy for this dataset drops to 0.622849, with a macro average precision of 0.775363 and recall of 0.641776, and a weighted average precision of 0.826255. Finally, in the mobile data NVIDIA dataset, the model’s performance declines further, with precision dropping to 0.645119 and recall to 0.221551 for the “Angry” category. The “Disgusted” category maintains high precision at 0.997433 but very low recall at 0.26333. Precision in the “Fearful” category drops significantly to 0.214528, with a recall of 0.484776. The “Happy” category has high precision at 0.97986 but lower recall at 0.508122. For the “Neutral” category, precision increases to 0.913174, and recall is very high at 0.879558. Both precision and recall are low in the “Sad” category, at 0.214528 and 0.246687, respectively. The “Surprised” category also shows low precision and recall, at 0.154091 and 0.119015, respectively. The overall accuracy for this dataset further decreases to 0.609476, with a macro average precision of 0.436237 and recall of 0.374643, and a weighted average precision of 0.609476. This indicates a significantly lower performance on the mobile data NVIDIA dataset, particularly for certain emotion categories. Furthermore,
Figure 9 shows the graphical representation of the ConvoNEXT model technique’s precision and recall.
EfficientNET model technique prediction performance: Table 5 presents the performance of an EfficientNET model technique evaluated across four datasets: WiFi (Boosteroid and NVIDIA) and mobile data (Boosteroid and NVIDIA). For the WiFi Boosteroid dataset, the model shows high precision in the “Angry” category (0.900671) but low recall (0.319717). The “Fearful” category has a lower precision (0.231446) but a relatively high recall (0.582723), indicating better detection ability but lower precision. The “Happy” category shows a balanced performance with precision and recall values of 0.691593 and 0.637561, respectively. However, categories like “Disgusted” and “Surprised” perform poorly, with both precision and recall at 0 for “Disgusted” and low values for “Surprised” (precision: 0.081096, recall: 0.121697). The accuracy for this dataset is 0.373277, with a macro average precision of 0.34118 and recall of 0.392742. The weighted average precision is 0.598176, with a recall of 0.373277. For the WiFi NVIDIA dataset, the “Angry” category achieves a precision of 0.864387 and a recall of 0.293137. The “Fearful” category shows a precision of 0.253531 and a recall of 0.573052, while the “Happy” category performs well with a precision of 0.907626 and a recall of 0.63115. The “Neutral” category also shows high recall (0.850652) but lower precision (0.241137). Again, “Disgusted” has no detection, and “Surprised” has limited detection ability (precision: 0.114437, recall: 0.117328). The accuracy for this dataset is 0.411646, with a macro average precision of 0.362561 and recall of 0.37906, and a weighted average precision of 0.617448 with a recall of 0.411646.
In the mobile data Boosteroid dataset, the “Angry” category has a precision of 0.828677 but a very low recall of 0.054913. The “Fearful” category shows moderate performance with precision and recall values of 0.360429 and 0.546636, respectively. The “Happy” category performs well with a precision of 0.930001 and recall of 0.670511. The “Neutral” category has a high recall (0.909071) but low precision (0.19181). The “Surprised” category performs poorly with a precision of 0.502004 and a recall of 0.038783. The overall accuracy for this dataset is the lowest among the four at 0.339924, with a macro average precision of 0.452749 and recall of 0.368804. The weighted average precision is 0.570115 with a recall of 0.339924. Lastly, the mobile data NVIDIA dataset demonstrates the best overall performance. The “Angry” category achieves a precision of 0.983136 and a recall of 0.715405. The “Fearful” category shows high precision (0.65981) and recall (0.955705), while the “Happy” category has a very strong performance with precision and recall values of 0.983295 and 0.944172, respectively. The “Neutral” category also performs well with a precision of 0.727481 and a recall of 0.96838. The “Surprised” category has a precision of 0.830032 and a recall of 0.775642. The overall accuracy for this dataset is the highest at 0.867067, with a macro average precision of 0.849962 and recall of 0.818839, and a weighted average precision of 0.899558 with a recall of 0.867067. Moreover,
Figure 10 shows the graphical representation of the EfficientNET model technique precision and recall performance.
ViT model technique prediction performance: In
Table 6, for the WiFi Boosteroid dataset, the model technique performed exceptionally well with high precision and recall across most categories. Specifically, this technique achieved a precision of 0.9957 and recall of 0.9133 for the “Angry” category, 0.9999 precision and 0.8943 recall for “Disgusted”, and 0.9859 precision with 0.9551 recall for “Happy”. The “Neutral” category had a precision of 0.7187 but a high recall of 0.9888. The overall accuracy was strong at 0.9249, with macro averages of 0.8611 in precision and 0.9340 in recall, and weighted averages of 0.9411 in precision and 0.9249 in recall. The WiFi NVIDIA dataset also demonstrated good performance, with “Happy” showing 0.9659 precision and 0.9524 recall. The “Angry” category had a precision of 0.9722 and recall of 0.9089. However, the performance dropped for “Fearful” with a precision of 0.8554 and recall of 0.5058, and for “Disgusted”, where precision was 0.9993, but recall was only 0.6832. The model achieved an accuracy of 0.9141, with macro averages of 0.8756 in precision and 0.9065 in recall, and weighted averages of 0.9277 in precision and 0.9141 in recall.
In the mobile data Boosteroid dataset, the ViT model technique maintained a high performance, particularly in the “Happy” category with a precision of 0.9975 and recall of 0.9190. The “Angry” category also showed strong results with a precision of 0.9959 and recall of 0.8688. The overall accuracy was 0.9167, with macro averages of 0.9152 in precision and 0.9223 in recall, and weighted averages of 0.9340 in precision and 0.9167 in recall. However, in the mobile data NVIDIA dataset, the model’s performance declined significantly. The “Angry” category had a precision of 0.6257 and recall of 0.1249, while the “Disgusted” category failed entirely, registering 0 for both precision and recall. The “Fearful” category also showed low performance, with a precision of 0.2118 and recall of 0.4059. Despite this, the “Happy” category remained strong with a precision of 0.9757 and recall of 0.9587. The dataset’s overall accuracy was the lowest at 0.4731, with macro averages of 0.4197 in precision and 0.3686 in recall, and weighted averages of 0.5979 in precision and 0.4731 in recall. At the end,
Figure 11 shows the graphical representation of the ViT model precision and recall.
Four Model techniques’ Comparison with EmotionNET prediction performance: Upon comparing the precision and recall performance across the EmotionNET, ConvoNEXT, EfficientNET, and ViT model techniques, it is evident that EmotionNET consistently outperforms the others. EmotionNET demonstrates exceptionally high precision across a broad range of emotions, particularly excelling in categories like “Happy”, “Neutral”, and “Surprised”. where it nearly achieves perfect scores. Even in more challenging categories, such as “Disgusted”, where other models tend to struggle, EmotionNET still maintains competitive precision, highlighting its robustness. Conversely, ConvoNEXT, while performing well overall, shows a noticeable dip in precision for emotions like “Disgusted”, “Sad”, and “Surprised”, making it slightly less reliable than EmotionNET. Similarly, EfficientNET and ViT exhibit solid precision for some emotions, but they also suffer from significant drops, particularly for “Disgusted” and “Fearful”, where their precision is markedly lower than that of EmotionNET.
In terms of recall, EmotionNET continues to lead, displaying high recall across all emotions, which underscores its effectiveness in accurately identifying and capturing the relevant instances of each emotion. Although there is a minor drop in recall for the “Disgusted” emotion, EmotionNET still outperforms the other models, which exhibit more pronounced declines in recall for this and other emotions. ConvoNEXT, while balanced in its recall performance, struggles with emotions like “Disgusted” and “Fearful”, mirroring its precision shortcomings. EfficientNET also falters in recall for these emotions, particularly “Sad”, where it falls behind EmotionNET. ViT, though competitive in certain aspects, similarly suffers from inconsistent recall, particularly for “Disgusted”, “Sad”, “Fearful”, and “Surprised”. EmotionNET emerges as the best-performing model in this comparison. Its consistent and high performance in both precision and recall across various emotions makes it the most reliable model technique among those evaluated, especially in handling more challenging emotional categories where others fall short. This demonstrates the robustness and superior accuracy of EmotionNET in emotion detection tasks. We observed a significant drop in model performance under mobile data conditions, particularly for the NVIDIA dataset. The results indicate that real-time latency fluctuations impact facial expression recognition, leading to lower QoE prediction accuracy. This suggests that deep learning models should be optimized for network variability to maintain robust emotion-based QoE assessment.
3.4. Analysis of Emotions and QoE Comparison
During the experiment, we analyzed the player’s emotions while playing the online game from those clouds. After the assessment of players’ emotions, we realized that network performance was affecting players’ QoE, because the same players played the “Fortnite” game on both networks and we collected different emotions while playing the game.
Figure 12 provides a comprehensive comparison of detected emotions across four different datasets: WiFi (Boosteroid and NVIDIA) and mobile data (Boosteroid and NVIDIA). In the mobile data Boosteroid dataset, the “Angry” emotion is the most dominant, with a detection rate of 52.1%, indicating a significant presence of negative emotions. The “Disgusted” emotion is moderately detected at 8.9%, while “Fearful” is also prominent at 10.2%. Positive emotions like “Happy” are under-represented, with only 4.4% detection, making it the least detected emotion in this dataset. The “Neutral” emotion is detected at a moderate rate of 13.1%, reflecting a balanced emotional state, while “Sad” shows a slightly higher detection at 9.2%. The “Surprised” emotion is minimally detected at 2.1%, indicating it plays a minor role in this dataset.
In the mobile data NVIDIA dataset, the “Angry” emotion remains highly detected at 43.5%, though slightly less than in the WiFi Boosteroid dataset. The “Disgusted” emotion is consistent with a detection rate of 8.8%, while “Fearful” is slightly less prominent at 9.9%. Notably, the “Happy” emotion shows a significant increase to 16.3%, indicating better representation of positive emotions. The “Neutral” emotion is moderately present at 14.2%, similar to the first dataset. The “Sad” emotion shows a slightly higher detection at 9.5%, suggesting more varied emotions in this dataset. The “Surprised” emotion is detected slightly more than in the first dataset at 2.7%, though it still remains under-represented.
The WiFi Boosteroid dataset shows a significant drop in the detection of “Angry” emotions, with a rate of 27.4%, indicating a shift in emotion dominance. The “Disgusted” emotion is detected less frequently at 6.8%, showing a reduction in negative emotions. Interestingly, the “Fearful” emotion shows a slight increase to 11.1%, indicating a stronger presence of fear-related emotions. The “Happy” emotion continues to increase, reaching 17.5%, making it one of the more dominant emotions in this dataset. The “Neutral” emotion shows a slight increase to 16.4%, suggesting a balanced emotional state. The “Sad” emotion is consistent at 9.1%, while the “Surprised” emotion shows a significant increase to 13.3%, making it a notable emotion in this dataset.
The WiFi NVIDIA dataset presents a dramatic shift in emotion detection, with the “Angry” emotion almost disappearing at a mere 0.3%. The “Disgusted” emotion is slightly less prominent at 6.5%, maintaining a minor role. The “Fearful” emotion is consistently present at 9.2%, though not dominant. The “Happy” emotion reaches its highest detection rate in this dataset at 18.1%, marking a shift towards more positive emotions. The “Neutral” emotion becomes overwhelmingly dominant, with a detection rate of 51.7%, indicating a strong neutral state in this dataset. The “Sad” emotion remains consistent with a detection rate of 9.1%, similar to other datasets. However, the “Surprised” emotion experiences a sharp drop to 0.7%, making it almost not important in this dataset.
When we applied the DL-based technique on emotion recognition, it gave remarkable results in emotion-based QoE. Later, we compared the EmotionNET technique with other DL-based techniques on our custom-created datasets. During the training process, we found that other models like ConvoNEXT, EfficientNET, and ViT have quite good training accuracy but they have a big overfitting problem. We found that the EmotionNET technique is the best technique to analyze emotion-based QoE. After the detailed observation of the EmotionNET technique on a custom emotion-based dataset, it is clearly visible that the network is affecting the player’s QoE, because the WiFi network has stable connectivity as compared to mobile data. This type of QoE helps cloud service providers to know the accurate QoE. These findings have significant implications for cloud gaming services. Emotion-based QoE assessment provides a more objective measure of user experience compared to traditional surveys. Cloud gaming providers can use this technology to monitor real-time user satisfaction and dynamically adjust video quality, latency compensation, or server allocation to optimize the gaming experience. Moreover, integrating this approach into cloud gaming platforms can lead to enhanced user engagement and reduced churn rates.