1. Introduction
Deep learning has greatly influenced a range of fields, such as computer vision, signal processing, and natural language processing. However, training deep convolutional architectures from scratch necessitates a substantial amount of data compared to traditional machine learning algorithms. This requirement becomes even more crucial when considering Transformer architecture. It is widely recognized that Transformers are data-intensive [
1,
2,
3,
4], relative to traditional CNN architectures. The advantage of this architecture lies in its enhanced model expressivity and ability to effectively learn complex tasks given ample data. Yet, the use of small datasets often leads to overfitting issues and inadequate generalization to unseen data.
Moreover, collecting annotated data in the medical domain poses additional challenges, especially when human involvement is required, due to the sensitive nature of the information gathered. Additionally, obtaining an adequate number of individuals with specific target pathologies can be complex. To overcome the issue of limited dataset size, data augmentation techniques are commonly used to artificially increase the number of training samples. These techniques involve sampling new data by applying various transformations [
5,
6] or interpolating new samples based on existing ones [
7,
8,
9].
However, for medical datasets, particularly those related to pathologies with distinct morphologies and structural characteristics, using such augmentation methods may not be appropriate as they can introduce artifacts and alter pathological features. Mixing-based algorithms like CutMix and MixUp, which assume a linear relationship between input and label, may also have drawbacks in this context.
As a result, pathologists and researchers often use specialized augmentation methods tailored to the characteristics of pathology patterns. These techniques involve learning the data distribution using generative approaches and then sampling from it. However, these methods are considered less effective in capturing complex or rare patterns compared to transformation-based techniques that generate diverse high-resolution samples. When trained on small-sized datasets with high-resolution images, these generative methods may not adequately capture all the patterns present in the data, as is evident in tasks like eye movement gaze classification where a two-minute recording corresponds to a multivariate time series consisting of approximately 24,000 points.
To address the challenges of limited annotated medical data, one potential solution is to enhance sample diversity by incorporating realistic physiological variations instead of directly learning the distribution. This approach leverages domain-specific knowledge and generates samples that align with the inherent characteristics of physiological data, contributing to robustness and authenticity. In this study, we introduce a physiologically based gaze data augmentation library that emulates head movements during data collection, capturing natural variability and intricacy in eye movement patterns.
The contributions are as follows:
We introduce EMULATE, a novel library for eye movement gaze data augmentation. Named EMULATE, which stands for Eye Movement data Augmentation by Emulating Head Position and Movement, this tool pioneers its category by emulating physiological aspects. The library generates augmented eye movement data by simulating natural head movements, both prior to recording and in real-time during gaze data collection.
We evaluate the data augmentation technique on three distinct architectures—two based on CNN and one hybrid, utilizing two separate datasets.
We explore various augmentation settings, demonstrating the effectiveness of the proposed library in regularizing the training process of the proposed architecture and improving its generalization ability.
We examine the complementarity between the proposed method and additional standard baseline approaches.
This paper is structured as follows: It begins with a summary of the state of the art for data augmentation, followed by an overview of the studies introducing the various architectures used. Additionally, detailed information is provided on the materials utilized in this study, including the eye movement recording setup and the resulting dataset, introduced in [
10]. In this study, we explore the relevance of the proposed data augmentation method by integrating it into the existing training framework [
10]. Thus, the three models are initially trained with and without the proposed method. Subsequently, comparisons are made between training with EMULATE and training with other baseline methods. Finally, we analyze the complementarity between EMULATE and the proposed baselines. In the methods section, we present the proposed data augmentation library, along with the experimental settings used to evaluate the significance of these methods. This includes the architecture used for training, as well as the augmentation and regularization methods for comparison. Implementation details such as model training and evaluation methods are also provided, together with the various hyperparameters used for the architecture, EMULATE, and the model training pipeline.
Furthermore, the experimental results are discussed in the results section and elaborated upon in the discussion section. Finally, the limitations and future directions of the proposed method are reviewed.
Author Contributions
Supervision, Z.K.;methodology, A.E.E.H.; software, A.E.E.H.; validation, A.E.E.H. and ZK; formal analysis, A.E.E.H.; investigation, A.E.E.H.; resources, ZK.; data curation, A.E.E.H.; Conceptualization, A.E.E.H.; writing—original draft, A.E.E.H.; writing—review and editing, Z.K.; visualization, A.E.E.H.; project administration, Z.K.; funding acquisition, Z.K. All authors have read and agreed to the published version of the manuscript.
Funding
A.E.E.H. is funded by Orasis-Ear, ANRT, and CIFRE.
Informed Consent Statement
This meta-analysis drew upon data sourced from Orasis Ear, in collaboration with clinical centers employing Remobi and Aideal technology. Participating centers agreed to store their data anonymously for further analysis.
Data Availability Statement
The datasets generated during and/or analyzed during the current study are not publicly available. This meta-analysis drew upon data sourced from Orasis Ear, in collaboration with clinical centers employing REMOBI and AiDEAL technology. Participating centers agreed to store their data anonymously for further analysis. However, upon reasonable request, they are available from the corresponding author.
Acknowledgments
This work was granted access to the HPC resources of IDRIS under the allocation 2024-AD011014231 made by GENCI.
Conflicts of Interest
Zoï Kapoula is the founder of Orasis-EAR.
Appendix A
Table A1.
The presentation of the global F1 (Macro F1), global positive F1 (Pos. F1), and global negative F1 (Neg. F1) scores when trained on the vergence visual task. For each model, the best global F1 score is highlighted in bold.
| HTCE-MAX | HTCE-MEAN | HTCSE |
---|
Macro | Neg. | Pos. | Macro | Neg. | Pos. | Macro | Neg. | Pos. |
---|
F1 | F1 | F1 | F1 | F1 | F1 | F1 | F1 | F1 |
---|
Cutout | 67.2 | 86.9 | 47.5 | 68.2 | 88.6 | 47.8 | 68.5 | 88.2 | 48.8 |
Dropout | 68.3 | 88.0 | 48.6 | 66.8 | 86.0 | 47.6 | 68.2 | 87.4 | 49.1 |
CutMix | 70.1 | 89.1 | 51.0 | 70.2 | 89.2 | 51.1 | 70.1 | 88.9 | 51.2 |
MixUp | 69.2 | 89.5 | 48.9 | 69.5 | 89.6 | 49.4 | 69.3 | 88.3 | 50.4 |
No Aug. | 68.6 | 89.1 | 48.0 | 66.9 | 88.2 | 45.7 | 68.3 | 88.0 | 48.5 |
Dynamic | 67.6 | 87.0 | 48.1 | 68.9 | 88.6 | 49.3 | 69.1 | 88.4 | 49.8 |
Dynamic High | 66.1 | 86.4 | 45.9 | 68.3 | 87.5 | 49.1 | 69.1 | 88.3 | 49.9 |
Static | 68.6 | 88.4 | 48.9 | 67.8 | 86.9 | 48.6 | 68.8 | 88.3 | 49.3 |
Table A2.
Presentation of the global F1 (Macro F1), global positive F1 (Pos. F1), and global negative F1 (Neg. F1) scores when trained on the saccade visual task. For each model, the best global F1 score is highlighted in bold.
| HTCE-MAX | HTCE-MEAN | HTCSE |
---|
Macro | Neg. | Pos. | Macro | Neg. | Pos. | Macro | Neg. | Pos. |
---|
F1 | F1 | F1 | F1 | F1 | F1 | F1 | F1 | F1 |
---|
Cutout | 69.7 | 89.1 | 50.2 | 68.9 | 88.1 | 49.8 | 69.5 | 88.1 | 50.9 |
Dropout | 69.8 | 87.9 | 51.6 | 69.0 | 87.2 | 50.7 | 70.4 | 88.4 | 52.5 |
CutMix | 71.3 | 89.4 | 53.3 | 71.6 | 89.6 | 53.5 | 71.5 | 89.1 | 53.9 |
MixUp | 69.8 | 89.4 | 50.3 | 70.8 | 89.2 | 52.4 | 70.7 | 88.6 | 52.8 |
No Aug. | 67.0 | 86.8 | 47.3 | 64.8 | 86.0 | 43.6 | 69.2 | 88.0 | 50.4 |
Dynamic | 69.7 | 88.9 | 50.6 | 70.1 | 88.4 | 51.8 | 70.3 | 88.6 | 52.0 |
Dynamic High | 69.8 | 88.6 | 51.0 | 69.9 | 89.2 | 50.5 | 70.0 | 88.7 | 51.3 |
Static | 69.0 | 88.6 | 49.5 | 69.6 | 88.4 | 50.8 | 69.7 | 88.2 | 51.3 |
Table A3.
Peer class macro F1 scores for each augmentation and regularization method when separately training the three different architectures on the vergence dataset.
Model | Augmentation Method | Class 0 | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 | Class 6 | Class 7 |
---|
HTCE-MAX | CutMix | 67.4 | 71.3 | 63.9 | 63.5 | 78.0 | 70.2 | 74.0 | 73.1 |
Cutout | 64.5 | 68.2 | 63.3 | 62.2 | 73.8 | 66.0 | 70.8 | 69.9 |
Dropout | 65.9 | 69.4 | 62.4 | 62.7 | 75.7 | 67.5 | 71.7 | 72.3 |
MixUp | 66.3 | 70.4 | 62.9 | 63.2 | 76.9 | 68.7 | 73.7 | 72.4 |
No Aug. | 65.7 | 70.1 | 62.1 | 62.8 | 76.7 | 69.0 | 73.1 | 71.9 |
Dynamic | 63.4 | 69.9 | 62.0 | 60.1 | 74.9 | 67.7 | 71.9 | 71.6 |
Dynamic High | 64.3 | 68.0 | 62.5 | 62.5 | 72.4 | 65.7 | 70.2 | 64.3 |
Static | 65.8 | 69.6 | 63.4 | 61.2 | 76.4 | 68.9 | 73.2 | 71.6 |
HTCE-MEAN | CutMix | 68.1 | 71.7 | 63.9 | 63.9 | 78.2 | 70.4 | 73.1 | 73.0 |
Cutout | 65.7 | 69.5 | 63.7 | 63.7 | 75.7 | 68.5 | 71.8 | 68.0 |
Dropout | 63.9 | 67.7 | 62.1 | 61.3 | 73.0 | 67.3 | 69.8 | 70.2 |
MixUp | 66.8 | 70.9 | 62.7 | 63.3 | 77.1 | 70.1 | 73.6 | 72.5 |
No Aug. | 64.9 | 68.4 | 63.3 | 60.6 | 72.9 | 68.3 | 71.7 | 66.4 |
Dynamic | 66.8 | 70.7 | 63.5 | 62.7 | 76.7 | 69.1 | 71.8 | 71.1 |
Dynamic High | 66.1 | 70.5 | 62.2 | 61.9 | 74.4 | 68.6 | 71.7 | 71.8 |
Static | 65.5 | 68.0 | 63.8 | 63.2 | 75.1 | 67.7 | 71.8 | 68.3 |
HTCSE | CutMix | 67.6 | 71.5 | 64.6 | 63.4 | 77.2 | 69.5 | 74.1 | 73.7 |
Cutout | 66.5 | 69.7 | 62.7 | 62.9 | 75.2 | 68.4 | 72.0 | 71.7 |
Dropout | 66.2 | 69.3 | 63.3 | 63.1 | 74.5 | 67.5 | 71.0 | 71.8 |
MixUp | 66.8 | 70.1 | 63.4 | 64.0 | 76.6 | 68.3 | 73.1 | 73.3 |
No Aug. | 65.9 | 68.8 | 62.9 | 62.5 | 75.5 | 67.7 | 72.5 | 71.2 |
Dynamic | 67.3 | 69.9 | 63.7 | 62.8 | 76.1 | 68.1 | 72.3 | 73.7 |
Dynamic High | 67.0 | 69.9 | 63.7 | 62.5 | 76.4 | 68.0 | 73.4 | 72.9 |
Static | 67.5 | 70.2 | 63.0 | 62.6 | 76.1 | 68.2 | 72.0 | 71.7 |
Table A4.
Peer class macro F1 scores for each augmentation and regularization method when separately training the three different architectures on the saccade dataset.
Model | Augmentation Method | Class 0 | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 | Class 6 | Class 7 |
---|
HTCE-MAX | CutMix | 70.5 | 74.4 | 65.5 | 66.7 | 79.0 | 72.8 | 73.0 | 69.8 |
Cutout | 68.7 | 71.8 | 64.5 | 64.0 | 78.5 | 70.8 | 72.7 | 67.6 |
Dropout | 68.1 | 71.6 | 64.5 | 66.1 | 77.1 | 71.4 | 72.3 | 68.0 |
MixUp | 68.9 | 73.0 | 63.1 | 65.5 | 78.6 | 70.8 | 72.2 | 67.7 |
No Aug. | 67.3 | 70.6 | 63.2 | 61.4 | 76.6 | 65.6 | 66.9 | 65.5 |
Dynamic | 68.5 | 72.7 | 64.3 | 64.9 | 78.5 | 70.0 | 72.3 | 67.8 |
Dynamic High | 68.1 | 71.7 | 64.5 | 64.9 | 78.6 | 71.4 | 72.5 | 68.2 |
Static | 68.4 | 71.7 | 62.8 | 64.5 | 78.6 | 70.6 | 72.0 | 64.7 |
HTCE-MEAN | CutMix | 70.5 | 74.4 | 65.5 | 67.2 | 79.8 | 73.4 | 74.1 | 68.9 |
Cutout | 69.0 | 72.2 | 65.6 | 64.8 | 77.7 | 71.3 | 68.4 | 63.5 |
Dropout | 69.4 | 72.4 | 66.1 | 65.7 | 74.0 | 70.1 | 68.4 | 66.8 |
MixUp | 69.7 | 73.1 | 66.2 | 65.5 | 79.5 | 72.1 | 73.8 | 67.5 |
No Aug. | 64.3 | 66.8 | 63.2 | 61.0 | 73.8 | 66.3 | 67.8 | 56.2 |
Dynamic | 69.7 | 72.3 | 66.0 | 65.4 | 78.9 | 71.4 | 72.0 | 66.1 |
Dynamic High | 68.9 | 72.3 | 64.4 | 65.0 | 79.0 | 72.0 | 71.1 | 67.2 |
Static | 69.1 | 71.8 | 64.9 | 65.8 | 77.4 | 71.4 | 72.0 | 65.5 |
HTCSE | CutMix | 70.3 | 73.9 | 65.5 | 67.1 | 79.8 | 72.9 | 73.7 | 70.2 |
Cutout | 67.9 | 71.1 | 64.8 | 65.1 | 78.0 | 69.8 | 72.3 | 68.3 |
Dropout | 69.3 | 73.1 | 65.8 | 66.5 | 77.0 | 71.8 | 72.1 | 69.0 |
MixUp | 69.7 | 72.1 | 65.7 | 66.9 | 78.5 | 72.3 | 73.0 | 68.3 |
No Aug. | 68.2 | 71.6 | 64.4 | 64.5 | 76.9 | 69.6 | 71.6 | 67.8 |
Dynamic | 68.8 | 73.0 | 63.4 | 66.2 | 78.6 | 71.8 | 73.2 | 68.3 |
Dynamic High | 69.2 | 72.6 | 64.0 | 65.3 | 78.4 | 71.5 | 72.4 | 67.8 |
Static | 68.4 | 71.9 | 64.6 | 65.5 | 77.2 | 70.1 | 72.3 | 68.8 |
Table A5.
Peer class macro F1 scores for each augmentation and regularization baseline method when training in combination with the dynamic variant of EMULATE, the three different architectures on the vergence dataset.
Model | Augmentation Method | Class 0 | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 | Class 6 | Class 7 |
---|
HTCE-MAX | CutMix | 67.7 | 71.9 | 63.6 | 62.5 | 77.6 | 69.1 | 74.0 | 72.8 |
Cutout | 66.6 | 69.7 | 64.3 | 63.3 | 76.5 | 69.2 | 73.0 | 72.9 |
Dropout | 67.4 | 70.8 | 64.0 | 62.8 | 76.7 | 69.7 | 72.8 | 73.5 |
MixUp | 67.3 | 70.9 | 62.6 | 62.5 | 76.3 | 68.7 | 73.7 | 73.8 |
HTCE-MEAN | CutMix | 68.0 | 71.1 | 63.0 | 62.9 | 76.9 | 70.1 | 73.1 | 74.6 |
Cutout | 67.5 | 70.6 | 64.6 | 62.4 | 77.2 | 68.7 | 72.8 | 72.4 |
Dropout | 67.3 | 70.8 | 63.9 | 63.5 | 75.6 | 69.7 | 70.2 | 73.1 |
MixUp | 67.3 | 71.2 | 63.1 | 63.7 | 77.1 | 69.4 | 73.6 | 73.8 |
HTCSE | CutMix | 67.5 | 71.0 | 62.8 | 63.4 | 77.0 | 68.9 | 74.1 | 73.3 |
Cutout | 67.5 | 70.3 | 63.3 | 63.2 | 75.9 | 68.6 | 72.9 | 73.1 |
Dropout | 68.0 | 70.8 | 63.5 | 63.2 | 75.6 | 68.5 | 72.9 | 72.7 |
MixUp | 68.2 | 70.8 | 64.3 | 64.3 | 76.4 | 67.5 | 73.5 | 73.5 |
Table A6.
Peer class macro F1 scores for each augmentation and regularization baseline method when training in combination with the dynamic High variant of EMULATE, the three different architectures on the vergence dataset.
Model | Augmentation Method | Class 0 | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 | Class 6 | Class 7 |
---|
HTCE-MAX | CutMix | 67.7 | 70.7 | 64.4 | 62.2 | 77.2 | 69.6 | 73.3 | 73.3 |
Cutout | 66.9 | 70.4 | 63.1 | 63.0 | 76.4 | 68.3 | 73.4 | 72.8 |
Dropout | 67.1 | 71.4 | 63.3 | 64.5 | 75.9 | 69.2 | 72.8 | 73.8 |
MixUp | 67.4 | 70.5 | 62.9 | 62.1 | 76.9 | 68.5 | 72.9 | 73.5 |
HTCE-MEAN | CutMix | 68.0 | 71.4 | 64.0 | 63.3 | 77.2 | 69.7 | 72.9 | 73.6 |
Cutout | 67.2 | 71.3 | 63.5 | 62.4 | 77.4 | 69.2 | 73.6 | 73.7 |
Dropout | 67.3 | 71.1 | 64.6 | 63.8 | 75.3 | 69.6 | 70.7 | 72.9 |
MixUp | 68.1 | 71.3 | 64.1 | 62.4 | 77.3 | 69.5 | 71.8 | 74.1 |
HTCSE | CutMix | 66.8 | 70.8 | 63.4 | 62.9 | 76.8 | 68.1 | 73.4 | 73.9 |
Cutout | 67.3 | 69.9 | 63.6 | 63.5 | 76.5 | 67.9 | 74.1 | 73.3 |
Dropout | 68.6 | 71.2 | 64.3 | 64.1 | 74.9 | 68.7 | 73.2 | 72.3 |
MixUp | 67.5 | 70.3 | 64.0 | 63.8 | 76.8 | 67.7 | 73.3 | 73.8 |
Table A7.
Peer class macro F1 scores for each augmentation and regularization baseline method when training in combination with the dynamic variant of EMULATE, the three different architectures on the saccade dataset.
Model | Augmentation Method | Class 0 | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 | Class 6 | Class 7 |
---|
HTCE-MAX | CutMix | 69.4 | 73.4 | 65.0 | 65.6 | 79.6 | 71.6 | 72.8 | 67.7 |
Cutout | 68.9 | 72.6 | 63.5 | 64.0 | 79.5 | 72.0 | 72.3 | 67.6 |
Dropout | 69.5 | 73.0 | 64.9 | 65.4 | 79.4 | 73.3 | 72.2 | 70.9 |
MixUp | 68.1 | 72.7 | 63.5 | 64.3 | 79.8 | 71.1 | 72.6 | 69.0 |
HTCE-MEAN | CutMix | 70.4 | 73.5 | 65.0 | 66.4 | 79.6 | 73.0 | 73.1 | 68.4 |
Cutout | 68.9 | 72.1 | 64.4 | 65.4 | 79.3 | 73.2 | 71.3 | 67.6 |
Dropout | 69.6 | 72.6 | 64.9 | 64.4 | 77.5 | 71.8 | 70.1 | 67.6 |
MixUp | 69.5 | 73.0 | 65.0 | 63.9 | 79.5 | 72.4 | 73.4 | 69.5 |
HTCSE | CutMix | 69.6 | 73.1 | 64.1 | 66.3 | 79.6 | 72.4 | 73.1 | 68.7 |
Cutout | 69.3 | 73.1 | 64.9 | 66.2 | 78.4 | 71.8 | 73.4 | 68.5 |
Dropout | 69.4 | 72.3 | 65.4 | 67.1 | 77.8 | 73.7 | 74.0 | 67.8 |
MixUp | 69.5 | 73.1 | 64.2 | 67.0 | 79.6 | 72.3 | 73.5 | 68.0 |
Table A8.
Peer class macro F1 scores for each augmentation and regularization baseline method when training in combination with the dynamic High variant of EMULATE, the three different architectures on the saccade dataset.
Model | Augmentation Method | Class 0 | Class 1 | Class 2 | Class 3 | Class 4 | Class 5 | Class 6 | Class 7 |
---|
HTCE-MAX | CutMix | 69.0 | 72.9 | 63.8 | 65.5 | 79.2 | 71.7 | 72.2 | 67.7 |
Cutout | 68.2 | 72.2 | 63.5 | 64.1 | 79.1 | 71.0 | 72.3 | 68.6 |
Dropout | 68.7 | 72.5 | 64.9 | 65.1 | 79.2 | 73.2 | 72.2 | 69.8 |
MixUp | 67.7 | 72.6 | 63.1 | 64.2 | 79.6 | 71.3 | 72.4 | 68.4 |
HTCE-MEAN | CutMix | 70.0 | 73.4 | 64.6 | 65.4 | 79.3 | 73.4 | 73.3 | 68.0 |
Cutout | 69.5 | 72.4 | 64.2 | 65.6 | 79.1 | 73.0 | 70.9 | 66.7 |
Dropout | 69.7 | 73.5 | 65.5 | 65.6 | 77.7 | 72.0 | 71.6 | 67.2 |
MixUp | 69.1 | 73.3 | 64.9 | 65.9 | 78.7 | 72.1 | 73.0 | 68.6 |
HTCSE | CutMix | 69.8 | 73.1 | 63.7 | 65.3 | 79.3 | 73.1 | 73.1 | 68.1 |
Cutout | 69.9 | 73.0 | 64.7 | 66.3 | 78.9 | 72.6 | 74.2 | 68.8 |
Dropout | 70.1 | 73.2 | 64.9 | 67.1 | 77.9 | 73.6 | 73.2 | 68.5 |
MixUp | 69.1 | 72.8 | 63.7 | 66.2 | 79.2 | 72.6 | 73.1 | 68.9 |
Table A9.
HTCE feature extractor hyperparameters.
Stage | Filter Size | Pooling | Kernel Size | Activation |
---|
1 | 128-128-128 | 0-0-2 | 5-5-5 | relu |
2 | 128-128-128 | 0-0-2 | 5-5-5 | relu |
3 | 256-256-256 | 0-2-2 | 5-5-5 | relu |
4 | 512-512-512 | 0-2-2 | 3-3-3 | relu |
Table A10.
Lightweight HTCE hyperparameters.
Stage | Filter Size | Pooling | Kernel Size | Activation |
---|
1 | 64-64 | 0-2 | 5-5 | relu |
2 | 128-128 | 0-2 | 5-5 | relu |
3 | 256-256 | 2-2 | 5-5 | relu |
4 | 512-512 | 2-2 | 3-3 | relu |
Table A11.
Model Training hyperparameters.
| Value |
---|
Optimizer | |
Name | AdamW |
Learning rate | 0.0001 |
Beta1 | 0.9 |
Beta2 | 0.999 |
Weight decay | 0.00001 |
Loss | |
name | Focal loss |
Alpha class 0 | 0.73 |
Alpha class 1 | 0.61 |
Alpha class 2 | 0.90 |
Alpha class 3 | 0.88 |
Alpha class 4 | 0.67 |
Alpha class 5 | 0.83 |
Alpha class 6 | 0.81 |
Alpha class 7 | 0.32 |
Gamma | 5 |
Training | |
Batch size (HTCE-MAX) | 128 |
Batch size (HTCE-MEAN) | 128 |
Batch size (Baselines) | 128 |
Batch size (HTCSE) | 256 |
Epochs | 100 |
Number of folds | 3 |
References
- Tagnamas, J.; Ramadan, H.; Yahyaouy, A.; Tairi, H. Multi-task approach based on combined CNN-transformer for efficient segmentation and classification of breast tumors in ultrasound images. Vis. Comput. Ind. Biomed. Art 2024, 7, 2. [Google Scholar]
- Pan, X.; Xiong, J. DCTNet: A Hybrid Model of CNN and Dilated Contextual Transformer for Medical Image Segmentation. In Proceedings of the 2023 IEEE 6th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 24–26 February 2023; IEEE: New York, NY, USA, 2023; Volume 6, pp. 1316–1320. [Google Scholar]
- Lin, X.; Yan, Z.; Deng, X.; Zheng, C.; Yu, L. ConvFormer: Plug-and-Play CNN-Style Transformers for Improving Medical Image Segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Vancouver, BC, Canada, 8–12 October 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 642–651. [Google Scholar]
- Abibullaev, B.; Keutayeva, A.; Zollanvari, A. Deep Learning in EEG-Based BCIs: A Comprehensive Review of Transformer Models, Advantages, Challenges, and Applications. IEEE Access 2023, 11, 127271–127301. [Google Scholar] [CrossRef]
- Cubuk, E.D.; Zoph, B.; Shlens, J.; Le, Q.V. Randaugment: Practical automated data augmentation with a reduced search space. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 702–703. [Google Scholar]
- Fons, E.; Dawson, P.; Zeng, X.j.; Keane, J.; Iosifidis, A. Adaptive weighting scheme for automatic time-series data augmentation. arXiv 2021, arXiv:2102.08310. [Google Scholar]
- Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6023–6032. [Google Scholar]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond empirical risk minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar]
- Alex, A.; Wang, L.; Gastaldo, P.; Cavallaro, A. Mixup augmentation for generalizable speech separation. In Proceedings of the 2021 IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP), Tampere, Finland, 6–8 October 2021; IEEE: New York, NY, USA, 2021; pp. 1–6. [Google Scholar]
- El Hmimdi, A.E.; Themis Palpanas, Z.K. Efficient Diagnostic Classification of Diverse Pathologies through Contextual Eye Movement Data Analysis with a Novel Hybrid Architecture. Sci. Rep.
- Zemblys, R.; Niehorster, D.C.; Holmqvist, K. gazeNet: End-to-end eye-movement event detection with deep neural networks. Behav. Res. Methods 2019, 51, 840–864. [Google Scholar] [CrossRef] [PubMed]
- Elbattah, M.; Loughnane, C.; Guérin, J.L.; Carette, R.; Cilia, F.; Dequen, G. Variational autoencoder for image-based augmentation of eye-tracking data. J. Imaging 2021, 7, 83. [Google Scholar] [CrossRef] [PubMed]
- Fuhl, W.; Rong, Y.; Kasneci, E. Fully convolutional neural networks for raw eye tracking data segmentation, generation, and reconstruction. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; IEEE: New York, NY, USA, 2021; pp. 142–149. [Google Scholar]
- Luo, Y.; Zhu, L.Z.; Wan, Z.Y.; Lu, B.L. Data augmentation for enhancing EEG-based emotion recognition with deep generative models. J. Neural Eng. 2020, 17, 056021. [Google Scholar] [CrossRef] [PubMed]
- Özdenizci, O.; Erdoğmuş, D. On the use of generative deep neural networks to synthesize artificial multichannel EEG signals. In Proceedings of the 2021 10th International IEEE/EMBS Conference on Neural Engineering (NER), Virtual, 4–6 May 2021; IEEE: New York, NY, USA, 2021; pp. 427–430. [Google Scholar]
- Luo, Y.; Zhu, L.Z.; Lu, B.L. A GAN-based data augmentation method for multimodal emotion recognition. In Proceedings of the Advances in Neural Networks—ISNN 2019: 16th International Symposium on Neural Networks, ISNN 2019, Moscow, Russia, 10–12 July 2019; Proceedings, Part I 16. Springer: Berlin/Heidelberg, Germany, 2019; pp. 141–150. [Google Scholar]
- Cubuk, E.D.; Zoph, B.; Mane, D.; Vasudevan, V.; Le, Q.V. Autoaugment: Learning augmentation strategies from data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Beach, CA, USA, 15–20 June 2019; pp. 113–123. [Google Scholar]
- DeVries, T.; Taylor, G.W. Improved regularization of convolutional neural networks with cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar]
- El Hmimdi, A.E.; Kapoula, Z.; Sainte Fare Garnot, V. Deep Learning-Based Detection of Learning Disorders on a Large Scale Dataset of Eye Movement Records. BioMedInformatics 2024, 4, 519–541. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Singh, P.; Thoke, A.; Verma, K. A Novel Approach to Face Detection Algorithm. Int. J. Comput. Appl. 2011, 975, 8887. [Google Scholar] [CrossRef]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Cutmix Algorithm. Available online: https://keras.io/api/keras_cv/layers/augmentation/cut_mix (accessed on 2 February 2024).
- Cutout Algorithm. Available online: https://keras.io/api/keras_cv/layers/augmentation/random_cutout/ (accessed on 2 February 2024).
- Mixup Algorithm. Available online: https://keras.io/api/keras_cv/layers/augmentation/mix_up/ (accessed on 2 February 2024).
- Iterative Stratification. Available online: https://scikit.ml/api/skmultilearn.model_selection.iterative_stratification.html (accessed on 2 February 2024).
- Loshchilov, I.; Hutter, F. Decoupled weight decay regularization. arXiv 2017, arXiv:1711.05101. [Google Scholar]
- André-Deshays, C.; Berthoz, A.; Revel, M. Eye-head coupling in humans: I. Simultaneous recording of isolated motor units in dorsal neck muscles and horizontal eye movements. Exp. Brain Res. 1988, 69, 399–406. [Google Scholar] [CrossRef] [PubMed]
- Baur, C.; Albarqouni, S.; Navab, N. MelanoGANs: High resolution skin lesion synthesis with GANs. arXiv 2018, arXiv:1804.04338. [Google Scholar]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
- Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of gans for improved quality, stability, and variation. arXiv 2017, arXiv:1710.10196. [Google Scholar]
- Hayat, K. Super-resolution via deep learning. arXiv 2017, arXiv:1706.09077. [Google Scholar]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part IV 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 184–199. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Figure 1.
Illustration of the physical model used to build the proposed data augmentation method. Point R corresponds to the position of the right eye pupil center. Point L corresponds to the position of the left eye pupil center. Point O corresponds to the center of the referential system, as well as the position of the head center. Illustration of the plane (OY, OX) where the pupil and head center.
Figure 2.
A comparison of the performance differences among different methods in terms of the global F1 scores for the three architectures, when trained on the saccade dataset (right subfigure) and the vergence dataset (left subfigure).
Figure 3.
A barplot comparing the different baseline performances when combined with the dynamic and dynamic high EMULATE settings, and trained with the HTCE-MAX (left subfigure), the HTCE-MEAN (middle subfigure), and the HTCSE (right subfigure) on the vergence dataset.
Figure 4.
A barplot comparing the different baseline performances when combined with the dynamic and dynamic high EMULATE settings, and trained with the HTCE-MAX (left subfigure), the HTCE-MEAN (middle subfigure), and the HTCSE (right subfigure) on the saccade dataset.
Table 1.
Presentation of the different groups of pathologies and the patient count for the saccade and the vergence datasets.
Class Identifier | Corresponding Disorder | Saccade Dataset | Vergence Dataset |
---|
0 | Dyslexia | 873 | 854 |
1 | Reading disorder | 1264 | 1265 |
2 | Listening and expressing | 331 | 321 |
3 | Vertigo and postural | 396 | 372 |
4 | Attention and neurological | 1016 | 975 |
5 | Neuro-strabismus | 455 | 511 |
6 | Visual fatigue | 678 | 567 |
7 | Other pathologies | 195 | 279 |
Table 2.
An overview of the various parameters defining the configuration for each of the 5 augmentation strategies. Note that U(a,b) corresponds to the uniform distribution on the interval [a,b].
Parameter | Dynamic | Dynamic High | Static |
---|
Initial angular position | U(−15,15) | U(−20,20) | U(−10,10) |
Maximum angular position | U(−15,15) | U(−20,20) | - |
Period | U(4,40) | U(4,40) | - |
Table 3.
A comparisonof the three global F1 scores across different architectures during training with baseline augmentation, both with and without the integration of the proposed methods (dynamic and dynamic high variants) on the saccade visual task. The best global F1 score for each method within the three setups is highlighted in bold.
Technique | Head | EMULATE Disabled | Dynamic | Dynamic High |
---|
Macro | Neg. | Pos. | Macro | Neg. | Pos. | Macro | Neg. | Pos. |
---|
F1 | F1 | F1 | F1 | F1 | F1 | F1 | F1 | F1 |
---|
HTCE-MAX | Cutout | 69.7 | 89.1 | 50.2 | 69.9 | 89.0 | 50.8 | 69.7 | 89.1 | 50.4 |
Dropout | 69.8 | 87.9 | 51.6 | 71.0 | 89.1 | 52.8 | 70.5 | 88.8 | 52.3 |
CutMix | 71.3 | 89.4 | 53.3 | 70.5 | 89.0 | 52.0 | 70.1 | 89.2 | 51.1 |
MixUp | 69.8 | 89.4 | 50.3 | 70.0 | 89.3 | 50.7 | 69.8 | 89.3 | 50.2 |
HTCE-MEAN | Cutout | 68.9 | 88.1 | 49.8 | 70.1 | 88.8 | 51.5 | 70.1 | 88.6 | 51.5 |
Dropout | 69.0 | 87.2 | 50.7 | 69.7 | 88.1 | 51.3 | 70.2 | 88.6 | 51.9 |
CutMix | 71.6 | 89.6 | 53.5 | 71.0 | 89.4 | 52.7 | 70.8 | 89.1 | 52.4 |
MixUp | 70.8 | 89.2 | 52.4 | 70.7 | 89.4 | 51.9 | 70.6 | 88.9 | 52.2 |
HTCSE | Cutout | 69.5 | 88.1 | 50.9 | 70.6 | 88.9 | 52.2 | 70.9 | 89.1 | 52.7 |
Dropout | 70.4 | 88.4 | 52.5 | 70.8 | 88.5 | 53.1 | 70.9 | 88.5 | 53.3 |
CutMix | 71.5 | 89.1 | 53.9 | 70.7 | 89.0 | 52.4 | 70.6 | 88.9 | 52.3 |
MixUp | 70.7 | 88.6 | 52.8 | 70.8 | 89.2 | 52.4 | 70.6 | 88.9 | 52.2 |
Table 4.
A comparison of the three global F1 scores across different architectures during training with baseline augmentations, both with and without the integration of the proposed methods (dynamic and dynamic high variants) on the vergence visual task. The best global F1 score for each method within the three setups is highlighted in bold.
Technique | Head | EMULATE Disabled | Dynamic | Dynamic High |
---|
Macro | Neg. | Pos. | Macro | Neg. | Pos. | Macro | Neg. | Pos. |
---|
F1 | F1 | F1 | F1 | F1 | F1 | F1 | F1 | F1 |
---|
HTCE-MAX | Cutout | 67.2 | 86.9 | 47.5 | 69.3 | 89.0 | 49.6 | 69.2 | 88.6 | 49.7 |
Dropout | 68.3 | 88.0 | 48.6 | 69.6 | 88.6 | 50.6 | 69.6 | 88.6 | 50.6 |
CutMix | 70.1 | 89.1 | 51.0 | 69.8 | 89.3 | 50.3 | 69.7 | 89.3 | 50.1 |
MixUp | 69.2 | 89.5 | 48.9 | 69.4 | 89.1 | 49.6 | 69.2 | 89.3 | 49.1 |
HTCE-MEAN | Cutout | 68.2 | 88.6 | 47.8 | 69.4 | 88.3 | 50.5 | 69.6 | 89.5 | 49.8 |
Dropout | 66.8 | 86.0 | 47.6 | 69.1 | 88.2 | 50.0 | 69.3 | 88.0 | 50.6 |
CutMix | 70.2 | 89.2 | 51.1 | 69.9 | 89.2 | 50.5 | 69.9 | 89.0 | 50.8 |
MixUp | 69.5 | 89.6 | 49.4 | 69.8 | 89.4 | 50.1 | 69.7 | 88.8 | 50.6 |
HTCSE | Cutout | 68.5 | 88.2 | 48.8 | 69.3 | 88.2 | 50.3 | 69.4 | 88.5 | 50.3 |
Dropout | 68.2 | 87.4 | 49.1 | 69.3 | 88.0 | 50.6 | 69.5 | 88.0 | 51.1 |
CutMix | 70.1 | 88.9 | 51.2 | 69.6 | 88.7 | 50.6 | 69.4 | 88.6 | 50.2 |
MixUp | 69.3 | 88.3 | 50.4 | 69.7 | 88.5 | 50.9 | 69.5 | 88.6 | 50.4 |
Table 5.
Presentation of the sampling parameters.
Parameters | Configuration |
---|
Dynamic | Static |
---|
Coordinates interpolation parameter (1 parameter) | X | X |
Initial angular position (3 parameters) | X | X |
Maximum angular rotation amplitude (3 parameters) | X | |
Sinusoidal period (3 parameters) | X | |
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).