Three-Stage Recursive Learning Technique for Face Mask Detection on Imbalanced Datasets

Tsai, Chi-Yi; Shih, Wei-Hsuan; Nisar, Humaira

doi:10.3390/math12193104

Open AccessArticle

Three-Stage Recursive Learning Technique for Face Mask Detection on Imbalanced Datasets

by

Chi-Yi Tsai

^1,*

,

Wei-Hsuan Shih

¹ and

Humaira Nisar

²

¹

Department of Electrical and Computer Engineering, Tamkang University, No. 151, Yingzhuan Road, Tamsui District, New Taipei City 251, Taiwan

²

Department of Electronic Engineering, Faculty of Engineering and Green Technology, Universiti Tunku Abdul Rahman, Kampar 31900, Malaysia

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(19), 3104; https://doi.org/10.3390/math12193104 (registering DOI)

Submission received: 23 August 2024 / Revised: 26 September 2024 / Accepted: 30 September 2024 / Published: 4 October 2024

(This article belongs to the Special Issue Advances in Algorithm Design and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

In response to the COVID-19 pandemic, governments worldwide have implemented mandatory face mask regulations in crowded public spaces, making the development of automatic face mask detection systems critical. To achieve robust face mask detection performance, a high-quality and comprehensive face mask dataset is required. However, due to the difficulty in obtaining face samples with masks in the real-world, public face mask datasets are often imbalanced, leading to the data imbalance problem in model training and negatively impacting detection performance. To address this problem, this paper proposes a novel recursive model-training technique designed to improve detection accuracy on imbalanced datasets. The proposed method recursively splits and merges the dataset based on the attribute characteristics of different classes, enabling more balanced and effective model training. Our approach demonstrates that the carefully designed splitting and merging of datasets can significantly enhance model-training performance. This method was evaluated using two imbalanced datasets. The experimental results show that the proposed recursive learning technique achieves a percentage increase (PI) of 84.5% in mean average precision ([email protected]) on the Kaggle dataset and of 186.3% on the Eden dataset compared to traditional supervised learning. Additionally, when combined with existing oversampling techniques, the PI on the Kaggle dataset further increases to 88.9%, highlighting the potential of the proposed method for improving detection accuracy in highly imbalanced datasets.

Keywords:

imbalanced data; recursive learning; face mask detection; object detection

MSC:

68T10; 68T45

1. Introduction

The primary mode of virus transmission between individuals is through respiratory droplets expelled from the mouth. Scientists have demonstrated that wearing face masks can effectively prevent the spread of COVID-19 [1]. It reduces infection risk and contributes to proper social distancing and hygiene practices. In recent years, the industry has increasingly used deep learning methods to detect face mask-wearing in public spaces, importing deep neural network models to learn from large amounts of annotated data. The trained model can accurately identify the wearing status of face masks.

It is challenging to detect faces in public spaces [2]. Deep learning models require a sufficient and diverse set of effective samples, encompassing variations in lighting, angles, and facial expressions to adapt to complex scenarios. Traditional face detection techniques often rely on understanding the entire face. However, the difficulty of face detection is further heightened due to the partial occlusion caused by the face mask. Other factors to consider, such as shape, color, and coverage, add to the complexity of face detection.

In such application scenarios, it is imperative to demand a deeper understanding of specific face mask features from the model to ensure the precise and reliable detection of faces correctly wearing masks. This task involves determining whether a mask is being worn and whether the mask covers critical areas such as the nose and mouth. Such a meticulous detection capability will enhance the model’s effectiveness in practical applications and ensure the reliable detection of faces correctly wearing masks.

Figure 1 shows a reference to three conditions of face mask-wearing. Most studies on mask detection only consider whether worn or not worn. They ignore the importance of correctly wearing the mask. Having any coverage of a mask does not guarantee effective virus prevention. It is crucial to ensure the correct coverage of the nose and mouth. Therefore, it is essential to consider the correct method of wearing the mask. Moreover, it is significant to consider the detection of incorrect mask-wearing. Incorrect mask-wearing provides no preventive effect against viruses in practical applications, making the detection of masks without such labeling almost futile.

Selecting a high-quality and comprehensive face mask dataset is crucial to achieving superior performance in face mask detection. However, obtaining samples of incorrect face mask-wearing is challenging in real-life scenarios. This results in a need for more samples of incorrect mask-wearing in most datasets. This imbalance in the number of annotated samples between classes can negatively impact feature extraction and detection. It is well known that if the number of samples in classes with more annotations significantly outweighs the number of those with fewer annotations, the model tends to predict the class with more data. This phenomenon adversely affects model-training performance, potentially causing overfitting, errors, and biases, and ultimately leading to a significant decrease in accuracy.

In order to solve the above data imbalance problem, this paper aims to improve the effectiveness of model training in the face mask detection task by combining recursive learning with a split-and-merge process on the imbalanced dataset. Specifically, a three-stage recursive learning method is proposed for the imbalanced face mask dataset, which improves the training result of the detection model by splitting and merging the imbalanced classes in the dataset. Note that the purpose of this study is not to introduce a new face mask detection model. Instead, our focus is on developing a novel model-training method specifically designed to address the challenge of imbalanced face mask datasets. The main contributions of this study include the following:

(1): The proposed method effectively overcomes the challenge of detecting incorrect mask-wearing in the face mask detection task. The proposed operations of splitting and merging datasets can effectively improve the detector’s detection performance for incorrect mask-wearing classes.
(2): We introduce the recursive learning method of the detection model to solve the model-training problems caused by data imbalance and conduct experimental comparison and evaluation with other methods.
(3): For detector selection, we evaluate the detection performance of the latest YOLO series detectors in the face mask detection task, including YOLOv6 [3], YOLOv7 [4], YOLOv8 [5], and YOLOX [6]. The experimental results show that applying the proposed model-training method to the YOLOv7 detector yields the best training performance. Although there are numerous object detection networks, such as Faster R-CNN [7], SSD [8], and CenterNet [9], YOLO detection networks have a relatively long and diverse history of development. Therefore, we selected the YOLO detectors for performance evaluation in this study.

We use two imbalanced mask datasets in our experiments for performance evaluation. The experimental results show that compared with the traditional supervised learning method, the proposed recursive learning method improves the [email protected] metric by approximately 84.5% on the Kaggle Face Mask Dataset (Kaggle dataset for short) [10]. On the other hand, the proposed recursive learning method improves the [email protected] metric by about 186.3% on the Eden dataset [11].

The rest of this paper is organized as follows. Section 2 discusses related work concerning model training on imbalanced datasets. Section 3 introduces the recursive learning method proposed in this study. Section 4 provides the pseudocode of our method. Section 5 explains the experimental methods and procedures used for the test using public face mask datasets. Finally, Section 6 concludes this study and proposes potential directions for future research.

2. Related Work

Researchers have proposed various methods to ensure the model’s performance in a small number of classes to solve the problem of model training with imbalanced data. These methods are described in detail below.

2.1. Training Methods for Imbalanced Data

One way to solve the uneven number of samples of different object classes is to use resampling, including oversampling and undersampling. Among them, Chawla et al. [12] proposed the synthetic minority class oversampling technique (SMOTE), which randomly extracts K linear data points adjacent to the minority object class from the original dataset and inserts new synthetic samples between them to form new minority object class data to increase the number of samples of a small number of object classes, which helps improve the model’s generalization ability. He et al. [13] proposed adaptive synthetic sampling technology (ADASYN), which calculates weights based on the density of samples of each minority object class. Classes with low sample density can generate more synthetic samples, and classes with high sample density can generate fewer synthetic samples to increase the number of samples in the minority class and determine how many new samples to synthesize according to need. Using generative adversarial network (GAN) technology to increase the data of specific classes is also a method to adjust the original data to balance object classes [14]. However, this technique focuses on data augmentation, which differs from the model-training approach discussed in this paper. Therefore, techniques for handling data imbalance using GANs are not reviewed in this study.

Oksuz et al. [15] conducted a comprehensive discussion on the imbalance problem in object detection, including the imbalance between foreground and background objects and the imbalance between foreground objects. They reviewed strategies such as resampling, transfer learning, adjusting loss functions, and GAN-based data generation to augment datasets. Mursalim and Kurniawan [16] integrated the architecture of ResNet-34 and multi-core CNN blocks to deal with the imbalance of chest X-ray datasets and improve the detection of COVID-19 diagnosis. Seo and Kim [17] proposed using the SMOTE resampling method to solve class imbalance and an optimization strategy for object class-imbalanced datasets. Ahmed and Saini [18] proposed using undersampling or oversampling to detect highly class-imbalanced datasets in fraudulent credit card transactions. Islam et al. [19] divided the dataset into minority-class and majority-class datasets. The majority-class dataset was then divided into multiple sub-datasets. The number of each sub-dataset was equal to the total number of minority-class data samples. The minority-class dataset was then combined with each sub-dataset of the majority class to create a balanced dataset, solving the class imbalance problem. Rustogi et al. [20] used synthetic minority oversampling technology and an extreme learning machine to classify binary imbalanced data and finally obtained higher F-measure, G-mean, and ROC scores. Ali et al. [21] proposed using multi-layer perception to identify faults in rotating machines and used SMOTE to deal with the classification problem of imbalance classes in the mechanical fault database.

Chavda et al. [22] proposed a two-stage CNN architecture for the class-imbalanced mask detection dataset and used two datasets, RMFD [23] and Kaggle, for performance verification. However, their study classified the incorrect-mask-wearing class in the Kaggle dataset into the no-mask-wearing class. Loey et al. [24] combined the medical mask dataset and the Kaggle dataset to regenerate a new dataset for mask detection. Liu and Ren [25] considered the small size and extreme imbalance of the Kaggle dataset and used simple CNAPS to improve classification performance.

2.2. Related Work on Recursive Learning

Recursive learning requires utilizing the output of previous iterations as an input to subsequent iterations, thereby facilitating the extraction of higher-level features, feature enhancement, or the refinement of predictions. This iterative process helps improve performance, especially in tasks with complex structures or hierarchies. Recursive learning has applications in various fields, such as image super-resolution, photo selection, image segmentation, and other tasks that present recursive structures. For example, Liu et al. [26] introduced an information-augmented recursive learning network architecture customized for automatic pancreas segmentation. Li et al. [27] proposed a multi-stage object detection method based on group recursive learning, which utilizes recursively learned segmentation features to enhance end-to-end object detection. Hung et al. [28] used recursive learning to enhance the learning effect of deep neural networks to achieve enhanced image super-resolution quality without increasing model parameters. Wu et al. [29] designed a recursive multi-relational graph convolutional network for automatic photo selection operations, using recursive learning to enhance the effectiveness of the graph convolutional neural network.

In addition, Yue et al. [30] designed a recursive triple-path learning network to use cross-modal information to reduce information loss, proving that the recursive learning method can further improve the accuracy of scene analysis. Kang et al. [31] proposed a recursive learning method combined with a stage-dependent loss function, where recursive learning enhances the model’s ability to capture complex image details. These studies highlight the effectiveness of recursive learning methods in different applications and demonstrate their potential to improve model performance and accuracy.

2.3. Existing Face Mask Datasets Used in the Literature

In this study, we utilized the following two publicly available face mask datasets for model training and performance evaluation:

(1): Kaggle Dataset [10]: This publicly available dataset contains 853 images in three classes, as shown in Figure 1, and their bounding boxes in PASCAL VOC format. Note that there is a clear imbalance in the incorrectly worn mask class.
(4): Eden Dataset [11]: Eden Digital Shelter established this dataset during the early stages of the pandemic. It aims to fulfill public health needs by offering relevant training and testing data. The dataset’s detailed annotations also include three classes: without a mask (marked as Bad), correct mask-wearing (marked as Good), and incorrect mask-wearing (marked as None). Similarly, it also has a clear imbalance in the None class.

Table 1 compares the number of annotations in the Kaggle and Eden datasets. Both datasets contain images with different characteristics, such as gender, race, presence or absence of crowds, and images of different sharpness, making them suitable for challenging the robustness of mask detectors. However, both datasets exhibit a notable imbalance, with relatively few annotations for incorrect mask-wearing, representing only 2.88% and 2.5% of the total samples in the Kaggle and Eden datasets, respectively. This data imbalance makes detecting and identifying this class more challenging.

2.4. Related Work on Face Mask Detection Methods

Currently, several studies have conducted experiments using the Kaggle dataset. One of these studies [24] considers the detection of three classes: correct mask-wearing, without a mask, and incorrect mask-wearing. This study aims to comprehensively learn and identify different mask-wearing states. However, the proposed detection model tested on the Kaggle dataset achieved a precision of only 81%.

In [22], the authors regarded the labeling of incorrect mask-wearing as the same situation as without wearing a mask and merged the two into the same class. Although this processing method can achieve an accuracy of 99%, it may ignore subtle differences in incorrect wearing. The detection model only needs to identify the characteristics of the mask, thereby avoiding the detection model’s identification of the mask-wearing state.

On the other hand, some studies ignore the detection of incorrect mask-wearing and focus on detecting the instances of correct and lack of mask-wearing [32,33]. For example, the study in [32] combines a set of face mask datasets and ignores the class of incorrect mask-wearing to achieve better model-training results. Das et al. [33] solved the problem of having different face mask types and face skin colors in the dataset and divided the annotation of training samples into two classes: correctly wearing and not wearing masks. Although this method achieved good training results, reaching 99% accuracy, it led to the insufficient detection of incorrectly worn masks in practical applications, thus affecting its practical prevention effect. Benifa et al. [34] proposed a lightweight deep learning model, FMDNet, to detect face mask violations in public areas, achieving 99.0% accuracy and outperforming other lightweight models. However, this method only considers two types of face images: with and without masks. Table 2 summarizes the performance of the above literature on the Kaggle dataset. Since the Eden dataset was released in 2020, there are currently no relevant experimental results on this dataset for reference.

3. The Proposed Recursive Learning Method

In the imbalanced face mask dataset, we observed a relatively small number of samples with incorrectly worn masks, which poses a challenge to the model’s identification accuracy. Upon further inspection, we found that incorrectly worn masks often appear very similar to correctly worn masks. This situation makes it difficult for detection models to distinguish between the two classes in traditional supervised learning methods. This problem motivates us to develop a new model-training method to improve the discrimination performance of the detection model for these two classes.

Figure 2 shows the difference between the proposed recursive and traditional supervised learning methods. Consider a multiclass imbalanced dataset containing N different classes. Traditional model training is usually based on supervised learning, where the model is trained directly on this dataset, and its weights are updated. In contrast, the proposed recursive learning method incorporates dataset manipulation processing into the model-training process. This manipulation processing involves splitting and merging the original dataset to generate multiple new datasets in different combinations. The detection model is recursively trained at different stages using these combined datasets. Note that we must adjust the dimension of the head layer according to the number of classes in each dataset when training the model. After training in the final stage, the model weights are the optimal training results.

3.1. Dataset Split-and-Merge Processing

Figure 3 shows the concept of the dataset split-and-merge processing proposed in this study. One possible way to enhance the model’s identification ability is to adjust the dataset’s labels by reassigning the labels of the data samples to merge smaller classes with similar attributes. This strategy helps reduce the negative effects of data imbalance in the original dataset. Specifically, merging multiple classes with similar attributes helps the model learn key features across similar classes, thereby improving its ability to differentiate between distinct classes. Taking the face mask detection task as an example, we can combine the correct mask-wearing (Class 1) and incorrect mask-wearing (Class 3) samples with similar attributes into the same class (Class 1 + Class 3) and the samples of no mask-wearing (Class 2) into another class to generate a combination dataset. This combination helps the detection model focus on learning key features between classes with significant attribute differences during the first stage of model training.

Then, the two classes (Class 1 and Class 3) with similar attributes are used to generate a binary dataset, which is used to train the detection model in the second stage. Training at this stage enables the detection model to learn detailed features that can effectively distinguish similar classes based on the shared key features among them. Therefore, this training stage helps improve the model’s accuracy in identifying incorrect mask-wearing situations. In the final stage of model training, the original dataset is used to further refine and optimize the detection model. The purpose of the final training stage is to improve the overall detection accuracy of the model on the original dataset, primarily when the detection model has obtained the critical features needed to identify similar classes through the previous two model-training stages. Note that the proposed dataset split-and-merge strategy could be broadly applied to various imbalanced classification problems, especially when dealing with classes that share similar attributes, leading to challenges in distinguishing between them.

3.2. Three-Stage Recursive Learning

After the above dataset split-and-merge process, two additional binary classification datasets can be generated. Therefore, the proposed recursive learning method contains three stages of model training, as shown in Figure 4. Each model-training stage includes three steps: dataset split-and-merge, head model adjustment, and model training and weight updating. The dataset split-and-merge step can generate the training dataset required for each stage. The head model adjustment step adjusts the output dimension according to the number of classes in the training dataset at each stage. The last step is to optimize the model and update the model weights. Since the training datasets generated in the first and second stages are binary, the head model architecture of these two stages needs to be changed to a binary classification branch, and the binary cross-entropy loss function is used for binary classification training. The head model architecture in the last stage is restored to the multiclass classification branch, and the multiclass cross-entropy loss function is used for multiclass classification training.

Note that we can extend the proposed method to handle imbalanced datasets with multiple classes in a similar way. First, we generate an initial combination dataset for the first stage of model training, enabling the model to learn critical features required to distinguish classes with different attributes. Subsequently, we select classes with similar attributes and pair and merge them for model training, enabling a more accurate discrimination of their feature differences. Finally, we reintroduce the original dataset to enhance the model’s ability to identify features of all classes. This multi-stage recursive learning method ensures that the model can more accurately identify imbalanced classes when faced with datasets containing multiple imbalanced classes, thus improving its overall recognition performance.

4. Implementation of the Proposed Method

This section presents the technical details of the proposed recursive learning method. Figure 5 shows the flowchart of the proposed recursive learning method. Algorithm 1 is the core algorithm of model recursive learning, which executes the three steps of each training stage described previously. Algorithm 2 is the first training step, focusing on splitting and merging multiclass imbalanced datasets. Algorithm 3 describes the second training step, in which the head network layer of the model is adjusted. Algorithm 4 explains the third training step, where the model weights are updated based on the loss function. If the training has not yet completed three cycles, the process will repeat recursively until the final model weights are generated after the third stage, and then the training process ends. Note that the detector model in each stage shares parameters and features learned in the previous stage, as shown in Figure 5.

Algorithm 1 Three-Stage Recursive Learning

Input: D_in, W₀

2.: Output: W_f

3.: W = W₀

4.: For stage = 1:3 do

5.: D_m = Split_Merge(D_in, stage)

6.: M_a = Head_Adjust(W, stage)

7.: W = Model_Training(M_a, D_m, stage)

8.: End for

9.: W_f = W

10.: Return W_f

Algorithm 2 Split_Merge

Input: D_in, stage

2.: Output: D_m

3.: $[C_{1 (L = 0)}, C_{2 (L = 1)}, C_{3 (L = 2)}]$ ← Split(D_in)

4.: If stage == 1 then

5.: $C_{13 (L = 0)}$ = Merge( $C_{1 (L = 0)}$ , $C_{3 (L = 2)}$ )

6.: $D_{m} = Combine (C_{13 (L = 0)}$ , $C_{2 (L = 1)}$ )

7.: else if stage == 2 then

8.: $D_{m} = Combine (C_{1 (L = 0)}$ , $C_{3 (L = 1)}$ )

9.: else stage == 3 then

10.: D_m = D_in

11.: End if

12.: Return D_m

Algorithm 3 Head_Adjust

Input: M, stage

2.: Output: M_a

3.: If stage == 1,2 then

4.: M_a = CreateModel(M, 2)

5.: else if stage == 3 then

6.: M_a = CreateModel(M, 3)

7.: End if

8.: Return M_a

Algorithm 4 Model_Training

Input:M_a, D_m, stage

2.: Output: W

3.: If stage == 1,2 then

4.: Loss_class = Loss_BCE

5.: else if stage == 3 then

6.: Loss_class = Loss_MCE

7.: End if

8.: Loss_tol = Loss_BR + Loss_conf + Loss_class

9.: W = TrainModel(D_m, M_a, Loss_tol)

10.: Return W

Table 3 and Table 4 list the symbol and function definitions used in this study, respectively. Algorithm 1 presents the main framework of the proposed recursive learning. Given an input dataset D_in and the detection model’s initial weight W₀, Algorithm 1 performs model training on the input model in three stages. Each stage executes the Split_Merge (Algorithm 2), Head_Adjust (Algorithm 3), and Model_Training (Algorithm 4) steps sequentially.

Algorithm 2 is crucial to splitting and merging the required datasets by the proposed recursive learning method. In Algorithm 2, the Split function splits all classes in the input dataset D_in, while the Merge function merges two classes into a single class with the same label. Additionally, the Combine function combines samples from different classes to form a new dataset. The initial step of Algorithm 2 involves dividing the input dataset into samples belonging to each class through the Split function. In the first stage of model training, classes C₁ and C₃ with similar attributes are merged into one class, C₁₃, using the Merge function. Subsequently, the C₁₃ and C₂ classes are combined through the Combine function to obtain the dataset required for the first model-training stage. In the subsequent stage, the Combine function combines the C₁ and C₃ classes to generate the required dataset for the second model-training stage. Finally, in the third stage, the original dataset serves as the training dataset required for model training.

Algorithm 3 describes the head layer adjustments corresponding to different model-training stages. In Algorithm 3, the CreateModel function adjusts the head layer of the input model to the specified output dimension while keeping the backbone network layer and neck network layer unchanged. In both the first and second training stages, the output dimension of the head layer of the input model is 2. In the third training stage, the output dimension of the head layer of the input model is adjusted to 3. Finally, the model, after adjusting the output dimension, is returned.

Algorithm 4 represents the model-training procedure based on the merged dataset D_m, the adjusted model M_a, and a total loss function, Loss_tol. Here, the total loss function is defined as follows:

L o s s_{t o l} = L o s s_{B R} + L o s s_{c o n f} + L o s s_{c l a s s},

(1)

where the first, second, and third terms represent the box regression loss, the confidence loss, and the classification loss, respectively. The box regression loss Loss_BR evaluates the difference between the ground truth and the predicted bounding boxes. The box regression loss is given by

L o s s_{B R} = 1 - I O U + \frac{D^{2}}{C^{2}} + V,

(2)

where IOU is the intersection-over-union index and is defined as follows:

I O U = \frac{A \cap B}{A \cup B},

(3)

where A is the ground truth bounding box and B is the predicted bounding box. The operators ∩ and ∪ represent the intersection and union areas of two bounding boxes, respectively. As shown in Figure 6, D is the Euclidean distance between the center points of the ground truth box and the predicted box. C is the diagonal distance of the minimum enclosing region containing both the ground truth box and the predicted box. V is a metric function, defined as follows:

V = \{\begin{matrix} 0, & i f I O U < 0.5, \\ \frac{v}{1 - I O U + v}, & o t h e r w i s e, \end{matrix}

(4)

where v is a similarity measure of the aspect ratio between the bounding box length and width and is expressed by

v = \frac{4}{π^{2}} {(\tan^{- 1} \frac{W^{r e a l}}{H^{r e a l}} - \tan^{- 1} \frac{W}{H})}^{2},

(5)

where W^real and H^real, respectively, are the width and height of the ground truth bounding box. W and H, respectively, are the width and height of the predicted bounding box.

The confidence loss Loss_conf evaluates the model’s confidence in whether the predicted bounding box contains an object of interest. Here, we use binary cross-entropy (BCE) loss as the Loss_conf function, and the formula is as follows:

L o s s_{c o n f} = - [s \log (y) + (1 - s) \log (1 - y)],

(6)

where y is the predicted probability of the object’s existence. s is a binary variable used to represent the target probability of the object’s existence. s = 1 means the object exists, and s = 0 means the object does not exist.

The classification loss Loss_class measures the loss between the predicted and actual class probability. Here, we use the BCE loss in the first and second stages and the multiclass cross-entropy loss in the final stage. Therefore, the formula for this loss is given by

L o s s_{c l a s s} = \{\begin{matrix} - \sum t \log (z), & i f s t a g e = 3, \\ - [t \log (z) + (1 - t) \log (1 - z)], & o t h e r w i s e, \end{matrix}

(7)

where y is the predicted probability of the class, and t is the corresponding actual probability.

5. Experimental Results

In the experiment, we used the RTX 3070 graphics processing unit in a notebook computer to perform model-training operations and verify the proposed model-training method. Regarding the software environment, we set up the experimental environment using the Windows 10 operating system and the Anaconda development kit (Anaconda Inc., Austin, TX, USA), and we developed programs using Python 3.9 and PyTorch 1.12.1. Detailed computer hardware and system software specifications are shown in Table 5.

We used some commonly used performance metrics for performance evaluation, including mean average precision (mAP), precision, recall, F1-measure, and percentage increase (PI) metrics. The definitions and descriptions of these metrics are as follows:

[email protected] (↑): The mean average precision calculated for detected target objects with an IOU threshold of 0.5.
Precision (↑): The ratio of target objects correctly predicted by the model to the total predicted target objects; its formula is as follows:

$Precision = \frac{T P}{T P + F P} .$

(8)
Recall (↑): The ratio of target objects correctly predicted by the model to the total actual target objects; its formula is as follows:

$Recall = \frac{T P}{T P + F N} .$

(9)
F1-measure (↑): The harmonic mean of precision and recall, as follows:

$F 1 - measure = \frac{2 \times Recall \times Precision}{Recall + Precision} .$

(10)
PI (↑): The measurement of percent change between the final value and the initial value, as follows:

$PI (%) = \frac{F i n a l V a l u e - I n i t i a l V a l u e}{I n i t i a l V a l u e} \times 100 %$

(11)

In Equations (8) and (9), TP, FP, and FN represent the number of True Positives, False Positives, and False Negatives, respectively. Here, the symbol ↑ (↓) indicates that the higher (lower) the metric, the better the detection performance.

Regarding detection model selection, we used different versions of YOLO detectors for the experiments, including YOLOv6, YOLOv7, YOLOv8, and YOLOX. The input image size for all detectors is set to 640 × 640 pixels. Furthermore, we used two imbalanced face mask datasets (Kaggle and Eden datasets) for model training and testing. In the experiments, in addition to identifying the best-performing YOLO detector in this detection task, we also tried to verify the effectiveness of the proposed three-stage recursive learning method on imbalanced datasets.

5.1. Ablation Study of the Proposed Method Using Different YOLO Detectors

Using different YOLO detectors, we conducted an ablation study of the proposed recursive learning method on the Kaggle test set. Based on the combination of different training stages, we classify the proposed method into Recursive Stage1 + Stage3 (R-S1S3), Recursive Stage2 + Stage3 (R-S2S3), and Recursive Stage1 + Stage2 + Stage3 (R-S1S2S3). Table 6 records the results of the ablation study, with bold fonts indicating the best results. Observing Table 6, we have the following observations:

(1): Compared with supervised learning methods, the proposed recursive learning method significantly enhances the performance of the YOLOX and YOLOv6 detectors. Furthermore, the R-S1S2S3 learning method outperforms all other methods regarding model-training results for all YOLO detectors. These experimental results highlight the effectiveness of the three-stage recursive learning technique in enhancing detection model-training results on imbalanced datasets.
(2): Based on the F1-measure and [email protected] metrics, the YOLOv7_X detector achieves the best training results, obtaining scores of 0.885 and 0.905, respectively. Compared with the supervised learning method, the proposed R-S1S2S3 learning method significantly improves these two metrics by 0.349 and 0.447, respectively. Therefore, we further evaluate the performance of the proposed recursive learning method with different YOLOv7 series detectors, and the results are shown in Table 7.
(3): The results in Table 7 show that among different YOLOv7 series detectors, YOLOv7 shows the best performance, with F1-measure and [email protected] scores of 0.940 and 0.915, respectively. Furthermore, YOLOv7 exhibits the lowest computational load in terms of FLOPs and model parameters (Params). Therefore, this study selects YOLOv7 as the default detector of the system.

5.2. Performance Evaluation on Two Imbalanced Datasets

To effectively compare the performance difference between supervised learning and the proposed recursive learning method, we used the YOLOv7 detector to conduct tests using two imbalanced datasets: the Kaggle dataset and the Eden dataset. We evaluated the training performance of different model-training methods under consistent detector conditions, ensuring a fair comparison across methods. The experimental results are presented in Table 8, resulting in the following observations:

(1): On the Kaggle dataset, the F1-measure metric of recursive learning reached 0.940, and the [email protected] metric reached 0.915. Compared with supervised learning, the two metrics were significantly improved by 0.184 and 0.419, respectively.
(2): On the Eden dataset, the F1-measure metric of recursive learning reached 0.900, and the [email protected] metric reached 0.710. Compared with supervised learning, the performance improvements in these two metrics reached 0.207 and 0.462, respectively.
(3): The PI between the proposed method and the supervised method on the [email protected] metric reached 84.5% and 186.3% for the Kaggle dataset and Eden dataset, respectively.
(4): On both datasets, recursive learning consistently outperformed supervised learning on performance metrics such as precision, recall, and F1-measure. These results show that the proposed method helps overcome the data imbalance problem, significantly improving the detection accuracy for a small number of classes, thereby showing superior detection performance.

5.3. Comparison of Results between the Proposed and Existing Methods

This section compares the proposed method with two existing methods for handling data imbalance: image weighting and oversampling. The image weighting method balances the class distribution by oversampling classes with fewer samples and undersampling classes with more samples, thereby improving the detection accuracy of imbalanced classes. In contrast, oversampling methods increase the number of samples in classes with fewer samples in the dataset, effectively balancing the classes. Since the oversampling method can be integrated into the recursive learning process, we also combine it with the proposed method, called the oversampling–recursive (Over-R) learning method. Specifically, during the model-training step, in each batch of samples, imbalanced classes are sampled more frequently to balance the distribution of classes within each batch, thereby enhancing the model’s ability to learn from imbalanced datasets. Table 9 presents the experimental results of the [email protected] and F1-measure metrics of the proposed method and the compared methods on the two imbalanced datasets. The experimental results show that image weighting and oversampling methods achieve significantly improved results compared to the supervised learning method. Specifically, on the Kaggle dataset, the oversampling method significantly outperforms the image weighting method. However, the image weighting method on the Eden dataset shows a better F1-measure score, while the oversampling method shows a better [email protected] score.

On the other hand, the model learning results achieved by the proposed recursive learning method are significantly better than those of these two existing methods on both the Kaggle and Eden datasets. Furthermore, the model learning effect is further enhanced when the proposed method is combined with the oversampling method. On the Kaggle dataset, the [email protected] and F1-measure metrics achieved high scores of 0.937 and 0.959, respectively. The PI over the supervised method on these metrics reaches 88.9% and 26.8%, respectively. Similarly, on the Eden dataset, both metrics also achieved scores of 0.899 and 0.906, with corresponding PI metrics of 262.5% and 30.7%. These results are significantly better than the training results obtained by the supervised learning method. Therefore, these experimental results validate the effectiveness of the proposed recursive learning method.

5.4. Visual Comparison Results

Figure 7 and Figure 8 show the experimental results comparing the supervised learning method with the proposed Over-R-S1S2S3 learning method on the Kaggle test set. Figure 7a,c show the detection and corresponding local zoom-in results using the supervised learning method. In the test image, it can be observed that the pedestrian on the left boundary is wearing their mask incorrectly. However, the detector incorrectly detected them as wearing their mask correctly. In contrast, Figure 7b,d show the detection and corresponding local zoom-in results, respectively, using the proposed recursive learning method. The experimental results show that the same pedestrian on the left boundary is correctly detected as wearing the mask incorrectly. Figure 8 also shows similar experimental results. These experimental results confirm that the proposed recursive learning method can effectively improve the training performance of the detection model in imbalanced datasets, thereby improving the detection accuracy of imbalanced classes in the dataset. Interested readers can refer to the online video in [35] for more test results of the proposed recursive learning method on private test sequences.

6. Conclusions and Future Work

In this study, we propose a novel recursive learning method to address the challenge of the data imbalance problem and apply it to the face mask detection task. The proposed method recursively splits and merges imbalanced datasets based on attribute features of imbalanced classes, thereby improving the training efficiency of detection models. After training and testing the Kaggle dataset, we found that the proposed method achieved the best detection performance on the YOLOv7 detector. To further validate the training performance of the proposed method, we conduct experiments on two imbalanced datasets. Experimental results show that the proposed recursive learning method significantly improves the [email protected] and F1-measure compared to the supervised learning method.

Furthermore, our recursive learning method also produces superior results compared to other methods dealing with data imbalance. To further improve training results, we combine oversampling with recursive learning methods. The test results confirm the effectiveness of our method in accurately detecting incorrect mask-wearing, solving a critical issue that is often overlooked in existing face mask detectors. Therefore, the proposed recursive learning method can effectively enhance the training results of the detection model on the imbalanced dataset, providing crucial practical value in real-life scenarios where only imbalanced datasets are available.

In future work, we aim to extend the application of the proposed recursive learning method to address other data imbalance challenges. By doing so, we can reduce the potential negative impact of imbalanced data on model-training performance in real-world scenarios, thereby enhancing the practical applicability of the proposed method. Additionally, we plan to explore extensions of our approach to datasets with more than three imbalanced object classes. These extensions require the development of a multi-stage recursive learning method that can effectively handle complex datasets with multiple imbalanced classes. Through these research efforts, we expect to improve the robustness and effectiveness of the proposed method further for solving the data imbalance problem in real-life scenarios.

Author Contributions

C.-Y.T.: Conceptualization; methodology; resources; supervision; project administration; funding acquisition; writing—original draft; writing—review and editing; W.-H.S.: methodology; software development; verification; investigation; data curation; visualization; H.N.: writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Science and Technology Council of Taiwan under Grant NSTC 112-2221-E-032-036-MY2.

Institutional Review Board Statement

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Chi-Yi Tsai reports financial support was provided by the National Science and Technology Council of Taiwan. Chi-Yi Tsai reports a relationship with the National Science and Technology Council that includes funding grants.

References

Howard, J.; Huang, A.; Li, Z.; Tufekci, Z.; Zdimal, V.; van der Westhuizen, H.-M.; von Delft, A.; Price, A.; Fridman, L.; Tang, L.-H.; et al. An Evidence Review of Face Masks against COVID-19. Proc. Natl. Acad. Sci. USA 2021, 118, e2014564118. [Google Scholar] [CrossRef] [PubMed]
Mohammed, O.A.; Al-Tuwaijari, J.M. Analysis of Challenges and Methods for Face Detection Systems: A Survey. Int. J. Nonlinear Anal. Appl. 2022, 13, 3997–4015. [Google Scholar]
YOLOv6 v3.0: A Full-Scale Reloading. Available online: https://github.com/meituan/YOLOv6 (accessed on 29 September 2024).
YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. Available online: https://github.com/WongKinYiu/yolov7 (accessed on 29 September 2024).
YOLO by Ultralytics (Version 8.0.0). Available online: https://github.com/ultralytics/ultralytics (accessed on 29 September 2024).
YOLOX: Exceeding YOLO Series in 2021. Available online: https://github.com/Megvii-BaseDetection/YOLOX (accessed on 29 September 2024).
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.; Berg, A. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Duan, K.; Bai, S.; Xie, L.; Qi, H.; Huang, Q.; Tian, Q. CenterNet: Keypoint Triplets for Object Detection. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6568–6577. [Google Scholar]
Kaggle Face Mask Dataset. Available online: https://www.kaggle.com/datasets/andrewmvd/face-mask-detection (accessed on 8 March 2024).
Eden Dataset for Mask Wearing. Available online: https://github.com/ch-tseng/Dataset_for_Mask_Wearing (accessed on 7 July 2022).
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, 1–8 June 2008; pp. 1322–1328. [Google Scholar] [CrossRef]
Lee, J.; Park, K. GAN-based Imbalanced Data Intrusion Detection System. Pers. Ubiquitous Comput. 2021, 25, 121–128. [Google Scholar] [CrossRef]
Oksuz, K.; Cam, B.C.; Kalkan, S.; Akbas, E. Imbalance Problems in Object Detection: A Review. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3388–3415. [Google Scholar] [CrossRef] [PubMed]
Mursalim, M.K.N.; Kurniawan, A. Multi-kernel CNN Block-based Detection for COVID-19 with Imbalance Dataset. Int. J. Electr. Comput. Eng. 2021, 11, 2467–2476. [Google Scholar] [CrossRef]
Seo, J.-H.; Kim, Y.-H. Machine-Learning Approach to Optimize SMOTE Ratio in Class Imbalance Dataset for Intrusion Detection. Comput. Intell. Neurosci. 2018, 2018, 20189704672. [Google Scholar] [CrossRef] [PubMed]
Ahmed, A.N.; Saini, R. A Survey on Detection of Fraudulent Credit Card Transactions Using Machine Learning Algorithms. In Proceedings of the 3rd International Conference on Intelligent Communication and Computational Techniques, Jaipur, India, 19–20 January 2023; pp. 1–5. [Google Scholar] [CrossRef]
Islam, S.; Sara, U.; Kawsar, A.; Rahman, A.; Kundu, D.; Dipta, D.D.; Karim, A.N.M.R.; Hasan, M. SGBBA: An Efficient Method for Prediction System in Machine Learning Using Imbalance Dataset. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 430–441. [Google Scholar] [CrossRef]
Rustogi, R.; Prasad, A. Swift Imbalance Data Classification Using SMOTE and Extreme Learning Machine. In Proceedings of the International Conference on Computational Intelligence in Data Science, Chennai, India, 21–23 February 2019; pp. 1–6. [Google Scholar] [CrossRef]
Ali, M.A.; Bingamil, A.A.; Jarndal, A.; Alsyouf, I. The Influence of Handling Imbalance Classes on the Classification of Mechanical Faults Using Neural Networks. In Proceedings of the 8th International Conference on Modeling, Simulation and Applied Optimization, Manama, Bahrain, 15–17 April 2019; pp. 1–5. [Google Scholar] [CrossRef]
Chavda, A.; Dsouza, J.; Badgujar, S.; Damani, A. Multi-Stage CNN Architecture for Face Mask Detection. In Proceedings of the 6th International Conference for Convergence in Technology, Maharashtra, India, 2–4 April 2021; pp. 1–8. [Google Scholar] [CrossRef]
Huang, B.-J. Real-World Masked Face Dataset (RMFD). Available online: https://github.com/X-zhangyang/Real-World-Masked-Face-Dataset (accessed on 6 March 2024).
Loey, M.; Manogaran, G.; Taha, M.H.N.; Khalifa, N.E.M. Fighting against COVID-19: A Novel Deep Learning Model Based on YOLO-v2 with ResNet-50 for Medical Face Mask Detection. Sustain. Cities Soc. 2021, 65, 102600. [Google Scholar] [CrossRef] [PubMed]
Liu, R.; Ren, Z. Application of YOLO on Mask Detection Task. In Proceedings of the IEEE 13th International Conference on Computer Research and Development, Beijing, China, 5–7 January 2021; pp. 130–136. [Google Scholar] [CrossRef]
Liu, Y.; Huang, Y.; Guo, R. Information Enhancement and Recursive Learning Network in a Coarse-Refine Manner for Pancreas Segmentation. In Proceedings of the IEEE International Conference on Multimedia and Expo, Taipei, Taiwan, 18–22 July 2022; pp. 1–6. [Google Scholar] [CrossRef]
Li, J.; Liang, X.; Li, J.; Wei, Y.; Xu, T.; Feng, J.; Yan, S. Multistage Object Detection with Group Recursive Learning. IEEE Trans. Multimed. 2017, 20, 1645–1655. [Google Scholar] [CrossRef]
Hung, K.-W.; Zhang, Z.; Jiang, J. Real-time Image Super-resolution Using Recursive Depthwise Separable Convolution Network. IEEE Access 2019, 7, 99804–99816. [Google Scholar] [CrossRef]
Xu, W.; Xu, Y.; Sang, G.; Li, L.; Wang, A.; Wei, P.; Zhu, L. Recursive Multi-Relational Graph Convolutional Network for Automatic Photo Selection. IEEE Trans. Multimed. 2023, 25, 3825–3840. [Google Scholar] [CrossRef]
Yue, Y.; Zhou, W.; Lei, J.; Yu, L. RTLNet: Recursive Triple-Path Learning Network for Scene Parsing of RGB-D Images. IEEE Signal Process. Lett. 2021, 29, 429–433. [Google Scholar] [CrossRef]
Kang, C.; Kang, S.-U. Self-Supervised Denoising Image Filter Based on Recursive Deep Neural Network Structure. Sensors 2021, 21, 7827. [Google Scholar] [CrossRef] [PubMed]
Rahmani, M.K.I.; Taranum, F.; Nikhat, R.; Farooqi, M.R.; Khan, M.A. Automatic Real-time Medical Mask Detection Using Deep Learning to Fight COVID-19. Comput. Syst. Sci. Eng. 2022, 42, 1181–1198. [Google Scholar] [CrossRef]
Das, A.; Ansari, M.W.; Basak, R. COVID-19 Face Mask Detection Using TensorFlow, Keras and OpenCV. In Proceedings of the IEEE 17th India Council International Conference, New Delhi, India, 10–13 December 2020; pp. 1–5. [Google Scholar] [CrossRef]
Benifa, J.V.B.; Chola, C.; Muaad, A.Y.; Hayat, M.A.B.; Heyat, M.B.B.; Mehrotra, R.; Akhtar, F.; Hussein, H.S.; Vargas, D.L.R.; Castilla, Á.K.; et al. FMDNet: An Efficient System for Face Mask Detection Based on Lightweight Model during COVID-19 Pandemic in Public Areas. Sensors 2023, 23, 6090. [Google Scholar] [CrossRef] [PubMed]
Results on Private Test Sequences. Results of Three-Stage Recursive Learning Technique on Private Test Sequences. Available online: https://youtu.be/ZuA3AOkcvQE (accessed on 29 September 2024).

Figure 1. Three conditions of face mask-wearing: (a) correct mask-wearing, (b) no mask-wearing, and (c) incorrect mask-wearing.

Figure 2. Comparison of (a) the traditional supervised learning and (b) the proposed recursive learning method. The proposed recursive learning method incorporates dataset manipulation processing into the model-training process to train the model recursively.

Figure 3. Concept of the proposed dataset split-and-merge processing for recursive learning.

Figure 4. Illustration of the proposed three-stage recursive learning method combined with dataset split-and-merge processing.

Figure 5. Flowchart of the proposed recursive learning method.

Figure 6. Illustration of the distances C and D between the ground truth A and the predicted B bounding boxes.

Figure 7. Experimental results of (a) the supervised learning and (b) the proposed Over-R-S1S2S3 learning method on the Kaggle test set, along with (c) and (d), the corresponding zoom-in results.

Figure 8. Experimental results of (a) the supervised learning and (b) the proposed Over-R-S1S2S3 learning method on the Kaggle test set, along with (c) and (d), the corresponding zoom-in results.

Table 1. Number of images and annotations in the Kaggle and Eden datasets.

Dataset			Training	Validation	Testing	Total	Percent
Kaggle	Number of Images		480	120	253	853	-
	Number of Annotations	With_mask	1191	203	568	1962	78.45
		Without_mask	311	59	97	467	18.67
		Mask_wear_incorrect	43	10	19	72	2.88
Eden	Number of Images		435	109	135	679	-
	Number of Annotations	Good	1371	342	460	2173	74.52
		Bad	453	130	87	670	22.98
		None	46	16	11	73	2.50

Table 2. Performance of the existing literature on the Kaggle dataset.

Literature	Precision	Recall	F1-Measure	Note
[24]	0.81	N/A	N/A	Preserve the three labels defined in the annotations.
[22]	0.99	0.99	0.99	Treat the label of Mask_wear_incorrect as the label of Without_mask.
[32]	0.99	0.99	0.99	Ignore the label of Mask_wear_in Correct and only preserve the labels of With_mask and Without_mask.
[33]	0.94	N/A	N/A
[34]	0.99	0.98	0.99

Table 3. Symbol definitions used in this study.

Symbol	Definition
D_in	Input multiclass imbalanced dataset
D_m	Merged training dataset
C_i(L=j)	Samples of the class i with label j
W₀	Pre-training model weights
W_f	Final model weights
M	Current detection model
M_a	Adjusted detection model for training

Table 4. Function definitions used in this study.

Function	Definition
$Split (D_{i n}$ )	Split all classes in the input dataset D_in
$Merge (a$ $, b$ )	Merge samples of class a and class b as the same label
$Combine (a$ $, b$ )	Combine samples of class a and class b to produce a new dataset
CreateModel (m, dim)	Adjust the input model m to create an adjusted model with output dimension dim
TrainModel (m, D_in, Loss)	Train the input model m based on the input dataset D_in and the loss function Loss to optimize the weights of the input model m

Table 5. Hardware and software specifications of the computer used in the experiment.

Hardware/Software	Item	Version
Notebook Computer	CPU	AMD R9 5900
	GPU	NVIDIA RTX 3070
	RAM	16 GB
Software Version	Operating System	Windows 10
	Python	3.9
	CUDA	11.2
	Pytorch	1.12.1

Table 6. Ablation study results of the proposed recursive learning method using different YOLO detectors on the Kaggle dataset.

Method	Supervised		R-S1S3 ¹		R-S2S3 ²		R-S1S2S3 ³
Model	F1 ⁴	mAP ⁵	F1	mAP	F1	mAP	F1	mAP
YOLOX_X	0.304	0.252	0.359	0.314	0.331	0.288	0.682	0.656
YOLOv6_L	0.602	0.711	0.610	0.713	0.615	0.755	0.652	0.832
YOLOv7_X	0.536	0.458	0.516	0.392	0.486	0.409	0.885	0.905
YOLOv8_X	0.817	0.828	0.804	0.839	0.787	0.829	0.823	0.884

¹: Recursive Stage1 + Stage3; ²: Recursive Stage2 + Stage3; ³: Recursive Stage1 + Stage2 + Stage3; ⁴: F1-measure; ⁵: [email protected]. The bold font indicates the best results in the table.

Table 7. Performance evaluation using different YOLOv7 series detectors on the Kaggle dataset.

Model	Precision	Recall	F1-Measure	[email protected]	FLOPs (G)	Params (M)
YOLOv7	0.934	0.947	0.940	0.915	104.7	36.9
YOLOv7_X	0.899	0.871	0.885	0.905	189.9	71.3
YOLOv7_D6	0.842	0.789	0.815	0.853	806.8	154.7
YOLOv7_E6	0.850	0.813	0.831	0.867	515.2	97.2
YOLOv7_E6E	0.812	0.746	0.778	0.814	843.2	151.7
YOLOv7_W6	0.831	0.813	0.822	0.852	360.0	70.4

The bold font indicates the best results in the table.

Table 8. Performance evaluation of the proposed method on two imbalanced datasets.

Dataset	Method	Precision	Recall	F1-Measure	[email protected]	PI ¹ (%)
Kaggle	Supervised	0.800	0.718	0.756	0.496	-
	[24]	0.810	-	-	-	-
	Proposed	0.934	0.947	0.940	0.915	84.5
Eden	Supervised	0.798	0.613	0.693	0.248	-
Eden	Proposed	0.915	0.887	0.900	0.710	186.3

¹: Initial value = mAP of the supervised method; Final value = mAP of the proposed method. The bold font indicates the best results in the table.

Table 9. Comparison of [email protected] and F1-measure between the proposed and existing methods.

Dataset	Supervised	Image Weighting	Oversampling	R-S1S2S3	Over-R-S1S2S3 ¹	PI ² (%)
Kaggle	0.496/0.756 ³	0.554/0.754	0.896/0.879	0.915/0.940	0.937/0.959	88.9/26.8
Eden	0.248/0.693	0.409/0.751	0.511/0.623	0.710/0.900	0.899/0.906	262.5/30.7

¹: Oversampling + Recursive Stage1 + Stage2 + Stage3; ²: Initial value = metrics of the supervised method; ³: [email protected]/F1-measure; Final value = metrics of the Over-R-S1S2S3 method. The bold font indicates the best results in the table.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tsai, C.-Y.; Shih, W.-H.; Nisar, H. Three-Stage Recursive Learning Technique for Face Mask Detection on Imbalanced Datasets. Mathematics 2024, 12, 3104. https://doi.org/10.3390/math12193104

AMA Style

Tsai C-Y, Shih W-H, Nisar H. Three-Stage Recursive Learning Technique for Face Mask Detection on Imbalanced Datasets. Mathematics. 2024; 12(19):3104. https://doi.org/10.3390/math12193104

Chicago/Turabian Style

Tsai, Chi-Yi, Wei-Hsuan Shih, and Humaira Nisar. 2024. "Three-Stage Recursive Learning Technique for Face Mask Detection on Imbalanced Datasets" Mathematics 12, no. 19: 3104. https://doi.org/10.3390/math12193104

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Three-Stage Recursive Learning Technique for Face Mask Detection on Imbalanced Datasets

Abstract

1. Introduction

2. Related Work

2.1. Training Methods for Imbalanced Data

2.2. Related Work on Recursive Learning

2.3. Existing Face Mask Datasets Used in the Literature

2.4. Related Work on Face Mask Detection Methods

3. The Proposed Recursive Learning Method

3.1. Dataset Split-and-Merge Processing

3.2. Three-Stage Recursive Learning

4. Implementation of the Proposed Method

5. Experimental Results

5.1. Ablation Study of the Proposed Method Using Different YOLO Detectors

5.2. Performance Evaluation on Two Imbalanced Datasets

5.3. Comparison of Results between the Proposed and Existing Methods

5.4. Visual Comparison Results

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI