Next Article in Journal
Biomass Dynamics in a Fragment of Brazilian Tropical Forest (Caatinga) over Consecutive Dry Years
Next Article in Special Issue
Semi-Automatic Adaptation of Diagnostic Rules in the Case-Based Reasoning Process
Previous Article in Journal
Qualitative Validation Approach Using Digital Model for the Health Management of Electromechanical Actuators
Previous Article in Special Issue
A Systematic Overview of Recent Methods for Non-Contact Chronic Wound Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Ensemble Learning for Skeleton-Based Body Mass Index Classification

1
Division of Network Business, Samsung Electronics Company, Ltd., Suwon 16677, Korea
2
Department of Electrical and Electronic Engineering, Yonsei University, Seoul 03722, Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2020, 10(21), 7812; https://doi.org/10.3390/app10217812
Submission received: 12 September 2020 / Revised: 26 October 2020 / Accepted: 2 November 2020 / Published: 4 November 2020
(This article belongs to the Special Issue Machine Learning in Medical Applications)

Abstract

:
In this study, we performed skeleton-based body mass index (BMI) classification by developing a unique ensemble learning method for human healthcare. Traditionally, anthropometric features, including the average length of each body part and average height, have been utilized for this kind of classification. Average values are generally calculated for all frames because the length of body parts and the subject height vary over time, as a result of the inaccuracy in pose estimation. Thus, traditionally, anthropometric features are measured over a long period. In contrast, we controlled the window used to measure anthropometric features over short/mid/long-term periods. This approach enables our proposed ensemble model to obtain robust and accurate BMI classification results. To produce final results, the proposed ensemble model utilizes multiple k-nearest neighbor classifiers trained using anthropometric features measured over several different time periods. To verify the effectiveness of the proposed model, we evaluated it using a public dataset. The simulation results demonstrate that the proposed model achieves state-of-the-art performance when compared with benchmark methods.

1. Introduction

Over the past several decades, the percentage of the population that is obese has steadily increased. Since obesity can cause various diseases, its prevalence among people has become a major social issue. To identify its adverse effects, many studies have investigated the relationship between obesity and various diseases. According to the results presented in the literature, obesity is related to several diseases such as diabetes [1], high blood pressure [2], hyperlipidemia [3], cholelithiasis [4], hypopnea [5], arthritis [6] and mental disorders [7], meaning that the death rate for the obese is relatively high.
The World Health Organization (WHO) defines obesity as excessive fat accumulation that may threaten life. Additionally, the WHO uses the body mass index (BMI) to identify obesity in people. The BMI is one of the factors used to estimate obesity levels and is defined as the ratio of a person’s body weight in kilograms to the square of their height in meters (i.e., the unit of the BMI is kg/ m 2 ). In general, a person is classified as “overweight” when their BMI is greater than 25.
Traditionally, if a person wants to know their BMI, they must first measure their body weight and height using a weight scale and extensometer, respectively. After this, their BMI can be calculated according to the definition of the BMI above. If no device is used to measure body weight or height, it is impossible to calculate the BMI. To make the acquisition of BMIs simpler, i.e., possible without using such devices, many researchers in the field of computer vision have explored various methods of BMI prediction/classification. Most have focused on calculating the BMI of an individual from a facial image [8,9,10]. In the literature, face image databases built under a controlled environment, such as frontal view, clean background, alignment according to interpupillary distance, etc., have been used. However, in a real-world situation, facial images are generally captured in uncontrolled conditions. For example, the facial image of a person can be taken from the side view as well as the frontal view and can have complex backgrounds. In addition, in some cases, a person can wear a mask which conceals parts of their face. In particular, since most of the features used in [8,9,10] can be extracted from only front-view face images without any occlusions, the feasibility and effectiveness of these methods for real-world application still need to be demonstrated.
Some studies on BMI classification have been conducted using human full-body images [11,12,13]. In these studies, each individual was asked to stand up straight while their full-body image was taken. In addition, the image was captured from a frontal view. Most of the features proposed in [11,12,13] can be obtained from only frontal full-body images. Therefore, in a real-world situation, if the methods proposed in [11,12,13] are used for BMI classification, it is necessary to have the cooperation of an individual to enable the extraction of features from their frontal full-body images.
Recently, based on advancements in human pose estimation techniques, the acquisition of three-dimensional (3D) human motion data has become quicker, simpler and more precise [14,15,16,17]. Therefore, many studies have exploited human motion data for various applications. Human action recognition [18,19,20,21], animation generation [22,23,24,25] and gait recognition [26,27] are prime examples of such research. Compared with conventional facial and full-body image-based BMI classifications, skeleton-based BMI classification has the advantage of being performed without the user’s cooperation, with the user only being asked to walk in a usual way. Because the user’s gait data can be obtained from the video filmed from various viewpoints as well as a frontal view, the BMI classification process is performed in such a way that the user is unaware of it and does not feel uncomfortable. However, to the best of our knowledge, there has only been one study [28] to date in which a method for classifying BMIs using 3D human motion sequences has been investigated. According to the results presented in [28], classification accuracy was improved up to only 40%. Motivated by these results, in this study, we investigate a method that facilitates the improvement of the classification accuracy of BMI. A block diagram of the proposed BMI classification method is presented in Figure 1.
The main contributions of this study can be summarized as follows.
  • We determine why the BMI classification accuracy in [28] was poor. Because the dataset considered had a class imbalance problem, the authors of [28] used the undersampling technique to balance classes. During our investigation, it was determined that the reason for performance degradation stemmed from the use of this technique. Therefore, we found that it is better to use an oversampling technique rather than an undersampling technique in terms of performance.
  • We demonstrate the usefulness of anthropometric features in the BMI classification problem. During this study, we observe that feature values calculated for each frame are not consistent over time, as a result of the inaccuracy in pose estimation. Furthermore, for some motion sequences, a large variance in feature values was observed over specific periods. Therefore, instead of using traditional anthropometric features, we employ anthropometric features measured over several different time periods (i.e., long/mid/short-term periods). The use of these features contributes to our ensemble model being able to obtain robust and accurate classification results.
  • We propose an ensemble model capable of classifying BMIs from 3D human motion sequences. In the proposed model, multiple k-nearest neighbor (k-NN) classifiers are trained separately using anthropometric features calculated from different time windows. Generalized results are then obtained by integrating the classification results obtained from the k-nearest neighbor (k-NN) classifiers. The evaluation results for the dataset considered in [28] demonstrate that the proposed model achieves state-of-the-art performance.
The remainder of this paper is organized as follows. Section 2 presents a review of related work. The motivation for our study is discussed in Section 3. Our methodology for classifying BMIs from 3D human motion sequences is described in Section 4. The experimental setup is described in Section 5. The results and discussion are presented in Section 6 and Section 7, respectively. Conclusions are presented in Section 8.

2. Related Work

Previous studies on BMI classification have mainly focused on calculating human weight or BMI from facial images [8,9,10,29]. For example, Coetzee et al. [8] demonstrated that three facial features, namely the width-to-height ratio, perimeter-to-area ratio and cheek-to-jaw-width ratio, are highly related to both human weight and BMI. Additionally, they proposed the use of these three features to judge the body weight and BMI of an individual. Motivated by this study, Pham et al. [9] assessed the relationship between these facial features and BMIs for facial images of 911 subjects. They demonstrated that the eye size and average eyebrow height are also related to the BMI. Wen and Guo [10] investigated a method for predicting the BMIs of people from face images. To achieve this goal, they extracted the facial features proposed in [8,9] from passport-style frontal face images with clean backgrounds. They then trained a support vector regression (SVR) model using these features and demonstrated the feasibility of using SVR to predict BMIs. Kocabey et al. [29] explored a method for predicting the BMI using low-quality facial images collected from social media. To achieve this goal, they used the convolutional neural network (CNN) proposed in [30]. They then used the outputs from the sixth fully connected layer of the CNN as features for the BMI prediction. They trained an SVR model using these features and evaluated their model using a publicly available social media face dataset. The evaluation results demonstrated the feasibility of BMI prediction using low-quality facial images.
BMI prediction/classification based on full-body person images has also been studied [11,12,13]. For example, Bipembi et al. [11] proposed a method for obtaining the BMI of an individual from their full-body silhouette image. To achieve this goal, the authors captured frontal full-body images of standing individuals. They then converted these images into silhouette images. Based on a silhouette image, they calculated the area and height of an individual. The area was defined as the number of pixels occupied by a silhouette, and the height was defined as the distance between the highest and lowest pixels in a silhouette. Finally, the BMI was calculated using the extracted values of the area and height. Amador et al. [12] also used human silhouette images for BMI classification. A total of 57 shape features introduced in [31] were extracted from images. Four machine learning algorithms, namely logistic regression, a Bayesian classifier, an artificial neural network and support vector machine (SVM), were tested on their own dataset consisting of 122 subjects. According to their results, the SVM outperformed the other algorithms and achieved an accuracy of approximately 72.16%. Madariaga et al. [13] proposed the calculation of human height from full-body images using the vanishing point and camera height information. For body weight calculation, a sensor called the Wheatstone bridge load cell was used. Using the estimated height and measured body weight information, the BMI was calculated.
Nahavandi et al. [32] investigated a method for estimating BMIs from depth images of people. To achieve this goal, the deep residual network (ResNet) proposed in [33] was used. To train the ResNet, 3D human polygon models were generated using the MakeHuman open-source software. The body surface area (BSA) of each model was calculated. By using the formula BSA = 1 / 6 ( body weight × height ) 0.5 [34], the body weight of each model was estimated. For each polygon model, a BMI was then obtained using the height and estimated body weight. These were then used as ground truth BMI values to train the ResNet. Additionally, 3D depth images were obtained from 3D human polygon models using a rendering software and encoded using red–green–blue (RGB) colorization. These colorized depth images were then used as training data for the ResNet. The authors tested their proposed method on their own dataset. According to their results, the ResNet achieved an accuracy of 95%.
The concept of using anthropometric features for BMI classification was first proposed by Andersson et al. [28]. For BMI classification, they used 20 anthropometric features, including the average length of each limb and the average height of the 3D human skeleton. To test the performance of their model, they constructed a dataset consisting of 112 subjects. Three classes of “underweight”, “normal weight” and “overweight” were used to classify BMIs. However, in their dataset, the number of people categorized as underweight was too small. Most people were categorized as normal weight. It is well known that if a classifier is trained using a dataset with an unequal distribution of classes, that classifier will exhibit poor classification performance for minority classes [35,36,37]. Therefore, in their experiments, Andersson et al. used an undersampling technique that randomly removed some samples from the majority classes to alleviate the imbalance in the dataset. They applied three machine learning algorithms, namely the SVM, k-NN model and multi-layer perceptron (MLP), to the undersampled dataset. According to their results, the BMI classification accuracies of the SVM, k-NN model and MLP were 40%, 25% and 35%, respectively. Because the accuracies of all algorithms were poor, the authors concluded that the anthropometric features were unhelpful for classifying BMIs.

3. Motivation

According to the results of some studies [38,39,40], undersampling techniques do not provide a good solution to the class imbalance problem because they may remove some important and useful data from majority classes. This loss of data can result in the performance degradation of a classifier for both majority and minority classes. Therefore, it can be concluded that the performance drop in terms of classification accuracy [28] may stem from the use of undersampling, rather than the unreliability of anthropometric features. To solve this issue, we generated a training dataset by employing the synthetic minority oversampling technique (SMOTE) proposed in [41] to overcome the lack of minority samples. Additionally, for comparison, we generated a training dataset by employing the undersampling technique used in [28].
Figure 2 illustrates the change of the number of datasets according to the used sampling technique. As shown in Figure 2, the number of samples belonging to the underweight class is significantly lower than the numbers of samples belonging to the other two classes. In this figure, one can see that the original dataset has a class imbalance problem. To overcome this problem, the undersampling technique was employed. Accordingly, some samples were removed from the majority classes. As a result, as shown Figure 2, the normal weight and overweight classes have the same number of samples as the underweight class. Although the class imbalance problem is alleviated by the undersampling technique, the total number of samples in the undersampled training dataset is significantly reduced compared to that in the original dataset. Instead of removing some samples from majority classes, SMOTE augments the samples of minority classes by generating new synthetic samples. As a result, as shown in Figure 2, the number of samples in each minority class is the same as that in the majority class.
Using the training dataset generated by SMOTE, we trained three machine learning algorithms: C-SVM, nu-SVM and k-NN. We then tested these algorithms on the testing dataset. It is noteworthy that the performance of average BMI classification for all three schemes improved significantly. k-NN achieved an accuracy of 92.97%, outperforming the other algorithms.
Therefore, we can confirm that anthropometric features hold validity for the classification of BMIs. Next, we investigated a method for improving BMI classification accuracy. To this end, we carefully reviewed the generation process for anthropometric features and measured the effects of each feature on classification. Traditionally, anthropometric features have been calculated using the length of each limb and the height of an individual averaged over the given frames. Figure 3 presents the body height of an individual over multiple frames (the height of the fourth subject in the dataset proposed by Andersson et al. [28]). As shown in this figure, the values are not consistent between frames as a result of the inaccuracy in pose estimation. Furthermore, there is a specific period during which a large variance among the values can be observed.
Therefore, if anthropometric features are simply extracted, ignoring such large variance over time, BMI classification accuracy could be significantly reduced. To overcome this problem, it is desirable to use anthropometric features localized over several different time periods rather than using features averaged over all frames. Therefore, in this paper, we propose an ensemble model for BMI classification. In the next section, we discuss our methodology in detail.

4. Ensemble for BMI Classification

Figure 4 presents a schematic overview of the proposed method for skeleton-based BMI classification. One can see that a 3D human skeleton sequence is used as an input and is divided into several segments of equal length. In this study, we divide an original sequence into T different segments. Specifically, we let W be the total length of the input motion sequence. In the frame division process, T · ( 1 + T ) / 2 segments corresponding to each length are generated (one for W, two for W / 2 , three for W / 3 , four for W / 4 , five for W / 5 , ⋯, T for W / T ).
For each segment described above, anthropometric features are calculated. For clarification, suppose that a 3D human skeleton model consists of N joints and M limbs. Let J = { 1 , , N } be the set of joint indices and L = { ( i , j ) | i j , i J , j J } be the set of joint pairs for constructing limbs. The values of N and M depend on the product specifications of motion capture sensors. For the Microsoft Kinect sensor [42], N and M are 20 and 19, respectively, as shown in Figure 5.
Let P j [ k ] = ( x j [ k ] , y j [ k ] , z j [ k ] ) be the position of the jth joint at the kth frame in the world coordinate system, where x j [ k ] , y j [ k ] , and z j [ k ] are the coordinates of the jth joint at the kth frame on the X, Y, and Z axes, respectively. The Euclidean distance between the ith and jth joints at the kth frame is then calculated as
d ( P i [ k ] , P j [ k ] ) = ( x i [ k ] x j [ k ] ) 2 + ( y i [ k ] y j [ k ] ) 2 + ( z i [ k ] z j [ k ] ) 2 .
In [43,44,45,46,47,48,49,50], the average length of each limb and average height of the target individual were used as anthropometric features. In this study, we used these features for BMI classification. To obtain anthropometric features for each segment, the length of each limb and height of the individual are measured in each frame. Average values are then used as the anthropometric features for the corresponding segment.
For clarity, let F { W , W / 2 , W / 3 , , W / ( T 1 ) , W / T } be the frame length of a segmented sequence. The average length of each limb over F is then calculated as
A F i , j = 1 F k = 1 F d ( P i [ k ] , P j [ k ] ) , ( i , j ) L .
For simplicity, we define a feature vector A F 1 as
A F 1 = A F i , j for ( i , j ) in L .
Because we consider the 19 limbs presented in Figure 5, the dimension of A F 1 in (3) is 19.
In general, human height is measured as the distance from the bottom of the feet to the top of the head in a human body. According to this definition of human height, the height of a 3D human skeleton also can be calculated as the distance from the joint of feet to the joint of head. However, to measure the height using this traditional method, it is necessary for the user to stand straight up during height measurement. If the user moves during height measurement, the measured height may be inaccurate. In [43], Araujo et al. proposed a method that is capable of estimating the height of the person even if the person moves. This method is widely used to estimate the height of human skeletons [44,45,46,47,48,49,50]. In this study, we used the method to estimate the skeleton’s height. According to the method proposed in [43], the target individual’s height is calculated as the sum of their neck length, upper and lower spine lengths and average lengths of the right and left hips, thighs and shanks. For clarity, let H [ k ] be the subject’s height at the kth frame. Then, H [ k ] is defined as
H [ k ] = d ( P 1 [ k ] , P 2 [ k ] ) = length of neck + d ( P 2 [ k ] , P 11 [ k ] ) = length of upper spine + d ( P 11 [ k ] , P 12 [ k ] ) = length of lower spine + { d ( P 12 [ k ] , P 13 [ k ] ) + d ( P 12 [ k ] , P 14 [ k ] ) } / 2 = average length of the right and left hips + { d ( P 13 [ k ] , P 15 [ k ] ) + d ( P 14 [ k ] , P 16 [ k ] ) } / 2 = average length of the right and left thighs + { d ( P 15 [ k ] , P 17 [ k ] ) + d ( P 16 [ k ] , P 18 [ k ] ) } / 2 = average length of the right and left shanks .
By using H [ k ] from (4), the average height over F is obtained as
A F 2 = 1 F k = 1 F H [ k ] .
After the anthropometric features are obtained using (2) and (5), these features are concatenated into a feature vector as
A F = concat A F 1 , A F 2 ,
where concat ( A F 1 , A F 2 ) is an operator that concatenates A F 2 onto the end of A F 1 . As a result, the dimension of A F in (6) is 20.
We present the pseudo-code for the proposed ensemble learning algorithm in Algorithm 1. As shown in Figure 4, in the proposed method, the ensemble model consists of multiple (i.e., T · ( T + 1 ) / 2 ) k-NNs. For simplicity, hereafter, we refer to the T · ( T + 1 ) / 2 k-NNs as k - NN 1 , k - NN 2 , k - NN 3 , ⋯, k - NN T · ( T + 1 ) / 2 1 , and k - NN T · ( T + 1 ) / 2 . In the training (testing) phase, each A F extracted from the T · ( T + 1 ) / 2 sequence segments is inputted into the corresponding k-NN. This can be explained as follows. As shown in Figure 3, the lengths of body parts and the height of the target individual vary over time as a result of the inaccuracy in pose estimation. Since, traditionally, the average values of anthropometric features over all frames (i.e., W) are calculated, the performance of classifiers trained using them can be affected by the variance of features.
In contrast, in the proposed ensemble learning method, anthropometric feature vectors are calculated as averages over long/mid/short-term periods (i.e., W, W / 2 , W / 3 , ⋯, W / T ). Through the frame division process, the periods in which the variance exists are divided into sub-periods. As a result, the variance for each of the T · ( T + 1 ) / 2 segments is reduced. Additionally, the A F s calculated for T · ( T + 1 ) / 2 segments are used to train the k-NNs in the ensemble model. The use of such A F s helps each k-NN to obtain robust and accurate classification results (i.e., BMI classes).
Let Q = { 1 , 2 , 3 , , T · ( T + 1 ) / 2 1 , T · ( T + 1 ) / 2 } be the set of indices of k-NNs and q be the index of a k-NN (i.e., q Q ). Additionally, let C q be the classification result of k - NN q . For simplicity, we define a classification result vector C Q as
C Q = C q for q in Q = C 1 , C 2 , , C T · ( T + 1 ) / 2 1 , C T · ( T + 1 ) / 2 .
To derive a final classification result from C Q in (7), in this study, the majority voting algorithm was adopted. The pseudo-code for the majority voting algorithm is presented in Algorithm 2.
Algorithm 1 Pseudo-code for the proposed ensemble learning algorithm (training phase).
Input: 
Training data D train consisting of N train human motion sequences
Output: 
Trained ensemble model E
1:
Set the frame division parameter T;
2:
Create a 3D matrix of zeros M 3 d ;
3:
for seq_idx = 1:1: N train do
4:
 seq = D train [ seq _ idx ] ;
5:
 len = size(seq);
6:
 Initialize the index of k-NN q = 0 ;
7:
for div_idx = 1:1:T do
8:
  for seg_idx = 1:1:div_idx do
9:
    q = q + 1 ;
10:
   stard_idx = round(len*(seg_idx-1)/div_idx)+1;
11:
   end_idx = round(len*(seg_idx)/div_idx);
12:
   seg_range = start_idx:1:end_idx;
13:
   segemented_data = D train [ seq _ range ] ;
14:
   Extract A F from segmented_data using (6);
15:
    M 3 d [ seq _ idx , 1 : 20 , q ] = A F ;
16:
    M 3 d [ seq _ idx , 21 , q ] = corresponding_BMI_class;
17:
  end for
18:
end for
19:
end for
20:
model = build_model(T);
# In build_model, an ensemble model consisting of T · ( T + 1 ) / 2 k-NNs are constructed.
21:
E = ensemble_learning(model, M 3 d );
22:
returnE;
Algorithm 2 Pseudo-code for the majority voting algorithm.
Input: 
C Q in (7)
Output: 
majority class C M
1:
Set the class set S = 1 , 2 , , N C ;
# Here, elements in S indicate BMI classes.
2:
Create a 1-by- N C vector of zeros V = 0 , 0 , , 0 ;
3:
Initialize the majority class C M to 0;
# Count the frequency of each class in C Q .
4:
forq = 1:1:size( C Q ) do
5:
V [ C Q [ q ] ] = V [ C Q [ q ] ] + 1 ;
6:
end for
7:
Set C M to the index that corresponds to the maximum value in V;
8:
return C M ;

5. Experimental Setup

We evaluated the proposed method on an existing publicly available BMI classification dataset [28]. This dataset is a unique BMI classification dataset that contains 3D human skeleton sequences as well as the body weights and heights of subjects. To assess the performance of the proposed ensemble model, we used five-fold cross-validation and leave-one-person-out cross-validation.

5.1. Five-Fold Cross Validation

The dataset consists of 112 people with five skeleton sequences for each person. However, for one individual, denoted as “Person158”, only four sequences exist. To implement five-fold cross-validation, we decided to eliminate this individual from the dataset. Additionally, there are six sequences for the following four individuals: “Person034”, “Person036”, “Person053” and “Person096”. After examining the six sequences for each of these individuals, we discarded the noisiest sequence for each individual. As a result, the first sequence for “Person034”, the third sequence for “Person036”, the fifth sequence for “Person053” and the sixth sequence for “Person096” were excluded from the dataset. As a result, the final dataset contained a total of 555 sequences (equaling 111 people × 5 sequences per person).
By using the body weights and heights of the 111 people mentioned above, we calculated the BMI for each individual and categorized the people according to their nutritional statuses. As shown in Table 1, only one person was categorized as “obesity class II”, and there were no people categorized as “obesity class III”. We decided to group “pre-obesity”, “obesity class I”, “obesity class II” and “obesity class III” into one class, called “overweight”. As a result, the number of people included in the overweight class was 32 ( = 25 + 6 + 1 ) . In our experiments, the people included in the class of “underweight” were labeled as “1”. The people included in the classes of “normal weight” and overweight were labeled as “2” and “3”, respectively.
In each cross-validation fold, for each person, four sequences were selected from the five available sequences. The selected 444 sequences (i.e., 111 people × 4 sequences per person) were used to train the proposed model. The remaining 111 (i.e., 555 444 ) sequences were used to test the model.
Although the number of subjects in class 3 (overweight) was increased by consolidation, the three classes were still imbalanced. As shown in Table 1, the number of subjects in class 1 (underweight) was too low. The number of subjects in class 2 (normal weight) was greater than the numbers of subjects in the other classes. During the training phase of the proposed model, we used the SMOTE algorithm proposed in [41] to balance the three classes. For class 2, the number of anthropometric feature vectors was 292 (i.e., 73 people × 4 sequences per person). Therefore, according to the application of SMOTE, the numbers of anthropometric feature vectors for classes 1 and 3 were also 292. As a result, in each cross-validation fold, the model was trained using 876 anthropometric feature vectors (i.e., 292 vectors per class × 3 classes ) .

5.2. Leave-One-Person-Out Cross-Validation

Unlike five-fold cross-validation, all the sequences for 112 people were used in leave-one-person-out cross-validation. As a result, the dataset contained a total of 563 sequences (i.e., 107 people × 5 sequences per person + 1 person × 4 sequences per person + 4 people × 6 sequences per person ) . In addition, since the BMI of Person158 was greater than 25, Person158 was categorized as class 3 (overweight). As a result, the number of subjects included in classes 1, 2 and 3 were 6, 73 and 33, respectively. In each validation round, the skeleton sequences for one person were used to test the model, while the sequences for the remaining 111 people were used for training. In addition, since the three classes were imbalanced, we used the SMOTE algorithm to balance the three classes during the training phase of the proposed model.

5.3. Performance Measurement

In each cross-validation fold (round), to evaluate the performance of the proposed ensemble model, we used the following measures: the true positive rate (TPR), positive predictive value (PPV) and F 1 score. Here, for each class, the TPR, which is also called recall, is defined as
TPR = True positive Condition positive .
For each class, PPV, which is also called precision, is defined as
PPV = True positive Predicted condition positive .
Based on (8) and (9), the F 1 score for each class is calculated as
F 1 = 2 · TPR · PPV TPR + PPV .
Additionally, the accuracy becomes
Accuracy = True positive + True negative Total population .
Furthermore, we computed the macro-average values for TPR, PPV and F 1 for each cross-validation fold as follows:
Macro - average of TPR = b = 1 B TPR ( class b ) B ,
Macro - average of PPV = b = 1 B PPV ( class b ) B ,
Macro - average of F 1 = b = 1 B F 1 ( class b ) B ,
where B is the number of classes. Here, B = 3 because there are three classes: underweight (class 1), normal weight (class 2) and overweight (class 3).

6. Results

We investigated the main factors for the low accuracy of BMI classification reported in [28]. The main reason for this low accuracy was that Andersson et al. in [28] used an undersampling technique to overcome the class imbalance problem in their dataset. For benchmarking, we tested five machine learning algorithms, namely C-SVM, nu-SVM, k-NN, Naive Bayes (NB) and decision tree (DT), in conjunction with the undersampling technique. C-SVM and nu-SVM were implemented using LIBSVM (version 3.21), which is an open-source library for SVMs [51]. Additionally, according to the recommendations in [52], the radial basis function (RBF) kernel was used for the C-SVM and nu-SVM models. C-SVM has a cost parameter denoted as c whose value ranges from zero to infinity. nu-SVM has a regularization parameter denoted as g whose value lies within 0 , 1 . The RBF kernel has a gamma parameter denoted as γ . The grid search method was used to find the best parameter combination for the C-SVM and nu-SVM models. The remaining k-NN, NB and DT models were implemented using the MATLAB functions “fitcknn”, “fitcnb” and “fitctree”, respectively. Additionally, to determine the best hyperparameter configuration for each model, we executed the hyperparameter optimization processes supported by the aforementioned functions. To this end, we used “Statistics and Machine Learning Toolbox”. In addition, the models were implemented and tested in MATLAB R2018a (9.4.0.813654).

6.1. Results of Five-Fold Cross-Validation

During the development of the proposed ensemble model, determining the optimal number of k-NN models to be used was a significant challenge. To this end, we constructed several ensemble models with different numbers of k-NN models and compared the resulting performance metrics. To implement each ensemble model, we used the MATLAB function “fitcknn”. To this end, we used “Statistics and Machine Learning Toolbox”. To determine the optimal hyperparameter configuration for each k-NN model, we performed hyperparameter optimization using the aforementioned function. For the hyperparameter optimization process, there were five optimizable parameters: the value of k, the distance metric, the distance weighting function, the Minkowski distance exponent and a flag to standardize predictors. During the training phase for each model, these five parameters were optimized using the hyperparameter optimization process. The proposed ensemble model was implemented and tested in MATLAB R2018a (9.4.0.813654). The MATLAB code is available at https://sites.google.com/view/beomkwon/bmi-classification.
Table 2 lists the performance metrics for the ensemble models with different numbers of k-NN models. A detailed description of the experimental setup is listed in Table 3. In Table 2, the proposed ensemble model consists of 15 k - NN q models ( q { 1 , 2 , , 14 , 15 } ). “Comparison model #1” consists of three k - NN q models ( q { 1 , 2 , 3 } ). “Comparison model #2” consists of six k - NN q models ( q { 1 , 2 , , 5 , 6 } ). “Comparison model #3” consists of 10 k - NN q models ( q { 1 , 2 , , 9 , 10 } ). “Comparison model #4” consists of 16 k - NN q models ( q { 1 , 2 , , 15 , 16 } ). “Comparison model #5” consists of 17 k - NN q models ( q { 1 , 2 , , 16 , 17 } ). “Comparison model #6” consists of 18 k - NN q models ( q { 1 , 2 , , 17 , 18 } ). “Comparison model #7” consists of 19 k - NN q models ( q { 1 , 2 , , 18 , 19 } ). “Comparison model #8” consists of 20 k - NN q models ( q { 1 , 2 , , 19 , 20 } ). “Comparison model #9” consists of 21 k - NN q models ( q { 1 , 2 , , 20 , 21 } ). One can see that the performance of the ensemble model increased as the number of k-NN models increased. However, when the number of k-NN models was greater than 15, the performance of the ensemble model was not improved. Based on these results, we selected an ensemble model consisting of 15 k-NN models for additional testing. However, as shown in Table 2, since the average running time of the ensemble model increased as the number of k-NN models increased, the trade-off between classification accuracy and running time needs to be considered in real-world applications.
Table 4 lists the optimized parameter settings for the ensemble model consisting of 15 k-NN models in each cross-validation fold.
Table 5 lists the five-fold cross-validation results of the five benchmark methods discussed above. Here, instead of SMOTE, the undersampling technique was used for performance evaluations. Therefore, during the training phase of each cross-validation fold, for class 2, 268 (i.e, 73 × 4 6 × 4 ) anthropometric feature vectors were randomly selected among the 292 total vectors and removed from the training dataset. For class 3, 104 (i.e., 32 × 4 6 × 4 ) vectors were randomly selected among the 128 total vectors and removed from the training dataset. The remaining 72 (i.e., 24 anthropometric feature vectors per class × 3 classes ) vectors were used to train the algorithm for each method. As shown in the table, for each method, there were significant variations in the TPR, PPV and F 1 values over the three classes. These results demonstrate that the undersampling technique used in [28] is not effective in overcoming the class imbalance problem.
Table 6 lists the five-fold cross-validation results of the five benchmark methods when SMOTE is used to alleviate the class imbalance problem. The results in this table demonstrate that SMOTE improves the TPR, PPV and F 1 score values of the five methods compared to the results in Table 5. By using (12) to (14), we computed the average metric values over five-fold cross-validation, as shown in Table 7. The results in this table demonstrate that it is better to use SMOTE instead of the undersampling technique to alleviate the class imbalance problem. For all benchmark methods, average performance improvements are achieved when SMOTE is applied. In particular, the k-NN model outperforms the other methods, achieving results of TPR = 0.9276, PPV = 0.8512, F 1 = 0.8798 and accuracy = 0.9279. Based on these results, we decided to use k-NN models to construct the proposed ensemble model.
Figure 6 presents the confusion matrices for the proposed ensemble model for each cross-validation fold. The diagonal elements in each confusion matrix indicate the numbers of correctly classified samples. The other elements indicate the numbers of incorrectly classified samples. One can see that the proposed ensemble model classifies class 1 as the minority class among the three classes without misclassification. Additionally, for class 3 (the second minority class), the ensemble model also exhibits a low misclassification rate.
Based on the results in the confusion matrices in Figure 6, we computed the TPR, PPV and F 1 score values of each class for five-fold cross-validation. Additionally, we computed the macro-average values of these metrics and classification accuracy. The performance evaluation results for the proposed ensemble model are summarized in Table 8. As shown in this table, the proposed model exhibits a robust and accurate classification performance for the minority class and majority class, achieving approximately 98.2% average classification accuracy.
Table 9 lists the macro-average values of TPR, PPV and F 1 scores, as well as the accuracy of each method over five-fold cross-validation. One can see that the proposed ensemble model performs best in terms of all evaluation metrics. This is because the ensemble model is trained using anthropometric features calculated over long/mid/short-term periods. In other words, the use of such features enables the ensemble model to be trained effectively by minimizing the adverse effects of variance in extracted features. As a result, the proposed model can achieve robust and accurate BMI classification performance. Among the considered benchmark methods, the k-NN model achieves the best performance, whereas DT achieves the worst performance. The classification accuracy of the proposed model is approximately 5.23% greater than that of a single k-NN model.
To verify the benefits of using the anthropometric features calculated for various different periods in the BMI classification task, we analyzed the standard deviations of the anthropometric features. For explanation, let A F ( r ) be A F for the rth skeleton sequence in the dataset. Here, according to the definition of A F in (6), the dimension of A F ( r ) is 20. In addition, let A F ( r ) [ u ] , u { 1 , 2 , , 19 , 20 } , be the uth element of A F ( r ) . Then, the average value of A F ( r ) [ u ] over the whole sequences can be obtained as
A V G [ u ] = r = 1 R A F ( r ) [ u ] , u { 1 , 2 , , 19 , 20 } ,
where R is the total number of skeleton sequences.
Based on (15), the standard deviation of each of the 20 anthropometric features is calculated as
S T D [ u ] = r = 1 R { A F ( r ) [ u ] A V G [ u ] } 2 R 1 , u { 1 , 2 , , 19 , 20 } ,
Figure 7 presents the standard deviations of the 20 anthropometric features for five different periods (i.e., W, W / 2 , W / 3 , W / 4 and W / 5 ). In this figure, for the cases of W / 2 , W / 3 , W / 4 and W / 5 , we calculated the average of the standard deviations. For example, for W / 5 , there are five equal-length segments according to the frame division process of the proposed method. We calculated the standard deviations in (16) for each of the segements, and then calculated the average of them over the five segments. As shown in Figure 7a, the features have high standard deviations when they are calculated for all frames (i.e., W). In contrast, the features calculated for W / 2 , W / 3 , W / 4 and W / 5 have relatively low standard deviations. In particular, the features calculated for W / 5 have the lowest standard deviation values. A low standard deviation indicates that the features are clustered around the average. A high standard deviation means that the features are spread out over a wide range. Because the anthropometric features are calculated as average lengths over a given segment sequence, the high standard deviations shown in Figure 7a may adversely affect the performance of machine learning algorithms. In contrast, in our ensemble learning method, anthropometric features with low standard deviations are used to train/test multiple k-NN models. Based on the use of these features, the proposed ensemble model can be trained effectively by minimizing the adverse effects of variance in extracted features, resulting in state-of-the-art performance.

6.2. Results of Leave-One-Person-Out Cross-Validation

Figure 8 shows the leave-one-person-out cross-validation process used in this work. As shown in the figure, in each validation round, the skeleton sequences for one person were used to test the classifier, and the sequences for the remaining 111 people were used for training. In each validation round, the predicted results (i.e., predicted BMI classes) for the testing skeleton sequences were obtained. After all 112 validations rounds were completed, we calculated a confusion matrix using all predicted results. Based on the confusion matrix, we calculated the TPR, PPV and F 1 values over the three classes in order to evaluate the performance of the classifier.
Table 10 shows the leave-one-person-out cross-validation results of the five benchmark methods when the undersampling technique was used. On the other hand, Table 11 shows the leave-one-person-out cross-validation results when SMOTE was used. By using (12) to (14), we computed the macro-average values of these metrics over the three classes. In addition, we calculated the classification accuracy of each benchmark method. The performance evaluation results for the five benchmark methods are summarized in Table 12. From the table, it is seen that SMOTE improves the macro-average of the TPR, PPV and F 1 of each benchmark method, compared with the results where the undersampling technique was used. In addition, the BMI classification accuracy of each method also improved when SMOTE was used. These results demonstrate that SMOTE is more effective at overcoming the class imbalance problem in the dataset used in [28] than the undersampling technique.
To find the optimal number of k-NN classifiers in the proposed ensemble model, we evaluated the performance of the ensemble models with different numbers of k-NN classifiers. Table 13 lists the performance metrics for each model. From this table, it can be seen that the best performance was achieved when the ensemble model consisted of 15 k - NN q models ( q { 1 , 2 , , 14 , 15 } ). Based on these results, we selected an ensemble model consisting of 15 k-NN models for additional testing.
Table 14 lists the macro-average values of TPR, PPV and F 1 scores, as well as the accuracy of each method over leave-one-person-out cross-validation. As shown in the table, for all evaluation metrics, the proposed method outperforms the other methods, achieving a BMI classification accuracy of 73%. In the proposed method, anthropometric features calculated over long/mid/short-term periods were used for the training of the ensemble model. In addition, the use of such features in the training phase could reduce the adverse effects of a variance in extracted features. As a result, the model could be trained effectively and achieved the best performance among the methods.

7. Discussion

In this study, we used human body joints estimated from depth images captured by the Kinect (v1) sensor [42]. These estimated joints generally contain the noise and error introduced by an uncertainty. The majority of the uncertainty is caused by occlusion and depth ambiguity [17]. Due to this uncertainty, the values of anthropometric features vary by measurement time, even if they are extracted from the same parts of the human body. In conventional methods, for each anthropometric feature, the average value is calculated over the whole period. Then, the average values for the features are used as inputs to classifiers. The performance of classifiers trained using these values can be affected by the variance of features. In order to minimize this adverse effect, we propose the use of anthropometric features calculated over several different time periods. However, after the Kinect v1 sensor was released in 2010, the Kinect v2 sensor was released in 2014, and Wang et al. [53] evaluated the human body joint estimation accuracies of the Kinect v1 and v2 sensors; according to the results in [53], compared with the Kinect v1 sensor, the Kinect v2 sensor had better accuracy in joint estimation and showed more robust results for occlusion and body rotation. Recently, the new version of the Kinect sensor, called Azure Kinect, was released in 2019. Albert et al. [54] reviewed the improved hardware implementation and motion tracking algorithm of the Azure Kinect sensor. The authors also evaluated the motion tracking performances of the Kinect v2 and Azure Kinect sensors. Their results demonstrated that the Azure Kinect sensor achieved higher accuracy in human motion tracking than the Kinect v2 sensor. Furthermore, with the advancements in camera techniques, releases of new sensors with better hardware and motion tracking algorithms will continue. Due to the use of state-of-the-art sensors, if the noise and error in the estimated joints are reduced and the anthropometric features extracted at each time all become exactly equal, the feature vectors calculated over different time periods by the method will all become equal. Therefore, the usefulness and effectiveness of our proposed method could be limited in this case.

8. Conclusions

In general, there are variances in anthropometric features calculated over all frames of a sequence as a result of the inaccuracy in pose estimation. These variances adversely affect the classification performance of machine learning algorithms. Therefore, to minimize these adverse effects, we proposed the use of anthropometric features that are calculated over long/mid/short-term periods. Additionally, we proposed a novel ensemble model consisting of 15 k-NN models trained using features calculated over several different time periods. Experimental results demonstrated that the proposed ensemble model outperforms five benchmark methods and achieves state-of-the-art performance on a publicly available BMI classification dataset. In practical situations, some joints can be invisible due to occlusions. If the joints are invisible over the whole time interval, the corresponding anthropometric features cannot be obtained. In this case, the anthropometric feature vector may contain “not a number”, called NaN. As the occlusions become severe, the number of such NaN values in the anthropometric feature vector increases. The use of these vectors in BMI classification can adversely affect the classification accuracy of the ensemble model. However, in this study, we considered only the condition that there was no occlusion and that all 20 joints were visible during the whole time interval. Therefore, as part of our future work, we plan to extend our method to classify the BMIs of people for whom some body parts are occluded.

Author Contributions

Conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, and writing—original draft preparation, B.K.; writing—review and editing, B.K. and S.L.; supervision and funding acquisition, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) funded by the Korea Government (Ministry of Science and ICT, MSIT) under Grant NRF-2020R1A2C3011697.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Nguyen, N.T.; Nguyen, X.M.T.; Lane, J.; Wang, P. Relationship between obesity and diabetes in a US adult population: Findings from the national health and nutrition examination survey, 1999–2006. Obes. Surg. 2011, 21, 351–355. [Google Scholar] [PubMed] [Green Version]
  2. Chen, Y.; Rennie, D.C.; Lockinger, L.A.; Dosman, J.A. Association between obesity and high blood pressure: Reporting bias related to gender and age. Int. J. Obes. 1998, 22, 771–777. [Google Scholar]
  3. Lai, S.W.; Ng, K.C.; Lin, H.F.; Chen, H.L. Association between obesity and hyperlipidemia among children. Yale J. Biol. Med. 2001, 74, 205–210. [Google Scholar] [PubMed]
  4. Thiet, M.D.; Mittelstaedt, C.A.; Herbst, C.A.; Buckwalter, J.A. Cholelithiasis in morbid obesity. South. Med. J. 1984, 77, 415–417. [Google Scholar] [PubMed]
  5. De Sousa, A.G.P.; Cercato, C.; Mancini, M.C.; Halpern, A. Obesity and obstructive sleep apnea-hypopnea syndrome. Obes. Rev. 2008, 9, 340–354. [Google Scholar] [PubMed]
  6. Magliano, M. Obesity and arthritis. Menopause Int. 2008, 14, 149–154. [Google Scholar]
  7. Scott, K.M.; McGee, M.A.; Wells, J.E.; Browne, M.A.O. Obesity and mental disorders in the adult general population. J. Psychosom. Res. 2008, 64, 97–105. [Google Scholar]
  8. Coetzee, V.; Chen, J.; Perrett, D.I.; Stephen, I.D. Deciphering faces: Quantifiable visual cues to weight. Perception 2010, 39, 51–61. [Google Scholar]
  9. Pham, D.D.; Do, J.H.; Ku, B.; Lee, H.J.; Kim, H.; Kim, J.Y. Body mass index and facial cues in Sasang typology for young and elderly persons. Evid. Based Complement. Altern. Med. 2011, 2011, 749209. [Google Scholar] [CrossRef] [Green Version]
  10. Wen, L.; Guo, G. A computational approach to body mass index prediction from face images. Image Vis. Comput. 2013, 31, 392–400. [Google Scholar]
  11. Bipembi, H.; Panford, J.K.; Appiah, O. Calculation of body mass index using image processing techniques. Int. J. Artif. Intell. Mechatron. 2015, 4, 1–7. [Google Scholar]
  12. Amador, J.D.; Cabrera, J.E.; Cervantes, J.; Jalili, L.D.; Castilla, J.S.R. Automatic calculation of body mass index using digital image processing. In Proceedings of the Workshop on Engineering Applications (WEA), Medellín, Colombia, 17–19 October 2018; pp. 309–319. [Google Scholar]
  13. Madariaga, N.E.Q.; Noel, B.L. Application of artificial neural network and background subtraction for determining body mass index (BMI) in android devices using bluetooth. Int. J. Eng. Technol. 2016, 8, 366–370. [Google Scholar] [CrossRef] [Green Version]
  14. Pons-Moll, G.; Baak, A.; Helten, T.; Müller, M.; Seidel, H.P.; Rosenhahn, B. Multisensor-fusion for 3d full-body human motion capture. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 663–670. [Google Scholar]
  15. Ganapathi, V.; Plagemann, C.; Koller, D.; Thrun, S. Real time motion capture using a single time-of-flight camera. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 755–762. [Google Scholar]
  16. Du, Y.; Wong, Y.; Liu, Y.; Han, F.; Gui, Y.; Wang, Z.; Kankanhalli, M.; Geng, W. Marker-less 3d human motion capture with monocular image sequence and height-maps. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 20–36. [Google Scholar]
  17. Lee, K.; Lee, I.; Lee, S. Propagating LSTM: 3d pose estimation based on joint interdependency. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 119–135. [Google Scholar]
  18. Kwon, B.; Kim, D.; Kim, J.; Lee, I.; Kim, J.; Oh, H.; Kim, H.; Lee, S. Implementation of human action recognition system using multiple Kinect sensors. In Proceedings of the 16th Pacific Rim Conference on Multimedia (PCM), Gwangju, Korea, 16–18 September 2015; pp. 334–343. [Google Scholar]
  19. Kwon, B.; Kim, J.; Lee, S. An enhanced multi-view human action recognition system for virtual training simulator. In Proceedings of the Asia–Pacific Signal and Information Processing Association Annual Summit Conference (APSIPA ASC), Jeju, Korea, 13–16 December 2016; pp. 1–4. [Google Scholar]
  20. Kwon, B.; Kim, J.; Lee, K.; Lee, Y.K.; Park, S.; Lee, S. Implementation of a virtual training simulator based on 360 multi-view human action recognition. IEEE Access 2017, 5, 12496–12511. [Google Scholar] [CrossRef]
  21. Lee, I.; Kim, D.; Lee, S. 3D human behavior understanding using generalized TS-LSTM networks. IEEE Trans. Multimed. 2020. [Google Scholar] [CrossRef]
  22. Wen, G.; Wang, Z.; Xia, S.; Zhu, D. From motion capture data to character animation. In Proceedings of the ACM Symposium on Virtual Reality Software and Technology (VRST), Limassol, Cyprus, 1–3 November 2006; pp. 165–168. [Google Scholar]
  23. Zhang, X.; Biswas, D.S.; Fan, G. A software pipeline for 3D animation generation using mocap data and commercial shape models. In Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR), Xi’an, China, 5–7 July 2010; pp. 350–357. [Google Scholar]
  24. Zhuang, Y.; Xiao, J.; Wu, Y.; Yang, T.; Wu, F. Automatic generation of human animation based on motion programming. Comput. Animat. Virtual Worlds 2005, 16, 305–318. [Google Scholar] [CrossRef]
  25. Kwon, B.; Huh, J.; Lee, K.; Lee, S. Optimal camera point selection toward the most preferable view of 3D human pose. IEEE Trans. Syst. Man Cybern. Syst. 2020. [Google Scholar] [CrossRef]
  26. Gu, J.; Ding, X.; Wang, S.; Wu, Y. Action and gait recognition from recovered 3-D human joints. IEEE Trans. Syst. Man Cybern. Part B 2010, 40, 1021–1033. [Google Scholar]
  27. Choi, S.; Kim, J.; Kim, W.; Kim, C. Skeleton-based gait recognition via robust frame-level matching. IEEE Trans. Inf. Forensics Secur. 2019, 14, 2577–2592. [Google Scholar] [CrossRef]
  28. Andersson, V.O.; Amaral, L.S.; Tonini, A.R.; Araujo, R.M. Gender and body mass index classification using a Microsoft Kinect sensor. In Proceedings of the 28th International Florida Artificial Intelligence Research Society (FLAIRS) Conference, Hollywood, FL, USA, 18–20 May 2015; pp. 103–106. [Google Scholar]
  29. Kocabey, E.; Camurcu, M.; Ofli, F.; Aytar, Y.; Marin, J.; Torralba, A.; Weber, I. Face-to-BMI: Using computer vision to infer body mass index on social media. In Proceedings of the International AAAI Conference on Web and Social Media (ICWSM), Montreal, Canada, 15–18 May 2017; pp. 572–575. [Google Scholar]
  30. Parkhi, O.M.; Vedaldi, A.; Zisserman, A. Deep face recognition. In Proceedings of the British Machine Vision Conference (BMVC) 2015, Swansea, UK, 7–10 September 2015; pp. 1–12. [Google Scholar]
  31. Mingqiang, Y.; Kidiyo, K.; Joseph, R. A survey of shape feature extraction techniques. Pattern Recognit. 2008, 15, 43–90. [Google Scholar]
  32. Nahavandi, D.; Abobakr, A.; Haggag, H.; Hossny, M.; Nahavandi, S.; Filippidis, D. A skeleton-free Kinect system for body mass index assessment using deep neural networks. In Proceedings of the IEEE International Systems Engineering Symposium (ISSE), Vienna, Austria, 11–13 October 2017; pp. 1–6. [Google Scholar]
  33. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  34. Reading, B.D.; Freeman, B. Simple formula for the surface area of the body and a simple model for anthropometry. Clin. Anat. 2005, 18, 126–130. [Google Scholar] [CrossRef]
  35. He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar]
  36. Elrahman, S.M.A.; Abraham, A. A review of class imbalance problem. J. Netw. Innov. Comput. 2013, 1, 332–340. [Google Scholar]
  37. Ali, A.; Shamsuddin, S.M.; Ralescu, A.L. Classification with class imbalance problem: A review. Int. J. Adv. Soft Comput. Appl. 2015, 7, 176–204. [Google Scholar]
  38. Batista, G.E.; Prati, R.C.; Monard, M.C. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newsl. 2004, 6, 20–29. [Google Scholar] [CrossRef]
  39. Van Hulse, J.; Khoshgoftaar, T.M.; Napolitano, A. An empirical comparison of repetitive undersampling techniques. In Proceedings of the IEEE International Conference on Information Reuse & Integration, Las Vegas, NV, USA, 10–12 August 2009; pp. 29–34. [Google Scholar]
  40. Ganganwar, V. An overview of classification algorithms for imbalanced datasets. Int. J. Emerg. Technol. Adv. Eng. 2012, 2, 42–47. [Google Scholar]
  41. Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
  42. Shotton, J.; Fitzgibbon, A.; Cook, M.; Sharp, T.; Finocchio, M.; Moore, R.; Kipman, A.; Blake, A. Real-time human pose recognition in parts from single depth images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 20–25 June 2011; pp. 1297–1304. [Google Scholar]
  43. Araujo, R.; Graña, G.; Andersson, V. Towards skeleton biometric identification using the Microsoft Kinect sensor. In Proceedings of the 28th Symposium on Applied Computing (SAC), Coimbra, Portugal, 18–22 March 2013; pp. 21–26. [Google Scholar]
  44. Andersson, V.; Dutra, R.; Araujo, R. Anthropometric and human gait identification using skeleton data from Kinect sensor. In Proceedings of the 29th Symposium on Applied Computing (SAC), Gyeongju, Korea, 24–28 March 2014; pp. 60–61. [Google Scholar]
  45. Andersson, V.; Araujo, R. Full body person identification using the Kinect sensor. In Proceedings of the 26th IEEE International Conference on Tools with Artificial Intelligence (ICTAI), Limassol, Cyprus, 10–12 November 2014; pp. 627–633. [Google Scholar]
  46. Andersson, V.; Araujo, R. Person identification using anthropometric and gait data from Kinect sensor. In Proceedings of the 29th Association for the Advancement of Artificial Intelligence (AAAI) Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 425–431. [Google Scholar]
  47. Yang, K.; Dou, Y.; Lv, S.; Zhang, F.; Lv, Q. Relative distance features for gait recognition with Kinect. J. Vis. Commun. Image Represent. 2016, 39, 209–217. [Google Scholar] [CrossRef] [Green Version]
  48. Huitzil, I.; Dranca, L.; Bernad, J.; Bobillo, F. Gait recognition using fuzzy ontologies and Kinect sensor data. Int. J. Approx. Reason. 2019, 113, 354–371. [Google Scholar] [CrossRef]
  49. Sun, J.; Wang, Y.; Li, J.; Wan, W.; Cheng, D.; Zhang, H. View-invariant gait recognition based on Kinect skeleton feature. Multimed. Tools Appl. 2018, 77, 24909–24935. [Google Scholar] [CrossRef]
  50. Kwon, B.; Lee, S. Human skeleton data augmentation for person identification over deep neural network. Appl. Sci. 2020, 10, 4849. [Google Scholar] [CrossRef]
  51. Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 1–27. [Google Scholar] [CrossRef]
  52. Hsu, C.W.; Chang, C.C.; Lin, C.J. A Practical Guide to Support Vector Classification; National Taiwan University: Taipei, Taiwan, 2003. [Google Scholar]
  53. Wang, Q.; Kurillo, G.; Ofli, F.; Bajcsy, R. Evaluation of pose tracking accuracy in the first and second generations of microsoft kinect. In Proceedings of the IEEE International Conference on Healthcare Informatics, Dallas, TX, USA, 21–23 October 2015; pp. 380–389. [Google Scholar]
  54. Albert, J.A.; Owolabi, V.; Gebel, A.; Brahms, C.M.; Granacher, U.; Arnrich, B. Evaluation of the pose tracking performance of the Azure Kinect and Kinect v2 for gait analysis in comparison with a gold standard: A pilot study. Sensors 2020, 20, 5104. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Block diagram for classifying the body mass index (BMI) from 3D human motion sequence.
Figure 1. Block diagram for classifying the body mass index (BMI) from 3D human motion sequence.
Applsci 10 07812 g001
Figure 2. Change of the number of datasets according to sampling technique. Here, the underweight, normal weight and overweight classes are denoted as Classes I, II and III, respectively.
Figure 2. Change of the number of datasets according to sampling technique. Here, the underweight, normal weight and overweight classes are denoted as Classes I, II and III, respectively.
Applsci 10 07812 g002
Figure 3. Body heights calculated at each frame.
Figure 3. Body heights calculated at each frame.
Applsci 10 07812 g003
Figure 4. Schematic overview of the proposed method for skeleton-based BMI classification.
Figure 4. Schematic overview of the proposed method for skeleton-based BMI classification.
Applsci 10 07812 g004
Figure 5. Three-dimensional human skeleton model consisting of 20 joints and 19 limbs.
Figure 5. Three-dimensional human skeleton model consisting of 20 joints and 19 limbs.
Applsci 10 07812 g005
Figure 6. Confusion matrices for the proposed ensemble model for different cross-validation folds.
Figure 6. Confusion matrices for the proposed ensemble model for different cross-validation folds.
Applsci 10 07812 g006
Figure 7. Standard deviations of 20 anthropometric features for long/mid/short-term periods.
Figure 7. Standard deviations of 20 anthropometric features for long/mid/short-term periods.
Applsci 10 07812 g007
Figure 8. Leave-one-person-out cross-validation process to evaluate the performance of each method.
Figure 8. Leave-one-person-out cross-validation process to evaluate the performance of each method.
Applsci 10 07812 g008
Table 1. Number of people for each nutritional status.
Table 1. Number of people for each nutritional status.
BMINutritional Status# of Persons
Below 18.5Underweight6
18.5–24.9Normal weight73
25.0–29.9ObesityPre-obesity25
30.0–34.9Obesity class I6
35.0–39.9Obesity class II1
Above 40.0Obesity class III0
Table 2. Five-fold cross-validation performance comparisons between ensemble models with different numbers of k-NN models. TPR: true positive rate; PPV: positive predictive value.
Table 2. Five-fold cross-validation performance comparisons between ensemble models with different numbers of k-NN models. TPR: true positive rate; PPV: positive predictive value.
Method# of k-NNsMacro-AverageAccuracyRunning Time
TPRPPV F 1
Comparison model #130.95880.86760.90010.94410.5565 s
Comparison model #260.96770.90920.93150.96400.9856 s
Comparison model #3100.97900.94280.95850.97481.5512 s
Proposed ensemble model150.98620.96030.97210.98202.2610 s
Comparison model #4160.97580.95660.96460.97302.3994 s
Comparison model #5170.97460.95750.96430.97302.5386 s
Comparison model #6180.97760.95950.96690.97662.6790 s
Comparison model #7190.97460.95750.96430.97302.8217 s
Comparison model #8200.96860.95850.96270.97662.9629 s
Comparions model #9210.98090.95940.96890.97843.1046 s
Table 3. Experimental setup.
Table 3. Experimental setup.
ItemExplanation
Operating SystemWindows 10 Pro
ProcessorInter(R) Core(TM) i5-8250U CPU @ 1.60 GHz
Memory8.00 GB
GPUIntel(R) UHD Graphics 620
Table 4. Parameter settings for the proposed ensemble model in five-fold cross-validation.
Table 4. Parameter settings for the proposed ensemble model in five-fold cross-validation.
Fold #1Fold #2Fold #3Fold #4Fold #5
k - NN 1 (1, m, in, 0.7140, 0)(1, c 3 , sq, -, 1)(1, s, sq, -, 0)(1, c 3 , sq, -, 1)(1, c 1 , sq, -, 0)
k - NN 2 (1, m, eq, 0.7075, 1)(1, c 2 , sq, -, 1)(1, c 3 , sq, -, 1)(1, c 3 , eq, -, 1)(1, e, sq, -, 1)
k - NN 3 (2, c 3 , in, -, 1)(1, c 2 , sq, -, 1)(1, m, sq, 0.6965, 1)(1, e, sq, -, 1)(1, e, in, -, 1)
k - NN 4 (1, c 3 , in, -, 1)(1, m, eq, 0.5485, 1)(1, c 3 , in, -, 1)(1, c 3 , sq, -, 1)(1, m, sq, 0.6046, 0)
k - NN 5 (1, c 3 , sq, -, 1)(1, c 3 , sq, -, 1)(2, c 3 , in, -, 1)(1, m, sq, 1.2213, 1)(3, c 3 , sq, -, 1)
k - NN 6 (1, m, sq, 0.7156, 1)(1, c 3 , in, -, 1)(1, m, eq, 0.7499, 1)(1, m, sq, 1.1226, 1)(1, c 1 , sq, -, 1)
k - NN 7 (1, m, sq, 0.7317, 0)(1, s, in, -, 0)(1, e, sq, -, 1)(1, s, sq, -, 0)(1, m, eq, 0.8746, 1)
k - NN 8 (1, e, sq, -, 1)(1, c 3 , sq, -, 1)(1, m, sq, 0.7044, 1)(1, c 1 , sq, -, 1)(1, m, in, 0.5390, 1)
k - NN 9 (1, e, sq, -, 1)(1, c 1 , sq, -, 1)(1, e, sq, -, 1)(1, m, eq, 1.5315, 1)(1, c 2 , in, -, 1)
k - NN 10 (1, m, in, 0.9299, 0)(1, c 3 , in, -, 1)(1, c 3 , sq, -, 1)(1, m, in, 0.6912, 0)(1, m, eq, 1.3160, 0)
k - NN 11 (1, m, in, 0.7098, 1)(1, m, eq, 1.7090, 1)(1, c 2 , sq, -, 1)(1, m, eq, 0.8312, 0)(1, s, sq, -, 0)
k - NN 12 (1, e, sq, -, 1)(1, m, sq, 1.0923, 1)(1, e, sq, -, 1)(1, m, sq, 1.1597, 1)(1, e, sq, -, 1)
k - NN 13 (1, c 3 , sq, -, 1)(1, m, in, 1.4446, 0)(2, c 3 , in, -, 1)(1, e, sq, -, 1)(2, c 1 , in, -, 0)
k - NN 14 (1, c 1 , eq, -, 0)(1, e, sq, -, 1)(1, m, sq, 1.0183, 1)(1, c 1 , sq, -, 1)(1, m, sq, 1.3201, 1)
k - NN 15 (1, e, sq, -, 1)(1, s, eq, -, 0)(1, c 3 , sq, -, 1)(1, s, sq, -, 0)(1, c 3 , sq, -, 1)
Note: Elements in brackets indicate k, distance metric, distance weighting function, Minkowski distance exponent, and flag to standardize predictors, respectively. In this table, the distance metrics “cityblock”, “correlation”, “cosine”, “euclidean”, “minkowski” and “seuclidean” are represented as “ c 1 ”, “ c 2 ”, “ c 3 ”, “e”, “m”, and “s”, respectively. The distance weighting functions “equal”, “inverse” and “squaredinverse” are denoted as “eq”, “in” and “sq”, respectively. k-NN: k-nearest neighbor.
Table 5. Five-fold cross-validation results of the five benchmark methods when the undersampling technique is used. SVM: support vector machine; NB: naive Bayes; DT: decision tree.
Table 5. Five-fold cross-validation results of the five benchmark methods when the undersampling technique is used. SVM: support vector machine; NB: naive Bayes; DT: decision tree.
MethodClassFold #1Fold #2Fold #3Fold #4Fold #5
TPRPPV F 1 TPRPPV F 1 TPRPPV F 1 TPRPPV F 1 TPRPPV F 1
C-SVM11.000.170.290.670.800.730.670.670.670.671.000.800.830.710.77
20.410.860.560.920.770.840.860.790.820.920.790.850.950.790.86
30.560.440.490.410.680.510.530.680.600.500.730.590.410.760.53
nu-SVM10.670.360.470.670.570.620.670.670.670.671.000.800.830.710.77
20.750.800.770.860.780.820.840.800.820.960.760.850.930.800.86
30.530.550.540.440.610.510.590.660.620.380.800.510.440.740.55
k-NN [28]11.000.270.431.000.250.400.830.170.281.000.230.381.000.250.40
20.450.770.570.520.900.660.470.890.610.410.750.530.520.830.64
30.590.410.490.750.530.620.660.490.560.560.400.470.590.460.52
NB10.330.100.150.330.080.130.500.110.180.830.080.140.830.370.59
20.410.810.550.120.560.200.520.750.610.290.810.420.190.770.38
30.630.380.470.690.320.440.410.410.410.280.430.340.310.500.46
DT10.330.060.100.670.140.230.500.100.170.500.170.250.830.130.23
20.470.710.560.380.820.520.440.740.550.420.630.510.290.640.40
30.380.410.390.530.350.430.380.320.340.440.320.370.470.380.42
Table 6. Five-fold cross-validation results of the five benchmark methods when the synthetic minority oversampling technique (SMOTE) is used.
Table 6. Five-fold cross-validation results of the five benchmark methods when the synthetic minority oversampling technique (SMOTE) is used.
MethodClassFold #1Fold #2Fold #3Fold #4Fold #5
TPRPPV F 1 TPRPPV F 1 TPRPPV F 1 TPRPPV F 1 TPRPPV F 1
C-SVM10.500.750.600.671.000.800.671.000.800.831.000.911.000.750.86
20.970.850.900.960.930.951.000.950.970.970.990.980.990.920.95
30.660.910.760.910.910.910.910.970.941.000.940.970.750.960.84
nu-SVM10.500.750.600.671.000.800.671.000.800.831.000.911.000.750.86
20.960.850.900.950.920.931.000.950.970.970.990.980.990.940.96
30.690.880.770.880.880.880.910.970.941.000.940.970.780.960.86
k-NN [28]10.830.500.631.000.860.920.830.710.771.000.750.861.000.550.76
20.900.930.920.970.970.970.961.000.980.931.000.960.920.970.94
30.780.830.810.910.940.921.000.940.971.000.910.960.880.900.89
NB10.330.130.190.170.130.140.500.380.430.330.290.310.670.270.38
20.680.790.740.780.790.790.780.830.800.730.870.790.640.870.74
30.530.520.520.630.650.630.630.590.610.780.580.670.660.500.57
DT10.170.100.130.500.200.290.500.430.460.500.430.460.670.310.42
20.710.710.710.730.790.760.770.770.770.820.900.860.700.780.74
30.380.430.400.530.590.560.470.480.480.750.650.700.530.520.52
Table 7. Average performance comparisons of five benchmark methods over five-fold cross validation.
Table 7. Average performance comparisons of five benchmark methods over five-fold cross validation.
MethodMacro-AverageAccuracy
TPRPPV F 1
C-SVM + undersampling0.68630.70960.66040.7135
C-SVM + SMOTE0.85170.92150.87600.9261
nu-SVM + undersampling0.68120.70690.67830.7459
nu-SVM + SMOTE0.85200.91780.87520.9243
k-NN + undersampling [28]0.69060.50740.50350.5459
k-NN + SMOTE0.92760.85120.87980.9297
NB + undersampling0.47600.41060.35370.4054
NB + SMOTE0.58900.54440.55360.6829
DT + undersampling0.46810.39450.36400.4198
DT + SMOTE0.58100.53850.54930.6685
Table 8. Results of five-fold cross-validation for the proposed ensemble model.
Table 8. Results of five-fold cross-validation for the proposed ensemble model.
Fold #ClassTPRPPV F 1 Macro-AverageAccuracy
TPRPPV F 1
1Underweight1.00000.85710.92310.95050.90720.92700.9369
Normal weight0.94520.95830.9517
Overweight0.90630.90630.9063
2Underweight1.00000.85710.92310.98500.94200.96160.9820
Normal weight0.98631.00000.9931
Overweight0.96880.96880.9688
3Underweight1.00001.00001.00001.00001.00001.00001.0000
Normal weight1.00001.00001.0000
Overweight1.00001.00001.0000
4Underweight1.00001.00001.00001.00001.00001.00001.0000
Normal weight1.00001.00001.0000
Overweight1.00001.00001.0000
5Underweight1.00000.85710.92310.99540.95240.97210.9910
Normal weight0.98631.00000.9931
Overweight1.00001.00001.0000
Average0.98620.96030.97210.9820
Table 9. Five-fold cross-validation performance comparisons between the proposed ensemble model and five benchmark methods.
Table 9. Five-fold cross-validation performance comparisons between the proposed ensemble model and five benchmark methods.
MethodMacro-AverageAccuracy
TPRPPV F 1
C-SVM0.85170.92150.87600.9261
nu-SVM0.85200.91780.87520.9243
k-NN [28]0.92760.85120.87980.9297
NB0.58900.54440.55360.6829
DT0.58100.53850.54930.6685
Proposed0.98620.96030.97210.9820
Table 10. Leave-one-person-out cross-validation results of the five benchmark methods when the undersampling technique is used.
Table 10. Leave-one-person-out cross-validation results of the five benchmark methods when the undersampling technique is used.
MethodClassTPRPPV F 1 Macro-AverageAccuracy
TPRPPV F 1
C-SVMUnderweight0.26670.05630.09300.33970.34930.30570.3837
Normal weight0.41300.72730.5269
Overweight0.33940.26420.2971
nu-SVMUnderweight0.26670.05330.08890.33610.34740.30190.3766
Normal weight0.40220.71840.5157
Overweight0.33940.27050.3011
k-NN [28]Underweight0.20000.04050.06740.30540.31620.27740.3410
Normal weight0.33420.61500.4331
Overweight0.38180.29300.3316
NBUnderweight0.10000.02110.03490.26160.29490.26060.3393
Normal weight0.36960.59130.4548
Overweight0.31520.27230.2921
DTUnderweight0.20000.04050.06740.30540.31620.27740.3410
Normal weight0.33420.61500.4331
Overweight0.38180.29300.3316
Table 11. Leave-one-person-out cross-validation results of the five benchmark methods when SMOTE is used.
Table 11. Leave-one-person-out cross-validation results of the five benchmark methods when SMOTE is used.
MethodClassTPRPPV F 1 Macro-AverageAccuracy
TPRPPV F 1
C-SVMUnderweight0.16670.11630.13700.39810.39040.39260.5560
Normal weight0.68210.68960.6858
Overweight0.34550.36540.3551
nu-SVMUnderweight0.03330.03030.03170.36200.36620.36330.5719
Normal weight0.72550.68810.7063
Overweight0.32730.38030.3518
k-NN [28]Underweight0.30000.20000.24000.48520.45360.46270.5790
Normal weight0.62230.73160.6725
Overweight0.53330.42930.4757
NBUnderweight0.03330.02270.02700.34040.34500.34220.5187
Normal weight0.63040.66100.6453
Overweight0.35760.35120.3544
DTUnderweight0.07690.10000.08700.33870.34110.33940.5062
Normal weight0.64740.63860.6430
Overweight0.29190.28480.2883
Table 12. Performance comparisons of the five benchmark methods over leave-one-person-out cross validation.
Table 12. Performance comparisons of the five benchmark methods over leave-one-person-out cross validation.
MethodMacro-AverageAccuracy
TPRPPV F 1
C-SVM + undersampling0.33970.34930.30570.3837
C-SVM + SMOTE0.39810.39040.39260.5560
nu-SVM + undersampling0.33610.34740.30190.3766
nu-SVM + SMOTE0.36200.36620.36330.5719
k-NN + undersampling [28]0.30540.31620.27740.3410
k-NN + SMOTE0.48520.45360.46270.5790
NB + undersampling0.26160.29490.26060.3393
NB + SMOTE0.34040.34500.34220.5187
DT + undersampling0.30540.31620.27740.3410
DT + SMOTE0.33870.34110.33940.5062
Table 13. Leave-one-person-out cross-validation performance comparisons between ensemble models with different numbers of k-NN classifiers.
Table 13. Leave-one-person-out cross-validation performance comparisons between ensemble models with different numbers of k-NN classifiers.
Method# of k-NNsMacro-AverageAccuracy
TPRPPV F 1
Comparison model #130.48890.45590.46460.5950
Comparison model #260.53000.49970.50300.6448
Comparison model #3100.58250.54500.55790.6927
Proposed ensemble model150.64220.58780.60580.7300
Comparions model #4210.58640.55640.56760.7069
Table 14. Leave-one-person-out cross-validation performance comparisons between the proposed ensemble model and five benchmark methods.
Table 14. Leave-one-person-out cross-validation performance comparisons between the proposed ensemble model and five benchmark methods.
MethodMacro-AverageAccuracy
TPRPPV F 1
C-SVM0.39810.39040.39260.5560
nu-SVM0.36200.36620.36330.5719
k-NN [28]0.48520.45360.46270.5690
NB0.34040.34500.34220.5187
DT0.33870.34110.33940.5062
Proposed0.64220.58780.60580.7300
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kwon, B.; Lee, S. Ensemble Learning for Skeleton-Based Body Mass Index Classification. Appl. Sci. 2020, 10, 7812. https://doi.org/10.3390/app10217812

AMA Style

Kwon B, Lee S. Ensemble Learning for Skeleton-Based Body Mass Index Classification. Applied Sciences. 2020; 10(21):7812. https://doi.org/10.3390/app10217812

Chicago/Turabian Style

Kwon, Beom, and Sanghoon Lee. 2020. "Ensemble Learning for Skeleton-Based Body Mass Index Classification" Applied Sciences 10, no. 21: 7812. https://doi.org/10.3390/app10217812

APA Style

Kwon, B., & Lee, S. (2020). Ensemble Learning for Skeleton-Based Body Mass Index Classification. Applied Sciences, 10(21), 7812. https://doi.org/10.3390/app10217812

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop