3.1.6. Classification

AU intensity classification was performed using SVM, RF, and kNN classifiers after reducing the feature vectors. SVMs analyze the data and are used for pattern recognition. They construct a hyperplane or a set of hyperplanes which are further used for regression and classification problems. Discriminative hyper planes are found by the SVM classifiers including the highest margin for dividing the data that belongs to the different classes. However, several kernel types can affect the efficiency of the SVM classifiers. In our study, we used the *C* = 6 AU intensity levels of each of the following AUs. The strategy considered is the one-against-one strategy where *C*(*C* − 1)/2 binary discriminant functions are defined, one for each possible pair of classes. The Gaussian RBF kernel was used. In our study, each AU and each frame were considered individually since it is an appearance-based approach. The results are also largely affected by the face region alignment.

#### *3.2. Databases Considered*

Training of ML algorithms depend on the type and size of database used. A low number of images in a database that is being used for training, can cause under-fitting. To counter the issue, we need a large database for training the available ML algorithms. The estimations and results are remarkably affected by the use of larger databases over the smaller ones; hence, for more accurate results a database is required in abundance. Emotion classification and their intensity estimations require a vast and varied dataset for validation and testing. The images used for intensity estimation and recognition of emotion are spontaneous and posed.


Hence, a close relationship exists between the models used for the intensity of facial emotions and the databases used. Mainly five databases CK [62,63], JAFFE [64], B-DFE and an in-house dataset of 200 images (20 images of each of the basic emotions considered) taken in an uncontrolled environment through a web camera. Here is a brief description of each database:


It should be noted that the AUs labeled by each database were different. Hence, comparison between posed and spontaneous images was not possible for all databases. The only comparison would be between DISF and B-DFE; however, B-DFE is a 3D image database and features extracted from those images unlike DISF, which had 2D images. Hence, a comparison between those two would not be a fair comparison. The feature vector of a 2D image was in the form [width, height, 3] and the feature vector in the 3D image dataset was of the form [width, height, depth, 3], i.e., both feature representations would be different and hence, not comparable. A possible way was to perform feature extraction by passing 3D volumes through a pre-trained 3D network/algorithm, or to perform 2D feature extraction on each slice of the volume and then combine the features for each slice, using PCA to reduce the dimensionality. However, this would impact the accuracy. Therefore, such a comparison was not presented.
