**3. Results and Discussion**

#### *3.1. Datasets*

Four datasets—Microsoft Hand Gesture (MHG) [66], Padua Hand Gesture (PHG) [67], Padua FaceDec (PFD) [10], and Padua FaceDec2 (PFD2) [9]—were used to experimentally develop the system proposed in this work. The faces in these datasets were captured in unconstrained environments. All four datasets contain colored images and their corresponding depth maps. All faces are upright and frontal with each possessing limited degrees of rotation. Originally, for two datasets, the faces were collected for gesture recognition rather than face detection. In addition, a separate set of images was collected for preliminary experiments and for parameter tunings. These faces were extracted from the Padua FaceDec dataset [10]. As in [9], these datasets were merged to form a challenging dataset for face detection.

In addition to the merged datasets, experiments are reported on the BioID dataset [56] so that comparisons with the system proposed here can be made with other face detection systems. Each of these five datasets is discussed below, with important information about each one summarized in Table 1.

MHG [66] was collected for the purpose of gesture recognition. This dataset contains images of 10 different people performing a set of gestures, which means that not only does each image in the dataset include a single face, but the images also exhibit a high degree of similarity. As in [9], a subset of 42 MHG images was selected, with each image manually labeled with the face position.

PHG [67] is a dataset for gesture recognition. It contains images of 10 different people displaying a set of hand gestures, and each image contains only one face. A subset of 59 PHG images were manually labeled.

PFD [10] was acquired specifically for face detection. PFD contains 132 labeled images that were collected outdoors and indoors with the Kinect 1 sensor. The images in this dataset contain zero, one, or more faces. Images containing people show them performing many different daily activities in the wild. Images were captured at different times of the day in vary lighting conditions. Some faces also exhibit various degrees of occlusion.

PFD2 [9] contains 316 images captured indoors and outdoors in different settings with the Kinect 2 sensor. For each scene, a 512 × 424 depth map and a 1920 × 1080 color image were obtained. Images contain zero, one, or more faces. Images of people show them in various positions with their heads tilted or next to objects. The outdoor depth data collected by Kinect 2 are highly noisy compared to the images collected with Kintect 1. This makes PFD2 an even more challenging dataset. The depth data was retroprojected over the color frame and interpolated to the same resolution to obtain two aligned depth and color fields.

**Table 1.** Characteristics of the six datasets. MHG: Microsoft Hand Gesture, PHG: Padua Hand Gesture, PFD: Padua FaceDec, and PFD2: Padua FaceDec2.


The MHG, PHG, PFD, and PFD2 datasets were merged, as in [9], to form a larger, more challenging dataset, called MERGED, containing 549 images with 614 total faces. Only upright frontal faces with a maximum rotation of ±30◦ were included. Parameter optimization of the face detectors was manually performed and fixed for all images even though they came from four datasets with different characteristics.

As a final dataset for validating the approach proposed in this work, we chose one of the leading benchmark datasets for upright frontal face detection: the BioID dataset [56]. It contains 1521 images of 23 people collected during several identification sessions. The images in BioID are gray-scale and do not include depth map information. Moreover, the degree of rotation in the facial images is small. As a consequence, most of the filters applied to the ensembles were not transferable to the BioID dataset. Despite this shortcoming, this dataset is useful in demonstrating the effectiveness of the ensembles developed in this work.
