2.3.3. Segmentation-Based Filtering (SEG and ELL)

SEG and ELL, proposed in [9], apply the segmented version of the depth image to compare its dimension to its bounding box in SEG or to its shape (which should approximate that of an ellipse) in ELL. From this information, two simple but useful evaluations can be made. In the case of SEG, the relative dimension of the larger area can be compared to the entire candidate image. The candidate regions where the area of the larger region is less than 40% of the entire area are rejected. In the case of ELL, the larger region is given a fitness score using the least-squares criterion to determine its closeness to an elliptical model. This score is calculated here using the MATLAB function fit\_ellipse [61]. The candidate regions with a score higher than 100 are rejected.

## 2.3.4. Eye-Based Filtering (EYE)

EYE, as proposed in [9], uses the presence of eyes in a region to detect a face. In EYE, two robust eye detectors are applied to candidate face regions [62,63]. Regions with a low probability of containing two eyes are rejected.

One of the eye detectors [62] used in EYE is a variant of the Pictorial Structures (PS) model. PS is a computationally efficient framework that represents a face as an undirected graph *G* = (*V*, *E*), where the vertices V correspond to facial features. The edges *E* describe the local pairwise spatial relationships between the feature set. PS is expanded in [62] so that it can deal with complications in appearance as well as with many of the structural changes that eyes undergo in different settings.

The second eye detector, presented in [63], makes use of color information to build an eye map that highlights the iris. A radial symmetry transform is applied to both the eye map and the original image once the area of the iris is identified. The cumulative results of this enhancement process provide the positions of the eye. Face candidates are rejected in those cases where detection of the eyes fall outside a threshold of 1 for the first approach [62] and of 750 for the second approach [63].

2.3.5. Filtering Based on the Analysis of the Depth Values (SEC)

SEC, as proposed in [9], takes advantage of the fact that most faces, except those where people are lying flat, are on top of the body, while the remaining surrounding volume is often empty. With SEC, candidate faces are rejected when the neighborhood manifests a different pattern from that which is expected.

The difference in the expected pattern is calculated as follows. First, the rectangular region defining a candidate face is enlarged so that the neighborhood of the face in the depth map can be analyzed.

Second, the enlarged region is then partitioned into radial sectors (eight in this work, see Figure 4), each emanating from the center of the candidate face. For each sector *Seci*, the number of pixels *ni* are counted whose depth value *dp* is close to the average depth value of the face *d*, thus:

$$n\_i = \left| \left\{ p : \left| d\_p - \overline{d} \right| < t\_d \land p \in \text{Sec}\_1 \right\} \right| \tag{7}$$

where *td* is a measure of closeness (*td* = 50 cm here).

**Figure 4.** Examples of partitioning of a neighborhood of the candidate face region into 8 sectors (gray area). The lower sectors *Sec*<sup>4</sup> and *Sec*<sup>5</sup> that should contain the body are depicted in dark gray [9].

Finally, the number of pixels per sector is averaged on the two lower sectors (*Sec*<sup>4</sup> and *Sec*5) and then again on the remaining sectors, from which two of the values, *nu* and *nl* respectively, are obtained. The ratio between *nu* and *nl* is then computed as:

$$\frac{n\_l}{n\_u} = \frac{\frac{1}{2}(n\_4 + n\_5)}{\frac{1}{6}(n\_1 + n\_2 + n\_3 + n\_6 + n\gamma + n\kappa)}.\tag{8}$$

If the ratio drops below a certain threshold, *tr* (where *tr* = 0.8 here), then the candidate face is removed.
