*2.3. Verification*

To verify the hypothesis about the capability of the trained model to detect rotated structures, the experiment presented below was conducted.

First, from UBEAR dataset [32], fragments of binary masks with precise ear localization (100 samples) were extracted. Next, their graph representations were created using the SLIC algorithm [26]. Nodes of those graphs corresponded to generated superpixels and were described by their average intensity normalized to the [0, 1] interval (Figure 5a). Directed edges connected superpixels' centroids (Figure 5b) and had Cartesian pseudo-coordinates assigned.

Those graphs allowed generating a family of training D*TR <sup>θ</sup>* , validation D*VA <sup>θ</sup>* and test D*TE <sup>θ</sup>* sets with 60, 20, and 20 samples, respectively, where for every input graph, the expected output graph was prepared. The values assigned to nodes of output graphs indicated whether the node approximately corresponded to an outer edge at *θ* angle. They were calculated using the following formula:

$$h\_{\theta}(s) = \max\left(\frac{1}{\Omega\_{\theta}(s)} \sum\_{t \in \mathcal{N}(s): \omega\_{\theta}(s,t) > 0} \omega\_{\theta}^{2}(s,t)(f(t) - f(s)), 0\right) \tag{4}$$

where:

$$\Omega\_{\theta}(s) = \sum\_{t \in \mathcal{N}(s): \omega\_{\theta}(s,t) > 0} \omega\_{\theta}^{2}(s,t) \tag{5}$$

and:

$$
\omega\_{\theta}(s) = \sin(\mathfrak{a}(s, t) - \theta) \tag{6}
$$

As before, *f* and *h* represent feature vectors of input and output graphs' nodes and *α*(*s*, *t*) is the angle between the horizontal axis and edge connecting nodes *s* and *t*. The sample, horizontal edge found in this way for *θ* = 0 is depicted in Figure 5c.

**Figure 5.** Images and graph used for initial verification of the proposed approach: (**a**) original image with superpixels detected; (**b**) its graph representation; (**c**) the expected output. In this experiment, the expected average size of the superpixel was equal to *A* = 400, and graph nodes were connected only if the corresponding superpixels were adjacent. It is worth noticing that the irregularities of superpixels and consequently irregularities of the graph structure are present only if image colors are not uniform. This is a typical situation when the SLIC algorithm is used for superpixel generation.

Having these data prepared, the CNN with GMM filters was trained using only D*TR* <sup>0</sup> . The MSE loss and Adam optimizer with the learning rate equal to 10−<sup>3</sup> were applied. The validating samples were used to check if the model was not overfitted and to select the optimal one. The network contained *L* = 2 convolutional layers. There were 10 groups with 1 filter in the first layer *φ*<sup>1</sup> and 1 group with 10 filters in the second one *φ*2. Every filter contained *J* = 4 Gaussian functions. In the first layer, ReLU and, in the second, identity activation functions were used. In Figure 6, the sample trained GMM filter *ϕ* from the first layer together with its rotated version *ϕθ* are presented. The results depicted in Figure 7 prove that the proposed concepts behaved correctly not only for training samples, but they were also able to generalize and give reasonable results for unseen graphs. In Table 1, a systematic evaluation is presented. It is evident that if filters in the original network Φ<sup>0</sup> were rotated by an angle *θ*, the resulting network Φ*<sup>θ</sup>* was able to detect structures rotated by the same angle (significantly smaller MSE error) effectively.

## **3. Results and Evaluation**

This section contains the results of the experiments conducted on the UBEAR dataset [32]. First, the convolutional network was trained to detect ears in images with a normal head orientation. Next, it is applied for the detection of ears for other, selected head poses. At the end of this section, the discussion of the experiments' outcomes is presented.

**Figure 6.** Sample GMM filter in layer *φ*<sup>1</sup> of the CNN trained in the described experiment: (**a**,**b**) 2D and 3D filter visualization, respectively; (**c**) GMM filter rotated by an angle *θ*. In all cases, red color represents a negative value, and green color represents a positive one. Black color corresponds to values close to 0.

**Figure 7.** Outputs of the trained CNN for different filter rotation angles *θ*: (**a**,**b**) results for training image shown in Figure 5; (**c**) result for the image from test set. It should be noticed that network Φ<sup>0</sup> trained to detect structures in their basic orientation is able to give a reasonable answer when its rotated version Φ*<sup>θ</sup>* is used.

**Table 1.** MSE errors of network <sup>Φ</sup><sup>0</sup> (trained using <sup>D</sup>*TR* <sup>0</sup> to detect horizontal edges) and its rotated versions Φ*<sup>θ</sup>* calculated for datasets D*<sup>θ</sup>* with different expected orientation of edges. As was expected, when the edge rotation corresponds to network (filter) rotation, the MSE errors are significantly smaller than errors obtained for the original network Φ0. The presented errors in fact involve only a small number of superpixels as both in network outputs and in expected graphs, most of the values are equal to 0.

