Fusion Network

After extracting the features from the three regions (eyes, nose, and mouth), three stream features defined as T1, T2, and T3 were obtained. The three stream features were then concatenated together to achieve better recognition by

$$T = T\_1 \oplus T\_2 \oplus T\_{\mathfrak{Z}\_\*} \tag{4}$$

where T is the fused feature and ⊕ represents the concatenation operation. The concatenated features T were used as inputs to the next operation of the network.
