**3. Methods**

In this section, we first describe our deep transfer–learning framework and then propose weighted-cluster loss. The framework of the proposed method is shown in Figure 1.

**Figure 1.** Framework of the proposed method. A SE-ResNet-50 model [21], which was pre-trained on VGGFace2 data [22] for face identification, is fine-tuned with AffectNet data [5] for facial recognition using weighted-cluster loss. Before the fine-tuning phase, we add one more fully connected layer to the model while froze the three first stages of the pre-trained model to save computing power. The weighted-cluster loss is used at the output layer to update model parameters. Best view in color.

#### *3.1. Base Model and Pre-Training*

#### 3.1.1. Base Models

Convolutional neural networks have achieved grea<sup>t</sup> success in the fields of pattern recognition and computer vision. This motivated us to base our FER models on a recent representative CNN architecture in the computer vision field. In this work, we employed the SE-ResNet-50 model [21], which is the ResNet-50 model [47] integrated with SE-ResNet modules as our base model. This CNN architecture uses SE modules that integrated with ResNet CNN architectures and improved the feature learning capacity of the integrated model. A wide range of experiments showed the effectiveness of SENets that achieve state-of-the-art performance across multiple datasets and tasks. This was demonstrated for object and scene classification, with a squeeze-and-excitation network winning the ILSVRC 2017 competition.
