2.2.1. Popular Models of DCNN

In 1998, Lecun et al. [24] constructed the first convolutional neural network model (LeNet-5), which has excellent identification performance in handwritten font identification tasks. In 2012, Krizhevsky and Hinton [25] proposed AlexNet, which won first place in that year's ImageNet Visual Recognition Challenge. Since then, various artificial intelligence applications based on DCNN have been merged, and DCNN has developed rapidly. Some new models have been proposed, which can be divided into the branchless model, such as VGG, and modular stacked models, such as GoogleNet, ResNet, and DenseNet.

VGGNet [26] constructs a deeper network structure based on AlexNet network to improve the learning ability of image features. Meanwhile, VGGNet stacks smaller convolution kernels to reduce network parameters and iterations. VGGNets are still widely used in image feature extraction due to their excellent performance. In the same year, Google launched GoogleNet [27], which used far less network parameters than VGGNet. Therefore, it takes up less memory and computing resources in computing. It also used a modular network structure containing convolution kernel parallel merging. After years of optimization and improvement, several versions have been derived.

ResNet [28] adopts the same modular stack structure as VGGNet and introduces a novel residual structure to greatly improve the fitting ability and overcome the degradation problem of deep neural network. Subsequently, DenseNet [29] introduces a dense block structure to reuse the features of each layer of feature map, thereby improving the transmission of features in the network, improving the identification efficiency of the network, and reducing the number of network parameters.

Although the deep convolutional network model has excellent performance, its computational efficiency is low, due to the complex network structure, which makes DCNNs difficult to widely applied in practical engineering.

#### 2.2.2. Lightweight Convolutional Neural Networks

Lightweight convolutional neural networks aim to reduce computational storage and increase recognition speed. Lightweight convolutional neural networks mainly include ShuffleNet series and MobileNet series. Howard et al. [30] proposed MobileNetV1, which uses a straight network structure and replace the deep separable convolution instead of traditional convolutional layers. By this improvement, the model parameters can be greatly reduced while ensuring the computational accuracy of the network. Then, Sandler et al. proposed MobileNetV2, which adopts the deep separable convolution instead of the traditional standard convolution and adds the inverted residuals block and linear bottlenecks structure. Therefore, MobileNetV2 reduces the number of model parameters while ensuring accuracy. The convolution process of MobileNetV2 is shown in Figure 2. Due to the excellent performance and lightweight size, MobileNet series models are often used in various recognition fields [31–34]. Because the moving load identification needs to respond rapidly to the vehicle information in driving, this paper introduces MobileNetv2 into the moving load identification.

**Figure 2.** Convolution process of MobileNetV2.

## *2.3. Transfer Learning*

The main concept of transfer learning is to utilize data from similar fields to solve the problem of data shortage in target fields. Its goal is to utilize the knowledge learned from an original environment to help learn tasks in a new environment. Depending on the requirements for transfer into the target domain, it can be classified as the following forms [35]: (1) instance transfer; (2) feature representation transfer; (3) parameter transfer; (4) relational knowledge transfer.

The strong transferability of neural network model greatly improves the applicability of transfer learning in the field of deep learning. Compared with the general transfer learning method, the transfer learning strategy in DCNNs transfers the shallow feature extraction ability (i.e., texture, edge feature, etc.) in the source domain to the target domain. Therefore, fine-tuning is commonly used in DCNNs for transfer learning. Specifically, it is to freeze the front several layers network of the pre-trained model, retrain the remaining layers, and replace the task classifier to match the new learning task when the new dataset has a small amount of data, and it is significantly different from the training dataset used by the pre-trained model. An example of transfer learning is shown in Figure 3.

**Figure 3.** Illustration of transfer learning strategy process.
