*2.2. VGG16 and Transfer Learning*

The Visual Geometry Group introduced VGG16 in 2014 [34]. The algorithm is very efficient and won first place in object localization and second place in image classification in the ImageNet Large Scale Visual Recognition Challenge. This model trained on the ImageNet dataset achieved a top-1 accuracy of 71.5% and a top-5 accuracy of 90.1% in image classification.

The network contains 13 convolutional layers with 3 × 3 filters (i.e., convolution kernels) to extract features. In addition, the network contains five max-pooling layers to reduce the number of learnable parameters (i.e., weights and biases) and three fully connected layers to map the flattened features to the Softmax layer where target class probabilities are calculated. In addition, the Rectified Linear Unit (ReLU) activation function is used to increase the non-linearity of the model. The network takes 224 × 224 RGB images as inputs and has more than 138 million learning parameters. Figure 3 presents the architecture of the VGG16 model.

**Figure 3.** Architecture of the VGG16 model.

The learning layout of the VGG16 network, and DCNNs in general, is based on optimizing a loss function (e.g., Binary and Multi-Class Cross-Entropy loss) that measures the discrepancy between the predicted outputs and ground truth through back-propagation.

The optimization scheme generally uses gradient descent optimizers (e.g., Stochastic Gradient Descent and adaptive optimizers) to update the learning parameters of the network.

For image classification, VGG16 and other state-of-the-art DCNNs are usually trained on the ImageNet dataset that contains millions of images belonging to thousands of classes. However, since the size of domain-specific datasets (e.g., concrete defects datasets in the case of this study) is limited, Transfer Learning techniques are applied to overcome the scarcity of labeled data.

In a Transfer Learning approach, pre-trained models on large datasets (e.g., ImageNet) are fine-tuned and partially retrained on the small target dataset. In this learning framework, the weights of the lower-level layers are generally maintained since they represent generic features. In contrast, the high-level layers are more sensitive to the target dataset and must be retrained to update their learning parameters [23]. The Transfer Learning settings examined in this paper are detailed in the experimental setup section.
