*2.1. Region-Based Convolutional Neural Network*

In this paper, to detect and localize multiple cracks on the surface of a tested concrete element a region-based convolutional neural network or regions with CNN features (R-CNN) architecture is applied. The R-CNN model works by performing computations in four steps [25]:


**Figure 1.** Flowchart for assessment of concrete surface cracks using Faster R-CNN and 2D DIC.

This approach was proposed by Girshick et al. in 2013 [26]. The schematic architecture of this model is presented in Figure 2.

**Figure 2.** Diagram of the R-CNN model architecture.

The main bottleneck of the R-CNN model is the need to extract features for each proposed region. The R-CNN model was improved by performing CNN forward computation on the whole image. This improved model is known as a Fast R-CNN model [27]. For obtaining precise object detection, Fast R-CNN requires, in general, many proposed regions in selective search. The Fast R-CNN model was improved by the Faster R-CNN model [28] which replaces selective search with a region proposal network (RPN) and reduces the number of generated proposed regions [25]. The schematic architecture of this model is presented in Figure 3.

The main part of these two architectures is a convolutional neural network (CNN). CNN is a special class of a layered feed-forward artificial neural network. CNN was designed for processing images and audio signals [29]. The typical CNN has an input layer, several hidden layers with nonlinear units, and an output layer with linear units (for regression) or nonlinear units (for classification). Each unit computes a weighted sum of its inputs (activation of the unit). The activation is sent to a transfer function, an S-shaped function such as sigmoid function or rectified linear unit (ReLU) function [29].

The training process use the backpropagation algorithm for efficiently computing the gradient of the loss function and stochastic gradient descent (SGD) algorithm for learning the weights of the CNN model. During training of the convolutional neural network with several millions of parameters, the main issue is the overfitting of the neural model to the dataset. Fortunately, there are several techniques to cope with the overfitting. For example, on can apply transfer learning to a pre-trained

model [29]. Transfer learning is a technique used for adaptation of the pre-trained model to another dataset. It is performed by additional training of selected convolutional layers of CNN while the rest of the layers are preserved [29].

**Figure 3.** Diagram of the Faster-R-CNN model architecture.

In this paper, as a CNN base the Inception V2 architecture is applied. Inception V2 is a convolutional neural network proposed by Szegedy et al. from Google Research in 2016 [30]. These architectures are available through the TensorFlow object detection API [31] which is an open-source library developed by Google Research and built upon TensorFlow [32]. It allows easy development and deployment of CNN-based models for object detection and other computer vision problems.
