**5. Results**

In this section, the training process and the results of the experiments as well as the final performance of the convolutional neural network (CNN) for image-based detection of flaws in concrete elements are presented.

In Figure 12, the training and validation losses for the network applied in the experiments are presented. From the plots, it can be observed that the minimal training loss achieved for the network with image augmentation and fine-tuning are 0.04 and 0.06. In Table 3, the lowest training and validation losses and training time for one epoch are given.

**Figure 12.** Performance of convolutional neural network (CNN): (**a**) training and validation losses, (**b**) training and validation accuracies.

**Table 3.** Training and validation losses and training time for one epoch.


In Figure 12, the training and validation accuracies during training within 100 epochs are presented. It can be seen that the best results obtained in terms of accuracy are similar to the corresponding results in terms of loss. The training accuracy after 100 epochs is 98% while the maximal validation accuracy is 97%. Table 4 shows the highest training and validation accuracies. The table also contains the times needed to perform one epoch of training. The training time for the large pre-trained CNN with image augmentation and fine-tuning is 18 s per epoch.

**Table 4.** Training and validation accuracies and training time for one epoch.


After the training process, it is possible to visualize the filters of the convolutional and pooling layers. It can be useful to better understand how the CNN model represents the visual information from the training dataset. In Figure 13, the image shows the visualization of 32 filters from the first convolutional layer, which is shown in the form of merged sub-images (2 rows and 16 columns). The size of each filter is 150 × 150 px.

**Figure 13.** Visualization of 2 × 16 filters from the first convolutional layer.

In Figure 14, the image shows the visualization of the corresponding 32 filters after the first pooling layer. The size of each filter is 75 × 75 px.

**Figure 14.** Visualization of 2 × 16 filters after the first max pooling layer.

Lastly, the trained convolutional neural network was checked against generalization capacity by using the testing dataset. After analysing the new B-scans by the network after fine-tuning, the generalization accuracy for the final network was close to 99%.
