*5.3. Metrics*

To evaluate the segmentation performance, three criteria are used. The first one is the intersection over union (IU) generally defined as Equation (11), where *L* represents for the ground truth, R represents the segmentation result; the second one is the pixel accuracy (PA) defined in Equation (12) to evaluate the portion of the area which need to be surveilled are segmented; and the extra pixel (EP) as Equation (13) is used to evaluate the portion of segmented areas which do not need to be surveilled. PA would influence the missing part of the track area which would cause a missing alarm, and the EP would influence the extra part of the track area which would cause a false alarm.

$$III = \frac{L \cap R}{L \cup R} \tag{11}$$

$$PA = \frac{L \cap R}{L} \tag{12}$$

$$EP = \frac{R - L \cap R}{L} \tag{13}$$

#### *5.4. Performance of the Proposed Segmentation Algorithm*

The proposed algorithm is compared with MCG and FCN using images from railway dataset and some examples are shown in Figure 14. In the experiment, the computation platform is equipped with an Intel i5-6500 CPU, 8 GB DDR3 memory, without GPU and MATLAB 2012, and images in the dataset are resized to 90 × 150. The MCG method is the pre-trained demo from [17]. The FCN network uses

a standard VGG16 structure trained by VOC2012 dataset for the feature extracting, and upsampled the outputs of the third, fourth, and seventh convolution layers.

**Figure 14.** Using different algorithm to detect the track area. (**a**) The original railway scenes. (**b**) Ground truth of track areas. (**c**) Results of the MCG algorithm. (**d**) Results of the FCN algorithm. (**e**) Results of our algorithm.

The missing part and the extra part of the segmented track area are shown in Figure 15. For the MCG algorithm, it used the CRFs to combine the fragmented regions into one unified area based on the texture which caused the missing part (as shown in Figure 15e) because of the difference texture between the nearby track and the distant track. The performances of the FCN algorithms were improved slightly from their original results in [19] because of the monotonous railway scene and the small amount of categories; but not too significantly because the shape and color textures of the scene images sampled with different illuminations, weather, and in different seasons were still complex. As shown in Figure 15f,i, the smooth boundary line of the FCN algorithm was not suitable for our railway scene parsing because of the concave and convex shapes at the straight and sharp edge of the region, especially near the area with an acute angle and straight line. Concave and convex shapes caused both a missing part and an extra part of the track area when compared with the ground truth, which would release both the missing alarms and false alarms. For the engineering application, our system would rather release a false alarm than miss a true alarm.

**Figure 15.** Missing and extra areas of different methods comparing with the ground truth. (**a**) Manual label of track areas. (**b**) Results of the MCG. (**c**) Results of the FCN. (**d**) Results of our method. (**e**) Missing part of MCG. (**f**) Missing part of FCN. (**g**) Missing part of our method. (**h**) Extra part of MCG. (**i**) Extra part of FCN. (**j**) Extra part of our method.

The performances of the three algorithms are shown in Table 4. It can be found that the proposed algorithm with four optimal Gaussian kernels achieves the highest score in PA, which means that the greatest portion of the surveillance area is found out and thus is preferred for applications.


**Table 4.** Experimental results of different algorithms.
