**2. YOLO V4 Locates the Pantograph Region**

The Alexey-proposed You Only Look Once (YOLO) V4 is a huge upgrade to the one-stage detector in the field of object detection [19]. Compared with the previous version of YOLO, YOLO V4 replaces the backbone network from the original darknet53 to CPSdarknet53 on the basis of YOLO V3, which makes YOLO V4 effectively reduce the amount of computation and improve the learning ability. Meanwhile, YOLO V4 replaces spatial pyramid pooling (SPP) with feature pyramid networks (FPN), which splices feature maps at different scales and increases the receptive field of the model, enabling YOLO V4 to extract more details.

Average Precision (AP) and Mean Average Precision (mAP) are important metrics to measure the performance of the target detection algorithm, while AP-50 and AP-75 are the AP values when the corresponding Intersection over Union (IoU) thresholds are set to 0.5 and 0.75. The performance of YOLO V4 and current mainstream object detection algorithms on two datasets, Visual Object Classes (VOC) and Common Objects in Context (COCO), is shown in Figure 3.

Figure 3 shows that the YOLO V4 has clear advantages in all aspects. Alexey had pointed out that the YOLO V4 was the most advanced detector at that time, and even now it still seems that the YOLO V4 has great advantages and performance [19]. Therefore, YOLO V4 is used to locate the pantograph region in this study, and the located pantograph region is passed into the subsequent algorithm. The overall algorithm flow for locating the pantograph region using YOLO V4 is shown in Figure 4.

(**a**)

**Figure 3.** Comparison of YOLO V4 with other mainstream neural networks [20–32]. (**a**) Test results on VOC2007 + VOC2012. (**b**) Test results on the COCO dataset.

**Figure 4.** YOLO V4 overall algorithm process.

#### **3. HSC Blur and Dirt Detection Algorithm**

#### *3.1. Blurry HSC Screen and Dirty HSC Screen*

During the operation of HSR, the HSC is always exposed to the outside of the car, which makes the HSC extremely vulnerable to interference from the outside. The external interference affecting the HSC is mainly divided into two kinds: one is the influence of rain on the pantograph during rainy days, and the other is the influence of the dirt attached to the HSC lens on the pantograph.

#### 3.1.1. Rainwater

HSR operation needs to face very complicated weather conditions, especially in rainy days, rainwater will directly affect the imaging of HSC. Figure 5 illustrates the different degrees of impact of rain on the HSR-A and HSR-B. when HSR is running at high speed, rainwater tends to cause blurring of the HSC imaging, making the captured pantographs unclear and thus causing the YOLO V4 to incorrectly assess the pantographs.

**Figure 5.** Blurred HSC imaging caused by rainwater. (**a**) HSR-A. (**b**) HSR-B.

## 3.1.2. Dirty

The lens dirt attached to the HSC can generally only be removed by manual cleaning. As shown in Figure 6, during the period from the time when the lens is dirty to before the dirt is artificially cleaned, the dirty lens will continue to affect the overall evaluation of the pantograph by YOLO V4.

**Figure 6.** The HSC lens has a lot of dirt attached to it. (**a**) HSR-A. (**b**) HSR-B.

#### *3.2. External Factors Cause YOLO V4 to Fail to Locate the Pantograph*

When YOLO V4 cannot locate the pantograph due to external interference, the approximate position of the pantograph in the current screen can be inferred from the pantograph position that was determined in the previous normal screen. When YOLO V4 locates the pantograph area, it only needs to obtain four parameters of the bounding box in Figure 2 to achieve its accurate positioning. These four parameters are the horizontal coordinates (*xlef t*) and vertical coordinates (*ytop*) of the point (*Ptop*−*lef t*) in the upper left corner of the bounding box, and the width and height of the pantograph. The variation of the four

parameters of the bounding box positioned by YOLO during normal operation of HSR of two different models is shown in Figure 7.

**Figure 7.** Changes of the four parameters of the bounding box when YOLO V4 is positioned normally without external interference.

As can be seen from Figure 7, whether it is HSR-A or HSR-B, when its normal operation is not disturbed by external scenes, the pantograph region positioned by YOLO V4 is always relatively fixed, although there is a small range of jitter. This small-scale jitter is caused by a combination of factors such as the bumps during the operation of the HSR and the force changes between the pantograph and the catenary. This jitter does not affect the approximate position of the pantograph in the image, so when the YOLO V4 is unable to locate the pantograph area due to external interference, the approximate position of the pantograph in the current frame can be inferred from the coordinate information obtained from the previous frame, and subsequent analysis can be performed.

#### *3.3. Improved Image Sharpness Evaluation Algorithm*

Brenner algorithm is a classical blur detection algorithm [33], which finally achieves the evaluation of image sharpness by accumulating the square of the grayscale difference between two pixel points. Since the gray value of the image at the focal position changes significantly compared with the telefocused image, and the image at the focal position has more edge information, a more accurate judgment of the sharpness of the image can be made using this method. However, the traditional Brenner algorithm cannot meet the complex scene changes and variable external disturbances that need to be faced during the operation of high speed rail, so this paper proposes the emphasize object region-Brenner (EOR-Brenner) algorithm combined with the pantograph region localized by YOLO V4. The principle of EOR-Brenner is shown in Equation (1).

$$\begin{aligned} F &= k\_1 F\_{IMG} + k\_2 F\_{ROI} \\ &= k\_1 \sum\_{\mathbf{x}=0}^{\text{img.cols}-3 \text{ img.rowss}-1} \sum\_{\mathbf{y}=0}^{\text{ym.cols}-1} [f(\mathbf{x}+2, \mathbf{y}) - f(\mathbf{x}, \mathbf{y})]^2 \\ &+ k\_2 \sum\_{\mathbf{x}=x\_{lift}}^{\text{x}\_{lift}+\text{width } y\_{ltop}+h\_l \text{width}} [f(\mathbf{x}+2, \mathbf{y}) - f(\mathbf{x}, \mathbf{y})]^2 \end{aligned} \tag{1}$$

where *x* is the horizontal coordinate of a pixel point, *y* is the vertical coordinate of a pixel point, *f*(*x*, *y*) is the gray value of the pixel point, *FIMG* and *FROI* are the sharpness results of the corresponding region. *k*<sup>1</sup> and *k*<sup>2</sup> are the weights of the corresponding region, and *F* is the final result of the improved Brenner algorithm.

Although the ROI occupies a relatively small area of the whole image, the pantograph, as the key research object, should be given a higher weight to the area where it is located. In this study, we recommend that *k*<sup>1</sup> can be 2 or 4 times of *k*2, and the specific choice should be made flexibly according to the actual operation line of HSR. After the values of *k*<sup>1</sup> and *k*<sup>2</sup> are determined, the appropriate threshold (*λ*) is selected based on the calculated EOR-Brenner to achieve the differentiation and detection of clear and blurred images.

As shown in (2), when the final result *F* of EOR-Brenner is higher than the set threshold (*λ*), the image captured by the current HSC is considered to be clear. If the pantograph cannot be detected or is detected as abnormal at this time, it can be assumed that the current detection result is not affected by the blurring of the HSC screen. However, there are still two situations: (1) the current pantograph is in normal state, although it is not affected by the blurred screen, but it may be disturbed by other external environment such as complex background, which leads to the normal pantograph being undetectable or the pantograph is incorrectly detected as abnormal. (2) The pantograph is really abnormal. At this time, it is necessary to further evaluate the real state of the pantograph through the subsequent algorithm, and finally realize the accurate detection of the real state of the pantograph.

$$\begin{cases} \text{Clear} & \text{image}\_{\prime} \\ \text{Blurred} & \text{image}\_{\prime} \quad F < \lambda \end{cases} \tag{2}$$

## *3.4. Blob Detection Algorithm Detects Screen Dirt*

When dirt is attached to a HSC, it is very easy to form blobs. Blobs caused by dirt have different areas, convexity, circularity and inertia rates, so these attributes can be used to detect and filter the blobs [34–37], and the number of blobs can ultimately determine whether the HSC is dirty or not.

The area of the blob (*S*) reflects the size of the detected blob, while the circularity derived from the area of the blob (*S*) and the corresponding perimeter (*C*) reflects the degree to which the detected spot is close to a circle, and the calculation of the circularity is shown in Equation (3):

$$Value\_{circularity} = \frac{4\pi S}{C^2} \tag{3}$$

The convexity reflects the degree of concavity of the blob. The convexity of the blob can be obtained from the area of the blob (*S*) and the area of the convex hull (*H*) of the blob, which is calculated as shown in Equation (4):

$$Value\_{convivity} = \frac{S}{H} \tag{4}$$

The inertia rate also reflects the shape of the blob. If an image is represented by *f*(*x*, *y*), then the moments of the image can be expressed by the Equation (5)

$$M\_{ij} = \sum\_{\mathbf{x}} \sum\_{\mathbf{y}} \mathbf{x}^i y^j f(\mathbf{x}, \mathbf{y}) \tag{5}$$

For a binary image, the zero-order moment *M*<sup>00</sup> is equal to its area, so its center of mass is as shown in Equation (6):

$$\{\vec{x},\vec{y}\} = \left\{\frac{M\_{10}}{M\_{00}}, \frac{M\_{01}}{M\_{00}}\right\} \tag{6}$$

The central moment of the image is defined as shown in Equation (7):

$$\mu\_{p\eta} = \sum\_{\mathbf{x}} \sum\_{\mathbf{y}} (\mathbf{x} - \bar{\mathbf{x}})^p (\underline{y} - \bar{y})^q f(\mathbf{x}, \mathbf{y}) \tag{7}$$

If only second-order central moments are considered, the image is exactly equivalent to an ellipse with a defined size, orientation and eccentricity, centered at the image center of mass and with constant radiality. The covariance moments of the image are shown in Equation (8):

$$\text{cov}[\mathbf{f}(\mathbf{x}, y)] = \begin{bmatrix} \mu\_{20}' & \mu\_{11}' \\ \mu\_{11}' & \mu\_{02}' \end{bmatrix} = \begin{bmatrix} \frac{\mu\_{20}}{\mu\_{00}} & \frac{\mu\_{11}}{\mu\_{00}} \\ \frac{\mu\_{11}}{\mu\_{00}} & \frac{\mu\_{02}}{\mu\_{00}} \end{bmatrix} \tag{8}$$

The two eigenvalues *λ*<sup>1</sup> and *λ*<sup>2</sup> of this matrix correspond to the long and short axes of the image intensity (i.e., the ellipse). Then *λ*<sup>1</sup> and *λ*<sup>2</sup> can be expressed by the Equation (9):

$$\begin{aligned} \lambda\_1 &= \frac{\mu\_{20}' + \mu\_{02}'}{2} + \frac{\sqrt{4\mu\_{11}'^2 + \left(\mu\_{20}' - \mu\_{02}'\right)^2}}{2} \\ \lambda\_2 &= \frac{\mu\_{20}' + \mu\_{02}'}{2} - \frac{\sqrt{4\mu\_{11}'^2 + \left(\mu\_{20}' - \mu\_{02}'\right)^2}}{2} \end{aligned} \tag{9}$$

The final inertia rate is obtained as shown in Equation (10):

$$\begin{split} Value\_{\text{inertia}} &= \frac{\lambda\_2}{\lambda\_1} = \frac{\mu\_{20}' + \mu\_{02}' - \sqrt{4\mu\_{11}'^2 + \left(\mu\_{20}' - \mu\_{02}'\right)^2}}{\mu\_{20}' + \mu\_{02}' + \sqrt{4\mu\_{11}'^2 + \left(\mu\_{20}' - \mu\_{02}'\right)^2}} \\ &= \frac{\mu\_{20} + \mu\_{02} - \sqrt{4\mu\_{11}^2 + \left(\mu\_{20} - \mu\_{02}\right)^2}}{\mu\_{20} + \mu\_{02} + \sqrt{4\mu\_{11}^2 + \left(\mu\_{20} - \mu\_{02}\right)^2}} \end{split} \tag{10}$$

The final selection of the number of blobs is achieved by the area, convexity, circularity and inertia rate of the blobs, and when the final number of detected blobs is greater than the set threshold, it can be inferred that the HSC surface is attached to the dirty at this time, so as to achieve the detection of HSC dirty. For the case shown in Figure 6 the final detection result is shown in Figure 8.

**Figure 8.** The HSC Screen dirty detection results. (**a**) HSR-A. (**b**) HSR-B.

#### *3.5. Overall Process of HSC Blur and Dirt Detection Algorithm*

As shown in Figure 9, the number of blobs in the current frame is first detected by the blob detection algorithm, and when the number is greater than the set threshold it is determined that the reason why YOLO V4 cannot achieve positioning in the current frame is due to dirt, and if the number of detected spots is less than the threshold value, the EOR-Brenner is used to evaluate whether the current frame is blurred or not. Finally

correctly evaluate whether the pantograph detection abnormality in the current frame or the pantograph cannot be detected is caused by the dirty and blurred HSC.

**Figure 9.** HSC blur and dirt detection algorithm process flow chart.
