CPDet: Circle-Permutation-Aware Object Detection for Heat Exchanger Cleaning

Liang, Jinshuo; Wu, Yiqiang; Qin, Yu; Wang, Haoyu; Li, Xiaomao; Peng, Yan; Xie, Xie

doi:10.3390/app14199115

Open AccessArticle

CPDet: Circle-Permutation-Aware Object Detection for Heat Exchanger Cleaning

by

Jinshuo Liang

¹,

Yiqiang Wu

²,

Yu Qin

²,

Haoyu Wang

¹,

Xiaomao Li

²

,

Yan Peng

¹

and

Xie Xie

^2,*

¹

School of Future Technology, Shanghai University, Shanghai 200444, China

²

Research Institute of USV Engineering, School of Mechatronic Engineering and Automation, Shanghai University, Shanghai 200444, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(19), 9115; https://doi.org/10.3390/app14199115

Submission received: 22 August 2024 / Revised: 26 September 2024 / Accepted: 30 September 2024 / Published: 9 October 2024

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Shell–tube heat exchangers are commonly used equipment in large-scale industrial systems of wastewater heat exchange to reclaim the thermal energy generated during industrial processes. However, the internal surfaces of the heat exchanger tubes often accumulate fouling, which subsequently reduces their heat transfer efficiency. Therefore, regular cleaning is essential. We aim to detect circle holes on the end surface of the heat exchange tubes to further achieve automated positioning and cleaning tubes. Notably, these holes exhibit a regular distribution. To this end, we propose a circle-permutation-aware object detector for heat exchanger cleaning to sufficiently exploit prior information of the original inputs. Specifically, the interval prior to the extraction module extracts interval information among circle holes based on prior statistics, yielding prior interval context. The following interval prior fusion module slices original images into circle domain and background domain maps according to the prior interval context. For the circle domain map, prior-guided sparse attention using prior a circle–hole diameter as the step divides the circle domain map into patches and performs patch-wise self-attention. The background domain map is multiplied by a hyperparameter weak coefficient matrix. In this way, our method fully leverages prior information to selectively weigh the original inputs to achieve more effective hole detection. In addition, to adapt the hole shape, we adopt the circle representation instead of the rectangle one. Extensive experiments demonstrate that our method achieves state-of-the-art performance and significantly boosts the YOLOv8 baseline by 5.24% mAP₅₀ and 5.25% mAP_50:95.

Keywords:

shell–tube heat exchanger; industrial object detection; prior information

1. Introduction

Heat exchangers, known for their necessity in industry, are widely used in almost all large industrial scenarios. Among various heat exchangers, the shell–hole one has reached the largest market share at 49.8% [1]. As a long-standing problem, fouling affects heat exchange efficiency and even disrupts the entire circulation system, resulting in huge energy waste and economic losses. Therefore, regular cleaning of heat exchangers is crucial. Current exchanger cleaning is mainly done manually hole by hole, which is inefficient and time-consuming for large-scale industrial scenarios. Automated cleaning has become an industry trend, and the core of automated cleaning is object detection. Current object detection methods are mainly divided into two-stage and one-stage detection. The two-stage methods [2,3] involve two separate regressions to generate the final output, resulting in low inference speed. One-stage methods [4,5] employ a single regression to produce results. This sample and fast pipeline is more suitable for deployment on various hardware platforms. Therefore, one-stage object detection methods are commonly used in industrial detection scenarios, such as weld spot location in industrial welding [6,7] and the localization of leak positions [8,9]. However, when applying one-stage methods to specific industrial scenarios, the lack of model tuning and customization for these special applications can lead to poor detection performance. For example, in defect detection, the contrast between the defect area and the background is low, and the defect scale varies greatly, resulting in poor localization and classification results. To address this issue, MD-YOLO [10] adopts extra data augmentation and multi-channel fusion modules, significantly enhancing the feature extraction capabilities. Similarly, the variation in object scales captured by drones at different altitudes leads to inaccurate localization. Tph-yolov5 [11] solves this problem by adding custom prediction heads and integrating attention modules. Therefore, enhancing detector adaptability in specific scenarios is critical.

This study analysis reveals exploitable prior information of holes, as shown in Figure 1, the circle holes are regularly arranged along the horizontal and the vertical. In addition, the intervals of these regularly distributed holes remain consistent. The general methods often mistake reflective spots in the background information for circle holes when detecting circle holes in the dataset, and at the same time, some edge circle holes have weakened feature information due to aging and corrosion. However, the arrangement and distribution of these circle holes have strong regularity. We can apply this prior information to weaken the background features and strengthen the foreground features, thereby reducing the occurrence of missed detections and false detections. Based on this prior idea, this work designed an object detector for detecting the holes of shell–tube heat exchangers, known as the Circle-Permutation-Aware Detector (CPDet), which uses a one-stage detector as the backbone to innovatively extract the prior information of objects, as well as craftily utilizes the prior information to preliminary distinguish the foreground and background areas, with the weighted approach used to enhance the foreground area features and weaken the background area features to achieve guided feature extraction. Specifically, CPDet contains three novel modules: interval prior extraction (IPE), interval prior fusion (IPF), and prior-guided sparse attention (PSA). IPE extracts interval information among holes and then normalizes it to produce prior interval information for computation in the subsequent IPF module. IPF utilizes the prior interval information with a slicing mechanism to divide the original image into a circle domain map and a background domain map. For the circle domain map, PSA leverages the prior information on the diameter of circle holes to divide the circle domain map into multiple circle patches to enhance the circle-wise focus of the network. The background domain map is processed by multiplying the hyperparametric weak coefficient matrix (WCM) to reduce its impact on the network. Finally, the circle domain map and background domain map are followed by post-processing operations to restore dimensions and then fed into the YOLO backbone for further processing. To achieve high localization accuracy, CPDet utilizes the circle representation replacing the traditional rectangular one. This adjustment is particularly suitable for detecting circle holes, and the architecture of CPDet is illustrated in Figure 2. In summary, the contributions of CPDet can be summarized as follows:

CPDet is the first detector using permutation prior information, and it visually represents the detection results with circles for circle–hole detection in the heat exchanger industry.
IPE, IPF, and PSA are proposed to extract and fuse the permutation priors of the holes, integrate them into the feature extraction process, and enhance the feature interaction among holes, thereby achieving precise detection in the exchanger cleaning industry.
CPDet achieves state-of-the-art performance in heat exchanger circle–hole detection scenarios and significantly boosts the baseline detector by 3.29% $m A P_{50}$ and 3.31% $m A P_{50 : 95}$ .

This paper is organized as follows: Section 1 is the background and introduction. Section 2 introduces the mainstream object detection methods and their applications in industrial scenarios. Section 3 introduces our proposed method for extracting and fusing the prior information of circular hole targets in the thermal exchanger hole detection scenario, as well as the specific approach to adding a sparse attention module. Section 4 describes the experimental study procedure and the analysis and discussion of the results. Finally, Section 5 concludes the paper.

2. Related Work

2.1. Two-Stage Detector

Two-stage object detection represents an early and highly significant application of deep learning in the field of object detection. The advent of R-CNN [12] marked a deep learning milestone in object detection, which features a two-stage process and a selective search that generates candidate regions, followed by a CNN-based classification and bounding box refinement. Fast R-CNN [13] notably accelerates the feature extraction process by sharing computations, while Faster R-CNN [2] further introduces a Region Proposal Network (RPN), enabling end-to-end learning for candidate region generation and greatly enhancing detection efficiency. For detecting objects of different scales, the Feature Pyramid Network (FPN) [14] constructs multi-level feature maps to achieve efficient detection across different scales. Moreover, PANet [15] strengthens the utilization of contextual information through feature path aggregation, further enhancing detection performance. Despite achieving high accuracy, the R-CNN series detectors suffer from low inference speed and a heavy computational burden. Due to the two-stage method almost containing the RPN network [2] to produce proposals that result in low detection speed, this means that two-stage object detection is not suitable for real-time detection scenarios. As a result, one-stage object detection methods are springing up.

2.2. One-Stage Detector

One-stage detection methods prioritize improvements in inference speed. YOLO [16] is the pioneered one-stage detection that treats the object detection task as a single regression problem, greatly improving detection speed. SSD [5] innovatively performs detections on feature maps at multiple levels, leveraging features of different scales to address objects of varying sizes. YOLOv2 [17] was inspired by SSD-proposed multi-scale training, which dynamically alters the input image size, enabling the network to adapt to varying scale objects and introduce batch normalization to every network layer to simplify subsequent computations. YOLOv3 [18] and YOLOv4 [19] introduce Darknet-53, which is a deeper CNN compared to YOLOV2’s enhanced feature extraction capabilities, while also incorporating popular data augmentation techniques of the time to enhance the diversity of training datasets to improve the generalization of the detector while maintaining its speed. YOLOv5 [20] proposes an autoanchor module to adjust anchor boxes if they are ill-fitted for the dataset and training settings. Moreover, it made further improvements in detection speed by converting the intermediate hidden layer activation function from mish to leakyRelu, reducing computational complexity and improving training and inference speed. The aforementioned improvements significantly enhance the generalization capability of the detector across datasets of various scales and types, thereby facilitating their widespread adoption in application domains while maintaining high detection speeds. However, in industrial environments, lighting conditions vary widely from strong light to shadow areas, and the sensitivity to changes in lighting may affect the detection performance of the above one-stage methods, especially without targeted optimization. Our method further optimizes the one-stage object detector for specific industrial scenarios, greatly improving detection performance without affecting detection speed.

2.3. Specific Industrial Scenarios Detector

Current deep learning-based object detection methods are widely applied across various fields [21,22], and the application of object detection technology to industries such as manufacturing, medicine, and artificial materials has further propelled the development of these fields. It is worth noting that the ease of deployment and extendability of YOLOv5 [20] facilitate its wide application in industrial scenarios. To adapt the model to the detection requirements of specific industrial scenarios, [23,24] demonstrate methods that are designed by introducing specific modules to meet the industrial detection. Ref. [25] introduced the transformer mechanism, using the Swin Transformer for feature extraction to capture comprehensive image features, thereby enhancing the detection capability for incomplete objects [26]. Using the improved Canny–Devernay sub-pixel edge detection algorithm in combination with the YOLOv6 deep learning network for rapid and accurate detection of surface defects on metal workpieces was conducted in [27]. Based on YOLOv7, the GhostModule has been introduced to enhance real-time performance, while SE blocks and the EIoU loss function have been incorporated to highlight important information in the image and further accelerate the model’s convergence speed for the purpose of detecting whether on-site construction workers are wearing safety helmets. Ref. [28] added small object detection layers and streamlined large object detection layers to enhance the effectiveness of data manipulation while integrating the EMA mechanism into the C2f module to strengthen feature perception capabilities, and this approach better achieved helmet-wearing detection. Tph-yolov5 [11] addresses the problem of small-object omission by increasing the prediction of head handling of large-scale variances and integrating Transformer Prediction Heads (TPHs) into YOLOv5. UTD-Yolov5 [29] improves object detection accuracy by replacing the original backbone with a two-stage cascaded CSP (CSP2) and introducing the visual channel attention mechanism module SE [30] to solve feature extraction issues in blurry images. Light-YOLOv5 [31] enhances the interaction between the backbone network and global information, as well as the extraction of flame and smoke features, by incorporating separable transformation convolution modules. In addition, the algorithm utilizes the Mish activation function and SIoU loss to improve convergence speed and accuracy. Swin-Transformer-YOLOv5 [32] proposes an integrated grape detection model that combines Swin Transformer and YOLOv5 for real-time object detection of grape clusters. MSFT-YOLO [33] achieves the organic integration of features and global information by incorporating Transformer-based modules in the backbone and detection head. It also improves the detector’s ability to dynamically adjust to objects of different scales by combining multi-scale feature fusion structures. AFF-YOLO [34] integrates an effective channel attention network (ECA-Net) in the neck of the network to enhance attention and filtering. It focuses on relevant steel surface defects and introduces bidirectional information flow through a cascaded BiFPN [35]. The Adaptive Spatial Feature Fusion (ASFF) module has been added to the prediction head to enhance feature fusion at different scales, allowing the model to better learn and recognize steel defects. The above methods prove that the performance of the detector can be enhanced by analyzing the specific demands of industrial scenarios and the characteristics of the dataset. Inspired by these methods, this study analyzed data regarding the heat exchanger and designed corresponding modules to enhance the detection performance of the base model for circle detection.

2.4. Circle Object Detection

The bounding box representation of the object is the mainstream; however, in some medical cell detection scenarios, as well as in the detection of certain specific graphics, circular representation is more suitable than rectangular representation. Ref. [36] integrated the YOLOv3 and Mask R-CNN methods and collected a large dataset related to crop circles based on Google Maps. The model was improved by targeting the characteristics of crop circles. Ref. [37] proposed a circular representation approach specifically designed for the detection of circular cells within medical imagery, which they contrasted with conventional box representation techniques, wherein circular representation significantly diminished the number of degrees of freedom while enhancing rotational invariance, thereby offering a more robust and efficient means for identifying circular structures in complex biomedical contexts. Ref. [38] proposed a novel network that uses a circular bounding box to accurately classify and detect the degree of chrysanthemums blooming, showing better results than a network with a traditional rectangular bounding box; moreover, it can be applied to recognize general objects in a circular form. The above approach illustrates that, under specific scenarios, improvements to detection representation can significantly enhance the overall performance of the detection method. Similarly, in our application scenarios, modifying the object detection representation from rectangular boxes to circular ones can increase detection visibility to better satisfy the specified detection requirements.

In summary, we provide a summary of current object detection and its application in the industrial field, as shown in Table 1.

3. Materials and Methods

3.1. Heat Exchanger Dataset

This work takes end-face images of the heat exchanger under different light scenarios from the industrial site and further proposes a novel heat exchanger dataset (HE dataset) that builds the groundwork for heat exchanger cleaning. A total of 662 heat exchanger end-face images were collected, which were mainly taken by a three-dimensional matrix camera with the exposure value as the variable and each image has a size of

906 \times 906

pixels. To expand the dataset and prevent overfitting, data augmentation methods such as adjusting brightness, color jittering, and changing image transparency were employed. Before model training, Hough circle detection was first used for preannotation, the preliminary detection results generated by Hough circle detection were saved in an annotation file, and we manually corrected the preannotation results. After completing the annotation, the original dataset was divided into a training set, a verification set, and a test set in a ratio of 8:1:1 using random sampling. The training and validation sets were used for model training and evaluation during a single training, and the test set was used to evaluate the detection performance of the final model. Some examples from the original dataset are shown in Figure 3. After data enhancement, a total of 2647 images was obtained as the final training set.

3.2. The Interval Prior Extract Module

Based on the dataset analysis in Figure 1, the circle objects exhibited a regular arrangement pattern along the vertical and horizontal directions, the horizontal interval size was maintained at 25 to 35 pixels, and the vertical interval was maintained at 8 to 12 pixels, which means that the circles could be extracted row by row and column by column, with the interval prior between the circles in each row and column. Specifically, we selected k annotation files randomly from the dataset of the heat exchanger, and each annotation file contained the coordinates of the circle center point

(x, y)

and the width and height

(w, h)

of the bounding box. We extracted the interval information of circles within the same rows or columns by checking if the difference in the y coordinate or x coordinate values between the center points of two circles was within the r range, which is shown in Figure 4.

d_{h} = | x_{i + 1} - x_{i} |

represents the vertical deviation between two adjacent circles in the same column, while

d_{w} = | y_{i + 1} - y_{i} |

signifies the horizontal deviation between two adjacent circles in the same row. We determined whether the circles belonged to the same row by examining if both

d_{w}

and

d_{h}

fell within the radius r, and

w_{i n t e_{i}}

,

h_{i n t e_{i}}

were used to represent the horizontal and vertical intervals, respectively. Subsequently, we sorted the central coordinates of circles in the same row or column by y or x, calculating the horizontal intervals

w_{i n t e_{i}}

and vertical intervals

h_{i n t e_{i}}

between each pair of circles within each row and column.

\begin{matrix} w_{i n t e_{i}} & = x_{i + 1} - x_{i} - (r_{i + 1} + r_{i}) \\ h_{i n t e_{i}} & = y_{i + 1} - y_{i} - (r_{i + 1} + r_{i}), \end{matrix}

(1)

where

i = 0, 1, 2, \dots

, and

r_{i} = \frac{1}{2} m i n (w_{i}, h_{i})

.

w_{i n t e_{i}}

and

h_{i n t e_{i}}

donate the single interval along the horizontal and vertical, respectively, so this process was repeated to compute the entire interval set of one annotation file written as

w_{k} = [w_{i n t e_{1}}, w_{i n t e_{2}}, w_{i n t e_{3}}, \dots, w_{i n t e_{n}}]

, and

h_{k} = [h_{i n t e_{1}}, h_{i n t e_{2}}, h_{i n t e_{3}}, \dots, h_{i n t e_{n}}]

, and then we further calculated the entire intervals of the k annotation files, shown in Equation (2). Furthermore, to ensure the robustness of the interval priors, the interval set

w_{i_{k}}

and

h_{i_{k}}

were standardized using mathematical–statistical methods to obtain the final interval prior

Δ w

and

Δ h

of one annotation file.

\begin{matrix} Δ w = F (w_{i n d e x}) \\ Δ h = F (h_{i n d e x}), \end{matrix}

(2)

where

i n d e x = 1, 2, \dots k

indicates the index of randomly selected k annotation files. The function

F (\cdot)

was used as an interval standardization. A lot of experiments show that trimmed mean statistical methods (Equation (3)) statistical prior information is the most accurate that can effectively control the impact of outliers while maintaining the representativeness of the data.

\begin{matrix} F (x) = \frac{1}{n - 2 \times ⌈ p n ⌉} \sum_{i = ⌈ p n ⌉ + 1}^{n - ⌈ p n ⌉} x_{i}, \end{matrix}

(3)

where p represents the proportion of data to be trimmed—typically ranging within

(0 < p < 0.5)

—n denotes the number of interval sets of

x_{i} = w_{i_{k}}

or

x_{i} = h_{i_{k}}

—which is the interval set along the horizontal and vertical sorted in ascending order,

⌈ p n ⌉

signifies the result of rounding up the product of the interval set size n and the trimming proportion p—which determines the number of intervals to be removed from both ends of the interval set, and Figure 5 shows that obtained circle domian and background domain. Furtherly, this work has also experimented with other classical mathematical statistical approaches showing the detection effects of different statistical methods in Section 4.

3.3. The Interval Prior Fusion Module

After acquiring the prior interval of circles, this information was integrated into the feature extraction process shown in Figure 6. In the study of extracting image features, ref. [39] applied the slicing technique, were input features were randomly segmented and reassembled, thereby achieving the function of downsampling without discarding information. Drawing inspiration from [39], a special slicing operation was conducted, which is similar to random cropping, but instead of randomly cropping the image, our method utilized the extracted prior information as the step size to guide the slicing of the image, and the input image was sliced into two feature maps: a circle domain map (

c i r c l e_{d o m a i n}

), which contains a large amount of circle information, and a background domain map (

b a c k g r o u n d_{d o m a i n}

), which contains the most background information, and the specific formula for slicing is as Equation (4). The slice operation was conducted as follows along the vertical starting from 0 and

Δ h

, where every

Δ h

pixel passed and

Δ h

was selected along the horizontal starting from 0 and

Δ w

, every

Δ w

pixel passed, and

Δ w

pixel was selected. Finally, all selected pixels were concatenated to form the final two sub-feature maps

c i r c l e_{d o m a i n}

and

b a c k g r o u n d_{d o m a i n}

. By emphasizing the features containing a large amount of circle information and reducing the focus on parts with almost background information, this work aimed to enhance valid information and suppress irrelevant information.

To enhance the focus on important information, the extracted circle domain map was further input into a prior-guided sparse attention module (PSA), which uses a prior circle–hole diameter as the step that divides the circle domain map into patches, performs patch-wise self-attention instead of pixel-wise attention, and further calculates the similarity between different positions and channels in the circle domain map to determine its importance in the subsequent model extraction, thereby enabling the network to better focus on local information. To reduce the focus on background information, the extracted background domain map was multiplied by a weakening coefficient matrix (WCM), which is a matrix with internal elements predefined as hyperparameters, with dimensions matching those of the background information constant matrix. Finally, using padding to restore the original input size on the spatial dimension and contact these two features along the channel dimension, a

1 \times 1

Conv was used to restore the original size on the channel dimension, and it output the same dimension with the original input feature.

\begin{matrix} c i r c l e_{d o m a i n} & = X [0 : H : Δ h, 0 : W : Δ w] \\ b a c k g r o u n d_{d o m a i n} & = X [Δ h : H : Δ h, Δ w : W : Δ w], \end{matrix}

(4)

where X, H, and W represent the original image, the height, and the width of the original image, respectively, and

Δ h

and

Δ w

represent the prior interval in the vertical direction and the horizontal direction, respectively.

3.4. Prior-Guided Sparse Attention Module

In the traditional neural network structure, information loss or poor information transmission is prone to occur when information is transferred between network layers. The attention mechanism can make the model pay more attention to locally important information. Therefore, this work fed the circle domain map into an attention module for refined feature extraction. Integrating prior knowledge about circle objects, we introduced patch-wise attention to enhance the feature interaction among circle holes, with each patch sized equal to

P a t c h_{s i z e}

. This change not only captures the local context more efficiently but also aligns with the inherent structure of the data, which is shown in Figure 7. Specifically, we divided the input features into multiple patches based on the

μ

times the diameter of circles and projected the patches to

Q^{P}, K^{P}, V^{P}

vectors shown in Equation (6). Then, we calculated the correlation between

K^{P}

and

Q^{P}

in each patch, sorted the correlation value, and took the subscripts corresponding to the top k with the highest correlation, shown in Equation (7), which can further sparse the attention mechanism.

P a t c h_{s i z e} = μ \times d

(5)

Q^{P} = X^{P} W^{q}, K^{P} = X^{P} W^{k}, V^{P} = X^{P} W^{v},

(6)

where

μ

(default

μ = 5

) defines multiples of the circle–hole diameter, P represents

P a t c h_{s i z e}

,

X^{P}

represents the patch features obtained by dividing the original image, and

W^{q}, W^{k}, W^{v} \in R^{C \times C}

is three trainable projection weight matrices.

A t t e n t i o n^{P} = Q^{P} {(K^{P})}^{T}

(7)

I^{P} = t o p k I n d e x (A t t e n t i o n^{P}),

(8)

where

Q^{P}, K^{P} \in R^{P^{2} \times C}

,

{Attention}^{P} \in R^{P^{2} \times P^{2}}

; the value in

{Attention}^{P}

is used to measure the degree of correlation between

K^{P}

and

Q^{P}

in the area.

I^{P}

is the subscript value of the top k, which is the most relevant

K^{P}

and

Q^{P}

in each region and extracts the corresponding

K^{P}

value and

V^{P}

value. However, the top k

K^{P}

values and

V^{P}

values filtered out according to the correlation matrix were discontinuous, and while the GPU was performing during matrix operations, it processed data in a continuous manner, so they needed to be aggregated and converted into continuous data using the gather function. This study used the

g a t h e r

function, which was integrated into the torch library to extract data at a specified index from a batch of tensors to realize the continuity of discretely stored vectors shown in Equation (9).

K^{g} = g a t h e r (K^{P}, I^{P}), V^{g} = g a t h e r (V^{P}, I^{P})

(9)

o u t p u t = s o f t m a x (\frac{Q^{P} {(K^{g})}^{T}}{\sqrt{d_{k}}}) V^{g},

(10)

where

K^{g}, V^{g} \in R^{P^{2} \times C}

are the gathered key and value tensor, and

d_{k}

is the dimension of the

K^{g}

tensor. Finally, the top k-most-relevant coarse-grained areas for each token are used as keys and values to participate in the Equation (10) operation, which means normal self-attention.

3.5. Improvements in Circle Representation

In object detection, objects are typically enclosed within rectangular bounding boxes, with their positions defined by the coordinates of opposite corners. However, for specialized datasets of heat exchangers, rectangular bounding boxes can be suboptimal, because rectangular bounding boxes introduce excess margins around circular objects, leading to imprecise localization and potentially false detections. This limitation was highlighted by [37], which studied medical cell detection, and their findings underscored the need for bounding shapes that match the geometry of the objects being detected. Inspired by these observations, this work proposed the representation of circle boxes for the detection of circle holes in heat exchanger datasets. This approach not only aligns more intuitively with the actual shape of the objects but also provides a more precise description of their location. Defining the circle box through its center and radius ensures that every point on the circumference of the object is accurately captured, minimizing the area of false inclusion or exclusion. This leads to a more accurate assessment of the position and size of circle holes, which is crucial for applications requiring high precision.

Specifically, we calculated the center coordinates and the width and height of the bounding box based on the annotated top-left and bottom-right corner coordinates and chose the smaller value between the width and height to calculate the circle radius, which is shown in Equation (11).

\begin{matrix} (c_{x}, c_{y}) & = (\frac{x_{1} + x_{2}}{2}, \frac{y_{1} + y_{2}}{2}) \\ r & = \frac{min (w, h)}{2}, \end{matrix}

(11)

where

(x_{1}, y_{1})

,

(x_{2}, y_{2})

, and w, h denote the top-left and bottom-right coordinates, as well as the width and height of the bounding box, respectively. In accordance with the computed center coordinates and radius, circle boxes were delineated upon the primitive image canvas. Employing the Bresenham algorithm completed the rendering of a circle box; this algorithm facilitates the identification of points proximate to the circumference on a lattice of discrete pixels. Leveraging the octagonal symmetry intrinsic to circles, it suffices to calculate points confined to the segment beneath the 45-degree line (the line defined by

y = x

) within the first quadrant of the coordinate system. Subsequently, the symmetry property was harnessed to extrapolate and complete the rendering of the entire circle. To this end, a pivotal decision variable, denoted as D, was introduced to ascertain which pixel should be elected as the succeeding point at each iterative juncture. Initially, D was established as in Equation (12). The choice of this initial value is critical, as it enables the algorithm to utilize integer arithmetic solely during its iterative progression, thus enhancing both the speed and accuracy of computation concurrently, where the initial point is set as Equation (12)

D = 3 - 2 r, (x, y) = (0, r),

(12)

At each iterative step, the algorithm indicates to pixel p whether to move one pixel

p_{e}

in the x direction alone or to shift a pixel

p_{s}

simultaneously in both the x and y directions, as shown in Equation (13), which is contingent upon the current value of the decision variable D.

D = \{\begin{matrix} D + 2 x + 3 & if D < 0 \\ D + 2 x - 2 y + 5 & if D \geq 0 \end{matrix}

(13)

And iterate the value of x and y shown in Equation (14) until

x \geq y

.

x_{n e w} = x + 1, y_{n e w} = \{\begin{matrix} y & if D < 0 \\ y - 1 & if D \geq 0 \end{matrix}

(14)

Finally, taking advantage of the symmetry of the circle, we found the corresponding points in the other seven quadrants and plot pixels at these locations to complete the drawing of the entire circle. The entire drawing process is illustrated below in the Pseudocode Algorithm 1.

Algorithm 1 Circle Drawing

procedure DrawCircle(

x_{c e n t e r}

,

y_{c e n t e r}

,

r a d i u s

)

x \leftarrow 0

//Initialize variables

y \leftarrow r a d i u s

D \leftarrow 3 - 2 \times r a d i u s

// Loop until the

y_{n e w}

value is less than the

x_{n e w}

while

y_{n e w} \geq x_{n e w}

do

// Draw the symmetry of an eighth of a circle’s region.

Plot(

x_{c e n t e r} + x

,

y_{c e n t e r} + y

)

if

D < 0

then

// Update decision parameter D

D \leftarrow D + 2 \times x + 3

else

D \leftarrow D + 2 \times (x - y) + 5

y_{n e w} \leftarrow y - 1

end if

// Update the value of

x_{n e w} \leftarrow x + 1

end while

end procedure

procedure Plot(x, y)

draw pixel at

(x, y)

end procedure

Figure 8 intuitively reflects how the circular detection representation can more intuitively reflect the locating effect of the circular holes. Simultaneously, to reduce resource waste, we plan to control the water output of the cleaning machine according to the size of the detection representation area in order to further achieve efficient and economical heat exchanger tube cleaning. Compared to the square representation, the circular representation can more accurately represent the true area of the heat exchanger tube mouth, thereby avoiding resource waste caused by a fake high detection area.

4. Results

To verify the robustness and universality of this method, relevant experiments were conducted on our self-collected heat exchanger dataset, which is referred to as the HE dataset.

4.1. Parameter Settings

Experimental equipment: In the experiment, the CPU is an Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20 GHz (Intel, Santa Clara, CA, USA), the GPU is an NVIDIA GeForce GTX 1080Ti (NVIDIA, Santa Clara, CA, USA), and the GPU memory is 10.95 GB. The algorithm was implemented in Pytorch v.11.1 and used CUDA11.2 for computing acceleration. Network parameter settings: This study set the batch size to four and the training epoch to 50. An early stopping strategy was used to avoid overfitting, with the patience set to 15. This work used Adam as the optimizer with an initial learning rate of 0.001. At the same time, the size of the anchor was very important to the detection and location of the circle hole. In this experiment, Table 2 lists three groups of anchors with three prediction heads applicable to this dataset, and each group has been applied to feature maps of different sizes.

4.2. Ablation Experiments

We useed the

m A P

as the evaluation metric for the ablation experiments, where

m A P_{50}

,

m A P_{90}

, and

m A P_{50 : 95}

represent the average precision calculated at IoU thresholds of 0.5, 0.9, and within the range from 0.5 to 0.95, respectively. In analyzing interval prior information, this work applied the trimmed mean, arithmetic mean, mode, and median to generate representative interval priors, where the trimmed mean is an average that excludes a small fraction of the highest and lowest values to reduce the impact of outliers. The arithmetic mean is the sum of all values divided by the number of values, representing the average. The mode is the value that occurs most frequently in a dataset. The median is the middle value when a dataset is ordered from least to greatest. When utilizing the trimmed mean as the interval statistical approach, we set a trimming ratio of 10%, which was based on practical considerations, and a trimming rate of 10% was used to effectively removed a portion of the outliers while ensuring that a substantial portion of the data remained incorporated into the final mean calculation, thereby securing the representativeness of the outcome. Depending on the statistical method of prior information will have a certain impact on the training results. According to Table 3, the trimmed mean (10%) method various indicators of the training results were the best. Compared to the arithmetic mean, the trimmed mean enhanced the robustness of the analysis and maked the results more reliable by preprocessing the data, which involved eliminating potential outliers.

The statistical anomaly for interval prior information is particularly important. The more the prior interval can represent the intervals between circular targets in the entire dataset, the more accurate the subsequent prior fusion results of the circle domain and background domain features will be. Table 3 shows the statistical interval prior values obtained at different trimming rates (for the convenience of subsequent calculations, the interval prior directly assumes integral values), as well as the average accuracy obtained when applying these prior values for subsequent model training. It can be clearly found that when the trimming rate was 10%, the

m A P_{50 : 95}

was maximized, indicating that the interval prior value obtained when the trimming rate was 10% is the most representative of the intervals between circular targets in the entire dataset.

Table 4 shows the circle domain and background- domain extracted under different interval information, and it can be intuitively seen that the pixel interval value obtained with a trimming rate of 10% can well separate the circle domain and background domain.

Figure 9 shows that the interval prior significantly affected the segmentation outcome. At a trimming rate of 10%, the intervals obtained are the most precise. Inaccuracies in the interval prior can result in the conflation of the circle domain with the background domain, which can adversely impact the feature extraction in subsequent networks. Simultaneously, a comparative experiment was conducted targeting the weakening background domain, involving attenuating the coefficient matrix across a range from 0 to 1 in increments of 0.2, as shown in Table 5. The experimental outcomes reveal that the optimal performance was achieved when the weakening factor was set at 0.6.

It can be seen through Table 6 that the interval prior extract module and interval prior fusion module plus the prior-guided sparse attention module had the greatest contribution to model performance, increasing the

m A P_{50}

and

m A P_{50 : 95}

by 5.24% and 5.52%, respectively. It should be noted that the improvement of only adding the IPE and IPF module was less than the improvement of the model performance by the PSA module plus the IPE and IPF module, where the

m A P_{50}

and

m A P_{50 : 95}

were, respectively, only improved by 1.29% and 2.2%. Similarly, without the help of the interval prior extract and interval prior fusion module, the improvement of adding the prior-guided sparse attention module to the original image information was relatively small, where the

m A P_{50}

and

m A P_{50 : 95}

were, respectively, only improved by 2.7% and 3.21%. This work speculates that this is because all the objects to be detected in this dataset are small objects. Targeted extraction of features of objects and suppression of invalid features can maximize the detection efficiency of objects.

4.3. Experimental Results

In this experiment, this work used the

m A P_{50}

,

m A P_{90}

, and

m A P_{50 : 95}

as the evaluation metrics for heat exchanger circle detection. The mAP values were calculated using Equation (22). By comparing the experimental results of the YOLO series and the improved method, it can be inferred that our method is better than the YOLO series in detecting circles in the HE dataset, and the modules proposed in this paper can be plug-and-play in the YOLO series shown in Table 7; also note that CPDet(v3) represents the test results with YOLOv3 as the backbone and the same applies to CPDet(v5) and CPDet(v8). On our self-collected heat exchanger dataset, conducting a visual comparative analysis of detection results using different algorithms, as shown in Figure 10, demonstrates that the approach has avoided most of the error detections compared to other methods and has also significantly reduced the rate of false negatives. In the HE dataset, the heat exchangers are the same type; therefore, the number of circle holes on the end face is constant. Therefore, we tallied the total number of objects in the dataset to further calculate the True Positive Rate (TPR), False Positive Rate (FPR), and False Negative Rate (FNR), shown in Equations (19)–(21), which can intuitively display the performance gains of the model in actual scenarios.

\begin{matrix} P r e c i s i o n = \frac{T P}{T P + F P}, \end{matrix}

(15)

\begin{matrix} R e c a l l = \frac{T P}{T P + F N}, \end{matrix}

(16)

\begin{matrix} A P = \sum_{n = 1}^{N} P r e c i s i o n_{n} \times Δ R e c a l l_{n}, \end{matrix}

(17)

\begin{matrix} m A P = \frac{\sum_{i = 1}^{k} A P_{i}}{k}, \end{matrix}

(18)

\begin{matrix} F 1 S c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}, \end{matrix}

(19)

\begin{matrix} T R P = \frac{T P}{T N O}, \end{matrix}

(20)

\begin{matrix} F N R = \frac{F N}{T N O}, \end{matrix}

(21)

\begin{matrix} F P R = \frac{T N O - T P}{T N O}, \end{matrix}

(22)

where

T P

(True Positive) represents the correctly detected positive samples,

F P

(False Positive) represents the incorrectly classified negative samples as positive, and

F N

(False Negative) represents the incorrectly predicted positive samples as negative. TNO (Total Number of Objects) represents the total number of objects to be detected in each image in the dataset.

P r e c i s i o n_{n}

represents the precision at the nth recall threshold.

Δ R e c a l l

denotes the interval between the recall threshold at the nth point and that at the preceding point.

In the conducted experiments, the performance of the improved model was evaluated using a test set from the heat exchanger dataset, as shown in Table 8. The areas highlighted in blue represent the outcomes achieved by augmenting the baseline approach with the proposed method. It can be seen from the ablation experiment that our method can better focus on the detection of circles and greatly improved the YOLO method detection performance.

4.4. Discussion

We applied the proposed module to the recently introduced YOLOv8 model and commonly used methods in the industrial field such as YOLOv3 and YOLOv5. By comparing the results in Table 6, it can be noticed that the methods incorporating our proposed module showed a significant improvement in model training accuracy, and the enhancement was most significant, where the

m A P_{50}

increased by 5.76 percentage points for CPDet(v3), the F1 Score increased by 6.5 for CPDet(v5), and the False Negative Rate (FNR), False Positive Rate(FPR) decreased by 0.11 and 0.07, respectively, for CPDet(v10), which means that on each image to be detected, there was about a 10% reduction in missed detections and false positives. Furthermore, by examining the model results running on the test set comparison in Table 7, it can be observed that for CPDet(v3), CPDet(v5), and CPDet(v10), different-size models saw an improvement of two to four percentage points in the

m A P_{50}

. This intuitively reflects the effectiveness and plug-and-play practicality of our proposed module. However, compared to the general methods, our method showed a significant growth in fps and flops, the inference speed of CPDet(v8s) and CPDet(v10s) decreased by approximately 30 frames per second, and the flops increased near to 2.2 M the same situation also occurred in other CPDet methods. Therefore, our method has limitations in terms of speed and computational complexity. Inevitably, compared to general methods, our method is only applicable to the current heat exchanger dataset; thus, there also exist limitations in terms of application generality.

In Figure 10, we have selected examples from overly dark and overly exposed scenes for detection, and it can be more intuitively seen that the instances of False Negatives and False Positives significantly decreased after the incorporation of our method. According to the visual examples in Figure 10 and Figure 11, it was observed that the False Negatives were primarily concentrated in the circle domain area, while the False Positives were mainly concentrated in the background domain area. Our method initially locates the circle domain and background domain by extracting the prior information of the object arrangement and enhances the focus of the network on the circle domain area by attention weighting while reducing the focus on the background domain area. This effectively avoids the occurrence of False Negatives and False Positives. Figure 11 intuitively demonstrates the significant performance of our method in avoiding False Negatives and False Positives compared to other mainstream methods, where our approach greatly reduced False Positives and minimized the False Negative issue to an acceptable range. However, our method mainly leveraged the regularity of object distribution in the HE dataset to achieve performance gains, which means that the method is not applicable to other datasets. Therefore, our method still has limitations in terms of generalizability.

5. Conclusions

This paper proposed a circle-permutation-aware object detector (CPDet) for heat exchanger cleaning aiming to sufficiently exploit prior information to further improve the detection effect. This paper fully utilized the prior arrangement information in the dataset to enhance the detection capability of the model for foreground objects. The experimental results show that our method has demonstrated a significant improvement in detection accuracy. The core of CPDet is the interval prior extraction (IPE) module, which extracts interval statistics from circle holes, followed by the interval prior fusion (IPF) module, which distinguishes the circle domain from the background domain. Our prior-guided sparse attention (PSA) refines circle domain analysis, while the background domain is tactically weakened by a weak coefficient matrix (WCM). Finally, we opted for a circle representation, which closely matches the object shape. Our approach effectively capitalizes on the prior interval information of circle holes, and it incorporates attention mechanisms to pinpoint the locations of circle holes with greater precision and to heighten the focus of the network on relevant areas. Empirical results demonstrate that the CPDet algorithm excels in terms of detection accuracy for tasks related to circular hole detection in heat exchanger end faces, substantiating its efficacy in this domain. In the future, we plan to abstract a prior information processing module for various datasets with regular arrangement distributions, thereby enhancing the generalizability of this work and applying it to all object detection tasks with a certain regularity arrangement to achieve better detection results by making full use of prior information and by using parallel computing to improve the inference speed of the model.

Author Contributions

Conceptualization, J.L. and X.L.; methodology, J.L. and Y.W.; software, J.L. and Y.W.; validation, J.L., Y.W. and Y.Q.; formal analysis, J.L. and Y.W.; investigation, J.L.; resources, J.L.; data curation, J.L. and X.L.; writing—original draft preparation, J.L., Y.W. and Y.Q.; writing—review and editing, J.L., Y.W., Y.Q., X.L. and H.W.; visualization, J.L.; supervision, X.L., X.X. and Y.P.; project administration, J.L. and X.L.; funding acquisition, X.X. and Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Silaipillayarputhur, K.; Khurshid, H. The design of shell and tube heat exchangers—A review. Int. J. Mech. Prod. Eng. Res. Dev. 2019, 9, 87–102. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision—ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
Zuo, Y.; Wang, J.; Song, J. Application of YOLO object detection network in weld surface defect detection. In Proceedings of the 2021 IEEE 11th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Jiaxing, China, 27–31 July 2021; pp. 704–710. [Google Scholar]
Yang, L.; Fan, J.; Liu, Y.; Li, E.; Peng, J.; Liang, Z. Automatic detection and location of weld beads with deep convolutional neural networks. IEEE Trans. Instrum. Meas. 2020, 70, 1–12. [Google Scholar] [CrossRef]
Fahimipirehgalin, M.; Trunzer, E.; Odenweller, M.; Vogel-Heuser, B. Automatic visual leakage detection and localization from pipelines in chemical process plants using machine vision techniques. Engineering 2021, 7, 758–776. [Google Scholar] [CrossRef]
Obaid, M.H.; Hamad, A.H. Deep Learning Approach for Oil Pipeline Leakage Detection Using Image-Based Edge Detection Techniques. J. Eur. Syst. Autom. 2023, 56, 663–673. [Google Scholar] [CrossRef]
Zheng, H.; Chen, X.; Cheng, H.; Du, Y.; Jiang, Z. MD-YOLO: Surface Defect Detector for Industrial Complex Environments. Opt. Lasers Eng. 2024, 178, 108170. [Google Scholar] [CrossRef]
Zhu, X.; Lyu, S.; Wang, X.; Zhao, Q. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 2778–2788. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7263–7271. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Couturier, R.; Noura, H.N.; Salman, O.; Sider, A. A deep learning object detection method for an efficient clusters initialization. arXiv 2021, arXiv:2104.13634. [Google Scholar]
Choudhari, S.J.; BJ, S.; Singh, S.A.; Desai, K. Utilizing Vision-Based Object Detection Algorithms in Recognizing Uncommon Operating Conditions for CNC Milling Machine. In Proceedings of the International Manufacturing Science and Engineering Conference, Greenville, SC, USA, 23–27 June 2023; American Society of Mechanical Engineers: New York, NY, USA, 2023; Volume 87240, p. V002T05A007. [Google Scholar]
Zhao, Y.; Yang, Z.; Xu, C. Bolt Loosening Detection for a Steel Frame Multi-Story Structure Based on Deep Learning and Digital Image Processing. In Proceedings of the ASME International Mechanical Engineering Congress and Exposition, Columbus, OH, USA, 29 October–2 November 2022; American Society of Mechanical Engineers: New York, NY, USA, 2022; Volume 86656, p. V003T04A017. [Google Scholar]
Fang, Y.; Guo, X.; Chen, K.; Zhou, Z.; Ye, Q. Accurate and automated detection of surface knots on sawn timbers using YOLO-V5 model. BioResources 2021, 16, 5390. [Google Scholar] [CrossRef]
Mathew, M.P.; Mahesh, T.Y. Leaf-based disease detection in bell pepper plant using YOLO v5. Signal Image Video Process. 2022, 16, 841–847. [Google Scholar] [CrossRef]
Ling, H.; Zhao, T.; Zhang, Y.; Lei, M. Engineering Vehicle Detection Based on Improved YOLOv6. Appl. Sci. 2024, 14, 8054. [Google Scholar] [CrossRef]
Wang, H.; Xu, X.; Liu, Y.; Lu, D.; Liang, B.; Tang, Y. Real-time defect detection for metal components: A fusion of enhanced Canny–Devernay and YOLOv6 algorithms. Appl. Sci. 2023, 13, 6898. [Google Scholar] [CrossRef]
Han, J.; Li, Z.; Cui, G.; Zhao, J. EGS-YOLO: A Fast and Reliable Safety Helmet Detection Method Modified Based on YOLOv7. Appl. Sci. 2024, 14, 7923. [Google Scholar] [CrossRef]
Yang, X.; Wang, Z.; Dong, M. PRE-YOLO: A Lightweight Model for Detecting Helmet-Wearing of Electric Vehicle Riders on Complex Traffic Roads. Appl. Sci. 2024, 14, 7703. [Google Scholar] [CrossRef]
Wang, J.; Yu, N. UTD-Yolov5: A real-time underwater targets detection method based on attention improved YOLOv5. arXiv 2022, arXiv:2207.00837. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Xu, H.; Li, B.; Zhong, F. Light-YOLOv5: A lightweight algorithm for improved YOLOv5 in complex fire scenarios. Appl. Sci. 2022, 12, 12312. [Google Scholar] [CrossRef]
Lu, S.; Liu, X.; He, Z.; Zhang, X.; Liu, W.; Karkee, M. Swin-Transformer-YOLOv5 for real-time wine grape bunch detection. Remote Sens. 2022, 14, 5853. [Google Scholar] [CrossRef]
Guo, Z.; Wang, C.; Yang, G.; Huang, Z.; Li, G. Msft-yolo: Improved yolov5 based on transformer for detecting defects of steel surface. Sensors 2022, 22, 3467. [Google Scholar] [CrossRef]
Mehta, M. AFF-YOLO: A Real-Time Industrial Defect Detection Method Based on Attention Mechanism and Feature Fusion. 2023. Available online: https://www.researchsquare.com/article/rs-3449230/v1 (accessed on 21 August 2024).
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
Mekhalfi, M.L.; Nicolò, C.; Bazi, Y.; Al Rahhal, M.M.; Al Maghayreh, E. Detecting crop circles in google earth images with mask R-CNN and YOLOv3. Appl. Sci. 2021, 11, 2238. [Google Scholar] [CrossRef]
Yang, H.; Deng, R.; Lu, Y.; Zhu, Z.; Chen, Y.; Roland, J.T.; Lu, L.; Landman, B.A.; Fogo, A.B.; Huo, Y. CircleNet: Anchor-free glomerulus detection with circle representation. In Proceedings of the Medical Image Computing and Computer Assisted Intervention—MICCAI 2020: 23rd International Conference, Lima, Peru, 4–8 October 2020; Proceedings, Part IV 23. Springer: Berlin/Heidelberg, Germany, 2020; pp. 35–44. [Google Scholar]
Park, H.M.; Park, J.H. YOLO network with a circular bounding box to classify the flowering degree of chrysanthemum. AgriEngineering 2023, 5, 1530–1543. [Google Scholar] [CrossRef]
Sunkara, R.; Luo, T. No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases; Springer Nature: Cham, Switzerland, 2022; pp. 443–459. [Google Scholar]
Jocher, G.; Chaurasia, A.; Qiu, J. Ultralytics YOLO. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 21 August 2024).
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Wang, C.; He, W.; Nie, Y.; Guo, J.; Liu, C.; Wang, Y.; Han, K. Gold-YOLO: Efficient object detector via gather-and-distribute mechanism. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 9–15 December 2024; Volume 36. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. Yolov10: Real-time end-to-end object detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]

Figure 1. The arrangement of circle holes is regular, using annotation files to extract the interval information of the dataset and apply the interval information to process the original image to obtain features containing a large amount of foreground information.

Figure 2. Overall network architecture. This process starts by identifying intervals between circles with the IPE module, combining these data with feature extraction using the IPF module to differentiate circular and background features. The PSA module adjusts the importance of circular features based on diameter and selects key–query–value pairs for attention. A weak coefficient matrix is used to downplay background features. The features are then refined with padding and 1 × 1 convolution before producing the final detection result.

Figure 3. Heat exchanger dataset.

Figure 4. Evaluating whether circles align in the same row or column.

Figure 5. The circle domain and background domain obtained after interval prior fusing from different input images.

Figure 6. Details of the interval prior fusion module.

Figure 7. Details of the prior-guided sparse attention module.

Figure 8. The circle representation provides a more intuitive localization for the circle holes compared to the square representation. Additionally, the circular representation fits the circle holes more closely, leading to more accurate area calculations. Therefore, using the circular area to control the water output of the cleaning machine is more precise.

Figure 9. The circle domain and background domain under varying trimming rates.

Figure 10. The actual comparative gains of our method compared with the baseline method, where the red box indicates successful reduction of the False Positive results, and the blue box indicates successful reduction of the False Negative results; moreover, the gray box indicates a failed reduction of the False Negative results.

Figure 11. The comparative detection results of different detection models on the heat exchanger dataset: the blue boxes highlight the parts that were incorrectly detected, and the red boxes indicate the sections that were missed during detection.

Table 1. Comparison of different detectors.

Detector Type	Advantages	Disadvantages
One-Stage Detector	Fast detection speed, easy to deploy	Detection accuracy is relatively poor compared to two-stage detectors
Two-Stage Detector	Detection accuracy is relatively higher compared to one-stage detectors	Detection speed is relatively slower compared to one-stage detectors, and it is difficult to deploy.
Specific Industrial Scenarios Detector	Improved for special scenarios, with high detection accuracy in specific scenarios.	Limited to specific scenarios; performs well in its specific scenarios but has poor generalizability.
Circle Object Detection	Represented for circular targets; more suitable for the detection of circle objects.	Improved based on two-stage detectors, with slower detection speed and poor portability.
Our Detector	Fully utilizes prior information in the dataset, achieving the best performance in the scenario of detecting circle holes in heat exchangers compared to the aforementioned methods.	Dependent on the arrangement of prior information in the dataset, with poor generalizability.

Table 2. Anchor box size.

Detection Head	Anchor Size
Small	[10,13], [16,30], [33,23]
Medium	[30,61], [62,45], [59,119]
Large	[116,90], [156,198], [373,326]

Table 3. Ablation experiment with different statistical methods of interval prior.

Statistical Method	${mAP}_{50}$	${mAP}_{90}$	${mAP}_{50 : 95}$
Arithmetic Mean	83.75%	67.10%	76.09%
Trimmed Mean (10%)	84.23%	68.26%	76.70%
Mode	79.11%	61.73%	68.54%
Median	65.96%	62.56%	64.71%

Note: bolded parts indicate the highest value.

Table 4. Ablation experiment with different trimming rate of interval prior.

Trimming Rate	h Interval	${mAP}_{50}$	${mAP}_{50 : 95}$
0%	22	75.13%	72.09%
10%	26	78.69%	76.70%
20%	23	76.13%	75.16%
30%	24	76.37%	74.18%
40%	25	73.89%	74.37%

Note: bolded parts indicate the highest value.

Table 5. Ablation experiment with different values of weakening coefficient matrix (WCM).

WCM	${mAP}_{50}$	${mAP}_{90}$	${mAP}_{50 : 95}$
0.2	81.91%	68.20%	75.38%
0.4	80.54%	68.43%	74.85%
0.6	83.49%	69.30%	76.26%
0.8	81.23%	68.74%	75.23%

Note: bolded parts indicate the highest value.

Table 6. Ablation experiments on different modules.

YOLOv8s Backbone		${mAP}_{50}$	${mAP}_{90}$	${mAP}_{50 : 95}$
IPE + IPF	PSA	${mAP}_{50}$	${mAP}_{90}$	${mAP}_{50 : 95}$
		81.63%	77.49%	77.13%
✔		82.92%	81.94%	79.33%
	✔	84.33%	84.12%	80.34%
✔	✔	86.87%	85.23%	82.65%

The ✔ represents the module that takes this column, and the bold indicates the highest value for this column metric.

Table 7. Experimental results of different detection algorithms of HE dataset.

Methods	${mAP}_{50}$	${mAP}_{90}$	${mAP}_{50 : 95}$	F1Score	TPR	FNR	FPR
YOLOv3 [18]	76.41%	63.16%	71.69%	77.82	0.71	0.13	0.17
YOLOv5 [20]	78.31%	65.64%	71.96%	74.11	0.75	0.11	0.14
YOLOv8 [40]	81.64%	67.91%	73.30%	75.95	0.73	0.08	0.13
YOLOX-S [41]	79.94%	63.01%	71.75%	73.62	0.76	0.09	0.13
Gold-YOLO [42]	80.80%	64.81%	72.80%	76.22	0.76	0.07	0.15
TPH-YOLOv5 [11]	80.98%	64.34%	71.08%	77.72	0.63	0.19	0.21
YOLOv10 [43]	82.15%	69.40%	73.84%	79.72	0.78	0.12	0.11
CPDet(v3)	82.17%	63.34%	74.13%	79.68(+1.86)	0.84(+0.13)	0.03(−0.10)	0.05(−0.12)
CPDet(v5)	83.54%	66.57%	75.40%	80.61(+6.50)	0.84(+0.09)	0.03(−0.08)	0.04(−0.1)
CPDet(v8)	84.93%	68.25%	76.61%	80.30(+4.35)	0.88(+0.15)	0.02(−0.06)	0.04(−0.09)
CPDet(v10)	84.59%	68.37%	76.74%	81.23(+1.51)	0.88(+0.10)	0.01(−0.11)	0.04(−0.07)

Note: The blue background is to highlight the effect of adding our proposed module to baseline using the method in parentheses, inside (v3, v5, v8, v10) indicated respectively YOLOv3, YOLOv5, YOLOv8, YOLOv10, the red font represents the increment of each metric in that line compared to its corresponding baseline, and the bold indicates the highest value for this column metric.

Table 8. The performance of the YOLO series and our method on the test set.

Methods	${mAP}_{50}$	${mAP}_{90}$	${mAP}_{50 : 95}$	F1Score	FPS	FLOPs	Memory Usage
YOLOv8s	80.55%	71.75%	73.63%	74.96	114	43.9M	13.51G
YOLOv8m	82.21%	72.56%	74.91%	76.19	77	114.3M	14.92G
YOLOv8l	83.13%	72.84%	75.87%	78.88	53	229.1M	16.72G
CPDet(v8s)	82.46%(+1.91%)	73.35%(+1.60%)	76.22%(+2.59%)	77.99(+3.03)	83	46.1M	13.94G
CPDet(v8m)	83.78%(+1.57%)	74.16%(+1.60%)	77.97%(+3.06%)	78.15(+1.96)	69	116.5M	15.83G
CPDet(v8l)	84.26%(+1.13%)	75.85%(+3.01%)	78.50%(+2.63%)	79.58 +0.75)	62	231.3M	17.26G
YOLOv5s	77.95%	72.96%	74.28	74.28	125	32.1M	13.27G
YOLOv5m	78.42%	73.75%	74.56%	74.65	94	98.2M	15.85G
YOLOv5l	79.93%	74.62%	74.34%	75.51	82	218.1M	17.37G
CPDet(v5s)	80.83%(+2.88%)	73.43%(+0.47%)	74.60%(+2.26%)	75.78(+1.50)	91	34.2M	13.98G
CPDet(v5m)	81.71%(+3.29%)	74.50%(+0.75%)	75.24%(+1.73%)	77.13(+2.48)	85	100.3M	16.45G
CPDet(v5l)	82.95%(+3.02%)	75.24%(+0.62%)	76.13%(+1.79%)	78.64(+3.13)	80	220.2M	18.87G
YOLOv10s	80.94%	72.80%	72.98%	77.96	146	27.0M	12.20G
YOLOv10m	78.12%	73.98%	73.64%	77.62	125	79.3M	14.00G
YOLOv10l	79.65%	74.47%	74.71%	78.32	114	192.8M	15.21G
CPDet(v10s)	83.49%(+2.55%)	73.89%(+1.09%)	74.75%(+1.77%)	78.37(+0.41)	114	28.9M	13.11G
CPDet(v10m)	85.00%(+3.67%)	75.23%(+1.27%)	75.67%(+1.79%)	78.91(+1.29)	94	81.2M	14.93G
CPDet(v10l)	85.87%(+2.58%)	75.09%(+0.21%)	76.71%(+2.41%)	79.02(+0.7)	82	194.7M	16.28G

Note: “s”, “m”, “l” in (v8s, v8m, v8l) means small, middle, and large models of YOLO benchmark. The blue background is to highlight the effect of adding our proposed module to the baseline using the method in parentheses, the red font represents the increment of each metric in that line compared to its corresponding baseline.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liang, J.; Wu, Y.; Qin, Y.; Wang, H.; Li, X.; Peng, Y.; Xie, X. CPDet: Circle-Permutation-Aware Object Detection for Heat Exchanger Cleaning. Appl. Sci. 2024, 14, 9115. https://doi.org/10.3390/app14199115

AMA Style

Liang J, Wu Y, Qin Y, Wang H, Li X, Peng Y, Xie X. CPDet: Circle-Permutation-Aware Object Detection for Heat Exchanger Cleaning. Applied Sciences. 2024; 14(19):9115. https://doi.org/10.3390/app14199115

Chicago/Turabian Style

Liang, Jinshuo, Yiqiang Wu, Yu Qin, Haoyu Wang, Xiaomao Li, Yan Peng, and Xie Xie. 2024. "CPDet: Circle-Permutation-Aware Object Detection for Heat Exchanger Cleaning" Applied Sciences 14, no. 19: 9115. https://doi.org/10.3390/app14199115

APA Style

Liang, J., Wu, Y., Qin, Y., Wang, H., Li, X., Peng, Y., & Xie, X. (2024). CPDet: Circle-Permutation-Aware Object Detection for Heat Exchanger Cleaning. Applied Sciences, 14(19), 9115. https://doi.org/10.3390/app14199115

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CPDet: Circle-Permutation-Aware Object Detection for Heat Exchanger Cleaning

Abstract

1. Introduction

2. Related Work

2.1. Two-Stage Detector

2.2. One-Stage Detector

2.3. Specific Industrial Scenarios Detector

2.4. Circle Object Detection

3. Materials and Methods

3.1. Heat Exchanger Dataset

3.2. The Interval Prior Extract Module

3.3. The Interval Prior Fusion Module

3.4. Prior-Guided Sparse Attention Module

3.5. Improvements in Circle Representation

4. Results

4.1. Parameter Settings

4.2. Ablation Experiments

4.3. Experimental Results

4.4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI