Given an image of corn pests, our model aims to accurately and quickly detect the polygon pest-infected regions on the corn leaves. We have annotated a dataset of polygon pest-infected regions based on corn images. To effectively extract features, we chose the YOLOv8 as the backbone network. Specially, we designed a new polygon detection head to detect pest-infected regions on corn leaves efficiently.
3.3. Improved Polygon Detection Head
To efficiently detect pest-infected regions, we designed a new polygon detection head. As shown in
Figure 2, the detection head designed with a decoupled structure can achieve better expressive ability. It mainly consists of three branches: The pointness branch predicts whether each point belongs to a pest-infected region. The bbox branch predicts the bounding box coordinates of the pest-infected region, while the cls branch classifies the type of those regions. Additionally, we designed an order-insensitive module to compute the coordinate regression loss.
We used the decoupled head to predict polygon object information for the three scale feature map from the network. In the bbox branch, we used the convolutions module with
kernel to process three different scales feature maps and predict the coordinate information
, where
represents the predicted
coordinate point. As shown in part (a) of
Figure 3, traditional polygon calculation predicts coordinate points and regression loss, giving excessive consideration to fitting the prediction object point order to the model. Observing part (b) of
Figure 3, we found that the prediction box was already close to the ground truth bounding box for object detection in the real world. However, the displacement of the prediction point order will cause excessive regression loss when calculating regression loss, which neglects the importance of detection shape for object detection, causing disturbances in the model convergence process. Simultaneously, overfitting to point order reduced the overall robustness of the model, a phenomenon that we referred to as coordinate order sensitivity loss. To solve this problem, we proposed an order-insensitive regression loss calculation algorithm. Hence, we introduced the pointness branch to extract the true existence information of coordinate points, which adopted a convolution module with
kernel. The extraction formula is defined as follows:
where
is the output of the pointness branch. In our work, the pointness branch aims to learn whether the corresponding coordinate point truly exists (a true value of 1 indicates existence, and 0 indicates non-existence). Therefore, to approximate the function value as closely as possible to the binary value [0,1] and ensure function continuity for back-propagation, we designed an approximation function
f, defined as
, where
k is the approximation coefficient with a value of 256. Simultaneously, we adopted a gradient regularization truncation mechanism to prevent a gradient explosion during approximation. For inference, we will maintain points in
that the confidence is greater than
, using them as the predicted coordinate points for the polygon detection box.
Afterward, we selected the points closest to the true value coordinate points from the predicted value coordinate point subset by Algorithm 1. The algorithm iterated through each ground truth point
and calculated its distance from each predicted point
using the Euclidean distance. It selected the predicted point with the minimum distance from the ground truth point. Then, it updated the index of the selected predicted point
z and the probability of existence
v. We repeated the process until all ground truth points have been traversed. As a result, the algorithm constructed a
tuple, which contains a ground truth point and a selected predicted point with its existence probability. Finally, these tuples were added in a set
E. We removed the prediction selected point from the set of
to avoid duplicate calculations. Subsequently, we calculated the coordinate regression loss
of each prediction box as follows:
where
is the number of elements in set
E;
e is a
element in set
E;
represents the existence probability of the prediction box in this
;
and
, respectively, represent the x-coordinate and y-coordinate of the ground truth box in this
; and
and
, respectively, represent the x-coordinate and y-coordinate of the predicted box in this
. To enhance the detection ability for small pest-infected regions, we introduced
scaling factor a based on the perimeter of the polygon for harmony MSE (Mean Square Error), where
,
C and
, respectively, represent the perimeters of the ground truth box and the predicted box. Afterward, we calculated the coordinate existence loss
as follows:
We defined the coordinate regression loss
.
Additionally, we used polygon
loss to constrain the shape of the object box, as shown in part (c) of
Figure 3. Since it involved non-convex polygon
computation, we obtained the pixel areas covered by the ground truth object box and predicted object box, respectively. Then, we calculated
by computing the intersection and union pixel of the images. The
loss is formulated as follows:
where
A is the pixel area covered by ground truth box, and
is the pixel area covered by predicted box.
Algorithm 1 Order-insensitive algorithm |
- Input:
Set of ground truth coordinate points , Probability of the existence of ground truth coordinates points , Set of predicted coordinate points , Probability of the existence of predicted coordinates points - Output:
E
- 1:
- 2:
for to n do - 3:
if then - 4:
Break - 5:
end if - 6:
Initialize distance , Index of the predicted coordinate point , Probability of the existence of predicted coordinates points - 7:
for to do - 8:
Calculate the euclidean distance between the true point and the predicted point - 9:
if then - 10:
- 11:
else - 12:
if then - 13:
- 14:
else - 15:
Keep the current values of , z, and v unchanged - 16:
end if - 17:
end if - 18:
end for - 19:
- 20:
- 21:
end for - 22:
return E
|
In the cls branch, we adopted a convolution module with
kernels to predict the probability
of the object categories. The prediction process is formulated as follows:
We used NMS to filter out redundant boxes and the binary cross-entropy loss function to calculate the object classification loss. The calculation formula is defined as follows:
where
y and
represent the ground truth label probability and the predicted box probability, respectively. Combining the three types of losses, we calculated the overall loss
L as follows:
where
N is the number of samples in the training set and
j represents the sample index.
We developed the Poly-YOLOv8 using the PyTorch framework, which automatically computes gradients for model parameters through loss and automatic adjustment of parameters via gradient descent. Our model employed an improved polygon detection head that effectively detects pest-infected regions with varying shapes. Additionally, we designed an order-insensitive loss calculation method to solve the overfitting of vertex ordering features on the model’s learning. This enabled the model to focus on learning the intrinsic shape features of pest-infected areas.