Storage Tank Target Detection for Large-Scale Remote Sensing Images Based on YOLOv7-OT

Wan, Yong; Zhan, Zihao; Ren, Peng; Fan, Lu; Liu, Yu; Li, Ligang; Dai, Yongshou

doi:10.3390/rs16234510

Open AccessTechnical Note

Storage Tank Target Detection for Large-Scale Remote Sensing Images Based on YOLOv7-OT

by

Yong Wan

^1,*

,

Zihao Zhan

¹,

Peng Ren

¹

,

Lu Fan

^1,2,

Yu Liu

¹,

Ligang Li

¹ and

Yongshou Dai

¹

Department of Oceanography and Space Informatics, China University of Petroleum, Qingdao 266580, China

²

Department of Technology, Sinopec Shengli Oilfield Company Technology Inspection Center, Dongying 257000, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(23), 4510; https://doi.org/10.3390/rs16234510

Submission received: 8 October 2024 / Revised: 20 November 2024 / Accepted: 29 November 2024 / Published: 1 December 2024

Download

Browse Figures

Versions Notes

Abstract

:

Since industrialization, global greenhouse gas emissions have gradually increased. Storage tanks, as industrial facilities for storing fossil energy, are one of the main sources of greenhouse gas emissions. Using remote sensing images to detect and locate storage tank targets over a large area can provide data support for regional air pollution prevention, control, and monitoring. Due to the circular terrain on the ground and the circular traces caused by human activities, the target detection model has a high false detection rate when detecting tank targets in large-scale remote sensing images. To address the above problems, a YOLOv7-OT model for tank target detection in large-scale remote sensing images is proposed. This model proposes a data pre-processing method of edge re-stitching for large-scale remote sensing images, which reduces the target loss caused by the edge of the image without losing the target information. In addition, to address the problem of small target detection, the CBAM is added to the YOLOv7 backbone network to improve the target detection accuracy under complex backgrounds. Finally, in response to the model’s misjudgment of targets during detection, a data post-processing method combining the spatial distribution characteristics of tanks is proposed to eliminate the misdetected targets. The model was evaluated on a self-built large-scale remote sensing dataset, the model detection accuracy reached 90%, and the precision rate reached 95.9%. Its precision rate and detection accuracy are better than those of the other three classic target detection models.

Keywords:

remote sensing; storage tank; target detection; deep learning

1. Introduction

With the acceleration of industrialization and the increasing frequency of human activities, global greenhouse gas emissions have shown a trend of sustained growth, which has had a profound impact on the global climate system. Greenhouse gases mainly include carbon dioxide, methane, nitrogen oxides, etc. Their concentration in the atmosphere has been rising year by year, exacerbating the greenhouse effect of the earth, leading to a series of serious consequences such as rising global average temperatures, frequent extreme weather events, and rising sea levels [1]. Climate change is a severe challenge faced by the world and poses a serious threat to the development and survival of human society. Addressing climate change has become a global consensus [2]. Since 2019, under the call of the United Nations Intergovernmental Panel on Climate Change’s special report on the global warming of 1.5 °C, major economies in the world have proposed zero-emission or carbon neutrality reduction targets based on their own national conditions [3].

Faced with this severe challenge, it is particularly important to achieve the goal of carbon peak and carbon neutrality and strengthen the monitoring, reporting, and verification of greenhouse gases [4]. As an important climate driver, the annual average concentration of methane in the atmosphere reached 1895.32 ppb in 2021, about three times that before the industrial revolution. The greenhouse effect of methane gas can reach 28 times that of carbon dioxide on a century-long time scale [5]. Compared with chemically active and stable carbon dioxide, controlling methane emissions has an immediate effect on achieving carbon neutrality goals and mitigating global warming.

The sources of methane include both anthropogenic emissions and natural emissions, of which anthropogenic emissions account for about 60%. Among anthropogenic emissions, fossil fuel use and related industries account for about 35% [6,7]. Storage tanks are one of the common facilities for storing oil and are widely used in oil and gas extraction, refining, transportation, storage, and other links. Storage tanks are the primary source of methane emissions, contributing 94.1% of total emissions [8]. Therefore, identifying and locating storage tanks in a large area will help strengthen the management and monitoring of methane emission sources and provide data support for reducing methane emissions.

Remote sensing image target detection refers to the process of using remote sensing technology to obtain surface image data and automatically or semi-automatically detect and classify ground targets [9,10]. Compared with traditional target detection technology, remote sensing images have their own unique characteristics, which are specifically manifested in the following points [11,12,13]:

Scale diversity: The size of similar ground targets varies greatly; for example, the diameter of a small industrial tank is about several meters, and the diameter of a large tank can reach tens of meters.
Perspective specificity: Compared with natural images, remote sensing images have a single imaging perspective, less available information, and uncertain target direction.
Small target problem: Remote sensing images cover a large area, and the target to be detected is smaller than the image as a whole, often only a few dozen pixels.
High background complexity: Since remote sensing images cover a large area and have a single perspective, the field of view will contain a lot of background information, which will strongly interfere with target detection.

In view of the particularity of remote sensing images, the target detection accuracy can be improved by processing remote sensing images and improving target detection models. Y. Wang et al. [14] used a super-resolution generative adversarial network to dehaze and super-reconstruct low-resolution remote sensing images under foggy conditions to obtain higher quality remote sensing images. Shi Wenxu and others [15] integrated the designed shallow feature fusion module, shallow feature enhancement module, and deep feature enhancement module based on the original multi-scale single-shot network model, which improved the average detection accuracy of the model for remote sensing images. E. Basaeed et al. [16] innovatively designed an integrated image fusion framework to synthesize multi-source remote sensing satellite images and identify and locate targets in them. Shu et al. [17] proposed an automatic detection method for remote sensing image targets based on the multi-temporal detection theory. The method includes four modules: automatic segmentation, automatic extraction, automatic processing, and automatic detection. The experimental results are good, and the method has high application value.

Storage tanks are one of the common remote sensing image targets. They are widely distributed in suburban areas and energy-based enterprise clusters, and their sizes vary significantly depending on their uses. Using remote sensing images to identify and locate storage tanks can quickly check the distribution of storage tanks in a large area, provide a basis for the prevention and control of potential safety hazards such as gas leaks in storage tanks in the area, and provide data support for the investigation of gas pollutant emission sources in the area. Storage tanks appear as approximate circles in remote sensing images. Tank information can be obtained by constructing a detection model for circular buildings. This method requires detection in areas with simpler backgrounds and cannot rule out interference from other circular buildings [18,19]. Therefore, using deep learning methods to build models for tank detection is one of the current mainstream methods. Li Chenyao et al. [20] compared and analyzed the performance of different deep learning-based object detection models on oil tank remote sensing images and concluded that RFCN was the model with the best average accuracy. Zhu et al. [21] combined the SSD model with Hough transform to detect industrial storage tanks in cities, achieving good performance in detection accuracy. Yu Peidong et al. [22] analyzed the YOLOv4 algorithm, added a new feature scale layer to the network structure, and embedded the SE module, which improved the detection performance of the YOLOv4 algorithm for typical targets in remote sensing images. Li Xiang et al. [23] proposed the TCS-YOLO model, which used the C3TR layer to optimize the structure based on YOLOv5, added an attention mechanism model to the network, and used SIoU instead of CIoU. The model performed well in terms of parameter quantity, average detection accuracy, and inference speed. Sun et al. [24] introduced a coordinate attention mechanism based on YOLOv5 and added a small target detection head to improve the detection accuracy of multi-scale tank targets. By analyzing the above research results, it is evident that most scholars focus on improving the detection rate of small targets. By enhancing the network model, these models can better detect image samples under multi-scale conditions. However, few scholars address the detection of storage tank targets under complex backgrounds in large-scale remote sensing images.

In the application of storage tank target detection in large-scale remote sensing images, the presence of numerous circular terrains and traces of human activities often causes the model to misidentify these targets as storage tank targets, leading to significant deviations in the statistics of storage tank targets within a certain range. By combining the characteristics of large-scale remote sensing images and the inherent distribution characteristics of the tanks, this paper proposes the YOLOv7-OT model. This model addresses the issue of non-tank targets being misidentified in the scenario of storage tank target detection in large-scale remote sensing images. Designed experiments demonstrate that this model can effectively improve the detection accuracy of storage tank detection in such images.

This model improves the model in three stages: pre-detection, mid-detection, and post-detection for a specific remote sensing image target—storage tanks. First, when the target appears at the edge of the image, the model re-splices the image and places the target completely in the middle of the image to eliminate the target loss caused by edge cutting. Secondly, the CBAM is added to the network structure, and the model is trained using a self-built dataset to improve the model detection accuracy. Finally, combined with the clustering effect of the target itself, the outlier targets are excluded to screen out the wrong detection targets. This model improves the detection accuracy of the model for storage tanks and improves the precision rate when observing large-scale remote sensing images.

The rest of this paper is organized as follows: Section 2 briefly introduces the selected data. Section 3 describes the YOLOv7-OT model in detail. Section 4 gives the experimental results and discussions. Finally, Section 5 concludes this paper.

2. Data and Model

2.1. Dataset

In order to obtain a good target detection model that can detect a certain type or several types of targets, it is necessary to train the target detection model with a dataset containing detection information to obtain targeted detection results. The dataset usually includes images and the location information of objects in the images, which is used to train the model to detect the feature points of the target and thus complete the target detection task.

Existing public remote sensing datasets usually include multiple typical remote sensing targets, such as storage tanks, airplanes, ships, sports fields, etc. Although the general remote sensing dataset has a large amount of data, the amount of tank target data contained in it is relatively small. If the data in a certain dataset are used in the experiment, it may result in too little data to obtain a good model. In addition, due to the problem of similar targets and similar backgrounds in a single dataset, the model trained by a single dataset may have poor applicability.

Therefore, this study employs a hybrid remote sensing dataset to train the model. We screen and extract the label files containing tank information and obtain the corresponding image information, ultimately forming the dataset used in this research. This hybrid dataset comprises tank targets from several classic remote sensing datasets, including the DIOR dataset, NWPUU_RESISC45 dataset, NWPU-VHR-10 dataset, TGRS-HRRSD dataset, and some self-built datasets. The basic details of the hybrid dataset are shown in Table 1.

The hybrid datasets used in this study mainly consist of the DIOR dataset, the NWPUU_RESISC45 dataset, the NPU-VHR-10 dataset, the HRRSD dataset, and some self-built datasets. The DIOR dataset [25] is a large-scale benchmark dataset for object detection in optical remote sensing images. The dataset consists of 23,463 images covering 20 object classes. The NWPUU_RESISC45 dataset [26] is a commonly used remote sensing image dataset created by Northwestern Polytechnical University. It includes 31,500 images covering 45 scene categories, with 700 images in each category. The NWPU-VHR-10 dataset [27] contains a total of 800 VHR optical remote sensing images, including 715 color images obtained from Google Earth, with spatial resolutions ranging from 0.5 to 2 m. The HRRSD dataset [28] was released by the University of the Chinese Academy of Sciences in 2019. It includes 21,761 images obtained from Google Earth and Baidu Maps, including 55,740 target instances divided into 13 categories, with approximately 4000 targets in each category. The images used in the self-built dataset were obtained from Google Earth, including images of storage tanks from multiple industrial areas around the world. Labeling software was used to annotate the objects in the images and generate corresponding label files.

This study collected and sorted the five mentioned datasets, screening out images containing storage tanks. We formed a remote sensing image dataset of storage tank targets under complex backgrounds. Using a hybrid dataset composed of multiple sources can improve the robustness of model training results while ensuring data diversity.

2.2. Model

With the development and improvement of deep learning in recent years, more and more deep learning models have been applied in various fields. In the field of remote sensing image detection, the traditional detection method is to screen the geometric features of the image [29,30]. This method has low detection accuracy and has high requirements on data quality. Applying the deep learning-based target detection model in the field of remote sensing image detection can enable the model to learn a large amount of sample data and extract target features, thereby reducing background interference and improving the speed and accuracy of target detection.

The YOLOv7 model [31,32,33] used in this paper is a high-performance single-stage target detection model characterized by fast detection speed, high accuracy, and ease of training and deployment. This model can play its fast and high-precision characteristics when facing large-scale remote sensing image detection. The input image size of the YOLOv7 model is 640 × 640. It uses methods such as mosaic data enhancement, adaptive anchor box calculations, and adaptive image scaling to process the image, which enhances the richness of the dataset and makes the network more robust. Its backbone network innovatively adopts the ELAN structure, which controls the shortest and longest gradient paths and continuously enhances the learning ability of the network without destroying the original gradient path, so that deeper networks can effectively learn and converge. The model also proposes a method with an auxiliary training head, which increases the time cost required for training in exchange for higher detection accuracy.

3. Method

Traditional target detection research primarily focuses on improving the model [34,35,36]. Targeted enhancements can increase detection accuracy and efficiency for specific targets. However, in the scenario of large-scale remote sensing images, most images contain areas without targets or with very few targets. Improving target detection accuracy in such areas and reducing the likelihood of misidentifying other objects as targets are the focus of this study.

To address this, we propose a three-stage target detection method for storage tank remote sensing images, enhancing the detection process before, during, and after detection. Given that remote sensing images cover large areas and have scattered targets, pre-processing the images before feeding them into the detection model helps in better target detection. Furthermore, due to the single perspective and complex backgrounds of remote sensing images, there is a tendency to miss targets and misjudge them. Incorporating post-detection processing improves accuracy and minimizes background interference. The structure of the YOLOv7-OT model is shown in Figure 1.

3.1. Pre-Detection Stage

When facing target detection in large-scale high-resolution remote sensing images, if the remote sensing image covers a large area, the input image entering the target detection model will be distorted, which will cause the target information contained in the image to be compressed and lost.

In order to solve this problem, this model proposes an image pre-processing method of edge re-stitching to process the remote sensing images input into the model. By re-stitching the target image, the edge error caused by cropping the image was eliminated to a certain extent. A high-resolution remote sensing image was cropped into multiple images of 256 × 256 size. To differentiate these images from the stitched images, this study refers to the cropped 256 × 256 pixel images as “meta-images”. First, splicing the four meta-images into a group, and the size of the spliced image is 512 × 512. It was input into the target detection model and detected. Since YOLOv7 adjusted the input image to 640 × 640, the spliced image can ensure that the coverage area is not too small and avoid the loss of small targets caused by image compression.

As shown in Figure 2a, the red box represents an image of size 512 × 512 composed of four “meta-images”. By performing target detection on the original stitched image, the position information of the storage tank target in the image can be obtained. When the target appeared at the edge of the image, the edge re-stitching method used the “meta-image” near the original stitched image for re-stitching, ensuring that the edge target was complete in the new image and was not affected by the edge error. The image in the dotted box in Figure 2b represents the result after re-stitching. The detection results of the new image were compared with the detection results of the original image. If the results are the same, the target is considered accurately identified. If they are different, the result with a more complete detection frame shall prevail.

This method cuts a larger area into multiple 512 × 512 grids, with no duplication between grids. If an object appears on the edge of a grid, the edge re-stitching method redraws the grid and places the edge object in the middle of the grid to avoid edge errors caused by image cutting.

3.2. Mid-Detection STAGE

Due to the particularity of remote sensing images, small targets in the images contain less information, often only a dozen pixels, which requires the model to be improved in the detection and extraction of effective features.

The Convolutional Block Attention Module (CBAM) [37] is a lightweight and high-performance module with small parameters and computational complexity, significantly enhancing the target feature extraction capability of the backbone network. The CBAM consists of two modules: the Channel Attention Module (CAM) and the Spatial Attention Module (SAM), as shown in Figure 3a. The CAM allows the model to focus on more meaningful features, while the SAM helps the CAM focus on the location of these features. The CAM uses two types of pooling to concentrate on the important information of each channel and ignore the secondary information. The SAM then performs pooling on each channel to capture the crucial information. The combination of these two modules effectively extracts target features while reducing the weight of secondary features. Consequently, adding the attention mechanism module to the YOLOv7 model enhances the network’s ability to extract target features, thereby improving its ability to detect small targets.

In this study, we incorporated the CBAM after the ELAN module in the backbone network. The ELAN module learns numerous features in different dimensions by controlling the shortest and longest gradient paths. When these features are input into the CBAM, it quickly focuses on the important features of different channels and reduces the weight of secondary features, thereby filtering out more effective feature information and inputting this new information into the subsequent target detection head. By adding the CBAM, the backbone network gains better control over the feature information of the storage tank target, improving the precision and recall rate of target detection.

This paper couples the CBAM with the YOLOv7 backbone network structure and adds four CBAMs to the backbone network, as shown in Figure 3b. The addition of the CBAM allows the model to focus on the deep features of the target, extract image feature information for judgment from the less information contained in the small target, and enhance the accuracy of the model in detecting small targets. The addition of this module can accelerate the convergence speed of the model during the training process, improve the average detection accuracy and recall rate, and achieve good results in target detection for remote sensing images.

3.3. Post-Detection Stage

When facing the detection of storage tank targets in large-scale remote sensing images, there was a large number of circular buildings and circular human activity traces in the image background. The characteristics of these targets are extremely similar to those of storage tanks, which caused the target detection model to misjudge and result in a large number of false positive targets. In industrial production activities, it is uncommon to build just one storage tank. Typically, multiple storage tanks are clustered in a designated area, resulting in a concentration of many storage tanks in one location. Conversely, suspicious targets are usually scattered over large areas, such as mountainous regions or densely built urban zones, far from the actual storage tank targets in industrial areas. These suspicious targets are not as concentrated in spatial distribution as storage tanks but are instead dispersed over a larger range.

In view of the spatial distribution characteristics of storage tanks, this paper proposes a target screening method for the post-processing of detection results. This method used the spatial distribution characteristics of the storage tanks themselves to screen and eliminate targets that were incorrectly detected by the model, thereby improving the precision of target detection in large-scale remote sensing images. The method is shown in Figure 4.

Due to the unique spatial distribution characteristics of storage tanks, they are often not found independently but rather concentrated in the same area with multiple tanks of similar sizes. The misjudged targets are mostly circular-like buildings and traces of human activity, which are typically more dispersed and exist independently in mountainous or urban areas, far from the industrial zones where real storage tanks are located. Therefore, this paper counted the targets monitored in adjacent spaces, recorded their spatial distribution, and established a target aggregation index S to characterize the degree of aggregation between the target and other targets. By evaluating the concentration of identified targets within the current area, we can determine whether the target is a misjudgment, specifically a false positive target. The formula is shown below:

S = \frac{T (k)}{k} \cdot \sum_{i = 1}^{k} \frac{1}{D (x_{i}, y_{i})} (k \neq 0)

(1)

T (k) = s i g m o i d (k - C)

(2)

D (x, y) = s q r t (x^{2} + y^{2})

(3)

In the formula,

x

and

y

are used to express the distance from other targets in the area to the target;

k

is the total number of other targets in the area; and

C

is a quantitative constant used to measure the number of targets in the area.

The model processed the image containing location and category information through the spatial clustering screening method. This method used the identified storage tank target as the center, counted other tank targets within a certain range, and calculated the clustering value of the target based on the location information of the other targets. Each target was assigned a clustering value. By setting the threshold S′, the targets in the area were screened, the targets with obvious clustering were judged as correct targets, and the dispersed targets were judged as misjudgments and excluded.

By filtering the target detection results through the above method, the probability of model misjudgment was greatly reduced, thereby improving the precision rate in large-scale remote sensing image detection. However, it should be noted that the spatial clustering screening method utilized the clustering information of the tanks to screen the targets, only recognizing tanks that were clustered together as correct tanks. If there are single or a few tanks that are relatively independent in space, meaning that there are no other targets around them within a certain range, these tanks will be judged as misidentifications and excluded. The spatial clustering screening method sacrificed part of the recall rate to enhance the precision rate in large-scale remote sensing image application scenarios.

4. Result and Discussion

4.1. Model Evaluation Indicators

After completing the model improvement, we tested the target detection network model by calculating four parameters, the precision rate (P), recall rate (R), P-R curve, and mean Average Precision (mAP). We designed experiments to verify the performance of the YOLOv7-OT model when facing storage tank targets in remote sensing images.

Mean Average Precision (mAP) is an indicator to measure the performance of the target detection model. It is the average value of the Average Precision (AP) calculated based on different Intersection over Union (IoU) thresholds. AP is calculated based on the area under the Precision–Recall curve. Precision indicates the probability that the model predicts correctly, and Recall indicates the proportion of the model covering the real target. The higher the mAP, the stronger the target detection ability of the model. The calculation formula of mAP is shown in formula (4), where n is the number of target categories, and

A P_{i}

is the AP of the i-th category.

m A P = \frac{1}{n} \sum_{i = 1}^{n} A P_{i}

(4)

According to the conceptual definition of the target entity binary classification problem, the actual target’s real situation can be distinguished from the model’s predicted result. The model’s performance can be calculated and evaluated based on the relationship between different categories. The classification criteria are shown in Table 2.

The four parameters in the table are represented as follows:

True Positive (TP): The model predicts positive, and the actual situation is also positive, indicating that the algorithm prediction result is correct.
False Positive (FP): The model predicts positive, and the actual situation is negative, indicating that the algorithm prediction result is incorrect.
True Negative (TN): The model predicts negative, and the actual situation is also negative, indicating that the algorithm prediction result is correct.
False Negative (FN): The model predicts negative, and the actual situation is positive, indicating that the algorithm prediction result is incorrect.

Based on the definitions of these four parameters, we can calculate the expressions for the three indicators: precision, recall, and F1-score.

P r e c i s i o n = \frac{T P}{T P + F P} = \frac{T P}{A l l p r e d i c t e d t r u e s a m p l e s}

(5)

R e c a l l = \frac{T P}{T P + F N} = \frac{T P}{A l l a c t u a l t r u e s a m p l e s}

(6)

F 1 - s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(7)

The F1-score is the harmonic mean of precision and recall, used to measure the accuracy of both precision and recall in binary classification models and other classification models. It takes into account the accuracy of model detection as reflected by the precision and recall values, providing a comprehensive evaluation of model performance, especially when there is a disparity between the two indicators.

The P-R curve is a method to evaluate the performance of a binary classification model, which reflects the relationship between precision and recall. The precision rate indicates how many samples predicted by the model as positive are actually positive, and the recall rate indicates the proportion of samples predicted by the model as positive to all true positives. The horizontal axis of the P-R curve is the recall rate, and the vertical axis is the precision rate. Each point on the curve corresponds to a different classification threshold. The closer the P-R curve is to the upper right corner, the higher the precision and recall rate of the model, and the better the performance.

4.2. Experimental Results

In the experiment, we selected 3568 remote sensing images and split the data into three parts: the training set, test set, and validation set in a ratio of 7:2:1. The training set contains 2498 images, the test set contains 714 images, and the validation set contains 356 images.

The comparison results with the YOLOv7 model are shown in Figure 5. The left figure shows the result of training the YOLOv7 model using the fused remote sensing dataset, and the right figure shows the result of training the YOLOv7-OT model using the same dataset. By comparison, the number of convergence rounds of the YOLOv7 model is about 400 rounds, the mAP@0.5 is about 0.87, and the precision and recall rates are 0.85 and 0.78, respectively; the number of convergence rounds of the YOLOv7-OT model with the CBAM is 100 rounds, the mAP@0.5 is 0.90, and the precision and recall rates are 0.89 and 0.87, respectively. Experimental data show that the improved YOLOv7-OT model has improved the precision, recall, and mAP@0.5 by 0.04, 0.09, and 0.03, respectively; the number of convergence rounds has been reduced by 75%; and the convergence speed has been accelerated. Therefore, the model has a significant improvement in the detection of remote sensing images of storage tanks.

As shown in Figure 6, in terms of the performance of the P-R curve, the mAP@0.5 value of the YOLOv7-OT model is higher than that of the unimproved YOLOv7 model, reaching 0.90, indicating that the improved YOLOv7-OT model has improved the accuracy of tank detection compared with the original model.

To further analyze the detection performance of the YOLOv7-OT model in storage tank remote sensing images, we conducted a comparative experiment evaluating the YOLOv7-OT model, Faster R-CNN, YOLOv7, and YOLOv10 models based on precision, recall, mAP@0.5, F1-score, and average detection speed. The experimental results are shown in Table 3.

The data analysis in Table 3 shows that the YOLOv7-OT model’s performance has improved compared to the original YOLOv7 model in four indicators, indicating the effectiveness of the enhanced method in identifying storage tanks. The improved YOLOv7-OT model is comparable to the YOLOv10 model in terms of mAP@0.5 and F1-score but outperforms the other three models in other indicators. The experiments demonstrate that the YOLOv7-OT model’s performance in remote sensing image storage tank target detection is on par with, if not superior to, other target detection models, and it shows good performance across all indicators.

4.3. Model Application

To explore the performance of the YOLOv7-OT model on large-scale high-resolution remote sensing images, this study selected high-resolution remote sensing image data within the range of 118.169°–119.102° east longitude and 36.905°–38.145° north latitude. This area covers approximately 8257 square kilometers, with a resolution of 0.5 m, sourced from Google Earth. The area is a significant oil and gas industry cluster near the ocean, encompassing large areas of farmland, wetlands, plains, a few hills, and diverse surface types. Additionally, the area includes a major city, numerous man-made facilities, and signs of human activity. The remote sensing images in this region are extensive, with complex backgrounds and numerous targets, making them highly representative. This study segments the remote sensing images within this range into six areas based roughly on surface features, resulting in a total of 575,675 meta-images, each sized 256 × 256 pixels. The coverage area is shown in Figure 7.

This paper uses YOLOv7-OT and other target detection models to detect the meta-images of the six regions. For the detection results of the six regions, this paper counts the number of grids and targets where the detection results are located and manually counts the real targets and the grids where they are located. The statistical results are calculated using Formula (8) to obtain the precision of different regions.

P = \frac{T P}{T P + F P}

(8)

In the formula,

P

represents the precision rate,

T P

represents the number of correctly detected tank targets, and

F P

represents the number of incorrectly detected tank targets. The statistical results and precision rates are shown in Table 4.

By analyzing the data in the table, this paper finds that the detection results of different target detection models all perform poorly in Area. II, and, compared with the results of the YOLOv7-OT model, the YOLOv7-OT model performs better in terms of the target precision index.

After observing the cases of incorrect prediction samples, this paper believes that the low precision may be due to the large number of wetlands and other human activities in the area and the complex background. An example of an incorrect prediction result is shown in Figure 8. It can be seen that the incorrect targets are mostly circular terrain and traces of human activities, which have a certain degree of similarity with the correct targets. The YOLOv7-OT model can reduce the target misjudgment caused by edge errors to a certain extent through the improved data processing method before detection, thereby improving the accuracy.

In order to explore the performance of the tank target on the target aggregation S, that is, to explore whether there is a certain degree of aggregation effects between the target and other targets, this paper counts the grids where the target is located obtained by model detection and calculates the precision rate of the grid where the target is located according to Formula (9).

P_{G i r d} = \frac{{T P}_{G i r d}}{{T P}_{G i r d} + {F P}_{G i r d}}

(9)

where

P_{G i r d}

represents the precision rate,

{T P}_{G i r d}

represents the number of grids where the tank target is correctly detected, and

{F P}_{G i r d}

represents the number of grids where the tank target is incorrectly detected. The grid is defined as an area formed by splicing four meta-images of size 256 × 256. There is no duplication between grids. For specific methods, see the section on improving the pre-detection stage. The statistical results and precision rates are shown in Table 5.

Analyzing the results in Table 5, this paper finds that Area. II still performs lower than other areas in terms of grid precision. From the comparison results of the grid precision index of the target detection model, the YOLOv7-OT model is higher than other target detection models in terms of grid precision.

In the process of large-scale detection, there are many isolated false targets. False targets are generally circular natural landforms or traces of human activities. These targets are often formed by chance, so they do not have a clustering effect, which leads to the fact that these false targets are often far away from other targets and have a low target aggregation. However, the target detection model will detect the false targets as correct during the detection process, and the grids where the false targets are located will be included in the statistical range, which will lead to the appearance of a large number of false grids.

The YOLOv7-OT model uses an improved post-processing method to detect isolated targets in the grid and eliminate targets with low aggregation. It can effectively exclude the grids where isolated targets are located, thereby reducing the number of erroneous grids.

Combining the data from Table 4 and Table 5, the YOLOv7-OT model demonstrates superior performance in recognizing storage tank targets in large-scale remote sensing images compared to other target detection models. Its precision is also higher than that of the newer YOLOv10 model. According to Formula (8), precision is primarily determined by the TP and FP parameters.

In actual experiments, it was found that the YOLOv7-OT and YOLOv10 models have the same detection rate for small targets. This means that, for small targets correctly identified, both models can detect them, with the only difference being confidence levels. Both models can identify these as storage tank targets, and the value of the TP parameter remains relatively small. However, in terms of misidentifying targets, the YOLOv10 model’s stronger feature extraction capability allows it to extract more secondary features. While this does not enhance the importance of main features, it makes the YOLOv10 model more likely to misidentify suspected storage tank targets as actual tanks, increasing the FP parameter and decreasing precision.

At the same time, this paper conducts an ablation experiment on the YOLOv7-OT model to verify the rationality of the improvements in the three stages pre-detection, mid-detection, and post-detection. The experimental results are shown in Table 6, where “√” represents the use of this improvement method, and “--” represents the non-use of this improvement method.

From the data analysis in Table 4, we can see that, after adding the improvement mid-detection method, the mAP@0.5 value of the model increased by 0.03. In terms of the precision rate of large-scale detection, the improvement in the pre-detection method and the improvement in the mid-detection method improved the original model by 3.1% and 3.8%, respectively; the improvement in the post-detection method improved the precision rate of large-scale detection by a large margin, by 12.9%. Using the three methods at the same time can improve the precision rate of large-scale detection by 16.1%. It proves that the three methods are effective in improving model performance.

In summary, when YOLOv7 is used to observe large-scale high-resolution remote sensing images, the precision rate is about 79.8%, and the detection accuracy is 0.87; when the YOLOv7-OT model is used, the precision rate can be increased to 95.9%, and the detection accuracy reaches 0.90. As shown in Figure 9, comparing the two pictures above, it can be observed that the YOLOv7-OT model can detect small targets that the YOLOv7 model has not detected, and the confidence in the correct target is generally higher than that of the YOLOv7 model; comparing the two pictures below, it can be observed that some circular buildings or circular terrains are excluded by the YOLOv7-OT model. Experiments show that the model can effectively improve the accuracy of tank detection in high-resolution remote sensing images, improve the precision rate when observing large-scale remote sensing images, and provide scientific help for determining tank targets and the spatial distribution of major methane emission sources.

5. Conclusions

This paper proposes a YOLOv7-OT target detection model for detecting tank information in high-resolution remote sensing images. The model is based on the YOLOv7 network structure and improves the target detection model in three stages: pre-detection, mid-detection, and post-detection. In the pre-detection stage, the background interference caused by the edge of the image is reduced by cropping and re-stitching the large-scale high-resolution remote sensing images. In the mid-detection stage, the CBAM is added to the YOLOv7 network to improve the model convergence speed and detection accuracy. In the post-detection stage, the target aggregation degree of the detection target is judged, and the threshold is set to eliminate outliers. Finally, a comparative experiment was carried out in a large-scale high-resolution remote sensing image. The experimental results show that the YOLOv7-OT model can effectively improve the detection accuracy and precision.

This study still has some shortcomings, such as the problem of small target detection that has not been properly solved. In the future, we can continue to study from two aspects. One is to use other target detection models for targeted improvements to solve problems such as complex backgrounds and small target detection; on the other hand, we can expand the application of detection results, collect statistics on information such as the size and geographical location of the detected target, and conduct joint analysis with other environmental factors.

Author Contributions

Conceptualization, Y.W. and Z.Z.; methodology, Y.W.; software, Z.Z., L.F. and Y.L.; validation, Z.Z. and Y.L.; formal analysis, P.R.; investigation, Y.W. and L.L.; resources, Y.W.; data curation, Z.Z.; writing—original draft preparation, Z.Z.; writing—review and editing, Y.W., Z.Z. and P.R.; visualization, Z.Z. and Y.L.; supervision, L.L. and Y.D.; project administration, Y.D.; funding acquisition, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

Author Lu Fan is employed by Sinopec Shengli Oilfield Company Technology Inspection Center. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Guo, Q.; Xi, X.; Yang, S.; Cai, M. Technology strategies to achieve carbon peak and carbon neutrality for China’s metal mines. Int. J. Miner. Metall. Mater. 2022, 29, 626–634. [Google Scholar] [CrossRef]
Zeng, Y.; Wang, X.M.; Tang, H. The Scientific Connotation, Realization Path and Challenges of Carbon Neutral Strategy of Carbon Dafeng. Mod. Chem. 2022, 42, 1–4. [Google Scholar]
Xiao, L.L. China’s Summit Diplomacy and National Green Strategy Capacity Building in the Context of Carbon Neutrality. J. Nanjing Univ. Sci. Technol. 2023, 36, 7–15. [Google Scholar]
Shi, T.; Han, G.; Ma, X.; Mao, H.; Chen, C.; Han, Z.; Pei, Z.; Zhang, H.; Li, S.; Gong, W. Quantifying factory-scale CO₂/CH₄ emission based on mobile measurements and EMISSION-PARTITION model: Cases in China. Environ. Res. Lett. 2023, 18, 034028. [Google Scholar] [CrossRef]
Pei, Z.; Han, G.; Mao, H.; Chen, C.; Shi, T.; Yang, K.; Ma, X.; Gong, W. Improving quantification of methane point source emissions from imaging spectroscopy. Remote Sens. Environ. 2023, 295, 113652. [Google Scholar] [CrossRef]
Ramsden, A.E.; Ganesan, A.L.; Western, L.M.; Rigby, M.; Manning, A.J.; Foulds, A.; France, J.L.; Barker, P.; Levy, P.; Say, D.; et al. Quantifying fossil fuel methane emissions using observations of atmospheric ethane and an uncertain emission ratio. Atmos. Chem. Phys. 2022, 22, 3911–3929. [Google Scholar] [CrossRef]
Han, G.; Huang, Y.; Shi, T.; Zhang, H.; Li, S.; Zhang, H.; Chen, W.; Liu, J.; Gong, W. Quantifying CO₂ emissions of power plants with Aerosols and Carbon Dioxide Lidar onboard DQ-1. Remote Sens. Environ. 2024, 313, 114368. [Google Scholar] [CrossRef]
Cao, D.; Xue, M.; Bai, D.; Wei, L.; Yang, S.; Sun, J.; Wang, Q.; Mao, Y. Characteristics and Quantification of Methane Emissions from Petroleum and Gas Processing Stations. Chin. J. Environ. Eng. 2023, 17, 4088–4095. [Google Scholar]
Qu, J.S.; Qu, S.B.; Wang, Z.J. Feature-based fuzzy-neural network approach for target classification and recognition in remote sensing images. J. Remote Sens. 2009, 13, 67–74. [Google Scholar]
Cheng, G.; Han, J. A Survey on Object Detection in Optical Remote Sensing Images. ISPRS J. Photogramm. Remote Sens. 2016, 117, 11–28. [Google Scholar] [CrossRef]
Chen, S.; Kang, Q.; Wang, Z.; Shen, Z.; Pu, H.; Han, H.; Gu, Z. Target detection method by airborne and spaceborne images fusion based on past images. In LIDAR Imaging Detection and Target Recognition; SPIE: Bellingham, WA, USA, 2017; Volume 10605. [Google Scholar]
Wang, X.; Ban, Y.; Guo, H.; Hong, L. Deep learning model for target detection in remote sensing images fusing multilevel features. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 250–253. [Google Scholar]
Wang, Y.; Sun, G.; Guo, S. Target detection method for low-resolution remote sensing image based on ESRGAN and ReDet. Photonics 2021, 8, 431. [Google Scholar] [CrossRef]
Fan, L.; Wang, Y.; Hu, G.; Li, F.; Dong, Y.; Zheng, H.; Ling, C.; Huang, Y.; Ding, X. Diffusion-Based Continuous Feature Representation for Infrared Small-Dim Target Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5003617. [Google Scholar] [CrossRef]
Shi, W.; Bao, J.; Yao, Y. Remote sensing image target detection and identification based on deep learning. Comput. Appl. 2020, 40, 3558–3562. [Google Scholar]
Basaeed, E.; Łoza, A.; Al-Mualla, M. Integrated remote sensing image fusion framework for target detection. In Proceedings of the 2013 IEEE 20th International Conference on Electronics, Circuits, and Systems (ICECS), Abu Dhabi, United Arab Emirates, 8–11 December 2013; pp. 86–87. [Google Scholar]
Shu, C.; Sun, L. Automatic target recognition method for multitemporal remote sensing image. Open Phys. 2020, 18, 170–181. [Google Scholar] [CrossRef]
Li, X.; Liu, Y. Oil tank detection in optical remote sensing imagery based on quasi-circular shadow. J. Electron. Inf. Technol. 2016, 38, 1489–1495. [Google Scholar]
Wang, T.; Li, Y.; Yu, S.; Liu, Y. Estimating the volume of oil tanks based on high-resolution remote sensing images. Remote Sens. 2019, 11, 793. [Google Scholar] [CrossRef]
Li, C.; Guo, H.; Ma, D.; Yu, D.; Huang, C. Comparative analysis of the accuracy of deep learning algorithms for oil tank detection in remote sensing imagery. Mar. Surv. Charting 2021, 2, 52–56. [Google Scholar]
Zhu, M.; Wang, Z.; Bai, L.; Zhang, J.; Tao, J.; Chen, L. Detection of industrial storage tanks at the city-level from optical satellite remote sensing images. Image Signal Process. Remote Sens. XXVII 2021, 11862, 266–272. [Google Scholar]
Yu, P.; Wang, X.; Jiang, G.; Liu, J.; Xu, B. An Improved YOLOv4 Algorithm for Detecting Typical Targets in Remote Sensing Images. J. Surv. Mapp. Sci. 2021, 38, 280–286. [Google Scholar]
Li, X.; Te, R.; Yi, F.; Xu, G. TCS-YOLO model for global oil storage tank inspection. Opt. Precis. Eng. 2023, 31, 246–262. [Google Scholar] [CrossRef]
Sun, W.; Hu, C.; Luo, N.; Zhao, Q. An optimization method of multiscale storage tank target detection introducing an attention mechanism. Geocarto Int. 2024, 39, 2339304. [Google Scholar] [CrossRef]
Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
Cheng, G.; Han, J.W.; Lu, X.Q. Remote Sensing Image Scene Classification: Benchmark and State of the Art. Proc. IEEE 2020, 105, 1865–1883. [Google Scholar] [CrossRef]
Cheng, G.; Zhou, P.; Han, J. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 2016, 54, 7405–7415. [Google Scholar] [CrossRef]
Zhang, Y.L.; Yuan, Y.; Feng, Y.C. Hierarchical and Robust Convolutional Neural Network for Very High-Resolution Remote Sensing Object Detection. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5535–5548. [Google Scholar] [CrossRef]
Yao, Y.; Jiang, Z.; Zhang, H. Oil tank detection based on salient region and geometric features. In Optoelectronic Imaging and Multimedia Technology III, Beijing, China; SPIE: Bellingham, WA, USA, 2014; pp. 276–281. [Google Scholar]
Cai, X.; Sui, H.; Lv, R.; Song, Z. Automatic circular oil tank detection in high-resolution optical image based on visual saliency and Hough transform. In Proceedings of the 2014 IEEE Workshop on Electronics, Computer and Applications, Ottawa, ON, Canada, 8–9 May 2014; pp. 408–411. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Li, G.; Suo, R.; Zhao, G.; Gao, C.; Fu, L.; Shi, F.; Dhupia, J.; Li, R.; Cui, Y. Real-time detection of kiwifruit flower and bud simultaneously in orchard using YOLOv4 for robotic pollination. Comput. Electron. Agric. 2022, 193, 106641. [Google Scholar] [CrossRef]
Wu, D.; Jiang, S.; Zhao, E.; Liu, Y.; Zhu, H.; Wang, W.; Wang, R. Detection of Camellia oleifera fruit in complex scenes by using YOLOv7 and data augmentation. Appl. Sci. 2022, 12, 11318. [Google Scholar] [CrossRef]
Cui, W.; Li, Z.; Duanmu, A.; Xue, S.; Guo, Y.; Ni, C.; Zhu, T.; Zhang, Y. CCG-YOLOv7: A Wood Defect Detection Model for Small Targets Using Improved YOLOv7. IEEE Access 2024, 12, 10575–10585. [Google Scholar] [CrossRef]
Zou, H.; He, G.; Yao, Y.; Zhu, F.; Zhou, Y.; Chen, X. YOLOv7-EAS: A Small Target Detection of Camera Module Surface Based on Improved YOLOv7. Adv. Theory Simul. 2023, 6, 2300397. [Google Scholar] [CrossRef]
Zhang, Y.; Ye, M.; Zhu, G.; Liu, Y.; Guo, P.; Yan, J. FFCA-YOLO for Small Object Detection in Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5611215. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European conference on computer vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]

Figure 1. YOLOv7-OT framework.

Figure 2. (a) Original stitched image, and (b) edge re-stitching stitched image.

Figure 3. (a) Schematic diagram of CBAM; and (b) YOLOv7 backbone network with CBAM.

Figure 4. Spatial aggregation screening method.

Figure 5. (a) YOLOv7 model results; and (b) YOLOv7-OT model results.

Figure 6. (a) YOLOv7 model P-R curve; and (b) YOLOv7-OT model P-R curve. (The curves shown in the legend completely overlap).

Figure 7. Large-scale remote sensing image coverage and zoning.

Figure 8. (a) Circular terrain, (b) man-made round building, (c) round traces of human activity, and (d) correct storage tank target.

Figure 9. Comparison of detection results.

Table 1. Basic information about the hybrid dataset.

Dataset	Resolution	Image Size/Pixel	Number of Images Containing Storage Tanks	Number of Storage Tanks
DIOR	0.5 m–30 m	800 × 800	1244	20,361
NWPUU_RESISC45	0.2 m–30 m	256 × 256	688	12,405
NWPU-VHR-10	0.5 m–2 m	(500–1100) × (500–1000)	165	1698
TGRS-HRRSD	0.15 m–1.2 m	(152–10569) × (152–10569)	897	4406
Self-built dataset	0.5 m–3 m	512 × 512	574	7205
Total	——	——	3568	46,075

Table 2. Target entity binary classification criteria.

Confusion Matrix		Predicted Value
Confusion Matrix		Positive	Negative
Real value	Positive	TP	FN
Real value	Negative	FP	TN

Table 3. Comparison of training results of different models.

Models	Precision	Recall	mAP@0.5	F1-Score	FPS
YOLOv7-OT	0.89	0.87	0.90	0.88	67.4
Faster RCNN	0.83	0.74	0.84	0.78	6.3
YOLOv7	0.85	0.78	0.87	0.81	60.3
YOLOv10	0.89	0.89	0.90	0.89	63.4

Table 4. Comparison of target precision of YOLOv7-OT and other target detection models.

Area	True Targets	YOLOv7-OT Precision	YOLOv7 Precision	YOLOv10 Precision	Faster RCNN Precision
Area. I	3392	96.0%	82.5%	78.1%	80.3%
Area. II	206	76.2%	41.5%	44.9%	39.7%
Area. III	1593	92.0%	80.1%	74.7%	72.5%
Area. IV	3330	96.9%	75.8%	73.5%	76.6%
Area. V	692	92.2%	70.3%	70.1%	73.5%
Area. VI	3679	98.9%	88.0%	82.5%	80.1%
Total	12892	95.9%	79.8%	76.2%	76.6%

Table 5. Comparison of grid precision of YOLOv7-OT and other target detection models.

Area	True Grids	YOLOv7-OT Precision	YOLOv7 Precision	YOLOv10 Precision	Faster RCNN Precision
Area. I	443	88.0%	46.4%	94.7%	50.2%
Area. II	27	64.2%	11.4%	32.9%	10.7%
Area. III	150	90.9%	38.9%	73.2%	35.7%
Area. IV	388	94.4%	22.5%	72.1%	20.4%
Area. V	83	88.2%	29.6%	76.9%	35.7%
Area.VI	412	96.7%	53.0%	85.1%	61.9%
Total	1503	91.8%	39.3%	79.7%	34.5%

Table 6. YOLOv7-OT model ablation experiment.

Model	Pre-Detection	Mid-Detection	Post-Detection	Precision	mAP@0.5
YOLOv7	--	--	--	79.8%	0.87
YOLOv7+ Pre-detection	√	--	--	82.9%	0.87
YOLOv7+ Mid-detection	--	√	--	83.6%	0.90
YOLOv7+ Post-detection	--	--	√	92.7%	0.87
YOLOv7-OT	√	√	√	95.9%	0.90

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wan, Y.; Zhan, Z.; Ren, P.; Fan, L.; Liu, Y.; Li, L.; Dai, Y. Storage Tank Target Detection for Large-Scale Remote Sensing Images Based on YOLOv7-OT. Remote Sens. 2024, 16, 4510. https://doi.org/10.3390/rs16234510

AMA Style

Wan Y, Zhan Z, Ren P, Fan L, Liu Y, Li L, Dai Y. Storage Tank Target Detection for Large-Scale Remote Sensing Images Based on YOLOv7-OT. Remote Sensing. 2024; 16(23):4510. https://doi.org/10.3390/rs16234510

Chicago/Turabian Style

Wan, Yong, Zihao Zhan, Peng Ren, Lu Fan, Yu Liu, Ligang Li, and Yongshou Dai. 2024. "Storage Tank Target Detection for Large-Scale Remote Sensing Images Based on YOLOv7-OT" Remote Sensing 16, no. 23: 4510. https://doi.org/10.3390/rs16234510

APA Style

Wan, Y., Zhan, Z., Ren, P., Fan, L., Liu, Y., Li, L., & Dai, Y. (2024). Storage Tank Target Detection for Large-Scale Remote Sensing Images Based on YOLOv7-OT. Remote Sensing, 16(23), 4510. https://doi.org/10.3390/rs16234510

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Storage Tank Target Detection for Large-Scale Remote Sensing Images Based on YOLOv7-OT

Abstract

1. Introduction

2. Data and Model

2.1. Dataset

2.2. Model

3. Method

3.1. Pre-Detection Stage

3.2. Mid-Detection STAGE

3.3. Post-Detection Stage

4. Result and Discussion

4.1. Model Evaluation Indicators

4.2. Experimental Results

4.3. Model Application

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI