**1. Introduction**

In recent decades, rapid economic development has led to a significant increase in energy consumption. In China's primary energy share in 2019, the proportion of fossil energy consumption was still more than 85%, according to the BP Statistical Review of World Energy. The burning of fossil fuels will release a large amount of pollutants into the atmosphere, which will cause serious environmental problems and endanger the health of nearby residents. Among different pollutant discharge sources, the industry discharge contributes the most. The waste gas produced by fossil fuel consumption in industry is mainly discharged to the atmosphere through the chimney. Therefore, the distribution of working chimneys serve as a very important indicator of local air pollution situation. Detecting the number of chimneys and their working status is of great significance to urban environment monitoring and environmental governance.

Target detection on high-resolution remote sensing image provides an efficient and accurate way to detect the position and status of the chimney. There are two types of target detection algorithms: traditional algorithms and algorithms based on deep learning. The traditional algorithms, such as the Local Binary Pattern (LBP) [1] algorithm, scale-invariant feature transform (SIFT) [2] algorithm, and the Support Vector Machine (SVM) [3] algorithm, do not perform well in accuracy and robustness when used for dealing with complex recognition problems [4]. To increase the detection accuracy, a deep learning algorithm, convolutional neural network (CNN) [5], has been proposed to imitate the human brain neuron connection and transfer message mechanism. This kind of deep learning algorithm can be divided into two categories, the one-step algorithm and the two-step algorithm. The one-step algorithm, such as Single Shot MultiBox Detector (SSD) [6], and You Only Look Once (YOLO) [7], has less accuracy as well as lower computational cost. The two-step algorithm, such as region-based convolutional neural networks (R-CNNs) [8], Fast R-CNN [9], and Faster R-CNN [10], is characterized by its high accuracy and high time cost.

At present, deep learning has been successfully applied in remote sensing images in aircraft detection [11–13], ship detection [14–16], oil tank [17–19] detection with good performance. Several experiments on chimney detection have also been reported. Yao et al. [20] used the Faster R-CNN to detect the chimney and condensing tower. Zhang et al. [21] established the BUAA-FFPP60 dataset, which can be used not only to detect the targets, but also to confirm their working status. Comparison among different deep learning algorithms [6,10,22–27] is also made based on performance indicators, such as accuracy, model memory size, and running time, and results show that no single algorithm performs well in all aspects. Deng et al. [28] increased the number and scale of feature pyramids, based on the original Feature Pyramid Network (FPN), to improve the detection accuracy.

In practical application, the image always contains various artificial targets. Some targets are very close to the chimney in textures and geometric features, such as roads, building edges, and oil tanks. The Faster R-CNN for chimney detection in the aforementioned references is based on specific datasets that only contain manually selected chimneys. When the Faster R-CNN is used in a large-scale scene, there will be a large number of chimney-like targets that are misclassified into chimneys, leading to a significant decrease in precision. In order to improve the precision, we use two spatial analysis methods. The digital terrain model (DTM) is first introduced. DTM reflects the height fluctuation of ground objects. The chimney is a vertical object and appears elongated in the image. Therefore, where there is a chimney, the DTM will change dramatically. It can be used as a condition to determine whether there is a chimney by detecting the severity of the changes. In addition, in a high-resolution image, the field of view is relatively small, so the changes in observing angle in one image is small. Consequently, the chimneys in one image show the same pointing direction. In this paper, we call this direction the main direction of this image. Therefore, the detected objects that are not in accordance with main direction can be considered as false detections.

In this paper, we use BUAA-FFPP60 dataset [21] and Faster R-CNN algorithm to train the preliminarily detection model. Then, two spatial analysis methods, the DTM filtering and main direction test, are introduced to remove the false chimneys. The detailed description of the method is in Section 2, and the result discussion in Section 3. The results show that the elevation filtering and main direction test are both very effective in reducing false detection rate. Furthermore, the combination of these two methods show extremely good performance in increasing detecting precision.

#### **2. Methodology**

The method proposed in this paper consists of three parts: (1) the preliminary detection on enhanced images by Faster R-CNN, (2) the elevation filtering using local DTM, (3) the main direction test. The overall process diagram is given in Figure 1. Considering that the condensing tower is detected in former studies, its experimental results are preserved as comparative references. Furthermore, although the thermal infrared data are helpful for detecting the working chimneys, the resolutions of commonly accessible data are too low. Therefore, they are not used in this paper.

**Figure 1.** Process diagram.
