Aircraft Position Estimation Using Deep Convolutional Neural Networks for Low SNR (Signal-to-Noise Ratio) Values

Mazurek, Przemyslaw; Chlewicki, Wojciech

doi:10.3390/s25010097

Open AccessArticle

Aircraft Position Estimation Using Deep Convolutional Neural Networks for Low SNR (Signal-to-Noise Ratio) Values

by

Przemyslaw Mazurek

^*,†

and

Wojciech Chlewicki

^*,†

Department of Signal Processing and Multimedia Engineering, West Pomeranian University of Technology in Szczecin, al. Piastow 17, 70-310 Szczecin, Poland

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Sensors 2025, 25(1), 97; https://doi.org/10.3390/s25010097

Submission received: 25 November 2024 / Revised: 25 December 2024 / Accepted: 26 December 2024 / Published: 27 December 2024

(This article belongs to the Section Sensing and Imaging)

Download

Browse Figures

Versions Notes

Abstract

:

The safety of the airspace could be improved by the use of visual methods for the detection and tracking of aircraft. However, in the case of the small angular size of airplanes and the high noise level in the image, sufficient use of such methods might be difficult. By using the ConvNN (Convolutional Neural Network), it is possible to obtain a detector that performs the segmentation task for aircraft images that are very small and lost in the background noise. In the learning process, a database of actual aircraft images was used. Using the Monte Carlo method, four types of Max algorithms, i.e., Pixel Value, Min. Pixel Value, and Max. Abs. Pixel Value, were compared with ConvNN’s forward architecture. The obtained results showed superior detection with ConvNN. For example, if the standard deviation equals 0.1, it was twice as large. Deep dream analysis for network layers is presented, which shows a preference for images with horizontal contrast lines. The proposed solution uses the processed image values for the tracking process with the raw data using the Track-Before-Detect method.

Keywords:

machine vision; air surveillance; deep convolutional neural networks; image processing

1. Introduction

Airspace safety is an important aspect of protection systems. These systems employ specialized air traffic control measures that utilize primary and secondary radars, operating effectively regardless of weather conditions. With secondary radars, such as the ADS-B system, it is possible to obtain information about the position of an object in space, its identifier, and additional details [1]. Data from the ADS-B system can be verified using information from the primary radar; however, this verification does not always allow a comprehensive assessment of many important parameters. For example, a different aircraft or UAV (Unmanned Aerial Vehicle) might transmit inaccurate data.

Weather radars have demonstrated their effectiveness in detecting airborne targets. The information gathered from this method could be valuable for sharing with air traffic control systems, as highlighted in [2]. Additionally, recent research has focused on shape modeling, which offers useful functionalities for target detection in air traffic control and defense systems [3]. Analyzing weather change characteristics offers important information for air traffic control systems, and this information can also be utilized for aircraft detection [4].

However, data verification can also be achieved by optical means, using position recognition systems, and by identifying the type of aircraft [5]. Monitoring airspace in the optical range can be challenging due to varying atmospheric conditions [6]. However, these systems can operate nearly continuously in certain regions due to favorable weather. Providing this additional information to airspace security systems is valuable as it helps address various potential threats, including terrestrial attacks and asymmetric aggression from other countries.

There are many other methods for the optical detection of airplanes. Most of these methods focus on detecting airplanes present at airports based on satellite images [7,8]. There are also works related to tracking systems, including vision systems. In certain studies, airplanes are represented as dot images or images made up of only a few pixels, meaning they are very small compared to the overall size of the image. This situation necessitates the development of complex methods, which often involve the use of neural networks at various stages. In some instances, the application of neural networks is combined with a multiscale approach [9,10]. In other cases, advanced edge detection techniques are employed [11]. On the other hand, visual classification was used in other works related to airspace safety. These works concern the detection and identification of UAVs [12,13,14,15] and birds [16]. There are also classification methods based on Jet Engine Modulation (JEM) of the primary radar signal [6,17].

Multimodal data fusion could be utilized in tracking systems, such as ADS-B, as an additional data source or for controlling an ALT-AZ camera head. However, this capability is often disabled or limited, particularly for non-civil or intruder aircraft.

This study aimed to develop and evaluate a new methodology for estimating aircraft positions using deep convolutional neural networks. Specifically, it presented a comparative analysis of four different algorithms for position estimation, focusing on the proposed solution based on a deep neural network with a forward architecture. A Monte Carlo simulation was performed to evaluate the efficacy of the algorithms against a backdrop of real images subjected to synthesized noise. Subsequent sections elaborate on the database utilized and the methodology implemented for the extraction of a training set from this database. Furthermore, a Convolutional Neural Network (ConvNN) integrated with data augmentation techniques was proposed to enhance performance. The ensuing sections present the results of the experiments, accompanied by an in-depth discussion. The final chapter summarizes the conclusions drawn from the study and proposes ways for future research.

The following sections describe the database and the method of obtaining a training set using that database. A ConvNN network with data augmentation was also proposed. In the following sections, the results are presented, and a discussion is held. In the Section 6, the conclusions and the further planned work are presented.

2. Data

In this study, the subset of the image database ‘FGVC–Aircraft Benchmark’ [18] that contains various photos of airplanes was used. The pictures of the airplanes were photographed both in flight and at the airport, in different view variants. Atypical and incomplete photos with other background objects, two airplanes, etc., were removed from this database. This was done algorithmically and then with the help of further expert verification. The original database is the database of color images, as shown in (Figure 1). The rejected images are presented in the top row, and the bottom row shows the images accepted for further processing. The issue with the FGVS database lies in the quality of the images. These images do not depict real detection scenarios, where the aircraft is viewed from a considerable distance, leading to a smaller representation in the camera’s frame and a loss of visible details. Factors such as weather conditions and lighting must also be taken into account in these situations, additionally.

This database serves as an appropriate tool for evaluating algorithms in general. However, it is important to note that its applicability to practical scenarios, specifically in the realm of detecting aircraft in flight, is limited. The FGVS database has been used for many years as a benchmark [19], achieving over 96% accuracy using the best solutions. However, this high accuracy is only applicable to very high-resolution, noise-free color images. Therefore, these results do not represent real-world applications related to aircraft detection under actual conditions.

The subset prepared in this way contained 1536 photos (available on github [20]), which were scaled to the resolution of

64 \times 64

so that the airplane had a horizontal or vertical size of 64 pixels. The photos were processed in such a way as to remove clouds from the background. The size of airplanes is between 3 and 16 pixels (length), so they start from the small object class (about 10 pixels occupied) and cover the extended object class (less than 100 pixels). The small object class is very difficult to detect and requires tracking using a video sequence [6]. In the case of the extended object class, detection is possible using a single frame under good visibility conditions because multiple pixels cover information about the object, and there is a possibility to use knowledge about shape and reflected light from airplanes. The camera or object motion blur is less important because a distant object moves with small angular velocity. Atmospheric scintillation is important for distance detection [21]. In the domain of object tracking, the implementation of image distortion models that account for various atmospheric phenomena can be utilized. Although techniques such as active camera cooling can mitigate image sensor noise, it is important to acknowledge that such noise remains an inherent characteristic of image acquisition systems [22].

In the process of augmenting the images, the following operations were performed: real-time data augmentation left–right flip, angle

\pm 5

deg, and scaling the contrast so that the background was 0 and the maximum luminescent value was

+ 1

or

- 1

, respectively; Gaussian noise was added with a zero mean value and variable standard deviation (0–0.1). The noise level value ranged from 0 to such a large value that the airplane was no longer visible to a human. The size of the airplane was changed from 16 pixels (the base image was 64 pixels) to 4 pixels.

This study did not test the ability to classify airplanes, but only the ability to detect airplanes based on a single image frame. Most of the airplanes were viewed from the side or from below. The samples with condensation streaks have been removed from the database. The image of an airplane taken during the day depends on the orientation and position of the airplane in 3D space towards the observer and the sun.

In this investigation, grayscale images were used. The database consisted of images with various lighting configurations. Because the reference point is the background, normalized to the value 0, the darker (usually darkened and dark-painted) values are negative, and the rest are positive. Detection of the airplane in the image was not based on the areas with pixel values above level 0 (bright parts of the airplane) but on both light and dark areas and the spatial relations between them. Many civil aircraft are painted in bright colors such as red. In contrast to them, military airplanes use camouflage in the form of painting in the color usually gray. The database contains a mask specifying the location of the airplane’s pixels and the background, which was used to compare the methods.

3. Method

There are many different methods for detecting objects in an image [23,24,25]. In the case of strong noise (small signal-to-noise ratio), the detection of objects is not possible based on a single pixel, and it is necessary to use spatial information. This can be done theoretically with a 2D matched filter bank, but such a bank must be very large due to different shapes, orientations, scales, and lighting conditions. For this reason, the use of neural networks seemed to be more effective due to the much smaller scale of filters and the transition from shallow to deep architecture. In this work, a ConvNN was used, with a typical forward deep architecture. Exemplary configuration of the ConvNN architecture is shown in the Figure 2.

Different configurations were tested, as presented in Table 1. Two sizes of convolutions were considered, i.e.,

3 \times 3

and

5 \times 5

, for various numbers of neurons, respectively. Four convolutional layers were applied with two main configurations. In the first configuration, successive reduction of the number of neurons in successive layers was assumed. The secondary configuration assumed a fixed number of neurons, and only the last layer used the two neurons.

The neural network underwent training under multiple configurations. The solution of the network learned over 5000 epochs was chosen. The images were grouped 64 in one to take advantage of the features of the convolutional network. The resulting network operated as a semantic network with two output classes (background/sky and airplane). The training process was facilitated using an NVIDIA GeForce GTX Titan X (Santa Clara, CA, USA) and NVIDIA Quadro RTX 8000 graphics processing units in conjunction with Matlab R2024b, specifically using the Deep Learning Toolbox [26]. The real-time augmentation process described in Section 2 was used.

As shown in Table 2, the learning algorithms used were: Stochastic Gradient Descent with Momentum (SGDM) [27], RMSpror, and ADAM [28].

4. Results

In testing, image blocks with a size of

32 \times 32

pixels and a corresponding mask for the airplane’s pixels were used. The resistance test of detection algorithms for the Monte Carlo method was utilized in this process. Gaussian noise was used, with the standard deviation ranging from 0 to

0.5

with a step of

0.01

. A total of 153,600 of aircraft images were used for each noise level. Sample ConvNN image responses are shown in Figure 3 and Figure 4.

The basic criteria are the following measures: Max. Pixel Value, Min. Pixel Value, and Max. Abs. Pixel Value for pixel values related to the airplane area (mask) relative to the background. Since the airplane can have brighter and darker pixels compared to the background, it can be detected in three ways by a local algorithm (moving window) using one of these criteria. The maximum absolute value search criterion is more universal than the first two. Exemplary reference images and detections based on the maximal value response of ConvNN are presented in Figure 5.

The results for the three basic detection methods and ConvNNs are shown in dependence on the level of Gaussian noise in Figure 6 (SGDM training algorithm), Figure 7 (RMSprop training algorithm), and Figure 8 (ADAM training algorithm). For simplicity of visualization, the results are noise-changed in the steps of 0.03. The results for basic detection algorithms using pixel values are shown in black, and red for ConvNNs where the first convolutional layer contains 128 neurons, green 64 neurons, and blue 32 neurons. All ConvNNs achieved after 5000 steps mini-batch accuracy >97%.

Two distinct visualization methods can be employed for individual cases to ascertain the learning patterns of a neural network. The first approach involves visualizing the weights originating from the first layer of the neural network [29]. It is commonly anticipated that these weights function as shape detectors (see Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13), as mentioned by the existing literature [30,31,32]. The second approach involves a technique that enables the visualization of individual neuron activations within the network, commonly called deep dream, as referenced in [33,34].

Using a detector in the form of a ConvNN maximizes the value for the airplane (they are larger than the background) because the expected output in the learning process is the mask. This allows the estimation of a very approximate silhouette region of the aircraft. Using the maximum value search algorithm is more efficient because it is not related to one pixel, but is the result of larger pixel group relationships. The semantic network returns two numerical results for each pixel, corresponding to the degree of belonging to each class. A value comparison was used to obtain a detection with binary output.

The network training time is shown in Table 3 for the NVIDIA Quadro RTX 8000 graphics processing unit.

5. Discussion

The Monte Carlo method, using real images and controlled disturbances, allowed for the estimation of the algorithm’s potential. The advantage of an approach with a detector that is learned by various luminance configurations is independence from sudden changes in lighting. For example, the use of the sliding window correlation method for successive frames is very difficult in such cases.

Location detection with the proposed ConvNN is very effective (especially for Figure 6). In the absence of noise (which is very difficult to obtain, in fact, due to the noise associated with the acquisition process or atmospheric fluctuations), a trivial algorithm could be used that, based on the estimated background, detracts the signal from the camera. Detection of very small airplanes, even with the size of one pixel, is then possible. This is not possible in a realistic case with image noise, and a machine learning approach is required.

In the

32 \times 32

blocks, the largest pixel value was related to the airplane area (Figure 5); therefore, the curve for ConvNN differs significantly from other simple algorithms (Figure 6, Figure 7 and Figure 8). This does not mean, however, that the high values are not in the background area, but the local value for the airplane is dominant. It is not appropriate to use detection by comparing both responses from a semantic network (which is binary). This leads to a large number of false detections in the background area, therefore this type of analysis was abandoned. This problem is well known in the literature related to tracking objects with a signal similar to background noise [35].

The correct approach is based on the rejection of the binary response from the aircraft detector in favor of tracking the signal (Figure 5) for the given trajectories. This method is known as TBD (Track–Before–Detect) [36,37,38,39,40] and requires very high computing power because all possible trajectories have to be tracked on the measurement sequence (in this case, video sequence) [36,41,42,43,44]. This tracking increases the SNR (Signal–to–Noise Ratio), thanks to which effective detection is possible only after that.

The best result was achieved by a network with a

5 \times 5

convolution window and an architecture with a successively decreasing number of neurons in successive hidden layers (128-64-32). In comparison, changing the convolution window to

3 \times 3

resulted in a much worse result. Increasing the number of neurons in subsequent hidden layers (128-128-128) gave a worse result, which can be interpreted as too weak convergence, despite the local accuracy value being >97%. This shows that analyzing this type of network more thoroughly under strong noise conditions during training is necessary. The advantage of a network with a larger convolution window is that the tracked objects are larger than individual pixels and can be easily detected by 2D filtering.

Based on the analysis of the figures (see Figure 9, Figure 10, Figure 11, Figure 12, Figure 13 and Figure 14), it can be concluded that all examined networks tend to detect fine details within the images. A notable characteristic shared among these networks is the identification of both bright and dark individual pixels. This phenomenon can be attributed to the fact that such pixel values enhance the likelihood of accurately detecting airplanes against a predominantly gray background. Furthermore, another salient feature observed is the detection of horizontal lines and edges. This can be explained by the composition of the dataset, which contains a significant number of aircraft displayed in horizontal or nearly horizontal orientations. This is typical for observing aircraft from a long distance. However, when an aircraft flies close to the camera, different angles can be captured. From a security perspective, detecting aircraft from greater distances is more important, as it allows for shorter reaction times when an aircraft appears, which is crucial for both civil and military applications.

Networks that have a larger number of neurons in the first layer offer a greater variety of feature detectors for images. Likewise, a network that uses a larger mask size (

5 \times 5

) emphasizes edge features more effectively than one that uses a smaller mask size (

3 \times 3

).

Larger masks enable the detection of larger features within an image, aligning with the principle of matched filters used for feature detection [45]. Conversely, when a smaller mask is used, the detection of larger features will be deferred to subsequent layers. Using very large masks can result in two scenarios: either mask reduction, where some weights are zeroed out, limiting the effective size of the mask, or the storage of the entire image of the object, such as an airplane. In the latter scenario, if the first layer contains a very large number of neurons, the network may function as a bank of matched filters that potentially perform interpolations. This is disadvantageous if the learning process—specifically the dataset and its augmentation—is not sufficiently diverse. Very large masks with a large number of neurons can result in slower network learning and increased processing costs, especially for real-time applications, additionally.

From a military standpoint, the detection of features associated with camouflage seems to be interesting. Implementing a strategy to minimize contrast in aircraft paint is advisable, with the optimal goal being to achieve a brightness that closely aligns with the surrounding environment. Nevertheless, it is essential to acknowledge that varying lighting conditions can produce shadows, which may lead to the emergence of darker areas that can be effectively detected by neural networks.

The deep dream study [33,34] demonstrates what input images can cause significant activation of neurons. However, this method is not straightforward, and only one sample image is presented. Layer

c o n v_{1}

represents the output of the first convolutional mask. In contrast, for the network shown in Figure 15, Figure 16 and Figure 17, the images exhibit lower contrast and feature horizontal lines. In Figure 18, Figure 19 and Figure 20, the shape detectors appear at the very beginning. Regarding the analysis of the second layer (again referring to Figure 15, Figure 16 and Figure 17), the strong activations correspond to raster-like images that also include line rasters. This means that the high-frequency components have a greater share in detection over the entire image area. It includes identifying edges that are either brighter or darker than the background, as well as edges that contain a mix of both bright and dark lines or pixels concerning the background. These characteristics are also observed in the

c o n v_{3}

layer of these networks. Other networks exhibit similar features in their

c o n v_{3}

layers, although they are not as prominently displayed.

In summary, deep dream analysis offers valuable insights into how networks operate and how they are trained. This suggests the possibility of designing filters intentionally rather than creating detectors solely through training the network. Such an approach could lead to the development of more effective solutions that account for the specificities of signal processing.

This analysis offers critical information regarding both optimal and suboptimal camouflage techniques. In the context of military applications, it further elucidates strategies for flight maneuvers designed to minimize optical detection within the visible spectrum, taking into consideration variables such as solar orientation and cloud cover.

Figure 21 illustrates the convergence process for training data across three different algorithms and various network configurations. The graph demonstrates that the accuracy ratio is quickly achieved within 100 iterations, indicating the strong performance of the deep network for object detection when using the SGDM and ADAM algorithms. However, in the case of the RMSprop algorithm, the accuracy values for subsequent batches fluctuate significantly, necessitating a much larger number of iterations (over 4000) to stabilize the accuracy. Additionally, the mini-batch loss graphs highlight the impact of the chosen convolution mask size. Specifically, smaller masks (size

3 \times 3

) correspond to higher error values, while larger masks (size

5 \times 5

) and more complex networks yield lower error values.

The training time values presented in Table 3 are relatively short due to the use of images with a larger number of planes for training. This approach also promotes better data balance. As a result, the computational capabilities of the convolutional network on the GPU are utilized more effectively during the training process. This indicates that the proposed network architectures are well-suited for larger systems.

6. Conclusions and Further Work

In this work, the detection based on ConvNN was presented with promising results. The advantage of this approach is its scalability for vision systems due to the simple structure using a weave. It can also be used in cameras that use intelligent image processing.

Further work will focus on improving the quality of the estimation of location in cloudy conditions—changing backgrounds and supplementing with tracking algorithms to further improve the quality of position estimation. The use of tracking algorithms [6] might have a positive impact on the improvement of the entire process, especially at high noise levels. Particularly, when the signal could be lost in the background noise and it is not possible to detect it based on one observation, but it is possible based on a series (in this case, video sequences).

Another problem, and at the same time a method of position estimation, is the use of information about condensation trails. Depending on the weather conditions, they are visible or not and could be a very good indicator of the presence of an aircraft also from a very long distance.

Author Contributions

Conceptualization, P.M. and W.C.; methodology, P.M.; software, P.M.; validation, W.C. and P.M.; formal analysis, P.M.; investigation, W.C.; resources, P.M.; data curation, W.C.; writing—original draft preparation, P.M. and W.C.; visualization, P.M.; supervision, P.M.; project administration, P.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available from the authors.

Acknowledgments

This work is supported by the UE EFRR ZPORR project Z/2.32/I/1.3.1/267/05 “Szczecin University of Technology–Research and Education Center of Modern Multimedia Technologies” (Poland). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X GPU used also for this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ADAM	ADAptive Moment Estimation
ALT-AZ	Altitude-Azimuth
ConvNN	Convolutional Neural Network
UAV	Unmanned Aerial Vehicle
JEM	Jet Engine Modulation
SGDM	Stochastic Gradient Descent with Momentum
TBD	Track Before Detect
SNR	Signal To Noise Ratio

References

Mayer, C.; Tzanos, P. Comparison of ASR-11 and ASR-9 surveillance radar azimuth error. In Proceedings of the 2011 IEEE/AIAA 30th Digital Avionics Systems Conference, Seattle, WA, USA, 16–20 October 2011; pp. 4E2-1–4E2-6. [Google Scholar] [CrossRef]
Rzewuski, S.; Kulpa, K.; Gromek, A. Airborne targets detection using weather radar. In Proceedings of the 2015 Signal Processing Symposium (SPSympo), Debe, Poland, 10–12 June 2015; pp. 1–4. [Google Scholar] [CrossRef]
Liu, Y.; Ji, H.; Zhang, Y. Gaussian-like measurement likelihood based particle filter for extended target tracking. IET Radar Sonar Navig. 2023, 17, 579–593. [Google Scholar] [CrossRef]
Villa, G.; Ruiz, J.A.; Da Costa, C.; Corrales, J.L.; Pacho, A.; Ferres, I. Enhanced Weather Detection and Tracking Algorithms in Primary Surveillance Radar. In Proceedings of the 2024 Integrated Communications, Navigation and Surveillance Conference (ICNS), Herndon, VA, USA, 23–25 April 2024; pp. 1–7. [Google Scholar] [CrossRef]
Chen, Y.; Zhou, L. Vulnerabilities in ADS-B and Verification Method. In Proceedings of the 2020 IEEE 2nd International Conference on Civil Aviation Safety and Information Technology (ICCASIT), Weihai, China, 14–16 October 2020; pp. 90–94. [Google Scholar] [CrossRef]
Blackman, S.; Popoli, R. Design and Analysis of Modern Tracking Systems; Artech House: Norwood, MA, USA, 1999. [Google Scholar]
Polat, M.; Mohammed, H.M.A.; Oral, E.A.; Ozbek, I.Y. Aircraft Detection from Satellite Images Using ATA-Plane Data Set. In Proceedings of the 2019 27th Signal Processing and Communications Applications Conference (SIU), Sivas, Turkey, 24–26 April 2019; pp. 1–4. [Google Scholar] [CrossRef]
Li, W.; Xiang, S.; Wang, H.; Pan, C. Robust airplane detection in satellite images. In Proceedings of the 2011 18th IEEE International Conference on Image Processing, Brussels, Belgium, 11–14 September 2011; pp. 2821–2824. [Google Scholar] [CrossRef]
Han, W.; Kuerban, A.; Yang, Y.; Huang, Z.; Liu, B.; Gao, J. Multi-Vision Network for Accurate and Real-Time Small Object Detection in Optical Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Xu, Z.F.; Jia, R.S.; Yu, J.T.; Yu, J.Z.; Sun, H.M. Fast aircraft detection method in optical remote sensing images based on deep learning. J. Appl. Remote Sens. 2021, 15, 014502. [Google Scholar] [CrossRef]
Bai, J.; Yu, W.; Yuan, A.; Xiao, Z. Airplane Detection in Optical Remote Sensing Video Using Spatial and Temporal Features. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–7. [Google Scholar] [CrossRef]
Magoulianitis, V.; Ataloglou, D.; Dimou, A.; Zarpalas, D.; Daras, P. Does Deep Super-Resolution Enhance UAV Detection? In Proceedings of the 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Taipei, Taiwan, 18–21 September 2019; pp. 1–6. [Google Scholar] [CrossRef]
Seidaliyeva, U.; Akhmetov, D.; Ilipbayeva, L.; Matson, E. Real-Time and Accurate Drone Detection in a Video with a Static Background. Sensors 2020, 20, 3856. [Google Scholar] [CrossRef] [PubMed]
Koksal, A.; Ince, K.; Alatan, A.A. Effect of Annotation Errors on Drone Detection with YOLOv3. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Los Alamitos, CA, USA, 14–19 June 2020; pp. 4439–4447. [Google Scholar] [CrossRef]
Matczak, G.; Mazurek, P. Comparative Monte Carlo Analysis of Background Estimation Algorithms for Unmanned Aerial Vehicle Detection. Remote Sens. 2021, 13, 870. [Google Scholar] [CrossRef]
Oh, H.M.; Lee, H.; Kim, M.Y. Comparing Convolutional Neural Network(CNN) models for machine learning-based drone and bird classification of anti-drone system. In Proceedings of the 2019 19th International Conference on Control, Automation and Systems (ICCAS), Jeju, Republic of Korea, 15–18 October 2019; pp. 87–90. [Google Scholar] [CrossRef]
Yang, W.Y.; Park, J.H.; Bae, J.W.; Kang, S.C.; Myung, N.H. Automatic extraction of jet engine blade number based on joint time-frequency analysis of jet engine modulation signals. In Proceedings of the 2014 Asia-Pacific Microwave Conference, Sendai, Japan, 4–7 November 2014; pp. 1333–1335. [Google Scholar]
Maji, S.; Kannala, J.; Rahtu, E.; Blaschko, M.; Vedaldi, A. Fine-Grained Visual Classification of Aircraft. Technical Report. arXiv 2013, arXiv:1306.5151. Available online: http://arxiv.org/abs/1306.5151 (accessed on 25 November 2024).
Papers with Code. Fine Grained Image Classification on FGVC. Meta AI Research. 2024. Available online: https://paperswithcode.com/sota/fine-grained-image-classification-on-fgvc (accessed on 25 November 2024).
Mazurek, P. Subset of FGVC-Aircraft Benchmark. 2024. Available online: https://github.com/orinocopl/aircraftdetection2024 (accessed on 25 November 2024).
Andrews, L. Field Guide to Atmospheric Optics. 2004. Available online: https://doi.org/10.1117/3.549260 (accessed on 25 November 2024).
Stefanov, K.D. CMOS Image Sensors; IOP Publishing: Bristol, UK, 2022; pp. 2053–2563. [Google Scholar] [CrossRef]
Ouchra, H.; Belangour, A. Object detection approaches in images: A survey. In Proceedings of the Thirteenth International Conference on Digital Image Processing (ICDIP 2021), Singapore, 20–23 May 2021; Volume 11878, pp. 132–141. [Google Scholar]
Fasana, C.; Pasini, S.; Milani, F.; Fraternali, P. Weakly Supervised Object Detection for Remote Sensing Images: A Survey. Remote Sens. 2022, 14, 5362. [Google Scholar] [CrossRef]
Bai, C.; Bai, X.; Wu, K. A Review: Remote Sensing Image Object Detection Algorithm Based on Deep Learning. Electronics 2023, 12, 4902. [Google Scholar] [CrossRef]
Beale, M.H.; Hagan, M.T.; Demuth, H.B. Deep Learning Toolbox. User’s Guide; Mathworks: Natick, MA, USA, 2020. [Google Scholar]
Yuan, W.; Hu, F.; Lu, L. A new non-adaptive optimization method: Stochastic gradient descent with momentum and difference. Appl. Intell. 2022, 52, 3939–3953. [Google Scholar] [CrossRef]
Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2017, arXiv:1609.04747. Available online: http://arxiv.org/abs/1609.04747 (accessed on 25 November 2024).
Hmidi, A.; Jihene, M. A CONVblock for Convolutional Neural Networks; IGI Global Scientific Publishing: Hershey, PA, USA, 2021; pp. 100–113. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems; Pereira, F., Burges, C., Bottou, L., Weinberger, K., Eds.; Curran Associates, Inc.: Newry, UK, 2012; Volume 25. [Google Scholar]
Li, Y.; Yosinski, J.; Clune, J.; Lipson, H.; Hopcroft, J. Convergent Learning: Do different neural networks learn the same representations? In Proceedings of the 1st International Workshop on Feature Extraction: Modern Questions and Challenges at NIPS 2015, Montreal, QC, Canada, 11–12 December 2015; Volume 44, pp. 196–212. [Google Scholar]
Chowers, R.; Weiss, Y. What do CNNs learn in the first layer and why? a linear systems perspective. In Proceedings of the 40th International Conference on Machine Learning, JMLR.org, ICML’23, Honolulu, HI, USA, 23–29 July 2023. [Google Scholar]
Al-Khazraji, L.R.; Abbas, A.R.; Jamil, A.S. The Effect of Changing Targeted Layers of the Deep Dream Technique Using VGG-16 Model. Int. J. Online Biomed. Eng. 2023, 19, 34–47. [Google Scholar] [CrossRef]
Sahu, P.; Chug, A.; Singh, A.P.; Singh, D. Classification of crop leaf diseases using image to image translation with deep-dream. Multimed. Tools Appl. 2023, 82, 35585–35619. [Google Scholar] [CrossRef]
Scott, D. Performance analysis framework for embedded video-tracking systems. In Proceedings of the Acquisition, Tracking, Pointing, and Laser Systems Technologies XXV, Orlando, FL, USA, 25–26 April 2011; Volume 8052, pp. 63–70. [Google Scholar] [CrossRef]
Stone, L.; Barlow, C.; Corwin, T. Bayesian Multiple Target Tracking; Artech House: Norwood, MA, USA, 1999. [Google Scholar]
Gural, P.; Larsen, J.; Gleason, A. Matched Filter Processing for Asteroid Detection. Astron. J. 2005, 130, 1951–1960. [Google Scholar] [CrossRef]
Zhang, T.; Li, M.; Zuo, Z.; Yang, W.; Sun, X. Moving dim point target detection with three–dimensional wide–to–exact search directional filtering. Pattern Recognit. Lett. 2007, 28, 246–253. [Google Scholar] [CrossRef]
Mazurek, P. Track–Before–Detect Filter Banks for Noise Object Tracking. Int. J. Electron. Telecommun. 2013, 59, 325–330. [Google Scholar] [CrossRef]
Mazurek, P.; Krupinski, R. Monte Carlo Analysis of Local Cross–Correlation ST–TBD Algorithm. In Proceedings of the Computational Science—ICCS 2019, Faro, Portugal, 12–14 June 2019; Rodrigues, J.M.F., Cardoso, P.J.S., Monteiro, J., Lam, R., Krzhizhanovskaya, V.V., Lees, M.H., Dongarra, J.J., Sloot, P.M., Eds.; Springer: Cham, Switzerland, 2019; pp. 60–70. [Google Scholar]
Lai, J.; Ford, J.; O’Shea, P.; Walker, R.; Bosse, M. A Study of Morphological Pre–Processing Approaches for Track–Before–Detect Dim Target. In Proceedings of the Australasian Conference on Robotics & Automation, Canberra, Australia, 3–5 December 2008. [Google Scholar]
Mazurek, P. Optimization of Bayesian Track-Before-Detect Algorithms for GPGPUs Implementations. Electr. Rev. 2010, 86, 187–189. [Google Scholar]
Mazurek, P. Code reordering using local random extraction and insertion (LREI) operator for GPGPU-based Track-Before-Detect systems. Soft Comput. 2013, 18, 1095–1106. [Google Scholar] [CrossRef]
Mazurek, P. Application of dot product for Track-Before-Detect tracking of noise objects. Poznań Univ. Technol. Acad. J.-Electr. Eng. 2013, 76, 101–107. [Google Scholar]
Stanković, L.; Mandic, D. Convolutional Neural Networks Demystified: A Matched Filtering Perspective-Based Tutorial. IEEE Trans. Syst. Man Cybern. Syst. 2023, 53, 3614–3628. [Google Scholar] [CrossRef]

Figure 1. Sample images from the ‘FGVC–Aircraft Benchmark’ database. The images in the upper row represent the rejected images, whereas those in the lower row represent the accepted images in this study.

Figure 2. Example configuration of the ConvNN architecture

3 \times 3

128-64-32.

Figure 2. Example configuration of the ConvNN architecture

3 \times 3

128-64-32.

Figure 3. Exemplary results for small airplanes. Noiseless (left–top) reference, noised image (std.dev. =

0.02

) (right–top), binary output decision (left–bottom), and detection values (right–bottom).

Figure 3. Exemplary results for small airplanes. Noiseless (left–top) reference, noised image (std.dev. =

0.02

) (right–top), binary output decision (left–bottom), and detection values (right–bottom).

Figure 4. Exemplary results for large airplanes. Noiseless (left–top) reference, noised image (std.dev. =

0.10

) (right–top), binary output decision (left–bottom), and detection values (right–bottom).

Figure 4. Exemplary results for large airplanes. Noiseless (left–top) reference, noised image (std.dev. =

0.10

) (right–top), binary output decision (left–bottom), and detection values (right–bottom).

Figure 5. Exemplary reference images and detections (red markers) based on the maximal value response of ConvNN.

Figure 6. Monte Carlo results for four detection algorithms: trivial (Max, Min, and Max Abs) and convolutional neural networks (SGDM).

Figure 7. Monte Carlo results for four detection algorithms: trivial (Max, Min and Max Abs) and convolutional neural networks (RMSprop).

Figure 8. Monte Carlo results for four detection algorithms: trivial (Max, Min, and Max Abs) and convolutional neural networks (ADAM).

Figure 9. Image of the weight mask associated with the first convolutional layer of the neural network specified in row 2 (

3 \times 3

64-32-16) of Table 1.

Figure 9. Image of the weight mask associated with the first convolutional layer of the neural network specified in row 2 (

3 \times 3

64-32-16) of Table 1.

Figure 10. Image of the weight mask associated with the first convolutional layer of the neural network specified in row 3 (

3 \times 3

32-16-8) of Table 1.

Figure 10. Image of the weight mask associated with the first convolutional layer of the neural network specified in row 3 (

3 \times 3

32-16-8) of Table 1.

Figure 11. Image of the weight mask associated with the first convolutional layer of the neural network specified in row 4 (

5 \times 5

126-64-32) of Table 1.

Figure 11. Image of the weight mask associated with the first convolutional layer of the neural network specified in row 4 (

5 \times 5

126-64-32) of Table 1.

Figure 12. Image of the weight mask associated with the first convolutional layer of the neural network specified in row 5 (

5 \times 5

64-32-16) of Table 1.

Figure 12. Image of the weight mask associated with the first convolutional layer of the neural network specified in row 5 (

5 \times 5

64-32-16) of Table 1.

Figure 13. Image of the weight mask associated with the first convolutional layer of the neural network specified in row 6 (

5 \times 5

32-16-8) of Table 1.

Figure 13. Image of the weight mask associated with the first convolutional layer of the neural network specified in row 6 (

5 \times 5

32-16-8) of Table 1.

Figure 14. Image of the weight mask associated with the first convolutional layer of the neural network specified in row 1 (

3 \times 3

128-64-32) of Table 1.

Figure 14. Image of the weight mask associated with the first convolutional layer of the neural network specified in row 1 (

3 \times 3

128-64-32) of Table 1.

Figure 15. Image of the deep dream feature for convolutional layers of the neural network specified in row 4 (

5 \times 5

126-64-32) of Table 1.

Figure 15. Image of the deep dream feature for convolutional layers of the neural network specified in row 4 (

5 \times 5

126-64-32) of Table 1.

Figure 16. Image of the deep dream feature for convolutional layers of the neural network specified in row 5 (

5 \times 5

64-32-16) of Table 1.

Figure 16. Image of the deep dream feature for convolutional layers of the neural network specified in row 5 (

5 \times 5

64-32-16) of Table 1.

Figure 17. Image of the deep dream feature for convolutional layers of the neural network specified in row 6 (

5 \times 5

32-16-8) of Table 1.

Figure 17. Image of the deep dream feature for convolutional layers of the neural network specified in row 6 (

5 \times 5

32-16-8) of Table 1.

Figure 18. Image of the deep dream feature for convolutional layers of the neural network specified in row 1 (

3 \times 3

128-64-32)of Table 1.

Figure 18. Image of the deep dream feature for convolutional layers of the neural network specified in row 1 (

3 \times 3

128-64-32)of Table 1.

Figure 19. Image of the deep dream feature for convolutional layers of the neural network specified in row 2 (

3 \times 3

64-32-16) of Table 1.

Figure 19. Image of the deep dream feature for convolutional layers of the neural network specified in row 2 (

3 \times 3

64-32-16) of Table 1.

Figure 20. Image of the deep dream feature for convolutional layers of the neural network specified in row 3 (

3 \times 3

32-16-8) of Table 1.

Figure 20. Image of the deep dream feature for convolutional layers of the neural network specified in row 3 (

3 \times 3

32-16-8) of Table 1.

Figure 21. Convergence of algorithms for different ConvNN configurations.

Table 1. Configurations of tested ConvNN.

No.	Convolution	Conv_1	Conv_2	Conv_3	Conv_4
No.	Convolution	[No. Neurons]	[No. Neurons]	[No. Neurons]	[No. Neurons]
1	$3 \times 3$	128	64	32	2
2	$3 \times 3$	64	32	16	2
3	$3 \times 3$	32	16	8	2
4	$5 \times 5$	128	64	32	2
5	$5 \times 5$	64	32	16	2
6	$5 \times 5$	32	16	8	2
7	$3 \times 3$	128	128	128	2
8	$3 \times 3$	64	64	64	2
9	$3 \times 3$	32	32	32	2
10	$5 \times 5$	128	128	128	2
11	$5 \times 5$	64	64	64	2
12	$5 \times 5$	32	32	32	2

Table 2. Configurations of tested ConvNN.

Algorithm	SGDM	RMSprop	ADAM
Initial Learn Rate	0.1	0.1	0.1
Regularization	$L_{2}$	$L_{2}$	$L_{2}$
Momentum	0.9	-	-
Learn Rate Drop Factor	0.98	0.98	0.98
Learn Rate Drop Period	50	50	50
Max. Epochs	5000	5000	5000
Mini-Batch Size	150	150	150
Learn Rate Schedule	piecewise	piecewise	piecewise
Shuffle	every-epoch	every-epoch	every-epoch
Gradient Threshold Method	$L_{2}$ norm	$L_{2}$ norm	$L_{2}$ norm
Gradient Threshold	0.05	0.05	0.05

Table 3. Learning time for various configurations of the evaluated methods.

No.	ConvNN Configuration	SGDM	RMSprop	ADAM
1	$3 \times 3$ 128-64-32	02:24:45	02:24:40	02:23:57
2	$3 \times 3$ 64-32-16	02:38:54	02:00:59	02:00:59
3	$3 \times 3$ 32-16-8	02:29:15	01:50:33	01:50:31
4	$5 \times 5$ 128-64-32	03:17:58	02:35:43	02:35:08
5	$5 \times 5$ 64-32-16	02:46:17	02:08:37	02:07:32
6	$5 \times 5$ 32-16-8	02:33:34	01:53:19	01:52:45
7	$3 \times 3$ 128-128-128	04:57:54	05:07:36	05:06:19
8	$3 \times 3$ 64-64-64	03:01:41	02:16:43	02:16:20
9	$3 \times 3$ 32-32-32	02:36:08	01:56:17	01:55:04
10	$5 \times 5$ 128-128-128	05:13:55	06:07:06	06:06:12
11	$5 \times 5$ 64-64-64	03:05:48	02:28:50	02:27:06
12	$5 \times 5$ 32-32-32	02:41:30	02:01:17	02:01:01

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mazurek, P.; Chlewicki, W. Aircraft Position Estimation Using Deep Convolutional Neural Networks for Low SNR (Signal-to-Noise Ratio) Values. Sensors 2025, 25, 97. https://doi.org/10.3390/s25010097

AMA Style

Mazurek P, Chlewicki W. Aircraft Position Estimation Using Deep Convolutional Neural Networks for Low SNR (Signal-to-Noise Ratio) Values. Sensors. 2025; 25(1):97. https://doi.org/10.3390/s25010097

Chicago/Turabian Style

Mazurek, Przemyslaw, and Wojciech Chlewicki. 2025. "Aircraft Position Estimation Using Deep Convolutional Neural Networks for Low SNR (Signal-to-Noise Ratio) Values" Sensors 25, no. 1: 97. https://doi.org/10.3390/s25010097

APA Style

Mazurek, P., & Chlewicki, W. (2025). Aircraft Position Estimation Using Deep Convolutional Neural Networks for Low SNR (Signal-to-Noise Ratio) Values. Sensors, 25(1), 97. https://doi.org/10.3390/s25010097

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Aircraft Position Estimation Using Deep Convolutional Neural Networks for Low SNR (Signal-to-Noise Ratio) Values

Abstract

1. Introduction

2. Data

3. Method

4. Results

5. Discussion

6. Conclusions and Further Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI