An Innovative Aircraft Skin Damage Assessment Using You Only Look Once-Version9: A Real-Time Material Evaluation System for Remote Inspection

Liao, Kuo-Chien; Lau, Jirayu; Hidayat, Muhamad

doi:10.3390/aerospace12010031

Open AccessArticle

An Innovative Aircraft Skin Damage Assessment Using You Only Look Once-Version9: A Real-Time Material Evaluation System for Remote Inspection

by

Kuo-Chien Liao

^1,*

,

Jirayu Lau

¹ and

Muhamad Hidayat

²

¹

Department of Aeronautical Engineering, Chaoyang University of Technology, Taichung 413, Taiwan

²

Department of Mechanical Engineering, Sumbawa University of Technology, Sumbawa Besar 84371, Indonesia

^*

Author to whom correspondence should be addressed.

Aerospace 2025, 12(1), 31; https://doi.org/10.3390/aerospace12010031

Submission received: 25 November 2024 / Revised: 30 December 2024 / Accepted: 3 January 2025 / Published: 6 January 2025

(This article belongs to the Special Issue Machine Learning for Aeronautics (2nd Edition))

Download

Browse Figures

Versions Notes

Abstract

:

Aircraft safety is the aviation industry’s primary concern. Inspections must be conducted before each flight to ensure the integrity of the aircraft. To meet the increasing demand for engineers, a system capable of detecting surface defects on aircraft was designed to reduce the workload of the inspection process. The system utilizes the real-time object detection capabilities of the you only look once-version 9 (YOLO v9) algorithm, combined with imagery captured from an unmanned aerial vehicle (UAV)-based aerial platform. This results in a system capable of detecting defects such as cracks and dents on the aircraft’s surface, even in areas that are difficult to reach, such as the upper surfaces of the wings or the higher parts of the fuselage. With the introduction of a Real-Time Messaging Protocol (RTMP) server, the results can be monitored via artificial intelligence (AI) and Internet of Things (IoT) devices in real time for further evaluation. The experimental results confirmed an effective recognition of defects, with a mean average precision (mAP@0.5) of 0.842 for all classes, the highest score being 0.938 for dents and the lowest value 0.733 for the paint-off class. This study demonstrates the potential for developing image detection technology with AI for the aviation industry.

Keywords:

artificial intelligence; you only look once-version9; Internet of Things; real-time detection; aircraft structural damage detection

1. Introduction

The aviation industry has witnessed significant growth in recent years, driven by the rising in the global travel demand and advancements in aerospace technology [1]. As more aircraft enter operation, the frequency and scale of maintenance tasks, including routine inspections of aircraft surfaces, have grown significantly for ensuring the safety and reliability of these aircraft [2]. The importance of the task portrays that expert-based technical work plays a pivotal role in this inspection, testing, and analysis of the aircraft for its integrity, performance, and material durability under extreme operating conditions [3]. However, the traditional inspection methods, which mainly focus on bulky equipment, and skilled personnel struggle to meet the rising demand. These methods are time consuming, are labor intensive, and often involve accessing hard-to-reach areas, leading to operational delays and increased downtime [4,5,6]. The need for scalable and efficient inspection solutions has become critical to keeping pace with the demands of a growing industry.

To address these challenges, this study explores the integration of unmanned aerial vehicles (UAVs) into routine aircraft inspection workflows [7]. UAVs offer a compelling solution to the challenges of traditional inspection workflows. UAVs provide a portable platform for accessing hard-to-reach areas, such as the upper surfaces of aircraft wings or tail sections, without requiring scaffolding or additional infrastructure. This allows for high-resolution imaging in real time, significantly reducing inspection times while improving safety [8]. Furthermore, UAVs are ideal for field or remote environments where deploying heavy equipment is impractical, allowing for greater flexibility and operational efficiency in routine inspections [7].

The introduction of artificial intelligence (AI) could further enhance the capabilities of UAV-based inspections by automating the detection and assessment of surface defects. Traditional inspections often rely on human expertise, which can be inconsistent and prone to errors, especially when identifying subtle or small-scale damage [9]. AI-powered object detection models, such as YOLOv9, offer a robust solution by enabling automated and accurate identification of defects, including cracks, dents, and corrosion [10]. These models process large volumes of data in real time, making it feasible to perform inspections directly onboard UAVs without requiring post-processing or external hardware [8]. By combining the precision of AI with the versatility of UAVs, inspections become faster, more reliable, and less dependent on skilled personnel, addressing both scalability and accuracy challenges.

This study proposes a novel UAV-based inspection system integrating YOLOv9 for the real-time detection of aircraft skin damage. By combining the accessibility of UAVs with the precision of AI automation, the system addresses the operational demands of routine inspections. Its portability and user-friendly design enable efficient and streamlined inspections, reducing downtime, lowering costs, and enhancing safety in the aviation industry. A key component of this system is the implementation of YOLO (you only look once) detection models, specifically the YOLO9 series. These models are well suited for real-time object detection, providing a robust solution for identifying and assessing damage to aircraft surfaces [11]. The use of YOLO models ensures that the system is not only fast but also accurate, enabling inspectors to detect small defects that could otherwise be missed with traditional methods. By overcoming the limitations of traditional methods, the proposed system offers a scalable and innovative solution to meet the growing maintenance needs of modern aviation.

Recent research suggests the potential of UAVs and AI in inspection tasks. UAVs have been widely adopted in infrastructure inspections, demonstrating their ability to access challenging areas and collect high-quality data [10,12]. Similarly, AI-powered object detection models, including YOLO, have significantly improved accuracy and automation in detecting defects across various domains [13,14,15]. Studies on surface fatigue analysis further emphasize the importance of automated tools for identifying damage in metal structures [16,17]. Building on these advancements, this study integrates UAVs and YOLOv9 to bridge the gap between accessibility and precision, offering an innovative and practical solution tailored to the aviation industry’s growing needs.

2. Methodology

As the results of the detection rely heavily on the performance of the cameras during the inspections, the specifications of the cameras and some image parameters are worth mentioning. Image parameters including camera sensitivity, commonly known by the name of ISO speed, which reflects the camera’s sensitivity to light. Lower ISO settings generally result in images with less noise or grain, particularly in well-lit conditions, resulting in more detailed images with smoother transitions between tones. However, higher ISO values can introduce image noise or graininess, especially in low-light conditions [18]. For instance, the performance of neural networks in detecting defects is highly influenced by the lighting conditions under which the inspection is conducted. Research indicates that optimal defect recognition is achieved at an illumination level of approximately 200 lux. In contrast, insufficient lighting (below 150 lux) can obscure critical features of the defects, leading to missed detections or incomplete analysis. Similarly, excessive lighting (above 250 lux) can cause glare or overexposure, which may distort the visual features of the defects and result in false positives or incorrect predictions [19]. The ISO range of the detecting devices can partially diminish the possible problem posed by varying lighting conditions. By adjusting the ISO settings, the sensor’s sensitivity to light can be optimized to capture clear and detailed images, even in suboptimal lighting environments [20]. Resolution is how well cameras capture and record details. Resolution is measured by the number of horizontal and vertical pixels in each frame. Higher video resolution provides more detail and clarity, which is essential for playback on large screens and professional video production [21]. The aperture, on the other hand, controls the amount of light that the cameras can record; hence, the larger the aperture, the more light that enters. In this respect, the aperture size of the iPhone 13 is more significant at f/1.6 compared to the UAV at f/2.8. Table 1 briefly compares the specifications of the three cameras used in this study.

2.1. Training Dataset

2.1.1. Defect Classification

In this study, a set of commonly observed defect categories was selected based on their frequency and relevance in aircraft maintenance. The identified defects were classified into six distinct categories to cover a broad spectrum of potential surface irregularities. Five of these categories—crack, dent, paint-off, scratch, and missing rivet—were specifically designated for visible light inspection due to their prominent appearance under standard lighting conditions. The sixth category, null, was introduced to address objects or patterns in the images that frequently triggered false positives during experimental testing. This classification aimed to improve the model’s accuracy by minimizing confusion to the model between the class and refining its ability to differentiate between genuine defects and irrelevant anomalies. We acquired the samples according to their definition shown in Table 2 and input into the model to enhance its practical applicability in real-world scenarios. Table 2 provides a detailed summary of the characteristics and descriptions of each defect type identified during aircraft inspections.

2.1.2. Image Acquisition

Aircraft used to collect defect samples in this study included the Cessna 172 (Textron Aviation, Wichita, KS, USA) and Hawker 400XP (Textron Aviation, Wichita, KS, USA). Figure 1a,b illustrated the aircraft used in this study.

The dataset used in this study consisted of 9420 images, including 921 images captured by the DJI Mavic 2 Enterprise UAV and about 800 images captured by iPhone 13. These images underwent an after process in which they were tiled and augmented using Roboflow (v9), an open-source tool that provides various services for computer vision projects, including data collection, annotation, preprocessing, model training, and deployment [26]. Figure 2 shows an example of the imagery-capturing process using the UAV. Since the built-in camera of the UAV could not zoom, the images were captured by controlling the UAV to face the defect at the closest possible distance.

2.1.3. Annotation

Data annotation involves labeling target objects in the captured images to help the algorithm learn and detect specific objects. This project’s annotation process was performed on Roboflow [26]. Defects in the images were defined as classes mentioned. Figure 3 defines annotated image examples of different defects classes, such as cracks (Figure 3a), dents (Figure 3b), missing rivets (Figure 3c), and scratches (Figure 3d).

2.2. Algorithm Training

In this study, the primary approach was to utilize an image detection algorithm, namely “You Only Look Once” (YOLO), to detect trained objects [27]. In this case, the model was specifically trained to detect damages, including cracks, scratches, dents, paint off, and missing head nails on aircraft surfaces. The equipment used for algorithm training in this study included a computer running Windows 11, equipped with an AMD Ryzen 9 7950X 16-core processor (AMD, Santa Clara, CA, USA), an NVIDIA GeForce RTX 4070Ti GPU (NVIDIA, Santa Clara, CA, USA)with 6 GB of VRAM, and 128 GB of RAM. Hardware from ASUS and MSI supported all model training tasks. Training large-scale models such as YOLO requires substantial computational power. CPUs, while powerful, are not optimized for parallel processing, making GPUs a necessity for efficient deep-learning tasks. The introduction of CUDA and CUDNN into the training process allowed the YOLO model to fully utilize GPU resources, significantly reducing training time and enhancing performance. These technologies, combined with PyTorch’s neural network operations, enabled the efficient handling of complex computations necessary for YOLO’s architecture.

2.2.1. Training Environment

In this study, the programming for algorithm training was conducted primarily using Visual Studio Code software, version 1.96 [28]. Python was installed as a plugin and served as the interpreter to utilize the “YAML” training files provided by the official YOLO site(s) [29,30]. YAML, a human-readable data serialization format, is widely used for configuration files and data interchange. It represents nested data structures with indentation, offering a concise and user-friendly alternative to formats such as JSON [31].

In the context of deep learning and machine learning, particularly with frameworks such as YOLO, a data.yaml file serves as a configuration file written in YAML format. This file specifies dataset paths, class names, and other necessary parameters required for training the model [31,32]. The data.yaml file used in this study defined the location of the data for training, validation, and testing as well as the classes included in the dataset.

The training process relied heavily on YAML files to define how the algorithm would learn from the dataset. Ultralytics (La Jolla, CA, USA) provides pre-configured training materials for various YOLO versions, including YOLOv5 Nano (yolov5n.yaml), YOLOv8 Nano (yolov8n.yaml), YOLOv9 Compact (yolov9c.yaml), and YOLOv9 Enhanced (yolov9e.yaml). These configuration files were adapted to meet the specific requirements of this study [29].

2.2.2. Code and Computation

The code for training was originally provided by ultralytics [29]; it enabled custom training of the YOLO model, either from scratch or using pre-trained weights. Weights represent the parameters in a neural network that transform input data through the network’s layers. During training, the model adjusts these weights to encode the knowledge learned from the dataset [33,34]. However, training with a large-scale database is time-intensive and demands significant computational resources [35,36]. To address this, GPUs and CUDA were employed to leverage parallel processing capabilities, reducing training time and computational load compared to CPUs [31,37,38]. After installing Python 3.10, essential libraries including PyTorch [39], TorchVision, OpenCV [40], NVIDIA CUDA, and CUDNN were configured to enable GPU acceleration during training [36,41]. These tools significantly enhanced model performance by optimizing computations on the GPU [42]. Figure 4 displays the YOLOv9 training code used for the training session in this study.

2.2.3. Training Parameters and Loss Functions

Several parameters were crucial in defining the training process. The “epochs” parameter determined the number of iterations for training, allowing the model to improve with each cycle, provided adequate resources were available. The “batch size” parameter controlled the number of training samples per iteration. Smaller batch sizes can result in noisy gradient updates but may improve generalization, while larger batch sizes are more memory efficient but risk overfitting [32,34,39,43].

Loss functions quantified the model’s performance during training. Three loss components—box loss, class loss, and DFL loss—were employed. Box loss measured the difference between predicted and ground truth bounding boxes using metrics, including Intersection over Union (IoU), defined in Equation (1) [44,45]:

I o U = (A r e a o f O v e r l a p) / (A r e a o f U n i o n)

(1)

Class loss quantified errors in object classification within bounding boxes. Cross-entropy loss, the most commonly used metric, is expressed in Equation (2) [33]:

C r o s s - E n t r o p y L o s s = - \sum_{i} y i l o g (p i)

(2)

DFL loss (Distribution Focal Loss) emphasized hard-to-classify samples by assigning them higher weights, improving the model’s learning on challenging cases. Its formula is given in Equation (3) [46]:

D F L L o s s = - α_{t} {(1 - p_{t})}^{r} l o g (P_{t})

(3)

where

$p_{t}$ is the predicted probability for the true class;
$α_{t}$ is a scaling factor;
r is the focusing parameter.

The objective during training was to minimize these loss values, as lower losses indicate better model performance. A lower loss value indicates that the model’s predictions are closer to the actual values, signifying better performance.

2.2.4. Training Outputs

Upon completing training, two output files in PyTorch format (“.pt”)—”best.pt” and “last.pt”—were generated. These files store the model’s architecture, parameters, and additional information required to recreate the model’s state [46]. The “best.pt” file, which represents the model with the highest performance, was selected for subsequent processes. Figure 5 illustrates the terminal output during model training in Visual Studio Code, which provides information on the losses’ value, GPU memory usage, and the calculated mAPs.

2.3. YOLO Model

Ultralytics provided several YOLO model options, each designed to address specific detection needs. For instance, YOLOv8 Nano is optimized for lightweight hardware environments, offering reduced computational requirements with only 3.2 M parameters and 8.7 FLOPs while achieving an mAP^val5095 of 37.3. YOLOv9 Compact (YOLOv9c) balances performance and efficiency, with 25.5 M parameters, 102.8 FLOPs with an mAP^val50-95 of 53.0, making it suitable for constrained resources. The YOLOv9 Enhanced model provides the highest accuracy (mAP^val50-95 of 55.6) but at the cost of significantly increased parameters (58.1 M) and FLOPs (192.5), ideal for high-performance tasks. Meanwhile, YOLOv5 Nano, the least demanding model, achieves an mAP^val50-95 of 28.0 with only 1.9 M parameters and 4.5 FLOPs, suitable for resource-limited real-time applications. Table 3 summarizes these trade-offs, allowing users to select a model based on hardware constraints and performance needs [29].

The mean average precision (mAP) value is a metric used to evaluate the performance of object detection models along with the metrics, such as precision, recall, and average precision [47,48]. The calculations regarding the evaluating of the model’s performance are shown in Equations (4)–(7) [48].

(1): Precision (P): The number of true positive detections divided by the total number of detections.

$P = T P / T P + F P$

(4)

where TP is true positives and FP is false positives.
(2): Recall (R): The number of true positive detections divided by the total number of ground-truth instances.

$R = T P / T P + F N$

(5)

where FN is false negatives.
(3): Average Precision (AP): The area under the precision–recall curve for a single class.

$A P = \sum_{n} (R_{n} - R_{n - 1}) \cdot P_{n}$

(6)

where R_n and P_n are the recall and precision at the nth threshold.
(4): Mean Average Precision (mAP): The mAP is calculated as the average of the average precision (AP) values across all classes:

$m A P = \frac{1}{N} \sum_{i = 1}^{N} A P_{i}$

(7)

where N is the number of classes and AP_i is the average precision for the ith class.

The mAP@0.5 metric evaluates the model’s performance at a fixed IoU threshold of 0.5. A detection is considered correct if the IoU between the predicted bounding box and the ground truth bounding box exceeds this threshold. This metric reflects the average precision across all classes under the specified IoU constraint [48].

The models underwent an experimental session using a custom database comprising five classes: crack, dent, missing-head, paint-off, and scratch. This experiment aimed to identify the most suitable models for the training material in this study, using the test devices listed in Table 1 and the training equipment described in Section 2.1. Given our limited GPU resources, the experiment was to ensure the practical performance and compatibility of each model within the constraints of our hardware. Figure 6 and Figure 7 demonstrate the performance score in mAP@50 and recall of different YOLO models tested under the mentioned environment for 500 epochs.

As shown in Figure 6, the training results demonstrate the differences in mAP@0.5 performance among various YOLO models. YOLOv9 Compact (Red Line) scored the highest mAP@0.5, achieving over 0.7 and remaining relatively stable after 200 epochs, suggesting the best and most reliable performance among the tested models. YOLOv8 Nano and YOLOv5 Nano also remained very stable but at lower mAP values compared to the YOLOv9 variants. The lowest score was from YOLOv5 Nano (Orange Line), which had the lowest mAP@0.5 among the compared models, reaching around 0.55, suggesting it is less accurate than the others.

In Figure 7, the chart illustrates the recall performance of different YOLO models. While all models show significant growth in the early epochs, YOLOv9 Compact (red line) consistently outperforms other models in recall throughout the training period, resulting in approximately a 0.7 recall value. YOLOv9 Enhanced (blue line) and YOLOv8 Nano (green line) show similar performance, with YOLOv9 Enhanced slightly edging out YOLOv8 Nano in the later epochs, though they still scored somewhat lower in this custom training. Table 4 summarizes the evaluation metrics across all the models.

Table 4 presents data collected at epoch 300. This decision was made because the training process for YOLOv9 Enhanced was set to conclude at epoch 500, while the others were set to finish at epoch 1000. Collecting data at epoch 300 ensured that the early completion of the training did not affect the metric results, as shown in Figure 6 and Figure 7. According to the official data regarding the performance of the models [49], YOLOv9 Enhanced was expected to outperform YOLOv9 Compact. However, several factors can affect performance, with the processing unit used during training being crucial, especially since YOLOv9 Enhanced requires extremely high FLOPs [49].

As shown in Figure 6 and Figure 7 and Table 4, the YOLOv9 Compact (red line) performed the best among the tested models. In the next phase of the experiment, YOLOv9 Compact will undergo further training with custom data.

This experiment was conducted using yolov9-c.yaml, the most suitable model for this study, provided by Ultralytics [50]. The model includes powerful new capabilities and efficiency in techniques such as Programmable Gradient Information (PGI), which optimizes gradient flow during the training of deep neural networks to improve convergence, prevent vanishing or exploding gradients, and enhance training efficiency. It also incorporates the Generalized Efficient Layer Aggregation Network (GELAN), which increases efficiency and effectiveness through efficient layer aggregation, generalization across tasks, and reduction of computational complexity [49].

According to yolov9.yaml, the YOLOv9 architecture is divided into two main parts: the “backbone” and the combination of the traditional “neck” and “head” of the neural network, simply defined as “head” [49]. Several modules are found in the source code, including “Conv Module”, “RepConvN Module”, “RepNBottleneck Module”, “RepNCSP Module”, “RepNCSpELAN4 Module”, “CBLinear Module”, and “CBFuse Module.” Each module in this architecture plays a specific role in processing and transforming the input data, enhancing the model’s performance in tasks such as object detection and recognition. The functions of these modules are as follows:

Conv Module: This module comprises three components: nn.Conv2d, nn.BatchNorm2d, and an activation function (Act). The nn.Conv2d layer applies convolutional operations to the input data, nn.BatchNorm2d normalizes the output to speed up training and improve stability, and the activation function introduces non-linearity to the model [51,52]. As shown in Figure 8a.
RepConvN Module: The module includes two convolutional layers and one batch normalization layer (bn), followed by an activation function (Act). The structure improves the efficiency and performance of the model by combining multiple convolutional operations [53,54], as shown in Figure 8b.
RepNBottleneck Module: This module makes use of a bottleneck architecture with a RepConvN block followed by a convolutional layer (Conv). This design helps to reduce the number of parameters and computational complexity while maintaining or improving the model’s accuracy [55], as shown in Figure 9a.
RepNCSP Module: This module consists of multiple RepNBottleneck blocks and a concatenation operation (Concat). The structure is designed to merge feature maps from different layers, enhancing the model’s ability to capture complex patterns [56], as shown in Figure 9b.
RepNCSpELAN4 Module: This module integrates several components: a convolutional layer (Conv), a chunk operation, two RepNCSP + Conv blocks, and a concatenation operation (Concat). This module outputs the processed data to subsequent layers. It is designed for efficient layer aggregation and improved feature learning [57], as shown in Figure 10a.
CBLinear Module: This module includes a convolutional layer (Conv), a split operation, and an element named A0 that outputs tensors. The structure is tailored to split the input data into multiple tensors for further processing [58], as shown in Figure 10b.
CBfuse Module: The CBfuse Module combines multiple interpolated inputs. Each input is first interpolated to a common size, then stacked together and finally summed. This approach helps in fusing features from different scales or resolutions, improving the overall feature representation [59], as shown in Figure 11.

Figure 12, Figure 13 and Figure 14 describe how an input image is processed by “yolov9.yaml” [49].

Figure 12 represents the architecture of a neural network backbone for YOLOv9 object detection, explicitly showing how different layers process the input data and pass the information through the network. The input to the network has a resolution of 640 × 640. In the first stage of the process, Layer 0 processes the input and directly outputs to Layer 1 and Layer 26, the “head” section, without altering the resolution. Layers in pink are convolutional layers that process and reduce the resolution by half, resulting in 320 × 320 pixels in Layer 1, which then outputs to Layer 2. Similarly, Layer 2 further reduces the resolution to 160 × 160 pixels and outputs to Layer 3 [10,60]. Layers in blue utilize the RepNCSpELAN4 block, maintaining the resolution while merging feature maps from different layers, enhancing the model’s ability [56]. In Layer 5, another RepNCSpELAN4 block with parameters [512, 256, 128, 1] keeps the resolution unchanged and outputs to Layer 6 and directly to Layer 24. It also concatenates with the upsampled Layer 13 at a further level. Resolution reduction and up-scaling procedures are fundamental to object detection models in neural networks, allowing the models to balance detail and semantics and ensuring robustness and efficiency [51,52].

Figure 13 and Figure 14 show that the input undergoes several stages with different modules to further enhance the model’s ability. At the end of the architecture, the final layer, “DualDDetect”, takes inputs from previous layers: 31, 34, 37, 16, 19, and 22. It combines features extracted at different resolutions and stages of the network, improving detection performance.

2.4. RTMP Server Construction

There are many options available for servers that allow for real-time image transfer. However, since the DJI application compatible with the UAV model used in this study only supports RTMP and RTSP transmission, the focus was on developing an RTMP server.

The construction of the RTMP server involved using NGINX (1.22.0) software, which allows users to create a virtual server on a device, and a Raspberry Pi-4 [61] with Ubuntu 20.04, a Debian-based operating system [62], was installed. The installation procedure was as follows:

Install Ubuntu 20.04 on Raspberry Pi 4:

-: Download the Ubuntu 20.04 image from the official Ubuntu website [62].
-: Use Balena Etcher to flash the Ubuntu image onto a microSD card.
-: Insert the microSD card into the Raspberry Pi 4 and power it on.
-: Follow the on-screen instructions and complete the Ubuntu installation.

After successfully installing Ubuntu, the system was ready for the NGINX installation. Since the updated version of NGINX no longer supports RTMP servers as a built-in function, additional modules were used to create the server.

Install can Configure RTMP module for NGINX:

Install N

GINX:
sudo	apt update
sudo	apt i n s t a l l l ibnginx − mod − rtmp

Configure NGINX to use the RTMP module by editing the nginx.conf file:

sudo nano/etc/nginx/nginx.conf

The following configuration was added:

rtmp {

server {

listen 1935;

chunk_size 4096;

application live {

live on; }}}

Listen 1935 allows for RTMP server to be access via the listening port 1935.

Start NGINX with the new configuration:

sudo systemctl reload nginx.service

Figure 15 demonstrates the terminal displaying the success in activating the server.

After the configuration, the server was ready for internal use. The server URL is “http://IP:1935/live/streamkey” (accessed on 18 May 2024), where the device’s IP address can be checked using the command “ifconfig.” The stream key can be configured according to the user’s needs. The next step required setting the firewall to allow access from outside to port 1935 by enabling port forwarding on the router [63].

Port Forwarding:

Access your router’s web interface.
Navigate to the port forwarding section.
Create a new port forwarding rule:
-
Service Name: RTMP Server;
-
Protocol: TCP;
-
External Port: 1935;
-
Internal IP Address: (IP address of your Raspberry Pi);
-
Internal Port: 1935.
Save the settings.

At this point, the RTMP server can be accessed without needing to be on the same network. Figure 16 flowchart explains how the connection network of the system works.

In the setup shown in Figure 16, the images captured by the UAV were transferred to the controller and then pushed to the RTMP server (1) with the pre-written Python code on the ground station that utilizes OpenCV’s video capture capability [29,64] using the following command:

cap = cv2.VideoCapture(URL)

The image can be streamed and used in the detection process in real time. The processed image will then be pushed to the RTMP server (2), which can be streamed to multiple IoT devices. The code requires the installation of FFmpeg (version 0.4.9-pre1) software [65], and the code is given by the following:

The ffmpeg command in Figure 17 enables the Python code to push the detection results to the RTMP server (2) within the same session as receiving and detecting the received images [65].

3. Results

3.1. System Performance in Real-Time Detection

The system’s performance was evaluated in a practical session using the configuration shown in Figure 16. Imagery captured by the UAVs was transferred to the ground station via the RTMP server, where the detection algorithm processed the data. The processed results were subsequently relayed to other IoT devices for further use. Figure 18 illustrates the detection process on-site, with a UAV operator (right) controlling the UAV and a data analyst (left) recording the detection results in real time.

The system showcased real-time processing capabilities, achieving an average inference time of 13 milliseconds per frame with additional time of approximately 4 to 6 milliseconds pre- and post-processing for each frame. Figure 19 displays the information in regard of the detection process.

The ability to output results in real time ensured seamless data transfer and immediate detection feedback, enabling efficient defect identification during UAV operation. The use of UAVs facilitated access to hard-to-reach areas, such as the upper surfaces of aircraft wings and fuselage sections, without requiring scaffolding or additional infrastructure. Figure 20, Figure 21 and Figure 22 demonstrate the application of the detection model in identifying various defect types. Specifically, Figure 21 presents sample outputs for the detected classes: (a) dent, (b) crack, (c) paint-off, and (d) scratch. These results highlight the model’s ability to accurately localize and classify defects under field conditions.

3.2. Algorithm Metrics

The detection algorithm was trained on a dataset of 10,482 images, unevenly distributed across five classes. The dataset used consisted of the original 9420 image dataset used for training in Section 2.1.1 and the null class, in which we aimed to lower the occurrence rate of the false positive prediction. The dataset was split into training, validation, and testing subsets to ensure robust evaluation. The training process utilized a batch size of 8, the SGD optimizer, a learning rate of 0.01, momentum of 0.9, and weight decay of 0.0001 for 100 epochs. All other settings were kept as defaults. Table 5 summarizes the performance metrics for each class.

As shown in Table 5, the table provides performance metrics for each class in the dataset, which can be use in evaluating the model’s ability to detect and localize objects. The Class column lists the object categories, including “crack”, “dent”, “missing-head”, “paint-off”, and “scratch”, with “all” representing aggregated metrics across all classes. Images indicate the number of images containing instances of each class; for instance, there are 266 images that contain “crack(s)”, while the column Instances represents the total number of objects for each class. Box(P) (precision for bounding boxes) measures the proportion of correctly predicted bounding boxes, with “crack” achieving the highest precision (0.955). R (recall) evaluates the proportion of true positive detections out of all ground-truth instances, with “dent” scoring the highest recall (0.874). The overall mAP@0.5 for all classes reached 0.842, while the mAP@0.5-0.95 was at 0.646. A class-wise analysis indicates that the dent class and crack class performed well in detection, with the mAP@0.5 score at 0.938 and 0.894, respectively. However, the paint-off and scratch classes exhibited lower performance, with mAP@0.5 values of 0.733 and 0.805, respectively. These results indicate potential challenges in detecting defects with subtle or low-contrast features, suggesting areas for further optimization. These observations align with known difficulties in detecting defects with subtle or ambiguous visual features. Further refinements, such as data augmentation, improved feature extraction, or additional training samples for underperforming classes, could enhance detection performance.

4. Discussion

The system’s performance was evaluated across several metrics, including the algorithm’s accuracy, the RTMP server’s responsiveness, reliability under continuous operation, and overall efficiency.

4.1. Algorithm Performance

From Table 4, the dent class achieved a very high mAP@0.5 value of 0.938, indicating a near-perfect balance between precision and recall. This high performance is likely due to the distinct visual features of dents, such as shadows or depth contrasts, which make them easier to detect. Similarly, the crack class achieved a mAP@0.5 of 0.888, suggesting high localization accuracy. The missing-head class had a mAP@0.5 of 0.861, demonstrating reliable detection capabilities. However, the paint-off and scratch classes underperformed, with mAP@0.5 values of 0.765 and 0.817, respectively. These lower scores may reflect challenges in detecting defects with subtle visual features or due to limited training samples.

The precision–recall curve as shown in Figure 23 provides further insights into class-specific detection performance. While the dent and crack classes maintain high precision across a wide recall range, the paint-off and scratch classes exhibit steeper drops in precision at higher recall levels, suggesting an increased rate of false positives when attempting to detect these defect types. Further analysis of the confusion matrix in Figure 24 reveals additional insights into the model’s classification performance. The diagonal cells represent correct classifications, with the highest counts observed for the crack and dent classes at 400 and 371, respectively. However, notable misclassifications are evident for paint-off and scratch defects, where these classes were frequently misclassified as either background or neighboring defect types. For example, 28 paint-off samples were misclassified as scratch defects, and 133 dent samples were misclassified as crack defects. These findings highlight the need for improved discrimination between similar defect types, particularly those with overlapping visual characteristics.

The confusion matrix also emphasizes the importance of background classification accuracy. While the majority of background samples were correctly classified, 207 were misclassified as cracks, and 183 were misclassified as dents, suggesting potential overfitting to more prominent defect features. Addressing these misclassifications could involve refining the training dataset, incorporating additional negative samples, or enhancing the model’s feature extraction capabilities.

In addition to the results presented in this study, numerous works have demonstrated the effectiveness of YOLO-based architectures in defect detection across various domains. YOLO algorithms have shown strong performance, particularly in real-time applications, balancing speed and accuracy across diverse scenarios, which makes them ideal for time-sensitive inspections [10]. For instance, the SGRS-YOLOv5n model has been utilized for cloud-edge collaborative defect detection, achieving superior detection accuracy and real-time performance, validating its application in resource-constrained environments [66]. Similarly, YOLO-ADS, an improved YOLOv8-based algorithm, demonstrated significant advancements in metal surface defect detection by leveraging a deep backbone network to integrate dense information across various scales [67]. Another study introduced the YOLOv8-TLC model for steel surface defect detection, highlighting its ability to manage defects of varying sizes and scales with enhanced sample distribution analysis [68]. Additionally, GDCP-YOLO, a lightweight deep learning model for multi-category steel defect detection, demonstrated improved performance while reducing computational requirements [69].

In comparison, the model in this study achieved an overall mAP@0.5 of 0.842, surpassing many prior benchmarks in defect detection. The high mAP@0.5 values for the dent (0.938) and crack (0.888) classes highlight its exceptional performance for these defect types. However, the paint-off (0.765) and scratch (0.817) classes exhibited lower performance, consistent with challenges observed in previous studies when detecting subtle or low-contrast defects [66,67]. These results affirm the competitiveness of the proposed model while underscoring areas for potential optimization.

This study’s key metrics include recall and precision scores, where recall evaluates how well the model recognizes trained objects, and precision assesses recognition accuracy. For example, the overall precision of 0.901 and recall of 0.763 indicate a balanced performance across all classes. Future improvements could involve augmenting the dataset with more samples of underperforming classes or applying advanced feature extraction techniques tailored to subtle defect types.

4.2. Reliability During Continuous Operation

To evaluate the reliability and endurance of the inspection system, a series of tests were conducted, simulating real-world inspection scenarios. The experiment aimed to assess the maximum operational duration of the UAV(s), the battery consumption of the mobile device used as its remote controller, and the stability of the image processing system. The results are presented in terms of battery consumption rates and computational performance metrics.

The experiment was designed to determine the maximum endurance of the UAV and the iPhone 13, which served as the remote control. The UAV was operated until it was forced to land at 10% battery capacity, which resulted in a test duration of 23 min and 30 s. The UAV’s onboard battery has a capacity of 3500 mAh, while the iPhone 13’s battery capacity is 3240 mAh [22,23]. Figure 25a,b illustrate the battery usage trends for both devices over time.

As shown in Figure 25a, the UAV consumed battery power at a rate of approximately 122.55 mAh per minute, leading to a maximum expected operational duration of 26 min. In comparison, the iPhone consumed battery power at a significantly lower rate of 23.30 mAh per minute, allowing for up to 140 min of operation. These results highlight that the UAV’s endurance is the limiting factor in the inspection system’s overall duration.

The computational reliability of the image processing system, equipped with an RTX 3050 Ti GPU, was tested under a 23 min inspection scenario along with the test previously mentioned. Figure 26a presents the GPU and CPU usage percentages over time, while Figure 26b shows the GPU temperature trend.

As shown in Figure 26a, the GPU usage remained relatively stable, averaging around 70%, with occasional peaks up to 80%. Similarly, CPU usage ranged between 18% and 28%, with a peak of 32%, indicating a balanced load distribution. These results suggest that the system can handle the computational demands of real-time image processing during inspection. Figure 26b depicts the GPU temperature trend during the test. Despite the extreme rising rate of the temperature during the initial 10 min, the temperature stabilized at approximately 54 °C for the remainder of the test. This stabilization indicates effective thermal management, ensuring that the system remains operational without risk of overheating.

The experimental results demonstrate that the UAV’s endurance defines the upper limit of the system’s operational duration at 26 min despite the iPhone’s extended battery life. The computational system, comprising the GPU and CPU, performed reliably under the given workload, with stable usage and temperature trends. These results suggest that the inspection system is suitable for short-duration tasks. However, for extended inspections, improvements in UAV battery capacity or the use of supplementary power sources would be required.

Future work could explore testing the system under varying environmental conditions, such as higher ambient temperatures, to ensure reliability in diverse scenarios. Additionally, optimizing computational workloads could further enhance the system’s efficiency.

4.3. False Results and Low Confident Rate of the Detection

One of the project’s challenges is false negatives, where the model fails to detect apparent defects, and false positives, where the model incorrectly identifies objects as defects or reports incorrect types of defects. Figure 25a,b illustrate examples of false results detected by the model, whereas in Figure 27a, a scratch was mistakenly identified as a crack. In Figure 27b, screws were identified as holes for missing rivets.

Environmental conditions can also impact detection results; low light or harsh glare can reduce recognition accuracy. While the misdetection of the defect exists in all test devices under suboptimum lighting condition, the results have shown an overall better performance while using iPhone 13 as a test device. This is potentially due to the difference in ISO and the low aperture of each test device, as indicated in Table 1.

Figure 28 shows the results of detecting a dent sample under the same lighting condition of two test devices, UAV (a) and iPhone 13 (b). Figure 26a exhibits significantly decreased confidence scores due to strong spotlighting, in contrast to Figure 26b, which achieved a relatively high confident rate of 0.85. This difference is attributed to equipment specifications, as previously discussed.

The ability of YOLO-based object detection models to identify defects depends on the resolution of the input images, the architecture of the model, and the quality of the training dataset. Studies indicate that YOLO models can detect objects as small as a few pixels in size, provided the objects are adequately represented in the training data and the resolution of the input image is sufficiently high [70]. However, in the real-world application, the system often failed to detect small defects despite being clearly captured by the UAVs. Figure 29 provides a visual illustration of the small defect that the models failed to detect.

Figure 29 illustrates a false negative in detecting a small crack defect. While the defect is relatively small in size, the defect is clearly visible in the captured image. This failure highlights a limitation of the model in identifying certain defects under specific conditions, such as subtle texture variations, lighting inconsistencies, or suboptimal defect visibility. In contrast, a successful detection case from a similar angle is shown in Figure 21b, where the model accurately identifies a defect of comparable size and type. The comparison between these two cases suggests that model performance may be influenced by slight variations in defect size, shape, and environmental conditions, such as lighting and shadowing. This observation suggests the need for further optimization in the training process, including the incorporation of more diverse image samples, enhancements to feature extraction capabilities, and advanced post-processing methods to reduce false negatives for small defects.

4.4. RTMP Server Performance

As discussed in Section 3.1, the average inference time of the detection process was approximately 13.8 milliseconds per frame. However, the wireless transmission of imagery data from the UAVs to the computing unit via RTMP introduces latency that significantly impacts the time required for results to be relayed back to the inspectors. To better understand and quantify this latency, an experiment was conducted to measure the time needed for the data transfer process. Similar to the approach in Section 4.1, this experiment evaluated the impact of delay during inspection sessions. The results are shown in Figure 28.

The delay in results over time is shown in Figure 30a; the delay in results fluctuated over the inspection session, ranging between 1 and 7 s. The delay was most prominent in the initial stages, peaking at 6–7 s during the first 5 min, which can be attributed to the initialization and data synchronization processes. Over time, the delay stabilized, varying between 2 and 5 s for the majority of the session, with a significant drop toward the end. These fluctuations could potentially become inefficiencies in the RTMP connection, particularly during the start of the transmission.

Inference time stability, shown in Figure 30b, illustrates the inference time stability analysis for the detection model, showing both the actual inference time and its trend over a 5 min window. The actual inference time displayed periodic variations, oscillating between 13.65 ms and 13.95 ms, with an average value of approximately 13.8 ms. The trend line demonstrates that the inference time remained stable throughout the session, indicating consistent computational performance of the detection model.

These periodic oscillations may reflect the system’s internal processing dynamics, such as batching or resource allocation within the GPU. However, the narrow range of variation ensures that the detection process remains unaffected, delivering reliable results during the inspection.

In the discussion of the overall speed of the system, while an average inference time of 13.8 milliseconds per frame is satisfactory, the RTMP connection introduces a latency of 2 to 5 s in the transmission. In practical terms, the UAV camera is required to pause at each inspection spot for a duration sufficient to capture and transmit imagery for processing. Given the average latency and inference time, the system can deliver detection results for a single spot within approximately 3 to 6 s under normal conditions. While this delay is negligible for most real-world surface inspections, it is important to note that the system’s speed could be further enhanced by optimizing data transfer protocols or incorporating local inference capabilities. The current limitations, particularly related to RTMP-induced latency, highlight opportunities for future improvements to further accelerate the exploration process.

5. Conclusions

The results of this study showcase the feasibility and potential of integrating YOLO-based models into aircraft maintenance workflows, particularly when paired with UAV technology. The YOLOv9 Compact model demonstrated a strong performance, achieving an overall mAP@0.5 of 0.842 across all classes, with outstanding accuracy in detecting dents (0.938) and cracks (0.888). This capability enables the precise identification of critical defects, including dents, cracks, missing rivets, paint off, and scratches, reinforcing the system’s applicability in ensuring operational safety and reliability.

Despite the successful detections, certain challenges persist, particularly in the detection of subtle defects such as paint off and scratches, which achieved mAP@0.5 scores of 0.765 and 0.817, respectively. These results suggest the need for further refinements to the training process, including the incorporation of more diverse datasets and environmental conditions, to improve the model’s accuracy in handling low-contrast and undistinguishable defect types. Additionally, the latency caused by the RTMP server transmission adds approximately 3 to 6 s per inspection point to the detection process under the current configuration. This duration represents the total time for the UAV to pause, capture data, and process detection results before transitioning to the next inspection point. To fully enable real-time feedback during inspections, network protocols and transmission efficiency must be optimized.

This study also evaluated the operational reliability of the system, particularly its integration of UAVs with real-time processing capabilities. The RTMP server successfully facilitated the transmission of UAV-captured imagery to ground stations and IoT devices, enabling effective defect detection without the need for cumbersome equipment or manual intervention. However, reducing transmission delays and enhancing the scalability of the system remain key priorities for future development.

In conclusion, this study demonstrates the significant potential of UAV-based YOLOv9 systems to revolutionize aircraft inspection protocols. Future efforts will focus on addressing existing limitations by refining detection capabilities for subtle defects, improving data transmission efficiency, and exploring advanced neural architectures to enhance real-time detection. These advancements will contribute to the development of a scalable, robust, and efficient solution that meets the stringent requirements of modern aviation maintenance and upholds the highest standards of safety and reliability.

Author Contributions

K.-C.L. Conceptualization, data curation, validation, methodology, visualization; K.-C.L., J.L. and M.H. resources, writing—original draft preparation, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This study received no external funding.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

This work was supported by the National Science and Technology Council of Taiwan.

Conflicts of Interest

The authors declare no conflicts of interest.

References

International Air Transport Association (IATA). Global Outlook for Air Transport; IATA: Montreal, QC, Canada, 2024; Available online: https://www.iata.org/en/iata-repository/publications/economic-reports/global-outlook-for-air-transport-june-2024-report/ (accessed on 19 December 2024).
Volgina, N.A.; Kidun, E.S. Global Civil Aircraft Industry: Modern Trends. In Modern Global Economic System: Evolutional Development vs. Revolutionary Leap. ISC 2019; Lecture Notes in Networks and Systems; Popkova, E.G., Sergi, B.S., Eds.; Springer: Cham, Switzerland, 2021; Volume 198. [Google Scholar] [CrossRef]
Committee on New Materials for Advanced Civil Aircraft. New Materials for Next-Generation Commercial Transports; National Academies Press: Washington, DC, USA, 1996. [Google Scholar]
Duvar, R.; Böyük, M.; Urhan, O. A Review on Visual Inspection Methods for Aircraft Maintenance. J. Aeronaut. Space Technol. 2021, 14, 185–192. [Google Scholar]
Jong, C.M.; Comer, A.J.; Chatzi, A.V.; Kourousis, K.I. Handling, inspection and repair of aircraft composites: A pilot study on the awareness of maintenance personnel. Aeronaut. J. 2024, 128, 211–229. [Google Scholar] [CrossRef]
Plastropoulos, A.; Bardis, K.; Yazigi, G.; Avdelidis, N.P.; Droznika, M. Aircraft Skin Machine Learning-Based Defect Detection and Size Estimation in Visual Inspections. Technologies 2024, 12, 158. [Google Scholar] [CrossRef]
Bugaj, M.; Novák, A.; Stelmach, A.; Lusiak, T. Unmanned Aerial Vehicles and Their Use for Aircraft Inspection. In Proceedings of the 2020 New Trends in Civil Aviation (NTCA), Prague, Czech Republic, 23–24 November 2020; pp. 45–50. [Google Scholar] [CrossRef]
Deyin, Z.; Penghui, W.; Mingwei, T.; Conghan, C.; Li, W.; Wenxuan, H. Investigation of Aircraft Surface Defects Detection Based on YOLO Neural Network. In Proceedings of the 2020 7th International Conference on Information Science and Control Engineering (ICISCE), Changsha, China, 18–20 December 2020; pp. 781–785. [Google Scholar] [CrossRef]
Saetchnikov, I.; Skakun, V.; Tcherniavskaia, E. Aircraft Detection Approach Based on YOLOv9 for High-Resolution Remote Sensing. In Proceedings of the 2024 11th International Workshop on Metrology for AeroSpace (MetroAeroSpace), Lublin, Poland, 3–5 June 2024; pp. 455–459. [Google Scholar] [CrossRef]
Liao, K.-C.; Lau, J.; Hidayat, M. Aircraft Skin Damage Visual Testing System Using Lightweight Devices with YOLO: An Automated Real-Time Material Evaluation System. AI 2024, 5, 1793–1815. [Google Scholar] [CrossRef]
Wang, C.Y.; Yeh, I.H.; Mark Liao, H.Y. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. In Computer Vision—ECCV 2024; ECCV 2024. Lecture Notes in Computer Science; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Springer: Cham, Switzerland, 2025; Volume 15089. [Google Scholar] [CrossRef]
Rodriguez-Gallo, Y.; Escobar-Benitez, B.; Rodriguez-Lainez, J. Robust coffee rust detection using uav-based aerial rgb imagery. AgriEngineering 2023, 5, 1415–1431. [Google Scholar] [CrossRef]
Wang, X.; Gao, H.; Jia, Z.; Li, Z. BL-YOLOv8: An improved road defect detection model based on YOLOv8. Sensors 2023, 23, 8361. [Google Scholar] [CrossRef]
Bilous, N.; Malko, V.; Frohme, M.; Nechyporenko, A. Comparison of CNN-Based Architectures for Detection of Different Object Classes. AI 2024, 5, 2300–2320. [Google Scholar] [CrossRef]
Nguyen, L.A.; Tran, M.D.; Son, Y. Empirical Evaluation and Analysis of YOLO Models in Smart Transportation. AI 2024, 5, 2518–2537. [Google Scholar] [CrossRef]
Ignatovich, S.; Menou, A.; Karuskevich, M.; Maruschak, P. Fatigue damage and sensor development for aircraft structural health monitoring. Theor. Appl. Fract. Mech. 2013, 65, 23–27. [Google Scholar] [CrossRef]
Maruschak, P.; Konovalenko, I.; Vuherer, T.; Panin, S.; Berto, F.; Danyliuk, I.; Menou, A. Analysis and automated fatigue damage evaluation of a 17mn1si pipeline steel. Procedia Struct. Integr. 2016, 2, 1928–1935. [Google Scholar] [CrossRef]
Quan, Y.; Li, C.-T. On Addressing the Impact of ISO Speed Upon PRNU and Forgery Detection. IEEE Trans. Inf. Forensics Secur. 2021, 16, 190–202. [Google Scholar] [CrossRef]
Maruschak, P.; Konovalenko, I.; Osadtsa, Y.; Medvid, V.; Shovkun, O.; Baran, D.; Kozbur, H.; Mykhailyshyn, R. Surface illumination as a factor influencing the efficacy of defect recognition on a rolled metal surface using a deep neural network. Appl. Sci. 2024, 14, 2591. [Google Scholar] [CrossRef]
Chapman, G.H.; Thomas, R.; Koren, I.; Koren, Z. Relating digital imager defect rates to pixel size, sensor area and ISO. In Proceedings of the 2012 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT), Austin, TX, USA, 3–5 October 2012; pp. 164–169. [Google Scholar] [CrossRef]
Wells, P. A Beginner’s Guide to Digital Video; AVA Books (UK) Ltd.: Worthing, UK, 2004. [Google Scholar]
Apple Inc. iPhone 13—Technical Specifications. Apple Support. 2021. Available online: https://support.apple.com/en-us/111872 (accessed on 19 December 2024).
Mavic, D.; Enterprise. Zoom Edition. Available online: https://cdn.djitelink.cz/data/user-content/navody/MAVIC%202%20ENTERPRISE%202.pdf (accessed on 19 December 2024).
Pagnutti, M.; Ryan, R.E.; Cazenavette, G.; Gold, M.; Harlan, R.; Leggett, E.; Pagnutti, J. Laying the foundation to use Raspberry Pi 3 V2 camera module imagery for scientific and engineering purposes. J. Electron. Imaging 2017, 26, 013014. [Google Scholar] [CrossRef]
Konovalenko, I.; Maruschak, P.; Brezinová, J.; Viňáš, J.; Brezina, J. Steel surface defect classification using deep residual neural network. Metals 2020, 10, 846. [Google Scholar] [CrossRef]
Universe, R. Open Source Computer Vision Community. Roboflow. 2023. Available online: https://universe.roboflow.com (accessed on 19 December 2024).
Gallagher, J.E.; Oughton, E.J. Surveying You Only Look Once (YOLO) Multispectral Object Detection Advancements, Applications And Challenges. arXiv 2024, arXiv:2409.12977. [Google Scholar]
Microsoft Corporation. Visual Studio Code, version 1.96; Microsoft Corporation: Redmond, WA, USA, 2024; Available online: https://code.visualstudio.com/updates/v1_96 (accessed on 19 December 2024).
Jocher, G.; Chaurasia, A.; Qiu, J.; Stoken, A.; Borovec, J.; Kwon, Y.; Michael, K.; Fang, J.; Imyhxy; Yu, L.; et al. YOLO by Ultralytics. GitHub Repository. Available online: https://github.com/ultralytics/yolov5 (accessed on 19 December 2024).
Van Rossum, G.; Drake, F. Python 3 Reference Manual; CreateSpace: Scotts Valley, CA, USA, 2009. [Google Scholar]
Evans, C.; Ben-Kiki, O.; döt Net, I. YAML Ain’t Markup Language (YAML™), version 1.2; 2017. Available online: https://yaml.org/spec/1.1/current.pdf (accessed on 19 December 2024).
Owens, J.D.; Houston, M.; Luebke, D.; Green, S.; Stone, J.E.; Phillips, J.C. GPU computing. Proc. IEEE 2008, 96, 879–899. [Google Scholar] [CrossRef]
Goodfellow, I. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Rusk, N. Deep learning. Nat. Methods 2016, 13, 35. [Google Scholar] [CrossRef]
Smith, S. Don’t decay the learning rate, increase the batch size. arXiv 2017, arXiv:1711.00489. [Google Scholar]
Strubell, E.; Ganesh, A.; McCallum, A. Energy and policy considerations for modern deep learning research. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13693–13696. [Google Scholar]
Chetlur, S.; Woolley, C.; Vandermersch, P.; Cohen, J.; Tran, J.; Catanzaro, B.; Shelhamer, E. Cudnn: Efficient primitives for deep learning. arXiv 2014, arXiv:1410.0759. [Google Scholar]
Luebke, D. CUDA: Scalable parallel programming for high-performance scientific computing. In Proceedings of the 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Paris, France, 14–17 May 2008; pp. 836–838. [Google Scholar]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]
Bradski, G.; Kaehler, A. OpenCV. Dr. Dobb’s J. Softw. Tools 2000, 3. Available online: https://www.scirp.org/reference/ReferencesPapers?ReferenceID=1692176 (accessed on 19 December 2024).
Cuda, C. NVIDIA CUDA™ Programming Guide, version 8.0; NVIDIA Corporation: Santa Clara, CA, USA, 2017. [Google Scholar]
Choi, H.; Lee, J. Efficient use of GPU memory for large-scale deep learning model training. Appl. Sci. 2021, 11, 10377. [Google Scholar] [CrossRef]
Merity, S.; Xiong, C.; Bradbury, J.; Socher, R. Pointer Sentinel Mixture Models. arXiv 2016, arXiv:1609.07843. [Google Scholar]
Bengio, Y. Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 437–478. [Google Scholar]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
Stevens, E.; Antiga, L.; Viehmann, T. Deep Learning with PyTorch; Manning Publications: Shelter Island, NY, USA, 2020. [Google Scholar]
Zou, Z.; Chen, K.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. Proc. IEEE 2023, 111, 257–276. [Google Scholar] [CrossRef]
Rizzieri, N.; Dall’Asta, L.; Ozoliņš, M. Diabetic Retinopathy Features Segmentation without Coding Experience with Computer Vision Models YOLOv8 and YOLOv9. Vision 2024, 8, 48. [Google Scholar] [CrossRef] [PubMed]
Yaseen, M. What is YOLOv9: An In-Depth Exploration of the Internal Features of the Next-Generation Object Detector. arXiv 2024, arXiv:2409.07813. [Google Scholar]
Ioffe, S. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Zhang, H.; Wu, C.; Zhang, Z.; Zhu, Y.; Lin, H.; Zhang, Z.; Sun, Y.; He, T.; Mueller, J.; Manmatha, R.; et al. Resnest: Split-attention networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2736–2746. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; PMLR. pp. 6105–6114. [Google Scholar]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Mungoli, N. Adaptive feature fusion: Enhancing generalization in deep learning models. arXiv 2023, arXiv:2304.03290. [Google Scholar]
Wang, C.-Y.; Yeh, I.-H.; Mark Liao, H.-Y. Yolov9: Learning what you want to learn using programmable gradient information. In Proceedings of the European Conference on Computer Vision, London, UK, 15–16 January 2025; Springer: Berlin/Heidelberg, Germany, 2024; pp. 1–21. Available online: https://link.springer.com/chapter/10.1007/978-3-031-72751-1_1 (accessed on 19 December 2024).
Liu, P.; Cao, X.; Jia, Y. Performance Evaluation and Analysis of Scalable Raspberry Pi 4 Model B Clusters. 2024. Available online: https://www.researchsquare.com/article/rs-4460804/v1 (accessed on 19 December 2024).
Canonical Ltd. Ubuntu 20.04 LTS: Focal Fossa; Canonical Ltd.: London, UK, 2020; Available online: https://ubuntu.com/download/desktop (accessed on 19 December 2024).
Lowe, D. Networking for Dummies; John Wiley & Sons: Hoboken, NJ, USA, 2020. [Google Scholar]
Jeon, W.; Ko, G.; Lee, J.; Lee, H.; Ha, D.; Ro, W.W. Deep learning with GPUs. In Advances in Computers; Elsevier: Amsterdam, The Netherlands, 2021; Volume 122, pp. 167–215. [Google Scholar]
Tomar, S. Converting video formats with FFmpeg. Linux J. 2006, 2006, 10. [Google Scholar]
Lei, Z.; Zhang, Y.; Wang, J.; Zhou, M. Cloud-Edge Collaborative Defect Detection Based on Efficient Yolo Networks and Incremental Learning. Sensors 2024, 24, 5921. [Google Scholar] [CrossRef] [PubMed]
Gui, Z.; Geng, J. YOLO-ADS: An Improved YOLOv8 Algorithm for Metal Surface Defect Detection. Electronics 2024, 13, 3129. [Google Scholar] [CrossRef]
Liu, C.; Cheng, H. Steel Surface Defect Detection Based on YOLOv8-TLC. Appl. Sci. 2024, 14, 9708. [Google Scholar] [CrossRef]
Yuan, Z.; Ning, H.; Tang, X.; Yang, Z. GDCP-YOLO: Enhancing Steel Surface Defect Detection Using Lightweight Machine Learning Approach. Electronics 2024, 13, 1388. [Google Scholar] [CrossRef]
Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]

Figure 1. The aircraft used in this study. (a) Cessna 172; (b) Hawker 400XP.

Figure 2. An example of the imagery capturing process.

Figure 3. Image examples defining different defects. (a) An example of a crack sample annotation; (b) an example of a dent sample annotation; (c) an example of a missing rivet sample annotation; and (d) an example of a scratch sample annotation.

Figure 4. YOLOv9 training code with modification.

Figure 5. The reading on the terminal during the model training.

Figure 6. Training results of mAP@0.5 scores for YOLO models.

Figure 7. Training results of recall scores for YOLO models.

Figure 8. (a) structure of Conv Module; (b) structure of RepConvN Module.

Figure 9. (a) Structure of RepNBottleneck Module; (b) structure of RepNCSP Module.

Figure 10. (a) structure of RepNCSpELAN4 Module; (b) structure of RepNCSP Module.

Figure 11. Structure of CBfuse Module.

Figure 12. The flowchart describes the architecture in the “backbone” section.

Figure 13. The flowchart describes the architecture in the “neck” section.

Figure 14. The flowchart describes the architecture in the “head” section.

Figure 15. The terminal window showing the working stage of the NGINX server.

Figure 16. The flowchart of the connection network system.

Figure 17. Code for ffmpeg upstream to custom RTMP server.

Figure 18. Detection process on-site.

Figure 19. Image inference speed displayed in the terminal.

Figure 20. An example of the detection result of the model.

Figure 21. Examples of detection results; (a) the detection result of a dent sample; (b) the detection result of a crack sample; (c) the detection result of a paint-off sample; (d) the detection result of a scratch sample.

Figure 22. Examples of detection results; (a) detection results of missing rivet using UAV camera (b) detection results of missing rivet using iPhone camera; (c) detection results of crack and scratch using UAV camera; (d) detection results of crack and scratch using iPhone camera.

Figure 23. Precision–recall curve exported upon training.

Figure 24. The confusion matrix of the trained model.

Figure 25. (a) The battery usage of the UAV device over time; (b) the battery usage of the iPhone 13 over time.

Figure 26. (a) CPU and GPU usage in percentage over time; (b) change in GPU temperature over time.

Figure 27. Examples of fault results; (a) the false result of a scratch as a crack; (b) the false positive of a missing head nail.

Figure 28. Confident scores on the UAV camera (a) and iPhone13 camera (b); (a) the false result of a scratch as a crack; (b) the false positive of a missing head nail.

Figure 29. The false negative on the detection of a small defect.

Figure 30. (a) The recorded delay time in seconds of the experiment; (b) the average inference time recorded during the experiment in milliseconds.

Table 1. Comparison of camera specifications: iPhone 13 [22], DJI Mavic 2 Enterprise [23], and Raspberry Pi Camera Module 3 [24].

Specification	iPhone 13	DJI Mavic 2 Enterprise	Raspberry Pi Camera Module 3
Aperture	f/1.6 (26 mm equivalent)	f/2.8	-
Sensor Type	CMOS (size not stated)	1/2.3″ CMOS	Sony IMX708 (Sony, Tokyo, Japan)
Megapixels	12 MP	12 MP	11.9 MP
IR Cut Filter	-	-	Integrated (not in NoIR variants)
ISO Range	32–6400	100–3200	-
Sensor Size	1/2.55″	1/2.3″	7.4 mm diagonal
Pixel Size	1.7 µm	-	1.4 µm × 1.4 µm
Resolution	4032 × 3024	4056 × 3040	4608 × 2592 pixels
Autofocus System	Dual-Pixel AF	-	Phase Detection Autofocus
Operating Temperature	0 °C to 35 °C	0 °C to 40 °C	0 °C to 50 °C

Table 2. Defect characteristic classification of aircraft used in this study [25].

Defect Type	Defect Features
Cracks	Linear or branched separations in the material, often caused by fatigue, stress, or environmental factors. These defects can compromise the structural integrity of the component and may propagate over time if not addressed.
Dents	Indentations or depressions on the surface caused by external impacts or collisions with other objects. Dents may weaken the material locally and require inspection to assess their depth and effect on structural performance.
Paint-off	Areas of coating loss exposing the underlying metal, often resulting from abrasion, chemical reactions, or environmental wear. This defect can lead to corrosion if the exposed metal is not treated promptly.
Missing Rivet	Absence of rivet heads that should be securing components, typically due to mechanical failure, wear, or improper installation. Missing rivets can result in structural instability or misalignment of parts.
Scratch	Shallow, linear defect, or abrasion on the surface of a material, typically caused by contact with a sharp or abrasive object. Scratches can serve as stress concentrators, potentially leading to further damage over time.

Table 3. YOLO model performance metrics [29].

Model	Size^(pixels)	mAP^val50-95	Params (M)	FLOPs
YOLOv9 Enhanced	640	55.6	58.1	192.5
YOLOv9 Compact	640	53.0	25.5	102.8
YOLOv8 Nano	640	37.3	3.2	8.7
YOLOv5 Nano	640	28.0	1.9	4.5

Table 4. Performance metrics of the YOLO model [30].

Model	Box Loss	Cls Loss	DFL Loss	Precision	Recall	mAP@0.5	FLOPs
YOLOv9 Enhanced	1.1566	0.90223	1.3288	0.73279	0.63225	0.66794	58.4
YOLOv9 Compact	1.1431	0.84063	1.29	0.77709	0.69067	0.71516	35.2
YOLOv8 Nano	1.5901	1.441	1.5728	0.73792	0.64289	0.65215	4.1
YOLOv5 Nano	1.6821	1.5732	1.5696	0.66685	0.60071	0.61084	1.5

Table 5. Performance metrics for different classes.

Class	Images	Instances	Box(P)	R	mAP50	mAP50-95
all	860	990	0.901	0.763	0.842	0.646
crack	266	307	0.955	0.827	0.894	0.663
dent	89	89	0.917	0.874	0.938	0.729
missing-head	124	143	0.921	0.732	0.842	0.627
paint-off	57	64	0.791	0.734	0.733	0.605
scratch	290	387	0.919	0.646	0.805	0.607

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liao, K.-C.; Lau, J.; Hidayat, M. An Innovative Aircraft Skin Damage Assessment Using You Only Look Once-Version9: A Real-Time Material Evaluation System for Remote Inspection. Aerospace 2025, 12, 31. https://doi.org/10.3390/aerospace12010031

AMA Style

Liao K-C, Lau J, Hidayat M. An Innovative Aircraft Skin Damage Assessment Using You Only Look Once-Version9: A Real-Time Material Evaluation System for Remote Inspection. Aerospace. 2025; 12(1):31. https://doi.org/10.3390/aerospace12010031

Chicago/Turabian Style

Liao, Kuo-Chien, Jirayu Lau, and Muhamad Hidayat. 2025. "An Innovative Aircraft Skin Damage Assessment Using You Only Look Once-Version9: A Real-Time Material Evaluation System for Remote Inspection" Aerospace 12, no. 1: 31. https://doi.org/10.3390/aerospace12010031

APA Style

Liao, K.-C., Lau, J., & Hidayat, M. (2025). An Innovative Aircraft Skin Damage Assessment Using You Only Look Once-Version9: A Real-Time Material Evaluation System for Remote Inspection. Aerospace, 12(1), 31. https://doi.org/10.3390/aerospace12010031

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Innovative Aircraft Skin Damage Assessment Using You Only Look Once-Version9: A Real-Time Material Evaluation System for Remote Inspection

Abstract

1. Introduction

2. Methodology

2.1. Training Dataset

2.1.1. Defect Classification

2.1.2. Image Acquisition

2.1.3. Annotation

2.2. Algorithm Training

2.2.1. Training Environment

2.2.2. Code and Computation

2.2.3. Training Parameters and Loss Functions

2.2.4. Training Outputs

2.3. YOLO Model

2.4. RTMP Server Construction

3. Results

3.1. System Performance in Real-Time Detection

3.2. Algorithm Metrics

4. Discussion

4.1. Algorithm Performance

4.2. Reliability During Continuous Operation

4.3. False Results and Low Confident Rate of the Detection

4.4. RTMP Server Performance

5. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI