Vision-Based Automatic Collection of Nodes of In/Off Block and Docking/Undocking in Aircraft Turnaround

Xu, Juan; Ding, Meng; Zhang, Zhen-Zhen; Xu, Yu-Bin; Wang, Xu-Hui; Zhao, Fan

doi:10.3390/app13137832

Open AccessArticle

Vision-Based Automatic Collection of Nodes of In/Off Block and Docking/Undocking in Aircraft Turnaround

by

Juan Xu

¹,

Meng Ding

^1,*

,

Zhen-Zhen Zhang

¹,

Yu-Bin Xu

²,

Xu-Hui Wang

² and

Fan Zhao

³

¹

College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China

²

China Academy of Civil Aviation Science and Technology, Beijing 100028, China

³

Capital Airports Holding Limited of China, Beijing 100020, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(13), 7832; https://doi.org/10.3390/app13137832

Submission received: 23 April 2023 / Revised: 29 June 2023 / Accepted: 30 June 2023 / Published: 3 July 2023

(This article belongs to the Section Aerospace Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

The automatic collection of key milestone nodes in the process of aircraft turnaround plays an important role in the development needs of airport collaborative decision-making. This article exploits a computer vision-based framework to automatically recognize activities of the flight in/off-block and docking/undocking and record corresponding key milestone nodes. The proposed framework, which seamlessly integrates state-of-the-art algorithms and techniques in the field of computer vision, comprises two modules for the preprocessing and collection of key milestones. The preprocessing module extracts the spatiotemporal information of the executor of key milestone nodes from the complex background of the airport ground. In the second module, aiming at two categories of key milestone nodes, namely, single-target-based nodes represented by in-block and off-block and interaction-of-two-targets-based nodes represented by docking and undocking stairs, two methods for the collection of key milestone are designed, respectively. Two datasets are constructed for the training, testing, and evaluation of the proposed framework. Results of field experiments demonstrate how the proposed framework can contribute to the automatic collection of these key milestone nodes by replacing the manual recording method routinely used today.

Keywords:

aircraft turnaround; key milestone nodes; in/off block; docking/undocking; automatic collection; computer vision

1. Introduction

Aircraft turnaround can be defined as the process that unfolds from unloading an aircraft after its arrival until the departure of that aircraft as a new flight [1]. Depending on the regulations of Airport Collaborative Decision Making (A-CDM), the turnaround process can be quantified into a series of key milestone nodes (KMNs) [2,3]. The first KMN of an aircraft’s turn is referred to as the in-block node when the aircraft arrives at the parking position and after the chocks are set. The turnaround process ends when the aircraft is ready to depart. This end-time node is called an off-block node. In the period from the in-block time to the off-block time, there are several KMNs (shown in Figure 1) corresponding to different flight ground handling processes, such as catering, fueling, cabin opening and closing, and passenger deplaning and boarding. In recent years, the International Civil Aviation Organization (ICAO) and civil aviation authorities around the world have been vigorously improving the efficiency and resilience of airport operations by optimizing the use of resources and improving the predictability of air traffic and airport ground activities [4]. KMNs play a considerable role in airport planning, including schedule planning, fleet planning, and operation planning [5]. Therefore, collecting and recording node information by adopting automatic approaches can also improve the efficiency and quality of aircraft turnarounds. In this paper, a computer vision-based framework is established for the automatic collection of KMNs during the process of aircraft turnaround. The proposed framework automatically identifies the flight in/off-block and docking/undocking activities and records the corresponding KMNs, instead of employing the manual recording method used daily.

Existing systems of airport ground surveillance enable the monitoring of moving and parked targets mainly through target-positioning systems [6,7], such as Automatic Dependent Surveillance-Broadcast (ADS-B), Surface Movement Radar (SMR), and Multilateration (MLAT). The aforementioned systems, which mainly identify and label airport ground targets as point signals, are unable to effectively capture and extract semantic features and detailed information about moving and parked targets, and it is therefore difficult to use these systems to identify flight ground handling processes and collect corresponding KMNs. In recent years, computer vision-based surveillance systems have gained substantial popularity [8,9]. Two computer vision-based systems for airport ground surveillance, namely, INTERVUSE [10] and AVITRACK [11], were developed in the European Union at the turn of the century. With the development of digital tower conception over the last 20 years, current computer vision-based systems for airport ground surveillance have been exploited [12]. In brief, computer vision-based systems have three advantages in airport ground surveillance. First, computer vision-based systems have lower equipment investment costs than A-DSB, SMR, and MLAT. Second, unlike ADS-B and MLAT for cooperative target surveillance and SMR for noncooperative target surveillance, computer vision-based systems can monitor both cooperative and noncooperative ground targets. Third, computer vision-based surveillance uses vision sensors to capture images with rich semantic and detailed information, which is required for the identification of ground handling and corresponding KMNs.

From a methodological perspective of KMN collection by using computer vision, we roughly divide the KMNs listed in Figure 1 into two categories. The first category is the KMNs based on a single target, such as In-block, Off-block, Cabin doors open/close, and Cargo doors open/close. To address the problems of collecting KMNs of this category, the core task is to determine the moments when the position or state of the target changes through computer vision techniques. Another category is KMNs based on the interaction of two objectives, such as Docking/Undocking stairs, Catering start/completion, and Refueling start/completion. The key to solving the problem of collecting KMNs of the second category is to determine the moment of shift in the relative position between the subject and the object of a particular action. In addition, several KMNs in Figure 1 are outside the scope of the proposed framework for the following reasons: First, several KMNs cannot be captured comprehensively and entirely by ground surveillance cameras as activities corresponding to these KMNs do not take place or partially take place on the airport ground, e.g., Cabin cleaning, Maintenance inspection confirmation started/completed. Second, the boundaries of several KMNs are difficult to determine by approaches of computer vision, e.g., passenger deplaning/board. To this end, this paper aims to exploit a framework for the automatic collection of KMNs in the process of aircraft turnaround based on apron surveillance videos. The proposed framework can provide a general methodological architecture for the automatic collection of both categories of KMNs. The main work and contributions of this paper are summarized as follows.

(1) A computer vision-based framework for the automatic collection of KMNs is proposed. The framework consists mainly of two modules: a preprocessing module and a general methodological architecture for automatic collection.

(2) To extract spatiotemporal information of the executor of KMNs from the complex background of the airport ground, the preprocessing module is established by using a different loss function to improve the detector YOLO v5 and integrating the methods for prediction and association of each executor’s positions in consecutive frames.

(3) Based on the different characteristics of the KMNs, this paper divided them into two categories, i.e., nodes based on a single target and nodes based on the interaction of two targets, and two architectures are proposed for collecting nodes of in/off block and docking/undocking, respectively.

Moreover, for the training, testing, and evaluation of the proposed framework and related algorithms, related datasets captured on Beijing Capital International Airport were constructed. To the best of our knowledge, these datasets are the first dedicated to the automatic collection of KMNs.

The remainder of the paper is organized as follows. Section 2 introduces related works. The dataset and the main components of the proposed framework are described in Section 3. Section 4 and Section 5 detail the proposed framework. Section 6 reports on the related experiments. The conclusions and future work are summarized in Section 7.

2. Related Works

2.1. Collection of KMNs

The collection of KMNs is a prerequisite for determining whether a flight can be launched according to the normal flight schedule. The automatic collection of KMNs can help the airport operation control center to generate an optimal push-back sequence and improve the airport ground operational efficiency [13]. It is therefore crucial to implement A-CDM to establish the next generation of smart airport systems. Previous works on the KMNs of aircraft turnaround focused on the prediction of the operational situation based on historical and real-time shared data from the airport operation control center [14,15,16]. Nevertheless, it is difficult to construct effective predictive models due to the uncertainties and dynamics of the ground handling process.

Compared with research on node prediction, there have been few published academic papers on the automatic collection of KMNs. To the best of our knowledge, the Gate Activity Monitoring Tool Suite (GAMTOS) is one of the earliest tools that uses computer vision to monitor the turnaround process [17]. GAMTOS contains two modules. Specifically, the first module uses a deformable part model (DPM) [18] to detect different objects moving on the airport ground. The push-back time is predicted using a Bayesian framework in the second module. The DPM used in GAMTOS is computationally expensive, and the framework thus seems ineffective. As one of the pioneering and representative works in the field of automatic collection of KMNs, Thai et al. proposed a computer vision-based method of aircraft push-back prediction and turnaround monitoring [19]. They conducted the real scenarios tests by using surveillance videos of Obihiro Airport (IATA: OBO, ICAO: RJCB) [20]. Related experimental results show that its object detection model achieved up to 87.3% average precision (AP), and the average error of push-back prediction is less than 3 min. The other representative work was presented by Yıldız S et al. [21]. They set up a turnaround system to automatically detect and track ground services at the airport. In their work, ground services at airports are collected by using the motion status (stopping and moving) of the vehicles providing services. Moreover, Gorkow et al. offered a demo application of aircraft turnaround management using computer vision [22]. Undoubtedly, employing computer vision to complete KMN collection by detecting and tracking the ground support equipment (GSE) has become a consensus. Although the proposed framework follows a similar technical line, our approach differs from the two previous representative approaches in terms of implementation. The specific differences are detailed in Table 1. The above studies revealed that several tasks of GSE detection, recognition, localization, and movement state estimation are crucial parts of the computer vision-based automatic collection of KMNs. To this end, the present paper thus designs a novel preprocessing module to accomplish the above functions.

2.2. Preprocessing for the Collection of KMNs

The aim of preprocessing is to extract spatiotemporal information of activity of the executors corresponding to different KMNs. Therefore, the first task of the proposed framework is to detect and identify these executors from the surveillance videos and thus obtain spatial information in the image plane. Based on this, the temporal motion information of the object is extracted by associating the same object in consecutive video frames. Therefore, detection and recognition are key aspects of the module of preprocessing for the collection of KMNs.

Object detection and recognition are hot and fundamental topics in the field of computer vision [23]. Before the emergence of deep learning-based detectors, handcrafted features, such as the histogram of oriented gradients [24] and scale-invariant feature transform [25], were used widely. One of the most representative handcrafted feature-based detectors is the DPM, which is based on histogram-of-oriented-gradients feature maps and has been applied in GAMTOS [17]. With the success of AlexNet [26], convolutional neural network-based models have become the mainstream in the field of object detection owing to their powerful capabilities of feature representation.

Existing deep learning-based detectors can generally be divided into two categories: two-stage and one-stage detectors. Two-stage methods have two steps, namely, region proposal generation and object classification. As a representative method of two-stage detectors, Faster RCNN was developed using the region proposal network as a proposal generator [27]. Many two-stage detectors based on the original Faster RCNN have been developed. In general, due to their computational complexity, two-stage detectors suffer from obvious shortcomings when applied to systems with high requirements for real-time performance. To address the real-time problem, region-free one-stage detectors have been proposed. They replace the results of the region proposal network by directly using each cell as a proposal and omit the proposal generation stage. In our opinion, AirNet proposed by Thai et al. belongs to a one-stage detector [20]. As the most widely used one-stage detectors at present, the first generation of the YOLO series was proposed in 2015 [28]. The greatest advantage of the original version of the YOLO series (e.g., YOLOv1, YOLOv2, and YOLOv3) is computational speed [29]. As new technologies (e.g., new backbones for feature map extraction and new modules for feature map fusion) are introduced into the YOLO framework, new members of the YOLO series with real-time performance with high accuracy are emerging all the time, such as YOLOv4 [30] and YOLOv5 [31]. In 2021, YOLOX is proposed by integrating recent outstanding developments in the field of object detection into the YOLOv5 [32]. Considering that the proposed framework has to work in real-time and requires accurate detection and identification of GSE and aircraft, we detect executors of ground activities corresponding to different KMNs by improving the loss function of YOLOv5.

3. Dataset Collection

To complete the training and testing of the algorithms and the proposed framework, two datasets are constructed and presented. The first dataset consists of a large number of single-frame images, with the GSE and aircraft in each image being manually labeled using bounding boxes, and is used to train and evaluate the algorithm of the detection and recognition. The second dataset consists of video sequences of ground surveillance recorded at different times on two fixed stands at Beijing Capital International Airport and is used to evaluate the algorithm of object prediction and association as well as the collection of KMNs. In the second dataset, the bounding boxes and distinct IDs of each GSE and aircraft in each frame are also manually labeled for object prediction and association evaluation. Moreover, in the top left corner of each frame, the time (to the second) at which the frame was captured is displayed as the label to evaluate the collection of KMNs. The relationship between the datasets and each component of the proposed framework is shown in Figure 2.

3.1. Dataset Comprising Single Images

To detect and identify executors of KMNs during the turnaround process, a dataset consisting of 3175 images was first constructed. The dataset mainly contains civil aircraft and mobile aircraft landing stairs. Their numbers are presented in Figure 3a. To reflect a realistic airfield operational environment, images of the dataset cover several special cases, such as multi-scale changes in the targets, different visibilities, and occlusions. All targets are labeled manually, and each ground truth includes the location of the bounding boxes and their class names. The location and scale distribution of the objects in the dataset are shown in Figure 3b. This figure shows that the objects to be detected are mainly distributed in the central region of the image, and the scale distribution of the instances is relatively uniform.

3.2. Dataset Comprising Video Sequences

Unlike the aforementioned dataset, which is an annotation of a single image, the second dataset is an annotation of a video sequence. Thus, in addition to annotating the bounding boxes and categories of the objects, this dataset also requires an ID number to be assigned to each target, with the same target having a constant ID in the sequence. This ID number is used to determine whether the same target has been successfully associated in different frames. Moreover, the dataset is used for the collection of KMNs and contains four KMN types, namely, aircraft in-block and off-block and the docking and undocking of mobile aircraft landing stairs.

Original videos of this dataset were captured by surveillance cameras of two remote stands (No. 734 and No. 939) at Beijing Capital International Airport shown in Figure 4. The field of view of the camera at No. 734 is much larger than that of the camera at No. 939. The dataset contains 42 individual video sequences. Table 2 presents the distribution of video sequences. The frame index corresponding to the time of occurrence of the KMN is manually recorded as the ground truth to evaluate the proposed framework. The frame rate of the original video is 30 frames per second (fps). Owing to the relatively low speed of the aircraft and the motion of the various working vehicles in the surveillance video, not only does lowering the frame rate not affect the performance of the proposed framework but also it reduces memory consumption and improves operational efficiency to a certain extent. To this end, the original surveillance video is sampled at 5 fps, and each sequence of the dataset contains an average of 300 frames with a resolution of 1920 × 1080 pixels.

4. Preprocessing Module

The purpose of the preprocessing module is to extract spatiotemporal information, including the identity, position, and movement states of executors of KMNs from the surveillance video, thus providing the necessary inputs for subsequent tasks of the collection of KMNs. The preprocessing module consists of three main parts: detection and recognition, prediction based on the filter, and association. Figure 5 is a flowchart of the preprocessing module.

4.1. Detection and Recognition

The identity and location information of the executors and participants of the KMNs are prerequisites for the completion of the collection of KMNs. A sub-module is thus required to accurately and in real-time detect and identify targets in specific areas of the airfield. The proposed framework utilizes YOLOv5 as the basic framework for GSE and aircraft detection and classification. YOLOv5 uses the architecture of Darknet53 with the Focus Structure and Cross Stage Partial Network as the backbone, PANet and SPP block as the neck, and the head containing three branches corresponding to outputs with three different scales [31]. The loss function of YOLOv5 consists of three parts and can be expressed as follows:

L = L_{c l s} + L_{o b j} + L_{r e g}

(1)

where the classification loss function L_cls is a cross-entropy loss function and used for classification and recognition, the confidence loss function L_obj is a binary cross entropy function used to determine whether an object exists within a projected bounding box, and the regression loss function L_reg is used to predict the bounding box regression. Considering that L_reg has a large impact on the positioning accuracy of the target, we compared four different methods for L_reg calculation: IoU-loss L_IoU, Distance IoU-loss (DIoU-loss) L_DIoU, Generalized IoU-loss (GIoU-loss) L_GIoU, and Complete IoU-loss (CIoU-loss) L_CIoU.

\begin{array}{l} I o U = \frac{B \cap B^{g t}}{B \cup B^{g t}} \\ G I o U = 1 - I o U + \frac{|E - B \cup B^{g t}|}{|E|} \\ D I o U = 1 - I o U + \frac{{(x_{c} - x_{c}^{g t})}^{2} + {(y_{c} - y_{c}^{g t})}^{2}}{L^{2}} \\ C I o U = D I o U + \frac{v^{2}}{(1 - I o U) + v} \end{array},

(2)

where

(x_{c}, y_{c})

and

(x_{c}^{g t}, y_{c}^{g t})

are the coordinates of the center points of the detected bounding box B and corresponding ground truth B^gt, respectively, E is the smallest enclosing box covering the detected bounding box and ground truth, L is the diagonal length of the E, and v is defined as [29],

v = \frac{4}{π^{2}} {(\arctan \frac{w^{g t}}{h^{g t}} - \arctan \frac{w}{h})}^{2},

(3)

where

w^{g t}

and

h^{g t}

are the width and height of B^gt, respectively, and w and h are the width and height of B, respectively. Through the comparative experiment (related results are shown in Table 4), CIoU-loss is used in the proposed framework to improve the positioning accuracy of the object. The reason for choosing CIoU-loss as the bounding box regression loss is that it takes into account two geometrical factors: the distance between the central points of bounding boxes and the aspect ratio of the bounding box.

4.2. Position Prediction

Detection can obtain the locations of each GSE and aircraft in a single frame, and associating the positions of the same target in consecutive frames is an indispensable precondition for the spatiotemporal information extraction for each target. Predicting the target position in the following frame based on the target position in the current frame can reduce both the search range and improve the accuracy of the match during association. To predict the target position, a state vector based on the detection results for a single frame is initialized [33]. x = [x_c, y_c, δ, h, Δx_c, Δy_c, Δδ, Δh]^T, where δ = w/h, [x_c, y_c, δ, h] denotes the position and shape of the bounding box, and [Δx_c, Δy_c, Δδ, Δh] represents the change of state of objects between adjacent frames. The state-transition matrix

A \in ℝ^{8 \times 8}

is defined as

A = [\begin{matrix} I_{4 \times 4} & I_{4 \times 4} \\ 0_{4 \times 4} & I_{4 \times 4} \end{matrix}],

(4)

where, I_{4 × 4} and 0_{4 × 4} are identity and zero matrixes with 4 × 4, respectively. The matrix of the transition from the state space to the observation space is C = [I_{4 × 4} 0_{4 × 4}]. According to the state model in the t-1^th frame, the estimated state model of the object in the tth frame is

{\hat{x}}_{t} = A x_{t - 1} + Q

, where

Q \in ℝ^{8}

is a noise vector. According to the theory of Kalman filtering [34], the predicted result in the current frame is

x_{t} = {\hat{x}}_{t} + K_{t} (y_{t} - C {\hat{x}}_{t}),

(5)

where y_t= [x_c, y_c, δ, h]^T is the final result in the current frame that is obtained by detection and association.

K_{t} \in ℝ^{8 \times 4}

is the gain of the Kalman filter and is computed as

K_{t} = {\hat{P}}_{t} C^{T} {(C {\hat{P}}_{t} C^{T} + R)}^{- 1},

(6)

where R

\in ℝ^{4 \times 4}

is the covariance matrix of the observation noise.

{\hat{P}}_{t}

is the predicted covariance matrix of the state vector in frame t and is calculated as

{\hat{P}}_{t} = A P_{t - 1} A^{T} + W,

(7)

where W

\in ℝ^{8 \times 8}

is the covariance matrix of the system noise,

P_{t - 1}

is the covariance matrix in the previous frame, and the covariance matrix P_t in the current frame needs to be updated according to

P_{t} = (I - K_{t} C) {\hat{P}}_{t}

. The initialization of the covariance matrix P₀, R, and W for surveillance videos of No.734 and 939 are defined as

\begin{array}{l} P_{0} = [\begin{matrix} 10 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 10 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 10 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 10 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 10000 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 10000 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 10000 \end{matrix}], R = [\begin{matrix} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 10 & 0 \\ 0 & 0 & 0 & 10 \end{matrix}], \\ W = [\begin{matrix} 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0.01 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0.01 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0.0001 \end{matrix}] \end{array} .

(8)

4.3. Association

In the current frame, the sub-module of detection and recognition obtains n bounding boxes

{\tilde{y}}_{t}^{i} = {[{\tilde{x}}_{c}^{i}, {\tilde{y}}_{c}^{i}, {\tilde{δ}}^{i}, {\tilde{h}}^{i}]}^{T}

, i = 1, 2,…, n. Moreover, according to the prediction result using the detection results of the previous frame, the predicted m bounding boxes are

{\hat{y}}_{t}^{j} = C {\hat{x}}_{t}^{j}

, j = 1, 2,…, m. We construct a matrix

U \in ℝ^{m \times n}

. The element u_i_,j of U denotes the value of IoU between two bounding boxes

{\tilde{y}}_{t}^{i}

and

{\hat{y}}_{t}^{j}

. Therefore, a higher value of the element u_i_,j indicates a better match of

{\tilde{y}}_{t}^{i}

and

{\hat{y}}_{t}^{j}

. In this paper, considering that the aircraft is moving slowly during the parking process, the aircraft has a larger IoU value between two adjacent frames compared with the GSE. We thus set a larger threshold T_a for the aircraft and a smaller threshold T_v for the vehicle to compute u_i_,j. Then, u_ij based on IoU is further constrained by the conditions

u_{i j} = \{\begin{matrix} I o U_{i j} \\ I o U_{i j} \\ 0 \end{matrix} \begin{matrix}  \end{matrix} \begin{matrix} i f \\ i f \end{matrix} \begin{matrix}  \end{matrix} \begin{matrix} u_{i j} > T_{a} & c l a s s ({\tilde{y}}_{t}^{i}) = c l a s s ({\hat{y}}_{t}^{j}) = a i r c r a f t \\ u_{i j} > T_{v} & c l a s s ({\tilde{y}}_{t}^{i}) = c l a s s ({\hat{y}}_{t}^{j}) \\ e l s e \end{matrix},

(9)

where class(.) denotes the category of the bounding box. Finally, the matrix U is used as the input of the Hungarian algorithm [35] to obtain the association results

Y_{t} = [y_{t}^{1}, y_{t}^{2}, \dots, y_{t}^{s}]

,

s \leq \min (m, n)

, and

y_{t} = {\tilde{y}}_{t}^{i}

if

{\tilde{y}}_{t}^{i}

and

{\hat{y}}_{t}^{j}

are associated.

5. Collection of KMNs

As mentioned above, the KMNs defined in A-CDM can be divided into two categories: nodes based on the actions of a single target and nodes based on the interactions of two targets. In this paper, we use four KMNs as examples to develop the framework for the automatic collection of KMNs. Among these four KMNs, in-block and off- block are representative of the action of a single target, whereas docking and undocking stairs are representatives of the interaction between the parked aircraft and moving aircraft landing stairs. Figure 6 shows the flowchart of the automatic collection of four KMNs. In the flowchart, there are four key issues to address: (1) Aircraft confirmation at the current position; (2) Motion state estimation for a single target (including aircraft and GSE); (3) Collection of KMNs based on a single target; (4) Collection of KMNs based on multi-object interaction. In this section, we discuss related issues using surveillance videos captured by the fixed cameras at positions 939 and 734.

5.1. Aircraft Confirmation at the Current Position

Existing cameras for airport parking surveillance are located at fixed locations and are used to monitor activities at individual aircraft stands. However, due to the different field of view of the surveillance cameras, aircraft in adjacent parking positions may be included in one surveillance image. Since all ground service work revolves around the aircraft in its current parking position, it is thus necessary to first identify and confirm the aircraft in the current position from all detected aircraft in the surveillance video. By analyzing the size of the aircraft in the image plane at positions No. 734 and No. 939, we find that the width or height of the bounding box of the aircraft at the current position is at least half of the width or height of the image plane. According to this prior rule, Algorithm 1 is designed to confirm the aircraft at its current position. The aircraft at the current position is determined by searching the bounding box of the aircraft with the largest area that satisfying the above rule. Figure 7 shows the results of aircraft confirmation.

Algorithm 1. Aircraft confirmation at the current position

Inputs: bounding boxes of aircraft

[x_{c}^{i}, y_{c}^{i}, w^{i}, h^{i}]

, i = 1, 2, …, n. W and H are the width and height of the input image, respectively.
for i = 1:n
if wⁱ>W/2 or hⁱ > H/2
Sⁱ = wⁱ × hⁱ
else
Sⁱ = 0
end if
end for

j = \underset{i \in 1, 2, 3, \dots, n}{\arg \max} (S^{i})

Outputs: the bounding box of the aircraft at current position is

[x_{c}^{j}, y_{c}^{j}, w^{j}, h^{j}]

.

5.2. Motion State Estimation for a Single Target

Determining whether an aircraft or a GSE is stationary is the key to the automatic collection of KMNs. Motion-state estimation for a single target is employed to address this issue. The position and size of the bounding box are constantly changing during the movement of airport ground targets. In determining whether the target is in motion or at rest, we use the standard deviations of the center point coordinates of the bounding box and the IoU_pc values of the previous and current bounding boxes in consecutive frames of a certain length to determine the motion state. We define the precondition for the target to be in a stationary state at tth frame as

Condition 1 : C_{1} (t) = \{\begin{matrix} 1 \\ 0 \end{matrix} \begin{matrix}  \end{matrix} \begin{matrix} i f \end{matrix} \begin{matrix} σ_{c} (t) < T h_{c} \end{matrix} \begin{matrix} a n d \\ e l s e \end{matrix} \begin{matrix} σ_{I o U}^{p c} (t) < T h_{I o U} \end{matrix},

(10)

C₁(t) = 1 indicates that the target is in a stationary state. Th_c and Th_IoU are thresholds of σ_c and

σ_{I o U}^{p c}

, respectively.

σ_{c} (t)

and

σ_{I o U}^{p c} (t)

are calculated as

σ_{c} (t) = \sqrt{σ_{x}^{2} (t) + σ_{y}^{2} (t)} and σ_{I o U}^{p c} (t) = \sqrt{\frac{\sum_{i = t - r + 1}^{t + r} I o U_{p c}^{i} - I o U_{p c}^{m e a n} (t)}{2 r}},

where r is the radius of the video sequence with a length of 2r + 1, and the tth frame is the center of this video sequence. Moreover,

\{\begin{matrix} σ_{x} (t) = \sqrt{\frac{\sum_{i = t - r}^{t + r} {(x_{c}^{i} - x_{c}^{m e a n} (t))}^{2}}{2 r + 1}} \\ σ_{y} (t) = \sqrt{\frac{\sum_{i = t - r}^{t + r} {(y_{c}^{i} - y_{c}^{m e a n} (t))}^{2}}{2 r + 1}} \end{matrix}, \{\begin{matrix} x_{c}^{m e a n} (t) = \frac{1}{2 r + 1} \sum_{i = t - r}^{t + r} x_{c}^{i} \\ y_{c}^{m e a n} (t) = \frac{1}{2 r + 1} \sum_{i = t - r}^{t + r} y_{c}^{i} \end{matrix}, I o U_{p c}^{m e a n} (t) = \frac{1}{2 r} \sum_{i = t - r + 1}^{t + r} I o U_{p c}^{i}, and I o U_{p c}^{i} = \frac{B_{}^{i - 1} \cap B_{}^{i}}{B_{}^{i - 1} \cup B_{}^{i}} .

Here, σ_x(t) and σ_y(t) are the standard deviations of the center point coordinates of the bounding box along the x-axis and y-axis directions of the image plane.

σ_{I o U}^{p c} (t)

is the standard deviation of IoU_pc. Bⁱ⁻¹ and Bⁱ are bounding boxes of the target in previous and current frames, respectively.

5.3. Collection of KMNs Based on a Single Target

We illustrate the collection of KMNs based on a single target by using the examples of in-block and off-block of an aircraft. In-block means that an aircraft has arrived in the parking position and parking brakes are activated. Thus, in-block of the aircraft is the transition of the aircraft’s state from motion to rest. In contrast, off-block is the moment when an aircraft starts to move from the parking position and prepares to taxi and take off. This implies that off-block is the transition from the stationary to the moving state of the aircraft. Consequently, nodes of in-block and off-block can be detected based on the motion state estimation of a single target. The algorithm for in-block and off-block node detection is described in Algorithm 2. Without loss of generality, this algorithm can be used to collect other KMNs based on a single target.

Algorithm 2. In-block and off-block node collection

Inputs: For a sequence with num frames, the bounding box of aircraft in each frame is

[x_{c}^{i}, y_{c}^{i}, w^{i}, h^{i}]

, i = 1,2,…,num. Th_c and Th_IoU of Equation (10) are initialized (manually determined).
for t = r + 2:num-r-1
Computing

σ_{c} (t - 1)

and

σ_{I o U}^{p c} (t - 1)

,

σ_{c} (t)

and

σ_{I o U}^{p c} (t)

,

σ_{c} (t + 1)

and

σ_{I o U}^{p c} (t + 1)

.
Computing C₁(t − 1), C₁(t), and C₁(t + 1) using Condition 1 of Equation (10).
if C₁(t − 1) = 0 and C₁(t) = 1 and C₁(t + 1) = 1
In-block node is t.
else if C₁(t − 1) = 1 and C₁(t) = 0 and C₁(t + 1) = 0
Off-block node is t.
end if
end for
Outputs: In-block and off-block nodes.

5.4. Collection of KMNs Based on Multi-Object Interaction

Unlike in-block and off-block node detection, which only needs to focus on the movement of the aircraft at the current position, docking and undocking stairs node collection needs to determine the movement of the mobile aircraft landing stairs while also analyzing the positional relationship between the mobile aircraft landing stairs and the parked aircraft. Condition 1 of Equation (10) can be used to determine the movement of the mobile aircraft landing stairs. Figure 8 reveals that the bounding boxes of the movable aircraft landing stairs are contained in the bounding box of the parked aircraft. Therefore, in order to represent the position relationship between the mobile aircraft landing stairs and the parked aircraft, the index IoU_av is proposed and expressed as

I o U_{a v}^{i} = \frac{B_{a}^{i} \cap B_{v}^{i}}{B_{v}^{i}},

(11)

where B_a and B_v are the bounding boxes of the parked aircraft and the movable aircraft landing stairs, respectively. When B_v is completely contained within the boundaries of B_a, IoU_av = 1.

The above observations indicate that there are two conditions for the docking of the parked aircraft and ground vehicles in the tth frame. The first condition is that the ground vehicle is in a stationary state that can be confirmed by Condition 1. The second condition reflects the positional relationship between the mobile aircraft landing stairs and the parked aircraft and is defined as

Condition 2 : C_{2} (t) = \{\begin{matrix} 1 \\ 0 \end{matrix} \begin{matrix}  \end{matrix} \begin{matrix} i f \end{matrix} \begin{matrix} I o U_{a v}^{t} > 0.9 \\ e l s e \end{matrix} .

(12)

On the basis of the two preconditions, the docking and undocking stairs node collection is demonstrated in Algorithm 3. Without loss of generality, this algorithm can be used to collect other KMNs based on multi-object interaction.

Algorithm 3. Docking and undocking stairs node detection

Inputs: For a sequence with num frames,

[x_{a c}^{i}, y_{a c}^{i}, w_{a}^{i}, h_{a}^{i}]

and

[x_{v c}^{i}, y_{v c}^{i}, w_{v}^{i}, h_{v}^{i}]

are the bounding boxes of aircraft and mobile aircraft landing stairs in each frame, respectively, i = 1,2, …, num. Th_c and Th_IoU are the thresholds of σ_c and σ_IoU of Condition 1, respectively.
for t = r + 2:num – r − 1
Computing Condition 2 C₂(t) using

[x_{a c}^{t}, y_{a c}^{t}, w_{a}^{t}, h_{a}^{t}]

and

[x_{v c}^{t}, y_{v c}^{t}, w_{v}^{t}, h_{v}^{t}]

if C₂(t) = 1
Using the bounding boxes of mobile aircraft landing stairs and Condition 1 to compute C₁(t − 1), C₁(t), and C₁(t + 1).
if C₁(t − 1) = 0 and C₁(t) = 1 and C₁(t + 1) = 1
Docking node is t.
else if C₁(t − 1) = 1 and C₁(t) = 0 and C₁(t + 1) = 0
Undocking node is t.
end if
end if
end for
Outputs: Docking and undocking stairs nodes.

6. Experimental Results and Analysis

Experiments were conducted to evaluate the performances of the proposed framework and modules. The related algorithms are executed on a GPU server equipped with an Intel(R) Xeon E51620 v3 CPU@3.50GHz×8, 64.0 GB of RAM, and NVIDIA GTX 1080Ti 12GB. We train and implement the proposed framework in Python 3.6.12 and CUDA 10.1.

6.1. Experimental Results of Detection and Recognition

The parameter settings of the algorithm used for detection and recognition training are summarized as follows. The training, validation, and test dataset are 60%, 20%, and 20%, respectively. The number of epochs is 30, and the batch size is 16. The weight decay rate was set to 0.0005, the momentum to 0.937, and the initial and final learning rates to 0.1 and 0.01, respectively. The following widely used metrics are adopted for the quantitative assessment of the performance of the algorithm of detection and recognition [36].

(1): Precision_C

P r e c i s i o n_{C} = \frac{T P_{C}}{T P_{C} + F P_{C}},

(13)

(2): Recall_C

R e c a l l_{C} = \frac{T P_{C}}{T P_{C} + F N_{C}}

(14)

(3): Mean Average Precision (mAP)

The precision values corresponding to all recall points under the precision–recall curve are averaged to calculate the average precision (AP). Furthermore, the average of AP across all classes is mAP.

Here, a true positive (TP) refers to the scenario that the actual class is positive, and the class predicted by the algorithm is also positive. A false positive (FP) and false negative (FN) refer to falsely predicted positive and negative values, respectively. C denotes the different categories.

Table 3 gives the evaluation results on our dataset for five classes of target. From this table, we can observe that mAP values are above 90% for both two target categories. Compared with aircraft detection, the detection performance of movable aircraft landing stairs do not perform as well as the former. In our opinion, this result is due to the shape diversity of mobile aircraft landing stairs (as shown in Figure 9). Table 4 lists the results of comparative experiments using different IoU losses, namely, IoU-loss, GIoU-loss, DIoU-loss, and CIoU-loss, in our sub-module of detection and recognition. In Table 4, although the Recall value for CIoU-loss is slightly worse than that for GIoU-loss, CIoU-loss outperforms the other two losses in terms of precision and mAP. Therefore, the detection module of the proposed framework uses CIoU-loss as the regression loss.

6.2. Experimental Results of the Prediction and Association

In this experiment, the following metrics provided in the MOT Challenge Benchmark are used to evaluate the performance of prediction and association. (1) Multi-object tracking accuracy (MOTA):

M O T A = 1 - \frac{\sum_{t = 1}^{n u m} F N_{t} + F P_{t} + I D S W_{t}}{\sum_{t = 1}^{n u m} G T_{t}} .

(15)

where num is the frame number of the sequence, and GT_t and IDSW_t are the number of ground-truth objects and the number of ID switches for all objects in the tth frame of the sequence, respectively.

(2) Multiple object tracking precision (MOTP)

M O T P = \frac{\sum_{t = 1}^{n u m} \sum_{i = 1}^{c_{t}} I o U_{i, t}}{\sum_{t = 1}^{n u m} c_{t}},

(16)

where c_t is the number of objects associated successfully in the tth frame, and IoU_i_,t is the bounding box overlap of the ith successfully associated object with the ground-truth object in the tth frame.

In Equation (9), the threshold T_a for the aircraft and T_v are set at 0.95 and 0.9, respectively. We select eight sets of video sequences from the dataset for experiments. These sequences consist of four types of KMN: in-block, off-block, docking stairs, and undocking stairs. The results of the quantitative assessment for eight test samples are given in Table 5. The experimental results show that the sub-module of the prediction and association used in our framework has only 33 ID switches owing to target association errors, whereas there are a total of 8397 objects detected. Specifically, the association success rate is high, with the average MOTA of 95.09%, and the localization accuracy is high, with the average MOTP of 92.63%.

Figure 10 shows several correlation results for qualitative analysis. The results for in-block reveal that the appearance of the aircraft varies considerably due to the number of turns required during in-block, but the aircraft ID number remains constant throughout this process. Similar results for off-block, docking stairs, and undocking stairs indicate that our method maintains association robustness in response to deformation. In Figure 10b, there are multiple targets such as craters, tractors, and aircraft. However, the ID of each target remains the same, indicating that the sub-module of the prediction and association remains robust against the complex background of the airport ground.

6.3. Experimental Results of the Collection of KMNs

KMNs are collected using the preprocessing module. The collection is evaluated according to the results of the above experiments. We tested all 42 video sequences from the dataset comprising video sequences. The ground truth of each KMN in the testing videos is manually annotated by a professional from the airport surface service department. The frame error (FE) and corresponding time error (TE) are used to quantitatively evaluate the performance:

F E = \frac{\sum_{i = 1}^{L} |N_{i} - N_{i}^{g t}|}{L}, T E = 0.2 \times F E,

(17)

where N_i is the frame index obtained by the proposed algorithm and

N_{i}^{g t}

is the corresponding ground truth. L is the number of test sequences. The frequency of the video sequence captured in the dataset is 5 fps, and the time interval between the current frame and the previous is thus 0.2 s. The (empirical) parameter settings are r = 5, Th_c = 5, and Th_IoU = 0.1 for the aircraft and Th_c = 3 and Th_IoU = 0.05 for the mobile aircraft landing stairs.

Results of quantitative evaluation are listed in Table 6. Figure 11a,b show the results of the collection of the four types of KMN at aircraft stands 939 and 734, respectively. In the figure, each time axis represents a time series in terms of the number of frame. The time axis is labeled with the origin and the total number of frames, and the predicted nodes of activity occurrence are shown as red dots, the actual nodes of activity occurrence are shown as blue dots, and the white dots are the process nodes of activity occurrence. The following observations are made from the aforementioned results. First, the error is largest for the in-block node. This is because aircraft have a larger bounding box compared with vehicles, making it more difficult to determine changes in aircraft movement. Second, the error at position 734 is slightly higher than that at position 939. This is due to more interference around position 734, which is farther away from the camera, and less interference around position 939, where the positioning information is more accurate. Third, the timing error in the automatic collection of KMNs is well below 60 s, as required by the A-CDM system. To sum up, since all of these test samples were from the 734 and 939 aircraft positions at Beijing Capital Airport, we believe that the proposed framework can be applied directly to KMN collection of these two aircraft stands.

It is well known that results of the collection of KMNs using the proposed framework depended on the change in the position and movement state of the executor of key milestone nodes. Therefore, timing error in the automatic collection of KMNs mainly comes from the extraction accuracy of the spatiotemporal information of the executor. Due to the complex background of the airport ground, it is difficult to improve this accuracy and thus reduce the timing error of KMN collection by only using the spatiotemporal information from the surveillance cameras. Consequently, in future work, it is essential to introduce heterogeneous information from other sensors (e.g., LIDAR, etc.) to improve the extraction accuracy of the spatiotemporal information of the executor, while ensuring the operational safety of the airport ground. Furthermore, it must also be pointed out that several thresholds in the module of collection of KMNs presented in this paper are preset according to the situation of aircraft stands 939 and 734 of Beijing Capital Airport. Since these thresholds are closely related to the distance between the aircraft and the camera, the size of the image resolution, etc., they are not universal and have to be recalibrated manually when the distance between the surveillance camera and the target or when the camera parameters change.

7. Conclusions and Future Work

The automatic collection of key milestone nodes in the process of aircraft turnaround plays an important role in terms of upgrading the intelligence of airport ground operations. The main purpose of this study was to exploit a framework of the automatic collection of four KMNs in the process of aircraft turnaround based on apron surveillance videos. To achieve this goals, two datasets of imagery captured at Beijing Capital International Airport were firstly established to train, test, and assess the proposed framework and related algorithms. Secondly, to extract spatiotemporal information of the executors of KMNs from the complex background of the airport ground, a preprocessing module seamlessly integrating state-of-the-art detection and tracking algorithms was proposed. Specifically, a detector which uses the CIoU-loss function to improve the positioning accuracy of the object in the YOLOV5 was designed. Moreover, the sub-modules of prediction and association were combined with the detector to improve both the accuracy and real-time performance of the spatiotemporal feature extraction. Thirdly, for the nodes based on the action of a single target (represented by in-block and off-block) and the interaction of two targets (represented by docking and undocking stairs), two approaches of automatic collection of KMNs were proposed. Experimental results on the two remote stands (No.734 and No.939) of Beijing Capital International Airport showed that the timing error of the proposed framework was well under 60 s, meeting the requirements of the A-CDM system.

Although we initially assessed the proposed framework and found it to be promising, there is still a lot of room for further development. First, instance segmentation-based methods will be developed to obtain more accurate target bounds. Second, as the datasets is further expanded, deep learning-based action localization methods will be investigated to further improve the robustness and accuracy of the automatic collection of KMNs against complex background on the airport grounds. Moreover, the autonomous selection of the thresholds of Algorithms 2 and 3 is also one of the important tasks in the future work.

Author Contributions

Conceptualization, J.X. and M.D.; methodology, M.D.; software, Z.-Z.Z.; validation, J.X. and F.Z.; formal analysis, Y.-B.X.; writing—original draft preparation, J.X.; writing—review and editing, M.D.; visualization, M.D. and Z.-Z.Z.; supervision, M.D. and X.-H.W.; project administration, M.D. All authors have read and agreed to the published version of the manuscript.

Funding

This work is co-supported by the National Natural Science Foundation of China (No. U2033201). It is also supported by the Opening Project of Civil Aviation Satellite Application Engineering Technology Research Center (RCCASA-2022003) and Innovation Fund of COMAC (GCZX-2022-03).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, B.; Wang, L.; Xing, Z.; Luo, Q. Performance Evaluation of Multiflight Ground Handling Process. Aerospace 2022, 9, 273. [Google Scholar] [CrossRef]
A-CDM Milestones, Mainly the Target off Block Time (TOBT). Available online: http://www.eurocontrol.int/articles/air-portcollaborative-decision-making-cdm (accessed on 3 March 2023).
More, D.; Sharma, R. The turnaround time of an aircraft: A competitive weapon for an airline company. Decision 2014, 41, 489–497. [Google Scholar] [CrossRef]
Airport-Collaborative Decision Making (A-CDM): IATA Recommendations. Available online: https://www.iata.org/contentassets/5c1a116a6120415f87f3dadfa38859d2/iata-acdm-recommendations-v1.pdf (accessed on 5 March 2023).
Wei, K.J.; Vikrant, V.; Alexandre, J. Airline timetable development and fleet assignment incorporating passenger choice. Transp. Sci. 2020, 54, 139–163. [Google Scholar] [CrossRef]
Tian, Y.; Liu, H.; Feng, H.; Wu, B.; Wu, G. Virtual simulation-based evaluation of ground handling for future aircraft concepts. J. Aerosp. Inf. Syst. 2013, 10, 218–228. [Google Scholar] [CrossRef]
Perl, E. Review of Airport Surface Movement Radar Technology. IEEE Aerosp. Electron. Syst. Mag. 2006, 21, 24–27. [Google Scholar] [CrossRef]
Xiong, Z.; Li, M.; Ma, Y.; Wu, X. Vehicle Re-Identification with Image Processing and Car-Following Model Using Multiple Surveillance Cameras from Urban Arterials. IEEE Trans. Intell. Transp. Syst. 2020, 22, 7619–7630. [Google Scholar] [CrossRef]
Zhang, C.; Li, F.; Ou, J.; Xie, P.; Sheng, W. A New Cellular Vehicle-to-Everything Application: Daytime Visibility Detection and Prewarning on Expressways. IEEE Intell. Transp. Syst. Mag. 2022, 15, 85–98. [Google Scholar] [CrossRef]
Besada, J.A.; Garcia, J.; Portillo, J.; Molina, J.M.; Varona, A.; Gonzalez, G. Airport Surface Surveillance Based on Video Images. IEEE Trans. Intell. Transp. Syst. 2005, 41, 1075–1082. [Google Scholar]
Thirde, D.; Borg, M.; Ferryman, J. A real-time scene understanding system for airport apron monitoring. In Proceedings of the IEEE International Conference on Computer Vision System, New York, NY, USA, 4–7 January 2006. [Google Scholar]
Zhang, X.; Qiao, Y. A video surveillance network for airport ground moving targets. In Proceedings of the International Conference on Mobile Networks and Management, Chiba, Japan, 10–12 November 2020; pp. 229–237. [Google Scholar]
Netto, O.; Silva, J.; Baltazar, M. The airport A-CDM operational implementation description and challenges. J. Airl. Airpt. Manag. 2020, 10, 14–30. [Google Scholar] [CrossRef]
Simaiakis, I.; Balakrishnan, H. A queuing model of the airport departure process. Transp. Sci. 2016, 50, 94–109. [Google Scholar] [CrossRef]
Voulgarellis, P.G.; Christodoulou, M.A.; Boutalis, Y.S. A MATLAB based simulation language for aircraft ground handling operations at hub airports (SLAGOM). In Proceedings of the 2005 IEEE International Symposium on Mediterrean Conference on Control and Automation Intelligent Control, Limassol, Cyprus, 27–29 June 2005; pp. 334–339. [Google Scholar]
Wu, C.L. Monitoring aircraft turnaround operations–framework development, application and implications for airline operations. Transp. Plan. Technol. 2008, 31, 215–228. [Google Scholar] [CrossRef]
Lu, H.L.; Vaddi, S.; Cheng, V.V.; Tsai, J. Airport Gate Operation Monitoring Using Computer Vision Techniques. In Proceedings of the 16th AIAA Aviation Technology, Integration, Operations Conference, Washington, DC, USA, 13–17 June 2016. [Google Scholar]
Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D.; Ramanan, D. Object detection with discriminatively trained part based models. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1627–1645. [Google Scholar] [CrossRef] [PubMed]
Thai, P.; Alam, S.; Lilith, N.; Phu, T.N.; Nguyen, B.T. Aircraft Push-back Prediction and Turnaround Monitoring by Vision-based Object Detection and Activity Identification. In Proceedings of the 10th SESAR Innovation Days, Online, 7–10 December 2020. [Google Scholar]
Thai, P.; Alam, S.; Lilith, N.; Nguyen, B.T. A computer vision framework using Convolutional Neural Networks for airport-airside surveillance. Transp. Res. Part C Emerg. Technol. 2022, 137, 103590. [Google Scholar] [CrossRef]
Yıldız, S.; Aydemir, O.; Memiş, A.; Varlı, S. A turnaround control system to automatically detect and monitor the time stamps of ground service actions in airports: A deep learning and computer vision based approach. Eng. Appl. Artif. Intell. 2022, 114, 105032. [Google Scholar] [CrossRef]
Available online: https://medium.com/@michaelgorkow/aircraft-turnaround-management-using-computer-vision-4bec29838c08 (accessed on 6 March 2023).
Zaidi SS, A.; Ansari, M.S.; Aslam, A.; Kanwal, N.; Asghar, M.; Lee, B. A survey of modern deep learning based object detection models. Digit. Signal Process. 2022, 26, 103514. [Google Scholar] [CrossRef]
Dalal, N.; Triggs, B. Histograms of Oriented Gradients for Human Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; pp. 886–893. [Google Scholar]
Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2012, 60, 84–90. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 26 June–1 July 2015; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Mark Liao, H.Y. YOLOv4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Kasper-Eulaers, M.; Hahn, N.; Berger, S.; Sebulonsen, T.; Myrland, Ø.; Kummervold, P.E. Detecting heavy goods vehicles in rest areas in winter conditions using YOLOv5. Algorithms 2021, 14, 114. [Google Scholar] [CrossRef]
Ge, Z.; Liu, S.; Wang, F.; Li, Z.; Sun, J. Yolox: Exceeding yolo series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Wu, H.; Du, C.; Ji, Z.; Gao, M.; He, Z. SORT-YM: An Algorithm of Multi-Object Tracking with YOLOv4-Tiny and Motion Prediction. Electronics 2021, 10, 2319. [Google Scholar] [CrossRef]
Kalman, R.E. A New Approach to Linear Filtering and Prediction Problems. J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef]
Kuhn, H.W. The Hungarian method for the assignment problem. Nav. Res. Logist. Q. 1955, 2, 83–97. [Google Scholar] [CrossRef]
Davis, J.; Goadrich, M. The relationship between Precision-Recall and ROC curves. In Proceedings of the 23rd International Conference on Machine Learning, New York, NY, USA, 25–29 June 2006; pp. 233–240. [Google Scholar]

Figure 1. KMNs in aircraft turnaround. The nodes in the dashed rectangle are parallel processes.

Figure 2. Relationship between components of the proposed framework and our datasets.

Figure 3. Sample number of different categories and the location and scale distributions of the sample in the dataset. (a) Sample numbers of different categories. (b) Location and scale distributions of the samples in the dataset. For the location distribution (left), x_c and y_c are the horizontal and vertical coordinates of the center point of the bounding box of each object, respectively. W and H are the width and height of the image, respectively. For the scale distribution (right), w and h are the width and height of the bounding box of the object, respectively.

Figure 4. Two remote stands at Beijing Capital International Airport (IATA: PEK, ICAO: ZBAA) where original videos of the dataset were captured.

Figure 5. Flowchart of the preprocessing module.

Figure 6. Flowchart of the automatic collection of four KMNs.

Figure 7. Results of aircraft confirmation at the current position. Blue rectangles are the bounding boxes of the aircraft confirmed by Algorithm 1.

Figure 8. Positional relationship between the mobile aircraft landing stairs and parked aircraft.

Figure 9. Detection and recognition of targets of the same class with different appearances.

Figure 10. Association results of four video sequences.

Figure 11. Results of the automatic collection of four KMNs.

Table 1. A brief comparison of the proposed work with the latest related works from the perspectives of the specific implementation.

Related Work	Detection	Tracking	KMNs Collection
Thai et al. 2022 [21]	AirNet (In terms of its architecture, AirNet is a one-stage detector with a module of bidirectional feature pyramid network module)	By comparing the positions of object bounding boxes and classes between previous and current frames, the same object in adjacent frames is matched.	Activity identification by the relationship between the involved objects and aircraft, the speed of the object, and the Intersection over Union (IoU) of the object and aircraft.
Yıldız S et al. 2022 [22]	YOLO v3	The MOSSE (Minimum Output Sum of Squared Error)	Ground services are recognized by using the motion status (stopping and moving) of the vehicles providing services.
Ours	Improved YOLO v5 to obtain more precise bounding boxes.	Position prediction and bounding boxes of association in adjacent frames based on IoU.	Two methods for collecting KMNs are proposed, aiming at nodes based on a single target and nodes based on the interaction of two targets.

Table 2. Distribution of video sequences in the dataset.

Category	No. 734	No. 939	Total
In-block	14	2	16
Off-block	13	3	16
Docking of mobile aircraft landing stairs	3	4	7
Undocking of mobile aircraft landing stairs	2	1	3
Total	32	10	42

Table 3. Performance metrics of the sub-module of detection and recognition.

Class/Metric	Precision_c	Recall_c	mAP
Aircraft	93%	90.1%	94%
Mobile aircraft landing stairs	89.3%	90.1%	91.9%

Table 4. Comparative experiments using different L_reg.

L_reg	Precision_c	Recall_c	mAP
IoU-loss	91.7%	90.3%	94.2%
GIoU-loss	91.1%	93.1%	94.3%
DIoU-loss	93.5%	90.8%	93.6%
CIoU-loss	93.6%	91.6%	94.7%

The best result in each column is shown in bold.

Table 5. Association results of the quantitative assessment for eight video sequences.

Nodes	num	IDSW	MOTA	MOTP
Off-block 1	323	3	94.25%	90.56%
Off-block 2	323	13	89.79%	91.25%
In-block 1	292	0	96.61%	95.63%
In-block 2	324	12	92.35%	94.56%
Docking stairs 1	239	0	97.34%	94.4%
Docking stairs 2	874	0	99.54%	91.3%
Undocking stairs 1	252	5	96.52%	91.23%
Undocking stairs 2	421	0	94.29%	92.1%

Table 6. Quantitative evaluation of the automatic collection of KMNs.

Position No.	Nodes	L	FE(Unit: Frame)	TE(Unit: Second)
No.939	In-block	2	51	10.2
	Off-block	3	9	1.8
	Docking stairs	4	27	5.4
	Undocking stairs	1	29	5.8
No.734	In-block	14	63	12.6
	Off-block	13	8	1.6
	Docking stairs	3	46	9.2
	Undocking stairs	2	36	7.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, J.; Ding, M.; Zhang, Z.-Z.; Xu, Y.-B.; Wang, X.-H.; Zhao, F. Vision-Based Automatic Collection of Nodes of In/Off Block and Docking/Undocking in Aircraft Turnaround. Appl. Sci. 2023, 13, 7832. https://doi.org/10.3390/app13137832

AMA Style

Xu J, Ding M, Zhang Z-Z, Xu Y-B, Wang X-H, Zhao F. Vision-Based Automatic Collection of Nodes of In/Off Block and Docking/Undocking in Aircraft Turnaround. Applied Sciences. 2023; 13(13):7832. https://doi.org/10.3390/app13137832

Chicago/Turabian Style

Xu, Juan, Meng Ding, Zhen-Zhen Zhang, Yu-Bin Xu, Xu-Hui Wang, and Fan Zhao. 2023. "Vision-Based Automatic Collection of Nodes of In/Off Block and Docking/Undocking in Aircraft Turnaround" Applied Sciences 13, no. 13: 7832. https://doi.org/10.3390/app13137832

APA Style

Xu, J., Ding, M., Zhang, Z.-Z., Xu, Y.-B., Wang, X.-H., & Zhao, F. (2023). Vision-Based Automatic Collection of Nodes of In/Off Block and Docking/Undocking in Aircraft Turnaround. Applied Sciences, 13(13), 7832. https://doi.org/10.3390/app13137832

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Vision-Based Automatic Collection of Nodes of In/Off Block and Docking/Undocking in Aircraft Turnaround

Abstract

1. Introduction

2. Related Works

2.1. Collection of KMNs

2.2. Preprocessing for the Collection of KMNs

3. Dataset Collection

3.1. Dataset Comprising Single Images

3.2. Dataset Comprising Video Sequences

4. Preprocessing Module

4.1. Detection and Recognition

4.2. Position Prediction

4.3. Association

5. Collection of KMNs

5.1. Aircraft Confirmation at the Current Position

5.2. Motion State Estimation for a Single Target

5.3. Collection of KMNs Based on a Single Target

5.4. Collection of KMNs Based on Multi-Object Interaction

6. Experimental Results and Analysis

6.1. Experimental Results of Detection and Recognition

6.2. Experimental Results of the Prediction and Association

6.3. Experimental Results of the Collection of KMNs

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI