Defend Against Property Inference Attack for Flight Operations Data Sharing in FedMeta Framework

Lei, Jin; Li, Weiyun; Yue, Meng; Wu, Zhijun

doi:10.3390/aerospace12010041

Open AccessArticle

Defend Against Property Inference Attack for Flight Operations Data Sharing in FedMeta Framework

College of Safety Science and Engineering, Civil Aviation University of China, No. 2898 Jinbei Highway, Tianjin 300300, China

^*

Author to whom correspondence should be addressed.

Aerospace 2025, 12(1), 41; https://doi.org/10.3390/aerospace12010041

Submission received: 20 November 2024 / Revised: 8 January 2025 / Accepted: 9 January 2025 / Published: 11 January 2025

(This article belongs to the Section Air Traffic and Transportation)

Download

Browse Figures

Versions Notes

Abstract

:

Flight operations data play a central role in ensuring flight safety, optimizing operations, and driving innovation. However, these data have become a key target for cyber-attacks, and are especially vulnerable to property inference attacks. Aiming at property inference attacks in shared application model training, we proposed FedMeta-CTGAN, a novel approach that leverages federated meta-learning and conditional tabular generative adversarial networks (CTGANs) to protect flight operations data. Motivated by the need for secure data sharing in aviation, as highlighted by the Federal Aviation Administration’s requirement for ADS-B Out equipment on aircraft to create a shared situational awareness environment, our method aims to prevent sensitive information leakage while maintaining model performance. FedMeta-CTGAN exploits the natural privacy-preserving properties of a two-stage update in meta-learning, using real data to train the CTGAN model and synthetic fake data as query data during meta-training. Comprehensive experiments using a real flight operation dataset demonstrate the effectiveness of our proposed method. FedMeta-CTGAN adapts quickly to unbalanced data, achieving a prediction accuracy of 96.33%, while reducing the attacker’s inference AUC score to 0.51 under property inference attacks. Our contribution lies in the development of a secure and efficient data-sharing solution for flight operations data, which has the potential to revolutionize the aviation industry.

Keywords:

property inference attack; federated meta-learning; flight operation data; privacy preserving

1. Introduction

Through the sharing of flight operations data [1,2], the production and operation activities of civil aviation enterprises can be optimized further and international community aviation safety can be guaranteed with technical support [3,4,5,6,7]. Safety information sharing within aviation can further promote aviation safety by utilizing industry safety data to identify risks and assess the effectiveness of mitigation measures. The Federal Aviation Administration (FAA) requires aircraft to be fitted with ADS-B Out in order to fly in most controlled airspaces, creating a shared situational awareness environment through the sharing of ADS-B data information between aircraft.

However, in flight operation data sharing, private data breaches occur from time to time, which pose a threat to aviation security. ADS-B real-time data sharing utilizes the uniformly assigned aircraft registration number of the International Civil Aviation Organization (ICAO), which makes it possible for any ADS-B transceiving and receiving device with which an individual is registered to be identified and tracking the real-time location of an aircraft becomes possible; information services such as the Aerodrome Terminal Information Service (ATIS) reveal aircraft intentions, including landing locations and times, which allows remote listeners to determine where an aircraft has landed. Related bills such as GDPR and CCPA, have resulted in strict restrictions on the transfer and use of personal data. Meanwhile, after the September 11 attacks in the U.S., incoming personnel are required to provide passenger name record (PNR) data; additionally, during the COVID-19 pandemic, many governments passed laws requiring the collection of information about incoming passengers such as information about their medical conditions. Thus, there is a difficult trade-off between data sharing and privacy protection.

Scholars have proposed some approaches in flight operation sharing [8,9,10,11]. However, these methods basically rely on the traditional centralized architecture of information technology methods, lack of information interaction, reliability and security cannot be guaranteed. How to effectively counter privacy threats such as property inference attacks while safeguarding data sharing has become a research hotspot and a difficulty in the current research.

Compared to traditional federated learning, federated meta-learning [12,13] has been applied in scenarios such as finance, healthcare, and manufacturing [14,15] due to its fast convergence and strong personalization capabilities. Although the data are retained locally in federated meta-learning, property inference attackers are able to infer the presence of sensitive attributes by collecting the gradients of the shared participants. A property inference attacker aims to extract sensitive property information directly from real data. CTGAN [16] is a leading method for generating structured data, offering broad application potential in privacy preservation. It uses GANs to synthesize realistic tabular data while maintaining statistical properties, making it highly valuable for scenarios where data privacy is paramount, such as in medical research, financial analysis, and personal information handling.

In this article, the focus is on addressing the following three issues arising from the sharing of flight operations data, which are dispersed among various units, posing complexities in integration and management, and privacy risks like property inference attacks:

(1): Unbalanced data distribution: Flight operation data come from many different regions and sharers vary greatly in quantity and characteristics, impacting data-sharing comprehensiveness and completeness, and potentially biasing analysis and model training.
(2): Sensitive data inference attacks: Flight data contain identifying information (e.g., passengers, aircraft details) that, if obtained improperly, threaten individual privacy and security, and may harm organizational operational security, trade secrets, and reputation.
(3): Data availability vs. privacy conflict: While data openness is necessary for analysis and utilization, strict privacy policies are required to prevent leakage and misuse.

In this article, we propose FedMeta-CTGAN to defend against property inference attacks on flight operations data. FedMeta-CTGAN leverages the inherent privacy-preserving characteristics of meta-learning by utilizing synthetic data generated by the CTGAN model as the query dataset in meta-training. With FedMeta-CTGAN, the federated meta-learning framework ensures adaptability to unbalanced data [17,18,19] while preserving privacy, as the data remain confined within local areas without needing external transmission. For additional privacy safeguards, CTGAN is employed to generate synthetic data, which serve as the meta-training query dataset. This approach ensures that the gradient information accessible to the property inference attacker pertains solely to the synthetic data, thereby protecting the privacy of the real data. In terms of utility assurance, CTGAN, a state-of-the-art technique for structured fake data generation, is utilized to produce high-quality synthetic data. Furthermore, incorporating synthetic data based on CTGAN into the training dataset neither compromises the model’s overall performance nor leads to overfitting. Moreover, while the previous research primarily relies on generalized datasets for validation, in this article, the use is pioneered of a proprietary dataset from the aviation industry to provide a comprehensive and in-depth assessment of data availability and privacy in the context of flight delay prediction.

2. Related Works

The main entities involved in flight operation data sharing include airports, airlines, the ATC, and other operational units, with various types of data involved. Given the variety and volume of operational data, data sharing is not only a legal requirement but also crucial for ensuring personal safety and enhancing societal operational efficiency.

In the current data sharing in the field of civil aviation, various data providers have taken note of the privacy issues and taken certain measures. In 2007, the US Federal Aviation Administration (FAA), in cooperation with the aviation industry, launched the Aviation Safety Information Analysis and Sharing (ASIAS) program, which aims to use safety data to enhance aviation safety. Sensitive data protection is ensured through the development of data protection and de-identification protocols [20]. Real-time ADS-B data sharing plays an important role in the U.S. National Airspace System (NAS), but also raises privacy concerns. The FAA uses the International Civil Aviation Organization (ICAO) Aircraft Addressing (PIA) privacy scheme to protect aircraft privacy [21]. For transnational flights, data sharing is not only a guarantee of operational efficiency, but also a requirement for national security considerations. In May 2024, the International Air Transport Association (IATA) released its first white paper, noting that differences in data ownership and privacy protection laws make implementation more difficult. For example, the U.S. government’s collection of passenger name record (PNR) data and different countries’ passenger information collection requirements during an outbreak are subject to privacy protection laws [22].

The application of federated meta-learning to flight operation data-sharing scenarios can effectively tackle the data imbalance among various parties, thereby significantly enhancing the effectiveness and efficiency of data sharing. However, federated meta-learning, as a distributed machine learning paradigm, while demonstrating significant potential in facilitating data-sharing and model training, also faces severe challenges such as property inference attacks [23,24]. Property inference attack has the ability to deduce the properties of the training data that are unrelated to the learning objective. Although this issue has not received extensive attention, it can lead to severe privacy breaches. In the current cybersecurity research on federated learning combined with meta-learning, more consideration is given to backdoor attacks and adversarial attacks [25,26], and no real consideration is given to privacy parameter leakage in collaborative frameworks, and they always take advantage of the fact that the federated learning data do not go out of the local area to achieve privacy protection [27,28,29]. The dominant approaches in federated learning to study inference attacks against parameter leakage are the use of differential privacy (DP) to reduce parameter availability [30,31] or cryptographic methods to encrypt the transmitted parameters, whereas the addition of noise in large-scale applications affects the model accuracy, and the encryption and decryption process incurs a greater cost in terms of communication and time [32,33]. One of the most important ideas in these approaches is to hide sensitive parameters, and with the application of generative adversarial networks in data augmentation, high quality synthetic data possessing the feature information of real data while not containing sensitive information is gradually starting to be noticed by researchers [34,35,36]. Numerous studies have leveraged generative adversarial networks (GANs) to address various security and privacy concerns [37,38,39,40,41,42,43]; however, there are no methods for countering attribute inference attacks in these studies. As of now, the academic community has not yet proposed a universally recognized effective defense method against property inference attacks.

3. Systemic Problem Formulation

3.1. Privacy Threats and Goals

Federated meta-learning [27] can combine the privacy-preserving benefits of federated learning with the rapid adaptability of meta-learning [13] to address the challenges of flight operations data sharing. On the one hand, for the problem of data imbalance, meta-learning is able to perform fast learning to quickly acquire knowledge from imbalanced data, and after obtaining global meta-models in federated meta-learning, each data sharer can obtain personalized models by performing a few steps of gradient descent optimization process on the obtained global metamodel using their own local dataset [14], which is very suitable for dealing with the sharing effect problem caused by imbalanced data. On the other hand, federated meta-learning ensures that the original data remain local, providing a degree of privacy protection.

In federated meta-learning-based flight operation data sharing, although the original data remain local, the knowledge shared during collaborative learning can still leak sensitive information from flight operation data [19]. Specifically, the gradient parameters shared during the process might unintentionally leak features, allowing attackers to launch property inference attack and deduce certain sensitive property unrelated to the shared objective [24]. In flight operation data sharing, there might be identifying information about the data-sharing participants, which, once leaked, makes it difficult for the data sharing to continue to gain support from more parties, and also violates the interests and privacy of the concerned data-sharing parties.

Our goal is to propose a privacy-preservation approach in federated meta-learning-based flight operation data sharing to defend property inference attack, ensure privacy compliance, and facilitate data sharing.

3.2. System Framework

In a federated meta-learning framework for flight operation data sharing, there are primarily the following two types of participants: (1) central server and (2) data-sharing participants. In data sharing, we assume the existence of an honest but curious data-sharing participant. An honest but curious participant faithfully follows the sharing process but is also interested in collecting private and sensitive information about other participants. Therefore, we also introduce an Adversary among the sharing participants. In privacy-preserving flight operation data sharing, each sharing participant aims to acquire knowledge from each participant by sharing global meta-model parameters instead of local data through joint collaborative learning, and after obtaining the final shared model, the participant can further perform few iterations to obtain a personalized model. The Adversary submits compliant information for training in the data sharing, while they can compromise one or more participants and collect training loss gradient information from one or more victims to launch an active or passive property inference attack, as shown in Figure 1.

The central server is responsible for the initialization, distribution, and aggregation of the global meta-model in training. It aggregates knowledge from multiple data-sharing participants by combining local meta-model parameters, and then passes the global meta-model parameters to staff involved in data sharing for the next round of data sharing. Additionally, after obtaining the final global meta-model, the central server sends it again to each data-sharing participant, allowing them to train and obtain their final personalized models using their own meta-adaption datasets.
Data-sharing participants include airlines, airports, the ATC, and other civil aviation entities. Each data-sharing participant in federated meta-learning has the same goal, i.e., the global training models in sharing are ultimately designed to achieve the same data-sharing application. In data sharing, the data of sharing participants remain local and are used as the local dataset to train on the obtained shared model to learn multi-party knowledge. After outer update in meta learning, obtained local model parameters are transferred for aggregated update. At last, each data-sharing participant obtains the final global shared model, which uses their local meta-adaption dataset to obtain the local personalized shared model.

3.3. Adversary Model

Adversary is an honest and curious data-sharing participant who has supporting data which are consistent in type with the other sharing participants’ data. In data sharing, the Adversary is able to utilize the auxiliary data for learning. Adversary honestly submits compliant information to the central server for data-sharing model updating, adhering to the relevant rules of federated meta-learning. At the same time, Adversary is also able to launch property inference attacks and train an attack classifier to reason sensitive property of interest to the Adversary containing the data held by other data-sharing participants. The Adversary has the following capabilities:

Adversary is able to launch passive property inference attacks by collecting shared models and shared knowledge to obtain model update parameters submitted by other data-sharing participants. Adversary collects snapshots of the global model for each round and computes the parameter gradient information based on the auxiliary data and labels each piece of gradient information. The gradient information obtained from auxiliary data with sensitive property is labeled as 1, and the gradient information obtained from auxiliary data without sensitive property is labeled as 0. Collecting this labeled information serves to train a binary property inference attack classifier. Next, in the training process, Adversary can collect the loss of other participants to obtain the gradient update information, put the gradient information into the trained binary property inference attack classifier for prediction, and infer whether there is sensitive property information in the participant according to the prediction results; Figure 2 introduces the flowchart of the property inference attack.
Adversary is able to access the shared model and shared knowledge during the training of the data-sharing model, and at the same time is able to submit the local shared model after modifying the training process to launch an active property inference attack. The Adversary that launches an active property inference attack not only collects the shared information to train a binary property classifier, but is also able to participate in the training using a local multi-task learning model that excels in the principal as well as the sensitive property inference challenge, making the resulting global meta-model better at separating data points with sensitive property from those without sensitive data property, as shown in Figure 3.
Adversary is able to victimize one or more data-sharing participants while launching an active or passive property inference attack. When there are only Adversary and the victim compromised by the Adversary among the participants, Adversary computes the updated gradient by collecting information relentlessly in training and directly reasons privacy messages embedded in the data. However, when there are all sincere parties, the Adversary can only obtain a general update by computing the update gradient of the participants.

4. FedMeta-CTGAN: Implementation Details

We present the specific schematic of FedMeta-CTGAN, as shown in Figure 4. In federated meta-learning, a sharing participant performs a meta-training process to learn shared knowledge and train a global shared meta-model through a continuous iterative process. Each shared participant then performs meta-adaption using a meta-adaption dataset to complete the personalized model by only a bit of the gradient parameter update processes. Information is only shared during the meta-training process, so a property inference attacker can only gather information during the meta-training process. Private information observed by the property inference attacker is the result of the external update performed on the query dataset, so we consider hiding sensitive data information in the meta-training query dataset. And we consider the use of the current structured fake data generation state of the art (SOTA) method CTGAN, which generates the data from a real data distribution which is used in training process; by doing that, a property inference attacker finds it hard to gain sensitive property information in real data. The symbols in this part are represented in Table 1.

4.1. Edge Synergy-Empowered FedMeta-CTGAN Training

Due to the rapid learning capabilities of meta-learning, applying meta-learning methods to federated learning for knowledge sharing among clients can combine the privacy protection features of federated learning, ensuring both the usability and privacy of flight operation data shared across multiple data sources. Meta-learning consists of two learning phases, training and adaptation, with a two-step updating process taking place in each learning phase update. Each process uses a partitioned dataset for updating, and in the case of one of the learning tasks T_k, the datasets it has throughout the learning phase can be represented as

D_{t r a i n, s}^{T_{k}}

,

D_{t r a i n, q}^{T_{k}}

,

D_{a d a p t, s}^{T_{k}}

,

D_{a d a p t, q}^{T_{k}}

, respectively.

While we apply the process of meta-learning to the federated learning framework, the two learning processes of meta-learning play different roles. When data sharers use the federation framework for common knowledge learning, they use the learning approach that is through the training process of meta-learning, and the information exchange transfer is still in the common federation averaging setup. At the end of the sharing process, the participating parties obtain the parameters of a learned model that gathers the wisdom of multiple parties. Further, the adaptive step of meta-learning becomes a way for each sharing party to obtain a privately customized model.

The global meta-model

θ

is first initialized, each client k then performs inner update on its local task T_k using its local dataset

D_{t r a i n, s}^{T_{k}}

, resulting in a locally updated meta-learning model

{\bar{θ}}_{k}

:

{\bar{θ}}_{k} \leftarrow θ_{k} - α \nabla f_{k} (θ_{k}; D_{t r a i n, s}^{T_{k}})

(1)

where

θ_{k}

is the meta-model sent from the server to client k;

α

is the learning rate in inner loop; and

f_{k} (\cdot)

is the testing loss in the local meta-model

θ_{k}

for meta-training support dataset

D_{t r a i n, s}^{T_{k}}

.

Each meta-training involves a two-stage update using these datasets, as follows: (1) inner update using the support dataset and (2) outer update using the query dataset. Then, the local meta-model updates

{\bar{θ}}_{k}

obtained from the inner update are evaluated on the meta-training query dataset

D_{t r a i n, q}^{T_{k}}

to obtain the final local update model:

θ_{k} \leftarrow θ_{k} - β \nabla f_{k} ({\bar{θ}}_{k}; D_{t r a i n, q}^{T_{k}})

(2)

where

θ_{k}

is the meta-model of client k;

β

is the learning rate in outer loop; and

f_{k} (\cdot)

denotes the test loss in the local meta-model

{\bar{θ}}_{k}

for the meta-training query dataset

D_{t r a i n, q}^{T_{k}}

.

Each client k follows that to complete the local task T_k to obtain a local meta-model, they pass these parameters to the central server and use the average of these local update values to implement the aggregated update. In addition, each client k can use the meta-model to quickly learn new tasks, and the personalized model is obtained in the meta-adaption dataset through an iterative process of few rounds.

4.2. FedMeta-CTGAN Defend Property Inference Attack

In FedMeta-CTGAN, meta-training aims to acquire common knowledge among participating parties, which poses a privacy risk of gradient leakage. Meta-adaption allows each participating party to perform personalized learning quickly using its private dataset, resulting in a private, shared model. This process does not involve information sharing, making gradient updates less observable. The specific implementation of FedMeta-CTGAN is shown in Algorithm 1.

In the two-stage update of meta-training, the local task meta-model information uploaded to the central server is obtained after executing an outer update, so the gradient information observed by the property inference attacker has a lot to do with the query dataset in meta training and is less about the meta-training support dataset, which can be exploited to defend the property inference attack by hiding the data information which is used in outer update.

In order to hide the information of the meta-training query dataset, we consider using synthetic data generated by CTGAN instead of the real data; the real data are put into CTGAN for training and the obtained synthetic data are used as the meta-training query dataset. Most of the sensitive data of flight operation data are tabular data, and also considering the utility of the model, we adopt the current SOTA method of structured fake data generation—Modeling Tabular Data using Conditional Generative Adversarial Networks (CTGAN). CTGAN solves the problem of unbalanced discrete columns in tabular data using a conditional generator that generates synthetic data conditional on one of the discrete columns. In addition, CTGAN’s sampling training allows specific conditions to be sampled so that the generated synthetic data match the distribution of discrete variables in the training data, and an evaluator estimates similarity of the generated conditional distribution as well as the conditional distribution on the real data. Thus, CTGAN is able to explore all possible discrete data values uniformly, generating high quality synthetic data that can be used as data augmentation.

The CTGAN model is shown in Figure 5. In the real dataset, first a discrete column is randomly selected from all the discrete columns with the same probability. If column

D_{2}

is selected in Figure 5, thus

i^{*} = 2

. A probability mass function (PMF) is constructed in the column

D_{i^{*}}

selected. Next, a value is randomly selected based on the PMF, the discrete column

D_{2}

in Figure 5 has two values, and category 1 is selected, so at this point

k^{*} = 1

. The condition vector

c o n d

is introduced as a way of specifying the condition

(D_{i^{*}} = k^{*})

, so at this point the

k^{*} - th

element of the

i^{*} - th

mask vector of the condition vector

c o n d

is set to 1. In generating the set of uniquely hot discrete vectors

\{{\hat{d}}_{1}, \dots, {\hat{d}}_{n}\}

, the conditions are set to force the generator to generate

{\hat{d}}_{i^{*}} = m_{i^{*}}

. The output is estimated as the distance between the learned conditional distribution

P_{G} (r o w |D_{i^{*}} = k^{*})

and the conditional distribution on the actual data

P (r o w |D_{i^{*}} = k^{*})

.

Algorithm 1: FedMeta-CTGAN

Input: local raw dataset

D_{t r a i n, s}^{T_{k}}, D_{t r a i n, q}^{T_{k}}, D_{a d a p t, s}^{T_{k}}, D_{t r a i n, q}^{T_{k}}

learning rate α and β

Output: global meta-model θ

for round = 0 to T − 1 do

Server initial θ;

Server chooses clients at random with size K;

Server sends θ to all selected clients;

for selected client k do

Generated synthetic data as

D_{t r a i n, q}^{T_{k}}

-based raw dataset by CTGAN.

for i:1 to τ do

Inner update local meta-learning model

\bar{θ_{k}}

on

D_{t r a i n, s}^{T_{k}}

by Equation (1);

Outer update final local meta-learning model

θ_{k}

on

D_{t r a i n, q}^{T_{k}}

by Equation (2);

end for

Client k send θ_k to Server;

end for

Server updates global meta-model based on returned model:

θ = \frac{1}{K} \sum_{k = 1}^{K} θ_{k}

end for

Clients obtain final global meta-model θ;

for client do

for j:1 to κ do

Inner update meta-learning model

\bar{θ_{k}}

on

D_{a d a p t, s}^{T_{k}}

:

{\bar{θ}}_{k} \leftarrow θ_{k} - α \nabla f_{k} (θ_{k}; D_{a d a p t, s}^{T_{k}})

;

Outer update meta-learning model

θ_{k}

on

D_{a d a p t, q}^{T_{k}}

:

θ_{k} \leftarrow θ_{k} - β \nabla f_{k} (\bar{θ_{k}}; D_{a d a p t, q}^{T_{k}})

;

end for

Client k obtains personalized model θ_k

end for

For flight operation data sharing among multiple sharing participants, we used FedMeta-CTGAN defend property inference attack. All parties should have the same shared goal, such as the example in this article for the use of common knowledge to collaboratively train to obtain a big data model that can accurately predict whether a flight can leave on time, all parties should use a part of their own real data to generate fake data that are highly statistically similar to the real data when preparing the data, and use this part of the fake data as the query set in the training dataset as a way of achieving the anti-privacy inference attacks.

5. Evaluation

In this part of the experiments, we conducted in-flight delay prediction. We used the CTGAN to generate tabular data and we followed the original configuration of the author’s code. After meta-training to obtain the global meta-model, the meta-adaptation of each client is set to five epochs, and finally a personalized model for each client is obtained. An ANN is defined as a flight delay prediction model in the experiment. Specifically, the ANN used is derived from a multilayer perceptual machine (MLP).

So as to validate the effectiveness of FedMeta-CTGAN, we first performed dataset introduction and analysis, followed by two main parts of experimental evaluations, as follows: (1) performance evaluation, which evaluates the model prediction effectiveness under the imbalance in the number of clients and compares it with four other prediction methods; (2) privacy-preserving effectiveness evaluation, which validates, under a property inference attack, the prediction accuracy and privacy of the privacy-preserving method used, as well as comparing the differential privacy methods with different privacy budgets, and also verifying the robustness of the privacy-preserving methods under different property inference attack classifiers. The evaluation metrics used mainly include classification accuracy and inference AUC score. Classification accuracy verifies the classification effectiveness of the trained model for the target classification task; the inference AUC score verifies the adversary’s ability to infer sensitive property from the collected gradient information.

5.1. Dataset Setup

Considering the problem of insufficient data, the public dataset “Flight Dynamics and Landing Dataset” [44] was used in the experiment. The dataset contains the following four parts: historical flight dynamic takeoff and landing data, historical city weather table, airport city correspondence table, and historical airport special case table. For the departure delay prediction, the city weather table, the airport city correspondence table, and the historical airport special case table are matched and spliced with the historical flight dynamic takeoff and landing data to finally obtain the data table required for training the prediction model. After data preprocessing, the generated flight dynamics table containing all the feature information is obtained.

By analyzing the obtained flight dynamics table containing all the features, it can be seen that the initial dataset is extremely unbalanced. Figure 6 shows the number of delayed flights versus non-delayed flights in the dataset, and we find that delayed flights account for only 4.4%. Table 2 is the message of analysis using the describe function, and it can be seen that some of the feature column distributions are also unbalanced, for example, the two parts of the departure special case and the arrival special case, the minimum, one-half, one-third, and three-quarter quartiles are all 0, and the maximum is 1. The feature correlation heatmap in Figure 7 shows the correlation coefficients between the extraction of all the features, and it can be seen that the prior delay has the strongest correlation for flight departure delay.

5.2. Performance Evaluation

For the proposed approach based on federated meta-learning and CTGAN, we evaluate the prediction accuracy with an unbalanced number of samples among different clients to verify the effectiveness of the federated meta-learning framework in adapting to unbalanced data. The training data are divided into two parts, small dataset D₁ and large dataset D₂. The training data are divided into two parts according to the number of entries, which are used as datasets for two shared clients. Vary the percentage of the number owned by each client for multiple experiments. All the splits are evaluated, as follows: ’small dataset model’ D₁ = {10, 20, …, 50%} and ’large dataset model’ D₂ = {90, 80, …, 50%}.

Figure 8 shows the performance evaluation on the obtained flight dynamics dataset. In general, the party with the larger amount of data tends to have a higher accuracy rate because enough samples allow the model to fully learn the knowledge, thus leading to upward and downward trends in the curves. At the same time, we notice that clients with smaller number of datasets have improving model accuracy with increasing sample size, suggesting that the meta-learning models are learning new knowledge quickly and adapting quickly to the effects of unbalanced data on the model accuracy.

In addition, for the flight delay prediction problem, we proposed FedMeta-CTGAN (ANN), and compared it with the five methods of artificial neural network (ANN), gradient-boosted decision tree (GBDT), logistic regression (LR), extreme gradient boosted (XGBoost), federated meta-learning using ANN (FedMeta (ANN)), as shown in Table 3. It is worth noting that the GBDT, LR, and XGBoost used all follow the original settings in the scikit-learn library. Both centralized and distributed models use the same number of datasets; in distributed learning, we distribute the number evenly across different clients and record the accuracy of the model after meta-adaption training. We found that the distributed learning method FedMeta-CTGAN (ANN) using data generated by CTGAN as a meta-training query dataset did not affect the model accuracy when compared to FedMeta (ANN) without using CTGAN. Even comparing the centralized learning methods of ANN, LR, GBDT, and XGBoost, our method achieved high accuracy, which is less than 1% lower than GBDT, which has the highest prediction accuracy, but our method has the advantage of privacy preservation, which the centralized methods do not have.

5.3. Privacy-Preserving Effectiveness Evaluation

In the processed flight dynamics dataset, we set up three scenarios, each of which consists of a target classification task, a sensitive property, and a correlation coefficient between the sensitive property and the target property feature. The three scenarios have the same target classification task, i.e., departure delay prediction, and the sensitive property considered in the first scenario is the departure special case, which has a correlation coefficient with the target property feature of 0.08 (denoted as E₁), the sensitive property considered in the second scenario is the prior delay, which has a correlation coefficient with the target property feature of 0.45 (denoted as E₂), and the sensitive property considered in the third scenario is the arrival special condition, which has a correlation coefficient of 0.07 (denoted as E₃) with the target property feature. An overview of all experimental scenarios is presented in Table 4.

For the property inference attack, we assume that an adversary can launch an active or passive attack. At the same time, the adversary can steal the update information of multiple victims (only victims, henceforth referred to as OVs) or reverse the loss of multiple participants (not only victims, henceforth referred to as NOVs), so for the property inference behaviors initiated by the attacker and the targets of the property inference attack, we obtain four cases of property inference attack OV_active, OV_passive, NOV_ active, and NOV_passive, and the detailed information is shown in Table 5.

For each experimental scenario, we validate the classification accuracy and the adversary’s inference AUC score of the model before using FedMeta-CTGAN (hereafter written as W/O Protection) and after using FedMeta-CTGAN (hereafter written as W/Protection) in four scenarios of property inference attack. We use the model’s prediction results under property inference attack and the adversary’s inference AUC score to quantify the effectiveness and privacy of the privacy preserving approach.

As shown in Figure 9, in the experimental scenario E₁, the sensitive property is departure special case, and the target task is delay prediction. In Figure 9a, before using FedMeta-CTGAN (W/O Protection), the adversary achieves high inference AUC scores in OV_passive and NOV_passive, which proves that the property inference attack is successful, whereas after using FedMeta-CTGAN (W/Protection) it can be observed that the inference scores all drop to around 0.5, which it proves that FedMeta-CTGAN is able to withstand the property inference attack. In Figure 9b, it can be observed that before and after using FedMeta-CTGAN, the effectiveness of the personalized models obtained by the participants is almost unaffected after local adaptation, and all of them remain above 91%. As shown in Figure 10, in the experimental scenario E₂, the sensitive property is the prior delay, and the sensitive property at this time is more relevant to the target task. At this time, in Figure 10a, we clearly see that the inference AUC score reaches 0.9 and above in the case of OV_active, OV_passive, and NOV_active before the use of FedMeta-CTGAN (W/O Protection), and the sensitive property related to the target prediction task is more likely to be inferred, whereas with the use of FedMeta-CTGAN (W/Protection), the inference AUC scores will all be around 0.5, which proves that our FedMeta-CTGAN is able to protect the property more likely to be inferred in the property inference related to the target categorization task. In the meantime, in Figure 10b, the local model after adaptation is shown as still being able to achieve the prediction accuracy before using FedMeta-CTGAN. Similarly, in experimental scenario E₃, as shown in Figure 11, we are still able to obtain similar conclusions that our approach is able to withstand the property inference attack while the accuracy of the model does not receive an impact after meta-adaptation training.

Next, in the E₂ scenario, we compare the W/O Protection and differential privacy (hereafter DP) methods with different privacy budgets, as shown in Figure 12. We apply the DP method in the flight delay prediction model training. We find that there is a certain decrease in the model accuracy after adding the different amounts of noise in dataset; meanwhile, as the privacy budget increases, less noise is added, and the inference AUC score improves rapidly, implying that the adversary successfully launches a property inference attack. In W/O Protection, compare such accuracy and reasoning scores, indicating that the system is vulnerable to property inference attack.

In the E₃ scenario, we used different inference attack classifiers to verify the robustness of the FedMeta-CTGAN. We chose four classification algorithms to train the attack classifiers, including random forest (RF), k-nearest neighbor (KNN), gradient boosting decision tree (GBDT), and support vector machine (SVM) to validate the inference AUC scores under four property inference attacks, namely, OV_passive, OV_active, NOV_passive, and NOV_active, as a way of quantifying the robustness of the privacy-preserving approach. The RF, KNN, GBDT, and SVM used herein all follow the settings in the sklearn.ensemble module. The results are given in Table 6, except for the inference AUC score of 0.56 in the case of NOV_active where the attack classifier is GBDT, the rest of the inference AUC scores are around 0.50, which means the property inference attacks all fail.

6. Conclusions

In this article, FedMetaTable-CTGAN is proposed to defend flight operations data privacy from property inference attacks and it is validated by application to flight delay prediction. FedMeta-CTGAN solves the problems of data imbalance and privacy information leakage. In addition, the proposed FedMeta-CTGAN is able to resist property inference attack without affecting the data-sharing effect, which is still secure and robust.

As part of future work, we will further consider more sensitive data types in flight operation data and explore other generative adversarial networks for more robust privacy protection.

Author Contributions

Conceptualization, J.L.; methodology, J.L. and W.L.; software, W.L.; validation, M.Y.; formal analysis, M.Y.; investigation, J.L. and Z.W.; resources, W.L. and Z.W.; data curation, Z.W.; writing—original draft preparation, J.L.; writing—review and editing, M.Y.; supervision, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China, grant number 62172418 and the Natural Science Foundation of Tianjin China, grant number 21JCZDJC00830.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, X.Y.; Zhao, H.M.; Deng, W. BFOD: Blockchain-based privacy protection and security sharing scheme of flight operation data. IEEE Internet Things J. 2024, 11, 3392–3401. [Google Scholar] [CrossRef]
Francy, F. The aviation information sharing and analysis center (A-ISAC). In Proceedings of the 2015 Integrated Communication, Navigation and Surveillance Conference (ICNS), Herndon, VA, USA, 21–23 April 2015. [Google Scholar]
Gui, G.; Liu, F.; Sun, J.L. Flight delay prediction based on aviation big data and machine learning. IEEE Trans. Veh. Technol. 2019, 69, 140–150. [Google Scholar] [CrossRef]
Yu, B.; Guo, Z.; Asian, S. Flight delay prediction for commercial air transport: A deep learning approach. Transp. Res. Part E Logist. Transp. Rev. 2019, 125, 203–221. [Google Scholar] [CrossRef]
Cai, K.Q.; Li, Y.; Fang, Y.P. A deep learning approach for flight delay prediction through time-evolving graphs. IEEE Trans. Intell. Transp. Syst. 2022, 23, 11397–11407. [Google Scholar] [CrossRef]
Li, G.; Qin, W.; He, C. Research on flight delay prediction based on horizontal and vertical federated learning framework. In Proceedings of the 2021 IEEE 3rd International Conference on Civil Aviation Safety and Information Technology (ICCASIT), Changsha, China, 20–22 October 2021. [Google Scholar]
Xu, J.J.; Zhao, Y.L.; Chen, H.Y. ABC-GSPBFT: PBFT with grouping score mechanism and optimized consensus process for flight operation data-sharing. Inf. Sci. 2023, 624, 110–127. [Google Scholar] [CrossRef]
Verstraeten, J.; van Baren, G.B.; Wever, R. The Risk Observatory: Developing an Aviation Safety Information Sharing Platform in Europe. J. Saf. Stud. 2016, 2, 91–103. [Google Scholar] [CrossRef]
Li, F.Y.; Cui, Y.; Huang, B.G. A Lamus-Based Flight Data Sharing Model on Consortium Blockchain. Secur. Commun. Netw. 2022, 2022, 5717185. [Google Scholar] [CrossRef]
Okwir, S.; Correas, A. Collaborative Decision Making (CDM) in Airport Surface: Europe vs usa implementations, challenges and best practices. In Proceedings of the 2014 Integrated Communications, Navigation and Surveillance Conference (ICNS) Conference, Herndon, VA, USA, 8–10 April 2014. [Google Scholar]
Glickman, S. The business case for system-wide information management. IEEE Aerosp. Electron. Syst. Mag. 2007, 22, 3–12. [Google Scholar] [CrossRef]
Smith, M.; Moser, D.; Strohmeier, M. Undermining privacy in the aircraft communications addressing and reporting system (ACARS). In Proceedings of the Privacy Enhancing Technologies 2018, Barcelona, Spain, 16 March 2018. [Google Scholar]
Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning 2017, Sydney, NSW, Australia, 30 October 2019. [Google Scholar]
Chen, B.; Chen, T.; Zeng, X. DFML: Dynamic federated meta-learning for rare disease prediction. IEEE/ACM Trans. Comput. Biol. Bioinform. 2024, 21, 880–889. [Google Scholar] [CrossRef] [PubMed]
Chen, J.; Tang, J.H.; Li, W.H. Industrial Edge Intelligence: Federated-Meta Learning Framework for Few-Shot Fault Diagnosis. IEEE Trans. Netw. Sci. Eng. 2023, 10, 3561–3573. [Google Scholar] [CrossRef]
Xu, L.; Skoularidou, M.; Cuesta-Infante, A. Modeling tabular data using conditional gan. In Proceedings of the Advances in Neural Information Processing Systems 2019, Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Lin, S.; Yang, G.; Zhang, J.S. A collaborative learning framework via federated meta-learning. In Proceedings of the 2020 IEEE 40th International Conference on Distributed Computing Systems (ICDCS), Singapore, Singapore, 29 November–1 December 2020. [Google Scholar]
Zheng, W.; Yang, S.J.; Zhang, X.Y. A Novel Privacy-Preserving Computing System Based on VAE Federated Meta-Learning. J. Phys. Conf. Ser. IOP Publ. 2022, 2363, 012027. [Google Scholar] [CrossRef]
Ma, X.D.; Li, B.P.; Jiang, Q. NOSnoop: An effective collaborative meta-learning scheme against property inference attack. IEEE Internet Things J. 2021, 9, 6778–6789. [Google Scholar] [CrossRef]
Federal Aviation Administration. Available online: https://www.faa.gov/about/plansreports/aviation-safety-information-analysis-and-sharing-asias (accessed on 27 December 2024).
Federal Aviation Administration. Available online: https://www.faa.gov/air_traffic/technology/equipadsb/privacy (accessed on 27 December 2024).
IATA. Available online: https://www.iata.org (accessed on 27 December 2024).
Wang, Z.B.; Huang, Y.T.; Song, M.K. Poisoning-assisted property inference attack against federated learning. IEEE Trans. Dependable Secur. Comput. 2022, 20, 3328–3340. [Google Scholar] [CrossRef]
Melis, L.; Song, C.; De Cristofaro, E. Exploiting unintended feature leakage in collaborative learning. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019. [Google Scholar]
Chen, C.L.; Babakniya, S.; Paolieri, M. Defending against poisoning backdoor attacks on federated meta-learning. ACM Trans. Intell. Syst. Technol. (TIST) 2022, 13, 1–25. [Google Scholar] [CrossRef]
Sun, L.; Tian, J.; Muhammad, G. FedKC: Personalized Federated Learning With Robustness Against Model Poisoning Attacks in the Metaverse for Consumer Health. IEEE Trans. Consum. Electron. 2024, 70, 5644–5653. [Google Scholar] [CrossRef]
Zhang, X.; Li, C.; Han, C. A personalized federated meta-learning method for intelligent and privacy-preserving fault diagnosis. Adv. Eng. Inform. 2024, 62, 102781. [Google Scholar] [CrossRef]
Han, Z.; Hu, C.; Li, T. Subgraph-level federated graph neural network for privacy-preserving recommendation with meta-learning. Neural Netw. 2024, 179, 106574. [Google Scholar] [CrossRef]
Hu, Y.; Wu, J.; Li, G. Privacy-Preserving Few-Shot Traffic Detection Against Advanced Persistent Threats via Federated Meta Learning. IEEE Trans. Netw. Sci. Eng. 2023, 11, 2549–2560. [Google Scholar] [CrossRef]
Li, J.; Meng, Y.; Ma, L. A federated learning based privacy-preserving smart healthcare system. IEEE Trans. Ind. Inform. 2021, 18, 2021–2031. [Google Scholar] [CrossRef]
Wei, K.; Li, J.; Ding, M. Federated learning with differential privacy: Algorithms and performance analysis. IEEE Trans. Inf. Forensics Secur. 2020, 15, 3454–3469. [Google Scholar] [CrossRef]
Zhao, P.; Cao, Z.; Jiang, J. Practical private aggregation in federated learning against inference attack. IEEE Internet Things J. 2022, 10, 318–329. [Google Scholar] [CrossRef]
Rao, B.; Zhang, J.; Wu, D. Privacy inference attack and defense in centralized and federated learning: A comprehensive survey. IEEE Trans. Artif. Intell. 1–22. [CrossRef]
Dong, F.; Ge, X.; Li, Q. PADP-FedMeta: A personalized and adaptive differentially private federated meta learning mechanism for AIoT. J. Syst. Archit. 2023, 134, 102754. [Google Scholar] [CrossRef]
Chen, M.; Liu, H.; Chi, H. Heterogeneous Ensemble Federated Learning with GAN-based Privacy Preservation. IEEE Trans. Sustain. Comput. 2024, 9, 591–601. [Google Scholar] [CrossRef]
Chen, J.; Zhao, Y.; Li, Q. FedDef: Defense against gradient leakage in federated learning-based network intrusion detection systems. IEEE Trans. Inf. Forensics Secur. 2023, 18, 4561–4576. [Google Scholar] [CrossRef]
Creswell, A.; White, T.; Dumoulin, V. Generative adversarial networks: An overview. IEEE Signal Process. Mag. 2018, 35, 53–65. [Google Scholar] [CrossRef]
Chen, Z.F.; Zhu, T.Q. Privacy preservation for image data: A gan-based method. Int. J. Intell. Syst. 2021, 36, 1668–1685. [Google Scholar] [CrossRef]
Yang, Y.; Mu, K.; Deng, R.H. Lightweight privacy-preserving GAN framework for model training and image synthesis. IEEE Trans. Inf. Forensics Secur. 2022, 17, 1083–1098. [Google Scholar] [CrossRef]
Wang, K.L.; Deng, N.; Li, X.H. An efficient content popularity prediction of privacy preserving based on federated learning and wasserstein gan. IEEE Internet Things J. 2022, 10, 3786–3798. [Google Scholar] [CrossRef]
Habibi, O.; Chemmakha, M.; Lazaar, M. Imbalanced tabular data modelization using CTGAN and machine learning to improve IoT Botnet attacks detection. Eng. Appl. Artif. Intell. 2023, 118, 105669. [Google Scholar] [CrossRef]
Hu, L.; Li, J.; Lin, G. Defending against membership inference attacks with high utility by GAN. IEEE Trans. Dependable Secur. Comput. 2022, 20, 2144–2157. [Google Scholar] [CrossRef]
Tabassum, A.; Erbad, A.; Lebda, W. Fedgan-ids: Privacy-preserving ids using gan and federated learning. Comput. Commun. 2022, 192, 299–310. [Google Scholar] [CrossRef]
Heywhale. Available online: https://www.heywhale.com/mw/dataset/59793a5a0d84640e9b2fedd3/content (accessed on 27 December 2024).

Figure 1. System framework.

Figure 2. Flow of property inference attack.

Figure 3. Multi-task learning model.

Figure 4. Defending property inference attack with FedMeta-CTGAN.

Figure 5. CTGAN model.

Figure 6. Flight delay distribution.

Figure 7. Characteristic correlation heat map.

Figure 8. Classification accuracy of large and small datasets by varying the proportion of small datasets within the total dataset.

Figure 9. Comparison of W/O Protection and W/Protection in E₁ scenario: (a) inference AUC score; (b) classification accuracy.

Figure 10. Comparison of W/O Protection and W/Protection in E₂ scenario: (a) inference AUC score; (b) classification accuracy.

Figure 11. Comparison of W/O Protection and W/Protection in E₃ scenario: (a) inference AUC score; (b) classification accuracy.

Figure 12. Inference AUC score and classification accuracy under W/O Protection and DP.

Table 1. Explanation and notations in FedMeta-CTGAN.

Symbol	Explanation
$D_{t r a i n, s}^{T_{k}}$ $, D_{t r a i n, q}^{T_{k}}$	T_k’s support and query dataset in training
$D_{a d a p t, s}^{T_{k}}$ $, D_{a d a p t, q}^{T_{k}}$	T_k’s support and query dataset in adaptation
$θ$	Global meta-model parameters
$θ_{k}$	T_k’s meta-model parameters
$K$	Number of participants in FedMeta-CTGAN

Table 2. Statistics of function describe().

	Prior_Delay	TARGET	Departure_Situation	Arrival_Situation
count	19,247.00	22,393.00	22,399.00	22,399.00
unique	NaN	NaN	NaN	NaN
top	NaN	NaN	NaN	NaN
freq	NaN	NaN	NaN	NaN
mean	0.39	1.02	0.02	0.03
std	1.75	2.32	0.15	0.17
min	−2.60	−0.60	0	0
30%	−0.22	0.15	0	0
50%	−0.05	0.28	0	0
75%	1.06	1.98	0	0
max	15.38	15.77	1	1

Table 3. Classification accuracy of flight delay prediction with different models.

Model	Accuracy (%)
ANN	94.60%
LR	94.00%
GBDT	97.07%
XGBoost	94.07%
FedMeta (ANN)	96.33%
FedMeta-CTGAN (ANN)	96.33%

Table 4. Three experimental scenarios set-up.

Scenarios	Target Task	Sensitive Property	A Correlation Coefficient
E₁	delay prediction	departure special case	0.08
E₂	delay prediction	prior delay	0.45
E₃	delay prediction	arrival special case	0.07

Table 5. Four cases of property inference attack.

	Active Property Inference Attack	Passive Property Inference Attack
Only Victims	OV_active	OV_passive
Not Only Victims	NOV_active	NOV_passive

Table 6. Inference AUC score for different property inference attack classifiers under W/Protection.

Classifier	OV_Passive	OV_Active	NOV_Passive	NOV_Active
RF	0.51	0.47	0.51	0.51
KNN	0.53	0.51	0.49	0.52
GBDT	0.48	0.48	0.51	0.57
SVM	0.53	0.52	0.50	0.51

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lei, J.; Li, W.; Yue, M.; Wu, Z. Defend Against Property Inference Attack for Flight Operations Data Sharing in FedMeta Framework. Aerospace 2025, 12, 41. https://doi.org/10.3390/aerospace12010041

AMA Style

Lei J, Li W, Yue M, Wu Z. Defend Against Property Inference Attack for Flight Operations Data Sharing in FedMeta Framework. Aerospace. 2025; 12(1):41. https://doi.org/10.3390/aerospace12010041

Chicago/Turabian Style

Lei, Jin, Weiyun Li, Meng Yue, and Zhijun Wu. 2025. "Defend Against Property Inference Attack for Flight Operations Data Sharing in FedMeta Framework" Aerospace 12, no. 1: 41. https://doi.org/10.3390/aerospace12010041

APA Style

Lei, J., Li, W., Yue, M., & Wu, Z. (2025). Defend Against Property Inference Attack for Flight Operations Data Sharing in FedMeta Framework. Aerospace, 12(1), 41. https://doi.org/10.3390/aerospace12010041

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Defend Against Property Inference Attack for Flight Operations Data Sharing in FedMeta Framework

Abstract

1. Introduction

2. Related Works

3. Systemic Problem Formulation

3.1. Privacy Threats and Goals

3.2. System Framework

3.3. Adversary Model

4. FedMeta-CTGAN: Implementation Details

4.1. Edge Synergy-Empowered FedMeta-CTGAN Training

4.2. FedMeta-CTGAN Defend Property Inference Attack

5. Evaluation

5.1. Dataset Setup

5.2. Performance Evaluation

5.3. Privacy-Preserving Effectiveness Evaluation

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI