In this part of the experiments, we conducted in-flight delay prediction. We used the CTGAN to generate tabular data and we followed the original configuration of the author’s code. After meta-training to obtain the global meta-model, the meta-adaptation of each client is set to five epochs, and finally a personalized model for each client is obtained. An ANN is defined as a flight delay prediction model in the experiment. Specifically, the ANN used is derived from a multilayer perceptual machine (MLP).
So as to validate the effectiveness of FedMeta-CTGAN, we first performed dataset introduction and analysis, followed by two main parts of experimental evaluations, as follows: (1) performance evaluation, which evaluates the model prediction effectiveness under the imbalance in the number of clients and compares it with four other prediction methods; (2) privacy-preserving effectiveness evaluation, which validates, under a property inference attack, the prediction accuracy and privacy of the privacy-preserving method used, as well as comparing the differential privacy methods with different privacy budgets, and also verifying the robustness of the privacy-preserving methods under different property inference attack classifiers. The evaluation metrics used mainly include classification accuracy and inference AUC score. Classification accuracy verifies the classification effectiveness of the trained model for the target classification task; the inference AUC score verifies the adversary’s ability to infer sensitive property from the collected gradient information.
5.1. Dataset Setup
Considering the problem of insufficient data, the public dataset “Flight Dynamics and Landing Dataset” [
44] was used in the experiment. The dataset contains the following four parts: historical flight dynamic takeoff and landing data, historical city weather table, airport city correspondence table, and historical airport special case table. For the departure delay prediction, the city weather table, the airport city correspondence table, and the historical airport special case table are matched and spliced with the historical flight dynamic takeoff and landing data to finally obtain the data table required for training the prediction model. After data preprocessing, the generated flight dynamics table containing all the feature information is obtained.
By analyzing the obtained flight dynamics table containing all the features, it can be seen that the initial dataset is extremely unbalanced.
Figure 6 shows the number of delayed flights versus non-delayed flights in the dataset, and we find that delayed flights account for only 4.4%.
Table 2 is the message of analysis using the describe function, and it can be seen that some of the feature column distributions are also unbalanced, for example, the two parts of the departure special case and the arrival special case, the minimum, one-half, one-third, and three-quarter quartiles are all 0, and the maximum is 1. The feature correlation heatmap in
Figure 7 shows the correlation coefficients between the extraction of all the features, and it can be seen that the prior delay has the strongest correlation for flight departure delay.
5.2. Performance Evaluation
For the proposed approach based on federated meta-learning and CTGAN, we evaluate the prediction accuracy with an unbalanced number of samples among different clients to verify the effectiveness of the federated meta-learning framework in adapting to unbalanced data. The training data are divided into two parts, small dataset D1 and large dataset D2. The training data are divided into two parts according to the number of entries, which are used as datasets for two shared clients. Vary the percentage of the number owned by each client for multiple experiments. All the splits are evaluated, as follows: ’small dataset model’ D1 = {10, 20, …, 50%} and ’large dataset model’ D2 = {90, 80, …, 50%}.
Figure 8 shows the performance evaluation on the obtained flight dynamics dataset. In general, the party with the larger amount of data tends to have a higher accuracy rate because enough samples allow the model to fully learn the knowledge, thus leading to upward and downward trends in the curves. At the same time, we notice that clients with smaller number of datasets have improving model accuracy with increasing sample size, suggesting that the meta-learning models are learning new knowledge quickly and adapting quickly to the effects of unbalanced data on the model accuracy.
In addition, for the flight delay prediction problem, we proposed FedMeta-CTGAN (ANN), and compared it with the five methods of artificial neural network (ANN), gradient-boosted decision tree (GBDT), logistic regression (LR), extreme gradient boosted (XGBoost), federated meta-learning using ANN (FedMeta (ANN)), as shown in
Table 3. It is worth noting that the GBDT, LR, and XGBoost used all follow the original settings in the scikit-learn library. Both centralized and distributed models use the same number of datasets; in distributed learning, we distribute the number evenly across different clients and record the accuracy of the model after meta-adaption training. We found that the distributed learning method FedMeta-CTGAN (ANN) using data generated by CTGAN as a meta-training query dataset did not affect the model accuracy when compared to FedMeta (ANN) without using CTGAN. Even comparing the centralized learning methods of ANN, LR, GBDT, and XGBoost, our method achieved high accuracy, which is less than 1% lower than GBDT, which has the highest prediction accuracy, but our method has the advantage of privacy preservation, which the centralized methods do not have.
5.3. Privacy-Preserving Effectiveness Evaluation
In the processed flight dynamics dataset, we set up three scenarios, each of which consists of a target classification task, a sensitive property, and a correlation coefficient between the sensitive property and the target property feature. The three scenarios have the same target classification task, i.e., departure delay prediction, and the sensitive property considered in the first scenario is the departure special case, which has a correlation coefficient with the target property feature of 0.08 (denoted as E
1), the sensitive property considered in the second scenario is the prior delay, which has a correlation coefficient with the target property feature of 0.45 (denoted as E
2), and the sensitive property considered in the third scenario is the arrival special condition, which has a correlation coefficient of 0.07 (denoted as E
3) with the target property feature. An overview of all experimental scenarios is presented in
Table 4.
For the property inference attack, we assume that an adversary can launch an active or passive attack. At the same time, the adversary can steal the update information of multiple victims (only victims, henceforth referred to as OVs) or reverse the loss of multiple participants (not only victims, henceforth referred to as NOVs), so for the property inference behaviors initiated by the attacker and the targets of the property inference attack, we obtain four cases of property inference attack OV_active, OV_passive, NOV_ active, and NOV_passive, and the detailed information is shown in
Table 5.
For each experimental scenario, we validate the classification accuracy and the adversary’s inference AUC score of the model before using FedMeta-CTGAN (hereafter written as W/O Protection) and after using FedMeta-CTGAN (hereafter written as W/Protection) in four scenarios of property inference attack. We use the model’s prediction results under property inference attack and the adversary’s inference AUC score to quantify the effectiveness and privacy of the privacy preserving approach.
As shown in
Figure 9, in the experimental scenario E
1, the sensitive property is departure special case, and the target task is delay prediction. In
Figure 9a, before using FedMeta-CTGAN (W/O Protection), the adversary achieves high inference AUC scores in OV_passive and NOV_passive, which proves that the property inference attack is successful, whereas after using FedMeta-CTGAN (W/Protection) it can be observed that the inference scores all drop to around 0.5, which it proves that FedMeta-CTGAN is able to withstand the property inference attack. In
Figure 9b, it can be observed that before and after using FedMeta-CTGAN, the effectiveness of the personalized models obtained by the participants is almost unaffected after local adaptation, and all of them remain above 91%. As shown in
Figure 10, in the experimental scenario E
2, the sensitive property is the prior delay, and the sensitive property at this time is more relevant to the target task. At this time, in
Figure 10a, we clearly see that the inference AUC score reaches 0.9 and above in the case of OV_active, OV_passive, and NOV_active before the use of FedMeta-CTGAN (W/O Protection), and the sensitive property related to the target prediction task is more likely to be inferred, whereas with the use of FedMeta-CTGAN (W/Protection), the inference AUC scores will all be around 0.5, which proves that our FedMeta-CTGAN is able to protect the property more likely to be inferred in the property inference related to the target categorization task. In the meantime, in
Figure 10b, the local model after adaptation is shown as still being able to achieve the prediction accuracy before using FedMeta-CTGAN. Similarly, in experimental scenario E
3, as shown in
Figure 11, we are still able to obtain similar conclusions that our approach is able to withstand the property inference attack while the accuracy of the model does not receive an impact after meta-adaptation training.
Next, in the E
2 scenario, we compare the W/O Protection and differential privacy (hereafter DP) methods with different privacy budgets, as shown in
Figure 12. We apply the DP method in the flight delay prediction model training. We find that there is a certain decrease in the model accuracy after adding the different amounts of noise in dataset; meanwhile, as the privacy budget increases, less noise is added, and the inference AUC score improves rapidly, implying that the adversary successfully launches a property inference attack. In W/O Protection, compare such accuracy and reasoning scores, indicating that the system is vulnerable to property inference attack.
In the E
3 scenario, we used different inference attack classifiers to verify the robustness of the FedMeta-CTGAN. We chose four classification algorithms to train the attack classifiers, including random forest (RF), k-nearest neighbor (KNN), gradient boosting decision tree (GBDT), and support vector machine (SVM) to validate the inference AUC scores under four property inference attacks, namely, OV_passive, OV_active, NOV_passive, and NOV_active, as a way of quantifying the robustness of the privacy-preserving approach. The RF, KNN, GBDT, and SVM used herein all follow the settings in the sklearn.ensemble module. The results are given in
Table 6, except for the inference AUC score of 0.56 in the case of NOV_active where the attack classifier is GBDT, the rest of the inference AUC scores are around 0.50, which means the property inference attacks all fail.