Next Article in Journal
Intelligent Functional Clustering and Spatial Interactions of Urban Freight System: A Data-Driven Framework for Decoding Heavy-Duty Truck Behavioral Heterogeneity
Previous Article in Journal
A FAIR Resource Recommender System for Smart Open Scientific Inquiries
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cloud–Edge Collaborative Model Adaptation Based on Deep Q-Network and Transfer Feature Extraction

1
Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China
2
Shanghai Aerospace Electronic Technology Institute, Shanghai 201109, China
3
Shanghai Key Laboratory of Collaborative Computing in Spacial Heterogeneous Networks (CCSN), Shanghai 201109, China
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2025, 15(15), 8335; https://doi.org/10.3390/app15158335
Submission received: 27 June 2025 / Revised: 19 July 2025 / Accepted: 21 July 2025 / Published: 26 July 2025

Abstract

Featured Application

Cloud-Edge Collaborative Model Adaptation Based on Deep Q-Network and Transfer Feature Extraction

Abstract

With the rapid development of smart devices and the Internet of Things (IoT), the explosive growth of data has placed increasingly higher demands on real-time processing and intelligent decision making. Cloud-edge collaborative computing has emerged as a mainstream architecture to address these challenges. However, in sky-ground integrated systems, the limited computing capacity of edge devices and the inconsistency between cloud-side fusion results and edge-side detection outputs significantly undermine the reliability of edge inference. To overcome these issues, this paper proposes a cloud-edge collaborative model adaptation framework that integrates deep reinforcement learning via Deep Q-Networks (DQN) with local feature transfer. The framework enables category-level dynamic decision making, allowing for selective migration of classification head parameters to achieve on-demand adaptive optimization of the edge model and enhance consistency between cloud and edge results. Extensive experiments conducted on a large-scale multi-view remote sensing aircraft detection dataset demonstrate that the proposed method significantly improves cloud-edge consistency. The detection consistency rate reaches 90%, with some scenarios approaching 100%. Ablation studies further validate the necessity of the DQN-based decision strategy, which clearly outperforms static heuristics. In the model adaptation comparison, the proposed method improves the detection precision of the A321 category from 70.30% to 71.00% and the average precision (AP) from 53.66% to 53.71%. For the A330 category, the precision increases from 32.26% to 39.62%, indicating strong adaptability across different target types. This study offers a novel and effective solution for cloud-edge model adaptation under resource-constrained conditions, enhancing both the consistency of cloud-edge fusion and the robustness of edge-side intelligent inference.

1. Background

With the development of smart devices and Internet of Things (IoT) technologies, the massive generation of data has placed higher demands on real-time processing and intelligent analysis. The collaboration between cloud computing and edge computing has become the mainstream architecture. Edge computing is responsible for real-time data processing at the device level near the data source, while cloud computing handles large-scale data analysis, storage, and processing in the resource-rich cloud environment [1]. Reasonably, allocating tasks between the cloud and the edge to achieve cloud-edge collaborative decision making has become a mainstream research focus in recent years. Cloud-edge collaborative decision making features distributed processing and low latency [2], making it widely applicable in fields like intelligent security, industrial quality inspection, autonomous driving, and smart transportation [3].
Although the cloud-edge collaborative architecture provides strong support for various intelligent application scenarios, the strict constraints on computing power, energy consumption, and storage of edge devices lead to reduced accuracy and robustness of models in tasks like object detection or classification [4]. To this end, a growing number of studies have focused on the fine-tuning of edge-side models (see [5,6]) by introducing an optimization mechanism equipped with intelligent perception and decision making capabilities. This enables the system to adaptively perform localized updates of the classification head based on the real-time state of edge devices, thereby improving the long-term stability and adaptability of edge-side models.
To address the need for local adaptation of edge models in classification tasks within object detection, RL (reinforcement learning) offers a feasible strategy-driven framework capable of learning optimal action policies through interaction with environmental feedback [7]. In particular, the Deep Q-Network (DQN) models the decision making process through a state-action-reward framework and can be employed to intelligently determine whether model transfer should be performed for a specific target class [8]. In addition, the feature extraction mechanism in transfer learning [9] can adjust the feature parameters of a specific category in the edge model based on discrepancies between detection results from the cloud and the edge. This enables precise and cost-effective model enhancement without compromising the stability of the edge-side backbone network. The integration of both approaches gives rise to a collaborative optimization framework that unifies cloud-edge policy decision making with parameter transfer. The DQN agent makes transfer decisions for specific categories based on the observed performance discrepancies between the cloud and edge environments. The transfer mechanism achieves rapid adaptation to edge-side data by replacing the parameters of the classification head. This design not only enhances the robustness of the edge model in complex and dynamic scenarios but also facilitates efficient coordination of computational resources between the cloud and the edge.
In recent years, various approaches have been proposed to address model deployment and optimization in edge environments, including lightweight techniques, such as model pruning, quantization, distillation, and parameter sharing. Pruning (see [10,11]) and quantization (see [12,13]) reduce model storage and improve inference efficiency by removing unimportant connections or neurons in neural networks or by converting high-precision floating-point values into low-precision integers. Distillation (see [14,15]) reduces model size while preserving performance by transferring knowledge from a large model to a smaller one. Parameter sharing (see [16,17]) reduces the number of parameters and improves computational efficiency by reusing identical weights across the network. Although these methods effectively reduce model size and computational overhead, they still require a trade-off between accuracy and efficiency in practical applications. Moreover, existing studies often focus on compressing or distilling the entire model, while few approaches address localized transfer and adaptive control specifically targeting the classification head. Meanwhile, in edge environments characterized by significant data heterogeneity, there remains a lack of suitable transfer methods equipped with intelligent decision making mechanisms. While reinforcement learning has been explored in areas like task scheduling and inference path selection [18], research on class-wise decision making for local model transfer using DQN is still in its early stages [19]. Ref. [20] employed a Deep Q-Network (DQN) to perform collaborative scheduling among edge devices within a Multi-Access Edge Computing (MEC) architecture, aiming to solve the job shop scheduling problem in smart factory processes, while ref. [21] proposed a Prioritized Action Sampling-based Dueling DQN (PASD) algorithm to determine task offloading and resource allocation strategies, aiming to minimize average task delay and total system energy consumption, thereby addressing the joint allocation of network and computing resources in cloud-edge collaborative Industrial Internet of Things (IIoT) environments. This is particularly scarce within the context of cloud-edge collaboration. Therefore, there is an urgent need for a systematic approach that integrates reinforcement-learning-based decision making mechanisms with local transfer optimization strategies to enable adaptive edge model deployment.
This paper addresses the challenges of the sky-ground integrated systems, specifically the limited edge computing resources and the inconsistency between cloud-edge fusion results and edge-side detection results. The paper proposes a cloud-edge collaborative edge model adaptation method that integrates DQN-based decision making with transfer feature extraction. The aim is to enhance the consistency between the edge model and the cloud fusion results, thereby ensuring the robustness of the edge model. The main contributions of this paper include the following.
Model Adaptive Transfer Mechanism: A DQN agent is employed to perform class-level decision making for transfer, enabling model updates for specific target categories.
Dynamic Decision Making Mechanism: Unlike traditional static strategies, the proposed method can dynamically determine the scope of patch transfer in real time based on the current operational state of the edge device.
Cloud-Edge Collaborative Enhancement: A general transfer strategy for multi-class object detection is developed to improve the consistency between edge-side models and cloud-side fusion results.
The overall structure of the paper is organized as follows.
Section 2 (Preliminaries) introduces the problem setting and core assumptions, which lay the foundation for the subsequent model design; Section 3 (Model Framework) presents the proposed cloud-edge collaborative model adaptation framework, detailing both the decision making mechanism and the transfer strategy; Section 4 (Experiment) provides extensive experimental validation to evaluate the effectiveness and practicality of the proposed method; and Section 5 (Conclusion and Future Work) summarizes the research findings and outlines potential directions for future work.

2. Preliminaries

For edge object detection in multi-view satellite imagery scenarios, a cloud-edge collaborative model adaptation method driven by reinforcement learning is proposed. This method integrates a Faster Region-based Convolutional Neural Network (Faster R-CNN) [22], transfer learning-based feature extraction strategies, and a DQN-based decision mechanism to achieve dynamic optimization of the classification head in edge-side models.

2.1. Definition

Set at a certain observation time t , capture static image data of aircraft parked at an airport from four different perspectives θ { 0 ° , 90 ° , 180 ° , 270 ° } , and the image may contain multiple aircraft targets. Each image is input into the target detection Faster R-CNN (Faster Region-Based Convolutional Neural Network) model on the edge side I t θ . The image may contain multiple aircraft targets. Faster R-CNN model for object detection on the edge side of each image input f e d g e . Obtain the category prediction set:
Y ˜ t θ = f e d g e ( I t θ )
The prediction results of all perspectives are summarized to the cloud, and the global prediction result is obtained through the cloud fusion algorithm f f u s i o n :
Y ˜ t c l o u d = f f u s i o n ( { Y ˜ t θ } θ ° , 90 ° , 180 ° , 270 ° } )
The model detects 11 types of aircraft, defined as the category set:
C = { A 220 , A 321 , A 330 , A 350 , A R J 21 , B o e i n g 737 , B o e i n g 747 , B o e i n g 777 , B o e i n g 787 , C 919 , o t h e r a i r p l a n e }
Record the detection results of the cloud side and edge measurement at this moment as the state input of reinforcement learning, and record them in the result.xlsx file.

2.2. Problem Statement

In collaborative decision making for cloud-edge object detection, edge-side detection results are expected to closely approximate cloud-side fusion outcomes under limited computational resources. Given the high cost and time-consuming nature of adjusting the entire model on the edge side, frequent full-model updates are impractical. Therefore, this approach introduces reinforcement-learning-based decision and policy mechanisms along with a localized classification head transfer strategy. It dynamically determines whether to execute transfer based on the state of each category and selectively replaces corresponding classification weights, thereby enhancing edge model performance at a low cost.

3. Model Framework

Figure 1 illustrates the workflow of cloud-edge collaboration. The edge-side model first performs object detection on the target image using a pre-trained edge model and uploads the detection results to the cloud. The cloud then fuses multi-view detection results from the edge. In response to inconsistencies between cloud-side and edge-side detection results, the edge employs a reinforcement-learning-based decision mechanism to determine whether to conduct localized classification head transfer training for specific categories. This enables low-overhead, highly adaptive model self-updating, ultimately forming a closed-loop cloud-edge collaborative system that integrates edge detection, cloud fusion, and policy optimization.
To address the inconsistency between edge-side detection results and cloud-side fusion outcomes, a cloud-edge collaborative model adaptation method is proposed, which integrates DQN-based decision making with transfer feature extraction. The detailed model architecture is illustrated in Figure 2.

3.1. Reinforcement Learning Decision and Strategy Mechanism

In order to achieve adaptive optimization of edge detection models in multi-view target detection tasks, a category migration decision mechanism based on reinforcement learning is designed. The Deep Q-Network (DQN) is used as the core decision model. The state-driven method is used to determine whether a certain category performs local migration operations, thereby achieving the adaptation of edge detection models under limited resources.

3.1.1. State Vector Construction

For each target category c C , a state vector S c is constructed, which includes the following three components:
S c = ( a c c c e d g e , a c c c c l o u d , Δ c )
a c c c e d g e : The detection accuracy rate of the edge-side model for category c, calculated as a c c c e d g e = N t o t a l , c e d g e N s e n s o r s   s a t e l l i t e s .
a c c c c l o u d : The detection accuracy rate of the cloud-side fusion model for category c, calculated as a c c c c l o u d = N t o t a l , c c l o u d N s e n s o r s   s a t e l l i t e s .
Δ c = a c c c c l o u d a c c c e d g e is the difference between the cloud-side and edge-side accuracies, used as an immediate reward signal in reinforcement learning. This reward r c = Δ c reflects the accuracy improvement brought by the migration operation at the edge, which serves as the core feedback signal for Q network optimization.
N s e n s o r s   s a t e l l i t e s : Total number of sensors (satellites).
N t o t a l , c e d g e : The total number of sensors (satellites) in a real aircraft that are determined to be of category c by the edge model.
N t o t a l , c c l o u d : The total number of sensors (satellites) in a real aircraft that are determined to be of category c by the cloud model fusion.

3.1.2. Reinforcement Learning Strategy

A Deep Q-Network (DQN) is used as the migration strategy agent. For each type of state Sc, DQN outputs an action set, a c { 0 , 1 } , where 1 means executing migration and 0 means skipping migration. The action selection strategy adopts an ε-greedy policy, where the agent selects a random action (exploration) with probability ε and chooses the action with the highest Q-value in the current Q-network (exploitation) with probability 1 − ε. The value of ε gradually decays during training, allowing the strategy to shift from early-stage exploration to convergence towards the optimal policy. For the Reinforcement Learning Experience Pool ( s c , a c , r c , s c ) , trained by replaying the experience pool, the goal is to minimize the Bellman residual:
L Q = ( Q ( s c , a c ) [ r c + γ max a Q ( s c , a ) ] ) 2
where γ is the discount factor used to balance long-term and short-term benefits. The strategy training goal is to learn the optimal Q function:
π ( s ) = arg max a Q ( s , a )
This mechanism can dynamically adjust the migration strategy of the edge model according to the performance feedback of the cloud and the edge in each iteration.
The structure of the Q-network is shown in Table 1.
The reinforcement learning algorithm parameters are shown in Table 2.

3.1.3. Reward Design and Decision-Making Mechanism

The reward function is used to measure the edge side accuracy improvement after the migration operation, and it is defined as
r c = a c c c c l o u d a c c c e d g e
When a category c is selected to perform the migration operation, that is, action = 1, and distillation training is performed on the category, the updated edge and cloud detection accuracy is calculated. If the accuracy difference is less than the set threshold θ = 0.1, the migration of this category is considered effective, and this category is regarded as a positive experience. This category enters the “frozen” state, and migration will not be repeated in the subsequent training process to avoid resource waste. If action = 0, migration of this category is skipped, and the reward value of the previous round is retained in the current round to maintain the continuity of the strategy.

3.2. Local Transfer Mechanism of Edge Model

3.2.1. Local Classification Head Feature Transfer

In order to improve the detection capability of specific categories while ensuring the integrity and stability of the backbone structure of the edge detection model, this paper designs a migration mechanism based on local classification head replacement. Specifically, for a certain category c C , only train the weight vector W c and the bias term b c ; the rest of the network is frozen. The training goal is to minimize the classification loss and bounding box regression loss of the model on the current category. The optimization objective function is
L c = L c l s + β L r e g
where L c l s = C E ( f θ c ( x ) , y c c l o u d ) [23] represents the cross-over loss between the classification head’s prediction of the category and the cloud-side fusion label, L r e g [24] represents the smooth LI loss of the bounding box position prediction, which is used to constrain the position offset, and β is the loss weight coefficient.
The training process uses mini-batch gradient descent SGD (Stochastic Gradient Descent) [25] optimization, and training is continued only when the classification loss decreases significantly. If the loss of multiple consecutive batches is lower than the threshold of 0.001, the system automatically triggers early termination to avoid overfitting. After training is completed, the optimized parameters are extracted and saved as category patches, and the patches are stored as pth files: p a t c h c = ( W c n e w , b c n e w ) . The patch will be integrated into the basic model on the edge side when it is later determined that it needs to be migrated, improving the detection performance of this category.

3.2.2. Patch Fusion and Model Update

After the local classification head feature migration is completed, the score head weight with index C + 1 (taking the background into account) is included in the classification head. Fuse and integrate the head score W c R 2048 and bias weights W c R into the original model to achieve a local model update.
W c b a s e is replaced by W c b a s e and b c b a s e is replaced by b c p a t c h , where W c b a s e and b c b a s e are the parameter part of the model before replacement. The edge model adopts the Faster R-CNN framework with a ResNet backbone for feature extraction. The overall architecture is shown in Table 3.

3.2.3. Edge Inference and State Update

After the local migration model is fused, the fused model is used for the inference of the edge side four-view satellite images (0°, 90°, 180°, 270°) and then sent to the cloud again for fusion detection results to update the migrated a c c c e d g e and a c c c c l o u d as the state input item of the next round of the reinforcement learning module.

3.3. The Algorithm Flowchart and Its Corresponding Pseudocode

The algorithm flowchart is shown in Figure 3, and the pseudocode of the algorithm is shown in Algorithm 1.
Algorithm 1. The pseudocode of the algorithm. Adaptive Cloud-Edge Collaborative Model Optimization Algorithm
Initialize DQN agent;
Load initial detection results from result.xlsx;
For each class c in the category set do:
Step 1 Construct State Vector for Reinforcement Learning
Compute edge-side detection accuracy: acc_edge_c;
Compute cloud-side fusion accuracy: acc_cloud_c;
Compute accuracy gap: Δc = acc_cloud_c - acc_edge_c;
Form state vector Sc = {acc_edge_c, acc_cloud_c, Δc};
Step 2 Input state into DQN agent and obtain action
action_c = DQN.predict(Sc);
If action_c == 1:
Load class-specific training samples (labeled XML);
Freeze all parameters of Faster R-CNN except for classification head of class c;
Calculate LC and Perform gradient update only on:
- head.score.weight[c]
- head.score.bias[c];
Save the updated parameters as class patch (pth file);
Merge the patch into the base edge model;
Step 3 Perform Edge Inference and Update State
Perform inference on multi-view (4-angle) satellite images;
Update result.xlsx with new edge-side predictions;
Recalculate acc_edge_c;
Else:
No parameter update, retain previous model state;
Step 4 Reinforcement Learning Update
Compute reward r_c = |acc_cloud_c - acc_edge_c|;
Store experience tuple (Sc, action_c, r_c, Sc + 1) into replay buffer;
Update DQN by minimizing Bellman loss L_Q using sampled experiences;
Step 5 Check Termination Condition
If |acc_cloud_c - acc_edge_c| < threshold:
Mark class c as “converged”; skip further transfer for this class;
Else:
Continue iterative transfer learning for class c;
End For
Output: Updated edge model and final result.xlsx with optimized edge-side detection results

4. Experiment

4.1. Dataset Description

This study utilizes a publicly available aircraft dataset tailored for large-scale and complex scenarios, comprising 1479 original images. To simulate satellite imagery from multiple viewing angles, each image is rotated to four orientations (0°, 90°, 180°, and 270°), resulting in a total of 5916 images. Specifically, the 0° images are from the publicly available dataset, while the other angles are generated by rotating the 0° images. The dataset has a spatial resolution of 1 m and includes 11 aircraft categories: A220, A321, A330, A350, ARJ21, Boeing 737, Boeing 747, Boeing 777, Boeing 787, C919, and other-airplane.
The images were primarily collected from three major civilian airports: Shanghai Hongqiao International Airport, Beijing Capital International Airport, and Taiwan Taoyuan International Airport. The dataset features a variety of image sizes to reflect diverse real-world conditions.
For edge-side model training, the dataset is split into 81% for training, 9% for validation, and 10% for testing. The number of samples per aircraft category is detailed in Table 4.

4.2. Experimental Environment

The model proposed in the study is implemented in Python (version: 3.12.7 (packaged by Anaconda, Inc., released on 4 October 2024)). The fusion module is based on the Faster R-CNN target detection framework, and training and reasoning are completed under the PyTorch(PyCharm 2024.3 (Community Edition), Build #PC-243.21565.199, built on 13 November 2024) framework. Edge deployment reasoning is equipped with an NVIDIA GeForce RTX 4070 GPU and an SM7 AI acceleration chip. The backbone network of the edge target detection model uses ResNet-50. The edge model freezes the backbone and only trains the parameters of the category score output head (score.weight and score.bias) to achieve local migration optimization.

4.3. Consistency Verification of Cloud-Edge Collaboration Implemented Through Reinforcement Learning

For the detection results of different scenes, the consistency of the side detection results of a certain category of aircraft and the cloud-side fusion results are updated iteratively during the training process of the DQN network. The following figure is a randomly selected picture with an angle of 0 in six scenes, and the number of aircraft is 6, 8, 12, 14, 22, and 33 respectively. Scenes 1, 2, and 3 correspond to the images shown in Figure 4, while Scenes 4, 5, and 6 correspond to the images shown in Figure 5.
As the number of DQN iterations increases for each type of scenario, the reward function changes, as shown in Figure 6.
The trends of the curves indicate that the DQN agent is able to quickly learn effective strategies, leading to the optimization of model performance. A detailed analysis is provided below.
(1) Overall Characteristics of the Convergence Trend
The reward curves in all six scenarios show a declining trend followed by stabilization, indicating that the DQN is able to converge effectively in these environments. In the initial stage, the reward values are relatively high, reflecting significant discrepancies between the edge model and the cloud fusion results before optimization. After three to five training iterations, the rewards in most scenarios approach zero, indicating that the edge model, under the guidance of the DQN, has closely approached the detection performance of the cloud model.
(2) Observations of Inter-Scenario Variability
In Scenarios 1 and 2 (with fewer aircraft), the reward function drops rapidly with slight fluctuations, and policy learning is nearly complete by the second iteration. This indicates that in simpler scenarios with less variability, the DQN policy converges faster, and the transfer strategy is easier to determine. In Scenarios 3 and 4 (with medium target density), the reward curves exhibit larger fluctuations but still converge after the fifth iteration. This suggests that in scenarios with moderate target density, the stability of policy learning remains satisfactory. In Scenarios 5 and 6 (high-density, crowded airports), the initial reward values are the highest, and the convergence process is slightly slower. This indicates that in complex scenarios, the discrepancy between the edge model and the cloud model is more pronounced, requiring the DQN to perform more interactions to learn an appropriate transfer strategy. However, convergence is eventually achieved, demonstrating the method’s adaptability in complex environments.
(3) Final Converged Value
According to the figure, in most scenarios, after 10 episodes, the consistency between the cloud-side and edge-side results reaches 100%, while in some scenarios it approaches 100%. This phenomenon may be attributed to the exploration-exploitation trade-off in the DQN learning strategy. In reinforcement learning, the ε-greedy strategy typically retains a certain probability of exploration, which introduces slight fluctuations or non-zero rewards during the training process. This phenomenon reflects the robustness of the strategy—it maintains the effectiveness of the migration while preserving the generalization ability of the policy.

4.4. Ablation and Baseline Comparison Experiments

For the evaluation indicators of side object detection, for each category, precision, recall, and precision-recall curve integration are used for evaluation (AP (average precision) value) [26].
P r e c i s i o n = T P T P + F P
R e c a l l = T P T P + F N
where TP is a true positive, FP is a false positive, and FN is a false negative. These values are determined by calculating whether the IoU (Intersection Over Union) between the detection box and the true box is greater than the MINOVERLAP threshold of 0.5.
A P = 0 1 p r e c i s i o n ( r ) d r
AP is the accuracy of each class at different recall values, which is interpolated and averaged. The closer the precision model prediction accuracy is to 1, the fewer false positives (FP) there are; the closer the recall model coverage is to 1, the fewer missed detections (FN) there are; and the closer the AP overall prediction performance is to 1, the better the model’s performance [27].
The ablation study and baseline comparison experiments evaluate each aircraft type using precision, recall, and average precision (AP). Additionally, three aircraft types—A321, A330, and other-plane—are randomly selected for inference result recording under Scenario 2.

4.4.1. Analysis of the Impact on Performance After Removing the Reinforcement Learning Mechanism

In the absence of the reinforcement learning mechanism, the triggering of category transfer relies on manual configuration. This experiment aims to verify the effectiveness of introducing a reinforcement learning strategy in improving the model’s adaptability. Table 5 presents the inference performance comparison of different types of aircraft targets with and without the reinforcement learning mechanism.
The experimental results validate the effectiveness of introducing the reinforcement learning mechanism. As a migration trigger strategy, reinforcement learning plays a positive role in the detection of different target categories, improving the model’s adaptability to uncertain targets and enhancing detection accuracy to a certain extent. After removing this mechanism, the model loses its ability to automatically adjust inter-class migration strategies, resulting in a slight decline in inference performance.

4.4.2. Comparison of Different Feature Migration Mechanisms

For the edge-side migrated model, different feature migration configurations were designed by adjusting head.score.bias, head.score.bias+head.score.weight, and head.score.bias+head.score.weight+head.cls_loc.weight+head.cls_loc.bias. The performance of each configuration was compared with that of the original model. The experimental results are shown in Table 6.
As shown in the table, transferring the edge-side model’s classification head components head.score.bias+head.score.weight can improve the original model’s precision, recall, and average precision (AP) to varying degrees. For the A321 category, this transfer strategy achieves stable performance, with precision reaching 71.00% and AP reaching 53.71%, both exceeding the original model. For the A330 category, the precision reaches 39.62%, while in the other-plane category, the recall under this configuration reaches 59.40%, outperforming both the original model and single-parameter transfer. This indicates that it effectively balances recall and false positives. In contrast, transferring the entire classification and regression heads leads to performance degradation, suggesting that excessive transfer may introduce noise and compromise the model’s generalization ability.

4.4.3. Comparison Between the Proposed Migration Mechanism and Mainstream Model Adaptation Methods

Mainstream model adaptation methods mainly include strategies like model pruning, knowledge distillation, and lightweight structural design. To verify the effectiveness of the proposed transfer feature extraction mechanism, two representative baseline methods were selected for comparison: structured pruning and soft label distillation. The inference performance of these methods was evaluated against the proposed method on the edge model.
In the pruning part, a structured channel pruning strategy was adopted. Specifically, pruning operations were performed on the conv1 and conv2 layers within the head.classifier module of the edge model to reduce redundant network parameters and improve the model’s inference efficiency on edge devices.
In the distillation part, the classical soft label distillation method was adopted. This strategy fixes the teacher model (the pre-trained edge model weights) and compares the logits bias outputs of the student model under the same input, achieving parameter-level consistency learning. By aligning the outputs of head.score.bias[class_id] between the teacher and student models and minimizing the Mean Squared Error (MSE), local parameter knowledge transfer and learning for the target category are achieved.
The inference results are shown in Table 7.
As shown in the table, compared with the original model, the precision, recall, and AP metrics for each category exhibit slight fluctuations after applying the pruning strategy. However, the overall differences are not significant, indicating that the model still maintains strong representation capability even with reduced parameter volume. In addition, the soft label distillation strategy demonstrates performance that is almost identical to, or slightly better than, the original model across all categories. This indicates that effective knowledge transfer can be achieved by aligning and optimizing head.score.bias without modifying the model’s structure. In summary, the proposed transfer feature extraction mechanism maintains good robustness and stability compared with mainstream adaptation strategies, such as structured pruning and soft label distillation, further validating its feasibility and practical value in edge scenarios.

5. Conclusions and Future Work

This study proposes a cloud-edge collaborative model adaptation method that combines DQN-based decision making with transfer feature extraction, aiming to enhance the consistency between edge-side inference results and cloud-side fusion results while ensuring the robustness of the edge model. The proposed method validates the feasibility and effectiveness of a DQN-based local migration strategy in ensuring the consistency between edge-side inference results and cloud-side fusion results, while also maintaining the robustness of the edge detection model to a certain extent.
However, due to the inherent limitations of the edge model’s detection accuracy, the current feature migration strategy still struggles to significantly overcome performance bottlenecks in category classification tasks. In addition, the proposed method still has certain limitations in handling edge intelligence tasks, such as few-shot learning, new category adaptation, and long-term online updates. On the one hand, the local migration strategy relies on sufficient training samples of existing categories, making it less adaptable to new categories or scenarios with scarce data. On the other hand, as edge devices operate over extended periods, the number of target categories will continuously expand. Therefore, the model needs to possess continuous learning capabilities to prevent previously learned knowledge from being overwritten by new tasks, thus avoiding catastrophic forgetting.
Based on the above issues, future research will focus on the following three directions to further improve the cloud-edge collaborative optimization framework.
(1) Enhancing the Detection Accuracy of the Edge Model Itself
In the future, the detection capability of the edge model will be comprehensively improved by enhancing the diversity of training data, such as expanding data sampling and applying data augmentation strategies, and by optimizing the model’s architecture, for example, by introducing lightweight and efficient feature extraction modules and multi-scale detection heads. These enhancements will strengthen the model’s perception accuracy from the source, providing a more solid performance foundation for subsequent migration and updates.
(2) Introducing Meta-Learning Mechanisms to Enhance Few-Shot Adaptation Capability
The current strategy mainly focuses on migration optimization for known categories, lacking the ability to quickly adapt to new classes. Future research will introduce a meta-learning framework, such as MAML or Prototypical Networks, for pre-training on the cloud side. This will enable the edge model to rapidly learn from a small number of samples, effectively addressing few-shot scenarios, such as the emergence of new aircraft types in real-world applications, and enhancing the model’s generalization capability.
(3) Integrating Incremental Learning Strategies to Achieve Continuous Learning and Knowledge Retention
To address the practical need for continuously expanding target categories in edge scenarios, it is necessary to construct an update mechanism with knowledge retention. Future work will explore class-incremental learning methods, such as Elastic Weight Consolidation (EWC), Learning without Forgetting (LwF), and replay buffer strategies, to enable the model to retain the recognition ability of previously learned categories when incorporating new ones. This will help prevent catastrophic forgetting and support the long-term stable operation and knowledge-compatible learning of edge intelligence systems.

Author Contributions

Conceptualization, S.T.; Methodology, J.C., X.C. and Y.J.; Validation, J.C.; Formal analysis, S.T.; Investigation, Y.J.; Writing—original draft, J.C.; Visualization, Y.J.; Supervision, X.C. and S.T.; Funding acquisition, X.C. and S.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research is sponsored by the National Natural Science Foundation of China (62273147) and the Foundation of Shanghai Key Laboratory of Collaborative Computing in Spatial Heterogenous Networks (CCSN-2025-07).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Shan, C.; Gao, R.; Han, Q.; Liu, T.; Yang, Z.; Zhang, J.; Xia, Y. KCES: A Workflow Containerization Scheduling Scheme Under Cloud-Edge Collaboration Framework. IEEE Internet Things J. 2024, 12, 2026–2042. [Google Scholar] [CrossRef]
  2. Lin, W.; Zhu, M.; Zhou, X.; Zhang, R.; Zhao, X.; Shen, S.; Sun, L. A Deep Neural Collaborative Filtering Based Service Recommendation Method with Multi-Source Data for Smart Cloud-Edge Collaboration Applications. Tsinghua Sci. Technol. 2024, 29, 897–910. [Google Scholar] [CrossRef]
  3. Guo, L.; He, Y.; Wan, C.; Li, Y.; Luo, L. From cloud manufacturing to cloud-edge collaborative manufacturing. Robot. Comput. Manuf. 2024, 90, 102790. [Google Scholar] [CrossRef]
  4. Xu, P.; Wang, K.; Hassan, M.M.; Chen, C.-M.; Lin, W.; Hassan, R.; Fortino, G. Adversarial Robustness in Graph-Based Neural Architecture Search for Edge AI Transportation Systems. IEEE Trans. Intell. Transp. Syst. 2022, 24, 8465–8474. [Google Scholar] [CrossRef]
  5. Feng, L.; Yang, Y.; Tan, M.; Zeng, T.; Tang, H.; Li, Z.; Niu, Z.; Feng, F. Adaptive multi-source domain collaborative fine-tuning for transfer learning. PeerJ Comput. Sci. 2024, 10, e2107. [Google Scholar] [CrossRef] [PubMed]
  6. Cao, Z.; Kwon, M.; Sadigh, D. Transfer Reinforcement Learning Across Homotopy Classes. IEEE Robot. Autom. Lett. 2021, 6, 2706–2713. [Google Scholar] [CrossRef]
  7. Wang, Y.; Liu, H.; Zheng, W.; Xia, Y.; Li, Y.; Chen, P.; Guo, K.; Xie, H. Multi-Objective Workflow Scheduling with Deep-Q-Network-Based Multi-Agent Reinforcement Learning. IEEE Access 2019, 7, 39974–39982. [Google Scholar] [CrossRef]
  8. Zhong, H.; Yu, S.; Trinh, H.; Lv, Y.; Yuan, R.; Wang, Y. Fine-tuning transfer learning based on DCGAN integrated with self-attention and spectral normalization for bearing fault diagnosis. Measurement 2023, 210, 112421. [Google Scholar] [CrossRef]
  9. Jiang, Y.; Wang, S.; Valls, V.; Ko, B.J.; Lee, W.-H.; Leung, K.K.; Tassiulas, L. Model Pruning Enables Efficient Federated Learning on Edge Devices. IEEE Trans. Neural Networks Learn. Syst. 2022, 34, 10374–10386. [Google Scholar] [CrossRef] [PubMed]
  10. Wang, C.-H.; Huang, K.-Y.; Yao, Y.; Chen, J.-C.; Shuai, H.-H.; Cheng, W.-H. Lightweight Deep Learning: An Overview. IEEE Consum. Electron. Mag. 2022, 13, 51–64. [Google Scholar] [CrossRef]
  11. Li, Y.; Zhang, S.; Wang, W.-Q. A Lightweight Faster R-CNN for Ship Detection in SAR Images. IEEE Geosci. Remote. Sens. Lett. 2020, 19, 4006105. [Google Scholar] [CrossRef]
  12. Mei, S.; Chen, X.; Zhang, Y.; Li, J.; Plaza, A. Accelerating Convolutional Neural Network-Based Hyperspectral Image Classification by Step Activation Quantization. IEEE Trans. Geosci. Remote. Sens. 2021, 60, 5502012. [Google Scholar] [CrossRef]
  13. Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge Distillation: A Survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
  14. Gou, J.; Sun, L.; Yu, B.; Wan, S.; Tao, D. Hierarchical Multi-Attention Transfer for Knowledge Distillation. ACM Trans. Multimedia Comput. Commun. Appl. 2023, 20, 1–20. [Google Scholar] [CrossRef]
  15. Zhao, C.; Wang, S.; Li, D.; Liu, X.; Yang, X.; Liu, J. Cross-domain sentiment classification via parameter transferring and attention sharing mechanism. Inf. Sci. 2021, 578, 281–296. [Google Scholar] [CrossRef]
  16. Xuan, S.; Zheng, L.; Chung, I.; Wang, W.; Man, D.; Du, X.; Yang, W.; Guizani, M. An incentive mechanism for data sharing based on blockchain with smart contracts. Comput. Electr. Eng. 2020, 83, 106587. [Google Scholar] [CrossRef]
  17. Miao, Q.; Lin, H.; Wang, X.; Hassan, M.M. Federated deep reinforcement learning based secure data sharing for Internet of Things. Comput. Networks 2021, 197, 108327. [Google Scholar] [CrossRef]
  18. Taghipour, S.; Namoura, H.A.; Sharifi, M.; Ghaleb, M. Real-time production scheduling using a deep reinforcement learning-based multi-agent approach. INFOR: Inf. Syst. Oper. Res. 2024, 62, 186–210. [Google Scholar] [CrossRef]
  19. Liang, Z.; Yang, R.; Wang, J.; Liu, L.; Ma, X.; Zhu, Z. Dynamic constrained evolutionary optimization based on deep Q-network. Expert Syst. Appl. 2024, 249, 123592. [Google Scholar] [CrossRef]
  20. Lee, S.; Choo, H.; Ismail, R. Smart Manufacturing Scheduling System: DQN based on Cooperative Edge Computing. In Proceedings of the 15th International Conference on Ubiquitous Information Management and Communication (IMCOM 2021), Seoul, Republic of Korea, 4–6 January 2021. [Google Scholar] [CrossRef]
  21. Qin, W.; Chen, H.; Wang, L. PASD: A Prioritized Action Sampling-Based Dueling DQN for Cloud-Edge Collaborative Computation Offloading in Industrial IoT. In China Conference on Wireless Sensor Networks; Springer Nature: Singapore, 2022; pp. 19–30. [Google Scholar] [CrossRef]
  22. Moustafa, M.S.; Metwalli, M.R.; Samshitha, R.; Mohamed, S.A.; Shovan, B. Cyclone detection with end-to-end super resolution and faster R-CNN. Earth Sci. Informatics 2024, 17, 1837–1850. [Google Scholar] [CrossRef]
  23. Xiao, Y.; Tian, Z.; Yu, J.; Zhang, Y.; Liu, S.; Du, S.; Lan, X. A review of object detection based on deep learning. Multimedia Tools Appl. 2020, 79, 23729–23791. [Google Scholar] [CrossRef]
  24. Nawaz, S.A.; Li, J.; Bhatti, U.A.; Shoukat, M.U.; Ahmad, R.M. AI-based object detection latest trends in remote sensing, multimedia and agriculture applications. Front. Plant Sci. 2022, 13, 1041514. [Google Scholar] [CrossRef] [PubMed]
  25. Fosić, I.; Žagar, D.; Grgić, K.; Križanović, V. Anomaly detection in NetFlow network traffic using supervised machine learning algorithms. J. Ind. Inf. Integr. 2023, 33, 100466. [Google Scholar] [CrossRef]
  26. Zhu, H.; Wei, H.; Li, B.; Yuan, X.; Kehtarnavaz, N. A Review of Video Object Detection: Datasets, Metrics and Methods. Appl. Sci. 2020, 10, 7834. [Google Scholar] [CrossRef]
  27. Kaur, R.; Singh, S. A comprehensive review of object detection with deep learning. Digit. Signal Process. 2022, 132, 103812. [Google Scholar] [CrossRef]
Figure 1. The workflow of cloud-edge collaboration.
Figure 1. The workflow of cloud-edge collaboration.
Applsci 15 08335 g001
Figure 2. The framework of cloud-edge collaborative model adaptation.
Figure 2. The framework of cloud-edge collaborative model adaptation.
Applsci 15 08335 g002
Figure 3. The algorithm flowchart.
Figure 3. The algorithm flowchart.
Applsci 15 08335 g003
Figure 4. Scenes 1, 2, and 3.
Figure 4. Scenes 1, 2, and 3.
Applsci 15 08335 g004
Figure 5. Scenes 4, 5, and 6.
Figure 5. Scenes 4, 5, and 6.
Applsci 15 08335 g005
Figure 6. The evolution of the reward function over episodes across the six experimental scenarios.
Figure 6. The evolution of the reward function over episodes across the six experimental scenarios.
Applsci 15 08335 g006
Table 1. Q-Network architecture.
Table 1. Q-Network architecture.
ParameterDescription
Input Dimension3( S c = ( a c c c e d g e , a c c c c l o u d , Δ c ) )
Output Dimension2( a c { 0 , 1 } )
Number of Hidden Layers2 layers
Neurons Per Layer64 neurons per layer
Activation FunctionReLU
Table 2. Reinforcement learning algorithm parameters.
Table 2. Reinforcement learning algorithm parameters.
ParameterValueDescription
Learning Rate (lr)1 × 10−3Initial learning rate for the Adam optimizer
Discount Factor (γ)0.99Degree of consideration for future rewards
Initial ε Value1.0Initial exploration rate (fully random)
Minimum ε Value0.05Lower bound of ε to maintain some exploration
ε Decay Factor0.995ε is multiplied by this factor after each update
Replay Buffer Size10,000Stores past state-action transitions for replay
Batch Size64Number of samples per training iteration
Target Network Update Frequency50Synchronize target network parameters every 50 steps
Table 3. The overall architecture of the edge model (Faster R-CNN).
Table 3. The overall architecture of the edge model (Faster R-CNN).
Module NameEffectRepresentative Shape
extractorFeature extraction (ResNet)Muti-level convolution and residual blocks
rpnRegion proposalCandidate box classification and position regression
head.classifierFeature encodingThree bottleneck blocks
head.cls_loc.weightPosition regression head weight(4 × 12, 2048)
head.cls_loc.biasPosition regression head bias(4 × 12, 1)
head.score.weightClass score output head weights(12, 2048)
head.score.biasCategory score bias(12, 1)
Table 4. Aircraft categories and quantities in the dataset.
Table 4. Aircraft categories and quantities in the dataset.
CategoriesQuantities
A22010,420
A3214159
A3302502
A3501613
ARJ21319
Boeing7376040
Boeing7472658
Boeing7772005
Boeing7872523
C919260
Other-airplane16,930
Table 5. Inference results of reinforcement learning ablation experiment.
Table 5. Inference results of reinforcement learning ablation experiment.
CategoryModel UsedPrecisionRecallAP
A321Original Model70.30%60.68%53.66%
Without Reinforcement Learning67.44%55.77%47.17%
A330Original Model32.26%55.56%52.38%
Without Reinforcement Learning28.57%50.00%45.83%
Other-planeOriginal Model44.12%57.69%37.85%
Without Reinforcement Learning41.43%55.77%34.63%
Table 6. Comparison of accuracy results for different fine-tuned components of the aircraft model on selected categories.
Table 6. Comparison of accuracy results for different fine-tuned components of the aircraft model on selected categories.
Migration CategoryMigration Model SitePrecisionRecallAP
A321Original model70.30%60.68%53.66%
head.score.bias70.30%60.68%52.71%
head.score.bias+head.score.weight71.00%60.68%53.71%
head.score.bias+head.score.weight+
head.cls_loc.weight+head.cls_loc.bias
31.28%60.68%34.87%
A330Original model32.26%55.56%52.38%
head.score.bias32.22%55.56%52.38%
head.score.bias+head.score.weight39.62%55.56%22.02%
head.score.bias+head.score.weight+
head.cls_loc.weight+head.cls_loc.bias
36.85%55.56%20.47%
Other-planeOriginal model44.12%57.69%37.85%
head.score.bias44.11%57.69%37.85%
head.score.bias+head.score.weight32.94%59.40%30.26%
head.score.bias+head.score.weight+
head.cls_loc.weight+head.cls_loc.bias
33.33%59.83%30.26%
Table 7. Inference results of different model adaptation methods.
Table 7. Inference results of different model adaptation methods.
CategoryModel UsedPrecisionRecallAP
A321Original Model70.30%60.68%53.66%
Pruning67.44%55.77%48.43%
Distillation70.28%60.68%53.71%
A330Original Model32.26%55.56%52.38%
Pruning33.33%50.00%42.71%
Distillation32.22%55.56%52.38%
Other-planeOriginal Model44.12%57.69%37.85%
Pruning43.70%56.73%37.61%
Distillation44.11%57.69%37.85%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, J.; Cheng, X.; Jia, Y.; Tan, S. Cloud–Edge Collaborative Model Adaptation Based on Deep Q-Network and Transfer Feature Extraction. Appl. Sci. 2025, 15, 8335. https://doi.org/10.3390/app15158335

AMA Style

Chen J, Cheng X, Jia Y, Tan S. Cloud–Edge Collaborative Model Adaptation Based on Deep Q-Network and Transfer Feature Extraction. Applied Sciences. 2025; 15(15):8335. https://doi.org/10.3390/app15158335

Chicago/Turabian Style

Chen, Jue, Xin Cheng, Yanjie Jia, and Shuai Tan. 2025. "Cloud–Edge Collaborative Model Adaptation Based on Deep Q-Network and Transfer Feature Extraction" Applied Sciences 15, no. 15: 8335. https://doi.org/10.3390/app15158335

APA Style

Chen, J., Cheng, X., Jia, Y., & Tan, S. (2025). Cloud–Edge Collaborative Model Adaptation Based on Deep Q-Network and Transfer Feature Extraction. Applied Sciences, 15(15), 8335. https://doi.org/10.3390/app15158335

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop