Cloud–Edge Collaborative Model Adaptation Based on Deep Q-Network and Transfer Feature Extraction

Chen, Jue; Cheng, Xin; Jia, Yanjie; Tan, Shuai

doi:10.3390/app15158335

Open AccessArticle

Cloud–Edge Collaborative Model Adaptation Based on Deep Q-Network and Transfer Feature Extraction

¹

Key Laboratory of Smart Manufacturing in Energy Chemical Process, Ministry of Education, East China University of Science and Technology, Shanghai 200237, China

²

Shanghai Aerospace Electronic Technology Institute, Shanghai 201109, China

³

Shanghai Key Laboratory of Collaborative Computing in Spacial Heterogeneous Networks (CCSN), Shanghai 201109, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8335; https://doi.org/10.3390/app15158335

Submission received: 27 June 2025 / Revised: 19 July 2025 / Accepted: 21 July 2025 / Published: 26 July 2025

Download

Browse Figures

Versions Notes

Abstract

Featured Application

Cloud-Edge Collaborative Model Adaptation Based on Deep Q-Network and Transfer Feature Extraction

Abstract

With the rapid development of smart devices and the Internet of Things (IoT), the explosive growth of data has placed increasingly higher demands on real-time processing and intelligent decision making. Cloud-edge collaborative computing has emerged as a mainstream architecture to address these challenges. However, in sky-ground integrated systems, the limited computing capacity of edge devices and the inconsistency between cloud-side fusion results and edge-side detection outputs significantly undermine the reliability of edge inference. To overcome these issues, this paper proposes a cloud-edge collaborative model adaptation framework that integrates deep reinforcement learning via Deep Q-Networks (DQN) with local feature transfer. The framework enables category-level dynamic decision making, allowing for selective migration of classification head parameters to achieve on-demand adaptive optimization of the edge model and enhance consistency between cloud and edge results. Extensive experiments conducted on a large-scale multi-view remote sensing aircraft detection dataset demonstrate that the proposed method significantly improves cloud-edge consistency. The detection consistency rate reaches 90%, with some scenarios approaching 100%. Ablation studies further validate the necessity of the DQN-based decision strategy, which clearly outperforms static heuristics. In the model adaptation comparison, the proposed method improves the detection precision of the A321 category from 70.30% to 71.00% and the average precision (AP) from 53.66% to 53.71%. For the A330 category, the precision increases from 32.26% to 39.62%, indicating strong adaptability across different target types. This study offers a novel and effective solution for cloud-edge model adaptation under resource-constrained conditions, enhancing both the consistency of cloud-edge fusion and the robustness of edge-side intelligent inference.

Keywords:

reinforcement learning; transfer feature extraction; Q-learning; object detection; cloud-edge collaboration; model adaptation

1. Background

With the development of smart devices and Internet of Things (IoT) technologies, the massive generation of data has placed higher demands on real-time processing and intelligent analysis. The collaboration between cloud computing and edge computing has become the mainstream architecture. Edge computing is responsible for real-time data processing at the device level near the data source, while cloud computing handles large-scale data analysis, storage, and processing in the resource-rich cloud environment [1]. Reasonably, allocating tasks between the cloud and the edge to achieve cloud-edge collaborative decision making has become a mainstream research focus in recent years. Cloud-edge collaborative decision making features distributed processing and low latency [2], making it widely applicable in fields like intelligent security, industrial quality inspection, autonomous driving, and smart transportation [3].

Although the cloud-edge collaborative architecture provides strong support for various intelligent application scenarios, the strict constraints on computing power, energy consumption, and storage of edge devices lead to reduced accuracy and robustness of models in tasks like object detection or classification [4]. To this end, a growing number of studies have focused on the fine-tuning of edge-side models (see [5,6]) by introducing an optimization mechanism equipped with intelligent perception and decision making capabilities. This enables the system to adaptively perform localized updates of the classification head based on the real-time state of edge devices, thereby improving the long-term stability and adaptability of edge-side models.

To address the need for local adaptation of edge models in classification tasks within object detection, RL (reinforcement learning) offers a feasible strategy-driven framework capable of learning optimal action policies through interaction with environmental feedback [7]. In particular, the Deep Q-Network (DQN) models the decision making process through a state-action-reward framework and can be employed to intelligently determine whether model transfer should be performed for a specific target class [8]. In addition, the feature extraction mechanism in transfer learning [9] can adjust the feature parameters of a specific category in the edge model based on discrepancies between detection results from the cloud and the edge. This enables precise and cost-effective model enhancement without compromising the stability of the edge-side backbone network. The integration of both approaches gives rise to a collaborative optimization framework that unifies cloud-edge policy decision making with parameter transfer. The DQN agent makes transfer decisions for specific categories based on the observed performance discrepancies between the cloud and edge environments. The transfer mechanism achieves rapid adaptation to edge-side data by replacing the parameters of the classification head. This design not only enhances the robustness of the edge model in complex and dynamic scenarios but also facilitates efficient coordination of computational resources between the cloud and the edge.

In recent years, various approaches have been proposed to address model deployment and optimization in edge environments, including lightweight techniques, such as model pruning, quantization, distillation, and parameter sharing. Pruning (see [10,11]) and quantization (see [12,13]) reduce model storage and improve inference efficiency by removing unimportant connections or neurons in neural networks or by converting high-precision floating-point values into low-precision integers. Distillation (see [14,15]) reduces model size while preserving performance by transferring knowledge from a large model to a smaller one. Parameter sharing (see [16,17]) reduces the number of parameters and improves computational efficiency by reusing identical weights across the network. Although these methods effectively reduce model size and computational overhead, they still require a trade-off between accuracy and efficiency in practical applications. Moreover, existing studies often focus on compressing or distilling the entire model, while few approaches address localized transfer and adaptive control specifically targeting the classification head. Meanwhile, in edge environments characterized by significant data heterogeneity, there remains a lack of suitable transfer methods equipped with intelligent decision making mechanisms. While reinforcement learning has been explored in areas like task scheduling and inference path selection [18], research on class-wise decision making for local model transfer using DQN is still in its early stages [19]. Ref. [20] employed a Deep Q-Network (DQN) to perform collaborative scheduling among edge devices within a Multi-Access Edge Computing (MEC) architecture, aiming to solve the job shop scheduling problem in smart factory processes, while ref. [21] proposed a Prioritized Action Sampling-based Dueling DQN (PASD) algorithm to determine task offloading and resource allocation strategies, aiming to minimize average task delay and total system energy consumption, thereby addressing the joint allocation of network and computing resources in cloud-edge collaborative Industrial Internet of Things (IIoT) environments. This is particularly scarce within the context of cloud-edge collaboration. Therefore, there is an urgent need for a systematic approach that integrates reinforcement-learning-based decision making mechanisms with local transfer optimization strategies to enable adaptive edge model deployment.

This paper addresses the challenges of the sky-ground integrated systems, specifically the limited edge computing resources and the inconsistency between cloud-edge fusion results and edge-side detection results. The paper proposes a cloud-edge collaborative edge model adaptation method that integrates DQN-based decision making with transfer feature extraction. The aim is to enhance the consistency between the edge model and the cloud fusion results, thereby ensuring the robustness of the edge model. The main contributions of this paper include the following.

Model Adaptive Transfer Mechanism: A DQN agent is employed to perform class-level decision making for transfer, enabling model updates for specific target categories.

Dynamic Decision Making Mechanism: Unlike traditional static strategies, the proposed method can dynamically determine the scope of patch transfer in real time based on the current operational state of the edge device.

Cloud-Edge Collaborative Enhancement: A general transfer strategy for multi-class object detection is developed to improve the consistency between edge-side models and cloud-side fusion results.

The overall structure of the paper is organized as follows.

Section 2 (Preliminaries) introduces the problem setting and core assumptions, which lay the foundation for the subsequent model design; Section 3 (Model Framework) presents the proposed cloud-edge collaborative model adaptation framework, detailing both the decision making mechanism and the transfer strategy; Section 4 (Experiment) provides extensive experimental validation to evaluate the effectiveness and practicality of the proposed method; and Section 5 (Conclusion and Future Work) summarizes the research findings and outlines potential directions for future work.

2. Preliminaries

For edge object detection in multi-view satellite imagery scenarios, a cloud-edge collaborative model adaptation method driven by reinforcement learning is proposed. This method integrates a Faster Region-based Convolutional Neural Network (Faster R-CNN) [22], transfer learning-based feature extraction strategies, and a DQN-based decision mechanism to achieve dynamic optimization of the classification head in edge-side models.

2.1. Definition

Set at a certain observation time

t

, capture static image data of aircraft parked at an airport from four different perspectives

θ \in {0 °, 90 °, 180 °, 270 °}

, and the image may contain multiple aircraft targets. Each image is input into the target detection Faster R-CNN (Faster Region-Based Convolutional Neural Network) model on the edge side

I_{t}^{θ}

. The image may contain multiple aircraft targets. Faster R-CNN model for object detection on the edge side of each image input

f_{e d g e}

. Obtain the category prediction set:

{\tilde{Y}}_{t}^{θ} = f_{e d g e} (I_{t}^{θ})

(1)

The prediction results of all perspectives are summarized to the cloud, and the global prediction result is obtained through the cloud fusion algorithm

f_{f u s i o n}

:

{\tilde{Y}}_{t}^{c l o u d} = f_{f u s i o n} ({{\tilde{Y}}_{t}^{θ}}_{θ °, 90 °, 180 °, 270 °}})

(2)

The model detects 11 types of aircraft, defined as the category set:

C = {A 220, A 321, A 330, A 350, A R J 21, B o e i n g 737, B o e i n g 747, B o e i n g 777, B o e i n g 787, C 919, o t h e r - a i r p l a n e}

(3)

Record the detection results of the cloud side and edge measurement at this moment as the state input of reinforcement learning, and record them in the result.xlsx file.

2.2. Problem Statement

In collaborative decision making for cloud-edge object detection, edge-side detection results are expected to closely approximate cloud-side fusion outcomes under limited computational resources. Given the high cost and time-consuming nature of adjusting the entire model on the edge side, frequent full-model updates are impractical. Therefore, this approach introduces reinforcement-learning-based decision and policy mechanisms along with a localized classification head transfer strategy. It dynamically determines whether to execute transfer based on the state of each category and selectively replaces corresponding classification weights, thereby enhancing edge model performance at a low cost.

3. Model Framework

Figure 1 illustrates the workflow of cloud-edge collaboration. The edge-side model first performs object detection on the target image using a pre-trained edge model and uploads the detection results to the cloud. The cloud then fuses multi-view detection results from the edge. In response to inconsistencies between cloud-side and edge-side detection results, the edge employs a reinforcement-learning-based decision mechanism to determine whether to conduct localized classification head transfer training for specific categories. This enables low-overhead, highly adaptive model self-updating, ultimately forming a closed-loop cloud-edge collaborative system that integrates edge detection, cloud fusion, and policy optimization.

To address the inconsistency between edge-side detection results and cloud-side fusion outcomes, a cloud-edge collaborative model adaptation method is proposed, which integrates DQN-based decision making with transfer feature extraction. The detailed model architecture is illustrated in Figure 2.

3.1. Reinforcement Learning Decision and Strategy Mechanism

In order to achieve adaptive optimization of edge detection models in multi-view target detection tasks, a category migration decision mechanism based on reinforcement learning is designed. The Deep Q-Network (DQN) is used as the core decision model. The state-driven method is used to determine whether a certain category performs local migration operations, thereby achieving the adaptation of edge detection models under limited resources.

3.1.1. State Vector Construction

For each target category

c \in C

, a state vector

S_{c}

is constructed, which includes the following three components:

S c = (a c c_{c}^{e d g e}, a c c_{c}^{c l o u d}, Δ_{c})

(4)

a c c_{c}^{e d g e}

: The detection accuracy rate of the edge-side model for category c, calculated as

a c c_{c}^{e d g e} = \frac{N_{t o t a l, c}^{e d g e}}{N_{s e n s o r s (s a t e l l i t e s)}}

.

a c c_{c}^{c l o u d}

: The detection accuracy rate of the cloud-side fusion model for category c, calculated as

a c c_{c}^{c l o u d} = \frac{N_{t o t a l, c}^{c l o u d}}{N_{s e n s o r s (s a t e l l i t e s)}}

.

Δ c = a c c_{c}^{c l o u d} - a c c_{c}^{e d g e}

is the difference between the cloud-side and edge-side accuracies, used as an immediate reward signal in reinforcement learning. This reward

r_{c} = Δ_{c}

reflects the accuracy improvement brought by the migration operation at the edge, which serves as the core feedback signal for Q network optimization.

N_{s e n s o r s (s a t e l l i t e s)}

: Total number of sensors (satellites).

N_{t o t a l, c}^{e d g e} :

The total number of sensors (satellites) in a real aircraft that are determined to be of category c by the edge model.

N_{t o t a l, c}^{c l o u d}

: The total number of sensors (satellites) in a real aircraft that are determined to be of category c by the cloud model fusion.

3.1.2. Reinforcement Learning Strategy

A Deep Q-Network (DQN) is used as the migration strategy agent. For each type of state Sc, DQN outputs an action set,

a_{c} \in {0, 1}

, where 1 means executing migration and 0 means skipping migration. The action selection strategy adopts an ε-greedy policy, where the agent selects a random action (exploration) with probability ε and chooses the action with the highest Q-value in the current Q-network (exploitation) with probability 1 − ε. The value of ε gradually decays during training, allowing the strategy to shift from early-stage exploration to convergence towards the optimal policy. For the Reinforcement Learning Experience Pool

(s_{c}, a_{c}, r_{c}, s_{c}^{'})

, trained by replaying the experience pool, the goal is to minimize the Bellman residual:

L_{Q} = {(Q (s_{c}, a_{c}) - [r_{c} + γ \cdot \max_{a^{'}} Q (s_{c}^{'}, a^{'})])}^{2}

(5)

where

γ

is the discount factor used to balance long-term and short-term benefits. The strategy training goal is to learn the optimal Q function:

π^{*} (s) = \arg \max_{a} Q (s, a)

(6)

This mechanism can dynamically adjust the migration strategy of the edge model according to the performance feedback of the cloud and the edge in each iteration.

The structure of the Q-network is shown in Table 1.

The reinforcement learning algorithm parameters are shown in Table 2.

3.1.3. Reward Design and Decision-Making Mechanism

The reward function is used to measure the edge side accuracy improvement after the migration operation, and it is defined as

r_{c} = a c c_{c}^{c l o u d} - a c c_{c}^{e d g e}

(7)

When a category

c

is selected to perform the migration operation, that is, action = 1, and distillation training is performed on the category, the updated edge and cloud detection accuracy is calculated. If the accuracy difference is less than the set threshold

θ

= 0.1, the migration of this category is considered effective, and this category is regarded as a positive experience. This category enters the “frozen” state, and migration will not be repeated in the subsequent training process to avoid resource waste. If action = 0, migration of this category is skipped, and the reward value of the previous round is retained in the current round to maintain the continuity of the strategy.

3.2. Local Transfer Mechanism of Edge Model

3.2.1. Local Classification Head Feature Transfer

In order to improve the detection capability of specific categories while ensuring the integrity and stability of the backbone structure of the edge detection model, this paper designs a migration mechanism based on local classification head replacement. Specifically, for a certain category

c \in C

, only train the weight vector

W c

and the bias term

b_{c}

; the rest of the network is frozen. The training goal is to minimize the classification loss and bounding box regression loss of the model on the current category. The optimization objective function is

L_{c} = L_{c l s} + β L_{r e g}

(8)

where

L_{c l s} = C E (f_{θ c} (x), y_{c}^{c l o u d})

[23] represents the cross-over loss between the classification head’s prediction of the category and the cloud-side fusion label,

L_{r e g}

[24] represents the smooth LI loss of the bounding box position prediction, which is used to constrain the position offset, and

β

is the loss weight coefficient.

The training process uses mini-batch gradient descent SGD (Stochastic Gradient Descent) [25] optimization, and training is continued only when the classification loss decreases significantly. If the loss of multiple consecutive batches is lower than the threshold of 0.001, the system automatically triggers early termination to avoid overfitting. After training is completed, the optimized parameters are extracted and saved as category patches, and the patches are stored as pth files:

p a t c h_{c} = (W_{c}^{n e w}, b_{c}^{n e w})

. The patch will be integrated into the basic model on the edge side when it is later determined that it needs to be migrated, improving the detection performance of this category.

3.2.2. Patch Fusion and Model Update

After the local classification head feature migration is completed, the score head weight with index C + 1 (taking the background into account) is included in the classification head. Fuse and integrate the head score

W_{c} \in R^{2048}

and bias weights

W_{c} \in R

into the original model to achieve a local model update.

W_{c}^{b a s e}

is replaced by

W_{c}^{b a s e}

and

b_{c}^{b a s e}

is replaced by

b_{c}^{p a t c h}

, where

W_{c}^{b a s e}

and

b_{c}^{b a s e}

are the parameter part of the model before replacement. The edge model adopts the Faster R-CNN framework with a ResNet backbone for feature extraction. The overall architecture is shown in Table 3.

3.2.3. Edge Inference and State Update

After the local migration model is fused, the fused model is used for the inference of the edge side four-view satellite images (0°, 90°, 180°, 270°) and then sent to the cloud again for fusion detection results to update the migrated

a c c_{c}^{e d g e}

and

a c c_{c}^{c l o u d}

as the state input item of the next round of the reinforcement learning module.

3.3. The Algorithm Flowchart and Its Corresponding Pseudocode

The algorithm flowchart is shown in Figure 3, and the pseudocode of the algorithm is shown in Algorithm 1.

Algorithm 1. The pseudocode of the algorithm. Adaptive Cloud-Edge Collaborative Model Optimization Algorithm

Initialize DQN agent;
Load initial detection results from result.xlsx;
For each class c in the category set do:
Step 1 Construct State Vector for Reinforcement Learning
Compute edge-side detection accuracy: acc_edge_c;
Compute cloud-side fusion accuracy: acc_cloud_c;
Compute accuracy gap: Δc = acc_cloud_c - acc_edge_c;
Form state vector Sc = {acc_edge_c, acc_cloud_c, Δc};
Step 2 Input state into DQN agent and obtain action
action_c = DQN.predict(Sc);
If action_c == 1:
Load class-specific training samples (labeled XML);
Freeze all parameters of Faster R-CNN except for classification head of class c;
Calculate L_C and Perform gradient update only on:
- head.score.weight[c]
- head.score.bias[c];
Save the updated parameters as class patch (pth file);
Merge the patch into the base edge model;
Step 3 Perform Edge Inference and Update State
Perform inference on multi-view (4-angle) satellite images;
Update result.xlsx with new edge-side predictions;
Recalculate acc_edge_c;
Else:
No parameter update, retain previous model state;
Step 4 Reinforcement Learning Update
Compute reward r_c = |acc_cloud_c - acc_edge_c|;
Store experience tuple (Sc, action_c, r_c, Sc + 1) into replay buffer;
Update DQN by minimizing Bellman loss L_Q using sampled experiences;
Step 5 Check Termination Condition
If |acc_cloud_c - acc_edge_c| < threshold:
Mark class c as “converged”; skip further transfer for this class;
Else:
Continue iterative transfer learning for class c;
End For
Output: Updated edge model and final result.xlsx with optimized edge-side detection results

4. Experiment

4.1. Dataset Description

This study utilizes a publicly available aircraft dataset tailored for large-scale and complex scenarios, comprising 1479 original images. To simulate satellite imagery from multiple viewing angles, each image is rotated to four orientations (0°, 90°, 180°, and 270°), resulting in a total of 5916 images. Specifically, the 0° images are from the publicly available dataset, while the other angles are generated by rotating the 0° images. The dataset has a spatial resolution of 1 m and includes 11 aircraft categories: A220, A321, A330, A350, ARJ21, Boeing 737, Boeing 747, Boeing 777, Boeing 787, C919, and other-airplane.

The images were primarily collected from three major civilian airports: Shanghai Hongqiao International Airport, Beijing Capital International Airport, and Taiwan Taoyuan International Airport. The dataset features a variety of image sizes to reflect diverse real-world conditions.

For edge-side model training, the dataset is split into 81% for training, 9% for validation, and 10% for testing. The number of samples per aircraft category is detailed in Table 4.

4.2. Experimental Environment

The model proposed in the study is implemented in Python (version: 3.12.7 (packaged by Anaconda, Inc., released on 4 October 2024)). The fusion module is based on the Faster R-CNN target detection framework, and training and reasoning are completed under the PyTorch(PyCharm 2024.3 (Community Edition), Build #PC-243.21565.199, built on 13 November 2024) framework. Edge deployment reasoning is equipped with an NVIDIA GeForce RTX 4070 GPU and an SM7 AI acceleration chip. The backbone network of the edge target detection model uses ResNet-50. The edge model freezes the backbone and only trains the parameters of the category score output head (score.weight and score.bias) to achieve local migration optimization.

4.3. Consistency Verification of Cloud-Edge Collaboration Implemented Through Reinforcement Learning

For the detection results of different scenes, the consistency of the side detection results of a certain category of aircraft and the cloud-side fusion results are updated iteratively during the training process of the DQN network. The following figure is a randomly selected picture with an angle of 0 in six scenes, and the number of aircraft is 6, 8, 12, 14, 22, and 33 respectively. Scenes 1, 2, and 3 correspond to the images shown in Figure 4, while Scenes 4, 5, and 6 correspond to the images shown in Figure 5.

As the number of DQN iterations increases for each type of scenario, the reward function changes, as shown in Figure 6.

The trends of the curves indicate that the DQN agent is able to quickly learn effective strategies, leading to the optimization of model performance. A detailed analysis is provided below.

(1) Overall Characteristics of the Convergence Trend

The reward curves in all six scenarios show a declining trend followed by stabilization, indicating that the DQN is able to converge effectively in these environments. In the initial stage, the reward values are relatively high, reflecting significant discrepancies between the edge model and the cloud fusion results before optimization. After three to five training iterations, the rewards in most scenarios approach zero, indicating that the edge model, under the guidance of the DQN, has closely approached the detection performance of the cloud model.

(2) Observations of Inter-Scenario Variability

In Scenarios 1 and 2 (with fewer aircraft), the reward function drops rapidly with slight fluctuations, and policy learning is nearly complete by the second iteration. This indicates that in simpler scenarios with less variability, the DQN policy converges faster, and the transfer strategy is easier to determine. In Scenarios 3 and 4 (with medium target density), the reward curves exhibit larger fluctuations but still converge after the fifth iteration. This suggests that in scenarios with moderate target density, the stability of policy learning remains satisfactory. In Scenarios 5 and 6 (high-density, crowded airports), the initial reward values are the highest, and the convergence process is slightly slower. This indicates that in complex scenarios, the discrepancy between the edge model and the cloud model is more pronounced, requiring the DQN to perform more interactions to learn an appropriate transfer strategy. However, convergence is eventually achieved, demonstrating the method’s adaptability in complex environments.

(3) Final Converged Value

According to the figure, in most scenarios, after 10 episodes, the consistency between the cloud-side and edge-side results reaches 100%, while in some scenarios it approaches 100%. This phenomenon may be attributed to the exploration-exploitation trade-off in the DQN learning strategy. In reinforcement learning, the ε-greedy strategy typically retains a certain probability of exploration, which introduces slight fluctuations or non-zero rewards during the training process. This phenomenon reflects the robustness of the strategy—it maintains the effectiveness of the migration while preserving the generalization ability of the policy.

4.4. Ablation and Baseline Comparison Experiments

For the evaluation indicators of side object detection, for each category, precision, recall, and precision-recall curve integration are used for evaluation (AP (average precision) value) [26].

P r e c i s i o n = \frac{T P}{T P + F P}

(9)

R e c a l l = \frac{T P}{T P + F N}

(10)

where TP is a true positive, FP is a false positive, and FN is a false negative. These values are determined by calculating whether the IoU (Intersection Over Union) between the detection box and the true box is greater than the MINOVERLAP threshold of 0.5.

A P = \int_{0}^{1} p r e c i s i o n (r) d r

(11)

AP is the accuracy of each class at different recall values, which is interpolated and averaged. The closer the precision model prediction accuracy is to 1, the fewer false positives (FP) there are; the closer the recall model coverage is to 1, the fewer missed detections (FN) there are; and the closer the AP overall prediction performance is to 1, the better the model’s performance [27].

The ablation study and baseline comparison experiments evaluate each aircraft type using precision, recall, and average precision (AP). Additionally, three aircraft types—A321, A330, and other-plane—are randomly selected for inference result recording under Scenario 2.

4.4.1. Analysis of the Impact on Performance After Removing the Reinforcement Learning Mechanism

In the absence of the reinforcement learning mechanism, the triggering of category transfer relies on manual configuration. This experiment aims to verify the effectiveness of introducing a reinforcement learning strategy in improving the model’s adaptability. Table 5 presents the inference performance comparison of different types of aircraft targets with and without the reinforcement learning mechanism.

The experimental results validate the effectiveness of introducing the reinforcement learning mechanism. As a migration trigger strategy, reinforcement learning plays a positive role in the detection of different target categories, improving the model’s adaptability to uncertain targets and enhancing detection accuracy to a certain extent. After removing this mechanism, the model loses its ability to automatically adjust inter-class migration strategies, resulting in a slight decline in inference performance.

4.4.2. Comparison of Different Feature Migration Mechanisms

For the edge-side migrated model, different feature migration configurations were designed by adjusting head.score.bias, head.score.bias+head.score.weight, and head.score.bias+head.score.weight+head.cls_loc.weight+head.cls_loc.bias. The performance of each configuration was compared with that of the original model. The experimental results are shown in Table 6.

As shown in the table, transferring the edge-side model’s classification head components head.score.bias+head.score.weight can improve the original model’s precision, recall, and average precision (AP) to varying degrees. For the A321 category, this transfer strategy achieves stable performance, with precision reaching 71.00% and AP reaching 53.71%, both exceeding the original model. For the A330 category, the precision reaches 39.62%, while in the other-plane category, the recall under this configuration reaches 59.40%, outperforming both the original model and single-parameter transfer. This indicates that it effectively balances recall and false positives. In contrast, transferring the entire classification and regression heads leads to performance degradation, suggesting that excessive transfer may introduce noise and compromise the model’s generalization ability.

4.4.3. Comparison Between the Proposed Migration Mechanism and Mainstream Model Adaptation Methods

Mainstream model adaptation methods mainly include strategies like model pruning, knowledge distillation, and lightweight structural design. To verify the effectiveness of the proposed transfer feature extraction mechanism, two representative baseline methods were selected for comparison: structured pruning and soft label distillation. The inference performance of these methods was evaluated against the proposed method on the edge model.

In the pruning part, a structured channel pruning strategy was adopted. Specifically, pruning operations were performed on the conv1 and conv2 layers within the head.classifier module of the edge model to reduce redundant network parameters and improve the model’s inference efficiency on edge devices.

In the distillation part, the classical soft label distillation method was adopted. This strategy fixes the teacher model (the pre-trained edge model weights) and compares the logits bias outputs of the student model under the same input, achieving parameter-level consistency learning. By aligning the outputs of head.score.bias[class_id] between the teacher and student models and minimizing the Mean Squared Error (MSE), local parameter knowledge transfer and learning for the target category are achieved.

The inference results are shown in Table 7.

As shown in the table, compared with the original model, the precision, recall, and AP metrics for each category exhibit slight fluctuations after applying the pruning strategy. However, the overall differences are not significant, indicating that the model still maintains strong representation capability even with reduced parameter volume. In addition, the soft label distillation strategy demonstrates performance that is almost identical to, or slightly better than, the original model across all categories. This indicates that effective knowledge transfer can be achieved by aligning and optimizing head.score.bias without modifying the model’s structure. In summary, the proposed transfer feature extraction mechanism maintains good robustness and stability compared with mainstream adaptation strategies, such as structured pruning and soft label distillation, further validating its feasibility and practical value in edge scenarios.

5. Conclusions and Future Work

This study proposes a cloud-edge collaborative model adaptation method that combines DQN-based decision making with transfer feature extraction, aiming to enhance the consistency between edge-side inference results and cloud-side fusion results while ensuring the robustness of the edge model. The proposed method validates the feasibility and effectiveness of a DQN-based local migration strategy in ensuring the consistency between edge-side inference results and cloud-side fusion results, while also maintaining the robustness of the edge detection model to a certain extent.

However, due to the inherent limitations of the edge model’s detection accuracy, the current feature migration strategy still struggles to significantly overcome performance bottlenecks in category classification tasks. In addition, the proposed method still has certain limitations in handling edge intelligence tasks, such as few-shot learning, new category adaptation, and long-term online updates. On the one hand, the local migration strategy relies on sufficient training samples of existing categories, making it less adaptable to new categories or scenarios with scarce data. On the other hand, as edge devices operate over extended periods, the number of target categories will continuously expand. Therefore, the model needs to possess continuous learning capabilities to prevent previously learned knowledge from being overwritten by new tasks, thus avoiding catastrophic forgetting.

Based on the above issues, future research will focus on the following three directions to further improve the cloud-edge collaborative optimization framework.

(1) Enhancing the Detection Accuracy of the Edge Model Itself

In the future, the detection capability of the edge model will be comprehensively improved by enhancing the diversity of training data, such as expanding data sampling and applying data augmentation strategies, and by optimizing the model’s architecture, for example, by introducing lightweight and efficient feature extraction modules and multi-scale detection heads. These enhancements will strengthen the model’s perception accuracy from the source, providing a more solid performance foundation for subsequent migration and updates.

(2) Introducing Meta-Learning Mechanisms to Enhance Few-Shot Adaptation Capability

The current strategy mainly focuses on migration optimization for known categories, lacking the ability to quickly adapt to new classes. Future research will introduce a meta-learning framework, such as MAML or Prototypical Networks, for pre-training on the cloud side. This will enable the edge model to rapidly learn from a small number of samples, effectively addressing few-shot scenarios, such as the emergence of new aircraft types in real-world applications, and enhancing the model’s generalization capability.

(3) Integrating Incremental Learning Strategies to Achieve Continuous Learning and Knowledge Retention

To address the practical need for continuously expanding target categories in edge scenarios, it is necessary to construct an update mechanism with knowledge retention. Future work will explore class-incremental learning methods, such as Elastic Weight Consolidation (EWC), Learning without Forgetting (LwF), and replay buffer strategies, to enable the model to retain the recognition ability of previously learned categories when incorporating new ones. This will help prevent catastrophic forgetting and support the long-term stable operation and knowledge-compatible learning of edge intelligence systems.

Author Contributions

Conceptualization, S.T.; Methodology, J.C., X.C. and Y.J.; Validation, J.C.; Formal analysis, S.T.; Investigation, Y.J.; Writing—original draft, J.C.; Visualization, Y.J.; Supervision, X.C. and S.T.; Funding acquisition, X.C. and S.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research is sponsored by the National Natural Science Foundation of China (62273147) and the Foundation of Shanghai Key Laboratory of Collaborative Computing in Spatial Heterogenous Networks (CCSN-2025-07).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Shan, C.; Gao, R.; Han, Q.; Liu, T.; Yang, Z.; Zhang, J.; Xia, Y. KCES: A Workflow Containerization Scheduling Scheme Under Cloud-Edge Collaboration Framework. IEEE Internet Things J. 2024, 12, 2026–2042. [Google Scholar] [CrossRef]
Lin, W.; Zhu, M.; Zhou, X.; Zhang, R.; Zhao, X.; Shen, S.; Sun, L. A Deep Neural Collaborative Filtering Based Service Recommendation Method with Multi-Source Data for Smart Cloud-Edge Collaboration Applications. Tsinghua Sci. Technol. 2024, 29, 897–910. [Google Scholar] [CrossRef]
Guo, L.; He, Y.; Wan, C.; Li, Y.; Luo, L. From cloud manufacturing to cloud-edge collaborative manufacturing. Robot. Comput. Manuf. 2024, 90, 102790. [Google Scholar] [CrossRef]
Xu, P.; Wang, K.; Hassan, M.M.; Chen, C.-M.; Lin, W.; Hassan, R.; Fortino, G. Adversarial Robustness in Graph-Based Neural Architecture Search for Edge AI Transportation Systems. IEEE Trans. Intell. Transp. Syst. 2022, 24, 8465–8474. [Google Scholar] [CrossRef]
Feng, L.; Yang, Y.; Tan, M.; Zeng, T.; Tang, H.; Li, Z.; Niu, Z.; Feng, F. Adaptive multi-source domain collaborative fine-tuning for transfer learning. PeerJ Comput. Sci. 2024, 10, e2107. [Google Scholar] [CrossRef] [PubMed]
Cao, Z.; Kwon, M.; Sadigh, D. Transfer Reinforcement Learning Across Homotopy Classes. IEEE Robot. Autom. Lett. 2021, 6, 2706–2713. [Google Scholar] [CrossRef]
Wang, Y.; Liu, H.; Zheng, W.; Xia, Y.; Li, Y.; Chen, P.; Guo, K.; Xie, H. Multi-Objective Workflow Scheduling with Deep-Q-Network-Based Multi-Agent Reinforcement Learning. IEEE Access 2019, 7, 39974–39982. [Google Scholar] [CrossRef]
Zhong, H.; Yu, S.; Trinh, H.; Lv, Y.; Yuan, R.; Wang, Y. Fine-tuning transfer learning based on DCGAN integrated with self-attention and spectral normalization for bearing fault diagnosis. Measurement 2023, 210, 112421. [Google Scholar] [CrossRef]
Jiang, Y.; Wang, S.; Valls, V.; Ko, B.J.; Lee, W.-H.; Leung, K.K.; Tassiulas, L. Model Pruning Enables Efficient Federated Learning on Edge Devices. IEEE Trans. Neural Networks Learn. Syst. 2022, 34, 10374–10386. [Google Scholar] [CrossRef] [PubMed]
Wang, C.-H.; Huang, K.-Y.; Yao, Y.; Chen, J.-C.; Shuai, H.-H.; Cheng, W.-H. Lightweight Deep Learning: An Overview. IEEE Consum. Electron. Mag. 2022, 13, 51–64. [Google Scholar] [CrossRef]
Li, Y.; Zhang, S.; Wang, W.-Q. A Lightweight Faster R-CNN for Ship Detection in SAR Images. IEEE Geosci. Remote. Sens. Lett. 2020, 19, 4006105. [Google Scholar] [CrossRef]
Mei, S.; Chen, X.; Zhang, Y.; Li, J.; Plaza, A. Accelerating Convolutional Neural Network-Based Hyperspectral Image Classification by Step Activation Quantization. IEEE Trans. Geosci. Remote. Sens. 2021, 60, 5502012. [Google Scholar] [CrossRef]
Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge Distillation: A Survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
Gou, J.; Sun, L.; Yu, B.; Wan, S.; Tao, D. Hierarchical Multi-Attention Transfer for Knowledge Distillation. ACM Trans. Multimedia Comput. Commun. Appl. 2023, 20, 1–20. [Google Scholar] [CrossRef]
Zhao, C.; Wang, S.; Li, D.; Liu, X.; Yang, X.; Liu, J. Cross-domain sentiment classification via parameter transferring and attention sharing mechanism. Inf. Sci. 2021, 578, 281–296. [Google Scholar] [CrossRef]
Xuan, S.; Zheng, L.; Chung, I.; Wang, W.; Man, D.; Du, X.; Yang, W.; Guizani, M. An incentive mechanism for data sharing based on blockchain with smart contracts. Comput. Electr. Eng. 2020, 83, 106587. [Google Scholar] [CrossRef]
Miao, Q.; Lin, H.; Wang, X.; Hassan, M.M. Federated deep reinforcement learning based secure data sharing for Internet of Things. Comput. Networks 2021, 197, 108327. [Google Scholar] [CrossRef]
Taghipour, S.; Namoura, H.A.; Sharifi, M.; Ghaleb, M. Real-time production scheduling using a deep reinforcement learning-based multi-agent approach. INFOR: Inf. Syst. Oper. Res. 2024, 62, 186–210. [Google Scholar] [CrossRef]
Liang, Z.; Yang, R.; Wang, J.; Liu, L.; Ma, X.; Zhu, Z. Dynamic constrained evolutionary optimization based on deep Q-network. Expert Syst. Appl. 2024, 249, 123592. [Google Scholar] [CrossRef]
Lee, S.; Choo, H.; Ismail, R. Smart Manufacturing Scheduling System: DQN based on Cooperative Edge Computing. In Proceedings of the 15th International Conference on Ubiquitous Information Management and Communication (IMCOM 2021), Seoul, Republic of Korea, 4–6 January 2021. [Google Scholar] [CrossRef]
Qin, W.; Chen, H.; Wang, L. PASD: A Prioritized Action Sampling-Based Dueling DQN for Cloud-Edge Collaborative Computation Offloading in Industrial IoT. In China Conference on Wireless Sensor Networks; Springer Nature: Singapore, 2022; pp. 19–30. [Google Scholar] [CrossRef]
Moustafa, M.S.; Metwalli, M.R.; Samshitha, R.; Mohamed, S.A.; Shovan, B. Cyclone detection with end-to-end super resolution and faster R-CNN. Earth Sci. Informatics 2024, 17, 1837–1850. [Google Scholar] [CrossRef]
Xiao, Y.; Tian, Z.; Yu, J.; Zhang, Y.; Liu, S.; Du, S.; Lan, X. A review of object detection based on deep learning. Multimedia Tools Appl. 2020, 79, 23729–23791. [Google Scholar] [CrossRef]
Nawaz, S.A.; Li, J.; Bhatti, U.A.; Shoukat, M.U.; Ahmad, R.M. AI-based object detection latest trends in remote sensing, multimedia and agriculture applications. Front. Plant Sci. 2022, 13, 1041514. [Google Scholar] [CrossRef] [PubMed]
Fosić, I.; Žagar, D.; Grgić, K.; Križanović, V. Anomaly detection in NetFlow network traffic using supervised machine learning algorithms. J. Ind. Inf. Integr. 2023, 33, 100466. [Google Scholar] [CrossRef]
Zhu, H.; Wei, H.; Li, B.; Yuan, X.; Kehtarnavaz, N. A Review of Video Object Detection: Datasets, Metrics and Methods. Appl. Sci. 2020, 10, 7834. [Google Scholar] [CrossRef]
Kaur, R.; Singh, S. A comprehensive review of object detection with deep learning. Digit. Signal Process. 2022, 132, 103812. [Google Scholar] [CrossRef]

Figure 1. The workflow of cloud-edge collaboration.

Figure 2. The framework of cloud-edge collaborative model adaptation.

Figure 3. The algorithm flowchart.

Figure 4. Scenes 1, 2, and 3.

Figure 5. Scenes 4, 5, and 6.

Figure 6. The evolution of the reward function over episodes across the six experimental scenarios.

Table 1. Q-Network architecture.

Parameter	Description
Input Dimension	3( $S c = (a c c_{c}^{e d g e}, a c c_{c}^{c l o u d}, Δ_{c})$ )
Output Dimension	2( $a_{c} \in {0, 1}$ )
Number of Hidden Layers	2 layers
Neurons Per Layer	64 neurons per layer
Activation Function	ReLU

Table 2. Reinforcement learning algorithm parameters.

Parameter	Value	Description
Learning Rate (lr)	1 × 10⁻³	Initial learning rate for the Adam optimizer
Discount Factor (γ)	0.99	Degree of consideration for future rewards
Initial ε Value	1.0	Initial exploration rate (fully random)
Minimum ε Value	0.05	Lower bound of ε to maintain some exploration
ε Decay Factor	0.995	ε is multiplied by this factor after each update
Replay Buffer Size	10,000	Stores past state-action transitions for replay
Batch Size	64	Number of samples per training iteration
Target Network Update Frequency	50	Synchronize target network parameters every 50 steps

Table 3. The overall architecture of the edge model (Faster R-CNN).

Module Name	Effect	Representative Shape
extractor	Feature extraction (ResNet)	Muti-level convolution and residual blocks
rpn	Region proposal	Candidate box classification and position regression
head.classifier	Feature encoding	Three bottleneck blocks
head.cls_loc.weight	Position regression head weight	(4 × 12, 2048)
head.cls_loc.bias	Position regression head bias	(4 × 12, 1)
head.score.weight	Class score output head weights	(12, 2048)
head.score.bias	Category score bias	(12, 1)

Table 4. Aircraft categories and quantities in the dataset.

Categories	Quantities
A220	10,420
A321	4159
A330	2502
A350	1613
ARJ21	319
Boeing737	6040
Boeing747	2658
Boeing777	2005
Boeing787	2523
C919	260
Other-airplane	16,930

Table 5. Inference results of reinforcement learning ablation experiment.

Category	Model Used	Precision	Recall	AP
A321	Original Model	70.30%	60.68%	53.66%
	Without Reinforcement Learning	67.44%	55.77%	47.17%
A330	Original Model	32.26%	55.56%	52.38%
	Without Reinforcement Learning	28.57%	50.00%	45.83%
Other-plane	Original Model	44.12%	57.69%	37.85%
	Without Reinforcement Learning	41.43%	55.77%	34.63%

Table 6. Comparison of accuracy results for different fine-tuned components of the aircraft model on selected categories.

Migration Category	Migration Model Site	Precision	Recall	AP
A321	Original model	70.30%	60.68%	53.66%
	head.score.bias	70.30%	60.68%	52.71%
	head.score.bias+head.score.weight	71.00%	60.68%	53.71%
	head.score.bias+head.score.weight+ head.cls_loc.weight+head.cls_loc.bias	31.28%	60.68%	34.87%
A330	Original model	32.26%	55.56%	52.38%
	head.score.bias	32.22%	55.56%	52.38%
	head.score.bias+head.score.weight	39.62%	55.56%	22.02%
	head.score.bias+head.score.weight+ head.cls_loc.weight+head.cls_loc.bias	36.85%	55.56%	20.47%
Other-plane	Original model	44.12%	57.69%	37.85%
	head.score.bias	44.11%	57.69%	37.85%
	head.score.bias+head.score.weight	32.94%	59.40%	30.26%
	head.score.bias+head.score.weight+ head.cls_loc.weight+head.cls_loc.bias	33.33%	59.83%	30.26%

Table 7. Inference results of different model adaptation methods.

Category	Model Used	Precision	Recall	AP
A321	Original Model	70.30%	60.68%	53.66%
	Pruning	67.44%	55.77%	48.43%
	Distillation	70.28%	60.68%	53.71%
A330	Original Model	32.26%	55.56%	52.38%
	Pruning	33.33%	50.00%	42.71%
	Distillation	32.22%	55.56%	52.38%
Other-plane	Original Model	44.12%	57.69%	37.85%
	Pruning	43.70%	56.73%	37.61%
	Distillation	44.11%	57.69%	37.85%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, J.; Cheng, X.; Jia, Y.; Tan, S. Cloud–Edge Collaborative Model Adaptation Based on Deep Q-Network and Transfer Feature Extraction. Appl. Sci. 2025, 15, 8335. https://doi.org/10.3390/app15158335

AMA Style

Chen J, Cheng X, Jia Y, Tan S. Cloud–Edge Collaborative Model Adaptation Based on Deep Q-Network and Transfer Feature Extraction. Applied Sciences. 2025; 15(15):8335. https://doi.org/10.3390/app15158335

Chicago/Turabian Style

Chen, Jue, Xin Cheng, Yanjie Jia, and Shuai Tan. 2025. "Cloud–Edge Collaborative Model Adaptation Based on Deep Q-Network and Transfer Feature Extraction" Applied Sciences 15, no. 15: 8335. https://doi.org/10.3390/app15158335

APA Style

Chen, J., Cheng, X., Jia, Y., & Tan, S. (2025). Cloud–Edge Collaborative Model Adaptation Based on Deep Q-Network and Transfer Feature Extraction. Applied Sciences, 15(15), 8335. https://doi.org/10.3390/app15158335

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cloud–Edge Collaborative Model Adaptation Based on Deep Q-Network and Transfer Feature Extraction

Abstract

Featured Application

Abstract

1. Background

2. Preliminaries

2.1. Definition

2.2. Problem Statement

3. Model Framework

3.1. Reinforcement Learning Decision and Strategy Mechanism

3.1.1. State Vector Construction

3.1.2. Reinforcement Learning Strategy

3.1.3. Reward Design and Decision-Making Mechanism

3.2. Local Transfer Mechanism of Edge Model

3.2.1. Local Classification Head Feature Transfer

3.2.2. Patch Fusion and Model Update

3.2.3. Edge Inference and State Update

3.3. The Algorithm Flowchart and Its Corresponding Pseudocode

4. Experiment

4.1. Dataset Description

4.2. Experimental Environment

4.3. Consistency Verification of Cloud-Edge Collaboration Implemented Through Reinforcement Learning

4.4. Ablation and Baseline Comparison Experiments

4.4.1. Analysis of the Impact on Performance After Removing the Reinforcement Learning Mechanism

4.4.2. Comparison of Different Feature Migration Mechanisms

4.4.3. Comparison Between the Proposed Migration Mechanism and Mainstream Model Adaptation Methods

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI