The optimization of offloading decisions is a widely researched topic in edge computing, with the goal of reducing latency and energy consumption in MEC systems. The use of reinforcement learning as a tool for generating offloading decisions has been widely studied, as it allows for the agent to interact with the MEC environment by making adjustments to offloading decisions based on changes in the environment’s state and reward values. The authors of [
12] studied the joint optimization problem of offloading decision and computing resource allocation in the time-varying environment with a single MEC server and multiple users, and proposed a Q-learning based computing offloading method to minimize the delay and energy consumption cost for all UEs. The authors in [
13] studied the task offloading problem in a multi-CAPs (computational access points) edge computing environment and proposed a DQN-based offloading strategy that dynamically adjusts the offloading ratio of tasks based on the states of the CAPs and tasks, in order to balance the overall delay and energy consumption of the system. The authors in [
14] researched the task offloading problem in a heterogeneous vehicular network, considering the dynamic channel changes caused by vehicle movement and the random task arrival condition in an edge computing environment. They proposed a computation offloading method based on DDPG, with the aim of minimizing the overall energy consumption and task delay of the system. The authors in [
15] studied the task offloading and resource allocation problem in a multi-site MEC RIoT environment, and proposed a hybrid hierarchical reinforcement learning method consisting of DDQN and DDPG. DDQN is responsible for generating subcarrier allocation decisions, while DDPG is responsible for offloading ratio, power allocation, and computing resource allocation decisions. The proposed method effectively reduces the weighted sum of energy consumption and delay. The authors in [
16] studied a computation offloading problem with multiple users competing for resources, aiming to minimize the delay and energy consumption. The authors proposed a computation offloading method based on DDPG, which determines the offloading location and ratio for the task MEC. The authors in [
17] proposed an actor–critic based computation offloading algorithm that generates offloading decisions and resource allocation in an energy harvesting computation offloading environment to maximize the number of offloaded tasks. The authors in [
18] propose a DDQN algorithm based on an attention mechanism to generate task offloading strategies composed of computation resource allocation and power allocation. The goal is to minimize task completion delay and energy consumption in the long term. The authors in [
19] propose a DQN-based action decomposition algorithm which decomposes the action space recursively into multiple actions and generates the decisions for server selection, offloading, and collaboration with multiple agents, in order to minimize delay cost. The authors in [
20] study the problem of computational offloading with task dependencies and propose a sequence-to-sequence based deep reinforcement learning method for generating offloading decisions with the aim of minimizing latency and energy consumption. The increasing number of users leads to the growth of the system’s state and action spaces, resulting in a heightened complexity of single-agent reinforcement learning methods. In order to alleviate this complexity, this paper proposes a task offloading approach based on multi-agent reinforcement learning, in which each user is trained as an independent agent.
Many studies use the framework of MARL to optimize offloading decisions for multiple users. MARL can solve optimization problems in complex environments through mutual cooperation between multiple intelligent agents. The authors in [
21] investigated the resource management problem in a vehicular network aided by MEC and UAV. They proposed a resource allocation algorithm based on multi-agent deep deterministic policy gradient (MADDPG), which treats each MEC server as an agent and enables the agents to collaborate in generating spectrum and computing and storage resources to meet the requirements of latency-sensitive tasks. The authors in [
22] study the task offloading problem of an energy harvesting multi-user MEC system and propose a multi-agent based AC algorithm to solve the problem, where each user is an agent and the agents collaborate with each other to generate offloading decisions. The objective is to minimize the execution time of the task. The authors in [
23] study the problem of computing offloading and resource allocation in a MEC system with multiple users and multiple MEC servers. Given the large number of users, the random arrival of tasks, and the time-varying nature of the environment, the authors proposed a multi-agent MADDQN-based offloading method for determining task offloading ratios and resource allocation decisions, with the aim of minimizing the weighted sum of delay and bandwidth. The authors in [
24] propose a hierarchical multi-intelligence reinforcement learning framework to solve the computational offloading problem by decomposing the problem into two subproblems, beamforming strategy and task allocation ratio, and solving them using the MADDPG and single DDPG algorithms, respectively, with the aim of maximizing energy efficiency. The authors in [
25] study a joint optimization problem based on computational offloading and interference coordination for smart small cell networks, and propose a MADDPG-based offloading method, while adding the idea of federal machine learning to MADDPG in order to reduce computational complexity, so that the model parameters can be reused by multiple agents. The objective is to reduce the energy consumption effectively while satisfying the delay requirements. The authors in [
26] study the task offloading problem for NOMA multi-user MEC systems, aiming to minimize the weighted sum of long-term power consumption and latency, and propose a computational offloading method for multi-agent reinforcement learning based on MADDPG, where individual intelligences use the same policy network to reduce the complexity of training. The authors of [
27] studied a multi-UAV and multi-MEC cooperative edge computing system. By jointly optimizing UAV trajectory, task allocation decisions, and resource management, the goal is to minimize the weighted sum of delay and energy consumption. Considering the high-dimensional continuous action space, the authors proposed a multi-agent reinforcement learning based MATD3 computation offloading method. The authors of [
28] proposed a computation offloading framework based on MADDPG for solving resource allocation problems in an integrated MEC network for terrestrial applications. The authors in [
29] proposed a computation offloading decision-making method based on QMIX multi-agent reinforcement learning to solve the problem of offloading server selection and task offloading ratio allocation in computation offloading. In this method, the agents make decisions based on both local observation information and global state, effectively reducing latency and energy consumption costs. The authors in [
15] study the computational offloading and resource allocation problem for railroad IoT edge computing and propose a hybrid deep reinforcement learning offloading method. The method is integrated by DDQN and DDPG so that a mixture of action discrete and continuous policies can be learned, with DDQN used to make subcarrier allocation decisions and DDPG used to make offload rate, power, and computational resource allocation, effectively reducing the execution time. The authors in [
30] study computational offloading in single-MEC server and multi-server scenarios and propose a MADDPG-based computational offloading approach to generate decisions such as task scheduling, transfer power and CPU cycles to minimize energy consumption and latency. In this paper, we investigate the problem of task offloading location selection and offloading ratio selection in computational offloading, and propose a hierarchical multi-agent reinforcement learning based computational offloading framework to solve this problem. We summarize the advantages and disadvantages of the proposed approach compared to existing work in
Table 1.