1. Introduction
Vigorously developing manufacturing and the real economy is conducive to increasing social wealth and better meeting people’s material and spiritual needs. It is also beneficial for providing more job opportunities and maintaining social stability. Furthermore, it is beneficial for increasing government fiscal revenue, meeting the needs of the public, and ensuring social welfare and public safety. Manufacturing companies need to constantly acquire new projects and orders. Enterprises ensure sustainable development by undertaking multiple projects. A large manufacturing enterprise often undertakes dozens or even hundreds of different projects, and how to manage the production of all projects is a huge challenge faced by the enterprise.
Every production project can generate more income, win more markets, or assume more social responsibility. Therefore, from the perspective of the project management department, there are almost no unimportant projects. However, production resources are always limited. When the same resource is allocated to multiple project plans and the necessary production resources cannot be obtained as expected, a resource conflict occurs. When there are disputes and conflicts among multiple project departments over the same critical resource, coordination can be realized only by higher-level managers. Under normal circumstances, coordinating multiple projects is extremely difficult. On the one hand, it is difficult for managers to evaluate the impact on the overall project plan after some project plans have been compromised. On the other hand, only the project management department knows the time margin of a project. However, to obtain more resources, the actual margin is generally not disclosed to personnel outside the project team.
Some new manufacturing enterprises or production workshops will try their best to avoid the mixed-line production of production projects when implementing the industrial layout. By constructing independent production units and lines, the abovementioned problems can be effectively avoided. However, most manufacturing enterprises have already formed enormous fixed assets. To undertake more production projects and respond to diversified market demands, mixed-line production must be carried out, and the products of different projects must be processed on each piece of processing equipment. Moreover, considering the cost of fixed asset investment, enterprises generally find it difficult to carry out large-scale transformation and upgrading of existing production resources. In such a situation, it is necessary for enterprise managers to organize the overall arrangement of project production plans and pursue the overall interests of the enterprise. Objective management is necessary for project surplus, project schedule impact, and project priority. In reality, multiproject management relies on the experience of managers, resulting in a lack of lean management in enterprises and a loss of overall benefits.
Based on the current situation of multiproject and multilevel planning management in manufacturing firms, this paper comes up with a planning model that combines PERT and reinforcement learning algorithms, namely, PERT-RP-DDPGAO. PERT is a technique that utilizes network analysis to develop and quantitatively evaluate plans. This paper decomposes the project plan through the PERT model and incorporates a feedback mechanism, which enables the PERT model to have dynamic optimization capabilities, complete the decomposition of the project plan, and distribute the decomposed enterprise project-level plan to various production units. After these plans arrive at the production unit, they are preliminarily decomposed into various pieces of processing equipment based on process documents, equipment resources, and working hours. This paper uses a manufacturing execution system (MES) to extract resource demand plans from processing equipment plans at 7-day intervals. The resource demand plans are used as intelligent agents, cleverly using matrix calculation methods to achieve intelligent agent actions. In PERT-RP-DDPG, RP refers to resource planning. Finally, based on the results of the PERT model, the DDPG learning model is applied to achieve automatic optimization of resource demand planning, which is called DDPGAO in this paper. In addition, by using the total time difference parameter, the calculation results of the PERT model are used as inputs for optimizing the resource demand plan. The optimization results of the resource demand plan are also fed back to the PERT model, which achieves multilevel planning management from the enterprise to the workshop and from the workshop to the section. Moreover, the optimization results and response times of the DDPG model are compared with those of traditional reinforcement learning models and greedy algorithms, and the advantages of the DDPG model in dealing with discrete production problems are discussed. The DDPG algorithm has a response time 3.0% lower than that of the DQN algorithm. The DDPG algorithm has a response time that is 8.4% shorter than that of the gray search algorithm. The DDPG algorithm has a response time reduced by 19.7% compared to that of the random search algorithm.
2. Related Work
With the ever-changing market, planning and scheduling management technology is constantly evolving, from economic order quantity models, material demand planning, and manufacturing resource planning to just-in-time production, enterprise resource planning, and load-based production control theory. The development of these theories has greatly promoted planning and scheduling development and improvement. Considering that planning is the carrier of resources and costs, people pay great attention to the management of planning. It is the main line of enterprise operations. If the accuracy of the plan continues to improve, the operational capabilities of the enterprise will also improve accordingly.
The research content of this paper is the multiproject and multilevel planning management of manufacturing enterprises. Mixed production is the greatest production feature and the most prominent problem for such enterprises. Based on production practices, the main mixed production problems can be divided into two categories: mixed production problems in a specific workshop and mixed production problems in multiple workshops.
In terms of solving mixed-line production problems in a specific workshop, Pereira studied the issue of mixed-line production in assembly workshops, optimized the fluctuation in the product output rate, and successfully developed a precise branch definition algorithm [
1]. Abdul Nazar and Madhusudanan Pillai also studied the mixed-line production problem in assembly workshops [
2]. Their research subjects were larger in size. Therefore, scholars have developed optimization solutions based on mutation algorithms. Siala conducted classification research on heuristic models and found that heuristic algorithms for branching and selecting classes have better feedback mechanisms [
3]. Sun and Fan proposed a scheduling model based on the ant colony algorithm to address the problem of mixed assembly of multiple orders in automotive assembly workshops, considering the impact of switching between orders [
4]. The ant colony algorithm was used to optimize the minimization of rule breaking times and target switching situations. An integrated model based on balanced production scheduling and buffer allocation was proposed by Lopes [
5]. An iterative decomposition method was used to solve the assembly mixed-line production model. A multiobjective algorithm based on free time, total duration, and idle time was proposed by Rauf to overcome multiobjective production scheduling problems [
6]. A new mixed-line scheduling model based on a simulated annealing algorithm combined with total duration minimization and idle time weighting was proposed by Mosadegh et al. [
7,
8]. The authors used the Q-learning algorithm to optimize heuristic rules. A mixed-line planning model considering preparation time was proposed by Nazar [
9]. This model focuses on the operation of the equipment. A multiobjective optimization planning algorithm was proposed by Wang [
10], focusing on maximizing net profit and reducing preparation time and turnover. On account of a better particle swarm optimization algorithm, a multiobjective optimization algorithm focusing on the plan completion rate and plan change rate was proposed by Zhong [
11]. Zhang used a genetic algorithm based on a cellular strategy to optimize the energy consumption and adjustment rate of production systems [
12]. Manavizadeh innovatively focused on the scheduling problem of mixed linear and U-shaped assembly lines and proposed a new heuristic algorithm [
13]. A new algorithm based on an integer linear programming algorithm and a hybrid genetic model considering assembly line length and the number of terminals was proposed by Defersha [
14].
In terms of solving mixed production problems in multiple workshops, an accelerated dynamic programming algorithm was used by Hong to minimize switching costs for solving the painting workshop scheduling problem [
15]. Leng transformed the color model of a surface treatment workshop into a Markov decision process and solved it [
16]. A taboo search algorithm that considers work and cache costs was proposed by Kampker by considering both the final assembly workshop and the assembly workshop together [
17]. A multiobjective integer linear programming model based on color batching load balancing and raw material balancing was proposed by Taube [
18]. A hybrid weighted model and integer programming algorithm scheduling model was proposed by Wu for the multistage planning problem of a surface treatment workshop, turnover workshop, and final assembly workshop [
19].
Based on the research above, special issues considered in this paper are introduced. This paper addresses another type of mixed-production situation. This situation does not consider a single workshop or a few workshops but, rather, all the production units of the enterprise. For a large enterprise, there may be more than ten or even dozens of production units. For the top management of the enterprise, the goal is to overcome the mixed production issue of all production units in the enterprise. At present, few people are conducting research in this field.
The research above reveals that scholars studying planning and scheduling algorithms mainly use heuristic algorithms, artificial intelligence algorithms, and hybrid algorithms. Heuristic algorithms include genetic algorithms [
20], taboo search algorithms [
21,
22,
23,
24], particle swarm optimization algorithms [
25,
26], and ant colony algorithms [
27].
Chen used a genetic algorithm to solve the fuzzy assembly line workshop planning problem considering resource occupancy in mixed flow shop scheduling [
28]. A genetic composite algorithm was proposed by Liu to minimize energy consumption and delay [
29]. A solution based on a genetic algorithm was proposed by Yu to solve the mixed-line scheduling problem of unrelated parallel machines in a workshop [
30]. A two-stage hybrid scheduling model considering energy conservation was proposed by Wang [
31]. Jamrus came up with the idea of combining two different heuristic models [
32]. They improved the particle swarm optimization algorithm based on Cauchy distribution and incorporated the concept of a genetic algorithm. This algorithm has made significant improvements in overcoming mixed-line problems. Robotic equipment is crucial for flexible and hybrid production, and the ant colony optimization algorithm was used by Elmi to solve the scheduling issue of multi-robot, hybrid production lines [
33].
An increasing number of scholars are applying artificial intelligence algorithms to solve planning and scheduling problems. Sun et al. used machine learning methods to schedule robot resources [
34]. Asghari et al. combined artificial intelligence computing models with genetic algorithms for scheduling cloud computing resources [
35]. Luo considered the impact of plan insertion and implemented dynamic scheduling in the workshop through machine learning [
36]. Zhang et al. used graph neural networks for workshop planning and control [
37]. Swarup et al. achieved results in saving computational costs by dynamically arranging cloud computing resources through machine learning [
38].
Among the numerous artificial intelligence algorithms, reinforcement learning models are highly favored. Reinforcement learning (RL) can enable intelligent agents to interact with the environment and achieve automatic scheduling of plans or resources through reward and punishment mechanisms [
39]. In recent years, some scholars have begun to pay attention to the management of multi-level planning systems. Zhao et al. regarded the workshop and logistics as two levels and used priority algorithms for planning optimization [
40]. Wan et al. divided cloud computing resource scheduling into user-level scheduling and sub-level scheduling [
41]. Manna and Bhunia treated inventory as an additional level of scheduling [
42]. Meanwhile, we have also noticed that no scholars have conducted multi-level planning and scheduling research on project management level and workshop resource level planning in manufacturing enterprises.
In summary, solving the multilevel planned mixed-line production problem of enterprise production unit equipment resources is highly important, but little related research has been conducted. Additionally, we note that the main target audience for scheduling problems is managers. Managers have another set of research ideas. Tripathi and Jha have focused on the management role of performance tools [
43]. Kadri and Boctor ingeniously combined time parameter calculation methods with genetic models [
44]. Olivieri et al. improved workflow and resource utilization through location management methods [
45]. Tripathi and Jha used success factors to model management models [
46]. Habibi established a mathematical model for supply chain management [
47]. These inspire us to combine management tools and models with artificial intelligence technology. This composite approach makes the new algorithm more in line with management activities, allowing artificial intelligence technology to leverage its advantages and assist managers in making decisions.
This article is based on the above discussion, combining advanced artificial intelligence technology and management tools (resource planning and project management models) to transform the behavior of managers using management tools into the behavior of machines constantly making decisions using management tools. From the research results, it can be seen that the interdisciplinary integration of computer algorithms and management tools has significant innovation in the field of scheduling algorithm research. The following will elaborate on the methods, experiments, discussions, and conclusions.
3. Method
The PERT-RP-DDPGAO algorithm includes a module framework and data acquisition, a PERT optimization model, a resource plan processing method, and an automatic optimization model based on DDPG.
3.1. Module Framework and Data Acquisition
The research object of this article is the most common machining processes in manufacturing enterprises. Mechanical processing is generally divided into small product mechanical processing and large product mechanical processing. The ordinary mechanical processing of small products has a short time cycle and can be performed using multiple pieces of equipment, generally without causing resource conflicts. Large-scale product machining generally involves medium-to-large-scale machining centers, which have high difficulty in product processing, high equipment value, and long production cycles and are prone to resource conflicts during mixed-line production. To solve practical production problems, this paper focuses on the mechanical processing of medium- and large-sized products and explains the content of the PERT-RP-DDPGAO algorithm model.
Figure 1 shows the PERT-RP-DDPGAO algorithm framework, which is divided into an enterprise planning layer, a production unit planning layer, and an equipment planning layer based on the application scenarios. At the enterprise planning level, the model obtains product structure tree information and standard operating time information from the manufacturing execution system. The PERT model takes a structural tree model and standard operating time information as inputs to form a product plan and the total time difference for each product plan. Through the PERT model, project plan decomposition is achieved, resulting in a planned dispatch from the enterprise to the production unit. After receiving the plan, the production unit forms a resource plan for equipment or workstations based on process information and work hour quota information. Through this mechanism, a planned dispatch from the production unit to the equipment is formed. The last layer is the equipment planning layer. At the device planning level, the DDPGAO model takes resource planning on mechanical processing equipment as an intelligent agent and optimizes resource planning autonomously through reinforcement learning. During the optimization of the DDPGAO model, the operation processing plan is adjusted. After changes in the process processing plan, the process processing plan is fed back to the product plan, which has an impact on the total time difference. This paper adds a feedback algorithm to the PERT model so that the results of the equipment planning layer can be fed back to the enterprise planning layer, enhancing the robustness of the entire planning system.
According to the theory of project management, manufacturing enterprises determine the composition structure of products based on the product structure tree when designing the product project organization. The composition structure of a product represents the logical sequence of processing from parts to components and from components to the final finished product. In addition, most manufacturing enterprises have established information systems that can conveniently determine the processing cycle of each product. Therefore, with knowledge of the product processing logic and processing cycle, the project can be decomposed into work. Usually, companies add standard operating times to the product structure tree. The product structure tree information is represented by
, as shown in Equation (1). In Equation (1),
represents the product name,
represents the product-level code, and the product processing logic relationship can be obtained through the product-level number.
represents the standard operating time,
represents the loose operating time, and
represents the emergency operating time.
is the loose coefficient, usually taken as 1.3.
is emergency coefficient, usually taken as 0.8.
The calculation equations for the loose operation time and emergency operation time are shown in Equation (2) and Equation (3), respectively.
3.2. PERT Optimization Model
Current product projects are very complex. If the final product is unfolded in the form of a product tree, a very complex tree-like structure is obtained, and the number of branches can reach thousands or tens of thousands. To cope with complex project management, manufacturing enterprises have developed the PERT model based on operations research theory. This model calculates key time parameters based on project task decomposition and the operation time of the task. By analyzing time arguments, elements such as the planned time, critical work, critical path, and total duration are obtained to support managers in better project management. The setting of the model needs to consider the field requirements of the information system.
must include the logical relationship expression of the task and related time parameters, as detailed in Equation (4). The nodes before and after the task are represented by
, and
represents the logical relationship between related tasks. The duration of the task is represented by m. The pre-task and post-task work nodes are assigned by
and
represents. Total float
and task completion status
are key control parameters.
can be determined through the task handover procedure.
The difference between the earliest start time and the latest start time is
. For a one-to-one link, the earliest start time of point
P on the left side of node
i is denoted by
represents. The duration between node
i and node
j is represented by
. For many-to-one links, it is
. The calculation methods are shown in Equations (5) and (6).
Compared with Equation (6), the main logic for the latest start time is to change from the maximum value to the minimum value. Duration between node
j and node
k is represented by
. Latest start time of the node on the right side of node
j is represented by
.
Using the principle of PERT technology, Equations (8) and (9) can be derived through Equations (5)–(7).
The backend of the algorithm program uses Python 3.5 language and the NetwordX 1.11 module. In the front-end design, the paper adopts SpringCloud. SpringCloud is able to utilize the development of Spring Boot to simplify the development of distributed system infrastructure.
To increase the robustness of the PERT model,
and
are subtracted from Equation (3), as shown in Equation (10).
This paper incorporates a feedback mechanism into the PERT model, as shown in
Figure 2.
The focus of the feedback mechanism is to record and monitor the total time difference. In addition, this paper introduces the emergency time in Equation (3). By using the emergency time, the compressible time of the task can be calculated. By compressing the time of key tasks, the total project duration can be shortened, ultimately achieving the goal of controlling the total project duration.
3.3. Resource Plan Processing Method
This paper achieves project plan decomposition through PERT technology. These decomposed plans can be called enterprise-level plans or production unit plans. After receiving the plan, the production unit can derive the process-level plan based on the working hour quota of each process of the product. The working hour quota for a certain product is set to
, as shown in Equation (11).
Although the working hour quota cannot fully represent the actual processing time of the product, it can accurately identify which processes have longer and shorter processing times and determine the proportion of time needed for each process. Therefore, through Equation (12), this paper can obtain the plan for process
.
represents the planned time of process
.
represents the project plan time calculated by PERT, usually the end time.
represents the processing cycle of the product in the production unit.
Through , this paper can obtain the production plan of the product process. After associating production resources or equipment in the product process production plan, the resource demand plan for a certain resource or equipment can be obtained.
This paper presents the resource plan in the form of matrix
, as shown in Equation (13). Based on the actual work situation, the model retrieves equipment resource plans on a weekly basis. The horizontal column of
represents the daily processing quantity of project a on this equipment within a week. The vertical column of
represents the types of projects undertaken within a week.
Through matrix processing of resource planning, the resource plan can act as an intelligent agent for reinforcement learning, achieving automatic coordination and arrangement of resource planning. To carry out resource planning and coordination, the manager needs to calculate the total number of tasks undertaken on the equipment every day and compare the processing capacity of the equipment. The current
matrix still lacks these elements, so it cannot be used for subsequent calculations. To achieve the subsequent calculation goals, it is necessary to extend matrix
to matrix
, as shown in Equation (14). In Equation (14), A represents the total planned quantity of vertical projects, and B represents the task-carrying capacity of the equipment (the maximum quantity that can be processed on the same day). C represents the difference between the task-carrying capacity and the total number of assigned tasks.
3.4. Automatic Optimization Model Based on DDPG
Through matrix processing, resource planning can serve as an intelligent agent for reinforcement learning. Another advantage of the resource planning matrix is that it can achieve overall planning actions through matrix operations.
Therefore, this paper improves matrix
by adding a new column and initializing it to 0 to form final resource matrix
, as shown in Equation (15). Additionally, a new action matrix
is established. The first few rows of the action matrix correspond to the planned quantity rows of resource matrix
, and each row has only one pair (+1/−1), as shown in Equation (16).
Matrix
enables resource planning agents to take action through paired (+1/−1) operations. We simulate the situation where the plan for this week is called out through the eighth column of the matrix. This paper incorporates a plan adjustment constraint, which means that the plan for this week should be completed as much as possible. If it cannot be completed during the week, it must be arranged on the first day of the next week. The paper adds matrix
to matrix
to obtain
, forming an adjusted resource plan.
Our goal is to train a strategy to automatically coordinate and balance resource plans. However, from matrix , it can be seen that the action space of the resource planning matrix is relatively complex and diverse, and the combination of +1/−1 can randomly appear at any position in matrix . It is difficult to obtain the optimal strategy using only deep reinforcement learning models. This paper proposes the use of the DDPG (deterministic policy gradient) algorithm, with the actor–critic algorithm as its basic framework, deep neural networks as approximations of the policy network and action value function, and the use of the stochastic gradient method to train the parameters of the policy network and value network models.
The DDPG algorithm framework is displayed in
Figure 3. The value of proxy operations is accurately evaluated by the critical network. There will be continuous interaction between agents and the environment. This interaction is also an iterative and trial and error process. Reward
, observed state
, selected action
, and new state
in replay buffer
are saved in the network. The agent trains the critic network from small-batch sampled data in the replay buffer. By using this training method, the difference in output from the target neural network is reduced.
is the expected value of the difference between the -value of the target critic network and the -value of the training critic network. is the sampling through mini-batch data from replay buffer . γ is the discount factor. is the training network parameter, and is the target critic network parameter; these parameters give the weights and bias. is the parameter index. is the policy of the actor network, and is the parameter of the target actor network.
The actor network uses the state of the environment as the input and actions as the output to calculate the policy. The method of evaluating the policy of the actor network is to use the
-value, which is the output of the critic network.
is the expected -value through an action selected according to the policy, and the actor network trains to increase . denotes the states sampled from replay buffer In fact, even if the model uses a determined participant network learning strategy, the accuracy of the results is still questioned. An exploration process still needs to be added to the model to determine the appropriate strategy. The Gaussian noise method was used in the detection process in the paper. During the training process, Gaussian noise is added to the actions generated by the actor network, further allowing for the exploration of various actions. The reward function mainly considers four elements. First, the total sum of the vertical columns of the resource plan matrix does not exceed the capacity, indicating task completion. Second, there is a collision when negative elements or the total time difference of the task is negative. Third, the more nonzero values there are in the eighth column, the greater the penalty. Fourth, the greater the total time difference between the priority tasks, the greater the reward.
The parameters of the target network for critics and actors are updated according to a certain cycle. This update is based on the level of network training, and when the training level is determined, the new target value is also determined. In the subsequent calculation process, this value can be fixed.
4. Experimental Evaluation and Discussion
4.1. Experimental Environment Design
The multilevel planning system includes an enterprise-level plan, a workshop-level plan, and an equipment-level plan. The enterprise-level plan mainly revolves around project management. The PERT method is used in project management to decompose plans. The project management environment is mainly based on the enterprise resource planning (ERP) system. The ERP system distributes the plan to the production workshop. The workshop receives tasks through the MES and dispatches them to equipment to form a resource plan. The management of resource plans is mainly based on MES. ERP systems and MES collect, transmit, and control equipment data through industrial control networks. This paper officially runs the PERT-RP-DDPGAO algorithm through the industrial information implementation framework shown in
Figure 4.
Mechanical processing is the most common processing method used in manufacturing enterprises. The main types of mechanical processing equipment used are small mechanical processing equipment, medium mechanical processing equipment, and large mechanical processing equipment. Among them, large-scale mechanical processing equipment is expensive, with a small number of pieces of equipment, and most of the time, it undertakes the processing of key and difficult products. In the actual production process, resource conflicts often occur.
Therefore, this paper focuses on the mechanical processing tasks of large-scale products in manufacturing enterprises, collects resource plans on large-scale mechanical processing equipment through production information systems, and conducts experimental analysis. The collection frequency of the resource plans is 7 days.
Through the experimental environment, the paper extracted the resource plan of a certain device and formed
Table 1 using a visual approach. The outcomes of the resource planning matrix are presented in
Table 1, which indicates that the paper represents three different items on the equipment in different colors, and the number of tasks for the day is calculated according to the previous formula. Then, the difference between the number of tasks and the equipment capacity is calculated to obtain the carrying capacity. If the total number of tasks is greater than the device’s faculty, the image is red. If the number of tasks is less than the device’s capacity, the image is green. Large-scale mechanical processing equipment is generally single-piece processing, and the equipment’s capacity is based on a 12 h working system with a maximum of two product tasks to be undertaken on the same day.
This paper adopts visualization processing to contrast the actual performance of the model after reinforcement learning training. The preliminary extracted resource plan visualization graph is illustrated in
Figure 5, which indicates the task distribution of three large-scale machining tasks over a period of 7 consecutive days. The green and red columns show that there is no conflict issue with respect to the tasks on Days 1, 2, and 5, while there is a conflict issue with respect to the tasks on Days 3, 4, 6, and 7; these issues need to be automatically adjusted through the model.
4.2. Experiment and Discussion on the Automatic Coordination of Resource Planning
This paper substitutes the data in
Table 1 into the new model for calculation and obtains the optimized results of the model. Then, the optimized results are visualized to form
Figure 6.
Figure 6 shows that the model has successfully coordinated the resource plan through continuous attempts. All tasks do not exceed the maximum capacity of the device. Comparing
Figure 6 and
Figure 5, this paper finds that the model moved Project 3 from Day 3 to Day 1 and moved Project 3 from Day 4 to Day 2. The main reason for this result is that our reward function stipulates that the fewer parts there are beyond our ability, the greater the total time difference of the task, and the greater the reward. Similarly, Project 3 on Day 6 was moved to Day 5. In subsequent calculations, the paper finds that the model ultimately exhibited low convergence. Therefore, we add a control function for the total time difference in the reward function. The control function of the total time difference calculates the total time difference impact of each project task. This total time difference affects the search for the corresponding project’s total time difference. If the impact on the total time difference can be tolerated, there will be rewards; if it cannot be tolerated, there will be punishments. We see this punishment as “hitting a wall”. Therefore, the results in
Figure 6 are closely related to the setting of the reward function. The feasibility of the model is verified through the experiments in this paper.
In the process of continuous model calculation, this paper discovers another advantage of intelligent algorithms. Managers often have great confidence in their own judgments, with the main goal of solving practical problems. Moreover, intelligent algorithms may achieve leaner results. Therefore, this paper analyzes and processes resource plans with single-point conflicts for experienced managers and intelligent models, forming
Figure 7;
Figure 7A shows the target resource plan;
Figure 7B shows the results of the manager’s analysis and processing, and
Figure 7C shows the results of the intelligent algorithm’s analysis and processing. This paper finds that managers achieve a balance in resource planning by eliminating peaks and filling in low values in management. Managers believe that their experience plays an important role when they do not see the results of intelligent algorithms. The intelligent algorithm yields even better results. This paper requires the optimal sum of the total time difference of project tasks in the reward function. Therefore, the intelligent algorithm obtains the results in
Figure 7C. These results not only achieve a balance in resource planning but also improve the total time difference between project tasks compared to
Figure 7B. The greater the total time difference, the stronger and more stable the anti-interference ability of the project plan.
From the above, it can be seen that managers may be influenced by various factors, such as personal abilities, work environment, and work status, when balancing resources, resulting in inadequate consideration. In other words, most of the decisions made by managers are correct solutions rather than optimal solutions. In order to achieve the goal of an optimal solution, the paper proposes the PERT-RP-DDPGAO algorithm, which allows machines to solve the optimal solution of planning and coordination through self-learning. The PERT-RP-DDPGAO algorithm converts the impact of plan adjustments and the matching of resources and plans by managers into time parameters, resource planning matrices, and reward functions. By using this method, the computer can obtain the optimal solution through rigorous calculations. Of course, computers are also affected by the accuracy of input data, the operating environment, system interfaces, and other factors during calculations, resulting in biased results and risks. Therefore, the output results require manual verification by managers. The results of retesting can only be used for actual production. The computer-aided production scheduling method enables managers to make more unified and scientific decisions, gradually reducing human uncertainty factors. This is also a requirement for the development of modern enterprises.
The final core of the PERT-RP-DDPGAO algorithm lies in the DDPG algorithm section. The DDPG algorithm part is the key environment for implementing the planning and coordination function. This paper aims to further validate the value of the engineering application of DDPG. We conduct comparative experiments between the DDPG algorithm, the DQN algorithm, the greedy search algorithm, and the random search algorithm. This paper first chooses the DQN algorithm because some of the principles of DDPG are the same as those of DQN, but the actor–critic algorithm is added as the basic framework. It is hoped that the superiority of DDPG in continuous control problems can be further verified through new experimental objects. Moreover, we verify whether this algorithm can be used in resource planning and coordination application scenarios, such as whether the response speed is better. Then, heuristic algorithms are considered for comparison to verify the advantages of reinforcement learning in handling such control problems.
This paper compares the convergence and response speed of the four algorithms mentioned above, forming
Figure 8 and
Figure 9.
The comparison results show that the DDPG algorithm outperforms the other algorithms in terms of convergence and response. The DQN algorithm is superior to heuristic algorithms. First, this paper briefly explains that the DQN algorithm deep Q-network can be used to solve continuous state space problems. The uniqueness of the DQN algorithm lies in the experience replay and the target network. When training the Q-network, experience replay can break the correlation between data and make the data independently distributed, thereby reducing the variance of parameter updates and improving the convergence speed. The use of the target network can alleviate the problem of overestimation to a certain extent and increase the stability of learning. In the application scenario of resource planning, DDPG has more advantages than DQN. In other words, DDPG has its own uniqueness in continuous control problems.
The response time is an important indicator for measuring algorithm efficiency. This paper extracts the average response time values of the DDPG algorithm, the DQN algorithm, the gray search algorithm, and the random search algorithm, shown in
Table 2. This table shows that the average response time of the DDPG algorithm is greater than that of the other algorithms. The DDPG algorithm has a response time 3.0% lower than that of the DQN algorithm. The DDPG algorithm has a response time that is 8.4% shorter than that of the gray search algorithm. The DDPG algorithm has a response time reduced by 19.7% compared to that of the random search algorithm.
The deep deterministic policy gradient algorithm is an optimization of the DQN that combines the idea of the deterministic policy gradient algorithm and innovatively adopts a model-free deep reinforcement learning algorithm. The dual neural network architecture is used in the DDPG algorithm architecture. Both the strategy function and value function use a dual neural network model architecture (i.e., an online network and a target network). This dual structure makes the learning process of the algorithm more stable and accelerates the convergence speed. Moreover, the DDPG algorithm introduces an experience replay mechanism; the experience data samples generated by the interaction between the actor and the environment are stored in the experience pool, and batch data samples are extracted for training. This training method is similar to the experience replay mechanism of the DQN. This mechanism can eliminate the correlation and dependency of samples, facilitating algorithmic convergence. This is also the reason why the DDPG algorithm can achieve good results in comparative analysis.
5. Conclusions
This paper proposes a new intelligent scheduling model, PERT-RP-DDPGAO. This model decomposes project plans using PERT technology. After the project plan is decomposed, a resource plan is formed, and the resource plan is trained as an intelligent agent in the DDPG model to achieve automatic coordination of multiple projects and multilevel plans in an enterprise.
The PERT-RP-DDPGAO model adds a feedback environment to traditional PERT techniques, improving the robustness of traditional algorithms. This paper studies resource planning, which has received little attention in scheduling algorithm research. For the first time, the resource plan is transformed into a resource plan matrix through the matrix formula, and the control action of the resource plan is simulated through matrix operation. After the resource plan is matrixed, the DDPG algorithm is used to achieve automatic coordination of the resource plan. After analysis, the results of automatic coordination have practical managerial implications and potential for engineering applications.
Finally, this paper conducts comparative experiments on the DDPG part of the new algorithm with the DQN algorithm, random search algorithm, and greedy search algorithm, analyzing and verifying that the DDPG algorithm is superior to the other algorithms in terms of convergence and response speed. The DDPG algorithm has a response time 3.0% lower than that of the DQN algorithm. The algorithm has a response time that is 8.4% shorter than that of the gray search algorithm. The algorithm has a response time reduced by 19.7% compared to that of the random search algorithm.
The PERT-RP-DDPGAO algorithm considers engineering applications, simplifies the complexity of the production process, and mainly focuses on the core planning parameters of project management. The core input parameters of the algorithm are planned time, time parameters, and resource capability. The planned time and time parameters are derived from the PERT algorithm, while the resource capability is obtained through manual filling. Therefore, the core of the application of the algorithm is for enterprises to have the ability to track product plans. Through practical application, it has been found that enterprises with ERP and MES systems can use this algorithm. However, this algorithm still has the following limitations: Firstly, algorithms require enterprises to have basic information technology capabilities, such as establishing ERP or MES systems that can provide real-time feedback on product progress. Secondly, the algorithm has only been applied to machining devices, and the generalization of the resource planning matrix still needs to be improved. Thirdly, the application of algorithm results also needs to be combined with management processes. The fourth issue is that the algorithm does not involve risk management.