1. Introduction
CSP appears in many practical application scenarios, such as cutting steel sheets, wood, electric wires, and paper rolls [
1]. There is a demand for cutting stock in many industries, such as aerospace, automobile, shipbuilding, energy, construction, and machinery manufacturing [
2,
3]. According to the relevant literature [
4,
5], cutting stock optimization can save 30% of the cost for some companies and reduce greenhouse gas emissions, which allows it to become a method of green manufacturing. According to the dimensions of raw materials and pieces [
6], CSP can be divided into the one-dimensional cutting stock problem (1DCSP), two-dimensional cutting stock problem (2DCSP), and three-dimensional cutting stock problem (3DCSP). Among them, as the research basis of the CSP, 1DCSP refers to cutting raw materials of known length into pieces of different lengths according to the required quantity. Previous research showed that the 1DCSP belongs to class of NP-hard problems [
7], which are problems in which an exact solution cannot be obtained in polynomial time. As the scale of the problem increases, the calculation time increases dramatically, which brings great challenges to the design of the algorithm. Dyckhoff et al. [
8] elaborated on the similarity of the 1DCSP and the one-dimensional packing problem (1DBPP) by comparing and classifying them. Some scholars solved the 1DCSP by using a one-dimensional packing algorithm model. Brando et al. [
9] proposed a general formula on the basis of the bin packing problem (BPP), which solved a large number of case-based CSPs and BPPs. Chang Yang et al. [
10] drew on the idea of the best-fit decreasing (BFD) algorithm in solving BBP and proposed a heuristic cutting algorithm based on multi-branch tree traversal, which achieved good results.
Currently, scholars have proposed various optimization algorithms for the 1DCSP, such as linear programming, the simplex method, dynamic programming, heuristic algorithms, genetic algorithms, simulated annealing algorithms, and evolutionary algorithms [
11,
12,
13,
14,
15,
16]. Among them, the performance of the genetic algorithm is better, which indicates that the balance between global search and local convergence is important for optimizing results. Shenglan Zhu [
17] not only designed a genetic algorithm with fixed-length real coding to calculate the 1DCSP of a single specification and multiple specifications, but also set penalties for infeasible solutions with individual fitness greater than 1. This method achieved better results. For the 1DCSP under a random environment, Cui Y et al. [
18] and Junyan Ma et al. [
19] proposed a heuristic algorithm based on column generation to solve the problem of available surplus materials in the 1DBPP. Belov G et al. [
20] and Jingjing Cao et al. [
21] studied the 1DCSP using the framework of sequential value correction (SVG). In recent research on the 1DBPP [
22,
23,
24], some researchers mainly improved meta-heuristics and heuristic algorithms based on the utilization of usable leftover raw materials, which consists of building a cutting pattern by sorting the items of pair or an odd length in descending order. However, for the large-scale 1DCSP, there are generally millions of cutting methods. Thus, using traditional algorithms to solve the problem has low efficiency. Moreover, heuristic algorithms and improved-heuristic algorithms need rules to be designed for specific cutting stock problems. Designing such rules requires the high professional skills of designers. Although the heuristic algorithm can obtain a wide range of solutions, it cannot always obtain a high-quality solution. In addition, in actual production, the 1DCSP is mostly in the state of experience cutting because there are few simple, practical, and versatile methods to solve the 1DBPP of large-scale and multiple batches [
25].
Machine learning, which has emerged in recent years, has been widely used in many fields. Amazing achievements in scheduling [
26,
27] and path optimization [
28,
29,
30,
31] reveal an interesting combination between artificial intelligence and operations research. Most machine learning methods for solving combinatorial optimization problems focus on learning to construct heuristic algorithms. Solutions can be generated under a given set of input nodes. These methods are often represented by sequences [
32]. The graph neural network [
33] and attention mechanism [
34] produce high-quality solutions. However, the construction method requires additional algorithms to obtain advanced performance, such as cluster search, classical improved heuristic algorithm, and random sampling. The solution to some cutting stock problems based on reinforcement learning depends on its learning rules and heuristic algorithms. Anselmo R et al. [
35] not only proposed a formulation of the stochastic cutting stock problem as a discounted infinite-horizon Markov decision process, but also proposed a heuristic method based on reinforcement learning. The results showed that the average cost could be up to 80% lower than the cost obtained by a myopic policy. The previous work [
36] also shows that the combination of reinforcement learning search and heuristic algorithm can achieve certain effects in solving the cutting stock problem, but there are certain limitations, such as the randomness of the search. As the mainstream method of machine learning, DRL has been successfully applied to image analysis [
37], video classification [
38], intelligent translation [
39], and other fields. By evaluating and comparing the results of the algorithm, it is found that DRL has the potential for good scalability, generalization, and versatility. The pointer network model proposed by Vials O et al. [
40] was the first modern depth model for combinatorial optimization problems and solved the traveling salesman problem (TSP) and knapsack problem. Based on the pointer network, Bello I et al. [
41] used reinforcement learning training to solve the sequence problem, which not only saved computational costs, but also obtained the optimal solution for the large-scale TSP problem. Hao Jie [
42] used DRL to study the dynamic TSP problem and obtained good results. Lombardi M et al. [
43] used machine learning and deep learning to learn heuristic algorithms and successfully solved the problem of combinatorial optimization. Some researchers [
44,
45] solved the TSP problem with supervised learning and reinforcement learning, which achieved certain results. Hang Zhao et al. [
46] solved the online the 3D bin packing problem by describing the variant of 3D bin packing problem as a constrained Markov decision process and proposed an effective and easy-to-implement constrained deep reinforcement learning method under the action-critic framework.
At present, there is no research using the DRL method to solve the 1DCSP, but some researchers are trying to solve the problem using neural networks aimed at CSP characteristics. Kang M et al. [
47] tried to solve the 1DCSP using an artificial neural network and achieved certain results. AR De et al. [
48] solved the 1DBPP by comparing the augmented neural network AugNN meta heuristic with the minimum bin slack MBS heuristic. These algorithms ignore the advantage of using deep networks to construct solutions autoregressively. These algorithms are limited to artificial decision-making networks and heuristic algorithms and need to be designed with solution rules separately for the problem, while ignoring the development and utilization of shared underlying patterns and self-learning optimization in cutting stock instances. The research on 1DCSP based on DRL is not only an early attempt to solve the cutting stock problem by using the deep network, but also a promotion and inheritance of existing research achievements. Further, exploration of cutting stock algorithms based on DRL can not only provide new ideas for online solutions to cutting stock problems, but also provide a new reference for solving more combinatorial optimization problems; the solution has great theoretical significance and application potential.
In this paper, a one-dimensional wire (that is, a unified name for raw materials of pipes and profiles by cutting stock) is taken as the research object, and a solution algorithm based on the DRL is proposed to solve the problem of a single specification cutting stock (that is, the length of raw materials available for cutting is only one size and the number is unlimited, while the length and number of wire pieces to be cut are certain). A pointer network with an encoder and decoder structure is taken as the strategy network, and the cutting sequence selection process of pieces to be cut is modeled as a Markov decision-making process. Then, the network parameters are trained using the reinforcement learning algorithm. In the training process, the strategy network and reward value are continuously optimized to obtain a satisfactory solution. Finally, the trained model is tested based on cutting stock instances to prove the effectiveness of the algorithm model.
4. Calculation Experiment and Analysis
The Python programming language is used in this study, and the neural network construction method based on PyTorch [
56] is adopted. The calculation test of the DRL-CSP algorithm is carried out on a computer with a 2.30 GHz AMD Ryzen 7 3750H CPU with 4 cores and 16 GB of RAM. The training set in this paper is obtained by the cutting generation method, and the test set is composed of instances data and randomly generated data. The performance of the DRL-CSP algorithm is tested by using instances of the 1DCSP in the literature of Peiyong Li et al. [
57] and Xianjun Shen et al. [
58]. These instances are also the data of the real steel cutting stock in the enterprise, as shown in
Table 1 and
Table 2. Furthermore, randomly generated large-scale instances of more than 1000 pieces to be cut are also used to test the performance of the algorithm model, as given in
Table 3. In the experiment, the utilization rate of raw materials is taken as a reward. That is, the length consumption and residual length of raw materials are within the scope of optimization. Each piece
L with a fixed specification can be cut into 500 small pieces at most. The minimum length of piece
lmin is set to 1000, and the maximum length
lmax is set to 5000. The number of samples is set to 500, and the number of training steps is set to 50. Our batch size
B is set to 32, and the hidden dimension of the LSTM cell is set to 32. The initial learning rate of the model is set to le-3, and the discount factor of the reward is set to 1. The training takes approximately 90 min. During the training process, with the increase in the training epoch, the loss value of the network changes, as shown in
Figure 4.
4.1. Experimental Results
Table 4 shows the comparison between our experimental results and the algorithm results in related literature [
57,
58,
59]. The specific cutting stock results of instances S1 and S2, based on the DRL-CSP algorithm, can be found in
Table 5 and
Table 6. Through the analysis of
Table 4, the DRL-CSP algorithm proposed in this paper has achieved similar results compared with the current excellent cutting stock algorithm of instance S1 with 28 specifications of pieces and instance S2 with 35 specifications of pieces. In instance S1, the raw materials with a quantity of 24 are consumed under the DRL-CSP algorithm, which is one more raw material consumption than the calculation result of the classical algorithms (HGA [
57], AGAPSO [
58], DEPES [
59], PES [
59]), and one more consumption than the quantity of the theoretical lower bound. In addition, the average utilization rate of raw materials has reached 94.56%, while the average utilization rate of raw materials with the removal of the maximum remaining material has increased to 96.43%. It is approximately 3% less than the theoretical optimal solution, which belongs to the category of a satisfactory solution. The length of the maximum remaining material obtained by the DRL-CSP algorithm is 7587, which is 849 mm longer than that obtained by the algorithm based on the HGA. This result is more conducive to the reuse of residual material. In instance S2, the raw materials with a quantity of 15 are consumed under the DRL-CSP algorithm, which has the same consumption quantity as the calculation based on the AGAPSO algorithm. The average utilization rate and the average utilization rate of the raw material with the removal of the maximum remaining material based on the DRL-CSP algorithm are 90.99% and 95.68%, respectively. These results have similar data rules with the algorithm calculated based on AGAPSO. In addition, the length of the maximum remaining material reaches 74.6% of the length of the raw material, which has positive impacts for the utilization of the residual material. In instance S3, a 92.41% average utilization rate of the raw material with the removal of the maximum remaining material is obtained for 1497 pieces to be cut, which exceeds the limit of 90% utilization rate in the practical application of the 1DCSP [
17].
By analyzing
Table 5 and
Table 6, a 100% utilization rate of raw materials appears in two data sets based on the DRL-CSP. In addition, the length of the maximum remaining material of instance S1 is 7587, and the residual material of most raw materials is less than 100, with almost no waste of residual material. This indicates that the cutting stock scheme of this instance has been effectively optimized. It is worth mentioning that in all instance calculations based on the DRL-CSP algorithm, the operation time is less than 1 s, which indicates that the algorithm model has high efficiency. Through the comparison of test results, it can be seen that the algorithm model of the DRL-CSP proposed in this paper has good generalization performance and solving efficiency, which can realize the cutting stock of wire pieces of various sizes in a very short time and obtain satisfactory solutions, indicating great practical application potential.
4.2. Analysis and Discussion
The above experimental results show that the DRL-CSP algorithm can not only efficiently solve the 1DCSP with 82 instances of 3 data sets, but that it can also obtain some satisfactory solutions comparable with the classical algorithm. The algorithm can not only meet the requirements of high utilization of raw materials, but also perform robustness calculations of large-scale cutting stock instances, which is related to the learning and optimization mode of the DRL model. The DRL model can automatically learn the cutting rules by using the shared underlying mode and data update mode of the wire piece cutting scheme, avoiding the solution defects caused by manual intervention similar to the heuristic solution. Therefore, the DRL-CSP algorithm has stronger generalization performance and versatility.
Figure 4 shows that the loss function changes with the increase of epochs during the reinforcement learning training. With the epoch increases, the loss curve decreases, and the fluctuation tends to converge. The possible reason is that with the increase of training time of reinforcement learning, the model will carry out a series of learning optimization and adjustment after backpropagation and gradient optimization. Meanwhile, the learning amplitude of the network will fluctuate toward the direction of more accurate prediction.
In the 1DCSP solution of instance S1, the calculation result based on the DRL-CSP algorithm reaches the category of a satisfactory solution. However, the calculation based on algorithms HGA, AGAPSO, DEPES, and PES have achieved an average utilization rate of 97.67%, and the calculation based on algorithms AGAPSO, DEPES, and PES has more remaining material. The result means that the calculation performance of the DRL-CSP algorithm is not as good as a certain classical algorithm in some instances, which is related to the calculation principle between algorithms. The DRL-CSP algorithm utilizes the principles of deep learning and reinforcement learning and obtains a stable algorithm model by training a large number of data sets. In the process of training, the setting of the training set and network parameters has a great impact on the model performance. The classical heuristic algorithm can approximate the solution of specific instances by manually setting the solution rules and has a certain solving efficiency. Therefore, for some specific instances, a better solution can be obtained by the classical heuristic algorithm. There are differences between them in solving principles, algorithm designs, and solving effects on different problem instances. Constructing a more efficient network solution structure and optimizing network parameters and training sets may better improve the performance of the DRL model. In general, the DRL-CSP algorithm can achieve similar results compared with the classical cutting stock algorithm of instance S1, with 28 specifications of pieces, and instance S2, with 35 specifications of pieces, which has certain advancements. The research of cutting stock algorithms based on DRL can not only provide new ideas for online solutions to cutting stock problems, but also provide a new reference for solving more combinatorial optimization problems, which has great theoretical significance and application potential.
In the 1DCSP solution of instance S3, a 92.41% average utilization rate of the raw material with the removal of the maximum remaining material has been achieved. A number of 1497 pieces should be cut, and the remaining material is long. However, more than a 95% utilization rate of the raw material, as of instance S1 and instance S2, has not been achieved with the removal of the maximum remaining material. The possible reason for this result is that the cutting distribution of the raw material is relatively uniform, and the gradient of residual materials is not obvious. The result further shows that the selection of sample data, the establishment of a mathematical model, and the distribution of case data have a greater impact on the performance of the DRL model in solving large-scale CSP problems.