Next Article in Journal
Credibilistic Cournot Game with Risk Aversion under a Fuzzy Environment
Previous Article in Journal
Novel Hybrid Optimized Clustering Schemes with Genetic Algorithm and PSO for Segmentation and Classification of Articular Cartilage Loss from MR Images
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Solving One-Dimensional Cutting Stock Problems with the Deep Reinforcement Learning

1
School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China
2
State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(4), 1028; https://doi.org/10.3390/math11041028
Submission received: 10 January 2023 / Revised: 16 February 2023 / Accepted: 16 February 2023 / Published: 17 February 2023
(This article belongs to the Section Mathematics and Computer Science)

Abstract

:
It is well known that the one-dimensional cutting stock problem (1DCSP) is a combinatorial optimization problem with nondeterministic polynomial (NP-hard) characteristics. Heuristic and genetic algorithms are the two main algorithms used to solve the cutting stock problem (CSP), which has problems of small scale and low-efficiency solutions. To better improve the stability and versatility of the solution, a mathematical model is established, with the optimization objective of the minimum raw material consumption and the maximum remaining material length. Meanwhile, a novel algorithm based on deep reinforcement learning (DRL) is proposed in this paper. The algorithm consists of two modules, each designed for different functions. Firstly, the pointer network with encoder and decoder structure is used as the policy network to utilize the underlying mode shared by the 1DCSP. Secondly, the model-free reinforcement learning algorithm is used to train network parameters and optimize the cutting sequence. The experimental data show that the one-dimensional cutting stock algorithm model based on deep reinforcement learning (DRL-CSP) can obtain the approximate satisfactory solution on 82 instances of 3 data sets in a very short time, and shows good generalization performance and practical application potential.

1. Introduction

CSP appears in many practical application scenarios, such as cutting steel sheets, wood, electric wires, and paper rolls [1]. There is a demand for cutting stock in many industries, such as aerospace, automobile, shipbuilding, energy, construction, and machinery manufacturing [2,3]. According to the relevant literature [4,5], cutting stock optimization can save 30% of the cost for some companies and reduce greenhouse gas emissions, which allows it to become a method of green manufacturing. According to the dimensions of raw materials and pieces [6], CSP can be divided into the one-dimensional cutting stock problem (1DCSP), two-dimensional cutting stock problem (2DCSP), and three-dimensional cutting stock problem (3DCSP). Among them, as the research basis of the CSP, 1DCSP refers to cutting raw materials of known length into pieces of different lengths according to the required quantity. Previous research showed that the 1DCSP belongs to class of NP-hard problems [7], which are problems in which an exact solution cannot be obtained in polynomial time. As the scale of the problem increases, the calculation time increases dramatically, which brings great challenges to the design of the algorithm. Dyckhoff et al. [8] elaborated on the similarity of the 1DCSP and the one-dimensional packing problem (1DBPP) by comparing and classifying them. Some scholars solved the 1DCSP by using a one-dimensional packing algorithm model. Brando et al. [9] proposed a general formula on the basis of the bin packing problem (BPP), which solved a large number of case-based CSPs and BPPs. Chang Yang et al. [10] drew on the idea of the best-fit decreasing (BFD) algorithm in solving BBP and proposed a heuristic cutting algorithm based on multi-branch tree traversal, which achieved good results.
Currently, scholars have proposed various optimization algorithms for the 1DCSP, such as linear programming, the simplex method, dynamic programming, heuristic algorithms, genetic algorithms, simulated annealing algorithms, and evolutionary algorithms [11,12,13,14,15,16]. Among them, the performance of the genetic algorithm is better, which indicates that the balance between global search and local convergence is important for optimizing results. Shenglan Zhu [17] not only designed a genetic algorithm with fixed-length real coding to calculate the 1DCSP of a single specification and multiple specifications, but also set penalties for infeasible solutions with individual fitness greater than 1. This method achieved better results. For the 1DCSP under a random environment, Cui Y et al. [18] and Junyan Ma et al. [19] proposed a heuristic algorithm based on column generation to solve the problem of available surplus materials in the 1DBPP. Belov G et al. [20] and Jingjing Cao et al. [21] studied the 1DCSP using the framework of sequential value correction (SVG). In recent research on the 1DBPP [22,23,24], some researchers mainly improved meta-heuristics and heuristic algorithms based on the utilization of usable leftover raw materials, which consists of building a cutting pattern by sorting the items of pair or an odd length in descending order. However, for the large-scale 1DCSP, there are generally millions of cutting methods. Thus, using traditional algorithms to solve the problem has low efficiency. Moreover, heuristic algorithms and improved-heuristic algorithms need rules to be designed for specific cutting stock problems. Designing such rules requires the high professional skills of designers. Although the heuristic algorithm can obtain a wide range of solutions, it cannot always obtain a high-quality solution. In addition, in actual production, the 1DCSP is mostly in the state of experience cutting because there are few simple, practical, and versatile methods to solve the 1DBPP of large-scale and multiple batches [25].
Machine learning, which has emerged in recent years, has been widely used in many fields. Amazing achievements in scheduling [26,27] and path optimization [28,29,30,31] reveal an interesting combination between artificial intelligence and operations research. Most machine learning methods for solving combinatorial optimization problems focus on learning to construct heuristic algorithms. Solutions can be generated under a given set of input nodes. These methods are often represented by sequences [32]. The graph neural network [33] and attention mechanism [34] produce high-quality solutions. However, the construction method requires additional algorithms to obtain advanced performance, such as cluster search, classical improved heuristic algorithm, and random sampling. The solution to some cutting stock problems based on reinforcement learning depends on its learning rules and heuristic algorithms. Anselmo R et al. [35] not only proposed a formulation of the stochastic cutting stock problem as a discounted infinite-horizon Markov decision process, but also proposed a heuristic method based on reinforcement learning. The results showed that the average cost could be up to 80% lower than the cost obtained by a myopic policy. The previous work [36] also shows that the combination of reinforcement learning search and heuristic algorithm can achieve certain effects in solving the cutting stock problem, but there are certain limitations, such as the randomness of the search. As the mainstream method of machine learning, DRL has been successfully applied to image analysis [37], video classification [38], intelligent translation [39], and other fields. By evaluating and comparing the results of the algorithm, it is found that DRL has the potential for good scalability, generalization, and versatility. The pointer network model proposed by Vials O et al. [40] was the first modern depth model for combinatorial optimization problems and solved the traveling salesman problem (TSP) and knapsack problem. Based on the pointer network, Bello I et al. [41] used reinforcement learning training to solve the sequence problem, which not only saved computational costs, but also obtained the optimal solution for the large-scale TSP problem. Hao Jie [42] used DRL to study the dynamic TSP problem and obtained good results. Lombardi M et al. [43] used machine learning and deep learning to learn heuristic algorithms and successfully solved the problem of combinatorial optimization. Some researchers [44,45] solved the TSP problem with supervised learning and reinforcement learning, which achieved certain results. Hang Zhao et al. [46] solved the online the 3D bin packing problem by describing the variant of 3D bin packing problem as a constrained Markov decision process and proposed an effective and easy-to-implement constrained deep reinforcement learning method under the action-critic framework.
At present, there is no research using the DRL method to solve the 1DCSP, but some researchers are trying to solve the problem using neural networks aimed at CSP characteristics. Kang M et al. [47] tried to solve the 1DCSP using an artificial neural network and achieved certain results. AR De et al. [48] solved the 1DBPP by comparing the augmented neural network AugNN meta heuristic with the minimum bin slack MBS heuristic. These algorithms ignore the advantage of using deep networks to construct solutions autoregressively. These algorithms are limited to artificial decision-making networks and heuristic algorithms and need to be designed with solution rules separately for the problem, while ignoring the development and utilization of shared underlying patterns and self-learning optimization in cutting stock instances. The research on 1DCSP based on DRL is not only an early attempt to solve the cutting stock problem by using the deep network, but also a promotion and inheritance of existing research achievements. Further, exploration of cutting stock algorithms based on DRL can not only provide new ideas for online solutions to cutting stock problems, but also provide a new reference for solving more combinatorial optimization problems; the solution has great theoretical significance and application potential.
In this paper, a one-dimensional wire (that is, a unified name for raw materials of pipes and profiles by cutting stock) is taken as the research object, and a solution algorithm based on the DRL is proposed to solve the problem of a single specification cutting stock (that is, the length of raw materials available for cutting is only one size and the number is unlimited, while the length and number of wire pieces to be cut are certain). A pointer network with an encoder and decoder structure is taken as the strategy network, and the cutting sequence selection process of pieces to be cut is modeled as a Markov decision-making process. Then, the network parameters are trained using the reinforcement learning algorithm. In the training process, the strategy network and reward value are continuously optimized to obtain a satisfactory solution. Finally, the trained model is tested based on cutting stock instances to prove the effectiveness of the algorithm model.

2. Mathematical Model

2.1. Problem Statement

The 1DCSP can be divided into a single-specification cutting stock problem and a multi-specification cutting stock problem according to raw material specifications. According to the different objective functions of optimization, the 1DCSP can be divided into three different types: the problem with the least remaining materials as the optimization objective, the problem with the minimum number of raw materials consumed as the optimization objective, and the problem with the least remaining materials and the minimum number of raw materials consumed as the optimization objective. Common mathematical models for the 1DCSP of a single specification include the Kantorovich model [49], the Gilmore–Gomory model [50], and the Haessler model [51,52]. These models played an important role in solving a single optimization objective and achieved good results.
In this paper, the problem of wire cutting stocks is mainly aimed at the cutting optimization of enterprises that use standard-length raw materials for wire cutting stock. The single specification CSP is mainly adopted when considering the utilization of surplus materials. Therefore, our research takes minimum raw material consumption and the maximum remaining material length of standard raw materials as the optimization objective to establish a mathematical model. The objective function and constraints are shown in Formulae (1) and (2), respectively.
U = min i = 1 k t i t max
s . t . L i L         i = 1 , 2 , , k t i = L i j = 1 n l j a i j 0         i = 1 , 2 , , k d j = i = 1 k a i j         j = 1 , 2 , , n t max = max t i | i = 1 , 2 , , k
where U is the objective function, s.t. is the constraint condition, n is the type of wire piece to be cut, the number of each piece is dj (j = 1,2,3…, n), lj represents the length of piece category j, and aij represents the cutting number of the piece i (i = 1,2,3…, k) in cutting block category j. L is the length of raw materials, k is the number of raw materials, and the remaining length of each raw material is ti. Li represents the length of each raw material. To maximize the utilization of surplus material, the maximum surplus material tmax should be as long as possible. Therefore, the purpose of setting the objective function is to minimize the sum of all remaining material lengths ti after removing the maximum remaining material tmax.

2.2. Problem Instance

Different solution definitions lead to different cutting stock schemes, directly affecting the cutting results. To better illustrate the mathematical model proposed in this paper, we distinguish the advantages and disadvantages of cutting stock schemes, as shown in Figure 1. Assume that the specifications of wire pieces are 300, 100, and 99, and the required quantities are 9, 903, and 67, respectively. All raw materials have a standard length of 1000 with an unlimited quantity. In Figure 1a, the number of raw materials consumed by cutting stock is 100, and the total length of the remaining materials is 367 (2 × 20 + 109 × 3), of which the length of the maximum remaining material is 109, which is represented by different color intervals. The total length of the consumed raw material is 99,633 (100 × 1000 − 367), and the sum of the length of the remaining material after the removal of the maximum remaining material is 40 (2 × 20). In Figure 1b, the number of raw materials consumed by cutting stock is also 100, and the total length of the remaining materials is 367 (5 × 10 + 105 × 2 + 107 × 1), of which, the length of the maximum remaining material is 107. The total length of the raw material consumed is also 99,633 (100 × 1000 − 367), and the sum of the length of the remaining material after the removal of the maximum remaining material is 260 (5 × 10 + 105 × 2). The values in Figure 1 represent the current quantity of wire raw materials, and the total quantity is 100. The symbol × indicates a multiplier, which is consistent with the symbol in the previous numerical calculation.
Comparing the two schemes, the cutting stock scheme in Figure 1a has a longer size of remaining material and a smaller total length of remaining materials after removing the maximum remaining materials on the basis of the same amount of raw material consumption. It can provide better support for reusing surplus materials, which is an ideal cutting scheme.

3. Algorithm Based on the DRL

In this section, an algorithm (DRL-CSP) based on DRL is described for solving the 1DCSP. The algorithm model is shown in Figure 2. The length sequence of the 1DCSP is used as input, and the vector representation of wire pieces is obtained after processing by the embedding layer of the neural network. Subsequently, the pointer network with an encoder and decoder structure (also known as the recurrent neural network module, RNN) is utilized to process the representation vector at each time point; then, the output is input into the probability function to obtain the cutting stock sequence. Furthermore, the network parameters are trained by reinforcement learning, and the loss of the model is adjusted by batch training to continuously optimize the reward function and obtain the optimal solution.

3.1. Policy Network Based on Pointer Network

Considering the time sequence relationship between the cutting sequences of wire pieces in the 1DCSP, we use the classical pointer network to process the prediction features of the cutting movement trajectory of wire pieces to simulate the behavior sequence of the cutting nodes of pieces. As described in the related literature [38], the output dictionary of pointer networks is variable in size, which has unique advantages for dealing with sequence problems. The structure of the pointer network is presented in Figure 3. In each time step ti, the length information of wire pieces is processed into a latent memory state sequence by the embedding layer and input to the long short-term memory (LSTM) unit in the encoder [53]. LSTM is a special RNN that can learn long dependencies and has a memory function for earlier sequences. That is, the input of the encoder network at time step ti is a one-dimensional embedding of li acquired through the linear transformation of li shared by all input pieces. In the last step of the encoder network, the unit state hm−1 and the output of LSTM are provided to the decoder network. At each time step in the decoder network, the pointing mechanism is used to select the next piece to be cut. Once the next piece is selected, it will be used as the input to the next decoder step. For example, in Figure 3, the output of step 2 in the decoder network is 5. Then, the output l5 of step 3 in the encoder network will be selected as the input of step 3 of the decoder network. The pointer to the input element is represented by u n m , and the hidden states of the encoder and decoder are represented by en and hm, respectively. Moreover, the softmax function is used for normalization to adjust the distribution probability of important elements. The probability of output nodes can be calculated by Formulae (3) and (4).
u n m = v T tanh W e n , h m
p = s o f t max u m
where v and W are the parameters to be trained. Taking Ci as the sequence of pieces to be cut and L= {l1, l2…, lj} as the set of wire pieces, the chain rule is used to estimate the conditional probability p, as shown in Formula (5).
p C m | C 1 , , C i 1 , L = s o f t max u m
In addition, in Formula (6), the node with the maximum probability of the unselected piece is used as the target output of the prediction network in the process of predicting the sequence of pieces to be cut. The output node is marked.
l = arg max p

3.2. Network Training Based on Reinforcement Learning

Supervised learning training has been applied in the research of Vinyals O et al. [40] and achieved certain results. However, in large-scale instance problems, it is usually difficult and costly to obtain original tags. Irwan B et al. [41] obtained the optimal solution on 200 items by using reinforcement learning training, which shows the feasibility of reinforcement learning to solve combinatorial optimization problems under unsupervised conditions. In this paper, we adopt a strategy based on the reinforcement learning algorithm to train the network model. The theoretical model of the algorithm is based on the Markov decision-making process. In this case, the agent learns the action autonomously through the feedback of the reward value in the given environment. After a series of state and action updates, the cutting stock sequence corresponding to the optimal objective function can be obtained. For the wire cutting problem with j pieces, the episode information based on a specific strategy can be modeled as Formula (7).
s 0 , a 1 , s 1 , r 2 , a 2 , s 2 , r 3 , , a j , s j , r j + 1         π ( s | a )
where ai and si represent the action and state of stage I, respectively. s0 is the state when raw materials have not been cut, and strategy π (s|a) is the strategy for selecting the next piece to be cut. We use the reinforcement learning algorithm based on Monte Carlo updating to train the network parameters and refer to the rules of this algorithm in the research of Williams et al. [54]. That is, the learning of state value and the updating of reward can only be carried out after the completion of a complete traversal of the wire pieces sequence to be cut. Rt is the total reward value returned by the agent exploration environment up to time t. The attenuation factor is defined as γ, which represents the distribution of current and future rewards. After a series of complete episodes, the accumulated discount reward value can be obtained, as shown in Formula (8). Every time the status updates, the average reward of the status can be calculated to guide the sequence adjustment of pieces to be cut. The theoretical research based on Monte Carlo reinforcement learning has been elaborated in the relevant literature [55].
R t = r t + 1 + γ r t + 2 + + γ j 1 r t
The neural network strategy parameterized to θ can be defined as π (o|l, θ), which means the probability of cutting sequence o to be adopted while giving a series of piece information l to be cut. The purpose of training is to obtain the expected utilization rate ρ of raw materials in Formula (9). The unit state ht at time T acquired from the RNN in the prediction network is fed to the training of reinforcement learning, and the strategy function πt at time t is obtained to represent the state behavior value at time t. The advantages and disadvantages of taking action at as the solution can be indicated under the piece cutting probability P and the current state st.
J ( θ | l ) = E o π θ ( . | l ) ρ ( o | l )
It can be seen from the above that the reward update based on Monte Carlo adopts a round system, and the average reward will be calculated to replace the value function and guide the change of state and action. In the training process, the input is the sequence of wire pieces and corresponding piece information, and the output is the parameter of the updated pointer network. Under the guidance of the pointer network, the wire pieces are cut with certain raw materials specifications, and the cutting results are used as rewards. The number of complete trajectories sampled for each gradient update is defined as M, and the parameter optimization based on the policy gradient is shown in Formula (10). The cutting utilization ρ is calculated according to the consumption of raw materials after obtaining the cutting sequence o. In the network training process, the loss function with model parameters is optimized by randomly extracting the data of batch wire pieces. The adaptive moment estimation (Adam) optimizer is used to adjust the loss function for random gradient descent. Then, the network parameters update so that the predicted value of the model gradually approaches the real action value. In the process of testing, the greedy strategy is utilized. In each step, the prediction with the highest probability is selected as the output. The network training process based on DRL is given in Algorithm 1.
θ J ( θ | l ) 1 M i = 1 M ρ ( o i | l i ) θ log π θ ( o i | l i )
At present, there is no large-scale training data set for the 1DCSP. Due to its simple structure, one-dimensional wire pieces in the practical application can be obtained by cutting wire rod pieces of different specifications or raw materials. Therefore, the piece data set used for reinforcement learning training in this study is generated based on piece cutting. The specific generation process of the data set is as follows: a piece L with a fixed specification is randomly cut into small pieces with variable length and a maximum number of n, and the piece set is initialized as {l1, l2..., ln}. In the cutting process, the minimum length of the piece is set to lmin, and the maximum length is set to lmax. The same operation is carried out on all sample pieces, and the piece training set S is obtained.
Algorithm 1 Network training based on the RL
procedure Training set S, number of training steps T, batch size B
 Initialize Pointer network param θ
for t = 1 to T do
  Select a batch of sample si for i   {1,2…, B}
  Send si to pointer network, sample cutting solution oi based on π θ ( . | l i ) for i  {1,2…, B}
  Obtain cutting utilization rate  ρ ( o i | l i )
  Let g θ = 1 B i = 1 B [ ρ ( o i | l i ) θ log π θ ( o i | l i ) ]
  Update θ = ADAM (θ, gθ)
end for
return pointer network parameters θ
end procedure

4. Calculation Experiment and Analysis

The Python programming language is used in this study, and the neural network construction method based on PyTorch [56] is adopted. The calculation test of the DRL-CSP algorithm is carried out on a computer with a 2.30 GHz AMD Ryzen 7 3750H CPU with 4 cores and 16 GB of RAM. The training set in this paper is obtained by the cutting generation method, and the test set is composed of instances data and randomly generated data. The performance of the DRL-CSP algorithm is tested by using instances of the 1DCSP in the literature of Peiyong Li et al. [57] and Xianjun Shen et al. [58]. These instances are also the data of the real steel cutting stock in the enterprise, as shown in Table 1 and Table 2. Furthermore, randomly generated large-scale instances of more than 1000 pieces to be cut are also used to test the performance of the algorithm model, as given in Table 3. In the experiment, the utilization rate of raw materials is taken as a reward. That is, the length consumption and residual length of raw materials are within the scope of optimization. Each piece L with a fixed specification can be cut into 500 small pieces at most. The minimum length of piece lmin is set to 1000, and the maximum length lmax is set to 5000. The number of samples is set to 500, and the number of training steps is set to 50. Our batch size B is set to 32, and the hidden dimension of the LSTM cell is set to 32. The initial learning rate of the model is set to le-3, and the discount factor of the reward is set to 1. The training takes approximately 90 min. During the training process, with the increase in the training epoch, the loss value of the network changes, as shown in Figure 4.

4.1. Experimental Results

Table 4 shows the comparison between our experimental results and the algorithm results in related literature [57,58,59]. The specific cutting stock results of instances S1 and S2, based on the DRL-CSP algorithm, can be found in Table 5 and Table 6. Through the analysis of Table 4, the DRL-CSP algorithm proposed in this paper has achieved similar results compared with the current excellent cutting stock algorithm of instance S1 with 28 specifications of pieces and instance S2 with 35 specifications of pieces. In instance S1, the raw materials with a quantity of 24 are consumed under the DRL-CSP algorithm, which is one more raw material consumption than the calculation result of the classical algorithms (HGA [57], AGAPSO [58], DEPES [59], PES [59]), and one more consumption than the quantity of the theoretical lower bound. In addition, the average utilization rate of raw materials has reached 94.56%, while the average utilization rate of raw materials with the removal of the maximum remaining material has increased to 96.43%. It is approximately 3% less than the theoretical optimal solution, which belongs to the category of a satisfactory solution. The length of the maximum remaining material obtained by the DRL-CSP algorithm is 7587, which is 849 mm longer than that obtained by the algorithm based on the HGA. This result is more conducive to the reuse of residual material. In instance S2, the raw materials with a quantity of 15 are consumed under the DRL-CSP algorithm, which has the same consumption quantity as the calculation based on the AGAPSO algorithm. The average utilization rate and the average utilization rate of the raw material with the removal of the maximum remaining material based on the DRL-CSP algorithm are 90.99% and 95.68%, respectively. These results have similar data rules with the algorithm calculated based on AGAPSO. In addition, the length of the maximum remaining material reaches 74.6% of the length of the raw material, which has positive impacts for the utilization of the residual material. In instance S3, a 92.41% average utilization rate of the raw material with the removal of the maximum remaining material is obtained for 1497 pieces to be cut, which exceeds the limit of 90% utilization rate in the practical application of the 1DCSP [17].
By analyzing Table 5 and Table 6, a 100% utilization rate of raw materials appears in two data sets based on the DRL-CSP. In addition, the length of the maximum remaining material of instance S1 is 7587, and the residual material of most raw materials is less than 100, with almost no waste of residual material. This indicates that the cutting stock scheme of this instance has been effectively optimized. It is worth mentioning that in all instance calculations based on the DRL-CSP algorithm, the operation time is less than 1 s, which indicates that the algorithm model has high efficiency. Through the comparison of test results, it can be seen that the algorithm model of the DRL-CSP proposed in this paper has good generalization performance and solving efficiency, which can realize the cutting stock of wire pieces of various sizes in a very short time and obtain satisfactory solutions, indicating great practical application potential.

4.2. Analysis and Discussion

The above experimental results show that the DRL-CSP algorithm can not only efficiently solve the 1DCSP with 82 instances of 3 data sets, but that it can also obtain some satisfactory solutions comparable with the classical algorithm. The algorithm can not only meet the requirements of high utilization of raw materials, but also perform robustness calculations of large-scale cutting stock instances, which is related to the learning and optimization mode of the DRL model. The DRL model can automatically learn the cutting rules by using the shared underlying mode and data update mode of the wire piece cutting scheme, avoiding the solution defects caused by manual intervention similar to the heuristic solution. Therefore, the DRL-CSP algorithm has stronger generalization performance and versatility. Figure 4 shows that the loss function changes with the increase of epochs during the reinforcement learning training. With the epoch increases, the loss curve decreases, and the fluctuation tends to converge. The possible reason is that with the increase of training time of reinforcement learning, the model will carry out a series of learning optimization and adjustment after backpropagation and gradient optimization. Meanwhile, the learning amplitude of the network will fluctuate toward the direction of more accurate prediction.
In the 1DCSP solution of instance S1, the calculation result based on the DRL-CSP algorithm reaches the category of a satisfactory solution. However, the calculation based on algorithms HGA, AGAPSO, DEPES, and PES have achieved an average utilization rate of 97.67%, and the calculation based on algorithms AGAPSO, DEPES, and PES has more remaining material. The result means that the calculation performance of the DRL-CSP algorithm is not as good as a certain classical algorithm in some instances, which is related to the calculation principle between algorithms. The DRL-CSP algorithm utilizes the principles of deep learning and reinforcement learning and obtains a stable algorithm model by training a large number of data sets. In the process of training, the setting of the training set and network parameters has a great impact on the model performance. The classical heuristic algorithm can approximate the solution of specific instances by manually setting the solution rules and has a certain solving efficiency. Therefore, for some specific instances, a better solution can be obtained by the classical heuristic algorithm. There are differences between them in solving principles, algorithm designs, and solving effects on different problem instances. Constructing a more efficient network solution structure and optimizing network parameters and training sets may better improve the performance of the DRL model. In general, the DRL-CSP algorithm can achieve similar results compared with the classical cutting stock algorithm of instance S1, with 28 specifications of pieces, and instance S2, with 35 specifications of pieces, which has certain advancements. The research of cutting stock algorithms based on DRL can not only provide new ideas for online solutions to cutting stock problems, but also provide a new reference for solving more combinatorial optimization problems, which has great theoretical significance and application potential.
In the 1DCSP solution of instance S3, a 92.41% average utilization rate of the raw material with the removal of the maximum remaining material has been achieved. A number of 1497 pieces should be cut, and the remaining material is long. However, more than a 95% utilization rate of the raw material, as of instance S1 and instance S2, has not been achieved with the removal of the maximum remaining material. The possible reason for this result is that the cutting distribution of the raw material is relatively uniform, and the gradient of residual materials is not obvious. The result further shows that the selection of sample data, the establishment of a mathematical model, and the distribution of case data have a greater impact on the performance of the DRL model in solving large-scale CSP problems.

5. Conclusions and Future Work

In this study, a mathematical model is established with the optimization objective of the minimum raw material consumption and the maximum remaining material length, aimed at the characteristics of the 1DCSP. An algorithm based on DRL is proposed to solve the 1DCSP, and the underlying sharing mode of the 1DCSP is obtained using the network framework of deep learning. Furthermore, reinforcement learning training is used to optimize the network parameters and automatically learn the cutting stock rules. To the best of our knowledge, it is the first time to solve the IDCSP with the help of a deep neural network. The experimental results show that the DRL-CPS algorithm can not only efficiently solve the 1DCSP with different specifications, but also obtain satisfactory solutions comparable with some classical algorithms. Moreover, a utilization rate of the raw material of higher than 90% can be obtained on large-scale data sets, which shows the great potential of the DRL-CSP algorithm for solving practical cutting stock problems.
In future research, we will explore a more efficient network solution structure and transfer solution mechanism while improving deep network optimization and consider applying deep networks to solve two-dimensional nesting problems.

Author Contributions

J.F.: conceptualization, methodology, validation, formal analysis, writing—original draft, writing—review and editing; Y.R.: supervision, project administration, writing—review and editing; Q.L.: writing—review and editing; J.X.: writing—review and editing; All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number. 51975231) and the Fundamental Research Funds for the Central Universities (grant number. 2019kfyXKJC043).

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the referees for their constructive comments that improved the presentation as well as the content of the paper.

Conflicts of Interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, and there are no professional or other personal interest of any nature or kind of in any product, service and/or company that could be construed as influencing the position presented in, or the review of the manuscript entitled.

References

  1. Stadtler, H. A one-dimensional cutting stock problem in the aluminium industry and its solution. Eur. J. Oper. Res. 1990, 44, 209–223. [Google Scholar] [CrossRef]
  2. Johnson, M.; Rennick, C.; Zak, E. Skiving Addition to the Cutting Stock Problem in the Paper Industry. SIAM Rev. 1997, 39, 472–483. [Google Scholar] [CrossRef]
  3. Cui, Y. A cutting stock problem and its solution in the manufacturing industry of large electric generators. Comput. Oper. Res. 2005, 32, 1709–1721. [Google Scholar] [CrossRef]
  4. Ogunranti, G.; Oluleye, A. Minimizing waste (off-cuts) using cutting stock model: The case of one-dimensional cutting stock problem in wood working industry. J. Ind. Eng. Manag. 2016, 9, 834–859. [Google Scholar] [CrossRef] [Green Version]
  5. Wattanasiriseth, P.; Krairit, A. An Application of Cutting-Stock Problem in Green Manufacturing: A Case Study of Wooden Pallet Industry. IOP Conf. Ser. Mater. Sci. Eng. 2019, 530, 12005. [Google Scholar] [CrossRef]
  6. Wäscher, G.; Haußner, H.; Schumann, H. An improved typology of cutting and packing problems. Eur. J. Oper. Res. 2007, 183, 1109–1130. [Google Scholar] [CrossRef]
  7. Lima, V.; Alves, C.; Clautiaux, F.; Iori, M.; Valério, D.; José, M. Arc flow formulations based on dynamic programming: Theoretical foundations and applications. Eur. J. Oper. Res. 2022, 296, 3–21. [Google Scholar] [CrossRef]
  8. Dyckhoff, H. A New Linear Programming Approach to the Cutting Stock Problem. Oper. Res. 1981, 29, 1092–1104. [Google Scholar] [CrossRef]
  9. Brandão, F.; Pedroso, J. Bin packing and related problems: General arc-flow formulation with graph compression. Comput. Oper. Res. 2016, 69, 56–67. [Google Scholar] [CrossRef] [Green Version]
  10. Yang, C.; Yang, L.; Sheng, Z. Research on Multi-Branches Tree Traversal Algorithm of One-Dimensional Cutting Stock Problem. Mech. Eng. Autom. 2018, 15, 11–12. [Google Scholar] [CrossRef]
  11. Kang, M.; Yoon, K. An improved best-first branch-and-bound algorithm for unconstrained two-dimensional cutting problems. Int. J. Prod. Res. 2011, 49, 4437–4455. [Google Scholar] [CrossRef]
  12. Lu, H.; Huang, Y. An efficient genetic algorithm with a corner space algorithm for a cutting stock problem in the TFT-LCD industry. Eur. J. Oper. Res. 2015, 246, 51–66. [Google Scholar] [CrossRef]
  13. Haessler, R.; Sweeney, P. Cutting stock problems and solution procedures. Eur. J. Oper. Res. 1991, 54, 141–150. [Google Scholar] [CrossRef] [Green Version]
  14. Wäscher, G.; Gau, T. Heuristics for the integer one-dimensional cutting stock problem: A computational study. Oper. Res. Spektrum 1996, 18, 131–144. [Google Scholar] [CrossRef]
  15. Wu, Z.; Zhang, L.; Wang, K. An Ant Colony Algorithm for One-dimensional Cutting-stock Problem. Mech. Sci. Technol. Aerosp. Eng. 2008, 27, 1681–1684. [Google Scholar]
  16. Guan, W.; Gong, J.; Xue, H. A Hybrid Heuristic Algorithm for the One-Dimensional Cutting Stock Problem. Mach. Des. Manuf. 2018, 8, 237–239. [Google Scholar]
  17. Zhu, S. The Research on Optimization Algorithms for one-Dimensional Cutting Stock Problems. Master’s Thesis, Huazhong University of Science and Technology, Wuhan, China, 2013. [Google Scholar]
  18. Cui, Y.; Song, X.; Chen, Y. New model and heuristic solution approach for one-dimensional cutting stock problem with usable leftovers. J. Oper. Res. Soc. 2017, 68, 269–280. [Google Scholar] [CrossRef] [Green Version]
  19. Ma, J.; Han, Z.; Luo, D.; Xiao, H. Research on One-Dimensional Cutting Stock Problem Based on Recursive Matrix Column Generation Algorithm. Mach. Des. Manuf. 2022, 117–119. [Google Scholar] [CrossRef]
  20. Belov, G.; Scheithauer, G. Setup and Open-Stacks Minimization in One-Dimensional Stock Cutting. INFORMS J. Comput. 2007, 19, 27–35. [Google Scholar] [CrossRef] [Green Version]
  21. Cao, J.; Cui, Y.; Li, D. Study on the solution of one-dimensional cutting stock for multiple stock lengths with variable cross-section. Forg. Stamp. Technol. 2017, 42, 161–165. [Google Scholar]
  22. Cerqueira, G.; Aguiar, S.; Marques, M. Modified Greedy Heuristic for the one-dimensional cutting stock problem. J. Comb. Optim. 2021, 42, 657–674. [Google Scholar] [CrossRef]
  23. Ravelo, S.; Meneses, C.; Santos, M. Meta-heuristics for the one-dimensional cutting stock problem with usable leftover. J. Heuristics 2020, 26, 585–618. [Google Scholar] [CrossRef]
  24. Pimenta, Z.; Sakuray, F.; Hoto, R. A heuristic for the problem of one-dimensional steel coil cutting. Comput. Appl. Math. 2021, 40, 39. [Google Scholar] [CrossRef]
  25. Tian, S.; Lv, L.; Cai, Y. Design and implementation of a simple algorithm for solving one dimensional cuttking block problem based on Lingo. Ind. Sci. Trib. 2021, 20, 45–47. [Google Scholar]
  26. Zhang, C.; Song, W.; Cao, Z.; Zhang, J.; Tan, P.; Xu, C. Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning. Adv. Neural Inf. Process. Syst. 2020, 33, 1621–1632. [Google Scholar]
  27. Park, J.; Chun, J.; Kim, S.; Kim, Y. Learning to schedule job-shop problems: Representation and policy learning using graph neural network and reinforcement learning. Int. J. Prod. Res. 2021, 59, 3360–3377. [Google Scholar] [CrossRef]
  28. Li, J.; Ma, Y.; Gao, R.; Cao, Z.; Lim, A.; Song, W.; Zhang, J. Deep Reinforcement Learning for Solving the Heterogeneous Capacitated Vehicle Routing Problem. IEEE Trans. Cybern. 2022, 52, 13572–13585. [Google Scholar] [CrossRef]
  29. Xin, L.; Song, W.; Cao, Z.; Zhang, J. Step-Wise Deep Learning Models for Solving Routing Problems. IEEE Trans. Ind. Inform. 2021, 17, 4861–4871. [Google Scholar] [CrossRef]
  30. Kool, W.; Van, H.; Welling, M. Attention, learn to solve routing problems. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  31. Xin, L.; Song, W.; Cao, Z.; Zhang, J. NeuroLKH: Combining Deep Learning Model with Lin-Kernighan-Helsgaun Heuristic for Solving the Traveling Salesman Problem. Adv. Neural Inf. Process. Syst. 2021, 34, 7472–7483. [Google Scholar]
  32. Ivanov, D.; Kiselev, M.; Larionov, D. Neural Network Optimization for Reinforcement Learning Tasks Using Sparse Computations. arXiv 2022, arXiv:2201.02571. [Google Scholar]
  33. Zhou, R.; Tian, Y.; Wu, Y.; Du, S. Understanding Curriculum Learning in Policy Optimization for Solving Combinatorial Optimization Problems. arXiv 2022, arXiv:2202.05423. [Google Scholar]
  34. Peng, B.; Wang, J.; Zhang, Z. A Deep Reinforcement Learning Algorithm Using Dynamic Attention Model for Vehicle Routing Problems. Commun. Comput. Inf. Sci. 2020, 1205, 636–650. [Google Scholar]
  35. Pitombeira-Neto, A.R.; Murta, A.H.F. A reinforcement learning approach to the stochastic cutting stock problem. Eur. J. Comput. Optim. 2022, 10, 100027. [Google Scholar] [CrossRef]
  36. Fang, J.; Rao, Y.; Zhao, X.; Du, B. A Hybrid Reinforcement Learning Algorithm for 2D Irregular Packing Problems. Mathematics 2023, 11, 327. [Google Scholar] [CrossRef]
  37. Zhang, W.; Tang, S.; Su, J.; Xiao, J.; Zhuang, Y. Tell and guess: Cooperative learning for natural image caption generation with hierarchical refined attention. Multimed. Tools Appl. 2021, 80, 16267–16282. [Google Scholar] [CrossRef]
  38. Xia, B.; Wong, C.; Peng, Q.; Yuan, W.; You, X. CSCNet: Contextual semantic consistency network for trajectory prediction in crowded spaces. Pattern Recognit. 2022, 126, 108552. [Google Scholar] [CrossRef]
  39. Song, J.; Kim, S.; Yoon, S. AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate. In Proceedings of the 2021 Conference On Empirical Methods in Natural Language Processing (EMNLP 2021), Punta Cana, Dominican Republic, 7–11 November 2021; pp. 1–14. [Google Scholar]
  40. Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer networks. Adv. Neural Inf. Process. Syst. 2015, 28, 2692–2700. [Google Scholar]
  41. Bello, I.; Pham, H.; Le, Q.; Norouzi, M.; Bengio, S. Neural combinatorial optimization with reinforcement learning. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017 Workshop Track Proceedings, Toulon, France, 24–26 April 2017. [Google Scholar]
  42. Chen, H.; Fan, J.; Liu, Y. Solving dynamic traveling salesman problem by deep reinforcement learning. J. Comput. Appl. 2022, 42, 1194–1200. [Google Scholar]
  43. Lombardi, M.; Milano, M. Boosting combinatorial problem modeling with machine learning. In Proceedings of the 27th IJCAI International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 5472–5478. [Google Scholar]
  44. Joshi, C.; Thomas, L.; Bresson, X. An Efficient Graph Convolutional Network Technique for the Travelling Salesman Problem. arXiv 2019, arXiv:1906.01227. [Google Scholar]
  45. Bogyrbayeva, A.; Yoon, T.; Ko, H.; Lim, S.; Yun, H.; Kwon, C. A Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drone. arXiv 2021, arXiv:2112.12545. [Google Scholar] [CrossRef]
  46. Zhao, H.; She, Q.; Zhu, C.; Yang, Y.; Xu, K. Online 3D bin packing with constrained deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February; 35, pp. 741–749.
  47. Kang, M.; Oh, J.; Lee, Y.; Park, K.; Park, S. Selecting Heuristic Method for One-dimensional Cutting Stock Problems Using Artificial Neural Networks. Korean J. Comput. Des. Eng. 2020, 25, 67–76. [Google Scholar] [CrossRef]
  48. Almeida, R.; Steiner, M. Resolution of one-dimensional bin packing problems using augmented neural networks and minimum bin slack. Int. J. Innov. Comput. Appl. 2016, 7, 214–224. [Google Scholar] [CrossRef]
  49. Kantorovich, L. Mathematical Methods of Organizing and Planning Production. Manag. Sci. 1960, 6, 366–422. [Google Scholar] [CrossRef]
  50. Gilmore, P.; Gomory, R. A Linear Programming Approach to the Cutting Stock Problem–Part II. Oper. Res. 1963, 11, 863–888. [Google Scholar] [CrossRef]
  51. Haessler, R. A Heuristic Programming Solution to a Nonlinear Cutting Stock Problem. Manag. Sci. 1971, 17. [Google Scholar] [CrossRef]
  52. Haessler, R. Controlling Cutting Pattern Changes in One-Dimensional Trim Problems. Oper. Res. 1975, 23, 483–493. [Google Scholar] [CrossRef]
  53. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
  54. Williams, R. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 1992, 8, 229–256. [Google Scholar] [CrossRef] [Green Version]
  55. Fang, J.; Rao, Y.; Guo, X.; Zhao, X. A reinforcement learning algorithm for two-dimensional irregular packing problems. In Proceedings of the ACAI’21: 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence, Sanya, China, 22–24 December 2021. [Google Scholar]
  56. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Chintala, S.; Killeen, T.; Gimelshein, N.; Lin, Z.; et al. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems 32 (NIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; pp. 8024–8035. [Google Scholar]
  57. Li, P.; Wang, Q.; Qiu, Y. Optimization for One-Dimensional Cutting UsingHybrid Genetic Algorithm. J. Shanghai Jiaotong Univ. 2001, 35, 1557–1560. [Google Scholar]
  58. Shen, X.; Yang, J.; Ying, W. Adaptive General Particle Swarm Optimization for One-Dimension Cutting Stock Problem. J. South China Univ. Technol. (Nat. Sci. Ed.) 2007, 35, 113–117. [Google Scholar]
  59. Hou, G. Research of One-dimensional Cutting Stock Problem Based on Improved Pyramid Evolution Strategy. Master’s Thesis, Wuhan University of Technology, Wuhan, China, 2020. [Google Scholar]
Figure 1. Diagram of the cutting stock scheme. (a) Ideal cutting stock scheme (b) Undesirable cutting stock scheme.
Figure 1. Diagram of the cutting stock scheme. (a) Ideal cutting stock scheme (b) Undesirable cutting stock scheme.
Mathematics 11 01028 g001
Figure 2. The DRL-CSP model.
Figure 2. The DRL-CSP model.
Mathematics 11 01028 g002
Figure 3. Pointer network structure.
Figure 3. Pointer network structure.
Mathematics 11 01028 g003
Figure 4. Loss curve of algorithm model training.
Figure 4. Loss curve of algorithm model training.
Mathematics 11 01028 g004
Table 1. Instance S1 based on steel cutting stock.
Table 1. Instance S1 based on steel cutting stock.
InstanceRaw Material Length L (mm)Length (mm) and Quantity of Steel Pieces
LengthQuantityLengthQuantityLengthQuantity
S118,000328042786619088
327542757518499
308542680418125
3005525433177012
29524230410171216
295042167416898
2868221621614048
285942006813528
283061975813158
13088----
Table 2. Instance S2 based on steel cutting stock.
Table 2. Instance S2 based on steel cutting stock.
InstanceRaw Material Length L (mm)Length (mm) and Quantity of Steel Pieces
LengthQuantityLengthQuantityLengthQuantity
S225,800161581850126861
162281955827731
163481968827774
165582068827881
1709821621627991
172282213228324
177082334828382
181212591328434
184982674128511
289112987430821
335223373233884
4273442884--
Table 3. Instance S3 based on large-scale wire pieces.
Table 3. Instance S3 based on large-scale wire pieces.
InstanceRaw Material Length L (mm)Length (mm) and Quantity of Large-Scale Wire Pieces
LengthQuantityLengthQuantityLengthQuantity
S318,00046002043001403000160
4000140415030380060
320599283834224028
283834286010250080
24001202334672162156
1968291955100180050
1600140----
Table 4. Results information of the 1DCSP.
Table 4. Results information of the 1DCSP.
InstanceAlgorithmQuantity of Raw Materials ConsumedAverage Utilization RateAverage Utilization Rate
(Removal of the Raw Material Containing the Maximum Remaining Material)
Length of the Maximum Remaining Material
S1DRL-CSP2494.56%96.43%7587
HGA2397.67%99.27%6738
AGAPSO2397.67%99.58%8529
DEPES2397.67%99.82%8909
PES2397.67%99.74%8621
S2DRL-CSP1590.99%95.68%19,255
AGAPSO1593.25%99.47%24,185
S3DRL-CSP25091.99%92.41%11,600
Table 5. The specific cutting stock results of instance S1.
Table 5. The specific cutting stock results of instance S1.
Label of the Raw MaterialPiece Length/mm (Quantity of Pieces)Length of the Remaining Material/mmUtilization Rate/%
11315130820063280286827570100
23042162----
214042786(2)13523275285918490100
1689-----
313521975(2)328027862162(2)2304499.98
4168917121770(2)292028682006599.97
19081352----
5197518491712(2)300527572680699.97
2304-----
6168918491352308528592162799.96
28302167----
7181217701712(2)1849(2)2006(2)3275999.95
813151352(2)1712(2)2952230413081299.93
20061975----
9181218492167(2)3280283017122199.88
2162-----
10168917121770(2)2830268013082999.84
23041908----
1117121308(2)177016891908(2)18123699.80
20062543----
12135217121908(3)2920275713155899.68
2162-----
1313081712(3)17701812130819087999.56
18492830----
14275723042162(5)2006--12399.32
1529522830275725432304(2)216214899.18
1628591404(3)27862162(2)1770(2)-27998.45
1721622167230430852952197556996.84
2786-----
1830052920285928302786268092094.89
19200618123005(2)1712(2)2680-206888.51
2013081404(2)1315(2)327529202830222987.62
2130051770(2)328030851315(2)-246086.33
223275135216891404(2)30852952283984.23
2325432304216219751849(2)1308401077.72
24131517701975(2)1689(2)--758757.85
Table 6. The specific cutting stock results of instance S2.
Table 6. The specific cutting stock results of instance S2.
Label of the Raw MaterialPiece Length/mm (Quantity of Pieces)Length of the Remaining Material/mmUtilization Rate/%
1335216551722(2)1955(2)233421621100
28382068(2)1968---
2427316152334(2)3388337328511100
28322799----
3428816551812308228912832299.99
2162206826742334--
4427316151849(3)3388216218502199.92
16341770(3)----
542731615(3)4288(2)33881722277322399.14
619682162(2)28382068(2)1709(2)268641998.38
233417221955---
7277719681634(2)17222334284361497.62
170925912987(2)---
820681849(2)1655(2)27772162233492696.41
1770195519682832--
9428816151622(2)427333882987106695.87
27772162----
1018491955(2)2213216219682068155893.96
17222591165517702334-
1128321968(2)2162(3)1955(2)1709(2)1770(2)167893.50
1216222843(2)1634(2)277725911849175893.19
221320681968---
1318492162(4)27881709(2)1655(2)1700229591.10
1722-----
14284317222987335216343373503780.48
16221615(2)----
1516551634(2)1622---19,25525.37
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fang, J.; Rao, Y.; Luo, Q.; Xu, J. Solving One-Dimensional Cutting Stock Problems with the Deep Reinforcement Learning. Mathematics 2023, 11, 1028. https://doi.org/10.3390/math11041028

AMA Style

Fang J, Rao Y, Luo Q, Xu J. Solving One-Dimensional Cutting Stock Problems with the Deep Reinforcement Learning. Mathematics. 2023; 11(4):1028. https://doi.org/10.3390/math11041028

Chicago/Turabian Style

Fang, Jie, Yunqing Rao, Qiang Luo, and Jiatai Xu. 2023. "Solving One-Dimensional Cutting Stock Problems with the Deep Reinforcement Learning" Mathematics 11, no. 4: 1028. https://doi.org/10.3390/math11041028

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop