Solving One-Dimensional Cutting Stock Problems with the Deep Reinforcement Learning

Fang, Jie; Rao, Yunqing; Luo, Qiang; Xu, Jiatai

doi:10.3390/math11041028

Open AccessArticle

Solving One-Dimensional Cutting Stock Problems with the Deep Reinforcement Learning

by

Jie Fang

^1,2,*

,

Yunqing Rao

^1,2,

Qiang Luo

^1,2 and

Jiatai Xu

^1,2

¹

School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan 430074, China

²

State Key Laboratory of Digital Manufacturing Equipment and Technology, Huazhong University of Science and Technology, Wuhan 430074, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(4), 1028; https://doi.org/10.3390/math11041028

Submission received: 10 January 2023 / Revised: 16 February 2023 / Accepted: 16 February 2023 / Published: 17 February 2023

(This article belongs to the Section E1: Mathematics and Computer Science)

Download

Browse Figures

Versions Notes

Abstract

:

It is well known that the one-dimensional cutting stock problem (1DCSP) is a combinatorial optimization problem with nondeterministic polynomial (NP-hard) characteristics. Heuristic and genetic algorithms are the two main algorithms used to solve the cutting stock problem (CSP), which has problems of small scale and low-efficiency solutions. To better improve the stability and versatility of the solution, a mathematical model is established, with the optimization objective of the minimum raw material consumption and the maximum remaining material length. Meanwhile, a novel algorithm based on deep reinforcement learning (DRL) is proposed in this paper. The algorithm consists of two modules, each designed for different functions. Firstly, the pointer network with encoder and decoder structure is used as the policy network to utilize the underlying mode shared by the 1DCSP. Secondly, the model-free reinforcement learning algorithm is used to train network parameters and optimize the cutting sequence. The experimental data show that the one-dimensional cutting stock algorithm model based on deep reinforcement learning (DRL-CSP) can obtain the approximate satisfactory solution on 82 instances of 3 data sets in a very short time, and shows good generalization performance and practical application potential.

Keywords:

cutting stock problem; one-dimensional; combination optimization; deep reinforcement learning; mathematical model; pointer network

MSC:

90C27; 68u07

1. Introduction

CSP appears in many practical application scenarios, such as cutting steel sheets, wood, electric wires, and paper rolls [1]. There is a demand for cutting stock in many industries, such as aerospace, automobile, shipbuilding, energy, construction, and machinery manufacturing [2,3]. According to the relevant literature [4,5], cutting stock optimization can save 30% of the cost for some companies and reduce greenhouse gas emissions, which allows it to become a method of green manufacturing. According to the dimensions of raw materials and pieces [6], CSP can be divided into the one-dimensional cutting stock problem (1DCSP), two-dimensional cutting stock problem (2DCSP), and three-dimensional cutting stock problem (3DCSP). Among them, as the research basis of the CSP, 1DCSP refers to cutting raw materials of known length into pieces of different lengths according to the required quantity. Previous research showed that the 1DCSP belongs to class of NP-hard problems [7], which are problems in which an exact solution cannot be obtained in polynomial time. As the scale of the problem increases, the calculation time increases dramatically, which brings great challenges to the design of the algorithm. Dyckhoff et al. [8] elaborated on the similarity of the 1DCSP and the one-dimensional packing problem (1DBPP) by comparing and classifying them. Some scholars solved the 1DCSP by using a one-dimensional packing algorithm model. Brando et al. [9] proposed a general formula on the basis of the bin packing problem (BPP), which solved a large number of case-based CSPs and BPPs. Chang Yang et al. [10] drew on the idea of the best-fit decreasing (BFD) algorithm in solving BBP and proposed a heuristic cutting algorithm based on multi-branch tree traversal, which achieved good results.

Currently, scholars have proposed various optimization algorithms for the 1DCSP, such as linear programming, the simplex method, dynamic programming, heuristic algorithms, genetic algorithms, simulated annealing algorithms, and evolutionary algorithms [11,12,13,14,15,16]. Among them, the performance of the genetic algorithm is better, which indicates that the balance between global search and local convergence is important for optimizing results. Shenglan Zhu [17] not only designed a genetic algorithm with fixed-length real coding to calculate the 1DCSP of a single specification and multiple specifications, but also set penalties for infeasible solutions with individual fitness greater than 1. This method achieved better results. For the 1DCSP under a random environment, Cui Y et al. [18] and Junyan Ma et al. [19] proposed a heuristic algorithm based on column generation to solve the problem of available surplus materials in the 1DBPP. Belov G et al. [20] and Jingjing Cao et al. [21] studied the 1DCSP using the framework of sequential value correction (SVG). In recent research on the 1DBPP [22,23,24], some researchers mainly improved meta-heuristics and heuristic algorithms based on the utilization of usable leftover raw materials, which consists of building a cutting pattern by sorting the items of pair or an odd length in descending order. However, for the large-scale 1DCSP, there are generally millions of cutting methods. Thus, using traditional algorithms to solve the problem has low efficiency. Moreover, heuristic algorithms and improved-heuristic algorithms need rules to be designed for specific cutting stock problems. Designing such rules requires the high professional skills of designers. Although the heuristic algorithm can obtain a wide range of solutions, it cannot always obtain a high-quality solution. In addition, in actual production, the 1DCSP is mostly in the state of experience cutting because there are few simple, practical, and versatile methods to solve the 1DBPP of large-scale and multiple batches [25].

Machine learning, which has emerged in recent years, has been widely used in many fields. Amazing achievements in scheduling [26,27] and path optimization [28,29,30,31] reveal an interesting combination between artificial intelligence and operations research. Most machine learning methods for solving combinatorial optimization problems focus on learning to construct heuristic algorithms. Solutions can be generated under a given set of input nodes. These methods are often represented by sequences [32]. The graph neural network [33] and attention mechanism [34] produce high-quality solutions. However, the construction method requires additional algorithms to obtain advanced performance, such as cluster search, classical improved heuristic algorithm, and random sampling. The solution to some cutting stock problems based on reinforcement learning depends on its learning rules and heuristic algorithms. Anselmo R et al. [35] not only proposed a formulation of the stochastic cutting stock problem as a discounted infinite-horizon Markov decision process, but also proposed a heuristic method based on reinforcement learning. The results showed that the average cost could be up to 80% lower than the cost obtained by a myopic policy. The previous work [36] also shows that the combination of reinforcement learning search and heuristic algorithm can achieve certain effects in solving the cutting stock problem, but there are certain limitations, such as the randomness of the search. As the mainstream method of machine learning, DRL has been successfully applied to image analysis [37], video classification [38], intelligent translation [39], and other fields. By evaluating and comparing the results of the algorithm, it is found that DRL has the potential for good scalability, generalization, and versatility. The pointer network model proposed by Vials O et al. [40] was the first modern depth model for combinatorial optimization problems and solved the traveling salesman problem (TSP) and knapsack problem. Based on the pointer network, Bello I et al. [41] used reinforcement learning training to solve the sequence problem, which not only saved computational costs, but also obtained the optimal solution for the large-scale TSP problem. Hao Jie [42] used DRL to study the dynamic TSP problem and obtained good results. Lombardi M et al. [43] used machine learning and deep learning to learn heuristic algorithms and successfully solved the problem of combinatorial optimization. Some researchers [44,45] solved the TSP problem with supervised learning and reinforcement learning, which achieved certain results. Hang Zhao et al. [46] solved the online the 3D bin packing problem by describing the variant of 3D bin packing problem as a constrained Markov decision process and proposed an effective and easy-to-implement constrained deep reinforcement learning method under the action-critic framework.

At present, there is no research using the DRL method to solve the 1DCSP, but some researchers are trying to solve the problem using neural networks aimed at CSP characteristics. Kang M et al. [47] tried to solve the 1DCSP using an artificial neural network and achieved certain results. AR De et al. [48] solved the 1DBPP by comparing the augmented neural network AugNN meta heuristic with the minimum bin slack MBS heuristic. These algorithms ignore the advantage of using deep networks to construct solutions autoregressively. These algorithms are limited to artificial decision-making networks and heuristic algorithms and need to be designed with solution rules separately for the problem, while ignoring the development and utilization of shared underlying patterns and self-learning optimization in cutting stock instances. The research on 1DCSP based on DRL is not only an early attempt to solve the cutting stock problem by using the deep network, but also a promotion and inheritance of existing research achievements. Further, exploration of cutting stock algorithms based on DRL can not only provide new ideas for online solutions to cutting stock problems, but also provide a new reference for solving more combinatorial optimization problems; the solution has great theoretical significance and application potential.

In this paper, a one-dimensional wire (that is, a unified name for raw materials of pipes and profiles by cutting stock) is taken as the research object, and a solution algorithm based on the DRL is proposed to solve the problem of a single specification cutting stock (that is, the length of raw materials available for cutting is only one size and the number is unlimited, while the length and number of wire pieces to be cut are certain). A pointer network with an encoder and decoder structure is taken as the strategy network, and the cutting sequence selection process of pieces to be cut is modeled as a Markov decision-making process. Then, the network parameters are trained using the reinforcement learning algorithm. In the training process, the strategy network and reward value are continuously optimized to obtain a satisfactory solution. Finally, the trained model is tested based on cutting stock instances to prove the effectiveness of the algorithm model.

2. Mathematical Model

2.1. Problem Statement

The 1DCSP can be divided into a single-specification cutting stock problem and a multi-specification cutting stock problem according to raw material specifications. According to the different objective functions of optimization, the 1DCSP can be divided into three different types: the problem with the least remaining materials as the optimization objective, the problem with the minimum number of raw materials consumed as the optimization objective, and the problem with the least remaining materials and the minimum number of raw materials consumed as the optimization objective. Common mathematical models for the 1DCSP of a single specification include the Kantorovich model [49], the Gilmore–Gomory model [50], and the Haessler model [51,52]. These models played an important role in solving a single optimization objective and achieved good results.

In this paper, the problem of wire cutting stocks is mainly aimed at the cutting optimization of enterprises that use standard-length raw materials for wire cutting stock. The single specification CSP is mainly adopted when considering the utilization of surplus materials. Therefore, our research takes minimum raw material consumption and the maximum remaining material length of standard raw materials as the optimization objective to establish a mathematical model. The objective function and constraints are shown in Formulae (1) and (2), respectively.

U = \min \sum_{i = 1}^{k} t_{i} - t_{\max}

(1)

s . t . \{\begin{cases} L_{i} \in L i = 1, 2, \dots, k \\ t_{i} = L_{i} - \sum_{j = 1}^{n} l_{j} a_{i j} \geq 0 i = 1, 2, \dots, k \\ d_{j} = \sum_{i = 1}^{k} a_{i j} j = 1, 2, \dots, n \\ t_{\max} = \max \{t_{i} | i = 1, 2, \dots, k \} \end{cases}

(2)

where U is the objective function, s.t. is the constraint condition, n is the type of wire piece to be cut, the number of each piece is d_j (j = 1,2,3…, n), l_j represents the length of piece category j, and a_ij represents the cutting number of the piece i (i = 1,2,3…, k) in cutting block category j. L is the length of raw materials, k is the number of raw materials, and the remaining length of each raw material is t_i. L_i represents the length of each raw material. To maximize the utilization of surplus material, the maximum surplus material t_max should be as long as possible. Therefore, the purpose of setting the objective function is to minimize the sum of all remaining material lengths t_i after removing the maximum remaining material t_max.

2.2. Problem Instance

Different solution definitions lead to different cutting stock schemes, directly affecting the cutting results. To better illustrate the mathematical model proposed in this paper, we distinguish the advantages and disadvantages of cutting stock schemes, as shown in Figure 1. Assume that the specifications of wire pieces are 300, 100, and 99, and the required quantities are 9, 903, and 67, respectively. All raw materials have a standard length of 1000 with an unlimited quantity. In Figure 1a, the number of raw materials consumed by cutting stock is 100, and the total length of the remaining materials is 367 (2 × 20 + 109 × 3), of which the length of the maximum remaining material is 109, which is represented by different color intervals. The total length of the consumed raw material is 99,633 (100 × 1000 − 367), and the sum of the length of the remaining material after the removal of the maximum remaining material is 40 (2 × 20). In Figure 1b, the number of raw materials consumed by cutting stock is also 100, and the total length of the remaining materials is 367 (5 × 10 + 105 × 2 + 107 × 1), of which, the length of the maximum remaining material is 107. The total length of the raw material consumed is also 99,633 (100 × 1000 − 367), and the sum of the length of the remaining material after the removal of the maximum remaining material is 260 (5 × 10 + 105 × 2). The values in Figure 1 represent the current quantity of wire raw materials, and the total quantity is 100. The symbol × indicates a multiplier, which is consistent with the symbol in the previous numerical calculation.

Comparing the two schemes, the cutting stock scheme in Figure 1a has a longer size of remaining material and a smaller total length of remaining materials after removing the maximum remaining materials on the basis of the same amount of raw material consumption. It can provide better support for reusing surplus materials, which is an ideal cutting scheme.

3. Algorithm Based on the DRL

In this section, an algorithm (DRL-CSP) based on DRL is described for solving the 1DCSP. The algorithm model is shown in Figure 2. The length sequence of the 1DCSP is used as input, and the vector representation of wire pieces is obtained after processing by the embedding layer of the neural network. Subsequently, the pointer network with an encoder and decoder structure (also known as the recurrent neural network module, RNN) is utilized to process the representation vector at each time point; then, the output is input into the probability function to obtain the cutting stock sequence. Furthermore, the network parameters are trained by reinforcement learning, and the loss of the model is adjusted by batch training to continuously optimize the reward function and obtain the optimal solution.

3.1. Policy Network Based on Pointer Network

Considering the time sequence relationship between the cutting sequences of wire pieces in the 1DCSP, we use the classical pointer network to process the prediction features of the cutting movement trajectory of wire pieces to simulate the behavior sequence of the cutting nodes of pieces. As described in the related literature [38], the output dictionary of pointer networks is variable in size, which has unique advantages for dealing with sequence problems. The structure of the pointer network is presented in Figure 3. In each time step t_i, the length information of wire pieces is processed into a latent memory state sequence by the embedding layer and input to the long short-term memory (LSTM) unit in the encoder [53]. LSTM is a special RNN that can learn long dependencies and has a memory function for earlier sequences. That is, the input of the encoder network at time step t_i is a one-dimensional embedding of l_i acquired through the linear transformation of l_i shared by all input pieces. In the last step of the encoder network, the unit state h^m−1 and the output of LSTM are provided to the decoder network. At each time step in the decoder network, the pointing mechanism is used to select the next piece to be cut. Once the next piece is selected, it will be used as the input to the next decoder step. For example, in Figure 3, the output of step 2 in the decoder network is 5. Then, the output l₅ of step 3 in the encoder network will be selected as the input of step 3 of the decoder network. The pointer to the input element is represented by

u_{n}^{m}

, and the hidden states of the encoder and decoder are represented by e_n and h_m, respectively. Moreover, the softmax function is used for normalization to adjust the distribution probability of important elements. The probability of output nodes can be calculated by Formulae (3) and (4).

u_{n}^{m} = v^{T} \tanh (W [e_{n}, h_{m}])

(3)

p = s o f t \max (u^{m})

(4)

where v and W are the parameters to be trained. Taking C_i as the sequence of pieces to be cut and L= {l₁, l₂…, l_j} as the set of wire pieces, the chain rule is used to estimate the conditional probability p, as shown in Formula (5).

p (C_{m} | C_{1}, \dots, C_{i - 1}, L) = s o f t \max (u^{m})

(5)

In addition, in Formula (6), the node with the maximum probability of the unselected piece is used as the target output of the prediction network in the process of predicting the sequence of pieces to be cut. The output node is marked.

l = \arg \max (p)

(6)

3.2. Network Training Based on Reinforcement Learning

Supervised learning training has been applied in the research of Vinyals O et al. [40] and achieved certain results. However, in large-scale instance problems, it is usually difficult and costly to obtain original tags. Irwan B et al. [41] obtained the optimal solution on 200 items by using reinforcement learning training, which shows the feasibility of reinforcement learning to solve combinatorial optimization problems under unsupervised conditions. In this paper, we adopt a strategy based on the reinforcement learning algorithm to train the network model. The theoretical model of the algorithm is based on the Markov decision-making process. In this case, the agent learns the action autonomously through the feedback of the reward value in the given environment. After a series of state and action updates, the cutting stock sequence corresponding to the optimal objective function can be obtained. For the wire cutting problem with j pieces, the episode information based on a specific strategy can be modeled as Formula (7).

s_{0}, a_{1}, s_{1}, r_{2}, a_{2}, s_{2}, r_{3}, \dots, a_{j}, s_{j}, r_{j + 1} π (s | a)

(7)

where a_i and s_i represent the action and state of stage I, respectively. s₀ is the state when raw materials have not been cut, and strategy π (s|a) is the strategy for selecting the next piece to be cut. We use the reinforcement learning algorithm based on Monte Carlo updating to train the network parameters and refer to the rules of this algorithm in the research of Williams et al. [54]. That is, the learning of state value and the updating of reward can only be carried out after the completion of a complete traversal of the wire pieces sequence to be cut. R_t is the total reward value returned by the agent exploration environment up to time t. The attenuation factor is defined as γ, which represents the distribution of current and future rewards. After a series of complete episodes, the accumulated discount reward value can be obtained, as shown in Formula (8). Every time the status updates, the average reward of the status can be calculated to guide the sequence adjustment of pieces to be cut. The theoretical research based on Monte Carlo reinforcement learning has been elaborated in the relevant literature [55].

R_{t} = r_{t + 1} + γ r_{t + 2} + \dots + γ^{j - 1} r_{t}

(8)

The neural network strategy parameterized to θ can be defined as π (o|l, θ), which means the probability of cutting sequence o to be adopted while giving a series of piece information l to be cut. The purpose of training is to obtain the expected utilization rate ρ of raw materials in Formula (9). The unit state h^t at time T acquired from the RNN in the prediction network is fed to the training of reinforcement learning, and the strategy function π_t at time t is obtained to represent the state behavior value at time t. The advantages and disadvantages of taking action a_t as the solution can be indicated under the piece cutting probability P and the current state s_t.

J (θ | l) = E_{o \sim π_{θ} (. | l)} ρ (o | l)

(9)

It can be seen from the above that the reward update based on Monte Carlo adopts a round system, and the average reward will be calculated to replace the value function and guide the change of state and action. In the training process, the input is the sequence of wire pieces and corresponding piece information, and the output is the parameter of the updated pointer network. Under the guidance of the pointer network, the wire pieces are cut with certain raw materials specifications, and the cutting results are used as rewards. The number of complete trajectories sampled for each gradient update is defined as M, and the parameter optimization based on the policy gradient is shown in Formula (10). The cutting utilization ρ is calculated according to the consumption of raw materials after obtaining the cutting sequence o. In the network training process, the loss function with model parameters is optimized by randomly extracting the data of batch wire pieces. The adaptive moment estimation (Adam) optimizer is used to adjust the loss function for random gradient descent. Then, the network parameters update so that the predicted value of the model gradually approaches the real action value. In the process of testing, the greedy strategy is utilized. In each step, the prediction with the highest probability is selected as the output. The network training process based on DRL is given in Algorithm 1.

\nabla_{θ} J (θ | l) \approx \frac{1}{M} \sum_{i = 1}^{M} ρ (o_{i} | l_{i}) \nabla_{θ} \log π_{θ} (o_{i} | l_{i})

(10)

At present, there is no large-scale training data set for the 1DCSP. Due to its simple structure, one-dimensional wire pieces in the practical application can be obtained by cutting wire rod pieces of different specifications or raw materials. Therefore, the piece data set used for reinforcement learning training in this study is generated based on piece cutting. The specific generation process of the data set is as follows: a piece L with a fixed specification is randomly cut into small pieces with variable length and a maximum number of n, and the piece set is initialized as {l₁, l₂..., l_n}. In the cutting process, the minimum length of the piece is set to l_min, and the maximum length is set to l_max. The same operation is carried out on all sample pieces, and the piece training set S is obtained.

Algorithm 1 Network training based on the RL

procedure Training set S, number of training steps T, batch size B

Initialize Pointer network param θ

for t = 1 to T do

Select a batch of sample s_i for i

\in

{1,2…, B}

Send s_i to pointer network, sample cutting solution o_i based on

π_{θ} (. | l_{i})

for i

\in

{1,2…, B}

Obtain cutting utilization rate

ρ (o_{i} | l_{i})

Let

g_{θ} = \frac{1}{B} \sum_{i = 1}^{B} [ρ (o_{i} | l_{i}) \nabla_{θ} \log π_{θ} (o_{i} | l_{i})]

Update θ = ADAM (θ, g_θ)

end for

return pointer network parameters θ

end procedure

4. Calculation Experiment and Analysis

The Python programming language is used in this study, and the neural network construction method based on PyTorch [56] is adopted. The calculation test of the DRL-CSP algorithm is carried out on a computer with a 2.30 GHz AMD Ryzen 7 3750H CPU with 4 cores and 16 GB of RAM. The training set in this paper is obtained by the cutting generation method, and the test set is composed of instances data and randomly generated data. The performance of the DRL-CSP algorithm is tested by using instances of the 1DCSP in the literature of Peiyong Li et al. [57] and Xianjun Shen et al. [58]. These instances are also the data of the real steel cutting stock in the enterprise, as shown in Table 1 and Table 2. Furthermore, randomly generated large-scale instances of more than 1000 pieces to be cut are also used to test the performance of the algorithm model, as given in Table 3. In the experiment, the utilization rate of raw materials is taken as a reward. That is, the length consumption and residual length of raw materials are within the scope of optimization. Each piece L with a fixed specification can be cut into 500 small pieces at most. The minimum length of piece l_min is set to 1000, and the maximum length l_max is set to 5000. The number of samples is set to 500, and the number of training steps is set to 50. Our batch size B is set to 32, and the hidden dimension of the LSTM cell is set to 32. The initial learning rate of the model is set to le-3, and the discount factor of the reward is set to 1. The training takes approximately 90 min. During the training process, with the increase in the training epoch, the loss value of the network changes, as shown in Figure 4.

4.1. Experimental Results

Table 4 shows the comparison between our experimental results and the algorithm results in related literature [57,58,59]. The specific cutting stock results of instances S1 and S2, based on the DRL-CSP algorithm, can be found in Table 5 and Table 6. Through the analysis of Table 4, the DRL-CSP algorithm proposed in this paper has achieved similar results compared with the current excellent cutting stock algorithm of instance S1 with 28 specifications of pieces and instance S2 with 35 specifications of pieces. In instance S1, the raw materials with a quantity of 24 are consumed under the DRL-CSP algorithm, which is one more raw material consumption than the calculation result of the classical algorithms (HGA [57], AGAPSO [58], DEPES [59], PES [59]), and one more consumption than the quantity of the theoretical lower bound. In addition, the average utilization rate of raw materials has reached 94.56%, while the average utilization rate of raw materials with the removal of the maximum remaining material has increased to 96.43%. It is approximately 3% less than the theoretical optimal solution, which belongs to the category of a satisfactory solution. The length of the maximum remaining material obtained by the DRL-CSP algorithm is 7587, which is 849 mm longer than that obtained by the algorithm based on the HGA. This result is more conducive to the reuse of residual material. In instance S2, the raw materials with a quantity of 15 are consumed under the DRL-CSP algorithm, which has the same consumption quantity as the calculation based on the AGAPSO algorithm. The average utilization rate and the average utilization rate of the raw material with the removal of the maximum remaining material based on the DRL-CSP algorithm are 90.99% and 95.68%, respectively. These results have similar data rules with the algorithm calculated based on AGAPSO. In addition, the length of the maximum remaining material reaches 74.6% of the length of the raw material, which has positive impacts for the utilization of the residual material. In instance S3, a 92.41% average utilization rate of the raw material with the removal of the maximum remaining material is obtained for 1497 pieces to be cut, which exceeds the limit of 90% utilization rate in the practical application of the 1DCSP [17].

By analyzing Table 5 and Table 6, a 100% utilization rate of raw materials appears in two data sets based on the DRL-CSP. In addition, the length of the maximum remaining material of instance S1 is 7587, and the residual material of most raw materials is less than 100, with almost no waste of residual material. This indicates that the cutting stock scheme of this instance has been effectively optimized. It is worth mentioning that in all instance calculations based on the DRL-CSP algorithm, the operation time is less than 1 s, which indicates that the algorithm model has high efficiency. Through the comparison of test results, it can be seen that the algorithm model of the DRL-CSP proposed in this paper has good generalization performance and solving efficiency, which can realize the cutting stock of wire pieces of various sizes in a very short time and obtain satisfactory solutions, indicating great practical application potential.

4.2. Analysis and Discussion

The above experimental results show that the DRL-CSP algorithm can not only efficiently solve the 1DCSP with 82 instances of 3 data sets, but that it can also obtain some satisfactory solutions comparable with the classical algorithm. The algorithm can not only meet the requirements of high utilization of raw materials, but also perform robustness calculations of large-scale cutting stock instances, which is related to the learning and optimization mode of the DRL model. The DRL model can automatically learn the cutting rules by using the shared underlying mode and data update mode of the wire piece cutting scheme, avoiding the solution defects caused by manual intervention similar to the heuristic solution. Therefore, the DRL-CSP algorithm has stronger generalization performance and versatility. Figure 4 shows that the loss function changes with the increase of epochs during the reinforcement learning training. With the epoch increases, the loss curve decreases, and the fluctuation tends to converge. The possible reason is that with the increase of training time of reinforcement learning, the model will carry out a series of learning optimization and adjustment after backpropagation and gradient optimization. Meanwhile, the learning amplitude of the network will fluctuate toward the direction of more accurate prediction.

In the 1DCSP solution of instance S1, the calculation result based on the DRL-CSP algorithm reaches the category of a satisfactory solution. However, the calculation based on algorithms HGA, AGAPSO, DEPES, and PES have achieved an average utilization rate of 97.67%, and the calculation based on algorithms AGAPSO, DEPES, and PES has more remaining material. The result means that the calculation performance of the DRL-CSP algorithm is not as good as a certain classical algorithm in some instances, which is related to the calculation principle between algorithms. The DRL-CSP algorithm utilizes the principles of deep learning and reinforcement learning and obtains a stable algorithm model by training a large number of data sets. In the process of training, the setting of the training set and network parameters has a great impact on the model performance. The classical heuristic algorithm can approximate the solution of specific instances by manually setting the solution rules and has a certain solving efficiency. Therefore, for some specific instances, a better solution can be obtained by the classical heuristic algorithm. There are differences between them in solving principles, algorithm designs, and solving effects on different problem instances. Constructing a more efficient network solution structure and optimizing network parameters and training sets may better improve the performance of the DRL model. In general, the DRL-CSP algorithm can achieve similar results compared with the classical cutting stock algorithm of instance S1, with 28 specifications of pieces, and instance S2, with 35 specifications of pieces, which has certain advancements. The research of cutting stock algorithms based on DRL can not only provide new ideas for online solutions to cutting stock problems, but also provide a new reference for solving more combinatorial optimization problems, which has great theoretical significance and application potential.

In the 1DCSP solution of instance S3, a 92.41% average utilization rate of the raw material with the removal of the maximum remaining material has been achieved. A number of 1497 pieces should be cut, and the remaining material is long. However, more than a 95% utilization rate of the raw material, as of instance S1 and instance S2, has not been achieved with the removal of the maximum remaining material. The possible reason for this result is that the cutting distribution of the raw material is relatively uniform, and the gradient of residual materials is not obvious. The result further shows that the selection of sample data, the establishment of a mathematical model, and the distribution of case data have a greater impact on the performance of the DRL model in solving large-scale CSP problems.

5. Conclusions and Future Work

In this study, a mathematical model is established with the optimization objective of the minimum raw material consumption and the maximum remaining material length, aimed at the characteristics of the 1DCSP. An algorithm based on DRL is proposed to solve the 1DCSP, and the underlying sharing mode of the 1DCSP is obtained using the network framework of deep learning. Furthermore, reinforcement learning training is used to optimize the network parameters and automatically learn the cutting stock rules. To the best of our knowledge, it is the first time to solve the IDCSP with the help of a deep neural network. The experimental results show that the DRL-CPS algorithm can not only efficiently solve the 1DCSP with different specifications, but also obtain satisfactory solutions comparable with some classical algorithms. Moreover, a utilization rate of the raw material of higher than 90% can be obtained on large-scale data sets, which shows the great potential of the DRL-CSP algorithm for solving practical cutting stock problems.

In future research, we will explore a more efficient network solution structure and transfer solution mechanism while improving deep network optimization and consider applying deep networks to solve two-dimensional nesting problems.

Author Contributions

J.F.: conceptualization, methodology, validation, formal analysis, writing—original draft, writing—review and editing; Y.R.: supervision, project administration, writing—review and editing; Q.L.: writing—review and editing; J.X.: writing—review and editing; All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number. 51975231) and the Fundamental Research Funds for the Central Universities (grant number. 2019kfyXKJC043).

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the referees for their constructive comments that improved the presentation as well as the content of the paper.

Conflicts of Interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, and there are no professional or other personal interest of any nature or kind of in any product, service and/or company that could be construed as influencing the position presented in, or the review of the manuscript entitled.

References

Stadtler, H. A one-dimensional cutting stock problem in the aluminium industry and its solution. Eur. J. Oper. Res. 1990, 44, 209–223. [Google Scholar] [CrossRef]
Johnson, M.; Rennick, C.; Zak, E. Skiving Addition to the Cutting Stock Problem in the Paper Industry. SIAM Rev. 1997, 39, 472–483. [Google Scholar] [CrossRef]
Cui, Y. A cutting stock problem and its solution in the manufacturing industry of large electric generators. Comput. Oper. Res. 2005, 32, 1709–1721. [Google Scholar] [CrossRef]
Ogunranti, G.; Oluleye, A. Minimizing waste (off-cuts) using cutting stock model: The case of one-dimensional cutting stock problem in wood working industry. J. Ind. Eng. Manag. 2016, 9, 834–859. [Google Scholar] [CrossRef] [Green Version]
Wattanasiriseth, P.; Krairit, A. An Application of Cutting-Stock Problem in Green Manufacturing: A Case Study of Wooden Pallet Industry. IOP Conf. Ser. Mater. Sci. Eng. 2019, 530, 12005. [Google Scholar] [CrossRef]
Wäscher, G.; Haußner, H.; Schumann, H. An improved typology of cutting and packing problems. Eur. J. Oper. Res. 2007, 183, 1109–1130. [Google Scholar] [CrossRef]
Lima, V.; Alves, C.; Clautiaux, F.; Iori, M.; Valério, D.; José, M. Arc flow formulations based on dynamic programming: Theoretical foundations and applications. Eur. J. Oper. Res. 2022, 296, 3–21. [Google Scholar] [CrossRef]
Dyckhoff, H. A New Linear Programming Approach to the Cutting Stock Problem. Oper. Res. 1981, 29, 1092–1104. [Google Scholar] [CrossRef]
Brandão, F.; Pedroso, J. Bin packing and related problems: General arc-flow formulation with graph compression. Comput. Oper. Res. 2016, 69, 56–67. [Google Scholar] [CrossRef] [Green Version]
Yang, C.; Yang, L.; Sheng, Z. Research on Multi-Branches Tree Traversal Algorithm of One-Dimensional Cutting Stock Problem. Mech. Eng. Autom. 2018, 15, 11–12. [Google Scholar] [CrossRef]
Kang, M.; Yoon, K. An improved best-first branch-and-bound algorithm for unconstrained two-dimensional cutting problems. Int. J. Prod. Res. 2011, 49, 4437–4455. [Google Scholar] [CrossRef]
Lu, H.; Huang, Y. An efficient genetic algorithm with a corner space algorithm for a cutting stock problem in the TFT-LCD industry. Eur. J. Oper. Res. 2015, 246, 51–66. [Google Scholar] [CrossRef]
Haessler, R.; Sweeney, P. Cutting stock problems and solution procedures. Eur. J. Oper. Res. 1991, 54, 141–150. [Google Scholar] [CrossRef] [Green Version]
Wäscher, G.; Gau, T. Heuristics for the integer one-dimensional cutting stock problem: A computational study. Oper. Res. Spektrum 1996, 18, 131–144. [Google Scholar] [CrossRef]
Wu, Z.; Zhang, L.; Wang, K. An Ant Colony Algorithm for One-dimensional Cutting-stock Problem. Mech. Sci. Technol. Aerosp. Eng. 2008, 27, 1681–1684. [Google Scholar]
Guan, W.; Gong, J.; Xue, H. A Hybrid Heuristic Algorithm for the One-Dimensional Cutting Stock Problem. Mach. Des. Manuf. 2018, 8, 237–239. [Google Scholar]
Zhu, S. The Research on Optimization Algorithms for one-Dimensional Cutting Stock Problems. Master’s Thesis, Huazhong University of Science and Technology, Wuhan, China, 2013. [Google Scholar]
Cui, Y.; Song, X.; Chen, Y. New model and heuristic solution approach for one-dimensional cutting stock problem with usable leftovers. J. Oper. Res. Soc. 2017, 68, 269–280. [Google Scholar] [CrossRef] [Green Version]
Ma, J.; Han, Z.; Luo, D.; Xiao, H. Research on One-Dimensional Cutting Stock Problem Based on Recursive Matrix Column Generation Algorithm. Mach. Des. Manuf. 2022, 117–119. [Google Scholar] [CrossRef]
Belov, G.; Scheithauer, G. Setup and Open-Stacks Minimization in One-Dimensional Stock Cutting. INFORMS J. Comput. 2007, 19, 27–35. [Google Scholar] [CrossRef] [Green Version]
Cao, J.; Cui, Y.; Li, D. Study on the solution of one-dimensional cutting stock for multiple stock lengths with variable cross-section. Forg. Stamp. Technol. 2017, 42, 161–165. [Google Scholar]
Cerqueira, G.; Aguiar, S.; Marques, M. Modified Greedy Heuristic for the one-dimensional cutting stock problem. J. Comb. Optim. 2021, 42, 657–674. [Google Scholar] [CrossRef]
Ravelo, S.; Meneses, C.; Santos, M. Meta-heuristics for the one-dimensional cutting stock problem with usable leftover. J. Heuristics 2020, 26, 585–618. [Google Scholar] [CrossRef]
Pimenta, Z.; Sakuray, F.; Hoto, R. A heuristic for the problem of one-dimensional steel coil cutting. Comput. Appl. Math. 2021, 40, 39. [Google Scholar] [CrossRef]
Tian, S.; Lv, L.; Cai, Y. Design and implementation of a simple algorithm for solving one dimensional cuttking block problem based on Lingo. Ind. Sci. Trib. 2021, 20, 45–47. [Google Scholar]
Zhang, C.; Song, W.; Cao, Z.; Zhang, J.; Tan, P.; Xu, C. Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning. Adv. Neural Inf. Process. Syst. 2020, 33, 1621–1632. [Google Scholar]
Park, J.; Chun, J.; Kim, S.; Kim, Y. Learning to schedule job-shop problems: Representation and policy learning using graph neural network and reinforcement learning. Int. J. Prod. Res. 2021, 59, 3360–3377. [Google Scholar] [CrossRef]
Li, J.; Ma, Y.; Gao, R.; Cao, Z.; Lim, A.; Song, W.; Zhang, J. Deep Reinforcement Learning for Solving the Heterogeneous Capacitated Vehicle Routing Problem. IEEE Trans. Cybern. 2022, 52, 13572–13585. [Google Scholar] [CrossRef]
Xin, L.; Song, W.; Cao, Z.; Zhang, J. Step-Wise Deep Learning Models for Solving Routing Problems. IEEE Trans. Ind. Inform. 2021, 17, 4861–4871. [Google Scholar] [CrossRef]
Kool, W.; Van, H.; Welling, M. Attention, learn to solve routing problems. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Xin, L.; Song, W.; Cao, Z.; Zhang, J. NeuroLKH: Combining Deep Learning Model with Lin-Kernighan-Helsgaun Heuristic for Solving the Traveling Salesman Problem. Adv. Neural Inf. Process. Syst. 2021, 34, 7472–7483. [Google Scholar]
Ivanov, D.; Kiselev, M.; Larionov, D. Neural Network Optimization for Reinforcement Learning Tasks Using Sparse Computations. arXiv 2022, arXiv:2201.02571. [Google Scholar]
Zhou, R.; Tian, Y.; Wu, Y.; Du, S. Understanding Curriculum Learning in Policy Optimization for Solving Combinatorial Optimization Problems. arXiv 2022, arXiv:2202.05423. [Google Scholar]
Peng, B.; Wang, J.; Zhang, Z. A Deep Reinforcement Learning Algorithm Using Dynamic Attention Model for Vehicle Routing Problems. Commun. Comput. Inf. Sci. 2020, 1205, 636–650. [Google Scholar]
Pitombeira-Neto, A.R.; Murta, A.H.F. A reinforcement learning approach to the stochastic cutting stock problem. Eur. J. Comput. Optim. 2022, 10, 100027. [Google Scholar] [CrossRef]
Fang, J.; Rao, Y.; Zhao, X.; Du, B. A Hybrid Reinforcement Learning Algorithm for 2D Irregular Packing Problems. Mathematics 2023, 11, 327. [Google Scholar] [CrossRef]
Zhang, W.; Tang, S.; Su, J.; Xiao, J.; Zhuang, Y. Tell and guess: Cooperative learning for natural image caption generation with hierarchical refined attention. Multimed. Tools Appl. 2021, 80, 16267–16282. [Google Scholar] [CrossRef]
Xia, B.; Wong, C.; Peng, Q.; Yuan, W.; You, X. CSCNet: Contextual semantic consistency network for trajectory prediction in crowded spaces. Pattern Recognit. 2022, 126, 108552. [Google Scholar] [CrossRef]
Song, J.; Kim, S.; Yoon, S. AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate. In Proceedings of the 2021 Conference On Empirical Methods in Natural Language Processing (EMNLP 2021), Punta Cana, Dominican Republic, 7–11 November 2021; pp. 1–14. [Google Scholar]
Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer networks. Adv. Neural Inf. Process. Syst. 2015, 28, 2692–2700. [Google Scholar]
Bello, I.; Pham, H.; Le, Q.; Norouzi, M.; Bengio, S. Neural combinatorial optimization with reinforcement learning. In Proceedings of the 5th International Conference on Learning Representations, ICLR 2017 Workshop Track Proceedings, Toulon, France, 24–26 April 2017. [Google Scholar]
Chen, H.; Fan, J.; Liu, Y. Solving dynamic traveling salesman problem by deep reinforcement learning. J. Comput. Appl. 2022, 42, 1194–1200. [Google Scholar]
Lombardi, M.; Milano, M. Boosting combinatorial problem modeling with machine learning. In Proceedings of the 27th IJCAI International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 5472–5478. [Google Scholar]
Joshi, C.; Thomas, L.; Bresson, X. An Efficient Graph Convolutional Network Technique for the Travelling Salesman Problem. arXiv 2019, arXiv:1906.01227. [Google Scholar]
Bogyrbayeva, A.; Yoon, T.; Ko, H.; Lim, S.; Yun, H.; Kwon, C. A Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drone. arXiv 2021, arXiv:2112.12545. [Google Scholar] [CrossRef]
Zhao, H.; She, Q.; Zhu, C.; Yang, Y.; Xu, K. Online 3D bin packing with constrained deep reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Online, 2–9 February; 35, pp. 741–749.
Kang, M.; Oh, J.; Lee, Y.; Park, K.; Park, S. Selecting Heuristic Method for One-dimensional Cutting Stock Problems Using Artificial Neural Networks. Korean J. Comput. Des. Eng. 2020, 25, 67–76. [Google Scholar] [CrossRef]
Almeida, R.; Steiner, M. Resolution of one-dimensional bin packing problems using augmented neural networks and minimum bin slack. Int. J. Innov. Comput. Appl. 2016, 7, 214–224. [Google Scholar] [CrossRef]
Kantorovich, L. Mathematical Methods of Organizing and Planning Production. Manag. Sci. 1960, 6, 366–422. [Google Scholar] [CrossRef]
Gilmore, P.; Gomory, R. A Linear Programming Approach to the Cutting Stock Problem–Part II. Oper. Res. 1963, 11, 863–888. [Google Scholar] [CrossRef]
Haessler, R. A Heuristic Programming Solution to a Nonlinear Cutting Stock Problem. Manag. Sci. 1971, 17. [Google Scholar] [CrossRef]
Haessler, R. Controlling Cutting Pattern Changes in One-Dimensional Trim Problems. Oper. Res. 1975, 23, 483–493. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Williams, R. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 1992, 8, 229–256. [Google Scholar] [CrossRef] [Green Version]
Fang, J.; Rao, Y.; Guo, X.; Zhao, X. A reinforcement learning algorithm for two-dimensional irregular packing problems. In Proceedings of the ACAI’21: 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence, Sanya, China, 22–24 December 2021. [Google Scholar]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Chintala, S.; Killeen, T.; Gimelshein, N.; Lin, Z.; et al. PyTorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems 32 (NIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; pp. 8024–8035. [Google Scholar]
Li, P.; Wang, Q.; Qiu, Y. Optimization for One-Dimensional Cutting UsingHybrid Genetic Algorithm. J. Shanghai Jiaotong Univ. 2001, 35, 1557–1560. [Google Scholar]
Shen, X.; Yang, J.; Ying, W. Adaptive General Particle Swarm Optimization for One-Dimension Cutting Stock Problem. J. South China Univ. Technol. (Nat. Sci. Ed.) 2007, 35, 113–117. [Google Scholar]
Hou, G. Research of One-dimensional Cutting Stock Problem Based on Improved Pyramid Evolution Strategy. Master’s Thesis, Wuhan University of Technology, Wuhan, China, 2020. [Google Scholar]

Figure 1. Diagram of the cutting stock scheme. (a) Ideal cutting stock scheme (b) Undesirable cutting stock scheme.

Figure 2. The DRL-CSP model.

Figure 3. Pointer network structure.

Figure 4. Loss curve of algorithm model training.

Table 1. Instance S1 based on steel cutting stock.

Instance	Raw Material Length L (mm)	Length (mm) and Quantity of Steel Pieces
		Length	Quantity	Length	Quantity	Length	Quantity
S1	18,000	3280	4	2786	6	1908	8
		3275	4	2757	5	1849	9
		3085	4	2680	4	1812	5
		3005	5	2543	3	1770	12
		2952	4	2304	10	1712	16
		2950	4	2167	4	1689	8
		2868	2	2162	16	1404	8
		2859	4	2006	8	1352	8
		2830	6	1975	8	1315	8
		1308	8	-	-	-	-

Table 2. Instance S2 based on steel cutting stock.

Instance	Raw Material Length L (mm)	Length (mm) and Quantity of Steel Pieces
		Length	Quantity	Length	Quantity	Length	Quantity
S2	25,800	1615	8	1850	1	2686	1
		1622	8	1955	8	2773	1
		1634	8	1968	8	2777	4
		1655	8	2068	8	2788	1
		1709	8	2162	16	2799	1
		1722	8	2213	2	2832	4
		1770	8	2334	8	2838	2
		1812	1	2591	3	2843	4
		1849	8	2674	1	2851	1
		2891	1	2987	4	3082	1
		3352	2	3373	2	3388	4
		4273	4	4288	4	-	-

Table 3. Instance S3 based on large-scale wire pieces.

Instance	Raw Material Length L (mm)	Length (mm) and Quantity of Large-Scale Wire Pieces
		Length	Quantity	Length	Quantity	Length	Quantity
S3	18,000	4600	20	4300	140	3000	160
		4000	140	4150	30	3800	60
		3205	99	2838	34	2240	28
		2838	34	2860	10	2500	80
		2400	120	2334	67	2162	156
		1968	29	1955	100	1800	50
		1600	140	-	-	-	-

Table 4. Results information of the 1DCSP.

Instance	Algorithm	Quantity of Raw Materials Consumed	Average Utilization Rate	Average Utilization Rate (Removal of the Raw Material Containing the Maximum Remaining Material)	Length of the Maximum Remaining Material
S1	DRL-CSP	24	94.56%	96.43%	7587
	HGA	23	97.67%	99.27%	6738
	AGAPSO	23	97.67%	99.58%	8529
	DEPES	23	97.67%	99.82%	8909
	PES	23	97.67%	99.74%	8621
S2	DRL-CSP	15	90.99%	95.68%	19,255
S2	AGAPSO	15	93.25%	99.47%	24,185
S3	DRL-CSP	250	91.99%	92.41%	11,600

Table 5. The specific cutting stock results of instance S1.

Label of the Raw Material	Piece Length/mm (Quantity of Pieces)						Length of the Remaining Material/mm	Utilization Rate/%
1	1315	1308	2006	3280	2868	2757	0	100
1	2304	2162	-	-	-	-	0	100
2	1404	2786(2)	1352	3275	2859	1849	0	100
2	1689	-	-	-	-	-	0	100
3	1352	1975(2)	3280	2786	2162(2)	2304	4	99.98
4	1689	1712	1770(2)	2920	2868	2006	5	99.97
4	1908	1352	-	-	-	-	5	99.97
5	1975	1849	1712(2)	3005	2757	2680	6	99.97
5	2304	-	-	-	-	-	6	99.97
6	1689	1849	1352	3085	2859	2162	7	99.96
6	2830	2167	-	-	-	-	7	99.96
7	1812	1770	1712(2)	1849(2)	2006(2)	3275	9	99.95
8	1315	1352(2)	1712(2)	2952	2304	1308	12	99.93
8	2006	1975	-	-	-	-	12	99.93
9	1812	1849	2167(2)	3280	2830	1712	21	99.88
9	2162	-	-	-	-	-	21	99.88
10	1689	1712	1770(2)	2830	2680	1308	29	99.84
10	2304	1908	-	-	-	-	29	99.84
11	1712	1308(2)	1770	1689	1908(2)	1812	36	99.80
11	2006	2543	-	-	-	-	36	99.80
12	1352	1712	1908(3)	2920	2757	1315	58	99.68
12	2162	-	-	-	-	-	58	99.68
13	1308	1712(3)	1770	1812	1308	1908	79	99.56
13	1849	2830	-	-	-	-	79	99.56
14	2757	2304	2162(5)	2006	-	-	123	99.32
15	2952	2830	2757	2543	2304(2)	2162	148	99.18
16	2859	1404(3)	2786	2162(2)	1770(2)	-	279	98.45
17	2162	2167	2304	3085	2952	1975	569	96.84
17	2786	-	-	-	-	-	569	96.84
18	3005	2920	2859	2830	2786	2680	920	94.89
19	2006	1812	3005(2)	1712(2)	2680	-	2068	88.51
20	1308	1404(2)	1315(2)	3275	2920	2830	2229	87.62
21	3005	1770(2)	3280	3085	1315(2)	-	2460	86.33
22	3275	1352	1689	1404(2)	3085	2952	2839	84.23
23	2543	2304	2162	1975	1849(2)	1308	4010	77.72
24	1315	1770	1975(2)	1689(2)	-	-	7587	57.85

Table 6. The specific cutting stock results of instance S2.

Label of the Raw Material	Piece Length/mm (Quantity of Pieces)						Length of the Remaining Material/mm	Utilization Rate/%
1	3352	1655	1722(2)	1955(2)	2334	2162	1	100
1	2838	2068(2)	1968	-	-	-	1	100
2	4273	1615	2334(2)	3388	3373	2851	1	100
2	2832	2799	-	-	-	-	1	100
3	4288	1655	1812	3082	2891	2832	2	99.99
3	2162	2068	2674	2334	-	-	2	99.99
4	4273	1615	1849(3)	3388	2162	1850	21	99.92
4	1634	1770(3)	-	-	-	-	21	99.92
5	4273	1615(3)	4288(2)	3388	1722	2773	223	99.14
6	1968	2162(2)	2838	2068(2)	1709(2)	2686	419	98.38
6	2334	1722	1955	-	-	-	419	98.38
7	2777	1968	1634(2)	1722	2334	2843	614	97.62
7	1709	2591	2987(2)	-	-	-	614	97.62
8	2068	1849(2)	1655(2)	2777	2162	2334	926	96.41
8	1770	1955	1968	2832	-	-	926	96.41
9	4288	1615	1622(2)	4273	3388	2987	1066	95.87
9	2777	2162	-	-	-	-	1066	95.87
10	1849	1955(2)	2213	2162	1968	2068	1558	93.96
10	1722	2591	1655	1770	2334	-	1558	93.96
11	2832	1968(2)	2162(3)	1955(2)	1709(2)	1770(2)	1678	93.50
12	1622	2843(2)	1634(2)	2777	2591	1849	1758	93.19
12	2213	2068	1968	-	-	-	1758	93.19
13	1849	2162(4)	2788	1709(2)	1655(2)	1700	2295	91.10
13	1722	-	-	-	-	-	2295	91.10
14	2843	1722	2987	3352	1634	3373	5037	80.48
14	1622	1615(2)	-	-	-	-	5037	80.48
15	1655	1634(2)	1622	-	-	-	19,255	25.37

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fang, J.; Rao, Y.; Luo, Q.; Xu, J. Solving One-Dimensional Cutting Stock Problems with the Deep Reinforcement Learning. Mathematics 2023, 11, 1028. https://doi.org/10.3390/math11041028

AMA Style

Fang J, Rao Y, Luo Q, Xu J. Solving One-Dimensional Cutting Stock Problems with the Deep Reinforcement Learning. Mathematics. 2023; 11(4):1028. https://doi.org/10.3390/math11041028

Chicago/Turabian Style

Fang, Jie, Yunqing Rao, Qiang Luo, and Jiatai Xu. 2023. "Solving One-Dimensional Cutting Stock Problems with the Deep Reinforcement Learning" Mathematics 11, no. 4: 1028. https://doi.org/10.3390/math11041028

APA Style

Fang, J., Rao, Y., Luo, Q., & Xu, J. (2023). Solving One-Dimensional Cutting Stock Problems with the Deep Reinforcement Learning. Mathematics, 11(4), 1028. https://doi.org/10.3390/math11041028

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Solving One-Dimensional Cutting Stock Problems with the Deep Reinforcement Learning

Abstract

1. Introduction

2. Mathematical Model

2.1. Problem Statement

2.2. Problem Instance

3. Algorithm Based on the DRL

3.1. Policy Network Based on Pointer Network

3.2. Network Training Based on Reinforcement Learning

4. Calculation Experiment and Analysis

4.1. Experimental Results

4.2. Analysis and Discussion

5. Conclusions and Future Work

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI