Research on the Multiobjective and Efficient Ore-Blending Scheduling of Open-Pit Mines Based on Multiagent Deep Reinforcement Learning

Feng, Zhidong; Liu, Ge; Wang, Luofeng; Gu, Qinghua; Chen, Lu

doi:10.3390/su15065279

Open AccessArticle

Research on the Multiobjective and Efficient Ore-Blending Scheduling of Open-Pit Mines Based on Multiagent Deep Reinforcement Learning

by

Zhidong Feng

^1,2,3,*

,

Ge Liu

^1,3,

Luofeng Wang

⁴,

Qinghua Gu

^1,3 and

Lu Chen

^3,5

¹

School of Resources Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China

²

School of Information Engineering, Yulin University, Yulin 719000, China

³

Xi’an Key Laboratory for Intelligent Industrial Perception, Calculation and Decision, Xi’an University of Architecture and Technology, Xi’an 710055, China

⁴

CMOC Group Limited, Luoyang 417500, China

⁵

School of Management, Xi’an University of Architecture and Technology, Xi’an 710055, China

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(6), 5279; https://doi.org/10.3390/su15065279

Submission received: 7 February 2023 / Revised: 10 March 2023 / Accepted: 13 March 2023 / Published: 16 March 2023

Download

Browse Figures

Versions Notes

Abstract

:

In order to solve the problems of a slow solving speed and easily falling into the local optimization of an ore-blending process model (of polymetallic multiobjective open-pit mines), an efficient ore-blending scheduling optimization method based on multiagent deep reinforcement learning is proposed. Firstly, according to the actual production situation of the mine, the optimal control model for ore blending was established with the goal of minimizing deviations in ore grade and lithology. Secondly, the open-pit ore-matching problem was transformed into a partially observable Markov decision process, and the ore supply strategy was continuously optimized according to the feedback of the environmental indicators to obtain the optimal decision-making sequence. Thirdly, a multiagent deep reinforcement learning algorithm was introduced, which was trained continuously and modeled the environment to obtain the optimal strategy. Finally, taking a large open-pit metal mine as an example, the trained multiagent depth reinforcement learning algorithm model was verified via experiments, with the optimal training model displayed on the graphical interface. The experimental results show that the ore-blending optimization model constructed is more in line with the actual production requirements of a mine. When compared with the traditional multiobjective optimization algorithm, the efficiency and accuracy of the solution have been greatly improved, and the calculation results can be obtained in real-time.

Keywords:

strip mine; ore blending; multiagent; deep reinforcement learning; multiobjective optimization

1. Introduction

1.1. Background and Motivation

Metal mineral resources are nonrenewable strategic materials and represent an important guarantee of national strategic security. There are many poor ores and abundant associated ores in our country, so mineral processing is difficult, and the comprehensive utilization rate is low. With increasingly complex mine production and operation requirements, the previous single-objective plans have been unable to meet the current fine management of multiple minerals in mining enterprises. Therefore, the preparation of a scientific and effective production plan for multimetal and multiobjective open-pit mines will play a vital role in improving the comprehensive utilization rate of the ores, the production cost of the mines, and the economic development of enterprises.

The preparation of an ore-blending plan mainly focuses on modeling and solutions. The modeling mainly takes ore grade, transportation work, production capacity, etc., as the optimization objectives. Single-objective planning mostly adopts 0–1 integer programming, goal programming, and general linear programming methods [1] for modeling. Among them, Wu Lichun et al. [2] adopted a 0–1 integer programming model and embedded it in DIMINE software to study the optimization problem of open-pit mine ore blending with the goal of a balanced quantity of ore. The solving speed and work efficiency of ore blending were significantly improved. Chen Pengneng et al. [3] established a linear programming model with the goal of maximizing the sales value of phosphate rock and solved it with Matlab software. The digital ore-blending model effectively reduced the cost and harmful impurity content. Ke Lihua et al. [4] conducted ore blending for low-grade dolomite and dolomite-grade foreign products and used the ore blending mathematical model to formulate planned ore-blending schemes, which improved the utilization rate of the mineral resources in Wulongquan. Wang Liguan et al. [5] set up an optimization model for open-pit ore-blending with the goal of minimizing grade deviation by using a programming language and created a solver to solve the problem. The algorithm was applied to a copper-molybdenum mine in Inner Mongolia, with strong practical significance. Li Guoqing et al. [6] established a mining operation planning model for underground metal mines based on 0–1 integer programming with the goal of minimizing grade deviation. A computer model and an integer-programming method were used to obtain the optimal scheme for mining operation planning. The ore could be reasonably matched, and the grade could be balanced.

With a demand for increasingly refined production plans, it is necessary to comprehensively consider multiple production indicators, which are often conflicting. Single-objective planning is gradually transitioning to multiobjective planning. In view of the complexity of multiobjective programming models and the difficulty of manual solutions, modern optimization technology can be used to solve the problem. Among them, Yao Xudong et al. [7] used an immune clonal selection algorithm to solve a short-term multiobjective ore matching model of underground metal mines, which could overcome the local optimal situation, and the model could converge stably. Li Zhiguo et al. [8] built a multiobjective ore-blending model for a phosphate ore storage yard with the goal of grinding the flotation raw ore composition index, the stable quality of raw ore, and maximizing the utilization of raw ore and solved this using a multiobjective genetic algorithm. Li Ning et al. [9] established a mine production ore-blending model with ore output and grade fluctuation as the variables and solved it by using a mixed particle swarm optimization algorithm. Gu Qinghua et al. [10] used a genetic algorithm to solve a limestone open pit ore-blending model with the objective function of minimizing grade deviation and the energy consumption of production. Huang Linqi et al. [11] established a metal ore-blending model with grade fluctuation and mining and transportation costs as the output variables and solved it using an adaptive genetic algorithm. Feng Qian et al. [12] used a particle swarm optimization algorithm to solve a sintering-ore-blending model of iron and steel production with the goal of minimizing the cost of blending the materials and maximizing the iron content. Although there are a large number of pieces of research that apply modern optimization methods, such as genetic algorithms, particle swarm optimization, and evolutionary algorithms, to solve multiobjective ore-blending optimization problems, the following challenges still remain. Firstly, most of these algorithms have the problem of a slow convergence speed and easily fall into a local optimum. Secondly, once the environment changes, such algorithms cannot be controlled online and respond quickly and effectively to sudden situations. Finally, most of these algorithms convert multiple objectives into single objectives for a solution and do not fundamentally solve the multiobjective programming problem.

1.2. Research Status

In recent years, with the development of artificial intelligence, machine learning, and other technologies, deep reinforcement learning is gradually becoming favored by researchers. Both reinforcement learning and deep learning are examples of machine learning. Reinforcement learning is an algorithm that constantly interacts with the environment to obtain the optimal strategy. It takes the state as the input and outputs the action strategy. Agents accumulate experience in constant trial and error processes and become more intelligent to make better decisions. From the early single-agent reinforcement learning algorithms, such as the Sarsa [13], Q-learning [14], and DQN [15] algorithms, to today’s multiagent reinforcement learning, reinforcement learning has shown great potential. In recent years, it has gained prominence in areas such as autonomous driving [16], finance [17], and gaming [18]. It has attracted much attention in target detection [19], speech recognition [20], image classification [21], and other fields. Intensive learning pays more attention to decision-making, but its perception of massive data is weak. Deep learning pays more attention to perception but lacks decision-making abilities. Deep intensive learning is an algorithm that combines the perception of deep learning with the decision-making ability of intensive learning. In the face of highly dimensional, complex optimization problems, it can efficiently search and quickly extract features to perceive abstract concepts and optimize strategies through rewards and punishments. Some studies have applied deep reinforcement learning to resource optimization control problems. Deng Zhilong [22] used a deep reinforcement learning algorithm to solve the resource scheduling problem. The deep learning network is trained by using the prior data obtained from the network nodes, and reinforcement learning is used to allocate the network resources. The effectiveness of the algorithm is proved by simulation experiments. Ke Fengkai [23] used a deep deterministic policy gradient algorithm to solve the continuous action control optimization problem. It could realize the fast, automatic positioning of a robot and achieve good results. Xiaomin Liao [24] used a deep reinforcement learning algorithm to solve the resource allocation problem of cellular networks. The proposed algorithm has a fast convergence speed and is obviously superior to other algorithms in the optimization of the transmission rate and system energy consumption. Liu Guannan [25] applied a deep reinforcement learning algorithm to the problem of allocating ambulance resources, and it was superior to the existing algorithms in the indicators of global average response time and initial response rate. Kui Hanbing [26] used deep reinforcement learning to solve the multiobjective control optimization problem of plug-in diesel hybrid electric vehicles, and the proposed algorithm could achieve better control effects. Hu Daner et al. [27] used a deep reinforcement learning algorithm to solve the reactive power optimization control problem of distribution networks. In the face of an unknown environment, it could co-ordinate a variety of reactive power compensation equipment. The simulation results show that the method has low overhead and low delay characteristics.

The above literature shows that deep reinforcement learning has the advantages that traditional optimization control methods do not have when facing highly uncertain optimization decision problems. On the one hand, the algorithm can realize offline training, learn the optimal strategy in the process of continuous trial and error, be controlled online, and obtain calculation results only according to local state information. On the other hand, the algorithm does not rely on an accurate environment model. In the face of unknown environments, agents can achieve real-time decision-making control through local state information. Therefore, using the above ideas, we can apply the deep reinforcement learning algorithm to the multiobjective ore-matching optimization problem of polymetallic open-pit mines. It is unnecessary to establish a complex ore-matching optimization model during training. The agent continuously optimizes the network parameters through the reward and punishment function and finally obtains an optimal ore-matching optimization model. During the test, dynamic ore-blending is realized. The system calculates the action strategy that the agent will take in the next step in real time according to the initial state manually entered, the ore-blending index, and the ore-blending optimization model obtained from training. The agent can respond in real-time even if some unexpected conditions are encountered in the ore-blending scheduling process.

In conclusion, this research proposes a method to solve the optimization problem of multimetal multiobjective ore matching in open-pit mines based on an actor-critic multiagent deep deterministic policy gradient (MADDPG) algorithm [28]. When considering all the indexes of ore grade, lithology, oxidation rate, and ore quantity in the ore-matching scene of an open-pit mine, the ore production sites are regarded as the multiagent with which to co-ordinate control and optimize the ore-matching problem. The time series is discretized, and the interaction process is transformed into the partially observable Markov decision process (POMDP). The agent constantly interacts with the ore-blending environment, learns new strategies through trial and error, and updates the network parameters according to the feedback of the environment so that the agent can get the maximum return. Each agent has two neural networks, namely, the strategic neural network and the evaluation neural network. The strategic neural network obtains control decisions based on local information interacting with the environment, and the evaluation neural network optimizes the strategic neural network based on the global information collected. The MADDPG algorithm is used to train the model, and the training process does not need an accurate optimization model of open-pit ore-blending and does not depend on the prediction data. When compared with traditional multiobjective optimization algorithms, such as the evolutionary algorithm and particle swarm optimization algorithm, which take 3–5 min to calculate each time, in-depth reinforcement learning online co-ordinated optimization can give real-time control strategies based on local state information in 100 milliseconds, which has obvious advantages.

The organizational structure of the paper is as follows: in the first part, the production process of multimetal and multiobject open-pit ore-blending is introduced, and the problem is modeled as a multiobjective optimization problem with the minimum deviation of ore grade and lithology. In the second part, a multimetal and multiobjective ore-matching scheduling algorithm based on multiagent deep reinforcement learning is proposed. In the third part, the simulation experiment is carried out. The simulation scene experimental data parameters and experimental steps are introduced, and the simulation experiment results are obtained. On this basis, the results are analyzed. The fourth part summarizes the work carried out in this paper.

2. Definition and Modeling of Open-Pit Polymetallic Multiobjective Ore-Matching Problem

2.1. Problem Definition

This paper studies the open-pit multimetal and multitarget ore-matching scene, as shown in Figure 1. In the scene, there are m ore production sites and n ore-receiving sites, and the ore production sites are expressed as

M_{i} = \{M_{1}, M_{2}, \dots, M_{m}\}

, and the ore-receiving sites are expressed as

C_{j} = \{C_{1}, C_{2}, \dots, C_{n}\}

. Due to the different ore grades, oxidation rates, lithology, and other parameters of the different ore production sites, the prepared ore-blending plans vary greatly. Each ore production site has b metals, and the grade of each metal is different. The ore supply grades at the ore production sites are expressed as

g_{i k} = \{g_{11}, g_{12}, \dots, g_{m b}\}

, and the target grades at the ore-receiving sites are expressed as

G_{j k} = \{G_{11}, G_{12}, \dots, G_{n b}\}

. The ore grade needs to be as close to the target grade as possible under the constraint of minimum grade deviation. The oxidation rate of the ores at each ore production site is different. The oxidation rate of the ores at the ore production sites is expressed as

h_{i} = \{h_{1}, h_{2}, \dots, h_{m}\}

, and the oxidation rate of the ores at the ore-receiving sites cannot exceed the allowable oxidation rate. The ore lithology of each ore production site is different. Grouping and arranging the ore lithology of each ore production site can be expressed as

\{α, β, γ, \dots\} .

After ore blending, there are various lithologies in each ore-receiving site, and the target lithology ratio is expressed as

\{L_{α j}, L_{β j}, L_{γ j}, \dots\}

. The lithology of the ore should be as close to the target lithology as possible under the constraint of minimum lithology deviation. The ore yield of the ore production sites and ore-receiving sites shall meet the requirements of production tasks. See Table 1 for the specific parameters and variable definitions.

2.2. Problem Modeling

2.2.1. Objective Function

The objective function of the multimetal multiobjective ore-blending model of open-pit mines needs to meet the requirements of minimum ore grade deviation and lithology deviation. If the ore-blending task is completed within the planned shift according to the ore-blending plan, the implementation requirements of the planned task will be met; otherwise, the implementation rate of the ore-blending plan will be affected.

(1) The functional expression of ore grade deviation is

m i n f_{1} (x) = \frac{\sum_{k = 1}^{b} \sum_{j = 1}^{n} |\sum_{i = 1}^{m} (g_{i k} - G_{j k}) x_{i j}|}{\sum_{j = 1}^{n} \sum_{i = 1}^{m} x_{i j}}

(1)

(2) The functional expression of ore lithology deviation is

m i n f_{2} (x) = \sum_{j = 1}^{n} (|\frac{\sum_{i = 1}^{α} x_{i j}}{\sum_{i = 1}^{m} x_{i j}} - L_{α j}| + |\frac{\sum_{i = α}^{β} x_{i j}}{\sum_{i = 1}^{m} x_{i j}} - L_{β j}| + |\frac{\sum_{i = β}^{γ} x_{i j}}{\sum_{i = 1}^{m} x_{i j}} - L_{γ j}| + \dots)

(2)

2.2.2. Constraints

Constraint conditions need to comprehensively consider the production capacity of the ore production sites and ore-receiving sites, the oxidation rate constraint, and the deviation of each target quantity.

(1) Constraints on the production capacity of the ore production sites and ore-receiving sites.

When taking into account the ore-matching optimization problem of polymetallic open-pit mines, the production capacity of the excavator at the ore production site needs to consider the actual production requirements. Too much production will lead to the failure of a single operation shift, and too little production will lead to other equipment waiting for a long time, and the crushing capacity of the crushing station at the mining point directly determines the upper and lower limits of the total amount of dynamically circulating ore in the whole environment. In order to ensure that the workload calculated by the ore-blending model is within the capacity range of the ore production and -receiving sites, it is necessary to set a reasonable production capacity constraint range for them.

p_{m_{i}}^{m i n} \leq \sum_{j = 1}^{n} x_{i j} \leq p_{m_{i}}^{m a x} (i = 1, 2, \dots, m)

(3)

p_{c_{j}}^{m i n} \leq \sum_{i = 1}^{m} x_{i j} \leq p_{c_{j}}^{m a x} (j = 1, 2, \dots, n)

(4)

where Formula (3) represents the constraint condition of the task quantity of ore production site, and Formula (4) represents the constraint condition of the task quantity of the ore-receiving site.

p_{m_{i}}^{m i n}

and

p_{c_{i}}^{m i n}

represent the lower limit of the production capacity of the ore production site and ore-receiving site, respectively.

p_{m_{i}}^{m a x}

and

p_{c_{i}}^{m a x}

represent the upper limit of the production capacity of the ore production site and ore-receiving site, respectively.

(2) Oxidation rate constraint.

According to the oxidation rate limit index given, the oxidation rate from the ore production site to the ore receiving site is limited.

\frac{\sum_{i = 1}^{m} h_{i} x_{i j}}{\sum_{i = 1}^{m} x_{i j}} \leq p_{o_{j}}^{m a x} (j = 1, 2, \dots, n)

(5)

where

p_{o_{j}}^{m a x}

represents the upper limit of the oxidation rate of ore at the ore-receiving site.

(3) Constraints on ore grade and lithology deviation.

Considering that the ore grade and lithology need to be as close to the target value as possible, the grade deviation and lithology deviation need to have upper and lower limit constraints:

p_{g}^{m i n} \leq f_{1} (x) \leq p_{g}^{m a x}

(6)

p_{l}^{m i n} \leq f_{2} (x) \leq p_{l}^{m a x}

(7)

where Formula (6) represents the constraint condition of ore grade deviation, and Formula (7) represents the constraint condition of ore lithology deviation.

p_{g}^{m i n}

and

p_{l}^{m i n}

, respectively, represent the lower limit of ore grade and lithology deviation at the ore-receiving site, and

p_{g}^{m a x}

and

p_{l}^{m a x}

, respectively, represent the upper limit of ore grade and lithology deviation at the ore-receiving site.

3. Multimetal Multiobjective Ore Allocation Algorithm Design for Open-Pit Mine Based on Multiagent Deep Reinforcement Learning

In the multiobjective ore-blending optimization problem of multimetal open-pit mines, the environmental information of the ore-receiving site, such as ore grade, lithology, oxidation rate, and ore amount, changes at every moment. This is a high-dimensional and complex problem with dynamic scheduling over time. Deep reinforcement learning is very good at solving such problems, with its powerful perception and computing ability. Therefore, this study uses a multiagent deep reinforcement learning algorithm to solve this problem.

3.1. Establishment of Multimetal and Multiobjective Ore-blending model in Open-Pit Mines Based on Multiagent Deep Reinforcement Learning

In view of the specific ore-blending scenario of open-pit mines in this study, the ore blending from multiple ore production sites to multiple ore-receiving sites is reasonable to maximize resource utilization. As shown in Figure 2, all ore production sites in the scene are selected as agents; all ore-receiving sites are selected as the environment, and the ore allocation optimization problem is transformed into a partially observable Markov decision-making process. In each time interval, t, each agent takes actions according to the observed state. The environment with the ore-receiving site as the observation object scores the actions taken by the agent and feeds them back to the agent along with the observations while updating the environment. The agents take corresponding actions each time according to the observed value and continuously repeat the above operations, successively deducing the next decision to be made by each agent. According to the objective function and constraint conditions of the multiobjective optimization problem in this scenario, the actions, observations, and rewards of the agent are constructed.

(1) Action (A): The action of the agent determines the strategy for ore-blending optimization. When considering the goal of minimizing the deviation of ore grade and lithology by rationally allocating ore resources from the ore production site to the ore-receiving site, at each time, the resource allocation operation from the ore production site to the ore-receiving site in the environment is defined as the action of the agent. The action set of all agents in time period t can be expressed as follows:

A^{t} = \{A_{1}^{t}, A_{2}^{t}, \dots, A_{m}^{t}\}

(8)

(2) Observation (O): This is the process of abstracting the optimization problem of ore distribution into a sequential decision-making problem; that is, the agent needs to make continuous decision-making to achieve the final goal. To facilitate marking the state of an agent, the constraints defined in the previous section are bounded, and the observation set

O_{i}^{t}

of agent i within the time t can be represented as

O_{i}^{t} = \{o_{i_{m}}^{t}, o_{i_{c}}^{t}, o_{i_{o}}^{t}, o_{i_{g}}^{t}, o_{i_{l}}^{t}\}

(9)

In the formula,

o_{i_{m}}^{t}

represents the execution of the task quantity of the ore production site of agent i in the period t. If the executed task satisfies the production capacity constraint of the ore production site defined in Equation (3),

o_{i_{m}}^{t}