*Article* **Modelling Large Heaped Fill Stockpiles Using FMS Data**

**Aaron Young \* and William Pratt Rogers**

Department of Mining Engineering, University of Utah, Salt Lake City, UT 84112, USA; pratt.rogers@utah.edu **\*** Correspondence: Aaron.S.Young@utah.edu

**Abstract:** The frequent best practice for managing large low-grade run-of-mine (ROM) stockpiles is to average the entire stockpile to only one grade. Modern ore control and mineral processing procedures need better precision. Low-precision models hinder the ability to create a digital mineto-mill model and optimize the holistic mining process. Prior to processing, poorly characterized stockpiles are often drilled and sampled, despite there being no geological reason for relationships between samples to exist. Stockpile management is also influenced by reserve accounting and lacks a common operational workflow. This paper provides a review of base and precious metal run-of-mine (ROM) pre-crusher stockpiles in the mining industry, and demonstrates how to build a spatial model of a large long-term stockpile using fleet management system (FMS) data and geostatistical code in Python and R Studio. We demonstrate a framework for modelling a stockpile believed to be readily workable for most modern mines through use of established geostatistical modelling techniques applied to the type of data generated in a FMS. In the method presented, each bench of the stockpile is modeled as its own geological domain. Size of dump loads is assumed to contain the same volume of material and grade values that match those of the grade data tracked in the FMS. Despite the limitations of these inputs, existing interpolation techniques can lead to increased understanding of the grade distribution within stockpiles. Using the framework demonstrated in this paper, engineers and stockpile managers will be able to leverage operational data into valuable insight for empowered decision making and smoother operations.

## **1. Introduction**

#### *1.1. Objective*

As mines are depleted and average global ore grades decline, the desire for an improved working model for stockpile management is likely to grow [1]. Fortunately, digitization efforts have increased throughout the mining industry in recent years, and there is already pressure for mining companies to make use of their new data [2]. Stockpiles are composed of not just valuable material, but also valuable and often misunderstood data, which makes them prime targets for the benefits of digital innovation [3]. This paper provides a review of base and precious metal run-of-mine (ROM) pre-crusher stockpiles in the mining industry, and demonstrates how to build a spatial model of a large long-term stockpile using fleet management system (FMS) data and geostatistical code in Python and R Studio.

## *1.2. Background*

Early base and precious metal mining endeavored to extract the richest ores available. In this scenario, miners would send rich ore directly to milling facilities. With the advent of industrial mining during the 19th century, partly due to Watt and the steam engine [4], mines scaled up into large tonnage operations capable of processing large tonnages of ore. For the first time, low-grade ores could be mined profitably [5]. As a result of these larger mines, the need for stockpiling gradually grew until they were justified to be economically advantageous over the course of a mine life [6–8].

**Citation:** Young, A.; Rogers, W.P. Modelling Large Heaped Fill Stockpiles Using FMS Data. *Minerals* **2021**, *11*, 636. https://doi.org/ 10.3390/min11060636

Academic Editor: Simon Dominy

Received: 21 April 2021 Accepted: 8 June 2021 Published: 15 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

According to Yi [7], the initial work around cut-off grades began in the 1960s and incorporated stockpiles in the 1980s [9]. These works determined that a stockpile will be required for any case in which a mine plans to change cut-off grade over the course of its life. Presently, the stockpiling of ore has become an integral part of mine planning [10,11]. Effective mine planning requires a detailed understanding of the ore deposit. This understanding is then leveraged into a production strategy that considers both the current mining environment as well as predictions of future demand, commodity prices, and other conditions.

Much work has been carried out to determine when base and precious metal mines should stockpile (cut-off grades, net present value (NPV) calculations, rehandle costs, milling costs, production contracts, mine planning, etc.) [1,11–17]. The discipline of managing stockpiles throughout mining operations to maximize the profit gained from their eventual processing is still emerging [18]. Often stockpiles are created as an afterthought or side effect to increased production. Occasionally, they are inherited from previous operators upon the purchase of a mine [19]. In some operations, stockpiles are required to fit into a space confined to a permit area they were not originally intended for.

The use of stockpiles has become paramount to the economic viability of production strategies for several reasons, which include the following:


#### *1.3. Classification of Stockpiles*

As is the case with many processes, blending or mixing is a necessary step to ensure optimal product quality. Typically, these blending processes are performed after the primary crusher step. Within mining, pre-crusher stockpiling is often used for its operational simplicity, but it typically lowers the confidence of the ore grade and reduces certainty in feed quality [15]. Pre-crusher stockpiling takes on five forms which are illustrated in Figure 1 [15–17]:

Just as there are operational differences between iron, coal and base and precious metal mining, there are also differences in how these operations stockpile. For example, iron ore is often shipped directly from the mine to the customer. Since the tonnages involved are immense, extensive work has already been carried out to model the way that iron ore should be stockpiled for re-handle [16,17,24,25]. Likewise, due to hazards around coal (loss of energy through oxidation, spontaneous combustion, etc.) and since it is typically directly shipped, there has been much work dedicated to understanding how it behaves during stockpiling [26].

For large-scale base and precious metal mine stockpiles, however, little scientific literature exists to document how to spatially model multivariate distribution of characteristics such as grade, hardness, rock type, and other geochemical properties. Kasmaee et al. [27] demonstrated a method using sampling from the base of stockpiles in conjunction with monthly surveying to create a kriging of an iron ore stockpile in Iran. Morley and Arvidson [28] mention that stockpile modelling typically involves arithmetic weighted averages of grade and tonnage values determined from blast hole samples and survey volume records. They also state that spatial modelling of stockpiles is performed manually using engineering software tools. The disadvantage of these manual models is that they require constant engineering resources which could be spent elsewhere. They are also subject to human error, risk of employee turnover and inconsistencies in their design method. Furthermore, base and precious metal stockpiles are classified as temporary storage [29] and their contents shift over time. Dozers and other heavy equipment frequently displace

stockpiled material from its original dump location, making it hard to know where material is located within the stockpile.

**Figure 1.** Five Types of Pre-Crusher Stockpiles (adapted from [15–17]).

Another way of classifying stockpiles is by their construction design. Figure 2 shows the different types of stockpile constructions which are common in mining operations.

**Figure 2.** Types of Stockpile Fill (adapted from [29]).

According to Carter, solutions for stockpile management are built into roughly a dozen comprehensive mine scheduling software platforms [18]. These software systems integrate with short-term and small-scale stockpiles, but they do not serve well for large, long-term, truck-dumped stockpiles. Generally, the stockpile modelling software that exists is tailored to conveyor systems for intermediate stockpiles [24,25], but it may be possible to apply some of their concepts to larger stockpiles. Discrete element modelling (DEM) [30] has demonstrated the ability to model material flow of intermediate stockpiles, but remains unproven for large stockpiles where particles are exceedingly numerous and variable.

In addition to modelling, documented technical methods for tracking ROM ore from haul trucks into large stockpiles and onward through downstream processes are perfunctory. Surface base and precious metal mines typically use global positioning systems (GPS) to track haul truck movement and fleet management systems (FMS) to optimize productivity. Afrapoli and Askari-Nasab [31] provide a detailed and current review of FMS technology. FMS systems perform best for tracking material directly dumped from pit to crusher, but lack the capability to track ore once it has been dumped into a stockpile.

Material tracking technology exists, but is rarely used in large stockpiles due to the long duration of time from the placement of the ore to processing. Jansen et al. [32] demonstrated proof of concept case studies involving SmartTag™ radio-frequency identification (RFID) tracer technology for both source-to-product as well as process hold-up ore tracking for the Northparkes mines in Australia. Jurdziak et al. [33] discuss the challenges, propose a workflow and estimate savings of 2.5 million euros for implementing RFID ore tracking at the KGHM Lubin mine in Poland. In addition to RFID, unmanned aerial vehicle (UAV) technology, while promising for its ability to determine stockpile volumes [34,35], is still incipient for the purposes of predicting grade distribution within the stockpile or tracking material movement.

Failing to adequately track and model ore stockpiles results in a loss of data already gained from geological exploration and mine production as illustrated in Figure 3. Such a failure represents not only data and money wasted, but also an opportunity lost in saving additional money during the processing of the stockpiled materials. This is a double lost opportunity for the operator who is probably already operating on relatively thin margins.

**Figure 3.** Digital Gaps in Different Phases of Mining Operation.

Figure 3 illustrates a conceptual loss of data associated with stockpiled material that was previously available from exploration models and operational logs. Moreover, the figure illustrates how information is added about the material at each processing step, including production drill sample analyses, ore control modelling and fleet management system information, until the time of stockpiling. Data from each of these operational

steps potentially add value to the material if appropriately used, but it is also wasted if the material is not tracked into the stockpile.

If a modern mine can be considered as a digital system of production, stockpiles may be thought of as digital bottlenecks, since they output less information into downstream processes than was input into them [36]. According to Kahraman et al. [37], identifying, tracking, and managing bottlenecks will enable significant improvement in mining operations. However, data gaps, such as stockpiles, short circuit the bottleneck tracking capabilities of sites not to mention, automation, mine-to-mill, and other operational improvement initiatives. While missing data can be thrown out to little detriment for estimation purposes (grade, tonnage, amount of explosive needed, etc.), they need to be included for predictive purposes (capital asset performance, maintenance, throughput, mine-to-mill, etc.) [38].

#### *1.4. Reconciliation and Metallurgical Accounting*

Despite little documentation on technical modelling methods, literature contains robust discussion on stockpiles as they pertain to reconciliation and metal accounting [32]. After the Sarbanes-Oxley Act of 2002 [39], the requirement for real-time disclosure of financial conditions to stakeholders in the mining business has inspired more systematic approaches to reconciliation and therefore more scrutiny of stockpiles. Macfarlane [40] contends that a full understanding of metal flow is necessary in order for there to be a systematic approach to reconciliation. Random stockpiling without a clearly defined process map can make accounting for tonnage discrepancies futile. Misunderstanding of stockpile characteristics reduces conversion of mineral resource inventory into saleable product. Motivations for stockpile misunderstandings, underappreciation and neglect are numerous, but generally include issues from the following categories [40,41]:


Ghorbani and Nwalia [41] affirm that mass flow measurement, stream sampling, mass balancing, and data handling and reporting are the four components of metallurgical accounting. Large, long-term stockpiles pose challenges to each of these components. Stockpiles are difficult to measure and sample [28]. They also risk being manipulated for the purposes of balancing the overall deposit metal value, may become orphaned by multiple operational departments and offer low incentives to be accurately recorded or reported on a frequent basis [41,42]. Improvements in technical capacity to model and manage stockpiles will therefore benefit reconciliation and metallurgical accounting. However, these technical improvements must be incorporated into mining workflows for such benefits to be reaped. Fortunately, existing reserve and operational models provide a foundation for the incorporation of a stockpile working model.

#### *1.5. Reserve and Operational Models*

The primary method currently used to model stockpile characteristics in mining is averaging the materials stockpiled and reconciling the values against reserve and operational models. Figure 4 illustrates key aspects of reserve and operational models. A reserve model

accounts for the entire life of mining operations and is based primarily on exploration drill holes that are widely spaced apart compared to production drilling. This model is updated biannually and contains more risk and uncertainty than the other models. Short term operational models are a type of hybrid between ore control models and the reserve model and are used for the purposes of planning on a one-to-three-year time horizon. Ore control models are made from additional data obtained by production drilling and contain only the information pertinent to areas of immediate mining within the next week to month for operational purposes. While some details vary in the drillhole spacing and the amount of hybridization between the short-term model, the reserve model and the ore control model, most mining operations follow some version of this established working model.

**Figure 4.** Reserve and Operational Models.

Unlike the reserve and operational models described previously, to the knowledge of the authors, there is no documented working model for large long-term stockpile characteristic distribution, which poses some issues. First, since stockpiles are not incomegenerating assets, without a working model that is easily implemented, they typically suffer neglect. Moreover, since many short-term discrepancies in ore reserve accounting can be written-off over time under the guise of environmental degradation [42] of stockpiles at the end of the mine life, the absence of a working model may be a disincentive to dedicate engineering resources to their concurrent management. Ultimately, these practices reduce confidence in the characteristics of stockpiled material and lead to a wasteful duplication of time and energy in drilling, and sampling, when the stockpile is finally set to be processed. Many stockpile working models have been developed manually by mining engineers to meet industry needs. However, there is a knowledge gap between the practical methods developed within industry and scientific documentation.

## *1.6. Sampling*

Many sampling issues have been identified to exist for both stockpiles as well as ore control operational models. In the case of ore control, blast hole sampling from drill collars has undergone intense scrutiny from the sampling community. Engström [43] provides a recent literature review on this scrutiny and states that blast hole sampling problems include loss of fines (or inaccurate particle size distribution representation), upward/downward contamination, influx of sub-drill material, pile segregation, pile shape irregularities, operator-dependent sampling, too small sample size, frozen (or weathered) blast hole cones and non-equiprobabilistic sampling equipment. Further complications to ore control understanding exist due to material movement after blasting. Thornton [44] states that material moves more than 4 m on average and movements of 10 m are common. Depending on the drill pattern, this movement could equate to a displacement from the initial sample location to up to three sample locations away.

The probability of a misclassification of ore to waste or waste to ore from sampling errors alone is commonly between 5% and 20% for base and precious metal mines [45,46]. In addition, ore loss of 9% to 19% due to blast movement and dilution can be expected [44]. The negative impacts of these issues may become more critical when handling precious metals and FMS data do not typically contain adjustments for them. However, the problem of data accuracy is separate from that of data utilization. If FMS data are used more frequently for modelling, then improvements in data accuracy will be encouraged through data feedback loops. Improvements in sensor technology could lead to increased sampling of mining operations and eventually better account for blast movement, dilution and sampling errors. It is probable that incorporating these more accurate measuring methods may help to improve the precision of the entire approach.

#### *1.7. Current Practice and Scope*

Industry sources have revealed that current practice is to take a running average of the material characteristics that are fed into the stockpile over time. These running averages are based on the reconciled survey volumes and grade values of materials and are updated concurrent with reserve and operating models. This process is designed to fulfill legal reporting requirements. Despite there being a large amount of data that are tracked in the FMS, the resulting stockpile block model currently in place is merely one large, homogenized block value containing the rolling average grade.

At this time, it is the understanding of the authors that no documented methodology exists in academic literature for determining the grade distribution of large, low-grade, randomly truck-dumped, pre-crusher stockpiles via FMS data. In such cases, drilling and grab sampling of the stockpile are often performed in order to determine the grade distribution and model the stockpile. These techniques have proven to be erroneous and biased [28]. We therefore present and explore an initial approach believed to be readily workable for most modern mines through use of established geostatistical modelling techniques applied to the type of data generated by FMS. This method is part of an ongoing study into developing an engineering methodology for greater understanding of large ROM truck-dumped stockpiles.

The working model for stockpile management conceived in this paper can attenuate the negative impacts of the present stockpiling scenario. It requires minimal engineering resources, is easy to setup and maintain over a long period of time, makes use of readily available data already existing in most modern mines and yields a high amount of detail. Our method could be used as a stand-alone model, or it could be used to enhance or verify other models. This method is software agnostic and can be integrated with mining software packages, as we will demonstrate.

## **2. Materials and Methods**

#### *2.1. Approach*

A data-driven approach was used to develop a working model of the characteristics of the stockpile. This approach combined the knowledge of the mine planners with that of the researchers in a process similar to the one demonstrated by Subramaniyan et al. [47]. For the purposes of this paper, as it is an example of the modelling technique, the exact data presented are not real data but exemplify real data seen by researchers and demon-

strate a hypothetical stockpile created to show the design method. Figure 5 outlines the methodology used.

**Figure 5.** Research Approach Following a Data-Driven [47] Technique.

As demonstrated in Figure 5, the inputs of the mine planners were used during each step of model development. In Step 1 mine planners shared relevant data with researchers, helped clarify questions about the data itself along with explaining any outliers and also defined the project boundary while researchers explored and cleaned the data. During Step 2 researchers presented some early visualizations of the raw data and prepared the modelling concepts. Mine planners provided feedback and context for the data visualization. Step 3 involved preparing and refining the model in accordance with mine planner requirements. Step 4 consisted of a final analysis and review. Each step provides the opportunity for feedback to prior and subsequent steps in the process for the purposes of tuning the final model.

#### *2.2. Data*

For the actual case study, a variety of survey, FMS, geological look-up tables, design files, and drone photogrammetry data were made available to the research team. These data have been kept proprietary and this paper contains only a hypothetical stockpile created in likeness of the real data. The flow of data used for creating the model for the actual case study is shown in Figure 6.

**Figure 6.** Data Flow Overview Figure.

FMS data came from the mines dispatching system, which contained information about trucks, grade IDs (ore control patterns), dumping locations, dump tonnage, and dumping coordinates. Assay table data contained grade values along with other details about the material such as hardness, rock type, concentration of deleterious elements, etc. Assay table data were matched to corresponding grade IDs in the FMS data. These values were interpolated into a block model by the same method demonstrated subsequently in this paper. Survey data consisted of .dxf files from survey pickups. Survey data were used to create the topographical extents of the block model and ensure that blocks were sequenced correctly. Historical surveys were used to verify that interpolated block values existed matched the real time frame.

A hypothetical dataset was made to illustrate the design method in this paper without revealing the proprietary data of the site. Tables 1 and 2 describe this hypothetical dataset, which represents the results of the combined FMS and assay table data shown in Figure 6. These data are statistically similar to the data used at the case study location and are similar to operational data available to most modern mines. The hypothetical example is that of an open pit gold mine, but the methodology could easily be applied to any other base or precious metal mine.


**Table 1.** Example FMS and Assay Data Categories.

**Table 2.** Statistical Characteristics of the Modelling Dataset.


As this model is a hypothetical example, survey volumes were designed in Maptek Vulcan for the creation of the stockpile survey data. Maptek Vulcan was chosen only because it was familiar and accessible to both the research team and the mine involved in the study. The model is software agnostic and compatible with any engineering software which uses .dxf file format. Other file formats may also prove compatible with the model in the future. The dimensions of the hypothetical heaped fill stockpile are 150 m × 150 m × 5 m. Figure 7 shows a screenshot of the designed hypothetical stockpile.

**Figure 7.** Screenshot of Hypothetical Heaped Fill Stockpile Designed in Maptek Vulcan.

Since the accuracy of each elevation value was limited to the value of the bench level, each bench was modeled as a two-dimensional plane. In the actual case study, each bench was aggregated to create the final block model. For the example shown in this paper, only one bench is demonstrated. While there is some difference in the tonnage values of each truck load in the data, for the purposes of this model, all dump locations were assumed to contain the same amount of material. Only one bench is demonstrated in this paper, but by this method multiple benches can be modelled as independent domains and combined into a block model for an entire stockpile.

#### *2.3. Model*

The block model was computed using R scripts running on a Python Jupyter Notebook Kernel. This notebook was developed using concepts from Pyrcz [48]. First, the combined values given by the FMS and assay data were imported into a dataframe. Then, the bench height given by the dump location was converted into an elevation or Z value for each of the dump coordinates. A dataframe consisting of the centroid locations for each block in the desired block model extents was then created. Finally, values for the block centroids were interpolated from the initial dataframe and assigned based on the inverse distance weighting (IDW) function from the gstats package in the R programming language to the centroids of the dataframe representing the block model. These centroids were then imported into Maptek Vulcan to demonstrate the capability of the model to be made useable by an engineering team.

The gstats package, developed by Edzer Pebesma [49], in the R programming language performs a number of common geostatistical functions. Pebesma outlines the modelling approach which should be used with gstats which is shown as Figure 8.

**Figure 8.** Decision Tree for Default gstats Program Action (from Pebesma [49]).

Inverse distance weighting (IDW) was selected as the modelling method in accordance with Figure 8, in that prediction locations were specified while variograms and base functions were not. Within gstats, the IDW function works the same way as the ordinary kriging function only without a model being passed and instead the inverse distance weighting power is directly specified by the user (β = 2) (see Equation (1)). A global search neighborhood (default parameter) was used for the model, meaning that all data points were used for estimating the value at each location.

IDW is a form of interpolation. Interpolation means to predict an attribute value zˆ from sampled locations xi at unsampled sites (x0) of a given neighborhood [50]. In this case, each of the dump locations along with corresponding grade values (z) were considered as ˆ the "sampled locations" (xi) and the unsampled sites (x0) were an array of block model centroid values (x and y coordinates within the model space at 15 m × 15 m spacings). While the general IDW equation varies slightly [51], for the purposes of the model shown in this paper, the equation used for interpolation is

$$Z\_{x,y} = \frac{\sum\_{i=1}^{n} Z\_i d\_{x,y,i}^{-\beta}}{\sum\_{i=1}^{n} d\_{x,y,i}^{-\beta}} \tag{1}$$

where *Zx,y* is the centroid point to be estimated, n is the number of samples, *z<sup>i</sup>* represents the value of the *i*th sample, *dx,y,i* is the distance between *Zx,y* and *z<sup>i</sup>* , and *β* is the user specified weight value.

This paper does not address which geostatistical method is best suited for modelling stockpiles. It is meant to demonstrate that FMS data are capable of being used along with geostatistical methods for greater operational understanding of stockpile characteristics. IDW was selected to show that FMS data are amicable to such methods of interpolation commonly employed in geostatistics. It was also chosen because it runs quickly without requiring additional model parameters as an input, whereas variogram modelling is more

time consuming and does not create a block model as a final product. The discussion section of this paper includes more information on how to further refine the initial model.

#### **3. Results**

In a heaped fill stockpiling scenario, dumping occurs in two phases. The first phase is a series of paddock dumps to form the base layer of the stockpile. The second phase involves building an upper layer above an area of the paddock dumps from which a campaign of edge dumps occurs until the bench is completed. Our results are broken down into the scenarios described, being first paddock dumping, second edge dumping and third a look at both in combination.

#### *3.1. Data Visualization*

Figure 9 shows an initial visualization of the hypothetical dataset. In Figure 9 each dot represents a single dump of similar volume. The positions of the dots represent the FMS tracked dumping position of the truck at the time of dumping. The dot colors represent the interval of their grade values in accordance with the legend shown.

**Figure 9.** Visualization of FMS Data.

Visualizing FMS data in this manner allows for the identification of some areas of homogeneity within the stockpile. One area would be the bottom left part of the figure, where many black dots are near each other. It also reveals that most of the stockpile contains areas of mixed values and it is not easily visually interpreted into a workable model.

The data from Figure 9 can be broken down into two types of data for greater understanding and better modelling. Each of the data types refer directly to the dumping process used to create the data. The first type of data is termed base layer or paddock dumping data and refers to instances in the data where the material was dumped as a heap on a roughly flat surface. The second type of data is called upper layer or edge dumping data and refers to situations where material was dumped down an existing stockpile face. Each of these data types were also explored and modeled.

#### *3.2. Base Layer/Paddock Dumping Model*

Via preliminary data analysis, FMS data which corresponds to paddock dumps and forms the base layer of a stockpile in a hypothetical scenario are shown in Figure 10. In Figure 10 each dot represents a single dump of similar volume. The positions of the dots represent the FMS tracked dumping position of the truck at the time of dumping. The dot colors represent the interval of their grade values in accordance with the legend shown.

**Figure 10.** Base Layer Grade Dumps of Example Stockpile Colored by Grade.

As is the case with paddock dumping for the base layer of a stockpile, the rows space evenly and maintain homogeneity along their respective row or column as the material originated from various low-grade areas in the pit of different grade amounts. This scenario occurs at the beginning of each bench of the stockpile or from paddock dumping areas where no upper layer is added. Material dumped this way is only handled by dozers or other heavy equipment in areas around the perimeter of the stockpile. Areas where dumps occurred too close to one another tend to settle into more vacant areas over time.

In the paddock dumping scenario, the area of influence of each truck load is closely related to the dimensions of the haul truck used. Figure 11 illustrates how truck dimensions influence the size and shape of the dumped material.

**Figure 11.** Volume of Influence of Haul Truck in Paddock Dumping Case.

From Figure 11, the resulting heap of material takes on geometric form similar to that of an elliptical frustum. The height, length and width (H, W, L in Figure 11) of the heap are dependent on the respective height, width and length of the haul truck used. These dimensions may be extended in horizontal directions if the haul truck moves excessively during the dumping of its material, which will also coincide with a reduction in the height of the corresponding heap. The width may also expand beyond the original width of the truck if the truck does not move forward. A movement of less than two meters during dumping is a common occurrence and this movement typically only affects the length of the heap, not the width. Movements occur most frequently when a truck has backed up too close to the neighboring heap before beginning its dumping phase, which acts to smooth out the overall average area of influence of multiple paddock dumps for a given area. At the time of dumping, heaps maintain an approximate 2:1 slope, consistent with that of heaped loaded material. Over time the slope decreases to that of the natural angle of repose of the material.

Figure 10 alone is sufficient to create a working model for a paddock-dumped stockpile without the need of additional modelling. If operators have an approximate understanding of where heaps of given ore values are located, they can easily match those locations to the corresponding heaps during operation. Blending the example shown in Figure 10 can be intuitively performed by processing the stockpile in parallel vertical approaches as needed.

#### *3.3. Upper Layer/Edge Dumping Model*

Figure 12 shows hypothetical FMS data for the second phase of heaped fill stockpile construction. Like Figure 10, each dot in Figure 12 represents a single dump of similar volume. The positions of the dots represent their FMS tracked position. The dot colors represent the interval of their grade values in accordance with the legend shown. During this phase, material is dumped on top of the material of the base layer. However, data from the previous layer have been removed to facilitate visualization.

**Figure 12.** Upper Layer Dumps of an Example Stockpile Colored by Grade.

Figure 12 demonstrates that many dumping locations are clustered together in the initial area where the upper layer of the stockpile is first made (top left of Figure 12). Afterwards, the stockpile is built by edge dumping material from the initial dumping area until it fills the designed volume for the given bench. Dozers and other heavy equipment help to ensure that material is cascaded without clumping, ensure compaction and maintain safety berms during operation. These actions mix the material from its initial dumping location in intractable ways.

While the FMS data in the previous phase could be defined by orderly row and column behavior, within this upper layer, the dumping process is defined by radial progression from an initial cluster point. Variation in the grade distribution is therefore defined by the tendency of the material to be added in sweeping radial movements which correlate with low-grade patterns of various grades in the pit.

Generally, its assumed that the volume of influence for of the haul truck in the upper layer/edge dump case is that of dimensions in width equal to the width of the haul truck, length equal to the horizontal component of the bench slope and variable height which is influenced by width, length and material characteristics. This volume runs perpendicular to the tangent of the dump location and is illustrated in Figure 13.

**Figure 13.** Assumed Area of Influence of Haul Truck in Edge Dumping Scenarios.

While the total volume of material is influenced by the capacity of the haul truck, during this phase the shape of the material dumped is mostly influenced by bench height and material characteristics. When edge dumping occurs normally, that is without rehandling from dozers or other equipment, the resulting shape is a streak of material cascaded along the entire face of the stockpile. This cascade of material typically aggregates more at the bottom of the dumping area under normal conditions and less near the top crest of the dump. Depending on the face of the stockpile larger amounts of material may clump at various places along the face. Round and large material may also roll beyond the floor of the bench, especially at higher bench heights. The authors recognize that these are initial assumptions and more discussion on improvement of the model may be found in the discussion section of this paper.

It should be noted that paddock dumping still occurs on top of the dump area during the edge dumping phase. These dumps are usually to patch and level the floor of the dump area, create safety berms or protect light stations. Occasionally, paddock dumping occurs near the edge before subsequentially being pushed over by a dozer. Operator error may cause trucks to misalign with the edge of the dump creating a more turbulent cascade. Such dumping scenarios are extremely difficult to identify from the data alone.

While it does not offer an operational model on its own, Figure 12 still serves as a starting point. Most homogeneity occurs in radial/diagonal left-to-right directions and at the corners. As with Figure 10, optimizing stockpile processing can intuitively follow the observable pattern of homogeneity (bottom left to top right or vice versa). Unlike Figure 10, operators will have a harder time identifying which area corresponds to which heap shown in the data. This scenario also contains more natural mixing of material due to settling and dozer handling than does the previous scenario.

#### *3.4. Interpolation Model*

Unlike Figures 10 and 12, Figure 9 offers no visual foundation for an operational model. The approach considered for such a situation is that of interpolation. Figure 14 shows an example of an interpolated model for the hypothetical stockpile using the inverse distance weighting method described in the method section. In this scenario, the area of influence of each truck load is difficult to determine since material that was dumped first affects the form that material dumped later takes.

**Figure 14.** Grade Distribution Map of Combination of Layers.

From the map shown in Figure 14, a block model may be created, since each coordinate is located on the same elevation, the block centroid Z-value is given by the elevation level midway through the bench. The block model shown in Figure 15 is the resulting block model from the interpolation shown in Figure 15 in Maptek Vulcan and bounded by the dimensions of the stockpile shown in Figure 7. To be clear, this block model contains only one block level. In the event that a stockpile contains multiple benches or levels, this same modelling process may be repeated for each level. Because the model was an interpolation, it populated values for every position within the sample space, including the corners where no data were shown. This interpolation is trimmed by the survey volume to create the block model shown in Figure 15.

**Figure 15.** Screenshot in Maptek Vulcan of Resulting Block Model from FMS Data Interpolation.

Figure 15 shows grade values for each 5 m × 5 m × 5 m block in the example stockpile volume. Grade values are colored in accordance with the legend shown in the top left of the black viewing window of the screenshot. The red box represents the model extent area.

## *3.5. Analysis*

To showcase the variance of the model by region and see how it compares with the data, confidence intervals on the mean values were used on each quadrant of the model area. A summary of the confidence intervals is shown in Table 3. Confidence intervals measure the degree of uncertainty in a dataset. They can take any number of probability limits, but Table 3 shows only 90%, 95% and 99% confidence levels. The greater the confidence interval value, the farther the mean value is from the remainder of the data.


**Table 3.** Confidence Intervals of Data and Model Areas.

Table 3 shows that for each quadrant and confidence level, the model has a smaller confidence interval value than that of the example data. Confidence interval values for the entire data area are only lower than two of the confidence intervals for the IDW model. These areas are in the bottom left and bottom right areas of the model and only when compared between a 90% confidence level with the data to a 99% confidence level with the

model. Overall, these lower values demonstrate that the model has less variance than the dataset. The top half of the model also has less variance than the bottom half.

#### **4. Discussion**

#### *4.1. Workflow*

The controversy in using a method which incorporates FMS data is that it frequently disagrees with the reconciled grade values which are required for financial reporting. FMS data also contain errors and need to be cleaned before modelling can occur smoothly. However, FMS data can be integrated into an automatic modelling process which frees up engineering resources. Using a data-driven approach also acts as a step towards digitization, which will improve after each iteration.

Figure 15 demonstrates that this modelling method may be passed into conventional engineering software and used by mine operations to optimize the processing of the stockpile. Effectively, this makes stockpile processing similar to mining of a large muck pile and subject to the same methods of ore control and mine planning previously established. Engineering software can create an optimized sequence for processing the stockpile which reduces the variability of the feed entering the mill. Block information can be uploaded into modern shovels with tracking technology so that operators can more tightly control processing along expected grade boundaries.

Having knowledge of zones within the stockpile that are trending to higher or lower values could potentially lead the operation to plan the dumping locations of each blast pattern more thoroughly. The added cost of organizing the stockpile in a real-time optimization would need to be weighed against the expected savings in rehandle cost at the time of the processing of the stockpile years in the future. This cost analysis would be tricky to perform since there are many variables to consider and most mines do not know the exact values and tonnages of their low-grade ore to be mined over the course of the entire mine life.

#### *4.2. Data*

Some dump coordinates exist outside of survey areas. This may be due to errors in the FMS tracking capability, or some material may be initially dumped outside of the boundary and later moved into it via dozers or other heavy equipment. Since FMS data alone makes it hard to determine the extents of the topography, survey data will always need to be used to create a final trim. There are also issues in variability in the physical location of the GPS coordinate of the haul truck and how that exact position best relates to the centroid of the corresponding dump location. This difference makes an even greater impact in the upper layer or edge dumping case, where the area of influence of the truck is more one-dimensional.

Future data inputs from real-time surveying of stockpiles via UAVs can improve the z values substantially. Drone data can also be used to model the dumping by truck and the movement of material by dozers. Furthermore, drones and additional sensors, along with improved modeling of the spreading behavior can improve the basic assumptions of the model, thus improving understanding the relationships between the data and the model.

Despite including drone data, the ability to track material flow over the entirety of its life in a heap fill stockpile and through processing will likely remain out of reach. However, a generalized model of key components of truck-dumped stockpile material behavior (material flow during dumping and movement by dozers for example) will improve the ability to estimate characteristic distribution within stockpiles. This modelling could further reduce processing downtime and increase throughput during the eventual processing of the stockpile.

#### *4.3. Modelling*

Sampling can be used to verify or calibrate the model. If each truck is sampled individually, the average sampling density for the stockpile modelled would be equal to

the haulage capacity of the truck used (100–400 t), which is an improvement in sampling density orders of magnitude above conventional sampling campaigns for heaped filled stockpiles and leach pads (175,000 tons in the use case of Winterton) [52]. While modelling will never replace sampling, both problems can be worked interdependently.

In the event that the grade and tonnage values differ substantially from official values given for financial reporting, the block model may be adjusted to represent the grade distribution of the stockpile as weighted by each block in comparison to all others and not strictly on block values given by interpolation of the FMS data. This adjustment to the model could also be completed in an autonomous manner and would involve data which are readily available at the mine.

The demonstration of FMS data incorporated into a stockpile block model in this paper opens the door to additional discussion around optimal stockpile modelling from FMS data. While the characteristic distribution of the material may not follow the inverse distance weighting method presented in this paper, it may be adequately modelled via other geostatistical methods, such as triangulation or nearest neighbors. Variograms, ordinary least squares (OLS) prediction methods, as well as kriging may also prove useful for stockpile modelling. These methods follow the modelling flowchart created by Pebesma [49] and shown as Figure 8.

Further studies and sampling could fine tune the model, such as using discrete element modeling for tracking flow of material for various settling scenarios. Once a model is deemed precise, which accurately accounts for the stockpile characteristics "as is", more advanced models may account for metal degradation over time, natural leaching, and other environmental factors. These models may apply directly to heap leach operations by indicating target areas for hydrological work that could increase leaching capacity.

#### *4.4. Other Factors*

While the stockpile used in this study acts as a strategic form of storage, many of the stockpiles used in mining act as a buffer to ensure continuity of the processing stage. Stockpiles dedicated to maintaining a steady flow of material into the plant are generally beneficial. However, bottlenecks and data loss may affect the intention of keeping control of the flow of material from mine to plant. The short timeframe between stockpiling and processing these types of stockpiles means a different approach for modelling them should be used when compared with long-term stockpiles.

Exposure of stockpiles to dilution, weathering and other elements causes significant changes to their characteristics. These changes amount to degradation, which is more prevalent in long-term stockpiles. Rezakhah and Newman [42] quantify degradation to be between 5 and 10% annually and recognize that literature on the problem of degradation is often seen as isolated from stockpiling or mine planning problems. More robust models of stockpiles from historical FMS data could provide a more granular look into the distribution and rate of degradation which occurs within stockpiles over time.

Base and precious metal stockpiles have much in common, especially if they are made using haul trucks and dozers of the same size. Where base and precious metal stockpiles differ is due to geochemical characteristics of the ore as well as grade distribution differences (nugget effect). Base metals also oxidize quickly, which can affect the geotechnical stability of their stockpiles. These differences should be considered when developing a modelling approach for each of these types of stockpiled materials.

The model demonstrated in this paper takes a simplistic approach to the material in each truck load. In reality each truck load will likely have undergone some degree of pre-crusher blending. Shovels and loaders constantly mix the material and blast movement and dilution cause it to not reflect what is in the FMS data. Stockpiles that act as process buffers may also undergo additional blending both before and after crushing. In either type of stockpile, the effects of blending must be considered when modelling.

#### **5. Conclusions**

In conclusion, the type of working method demonstrated in this paper shows how leveraging FMS data and existing interpolation techniques can lead to increased understanding of the grade distribution within stockpiles. Figures 14 and 15 indicate interpolated grade values at each location within the stockpile in a way which is directly incorporable into mining operations. Knowledge of the types of zones illustrated by Figures 14 and 15 could enable miners to optimize the processing of their stockpiles. While this method is merely illustrative, it shows that through a systematic process of validation and modelling improvements, a given mine can use this method to come to a better understanding of the grade distribution within its long-term, low-grade stockpile. The implications of better modelling ROM stockpiles will enhance the overall mine optimization such as mine-to-mill and other continuous improvement initiatives at mines.

**Author Contributions:** Conceptualization, A.Y.; Data curation, A.Y.; Formal analysis, A.Y.; Funding acquisition, W.P.R.; Investigation, A.Y.; Methodology, A.Y.; Project administration, W.P.R.; Resources, W.P.R.; Software, A.Y.; Supervision, W.P.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** The dataset used was made by the authors and is available upon request.

**Acknowledgments:** The authors would like to thank Newmont Mining Corporation for the collaboration and support of this project.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Jacques Olivier <sup>1</sup> and Chris Aldrich 1,2,\***


**Abstract:** Grinding circuits can exhibit strong nonlinear behaviour, which may make automatic supervisory control difficult and, as a result, operators still play an important role in the control of many of these circuits. Since the experience among operators may be highly variable, control of grinding circuits may not be optimal and could benefit from automated decision support. This could be based on heuristics from process experts, but increasingly could also be derived from plant data. In this paper, the latter approach, based on the use of decision trees to develop rule-based decision support systems, is considered. The focus is on compact, easy to understand rules that are well supported by the data. The approach is demonstrated by means of an industrial case study. In the case study, the decision trees were not only able to capture operational heuristics in a compact intelligible format, but were also able to identify the most influential variables as reliably as more sophisticated models, such as random forests.


**Citation:** Olivier, J.; Aldrich, C. Use of Decision Trees for the Development of Decision Support Systems for the Control of Grinding Circuits. *Minerals* **2021**, *11*, 595. https://doi.org/10.3390/min11060595

Academic Editor: Yosoon Choi

Received: 12 April 2021 Accepted: 28 May 2021 Published: 31 May 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Keywords:** grinding circuits; minerals processing; random forest; decision trees; machine learning; knowledge discovery; variable importance

#### **1. Introduction**

Grinding circuits are well-known for their inefficiency and disproportionate contributions to the energy expenditure of mineral processing plants [1]. Studies have estimated that comminution circuits account for between 2–3% of global energy consumption [1,2], and up to 50% of the energy consumption on mineral processing plants [3], with the cost rising exponentially, the finer the grind. Moreover, the pronounced effect of grinding on downstream separation processes has long been a driving force for more efficient operation of these circuits through process control. However, advanced control is often hindered by the complexity of grinding operations, characterised by strongly nonlinear relationships between coupled circuit variables and long time delays. In addition, frequent changes in feed ore characteristics and other disturbances affect the operational state of the circuit, requiring frequent adjustment of set points.

This is partly the reason why in the mineral processing industry, the majority of regulatory grinding circuit control systems is realised through PID control loops [4,5]. In contrast, supervisory control functions designed to maintain set points for regulatory control and adherence to process constraints are entrusted to either process operators or advanced process control (APC) systems. The latter includes expert control systems (ECS) [6,7], fuzzy controllers [8,9], and model predictive control systems [10,11].

Despite the advantages of APC, a recent survey [12] has indicated that it is still not well-established on most mineral processing plants. As a consequence, grinding circuit performance is still largely dependent on operator decision-making processes.

Through trial and error, operators accumulate experience and heuristic knowledge to perform these supervisory functions. However, application of this knowledge is dependent

on a subjective assessment of the state of the circuit and appropriate corrective action. Naturally, this can lead to inconsistent operator decision-making during similar operational states, as well as inconsistent operation between different individuals. sidered in this paper. More specifically, operator behaviour embedded in process data are extracted using decision trees to provide explicit and actionable rules, such as those found in ECS, pre-

Through trial and error, operators accumulate experience and heuristic knowledge to perform these supervisory functions. However, application of this knowledge is dependent on a subjective assessment of the state of the circuit and appropriate corrective action. Naturally, this can lead to inconsistent operator decision-making during similar

This work explores a methodology to support operator decision-making for the control of grinding circuits by extracting knowledge from plant data. Generally speaking, this may require feature extraction from raw process signals to be interpretable, which could be visualised as sets of univariate time series signals or structured data by the operator, or via more sophisticated infographic plots or displays, as indicated in Figure 1. This information may also captured by diagnostic process models designed for anomaly or change point detection or fault diagnosis, as well as models that could be used in auto-

It may also be possible to build diagnostic models that could include *if–then* rules that could be used for higher level interpretation of the data, and it is this aspect that is con-

operational states, as well as inconsistent operation between different individuals.

This work explores a methodology to support operator decision-making for the control of grinding circuits by extracting knowledge from plant data. Generally speaking, this may require feature extraction from raw process signals to be interpretable, which could be visualised as sets of univariate time series signals or structured data by the operator, or via more sophisticated infographic plots or displays, as indicated in Figure 1. This information may also captured by diagnostic process models designed for anomaly or change point detection or fault diagnosis, as well as models that could be used in automatic control systems, as indicated in Figure 1. sented in an *if–then* format. These rules are used to analyse current operator behaviour and guide future operator decisions. Application of the methodology is demonstrated in a case study using data from an industrial grinding circuit. Section 2 provides a brief overview of knowledge discovery from data relating to grinding circuits, while Section 3 describes the specific methodology followed in the investigation, based on the use of decision trees. Section 4 demonstrates the methodology on an industrial grinding circuit, with a discussion of the results and general conclusions presented in Section 5.

*Minerals* **2021**, *11*, 595 2 of 21

matic control systems, as indicated in Figure 1.

**Figure 1**. Data-driven modelling and decision support framework in support of human plant operator and automatic control systems. **Figure 1.** Data-driven modelling and decision support framework in support of human plant operator and automatic control systems.

**2. Knowledge Discovery for Grinding Circuit Control** Control of grinding circuits requires knowledge of the fundamental principles governing circuit operation. This knowledge allows the transformation of observed data and It may also be possible to build diagnostic models that could include *if–then* rules that could be used for higher level interpretation of the data, and it is this aspect that is considered in this paper.

information into sets of instructions [13]. The knowledge acquisition process is often referred to as knowledge discovery, or knowledge engineering in ECS literature [14,15]. During traditional knowledge engineering, such as used during the crafting of ECS, rules are extracted directly from the heuristic knowledge of operators or circuit experts. This is generally facilitated through interviews, questionnaires, or observation of circuit More specifically, operator behaviour embedded in process data are extracted using decision trees to provide explicit and actionable rules, such as those found in ECS, presented in an *if–then* format. These rules are used to analyse current operator behaviour and guide future operator decisions. Application of the methodology is demonstrated in a case study using data from an industrial grinding circuit.

Section 2 provides a brief overview of knowledge discovery from data relating to grinding circuits, while Section 3 describes the specific methodology followed in the investigation, based on the use of decision trees. Section 4 demonstrates the methodology on an industrial grinding circuit, with a discussion of the results and general conclusions presented in Section 5.

#### **2. Knowledge Discovery for Grinding Circuit Control**

Control of grinding circuits requires knowledge of the fundamental principles governing circuit operation. This knowledge allows the transformation of observed data and information into sets of instructions [13]. The knowledge acquisition process is often referred to as knowledge discovery, or knowledge engineering in ECS literature [14,15].

During traditional knowledge engineering, such as used during the crafting of ECS, rules are extracted directly from the heuristic knowledge of operators or circuit experts. This is generally facilitated through interviews, questionnaires, or observation of circuit operation by the knowledge engineer [15,16]. However, situations often arise where human experts have difficulties articulating their own decision-making rationale, or the human expert knowledge is inadequate for a comprehensive knowledge-base. Alternatively, data mining tools can be applied to extract knowledge from process data through inductive learning.

Data mining usually involves fitting models to or extracting patterns from systems by learning from examples. To facilitate decision support, the specific representation of knowledge in these models are of great importance. For knowledge discovery purposes, data mining models can be categorised according to the manner in which this knowledge is represented, being either implicit or explicit [17]. Implicit representations lack the formal mathematical notations of explicit representations, thus requiring experience to understand and possibly enabling subjective interpretations.

Black box models with implicit knowledge representations have been applied successfully for grinding circuit control using neural networks [18–20], support vector machines [21,22], reinforcement learning [23,24], or hybrid model systems [25], among others. However, the difficulty associated with interpreting and transferring such implicit knowledge make these models undesirable for operator decision support.

In contrast, rule-based classifiers are more suitable for the development of decision support systems, as they generate explicit *if–then* rules that can in principle be interpreted easily by human operators. In mineral processing, rule-based classifiers include evolutionary algorithms, such as genetic algorithms [26], genetic programming [27,28], rough sets [29], as well as decision trees [30]. In addition to this, some efforts have also been made to extract rules indirectly from the data, via from black box models, such as neural networks [31–34].

Of these methods, decision trees are by far the most popular, as the rules generated by these models are easily interpretable by operators and can provide actionable steps to move from one operational state to another. These rules consist of a sequence of conditional statements combined by logical operators; an example is given below.

$$\text{IF } (\mathbf{X}\_1 < \mathbf{C}\_1) \text{ and } (\mathbf{X}\_2 > \mathbf{C}\_2) \text{ and } (\mathbf{X}\_3 = \mathbf{C}\_3) \text{ and } \dots \text{ THEN } (\mathbf{C} \text{class} = k) \tag{1}$$

While this method for rule induction has been applied to numerous problems, it has found sparse application in the chemical and mineral processing industries. Saraiva and Stephanopoulos [35] demonstrated the use of decision rules extracted from decision trees to developing operating strategies for process improvement in case studies from the chemical processing industry. Leech [36] developed a knowledge base from rules induced from decision trees to predict pellet quality of uranium oxide powders for nuclear fuels. Reuter et al. [37] generated a rule base to predict the manganese grade in an alloy from slag characteristics in a ferromanganese submerged-arc furnace.

The applications of rule-based classifiers mentioned above mostly focus on the generation of rule sets for automatic control systems. Accordingly, these systems are often not suitable for interpretation by humans. In this study, the use of decision trees to extract rules from grinding circuit data that can be used to support control operators is considered. Decision trees have received some focus in the development of decision support systems, but applications to the processing industries were rarely encountered [38,39]

#### **3. Methodology**

#### *3.1. Classification and Regression Trees*

Decision trees are a class of machine learning algorithms that split the data space into hyper-rectangular subspaces. Each subspace is associated with a single class label, for categorical data, or numerical value for continuous data. These subspaces are identified by recursively searching for partitions based on a single input variable that cause the largest reduction in impurity of the output variable in the associated hyperrectangle.

Tree induction algorithms, such as CART (Breiman, et al., 1984) and C4.5 (Quinlan, 1993), utilise different concepts for this notion of impurity. Different impurity measures are also used depending on whether the tree is used for classification or regression. For

classification purposes, CART, the algorithm used during this investigation, calculates the Gini Index at each split point. Consider a classification task where the proportion of class *k*, at node *η*, is given by *p*(*k*|*η*). The Gini Index, *i*(*η*), at node *η* is given by: classification purposes, CART, the algorithm used during this investigation, calculates the Gini Index at each split point. Consider a classification task where the proportion of class , at node , is given by (|). The Gini Index, (), at node is given by:

Tree induction algorithms, such as CART (Breiman, et al., 1984) and C4.5 (Quinlan, 1993), utilise different concepts for this notion of impurity. Different impurity measures are also used depending on whether the tree is used for classification or regression. For

*Minerals* **2021**, *11*, 595 4 of 21

$$\dot{a}(\eta) = 1 - \sum\_{k=1}^{C} p(k|\eta)^2 \tag{2}$$

The Gini Index increases as the impurity or mixture of classes increases at the node. For example, in a binary classification problem the Gini Index reaches a maximum value of 0.5 when both classes are of equal proportion in the node. A node containing examples from a single class will have a Gini Index of 0. The reduction in impurity for a proposed split position, *ξ*, depends on the impurity of the current node, the impurity of proposed left and right child nodes (*η<sup>L</sup>* and *ηR*), as well as the proportion of samples reporting to each child node (*p<sup>L</sup>* and *pR*): The Gini Index increases as the impurity or mixture of classes increases at the node. For example, in a binary classification problem the Gini Index reaches a maximum value of 0.5 when both classes are of equal proportion in the node. A node containing examples from a single class will have a Gini Index of 0. The reduction in impurity for a proposed split position, , depends on the impurity of the current node, the impurity of proposed left and right child nodes ( and ), as well as the proportion of samples reporting to each child node ( and ):

$$
\Delta i(\xi\_\prime \eta) = i(\eta) - p\_\mathbb{R} \times i(\eta\_\mathbb{R}) - p\_L \times i(\eta\_L) \tag{3}
$$

The split position resulting in the largest decrease in impurity is selected. In regression trees, splits are selected to minimise the mean squared error from predictions of the child nodes. The split position resulting in the largest decrease in impurity is selected. In regression trees, splits are selected to minimise the mean squared error from predictions of the child nodes. Without a stopping criterion specified, this procedure is repeated until all examples

Without a stopping criterion specified, this procedure is repeated until all examples in a node belong to the same class, have the same response value or the node contains only a single training example. In classification models, these terminal (leaf) nodes will predict the label of the class present in the largest proportion. For regression models, leaf nodes predict the average value of all samples belonging to the node. in a node belong to the same class, have the same response value or the node contains only a single training example. In classification models, these terminal (leaf) nodes will predict the label of the class present in the largest proportion. For regression models, leaf nodes predict the average value of all samples belonging to the node.

The recursive nature of tree induction algorithms allow the decision trees to be represented as tree diagrams, as shown for the classification tree in Figure 2. The recursive nature of tree induction algorithms allow the decision trees to be represented as tree diagrams, as shown for the classification tree in Figure 2.

**Figure 2.** General classification tree diagram. **Figure 2.** General classification tree diagram.

The trees are readily converted to discrete normal form (DNF) [40] as sets of "*if*–*then*" decision rules by following the decision paths from the root node to each leaf node. Table 1 presents the results of this procedure illustrated for the tree in Figure 2. The trees are readily converted to discrete normal form (DNF) [40] as sets of "*if* –*then*" decision rules by following the decision paths from the root node to each leaf node. Table 1presents the results of this procedure illustrated for the tree in Figure 2.

Without any restrictions, decision tree models can grow to fit almost any data distri-**Table 1.** Decision rules extracted from tree in Figure 2.


Without any restrictions, decision tree models can grow to fit almost any data distribution. However, this generally results in the tree overfitting training data with reduced generalization performance. Stoppage criterions can be imposed to reduce the complexity of the tree and reduce this probability of overfitting the dataset.

These are enforced by requiring a minimum amount of samples belonging to a proposed node for the node to be formed, or placing a hard limit on the amount of allowable splits or overall tree depth.

#### *3.2. Evaluating the Utility of Decision Rules*

In this investigation, the focus was on using decision tree algorithms to extract decision rules which have significant support in the dataset and are sufficiently accurate, while remaining compact and interpretable for decision support. The utility of a rule could be considered a function of these factors, as indicated in Equation (4) below.

$$\text{utility}(\text{rule}) = f(\text{support}, \text{accuracy}, \text{complexity}) \tag{4}$$

Conceptually, there should exist some optimal configuration of the parameters wherein the utility of a rule is maximised. This concept is explored qualitatively in the case study. Each of the three requirements are briefly discussed below.

#### 3.2.1. Supporting Samples in the Dataset

For a rule to be of any utility, it needs to be applicable in the modelled system for significant periods of time. The higher the number of samples in the dataset belonging to a specific rule, the higher the support is for the rule, and the larger the fraction of time for which the rule is valid for decision support. This metric is analogous to the support metric used to quantify the strength of association rules [41]. An acceptable level of support for a rule is dependent on the modelling problem. When modelling common operational practices, a high number of supporting samples would be required. However, if fault or rare events are investigated, the level of support could be considerably less.

#### 3.2.2. Rule Accuracy

The accuracy of the rule refers to the dispersion of target values of the data samples belonging to the rule. For classification trees, the accuracy of a rule is represented by the proportion of training samples belonging to the class predicted by the node. For regression trees, the accuracy is well represented by the standard deviation of the target values of samples belonging to the rule. Low standard deviation of the target values of samples indicates a relatively accurate prediction by the regression tree, with sample target values close to the predicted mean.

Both accuracy measures are closely related to the impurity measures used during construction of the trees. Ideally, emphasis is placed upon rules with high accuracy.

#### 3.2.3. Complexity and Rule Interpretability

For a rule to be of utility for decision support, the rule must remain interpretable by humans. While this notion of interpretability is naturally subjective, longer rules with a large amount of splits are difficult to interpret and tie to physical phenomena. The shorter the rule, the easier it is to interpret and possibly act upon.

Here, the decision tree algorithm is forced to generate shorter rules, and a fewer number of rules, by specifying the maximum amount of splits allowed in the tree. However, these restrictions will come at a cost, possibly decreasing the accuracy of the rules, since the capacity of the model has been decreased. However, as will be shown, restricting the number of splits does always not have a significant effect on the model generalisation ability, while it significantly increases the interpretability of the rules.

#### *3.3. Decision Rule Extraction Procedure* This section describes the methodology used to extract useful rules from decision

This section describes the methodology used to extract useful rules from decision trees to support operator decision-making. The overall procedure is displayed in Figure 3. The process is naturally iterative, and in practice, a practitioner would repeat the process until a rule set of sufficient utility is discovered. A short discussion of each step is presented below. trees to support operator decision-making. The overall procedure is displayed in Figure 3. The process is naturally iterative, and in practice, a practitioner would repeat the process until a rule set of sufficient utility is discovered. A short discussion of each step is presented below.

*Minerals* **2021**, *11*, 595 6 of 21

ability, while it significantly increases the interpretability of the rules.

number of splits does always not have a significant effect on the model generalisation

**Figure 3.** Procedure for extracting rules from decision trees **Figure 3.** Procedure for extracting rules from decision trees.

*3.3 Decision Rule Extraction Procedure*

#### 3.3.1. Data Acquisition and Exploration 3.3.1. Data Acquisition and Exploration

Operational data is collected from a mineral processing plant. Ideally, this dataset would span a period of operation capturing some variation or drift in the process. To successfully evaluate the utility of identified rules, the practitioner has to be very familiar with the intricacies of the operation, or a circuit expert needs to be consulted. Next, an exploratory analysis of the collected data can be conducted. The presence of frequently recurring operating states are identified and the conditions around these states inspected. Tying decision rules to specific operational states could provide guidance to move from less, to more favourable states. Operational data is collected from a mineral processing plant. Ideally, this dataset would span a period of operation capturing some variation or drift in the process. To successfully evaluate the utility of identified rules, the practitioner has to be very familiar with the intricacies of the operation, or a circuit expert needs to be consulted. Next, an exploratory analysis of the collected data can be conducted. The presence of frequently recurring operating states are identified and the conditions around these states inspected. Tying decision rules to specific operational states could provide guidance to move from less, to more favourable states.

#### 3.3.2. Model Specification and Tree Induction 3.3.2. Model Specification and Tree Induction

The modelling problem needs to be carefully formulated to ensure rules are extracted to address a specific variable that can solve an existing problem, or address specific operational patterns. This leads to the identification of candidate input, , and target variables, , for the decision tree algorithm. The modelling problem needs to be carefully formulated to ensure rules are extracted to address a specific variable that can solve an existing problem, or address specific operational patterns. This leads to the identification of candidate input, **X**, and target variables, **y**, for the decision tree algorithm.

Both controlled and manipulated variables are suitable targets for knowledge discovery. Decision tree models constructed with manipulated variables as the target leads to rules with a direct control action as its prediction. Modelling a controlled variable does not have this feature, but serves to discover common operational patterns leading to different operational states. Both controlled and manipulated variables are suitable targets for knowledge discovery. Decision tree models constructed with manipulated variables as the target leads to rules with a direct control action as its prediction. Modelling a controlled variable does not have this feature, but serves to discover common operational patterns leading to different operational states.

In addition to traditional operational variables, the role of the operator is embedded as a latent variable into the dataset. The operator's contribution will usually be revealed as a set point change in the manipulated variables. Thus, in many situations rules are actually describing common decision-making patterns by operators. In addition to traditional operational variables, the role of the operator is embedded as a latent variable into the dataset. The operator's contribution will usually be revealed as a set point change in the manipulated variables. Thus, in many situations rules are actually describing common decision-making patterns by operators.

Once the input and output variables are designated, decision trees are induced on a training partition of the data, and a test set is used to measure the generalisation ability of the tree. If the accuracy of the tree proved to be too low, previous steps were repeated. Variable importance measures were used to quantify the relative contributions of different variables to the model. The results should be evaluated for consistency with heuristic circuit knowledge. Once the input and output variables are designated, decision trees are induced on a training partition of the data, and a test set is used to measure the generalisation ability of the tree. If the accuracy of the tree proved to be too low, previous steps were repeated. Variable importance measures were used to quantify the relative contributions of different variables to the model. The results should be evaluated for consistency with heuristic circuit knowledge.

#### 3.3.3. Rule Extraction and Evaluation 3.3.3. Rule Extraction and Evaluation

Decision trees are readily converted into a DNF rule set. In a decision tree, each path from the root node to a terminal node can be represented as a rule consisting of the conjunction of tests on the internal nodes on the path. The outcome of the rule is the class label or numerical value in the leaf node. Such a rule is extracted for each terminal node in the decision tree. Decision trees are readily converted into a DNF rule set. In a decision tree, each path from the root node to a terminal node can be represented as a rule consisting of the conjunction of tests on the internal nodes on the path. The outcome of the rule is the class label or numerical value in the leaf node. Such a rule is extracted for each terminal node in the decision tree.

The utility of each rule was evaluated using the measures proposed above. Rules with high utility are considered valuable for operator decision support and were further analysed for the knowledge the rule contains and its practical usage.

1

1

1

#### **4. Case Study 4. Case Study** In this section, the methodology is applied to a dataset from an industrial semi-au-

In this section, the methodology is applied to a dataset from an industrial semiautogenous grinding (SAG) circuit. The circuit is operated under human-supervisory control, with set points primarily determined by process operators based on production targets from management. Classification trees were used to analyse the operational patterns surrounding periods wherein the mill overloaded, requiring drastic action from process operators. Identifying and addressing the circuit operating patterns during such events could reduce the frequency of similar events in future operation. togenous grinding (SAG) circuit. The circuit is operated under human-supervisory control, with set points primarily determined by process operators based on production targets from management. Classification trees were used to analyse the operational patterns surrounding periods wherein the mill overloaded, requiring drastic action from process operators. Identifying and addressing the circuit operating patterns during such events could reduce the frequency of similar events in future operation.

The utility of each rule was evaluated using the measures proposed above. Rules with high utility are considered valuable for operator decision support and were further

*Minerals* **2021**, *11*, 595 7 of 21

analysed for the knowledge the rule contains and its practical usage.

#### *4.1. SAG Circuit Description 4.1. SAG Circuit Description*

A schematic of the SAG circuit is shown in Figure 4. Crushed ore and water are fed to the SAG mill. Fine SAG mill product leaves the mill through a trommel screen and enters a sump before being pumped to a reverse configuration ball milling circuit. Pebbles, consisting of the mid-size fraction material building up, exit the mill through pebble ports and crushed in a cone crusher before being fed back to the mill. A schematic of the SAG circuit is shown in Figure 4. Crushed ore and water are fed to the SAG mill. Fine SAG mill product leaves the mill through a trommel screen and enters a sump before being pumped to a reverse configuration ball milling circuit. Pebbles, consisting of the mid-size fraction material building up, exit the mill through pebble ports and crushed in a cone crusher before being fed back to the mill.

**Figure 4.** SAG circuit diagram, with variable descriptions in Table 2. **Figure 4.** SAG circuit diagram, with variable descriptions in Table 2.


**Name Description Unit**


 Mill power draw kW Dry feed rate Tonnes/hour Pebble discharge rate Tonnes/hour Pebble returns rate Tonnes/hour Water addition rate m<sup>3</sup> /hour Cyclone Pressure kPa The SAG circuit is operated to achieve maximum throughput, by maximising dry feed rate, while operating within the power draw limits imposed by the SAG mill drive system. Operators continuously monitor the mill power draw and respond by changing the feed rate accordingly. Distinction is made between the pebble discharge, **X**3, and pebble returns rate, **X**4, since operators have the option to drop the whole pebble stream to a stockpile, allowing near instantaneous mill load control.

 Pebble circuit bypass Binary control variable This action was observed most often when the mill power draw reached a very high level and risked tripping the mill. This action can be considered as another binary "onor-off" manipulated variable. A description of measured circuit variables, as indicated in Figure 4, are given in Table 2.

#### *4.2. Modelling Problem Description*

In this case study, the focus was on gaining an understanding of the sequence of events leading to operators deciding to bypass the pebble circuit. It was generally understood that these events occurred in reaction to impending mill overloads, by removing the midsize fraction from the mill charge. However, metallurgists wanted to discover common operating patterns leading to these overload and subsequent pebble circuit bypass events.

While dropping the pebbles to a stockpile can dramatically reduce the mill load, and subsequently power draw, this action essentially just postpones the problem. Generation of the pebble material has consumed significant amounts of energy, without resulting in product sent for downstream concentration. The pebbles contain a significant amount of valuable material that will require regrinding in the future. Additionally, the drastic change in mill load results in a coarser overall grind and forces the mill into subsequent cycles of instability.

Accordingly, the modelling problem was formulated to predict the status of the pebble circuit as a function of the remaining operational variables. The model specification is summarised in Table 3.

**Table 3.** Classification model specification to predict the status of the SAG pebble circuit.


Since the status of the pebble circuit can be represented as a binary "on/off" variable, the problem was suitable for a classification model. Alternatively, modelling the variable **X**<sup>4</sup> as a regression target should lead to similar results.

Notably missing from the inputs in Table 3 is **X**4, the pebble returns rate. Pebble circuit bypass events correspond to normal tonnages on **X**3**,** but no pebble returns to the feed conveyor (zero on **X**4). Thus, the status of the pebble circuit can be perfectly predicted from knowledge of **X**<sup>3</sup> and **X**<sup>4</sup> alone. Combining these two variables in a model will result in perfect predictions, but no meaningful insights will be obtained from decision rules.

Since it is a manually triggered event, modelling the status of the pebble circuit essentially attempts to model the operators' decision-making processes. Decision rules induced during the analysis should identify the most common operational patterns leading to bypass events. Once these patterns are identified, the behaviours can be addressed in an attempt to decrease the frequency of these occurrences.

#### *4.3. Raw SAG Circuit Data Exploration*

Data samples were collected at a frequency of 5 min from the plant spanning a period of approximately six weeks of operation. The normalised data samples are shown in Figure 5. Regarding the binary variable **y**, a bypass of the pebble circuit is designated with the value 1, while normal operation of the circuit is denoted with the value 0. From the 9500 samples present in the dataset, 435 samples corresponded to periods of bypassing the pebble circuit, from 141 unique bypass events. The data were cleaned to remove downtime and any equipment maintenance periods.

Figure 6 shows the effect of the pebble circuit bypass on the overall variability in the circuit. The figure shows the principal component scores of the dataset on the first three principal axes, with the pebble returns superimposed as a colour map. Bypassing the pebble circuit corresponds to no pebbles returned to the mill. From the figure, it can be seen that periods of bypass lie outside the edges of the central cluster of normal operation. This suggests that reducing the frequency of these events would decrease the overall variability in the SAG circuit.

suggests that reducing the frequency of these events would decrease the overall variabil-

*Minerals* **2021**, *11*, 595 9 of 21

**Figure 5.** Normalised SAG circuit operational data spanning six weeks of operation. **Figure 5.** Normalised SAG circuit operational data spanning six weeks of operation. ity in the SAG circuit.

**Figure 6.** Principal component scores of SAG circuit operational data projected on first three principal axes. Percentage of variance explained by each principal component shown in brackets. Pebble return rates superimposed as a colour map. **Figure 6.** Principal component scores of SAG circuit operational data projected on first three principal axes. Percentage of variance explained by each principal component shown in brackets. Pebble return rates superimposed as a colour map.

#### *4.4. Random Forest Classification Model*

**Figure 6.** Principal component scores of SAG circuit operational data projected on first three principal axes. Percentage of variance explained by each principal component shown in brackets. Pebble return rates superimposed as a colour map. To gain a baseline indication of the predictability of the power draw from other circuit variables, a random forest model [42] was trained for the classification task. The random forest model, consisting of the bagged ensemble of trees and bootstrap samples used to train each tree in the forest, provides an upper limit for comparison to the predictive performance of individual tree models. Variable importance estimates from random forest models are also generally more reliable than decision trees because of the bootstrap aggregating procedure.

A random forest model was trained for the classification task specified in Table 3, using the parameters summarised in Table 4. The number of trees in the forest was selected to be large enough such that a further increase in the number of trees does not increase the model generalization. The number of predictors to sample at each split in the tree, from the total number of variables *M*, was maintained at the default value as suggested by Liaw and Weiner [43].


**Table 4.** Random forest model parameters for classification of SAG circuit data.

**Table 5.** Custom cost matrix for random forest models to reduce false negatives.


The dataset was split into a training and test dataset in an 80/20 ratio. Since the number of bypass events are highly outnumbered by normal operation, the target dataset was highly imbalanced. The imbalance was negated by imposing a higher misclassification cost on bypass samples misclassified as normal pebble circuit operation. The higher misclassification cost was set equal to the proportion of normal samples to pebble circuit bypass samples, to reduce the amount of predictions resulting in false negatives. The custom cost matrix is shown in Table 5.

To quantify the model accuracy on the imbalanced dataset, the F1-score was used, as defined below:

$$F1\text{ Score} = 2 \times \frac{Precision \times Recall}{Precision + Recall} \tag{5}$$

In this context, the precision designates the fraction of samples correctly classified as bypass events (true positives) against the total number of samples classified as bypass events (true positives and false positives). The recall designates the fraction of samples correctly classified as bypass events (true positives) against the actual number of bypass event samples (true positives and false negatives). Ideally, a model should obtain high precision and recall. Since the F1-score is simply the harmonic mean of these two measures, a high F1 score is also desired.

The F1 score as a function of the number of trees in the random forest model, as calculated on the held-out test set, is shown in Figure 7. The figure demonstrates that the F1-score improves sharply until ten trees are added to the model, after which the score plateaus around 0.7 and less significant increases to the generalization performance is observed. Notably, a single decision tree achieves a F1-score of only 0.47, indicating a significant number of misclassifications.

The misclassifications are shown in the confusion matrix in Table 6 for a random forest with 50 trees. The confusion matrix shows that the model predicts a low number of false positives, corresponding to a precision of 0.77. However, a significant number of false negatives, corresponding to a recall of 0.6, arises because of misclassification of actual bypass events.

icant number of misclassifications.

eaus around 0.7 and less significant increases to the generalization performance is observed. Notably, a single decision tree achieves a F1-score of only 0.47, indicating a signif-

**Figure 7.** Random forest classification accuracy as a function of number of trees in the model. **Figure 7.** Random forest classification accuracy as a function of number of trees in the model.

The misclassifications are shown in the confusion matrix in Table 6 for a random for-**Table 6.** Confusion matrix of a random forest models with 50 trees on an independent test set.


decision-making. While it is thought that there is a general pattern leading to these events, the decisions made ultimately rely on subjective assessments of conditions and inconsistent choices between different individuals. The complexity of the modelling task is further increased by the general uncertainty present in the circuit, related to the disturbances of feed ore characteristics. However, the majority of the events are correctly classified, and the rule extraction procedure can be used to identify the most prominent behavioural patterns leading to these events. **Table 6.** Confusion matrix of a random forest models with 50 trees on an independent test set. The results demonstrate the difficulty of classifying the pebble circuit bypass events. This is likely a consequence of the fact that the model is attempting to describe operator decision-making. While it is thought that there is a general pattern leading to these events, the decisions made ultimately rely on subjective assessments of conditions and inconsistent choices between different individuals. The complexity of the modelling task is further increased by the general uncertainty present in the circuit, related to the disturbances of feed ore characteristics. However, the majority of the events are correctly classified, and the rule extraction procedure can be used to identify the most prominent behavioural patterns leading to these events.

**Predicted Class 0 1 True Class** <sup>0</sup> <sup>1805</sup> <sup>15</sup> 1 35 52 The permutation importance and Gini importance measures [42,44] were calculated to quantify the importance of each variable in the random forest model. A random or dummy variable was added to the set of predictor variables, as was proposed by [45]. This random variable had no relationship with the target variable and serves as an absolute benchmark against which to measure the contributions of the variables. Both measures were calculated for 30 instances of the model, with each instance trained on a different subset of the data. The distributions of the importance measures calculated based on the permutation importance and Gini importance criteria are shown in Figures 8 and 9, respectively. In these figures, the red horizontal bars in the centres of the boxes show the The permutation importance and Gini importance measures [42,44] were calculated to quantify the importance of each variable in the random forest model. A random or dummy variable was added to the set of predictor variables, as was proposed by [45]. This random variable had no relationship with the target variable and serves as an absolute benchmark against which to measure the contributions of the variables. Both measures were calculated for 30 instances of the model, with each instance trained on a different subset of the data. The distributions of the importance measures calculated based on the permutation importance and Gini importance criteria are shown in Figures 8 and 9, respectively. In these figures, the red horizontal bars in the centres of the boxes show the median values of the importance measures, while the upper and lower edges of the boxes correspond with the 25th and 75th percentiles of the measures. The whiskers extend to the most extreme points not considered outliers, which are indicated by '+' markers. The notches in the boxes can be used to compare the median values of the importance measures, i.e., non-overlapping notches indicate a difference between the median values of the importance measures with 95% certainty

median values of the importance measures, while the upper and lower edges of the boxes correspond with the 25th and 75th percentiles of the measures. The whiskers extend to the

*Minerals* **2021**, *11*, 595 12 of 21

of the importance measures with 95% certainty

of the importance measures with 95% certainty

**Figure 8.** Box plots of permutation variable importance measures of a random forest model with 50 trees for the predictor set and a dummy variable (R), showing the median values (red bar), 25% and 75% percentiles (upper and lower box edges), extreme points (whiskers), as well as outliers (red '+' markers). **Figure 8.** Box plots of permutation variable importance measures of a random forest model with 50 trees for the predictor set and a dummy variable (R), showing the median values (red bar), 25% and 75% percentiles (upper and lower box edges), extreme points (whiskers), as well as outliers (red '+' markers). 50 trees for the predictor set and a dummy variable (R), showing the median values (red bar), 25% and 75% percentiles (upper and lower box edges), extreme points (whiskers), as well as outliers (red '+' markers).

most extreme points not considered outliers, which are indicated by '+' markers. The notches in the boxes can be used to compare the median values of the importance measures, i.e., non-overlapping notches indicate a difference between the median values

most extreme points not considered outliers, which are indicated by '+' markers. The notches in the boxes can be used to compare the median values of the importance measures, i.e., non-overlapping notches indicate a difference between the median values

**Figure 9.** Box plots of Gini variable importance measures of random forest model with 50 trees for the predictor set and a dummy variable (R), showing the median values (red bar), 25% and 75% percentiles (upper and lower box edges), extreme points (whiskers), as well as outliers (red '+' **Figure 9.** Box plots of Gini variable importance measures of random forest model with 50 trees for the predictor set and a dummy variable (R), showing the median values (red bar), 25% and 75% percentiles (upper and lower box edges), extreme points (whiskers), as well as outliers (red '+' markers). **Figure 9.** Box plots of Gini variable importance measures of random forest model with 50 trees for the predictor set and a dummy variable (R), showing the median values (red bar), 25% and 75% percentiles (upper and lower box edges), extreme points (whiskers), as well as outliers (red '+' markers).

markers). In both figures, all variables contributed significantly more to the target variable than the random variable. Both measures shows markedly similar variable importance distributions. Both measures identify significant model contributions from , the water addition rate, and , the dry feed rate. A lesser contribution from , the pebble discharge In both figures, all variables contributed significantly more to the target variable than the random variable. Both measures shows markedly similar variable importance distributions. Both measures identify significant model contributions from , the water addition rate, and , the dry feed rate. A lesser contribution from , the pebble discharge rate, is noted. Although the pebble circuit bypass is generally thought to be a response to In both figures, all variables contributed significantly more to the target variable than the random variable. Both measures shows markedly similar variable importance distributions. Both measures identify significant model contributions from **X**5, the water addition rate, and **X**2, the dry feed rate. A lesser contribution from **X**3, the pebble discharge rate, is noted. Although the pebble circuit bypass is generally thought to be a response to rapid increases in the power draw, this variable was deemed less significant.

rate, is noted. Although the pebble circuit bypass is generally thought to be a response to rapid increases in the power draw, this variable was deemed less significant. rapid increases in the power draw, this variable was deemed less significant. Ideally, a decision tree analysed for decision support should prioritise the same variables as the random forest model. Apart from the F1-score, a similarity in the variable importance distributions serve as additional indication that the structure of a decision tree is sufficiently representative of the more accurate and robust random forest model. Accordingly, these measures were compared with the variable importance measures obtained from a single decision tree, as demonstrated in the next section.

#### *4.5. Decision Tree Induction and Simplification 4.5. Decision Tree Induction and Simplification*

*Minerals* **2021**, *11*, 595 13 of 21

tained from a single decision tree, as demonstrated in the next section.

The previous section demonstrated that the status of the pebble circuit is to an extent predictable from the set of input variables. The RF model demonstrated that a F1-score of 0.47 could be obtained using a single decision tree. While this constitutes a considerable drop in accuracy from the unrestricted random forest model, simpler decision tree models should still be able to extract simple rules describing the most common patterns leading to bypass events. The previous section demonstrated that the status of the pebble circuit is to an extent predictable from the set of input variables. The RF model demonstrated that a F1-score of 0.47 could be obtained using a single decision tree. While this constitutes a considerable drop in accuracy from the unrestricted random forest model, simpler decision tree models should still be able to extract simple rules describing the most common patterns leading to bypass events.

Ideally, a decision tree analysed for decision support should prioritise the same variables as the random forest model. Apart from the F1-score, a similarity in the variable importance distributions serve as additional indication that the structure of a decision tree is sufficiently representative of the more accurate and robust random forest model. Accordingly, these measures were compared with the variable importance measures ob-

The trees in the random forest model are constructed without any restrictions on tree or branch growth. This impedes the extraction of short, interpretable decision rules from the tree. The trees in the random forest model are constructed without any restrictions on tree or branch growth. This impedes the extraction of short, interpretable decision rules from the tree.

A decision tree model was trained for the classification problem using the parameters in Table 7 below. A restriction on the minimum parent (branch) node size is usually imposed as a default setting in software packages to prevent the tree from growing separate branches for each training example. However, the minimum leaf size of one member still allows the tree to overfit the training data. A decision tree model was trained for the classification problem using the parameters in Table 7 below. A restriction on the minimum parent (branch) node size is usually imposed as a default setting in software packages to prevent the tree from growing separate branches for each training example. However, the minimum leaf size of one member still allows the tree to overfit the training data.

**Table 7.** CART model parameters for decision tree induction on the SAG circuit data. **Table 7.** CART model parameters for decision tree induction on the SAG circuit data.


A decision tree trained using the parameters in Table 7 is shown in Figure 10. With no restrictions placed on the branch growth, the tree contains 173 branch nodes and 174 leaf nodes. The tree achieved a classification accuracy, in terms of the F1 score, of 0.422. A decision tree trained using the parameters in Table 7 is shown in Figure 10. With no restrictions placed on the branch growth, the tree contains 173 branch nodes and 174 leaf nodes. The tree achieved a classification accuracy, in terms of the F1 score, of 0.422.

**Figure 10.** Decision tree predicting SAG pebble circuit status from operational data, with no restrictions on tree growth capabilities. **Figure 10.** Decision tree predicting SAG pebble circuit status from operational data, with no restrictions on tree growth capabilities.

While the large number of splits and leaf nodes allow the tree to more closely fit the training data, the interpretability of the decision tree and individual tree branches is lost. The absence of restrictions on tree splitting parameters, such as the number of splits or While the large number of splits and leaf nodes allow the tree to more closely fit the training data, the interpretability of the decision tree and individual tree branches is lost. The absence of restrictions on tree splitting parameters, such as the number of splits or tree depth, also reduces the generalisation ability of the tree by overfitting to the training dataset. This is demonstrated in Figure 11, where the training and test set accuracy of a decision tree is plotted as a function of the maximum number of splits allowed.

*Minerals* **2021**, *11*, 595 14 of 21

**Figure 11.** Classification tree prediction accuracy as a function of the maximum number of split nodes allowed. **Figure 11.** Classification tree prediction accuracy as a function of the maximum number of split nodes allowed.

tree depth, also reduces the generalisation ability of the tree by overfitting to the training dataset. This is demonstrated in Figure 11, where the training and test set accuracy of a

decision tree is plotted as a function of the maximum number of splits allowed.

Figure 11 shows a slight increase in the F1-score on the test set with increasing number of splits. There is a sharp increase in the F1-score up until 20 splits, after which the increases become less significant. However, even at 20 splits, the F1-score is only 0.4, indicating a significant number of misclassifications. This stresses the importance of the accuracy of rules extracted from such a tree. Figure 11 shows a slight increase in the F1-score on the test set with increasing number of splits. There is a sharp increase in the F1-score up until 20 splits, after which the increases become less significant. However, even at 20 splits, the F1-score is only 0.4, indicating a significant number of misclassifications. This stresses the importance of the accuracy of rules extracted from such a tree. **Figure 11.** Classification tree prediction accuracy as a function of the maximum number of split nodes allowed.

Further, Figure 11 shows that above 20 split nodes the tree is starting to overfit the training data with only marginal increases to the generalization ability. This result indicates that we can significantly reduce the number of splits for construction of the decision trees, while maintaining acceptable accuracy and generalisation ability. Restricting the number of splits and leaf nodes will somewhat decrease the reliability of the tree, but will also simplify the tree branches greatly to allow the extraction of interpretable decision rules. This simplification is demonstrated in Figure 12. Further, Figure 11 shows that above 20 split nodes the tree is starting to overfit the training data with only marginal increases to the generalization ability. This result indicates that we can significantly reduce the number of splits for construction of the decision trees, while maintaining acceptable accuracy and generalisation ability. Restricting the number of splits and leaf nodes will somewhat decrease the reliability of the tree, but will also simplify the tree branches greatly to allow the extraction of interpretable decision rules. This simplification is demonstrated in Figure 12. Figure 11 shows a slight increase in the F1-score on the test set with increasing number of splits. There is a sharp increase in the F1-score up until 20 splits, after which the increases become less significant. However, even at 20 splits, the F1-score is only 0.4, indicating a significant number of misclassifications. This stresses the importance of the accuracy of rules extracted from such a tree. Further, Figure 11 shows that above 20 split nodes the tree is starting to overfit the training data with only marginal increases to the generalization ability. This result indi-

The trees in Figure 12 were constructed with the parameters indicated in Table 7, as well as an additional parameter restricting the maximum number of splits allowed in the tree. The figure demonstrates that small numbers of short, interpretable rule sets can be generated with 20 or less splits in the tree, while maintaining acceptable accuracy on the test dataset. cates that we can significantly reduce the number of splits for construction of the decision trees, while maintaining acceptable accuracy and generalisation ability. Restricting the number of splits and leaf nodes will somewhat decrease the reliability of the tree, but will also simplify the tree branches greatly to allow the extraction of interpretable decision rules. This simplification is demonstrated in Figure 12.

**Figure 12.** *Cont.*

*Minerals* **2021**, *11*, 595 15 of 21

**Figure 12.** Decision trees generated on the SAG circuit data set with maximum number of splits imposed. F1-score of each tree on an independent test set is indicated. **Figure 12.** Decision trees generated on the SAG circuit data set with maximum number of splits imposed. F1-score of each tree on an independent test set is indicated.

test dataset.

The trees in Figure 12 were constructed with the parameters indicated in Table 7, as well as an additional parameter restricting the maximum number of splits allowed in the tree. The figure demonstrates that small numbers of short, interpretable rule sets can be generated with 20 or less splits in the tree, while maintaining acceptable accuracy on the For illustrative purposes, the rest of the analysis considers a tree with ten split nodes. Table 8 shows the confusion matrix for such a tree, which achieved an F1-score of 0.331. Because of the increased misclassification cost, the majority of bypass events are correctly classified. However, this also leads to an increased number of false positives.



**Confusion Matrix Predicted Class 0 1** True class <sup>0</sup> <sup>1616</sup> <sup>204</sup> 1 29 58 The above tree induction was simulated 30 times and the variable importance The above tree induction was simulated 30 times and the variable importance measures were calculated at each iteration. The permutation and Gini variable importance measures are shown in Figures 13 and 14, respectively. Both measures rank the importance of the input variables similarly to that of the random forest model in Figures 8 and 9. However, the importance of the hydrocyclone has diminished in the underfit decision trees. All inputs are again at least as important as the random variable.

measures were calculated at each iteration. The permutation and Gini variable importance measures are shown in Figures 13 and 14, respectively. Both measures rank the importance of the input variables similarly to that of the random forest model in Figures 8 and 9. How-The average F1-score over the 30 model instances is notably lower than the above result. This demonstrates the sensitivity of the generated models to the specific partition of data used during training.

ever, the importance of the hydrocyclone has diminished in the underfit decision trees. All inputs are again at least as important as the random variable. The average F1-score over the 30 model instances is notably lower than the above result. This demonstrates the sensitivity of the generated models to the specific partition of data used during training. The variable importance measures were calculated to directly compare with the results in Figures 8 and 9. This analysis is required to analyse the importance of variables in a random forest, since the forest of trees obstructs simple interpretation of the model. However, a single decision tree is more interpretable, and the most significant variables should be recognisable from the top branches in the tree.

The variable importance measures were calculated to directly compare with the results in Figures 8 and 9. This analysis is required to analyse the importance of variables in a random forest, since the forest of trees obstructs simple interpretation of the model. The tree corresponding to the results in Table 8 is presented in Figure 15. The variables close to the root node in the tree correspond to those identified as most significant by the variable importance measures. In the following section, decision rules are extracted from this tree and analysed for their utility in decision support.

should be recognisable from the top branches in the tree.

should be recognisable from the top branches in the tree.

*Minerals* **2021**, *11*, 595 16 of 21

this tree and analysed for their utility in decision support.

this tree and analysed for their utility in decision support.

**Figure 13.** Box plots of permutation variable importance measures of a single decision tree with a maximum of ten splits, showing the median values (red bar), 25% and 75% percentiles (upper and lower box edges), extreme points (whiskers), as well as outliers (red '+' markers). Distributions were calculated over 30 model realisations. **Figure 13.** Box plots of permutation variable importance measures of a single decision tree with a maximum of ten splits, showing the median values (red bar), 25% and 75% percentiles (upper and lower box edges), extreme points (whiskers), as well as outliers (red '+' markers). Distributions were calculated over 30 model realisations. **Figure 13.** Box plots of permutation variable importance measures of a single decision tree with a maximum of ten splits, showing the median values (red bar), 25% and 75% percentiles (upper and lower box edges), extreme points (whiskers), as well as outliers (red '+' markers). Distributions were calculated over 30 model realisations.

However, a single decision tree is more interpretable, and the most significant variables

However, a single decision tree is more interpretable, and the most significant variables

The tree corresponding to the results in Table 8 is presented in Figure 15. The variables close to the root node in the tree correspond to those identified as most significant by the variable importance measures. In the following section, decision rules are extracted from

The tree corresponding to the results in Table 8 is presented in Figure 15. The variables close to the root node in the tree correspond to those identified as most significant by the variable importance measures. In the following section, decision rules are extracted from

**Figure 14.** Box plots of Gini variable importance measures of a single decision tree with a maximum of ten splits, showing the median values (red bar), 25% and 75% percentiles (upper and lower box edges), extreme points (whiskers), as well as outliers (red '+' markers). Distributions were calculated over 30 model realisations. **Figure 14.** Box plots of Gini variable importance measures of a single decision tree with a maximum of ten splits, showing the median values (red bar), 25% and 75% percentiles (upper and lower box edges), extreme points (whiskers), as well as outliers (red '+' markers). Distributions were calculated over 30 model realisations. **Figure 14.** Box plots of Gini variable importance measures of a single decision tree with a maximum of ten splits, showing the median values (red bar), 25% and 75% percentiles (upper and lower box edges), extreme points (whiskers), as well as outliers (red '+' markers). Distributions were calculated over 30 model realisations.

**Figure 15.** Classification decision tree with a maximum of 10 node splits. Leaf nodes resulting in pebble circuit bypass events are circled in red. **Figure 15.** Classification decision tree with a maximum of 10 node splits. Leaf nodes resulting in pebble circuit bypassevents are circled in red.

#### *4.6. Extracting and Evaluating Decision Rules 4.6. Extracting and Evaluating Decision Rules*

The pathways from the root node to leaf nodes in the tree in Figure 15 are represented as decision rules in Table 9. Where the pathway contains multiple partitions on the same variable, the rule was simplified to contain a single expression for each unique variable. The support, accuracy and number of splits for each rule is summarised in Table 10, with which the utility of each rule could be evaluated. The pathways from the root node to leaf nodes in the tree in Figure 15 are represented as decision rules in Table 9. Where the pathway contains multiple partitions on the same variable, the rule was simplified to contain a single expression for each unique variable. The support, accuracy and number of splits for each rule is summarised in Table 10, with which the utility of each rule could be evaluated.


**Table 9.** Simplified decision rules extracted from the tree in Figure 15.

9 IF ≥ 0.507 AND ≥ 0.428 AND ≥ 0.736; THEN = 1 **Table 10.** Support, accuracy, and number of splits per rule in Table 8.

8 IF ≥ 0.507 AND ≥ 0.428 AND < 0.736; THEN = 0


The rules in Table 9 leading to bypass events naturally have relatively small numbers of supporting samples because of the prevalence of these events, and this state essentially representing fault conditions. In Table 10, the rules predicting bypass events all have an accuracy below 0.5. If the prediction was a simple majority vote of all the samples belonging to the rule, the rules would naturally predict normal operation of the pebble circuit. However, the higher cost imposed on misclassifying actual bypass events outweighs the cost of misclassifying normal operation samples. Thus, the higher misclassification cost allows for the identification of the operational states wherein these bypass events are most likely to occur. Intuitively, the accuracy is thus better interpreted as an indication of the probability of a bypass event occurring in the operational state specified by the rule. The number of splits are low and interpretable because of the maximum number of splits restriction imposed on the decision tree.

Three of the rules extracted are critically analysed below. Consider a closer inspection of rule 5:

IF Water addition rate < 0.507 AND Dry feed rate ≥ 0.646; THEN Bypass pebble circuit (Probability = 0.36) (6)

The rule states that at a higher dry feed rate coupled with a lower water addition rate, corresponding to an increased solids density in the SAG mill, there is a 36% chance the circuit would be bypassed. Depending on ore characteristics at the time, the inadequate water addition is causing the mill to retain more fines than usual, leading to an increase in the mill load and power draw. To decrease the probability of this event in the future, metallurgists could reconsider the SAG discharge density targets given to operators based on different ore sources.

Rule 9 states the following:

IF Water addition rate ≥ 0.507 AND Mill power draw ≥ 0.428 AND Pebble discharge rate ≥ 0.736; THEN Bypass pebble circuit (Probability = 0.102) (7)

Rule 9 states that when the water addition rate and power draw are at medium levels or higher, while the pebble discharge rate is high, there is a 10% chance that the pebble circuit would be bypassed. This situation might arise when the mill feed suddenly changes to a more competent ore source, or a larger portion of mid-size fraction material is being fed. The mill load and power draw are not necessarily high, but the fraction of mid-size material being discharged from the mill is increasing, possibly to a level where the pebble crusher and circuit conveyors are unable to deal with the increased load. Depending on the mill fill level and the amount of power available, operators might choose to draw a higher portion of large rocks from the stockpile to attempt to break down some of the mid-size material. Alternatively, operators may choose to draw an increased fraction of finer material from the stockpile to maintain the mill throughput while not further contributing to the generation of pebbles. Metallurgists could further investigate the particle size distributions received from the preceding crusher section to deal with these occurrences.

Rule number 9 is directly contrasted by rule 8, which received the highest amount of support in the dataset:

> IF Water addition rate ≥ 0.507 AND Mill power draw ≥ 0.428 AND Pebble discharge rate < 0.736; THEN Normal pebble circuit operation (Probability = 0.989) (8)

As seen in the tree in Figure 15, rule 8 and rule 9 split the data space according to the specific value of the pebble discharge rate. In contrast to rule number 9, rule 8 predicts that at lower pebble discharge rates, the circuit was only bypassed 1.1% of the time. Thus,

the combination of the two rules discover the explicit value of the pebble discharge rate, such that when this value is exceeded, the operator is ten times more likely to bypass the pebble circuit. The rule can alert an operator when approaching this specific operational state, hopefully triggering faster control action and avoiding the bypass event.

#### **5. Discussion and Conclusions**

In the case study, classification trees were used to model operator decision-making, when deciding whether to remove the critical size material from the circuit to prevent the mill from overloading. It was demonstrated how the model specification can be exploited to identify the causes of rare events. It was demonstrated that rules can be extracted to understand why and when operators were making this particular decision.

This type of knowledge can be utilised by metallurgists to aid in determining circuit operational parameters, or provided to the operator as decision support on a humanmachine interface (HMI). Decision support on a HMI could nudge the more cautious operator to increase throughput, or restrain aggressive operators when their ambitions might push equipment towards its limit and require drastic action. This decision support could take the form of explicitly displaying rules extracted on a HMI, or process alarms alerting the operator when entering a state governed by a specific rule. Depending on the specific problem investigated, such decision support systems could either increase overall throughput, increase the energy efficiency of the grinding task, or reduce the wear to mill consumables and liners.

The merits of this type of rule induction is based on its simplicity. Site experts or metallurgists can identify a problem and formulate a model to answer questions regarding the problem. Rules are then easily induced using pre-packed CART implementations. There is no guarantee that the rules will contain valid or insightful knowledge, so the expert is required to critically analyse to ensure they are reasonable. The greatest inhibitor of extracting rules for successful decision support would be the unavailability of quality data sets, or a lack of site-specific knowledge to interpret and critically evaluate the patterns such rules discover. Neither of these should be of any concern to a plant metallurgist.

While it is unlikely that this induction is used for the generation of a complete ECS, it can certainly augment heuristic knowledge from experts in such systems. Experts often have difficulty explaining the procedures they follow to arrive at decisions [15]. Rule induction could aid in formalizing some of the procedures.

As noted by Li et al. [46], the integration of human operators and technology in the control room is lacking in the minerals processing industry. The successful implementation of any process control or decision support system is reliant on effective HMI visualisations, and training operators to effectively utilise such tools.

In summary, decision trees can be used as an effective approach to extract intelligible rules from data that can be used to support operators controlling grinding circuits. Three criteria are considered in the process, i.e., the accuracy of the rule, the support of the rule, and the complexity of the rule. While the case study described the application of classification trees, the methodology is easily extended to regression problems.

In addition, in some instances, as was the case in this investigation, they can be used to identify the most influential variables as reliable as more complex models, such as random forests.

Future work will focus on the industrial operationalization of the approach, making use of various online sensors in grinding circuits.

**Author Contributions:** Conceptualization, C.A. and J.O.; methodology, C.A. and J.O.; software, J.O.; validation, J.O.; formal analysis, J.O.; investigation, J.O.; resources, C.A.; data curation, C.A. and J.O.; writing—original draft preparation, J.O.; writing—review and editing, C.A. and J.O.; supervision, C.A.; funding acquisition, C.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** Financial support for the project by Molycop (https://molycop.com) is gratefully acknowledged.

**Data Availability Statement:** The data are not publicly available for proprietary reasons.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

