*2.1. Operational Relative-Hardness Criteria*

From the several operational parameters that can be captured and associated to SAG mill operations, we consider the energy consumption (EC) and feed tonnage (FT) to build our operational relative-hardness criteria.

Let us assume that data {EC,FT}*<sup>t</sup>* is collected over a period of time *T* using a Δ*t* discretization. By considering the one-step forward time difference of energy consumption ΔEC*<sup>t</sup>* = EC*t*+1−EC*<sup>t</sup>* and feed tonnage Δ FT*<sup>t</sup>* = FT*t*+<sup>1</sup> − FT*<sup>t</sup>* , a qualitative assessment of the operational relative-hardness can be done. For instance, if the energy consumption is increasing and the feed tonnage is constant, it can be interpreted as an increase in ore hardness relative to the previous period. Similarly, if the feed tonnage is constant and the energy decreases, a decrease in ore hardness relative to the previous period can be assumed. Particularly, when both ΔEC*<sup>t</sup>* and ΔFT*<sup>t</sup>* show the same behaviour, the SAG can be either processing ore with medium operational relative-hardness or being filled up or emptied. To avoid misclassification in this last case, the operational relative-hardness is labelled as undefined. Table 1 summarizes the nine combinations of states and the associated operational relative-hardness.

The qualitative labelling of ΔEC*<sup>t</sup>* and ΔFT*<sup>t</sup>* as increasing, constant or decreasing can be established based on their global distribution over the period *T* as:

$$\text{AEC}\_{t} = \begin{cases} \text{Increasing} & \text{if } \Delta EC\_{t} > \lambda \cdot \sigma\_{\Delta EC} \\ \text{Constant} & \text{if } |\Delta EC\_{t}| \le \lambda \cdot \sigma\_{\Delta EC} \\ \text{Decreasing} & \text{if } \Delta EC\_{t} < -\lambda \cdot \sigma\_{\Delta EC} \end{cases} \quad \Delta FT\_{t} = \begin{cases} \text{Increasing} & \text{if } \Delta FT\_{t} > \lambda \cdot \sigma\_{\Delta FT} \\ \text{Constant} & \text{if } |\Delta FT\_{t}| \le \lambda \cdot \sigma\_{\Delta FT} \\ \text{Decreasing} & \text{if } \Delta FT\_{t} < -\lambda \cdot \sigma\_{\Delta FT} \end{cases} \tag{1}$$

where *σ*Δ*EC* and *σ*Δ*FT* represent the standard deviations over the period *T* of *EC* and *FT*, respectively, and *λ* is a scalar value that modulates the labelling distribution. Note that (i) a *λ* value above 1.5 would make the entire definition meaningless since most values would remain as constant, and (ii) the *λ* value definition is an external model parameter and can be guided either subjectively or via statistical meaning.


**Table 1.** Operational relative-hardness criteria based on one time-step difference of energy consumption and feed tonnage.

#### *2.2. Long Short-Term Memory*

The Long Short-Term Memory (LSTM) [15] neural network architecture belongs to the family of recurrent neural networks in Deep Learning [16]. They are suitable to capture short and long term relationships in temporal datasets. Internally, LSTM applies several combinations of affine transformations, element-wise multiplications and non-linear transfer functions, for which the building blocks are:


where *m* is the number of variables as input, *K* is the number of output variables, and *nH* is the number of hidden units. Let *<sup>τ</sup>* <sup>∈</sup> <sup>N</sup> be a temporal window. At each time *<sup>t</sup>* ∈ {1, ..., *<sup>τ</sup>*}, the LSTM receives the input **x***t*, the previous hidden state *ht*−<sup>1</sup> and previous memory cell *ct*−1. The forget gate *ft* = *σ* **W***<sup>f</sup>* **x***<sup>t</sup>* + **U***<sup>f</sup> ht*−<sup>1</sup> + **b***<sup>f</sup>* is the permissive barrier of the information carried by **x***t*. The input gate *it* = *σ* **W***i***x***<sup>t</sup>* + **U***iht*−<sup>1</sup> + **b***<sup>i</sup>* decides the relevance of the information carried by **x***t*. Note that both *ft* and *it* use sigmoid *σ*(*x*)=(1 + *e*−*x*)−<sup>1</sup> as the activation function over a linear combination of **x***<sup>t</sup>* and *ht*−1.

By passing the combination of **x***<sup>t</sup>* and *ht*−<sup>1</sup> through a Tanh function, a candidate memory cell *c*˜*<sup>t</sup>* = *Tanh* **W***c***x***<sup>t</sup>* + **U***cht*−<sup>1</sup> + **b***<sup>c</sup>* is computed. The final memory cell *ct* = *ft ct*−<sup>1</sup> + *it c*˜*<sup>t</sup>* is computed as a sum of (i) what to forget from the past memory cell as an element-wise multiplication () between *ft* and *ct*−1, and (ii) what to learn from the candidate memory cell as an element-wise multiplication () between *it* and *c*˜*t*.

Similar to *it* and *ft* the output gate *ot* = *σ* **W***o***x***<sup>t</sup>* + **U***oht*−<sup>1</sup> + **b***<sup>o</sup>* passes through a sigmoid function a linear combination between **x***<sup>t</sup>* and *ht*−1. It controls the information passing from the current memory cell *ct* to the final hidden state *ht* = *Tanh ct ot* as an element-wise multiplication between *ot* and *Tanh ct* . At the final step *τ*, the output is computed as *y<sup>τ</sup>* = **V***h<sup>τ</sup>* + **c** . When dealing with more than one categorical prediction (*K* > 1), as in the present work for ORH forecasting, a softmax function is applied over **y***τ* to obtain the normalized probability distribution, and the category *k* has a probability of *<sup>p</sup>*ˆ(*k*) = exp(*yτ*,*k*)

$$\text{of } \hat{p}(k) = \frac{\sum\_{\epsilon=1}^{K} \exp(y\_{\tau\_{\hat{\epsilon}}})}{\sum\_{\epsilon=1}^{K} \exp(y\_{\tau\_{\hat{\epsilon}}})}.$$

An illustrative scheme of the internal connection at time step *t* inside an LSTM is shown in Figure 1 (left). The ORH prediction has three categories (hard, soft and undefined) and the probability is computed at the last unit, at time step *τ*, as shown in the unrolled LSTM in Figure 1 (right).

**Figure 1.** Schemes. Information flow inside Long Short-Term Memory (LSTM) (**left**) and unrolled LSTM where the output is computed at the last recurrence (**right**).

#### **3. Experiment**

### *3.1. Dataset*

We used two datasets containing operational data for two independent SAG mills every half hour over a total time of 340 and 331 days, respectively. Each one of the SAG mills receives fresh feed and is connected in an open circuit configuration (SABC-B) where the pebble crusher product is sent to ball mills. At each time *t*, the dataset contains Feed tonnage (FT) (ton/h), Energy consumption (EC) (kWh), Bearing pressure (BPr) (psi) and Spindle speed (SSp) (rpm). They are split into two main subsets (a validation dataset is not considered since the optimum LSTM architecture to train is drawn from previous work [14]): training and testing (Table 2). This is an arbitrary division, and we seek to have a proportion of ∼50/50, respectively.


**Table 2.** Summary statistics over training testing dataset on semi-autogenous grinding mill (SAG) mills.

As it can be seen in Table 2, the predictive methods are trained with the first 50% and tested with the upcoming 50%, without being fed with the previous 50% of historical data.

Note that the comminution properties of the ore, such as *a* × *b* or BWi, are not included in the datasets; therefore, the relationship between forecasted ORH and comminution properties is not explored in this work. The results herein presented, however, serve as a basis to examine such a relationship if those properties were known.

#### *3.2. Assumptions*

SAG mills are fundamental pieces in comminution circuits. As no information regarding downstream/upstream processes is available, recognizing bottlenecks in the dataset becomes subjective. We assume that SAG mills will potentially show changes from steady-state to under capacity and vice versa along with the dataset. Thus, stationarity of all operational variable distributions is

assumed throughout this work, including the ore grindability. It means that the entire dataset belongs to a known and planned combination of ore characteristics (geometallurgical units). By doing so, we limit the applicability of the present models beyond the temporal dataset without a proper training process.

As explained in the problem statement section, we make use of the temporal average over energy consumption and feed tonnage as input for operational hardness prediction. Thus, we assume an additivity property over those variables as their units are kWh and ton/h, respectively, over constant temporal discretization so averaging adjacent data points is mathematically consistent.

In the operation from which the datasets were obtained, the SAG mill liners are replaced every 5–7 months. Since the datasets cover almost a year, we can ensure that the liners were replaced in each SAG mill at least once during the tested period, which may alter the relationship between energy consumption and other operational variables, inducing a discontinuity in the temporal plots. However, since in this work the temporal window for ORH evaluation is eight hours, the local discontinuity associated with liners replacement is not expected to affect the forecast at that time frame. The ORH is related to what was happening in the corresponding mill within the last few hours, and not to the mill behaviour prior to the last replacement of liners.
