*3.1. Data Pre-Processing Algorithm*

The field data acquisition system collects real-time machine-state data continuously. In order to avoid the interference of invalid data to the power-feeding model, only the data collected during the silage harvesting period should be selected as the original data input of the model.

#### 3.1.1. Effective Data Filtering

The shredding roller and throwing blower are the key operating components of the silage harvester, which have more than 60% of the power consumption of the whole machine and have more significant data variation characteristics. Therefore, the monitoring data of these two components are used as the screening basis, and combined with the time-lag analysis model based on material flow, the screening of all measurement data is realized. The main process is as follows.

(1) Define a collection of data categories,D={*d1*, *d2*, *d3*, *d4*} = {1, 2, 3, 0}. Where *d1*, *d2*, *d3*, and *d4* represent pre-acceleration data, field harvesting data, harvesting stopped data, and non-experimental suspended data, respectively.

(2) Preliminary classification is carried out depending on the rotation speed of the shredding roller, where the experimental and non-experimental data segments are obtained.

$$D\_i = \begin{cases} 0 & n\_c < 0.5n\_{cc} \\ 1,2,3 & n\_c \ge 0.5n\_{cc} \end{cases} \tag{5}$$

where *Di* is the data category of the *ith* sampling. *nc and nce* are the real-time shredding speed and rated shredding speed, respectively.

(3) The Mann-Kendall algorithm is used to detect the boundary points of field harvesting data [21]. Mann-Kendall is a non-parametric time-series rank test method which has good applicability to field data with anomalous interference and unknown sample distribution. However, it is not appropriate for multiple mutation detection. Thus, the experiment samples in the original torque detection sequence are segmented into two data subsets, which are independently used to detect the loaded mutation boundary points and the unloaded mutation boundary points. In the case of a subset *X* = {*x*1, *x*2, *x*<sup>3</sup> ... ,*x*n} with *n* torque detection sequences, first, a forward rank sequence *Sk* is constructed by forwarding traversal and calculation of the cumulative count of torque values at the *ith* time greater than that at the *jth* time in the sequence, as shown in Formula (6).

$$\begin{aligned} \mathcal{S}\_k &= \sum\_{i=1}^k p\_k \mid k = 2, 3, \dots, n \mid \\ p\_k &= \sum\_{j=1}^k m\_{kj} \mid j = 1, 2, \dots, k \mid \\ m\_{kj} &= \begin{cases} 1, \ge \boldsymbol{x}\_k > \boldsymbol{x}\_j \\ 0, \boldsymbol{x}\_k \le \boldsymbol{x}\_j \end{cases} \end{aligned} \tag{6}$$

Secondly, based on the sequence mean *E*(*Sk*) and sequence variance *var*(*Sk*), the forwarding statistics sequence *UFk* is calculated by Formula (7).

$$\begin{array}{l} UF\_k = \begin{cases} 0 & (k=1) \\ \frac{(S\_k - E(S\_k))}{\sqrt{var(S\_k)}} \left(k=2, 3, \cdots, n\right) \\ E(S\_k) = \frac{k(k-1)}{4} \\ var(S\_k) = \frac{k(k-1)(2k+5)}{72} \end{cases} \tag{7}$$

Thirdly, the reverse statistic sequence *UBk* is calculated by reconstructing the reverse detection sequence *XB* = {*xn*, *xn-*1, ... , *x*1}, repeating the above traversal process, and taking the negative. Then, under the assumption that the statistic follows the standard normal distribution [22], the solution in Equation (8) under the *U0.05* significance level constraint is the rising or falling boundary of the effective data segment.

$$\begin{cases} \, \text{lIF}\_k(i) - \text{lIB}\_k(i) = 0\\ \, \text{s.t.} \, \text{lIF}\_k(i) \in \mathcal{U}\_{0.05}, i = 2, 3, \dots, n \end{cases} \tag{8}$$

(4) Through the steps above, the data screening for shedding rollers and throwing blowers can be achieved. However, as for header and feeding parts, the data variation characteristics are not sufficient to enable precise boundary extraction. Thus, we locate the harvesting data segment of these two parts by the material's time-delay model in the silage harvester, as shown in Figure 4.

**Figure 4.** Time lag analysis of silage flow in the harvester. 1: header; 2: conveying roller; 3: feeding rollers; 4: shedding rollers; 5: self-sharpening unit; 6: self-sharpening unit; 7: kernel-processing unit.

As the silage is fed into the harvester at the time *th*, there is almost no time delay for header torque change; After the collection of the header, the silage then reaches the feeding conveying rollers at the time *tf*, while a time lag of Δ*t*<sup>1</sup> exists for the feeding power. After the silage is compacted and conveyed by the feeding rollers, it arrives at the shedding rollers at the time *tc*, and there is a time lag of (Δ*t*<sup>1</sup> + Δ*t*2) for the shredding torque change. Finally, after the crop is shredded, it reaches the blower at the time tb, while the blower load data has a time delay of (Δ*t*<sup>1</sup> + Δ*t*<sup>2</sup> + Δ*t*3). Δ*t*<sup>1</sup> and Δ*t*<sup>2</sup> are calculated by Formula (9).

$$\begin{array}{l} \Delta t\_1 + \Delta t\_2 = \frac{S\_1}{\text{120}\pi n\_k R\_h} + \frac{S\_2}{\text{120}\pi n\_f R\_f} = \frac{S\_1 n\_f R\_f + S\_2 n\_h R\_h}{\text{120}\pi n\_f n\_h R\_f R\_h} \\ \Delta t\_2 = \frac{S\_2}{\text{120}\pi n\_f R\_f} \end{array} \tag{9}$$

Thus, the effective data segments of the header, feeding unit, shredding rollers, and throwing blows are *Xh*, *Xf*, *Xc*, and *Xb* respectively, as shown in the Formula (10).

$$\begin{aligned} \mathbf{X}\_{h} &= \left[ \mathbf{x}(t\_{\rm cs} - \Delta t\_{1} - \Delta t\_{2}), \mathbf{x}(t\_{\rm cs} - \Delta t\_{1} - \Delta t\_{2} + T), \dots, \mathbf{x}(t\_{\rm cs} - \Delta t\_{1} - \Delta t\_{2} + iT), \dots, \mathbf{x}(t\_{\rm c\varepsilon} - \Delta t\_{1} - \Delta t\_{2}) \right] \\ \mathbf{X}\_{f} &= \left[ \mathbf{x}(t\_{\rm cs} - \Delta t\_{2}), \mathbf{x}(t\_{\rm cs} - \Delta t\_{2} + T), \dots, \mathbf{x}(t\_{\rm cs} - \Delta t\_{2} + iT), \dots, \mathbf{x}(t\_{\rm c\varepsilon} - \Delta t\_{2}) \right] \\ \mathbf{X}\_{c} &= \left[ \mathbf{x}(t\_{\rm cs}), \mathbf{x}(t\_{\rm cs} + T), \dots, \mathbf{x}(t\_{\rm cs} + iT), \dots, \mathbf{x}(t\_{\rm c\varepsilon}) \right] \\ \mathbf{X}\_{b} &= \left[ \mathbf{x}(t\_{\rm bs}), \mathbf{x}(t\_{\rm bs} + T), \dots, \mathbf{x}(t\_{\rm bs} + iT), \dots, \mathbf{x}(t\_{\rm bc}) \right] \end{aligned} \tag{10}$$

where *S*<sup>1</sup> and *S*<sup>2</sup> are the moving distance of the crop at the header and the feeder, respectively. *Rh* and *Rf* are the radii of the header cutting disk and the upper feed roller, respectively. *nh* and *nf* are the rotation speed of the header and the upper feed roller, respectively. *tcs* and *tce* denote the rising edge and descending edge of the filtered data in the shedder. *tbs* and *tbe* express the rising edge and descending edge of the filtered data in the blower.
