*3.2. MCRNN Architecture*

## 3.2.1. Input Representation

Consideration should be given to using multiscale time series to build an accurate and reliable time series model. The long-term temporal pattern shows general trend changes,

and the short-term temporal pattern reflects fine-grained fluctuations. Both patterns are critical to the performance of TSC. In our research work, we transform the original input space to obtain representation at different time scales and frequencies inspired by Cui et al. [40]. The transformation includes two stages: downsampling transformation in the time domain and smoothing transformation in the frequency domain. In the first stage, we downsample from the sequence *X* = [*<sup>x</sup>*1, *x*2, ..., *xT*] of mold level fluctuation and the downsampling rate is *r*. Then, new time series *Xr* is generated from the original sequence by retaining every *rth* data points.

$$X^r = \{x\_{1+r \ast i}\}, i = 0, 1, \ldots, \lfloor \frac{T-1}{r} \rfloor \tag{1}$$

Due to the influence of high-frequency disturbances and random noise, we carry out the moving average of the time series in the second stage to solve the problem. Given an original sequence *X* = [*<sup>x</sup>*1, *x*2, ..., *xT*] of mold level fluctuation, a new time series can be defined as *Xw* according to different degrees of smoothness.

$$X^w = \{ \frac{1}{w} \sum\_{i=(j-1)w+1}^{jw} x\_i \}, j = 1, 2, \dots, \frac{T}{w} \tag{2}$$

where *w* is the window size.

As shown in Figure 4, a sequence of the mold level fluctuation values in the production time of one slab transforms in time and frequency dimensions. For different downsampling rates and degrees of smoothness, we can ge<sup>t</sup> multiple time sequences, each of which corresponds to different scale representations of original sequence input. With the multiscale transformation of input, long-term temporal patterns and short-term temporal patterns can be employed to build a robust model. At the same time, the new time series based on the moving average of different windows reduces the noise of the original sequence. After two stages of transformation, the input is divided into two modules and fed into the neural network. For *r* and *w*, it is related to the sampling size. Sampling size is the sample points for each slab. We compared the sampling size values when the sampling rate is 1:2. As shown in Table 2, the model trained well when the sampling size was equal to 256, so we use 256 in our model.

**Figure 4.** Illustration of the input transformations when *r* = 4 and *w* = 4.


**Table 2.** Comparison of sampling size with sampling ratios = 1:2.
