3.1.2. Data Cleaning

When the sensor detects and transmits the status of the cow, it also sends a lot of invalid data. The accuracy of the original data will be considerably influenced if using these data directly. Therefore, the initial step is to clear up the corrupted data.

Because the data returned by the sensor represents the cattle's states at a specific point in time, quantifying that state is critical for further design. In this work, the time of various states each hour in minutes is taken as the research object. Because corrupted or invalid

data usually aggregate, identifying the point at which incorrect data arrives as 0 is not precise. For example, if there is a large amount of damage data in an hour, the rest state of the cattle for that hour will be marked as 0 min, which will affect the calculation of the average single period. Therefore, deleting corrupted data and the corresponding time serial number, to ensure that they are not included in the calculation of the average period.

The flow chart of data cleaning is shown in Figure 1. Data cleaning mainly focuses on the segmented data to clean and organize and finally obtains the cleaned data and its corresponding time series. This step primarily calculates the resting time of cattle in each hour. If it exists any corrupted data during the calculated hour, that hour's data will be destroyed.

#### *3.2. The State of Cattle throughout the Sampling Period*

Acquiring the cattle's state changes across the sample period needs to average one group's data of cows due to large amount of discrete and lost data from a single cow. For example, averaging the resting time per hour of 14 Brahman females can determine variations in the resting state of Brahman treated during the sample period. The time series after data cleaning are different between each cattle's data set, since invalid data collected by sensors in the farm's IoT system is usually a random process. Therefore, the data processing in this step is to average the state data of the cattle with the same time serial number and obtain the state curve of the cattle in the whole cycle. The process flow chart of an average state time for several cattle can be found in Figure 2.

**Figure 2.** Process flow chart of average state times for multiple cattle.

The state diagram of cattle in the entire cycle can be obtained after the program has been executed. Figure 3 shows the calculated hourly rest time of the cattle in the whole cycle (Brahman Female). The number on the abscissa corresponds to the corresponding day, which includes all 24 h. The ordinate represents the rest time corresponding to this hour in minutes.

**Figure 3.** The resting time of Brahman Female during the whole sample period.

Figure 4 is a detailed zoomed-in part of Figure 3 and located between days 16 and 20. It is obvious that the rest time of cattle varies periodically with a cycle of one day. The peaks of the daily rest time can be found in both early morning and late-night while the valleys can usually be identified at forenoon and afternoon hours.

**Figure 4.** The enlarge vision shown 4 days.

#### *3.3. The Average 24 h State of Cattle*

The averaged single rest cycle data result (which is 24 h) of a single cattle is plotted in Figure 5. The entire sampling cycle is approximately 52 days as shown in Figure 3. The abscissa refers to the o'clock, i.e., from 0:00 to 23:59, and the ordinate relates to the rest period in minutes at this hour (Brahman Female). The average period's plot is flatter than a single period's plot. However, the trend and structure of these two are nearly identical, and a single cycle has more individual points and noises.

**Figure 5.** The average of one resting period for cattle(i.e., 24 h a day).

#### *3.4. Fitting Curve for the Average State Period (24 h)*

Curve fitting is commonly used to obtain the data relationship for such irregular curves. Typical fitting methods include minimum binomial fitting, exponential function fitting, power function fitting, and hyperbola fitting. Different fitting approaches are compared in this section to obtain the most ideal mathematical model [24,25].


**Table 3.** The results of different fitting methods.

Four fitting approaches are utilized to fit the 24-h average rest duration of cattle: Gaussian fitting, Sum of Sine fitting, Polynomial fitting, and Fourier fitting. The independent variable is the time, and the dependent variable is the rest period of cattle corresponding to that time while fitting the curve. The relationship between the time and the associated rest time can be established, and the curve of the cattle's rest period throughout the day can be obtained. As indicated in Table 3, Gaussian (item number 8) fitting is found to be the most accurate model among all candidates in terms of the fitting variance result. The error variance of Gaussian fitting is only 3.0037, which is much smaller than that of other fitting methods. The fitted curve shape is depicted in Figure 6, it is basically consistent with that of the average period in Figure 5.

The formula of the fitting curve (Gauss eight-term) formula is:

$$f(x) = 51.29e^{(-\frac{x-2.823}{2.997})^2} + 44.42e^{(-\frac{x-24.19}{3.996})^2} + 1.378 \times 10^{14}e^{(-\frac{x+40.24}{7.546})^2} + 19.29e^{(-\frac{x-13.55}{3.22})^2} \tag{1}$$

$$+ 16.18e^{(-\frac{x-19.06}{10.597})^2} + 19.25e^{(-\frac{x-4.589}{0.5932})^2} + 29.29e^{(-\frac{x-20.29}{1.5932})^2} + 20.45e^{(-\frac{x-9.02}{2.834})^2} \tag{1}$$

In Equation (1), *x* is the clock of a day, while *f*(*x*) denotes the rest time within one hour of that clock. Regarding the low standard deviation and variance of this fitting result, this model is considered to be the proper candidate to describe the resting time of cattle in a day for Brahman Females. The models for other breeds, genders and states can be obtained in the same way.

**Figure 6.** The Gaussian Fitting for one average period.

#### *3.5. Noise Reduction Using Low-Pass Finite Impulse Response (FIR) Filter*

Throughout the sample period, the cattle's condition varies on daily basis. The plot of the entire activity cycle contains noise and outliers in Figure 3. Therefore, denoising the sampled data is required.

FIR andInfinite Impulse Response (IIR) are two types of digital filters that are extensively employed. In theory, an IIR function's filtering effect is superior to that of an FIR function of the same order, but divergence can occur. The IIR digital filter has a high precision for amplitude-frequency characteristics, and with a non-linear phase, it is suited for audio signals that are insensitive to phase information. FIR digital filters have lesser amplitude-frequency precision than IIR digital filters. However, the phase is linear, meaning the time difference between signals of various frequency components remains unaltered after going through the FIR filter. In addition, the calculation time delay is relatively tiny, it is suited for real-time signal processing [26,26]. Because the state of the cattle is time-series data, it is critical to ensure that the filtered phase remains constant. Therefore, in this work, we use the FIR low-pass filter for denoising.

Cattle monitoring data are sampled once every 60 s in this study, resulting in a sampling frequency of around 0.0167 Hz. This is a low-frequency sampling signal, and the noise is present between each sampling. Noise frequency is more extensive than sampling frequency, so the signal between 0 and 0.0167 Hz is kept while the signal above 0.0167 Hz is eliminated. In Figure 7, the filter length is set to 5, and the filter's shape corresponds to its frequency. The filtered result is depicted in Figure 8, which uses the resting time of a Brahman Female's cow as an example.

**Figure 7.** The shape of the FIR filter and the frequency response.

**Figure 8.** The resting time of cattle during the whole period after using the FIR filter. (**a**) The whole sample period. (**b**) The enlarge vision shown 4 days.

Figure 8b is a local detailed version of Figure 8a, focusing on the comparison of before using FIR filtering and after using FIR filtering from the 16th to the 20th day. Data performance is optimized after the introduction of the FIR filter for smooth signal processing, and the data trend can be clearly identified.

After going through the FIR filter, Figure 9 provides an image of a single rest period (one day, Day 17). In comparison to Figure 5, it exhibits the same trend, i.e., one day's rest time after filtering is nearly the same as one day's typical rest time. This feature demonstrates that the cattle's condition changes on a regular basis. It also indicates that the FIR filtered signal is effective and precise. The FIR filter effectively minimizes noise and eliminates outliers and gross inaccuracy. As a result, the signal filtered by the FIR filter can be used for subsequent modelling and prediction.

**Figure 9.** The single resting cycle after the FIR filter.

#### **4. Prediction Based on LSTM Model**

In DL, the LSTM network is a unique RNN model. Its unique structural design allows it to avoid long-term reliance. The default nature of LSTM is to remember information from a long time ago [12,17,27,28]. In this section, we employ the LSTM model to forecast the status of cattle based on the above research content. To be more explicit, the structure and properties of LSTM and how to construct an LSTM model are first discussed. Second, using the LSTM model, the cattle status is modelled and forecasted. Finally, the model is optimized in order to improve its accuracy.

#### *4.1. Build the LSTM Model of the Cattle State*

The program flow chart for establishing the LSTM model is shown in Figure 10. First, import the data previously filtered by the FIR filter, and divide it into a test set and a training set. Second, the LSTM model is created. Setting parameters: the number of input neurons, output neurons, hidden neurons, learning rate, batch size, epoch size (i.e., the number of training cycles) and the number of LSTM layers [29,30]. The loss error is chosen as the mean square error, and the LSTM neural network is trained using the Adam optimisation technique [31]. The cycle ends when the number of training times is reached, and the lowest loss error will be the output.

**Figure 10.** The code process of building the LSTM model.

#### *4.2. Using the LSTM Model to Predict the State of Cattle*

It is critical to determine the input, output, and time series before using the constructed LSTM model for cattle state prediction. The cattle's state must be presented as the output, and the number of the independent variable hours must be seen as a time series, according to the characteristics of the data sets. As a result, determining input variables is a challenging aspect of this approach. Because the output variable must be data with periodic changes, the input must be a known fixed periodic function. Time series as a fixed periodic function can be used as input. To be more specific, given that the state cycle of cattle is one day, it is appropriate to determine the input variable as the number of hours on the clock each day. The input and output variables, as well as the time series, for the resting time of Brahman Female's cattle are as follows:

Input: The number of hours on the clock each day (24 h).

Output: The resting time during this hour (e.g., The resting time at 7:00 means that the resting time during one hour from 7:00 to 7:59).

Time series *t*: The sequence number of this hour (e.g., 0:00 a.m. on the first day is the first hour, and *t* is 1. So on, 0:00am on the second day is the 25th h, and *t* is 25) [30,32].

• Training:

> Both the input and output data are periodicities. The distinction is that the input in this cycle has a set value and trend, whereas the output in each cycle has a varied value. For example, the input is 0 at 0:00 a.m. on Day 17th and 0:00 a.m. on Day 24th, as shown by the two red lines in Figure 11, but the output is different. In other words, the same input might result in multiple outcomes regardless of time. Although the input is the same, the input's matching time series is not. As a result, when a single input correlates to numerous outputs in a time series, the LSTM model can successfully handle the problem.

**Figure 11.** The input and output based on the LSTM model.

• Testing and prediction:

> In total, 90% of the data is used for training, and 10% for prediction and testing. For example, the input data sets for training are *inputt*1 through *inputt*90 , while the data sets for testing are *inputt*91 through *inputt*100 . The training outcomes are depicted in Figure 12.

**Figure 12.** The predictive results after training and testing.

The predict and actual results are similarly shown in Figure 12. This means the digital twin model for individual cattle is basically established. The training loss reduces during the training process, showing that the model is converged and practical in Figure 13. However, the prediction results' error is relatively significant, which indicates further requirements of the parameter optimization in the model.
