*5.2. Traffic Analysis*

As seen in Tables 5 and 6, Sensors Calibration Test is executed 4% data transmission loss, while Test 1 (Real-condition) is executed with 12% data transmission loss. It is clear that a higher number of measurements causes a higher data transmission loss.


**Table 5.** Sensors Calibration Test Traffic Analysis.


**Table 6.** Test 1 (Real-condition) Traffic Analysis.

### *5.3. Data Visualization*

Figure 7 visualizes the sensing reading of temperature (legend: Temp), humidity (legend: Hum), light intensity (legend:LightInt) and soil moisture (legend: SoilMoist), collected in the Test 1 (Real-condition), in total of 1776 observations. For decision-making purposes, three different watering events have been tested and can be observed; (A) Strawberry plant not watered, (B) Strawberry plant in humid soil and (C) Strawberry plant watered.

**Figure 7.** Visualization of Test 1 (Real-condition) data. x axis is time (Number of Measurements). y axis represents the sensor readings. (**a**) Temp unit is ◦C, (**b**) Hum unit is % RH, (**c**) LightInt unit is Volts and (**d**) SoilMoist unit is Volts. For decision-making purposes, three different watering events have been tested and can be observed; (A) Strawberry plant not watered, (B) Strawberry plant in humid soil and (C) Strawberry plant watered.

#### Correlation Coefficients

Correlation coefficients are used to measure the dependence of the readings between two sensors *X* and *Y*. The Pearson correlation coefficient is defined as:

$$
\rho(X,Y) = \frac{cov(X,Y)}{\sigma\_X \sigma\_Y},
$$

where *cov*(*X*,*Y*) is the covariance of X and Y, and *σ<sup>X</sup>* and *σ<sup>Y</sup>* are the standard deviation of *X* and *Y*, respectively. The values of the coefficients can range from −1 to 1. Value −1 represents a directly negative correlation, 0 represents no correlation, and 1 represents a directly positive correlation.

Table 7 lists the *ρ*(*X*,*Y*) values for each pairwise variable combinations of temperature, humidity, light intensity and soil moisture, shorted as Temp, Hum, LightInt and SoilMoist respectively. It shows that Temp and Hum has a strong negative linear relationship, Temp and SoilMoist has a moderate positive linear relationship, and Hum and SoilMoist has a moderate negative linear relationship.

These findings are consistent with domain knowledge in agriculture: relative humidity relies on both pressure and temperature. At a lower temperature, less water vapor is needed to reach a high level of humidity. However, at a higher temperature, a higher water vapor is needed to obtain a high level of relative humidity.


**Table 7.** Correlation coefficients *ρ* values.

### *5.4. Feature Selection and Evaluation*

Data generating in this IoT system comes from four sensors. They measure temperature, humidity, light intensity and soil moisture. In the dataset, one feature contains the readings from one sensor. Data of each feature is being generated by the according sensor node. Laplacian scores are calculated to measure the important of features.

Laplacian scores here are for unsupervised learning. To further evaluate it, we test the result on the following example, as an application in future decision support.

### Example in Decision Support

The outputs from the unsupervised method Laplacian scores can be used to for decision-making. For example, an expert labeled the data collected and decided when watering is needed. We compare the classification outcomes of using the selected features from using Laplacian scores and of using the all sensor data. Please note that the class label is only for one action here, while Laplacian scores is generated without class labels for general purpose.

Classifiers' accuracy and performance measured using data inputs with 5 min transmission rate and last 2 h average. In both cases, classification conducted using the 4 and 3 most important features based on their scores.

The accuracy and performance of resulting classifiers using data inputs with 5 min transmission rate is shown in Table 8 for the 4 and 3 most important features, respectively.

**Table 8.** The accuracy and performance of resulting classifiers using data inputs with 5 min transmission rate for the 4 and 3 most important features.


The accuracy and performance of resulting classifiers using data inputs with last 2 h average is shown in Table 9 for the 4 and 3 most important features, respectively.

Overall, the classification results showed that the decision of watering or not a plant can be made using a reduced number of sensors. With 5 min transmission rate, the accuracy for decision-making achieved 95% when the least important feature has been removed. With the last 2 h average data set, the accuracy for decision-making achieved 97% when reducing the features to 3.


**Table 9.** The accuracy and performance of resulting classifiers using data inputs with last 2 h average for the 4 and 3 most important features.

Often the acceptable level of accuracy is user defined, depending the nature of the subject or scenarios [41]. In this case, the accuracy reduces from nearly 100% to 97% and 95%, which means the error is within 5%. In statistics, when the type of error rate is within 5%, which is acceptable to have a 5% probability of incorrectly rejecting the true null hypothesis [41]. In addition, it is a common practice.

This approach can be promising for a large-scale deployment. The sum of a large amount of data from the least important sensor(s) might be reduced, if using appropriate data-mining methods to select sensors which are more important to the chosen decision-making.

### **6. Conclusions**

This paper addresses the open challenge of feature reduction in IoT systems for agricultural plant-monitoring and decision-making support. Our data reduction approach is unsupervised learning using Laplacian scores. This approach is especially useful when class labels are unavailable. Using similarity and difference, features are ranked, so that users can select the most important features, rather than the whole feature set. Giving high resolutions of some features in real-world IoT applications, this will help reduce the volume of data to be transmitted. To evaluate our proposal, a real-world strawberry-plant monitoring IoT system has been implemented, calibrated and tested, measuring real-condition parameters such as temperature, relative humidity, soil moisture and light intensity. Our research has demonstrated that the proposed feature reduction can significantly reduce the volume data required to be transferred from the LoRa Node (edge device) to the network, while keeping the IoT system functioning at high accuracy levels. Moreover, the proposed IoT system has been tested on a specific decision-making support task (to water or not to water). The experimental results clearly show that the accuracy of decision-making on the reduced data decreases at an acceptable level (only 3–5%). The proposed research can potentially be used and provide insights for a rich range of decision-making tasks related to agricultural monitoring which can release the burden of data volume off the IoT systems.

In the future, this work can be expanded to another decision-making task except for watering a plant. For instance, if a greenhouse includes cooling fans, the event of turning them on/off could be controlled through an IoT system, similarly to what is proposed above. Strawberry and any other plants are very sensitive to very high/low level values of temperature or relative humidity so this could prevent them from being destroyed. Moreover, farmers can take advantage of this decision-making support to become more efficient on the usage of cooling fans, preventing high amount electricity bills. This decision-making scenario is planned to be conducted in the future when a greenhouse with such cooler fans is identified.

**Author Contributions:** Conceptualization, G.T., X.D., G.F. and N.J.; Data curation, G.T. and N.J.; Formal analysis, G.T., X.D. and N.J.; Funding acquisition, G.F. and N.J.; Investigation, G.T., X.D. and N.J.; Methodology, G.T., X.D., G.F. and N.J.; Project administration, X.D., G.F. and N.J.; Resources, G.F. and N.J.; Software, G.T. and N.J.; Supervision, X.D., G.F. and N.J.; Validation, G.T., X.D., G.F. and N.J.; Visualization, G.T. and N.J.; Writing–original draft, G.T., X.D., G.F. and N.J.; Writing–review & editing, G.T., X.D., G.F. and N.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** Nanlin Jin is partly funded by Landslide Mitigation Informatics (LIMIT): Effective decision-making for complex landslide geo-hazards provided by NERC (NE/T005653/1) for this research.

**Acknowledgments:** The authors would like to thank Alun Moon and David Kendall of Northumbria University at Newcastle for the support throughout the project. The authors would also like to thank K & M Yiannoukkou Strawberry Production for the providence of their greenhouse premises for real-world plant-monitoring, data gathering and testing.

**Conflicts of Interest:** The authors declare no conflict of interest.
