5.2. Experimental Results
Reader–antenna positions within the simulated ward were recorded based upon their coordinates. The 16 arbitrarily chosen reader–antenna positions resulted in 120 combinations of the paired readers. General descriptive statistics of the data collected are presented in
Table 3. This includes the number of records collected from the positions selected for analysis, mean and standard deviation of the variables, minimum and maximum values together with quartile ranges. The range of each feature differs as the measuring unit for each one is different. All features have the same number of records with no missing values.
Table 4 presents the top 5 records of the data set features after data preparation was completed.
Prior to data preparation, the RSSI values of all reader–antenna positions in the simulated ward were plotted on a line graph with respect to a bar graph of the static tag position and are presented in
Figure 5. The alphabetic positions of the reader–antennas are denoted on the
x axis and the static tag distance in meters is on the
axis. The RSSI percentage value is denoted on the
axis ranges from 0 to 100. The maximum RSSI value was approximately 86% and steadily decreased to 66% as the readers were brought further apart. The closest antenna position on the bar graph is the
j position and farthest is the
e position.
Data collected from the paired reader positions were analysed.
Table 5 shows the matrix formed. The maximum RSSI value gathered from the two reader–antennas is presented. Each of the 16 reader–antennas was combined with the remaining reader–antenna positions. There were no repetitions as both reader–antennas were identical in both configuration and features. Close observation of
Table 5 showed that
j antenna position had the highest RSSI value and
e position had the least RSSI value.
Using the ordinary least squares (OLS) linear regression method, a linear relationship between independent and dependent variables was determined. The OLS results are presented in
Table 6. It shows that the two independent variables,
Distance_1 and
Distance_2, have a negative coefficient.
Antennas_
Distance variable has coefficient closer to zero value. The constant value was the intercept of the linear equation. Standard error is error in prediction or represents the average distance of the variable from the regression line. Standard error was high in constant value compared to other three variables. The
t-statistic value is a measure of how statistically significant the coefficient is, which was calculated by dividing coefficient with standard error. The constant value had a high
t-statistic value due to its high standard error.
The statistical significance of each variable, which effects the output variable RSSI value, is tested in this OLS linear regression analysis. The
value as shown in
Table 6 defines the significance of the variables, which is the
p-value for the null hypothesis that the coefficient is equal to zero (no effect). By convention,
value (0.05) was set to be the standard measure for significance. If
p value of a variable was less than
, the variable was considered to be statistically significant. In this study,
Distance_1 and
Distance_2 variables were highly significant and the null hypothesis was rejected, i.e., distance of static tag from each antenna–readers would not affect the RSSI value as their
p value was much less than the
value. This showed that the
Distance_1 and
Distance_2 variables were strongly correlated with dependent variable RSSI and a value decrease in these variables would enhance the output variable.
Antennas_
Distance variable’s
p value was also greater than the
value 0.05 and did not reject the null hypothesis. The
r-squared value is a fraction of variation in output variable predicted by input variable [
26]. Here, the value is 0.626.
From
Table 6, a linear Equation (
7) can be formed with coefficients of each variable and constant value.
where
is the output variable, with input variables
Distance_1,
Distance_2 and
Distance_3 being
,
and
, respectively, and the constant is the intercept.
As part of the data modelling tier of this study architecture, multiple machine learning models, such as Decision Tree, Random Forest and XGBoost, were trained and evaluated with performance metrics discussed in
Section 5.1. The decision tree model had performance metrics as MAE 0.01 and MSE 0.003. The hyperparameter of Decision Tree,
, was tuned from 1 to 10. This reduced the error rates till the value 8. Random Forest had a higher performance compared to XGBoost with MAE 0.16 and MSE 0.11. Two hyperparameters—
and
—of the Random Forest model were tuned to enhance the prediction capability of the model and reduce the error rates. In the XGBoost model, the error rates are high compared to the other two machine learning models. The
k-fold cross-validation implemented in the XGBoost model did not seem to improve the performance. Later, Ensemble Learning was implemented by combining individual models with weights, taking average of the outputs as the final result. Decision Tree had the least error rate in all three performance metrics with MAE 0.01 and MSE 0.003, compared to other individual models, even with outperforming the Ensemble Learning model. Performance metrics of the individual models and Ensemble Learning are presented in
Table 7.
Ideally, Ensemble Learning would enhance the prediction accuracy compared to individual models involved in it. In this study, the weighted average method did not improve prediction accuracy or reduce error rates, at least when compared with the Decision Tree model. After tuning the model by testing the coefficients in range of 0.1 to 1, the best performance that the Ensemble Learning model was able to achieve was MAE 0.04 and MSE 0.006 (as shown in
Table 5) with
and
, where
and
refer to Decision Tree, Random Forest and XGBoost models, respectively, for the
function defined by Equation (
4).
The predicted values of test data were compared with original test data to evaluate the individual model’s predictive performance visually.
Figure 6 shows three plots of Decision Tree, Random Forest and XGBoost models. To differentiate the original data and predicted data in the plots, dots refer to original data and lines refer to predicted data. If a line overlaps a dot, the model has predicted the value precisely; otherwise the model has predicted incorrectly. Fluctuations in the original and predicted data seem almost similar in three model plots. Decision Tree model was able to predict almost all data points precisely. Random Forest plot shows four data points were incorrectly predicted, whereas XGBoost model predicted five data points incorrectly. This is consistent with the performance metrics shown in
Table 7.
5.3. Discussion
The main contribution that this study makes is to the understanding of the ground work required for designing a remote patient monitoring system using RFID sensor technology. This work has identified the considerations needed for using RFID reader–antennas to identify vital signs on hospitalised patients that may be able to move freely about the ward. The work implies that this type of approach would be best deployed in patient rooms that are designed to accommodate possibly up to four in-patients instead of an open ward layout. This is because the range for detecting passive RFID signals using the techniques described in this study have a direct bearing on the RSSI. However, this conforms to the goal of the initial scenario for this research, which is to identify early deterioration of suicidal and self-harm behaviours in circumstances where nursing observation and supervision of patient safety are reduced because of lower staffing levels during the known high risk periods in the evening and night shifts. The study has confirmed how the optimum positions for reader–antennas would be chosen for deployment in a psychiatric ward of a given hospital. This was achieved by understanding the relationship between the independent and dependent variables that contribute to detecting the maximum signal strength required for detecting vital signs and patient movement. A better understanding was also offered from the results of this case study as to the implications for choice of a suitable machine learning algorithm to analyse signal data.
The optimum position for the first reader–antenna placed in the simulated ward used for this study produced the highest RSSI tag readability. The second reader position was determined from the next highest signal strength obtained. A benefit of this for eventual system design is that the location of at least one of the reader–antennas could logically be associated with an entrance or doorway and could indicate if a person has left the room with respect to the signal strength associated with this event. The impact of this relationship to tag readability has important practical implications for deployment in a real ward and could effect decisions on how notifications and alarms will be configured so not to disrupt the routine clinical business of a hospital ward.
Figure 5 infers that an increase in distance between tag and antenna decreases the RSSI value. This inference was supported with OLS regression results in
Table 6 and metrics in
Table 5. OLS regression results have a negative coefficient of
and
variables. Negative coefficients present an inverse relation with output variable RSSI. Metrics in
Table 5 show the
j position has the highest RSSI value 86.98, which is the closest reader–antenna position to the tag and the
e position has the lowest RSSI value, which is the farthest reader–antenna position to the tag. All three sets of results presented in
Figure 5 and
Table 5 and
Table 6 confirm that closer positions would have higher RSSI values. When considering the dimensions of the laboratory in this research, position
j could be the first preference for reader–antenna. The probability of selecting two better positions for signal receivable would be narrowed down to 15 combinations as position
j was considered best. The second position was decided based on the spread of reader–antenna positions in
Figure 3 and each individual antenna RSSI value. With this,
j and
e combination would be good pair of positions in the laboratory to fix two UHF RFID reader–antennas. Other RSSI values presented in
Figure 7 show that the difference of RSSI value from positions
k and
c is 4.47 but their distance from static tag is almost same, similar to that on positions
b and
d. This could be useful in identifying multiple tags that are associated with patients where their initial reference could be associated with their hospital bed located in the room.
In
Table 6, the
p value shows the significance of each independent variable in predicting output variable when the value is less than the significance level
value (0.05). Based on this,
and
variables are significant and the
variable is not, as its
p value is far greater than
value (0.05). The
r-squared value is 0.626, which is considerably low. However, the OLS regression model is able to find the relationship between independent and dependent variables. An increase in the
r-squared value would enhance dependent variable prediction accuracy of the linear Equation (
7) introduced previously.