1. Introduction
New data technologies have evolved in recent years due to the expansion of internet networks and the introduction of digital communication techniques. The internet of things (IoT) stack has been used for a wide range of innovative purposes in a wide range of sectors. We were able to increase human–machine communication and achieve our aim of improved collaboration by experimenting with novel approaches for human supervisors and robots to interact with one another [
1]. More adaptable monitoring of complicated operations was made possible by the use of wirelessly connected sensors and technology [
2]. The micro and nanosensors that have been developed are only two examples of the many various types and styles of sensors that have been developed for use in a wide range of detection tasks. These sensors might be used in places that are inaccessible to humans. Small sensors can resist harsh environments and capture crucial data, which may then be sent over a wireless sensor network (WSN) [
3]. The data gathered by sensor arrays and networks are kept on supercomputers and then retrieved using mining techniques.
Wearable sensors may be actively explored and used to detect people’s health, activities, and habits. This can be performed with higher precision. As a result, the widespread use of these sensors has the potential to improve our daily lives in ways analogous to the benefits we obtain from the popular use of computers, cell phones, and so on. The primary monitoring device is in charge of extracting anxiety aspects and creating evaluations based on data collected from sensors put in or worn on a person’s body, clothes, home, and other settings. These sensors can detect the degree of anxiety a person is experiencing. As soon as the data reaches the targeted servers, it is mined using machine learning and deep learning-inspired algorithms.
This study aims to establish techniques for classifying human activities, with the ultimate objective of recognizing and preventing mobility issues due to its results. Movement abnormality exercises are crucial for avoiding deadly crashes or slips because older adults are more likely to have unexpected mobility limits due to health conditions. Human behavior is difficult to observe, since it is complex and occasionally surprising. As a result, it is difficult to establish a stable connection that can be used to influence the conduct of others. Due to the tremendous number and diversity of data available for human tracking and monitoring, predicting how people will perceive and respond to any given set of circumstances is exceedingly challenging. The quest to uncover data-driven alternatives to present employment activities drives advancement in this sector’s artificial intelligence (AI). This article defines the category as an intelligent monitoring system that uses AI-powered apps to monitor the movement of elderly or senior citizens, pregnant women, severely injured patients, and so on. Previous research has shown that using these strategies to treat several conditions is effective.
Wearable sensor systems, such as smartwatches, smartphones, and other mobile computing devices, are now being used in several settings, including hospitals, smart homes, libraries, and other public and private areas. Interference with communication, such as shadowing, noise, interactions with other communication systems, and so on, as well as noise inherited from human organs, are some examples of elements that may delay information transfer (e.g., when heartbeats are monitored, muscular motions or other organs can be interfaced with the electrocardiogram signal). As a result, it is possible that systems that recognize actions or behaviors will not function properly [
4].
The second difficulty is the presence of data that has not been labeled. These are the kind of sources that students typically use to gain access to the sensitive subject matter discussed in class. All data collectors simply acquire the information they want without first classifying it, which raises processing costs [
5].
Wearing smart devices designed to avoid falls presents a distinct set of challenges. A variety of falls can occur, and some are more likely to occur than others based on the presence or absence of specific physical imbalance factors (i.e., using data from an accelerometer sensor, etc.) Because the data generated represent the entirety of a single class, they cannot be addressed by any of the currently known supervised machine learning or deep learning tools or methods [
6].
2. The Literature Review
Human habits and routines are used in many different industries, including economics, internet commerce, health applications, and even security systems. Many approaches have been presented in the research when it comes to gathering behavioural data, as well as analysing and classifying the information that has been acquired.
Perez-Vega et al. [
7] have created a deep neural network (DNN). They intend to develop a theoretical model that defines how organizations and customers might employ AI-enabled data processing technology to improve the results of both solicited and unsolicited forms of online customer interactions. They do this by building on the analogy of AI systems as biological creatures and adopting a stimulus–organism–response theory viewpoint. They can now distinguish between firm-solicited and firm-unsolicited online consumer involvement practices. They operate as triggers for AI organisms to analyse customer-related information, which motivates reactions from AI organisms and humans, changing the settings for future online consumer involvement.
For commercial artificial intelligence applications in the banking industry, Konigstorfer et al. [
8] used the support vector machine (SVM), logistic regression (LR), convolutional neural network (CNN), and artificial neural network (ANN) models. The study suggests that commercial banks may employ AI to enhance automation, reduce loan losses, secure payment processing, and improve consumer targeting.
Recurrent CNN with traffic management systems might minimize highway collisions [
9]. This study examines how crowd sensing and AI might enhance emergency situational awareness and response times.
There is a proposition floating around about how to use recurrent CNN in traffic management systems to assist in reducing the number of collisions that take place on roadways [
9]. This study gives insight into how crowd sensing and artificial intelligence may be utilized to increase emergency situational awareness, as well as response times.
Chunhui Li. et al. [
10] proposed that an ANN be built on a field programmable gate array (FPGA) board and used to assess biodiversity. Data analysis and model evaluation were both used to establish a distinct pattern. AI and neural network algorithms were consistent with model standards that promote openness and reproducibility, and they indicated that incorporating biodiversity assessments into practice will result in high-quality models and reviews.
The authors of [
11,
12] proposed conducting a pilot study to determine whether or not AI can be used in an academic setting. The authors proposed a method for combining artificial autonomy based on humanlike perceptions of non-human animals’ competence and warmth. They rely on the theory of mind perceptions to back up this claim. These studies enhance our theoretical knowledge of artificial autonomy in information systems research because they used AI. The application of AI makes this possible.
In [
13], the performance of multi-layer perceptron (MLP), k-nearest neighbors (KNN), growing self organization map (GSOM), and random forest (RF) in identifying emotional states has investigated to construct profiles based on emotional states. Improving AI’s capabilities has the potential to improve disease modeling, protein structure prediction, therapeutic repurposing, and vaccine development.
Chimamiwa et al. [
14] devised a system for teaching home tasks. Sensors were varied. House sensors capture residents’ behaviors. Between 26 February and 26 August 2020, millions of sensor data samples were collected at 1 Hz. This dataset may test several methods, including data-driven algorithms for routine recognition. Long-term usage of such data by AI systems can reveal the user’s actions and discover changes in their habits.
In [
15], the development of the Earthquake Emergency Micro Reaction Device was observed, and a new device was proposed to improve existing emergency response methods in the aftermath of a devastating earthquake. This device can monitor post-earthquake conditions by combining data from smart watches worn by the general population with a geographic information system (GIS). A system was created to discover imprisoned individuals and important rescue areas by utilizing data generated by smartwatches belonging to probable victims and exposed elements. This new technology can rapidly determine the most critical rescue zone by monitoring patients’ heart rates and calculating their location. It increases the likelihood of successful capture and rescue missions.
In [
4], SVM, KNN, RF, and the hidden Markov model (HMM) were all used in the detection of sleep disorders. Categorization results were obtained by applying various permutations of data, training, and scoring procedures to five distinct machine learning algorithms. The system’s efficiency was tested in two ways: first, the success rate in identifying participants based on their respiratory issues had one misclassification for every seventeen individuals, and second, the accuracy rate in recognizing abnormal respiratory episodes was 85.95%.
A study [
16] demonstrated a one-dimensional CNN-based technique for detecting human activity using triaxial accelerometer data acquired from users’ smartphones. The technology was capable of detecting many forms of human activity. The data collected by a smartphone’s accelerometer sensor relate to three main types of human movement and inactivity. The accuracy of the one-dimensional CNN-based system was 92.71%, which was greater than the standard random forest approach (which reached 89.10% accuracy).
In [
17], a model for distinguishing between sedentary and active behaviour in public datasets using a one-dimensional CNN was proposed. It might be achieved by employing the CNN model. This research technique has been tried and tested (through experiments). The CNN model was made up of four convolutional layers. The rectification linear unit (ReLU) was used as the activation function in each of the CNN layers. The ultimate aim was to obtain a level of accuracy of 95.9%.
The authors of [
18] provide the results of an activity identification experiment performed using Kinect RGB and a depth sensor camera. The investigators had to identify seven distinct human activities (seven classes). The feature vectors for the eight limbs used in the experiment were the joint angles obtained from the Kinect depth sensor, and each of these vectors has three axes. They employed three separate cutting-edge recurrent neural network (RNN) models for training and testing. The comparison of the three RNN models revealed that the long short-term memory (LSTM) model, with a 96% success rate, had the highest accuracy in identifying human activities.
3. Methodology
The first step is to gather data from a wearable sensing device, which may include readings from the accelerometer and gyroscope. Once we obtain the data, the preprocess is made appropriate for analysis by normalizing and labeling the data to show whether or not a person is falling, as well as concatenating them to prepare them for analysis. Then, the labeled data are partitioned into training and testing sets, where the training set is used to train the model. Thereafter, the classification model where the preprocessed data should be fed through it is constructed and optimized to produce a binary classification (fall or not fall). The training data are used to train the mode and adjust its weights to decrease the classification error. Upon the model being trained, use the testing data to assess the model’s performance. Finally, the model’s aptitude is verified to appropriately detect falls and non-falls by computing metrics, such as accuracy, precision, recall, and F1-score.
In this section, the methodology will be explained in detail.
3.1. Dataset
The dataset includes four falling tasks completed by 11 people across three separate tries each [
11]. We obtained this dataset from 11 subjects who performed four different forms of falling (falling forward using hands, falling forward using knees, falling backward, and falling sideways). Each subject had a wearable sensing device with an accelerometer, gyroscope, and orientation sensors to track every movement, with a sampling rate of (100 Hz). The collection contains 637,127 data in total.
Here is how the data set is structured:
An amount of 11 subject-specific folders:
Four different folders, one for each task:
Each trial will have three subfolders:
There will be one sensor data CSV file, with information from the attempt.
Select a topic, action, and experiment from the menus below to select the files you wish to download. The “Complete Downloads” section contains links to the whole data set (sadly, we do not have a single file with all the photographs), while the “File Search” section allows you to browse for individual files. Every record has:
Features:
X and Y: characteristics determined by analyzing the data. Each feature’s window length (in seconds) is represented by the Y value, while the X value indicates the time interval (in seconds) during which the features were collected.
Camera X: refers to a zipped folder containing the photographs for each experiment that were captured with Camera X.
Camera X OF: this is a compressed folder containing all of Camera X’s settings (optical flow).
Shrinks the camera OF: this means a CSV file containing the OF from both cameras, scaled down to a 20 × 20 matrix.
CCTV OF Features The X and Y: this file contains the OF from both cameras, scaled down to a 20 × 20 matrix. The mean was extracted as the lone feature from these files, with X denoting the time gap in seconds over which the feature was collected, and Y denotes the window length in seconds.
3.2. Prediction Model
3.2.1. Preprocessing: Missing Data Elimination
Because motion sensors generate a large amount of data, it is vital to store them on servers. Some missing values occurred due to data storage and retrieval, resulting in such spaces being filled up with hashes or question marks. There will be a decrease in categorization accuracy due to the employment of these symbols. To avoid a situation such as this, we combed through all of the data looking for similar symbols (see
Figure 1).
When the error is discovered, the line with the missing value (represented by symbols) is erased from the file. Because of the massive amount of data and because of the lines representing Cartesian coordinates that are formed every minute, there is no need to be concerned about losing out on any information. As a consequence, deleting a single line will result in no information loss.
The detection of falls should always serve as the basis for motion classification. This endeavor is driven by the rising need for accurate fall prevention technology, which is especially crucial considering the limitations of the solutions that are now available. It is necessary to improve the quality of the classifier to make fall detection more reliable. Simply following the instructions listed here will take you to your destination.
To commence, the data are transformed, labeled, and concatenated to fit the format required by the model and to convert to coded form and analysis. Ultimately, all data are divided into test data and data trains (see
Table 1).
3.2.2. Auto Encoder (AEC)
An auto encoder (AEC) is a form of ANN used to learn effective encodings of unlabeled input (unsupervised learning). Working towards the aim of recreating the initial input is an excellent way for evaluating and improving encoding. Auto encoders are neural networks that may be trained to learn a representation (encoding) of a collection of inputs by filtering out irrelevant information (noise). The categorization of one-class scenarios, and the administration of unlabeled data provided by wearable sensing devices, are currently beyond the capability of AECs. With a mean absolute error (MAE) of 37.3 and an accuracy of 52.8%, the AEC method allows us to predict when someone will fall.
Figure 2a,b both illustrate the same thing for us to see.
As shown in
Figure 2a, the ten-fold accuracy ranges between 49% and 51.6%, with a floor of 49%. The root means square error (RMSE) was 86.488, and the mean square error (MSE) was 7.4803 × 10
3. A single-stage classifier with 80% training and 20% testing data yielded the findings. AEC classifies unlabeled data using unsupervised deep learning. This procedure yielded results.
3.2.3. Classifiers
To categorize the labeled drop data, a neural network with four steps of feed-forward classification is utilized. To increase the accuracy of data predictions, a supervised learning approach known as a “4-stage forward neural network” (4SFNN) was implemented. Each categorization level is fine-tuned using a technique known as particle swarm optimization (PSO).
Table 1 shows the list of configuration parameters of the proposed model. PSO is used to improve training quality by roughly modeling mistakes and, hence, maximizing the performance of each step of the classification process. Weight/bias estimation is used to achieve this purpose. This approach assigns weight/bias values depending on the changing value of the training MSE that happens during the investigation.
Using this approach, the information is divided into four groups based on the characteristics of the fall: MSC, FKL, FOL, and SDL (each one represents a type of falling mentioned in
Table 1). As a result, every incoming class is subjected to a four-stage categorization procedure (
Figure 3 demonstrates the underlying structure of the proposed classifier). As result of the labeling challenge, it is difficult to use appropriate classifiers; extensive performance evaluation may be performed to strengthen the reliability of the proposed method. To ensure that the findings are valid, each suggested classifier must be evaluated using metrics, such as the proportion of correct classifications and the MAE.
When the classification is being trained, an epoch is a full pass through the training data. Low epoch counts can lead to under-fitting, which occurs when the model has not learned enough from the training set of data to make reliable predictions about the set of incoming data. On the other hand, too many epochs may lead to overfitting, which prevents the model from generalizing well to new data, since it has learned too much from the training set. Therefore, it is vital to be vigilant about the model’s performance on a validation dataset throughout the training process and to stop training when the validation loss starts to rise the overfitting.
The common process for adjusting the ideal number of epochs is called early stopping, which automatically determines the optimal number of epochs based on the performance of your model on a validation dataset. Divide the dataset into three sets: a training, a validation, and a test set. Train the model on the training set for a large number of epochs, which is significantly greater than anticipated. After each epoch, evaluate how well the model performs on the validation set. Whenever the enhancement of a validation set’s performance is stopping training, then use the epoch with the best validation set performance. The value of accuracy in the system refers to the percentage of correctly classified fall events out of all the fall events in your dataset. During the training process, the model will predict the type of fall based on the sensor data from the accelerometer and gyroscope sensors. The predicted fall is then compared to the actual fall type in the dataset and identifies the accuracy. The quality of the classifier must be increased to improve the accuracy of fall detection. The accuracy is calculated by comparing the model’s predictions to the true labels for the data in the training or validation set. During the training process, we can use these accuracy values to monitor the performance of the model and adjust the hyperparameters or architecture, if needed, to improve the model’s accuracy on the validation set.
4. Results and Discussion
4.1. Nonoptimized
After preprocessing the data by research and identifying the missing values of three coordinates’ sensor data to compensate for them and obtain the relevant features by specifying the coordinated values with corresponding time to make it accessible for the classification, then we partition it into training and testing sets, train the model on the training set 0.8%, evaluate its performance on the validation set 0.1%, and test it on the testing set 0.1% (see
Table 2). The outcomes of identifying the labeled data on falling using a feed-forward neural network with four stages are depicted below. The first findings of a classification system that was not optimized suggest an accuracy of 97.78% and a mean absolute error of (0.0275), while the MSE is 0.031, and the RMSE is 0.176. In the end, the third-stage classifier was successful in achieving an accuracy of 98.16%, a MAE of 0.0256, a MSE of 0.0319, and a RMSE of 0.177. In the fourth step of the classifier, we achieved 98% accuracy with a MAE of 0.0287, a MSE of 0.0365, and a RMSE of 0.191 (
Figure 4).
4.2. Optimanzed
PSO improves the weights of an FFNN by repeatedly altering particle locations to classify features. This technique improves the FFNN convergence rate and learning process, creating a more accurate and efficient neural network model. FFNN-PSO has four layers with 40, 30, 10, and 10 neurons, with 0.7%, 0.2%, and 0.1% training, validation, and testing tests. Stage 1 findings for the MSE class, where the proposed classifier predicts the MSE class with 98.95% accuracy, the MAE with 0.016, the MSE with 0.0168, and the RMSE with 0.129. Stage 3 of the proposed classifier predicts the SDL class with 97.94% accuracy, the MAE with 0.028, the MSE with 0.0266, and the RMSE with 0.137.
Figure 5 shows that the proposed classifier predicts FKL with a 98.45% accuracy, the MAE with 0.018, the MSE with 0.0216, and the RMSE with 0.147. The proposed classifier has a FOL prediction accuracy of 98.76%, a FKL of 0.0176, a MSE of 0.0363, and a RMSE of 0.190.
Six male and five female participants, aged 22 to 36, with a mean height of 176.09 cm and a mean weight of 77.63 kg, were asked to perform different falling activities (falling backward, falling forward, sitting in an empty chair, and falling sideways) to collect raw data from wearable (x, y, z) axis accelerometer sensors and (roll, pitch, yaw) gyroscope sensors at 100 Hz.
Choose consistent missing value characteristics. The dataset has four subgroups. Find the frequency of each character in each chosen subgroup. Employ the classification techniques and accuracy measures to gradually examine feature prediction abilities.
This study uses the dataset to anticipate three fall models. The PSO technique gave the FFNN classifier a final prediction accuracy of 98.615%, compared to 52.8% for the AEC in the first model (see
Table 3). The mean absolute error rate also contrasts AEC with the suggested methods. PSO in the FFNN algorithm stages decreases the MAE rate from 0.0262 to 0.0193, improving performance and reducing errors.
Similarly, there is an obvious increase in training and validation accuracy with the rise of the number of epochs, which has a significant impact on improving the system and reducing loss. As shown in the
Figure 6, the increase in the number of epochs to 30 leads to a significant improvement, reaching 92.3% in training accuracy with 88.8% in validation, and, then, the increment during the range of 60 to 80 epoch matched to a great increase in each of training and validation accuracy, arriving at 98.1 and 95.3, respectively, until it stabilized between 80 to 100. Thus, it is clear how the appropriate choice of the number of epochs achieved best classification performance.
With exact performance percentages (see
Table 4), powerful human fall detection systems increase safety and wellbeing, especially for the elderly and those with medical issues. Caregivers and family members may rest easy with fall detection systems. These technologies can reduce falls that need emergency medical services and hospitalizations, lowering healthcare costs.
Finally, we make a comparison that shows some comparisons between this study and the literature studies of performance. Comparing our best method with other studies, the accuracy outperforms every single method (see
Table 5).
4.3. Confusion Matrix
A confusion matrix table is frequently used to assess whether a classification model is working well. It displays the proportion of precise and imprecise predictions generated by the model in contrast to the actual results. It is commonly depicted as a square matrix with rows and columns corresponding to the predicted and actual classes, respectively. The confusion matrix is used to evaluate a model’s performance, notably its accuracy, precision, recall, and F1 score (
Figure 7).
The accuracy of the model is determined by dividing the total number of predictions by the sum of true positives and true negatives. Precision is determined by dividing the number of true positives by the total number of positive predictions, whereas recall is determined by dividing the number of true positives by the total number of positive predictions. The harmonic mean of recall and precision is the F1 score.
The four outcomes of a binary classification model are:
True positive (TP): The model accurately predicted the positive class.
False positive (FP): The model predicts that the class would be positive, but the actual class was negative.
True negative (TN): The model predicted the negative class in the appropriate order.
False negative (FN): The model indicated that the class would be negative, but it was actually positive.
The dataset’s confusion matrix of human falling detection is evaluated by the findings with four labels on the actual and predicted labels, which have an accuracy of 98.615% and have been processed using the FFNN-PSO algorithm. This shows that falling forward using knees has the highest number of predictions (16,008), while falling forward using hands receives the lowest prediction (eight at the same time).
Precision and recall are important for evaluating system performance. Precision measures the system’s ability to correctly identify positive results, while recall measures the system’s ability to correctly identify all positive results. The values of precision and recall depend on the number of true positives, false positives, and false negatives. The F1 Score depends on the value of precision and recall and is used as a measure of a system’s accuracy
Regarding matrix results, indicating that the FFNN_PSO model had achieved precise performance for activities with a precision of 98.9%, recall of 95.2%, and F1 score of 97.1%, the prediction error is about 0.0224. The value of precision and recall is used as a measure of a system’s accuracy.
For BSC falling, 15,925 actual samples were correctly classified as true positive (TP), while only eight samples were incorrectly classified as false positive (FP). For FLK falling, 16,008 samples were correctly classified as (TP), and 388 samples were incorrectly classified as FKL or SDL (FP). For FOL, 15,628 actual samples were correctly classified as (TP), while 632 samples were incorrectly classified as BSC or FKL (FN), and 459 samples were incorrectly classified as SDL (FP). An amount of 15,457 actual samples were correctly classified (TP), 247 samples were incorrectly classified as FKL or FOL (FN), and 347 samples were incorrectly classified as BSC (FP).