Next Article in Journal
Prospects of Mortality Salience for Promoting Sustainable Public Sector Management: A Survey Experiment on Public Service Motivation
Previous Article in Journal
The Impact of Group Control on the Effectiveness of Enterprise Innovation: An Empirical Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Smartphone User Identification/Authentication Using Accelerometer and Gyroscope Data

by
Eyhab Al-Mahadeen
1,
Mansoor Alghamdi
2,
Ahmad S. Tarawneh
3,
Majed Abdullah Alrowaily
4,
Malek Alrashidi
2,
Ibrahim S. Alkhazi
5,
Almoutaz Mbaidin
1,6,
Anas Ali Alkasasbeh
1,
Mohammad Ali Abbadi
1 and
Ahmad B. Hassanat
1,*
1
Computer Science Department, Mutah University, Karak 61711, Jordan
2
Applied College, University of Tabuk, Tabuk 47512, Saudi Arabia
3
Department of Data Science and Artificial Intelligence, University of Petra, Amman 11196, Jordan
4
Department of Computer Science, College of Computer and Information Sciences, Jouf University, Sakaka 72341, Saudi Arabia
5
College of Computers & Information Technology, University of Tabuk, Tabuk 47512, Saudi Arabia
6
CiTIUS, University Santiago, 15782 Santiago de Compostela, Spain
*
Author to whom correspondence should be addressed.
Sustainability 2023, 15(13), 10456; https://doi.org/10.3390/su151310456
Submission received: 25 May 2023 / Revised: 27 June 2023 / Accepted: 28 June 2023 / Published: 3 July 2023

Abstract

:
With the increasing popularity of smartphones, user identification has become a critical component to ensure security and privacy. This study looked into how smartphone sensors’ data can be utilized to identify/authenticate users and gives suggestions for the best application components. A public smartphone dataset was used to train a deep learning algorithms, conventional classifiers, and voting classifiers, which were then used to identify the users. Feature selection and Pre-processing techniques were investigated to improve the performance. According to the results, Recursive Feature Elimination beat the other feature-selection approaches, and Long Short-Term Memory (LSTM) had the best identification performance, as evidenced by a relatively large number of machine learning performance metrics. Even with a larger number of users, the proposed identification system performed well and outperformed existing approaches, which were primarily designed and tested on the same public smartphone dataset. In terms of user authentication, this study compared the effectiveness of accelerometer data against gyroscope data. According to the findings, the accelerometer data surpassed the gyroscope data in the authentication process. Notably, the study revealed that employing LSTM to combine the accelerometer and gyroscope data resulted in near-perfect user authentication. The insights gained from this study help to develop user identification and authentication approaches that employ smartphone accelerometer data.

1. Introduction

Smartphones have become an indispensable component of people’s lives, fulfilling a variety of functions other than communication. One notable application is in the Internet of Things (IoT), where smartphones use their data-sensing, visualization, and edge-computing capabilities to allow IoT-based solutions [1,2,3,4]. Likewise, smartphones serve as an entertainment platform, allowing users to engage in mobile games, as well as watch video and audio content [5,6]. They also function as dependable Global Positioning System (GPS) devices and navigation systems [7] and include services such as document scanning and imaging, allowing users to scan Quick Response (QR) codes and copy documents with ease [8]. Moreover, smartphones are equipped with built-in sensors that facilitate health and fitness monitoring, enabling users to track their physical activity and manage their health [9]. Another useful feature of smartphones is their capacity to support electronic payments, making mobile transactions and financial transactions more accessible [10]. Smartphones incorporate biometric authentication methods such as facial recognition [11], fingerprint identification [12,13], and palm-print identification [14,15], which are used not only for user recognition, but also for specialized applications such as identifying potential threats or terrorists [16,17].
Contemporary mobile phones are outfitted with a variety of sensors, which improve their operation and offer users a rich interactive experience. The accelerometer, gyroscope, and magnetometer are three important sensors found in modern mobile phones. The accelerometer detects changes in orientation and movement by measuring the phone’s linear acceleration in three dimensions. This sensor is in charge of counting steps, screen rotation, and recognizing gestures. The gyroscope supports the accelerometer by monitoring the rotational motion of the phone, allowing for more-exact tracking of movements and improving augmented reality apps. Finally, the magnetometer functions as a digital compass, sensing magnetic fields and assisting users in properly navigating their surroundings. Together, these sensors enable a wide range of applications and contribute to mobile phones’ general adaptability and utility [18,19,20,21].
With the growing popularity of smartphones, user identification has become an essential component of maintaining security and privacy. This research focused on the use of the accelerometer data obtained by a smartphone for user identification. The accelerometer sensor in smartphones records the device’s movements and orientation, which can be used to develop personalized profiles for various users.
Less research has been performed on the topic of identifying users using mobile phones’ accelerometer data. Although user authentication has seen much research using accelerometer data, as can be seen in the next section, presuming that each mobile phone has its own unique owner, this is not always the case, occasionally, several persons will use the same mobile device. Such scenarios include, but are not limited to use in education, sharing with friends and family, public use, business use, etc. For each of these scenarios, it is crucial to implement the necessary security precautions to safeguard each user’s privacy and personal information.
This paper attempted to investigate the potential applications of smartphone sensors’ data in depth, with an emphasis on user identification using accelerometer data. To begin, a variety of machine learning approaches, including deep learning, traditional classifiers, and voting classifiers, was used to learn from each user’s accelerometer data for precise identification. The key training resource for these methods was the Hand Movement, Orientation, and Grasp (HMOG) public dataset [22]. To improve the performance of machine learning, several Pre-processing approaches and feature-selection strategies were investigated to clean and optimize the data.
Building on the initial scope, the objectives of this study were expanded to encompass the process of authentication. The use of gyroscope data in conjunction with accelerometer data was investigated to determine their impact on authentication Accuracy. By leveraging both sensors, this study aimed to assess their combined effectiveness and potential synergies to improve user authentication.
This paper is organized into sections to give an in-depth investigation of smartphone user identification/authentication using sensor data. The “Related Work” Section 2 establishes the context for the study by providing an overview of previous literature and research pertinent to the scope of this paper.
The “Materials and Methods” Section 3 is broken into numerous subsections. The dataset utilized in the study is explained in Section 3.1, including information on the HMOG public dataset. Section 3.2 focuses on the strategies used to preprocess the accelerometer data to ensure their quality and suitability for analysis. In Section 3.3, the feature extraction procedure is described, along with the specific features extracted from the accelerometer signals. The feature-selection strategies used to determine the most-useful features are discussed in Section 3.4. The class imbalance issues are addressed in Section 3.5, along with techniques for dealing with this obstacle. Finally, in Section 3.6, the numerous machine learning classifiers utilized in the study are introduced, with specifics on how they were used in the user identification/authentication process.
The “Results and Discussion” section (Section 4) is divided into subsections to present the findings thoroughly. The comparison of filtered and unfiltered signals is examined in Section 4.1, along with their impact on the identification process. Section 4.2 provides the outcomes of the feature selection procedure. In Section 4.3, the classifiers are compared and analyzed to determine their Accuracy in identifying users. Section 4.4 focuses on authentication specifically, demonstrating the results obtained by merging the accelerometer and gyroscope data using LSTM. The findings of the study are compared to state-of-the-art approaches in Section 4.5, indicating the advances made in user identification and authentication. Finally, in Section 4.6, the study’s scientific contributions are summarized and stressed.
The paper concludes with the “Conclusions” Section 5, which summarizes the main findings and implications. This section emphasizes the study’s significance, provides insights into future research directions, and reaffirms the paper’s impact on the field of smartphone user identification and authentication.

2. Related Work

Modern smartphones frequently contain inertial sensors such as accelerometers, gyroscopes, and magnetometers, which has motivated researchers interested in authentication and identification techniques to take into account secure, dynamic behavioral biometrics based on these sensors. This section presents a number of works that dealt with behavioral biometric keys based on the data obtained from the aforementioned sensors.
An identification framework called ActID was presented by Sudhakar and his coworkers to identify a user based on hand movements made when the user is walking for 60 s on a flat surface. Activity sensors consisting of an accelerometer and gyroscope were placed on the wrist of the hand to collect data from 30 subjects over the course of two sessions. The sensors’ readings were subsequently sent through Bluetooth to a mobile phone [23]. The transmitted signals were then filtered and resampled using a linear interpolation method to replace missing data with the nearest value. Next, they extracted physical features such as the peak value, as well as statistical features such as the mean, median, and variance. Then, they used Optimal Feature Evaluation and Selection (OFES) Ram et al. [24] to choose features of high quality and Correlation-based Feature Subset Selection (CFSS) Hall [25] to choose features that were most-closely related to the labeled class. The results of various standard classifiers, including K-Nearest Neighbors (KNN), Naive Bayes (NB), and Random Forest (RF), were then applied to a subset of features, and their performance was compared with the sliding window voting classifier, which was based on SVM and had 100% Accuracy, outperforming the 97.98% Accuracy of the standard SVM classifier. However, the research was hindered by the fact that the available data did not encompass all user behaviors. Additionally, both the training and testing data were obtained from identical users, in identical circumstances, and over the same duration, which limited the study’s scope.
Cherifi and her coworkers [26] proposed an effective continuous authentication technique based on users’ prehensile movements and modeled these movements through the Hidden Markov Model-Universal Background Model (HMM-UBM) with continuous observations based on the Gaussian Mixture Model (GMM). Two datasets were used: the first was the HMOG public dataset, and the second was a proprietary database. The dataset was then filtered using the obfuscation method, which multiplied the original signal of each of the three employed sensors by an offset value and added new noise to the original signals. The sensor readings for each axis were then normalized by dividing their values by their magnitude and added as a fourth dimension. Multiple tests were conducted to determine the best window size, and a fixed sliding window was ultimately selected. The mean and standard deviation for each sensor axis and magnitude were the extracted features, resulting in a vector of twenty-four features. The proposed method achieved a 19.2% Equal Error Rate (EER) on the HMOG dataset and a 14.8% EER on their second dataset. This study neglected to account for the class-imbalanced HMOG dataset and lacked other performance metrics such as the Area Under the Curve (AUC), Specificity, etc. Sitova and coworkers [27] introduced a Biometric-Key-Generation (BKG) function on HMOG paired with taps and keystrokes, which was extracted by an accelerometer, gyroscope, and magnetometer from 100 users while they were seated and moving about. They extracted two types of features: The first type is known as grasp resistance, and it measures how quickly the phone moves and changes orientation in reaction to a tap gesture. The second type, known as grasp stability, measures how fast tap forces cause changes in movement and orientation. The extracted features for both types included sensor readings on the x, y, and z, magnitude, the start and end times of the taps, the stability after the tap ends, and the average sensor reading in 100ms. Using combined accelerometer and gyroscope data, they were able to obtain a 13.62% EER on HMOG features. However, the EER dropped to 7.16% when they integrated HMOG, taps, and keystrokes. Since HMOG is a class-imbalanced dataset, other metrics, such as the Precision and Recall of the minority class, must be used instead since their model was only evaluated using one metric (EER).
A study by Yoneda and coworkers [28] provided a thorough investigation of mobile biometrics by evaluating nine sensors and 18 physical activities. The data were collected at 20 Hz by the accelerometer and gyroscope incorporated into inertial smartwatches and smartphones. About three minutes were spent performing each of the 18 activities by the 51 users who made up the study’s sample. Every activity was split into eighteen instances with a defined window size of ten seconds, which did not overlap. For each instance for both sensors, the statistical features of the Average, Standard Deviation, Average Absolute Difference, Time between Peaks, Average Resultant Acceleration, Binned Distribution, Activity label, and Subject ID were retrieved. The KNN, Decision Tree, and Random Forest were investigated. The accelerometer sensor combinations for both the watches and the phones, together with all other sensor combinations utilizing Random Forest with voting yielded the lowest EER for an authentication of 9.3%. However, the authentication Accuracy increased to 99.7% by combining Random Forest with voting based only on the accelerometer of both the smartphones and smartwatches used.
An authentication approach based on a fingertip sensor device that collects motion data such as angular velocity and acceleration, as well as physiological data such as a Photoplethysmography (PPG) signal was presented by Wu and coworkers [29]. The compact and portable fingertip device had two sensor chips: the Flora chip detects movement, and the Pulse Sensor Amped chip checks blood oxygen levels. As soon as it is worn, it starts capturing acceleration, angular velocity, and PPG signals. These data are then transmitted through Bluetooth to a computer. The data were collected from 40 subjects, who agreed to wear the fingertip device while performing three actions—slow walking, sitting, and performing rather hard movements—for 20 repeats (each repetition lasting 12 s) over the course of 30 days. They used a fourth-order Chebyshev low-pass filter with a cutoff frequency of 5 Hz to decrease the noise in the data. Then, a peak detection method was used to extract the periodicity of PPG signals, and a Fixed-size Sliding Window (FSW) ranging from 2 s to 12 s with 20% overlap was applied to extract the statistical features, which were divided into three categories: time-domain, frequency-domain, and wavelet-domain features. Then, they used the ReliefF algorithm [30] to determine the importance of the features, choosing 90% of them that were more crucial for authentication. For activity recognition, the Decision Tree performed the lowest with an Overall Accuracy of 94.3%, while the KNN earned the best performance with 100% Accuracy. With a False Acceptance Rate (FAR) of 4.69%, a False Rejection Rate (FRR) of 4.95%, and 98.74% Accuracy, the walking state learned by the Support Vector Machine (SVM) provided the best authentication performance.
The works of [31,32,33,34,35,36,37,38,39,40,41,42,43,44] are other examples of research that investigated the use of smartphone accelerometer data for biometrics.
The bulk of previous research, as can be seen, only used one evaluation metric, leaving out crucial metrics such as the Area Under the Curve (AUC), Index of Balanced Accuracy (IBA) [45], Sensitivity, Specificity, Geometric Mean, and F-Score. Furthermore, the majority of these studies relied on person authentication rather than identification, and when they addressed the issue of class imbalance, they employed the oversampling approach, which creates new examples based on their resemblance to the minority examples in question and assumes that they belong to the same class, even though this is not always the case according to [46]. Table 1 lists some of the studies with information about their experiments’ setup.
This paper focused on these drawbacks and worked on person identification using a variety of widely used evaluation metrics, including the AUC, IBA, GEO, and F-Score. Additionally, it looked into the use of Extreme Gradient Boosting (XGBoost) [48], which has been recommended for the class imbalance problem [49], as well as other approaches, such as [50].

3. Materials and Methods

Data preparation, feature extraction, feature selection, and training/testing using a machine learning classifier were the four main components of the proposed identification system, as illustrated in Figure 1.
The proposed system leverages data obtained from the mobile accelerometer sensor, which is made up of a set of slender structures and is used for motion detection. These structures produce readings, which are then transmitted to the main circuit. Tri-axial accelerometers are sensors that approximate acceleration in the x, y, and z axes numerically. This information can be used to calculate velocity and displacement [51].

3.1. Dataset Description

Hand Movement, Orientation, and Grasp (HMOG) [22,27] is a dataset of features captured from human interaction with smartphones such as holds, taps, and grasps. Accelerometer, gyroscope, and magnetometer sensors were used to capture the subtle hand movements and orientation while users interacted with their smartphones. The HMOG dataset was captured from 100 subjects (53 male and 47 female) over the course of eight sessions, with four sessions involving the participants writing a text of at least 250 characters while they were walking and the remaining four with the volunteers writing a text while sitting.
The data were collected using Android Samsung Galaxy S4 for all volunteers to record two types of readings:
  • Accelerometer, gyroscope, and magnetometer readings with a sampling rate of 100 Hz.
  • Touch screen data such as touch gestures and key press.
This study utilized the data obtained from the accelerometer sensor only to identify users. Figure 1 shows a sample of the accelerometer sensor reading in one second, and Table 2 summarizes the dataset characteristics.

3.2. Preprocessing

The goal of this step was to clean and prepare the sensor’s raw data so that the feature extraction process could make efficient use of them. The preprocessing steps included calculating the magnitude of the three axes (x, y, and z) for each signal, bandpass filtering the data, and utilizing an overlapping sliding window to extract the signal’s internal features. Equation (1) can be used to more easily and effectively calculate the magnitude of the x, y, and z vectors of the accelerometer sensor than it is to calculate each axis separately.
M = x 2 + y 2 + z 2 ,
where M is the magnitude and x, y, and z are the acceleration at a specific timestamp on the three axes of the accelerometer signal.
It is usually recommended to use a bandpass filter to reduce noise in a signal [52], although this largely depends on the amount of noise that may be present in the HMOG dataset. This study tested two different scenarios, the first of which made use of a bandpass filter that was applied at low and high frequencies between 3 and 8 Hz, which corresponds to the acceptable range for human hand tremors [53]. The second scenario was to omit this step in the event that the data were noise-free.
Utilizing the overlapping sliding window technique for preprocessing low-dimensional time series data results in reduced latency, improved Accuracy, and lower processing costs, particularly when employing a smaller window size [54]. In order to improve the Accuracy, reduce the feature extraction time, and give more comprehensive user data so the model can learn from them, each user signal was split into small segments in this study using the overlapping sliding window technique. The window size was fixed at 10 s (10 s × 100 samples per second = 1000 samples), as illustrated in Figure 2.

3.3. Feature Extraction

One of the most-important processes in building a trained model is feature extraction since choosing the best features enables the model to train more smoothly and accurately differentiate between numerous classes. The Fast Fourier Transform (FFT) was used to transform the row magnitude signal from the time domain to the frequency domain for each sliding window. The statistical features, which were extracted from the resultant FFT signal and the pure magnitude signal, are listed in Table 3 and Table 4.

3.4. Feature Selection

A subset of features that were relevant from a vast number of features was chosen via a feature-selection technique. The primary goal of this step was to improve the machine learning model’s assessment metrics by deleting unnecessary or redundant data. The experiments were conducted with a number of feature-selection methods to assess and determine the most-prevailing features. These methods included the Correlation Coefficient, Fisher Score, Information Gain, and Recursive Feature Elimination (RFE). The chosen subsets of features from different techniques were tested using different classifiers. The outcomes of these experiments will be discussed in the next section.
The features that had a Correlation Coefficient greater than 0, as shown in Figure 3, were chosen for testing with various classifiers for identifying users. The outcomes of these tests were then compared to the results obtained from other feature-selection methods and with the results obtained from all features under identical testing and classifier settings.
The Fisher Score can also be utilized to determine the importance of features. A different subset of features of top importance (10, 20, and 30 features) was chosen and subjected to the same classifiers for identifying users. The results were compared to those produced by classifiers that utilized all features to determine which approach yielded better results. Figure 4 shows the features with the Fisher Score ranks.
The Information Gain was also investigated as a feature-selection method in this study. Specifically, features with an Information Gain value greater than 0.1 and 0.15 were selected. The Information Gain of each feature is shown in Figure 5.
The ranking of features based on their importance is shown in Figure 6, as determined by the RFE, which eliminates the least-important features. Logistic Regression (LR) and Random Forest (RF) were the two models used to determine the importance of the features. The effectiveness of this method was assessed by multiple classifiers, both with 10 and 20 features removed.

3.5. Class Imbalance

As can be seen from Table 2, HMOG is a class-imbalanced dataset. Usually, researchers opt for oversampling techniques that synthesize new examples for the minority classes/subjects (those having fewer accelerometer signals) based on their similarity to the other examples belonging to the same class/subject. However, this assumption might be violated according to a recent study [46], and it is recommended to avoid such an approach, stressing the use of other resampling approaches such as the ensemble approach [50,55].
This work investigated the use of XGBoost [48] because it provides the ability to adjust the training process to focus more on the misclassification of the minority class in class-imbalanced datasets [49,56].

3.6. Machine Learning Classifiers

In order to determine the best option for the proposed identification system, a variety of traditional classifiers, voting classifiers, and deep learning algorithms were investigated. The traditional classifiers include Artificial Neural Networks (ANNs), Decision Trees (DT), Random Forests (RFs), SVMs, KNNs, Logistic Regression (LR), Stochastic Gradient Descent (SGD), and Gaussian Naive Bases (GNB).
The Voting Classifier (VC) is an ensemble learning method that selects the class with the most votes among various classifiers as the prediction of the ensemble model. In this study, the voting classifier selected the most-voted class among the top three traditional classifiers, which are the ANN, SVM, and RF.
The second voting classifier was the well-known XGBoost, a machine learning method that builds machine learning models using the gradient boosting framework, which is a common framework. The gradient boosting framework combines a number of Decision Trees, which are weak learners, to produce a strong learner that can predict outcomes accurately. XGBoost entails successive iterations of adding new models to the ensemble and adjusting the weights of the data points based on the mistakes made by the prior models; this is vital for the misclassifications of the minority examples.
The Long Short-Term Memory (LSTM) was used as a deep learning classifier, which is a type of Recurrent Neural Network (RNN), which can process single and entire sequences of data.

4. Results and Discussion

A set of experiments was conducted to look into various concerns in order to evaluate the proposed user-identification system, including the following experiments:
  • Comparisons of identification results on filtered and unfiltered signals to recommend the best alternative for the proposed system.
  • Feature-selection comparisons to identify what most-effective attributes that can boost the identification metrics.
  • Comparing classifier performance to determine the most-suitable classifier and alternatives for the proposed system.

4.1. Filtered vs. Unfiltered Signals

In this set of experiments, 10 randomly chosen users were identified using their accelerometer signals using 8 conventional classifiers and 2 voting classifiers, with and without bandpass filtering. Figure 7 shows the classifiers’ highest levels of Accuracy. This figure illustrates how significantly filtering the HMOG data reduced the performance of the identification process, giving the impression that the data were not noisy and did not require filtering because doing so would result in the loss of some crucial information from the original accelerometer signal, as the desired information would be sacrificed in order to reduce the noise. This is also supported by many studies such as [57] and Figure 8, which depicts the AUC of identifying 10 users using RF classifier.
As demonstrated in Table 5, unfiltered data were better for identification, not only in terms of identification Accuracy, but also in terms of other crucial metrics such as the Precision (Pre), Recall (Rec), Specificity (Spe), F-Score (F1), Geometric Mean (GEO), and Index of Balanced Accuracy (IBA).

4.2. Feature-Selection Results

ANN, RF, SVM, VC, and XGBoost were used to evaluate the impact of certain standard feature-selection methods, including the Correlation Coefficients, Information Gain, Fisher Score, and RFW, as they achieved the most-favorable outcomes in the previous set of assessments. The identification Accuracy of several scenarios, where various subsets of features were picked using various selection approaches, is shown in Table 6. In general, utilizing no feature selection approach, the identification Accuracy reported by all classifiers examined was higher. However, all classifiers showed Accuracy results when utilizing the RFE that were comparable to those achieved when the classifiers were applied to all features, especially when using subgroups of 20 or 30. Evidently, the Accuracy score increased as the number of features increased, demonstrating that the statistically extracted features contributed well to class distinction.

4.3. Classifiers’ Comparisons

A reasonably big number of metrics were used here in order to determine the best-possible classifier for the proposed identification system, as each has its own strengths and weaknesses. For example, the measures that are most-frequently used to evaluate the performance of classification algorithms are Accuracy, Precision, Sensitivity, and Specificity [58]. These metrics, however, are not enough for classifier evaluation in class-imbalanced learning and are susceptible to the data distribution [59]. On the other hand, the true negatives and classes’ contributions to overall performance are not taken into consideration by the GEO and F1 [45], however, because they are less influenced by the majority classes in the imbalanced data, and they are beneficial evaluation measures for the performance of different machine learning methods [60]. The IBA metric combines a measure of how dominating the class with the highest individual Accuracy rate is with an unbiased index of its Overall Accuracy [45]. Previous studies indicated that all performance metrics, with the exception of the AUC, were weakened by skewed data distributions [61,62,63]. For these reasons, the AUC in addition to the aforementioned metrics were selected to gain a better insight into the performance of the proposed identification system using a specific classifier.
The highlighted values in Table 7 showed the best results of different metrics obtained after applying all classifiers employed to identify 10 users based on their accelerometer signals obtained from the original HMOG dataset without filtering. The LSTM deep learning model had the best performance, scoring an 89% Overall Accuracy, 89% Precision, 89% Recall, 99% Specificity, 88% F1-Score, 93% GEO, 86% IBA, and 99% AUC. The XGBoost classifier came in second, scoring an 88% Overall Accuracy, 88% Precision, 88% Recall, 99% Specificity, 88% F1-Score, 93% GEO, 86% IBA, and 99% AUC.
Given that HMOG is a class-imbalanced dataset, XGBoost’s successful performance may be explained by its capacity to modify the training process to place a greater emphasis on the misclassification of the minority class in such datasets [49,56]. Based on the results shown in Table 7, apparently, LSTM was the best choice for the proposed identification system, because it effectively addresses the vanishing gradient and long-term dependency issues and concentrates on the pertinent time series data [64,65]. Figure 9 shows the Accuracy and Loss curves during the training of the LSTM model on 10 users.
It is important to note that all machine learning methods tested in all experiments used a five-fold cross-validation approach with their own default parameters.
Almost similar results were obtained when increasing the number of users to 20, 30, and 40. Figure 10 shows that the AUC maintained its value despite the increase in class numbers. The F1-Score and GEO were also found to be less affected by the number of classes since they were computed for each class and then averaged over all classes, so they were less influenced by the number of classes than, for example, the Accuracy and other metrics, as shown in Figure 11 and Figure 12, which illustrate the performance of the two top classifiers, XGBoost and LSTM, respectively, using four metrics (Accuracy, F1-Score, GEO, and AUC) for identifying varying numbers of users (10, 20, 30, 40 and 50).

4.4. Authentication

The procedures of identification and authentication are inextricably linked since they both entail recognizing and verifying the legitimacy of users. However, identification is more complex than authentication because the feature space includes features from numerous subjects, but authentication is limited to a single subject, i.e. learning binary class data is easier than learning multi-class data [66].
In order to supplement this investigation, many experiments on authenticating a mobile phone user using accelerometer and/or gyroscope data were conducted. For this purpose, two user profiles were generated using data from 10 randomly selected users. The initial profile, called User1, was made up of data from a single user chosen at random who represented the positive class with 3027 examples, while the other nine users represented the imposters or negative class with 23,340 examples. The second profile, User2, had data from nine other users who made up the negative class or imposters and another random user who represented the positive class with 2559 samples.
The authentication experiments were conducted on data obtained from three sources: the accelerometer, gyroscope, and a combination of both. We omitted the data acquired from the magnetometer because magnetometers are impacted by environmental magnetic variation and other factors [67].
The same aforementioned features were employed, but this time, the deep learning LSTM and XGBoost were applied only, because they were the highest performers, as indicated by the previous identification results. Table 8 and Table 9 present the five-fold cross-validation results of both classifiers employed for smartphone user authentication.
As shown in Table 8 and Table 9, it is clear that the LSTM method performed better than the XGBoost approach in binary class problems across a range of data sources and metrics. This achievement is owed to LSTM’s capacity to learn long-term correlations in sequences, especially in time series data and its capacity to resolve the vanishing gradient problem [68].
The confusion matrices illustrated in Figure 13 demonstrate the importance of combining gyroscope and accelerometer data. This demonstrates that, when the two sensors were utilized in conjunction, practically perfect outcomes were obtained, particularly when using LSTM. However, when considering only the gyroscope data, it did not perform as well as the accelerometer on different metrics and in different settings. This observation provides a strong rationale for using solely accelerometer data for identification in the earlier sections.

4.5. Comparison to State-of-the-Art Methods

Table 10 shows the comparison to some of the state-of-the-art methods that used the HMOG dataset. The used sensors, the datasets, the approach, and the evaluation metrics are shown in this table. As mentioned before in the Related Work Section, none of the earlier studies evaluated their models with important metrics such as the IBA, AUC, and Specificity which are important to evaluate models’ performances.
As shown in Table 10, the proposed identification system using LSTM based on the aforementioned statistical features obtained from the smartphone accelerometer signal outperformed existing techniques, which were primarily developed for the authentication of a user (a binary classification problem), whereas the proposed system was created for the identification of multiple users (a multi-class classification problem), which is a more-challenging problem in machine learning, as supported by García and Ortiz [70], who argued that many classifiers either perform better with binary classification problems or are created expressly for them, which is a problem in the general case of classification. This is also evident from the authentication results of this study, where perfect user authentication was achieved, particularly when combining the gyroscope and accelerometer data using LSTM.

4.6. Research Contributions

The novelty and contributions of this work can be summarized as follows:
  • Investigation of utility: The article investigated the potential applications and usability of accelerometer data, with a focus on user identification and authentication. The study intended to provide insights into the practical application of accelerometer data by using machine learning techniques and employing the preprocessing and feature selection approaches.
  • User identification: To reliably identify users based on accelerometer data, the article employed a variety of machine learning algorithms, including deep learning algorithms, standard classifiers, and voting classifiers. For training and analysis, the study made use of the HMOG public dataset.
  • Data preprocessing and feature selection: The research analyzed several preprocessing approaches and feature-selection strategies to increase the performance of the machine learning algorithms. These technologies seek to clean and enhance accelerometer data in order to improve the Accuracy of user identification.
  • User authentication: The study extended the initial scope of user identification to cover the authentication process. It investigated the use of gyroscope data in conjunction with accelerometer data to examine their combined impact on authentication Accuracy.
  • Sensor comparison and analysis: The study examined the efficacy of accelerometer data alone for authentication and evaluated the impact of gyroscope data on the authentication process. This comparative analysis delved into the individual and combined contributions of these sensors, demonstrating that accelerometer data are superior to gyroscope data for authentication and that combining both is the best option, omitting the magnetometer’s data, as they are impacted by the environment.
Overall, the research contributes to a better understanding of the utility of smartphone sensors’ data, particularly in user identification and authentication. It provides significant insights into the potential applications and performance optimization approaches associated with these fields.

5. Conclusions

This paper investigated the applicability of smartphone accelerometer data in identifying users and proposed the best components that can be used for a complete application. To this end, several machine learning methods were used to learn the accelerometer data of each user in order to identify them. These included a few chosen techniques from deep learning, traditional classifiers, and voting classifiers, all of which were trained using the users’ accelerometer signals obtained from the HMOG public dataset [22]. Moreover, the use of some preprocessing methods to clean the data was investigated, in addition to investigating some feature-selection methods to improve the performance of the identification system.
The large set of experiments conducted for the purpose of this study yielded the following conclusions:
The best performance in user recognition was demonstrated by the LSTM deep learning system, followed by XGBoost. These approaches were successful in properly identifying users based solely on accelerometer data.
Using bandpass filtering resulted in poor identification. Hence, it is better to avoid using such filtering in this context because it results in information loss.
In terms of feature selection, the RFE method beat the other methods investigated. It was notably beneficial in improving the identification results when used in conjunction with LSTM. It should be noted, however, that all statistical features preserved their importance during the identification process.
Even when the number of users expanded from 10 to 50, the system consistently performed well in terms of identification. This robustness suggests that the system was capable of handling large-scale user identification scenarios.
Furthermore, in terms of user authentication, this study found that the accelerometer data outperformed the gyroscope data and that using LSTM when combining both signals resulted in almost perfect user authentication.
However, the limitation of this study included the following:
  • Out of the three sensors, this study only used the data from the accelerometer for identification and the accelerometer and gyroscope for authentication, leaving out the information obtained from the magnetometer, which might be beneficial to the identification process, even though it is shown to be affected by the environment and other factors [67].
  • The signal’s magnitude and FFT of the magnitude were both used to extract the statistical features. It is necessary to investigate additional methods for reducing the dimensionality of time series, such as wavelet and discrete cosine transformations.
  • Only one deep learning method (LSTM) was used in this study; more recent deep learning approaches need to be investigated such as the gated recurrent unit [71] and quasi-recurrent neural networks [72].
These limitations will be the focus of our future research so as to be able to devise more-accurate identification/authentication systems. Future studies should also look into incorporating advanced optimization methods within the proposed framework. These algorithms, which include hybrid heuristics, metaheuristics, adaptive algorithms, self-adaptive algorithms, and island algorithms, have been shown to be useful in addressing difficult decision problems in a variety of disciplines such as online learning [73], scheduling [74,75], multi-objective optimization [76], transportation [77], medical data, document classification, and other related fields [78]. We can demonstrate the versatility of advanced optimization algorithms and how they might improve decision-making processes in numerous situations, including the specific decision problem addressed in our study, by investigating their use in these diverse fields, which may contribute to the advancement of research in this field and open avenues for further exploration and innovation.

Author Contributions

Conceptualization, A.S.T. and A.B.H.; methodology, E.A.-M., M.A. (Mansoor Alghamdi), M.A.A. (Majed Abdullah Alrowaily), M.A. (Malek Alrashidi), A.M. and A.B.H.; software, E.A.-M., A.S.T. and M.A. (Malek Alrashidi); validation, E.A.-M., M.A. (Malek Alrashidi) and M.A.A. (Mohammad Ali Abbadi); formal analysis, E.A.-M., M.A. (Mansoor Alghamdi) and I.S.A.; investigation, E.A.-M., I.S.A. and A.A.A.; resources, E.A.-M., M.A.A. (Majed Abdullah Alrowaily), A.M. and A.A.A.; data curation, M.A. (Mansoor Alghamdi), M.A.A. (Majed Abdullah Alrowaily), M.A. (Malek Alrashidi), I.S.A., A.M. and A.A.A.; writing—original draft, E.A.-M., A.S.T. and A.B.H.; writing—review and editing, M.A. (Mansoor Alghamdi), M.A.A. (Majed Abdullah Alrowaily), M.A. (Malek Alrashidi), I.S.A., A.M., A.A.A., M.A.A. (Mohammad Ali Abbadi) and A.B.H.; visualization, M.A. (Mansoor Alghamdi) and M.A.A. (Majed Abdullah Alrowaily); supervision, A.S.T., M.A.A. (Mohammad Ali Abbadi) and A.B.H.; project administration, A.B.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

For the purposes of this study, all experiments were performed using the HMOG smartphone dataset, which was obtained from [27].

Acknowledgments

We truly appreciate the Reviewers’ voluntary efforts and thank them for their input.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Meng, Q.; Lu, P.; Zhu, S. A Smartphone-enabled IoT System for Vibration and Noise Monitoring of Rail Transit. IEEE Internet Things J. 2023, 10, 8907–8917. [Google Scholar] [CrossRef]
  2. Mnasri, S.; Nasri, N.; Alrashidi, M.; van den Bossche, A.; Val, T. IoT networks 3D deployment using hybrid many-objective optimization algorithms. J. Heuristics 2020, 26, 663–709. [Google Scholar] [CrossRef]
  3. Abdallah, W.; Mnasri, S.; Nasri, N.; Val, T. Emergent IoT Wireless Technologies beyond the year 2020: A Comprehensive Comparative Analysis. In Proceedings of the IEEE 2020 International Conference on Computing and Information Technology (ICCIT-1441), Tabuk, Saudi Arabia, 9–10 September 2020; pp. 1–5. [Google Scholar]
  4. Tlili, S.; Mnasri, S.; Val, T. A multi-objective Gray Wolf algorithm for routing in IoT Collection Networks with real experiments. In Proceedings of the IEEE 2021 National Computing Colleges Conference (NCCC), Taif, Saudi Arabia, 27–28 March 2021; pp. 1–5. [Google Scholar]
  5. Rasool, G.; Hussain, Y.; Umer, T.; Rasheed, J.; Yeo, S.F.; Sahin, F. Design Patterns for Mobile Games Based on Structural Similarity. Appl. Sci. 2023, 13, 1198. [Google Scholar] [CrossRef]
  6. Hassanat, A.B.; Altarawneh, G.; Tarawneh, A.S.; Faris, H.; Alhasanat, M.B.; de Voogt, A.; Al-Rawashdeh, B.; Alshamaileh, M.; Prasath, S.V. On Computerizing the Ancient Game of Tab. Int. J. Gaming-Comput.-Mediat. Simul. (IJGCMS) 2018, 10, 20–40. [Google Scholar] [CrossRef] [Green Version]
  7. Tai, Y.; Yu, T.T. Using smartphones to locate trapped victims in disasters. Sensors 2022, 22, 7502. [Google Scholar] [CrossRef]
  8. Skurowski, P.; Nurzyńska, K.; Pawlyta, M.; Cyran, K.A. Performance of QR code detectors near Nyquist limits. Sensors 2022, 22, 7230. [Google Scholar] [CrossRef]
  9. Soni, V.; Yadav, H.; Semwal, V.B.; Roy, B.; Choubey, D.K.; Mallick, D.K. A Novel Smartphone-Based Human Activity Recognition Using Deep Learning in Health care. In Proceedings of the Machine Learning, Image Processing, Network Security and Data Sciences (Select Proceedings of 3rd International Conference on MIND 2021), Raipur, India, 11–12 December 2021; Springer: Singapore, 2023; pp. 493–503. [Google Scholar]
  10. Shaw, B.; Kesharwani, A. Moderating effect of smartphone addiction on mobile wallet payment adoption. J. Internet Commer. 2019, 18, 291–309. [Google Scholar] [CrossRef]
  11. Hassanat, A.B.; Albustanji, A.A.; Tarawneh, A.S.; Alrashidi, M.; Alharbi, H.; Alanazi, M.; Alghamdi, M.; Alkhazi, I.S.; Prasath, V.S. DeepVeil: Deep learning for identification of face, gender, expression recognition under veiled conditions. Int. J. Biom. 2022, 14, 453–480. [Google Scholar] [CrossRef]
  12. Hassanat, A.B.; Prasath, V.S.; Al-kasassbeh, M.; Tarawneh, A.S.; Al-shamailh, A.J. Magnetic energy-based feature extraction for low-quality fingerprint images. Signal Image Video Process. 2018, 12, 1471–1478. [Google Scholar] [CrossRef]
  13. Tarawneh, A.S.; Hassanat, A.B.; Alkafaween, E.; Sarayrah, B.; Mnasri, S.; Altarawneh, G.A.; Alrashidi, M.; Alghamdi, M.; Almuhaimeed, A. DeepKnuckle: Deep Learning for Finger Knuckle Print Recognition. Electronics 2022, 11, 513. [Google Scholar] [CrossRef]
  14. Zhao, S.; Fei, L.; Wen, J. Multiview-Learning-Based Generic Palmprint Recognition: A Literature Review. Mathematics 2023, 11, 1261. [Google Scholar] [CrossRef]
  15. Hassanat, A.; Al-Awadi, M.; Btoush, E.; Al-Btoush, A.; Alhasanat, E.; Altarawneh, G. New mobile phone and webcam hand images databases for personal authentication and identification. Procedia Manuf. 2015, 3, 4060–4067. [Google Scholar] [CrossRef] [Green Version]
  16. Hassanat, A.B. On identifying terrorists using their victory signs. Data Sci. J. 2018, 17, 1–13. [Google Scholar] [CrossRef] [Green Version]
  17. Hassanat, A.B.; Btoush, E.; Abbadi, M.A.; Al-Mahadeen, B.M.; Al-Awadi, M.; Mseidein, K.I.; Almseden, A.M.; Tarawneh, A.S.; Alhasanat, M.B.; Prasath, V.S.; et al. Victory sign biometrie for terrorists identification: Preliminary results. In Proceedings of the IEEE 2017 8th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 4–6 April 2017; pp. 182–187. [Google Scholar]
  18. Särkkä, O.; Nieminen, T.; Suuriniemi, S.; Kettunen, L. A multi-position calibration method for consumer-grade accelerometers, gyroscopes, and magnetometers to field conditions. IEEE Sens. J. 2017, 17, 3470–3481. [Google Scholar] [CrossRef]
  19. Zhao, H.; Wang, Z. Motion measurement using inertial sensors, ultrasonic sensors, and magnetometers with extended kalman filter for data fusion. IEEE Sens. J. 2011, 12, 943–953. [Google Scholar] [CrossRef]
  20. Scislo, L. Verification of Mechanical Properties Identification Based on Impulse Excitation Technique and Mobile Device Measurements. Sensors 2023, 23, 5639. [Google Scholar] [CrossRef]
  21. Celestina, M.; Hrovat, J.; Kardous, C.A. Smartphone-based sound level measurement apps: Evaluation of compliance with international sound level meter standards. Appl. Acoust. 2018, 139, 119–128. [Google Scholar] [CrossRef]
  22. Yang, Q.; Peng, G.; Nguyen, D.T.; Qi, X.; Zhou, G.; Sitová, Z.; Gasti, P.; Balagani, K.S. A multimodal data set for evaluating continuous authentication performance in smartphones. In Proceedings of the 12th ACM Conference on Embedded Network Sensor Systems, Memphis, TN, USA, 3–6 November 2014; pp. 358–359. [Google Scholar]
  23. Sudhakar, S.R.V.; Kayastha, N.; Sha, K. ActID: An efficient framework for activity sensor based user identification. Comput. Secur. 2021, 108, 102319. [Google Scholar] [CrossRef]
  24. Ram, V.S.S.; Kayastha, N.; Sha, K. OFES: Optimal feature evaluation and selection for multi-class classification. Data Knowl. Eng. 2022, 139, 102007. [Google Scholar] [CrossRef]
  25. Hall, M.A. Correlation-Based Feature Selection for Machine Learning. Ph.D. Thesis, The University of Waikato, Hamilton, New Zealand, 1999. [Google Scholar]
  26. Cherifi, F.; Omar, M.; Amroun, K. An efficient biometric-based continuous authentication scheme with HMM prehensile movements modeling. J. Inf. Secur. Appl. 2021, 57, 102739. [Google Scholar] [CrossRef]
  27. Sitová, Z.; Šeděnka, J.; Yang, Q.; Peng, G.; Zhou, G.; Gasti, P.; Balagani, K.S. HMOG: New behavioral biometric features for continuous authentication of smartphone users. IEEE Trans. Inf. Forensics Secur. 2015, 11, 877–892. [Google Scholar] [CrossRef]
  28. Yoneda, K.; Weiss, G.M. Mobile sensor-based biometrics using common daily activities. In Proceedings of the 2017 IEEE 8th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference (UEMCON), New York, NY, USA, 19–21 October 2017; pp. 584–590. [Google Scholar]
  29. Wu, G.; Wang, J.; Zhang, Y.; Jiang, S. A continuous identity authentication scheme based on physiological and behavioral characteristics. Sensors 2018, 18, 179. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  30. Kononenko, I. Estimating attributes: Analysis and extensions of RELIEF. In Proceedings of the European Conference on Machine Learning, Catania, Italy, 6–8 April 1994; Volume 94, pp. 171–182. [Google Scholar]
  31. Primo, A.; Phoha, V.V.; Kumar, R.; Serwadda, A. Context-aware active authentication using smartphone accelerometer measurements. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Columbus, OH, USA, 23–28 June 2014; pp. 98–105. [Google Scholar]
  32. Sprager, S.; Juric, M.B. An efficient HOS-based gait authentication of accelerometer data. IEEE Trans. Inf. Forensics Secur. 2015, 10, 1486–1498. [Google Scholar] [CrossRef]
  33. Jain, A.; Kanhangad, V. Exploring orientation and accelerometer sensor data for personal authentication in smartphones using touchscreen gestures. Pattern Recognit. Lett. 2015, 68, 351–360. [Google Scholar] [CrossRef]
  34. Muaaz, M.; Mayrhofer, R. Smartphone-based gait recognition: From authentication to imitation. IEEE Trans. Mob. Comput. 2017, 16, 3209–3221. [Google Scholar] [CrossRef]
  35. Abuhamad, M.; Abuhmed, T.; Mohaisen, D.; Nyang, D. AUToSen: Deep-learning-based implicit continuous authentication using smartphone sensors. IEEE Internet Things J. 2020, 7, 5008–5020. [Google Scholar] [CrossRef]
  36. Li, Y.; Zou, B.; Deng, S.; Zhou, G. Using feature fusion strategies in continuous authentication on smartphones. IEEE Internet Comput. 2020, 24, 49–56. [Google Scholar] [CrossRef]
  37. Wang, R.; Tao, D. Context-aware implicit authentication of smartphone users based on multi-sensor behavior. IEEE Access 2019, 7, 119654–119667. [Google Scholar] [CrossRef]
  38. Alzubaidi, A.; Kalita, J. Authentication of smartphone users using behavioral biometrics. IEEE Commun. Surv. Tutor. 2016, 18, 1998–2026. [Google Scholar] [CrossRef] [Green Version]
  39. Shen, C.; Chen, Y.; Guan, X. Performance evaluation of implicit smartphones authentication via sensor-behavior analysis. Inf. Sci. 2018, 430, 538–553. [Google Scholar] [CrossRef]
  40. Maghsoudi, J.; Tappert, C.C. A behavioral biometrics user authentication study using motion data from android smartphones. In Proceedings of the IEEE 2016 European Intelligence and Security Informatics Conference (EISIC), Uppsala, Sweden, 17–19 August 2016; pp. 184–187. [Google Scholar]
  41. Ehatisham-ul Haq, M.; Azam, M.A.; Loo, J.; Shuang, K.; Islam, S.; Naeem, U.; Amin, Y. Authentication of smartphone users based on activity recognition and mobile sensing. Sensors 2017, 17, 2043. [Google Scholar] [CrossRef] [Green Version]
  42. Shen, C.; Li, Y.; Chen, Y.; Guan, X.; Maxion, R.A. Performance analysis of multi-motion sensor behavior for active smartphone authentication. IEEE Trans. Inf. Forensics Secur. 2017, 13, 48–62. [Google Scholar] [CrossRef]
  43. Lee, W.H.; Lee, R. Implicit sensor-based authentication of smartphone users with smartwatch. In Proceedings of the Hardware and Architectural Support for Security and Privacy, Seoul, Republic of Korea, 18 June 2016; pp. 1–8. [Google Scholar]
  44. Mohamed, M.; Cheffena, M. Received signal strength based gait authentication. IEEE Sens. J. 2018, 18, 6727–6734. [Google Scholar] [CrossRef] [Green Version]
  45. García, V.; Mollineda, R.A.; Sánchez, J.S. Index of balanced Accuracy: A performance measure for skewed class distributions. In Proceedings of the Pattern Recognition and Image Analysis: 4th Iberian Conference (IbPRIA 2009), Póvoa de Varzim, Portugal, 10–12 June 2009; pp. 441–448. [Google Scholar]
  46. Tarawneh, A.S.; Hassanat, A.B.; Altarawneh, G.A.; Almuhaimeed, A. Stop Oversampling for Class Imbalance Learning: A Review. IEEE Access 2022, 10, 47643–47660. [Google Scholar] [CrossRef]
  47. Volaka, H.C.; Alptekin, G.; Basar, O.E.; Isbilen, M.; Incel, O.D. Towards continuous authentication on mobile phones using deep learning models. Procedia Comput. Sci. 2019, 155, 177–184. [Google Scholar] [CrossRef]
  48. Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K.; Mitchell, R.; Cano, I.; Zhou, T.; et al. Xgboost: Extreme Gradient Boosting. In R Package Version 0.4-2; R Packages: Vienna, Austria, 2015; Volume 1, pp. 1–4. [Google Scholar]
  49. Mushava, J.; Murray, M. A novel XGBoost extension for credit scoring class-imbalanced data combining a generalized extreme value link and a modified focal Loss function. Expert Syst. Appl. 2022, 202, 117233. [Google Scholar] [CrossRef]
  50. Hassanat, A.B.; Tarawneh, A.S.; Abed, S.S.; Altarawneh, G.A.; Alrashidi, M.; Alghamdi, M. Rdpvr: Random data partitioning with voting rule for machine learning from class-imbalanced datasets. Electronics 2022, 11, 228. [Google Scholar] [CrossRef]
  51. Zhuravlev, Y.I.; Laptin, Y.; Vinogradov, A.; Zhurbenko, N.; Likhovid, A. Nonsmooth optimization methods in the problems of constructing a linear classifier. Int. J. Inf. Model. Anal. 2012, 1, 103–111. [Google Scholar]
  52. Chakraborty, M.; Das, S. Determination of signal to noise ratio of electrocardiograms filtered by band pass and Savitzky-Golay filters. Procedia Technol. 2012, 4, 830–833. [Google Scholar] [CrossRef] [Green Version]
  53. Hess, C.W.; Pullman, S.L. Tremor: Clinical phenomenology and assessment techniques. Tremor Other Hyperkinetic Movements 2012, 2, 1–15. [Google Scholar] [CrossRef]
  54. Jaén-Vargas, M.; Leiva, K.M.R.; Fernandes, F.; Gonçalves, S.B.; Silva, M.T.; Lopes, D.S.; Olmedo, J.J.S. Effects of sliding window variation in the performance of acceleration-based human activity recognition using deep learning models. PeerJ Comput. Sci. 2022, 8, e1052. [Google Scholar] [CrossRef] [PubMed]
  55. Galar, M.; Fernandez, A.; Barrenechea, E.; Bustince, H.; Herrera, F. A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 2011, 42, 463–484. [Google Scholar] [CrossRef]
  56. Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  57. De Luca, C.J.; Gilmore, L.D.; Kuznetsov, M.; Roy, S.H. Filtering the surface EMG signal: Movement artifact and baseline noise contamination. J. Biomech. 2010, 43, 1573–1579. [Google Scholar] [CrossRef] [PubMed]
  58. He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar]
  59. Elazmeh, W.; Japkowicz, N.; Matwin, S. Evaluating misclassifications in imbalanced data. In Proceedings of the Machine Learning: ECML 2006: 17th European Conference on Machine Learning, Berlin, Germany, 18–22 September 2006; pp. 126–137. [Google Scholar]
  60. Barandela, R.; Sánchez, J.S.; Garcıa, V.; Rangel, E. Strategies for learning in class imbalance problems. Pattern Recognit. 2003, 36, 849–851. [Google Scholar] [CrossRef]
  61. Perusquía-Hernández, M.; Dollack, F.; Tan, C.K.; Namba, S.; Ayabe-Kanamura, S.; Suzuki, K. Smile Action Unit detection from distal wearable Electromyography and Computer Vision. In Proceedings of the 2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021), Jodhpur, India, 15–18 December 2021; pp. 1–8. [Google Scholar]
  62. Namba, S.; Sato, W.; Osumi, M.; Shimokawa, K. Assessing automated facial action unit detection systems for analyzing cross-domain facial expression databases. Sensors 2021, 21, 4222. [Google Scholar] [CrossRef]
  63. Jeni, L.A.; Cohn, J.F.; De La Torre, F. Facing imbalanced data–recommendations for the use of performance metrics. In Proceedings of the IEEE 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, Geneva, Switzerland, 2–5 September 2013; pp. 245–251. [Google Scholar]
  64. Alawneh, L.; Mohsen, B.; Al-Zinati, M.; Shatnawi, A.; Al-Ayyoub, M. A comparison of unidirectional and bidirectional lstm networks for human activity recognition. In Proceedings of the 2020 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops), Austin, TX, USA, 23–27 March 2020; pp. 1–6. [Google Scholar]
  65. Yue-Hei Ng, J.; Hausknecht, M.; Vijayanarasimhan, S.; Vinyals, O.; Monga, R.; Toderici, G. Beyond short snippets: Deep networks for video classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015; pp. 4694–4702. [Google Scholar]
  66. Sohankar, J.; Sadeghi, K.; Banerjee, A.; Gupta, S.K. E-bias: A pervasive eeg-based identification and authentication system. In Proceedings of the 11th ACM Symposium on QoS and Security for Wireless and Mobile Networks, Cancun, Mexico, 2–5 November 2015; pp. 165–172. [Google Scholar]
  67. Berman, Z. Outliers rejection in Kalman filtering—Some new observations. In Proceedings of the 2014 IEEE/ION Position, Location and Navigation Symposium (PLANS 2014), Monterey, CA, USA, 5–8 May 2014; pp. 1008–1013. [Google Scholar]
  68. Malhotra, P.; Vig, L.; Shroff, G.; Agarwal, P. Long Short Term Memory Networks for Anomaly Detection in Time Series. In Proceedings of the European Symposium on Artificial Neural Networks, Bruges, Belgium, 22–24 April 2015; Volume 2015, p. 89. [Google Scholar]
  69. De Marcos, L.; Martínez-Herráiz, J.J.; Junquera-Sánchez, J.; Cilleruelo, C.; Pages-Arévalo, C. Comparing machine learning classifiers for continuous authentication on mobile devices by keystroke dynamics. Electronics 2021, 10, 1622. [Google Scholar] [CrossRef]
  70. García-Pedrajas, N.; Ortiz-Boyer, D. An empirical study of binary classifier fusion methods for multiclass classification. Inf. Fusion 2011, 12, 111–130. [Google Scholar] [CrossRef]
  71. Cho, K.; Van Merriënboer, B.; Bahdanau, D.; Bengio, Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv 2014, arXiv:1409.1259. [Google Scholar]
  72. Bradbury, J.; Merity, S.; Xiong, C.; Socher, R. Quasi-recurrent neural networks. arXiv 2016, arXiv:1611.01576. [Google Scholar]
  73. Zhao, H.; Zhang, C. An online-learning-based evolutionary many-objective algorithm. Inf. Sci. 2020, 509, 1–21. [Google Scholar] [CrossRef]
  74. Dulebenets, M.A. An Adaptive Polyploid Memetic Algorithm for scheduling trucks at a cross-docking terminal. Inf. Sci. 2021, 565, 390–421. [Google Scholar] [CrossRef]
  75. Dulebenets, M.A.; Kavoosi, M.; Abioye, O.; Pasha, J. A self-adaptive evolutionary algorithm for the berth scheduling problem: Towards efficient parameter control. Algorithms 2018, 11, 100. [Google Scholar] [CrossRef] [Green Version]
  76. Pasha, J.; Nwodu, A.L.; Fathollahi-Fard, A.M.; Tian, G.; Li, Z.; Wang, H.; Dulebenets, M.A. Exact and metaheuristic algorithms for the vehicle routing problem with a factory-in-a-box in multi-objective settings. Adv. Eng. Inform. 2022, 52, 101623. [Google Scholar] [CrossRef]
  77. Rabbani, M.; Oladzad-Abbasabady, N.; Akbarian-Saravi, N. Ambulance routing in disaster response considering variable patient condition: NSGA-II and MOPSO algorithms. J. Ind. Manag. Optim. 2022, 18, 1035–1062. [Google Scholar] [CrossRef]
  78. Gholizadeh, H.; Fazlollahtabar, H.; Fathollahi-Fard, A.M.; Dulebenets, M.A. Preventive maintenance for the flexible flowshop scheduling under uncertainty: A waste-to-energy system. Environ. Sci. Pollut. Res. 2021. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Flowchart of the major steps in the proposed identification system.
Figure 1. Flowchart of the major steps in the proposed identification system.
Sustainability 15 10456 g001
Figure 2. The used overlapping sliding window technique for feature extraction.
Figure 2. The used overlapping sliding window technique for feature extraction.
Sustainability 15 10456 g002
Figure 3. Feature-selection results using the Correlation Coefficients of all features listed in Table 3 and Table 4.
Figure 3. Feature-selection results using the Correlation Coefficients of all features listed in Table 3 and Table 4.
Sustainability 15 10456 g003
Figure 4. Feature-selection results using the Fisher Score ranking all features listed in Table 3 and Table 4.
Figure 4. Feature-selection results using the Fisher Score ranking all features listed in Table 3 and Table 4.
Sustainability 15 10456 g004
Figure 5. Feature-selection results using the Information Gain of each feature listed in Table 3 and Table 4.
Figure 5. Feature-selection results using the Information Gain of each feature listed in Table 3 and Table 4.
Sustainability 15 10456 g005
Figure 6. Feature-selection results using feature ranking by the RFE and RF of all features listed in Table 3 and Table 4.
Figure 6. Feature-selection results using feature ranking by the RFE and RF of all features listed in Table 3 and Table 4.
Sustainability 15 10456 g006
Figure 7. Classifiers’ Overall Accuracy on filtered and unfiltered data.
Figure 7. Classifiers’ Overall Accuracy on filtered and unfiltered data.
Sustainability 15 10456 g007
Figure 8. AUC of identifying 10 users using RF classifier: (a) AUC results of the unfiltered data; (b) AUC results of the filtered data.
Figure 8. AUC of identifying 10 users using RF classifier: (a) AUC results of the unfiltered data; (b) AUC results of the filtered data.
Sustainability 15 10456 g008aSustainability 15 10456 g008b
Figure 9. Accuracy and Loss of the LSTM model trained and validated on 10 users. (a) Accuracy; (b) Loss.
Figure 9. Accuracy and Loss of the LSTM model trained and validated on 10 users. (a) Accuracy; (b) Loss.
Sustainability 15 10456 g009aSustainability 15 10456 g009b
Figure 10. AUC result of identifying 10, 20, 30, and 40 users. Almost similar curves were obtained for all these cases.
Figure 10. AUC result of identifying 10, 20, 30, and 40 users. Almost similar curves were obtained for all these cases.
Sustainability 15 10456 g010
Figure 11. The identification performance of the XGBoost on different numbers of users.
Figure 11. The identification performance of the XGBoost on different numbers of users.
Sustainability 15 10456 g011
Figure 12. The identification performance of the LSTM on different numbers of users.
Figure 12. The identification performance of the LSTM on different numbers of users.
Sustainability 15 10456 g012
Figure 13. The confusion matrices for authenticating User1 with both data sensors combined, (a) using XGBoost classifier and (b) using LSTM classifier.
Figure 13. The confusion matrices for authenticating User1 with both data sensors combined, (a) using XGBoost classifier and (b) using LSTM classifier.
Sustainability 15 10456 g013
Table 1. Summary of some related literature: number of users, feature-selection methods, classifiers, and performance metrics. Optimal Feature Evaluation and Selection (OFES), Correlation-based Feature Subset Selection (CFSS), Scaled Manhattan (SM), Scaled Euclidean (SE), Gaussian Naive Bayes (GNB), Stochastic Gradient Descent (SGD), Accuracy (Acc), Decision Tree (DT), Stochastic Gradient Descent (SGD), Voting Classifier (VC).
Table 1. Summary of some related literature: number of users, feature-selection methods, classifiers, and performance metrics. Optimal Feature Evaluation and Selection (OFES), Correlation-based Feature Subset Selection (CFSS), Scaled Manhattan (SM), Scaled Euclidean (SE), Gaussian Naive Bayes (GNB), Stochastic Gradient Descent (SGD), Accuracy (Acc), Decision Tree (DT), Stochastic Gradient Descent (SGD), Voting Classifier (VC).
WorkUsersSensorsFeature SelectionClassifiersMetrics
[23]30Accelerometer
Gyroscope
OFSS
CFSS
KNN
NB
RF
SVM
Acc
[26] Accelerometer
Gyroscope
Gravity
-HMM-UBMEER
[27]100
Accelerometer
Gyroscope
Magnetometer
-SVM
SM
SE
EER
[29]40Flora Chip
Pulse Sensor Amped Chip
ReliefF
KNN
DT
SVM
FAR
FRR
[28]51Accelerometer
Gyroscope
-
KNN
DT
RF with Voting
EER
Acc
[47] Accelerometer
Gyroscope
Magnetometer
Touch Screen
-Network with Three
Dense Layers
EER
Acc
Pre
Proposed10
20
30
40
50
Accelerometer
Gyroscope
CC
FS
IG
RFE
All Features
KNN
RF
DT
GNB
SGD
ANN
VC
XGBoost
LSTM
Acc
Pre
Rec
F1
GEO
IBA
AUC
Table 2. Summary of HMOG dataset characteristics.
Table 2. Summary of HMOG dataset characteristics.
CategoryContent
AccelerometerTimestamp, acceleration force along x/y/z axis
Subjects53 (males), 47 (females)
Number of Examples/Subject (Class)in the range of 1162 to 4639
Table 3. The extracted statistical Features from both signals: the Magnitude (Mag), and the Fast Fourier Transform (FFT).
Table 3. The extracted statistical Features from both signals: the Magnitude (Mag), and the Fast Fourier Transform (FFT).
FeatureDescriptionEquation
Above mean (Mag)Count of acceleration measurements that are larger than the mean acceleration value of a corresponding window.
| W + | : W + = a t ϵ W : a t > 1 W l t = 0 W l a t
Above mean (FFT)Count of frequencies that are larger than the mean frequency value of a corresponding window.
| W + | : W + = f t ϵ W : f t > 1 W l t = 0 W l f t
Below mean (Mag)Count of acceleration measurements that are smaller than the mean acceleration value of a corresponding window.
| W | : W = a t ϵ W : a t < 1 W l t = 0 W l a t
Below mean (FFT)Count of frequencies that are smaller than the mean frequency value of a corresponding window.
| W | : W = f t ϵ W : f t < 1 W l t = 0 W l f t
Complexity-invariant distance (Mag)Calculate how complex the time series is (number of peaks, valleys, etc.).
C I D = t = 1 w l 1 ( a t a t + 1 ) 2
Complexity-invariant distance (FFT)Calculate how complex the frequency series is (number of peaks, valleys, etc.).
C I D = t = 1 w l 1 ( f t f t + 1 ) 2
Skewness (Mag)Measure of the symmetry at the right and the left of the center point of the window.
γ l = t = 0 W l ( a t a ¯ w ) W l ( s w ) 3
Skewness (FFT)Measure of the symmetry at the right and the left of the center point of the window.
γ l = t = 0 W l ( f t f ¯ w ) W l ( s w ) 3
Standard deviation (Mag)The average amount of how far each acceleration value lies from the mean acceleration.
s w = t = 0 W l ( a t a ¯ w ) 2 W l 1
Standard deviation (FFT)The average amount of how far each frequency value lies from the mean frequency.
s w = t = 0 W l ( f t f ¯ w ) 2 W l 1
Sample entropy (Mag and FFT)The complexity of physiological frequency series signals.
S a m p l e E n ( m , r ) = log e A m + 1 ( r ) A m ( r )
Maximum acceleration (Mag)The maximum magnitude value.
m a x ( a ) = m a x t = 0 W l   a t
Maximum frequency (FFT)The maximum frequency value.
m a x ( f ) = m a x t = 0 W l   f t
Median (Mag)The middle number in the sorted magnitude values. { a t ( i ) : i = W l ( O ) + 1 2 a t ( i ) + a t ( i + 1 ) 2 : i = W l ( E ) 2
Median (FFT)The middle number in the sorted frequency values. { f t ( i ) : i = W l ( O ) + 1 2 f t ( i ) + f t ( i + 1 ) 2 : i = W l ( E ) 2
Mean (Mag)Average of the magnitude in a corresponding window. 1 W l t = 0 W l a t
Mean (FFT)Average of the frequencies in a corresponding window. 1 W l t = 0 W l f t
Table 4. The extracted statistical features from both signals: the magnitude (Mag) and the Fast Fourier Transform (FFT)—continued.
Table 4. The extracted statistical features from both signals: the magnitude (Mag) and the Fast Fourier Transform (FFT)—continued.
FeatureDescriptionEquation
Kurtosis (Mag)Measure of acceleration outliers in a corresponding window. β 2 = 1 W l t = 0 W l a t a ¯ w 4 s w 4
Kurtosis (FFT)Measure of frequency outliers in a corresponding window. β 2 = 1 W l t = 0 W l f t f ¯ w 4 s w 4
Autocorrelation (Mag)Serial correlation value with lag = 1 of accelerations within a corresponding window. R ^ ( l ) = 1 W l l s w 2 t = 0 W l l a t a ¯ a t + l a ¯ w
Autocorrelation (FFT)Serial correlation value with lag = 1 of frequencies within a corresponding window. R ^ ( l ) = 1 W l l s w 2 t = 0 W l l f t f ¯ f t + l f ¯ w
Sum of absolute differences (Mag)Sum over the absolute value of consecutive changes in the magnitude series. S A D = t = 0 W l 1 a ( t + 1 ) a t
Sum of absolute differences (FFT)Sum over the absolute value of consecutive changes in the frequency series. S A D = t = 0 W l 1 f ( t + 1 ) f t
Energy (Mag)Sum of all squared accelerations for the corresponding window. E = t = 0 W l a t 2
Energy (FFT)Sum of all squared frequencies for the corresponding window. E = t = 0 W l f t 2
Peaks (Mag)Number of peaks (values bigger than its 2 neighbors). E = t = 0 W l f t 2
Amplitude of peak power spectral density (Mag)Amplitude of the maximum signal power. A ( P S D P ) = max a t W 1 W l t = 0 W l 1 a t e j 2 π k t W l 2
Median frequency (FFT)Frequency dividing the total power area into two equal amplitude parts. f m e d : f = f l f m e d P S D = 1 2 f = f l f h P S D
Frequency dispersion (FFT)Dispersion of sample frequencies within a corresponding window. f d i s p = 2 f s t e p : f m e d f s t e p f m e d + f s t e p P S D = 68 100 f = f l f h P S D
Fundamental frequency (FFT)Frequency that carries the maximum energy. f f u n d : P S D f u n d = max f h f l P S D
Frequency difference (FFT)Sum over the difference value of consecutive changes in the frequency series. f Δ = f m e d f f u n d
Spectral centroid amplitude (FFT)Weighted average of the amplitude spectrum of frequencies of a corresponding window. S C A = f = f l f h ( f ) ( P S D ) f = f l f h ( f )
Maximum weighted PSD (FFT)The maximum spectral energy of the frequency component within the corresponding window. P S D ( w ) m a x = max f h f l ( f ) ( P S D )
Table 5. All classifiers’ performance on filtered and unfiltered signals.
Table 5. All classifiers’ performance on filtered and unfiltered signals.
ClassifierWithout FilterWith Filter
PreRecSpeF1GEOIBAPreRecSpeF1GEOIBA
ANN0.850.850.980.850.910.820.610.610.950.610.760.56
DT0.750.750.970.750.850.710.450.440.940.440.640.39
GNB0.50.50.940.460.650.440.310.30.910.290.480.25
KNN0.830.830.980.830.90.80.50.490.940.480.670.44
LR0.630.630.950.620.770.580.490.480.940.480.670.43
RF0.840.840.980.840.910.810.590.590.950.590.740.54
SGD0.550.540.940.530.710.490.430.420.930.410.620.37
SVM0.850.850.980.850.910.820.550.550.950.540.720.5
Voting0.870.870.980.870.920.840.610.610.950.610.760.56
XGBoost0.880.880.990.880.930.860.60.60.950.60.750.55
Table 6. Identification Accuracy with different feature-selection methods and subsets.
Table 6. Identification Accuracy with different feature-selection methods and subsets.
ClassifierCorr. Coef.Fisher ScoreInfo. GainRFEAll Feat
TopTopTopOverOver102030
1020300.10.15FeatFeatFeat
ANN0.770.590.820.840.830.80.790.840.840.85
RF0.80.590.820.830.830.820.810.840.840.84
SVM0.760.580.80.850.830.780.770.850.850.85
Voting0.790.60.830.860.850.810.80.860.870.87
XGBoost0.820.60.860.870.860.850.830.880.880.88
Table 7. Classifiers’ performance with different metrics. Bold values represent the top level of performance.
Table 7. Classifiers’ performance with different metrics. Bold values represent the top level of performance.
ClassifierPreRecSpeF1GEOIBAAUCAcc
ANN0.850.850.980.850.910.820.990.85
DT0.750.750.970.750.850.710.860.75
GNB0.50.50.940.460.650.440.870.5
KNN0.830.830.980.830.900.80.960.83
LR0.630.630.950.620.770.580.930.63
RF0.840.840.980.840.910.810.990.84
SGD0.550.540.940.530.710.490.900.54
SVM0.850.850.980.850.910.820.990.85
Voting0.870.870.980.870.920.840.990.88
XGBoost0.880.880.990.880.930.860.990.88
LSTM0.890.890.990.880.930.860.990.89
Table 8. The authentication performance of XGBoost on data obtained from different data sources. Bold values represent the top level of performance.
Table 8. The authentication performance of XGBoost on data obtained from different data sources. Bold values represent the top level of performance.
SensorUser1User2
PreRecSpeF1GEOIBAAccPreRecSpeF1GEOIBAAcc
Accel0.970.760.990.850.870.740.970.930.710.990.800.840.680.96
Gyro0.940.650.990.770.800.620.950.880.320.990.470.560.300.93
Accelerometer + Gyro0.980.830.990.900.910.820.980.950.800.990.870.890.780.97
Table 9. The authentication performance of LSTM on data obtained from different data sources. Bold values represent the top level of performance.
Table 9. The authentication performance of LSTM on data obtained from different data sources. Bold values represent the top level of performance.
SensorUser1User2
PreRecSpeF1GEOIBAAccPreRecSpeF1GEOIBAAcc
Accel0.980.970.990.980.990.990.990.960.950.990.950.970.940.99
Gyro0.950.870.990.910.930.860.980.880.720.990.790.840.690.96
Accelerometer + Gyro0.990.990.990.990.990.990.990.990.990.990.990.990.990.99
Table 10. Comparison of the results with those of some related works.
Table 10. Comparison of the results with those of some related works.
WorkSensorsDatasetApproachMetrics
[26]Accelerometer
Gyroscope
Gravity
HMOG,
Proprietary Dataset
HMM-UBMEER 14.8%,
EER 19.2%
[27]Gyroscope
Magnetometer
HMOG
Tab and Keystroke
SVMHMOG,
Tap and
Keystroke
EER 7.16%
[47]Gyroscope
Scroll
HMOGDeep LearningEER 14%
Acc 88%
Pre 80%
[69]KeystrokeHMOGRFAcc 68%
Pre 71%
Rec 76%
F1 73 %
AUC 72%
MCC 59%
This Study/
Identification
AccelerometerHMOGLSTMPre 89%
Rec 89%
Spe 99%
F1 88%
GEO 93%
IBA 86%
AUC 99%
Acc 89%
This Study/
Authentication
Accelerometer
Gyroscope
HMOGLSTMPre 99.9%
Rec 99.9%
Spe 99.9%
F1 99.9%
GEO 99.9%
IBA 99.9%
AUC 99.9%
Acc 99.9%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Al-Mahadeen, E.; Alghamdi, M.; Tarawneh, A.S.; Alrowaily, M.A.; Alrashidi, M.; Alkhazi, I.S.; Mbaidin, A.; Alkasasbeh, A.A.; Abbadi, M.A.; Hassanat, A.B. Smartphone User Identification/Authentication Using Accelerometer and Gyroscope Data. Sustainability 2023, 15, 10456. https://doi.org/10.3390/su151310456

AMA Style

Al-Mahadeen E, Alghamdi M, Tarawneh AS, Alrowaily MA, Alrashidi M, Alkhazi IS, Mbaidin A, Alkasasbeh AA, Abbadi MA, Hassanat AB. Smartphone User Identification/Authentication Using Accelerometer and Gyroscope Data. Sustainability. 2023; 15(13):10456. https://doi.org/10.3390/su151310456

Chicago/Turabian Style

Al-Mahadeen, Eyhab, Mansoor Alghamdi, Ahmad S. Tarawneh, Majed Abdullah Alrowaily, Malek Alrashidi, Ibrahim S. Alkhazi, Almoutaz Mbaidin, Anas Ali Alkasasbeh, Mohammad Ali Abbadi, and Ahmad B. Hassanat. 2023. "Smartphone User Identification/Authentication Using Accelerometer and Gyroscope Data" Sustainability 15, no. 13: 10456. https://doi.org/10.3390/su151310456

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop