Investigation of Using CAPTCHA Keystroke Dynamics to Enhance the Prevention of Phishing Attacks

Alamri, Emtethal K.; Alnajim, Abdullah M.; Alsuhibany, Suliman A.

doi:10.3390/fi14030082

Open AccessArticle

Investigation of Using CAPTCHA Keystroke Dynamics to Enhance the Prevention of Phishing Attacks

by

Emtethal K. Alamri

^1,*

,

Abdullah M. Alnajim

¹

and

Suliman A. Alsuhibany

²

¹

Department of Information Technology, College of Computer, Qassim University, Buraydah 51452, Saudi Arabia

²

Department of Computer Science, College of Computer, Qassim University, Buraydah 51452, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Future Internet 2022, 14(3), 82; https://doi.org/10.3390/fi14030082

Submission received: 24 February 2022 / Revised: 7 March 2022 / Accepted: 8 March 2022 / Published: 10 March 2022

(This article belongs to the Section Cybersecurity)

Download

Browse Figures

Versions Notes

Abstract

:

Phishing is a cybercrime that is increasing exponentially day by day. In phishing, a phisher employs social engineering and technology to misdirect victims towards revealing their personal information, which can then be exploited. Despite ongoing research to find effective anti-phishing solutions, phishing remains a serious security problem for Internet users. In this paper, an investigation of using CAPTCHA keystroke dynamics to enhance the prevention of phishing attacks was presented. A controlled laboratory experiment was conducted, with the results indicating the proposed approach as highly effective in protecting online services from phishing attacks. The results showed a 0% false-positive rate and 17.8% false-negative rate. Overall, the proposed solution provided a practical and effective way of preventing phishing attacks.

Keywords:

phishing attacks; keystroke dynamics; text-based CAPTCHA; authentication

1. Introduction

Due to the increasing use of the Internet in many aspects of modern life, the number and complexity of attacks on cyber-security have also been increasing exponentially day by day, making it difficult to identify, analyse, and regulate the relevant risk events [1]. Cyber-attacks are defined as any digital attempt to steal, disrupt, or gain unauthorised access to the computing environment/infrastructure so that controlled information can be stolen [2]. Attempts of this kind always involve unauthorised access to sensitive data, whether personal or organisational, thereby violating the confidentiality, integrity, and availability of that information. Consequently, companies and institutions are obliged to focus on ensuring the security of their own online services because an attack of any kind can have long-term effects, giving rise to severe financial losses, as well as the loss of customers’ trust.

Phishing, which is considered to be one of the main types of cyber-attacks faced by online service users, is a dangerous and increasingly common phenomenon. Lastdrager has defined phishing as “a scalable act of deception whereby impersonation is used to obtain information from a target” [3]. The term was coined as a serious cyber threat in 1996, when phishers stole information about the credentials of America Online (AOL) users [4]. Since this attack on AOL, phishers have continued to change and develop their methods of attacking higher-value targets.

In a phishing attack, the user is usually asked to log onto a fake website—which mimics a legitimate website—by opening a malicious email attachment. When a user fails to recognise this as a phishing attempt and inputs his or her log-in information, the phisher captures the log-in credentials, credit card information, etc. for the user’s account.

In July of 2021, the Anti-Phishing Working Group (APWG) reported 260,642 phishing websites [5]. According to the APWG, this is the highest monthly total in its reporting history. Figure 1 shows the number of phishing occurrences from the first quarter of 2021 to the third quarter of 2021.

Statistics indicate that the incidence of phishing scams has doubled in recent years due to the 2019 coronavirus (COVID-19) pandemic, which suggests that phishers are seeking opportunities to exploit current events. As stated by the World Health Organization (WHO), COVID-19 has established an ‘infodemic’ that actually benefits phishers [6]. In addition, the FBI claims that over 11 times as many phishing complaints were logged in 2020 as in 2016 [7] since phishing attacks often tailor their campaigns to current events. Nonetheless, the number of phishing scams declined during the first quarter of 2021, as shown in Figure 1, with users becoming more aware of COVID-19 phishing scams. This resulted in phishers being less successful with emails related to the pandemic. However, phishers continued to exploit the pandemic, especially with the COVID-19 vaccination rollout [8]. In addition, phishers frequently impersonate leading brands in a bid to steal confidential information from users, such as their payment credentials. New reports show that the most frequently imitated brands in global phishing attempts are Microsoft and the DHL delivery service provider [9]. All previous statistical explanations refer to the risk of phishing attacks. Phishing is considered as a lucrative criminal activity, which targets individuals and organisations and incurs millions of dollars in losses every day. Moreover, it is seldom prosecuted. According to a recent study conducted by the Ponemon Institute, the annual cost of phishing attacks in the US has increased significantly in the last six years to the point where large US companies are now paying out $15 million each year. This amounts to nearly $1500 per employee annually [10]. To date, anti-phishing techniques have not been sufficiently effective to reduce the risk of phishing attacks. Due to phishers seeking to identify the weaknesses and vulnerabilities of a given solution so that these can be exploited to carry out a successful attack, it is essential to protect users’ data from phishing attacks by educating them in the correct course of action and reporting methods if a phishing email is received. For example, Qassim University is about to launch a new awareness programme to mitigate phishing attacks. This programme is similar to one that was implemented in Stanford University, which now has a phishing awareness service [11]. Moreover, online services need to adopt a reliable phishing-prevention mechanism to ensure that only genuine users gain access to their systems. Consequently, many organisations currently seek to protect their systems by implementing stronger authentication requirements as a means of preventing unauthorised access. For example, online banking services exclusively use one-time passwords (OTP) to prevent identity theft, wherein a new password is generated and required to enter for each log-in attempt. Furthermore, biometric authentication is another promising trend for combatting phishing attacks. There are a number of different systems that apply biometric information as a means of identifying people, as in the case of civil, government, and healthcare identification. Biometric authentication schemes have been gaining popularity over other types of authentication in recent years since they provide high security to protect people’s identities and are easily combined with traditional authentication techniques.

Biometric information may be divided approximately into physiological and behavioural characteristics. The biometric information used in physiological authentication techniques is derived from an individual’s physical traits, such as fingerprint and face recognition. However, the measurement of these characteristics is very costly to deploy, as is the accompanying hardware. Conversely, behavioural characteristics are based on what users have learned or acquired that differentiates them from others. These include keystroke and mouse dynamics. Out of the many possible biometric traits, keystroke dynamics are the most popular and have been extensively studied for recognition purposes [12]. Recent research has investigated the effectiveness of keystroke dynamics in order to increase the level of security in authentication systems. These studies have varied in their approach, adopting different classes of keystroke dynamics (for example, free- and fixed-text), pattern classification techniques (such as statistical and machine-learning), and experimental environments (controlling or non-controlling).

All have yielded promising results, but the results obtained with free text are undoubtedly more secure than those produced with fixed text. Moreover, numerous studies have sought to explore the intrinsic benefits of free-text keystroke dynamics in providing continuous and non-intrusive authentication. Therefore, this current study investigates the effectiveness of incorporating free-text keystroke dynamics into completely automated public Turing test sentences (CAPTCHAs) in order to be able to distinguish between computers and humans (CAPTCHAs), thereby preventing phishing attacks.

CAPTCHA technology has played a significant role as a defence mechanism, protecting Web security from malicious bot programmers across the Internet. It is one of the recognised shields used to distinguish between humans and computer programs (bots). CAPTCHA technology generates simple tests based on problems that humans can solve with ease but that are difficult for computers (i.e., artificial intelligence [AI]) [13]. When the right answer is received, it is, consequently, assumed that it was entered by a human, so the user is given access to the system [14]. CAPTCHAs exist on most websites and are mainly classified into four types: image-, audio-, video-, and text-based.

Text-based CAPTCHAs are one of the most widely used CAPTCHA schemes, requiring users to read distorted text (digits/letters) that is presented in an image in registration or log-in forms. Users must recognise and write the text in the input text box to obtain validation. Only then will they be granted access to the site, provided that the input text matches the CAPTCHA characters and/or digits. This task supposedly cannot be solved by AI programs. Popular platforms, such as Microsoft, Google, eBay, and Yahoo, have used this scheme as a security arrangement to authenticate users and enhance website security. In this study, the term ‘CAPTCHA’ refers solely to text-based CAPTCHAs. The following is a summary of this study:

−: Designed and implemented an effective and secure approach to investigate the effectiveness of using CAPTCHA keystroke dynamics in enhancing the prevention of phishing attacks.
−: Analysed the existing schemes to design effective CAPTCHA, which helps to take advantage of keystroke dynamics to prevent phishing attacks.
−: Significant time features were selected, representing users’ typing behaviour, and measured according to the existing literature. To the best of our knowledge, these features have not been used before in preventing phishing attacks.
−: Appropriate similarity threshold was determined to produce excellent results.
−: Collected a large number of participants compared with previous works.
−: A controlled laboratory experiment was conducted in order to practically evaluate the approach applied.

The structure of this paper is as follows: Section 2 reviews related work and the background of the study, while Section 3 outlines the proposed work, and Section 4 presents the methodology. Section 5 then describes the experimental study, and Section 6 includes the evaluation metrics, while Section 7 presents and discusses the results of the experiment. Section 8 concludes the paper and provides some direction for future work.

2. Related Work and Background

To address the issue of phishing attacks, several solutions have been proposed to mitigate their effect. These solutions may be classified as technical (using software), non-technical (educational), or a combination of the two. Technical solutions can be further categorised as phishing detection and phishing prevention. Phishing detection involves analysing a website’s media content. It can use an image or URL to verify whether the website is genuine [15]. Several anti-phishing detection techniques have been developed to mitigate the impact of phishing attacks, such as security toolbars, search engines [16], black/white lists [17], machine learning [18], and visual similarity-based techniques [19]. Phishing prevention solutions attempt to prevent phishing attacks by enhancing website security. This is endeavoured through the implementation of unique methods of securing authentication schemes and platforms for user interaction [20]. These solutions have achieved widespread success in recent years, especially with the increase in identity theft and data leakage resulting from diverse means of attack, such as phishing, email spoofing, and malware. Therefore, companies, including banks, have focused on integrating strong authentication methods to protect their systems based on the premise that, even if phishers manage to steal victims’ information, they cannot use it to achieve their goals. For instance, in one study [21], an OTP and authentication tokens were used to construct a system of two-factor authentication to prevent phishing attacks. In another study [22], a novel authentication system was developed for online banking, which depended on integrated OTP and QR-codes to provide greater security and convenience for users. The authors in Ref. [23] have proposed a two-factor authentication/verification scheme using fingerprint and code verification with a pattern matrix to prevent phishing attacks. In the above study, it was claimed that the scheme would be very useful in the financial sector to provide a high level of security, with a phisher being unable to bypass two levels of authentication and verification. Moreover, Chetalam introduced a multi-factor authentication scheme to improve the security of a mobile money transfer system (the Mpesa app) in Kenya, integrating username and password, phone number, and voice biometrics to prevent mobile money fraud, including phishing attacks [24].

In a recent study [25], another approach to protecting electronic payment systems was developed, combining password, fingerprint, and OTP verification. This combined approach was intended to provide more reliable user authentication, consisting of multi-factor authentication in three main phases: (1) Registration: with various types of data being required (i.e., the user’s biodata, password, and fingerprint), followed by verification of accuracy and storage in the database, (2) Authentication: with the user required to enter a password and biometric fingerprint so that the latter could be checked against the fingerprint stored in the database; if a match was verified, access would be granted, and (3) Transaction: with the user selecting the amount for transfer and being authenticated by scanning their biometric fingerprint. If a match was verified, the system would create an OTP and transmit it to the user’s registered phone number. The user would then have to enter the correct OTP to complete the money transfer.

In other studies [26,27], a novel approach was introduced for the detection of phishing websites and authentication of registered users through CAPTCHA-based validation. The first study used visual cryptography to protect the privacy of an image-based CAPTCHA by dividing the original image CAPTCHA into two shares. These were maintained on separate database servers (the user’s device and the server). As a result, the original image CAPTCHA was only exposed if both CAPTCHAs were simultaneously available. The proposed methodology was implemented by using Matlab without investigating the effectiveness of the approach in a real experiment. In contrast, the other study used a cropped image CAPTCHA (CIC) algorithm, where the user undertook two phases: registration and log-in. In the registration phase, the user was required to enter all details, select two images from among 80 images that were randomly generated by the server, and perform a cropping process based on predefined constraints. The system subsequently created a user profile containing registration details and the cropped pieces, which would then be sent server-side for storage in a database. During the log-in phase, after entering their details, the users could determine whether a suspicious webpage was legitimate or a phishing website according to the image received in this phase. If the user received any of the images selected in the registration phase, along with nine random images, the suspicious webpage could be identified as legitimate. Otherwise, it would be recognised as a phishing website. If a webpage was determined as legitimate, the user would have to perform the same cropping procedure as for registration. To evaluate the proposed system, the authors recruited 114 users, all familiar with the Internet and computers. The authors used three verification keys, which were password, PIN, and the proposed method, to evaluate the user performance over three months. The results indicate that the CIC success rate was higher than a password at 9% and higher than a PIN at 1% in the third month. In addition, some demographic data were collected in the survey to evaluate the proposed approach in terms of ease to remember and response time. According to the results, 82% of the participants believed CIC was easier to use, while 11% believed CIC was slow to solve. Moreover, the study by the authors of Ref. [28] proposed a method to distinguish between authorized and unauthorized users by combining CAPTCHA attributes and human capabilities called bio-detection functionalities (BDF). The authors concluded that a CAPTCHA could help protect against third-party attacks when it is embedded with mechanisms (i.e., a BDF mechanism) that help identify the user.

Furthermore, in Ref. [29], it was highlighted how behavioural biometric authentication could be used to prevent phishing. The above author discussed how keystroke dynamics could be used to prevent phishing attacks, as well as ensuring that phishers do not impersonate users. However, this idea was merely underlined without proving the concept in an experimental study.

The use of the keystroke dynamics technique is a promising trend in identifying users from typing patterns. In the early 1980s, both the National Science Foundation (NSF) and National Institute of Standards and Technology (NIST) in the US found in studies that typing patterns were unique and recognisable [30]. These initial promising results prompted a large number of researchers to explore keystroke dynamics, whereupon new methods were developed to improve the efficiency of using keystroke dynamics in user authentication, achieved by measuring and testing users’ typing patterns. In fact, all research in this area has sought to provide efficient solutions that will provide a robust and inexpensive authentication mechanism. In particular, a study [31] examined the use of free text, applying Euclidean principles to obtain the distances between key pairs according to their position on the keyboard. An interesting finding of this work, which relates closely to the present research, is that flight time is more relevant than dwell time. This finding is evaluated in the current study. The best results were obtained with the features, down-down (DD), up-down (UD), and up-up (UU), producing a 21% false acceptance rate (FAR) and 17% false rejection rate (FRR).

In addition, two studies [32,33] investigated a free-text keystroke dynamic authentication approach for English and Arabic input. However, the timing features and classification algorithms used in both the above studies differed. For example, in the first, five extracted timing features were used for each key-pair to be included in the user feature vector. The experiment was conducted on a sample of 21 participants. The researchers used decision trees (DTs) and support vector machines (SVMs) to classify users based on the proposed timing features. The results for the English input revealed FAR = 0.245 and FRR = 0.613 for the SVM classifier. Meanwhile, the DT results showed that FAR = 0.281 and FRR = 0.702. In contrast, the other study used keystroke duration, di-graph duration, and latency, combined as a single feature to distinguish among and calculated to classify the user samples. The results for the English input indicated that FAR = 0.22 and FRR = 0. However, in both studies, only the English input results were reviewed since they related to the proposed study. In this regard, a study in Ref. [33] achieved more promising results than a study in Ref. [32]. Table 1 provides a simple comparison of the most relevant studies with the current work, clearly indicating the contributions.

3. Proposed Work

To precisely define the proposed work, this section describes a system interpretation approach to the prevention of phishing attacks as a means of protecting the online services. This system deploys keystroke timing data to prevent phishing attacks, as well as providing secure authentication, using log-in credentials (username and password) and a text-based CAPTCHA, which integrates keystroke dynamics into a single system to prevent phishing attacks. Keystroke dynamics capture a user’s typing patterns in order to identify that user as it is difficult to reproduce a user’s typing pattern. The keystroke dynamics system evaluates a user’s typing pattern in milliseconds (ms). In general, the system verifies users who are requesting to access an online service. If a phisher requests access, they will either be denied or the request will be accepted.

The approach illustrated in Figure 2 provides the basis of the experimental study presented in the next section. In the system model, (1) a text-based CAPTCHA is shown on the website’s log-in page. The user then enters their credentials and solves the CAPTCHA; (2) the request, along with the keystroke timing data, is sent to the authentication server; (3) the server checks the user’s details and compares them with the user’s profile in the database; and (4) the server grants access or rejects the user’s request. The following section details the proposed methodology for this study.

4. Methodology

This section presents the proposed biometric user keystroke dynamics authentication system and its components. Figure 2 depicts the proposed system approach and how it serves to prevent phishing attacks. According to Figure 2, the main implementation steps involve identifying time features that represent users’ typing behaviour and extracting timing vectors. Moreover, the methods of measuring distance and classifying users are explained, and the typing text used in this experiment is identified.

4.1. Definition of Features

Keystroke dynamics are mainly based on features of time, but some research has been grounded on other features, such as pressure, the sequence of special action keys (i.e., left-right Alt, Shift, Ctrl), and speed typing. All these features have been calculated in milliseconds. The present work applied time features obtained from two keyboard actions: depression (D_n) and release (U_n) for each key typed, wherein n indicates the key, and time is recorded in milliseconds. In this study, three timing features were extracted, as suggested by [33]:

Keystroke duration or hold time: the interval between a key being pressed and released, which may be computed according to the following formula:

H_K1 = U_k1 − D_k1

Keystroke latencies (also called press-press or DD time): time taken by the user to press two consecutive keys, which can be calculated with the following formula:

DD = D_k2 − D_k1

Di-graph duration: time difference between releasing one key and pressing another, computed with the following formula:

UD = D_k2 − U_k1

Figure 3 presents an example of these time features extracted for two keys. Based on this example, the hold time for key ‘B’ = 300 − 000 = 300 ms, and for key ‘I’= 750 − 400 = 350 ms. In addition, the DD time (for keys ‘B’ and ‘I’) = 400 − 000 = 400 ms and the UD time = 400 − 300 = 100 ms, respectively.

4.2. Extraction of Timing Vectors

After extracting time features, the collected data were pre-processed to remove outliers and noisy data (i.e., the large amounts of data generated when a user presses two keys simultaneously in error). The server then calculated the mean values for each time feature (hold time, UD time, and DD time) to build the user’s profile, as in Ref. [33]. The system also assigned each participant a unique ID for their identification, which could also be used to label the data. Thus, the time vectors were categorised based on the user’s label data. In addition, the system provided a fake IP address for each participant because the experiment was conducted on a single laptop.

4.3. Finding the Distance and Classification Methods

To determine how much a user’s test data matched their profile, a Euclidean distance measure was used. Thus, the system measured the distance between two vectors based on a Euclidean distance equation in three-dimensional space [34]:

d (x, y) = \sqrt{{(x_{1} - y_{1})}^{2} + {(x_{2} - y_{2})}^{2} + {(x_{3} - y_{3})}^{2}}

(1)

where x and y are two timing vectors. In this study, the user login was x and the user profile vector was y. Moreover, d(x, y) ≥ 0 [34]. Algorithm 1 indicates how Euclidean distance was computed in the proposed work.

Algorithm 1: Euclidean distance (ED)

1: begin
2: Compute the different values between two timing vectors.
3: Calculate Square value
4: Sum the values of step 3
5: Take the square root
6: end

To identify the users, standard deviation (SD) was applied as a threshold for Euclidean distance in an approach inspired by the work in Ref. [33]. In the proposed system, Euclidean distance was compared to the SD. If the Euclidean distance was less than the half rows of SD values stored in the database, it was considered as a similarity threshold, with the new time vector belonging to the same user as the profile being compared. Hence, the new vector would be stored in the database. Otherwise, the system would give the user six attempts. If this number of attempts was exceeded, the user would be classified as a phisher.

SD = \sqrt{\frac{\sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{2}}{N - 1}}

(2)

where x_i = {x₁, x₂, …,x_i} represents the values of the time features, these being the mean values for hold, UD, and DD times. Meanwhile,

\bar{X}

represents the mean value of all features used, and N represents the number of features [35]. Algorithm 2 illustrates how to compute the SD in the proposed work.

Algorithm 2: Standard Deviation (SD)

1: begin
2: Compute the mean values for each feature
3: Calculate Square value
4: Sum the value of step 3
5: Divide by 2
6: Take the square root
7: end

4.4. Typing Text

There are two main phases in this study, as in all keystroke dynamics systems: enrolment and verification. The proposed study deals with free text because text-based CAPTCHA provides completely different text each time. Thus, it does not require the user to memorise any text. For the purpose of usability, the system generates a CAPTCHA that combines lowercase letters (a–z) and digits (1–9). The proposed solution targets strings with lowercase letters to exclude the use of shift and caps lock keys because the system need only focus on collecting key events for the character keys, thereby avoiding any other keystroke sequences.

Informed by the pilot experiment, some confusable letters were removed to increase the usability and accuracy of the proposed solution. For example, the numerical digit ‘0’ was removed because it is often confused with the letter ‘O’, and the capital letter ‘I’ was removed because it is often confused with the lowercase letter ‘l’. The lowercase letters ‘l’, ‘s’, and ‘g’ were also removed because they are often mistaken for the numbers ‘1’, ‘5’, and ‘9’, respectively, as suggested in Ref. [36]. In addition, the font size was increased to 48 points to ensure that all the characters could be clearly read by the users. The Verdana font was selected because it is stated in the literature that users solve CAPTCHAs more accurately when using the Arial or Verdana fonts [37]. Figure 4 shows a generated sample.

Previous studies have focused on using a long-text system to obtain a large amount of timing information. However, this method has a longer training phase and is not user-friendly. In addition, the accuracy of this technique is not high because the user must pause frequently to look at the text during the copy task, which can lead to inconsistencies in the collected data [38]. Therefore, in the proposed solution, a text length of 10 characters was adopted to authenticate keystroke dynamics. This text length was selected based on Ref. [39], in which the authors investigated a number of studies on anomaly detection using authentication via keystroke dynamics, wherein they observed that a text length of 10 characters was typical in keystroke dynamics authentication. This text length has proved to be effective when applied with a text-based CAPTCHA to detect attacks from humans [40]. However, the proposed system asks the user to solve a CAPTCHA seven times. This was inspired by Ref. [41], in which the authors achieved the best performance, consisting of 0.00% FAR and 0.00% FRR. Moreover, a large sample can help ensure an accurate and conclusive test result.

In this study, the generated CAPTCHA word was presented on a grey background with no background lines or noise. The main aim was to prove the effectiveness of the proposed solution and increase acceptance of the submitted idea. Finally, the intention was to display the CAPTCHA to the user on the signup and log-in webpage. A ‘refresh CAPTCHA’ button was also included, which would allow the user to view a new problem. In addition, instructions were provided to clarify that all characters were lowercase with no spaces between them.

Typed Text in the Sign-up (Enrolment) Phase

During the enrolment phase of this experiment, the system began to create a biometric template for each user by asking them to enter their credentials (username, email, and password) and to solve a CAPTCHA seven times. Moreover, guidelines appeared in the sign-up and log-in pages, explaining that all characters of the CAPTCHA were lowercase letters with no spaces between them. In addition, the participants were informed that any information entered would only be used for the purposes of the research. Figure 5 depicts a screenshot of the sign-up page.

Text in the Log-in (Verification) Phase

During this phase, the participants were required to enter their username and password, and to solve a CAPTCHA once on the log-in page. The features of the captured typing pattern were then extracted from the CAPTCHA solution and compared with those stored in the profile associated with the corresponding username and password in the database. Figure 6 depicts a screenshot of the log-in page.

5. Experimental Evaluation of the Proposed System

This section discusses the experiments conducted to evaluate the effectiveness of the proposed solution in preventing phishing attacks. As mentioned previously, the proposed work is based on calculating users’ hold time, latencies, and di-graph duration when solving a CAPTCHA test. To collect the features of keystroke dynamics, a toolkit was required. In turn, this necessitated the selection of an appropriate development platform. Although the project included both mobile and Web platforms, the focus was on Web development. The language chosen for developing the data acquisition was JavaScript because this is one of the most commonly used languages for Web and mobile applications, offering the benefits of low cost and high performance. In addition, HTML, Bootstrap, and JQuery AJAX were used. For the backend portion of this project, Python was adopted as the language and Flask as the framework. The system was developed and tested on a Windows laptop, using SQLite as the database system. Table 2 presents a brief summary of the components used to perform this experimental study. In order to evaluate the proposed system and increase the chance of generating clear results, two controlled laboratory experiments were conducted. The following subsections briefly describe each of these experiments:

5.1. Pilot Experiment

Before initiating the main experiment, a simple preliminary experiment was conducted on a small sample to examine the system’s performance. This pilot experiment would also determine the appropriate similarity threshold for authenticating genuine users and excluding phishers from the main experiment. In this pilot, the similarity threshold was determined as five (5), this being a randomly selected number, meaning that if the Euclidean distance value was less than or equal to five SD values stored in the user database, the user would be considered genuine and granted access to the system. Otherwise, the user would be identified as a phisher and prevented from accessing the system. The pilot experiment was conducted with five participants acting as phishers. The participants were asked to log into the website using the information provided, repeating their login attempts until the sixth attempt, which is when they would be stopped by the system. This plugin was used in the system to block the user’s Internet address from further attempts once a specified retry limit had been reached. The proposed system was inspired by Google’s six attempts per IP address. Moreover, Microsoft recommends a minimum of four attempts and maximum of 10. Table 3 presents the IP address of each participant and the time taken by each to complete the task. The pilot experiment demonstrated that the system was working properly and capable of preventing all phishers. Moreover, the similarity threshold was observed to be the total number of SD in similar profiles divided by two, where the number of user profiles will be increased after each successfully authenticated attempt, meaning that identifying a specific number might not be effective after a certain period of time as well as it would increase the FRR rate. As shown in Table 3, the number of similar profiles did not exceed four, indicating that the determined threshold will be capable of producing excellent results.

5.2. Main Experiment

Seventy-five participants participated in a controlled laboratory experiment to evaluate the proposed system. The participants comprised undergraduate students studying different subjects at Qassim University, all with different levels of typing skill. In addition, all were familiar with text-based CAPTCHAs. These participants were divided into two groups: genuine users and phishers. The group of genuine users consisted of 30 participants, while the phishing group contained 45 participants. The number of participants and their division into groups was very similar to the method adopted in most of the previous studies, for example, study of Ref. [40]. Thus, the results of these studies were all considered to be equally credible. Nonetheless, a higher number of participants was sampled in this current work compared to previous studies. The following subsections explain the actual experimental setup and procedure.

5.2.1. Experimental Setup

In this step, the system was prepared by deleting all data from the database and making the necessary changes identified in the pilot experiment. After preparing an appropriate place to conduct the experiment, the experimental procedure was explained to the users in an information sheet. The procedure was then re-explained to the users immediately before starting the main experiment. The experiment began with the group of genuine users. The participants in this group were registered in the system as genuine users, having entered their usernames, email addresses, and passwords, and having solved a text-based CAPTCHA seven times to collect sufficient keystroke timing data. The user subsequently needed to log into the system to gain access to the results page. If a user managed to gain access to the system, it would mean that the task was completed successfully, and all attempts were stored in the database as time vectors (user profile). Figure 7 illustrates the time vector of one genuine user, which was stored in the database. From Figure 7, it can be seen that eight samples from the user are present.

Once the information for the group of genuine participants had been collected, nine credentials of different users from various specialties were selected. This information was then offered to the phishing group so that they could gain illegal access to the system. These phishers were permitted six log-in attempts per IP address. The system would then block any further log-in attempts from that IP address. Figure 8 shows a phisher’s attempts to gain illegal access to the account of a genuine user, whose data are presented above in Figure 7.

5.2.2. The Experimental Procedure

A controlled laboratory environment was used as the experimental setting to avoid any interruption while the text-based CAPTCHA was being solved. This meant that all phones needed to be switched off (or put on silent) and any chatting with friends had to be avoided. All the participants were instructed that they needed to sign up, first by entering their username, email address, and password, and then by solving the CAPTCHA seven times. Guidelines for solving the CAPTCHA appeared in the sign-up and log-in pages, clarifying that all characters of the CAPTCHA were lowercase letters, with no spaces between them. Moreover, the system included a button to refresh the CAPTCHA so that a user could receive readable text and re-enter a solution. In addition, the participants were informed that any information entered would only be used for the purposes of the research. Following the sign-up phase, the participants were instructed to log into the system with the same email address and password that were entered in the sign-up phase, and to solve the CAPTCHA once. The participants were permitted to use backspace keys if required while typing. Finally, a welcome page with the corresponding username appeared to notify the participants that the experiment had ended. Once the genuine group’s information had been collected, the collection of the phishing group’s information began. For the phishing group, the same steps were undertaken as for the genuine group, except that the phishers did not need to sign up as they were being provided with other people’s information. They were to use this information to try and log in. All the participants received an information sheet explaining the primary goal of the experiment and how it would be conducted. This explanation was reiterated for each user individually before starting the experiment in order to ensure accurate understanding.

6. Evaluation Metrics

There are several metrics that were used to evaluate the effectiveness of the model. The false positive (FP), false negative (FN), true positive (TP), and true negative (TN) are parameters often used by any phishing solution researchers to judge the performance of solutions. Let true positive (TP) indicate the number of phishers correctly classified as phishing attackers; true negative (TN) indicates the number of genuine users correctly classified as genuine, false positive (FP) indicates the number of genuine users who are incorrectly classified as phishers, and false negative (FN) indicates the number of phishers who are incorrectly classified as genuine users. This study employs five different metrics based on these parameters, as follows:

True positive rate (TPR): it is the rate of phishers who are correctly classified as phishers of the total phishers. The equation of the TPR is shown in Equation (3):

TPR = \frac{TP}{TP + FN}

(3)

True negative rate (TNR): it is the rate of genuine users who are correctly classified as genuine users of the total genuine users; the equation of calculating TNR is shown in Equation (4):

TNR = \frac{TN}{TN + FP}

(4)

False positive rate (FPR): it is the rate of genuine users who are incorrectly classified as phishers of the total genuine users. The Equation (5) defined a FPR equation:

FPR = \frac{FP}{FP + TN}

(5)

False negative rate (FNR): it is the rate of phishers who are incorrectly classified as genuine users of the total phishers. Equation (6) shows how to compute FNR:

FNR = \frac{FN}{FN + TP} .

(6)

Accuracy refers to the total number of correctly classified attempts (true accept/true reject) in relation to the total number of all users’ completed attempts, and computes as shown in Equation (7):

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(7)

7. Results and Discussion

This section presents and discusses the results of the real experimental study for the proposed system. All the data were obtained from the actual experiment, and it was verified that all the participants successfully completed the given tasks. Details of these results are displayed and discussed in the following subsections.

7.1. Time Taken for Each Genuine User to Register in the System

Despite the length of the CAPTCHA and its sevenfold repetition in the registration phase, as well as having to complete the log-in phase, most of the users solved the CAPTCHA within 3–5 min, as shown in Figure 9. This indicated that the proposed system was not overly complicated or laborious.

7.2. Results for Average Hold Time, Up-Down (UD) Time, and Down-Down (DD) Time for Each Genuine User

Figure 10 illustrates the average of all features used in the proposed system (hold time, UD time, and DD time) for all the successful CAPTCHA answers typed by each genuine user. It should be noted that the average hold time was more constant, whereas the di-graph features were less constant between users. However, the system appeared to be effective in preventing phishers.

7.3. Number of Attempts Made by Attackers to Gain Unauthorised Access to the System

The number of attempts to obtain access to the system was limited to six for each IP address. If this maximum number was exceeded, the IP address would be blocked. Figure 11 presents the number of attempts made by attackers who gained successful access to the system. Conversely, Table 4 displays the IP addresses that were blocked because the maximum number of attempts was exceeded.

7.4. Results of Average Hold Time, Up-Down (UD) Time, and Down-Down (DD) Time for Each Phisher

Figure 12 shows the average hold time, UD time, and DD time for all phishing attempts to gain unauthorised access to the system. It should be noted that the system recorded the timestamp in milliseconds, as mentioned previously. The results prove the effectiveness of the proposed system for preventing phishing attacks.

Figure 10 and Figure 12 illustrate the unique rhythm of each participant (genuine users and phishers) generated when solving the text-based CAPTCHA. Although the values were close, there were no duplicates. Keystroke dynamics were, therefore, found to be effective in preventing phishing attacks. In addition, Figure 11 shows that some phishers made more than one attempt to gain access to the system. These repeated attempts indicate that, although the phishers obtained information about a genuine user’s credentials, it was difficult to mimic the genuine user’s typing dynamics when solving the CAPTCHA because the participants’ typing rhythms were recorded in milliseconds. Moreover, the proposed system appeared to have many usability advantages over traditional systems in terms of its ability to operate in stealth mode, together with its low cost, lack of additional hardware, user acceptance, and ease of integration into existing security systems. However, keystroke dynamics have two disadvantages: lower accuracy (they are affected by external factors, such as fatigue or stress) and lower permanence (a user’s typing pattern may change over time). The proposed system overcame these disadvantages as the experiment was conducted in a controlled environment to prove that each user’s typing pattern differed. In addition, the system stored new user profiles with each successful attempt to update the user profiles stored in the database. From Table 5, it shall be inferred that the proposed model is capable of preventing phishing attacks through identifying each user from their own typing pattern as well, with promising results.

Furthermore, the proposed approach was compared with those of previous studies, such as Refs. [21,22,23], in terms of ease of use, cost-effectiveness, observed popularity, and general security. Each of these terms is briefly explained below, as defined in Ref. [42], while the comparison is shown in Table 6.

▪: Ease of use is a basic concept referring to the facility of the authentication method adopted in terms of the level of user acceptance and system availability.
▪: Cost-effectiveness refers to an authentication method that provides excellent results without requiring high expenditure.
▪: Observed popularity indicates the percentage popularity of the method used as compared to other types of authentication.
▪: General security refers to an evaluation of the safety provided by the authentication method used.

In addition, the proposed study provides a high level of security to protect against traditional attacks, such as brute force, shoulder surfing, guessing, and dictionary attacks, as explained in the following:

▪

Brute force: the attackers attempt to try all possible combinations of characters in the hopes to find the username and password.

−: If the attacker finds the username and password, they must know the typing rhythm of the user when solving the CAPTCHA.

▪

Shoulder surfing: the attacker observes the typing pattern of the victim when solving the CAPTCHA to try mimicking typing rhythms.

−: Although it is possible to mimic a user’s typing pattern in fixed-text systems, it is more difficult in free-text systems because it requires the attacker to observe the victim’s behaviour for the duration of their logged-in session. Therefore, it is quite rare for an attacker to be able to replicate all typing rhythms of users.

▪

Guessing: the attacker tries to guess the correct password by using the most common words that they expect all the users used.

−: The attacker needs to obtain the typing pattern of the user to pass the CAPTCHA test.

▪

Dictionary attacks: the attacker attempts to defeat the authentication mechanism by determining the correct password from a large number of possibilities.

−: If the attacker finds the correct password, they must know the typing rhythm of the user when solving the CAPTCHA

However, several previous studies [26,27] used an image-based CAPTCHA to prevent phishing attacks, whereas the proposed system used a text-based CAPTCHA. Thus, the present results cannot be compared to this existing work. Besides, research by the authors of [24] used voice biometrics to protect Mpesa (a Kenyan mobile banking system) from fraud in a mobile money-transfer app. A voice trait, considered as a behaviour biometric in some cases, was adopted in the above study. However, this is incompatible with the proposed approach, which involves typing patterns. Moreover, the Mpesa approach related to a mobile platform, unlike the current study, which was conducted on a Web platform.

In contrast, other studies [31,32,33] used free-text keystroke dynamics for English language input, and time features were applied to distinguish between user samples. It is, therefore, very tempting to compare the proposed work with previous research in the same domain, but all the earlier studies differ in the number of participants, extracted features, environment in which the experiments were conducted, and the classification methods applied. Hence, it is not possible to rely on the credibility of these comparisons. That said, the three above-mentioned studies are similar to this current work in some respects, such as in controlling the environment, the use of English language free text, and the application of Euclidean distance as a classification method. Therefore, a simple comparison may be made to identify limitations in the benchmarking of these present results against those reported in Refs. [31,32,33], as illustrated in Table 7. However, the proposed approach outperforms these previous studies in terms of sample size and in determining the appropriate threshold for enhancing system performance. Note that equal error rate (EER) is a value of FAR/FRR where FAR equals FRR.

8. Conclusions and Future Work

Phishing is a constant and complex issue, with the capacity to do extensive damage to the targeted party and maximise gains for the attackers. It has already led to injurious losses in the business, government, and technology sectors. Nevertheless, to date, the mechanisms for preventing phishing attacks have proved insufficient, leaving practical challenges to be overcome. Therefore, this paper proposes an approach to preventing phishing attacks using CAPTCHA keystroke dynamics. In particular, a combination of three timing features was deployed to distinguish between samples of authenticated users and phishers: keystroke duration, di-graph duration, and latency. In total, 75 users participated in the experiment over a period of six weeks, and Euclidean distance was used to verify the user samples.

The results offer sufficient evidence of the effectiveness of capturing keystroke dynamics to prevent phishing, indicating that this approach is worthy of further study. Moreover, the proposed approach outperforms all previous work in the literature in terms of speed, acceptance, cost-effectiveness, and ease of integration with other systems. Also demonstrated was the benefit of collecting adequate samples in the enrolment phase, where the users were identified. Additionally, the experiment proved that the performance of keystroke dynamics could be improved by determining the appropriate threshold. This will be the focus of the author’s research going forward.

In future work, experiments will be conducted in an online environment with a larger number of participants in order to evaluate the proposed system in a real-world environment and obtain more accurate results. In addition, it will also be interesting to investigate other attacks that may face the proposed system when applied in the real world, such as replay attacks. Moreover, further features will be included to improve the results, and additional classifiers and more advanced methodologies will be applied.

Author Contributions

Conceptualization, E.K.A., A.M.A. and S.A.A.; Data curation, E.K.A.; Formal analysis, E.K.A.; Funding acquisition, E.K.A.; Investigation, A.M.A. and S.A.A.; Methodology, E.K.A. and A.M.A.; Project administration, A.M.A. and S.A.A.; Software, E.K.A.; Supervision, A.M.A. and S.A.A.; Validation, E.K.A., A.M.A. and S.A.A.; Writing—original draft, E.K.A.; Writing—review & editing, A.M.A. and S.A.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable, the study does not report any data.

Acknowledgments

The researchers would like to thank the Deanship of Scientific Research, Qassim University for funding the publication of this project.

Conflicts of Interest

The authors declare no conflict of interest.

References

Basit, A.; Zafar, M.; Liu, X.; Javed, A.R.; Jalil, Z.; Kifayat, K. A comprehensive survey of AI-enabled phishing attacks detection techniques. Telecommun. Syst. 2021, 76, 139–154. [Google Scholar] [CrossRef] [PubMed]
Uma, M.; Padmavathi, G. A survey on various cyber attacks and their classification. Int. J. Netw. Secur. 2013, 15, 390–396. [Google Scholar]
Lastdrager, E.E.H. Achieving a consensual definition of phishing based on a systematic review of the literature. Crime Sci. 2014, 3, 1–10. [Google Scholar] [CrossRef]
Jakobsson, M.; Myers, S. Phishing and Countermeasures: Understanding the Increasing Problem of Electronic Identity Thef; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
APWG. Phishing Activity Trends Report: 3rd Quarter 2021. 2021. Available online: https://docs.apwg.org/reports/apwg_trends_report_q3_2021.pdf?_ga=2.147528119.149518382.1644108193-680326765.1644108193&_gl=1*cr9iea*_ga*NjgwMzI2NzY1LjE2NDQxMDgxOTM.*_ga_55RF0RHXSR*MTY0NDEwODE5My4xLjAuMTY0NDEwODE5My4w (accessed on 23 February 2022).
Hewage, C. Coronavirus pandemic has unleashed a wave of cyber attacks-here’s how to protect yourself. Conversation 2020, 31. Available online: https://theconversation.com/coronavirus-pandemic-has-unleashed-a-wave-of-cyber-attacks-heres-how-to-protect-yourself-135057 (accessed on 23 February 2022).
Federal Bureau of Investigation-Internet Crime Complaint Center (IC3). 2020 Internet Crime Report. 2021. Available online: https://www.ic3.gov/Media/PDF/AnnualReport/2020_IC3Report.pdf (accessed on 23 February 2022).
Kulikova, T.; Shcherbakova, T.; Sidorina, T. Spam and phishing in Q1 2021. Available online: https://securelist.com/spam-and-phishing-in-q1-2021/102018/ (accessed on 23 February 2022).
Kulikova, T.; Shcherbakova, T.; Sidorina, T. Spam and phishing in 2020. Secur. Kapersky 2021. Available online: https://securelist.com/spam-and-phishing-in-2020/100512/ (accessed on 23 February 2022).
Ponemon, L. The 2021 Cost of Phishing Study. 2021. Available online: https://www.proofpoint.com/us/resources/analyst-reports/ponemon-cost-of-phishing-study (accessed on 23 February 2022).
Stanford University IT. University IT Launches Phishing Awareness Service. 2016. Available online: https://uit.stanford.edu/news/university-it-launches-phishing-awareness-service (accessed on 23 February 2022).
Buza, K. Person identification based on keystroke dynamics: Demo and open challenge. CEUR Workshop Proc. 2016, 1612, 161–168. [Google Scholar]
Brodić, D.; Amelio, A. The CAPTCHA—Perspectives and Challenges Perspectives and Challenges; Springer Nature: Cham, Switzerland, 2020. [Google Scholar]
Ahn, L.V.; Blum, M.; Hopper, N.J.; Langford, J. CAPTCHA: Using Hard AI Problems for Security; Lecture Notes in Computer Science; Springer Nature: Berlin/Heidelberg, Germany, 2003; Volume 2656, pp. 294–311. [Google Scholar] [CrossRef] [Green Version]
Varshney, G.; Misra, M.; Atrey, P.K. A survey and classification of web phishing detection schemes. Secur. Commun. Networks 2016, 9, 6266–6284. [Google Scholar] [CrossRef]
Masri, R.; Aldwairi, M. Automated Malicious Advertisement Detection using VirusTotal, URLVoid, and TrendMicro. In Proceedings of the 2017 8th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, 4–6 April 2017; pp. 336–341. [Google Scholar]
Jain, A.K.; Gupta, B.B. A novel approach to protect against phishing attacks at client side using auto-updated white-list. EURASIP J. Inf. Secur. 2016, 2016, 1–11. [Google Scholar] [CrossRef] [Green Version]
Kumar, A.; Gupta, J.B.B. Towards detection of phishing websites on client-side using machine learning based approach. Telecommun. Syst. 2017, 68, 687–700. [Google Scholar] [CrossRef]
Mao, J.; Li, P.; Li, K.; Wei, T.; Liang, Z. BaitAlarm: Detecting phishing sites using similarity in fundamental visual features. In Proceedings of the 2013 5th International Conference on Intelligent Networking and Collaborative Systems, Xi’an, China, 9–11 September 2013; pp. 790–795. [Google Scholar] [CrossRef]
Tirfe, D.; Anand, V.K. A survey on trends of two-factor authentication. In Contemporary Issues in Communication, Cloud and Big Data Analytics; Springer: Singapore, 2022; pp. 285–296. [Google Scholar] [CrossRef]
Khan, A.A. Preventing Phishing Attacks using One Time Password and User Machine Identification. Int. J. Comput. Appl. 2013, 68, 7–11. [Google Scholar]
Lee, Y.S.; Kim, N.H.; Lim, H.; Jo, H.K.; Lee, H.J. Online Banking Authentication system using Mobile-OTP with QR-code. In Proceedings of the 5th International Conference on Computer Sciences and Convergence Information Technology ICCIT 2010, Seoul, Korea, 30 November–2 December 2010; pp. 644–648. [Google Scholar] [CrossRef]
Patel, Y.; Diana, M.S.C. Fingerprint authentication technique to prevent phishing using pattern matrix. Int. J. Eng. Res. Dev. 2013, 6, 88–92. [Google Scholar]
Jepkemboi, C.L. Enhancing Security of Mpesa Transactions by Use of Voice Biometrics. Ph.D. Thesis, United States International University-Africa, Nairobi, Kenya, May 2018. [Google Scholar]
Hassan, M.A.; Shukur, Z. A secure multi factor user authentication framework for electronic payment system. In Proceedings of the 2021 3rd International Cyber Resilience Conference (CRC) 2021, Langkawi Island, Malaysia, 29–31 January 2021. [Google Scholar] [CrossRef]
James, D.; Philip, M. A novel anti phishing framework based on visual cryptography. In Proceedings of the 2012 International Conference on Power, Signals, Controls and Computation, Thrissur, India, 3–6 January 2012; pp. 207–218. [Google Scholar]
Krishnamoorthy, S.K.; Thankappan, S. A novel method to authenticate in website using CAPTCHA-based validation. Secur. Commun. Netw. 2016, 9, 5934–5942. [Google Scholar] [CrossRef]
Nanglae, N.; Bhattarakosol, P. A study of human bio-detection function under text-based CAPTCHA system. In Proceedings of the 11th IEEE/ACIS International Conference on Computer and Information Science, Shanghai, China, 30 May–1 June 2012; pp. 139–144. [Google Scholar] [CrossRef]
Costigan, N. The growing pain of phishing: Is biometrics the cure? Biom. Technol. Today 2016, 2016, 8–11. [Google Scholar] [CrossRef]
Karnan, M.; Akila, M.; Krishnaraj, N. Biometric personal authentication using keystroke dynamics: A review. Appl. Soft Comput. J. 2011, 11, 1565–1573. [Google Scholar] [CrossRef]
Alsultan, A.; Warwick, K. User-friendly free-text keystroke dynamics authentication for practical applications. In Proceedings of the 2013 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2013, Washington, DC, USA, 13–16 October 2013; pp. 4658–4663. [Google Scholar] [CrossRef]
Alsultan, A.; Warwick, K.; Wei, H. Free-text keystroke dynamics authentication for Arabic language. IET Biom. 2016, 5, 164–169. [Google Scholar] [CrossRef] [Green Version]
Alsuhibany, S.A.; Almushyti, M.; Alghasham, N.; Alkhudier, F. Analysis of free-Text keystroke dynamics for Arabic language using Euclidean distance. In Proceedings of the 2016 12th International Conference on Innovations in Information Technology, IIT 2016, Al-Ain, United Arab Emirates, 28–30 November 2016; pp. 185–190. [Google Scholar] [CrossRef]
Garrett, P.B. Linear algebra I: Dimension. In Number Theory, Trace Formulas and Discrete Groups; Academic Press: Cambridge, MA, USA, 2014. [Google Scholar]
Rouaud, M. Probability, Statistics and Estimation: Propagation of Uncertainties, p.191. 865 Creative Commons. 2013. Available online: http://www.incertitudes.fr/book.pdf (accessed on 23 February 2022).
Alsuhibany, S.A. Optimising CAPTCHA generation. In Proceedings of the 2011 Sixth International Conference on Availability, Reliability and Security, Washingot, DC, USA, 22–26 August 2011. [Google Scholar] [CrossRef]
Bursztein, E.; Moscicki, A.; Fabry, C.; Bethard, S.; Mitchell, J.C.; Jurafsky, D. Easy does it: More usable CAPTCHAs. In Proceedings of the Conference on Human Factors in Computing Systems-Proceedings, Toronto, CA, USA, 26 April–1 May 2014; pp. 2637–2646. [Google Scholar] [CrossRef]
Alsultan, A.; Warwick, K. Keystroke Dynamics Authentication: A Survey of Free-text Methods. Int. J. Comput. Sci. 2013, 10, 1–10. Available online: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=6B582DD715E9CD8F474394CED80C2A56?doi=10.1.1.412.2833&rep=rep1&type=pdf%5Cnhttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.412.2833 (accessed on 23 February 2022).
Killourhy, K.S.; Maxion, R.A. Comparing anomaly-detection algorithms for keystroke dynamics. In Proceedings of the International Conference on Dependable Systems and Networks (DSN), Lisbon, Portugal, 29 June 2009; pp. 125–134. [Google Scholar] [CrossRef] [Green Version]
Alsuhibany, S.A.; Alreshoodi, L.A. Detecting human attacks on text-based CAPTCHAs using the keystroke dynamic approach. IET Inf. Secur. 2021, 15, 191–204. [Google Scholar] [CrossRef]
Alsultan, A.; Warwick, K.; Wei, H. Improving the performance of free-text keystroke dynamics authentication by fusion. In Proceedings of the 2009 IEEE/IFIP International Conference on Dependable Systems & Networks, Lisbon, Portugal, 29 June–2 July 2009; pp. 1024–1033. [Google Scholar]
Idrus, S.Z.S.; Cherrier, E.; Rosenberger, C.; Schwartzmann, J.J. A Review on Authentication Methods. Aust. J. Basic Appl. Sci. 2013, 7, 95–107. [Google Scholar]

Figure 1. Detection of unique phishing sites [5].

Figure 2. Proposed system model.

Figure 3. Example of extracting time features of keystroke dynamics.

Figure 4. A generated sample.

Figure 5. Sign-up page.

Figure 6. Log-in page.

Figure 7. Time vector of one genuine user.

Figure 8. Example of time vector from one phisher’s attempts.

Figure 9. Time consumed by each genuine user.

Figure 10. Average hold time, UD time, and DD time for each genuine user.

Figure 11. Number of attempts made by attackers to gain access to the system.

Figure 12. Average hold time, UD time, and DD time for each phisher.

Table 1. Comparison between the current and relevant previous studies.

Reference	Contribution	Results
[24]	Proposed a multi-factor authentication scheme using voice traits as behavioural biometrics to protect a mobile banking system (Mpesa) in Kenya.	The results demonstrated that voice biometrics have numerous potential advantages and benefits in terms of reducing risk for mobile financial systems.
[33]	Introduced a new approach involving free-text keystroke dynamics authentication to provide a high degree of usability and security. Arabic and English language typing text was used to compare the performance of Arabic input with another input. Keystroke duration, di-graph duration, and latency were combined as a feature to distinguish between samples of authenticated users and impostors.	The results showed that the proposed approach achieved excellent results with FAR = 0.2 and FRR = 0.0.
Current work	Investigate the effectiveness of using CAPTCHA keystroke dynamics in enhancing the prevention of phishing attacks.	The results are promising in terms of providing a practical and cost-effective solution to prevent phishing attacks.

Table 2. Components of the proposed system.

Components	Description
Participants	75
Genuine group	30
Phishing group	45
Gender	Female
User’s profession	Students at Qassim University
Age range	Between 19 and 27 years
Language	Python
Recording of typing rhythms	JavaScript
Timing function	Date.getTime()
Keyboard	QWERTY (laptop)
Acquisition platform	Windows
Controlled environment	Yes
Typing text	English language free text
Training sample	7 samples
Testing sample	One-time sample

Table 3. Details of participants in the pilot experiment.

IP Address	Successful Attempts	Failed Attempts	Number of Similar Profiles	Time Consumed (In Minutes)
192.168.0.1	0	6	4	Most users did not exceed two minutes
192.168.0.2			2
192.168.0.3			0
192.168.0.4			0
192.168.0.5			1

Table 4. Blocked IP addresses.

ID	IP Address
1	192.168.0.1
2	192.168.0.2
3	192.168.0.3
4	192.168.0.4
5	192.168.0.5
6	192.168.0.6
9	192.168.0.9
11	192.168.0.11
12	192.168.0.12
15	192.168.0.15
18	192.168.0.18
32	192.168.0.32
34	192.168.0.34
35	192.168.0.35
38	192.168.0.38
40	192.168.0.40
41	192.168.0.41

Table 5. Performance of the proposed system.

Approach	TPR	TNR	FPR	FNR	Accuracy
The proposed approach	82.2%	100%	0%	17.8%	85%

Table 6. Comparison between some of the related research and proposed work.

Approach	Ease of Use	Cost-Effectiveness	Observed Popularity	General Security
[21]	**	**	***	**
[21]	**	**	***	**
[22]	****	***	**	**
[22]	****	***	**	**
[23]	***	**	*	***
[23]	**	**	*	***
Current approach	***	***	***	***
Current approach	***	***	***	**

Table 7. Summary of comparison results.

Study	Sample Size	FAR		FRR		EER
[31]	15	0.21		0.17		0.19
[33]	30	0.22		0		0.11
[32]	21	0.245 (SVM)	0.281 (DT)	0.613 (SVM)	0.702 (DT)	0.429 (SVM)	0.491 (DT)
Current study	75	0.17		0.0		0.08

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alamri, E.K.; Alnajim, A.M.; Alsuhibany, S.A. Investigation of Using CAPTCHA Keystroke Dynamics to Enhance the Prevention of Phishing Attacks. Future Internet 2022, 14, 82. https://doi.org/10.3390/fi14030082

AMA Style

Alamri EK, Alnajim AM, Alsuhibany SA. Investigation of Using CAPTCHA Keystroke Dynamics to Enhance the Prevention of Phishing Attacks. Future Internet. 2022; 14(3):82. https://doi.org/10.3390/fi14030082

Chicago/Turabian Style

Alamri, Emtethal K., Abdullah M. Alnajim, and Suliman A. Alsuhibany. 2022. "Investigation of Using CAPTCHA Keystroke Dynamics to Enhance the Prevention of Phishing Attacks" Future Internet 14, no. 3: 82. https://doi.org/10.3390/fi14030082

APA Style

Alamri, E. K., Alnajim, A. M., & Alsuhibany, S. A. (2022). Investigation of Using CAPTCHA Keystroke Dynamics to Enhance the Prevention of Phishing Attacks. Future Internet, 14(3), 82. https://doi.org/10.3390/fi14030082

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Investigation of Using CAPTCHA Keystroke Dynamics to Enhance the Prevention of Phishing Attacks

Abstract

1. Introduction

2. Related Work and Background

3. Proposed Work

4. Methodology

4.1. Definition of Features

4.2. Extraction of Timing Vectors

4.3. Finding the Distance and Classification Methods

4.4. Typing Text

5. Experimental Evaluation of the Proposed System

5.1. Pilot Experiment

5.2. Main Experiment

5.2.1. Experimental Setup

5.2.2. The Experimental Procedure

6. Evaluation Metrics

7. Results and Discussion

7.1. Time Taken for Each Genuine User to Register in the System

7.2. Results for Average Hold Time, Up-Down (UD) Time, and Down-Down (DD) Time for Each Genuine User

7.3. Number of Attempts Made by Attackers to Gain Unauthorised Access to the System

7.4. Results of Average Hold Time, Up-Down (UD) Time, and Down-Down (DD) Time for Each Phisher

8. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI