**3. Methods**

#### **Cognitive sciences applied to cybersecurity; an exploration based on PRISMA**.

The methodology used in this study was the development of a systematic literature review based on the PRISMA methodology, which includes four stages: identification, screening, eligibility analysis, and inclusion (see Figure 2). Study selection was based on

a systematic review following the Prisma Guidelines [21]. In the identification stage, we found works in the following databases: Springer, Scopus, IEEE, Association for Computing Machinery (ACM), Web of Science, and Science Direct, in the last three years, 2019 to 2020, to identify the trends in cybersecurity. The search queries established were the following:


**Figure 2.** SLR according to Prisma methodology.

The inclusion criteria were: (i) documents published on the scientific database from 2019 to 2021. The exclusion criteria included: (i) documents not related to cybersecurity and (ii) documents out of the research period (2019–2021). Figure 3 shows the screening and eligibility process of the 1244 studies. Then, based on the review of papers' titles and abstracts using a web application, Rayyan, created for the systematic review process, we removed the papers that did not comply with the inclusion criteria. At the end of the screening process, 813 articles were selected for full-text reading. Finally, we removed studies without clear proposals in the cybersecurity field, excluding 748 papers.

**Figure 3.** General topics in cybersecurity between 2019 to 2021.

#### **Qualitative analysis using text mining technique.**

Text mining, which is considered another field in cognitive science, is essential for qualitative cybersecurity research. However, text mining requires text cleaning and tok-

enization as prerequisites. In this way, the cleaning process of text, within the scope of text mining, consists of eliminating everything that does not provide information on its subject, structure, or content from the corpus. It should be noted that there is no single way to do this step. It depends on the purpose of the analysis and the text source. We applied a text mining analysis using R software to all 748 studies obtained in the included stage of PRISMA methodology. Thus, we eliminated non-informative patterns (web page URLs), punctuation marks, and single characters. We generated the text tokenization, which divides the text into the units for the analysis in question. We proceeded to store the tokenized text. Each element of the tokenized\_text column is a list with a character vector containing the generated tokens. However, there has been a significant change when doing the tokenization process. Before the text's division, the study elements (observations) were the titles and keywords of selected papers. Each one was in a row, thus fulfilling the condition of tidy data: one observation per row. When performing the tokenization, the study element has become each token (word), thus violating the condition of tidy data. Thus, each token list must be expanded to recover the ideal structure, doubling the other columns' value as many times as necessary [29]. We carried out the analysis for the years 2020–2021, obtaining the results in Table 2 and Figure 3.


**Table 2.** General topics in cybersecurity between 2019 to 2021.

We included the studies of all the works that evidenced the development of strategies and structures in cybersecurity. Furthermore, we considered articles referring to models developed for learning defense against a cyberattack.

Then, we developed a word cloud process to obtain more detail on scientific studies' contributions in the cybersecurity domain. Algorithm 1 shows the R script used to determine the cybersecurity topics, and Figure 4 shows the word cloud results.


**Figure 4.** Results of a word cloud of 748 papers.

#### **Cybersecurity attacks and their impact**

According to the World Economic Forum [1], cyberattacks were considered a fifth of the worldwide risks above food crisis in 2020. Adversaries developed several cyberattack scenarios. For instance, the United States Department of Justice discloses public information about scams perpetrated through websites, social networks, emails, and robocalls, among other means. All these related to fake news about COVID-19 vaccines, treatments, protective equipment, and obviously, about criminals who conducted fake businesses to steal identities or file fraud cite the USDJ. Another cybersecurity scenario covered by the United States Department of Justice is when adversaries send text messages using fictitious phone numbers and social media accounts to harass, intimidate, cyberstalk, and attempt to sexextort women [30]. On the other hand, CISA mentions that adversaries use Bots to conduct credential harvesting, mail exfiltration, crypto mining, point-of-sale data exfiltration, and the deployment of ransomware [31]. According to the FBI, from 2015 to 2019, reports about fraud in the FBI's Internet Crime Complaint Center (IC3) went from USD 1.1 billion to USD 3.5 billion [32]. Establishing the most appropriate cybersecurity defense solution is necessary to identify the characteristics of cyberattacks [33]. There are currently a grea<sup>t</sup> variety of cyberattacks [34]; Table 3 shows those that are the most recurrent among the selected papers for this study.


**Table 3.** Cybersecurity attacks detected in text mining process.

We contrasted this result with an international organization related to cybersecurity. We found that some of them were considered the most relevant cyberattacks in the year 2020, according to The European Union Agency for Cybersecurity (ENISA) [34]. Additionally, we compared this result with the report of a specialized cybersecurity firm. We found that four out of nine attacks documented in our study had a growth rate of between 7 and 25 percent in 2020 in America, Europe, and Asia (see Table 4). According to [67], a classification of cyberattacks is based on the effects they cause against a system or its architecture: misuse of resources; user access compromise; root access compromise; web access; malware; and denial of service.


**Table 4.** Growth rate percentage of cyberattack 2020.

Other cyberattacks use machines as attack vectors [68], while others focus on human behaviors [69]. In the case of phishing, attackers seek to exploit human vulnerabilities resulting from factors such as solidarity, desperation, or authority control to carry out their attack [70]. In contrast, Ransomware attacks exploit vulnerabilities in operating systems or applications to encryp<sup>t</sup> users' or organizations' sensitive information [71]. Within this context, Watering hole attackers use exploit kits with stealth features and seek to compromise a specific group of end-users by infecting websites [65]. A malicious URL attacker defines a link created to distribute malware or facilitate a scam [72]. Form hacking is a type of cyberattack where hackers inject malicious JavaScript code into legitimate website paymen<sup>t</sup> forms [73]. Table 5 shows a classification of attacks based on an adversary's resource (machine or human).



X represents the affectation of target due to attack.

Another way to classify cyberattacks could be based on the target, such as energy, healthcare, and transportation [74,75]. Table 6 shows some services considered targets by adversaries. An exciting fact obtained from text mining analysis is that most research works focus on cybersecurity in the energy domain. False data injection is the most famous attack in energy services because it focuses on modifying forecasted demand data [76]. The main issue with energy services, such as smart grids, is connected to network infrastructure and smart meters, which could have some vulnerabilities. This aspect increases the probability of cyberattacks on smart grid infrastructures [77]. Research focuses on preventing and overcoming cyberattacks by using machine learning techniques, such as artificial neural networks, to solve cybersecurity challenges, especially with the considerable volume of data on power systems [74].

**Table 6.** Classification of cybersecurity attacks based on target services.


Table 7 shows topics related to cybersecurity in energy facilities. Healthcare is another domain of interest for adversaries for sensitive and personal information [75]. In healthcare, one relevant issue is legacy software [78]. It is difficult for some hospitals or medical centers to migrate their medical records to new systems, e.g., for factors such as budget, data format, or time; this could be a disadvantage from a cybersecurity perspective. Some research is focused on improving authentication methods to reduce this gap [79], following the topics related to healthcare cybersecurity:


Adversary takes advantage of vulnerabilities in different domains, such as [80]:


**Table 7.** Cybersecurity topics related to energy facilities.


The growth of new electronic services and technologies such as IoT, big data, and artificial intelligence have allowed the development of new attack vectors [81,82]. IoT has generated interest by adversaries in carrying out security attacks due to its lack of advanced security and grea<sup>t</sup> coverage [83]. IoT solutions are very attractive for attackers because of the variety of attacks that can be performed on different components of IoT, among which we can mention the following [84]:

• Mobile devices;


The growth of crypto-currency and distributed authentication architecture is driving the use of blockchain architecture [85]. Another use of blockchain is in healthcare organizations to improve data integrity, authentication, and privacy issues, especially those with sensitive features such as medical records [86]. On the other hand, IoT is growing in different domains such as healthcare, smart city, and smart home [26]. Establishing authentication such as PKI architectures for IoT ecosystems could be expensive for many IoT devices, so smart contracts based on blockchain architecture are an alternative [87]. Following, we outline the topics related to blockchain and cybersecurity in papers selected in this work, which were developed between 2019 to 2021:


Some cyberattacks take advantage of new technologies such as 5G, IoT, and the cloud to perform DDoS attacks [88]. The growth of IoT devices with limited computational resources and lack of security configurations make them vulnerable to different cyberattacks. For instance, Mirai Botnet malware exploited the vulnerabilities of an estimated 600,000 IoT devices, resulting in massive Distributed Denial of Service (DDoS) attacks [89]. Cloud computing services are used to launch Distributed Denial of Service (DDoS) attacks. However, adversaries are focusing on low-rate DDoS attacks because they are more challenging to detect due to their stealthy and low-rate traffic [58].

On the other hand, using the hijacked Connection-less Lightweight Directory Access Protocol, an attacker could perform DDoS attacks at 2.3 terabytes per second [90]. Social media platforms have achieved relevance for interaction and social information exchange. However, the attackers have used them to deceive people and make them victims of attacks [91]. An adversary has found a striking attack target in humans because they can be deceived through persuasion techniques [15]. Attacks based on human vulnerabilities, called social engineering, have grown in recent years [66]. Figure 5 shows a word cloud of topics related to social engineering. We can observe that human factors are relevant in this kind of attack. The pandemic has created tremendous pressure on cybersecurity aspects. During the COVID-19 pandemic, the social engineering attacks carried out were phishing, spamming, and scamming. These attacks were combined with socio-technical methods such as fake emails, websites, and mobile apps [92]. The need to work remotely has changed the attack surface of organizations. Attacks on VPNs, hijacking of video meetings, fake news campaigns, and phishing attacks have increased during the COVID-19 pandemic [15]. According to the text mining process, we identified the following topics related to COVID-19 and cybersecurity:


**Figure 5.** Word cloud of topics related to Social Engineering.

#### **Challenges in cybersecurity solutions**

To face cyberattacks, organizations have established cybersecurity mechanisms that could be physical, software-oriented, or procedural. Below, we show some of the most common defense mechanisms:


The mechanisms described above are the most common solutions for cyberattacks. However, it is possible to define specific defense mechanisms for each type of cyberattack in some cases. For instance [67], the two defense techniques against phishing attacks are:


However, MITRE [93] has defined 245 techniques that the attacker could use for executing cyberattacks. The techniques are distributed in 14 stages; each stage is associated with the attackers' process of executing cyberattacks. Figure 6 shows the number of techniques associated with each stage. Figure 7 shows the frequency of MITRE techniques included in the works selected in this study, which were developed between 2019 and 2021. Our text mining analysis found that the most relevant techniques are reconnaissance, discovery, lateral movement, collection, command-control, and impact. On this point, it is important to mention that the absence of frequency in other techniques, such as initial access or privilege escalation, is not an indicator that these techniques are not used in cyberattacks. The information shown in Figure 8 reveals that researchers are more focused on the result of one specific technique in their study. However, for the review made, we can observe that not all selected works considered the cycle of a cyberattack; this aspect is relevant for developing a good defense strategy. We found that most of the techniques mentioned in the selected studies focused on gathering information, such as reconnaissance, discovery, and collection.

**Figure 6.** Cybersecurity Techniques according to MITRE.

**Figure 7.** Techniques MITRE identified in works selected in this study.

TechniquesMITREusedinvertical domainssuchasEnergy,SocialEngineering,

 and IoT.

**Figure 8.**

Figure 8 shows the relation between cybersecurity techniques and domain attacks: energy, IoT, and social engineering. We observed that the relevance of a specific technique depends on the type of cyberattack. For instance, the most relevant techniques in social engineering are reconnaissance, resource development, persistence, and defense evasion. On the other hand, the most relevant techniques in IoT attacks are credential access, lateral movement, and collection. This number of techniques could be a challenge because cybersecurity analysts need to have the capability to detect them in real-time when they are used in cyberattacks to select the best defense strategy.

Figure 9 shows some variants of cybersecurity attacks based on social engineering, which show the incredible versatility of attacks, which can vary depending on the attack techniques used digitally, in person, or by phone.

**Figure 9.** Classification of Social Engineering attacks.

Cybersecurity solutions require adapting to new challenges:


Cybersecurity firms and researchers have been developing some alternatives by mainly focusing on anomaly detection. Inside the anomaly detection process, the objective is to detect some pattern, behavior, or component used by attackers [94]. Table 8 shows topic development from 2019 to 2021 related to anomaly detection. Cybersecurity companies and researchers in the field have moved on from reactive solutions to proactive ones [95].

**Table 8.** Cybersecurity topics related to IoT.


Cybersecurity research is trying to stay one step ahead and take advantage of cybersecurity analysts' cognitive capabilities to define proactive cybersecurity defense strategies. So, several research types are focused on incorporating cognitive models to generate these proactive solutions. In the selected period (2019–2021), several studies included artificial intelligence and machine earning concepts applied to cybersecurity (See Table 9).


**Table 9.** Topics related to anomalies.

The use of supervised machine learning such as Decision Tree (DT), Support Vector Machine (SVM), Naïve Bayes (NB), Random Forest (RF), and unsupervised algorithms such as K-nearest neighbor (kNN) and Artificial Neural Network (ANN's) for building intrusion detection systems (IDS), or anomaly pattern detection, are the most exciting topics in cybersecurity. A relevant fact observed in the selected papers was the growing number of studies related to deep learning applications. Researchers have considered deep learning a good alternative for facing different cybersecurity issues. How can deep learning be applied to detect IoT attacks, APT, DDoS, malware, and anomaly detection? An interesting fact is that there are three variants of deep learning:


Table 10 shows topics identified from papers in the text mining process related to security in IoT. Research focus on defense solutions to face DDoS include the use of cognitive sciences approaches such as [88]:


**Table 10.** Machine learning applied to cybersecurity.


Below there are some approaches of studies between 2019 and 2021 with solutions based on machine learning and deep learning for identifying malicious URLs or sentiment analysis in social media:


Game theory is another alternative of cognitive sciences applied to cybersecurity. Its objective is to try to guess the next step for adversaries during cyberattacks. Figure 10 shows a word cloud with topics related to game theory. We identified that game theory could be applied to different domains such as energy, investment, cyber-physical systems, and computer security. Additionally, game theory research shows approaches in defense mechanisms, information dissemination, and decision making. Game theory uses computational modeling to take advantage of security analysts' cognitive processes and adversaries to improve decision-making based on information analysis to face attacks [96]. Game theory is mostly used in the economy field, which is responsible for studying optimal decisions and strategies for given situations. According to the definition of Nash equilibrium, the strategy or set of strategies of each player responds to the other players' actions to maximize each player's profit. The player's strategy is a specific action at a particular moment of the game [96]. A game is defined as interacting with two or more participants seeking a reward. During the game, participants develop strategies to maximize their profit. Players do not necessarily represent people; they can be organizations or groups. There are two classic games in-game theory: cooperative games and non-cooperative games. There are two ways for the mathematical representation of a game: a standard form using matrices and an extensive form using decision trees. A cooperative game is based on the players' interaction reaching agreements to establish the decision-making that each player will carry out, achieving the objective of reaching coalitions, and determining how to distribute the rewards [97]. However, in non-cooperative games, each player must decide what decision to make without knowing the rest's decisions. These are more subject to the reality of what happens in the cybersecurity domain. Complete information games are those in which each player knows all the events in the game's course from the beginning, especially when making a decision. A classic example of a complete information game is the game of chess. Incomplete information games, in most cases, are simultaneous decision-making games, so each player knows something that the others do not. Interactions between an adversary and the user could be modeled based on two players' stochastic game. Using a non-linear program is possible to compute Nash equilibrium to define the best response strategies for players [98]. Developing games that consider cost, time, reward, and performance could define effective game strategies.

**Figure 10.** Game theory research topics.

#### **4. Results and Discussion**

*Cognitive Cybersecurity Model*

Our text mining process found that the works selected in this study do not consider indirect cognitive processes or cognitive models such as OODA or MAPE-K. Including

game theory in cybersecurity can lead to strategies to minimize cyberattacks from a cognitive perspective. A complete information model is the most appropriate to obtain the best decision from the game theory approach. Big network environments are very complex scenarios for developing detection and protection cybersecurity solutions. The integration of machine learning and deep learning with game theory techniques could improve proactive security solutions. Concerning Figure 2, Cassenti et al. [23] mention that technology does not consider the user learning processes. From our perspective, the game theory approach could be a solution to this because it validates the user's decision-making processes based on a set of experiences and patterns. From the game theory perspective, if the user (player) improves the learning process or the decision-making process based on cognitive processes, the probability of winning the game increases. In this sense, we propose in Figure 11, a cognitive cybersecurity model based on integrating cognitive process and machine learning, deep learning, and game theory approach applied in cybersecurity. As shown in Figure 11, we structured the model into three layers. The first layer of the cognitive model addresses the aspects of perception related to the cognitive processes. It associates them with sources of information that can be analyzed to establish patterns of anomalies based on space-time criteria. The second layer establishes the association of the understanding processes with machine learning (ML) techniques or deep learning (DL) that can be used for the anomaly detection processes. This association must have bi-directional feedback between analysts and technology to improve ML or DL algorithms' training processes. The way towards the analysis allows us to generate perspicacity about cybersecurity situation awareness. Additionally, this feedback should support the improvement in the analyst's cognitive processes to detect cyberattacks.

**Figure 11.** Proposal for a Cognitive Cybersecurity model.

Finally, the third layer associates the cognitive projection process with game theory techniques. At this level, the decision-making processes to establish the best defense strategy must be supported by the information obtained from the ML and DL processes carried out in the lower layer. The bidirectional relation, in one sense, is the computational model of the game theory component. In another sense, it should improve the cognitive decision-making processes. However, establishing the proposed model is complex without obtaining all the information from the analyst and adversary (See Figure 12). Modeling the adversary's characteristics would allow analysts to have a complete vision to establish a better decision. For instance, they knew that the adversary could use a combination of tools (T), techniques (Th), and procedures (C2F). However, the list of tools and procedures can be extensive and varied. Below is a list of the most widespread RATs:

	- FudgeC2 is a campaign-orientated Powershell C2;
	- Callidus is an open-source C2 framework that leverages Outlook, OneNote, Microsoft Teams for command and control;
	- APfell is a cross-platform, OPSEC aware, red teaming, post-exploitation C2 framework;
	- DaaC2: is an open-source C2 framework that makes use of Discord as a C2;
	- Koadic is an open-source for post-exploitation;
	- TrevorC2 is a client/server model for masking command and control through web browsers

**Figure 12.** Attack and defense components.

We represent in Equation (1) the attack as the combination of tools (T), procedures (C2F), and techniques (Th), where w represents the weights based on the tool, procedure, and technique used by the adversary.

$$\text{Attack} = \text{w(T)} + \text{w(C2F)} + \text{w(T h)}\tag{1}$$

On the other hand, cybersecurity analysts developed a set of cognitive processes to establish the defense process. From a macro vision, the analyst must decide if a possible event could be an attack or not, based on the cognitive process of perception. Bitzer et al. [99] mention that perceptual decision making is applied to two-alternative forced-choice tasks to judge perceptual feature differences. According to Bitzer, driftdiffusion models have been used to quantitatively analyze behavioral data, i.e., reaction times and accuracy. In the same vein, Dale et al. [100] mention that the cognitive analysis of vast amounts of data requires the application of the heuristics process and that people often mistakenly judge the likelihood of a situation by not taking all relevant data into account. However, according to Nikoli´c et al. [101], the application of heuristics as mental strategies and certain deformations in the thoughts and perceptions of decision makers affect their attitudes and approach to problem solving. Trueblood et al. [102] mention that we need to understand how people make perceptual decisions to improve training to minimize misdiagnoses in the medical field. So, let us adapt this approach to cybersecurity: the defense strategy must be oriented toward the factors associated with the cognitive process; this is described in Equation (2), where: R.T is the Response Time associated with the time

for executing a defense action by a cybersecurity analyst; H.T is the heuristic thinking associated with the process of selecting a decision; B is the Bias related to human thinking, and S.A is the speed accuracy in the decision-making process.

$$\text{Cogarithmic Process} = \mathbf{w1(R.T)} + \mathbf{w2(H.T)} + \mathbf{w3(B)} + \mathbf{w4(S.A)}\tag{2}$$

where wi is the weight assigned to each variable.

Once the cognitive process has been carried out, the best decision is made considering the weight of each variable in the cognitive process, expressed in Equation (3).

$$\text{Decision(j)} = \text{(AP j)} \cdot \text{Cogarithmic process} \tag{3}$$

where Delta P j is the variation due to weights in cognitive processes. Therefore, the defense strategy is expressed as Equation (4).

$$\text{Defense} = \text{(Decision j)} + \text{Error} \tag{4}$$

However, analysts in the cybersecurity decision-making process could be affected by factors such as Bias and speed accuracy. Bias (B) effects and speed-accuracy effects are ubiquitous in experimental psychology. Bias effects arise when the two stimulus alternatives occur with unequal frequency or have unequal rewards attached to them. Speed-accuracy (SA) effects arise as the result of explicit instructions emphasizing speed or accuracy [103]. Computational models of decision making present a solution to this problem. In particular, we choose Response Time (RT) models such as the drift-diffusion model (D.D.M.), proposed by Ratcliff [103], and the linear ballistic accumulator (LBA) model, proposed by Brown [104]. Accumulator models assume that evidence is accumulated over time until a threshold amount is reached for a commitment to that response option. These models contain four primary parameters related to different psychological components of simple decisions: caution, Bias, stimulus processing, and motor sense.
