MICRA: A Modular Intelligent Cybersecurity Response Architecture with Machine Learning Integration

Coutinho, Alessandro Carvalho; Araújo, Luciano Vieira de

doi:10.3390/jcp5030060

Open AccessArticle

MICRA: A Modular Intelligent Cybersecurity Response Architecture with Machine Learning Integration

by

Alessandro Carvalho Coutinho

^*

and

Luciano Vieira de Araújo

School of Arts, Sciences and Humanities of University of São Paulo, São Paulo 03828-000, Brazil

^*

Author to whom correspondence should be addressed.

J. Cybersecur. Priv. 2025, 5(3), 60; https://doi.org/10.3390/jcp5030060

Submission received: 20 June 2025 / Revised: 8 August 2025 / Accepted: 12 August 2025 / Published: 16 August 2025

(This article belongs to the Collection Machine Learning and Data Analytics for Cyber Security)

Download

Browse Figures

Versions Notes

Abstract

The growing sophistication of cyber threats has posed significant challenges for organizations in terms of accurately detecting and responding to incidents in a coordinated manner. Despite advances in the application of machine learning and automation, many solutions still face limitations such as high false positive rates, low scalability, and difficulties in interorganizational cooperation. This study presents MICRA (Modular Intelligent Cybersecurity Response Architecture), a modular conceptual proposal that integrates dynamic data acquisition, cognitive threat analysis, multi-layer validation, adaptive response orchestration, and collaborative intelligence sharing. The architecture consists of six interoperable modules and incorporates techniques such as supervised learning, heuristic analysis, and behavioral modeling. The modules are designed for operation in diverse environments, including corporate networks, educational networks, and critical infrastructures. MICRA seeks to establish a flexible and scalable foundation for proactive cyber defense, reconciling automation, collaborative intelligence, and adaptability. This proposal aims to support future implementations and research on incident response and cyber resilience in complex operational contexts.

Keywords:

machine learning; threat intelligence; threat detection; modular architecture; cybersecurity framework; incident response automation; secure data sharing; artificial intelligence; proactive cyber defense

Graphical Abstract

1. Introduction

The cyber threat landscape has evolved rapidly, posing increasing risks to individuals, organizations, and governments [1], requiring responses that go beyond traditional security mechanisms. The growing sophistication of attacks, ranging from ransomware [2] and vulnerabilities exploited in IoT systems [3] to advanced persistent intrusions [1], poses significant challenges for the protection of critical networks and data. This reality highlights the need for architectures that are simultaneously flexible, scalable, and capable of dynamically adapting to new attack patterns [4,5].

Traditional cybersecurity solutions face limitations, mainly due to the use of architectures that offer little or no adaptability to deal with different incidents and isolated operations, which limits the possibility of offering integrated and adaptable defenses [6]. As cyber threats evolve, conventional approaches such as perimeter-based models and static access controls often prove insufficient to deal with the comprehensive and dynamic advance of threats [7,8]. These systems lack the features with which to quickly adapt to dealing with new threats and face difficulties in incorporating emerging technologies and modern processes, providing gaps that adversaries can exploit [9].

The adoption of machine learning presents an opportunity to improve threat detection and response, strengthening cyber defense resilience [8,10]. When coupled with intrusion prevention system strategies and layered security approaches, this combination can provide more robust protection against constantly evolving threats [6,11].

Recent works have sought to address these limitations by combining advanced deep learning architectures and dynamic attention mechanisms. For example, Chechkin et al. [12] proposed a hybrid model integrating Kolmogorov–Arnold Networks (KAN), Bidirectional Long Short-Term Memory networks (BiLSTM), and Transformers, enhanced by multi-domain dynamic attention to better detect anomalies and malicious behaviors in cybersecurity contexts. While this solution demonstrates improved detection capability by focusing adaptively on relevant aspects of diverse security data, its main limitations involve high computational demand, difficulty handling rare attack scenarios, vulnerability to adversarial inputs, and reduced interpretability, which may constrain practical deployment in resource-limited or critical environments.

In this context, this study presents MICRA, Modular Intelligent Cybersecurity Response Architecture, a conceptual proposal that integrates machine-learning techniques into a modular structure aimed at detecting, validating, and responding to cyber incidents.

Problem addressed

Poorly adaptable architectures, current solutions show limited flexibility to deal with new attack vectors or update components incrementally.
Isolated operations, heterogeneous tools act without coordination, making it difficult to correlate alerts and automate response.
Restricted intelligence sharing, validated threat indicators seldom feed back into internal mechanisms and trusted external partners in a structured manner.

MICRA was designed for dynamic operational environments, such as corporate networks, critical infrastructures, and distributed ecosystems, where multiple security data sources must be integrated and the response must evolve continuously as new attack patterns emerge.

Contributions of this proposal

Scalable modular structure that allows independent replacement of components (collection, analysis, response).
Combination of pattern-detection algorithms, heuristics, and behavioral analysis, combined with layered validation mechanisms (automatic and manual).
Adaptive response orchestration, with playbooks that adjust counter-measures according to context, prioritization, and IoC confidence.
Collaborative intelligence sharing, through a strategic hub that disseminates validated indicators both internally and to trusted external entities.

This architecture establishes a structured conceptual basis, developed from gaps identified in the literature and recurring challenges in organizational environments. MICRA should be understood as a starting point for future implementations, empirical evaluations, and iterative improvements in different operational contexts, enabling its effectiveness to be measured in specific scenarios.

The remainder of this article is structured as follows: Section 2 presents the related work, highlighting the main advances and gaps in existing approaches. Section 3 describes the MICRA architecture, detailing its modules and submodules for collecting, analyzing, validating, responding to, and sharing threat data. Section 4 provides recommendations and guidelines for implementation, illustrating how MICRA can be adapted to different environments. Section 5 describes the implementation of the prototype and presents a proof-of-concept evaluation. Although a specific technology stack is used for illustration, due to the low coupling model of MICRA, each component can be replaced without compromising the architecture. Section 6 presents a detailed discussion, situating MICRA’s modular contributions within the literature and operational realities, while acknowledging practical constraints and integration challenges. Section 7 then summarizes the study’s main findings and delineates three priority avenues for future work.

2. Related Work

2.1. Adaptive and Scalable Architectures

Neshenko et al. [4] presented a comprehensive review of situational awareness methods related to cyber threats in the context of smart cities, highlighting the use of machine learning and big data analysis as tools to improve threat detection and decision-making in connected environments. They reinforced the importance of scalable and adaptive architectures in the face of the growing complexity of critical infrastructures; however, they also pointed out that the absence of integrated models which are capable of combining systemic dependencies, contextualized intelligence, and advanced detection techniques compromises the effectiveness of adaptive responses. The authors concluded that there is an emerging need for more coordinated, holistic, and modular solutions to effectively address dynamic cyber challenges.

Tariq et al. [13] proposed an intrusion detection system based on fog–edge computing, which combines distributed processing between devices at the edge of the network (edge) and intermediate nodes (fog), integrating machine learning and federated learning techniques. The architecture was designed to reduce latency, preserve data privacy, and improve the accuracy of threat detection in dynamic environments. However, the authors identified relevant operational limitations, such as interoperability difficulties, challenges in the efficient management of models in heterogeneous scenarios, and the absence of an integrated approach to deal with contextual threats, which restricts its applicability in systems that require a multidimensional correlation of events.

Nguyen et al. [14] formulated an approach for security orchestration and automatic response in cyber–physical systems and Internet of Things (IoT) environments, with an emphasis on continuous data collection and the automation of defense actions. The solution uses Deep Reinforcement Learning to dynamically adapt responses to threats and increase the resilience of critical infrastructures. The authors highlighted the contributions of real-time decision-making, optimization of security strategies, and integration with Cyber Threat Intelligence (CTI) tools. However, they identified important limitations, such as the complexity of implementation, which requires advanced integration of heterogeneous components; the restricted focus on specific scenarios, making generalization difficult; and the dependence on high-quality data and robust models, which reduces effectiveness in contexts with scarce, unbalanced, or unlabeled data.

2.2. Detection Based on Machine Learning

Balaram et al. [9] addressed malware detection using a hybrid approach that combines static and dynamic analysis, using convolutional neural networks (CNNs) to examine images of malware bytes and recurrent neural networks (RNNs) to interpret sequences of API (Application Programming Interface) calls, making it possible to identify complex malicious patterns. Although the proposal achieved high detection accuracy, it has limitations related to scalability in large and diverse environments, as well as a dependence on high-quality data to train the models, which can compromise its effectiveness in scenarios with scarce or unlabeled data.

Hasan et al. [15] introduced an approach for distributed intelligence on edge devices, using large-scale language models (LLMs) optimized for local execution. The architecture integrates federated learning and edge servers to enable analysis close to the source of the data, such as network traffic and logs, while preserving privacy and reducing latency. Although the authors reported improvements in real-time threat detection, they highlighted limitations related to the computational demand of LLMs on restricted devices and the complexity of maintaining up-to-date models in large-scale distributed environments.

Salem et al. [16] conducted a comprehensive review of artificial intelligence-based threat detection techniques, analyzing more than 60 recent studies involving machine learning algorithms, deep learning, and metaheuristics applied to cybersecurity. The research covered different attack vectors, such as malware, network intrusions, phishing, and advanced persistent threat (APT), showing that such techniques contribute to improving detection and response. The authors also pointed out important limitations, including the strong dependence on labeled data, high computational resource consumption, and low model interpretability. As a direction for future work, they emphasized the need for greater integration between approaches, broader dataset diversification, and enhanced transparency in algorithmic decision-making.

Okutan et al. [17] developed a system for proactively predicting cyber attacks based on unconventional indicators, such as data from social media and public platforms. The proposal combines data imputation, temporal aggregation, and class-balancing techniques to deal with incomplete and unbalanced bases. The authors also identified relevant limitations, such as the reliance on public indicators—which may not capture threats specific to corporate environments—and the requirement for large volumes of historical data to ensure consistency in the obtained results.

Sarker et al. [18] developed a modular framework for automatic threat detection and response, integrating supervised, unsupervised, and reinforcement learning techniques. The architecture was designed with a focus on automation and adaptability, using network data, databases, and user activity to track the evolution of threats. Despite the robust approach, the authors highlighted limitations related to the dependence on high-quality data to train the models and the complexity of continuous updates—factors that could compromise the system’s applicability in dynamic and heterogeneous environments.

2.3. Cyber Threat Intelligence (CTI)

Silaen et al. [19] carried out a systematic review on the use of honeypots in the detection and prevention of cyber attacks, analyzing 38 studies published between 2018 and 2023. The authors highlighted the growing adoption of these systems in sectors such as health, cloud computing, and industrial control, highlighting their potential to generate valuable data on malicious activities, especially when integrated with machine learning techniques. However, they pointed out limitations such as the high consumption of resources in high-interaction honeypots and the need for constant updates to avoid identification by attackers.

Skrobanek et al. [20] explored the correlation between honeypot data and IDS (intrusion detection system) alerts using One-Class SVM as an alternative in scenarios with a shortage of labeled data. The approach stands out for allowing detection based on unlabeled data, which is useful in operational environments. The authors acknowledged that the model remains sensitive to outliers and exhibits limited effectiveness when confronted with complex scenarios, such as zero-day attacks.

Based on the MISP platform [21], Alzahrani et al. [22] presented a Cyber Threat Intelligence architecture that integrates data from honeypots and OSINT (Open Source Intelligence) sources. The solution facilitates the correlation and sharing of IoCs (Indicators of Compromise), promoting proactive defense and situational awareness. However, the authors pointed out important challenges, such as limited collaboration between organizations, regional biases in the collection process, and gaps in the validation of IoCs, suggesting the need for expansion of the sensor infrastructure and the improvement of analytical methodologies.

2.4. Heuristic Approaches and Behavioral Analysis

Szczepanik et al. [23] highlight the effectiveness of heuristics in detecting attack variants that share known characteristics, such as file structures and traffic patterns. The authors noted that such approaches depend heavily on predefined patterns, limiting their effectiveness against unknown or polymorphic threats.

Iwabuchi et al. [24] proposed a hybrid model that combines heuristics with machine learning, using network packet data to improve the detection of complex malicious patterns. Although it showed good results, the method faces challenges such as the occurrence of false positives and the need for continuous training, which can compromise its robustness in operational environments.

Behavioral analysis has been explored as a complementary approach to identifying anomalies in network traffic. Amirthayogam et al. [25] integrated these techniques into intrusion detection systems (IDSs) for real-time monitoring, and Krajewska et al. [26] introduced a semi-supervised algorithm for clustering malicious traffic and detecting unknown patterns. Both studies, however, stressed the dependence on partially labeled, high-quality data to ensure the effectiveness of the models.

Wang et al. [27] presented the Seed Expanding algorithm, based on clustering, to segment traffic into distinct phases of cyber attacks and identify behaviors before they cause damage. Despite the innovative proposal, the authors acknowledged that the algorithm relies heavily on pre-processing steps, such as discretization and data transformation, and may face difficulties in adapting to highly dynamic attack patterns.

2.5. Summary of Gaps and Motivation for the Proposal

The reviewed studies report important progress in three main areas: (i) dynamic adaptation enabled by machine and reinforcement learning techniques, (ii) behavioral analytics for anomaly detection in network traffic, and (iii) Cyber Threat Intelligence (CTI) platforms that improve information coordination and sharing. However, these capabilities are usually deployed in isolation, leading to fragmented defenses and hindering a systemic, coordinated understanding of cyber threats.

Table 1 summarizes the distinctive approaches, key strengths, and reported limitations of each work discussed in Section 2.1 to Section 2.4. The limitation column, in particular, highlights concrete gaps, such as the lack of multidimensional event correlation, the high computational cost on constrained devices, or the dependence on labelled data, and these gaps motivate the integrated design principles adopted by the MICRA architecture.

To address the multifaceted gaps highlighted in Table 1, this study introduces MICRA (Modular Intelligent Cybersecurity Response Architecture), a conceptual framework that brings together four capabilities that the literature regards as desirable yet are rarely found in the same system: (i) dynamic data acquisition, (ii) threat detection and revalidation across multiple analytics layers, (iii) automated, context-aware response orchestration, and (iv) collaborative intelligence sharing.

MICRA’s modular and adaptable structure enables a continuous cycle of detection, validation, and mitigation, with emphasis on accuracy, agility, and interorganizational collaboration. A distinctive contribution is its multilayer validation pipeline—combining automatic signatures, expert review, and repository-based corroboration—executed before any protection action is enforced, which directly reduces the high false-positive rates reported by several studies.

Finally, MICRA’s components, algorithms, and data pipelines are designed to be customizable for each operational context, thereby overcoming the rigidity found in many traditional solutions and supporting a resilient, extensible cybersecurity posture that addresses the limitations summarized above.

2.6. Alignment with NIST CSF 2.0

The NIST Cybersecurity Framework (CSF 2.0) [28] organizes recommended cybersecurity activities into six high-level functions—Govern (GV), Identify (ID), Protect (PR), Detect (DE), Respond (RS) and Recover (RC). Each function is further broken down into core categories that describe what an organization should achieve (e.g., Asset Management, Incident Mitigation).

Although MICRA is framework-agnostic by design—its loosely coupled, modular pipeline can operate without any external standard—mapping its outputs to the CSF enables direct adoption by organizations that already structure their programs around the framework. Table 2 shows how each MICRA module contributes to one or more CSF functions.

MICRA is intentionally framework-agnostic: its modules and data flows operate without depending on any specific external standard. Nevertheless, by explicitly mapping each module’s outputs to the CSF 2.0 Functions, we demonstrate how the architecture:

Fully supports Identify, Protect, Detect, and Respond;
Provides clear starting points for future coverage of Govern and Recover;
Facilitates adoption in organizations that already rely on the NIST CSF, allowing MICRA to integrate seamlessly into existing processes.

3. Proposed Architecture

Based on the diversity and complexity of contemporary cyber threats, it is essential to adopt a flexible, modular, and adaptive approach to protect critical networks and systems. Considering the limitations and innovations highlighted in the literature, we propose MICRA—an architecture made up of specialized modules that integrate data collection and processing, machine learning, threat validation, incident response automation, and collaborative information sharing between organizations.

This modular approach allows for the dynamic integration of emerging techniques and technologies, facilitating continuous adaptation to evolving threats. By organizing the architecture into coordinated analysis flows that encompass collection, detection, validation, response, and collaboration, the model simplifies the adoption and orchestration of techniques, addressing the main challenges of modern cyber defense in a scalable and flexible manner.

The MICRA architecture covers the main stages of a cyber defense process and is made up of six main modules:

M1—Dynamic Data Acquisition (DDA): The dynamic collection of network flow data, including corporate traffic and data from honeypots;
M2—Cognitive Threat Analysis (CTA): Identifies malicious patterns using machine learning algorithms, heuristics, and behavioral analyses.
M3—Strategic Threat Intelligence (STI): Performs second-layer validation of suspicious data through signatures, manual reviews, and validated threat data repositories;
M4—Adaptive Response Orchestration (ARO): Performs automated and adaptive responses to detected incidents in an orchestrated manner;
M5—Proactive Threat Discovery (PTD): Systematically searches historical data to detect unidentified threats in the network’s data flow;
M6—Collaborative Intelligence Ecosystem (CIE): Promotes the continuous sharing of threat information between internal applications and external organizations.

The architecture of MICRA incorporates complementary mechanisms that strengthen its effectiveness:

Validation in multiple layers, with the aim of reducing false positives;
Automated responses that aim to minimize reaction time;
Retrospective analysis to identify threats in historical data;
Continuous learning and adaptability to new contexts;
Institutional collaboration to expand collective intelligence.

In this way, MICRA seeks to consolidate a dynamic cyber defense ecosystem in which detection, response, and intelligence sharing operate in a coordinated manner. This approach aims to establish a versatile and extensible basis for cyber protection in complex environments and provides avenues for future research and practical applications.

The architecture is made up of six main modules and fifteen submodules, which can be deployed or adjusted according to the context of the technological environment. Figure 1 and Table 3 illustrates the organization of the architecture, and the following sections present the modules and submodules.

3.1. M1—Dynamic Data Acquisition (DDA)

The DDA module is responsible for the continuous collection of data in transit on the network infrastructure. Its submodules, M1.1—Deceptive Data Streaming and M1.2—Network Data Streaming (Figure 2), monitor the environment and collect and forward the data to the subsequent analysis infrastructure.

To optimize resources and reduce storage consumption, the architecture prioritizes the capture of traffic metadata—commonly referred to as data flow—rather than completely capturing packets. Collecting the flow allows communication patterns to be analyzed and possible anomalies to be identified more efficiently. This approach makes it possible to extract strategic information from data such as the following:

Source and destination IP addresses: Unique identifiers of the sending and receiving devices, essential for tracing the origin and destination of packets;
Source and destination ports: Indicators of the services or applications involved, making it possible to infer the context of the communication;
Network protocol: Type of protocol used (e.g., TCP, UDP, or ICMP), which defines the communication standard;
Total bytes and packets: Volume of data transmitted in an interval, useful for identifying unusual flows;
Date and time stamps: Records of the start and end of communication, essential for temporal analysis and correlation with events;
TCP flags: Information about the status of connections, including initiation and termination, and events such as retransmissions.

The data collected by submodules M1.1 and M1.2 are forwarded to module M2—Cognitive Threat Analysis (CTA), where they undergo analysis based on machine learning, heuristics, and behavior.

M1.1—Deceptive Data Streaming: This submodule is responsible for collecting data from honeypots distributed in the architecture. Honeypots are systems designed to simulate legitimate services, but without any real operational functionality, acting as passive sensors to capture indicators of attacks. The proposal is for these elements to be implemented in one or two complementary configurations:

External honeypots: Exposed to the internet, without blocking rules. This configuration simulates vulnerable services in order to collect data on exploitation attempts by external agents.
Internal honeypots: Deployed on private networks to identify suspicious behavior from internal users or devices.

Both configurations provide valuable data on the tactics, techniques, and procedures (TTPs) used by malicious agents, feeding into the architecture’s intelligence cycle. As these systems do not host productive services, any attempted interaction will be considered by M1.1 as suspicious and will be labeled “suspect_honeypot”.

M1.2—Network Data Streaming: This submodule collects and continuously forwards network flow metadata observed at routers, firewalls, and other infrastructure edge points. It covers both legitimate, established communications and connection attempts, allowing a comprehensive view of internal and external flows.

Designed for hybrid environments, the M1.2 submodule is designed to work in local area networks (LANs), industrial networks (OT), and cloud infrastructures, with the ability to capture traffic with the granularity required for subsequent behavioral analysis. This architectural flexibility favors its integration into scenarios with multiple security zones, such as demilitarized zones (DMZs), segmented corporate networks, and industrial environments.

3.2. M2—Cognitive Threat Analysis (CTA)

The CTA module is responsible for automatically and adaptively analyzing the data collected by the DDA module in order to identify potentially malicious activities. This module was designed to deal with one of the main limitations observed in the approaches presented in the literature: the inability of many systems to distinguish between normal and anomalous behavior in constantly evolving environments.

To do this, the CTA module employs three specialized submodules: M2.1—Pattern Recognition Threat Analyzer, M2.2—Heuristic Threat Analyzer, and M2.3—Behavioral Threat Insights, which operate in a complementary manner through the use of supervised machine learning techniques, heuristic analysis, and behavioral analysis. This combination of methods seeks to strengthen the ability to detect known threats and emerging attacks.

The process begins with the processing of network flows and data from honeypots, to which categorization and pattern extraction algorithms are applied. Once anomalies have been detected, the data are labeled based on the patterns identified and then forwarded to the M3—Strategic Threat Intelligence (STI) module for multi-layered validation, allowing the analysis to be refined and aimed at reducing false positives.

The CTA architecture (Figure 3) was designed with the flexibility to support the continuous replacement or updating of algorithms and techniques, making it possible to incorporate new detection methods as threats and algorithms evolve. This capacity for continuous adaptation is designed to favor the resilience of the architecture in the face of complex and dynamic operational scenarios.

M2.1—Pattern Recognition Threat Analyzer: The Pattern Recognition Threat Analyzer submodule receives data collected by M1.1—Deceptive Data Streaming, previously labeled as “suspect_honeypot,” as any interaction with honeypots is considered potentially malicious. This submodule was implemented as a deterministic analyzer of malicious patterns previously observed in the controlled environment. In the MICRA prototype, its role is to identify occurrences of Indicators of Compromise (IoCs) extracted directly from honeypot logs and label them as potentially suspicious interactions.

Once these patterns are established, they are applied to data collected by M1.2—Network Data Streaming. Events that exhibit characteristics similar to those previously captured in the honeypot environment are labeled as “suspect_network_honeypot.” In this way, MICRA converts knowledge derived from controlled malicious interactions into actionable intelligence applicable to real-world environments, enabling a continuous learning cycle from newly observed attacks.

M2.2—Heuristic Threat Analyzer: The Heuristic Threat Analyzer submodule performs a heuristic analysis of data from M1.2—Network Data Streaming, focusing on detecting malicious behavior that corresponds to known patterns. Detection is based on signatures and heuristics derived from characteristics observed in malware and widely documented attack vectors.

The analysis incorporates prior knowledge obtained from labeled datasets containing malicious behaviors. When the observed behavior matches the defined heuristic rules, the data are labeled with “suspect_network_heuristic.” This approach is designed to identify variants of known malware and common attack strategies.

M2.3—Behavioral Threat Insights: The Behavioral Threat Insights submodule applies behavioral analysis techniques to the data from M1.2—Network Data Streaming, with the aim of detecting anomalies that may indicate threats not recognizable by signatures or pre-established rules. The approach combines statistical and machine learning methods to build standard behavior profiles and identify relevant deviations.

The process involves establishing a behavioral baseline for devices or network segments using metrics such as averages, standard deviations, and probabilistic distributions. In parallel, clustering algorithms can be used to identify outliers, allowing unusual activity to be detected even in the absence of labels.

For example, if a device normally restricted to internal communications starts transmitting large volumes of data to an atypical external destination, this pattern change will be labeled as “suspect_network_behavioral.”

This approach aims to detect emerging and unknown threats by promoting a behavior-based analysis layer that continuously adapts to the monitored environment.

3.3. M3—Strategic Threat Intelligence (STI)

The STI module is responsible for applying a second layer of analysis and validation to data previously labeled as suspicious. Its aim is to reduce the occurrence of false positives and strengthen the reliability of threat classification before automated actions are taken to protect the environment.

The STI is made up of the following submodules: M3.1—Threat Signature Validation, M3.2—Expert Intelligence Validation, and M3.3—Strategic Data Validation Hub (Figure 4), and it receives as input labeled data from the following submodules: M2—Cognitive Threat Analysis, M5—Proactive Threat Discovery, and M6—Collaborative Intelligence Ecosystem.

Conflict-resolution flow. When the three sub-modules produce conflicting verdicts, MICRA follows a precedence order that privileges human expertise and the organization’s curated history before consulting external sources. (1) M3.2—Expert Intelligence Validation holds overriding authority: the security team may confirm or overturn any other diagnosis, because analysts can weigh broader business factors—human safety, environmental impact, brand reputation, supply-chain dependencies—that lie beyond algorithmic scope. (2) M3.3—Strategic Data Validation Hub is consulted next; this internal repository consolidates the historical classifications already reviewed by experts, providing a low-cost, high-trust reference. (3) M3.1—Threat Signature Validation serves as a complementary mechanism, querying external feeds and signatures only when the previous two levels offer no consensus or when new indicators have yet to be catalogued. Time-outs and precedence can be tuned to match each organization’s risk appetite and resource profile, while preserving the guiding principle that qualified human judgement has primacy in critical decisions.

Each piece of data received is subjected to different validation strategies, which may include the following:

Verification using known signatures;
Manual analysis by cyber defense experts;
Comparison with external and internal repositories of indicators of commitments (IoCs).

The STI architecture was designed to offer flexibility in the selection of validation methods, allowing them to be adapted to security policies and the criticality of the data being analyzed. In addition, the module can support an approach based on contextual risk assessment, balancing operational cost, response time, and potential impact. With this, technical resources can be allocated strategically, prioritizing threats with greater potential for damage or greater informational value.

Data on validated threats are forwarded to M4—Adaptive Response Orchestration (ARO), where they are processed via mitigation and containment mechanisms. This direct connection between validation and response makes it possible to react more quickly and in a targeted and contextualized way, thus reducing the risk exposure time.

The use of STI submodules can be configured adaptively, with specific combinations defined according to the availability of resources, the desired level of automation, data sensitivity, and operational priorities.

M3.1—Threat Signature Validation: The Threat Signature Validation submodule performs automated validation of threats by comparing them with known signatures, using APIs (Application Programming Interfaces) to forward the data labeled “suspect_…” to specialized security tools. These tools consult threat intelligence databases, including anti-malware engines and IoC feeds, in order to verify correspondence with previously identified malicious patterns.

If a match is identified, the data are labeled “signature-validated” and, if not, they are marked as “non-signature-validated.” This approach enables rapid responses to known threats and can improve the efficiency of the validation process by reducing the volume of data requiring manual investigation. Integration with multiple intelligence sources extends coverage and tends to minimize the incidence of false positives.

M3.2—Expert Intelligence Validation: The Expert Intelligence Validation submodule is responsible for conducting detailed manual analysis on data that could not be validated by automated methods or whose criticality requires human evaluation. This layer is especially relevant in environments where incorrect blocking decisions can have significant impacts.

The investigation is carried out by cybersecurity experts, who evaluate factors such as the context of the communication, behavioral patterns, and the origin and destination of the flows, among other elements they deem relevant. This approach can be applied to detect sophisticated threats, such as new and unknown attacks that escape automated tools.

Although it may represent the submodule with the highest operational cost, the coordinated use of the previous submodules is expected to contribute to more efficient screening, potentially reducing the volume of data requiring manual evaluation. Data validated by experts as malicious are labeled as “manual-validated,” while the rest are labeled as “non-manual-validated.”

The integration of human expertise and technical pre-processing mechanisms tends to strengthen the reliability of the MICRA architecture, favoring greater precision in potentially complex scenarios.

M3.3—Strategic Data Validation Hub: The Strategic Data Validation Hub submodule compares suspicious data with a consolidated repository of Indicators of Compromise, made up of previously validated cyber threat data. This database will be continuously updated with contributions from submodules M3.1 and M3.2, as well as external STI sources [22] and investigations carried out by the technical cybersecurity team.

The process consists of checking the correspondence between the data labeled “suspect_…” and the existing records in the repository. If identified, the data are labeled “repository-validated” and, otherwise, they are marked as “non-repository-validated.”

This submodule also plays a strategic role in the feedback loop of MICRA. Consolidating internally validated data allows future analyses to benefit from the accumulated knowledge, promoting the system’s continuous learning. As a result, the architecture’s knowledge base is progressively expanded, improving its ability to respond to new threats dynamically and adaptively.

3.4. M4—Adaptive Response Orchestration (ARO)

The ARO module is responsible for converting validated threat data into concrete mitigation and containment actions. It takes as input the data classified as malicious by M3—Strategic Threat Intelligence (STI) and acts in the automated execution of response measures.

Composed of the submodules M4.1—Dynamic Perimeter Defense, M4.2—Dynamic Endpoint Defense, and M4.3—Dynamic Network Intrusion Prevention (Figure 5), the main objective of ARO is to reduce the time between detecting and neutralizing risks. This is particularly relevant in environments with a high volume of traffic and a high frequency of alarms with suspicious events, where a manual response would be unfeasible in terms of scale and speed.

The module is designed to offer progressive orchestration, allowing response mechanisms to be activated in a staggered manner. Initially, actions can be carried out manually or under supervision, allowing the blocking and mitigation rules to be refined. This calibration phase helps to reduce the rate of operational false positives and the risk of undue interruptions to critical services.

As the process is validated and refined, ARO can enable a gradual transition to automated responses, based on criteria of confidence in the validated process and operational maturity. This approach seeks to balance agility and precision, allowing automated decisions to be executed with less human intervention, potentially maintaining a more effective response to identified cyber threats.

M4.1—Dynamic Perimeter Defense: The Dynamic Perimeter Defense submodule acts to protect network borders, using the data labeled as malicious by M3—Strategic Threat Intelligence (STI) to generate blocking rules which are applicable to incoming and outgoing traffic. This bidirectional action is essential to mitigate both the entry of external threats and the exfiltration of sensitive data to unauthorized environments.

By blocking access points to the environment, this submodule seeks to limit the spread of threats, prevent lateral movement between devices on different networks, and interrupt potentially malicious communications with external sources. It can be used strategically to mitigate attacks before they reach sensitive systems or compromise the confidentiality of information.

M4.2—Dynamic Endpoint Defense: The Dynamic Endpoint Defense submodule focuses on protecting network end devices, such as servers and workstations, which are often the initial targets of cyber attacks. Based on data validated by the STI, the submodule implements local blocking policies on endpoints, using block lists updated with malicious indicators.

This decentralized approach aims to prevent the execution of malicious software, as well as the interruption of communications with compromised domains or IP addresses. By reinforcing security at the point of origin of many attacks, the module aims to reduce the risk of local infections and prevent devices from being used as vectors for propagation in the network via lateral movement.

M4.3—Dynamic Network Intrusion Prevention: The Dynamic Network Intrusion Prevention submodule acts as an intrusion prevention system (IPS), using dynamically updated threat signatures based on data classified as malicious by the STI module. This defense layer operates in real-time, analyzing network traffic and proactively blocking attempts to exploit vulnerabilities and malicious communications.

The adaptive nature of this submodule allows detection rules to be continuously updated, favoring alignment with the evolution of the threat scenario. This capability contributes to a more responsive and contextualized response, with the potential to interrupt hostile behavior before it compromises the integrity of the environment.

3.5. M5—Proactive Threat Discovery (PTD)

The PTD module expands the capabilities of the MICRA architecture by incorporating mechanisms for retrospective analysis of historical data, complementing the detection processes carried out on network data traffic. Composed of the submodules M5.1—SQL Threat Search Engine and M5.2—Malware Pattern Search Engine (Figure 6), its focus is on identifying threats that were not detected in the analysis of network flow data.

The investigation is based on records stored in systems such as security information and event management (SIEM) platforms or organizational data repositories, allowing for the analysis of previous activities based on recently identified attack patterns. The analyses use the data labeled as malicious by the STI module as a reference.

Based on this knowledge, PTD can generate detection scripts and update machine learning models, increasing the architecture’s predictive capacity. This makes it possible to identify signs of compromise in past events that did not generate alerts.

Historical data identified as suspicious are labeled and forwarded to the STI module for revalidation, ensuring that all possible threats—from both the network flow and historical analysis—are subjected to the same methodological rigor before automated response measures are adopted.

M5.1—SQL Threat Search Engine: The SQL Threat Search Engine submodule performs historical analysis on records stored in systems such as SIEM platforms or organizational data repositories, applying queries in SQL (Structured Query Language) to identify threat patterns based on previously detected characteristics. The SQL queries will be prepared based on the data validated by the STI module, including (among others) the following search types:

Detection of repeated authentication attempts associated with Brute Force attacks;
Identification of unusual changes in access privileges;
Locating flows with atypical volumes of data or anomalous temporal patterns;
Correlation of records with validated indicators of commitment (IoC);
Verification of communications with IP addresses or domains later classified as malicious.

Data identified as suspicious by this submodule are labeled as “suspect_historical_sql” and forwarded to the STI module for further revalidation.

M5.2—Malware Pattern Search Engine: The Malware Pattern Search Engine submodule applies malware detection patterns to historical records, using textual, structural, and behavioral characteristics extracted from stored files and events. The construction of the patterns is based on the knowledge accumulated in the STI module, with a focus on increasing the capacity for retroactive identification of undetected threats in the analysis of network data flow. The following are among the elements considered:

Malware characteristic strings contained in binaries;
Metadata such as digital signatures, timestamps, and compiler data;
File structure, such as sections and headers of the PE (Portable Executable) format;
Behavioral actions, such as API calls, modifications to system files, and connections to external domains.

The scripts with the patterns are applied to large volumes of historical data in order to locate potential threats. Data that match these patterns are labeled as “suspect_malware_pattern” and forwarded to the STI module for revalidation.

3.6. M6—Collaborative Intelligence Ecosystem (CIE)

The CIE module provides the MICRA architecture with the ability to share cyber intelligence and threat data, both internally and with other organizations, thus promoting a more coordinated, proactive, and resilient defense. Composed of submodules M6.1—Internal Data Intelligence Hub and M6.2—External Data Collaboration (Figure 7), the CIE module aims to establish a collaborative ecosystem in which validated data feed internal protection mechanisms and strengthen collective intelligence through inter-organizational cooperation.

In the internal environment, the data labeled as malicious by the STI module are stored and integrated into the environment’s own tools, such as investigation platforms, incident response, monitoring systems, and endpoint protection solutions. This process allows the accumulated knowledge to be reused to increase the effectiveness of internal defenses.

For the external environment, validated data are shared securely with partners, threat intelligence communities, and collaborative platforms. The sharing process must comply with privacy guidelines and incorporate sensitivity control mechanisms, allowing the source organization to define which data will be shared, with what level of detail, and in a way that does not compromise its operating environment or the confidentiality of the information.

Information received from external sources is evaluated internally as follows: before being incorporated into the pipeline, it is sent to the STI for technical validation, following the same criteria applied to the environment’s native data. This revalidation process contributes to the consistency of the database and helps to mitigate the risks associated with the spread of inaccurate information or false positives, which can have an impact on the environment if they are used in blocking rules.

M6.1—Internal Data Intelligence Hub: The Internal Data Intelligence Hub submodule manages the integration and internal distribution of threat data previously validated as malicious by the STI module. These data are stored in a centralized repository and connected via APIs to the organization’s various security solutions.

Typical integrations include the following:

SIEM (Security Information and Event Management): For event correlation and anomaly detection;
SOAR (Security Orchestration, Automation, and Response): For orchestrating automated responses;
Ticketing systems: For recording and managing incidents;
Alert and notification platforms: For activating countermeasures;
Forensic investigation solutions: For retroactive analysis of events and attacks.

This submodule makes it possible for validated information to be available and integrated into the organization’s technical ecosystem, allowing for continuous and automatic updates of defense mechanisms.

M6.2—External Data Intelligence Hub: The External Data Intelligence Hub submodule enables the secure inter-organizational sharing of Cyber Threat Intelligence, promoting the early detection of risks and strengthening collective defense.

The sharing process follows these main steps:

Data selection: Only information disassociated from the organization’s internal context is eligible for sharing;
Anonymization and filtering: All data are passed through a sanitization process, eliminating sensitive attributes or those that could identify details about the environment of the organization of origin.
Classification and prioritization: Threats are organized by criticality, optimizing the use of information by the receiving organizations;
Integration with CTI (Cyber Threat Intelligence) platforms: Uses secure protocols standardized by solutions such as MISP and OpenCTI [29], facilitating automated and interoperable sharing;
Input validation: Data received from third-party organizations must be labeled “suspect_collaborative_ioc” and forwarded to the STI module, where they go through the validation process before being used in the response module.

This flow is designed to favor the reliability and applicability of shared knowledge. In addition, M6.2 contributes to the creation of a collaborative cybersecurity ecosystem, where organizations actively cooperate in the early detection, sharing, and blocking of attack vectors.

4. Towards Practical Implementation

This study presents MICRA as a conceptual proposal that was developed considering the gaps identified in the academic articles described in the related papers, particularly with regard to the lack of integration between detection and response, the rigidity of many existing architectures, and the limitations on collaboration between organizations. One of its main differences is its modular adaptability, designed to enable incremental integration, the progressive replacement of components, and the customization of techniques according to the level of technological maturity, the operational requirements, and the budgetary constraints of each organizational scenario.

Even after implementation, the modules and submodules of MICRA can be adjusted or reconfigured according to continuous learning, new demands, or technological advances. This conceptual flexibility makes the framework potentially applicable to a wide range of contexts—from corporate and academic environments to critical infrastructures—allowing it to be partially or fully implemented according to the specific needs of the technological environment.

This section aims to present suggestions for tools and approaches that can be considered in the practical implementation of MICRA, with an emphasis on free or open-source solutions. Although these suggestions are neither exhaustive nor restrictive, their purpose is to support researchers and professionals who wish to apply, adapt, or expand the MICRA architecture proposal in their respective academic, technical, or institutional contexts, as long as they are aligned with the functional objectives of each module.

4.1. M1—Dynamic Data Acquisition (DDA)

The operationalization of the DDA module requires the adoption of continuous traffic data collection mechanisms capable of providing sufficient information for the subsequent analytical modules of the MICRA architecture. To this end, a dual approach is proposed, combining the passive collection of network traffic with the active capture of suspicious interactions by means of honeypots.

M1.1—Deceptive Data Streaming: This submodule is responsible for capturing suspicious interactions by simulating vulnerable services and environments. For this purpose, honeypots such as Cowrie [30] can be used, which simulate SSH (Secure Shell) and Telnet services and record authentication attempts and executed commands. Dionaea [31] can be used to capture malware samples transmitted via different protocols, serving as a source for detection models. Honeyd [32] allows for the simulation of entire networks with multiple configurable hosts, broadening the scope of the collection and the diversity of the data generated.

Honeypots can be deployed in isolation or in combination, strategically positioned at internal and external points in the network, depending on the visibility and monitoring objectives.

M1.2—Network Data Streaming: This submodule passively collects network traffic in order to provide a continuous view of communications between devices in the monitored environment. To do this, it is recommended to use flow export protocols, such as NetFlow [33], sFlow [34], and IPFIX [35], which make it possible to extract network metadata without the need to store complete packets.

Among the tools suggested, nProbe [36] is compatible with various network flow protocols and can be integrated into edge devices or critical infrastructure. Zeek [37] offers advanced inspection and analysis capabilities, allowing traffic patterns and relevant events to be detected with greater granularity.

Adopting these resources makes it possible to capture attributes such as source and destination IP addresses, ports used, traffic volume, frequency of connections, and protocols involved, providing essential input for the architecture’s analytical modules.

4.2. M2—Cognitive Threat Analysis (CTA)

The M2 module is responsible for identifying suspicious behavior in data collected by the DDA module. Its main function is to process, label, and forward suspicious data to the validation stage—the STI module.

M2.1—Pattern Recognition Threat Analyzer: This submodule is responsible for analyzing data collected from honeypots (M1.1), which is previously labeled as suspicious, since any interaction with these decoy systems is considered potentially malicious. The implementation adopts a deterministic approach based on the direct matching of known Indicators of Compromise (IoCs).

For this purpose, Python scripts can be used to extract, normalize, and compare IoCs—such as IP addresses, file hashes, URLs, or command patterns—against real-time network data. This enables the identification of suspicious events that reproduce behaviors previously captured in the honeypot environment. By using structured comparisons and rule-based logic, the system provides explainable and auditable results.

M2.2—Heuristic Threat Analyzer: This submodule is dedicated to detecting patterns associated with already known malware, based on previously defined heuristic characteristics. This analysis can consider attributes such as system call sequences and traffic patterns observed in recurring incidents.

For implementation, it is possible to use supervised models with the support of libraries such as scikit-learn, which provides algorithms such as Support Vector Machines [38] and Gradient Boosting [39], as well as TensorFlow, which offers greater flexibility in the creation of neural networks. PyCaret can be used to facilitate rapid experimentation with algorithms such as Random Forests [40] and Decision Trees [41], allowing the models to be adjusted according to the operational context.

M2.3—Behavioral Threat Insights: This submodule focuses on the behavioral analysis of devices and network flows, with a focus on identifying deviations from the usual pattern of operation. This approach is particularly useful in scenarios where there are no previously known signatures or explicit descriptions of malicious behavior.

Anomalies can be detected using clustering and outlier detection algorithms such as K-Means [42], DBSCAN [43], and Isolation Forest [44], implemented in libraries such as scikit-learn and PyCaret. In environments with high data dimensionality, more advanced techniques, such as autoencoders [45] or recurrent neural networks (RNNs) [46], can be used with the support of TensorFlow to model normal patterns of behavior and flag atypical variations.

4.3. M3—Strategic Threat Intelligence (STI)

This module is responsible for the strategic validation of possible threats identified in the CTA, PTD, and CIE modules, as well as other detection sources. Its function is to help reduce false positives and support response decisions based on more reliable data.

M3.1—Threat Signature Validation: This submodule performs automated validation of data labeled as suspicious, such as IP addresses, domains, URLs, and file hashes, by comparing them with threat intelligence sources and databases. Implementation can be conducted through integration with platforms such as MISP, which organizes and distributes IoC feeds in a structured way via APIs. The VirusTotal platform [47] can be used to check files, hashes, URLs, and IP addresses in multiple detection mechanisms, while Cortex [48] allows for the automation of simultaneous analysis in several external sources.

Complementary sources, such as AbuseIPDB [49] and PhishTank [50], can enrich the analysis by providing public classifications for potentially malicious IPs and URLs. The combined use of these tools can reduce the volume of data to be forwarded for manual analysis, promoting agility and consistency in the process of screening suspicious data.

M3.2—Expert Intelligence Validation: When automated validation is not conclusive, this submodule can complement the process through contextual analysis carried out by cybersecurity experts. The proposal is to integrate human judgment into the investigative process, based on evidence such as historical records, behavior patterns, communication characteristics, and information on the origin and destination of the data analyzed.

Tools such as Wazuh [51], combined with event correlation and analytical visualization mechanisms, such as Graylog [52], can provide technological support for this investigation, offering enriched and organized data for analysis. In addition, external sources such as technical forums, specialized communities, and threat intelligence (CTI) reports can be consulted to contextualize incidents that escape traditional automated detection techniques.

Although this process requires greater human involvement, it is conceived as a strategic complementary layer, particularly relevant in sensitive or mission-critical environments. The combination of human analysis and automated intelligence seeks to increase the reliability of the architecture and reduce the risk of inadequate responses to complex threats.

M3.3—Strategic Data Validation Hub: This submodule aims to consolidate, in a central repository, the data validated by automated and manual processes previously conducted in the MICRA modules. This knowledge base is designed to support both validations and feedback from internal detection and response mechanisms, promoting greater consistency and continuity in cybersecurity actions.

The organization of the repository can consider different data formats. Structured Indicators of Compromise (IoCs) can be stored in relational databases such as MariaDB [53] or PostgreSQL [54]. Semi-structured information, such as JSON or XML records, can be handled by solutions such as MongoDB [55] or Elasticsearch [56]. For unstructured data, including extensive log records, traffic captures, and malware samples, alternatives such as MinIO [57], Ceph [58], or Apache Hadoop [59] offer scalable support.

Integration with external sources, such as MISP, OpenCTI, VirusTotal, AbuseIPDB, and PhishTank, can be carried out via APIs, allowing for continuous updating and enrichment of the local database. In addition, record revalidation and expiration policies can be applied to ensure that stored data are up-to-date and reliable, avoiding decisions based on obsolete information.

4.4. M4—Adaptive Response Orchestration (ARO)

This module proposes the coordination of automated responses to cyber incidents based on data previously validated in the strategic intelligence module (STI). The main objective is to reduce the latency between the identification of a threat and the application of countermeasures, promoting a responsive, orchestrated defense compatible with different levels of technological maturity.

M4.1—Dynamic Perimeter Defense: This submodule proposes the automated application of blocking rules at the borders of the environment, based on previously validated data. Perimeter protection can be carried out using firewalls or software-based solutions, integrated with orchestration mechanisms capable of applying policies programmatically and dynamically.

Using tools such as Shuffle Automation [60] makes it possible to automate this process, allowing IoCs, such as IP addresses, file hashes, and malicious URLs, to be converted into specific rules and applied via APIs. Scripts developed with the Python programming language can complement this integration, adapting the data received to the different formats required by each technology in the environment.

M4.2—Dynamic Endpoint Defense: This submodule proposes the implementation of automated protection mechanisms on the network’s end devices, with the aim of preventing them from being used as vectors for spreading threats. Based on the data validated in the strategic intelligence module (STI), it is recommended that block lists and specific security policies be generated and applied directly to the endpoints on an ongoing basis.

Tools such as Wazuh can be used to monitor devices, apply security settings, and issue alerts. This protection can be reinforced with XEDR (Extended Detection and Response) platforms, which allow for the correlation of events between different assets and provide an integrated view of distributed threats. The orchestration of these actions can be automated using tools such as Shuffle Automation or Python scripts, making it possible to synchronize defense policies between multiple endpoints.

M4.3—Dynamic Network Intrusion Prevention: This submodule includes the implementation of an intrusion prevention system (IPS) with continuous update capability, designed to block malicious traffic in real time. For this purpose, solutions such as Suricata [61] and Snort [62] are recommended, given their ability to perform deep packet inspection and their compatibility with rules based on Indicators of Compromise (IoCs).

IPS configuration can be automated using custom scripts, which are responsible for converting data labeled as malicious into rules compatible with the adopted prevention engine. This allows for frequent updates, reducing the need for manual intervention. Orchestration tools such as Shuffle Automation or Python scripts can be used to consolidate data from sources such as MISP, OpenCTI, VirusTotal, and others, enabling automated flows for ingesting and applying rules.

4.5. M5—Proactive Threat Discovery (PTD)

The M5 module aims to expand the detection capabilities of the MICRA architecture through retrospective analysis of historical data. This approach aims to identify possible threats that have not been detected by network flow analysis mechanisms, allowing the analysis of records based on new attack patterns or recently validated Indicators of Compromise.

Module 5.1—SQL Threat Search Engine: This submodule proposes the use of SQL queries built dynamically using scripts to investigate malicious behavior in organizational databases and security information and event management systems (SIEMs). These queries can be built with the support of the SQLAlchemy library [63], which facilitates the dynamic composition of commands adapted to different data structures. The Python pandas library [64] can be used to pre-process the data returned, helping to prepare for more complex analyses.

For scenarios with large volumes of data, the use of the dbt (Data Build Tool) [65] makes it possible to transform raw data into queryable structures, facilitating the creation of periodic search pipelines or those reactive to specific events. Queries can be complemented by visualization tools such as Apache Superset [66] and Metabase [67], which offer interactive dashboards based on dynamic variables, optimizing the monitoring of historical anomalies. In more exploratory environments, Jupyter Notebooks [68] with the SQL Magic extension can support in-depth manual investigations with dynamic visualization.

M5.2—Malware Pattern Search Engine: Complementing the approach based on structured queries, this submodule proposes the use of techniques to identify malicious patterns in stored data, focusing on files, log records, and binaries. The Yara Rules tool [69] can be used to define rules based on specific malware characteristics, such as byte sequences, strings, and previously known signatures. These rules can be applied retroactively to data sets, allowing for the detection of Indicators of Compromise that were not identified in the analysis of the data stream.

In a complementary way, Sigma Rules [70] provides a standardized model for describing attack patterns in logs, which facilitates integration with different monitoring systems. For investigations at the operating system level, OSQuery [71] allows information such as active processes, network connections, and system logs to be explored, using a query interface similar to the SQL language. Additionally, Graylog can be used for large-scale analysis of historical data, offering a flexible rules language and interactive dashboards capable of highlighting anomalous behavior visually and dynamically.

4.6. M6—Collaborative Intelligence Ecosystem (CIE)

The CIE module includes internal and external integration mechanisms for structured sharing of cyber intelligence and threat data. Its aim is to strengthen organizational resilience through the coordinated sharing of Indicators of Compromise, allowing validated information to feed both local systems and collaborative defense networks.

M6.1—Internal Data Intelligence Hub: This submodule aims to promote the automated integration of data validated as malicious into the organization’s various internal security systems. Information from the Strategic Data Validation Hub (M3.3) can be stored in centralized repositories and distributed via secure APIs to solutions such as SIEMs, SOAR platforms, forensic analysis tools, ticketing systems, and anti-malware mechanisms.

The use of platforms such as MISP and OpenCTI helps to organize, index, and share this data, structuring an internal flow of intelligence that feeds back into the detection and response processes. To optimize operational efficiency, it is recommended to apply filters that consider criteria such as impact, criticality, and operational context in order to prioritize the most relevant data in the sharing process.

M6.2—External Data Intelligence Hub: Aimed at secure collaboration with other organizations, this submodule seeks to enable the structured sharing of threat data while preserving the privacy and confidentiality of the source environment. Internally validated data can be anonymized and filtered before sharing, excluding sensitive attributes such as internal IPs, equipment names, or specific infrastructure configurations.

In addition, the submodule includes the integration of data from external sources, which can be received from institutional partners or intelligence communities. These external data are incorporated into the internal repository after validation and labeling processes, ensuring that only reliable and relevant information is used to enhance the MICRA architecture’s detection and response capabilities.

The architecture suggests using platforms such as MISP and/or OpenCTI—which support secure protocols, strong authentication, and end-to-end encryption—both for sending and receiving Indicators of Compromise (IoCs).

5. Prototype Implementation and Evaluation

This section describes the implementation of a prototype of MICRA and presents a proof-of-concept evaluation. We first discuss the principles that guided the design (Section 5.1) and the mapping of the modules to the selected open-source tools (Section 5.2). Next, we detail the deployment topology (Section 5.3), experimental environment, and evaluation protocol (Section 5.4). Although we use a specific set of technologies for illustrative purposes, each component can be replaced without affecting the architecture due to the low coupling model adopted by MICRA.

5.1. Design Principles

The prototype implementation of MICRA was guided by principles that aim to maintain consistency with the conceptual proposal, facilitate adoption in diverse environments, and allow for continuous evolution of the system.

Low coupling and modularity: Each module communicates through asynchronous interfaces, reducing rigid dependencies. This design favors the exchange of tools or algorithms in isolation, without structural impact on the architecture.
Adoption of open-source software: The tools used are open-source, with active communities and extensive documentation. This choice prioritizes code transparency, eliminates licensing costs, and facilitates replication of the prototype by other researchers.
Event-driven architecture: Data flow is treated as a sequence of distributed events, which contributes to the horizontal scaling of services.
Process transparency: Tools that expose internal workings were chosen over solutions that abstract the details of their operation. This simplifies the understanding of the steps in each submodule.
Scientific reproducibility: Provisioning scripts, experimentation notebooks, data sets, and models are available for consultation in the reference material. This allows other researchers to reproduce the test environment and verify the results presented.
Incremental scalability: Submodules can be deployed gradually, according to the operational context; it is not mandatory to activate all of them immediately, as several work independently and can be added later.

5.2. Reference Stack

Table 4 presents an illustrative mapping between the MICRA modules and a minimum set of open-source tools adopted in the prototype. The choice of these components follows the principles listed in Section 5.1, allowing for interchangeability and transparency.

5.3. Deployment Topology

The MICRA prototype was deployed on Microsoft Azure, within a single resource group, using four Standard B2as v2 (Basv2 series) virtual machines. Each instance offers 2 vCPUs, 8 GiB of memory, and 256 GiB Premium SSD disks (Figure 8).

All VMs run Ubuntu Server 24.04 LTS (x64) and host their services in Docker containers; there is no dedicated orchestrator, as the goal is to demonstrate functionality, not high availability.

Future scaling: If the data load increases, containers can be relocated to larger VMs (e.g., D-series) or distributed horizontally without changing the architectural logic due to the low coupling between modules.

5.4. Datasets

(a) Cowrie: Honeypot provisioned on cloud infrastructure, exposed to the internet, with the objective of collecting real data from attackers to be used in the MICRA environment and exported with the name cowrie.json. Details about the data used are provided below (Table 5):

All data represent real interactions between external agents and the honeypot, without synthetic traffic or data anonymization.

(b) CIC-IDS 2017: CIC-IDS 2017 (Intrusion Detection Evaluation Dataset) is a public benchmark maintained by the Canadian Institute for Cybersecurity (CIC) in partnership with the Communications Security Establishment (CSE) (Table 6). This dataset contains realistic network traffic, both benign and simulated attack traffic, captured over five days, with attacks executed at defined times on different days of the week, covering scenarios such as Brute Force, SSH-Patator, DoS/DDoS, Heartbleed, Web Attacks, Botnet, and PortScan. The data were processed using CICFlowMeter and translated into CSV files containing annotated flows with more than 80 variables per record, including IP addresses, ports, protocols, and attack labels.

In the MICRA prototype, the Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX file was specifically used, which represents the port scanning (PortScan) scenario during working hours on a Friday. This subset includes flows labeled with BENIGN or PortScan.

In MICRA, the Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv subset of the CIC-IDS 2017 dataset simulates hybrid network traffic by combining benign and malicious flows in realistic proportions. This allows the detection and response pipeline to be exercised under conditions comparable to those of a real corporate network. When combined with genuine events captured by the Cowrie honeypot, the dataset creates a test environment that mixes legitimate behavior and suspicious activity, supporting the proof of concept of MICRA without resorting to synthetic data.

5.4.1. Evaluation Protocol

The main objective of a modular system is to clearly separate the core architecture from the implementation details of each machine learning algorithm. In this context, employing different algorithms and presenting their metrics primarily illustrates the operational capabilities and enables potential comparisons due to the system’s flexibility. Accordingly, the discussion in this work emphasizes the adaptable and integrative properties of the framework, rather than narrowly focusing on the specific performance results achieved by each model tested [72].

In the context of cyber threat detection, the quantitative evaluation of modular architectures often utilizes approaches that isolate the contribution of each module, such as ablation studies [73], to better understand system components. However, the aim of this work is not to dissect individual model performances, but to demonstrate the data flow, decision logic, and integration across environments that constitute MICRA. As such, the metrics presented here reflect the performance and behavior of a functional prototype, in which each submodule may employ different algorithms, serving to demonstrate the system’s real-world feasibility and interoperability.

Considering that each component is designed to be replaceable with more current or specialized solutions as needed, the central focus of MICRA remains its adaptability to state-of-the-art advances, thus supporting flexibility, interoperability, and longevity amid the continuous evolution of cyber defense techniques.

C0 (dataset): Initial set of labeled data, sourced exclusively from the subset “Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv” from CSE-CIC-IDS 2017. This set was divided into three partitions: training (70%), validation (15%), and testing (15%). The test dataset, containing approximately 3000 flows (1500 benign and 1500 malicious, distributed equally between FTP-BruteForce and SSH-Bruteforce attacks), remained untouched until the final phase of the evaluation.
C1 (M2.1 Pattern Engine): This deterministic module exclusively uses IoCs obtained from a Cowrie honeypot (data from July 2025). These IoCs are used as known threat signatures for the initial classification of flows. This approach allows us to evaluate the effectiveness of modern IoCs against old flows (2017), testing temporal generalization.
C2 (M2.2 Heuristic Classifier): Supervised classifier trained using the training and validation partitions of the Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv subset. Its role is to reduce false positives and identify additional patterns not detected by the deterministic module (M2.1).
C3 (M2.3 Behavioral Detector): Unsupervised detector based on unsupervised learning. Trained with the training and validation partitions of the 2017 base; it aims to detect unknown threats by anomalous behavior.
C4 (M3.1 CTI Enrichment): Automated validation using the free version of the AbuseIPDB service. Given the practical limitation of queries (paid API), we use a stratified random sample from the test partition to quantitatively evaluate the contribution of this module in terms of enrichment and validation of decisions previously made by the M2.x modules.

Metrics

Given the modular and adaptive nature of the MICRA architecture, we did not adopt a fixed, universal set of metrics for the entire experimental evaluation. Instead, metrics were selected specifically for each submodule, based on the nature of the task performed and the methodological practices most commonly found in the literature.

For example, submodules based on classification employed metrics such as F1-score and false positive rate (FPR), which are widely used in intrusion detection systems due to their robustness in the presence of imbalanced datasets. Other modules relying on heuristic analysis or reputation verification adopted simpler metrics such as coverage or direct cross-validation with external sources.

This selective approach aligns with the core objective of this work: to present and demonstrate the feasibility of a flexible architecture, in which algorithms and tools can be replaced as the state-of-the-art evolves. MICRA does not depend on a fixed set of techniques or metrics but, instead, provides a resilient foundation for composing interoperable, auditable solutions tailored to the operational context of each deployment.

5.4.2. Reproducibility

To ensure methodological transparency and facilitate independent replication of the experiments, this section details the steps performed in each module implemented in the MICRA prototype. We address aspects such as the operating environment, tools, scripts used, and any specific procedures adopted to enable the reported results to be reproduced by other researchers. Each subsection includes a brief but accurate description of the steps necessary for direct replication of the experiments performed.

M1.1—Deceptive Data Streaming

The vm-honeypot virtual machine was provisioned with the Cowrie honeypot, installed and configured according to the official documentation [30]. During the experimental period, Cowrie generated session logs exclusively from public-facing SSH (TCP port 22) and Telnet (TCP port 23) services exposed to the internet.

The events used in this study were extracted solely from the honeypot logs recorded during this period. These were consolidated and exported in JSON format under the filename cowrie.json, which is included in the Supplementary Material of this article.

To facilitate processing and analysis, the cowrie.json file was securely transferred from vm-honeypot to vm-core using the SCP (Secure Copy Protocol), preserving data integrity for subsequent stages of the MICRA pipeline.

M1.2—Network Data Streaming

The CSE-CIC-IDS 2017 dataset was obtained directly from the official repository maintained by the Canadian Institute for Cybersecurity (CIC) [74]. After downloading and extracting the dataset on the data capture virtual machine (vm-sensor), we selected the specific subset titled Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv, provided in CSV format.

This subset was chosen due to its relevance to network-based threat detection scenarios and is included in the Supplementary Material of this article to support independent validation. The selected file was subsequently transferred to the central virtual machine (vm-core) using the Secure Copy Protocol (SCP), ensuring consistency with the data-handling procedures adopted throughout the prototype.

M2.1—Pattern Recognition Threat Analyzer

The M2.1 submodule was implemented as a deterministic pattern analyzer responsible for identifying Indicators of Compromise (IoCs) extracted from honeypot logs and applying them to real network traffic. This module plays a key role in the MICRA prototype by translating malicious interactions observed in controlled environments into actionable intelligence within operational flows.

In this implementation, the system reads the cowrie.json file line by line and extracts a set of IoCs, including IP addresses, domain names, full URLs, remote file names, protocol banners (e.g., SSH or HTTP user-agent strings), and file hashes (MD5, SHA-1, SHA-256), retrieved either from commands (input) or from the shasum field. These IoCs are consolidated in the file honeypot_ioc.csv.

The network traffic analyzed corresponds to the subset Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv from the CSE-CIC-IDS 2017 dataset. For each record, the script performs a full field-wise scan, comparing all values against the extracted IoC set using strict equality. If any match is found, the record is labeled with suspect_network_honeypot.

All flagged entries are saved to a new file, network_stream_honeypot_labeled.csv, which is included in the Supplementary Material of this article. The entire process was executed on the vm-core machine using Python 3.12.

Input Files

cowrie.json—Honeypot interaction log containing attacker commands and metadata.

Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv—Netflow records from the CSE-CIC-IDS 2017 dataset, representing real-world network traffic.

Output Files

honeypot_ioc.csv—Consolidated list of unique IoCs extracted from the honeypot logs.

network_stream_honeypot_labeled.csv—Subset of netflow entries flagged as malicious based on IoC matching.

Reproducibility Instructions

To ensure reproducibility, the analysis script m21_analyzer_deterministic.py is provided, along with the required datasets. To replicate the results, execute the following steps:

1.: Place the required input files in the same directory as the script.
2.: Install the necessary dependencies:

pip install pandas

3.: Run the analysis:

python m21_analyzer_deterministic.py

Execution Results

[+] 437 IoCs saved to honeypot_ioc.csv

[+] 0 rows labeled and exported to network_stream_honeypot_labeled.csv

This modular and transparent approach reinforces the MICRA design principle of adaptability. While the current implementation is based on direct matching of known IoCs, this submodule can be replaced or extended to incorporate alternative detection strategies, including probabilistic or hybrid techniques, without compromising the architectural integrity.

M2.2—Heuristic Threat Analyzer

The M2.2 submodule was implemented as a supervised classification engine responsible for detecting malicious behavior in network traffic using heuristic indicators. Its objective is to simulate real-world application of machine learning techniques to classify complex behavioral patterns associated with known malware, brute-force attempts, and coordinated scanning activities. The submodule operates on labeled datasets and employs standard evaluation metrics to assess detection performance.

For this prototype, the subset Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv from the public dataset CSE-CIC-IDS 2017 was used. The dataset was split into three parts: 70% for training, 20% for evaluation and metrics calculation, and 10% reserved for runtime inference simulation. The latter portion was saved as new_flow.csv, emulating future real-time scenarios.

The script m22_heuristic_multi.py supports parallel evaluation of multiple classifiers. In this execution, seven models were tested: Decision Tree, Random Forest, Gradient Boosting, Extra Trees, Logistic Regression, XGBoost, and LightGBM. Each model was assessed based on Accuracy, Precision, Recall, F1-score (malicious class), AUC-ROC, False Positive Rate (FPR), and Coverage. The best-performing model was selected automatically using a composite criterion (highest F1-score and lowest FPR) and exported as m22_model_best.joblib for future inference. The final labeling from this model was saved as network_stream_heuristic_labeled.csv.

The entire process was executed on the vm-core machine using Python 3.12 and the libraries pandas, scikit-learn, xgboost, lightgbm, and joblib.

Input Files

Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv: Raw labeled netflow dataset used for training and evaluation.

Output Files

network_stream_heuristic_<model>.csv: Flows labeled as suspicious by each evaluated model.
network_stream_heuristic_labeled.csv: Final output using the best classifier.
m22_model_best.joblib: Serialized model with best F1-score and lowest FPR.
m22_comparison_metrics.csv: Summary of metrics for all evaluated models.
new_flow.csv: 10% subset reserved to simulate unseen real-time flows.

Reproducibility Instructions

1.: Install dependencies

pip install pandas scikit-learn xgboost lightgbm joblib

2.: Prepare dataset

Ensure the file Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv is in the same directory as the script.

3.: Run the script

python m22_heuristic_multi.py --models dt rf gb et lr xgb lgb

This command will:

Train and evaluate all models (Table 7, Table 8, Table 9, Table 10, Table 11, Table 12, Table 13, Table 14, Table 15, Table 16, Table 17, Table 18, Table 19, Table 20 and Table 21);
Export labeled flows for each model;
Automatically select and serialize the best model;
Save a CSV file for runtime simulation.

Execution Results

[+] Loading dataset…

[✓] Runtime exported as: new_flow.csv

[✓] Model: Decision Tree

Table 7. Decision tree: evaluation metrics.

Accuracy	0.9999
Precision (Malicious)	1.0000
Recall (Malicious)	0.9999
F1-score (Malicious)	0.9999
AUC-ROC	0.9999
False Positives (FP)	1
False Negatives (FN)	3
Specificity (TN rate)	1.0000
False Positive Rate	0.0000
False Negative Rate	0.0001
Coverage	0.5502

Table 8. Decision tree: classification report.

	Precision	Recall	F1-Score	Support
BENIGN	1.00	1.00	1.00	25,771
MALICIOUS	1.00	1.00	1.00	31,523
accuracy			1.00	57,294
macro avg	1.00	1.00	1.00	57,294
weighted avg	1.00	1.00	1.00	57,294

[✓] Exported: network_stream_heuristic_decision_tree.csv

[✓] Model: Random Forest

Table 9. Random forest: evaluation metrics.

Accuracy	0.9999
Precision (Malicious)	1.0000
Recall (Malicious)	0.9999
F1-score (Malicious)	1.0000
AUC-ROC	1.0000
False Positives (FP)	0
False Negatives (FN)	3
Specificity (TN rate)	1.0000
False Positive Rate	0.0000
False Negative Rate	0.0001
Coverage	0.5501

Table 10. Random forest: classification report.

	Precision	Recall	F1-Score	Support
BENIGN	1.00	1.00	1.00	25,771
MALICIOUS	1.00	1.00	1.00	31,523
accuracy			1.00	57,294
macro avg	1.00	1.00	1.00	57,294
weighted avg	1.00	1.00	1.00	57,294

[✓] Exported: network_stream_heuristic_random_forest.csv

[✓] Model: Gradient Boosting

Table 11. Gradient boosting: evaluation metrics.

Accuracy	0.9999
Precision (Malicious)	1.0000
Recall (Malicious)	0.9998
F1-score (Malicious)	0.9999
AUC-ROC	1.0000
False Positives (FP)	0
False Negatives (FN)	5
Specificity (TN rate)	1.0000
False Positive Rate	0.0000
False Negative Rate	0.0002
Coverage	0.5501

Table 12. Gradient boosting: classification report.

	Precision	Recall	F1-Score	Support
BENIGN	1.00	1.00	1.00	25,771
MALICIOUS	1.00	1.00	1.00	31,523
accuracy			1.00	57,294
macro avg	1.00	1.00	1.00	57,294
weighted avg	1.00	1.00	1.00	57,294

[✓] Exported: network_stream_heuristic_gradient_boosting.csv

[✓] Model: Extra Trees

Table 13. Extra trees: evaluation metrics.

Accuracy	0.9999
Precision (Malicious)	1.0000
Recall (Malicious)	0.9999
F1-score (Malicious)	1.0000
AUC-ROC	1.0000
False Positives (FP)	0
False Negatives (FN)	3
Specificity (TN rate)	1.0000
False Positive Rate	0.0000
False Negative Rate	0.0001
Coverage	0.5501

Table 14. Extra trees: classification report.

	Precision	Recall	F1-Score	Support
BENIGN	1.00	1.00	1.00	25,771
MALICIOUS	1.00	1.00	1.00	31,523
accuracy			1.00	57,294
macro avg	1.00	1.00	1.00	57,294
weighted avg	1.00	1.00	1.00	57,294

[✓] Exported: network_stream_heuristic_extra_trees.csv

[✓] Model: Logistic Regression

Table 15. Logistic regression: evaluation metrics.

Accuracy	0.9630
Precision (Malicious)	0.9402
Recall (Malicious)	0.9962
F1-score (Malicious)	0.9674
AUC-ROC	0.9854
False Positives (FP)	1998
False Negatives (FN)	121
Specificity (TN rate)	0.9225
False Positive Rate	0.0775
False Negative Rate	0.0038
Coverage	0.5830

Table 16. Logistic regression: classification report.

	Precision	Recall	F1-Score	Support
BENIGN	0.99	0.92	0.96	25,771
MALICIOUS	0.94	1.00	0.97	31,523
accuracy			0.96	57,294
macro avg	0.97	0.96	0.96	57,294
weighted avg	0.96	0.96	0.96	57,294

[✓] Exported: network_stream_heuristic_logistic_regression.csv

[✓] Model: XGBoost

Table 17. XGBoost: evaluation metrics.

Accuracy	1.0000
Precision (Malicious)	1.0000
Recall (Malicious)	1.0000
F1-score (Malicious)	1.0000
AUC-ROC	1.0000
False Positives (FP)	0
False Negatives (FN)	1
Specificity (TN rate)	1.0000
False Positive Rate	0.0000
False Negative Rate	0.0000
Coverage	0.5502

Table 18. XGBoost: classification report.

	Precision	Recall	F1-Score	Support
BENIGN	1.00	1.00	1.00	25,771
MALICIOUS	1.00	1.00	1.00	31,523
accuracy			1.00	57,294
macro avg	1.00	1.00	1.00	57,294
weighted avg	1.00	1.00	1.00	57,294

[✓] Exported: network_stream_heuristic_xgboost.csv

[✓] Model: LightGBM

Table 19. LightGBM: evaluation metrics.

Accuracy	1.0000
Precision (Malicious)	1.0000
Recall (Malicious)	1.0000
F1-score (Malicious)	1.0000
AUC-ROC	1.0000
False Positives (FP)	1
False Negatives (FN)	0
Specificity (TN rate)	1.0000
False Positive Rate	0.0000
False Negative Rate	0.0000
Coverage	0.5502

Table 20. LightGBM: classification report.

	Precision	Recall	F1-Score	Support
BENIGN	1.00	1.00	1.00	25,771
MALICIOUS	1.00	1.00	1.00	31,523
accuracy			1.00	57,294
macro avg	1.00	1.00	1.00	57,294
weighted avg	1.00	1.00	1.00	57,294

[✓] Exported: network_stream_heuristic_lightgbm.csv

[✓] Comparison table saved as: m22_comparison_metrics.csv

[✓] Summary table:

Table 21. Summary of classifier metrics.

Model	f1	Accuracy	fpr	roc_auc
Decision Tree	0.999937	0.999930	0.000039	0.999933
Random Forest	0.999952	0.999948	0.000000	1.000000
Gradient Boosting	0.999921	0.999913	0.000000	0.999992
Extra Trees	0.999952	0.999948	0.000000	1.000000
Logistic Regression	0.967361	0.963015	0.077529	0.985449
XGBoost	0.999984	0.999983	0.000000	1.000000
LightGBM	0.999984	0.999983	0.000039	1.000000

[✓] Best model: LightGBM—saved as: m22_model_best.joblib

[✓] Consolidated output → network_stream_heuristic_labeled.csv

Total execution time: 209.41 s

M2.3—Behavioral Threat Insights

The M2.3 submodule was implemented as an unsupervised anomaly detection engine aimed at identifying behavioral deviations in network traffic without relying on pre-labeled data. It integrates multiple models to infer abnormal patterns based on statistical and clustering-based methods, thereby enabling a robust detection pipeline for unknown or evolving threats.

In this implementation, the system processes the file Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv, a subset of the CSE-CIC-IDS 2017 dataset. All numeric features are standardized and submitted to four anomaly detection algorithms:

Isolation Forest (ISO);
Local Outlier Factor (LOF);
Principal Component Analysis (PCA);
K-Means Clustering (KM).

Each model is configured to detect the top 1% most anomalous flows in the dataset. PCA and KM are preceded by dimensionality reduction steps to enhance performance and accuracy. After execution, the results from all models are merged to calculate a composite field named behavioral_score, representing the number of models that classified the same flow as anomalous (ranging from 0 to 4).

The m23_behavioral_analyzer.py script generates several output files and prints comprehensive statistics (Table 22, Table 23, Table 24, Table 25, Table 26, Table 27 and Table 28), including overlap between models and the top IP pairs with highest confidence (Table 29).

All experiments were executed on the vm-core machine using Python 3.12, and required libraries include pandas, numpy, and scikit-learn.

Input Files

Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv: raw flow data (numeric features only)

Output Files

network_stream_behavioral_labeled.csv: full dataset with labels from any model (score ≥ 1)
m23_behavioral_score3plus.csv: filtered flows with score ≥ 3 (high-confidence anomalies)
m23_behavioral_high_confidence.csv: flows flagged by all models (score = 4)
m23_comparison_metrics.csv: anomaly count and coverage by model
m23_behavioral_ip_summary.csv: top Source → Destination IP pairs with highest aggregate scores

Reproducibility Instructions

1.: Install dependencies:

pip install pandas numpy scikit-learn

2.: Prepare dataset:

Ensure the file Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv is in the same directory as the script.

3.: Run the script:

python m23_behavioral_analyzer.py \

--input “Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv” \

--models iso lof pca km

You may choose a subset of models using --models, such as iso pca.

Execution Results

[+] Loading: Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv

[✓] Running ISO…

[✓] Model: ISO

Table 22. Isolation forest anomaly detection summary.

Anomalies detected	2865
Coverage (dataset)	1.00%
Avg anomaly score	−0.026

[✓] ISO completed in 2.98 s.

[✓] Running LO…

[✓] Model LOF

Table 23. Local Outlier Factor (LOF) anomaly detection summary.

Anomalies detected	2865
Coverage (dataset)	1.00%
Mean lof score	1,682,202.493

[✓] LOF completed in 532.12 s.

[✓] Running PCA…

[✓] Model PCA

Table 24. PCA-based anomaly detection summary.

Anomalies detected	2865
Coverage (dataset)	1.00%
Explained var	76.92%

[✓] PCA completed in 1.05 s.

[✓] Running KM…

[✓] Model KM

Table 25. K-Means anomaly detection summary (with PCA preprocessing).

Anomalies detected	2865
Coverage (dataset)	1.00%
Mean distance to centroid	21.433

[✓] KM completed in 1.22 s.

behavioral_score distribution:

Table 26. Distribution of flows by behavioral score.

Score 0	278,198 (97.11%)
Score 1	6015 (2.10%)
Score 2	1325 (0.46%)
Score 3	921 (0.32%)
Score 4	8 (0.00%)

Average behavioral_score: 0.040

Score Group Summary:

Table 27. Behavioral score aggregation summary.

Behavioral_Score	Count	Percentage
0	278,198	97.11
1	6015	2.10
2	1325	0.46
3	921	0.32
4	8	0.00

Total execution time: 544.35 s

Glossary:

Table 28. Description of output metrics and files generated by M2.3.

avg_anomaly_score	mean anomaly score from Isolation Forest
mean_lof_score	average inverse density from LOF
explained_var	% of variance explained by PCA components
mean_distance_to_centroid	average distance to cluster center (K-Means)
overlap	number of flows flagged by both models
jaccard	intersection/union of two detection sets
behavioral_score	number of models that flagged the flow (0–4)
coverage	proportion of dataset flagged as anomalous
network_stream_behavioral_labeled.csv	flows flagged by any model
score3plus.csv	flows flagged by ≥3 models (triage)
m23_behavioral_high_confidence.csv	flows flagged by all models
m23_comparison_metrics.csv	anomaly count and coverage by model
m23_behavioral_ip_summary.csv	top Source→Destination IP pairs

[✓] Top 10 Source→Destination IPs:

Table 29. Top 10 source–destination IP pairs (high-confidence flows).

Source IP	Destination IP	Count	Avg_Score
192.168.10.17	104.197.43.56	3	4.0
192.168.10.12	79.127.127.5	1	4.0
192.168.10.15	52.84.26.193	1	4.0
192.168.10.16	173.241.242.220	1	4.0
192.168.10.16	198.54.12.96	1	4.0
192.168.10.25	192.168.10.3	1	4.0

M3.1—Threat Signature Validation

The M3.1 submodule is responsible for validating potentially malicious IP addresses observed across all previous MICRA detection modules by cross-referencing them with external threat intelligence. Specifically, the system integrates with the VirusTotal public API, allowing the enrichment of Indicators of Compromise (IoCs) by leveraging a reputational score computed from multiple antivirus and security engines.

The script m31_threat_validate.py consolidates the Top 50 public IP addresses from each of the following labeled network traffic files:

network_stream_honeypot_labeled.csv (output of M2.1);
network_stream_heuristic_labeled.csv (output of M2.2);
network_stream_behavioral_labeled.csv (output of M2.3).

Each IP is validated using VirusTotal’s v3 API, with results cached locally in SQLite to minimize redundant queries and improve performance. The validation logic applies a simple binary verdict: an IP is labeled malicious if its malicious score from VirusTotal is 1 or higher; otherwise, it is considered benign.

Each verified IP is stored in a PostgreSQL table named validated_iocs, including relevant metadata such as number of reports, associated country, and the date of last analysis. The final report is also exported as a consolidated CSV for traceability.

This module reinforces MICRA’s layered defense by correlating internal detection signals with trusted external sources, increasing confidence and supporting further response or triage steps.

Input Files

network_stream_honeypot_labeled.csv

(Labeled flows flagged by deterministic pattern-matching based on honeypot IoCs);

network_stream_heuristic_labeled.csv

(Labeled flows from supervised ML models);

network_stream_behavioral_labeled.csv

(Labeled flows from behavioral anomaly detection).

Output Files

m31_validated_ips.csv:

Consolidated VirusTotal enrichment results for the top 50 public IPs from each input file (Table 30).

PostgreSQL table: validated_iocs

Stores normalized IP reputation verdicts and metadata for long-term querying.

Reproducibility Instructions

1.: Set environment variable for database access:

export PG_DSN = “postgresql://micra_user:micra_pass@localhost:5432/micra”;

2.: Ensure VirusTotal API key is configured:

Either by editing VT_API_KEY directly in the script, or:

export VT_API_KEY = “your_api_key_here”;

3.: Install required dependencies:

pip install pandas requests psycopg2-binary;

4.: Run the validation:

python m31_threat_validate.py \

--m21 network_stream_honeypot_labeled.csv \

--m22 network_stream_heuristic_labeled.csv \

--m23 network_stream_behavioral_labeled.csv

Execution Results

[+] network_stream_honeypot_labeled.csv: top 0 public IPs collected

[+] network_stream_heuristic_labeled.csv: top 1 public IPs collected

[+] network_stream_behavioral_labeled.csv: top 50 public IPs collected

[+] Total unique IPs to check: 51

Table 30. Public IPs Validated through VirusTotal.

(1/51) 104.31.91.87	VT_malicious = 0	(2/51) 104.97.120.94	VT_malicious = 0
(3/51) 104.97.137.26	VT_malicious = 0	(4/51) 106.122.252.16	VT_malicious = 0
(5/51) 141.170.25.54	VT_malicious = 0	(6/51) 151.101.21.127	VT_malicious = 0
(7/51) 157.240.18.19	VT_malicious = 0	(8/51) 157.240.18.35	VT_malicious = 3
(9/51) 157.240.2.25	VT_malicious = 0	(10/51) 157.240.2.35	VT_malicious = 0
(11/51) 160.17.5.1	VT_malicious = 2	(12/51) 162.213.33.50	VT_malicious = 0
(13/51) 17.253.14.125	VT_malicious = 0	(14/51) 172.217.10.110	VT_malicious = 0
(15/51) 172.217.10.130	VT_malicious = 0	(16/51) 172.217.10.226	VT_malicious = 0
(17/51) 172.217.10.66	VT_malicious = 0	(18/51) 172.217.12.162	VT_malicious = 0
(19/51) 172.217.12.174	VT_malicious = 0	(20/51) 172.217.12.206	VT_malicious = 0
(21/51) 172.217.3.110	VT_malicious = 0	(22/51) 172.217.3.98	VT_malicious = 0
(23/51) 172.217.6.194	VT_malicious = 0	(24/51) 172.217.9.226	VT_malicious = 0
(25/51) 173.241.242.143	VT_malicious = 0	(26/51) 178.124.129.12	VT_malicious = 0
(27/51) 178.255.83.1	VT_malicious = 0	(28/51) 192.229.211.82	VT_malicious = 0
(29/51) 192.82.242.23	VT_malicious = 0	(30/51) 217.118.87.98	VT_malicious = 0
(31/51) 31.13.71.36	VT_malicious = 1	(32/51) 31.13.71.7	VT_malicious = 0
(33/51) 31.13.80.12	VT_malicious = 1	(34/51) 37.209.240.1	VT_malicious = 0
(35/51) 50.63.243.230	VT_malicious = 0	(36/51) 62.161.94.230	VT_malicious = 0
(37/51) 63.251.240.12	VT_malicious = 0	(38/51) 67.72.99.137	VT_malicious = 0
(39/51) 68.67.178.111	VT_malicious = 0	(40/51) 68.67.180.12	VT_malicious = 1
(41/51) 69.172.216.111	VT_malicious = 0	(42/51) 69.4.95.11	VT_malicious = 0
(43/51) 72.21.81.48	VT_malicious = 0	(44/51) 74.117.200.68	VT_malicious = 0
(45/51) 74.121.138.87	VT_malicious = 0	(46/51) 8.0.6.4	VT_malicious = 0
(47/51) 8.43.72.97	VT_malicious = 0	(48/51) 8.43.72.98	VT_malicious = 0
(49/51) 8.6.0.1	VT_malicious = 0	(50/51) 91.236.51.44	VT_malicious = 0
(51/51) 93.184.216.180	VT_malicious = 0

[✓] 5 malicious IPs inserted/updated in PostgreSQL.

[✓] Full report saved at:/home/linuxman/scripts/micra/results_m31/m31_validated_ips.csv

M3.2—Expert Intelligence Validation

The M3.2 submodule was implemented to incorporate human-driven threat intelligence into the unified threat indicator database used by the MICRA architecture. It allows analysts to manually validate, override, or complement indicators of compromise (IoCs) previously obtained through automated modules (M3.1) or other pipelines. This ensures that expert domain knowledge is properly integrated into the detection and decision-making process.

The input consists of a structured CSV file (expert_iocs.csv) containing analyst-verified IoCs. Each entry must explicitly declare the indicator value, type (e.g., IP, domain, URL, hash), verdict (malicious, benign, or false_positive), the analyst name, and an optional rationale. Upon execution, the script m32_manual_validate.py reads the file and performs either upserts or soft deletions into the unified PostgreSQL table validated_iocs. Fields such as score, reports, and country, typically filled by automated sources like VirusTotal, are left untouched in this stage (Table 31, Table 32, Table 33 and Table 34).

The submodule enforces strict data validation rules, provides detailed execution feedback (including per-analyst and per-verdict statistics), and supports auditability and traceability by persisting the analyst, rationale, and updated_at fields.

This module reinforces the MICRA design principle of human-in-the-loop intelligence validation, enabling cybersecurity professionals to intervene, contextualize, or contest automated inferences.

Input Files

expert_iocs.csv: Analyst-curated CSV file with the following required columns:
○
ioc: Indicator value (IP, domain, URL, or hash);
○
ioc_type: One of ip, domain, url, hash;
○
verdict: Must be one of malicious, benign, false_positive;
○
analyst: Name or initials of the expert;
○
rationale: Optional justification or context.

Output Files

No new files are generated.
The PostgreSQL table validated_iocs is updated with new or modified entries.
Fields updated include:
○
verdict, analyst, rationale, updated_at, source = ‘manual’

Reproducibility Instructions

1.: Prepare your environment:

export PG_DSN = “postgresql://micra_user:micra_pass@localhost:5432/micra”.

2.: Create or edit the input file expert_iocs.csv with analyst judgments.
3.: Run the script:

python m32_manual_validate.py --csv expert_iocs.csv

Execution Results

Import Summary

Table 31. Summary of expert-validated IoC entries processed from CSV input.

Total entries in CSV	49
Valid inserts/updates	49
Deletions (hard/soft)	0
Invalid/ignored rows	0

Verdict Breakdown

Table 32. Distribution of analyst-assigned verdicts for imported threat indicators.

malicious	22
benign	14
false_positive	13

Analysts

Table 33. Number of IoCs contributed by each human analyst during validation.

Bob	15 IoCs
Eve	11 IoCs
Carol	10 IoCs
Alice	9 IoCs
Dave	4 IoCs

IOC Types

Table 34. Breakdown of IoC types included in the manually curated dataset.

ip	14
domain	14
url	14
hash	6
junk	1

[✓] Operation complete.

M3.3—Strategic Data Validation Hub

The M3.3 submodule serves as the centralized repository of validated threat intelligence within the MICRA architecture. It is designed to store and maintain Indicators of Compromise (IoCs) that have been verified as malicious or classified by expert analysis. This module provides a reliable and consistent source of ground truth, supporting decision-making, inference, and automated response across all downstream components.

Rather than performing detection or analysis itself, M3.3 functions as a persistent intelligence layer. It enables the consolidation of external threat data, manual assessments, and outputs from other analytical modules into a single authoritative structure. This unified repository facilitates streamlined access to high-confidence IoCs, promoting interoperability, auditability, and scalability within security workflows.

The centralized repository is maintained in a PostgreSQL 15 database running in a Docker container. The schema (Table 35) consists of a single table (validated_iocs) populated incrementally by upstream modules. The M3.3 implementation runs on vm-core.

Table structure—validated_iocs

Table 35. Schema definition of the validated_iocs table used for centralized threat intelligence storage.

Column	Type	Description
ioc	TEXT	The IoC value (IP, domain, URL, hash, etc.)
ioc_type	TEXT	Type of indicator (ip, domain, url, hash, etc.)
verdict	TEXT	Classification result: malicious, benign, or false_positive
score	INTEGER	Confidence score (if available)
reports	INTEGER	Number of reports or sightings
country	TEXT	Country of origin (applicable for IP addresses)
last_report	DATE	Date of last known report
source	TEXT	Origin of the data (e.g., virustotal, manual)
analyst	TEXT	Analyst responsible for manual classification (if applicable)
rationale	TEXT	Justification or reasoning for the classification
updated_at	TIMESTAMPTZ	Timestamp of the last update

This structure ensures traceability, consistency, and extensibility for future integration with other security platforms, including SIEMs, CTI feeds, and response engines.

M4.1—Dynamic Perimeter Defense

The M4.1 submodule implements an automated perimeter protection mechanism by remotely applying iptables firewall rules to a Linux-based border host. Its purpose is to block confirmed malicious IP addresses, previously validated and stored in the centralized PostgreSQL repository, thereby preventing inbound or outbound traffic to known threats. The execution strategy is idempotent: before inserting any new rule, the system checks for existing entries to avoid duplication or conflict.

This mechanism reinforces the proactive defense posture of MICRA by transforming threat intelligence into real-time network enforcement actions at the perimeter level.

Architecture and Tooling

The script is executed from the analysis node (vm-core), which connects via SSH to the target perimeter host (vm-misp, running Ubuntu 24.04 LTS). The firewall on vm-misp is managed with iptables. The SSH user (micra) has a sudo rule that allows execution of/usr/sbin/iptables without a password prompt (NOPASSWD), enabling full automation without manual intervention.

The submodule is implemented in the script m41_dynamic_perimeter_block.py, which uses the libraries psycopg2, paramiko and dotenv to query the database and enforce the firewall policies.

Input Files

PostgreSQL table validated_iocs

The script queries this table directly via the connection string defined in the PG_DSN environment variable.

Output Files

The script does not generate output files by default, but all commands and results are printed to the console (Table 36). Redirecting stdout (e.g., > results.txt) is recommended for auditing purposes.

Reproducibility Instructions

1.: Install required libraries

pip install psycopg2-binary paramiko python-dotenv

2.: Ensure PostgreSQL access and environment configuration

export PG_DSN = “postgresql://micra_user:micra_pass@localhost:5432/micra”

export MISP_HOST = “vm-misp”

export MISP_USER = “micra”

3.: Ensure SSH and sudo configuration on target host (vm-misp)

micra ALL = (ALL) NOPASSWD:/usr/sbin/iptables

4.: Run the script from vm-core

python m41_dynamic_perimeter_block.py

Execution Results

[+] Attempting to block 27 IPs on vm-misp…

Table 36. List of 27 malicious IPs successfully blocked via iptables on the perimeter host (vm-misp).

Blocked 102.157.44.105	Blocked 105.158.118.241
Blocked 111.111.111.111	Blocked 112.112.112.112
Blocked 113.113.113.113	Blocked 113.169.187.159
Blocked 123.123.123.123	Blocked 134.35.9.209
Blocked 139.195.43.166	Blocked 147.185.221.30
Blocked 154.94.232.230	Blocked 157.240.18.35
Blocked 160.17.5.1	Blocked 172.64.80.1
Blocked 185.103.100.63	Blocked 185.143.223.69
Blocked 193.233.171.95	Blocked 234.234.234.234
Blocked 31.13.71.36	Blocked 31.13.80.12
Blocked 45.12.112.91	Blocked 68.67.180.12
Blocked 68.83.169.91	Blocked 82.102.21.123
Blocked 89.89.89.89	Blocked 91.108.245.232
Blocked 91.236.51.44

Blocking process completed.

M4.2—Dynamic Endpoint Defense

The M4.2 submodule is responsible for synchronizing confirmed malicious IP addresses with endpoint agents in real time, leveraging the native integration capabilities of the Wazuh platform. By maintaining an up-to-date block list within the Wazuh Manager, the system ensures that all connected endpoints receive and enforce perimeter rules dynamically, supporting distributed protection in a scalable and automated fashion.

This defense mechanism enhances MICRA’s architecture by extending threat mitigation beyond the network perimeter (M4.1), reaching down to each enrolled endpoint in a coordinated and secure manner.

Implementation Architecture

The Wazuh Manager runs inside a Docker container tagged wazuh.manager. The MICRA core system (vm-core) executes the script m42_sync_endpoint_wazuh.py, which performs the following operations:

Queries the PostgreSQL validated_iocs table for up to 1000 recently validated malicious IPs, excluding private ranges;
Generates a block list file in the expected Wazuh format;
Packages the list as a tar archive and transfers it into the container directory/var/ossec/etc/shared/lists/;
Restarts the Wazuh Manager via wazuh-control, triggering automatic distribution to all agents.

This workflow supports secure and idempotent updates without requiring direct access to the host filesystem or containers.

Input Files

PostgreSQL Table: validated_iocs

Filtered for:
ioc_type = ‘ip’;
verdict = ‘malicious’;
Excludes private IP ranges (10.x.x.x, 172.16.x.x–172.31.x.x, 192.168.x.x).

Output Files

/var/ossec/etc/shared/lists/blocklist_<YYYYMMDD>.lst (inside the Wazuh Manager container).

This file is automatically synchronized with all Wazuh agents and enforces dynamic blocking policies.

Reproducibility Instructions

1.

Ensure prerequisites are met:

PostgreSQL service running and accessible from vm-core;
Docker installed and the Wazuh Manager container active;
Environment variable PG_DSN set with proper connection string.

2.: Install required Python packages:

pip install psycopg2-binary docker.

3.: Execute the script:

python m42_sync_endpoint_wazuh.py

Execution Results

Generated:/tmp/tmpu9xdiloz → blocklist_20250731.lst

Target container: single-node-wazuh.manager-1 (OK)

Total IPs in database: 31

IPs synced (latest 1000): 31

Top 10 IPs to sync:

82.102.21.123
193.233.171.95
185.103.100.63
91.236.51.44
91.108.245.232
45.12.112.91
134.35.9.209
139.195.43.166
185.143.223.69
68.83.169.91

Copied to single-node-wazuh.manager-1:/var/ossec/etc/shared/lists/blocklist_20250731.lst

Manager restarted—rules will propagate in ~1 min

M4.3—Dynamic Network Intrusion Prevention

The M4.3 submodule dynamically updates network-level intrusion prevention mechanisms by generating a Suricata ruleset based on the centralized threat intelligence repository. Its goal is to ensure that newly validated malicious IPs are automatically transformed into blocking rules and distributed to perimeter sensors.

This module connects to the PostgreSQL database to extract all malicious IPs stored in the validated_iocs table. Each IP is converted into a Suricata drop rule and saved to a dedicated rules file named micra_ioc.rules. The rules are remotely deployed via SSH to the vm-sensor machine, where Suricata runs inside a Docker container. Once transferred, the module triggers a rule reload to ensure immediate enforcement.

The ruleset uses a reserved SID range (from 6,000,000 onward) and includes metadata such as timestamp and classification for easier traceability.

Input Files

PostgreSQL table: validated_iocs
○
Fields used: ioc, ioc_type, verdict, updated_at

Output Files

/opt/suricata/rules/micra_ioc.rules (on vm-sensor): dynamically generated ruleset containing all malicious IPs.
Rules format (example):

drop ip any any -> 91.108.245.232 any (msg: ”MICRA IOC—Malicious IP 91.108.245.232”; sid:6000001; rev:1; classtype: trojan-activity;)

Reproducibility Instructions

1.: Install dependencies

pip install psycopg2-binary paramiko

2.: Set environment variables

export PG_DSN = “postgresql://micra_user:micra_pass@localhost:5432/micra”

export SENSOR_HOST = “vm-sensor”

export SSH_KEY = “/home/linuxman/.ssh/id_rsa” # adjust as needed

export SSH_USER = “linuxman”

3.: Ensure permissions on vm-sensor

The SSH user must have write access to/opt/suricata/rules or be granted via chmod 777 (for testing).

4.: Execute the script

python m43_sync_suricata_rules.py

Execution Results

File deployed to vm-sensor:/opt/suricata/rules/micra_ioc.rules

Total rules written: 31

Top 10 IPs:

82.102.21.123
193.233.171.95
185.103.100.63
91.236.51.44
91.108.245.232
45.12.112.91
134.35.9.209
139.195.43.166
185.143.223.69
68.83.169.91

Suricata reloaded successfully: {“message”: “done”, “return”: “OK”}

31 rules successfully active in Suricata.

M5.1—SQL Threat Search Engine

The M5.1—SQL Threat Search Engine module implements a modular hunt generation and ingestion engine, designed to convert validated threat intelligence indicators (IoCs) into structured SQL queries. These queries, referred to as hunts, are stored and versioned for execution in Security Information and Event Management (SIEM) platforms or PostgreSQL-compatible telemetry databases.

The hunt engine allows analysts and automated systems to investigate patterns of malicious behavior retrospectively using structured queries. Each hunt is versioned with metadata (e.g., title, severity, tags) and saved both in .sql and .yml formats. Once results are collected from the SIEM or SQL engine, they can be ingested back into the MICRA system for correlation, historical tracking, and triage.

This submodule is composed of two main scripts:

m51_build_hunts.py: Generates SQL hunt templates using validated IP-based IoCs.
m51_ingest_hunts.py: Loads the CSV outputs of executed hunts into the MICRA database (suspect_historical_sql table) for storage and further analysis.
Input Files
Validated IoCs: Malicious IP addresses stored in the PostgreSQL table validated_iocs (populated by modules M3.1, M3.2, etc.).
Executed Hunt Results (CSV): CSV files resulting from hunt execution via SIEM or manual query execution.
Output Files
hunts_out/bruteforce_auth.sql: SQL query to detect brute-force authentication attempts.
hunts_out/ioc_contact.sql: SQL query to detect contact with known malicious IPs.
hunts_out/hunts.yml: Metadata for all generated hunts, including ID, title, severity, and description.
Populated PostgreSQL table suspect_historical_sql with ingestion of hunt results via CSV files.

Reproducibility Instructions

1.: Install Dependencies

pip install pyyaml psycopg2-binary sqlparse

2.: Set Environment Variable for Database Connection

export PG_DSN = “postgresql://micra_user:micra_pass@localhost:5432/micra”

3.: Generate Hunts (SQL + Metadata)

python m51_build_hunts.py

This command will generate .sql files and a metadata file under hunts_out/.

4.

Execute the Hunts Manually or via SIEM

For example:

psql micra < hunts_out/ioc_contact.sql > results_m51/ioc_contact_result.csv

Ensure the result is saved in CSV format.

5.: Ingest Hunt Results

python m51_ingest_hunts.py --in-dir results_m51 --tag “Q2 Threat Hunt”

This step inserts the hunt results into the table suspect_historical_sql, preserving metadata such as source IP, destination IP, timestamp, and hunt ID.

Execution Results

After running the full cycle of M5.1, the following results were obtained:

2 hunts generated:

bruteforce_auth.sql (high severity)

ioc_contact.sql (critical severity)

Metadata exported to hunts_out/hunts.yml

CSV results ingested with tag “Q2 Threat Hunt”:

ioc_contact_result.csv: 17 rows inserted

bruteforce_auth_result.csv: 5 rows inserted

Table suspect_historical_sql contains enriched data including IPs, timestamps, and original CSV rows (stored as JSON in extra field)

The modular design allows for easy extension with new hunt types, as well as automatic or scheduled ingestion of detection results for long-term tracking and visualization.

M5.2—Malware Pattern Search Engine

The M5.2 submodule is designed to retroactively identify malware artifacts across local or historical file repositories using pattern-matching technologies such as YARA and Sigma. It operationalizes structured detection strategies to scan for previously validated threat indicators—particularly file hashes (SHA-256), suspicious strings, and behavioral signatures extracted from honeypot sessions. This process supports malware triage, incident forensics, and historical threat correlation.

Two distinct scripts were developed for this module:

m52_build_sigma_from_cowrie.py: Parses Cowrie honeypot logs and generates a consolidated YAML file containing Sigma detection rules. These rules cover:
○
Exact SHA-256 hashes of downloaded/uploaded binaries
○
IOC clusters of up to 20 IPs/URLs per rule
○
A generic shell-dropper rule for common attack behaviors (e.g., wget, curl, chmod +x)
○
Detection of non-standard SSH client banners

The resulting file is ready for SIEM conversion via tools such as sigma convert.

m52_build_yara_from_cowrie.py: Generates .yar rules by extracting malicious indicators from Cowrie events. It groups them into:
○
Individual rules for each SHA-256 hash
○
URL/IP blocks (up to 20 indicators per rule)
○
A combined rule using top-10 hashes or shell-dropper patterns
○
Detection of fake SSH banners

These rules can be used to scan local file systems using the yara CLI tool.

m52_ingest_yara_results.py: After scanning the environment using the generated YARA rules, this script imports detected SHA-256 hashes into the validated_iocs repository with a malicious verdict and source tag yara-hunt.

Input Files

cowrie.json: Honeypot log with attacker sessions (source for pattern generation)

scan_yara.csv: Output of a retroactive scan using yara -r…

Output Files

rules_out/m52_sigma_rules_iocs_<DATE>.yml: Sigma rules generated from Cowrie logs

rules_out/m52_yara_rules_iocs_<DATE>.yar: YARA rules with malware indicators

validated_iocs (PostgreSQL): Updated with new SHA-256 hits (source: yara-hunt)

Reproducibility Instructions

1.: Install dependencies:

pip install psycopg2-binary pyyaml sqlparse

2.: Generate Sigma rules:

python m52_build_sigma_from_cowrie.py

3.: Generate YARA rules:

python m52_build_yara_from_cowrie.py

4.: Run a scan using YARA (example):

yara -r rules_out/m52_yara_rules_iocs_<DATE>.yar/target_dir > scan_yara.csv

5.: Ingest results into PostgreSQL:

export PG_DSN = “postgresql://user:pass@host:5432/micra”

python m52_ingest_yara_results.py scan_yara.csv

Execution Results

Sigma Generation:

Created rules for:

○
12 individual SHA-256 hashes
○
3 URL/IP clusters (60 IOCs total)
○
1 generic dropper rule
○
1 SSH banner detection rule

Output saved to: rules_out/m52_sigma_rules_iocs_2025-07-31.yml

YARA Generation:

Created rules for:

○
12 SHA-256 hashes
○
60 URLs and IPs in 3 grouped rules
○
1 combined dropper rule
○
1 SSH banner detection rule

Output saved to: rules_out/m52_yara_rules_iocs_2025-07-31.yar

YARA Ingestion:

After scanning:

○
SHA-256 ingested: 4 new, 2 updated, 3 skipped.
○
All inserted entries tagged as source = ‘yara-hunt’ and verdict = ‘malicious’

M6.1—Internal Data Intelligence Hub

The M6.1 submodule is responsible for exporting validated threat intelligence from the MICRA repository to the internal MISP (Malware Information Sharing Platform) instance for collaborative enrichment, correlation, and visualization. It consolidates strategic indicators of compromise (IoCs) classified as malicious in the PostgreSQL database and publishes them as new structured events in the MISP platform, tagged and categorized for use in IDS/IPS signatures, threat analysis, or forensic triage.

This mechanism bridges the internal threat validation architecture (Module M3.3) with a dedicated CTI platform (MISP), supporting traceability, collaboration, and long-term threat knowledge accumulation.

The submodule is executed from the vm-core node and publishes events directly to the vm-misp instance, assuming the existence of valid API credentials and appropriate visibility configurations (e.g., org-only or sharing group ID).

Input Files

PostgreSQL table validated_iocs: Contains all malicious IoCs validated in MICRA

Output Files

New MISP event: Event with ip-dst attributes for each malicious IoC

Event metadata: Includes rationale, date, and MISP tags (validated, ids:signature)

Reproducibility Instructions

1.: Configure environment variables:

export PG_DSN = “postgresql://micra_user:micra_pass@localhost:5432/micra”

export MISP_URL = http://vm-misp (MISP instance)

export MISP_KEY = “your_api_key_here”

export SHARING_ID = “1” # optional: ID of sharing group

2.: Run the publishing script:

python m61_ioc_m33_to_misp.py

3.

Expected behavior:

○: All malicious IPs are extracted from the validated_iocs table.
○: A new event is created in the MISP instance.
○: Each IP is added as an attribute (ip-dst) with optional analyst commentary.

Execution Results

Example output for an execution containing 31 indicators:

✓ Published event 325 with 31 IoCs to MISP at http://vm-misp (MISP instance)

Each IoC is added with its rationale (if present), and the event is tagged appropriately. If no new IoCs are present, the script will return:

✓ No new IoCs—nothing to publish.

This design supports repeated execution without duplication due to MISP’s internal deduplication and MICRA’s control over verdict assignment.

M6.2—External Data Intelligence Hub

The M6.2 submodule enables bidirectional threat intelligence sharing between MICRA and external partners, such as ISACs or trusted MISP nodes. Its core objective is to ingest externally received IoCs into MICRA’s validation pipeline and to automatically export internally validated indicators back to partner nodes in compliance with configured sharing policies.

This architecture reinforces MICRA’s ability to operate as both a consumer and a provider of cyber threat intelligence (CTI), while ensuring that all indicators follow the same internal scrutiny path regardless of their origin.

The process is performed manually during the MVP phase, using the script m63_ioc_externo_misp_to_m33.py, which extracts new indicators tagged as status:new_external in the local MISP and inserts them into the validation_queue table for reprocessing by the M3.x validation modules.

Input Files

MISP attributes with tag status:new_external—Incoming threat intelligence tagged by automation rules

Output Files

validation_queue: Temporary queue for unverified IoCs to be processed by M3.x

Reproducibility Instructions

1.: Configure required environment variables:

export MISP_URL = http://vm-misp (MISP instance)

export MISP_KEY = “your_api_key”

export PG_DSN = “postgresql://micra_user:micra_pass@localhost:5432/micra”

2.: Execute ingestion script on vm-core:

python m63_ioc_externo_misp_to_m33.py

3.: Confirm insertion into validation_queue:

✓ 23 IoCs inserted into validation_queue

Execution Results

After execution, all external IoCs tagged as status:new_external in MISP are routed to MICRA’s internal validation pipeline. If properly configured, the complete flow proceeds as follows:

External IoCs appear in MISP.
Script m63_ioc_externo_misp_to_m33.py inserts them into validation_queue.
Validation pipeline (M3.x) processes the entries and updates validated_iocs.
Validated indicators are published internally (M6.1) and externally via MISP sync.

This design supports reproducible, rule-based CTI sharing under trust boundaries defined by TLP and tag-based filtering.

6. Discussion

This discussion interprets the modular contributions of MICRA in light of the existing literature and real-world operational demands. Modular Intelligent Cybersecurity Response Architecture (MICRA) was conceived as a response to the main gaps identified in the specialized academic literature as well as the organizational demands observed in practice, such as the fragmentation of tools, architectural rigidity, the lack of effective integration between detection, validation and response phases, and limitations in the structured sharing of cyber intelligence among organizations. The proposal adopts a modular, scalable, and adaptable structure, with the aim of enabling the coordinated orchestration of technologies, data, processes, and collaborative security mechanisms, considering different levels of maturity and operational contexts.

One of the main distinguishing features of MICRA is its strategic modularity, which allows subcomponents to be replaced, reconfigured, or expanded without compromising the architecture as a whole. This feature represents an important advance over monolithic models, promoting greater resilience, interoperability, and adherence to different institutional contexts, as pointed out by recent studies in architectural cybersecurity.

The functionality of the M1 (Dynamic Data Acquisition) module lays the operational foundation on which the other modules are built. The proposal of a hybrid approach—which combines passive traffic collection with the active capture of suspicious interactions through honeypots—increases situational visibility without overloading the monitored infrastructure. This distinction between legitimate flows and suspicious environments favors the initial labeling of data, which is essential for the effectiveness of subsequent analytical models.

By incorporating multiple analytical approaches, such as supervised learning, heuristics, and behavioral analysis, the M2 (Cognitive Threat Analysis) module contributes to more robust threat identification. By working with both labeled and unlabeled data, MICRA increases its detection capacity even in scenarios with a low volume of previously known samples, a common challenge in operational environments. In parallel, the M4 (Adaptive Response Orchestration) module allows these findings to be quickly converted into corrective actions, with different levels of automation adaptable to the organization’s maturity and risk.

The introduction of M3 (Strategic Threat Intelligence) as the central point of multi-layer validation offers a balance between automation and human judgment. The ability to combine queries to external sources with specialized validations conducted by analysts reinforces the reliability of decisions while reducing false positives and avoiding hasty responses. This layered architecture proves especially relevant in sensitive scenarios, such as those involving critical systems or regulated environments.

The inclusion of the M5 (Proactive Threat Discovery) module extends the scope of the architecture by enabling retrospective analysis of historical data—an aspect that is often overlooked in models that focus exclusively on detection in the network data stream. This feature makes it possible to re-evaluate records based on new patterns and information about identified threats, helping to detect persistent threats such as advanced campaigns or long-lasting attacks.

In addition, module M6 (Collaborative Intelligence Ecosystem) provides a structured model for sharing cyber intelligence, both internally and externally. The adoption of open standards, anonymization, and authentication mechanisms allows for the construction of more reliable collaborative ecosystems, in line with international recommendations for strengthening collective cyber resilience.

Even so, the implementation of MICRA may require attention to specific contextual factors. Integration with legacy systems, for example, may require additional technical adaptations. The calibration of automatic locking mechanisms must also take into account the risks associated with false correlations and improper actions, which justifies maintaining human layers of supervision in critical validation modules.

Finally, adopting an architecture such as MICRA requires multidisciplinary skills, involving data science, security engineering, traffic analysis, and risk management. Although challenging, this demand may represent an opportunity for organizational restructuring around more integrated, data-driven practices focused on continuous adaptability in the face of the dynamic digital threat landscape.

7. Conclusions

This article introduced MICRA, Modular Intelligent Cybersecurity Response Architecture, a framework that unites detection and response in a modular and adaptable way. The experiments described in Chapter 5 validate the benefits of combining cognitive analysis, multi-layer validation, and adaptive orchestration within a loosely coupled architecture.

Breaking the defence cycle into six interchangeable modules allows organizations to start with open-source components and later integrate proprietary tools or more advanced algorithms as resources and maturity evolve. Mapping each module to the Identify, Protect, Detect, and Respond functions of the NIST CSF 2.0 also eases MICRA’s adoption in regulated environments.

Priority directions for future work

Large-scale benchmarking, validate MICRA on real enterprise traffic, measuring throughput, latency, and operational overhead across different hardware profiles.
Complete coverage of the NIST CSF 2.0 Recover function, integrate automated service-restoration playbooks and business-continuity hooks to close the current gap.
Explainable AI for analyst trust, embed XAI techniques in the Cognitive Threat Analysis module, offering feature-level justifications for automated decisions.
Explore real-world use cases and integration scenarios, such as deployment in critical infrastructure environments, interconnection with existing SIEM/SOAR platforms, and adaptation for sector-specific threat intelligence sharing networks (e.g., ISACs).

By aligning architectural flexibility, intelligent automation, and secure collaboration, MICRA offers a practical path for organizations to strengthen their security posture amid the growing complexity of the threat landscape. Moreover, investigating concrete deployment scenarios will help translate MICRA’s conceptual strengths into measurable operational benefits, fostering its adoption in diverse environments and accelerating the maturity of cyber defence practices.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/jcp5030060/s1. File S1: compressed archive containing every artefact used in the MICRA prototype evaluation. Contents (all in the root of the ZIP) include: Datasets: cowrie.json, Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv, honeypot_ioc.csv, new_flow.csv, and other CSV files with labelled network streams for training, testing, and validation; Python scripts: code for each sub-module, for example m21_analyzer_deterministic.py, m22_heuristic_multi.py, m23_behavioral_analyzer.py, m41_dynamic_perimeter_block.py, m52_build_sigma_from_cowrie.py, and m61_ioc_m33_to_misp.py; Models and caches: trained LightGBM model (m22_model_best.joblib) and VirusTotal query cache (vt_cache.sqlite); Metrics and results: comparison spreadsheets (m22_comparison_metrics.csv, m23_comparison_metrics.csv), high-confidence IoC lists (m23_behavioral_high_confidence.csv, expert_iocs.csv), hunting outputs in the hunts_out folder, and final validation reports in results_m31. All files are ready to run with Python 3.12; execution order and parameters are documented in header comments at the top of each script.

Author Contributions

Conceptualization, A.C.C. and L.V.d.A.; methodology, A.C.C. and L.V.d.A.; software, A.C.C.; validation, A.C.C. and L.V.d.A.; formal analysis, A.C.C.; investigation, A.C.C.; resources, A.C.C.; data curation, A.C.C. and L.V.d.A.; writing—original draft preparation, A.C.C.; writing—review and editing, A.C.C. and L.V.d.A.; visualization, A.C.C.; supervision, L.V.d.A.; project administration, A.C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

We would like to express our deep gratitude and appreciation to the organizations and individuals who made this work possible. Special thanks go to the University of São Paulo (USP) for its exceptional research infrastructure and enriching academic environment. We would also like to thank Petróleo Brasileiro S.A. (Petrobras) for the technical support that contributed significantly to this research. During the preparation of this manuscript, the authors used ChatGPT (https://chatgpt.com) for grammar refinement and DeepL (https://www.deepl.com) for translation purposes. The authors have reviewed and edited all output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest. Although technical support was provided by Petróleo Brasileiro S.A. (Petrobras) and the research infrastructure was made available by the University of São Paulo (USP), neither institution had any role in the design of the study; in the collection, analysis, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
API	Application Programming Interface
ARO	Adaptive Response Orchestration
CIE	Collaborative Intelligence Ecosystem
CNN	Convolutional Neural Network
CTA	Cognitive Threat Analysis
CTI	Cyber Threat Intelligence
DBSCAN	Density-Based Spatial Clustering of Applications with Noise
DDA	Dynamic Data Acquisition
DMZ	Demilitarized Zone
IDS	Intrusion Detection System
IoC	Indicator of Compromise
IPS	Intrusion Prevention System
LAN	Local Area Network
MICRA	Modular Intelligent Cybersecurity Response Architecture
ML	Machine Learning
OSINT	Open Source Intelligence
PE	Portable Executable
PTD	Proactive Threat Discovery
RNN	Recurrent Neural Network
SIEM	Security Information and Event Management
SOAR	Security Orchestration, Automation, and Response
SQL	Structured Query Language
STI	Strategic Threat Intelligence
SVM	Support Vector Machine
TCP	Transmission Control Protocol
UDP	User Datagram Protocol
XAI	Explainable Artificial Intelligence

References

Al-Matarneh, E.M. Advanced Persistent Threats and its Role in Network Security Vulnerabilities. Int. J. Adv. Res. Comput. Sci. 2020, 11, 11–20. [Google Scholar] [CrossRef]
Kaushik, M. Cybersecurity Management: Developing Robust Strategies for Protecting Corporate Information Systems. Int. J. Glob. Acad. Sci. Res. 2024, 3, 24–35. [Google Scholar] [CrossRef]
Charyyev, B.; Hadi Gunes, M. Identifying Anomaly in IoT Traffic Flow With Locality Sensitive Hashes. IEEE Access 2024, 12, 89467–89478. [Google Scholar] [CrossRef]
Neshenko, N.; Nader, C.; Bou-Harb, E.; Furht, B. A Survey of Methods Supporting Cyber Situational Awareness in the Context of Smart Cities. J. Big Data 2020, 7, 92. [Google Scholar] [CrossRef]
Dave, D.; Sawhney, G.; Aggarwal, P.; Silswal, N.; Khut, D. The New Frontier of Cybersecurity: Emerging Threats and Innovations. arXiv 2023. [Google Scholar] [CrossRef]
Sun, L. Synergizing Next-Generation Firewalls and Defense-in-Depth Strategies in a Dynamic Cybersecurity Landscape. In Proceedings of the International Conference on Computer Network Security and Software Engineering (CNSSE 2024), Sanya, China, 23–25 February 2024; Karras, D.A., Gheisari, M., Eds.; SPIE: Bellingham, WA, USA, 2024; p. 39. [Google Scholar]
Gadicha, A.B.; Gadicha, V.B.; Zuhair, M.; Ingole, V.A.; Saraf, S.S. ZTA-DEVSECOPS: Strategies Towards Network Zero Trust Architecture and DevSecops in Cybersecurity and IIoT Environments. In Advances in Information Security, Privacy, and Ethics; Al-Haija, Q.A., Ed.; IGI Global: Hershey, PA, USA, 2024; pp. 306–324. ISBN 979-8-3693-3451-5. [Google Scholar]
Balisane, H.; Egho-Promise, E.I.; Lyada, E.; Aina, F. Towards Improved Threat Mitigation in Digital Environments: A Comprehensive Framework for Cybersecurity Enhancement. Int. J. Res. Granthaalayah 2024, 12, 3. [Google Scholar] [CrossRef]
Balaram, A.; Umashankari, E.; Dutt, A.; Bharadwaj, G.; V, R.; Albawi, A. Addressing the Rising Challenge of Malware Innovative Detection and Mitigation Techniques. In Proceedings of the 2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE), Gautam Buddha Nagar, India, 9–11 May 2024; pp. 1165–1166. [Google Scholar]
Temitayo, O.A.; Sarah, K.E.; Samuel, O.D.; Abimbola, O.A.; Azeez, O.H. A Review of Cybersecurity Strategies in Modern Organizations: Examining the Evolution and Effectiveness of Cybersecurity Measures for Data Protection. Comput. Sci. IT Res. J. 2024, 5, 12. [Google Scholar] [CrossRef]
Melaku, H.M. A Dynamic and Adaptive Cybersecurity Governance Framework. J. Cybersecur. Priv. 2023, 3, 327–350. [Google Scholar] [CrossRef]
Chechkin, A.; Pleshakova, E.; Gataullin, S. A Hybrid KAN-BiLSTM Transformer with Multi-Domain Dynamic Attention Model for Cybersecurity. Technologies 2025, 13, 223. [Google Scholar] [CrossRef]
Tariq, N.; Alsirhani, A.; Humayun, M.; Alserhani, F.; Shaheen, M. A Fog-Edge-Enabled Intrusion Detection System for Smart Grids. J. Cloud Comput. 2024, 13, 43. [Google Scholar] [CrossRef]
Nguyen, P.; Dautov, R.; Song, H.; Rego, A.; Iturbe, E.; Rios, E.; Sagasti, D.; Nicolas, G.; Valdés, V.; Mallouli, W.; et al. Towards Smarter Security Orchestration and Automatic Response for CPS and IoT. In Proceedings of the 2023 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), Napoli, Italy, 4–6 December 2023; pp. 298–302. [Google Scholar]
Hasan, S.M.; Alotaibi, A.M.; Talukder, S.; Shahid, A.R. Distributed Threat Intelligence at the Edge Devices: A Large Language Model-Driven Approach. In Proceedings of the 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC), Osaka, Japan, 2–4 July 2024; pp. 1496–1497. [Google Scholar]
Salem, A.H.; Azzam, S.M.; Emam, O.E.; Abohany, A.A. Advancing Cybersecurity: A Comprehensive Review of AI-Driven Detection Techniques. J. Big Data 2024, 11, 105. [Google Scholar] [CrossRef]
Okutan, A.; Werner, G.; Yang, S.J.; McConky, K. Forecasting Cyberattacks with Incomplete, Imbalanced, and Insignificant Data. Cybersecurity 2018, 1, 15. [Google Scholar] [CrossRef]
Sarker, I.H.; Kayes, A.S.M.; Badsha, S.; Alqahtani, H.; Watters, P.; Ng, A. Cybersecurity Data Science: An Overview from Machine Learning Perspective. J. Big Data 2020, 7, 41. [Google Scholar] [CrossRef]
Silaen, K.E.; Meyliana, M.; Warnars, H.L.H.S.; Prabowo, H.; Hidayanto, A.N.; Anggreainy, M.S. Usefulness of Honeypots Towards Data Security: A Systematic Literature Review. In Proceedings of the 2023 International Workshop on Artificial Intelligence and Image Processing (IWAIIP), Yogyakarta, Indonesia, 1–2 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 422–427. [Google Scholar]
Skrobanek, P. Intrusion Detection Systems; BoD–Books on Demand: Hamburg, Germany, 2011; ISBN 978-953-307-167-1. [Google Scholar]
MISP MISP Open Source Threat Intelligence Platform & Open Standards For Threat Information Sharing. Available online: https://www.misp-project.org/ (accessed on 7 August 2025).
Alzahrani, I.; Lee, S.; Kim, K. Practical Cyber Threat and OSINT Analysis based on Implementation of CTI Sharing Platform. Electronics 2024, 13, 2526. [Google Scholar] [CrossRef]
Szczepanik, W.; Niemiec, M. Heuristic Intrusion Detection Based on Traffic Flow Statistical Analysis. Energies 2022, 15, 3951. [Google Scholar] [CrossRef]
Iwabuchi, M.; Nakamura, A. A Heuristics and Machine Learning Hybrid Approach to Adaptive Cyberattack Detection. In Proceedings of the 2024 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA), Victoria, Seychelles, 1–2 February 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–7. [Google Scholar]
Amirthayogam, G.; Kumaran, N.; Gopalakrishnan, S.; Brito, K.R.A.; RaviChand, S.; Choubey, S.B. Integrating Behavioral Analytics and Intrusion Detection Systems to Protect Critical Infrastructure and Smart Cities. Babylon. J. Netw. 2024, 2024, 88–97. [Google Scholar] [CrossRef]
Krajewska, A.; Niewiadomska-Szynkiewicz, E. Clustering Network Traffic Using Semi-Supervised Learning. Electronics 2024, 13, 2769. [Google Scholar] [CrossRef]
Wang, J.; Yang, L.; Wu, J.; Abawajy, J.H. Clustering Analysis for Malicious Network Traffic. In Proceedings of the 2017 IEEE International Conference on Communications (ICC), Paris, France, 21–25 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [Google Scholar]
Pascoe, C.; Quinn, S.; Scarfone, K. The NIST Cybersecurity Framework (CSF) 2.0; NIST: Gaithersburg, MD, USA, 2024. [Google Scholar]
OpenCTI Platform. Available online: https://filigran.io/platforms/opencti/ (accessed on 7 August 2025).
Cowrie. Available online: https://docs.cowrie.org/en/latest/index.html (accessed on 24 July 2025).
Welcome to Dionaea’s Documentation!—Dionaea 0.11.0 Documentation. Available online: https://dionaea.readthedocs.io/en/latest/ (accessed on 7 August 2025).
Developments of the Honeyd Virtual Honeypot | Honeyd. Available online: https://www.honeyd.org/ (accessed on 7 August 2025).
Claise, B. Cisco Systems NetFlow Services Export Version 9; Internet Engineering Task Force: Fremont, CA, USA, 2004. [Google Scholar]
Panchen, S.; McKee, N.; Phaal, P. InMon Corporation’s sFlow: A Method for Monitoring Traffic in Switched and Routed Networks; Internet Engineering Task Force: Fremont, CA, USA, 2001. [Google Scholar]
Zseby, T.; Claise, B.; Quittek, J.; Zander, S. Requirements for IP Flow Information Export (IPFIX); Internet Engineering Task Force: Fremont, CA, USA, 2004. [Google Scholar]
nProbe. ntop. Available online: https://www.ntop.org/products/netflow-probes/nprobe/ (accessed on 7 August 2025).
The Zeek Network Security Monitor. Available online: https://zeek.org/ (accessed on 7 August 2025).
Noble, W.S. What Is a Support Vector Machine? Nat. Biotechnol. 2006, 24, 1565–1567. [Google Scholar] [CrossRef]
Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A Comparative Analysis of Gradient Boosting Algorithms. Artif. Intell. Rev. 2021, 54, 1937–1967. [Google Scholar] [CrossRef]
Biau, G. Analysis of a Random Forests Model. J Mach Learn Res 2012, 13, 1063–1095. [Google Scholar]
Kotsiantis, S.B. Decision Trees: A Recent Overview. Artif. Intell. Rev. 2013, 39, 261–283. [Google Scholar] [CrossRef]
Norouzi, M.; Fleet, D.J. Cartesian K-Means. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Portland, OR, USA, 23–28 June 2013. [Google Scholar]
Singh, H.V.; Girdhar, A.; Dahiya, S. A Literature Survey Based on DBSCAN Algorithms. In Proceedings of the 2022 6th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, Tamil Nadu, India, 25–27 May 2022; pp. 751–758. [Google Scholar]
Rawat, R.; Kassem, A.A.; Dixit, K.K.; Deepak, A.; Pushkarna, G.; Harikrishna, M. Real-Time Anomaly Detection in Large-Scale Sensor Networks Using Isolation Forests. In Proceedings of the 2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE), Gautam Buddha Nagar, India, 9–11 May 2024; pp. 1400–1405. [Google Scholar]
Tien, C.-W.; Huang, T.-Y.; Chen, P.-C.; Wang, J.-H. Using Autoencoders for Anomaly Detection and Transfer Learning in IoT. Computers 2021, 10, 88. [Google Scholar] [CrossRef]
Xiao, J.; Zhou, Z. Research Progress of RNN Language Model. In Proceedings of the 2020 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 27–29 June 2020; pp. 1285–1288. [Google Scholar]
VirusTotal-Home. Available online: https://www.virustotal.com/gui/home/upload (accessed on 7 August 2025).
Cortex. StrangeBee. Available online: https://strangebee.com/cortex/ (accessed on 7 August 2025).
AbuseIPDB-IP Address Abuse Reports-Making the Internet Safer, One IP at a Time. Available online: https://www.abuseipdb.com/ (accessed on 7 August 2025).
PhishTank | Join the Fight Against Phishing. Available online: https://phishtank.org/ (accessed on 7 August 2025).
Wazuh Wazuh-Open Source XDR. Open Source SIEM. Available online: https://wazuh.com/ (accessed on 7 August 2025).
Graylog. Available online: https://graylog.org/ (accessed on 7 August 2025).
MariaDB Foundation. Available online: https://mariadb.org/ (accessed on 7 August 2025).
PostgreSQL. Available online: https://www.postgresql.org/ (accessed on 7 August 2025).
MongoDB: The World’s Leading Modern Database. Available online: https://www.mongodb.com/ (accessed on 7 August 2025).
Elasticsearch: The Official Distributed Search & Analytics Engine. Available online: https://www.elastic.co/elasticsearch (accessed on 7 August 2025).
Exascale Object Store for AI | MinIO. Available online: https://www.min.io (accessed on 7 August 2025).
Ceph. Available online: https://ceph.io/en/ (accessed on 7 August 2025).
Apache Hadoop. Available online: https://hadoop.apache.org/ (accessed on 7 August 2025).
The Open Source SOAR for All Purposes. Available online: https://shuffler.io (accessed on 7 August 2025).
Suricata. Available online: https://suricata.io/ (accessed on 7 August 2025).
Snort-Network Intrusion Detection & Prevention System. Available online: https://www.snort.org/ (accessed on 7 August 2025).
SQLAlchemy. Available online: https://www.sqlalchemy.org (accessed on 7 August 2025).
Pandas-Python Data Analysis Library. Available online: https://pandas.pydata.org/ (accessed on 7 August 2025).
Deliver Trusted Data with Dbt | Dbt Labs. Available online: https://www.getdbt.com/ (accessed on 7 August 2025).
Superset. Available online: https://superset.apache.org/ (accessed on 7 August 2025).
Open Source Business Intelligence and Embedded Analytics. Available online: https://www.metabase.com/ (accessed on 7 August 2025).
Project Jupyter. Available online: https://jupyter.org (accessed on 7 August 2025).
Patel, P.S.; Kunwar, R.S.; Thakar, A. Malware Detection Using Yara Rules in SIEM: In Advances in Information Security, Privacy, and Ethics; Shiva Darshan, S.L., Manoj Kumar, M.V., Prashanth, B.S., Vishnu Srinivasa Murthy, Y., Eds.; IGI Global: Hershey, PA, USA, 2023; pp. 313–330. ISBN 978-1-6684-8666-5. [Google Scholar]
SigmaHQ/Sigma. Available online: https://github.com/SigmaHQ/sigma (accessed on 7 August 2025).
Osquery | Easily Ask Questions About Your Linux, Windows, and MacOS Infrastructure. Available online: https://www.osquery.io/ (accessed on 7 August 2025).
Pérez-Díaz, J.A.; Valdovinos, I.A.; Choo, K.-K.R.; Zhu, D. A Flexible SDN-Based Architecture for Identifying and Mitigating Low-Rate DDoS Attacks Using Machine Learning. IEEE Access 2020, 8, 155859–155872. [Google Scholar] [CrossRef]
Wu, L.; Won, Y.-S.; Jap, D.; Perin, G.; Bhasin, S.; Picek, S. Ablation Analysis for Multi-Device Deep Learning-Based Physical Side-Channel Analysis. IEEE Trans Dependable Secur. Comput. 2024, 21, 1331–1341. [Google Scholar] [CrossRef]
IDS 2017 | Datasets | Research | Canadian Institute for Cybersecurity | UNB. Available online: https://www.unb.ca/cic/datasets/ids-2017.html (accessed on 24 July 2025).

Figure 1. High-level view of the six core modules that make up the MICRA (Modular Intelligent Cybersecurity Response Architecture), showing the logical flow from data acquisition to collaborative intelligence sharing. Detailed diagrams and descriptions of each sub-module are presented in the following sections.

Figure 2. Internal structure of Module M1—Dynamic Data Acquisition (DDA), showing its two submodules: M1.1 Deceptive Data Streaming, which collects interaction data from honeypots, and M1.2 Network Data Streaming, which captures metadata from live network traffic. Both streams feed into M2—Cognitive Threat Analysis for subsequent processing.

Figure 3. Internal structure of Module M2—Cognitive Threat Analysis (CTA), composed of three submodules: M2.1 Pattern Recognition Threat Analyzer, which identifies malicious patterns from honeypot data; M2.2 Heuristic Threat Analyzer, which detects threats using predefined signatures and heuristics; and M2.3 Behavioral Threat Insights, which flags anomalies based on deviations from established behavioral baselines. All three feed validated outputs to M3—Strategic Threat Intelligence.

Figure 4. Internal structure of Module M3—Strategic Threat Intelligence (STI), comprising three submodules: M3.1 Threat Signature Validation, which cross-checks suspicious indicators against known threat signatures; M3.2 Expert Intelligence Validation, which incorporates analyst review to confirm or refute findings; and M3.3 Strategic Data Validation Hub, which consolidates validated indicators into a trusted repository. These outputs guide response actions in M4—Adaptive Response Orchestration, feed new seeds to M5—Proactive Threat Discovery, and share intelligence via M6—Collaborative Intelligence Ecosystem.

Figure 5. Internal structure of Module M4—Adaptive Response Orchestration (ARO), which translates validated intelligence from M3—Strategic Threat Intelligence into coordinated defensive actions. It comprises three submodules: M4.1 Dynamic Perimeter Defense, applying adaptive controls at network boundaries; M4.2 Dynamic Endpoint Defense, enforcing protection policies on individual devices; and M4.3 Dynamic Network Intrusion Prevention, blocking malicious traffic in real time based on updated threat indicators.

Figure 6. Internal structure of Module M5—Proactive Threat Discovery (PTD), which uses validated intelligence from M3—Strategic Threat Intelligence to perform retrospective and proactive threat hunting. It comprises two submodules: M5.1 SQL Threat Search Engine, which queries historical data for suspicious activity, and M5.2 Malware Pattern Search Engine, which scans for malicious artifacts based on predefined patterns and indicators.

Figure 7. Internal structure of Module M6—Collaborative Intelligence Ecosystem (CIE), which integrates validated intelligence from M3—Strategic Threat Intelligence into a collaborative framework. It consists of two submodules: M6.1 Internal Data Intelligence Hub, which distributes validated indicators across internal systems and tools, and M6.2 External Data Intelligence Hub, which securely shares and ingests threat data with trusted external partners.

Figure 8. MICRA prototype deployment topology, illustrating the interaction between virtual machines and their roles: vm-honeypot running the Cowrie honeypot for attack simulation, vm-sensor handling data collection and repository services, vm-core hosting machine learning modules, integrations, and PostgreSQL, and vm-misp operating the MISP platform for threat intelligence sharing.

Table 1. Comparative synopsis of related work, outlining each approach’s main focus, key strengths, and specific limitations that informed MICRA’s integrated design principles.

	Comparative Overview of Related Approaches and Their Influence on MICRA’s Design
Ref.	Approach/Main Focus	Key Strengths	Reported Limitations
[4]	Situational awareness for smart-city cyber-threats (ML + big-data)	Emphasizes scalable, adaptive architectures; highlights systemic dependencies	Lacks an integrated model combining dependencies, contextual intelligence, and advanced detection; adaptive response remains limited
[13]	Fog–edge IDS with ML and federated learning	Low latency, privacy preservation, improved accuracy in dynamic settings	Interoperability hurdles; model management in heterogeneous scenarios; no multidimensional event correlation
[14]	SOAR for CPS/IoT using Deep RL	Real-time decision making; CTI integration; boosts resilience	High implementation complexity; scenario-specific focus; needs high-quality, balanced data
[9]	Hybrid malware detection (CNN + RNN)	High detection accuracy for complex patterns	Limited scalability in large, diverse settings; dependent on high-quality labelled data
[15]	Edge-based LLMs with federated learning	Near-source analysis; privacy-friendly; real-time detection	Heavy compute demand on constrained devices; costly model upkeep at scale
[16]	Survey of AI threat-detection techniques (60 + studies)	Covers multiple attack vectors; evidences ML/DL benefits	Strong dependence on labelled data; high resource use; low interpretability
[17]	Attack forecasting via social-media indicators	Proactive detection; handles incomplete/unbalanced data	Public indicators miss enterprise-specific threats; needs large historical volumes
[18]	Modular detection-and-response (supervised, unsupervised, RL)	Automation and adaptability across data sources	Requires high-quality data and continuous updates; operational complexity
[19]	Systematic review of honeypots (38 studies)	Demonstrates value in health, cloud, ICS; integrates ML	Resource-intensive high-interaction honeypots; constant updates needed
[20]	Honeypot + IDS correlation with One-Class SVM	Works with unlabeled data; suitable for live ops	Sensitive to outliers; limited against complex zero days
[22]	CTI architecture on MISP (honeypot and OSINT)	Facilitates IoC sharing; fosters proactive defense	Limited cross-org collaboration; regional biases; IoC-validation gaps
[23]	Heuristic flow-statistics IDS	Effective on variants sharing known traits	Relies on predefined patterns; weak vs. unknown/polymorphic threats
[24]	Heuristic + ML hybrid on packet data	Better detection of complex patterns; promising results	False positives; needs continual retraining
[25]	Real-time behavioral analytics in IDS	Continuous monitoring; flags traffic anomalies	Dependent on partially labelled, high-quality data
[26]	Semi-supervised clustering of malicious traffic	Detects unknown patterns; modular design	Sensitive to label quality; parameter-tuning complexity
[27]	Seed-Expanding clustering for attack phases	Identifies behaviors pre-damage; phase-aware view	Heavy preprocessing; struggles with highly dynamic patterns

Table 2. Alignment of MICRA modules with the NIST CSF 2.0 functions, mapping each module to the framework’s core categories to illustrate its coverage and integration within the standard.

CSF 2.0 Function	Key Categories	MICRA Modules That Fulfil the Function
GV—Govern	Organizational context, risk strategy, policies and roles	Not covered in the current version
ID—Identify	Asset Management, Risk Assessment, Improvement	M1 (DDA) and M5 (PTD) supply the M3.3 Strategic Data-Validation Hub with a dynamic inventory of assets and risk scores.
PR—Protect	Access and Identity, Data/Platform Security, Infrastructure Resilience	M4 (ARO) enforces adaptive safeguards across the perimeter, endpoints and IPS.
DE—Detect	Continuous Monitoring, Adverse-Event Analysis	M1 (DDA) data sensors; M2 (CTA) cognitive analysis; M3 (STI) strategic correlation; M5 (PTD) retrospective threat-hunts.
RS—Respond	Incident Management, Analysis, Mitigation, Communication	M4 (ARO) executes counter-measures; M6 (CIE) communicates and shares validated indicators externally.
RC—Recover	Recovery-Plan Execution, Communication	Not covered in the current version

Table 3. Data flows and their underlying rationale among MICRA’s modules, indicating the main directional connections between components and explaining how each link supports the architecture’s detection, validation, response, and collaborative intelligence processes.

Module	Main Arrow Flows	Rationale Behind the Arrows Flows
M1—DDA	M1 → M2	Raw data (network flows, honeypot logs, etc.) are forwarded directly to cognitive analysis. There is no return arrow because collection is passive; capture adjustments are handled later by response policies in M4.
M2—CTA	M2 → M3	Events that reach the confidence threshold are promoted to strategic evidence for the intelligence layer (M3).
M3 –STI	M3 → M4 (single arrow) M3 → M5, M6 (branching arrows)	M3 enriches and prioritizes evidence. To M4: passes context (criticality, verified indicators) that guides adaptive counter-measures. To M5: supplies validated IoCs as “seeds” for proactive threat hunts. To M6: shares the same intelligence with external partners, strengthening collaborative CTI.
M5 –PTD	M5 → M3	Newly discovered threats are sent back to M3 for validation and incorporation into the strategic repository, closing the intelligence <-> detection loop.
M6 –CIE	M6 → M3 (feedback)	Indicators received from external partners return to M3 for vetting before entering the pipeline; M6 itself does not trigger counter-measures but acts as a CTI bridge.

Table 4. Mapping of MICRA submodules to open-source tools and prototype functions.

Submodule	Tool	Function in Prototype
M1.1—Deceptive Data Streaming	Cowrie honeypot	Generation of suspicious traffic (SSH/Telnet) and initial indicators.
M1.2—Network Data Streaming	CSE-CIC-IDS 2017	Dataset of benign and malicious data to validate the model in a controlled environment.
M2.1—Pattern Recognition Threat Analyzer	Cowrie honeypot; CSE-CIC-IDS 2017; Python	Deterministic and heuristic engine based on malicious patterns observed in the honeypot.
M2.2—Heuristic Threat Analyzer	scikit-learn; Python	Detect patterns associated with known malware based on predefined heuristic characteristics (file structure, packet headers, system call sequences, and traffic patterns).
M2.3—Behavioral Threat Insights	scikit-learn; Python	Behavioral analysis to detect deviations from the usual operating pattern. Uses clustering and outlier detection techniques to identify suspicious behavior.
M3.1—Threat Signature Validation	AbuseIPDB	Validation of suspicious IoCs.
M3.2—Expert Intelligence Validation	Manual analysis with multiple tools	Manual curation to confirm or discard suspicious data.
M3.3—Strategic Data Validation Hub	PostgreSQL	Structured storage of validated data.
M4.1—Dynamic Perimeter Defense	Python scripts	Orchestration of blocks on the security perimeter.
M4.2—Dynamic Endpoint Defense	Python scripts	Propagation of rules to agents on devices.
M4.3—Dynamic Network Intrusion Prevention	Python scripts	Automation of IPS responses.
M5.1—SQL Threat Search Engine	Python scripts	SQL queries on historical data.
M5.2—Malware Pattern Search Engine	YARA Rules; Sigma Rules; Python scripts	Searches for malware patterns in historical data and creates alerts for SIEM.
M6.1—Internal Data Intelligence Hub	MISP; Python scripts	IoC repository and internal sharing.
M6.2—External Data Intelligence Hub	MISP; Python scripts	IoC repository and external sharing.

Table 5. Summary of honeypot data collected by Cowrie.

Metric	Value
Time interval	20 July 2025, 10:05:02 p.m. UTC → 23 July 2025, 1:48:37 a.m. UTC
Total events	16,205
SSH/Telnet sessions	2373
Unique IP addresses	351
Attempts from the 10 most active IPs	12,569
Most frequent event types	cowrie.command.input: 3047 cowrie.session.connect: 2373 cowrie.session.closed: 2371 cowrie.client.version: 1684 cowrie.client.kex: 1659
Files requested for download	7 distinct

Table 6. Summary of CIC-IDS 2017 data.

Metric	Value
Time interval	7 July 2017, 1:00 a.m. → 7 July 2017, 11:59 p.m. UTC
Total flows	286,475
Benign class	127,537 (45 %)
Malicious flows	158,938 (55 %)
Dominant attacks	PortScan—158.930 flows
Main protocols	TCP (6)—79% • UDP (17)—21% • Outros (0)—0%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Coutinho, A.C.; Araújo, L.V.d. MICRA: A Modular Intelligent Cybersecurity Response Architecture with Machine Learning Integration. J. Cybersecur. Priv. 2025, 5, 60. https://doi.org/10.3390/jcp5030060

AMA Style

Coutinho AC, Araújo LVd. MICRA: A Modular Intelligent Cybersecurity Response Architecture with Machine Learning Integration. Journal of Cybersecurity and Privacy. 2025; 5(3):60. https://doi.org/10.3390/jcp5030060

Chicago/Turabian Style

Coutinho, Alessandro Carvalho, and Luciano Vieira de Araújo. 2025. "MICRA: A Modular Intelligent Cybersecurity Response Architecture with Machine Learning Integration" Journal of Cybersecurity and Privacy 5, no. 3: 60. https://doi.org/10.3390/jcp5030060

APA Style

Coutinho, A. C., & Araújo, L. V. d. (2025). MICRA: A Modular Intelligent Cybersecurity Response Architecture with Machine Learning Integration. Journal of Cybersecurity and Privacy, 5(3), 60. https://doi.org/10.3390/jcp5030060

Article Menu

MICRA: A Modular Intelligent Cybersecurity Response Architecture with Machine Learning Integration

Abstract

1. Introduction

2. Related Work

2.1. Adaptive and Scalable Architectures

2.2. Detection Based on Machine Learning

2.3. Cyber Threat Intelligence (CTI)

2.4. Heuristic Approaches and Behavioral Analysis

2.5. Summary of Gaps and Motivation for the Proposal

2.6. Alignment with NIST CSF 2.0

3. Proposed Architecture

3.1. M1—Dynamic Data Acquisition (DDA)

3.2. M2—Cognitive Threat Analysis (CTA)

3.3. M3—Strategic Threat Intelligence (STI)

3.4. M4—Adaptive Response Orchestration (ARO)

3.5. M5—Proactive Threat Discovery (PTD)

3.6. M6—Collaborative Intelligence Ecosystem (CIE)

4. Towards Practical Implementation

4.1. M1—Dynamic Data Acquisition (DDA)

4.2. M2—Cognitive Threat Analysis (CTA)

4.3. M3—Strategic Threat Intelligence (STI)

4.4. M4—Adaptive Response Orchestration (ARO)

4.5. M5—Proactive Threat Discovery (PTD)

4.6. M6—Collaborative Intelligence Ecosystem (CIE)

5. Prototype Implementation and Evaluation

5.1. Design Principles

5.2. Reference Stack

5.3. Deployment Topology

5.4. Datasets

5.4.1. Evaluation Protocol

Metrics

5.4.2. Reproducibility

M1.1—Deceptive Data Streaming

M1.2—Network Data Streaming

M2.1—Pattern Recognition Threat Analyzer

M2.2—Heuristic Threat Analyzer

M2.3—Behavioral Threat Insights

M3.1—Threat Signature Validation

M3.2—Expert Intelligence Validation

M3.3—Strategic Data Validation Hub

M4.1—Dynamic Perimeter Defense

M4.2—Dynamic Endpoint Defense

M4.3—Dynamic Network Intrusion Prevention

M5.1—SQL Threat Search Engine

M5.2—Malware Pattern Search Engine

M6.1—Internal Data Intelligence Hub

M6.2—External Data Intelligence Hub

6. Discussion

7. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI