All data represent real interactions between external agents and the honeypot, without synthetic traffic or data anonymization.
In the MICRA prototype, the Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX file was specifically used, which represents the port scanning (PortScan) scenario during working hours on a Friday. This subset includes flows labeled with BENIGN or PortScan.
In MICRA, the Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv subset of the CIC-IDS 2017 dataset simulates hybrid network traffic by combining benign and malicious flows in realistic proportions. This allows the detection and response pipeline to be exercised under conditions comparable to those of a real corporate network. When combined with genuine events captured by the Cowrie honeypot, the dataset creates a test environment that mixes legitimate behavior and suspicious activity, supporting the proof of concept of MICRA without resorting to synthetic data.
5.4.2. Reproducibility
To ensure methodological transparency and facilitate independent replication of the experiments, this section details the steps performed in each module implemented in the MICRA prototype. We address aspects such as the operating environment, tools, scripts used, and any specific procedures adopted to enable the reported results to be reproduced by other researchers. Each subsection includes a brief but accurate description of the steps necessary for direct replication of the experiments performed.
M1.1—Deceptive Data Streaming
The vm-honeypot virtual machine was provisioned with the Cowrie honeypot, installed and configured according to the official documentation [
30]. During the experimental period, Cowrie generated session logs exclusively from public-facing SSH (TCP port 22) and Telnet (TCP port 23) services exposed to the internet.
The events used in this study were extracted solely from the honeypot logs recorded during this period. These were consolidated and exported in JSON format under the filename cowrie.json, which is included in the
Supplementary Material of this article.
To facilitate processing and analysis, the cowrie.json file was securely transferred from vm-honeypot to vm-core using the SCP (Secure Copy Protocol), preserving data integrity for subsequent stages of the MICRA pipeline.
M1.2—Network Data Streaming
The CSE-CIC-IDS 2017 dataset was obtained directly from the official repository maintained by the Canadian Institute for Cybersecurity (CIC) [
74]. After downloading and extracting the dataset on the data capture virtual machine (vm-sensor), we selected the specific subset titled Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv, provided in CSV format.
This subset was chosen due to its relevance to network-based threat detection scenarios and is included in the
Supplementary Material of this article to support independent validation. The selected file was subsequently transferred to the central virtual machine (vm-core) using the Secure Copy Protocol (SCP), ensuring consistency with the data-handling procedures adopted throughout the prototype.
M2.1—Pattern Recognition Threat Analyzer
The M2.1 submodule was implemented as a deterministic pattern analyzer responsible for identifying Indicators of Compromise (IoCs) extracted from honeypot logs and applying them to real network traffic. This module plays a key role in the MICRA prototype by translating malicious interactions observed in controlled environments into actionable intelligence within operational flows.
In this implementation, the system reads the cowrie.json file line by line and extracts a set of IoCs, including IP addresses, domain names, full URLs, remote file names, protocol banners (e.g., SSH or HTTP user-agent strings), and file hashes (MD5, SHA-1, SHA-256), retrieved either from commands (input) or from the shasum field. These IoCs are consolidated in the file honeypot_ioc.csv.
The network traffic analyzed corresponds to the subset Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv from the CSE-CIC-IDS 2017 dataset. For each record, the script performs a full field-wise scan, comparing all values against the extracted IoC set using strict equality. If any match is found, the record is labeled with suspect_network_honeypot.
All flagged entries are saved to a new file, network_stream_honeypot_labeled.csv, which is included in the
Supplementary Material of this article. The entire process was executed on the vm-core machine using Python 3.12.
Input Files
cowrie.json—Honeypot interaction log containing attacker commands and metadata.
Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv—Netflow records from the CSE-CIC-IDS 2017 dataset, representing real-world network traffic.
Output Files
honeypot_ioc.csv—Consolidated list of unique IoCs extracted from the honeypot logs.
network_stream_honeypot_labeled.csv—Subset of netflow entries flagged as malicious based on IoC matching.
Reproducibility Instructions
To ensure reproducibility, the analysis script m21_analyzer_deterministic.py is provided, along with the required datasets. To replicate the results, execute the following steps:
- 1.
Place the required input files in the same directory as the script.
- 2.
Install the necessary dependencies:
pip install pandas
- 3.
Run the analysis:
python m21_analyzer_deterministic.py
Execution Results
[+] 437 IoCs saved to honeypot_ioc.csv
[+] 0 rows labeled and exported to network_stream_honeypot_labeled.csv
This modular and transparent approach reinforces the MICRA design principle of adaptability. While the current implementation is based on direct matching of known IoCs, this submodule can be replaced or extended to incorporate alternative detection strategies, including probabilistic or hybrid techniques, without compromising the architectural integrity.
M2.2—Heuristic Threat Analyzer
The M2.2 submodule was implemented as a supervised classification engine responsible for detecting malicious behavior in network traffic using heuristic indicators. Its objective is to simulate real-world application of machine learning techniques to classify complex behavioral patterns associated with known malware, brute-force attempts, and coordinated scanning activities. The submodule operates on labeled datasets and employs standard evaluation metrics to assess detection performance.
For this prototype, the subset Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv from the public dataset CSE-CIC-IDS 2017 was used. The dataset was split into three parts: 70% for training, 20% for evaluation and metrics calculation, and 10% reserved for runtime inference simulation. The latter portion was saved as new_flow.csv, emulating future real-time scenarios.
The script m22_heuristic_multi.py supports parallel evaluation of multiple classifiers. In this execution, seven models were tested: Decision Tree, Random Forest, Gradient Boosting, Extra Trees, Logistic Regression, XGBoost, and LightGBM. Each model was assessed based on Accuracy, Precision, Recall, F1-score (malicious class), AUC-ROC, False Positive Rate (FPR), and Coverage. The best-performing model was selected automatically using a composite criterion (highest F1-score and lowest FPR) and exported as m22_model_best.joblib for future inference. The final labeling from this model was saved as network_stream_heuristic_labeled.csv.
The entire process was executed on the vm-core machine using Python 3.12 and the libraries pandas, scikit-learn, xgboost, lightgbm, and joblib.
Input Files
Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv: Raw labeled netflow dataset used for training and evaluation.
Output Files
network_stream_heuristic_<model>.csv: Flows labeled as suspicious by each evaluated model.
network_stream_heuristic_labeled.csv: Final output using the best classifier.
m22_model_best.joblib: Serialized model with best F1-score and lowest FPR.
m22_comparison_metrics.csv: Summary of metrics for all evaluated models.
new_flow.csv: 10% subset reserved to simulate unseen real-time flows.
Reproducibility Instructions
- 1.
Install dependencies
pip install pandas scikit-learn xgboost lightgbm joblib
- 2.
Prepare dataset
Ensure the file Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv is in the same directory as the script.
- 3.
Run the script
python m22_heuristic_multi.py --models dt rf gb et lr xgb lgb
This command will:
Train and evaluate all models (
Table 7,
Table 8,
Table 9,
Table 10,
Table 11,
Table 12,
Table 13,
Table 14,
Table 15,
Table 16,
Table 17,
Table 18,
Table 19,
Table 20 and
Table 21);
Export labeled flows for each model;
Automatically select and serialize the best model;
Save a CSV file for runtime simulation.
Execution Results
[+] Loading dataset…
[✓] Runtime exported as: new_flow.csv
[✓] Model: Decision Tree
Table 7.
Decision tree: evaluation metrics.
Table 7.
Decision tree: evaluation metrics.
Accuracy | 0.9999 |
Precision (Malicious) | 1.0000 |
Recall (Malicious) | 0.9999 |
F1-score (Malicious) | 0.9999 |
AUC-ROC | 0.9999 |
False Positives (FP) | 1 |
False Negatives (FN) | 3 |
Specificity (TN rate) | 1.0000 |
False Positive Rate | 0.0000 |
False Negative Rate | 0.0001 |
Coverage | 0.5502 |
Table 8.
Decision tree: classification report.
Table 8.
Decision tree: classification report.
| Precision | Recall | F1-Score | Support |
---|
BENIGN | 1.00 | 1.00 | 1.00 | 25,771 |
MALICIOUS | 1.00 | 1.00 | 1.00 | 31,523 |
accuracy | | | 1.00 | 57,294 |
macro avg | 1.00 | 1.00 | 1.00 | 57,294 |
weighted avg | 1.00 | 1.00 | 1.00 | 57,294 |
[✓] Exported: network_stream_heuristic_decision_tree.csv
[✓] Model: Random Forest
Table 9.
Random forest: evaluation metrics.
Table 9.
Random forest: evaluation metrics.
Accuracy | 0.9999 |
Precision (Malicious) | 1.0000 |
Recall (Malicious) | 0.9999 |
F1-score (Malicious) | 1.0000 |
AUC-ROC | 1.0000 |
False Positives (FP) | 0 |
False Negatives (FN) | 3 |
Specificity (TN rate) | 1.0000 |
False Positive Rate | 0.0000 |
False Negative Rate | 0.0001 |
Coverage | 0.5501 |
Table 10.
Random forest: classification report.
Table 10.
Random forest: classification report.
| Precision | Recall | F1-Score | Support |
---|
BENIGN | 1.00 | 1.00 | 1.00 | 25,771 |
MALICIOUS | 1.00 | 1.00 | 1.00 | 31,523 |
accuracy | | | 1.00 | 57,294 |
macro avg | 1.00 | 1.00 | 1.00 | 57,294 |
weighted avg | 1.00 | 1.00 | 1.00 | 57,294 |
[✓] Exported: network_stream_heuristic_random_forest.csv
[✓] Model: Gradient Boosting
Table 11.
Gradient boosting: evaluation metrics.
Table 11.
Gradient boosting: evaluation metrics.
Accuracy | 0.9999 |
Precision (Malicious) | 1.0000 |
Recall (Malicious) | 0.9998 |
F1-score (Malicious) | 0.9999 |
AUC-ROC | 1.0000 |
False Positives (FP) | 0 |
False Negatives (FN) | 5 |
Specificity (TN rate) | 1.0000 |
False Positive Rate | 0.0000 |
False Negative Rate | 0.0002 |
Coverage | 0.5501 |
Table 12.
Gradient boosting: classification report.
Table 12.
Gradient boosting: classification report.
| Precision | Recall | F1-Score | Support |
---|
BENIGN | 1.00 | 1.00 | 1.00 | 25,771 |
MALICIOUS | 1.00 | 1.00 | 1.00 | 31,523 |
accuracy | | | 1.00 | 57,294 |
macro avg | 1.00 | 1.00 | 1.00 | 57,294 |
weighted avg | 1.00 | 1.00 | 1.00 | 57,294 |
[✓] Exported: network_stream_heuristic_gradient_boosting.csv
[✓] Model: Extra Trees
Table 13.
Extra trees: evaluation metrics.
Table 13.
Extra trees: evaluation metrics.
Accuracy | 0.9999 |
Precision (Malicious) | 1.0000 |
Recall (Malicious) | 0.9999 |
F1-score (Malicious) | 1.0000 |
AUC-ROC | 1.0000 |
False Positives (FP) | 0 |
False Negatives (FN) | 3 |
Specificity (TN rate) | 1.0000 |
False Positive Rate | 0.0000 |
False Negative Rate | 0.0001 |
Coverage | 0.5501 |
Table 14.
Extra trees: classification report.
Table 14.
Extra trees: classification report.
| Precision | Recall | F1-Score | Support |
---|
BENIGN | 1.00 | 1.00 | 1.00 | 25,771 |
MALICIOUS | 1.00 | 1.00 | 1.00 | 31,523 |
accuracy | | | 1.00 | 57,294 |
macro avg | 1.00 | 1.00 | 1.00 | 57,294 |
weighted avg | 1.00 | 1.00 | 1.00 | 57,294 |
[✓] Exported: network_stream_heuristic_extra_trees.csv
[✓] Model: Logistic Regression
Table 15.
Logistic regression: evaluation metrics.
Table 15.
Logistic regression: evaluation metrics.
Accuracy | 0.9630 |
Precision (Malicious) | 0.9402 |
Recall (Malicious) | 0.9962 |
F1-score (Malicious) | 0.9674 |
AUC-ROC | 0.9854 |
False Positives (FP) | 1998 |
False Negatives (FN) | 121 |
Specificity (TN rate) | 0.9225 |
False Positive Rate | 0.0775 |
False Negative Rate | 0.0038 |
Coverage | 0.5830 |
Table 16.
Logistic regression: classification report.
Table 16.
Logistic regression: classification report.
| Precision | Recall | F1-Score | Support |
---|
BENIGN | 0.99 | 0.92 | 0.96 | 25,771 |
MALICIOUS | 0.94 | 1.00 | 0.97 | 31,523 |
accuracy | | | 0.96 | 57,294 |
macro avg | 0.97 | 0.96 | 0.96 | 57,294 |
weighted avg | 0.96 | 0.96 | 0.96 | 57,294 |
[✓] Exported: network_stream_heuristic_logistic_regression.csv
[✓] Model: XGBoost
Table 17.
XGBoost: evaluation metrics.
Table 17.
XGBoost: evaluation metrics.
Accuracy | 1.0000 |
Precision (Malicious) | 1.0000 |
Recall (Malicious) | 1.0000 |
F1-score (Malicious) | 1.0000 |
AUC-ROC | 1.0000 |
False Positives (FP) | 0 |
False Negatives (FN) | 1 |
Specificity (TN rate) | 1.0000 |
False Positive Rate | 0.0000 |
False Negative Rate | 0.0000 |
Coverage | 0.5502 |
Table 18.
XGBoost: classification report.
Table 18.
XGBoost: classification report.
| Precision | Recall | F1-Score | Support |
---|
BENIGN | 1.00 | 1.00 | 1.00 | 25,771 |
MALICIOUS | 1.00 | 1.00 | 1.00 | 31,523 |
accuracy | | | 1.00 | 57,294 |
macro avg | 1.00 | 1.00 | 1.00 | 57,294 |
weighted avg | 1.00 | 1.00 | 1.00 | 57,294 |
[✓] Exported: network_stream_heuristic_xgboost.csv
[✓] Model: LightGBM
Table 19.
LightGBM: evaluation metrics.
Table 19.
LightGBM: evaluation metrics.
Accuracy | 1.0000 |
Precision (Malicious) | 1.0000 |
Recall (Malicious) | 1.0000 |
F1-score (Malicious) | 1.0000 |
AUC-ROC | 1.0000 |
False Positives (FP) | 1 |
False Negatives (FN) | 0 |
Specificity (TN rate) | 1.0000 |
False Positive Rate | 0.0000 |
False Negative Rate | 0.0000 |
Coverage | 0.5502 |
Table 20.
LightGBM: classification report.
Table 20.
LightGBM: classification report.
| Precision | Recall | F1-Score | Support |
---|
BENIGN | 1.00 | 1.00 | 1.00 | 25,771 |
MALICIOUS | 1.00 | 1.00 | 1.00 | 31,523 |
accuracy | | | 1.00 | 57,294 |
macro avg | 1.00 | 1.00 | 1.00 | 57,294 |
weighted avg | 1.00 | 1.00 | 1.00 | 57,294 |
[✓] Exported: network_stream_heuristic_lightgbm.csv
[✓] Comparison table saved as: m22_comparison_metrics.csv
[✓] Summary table:
Table 21.
Summary of classifier metrics.
Table 21.
Summary of classifier metrics.
Model | f1 | Accuracy | fpr | roc_auc |
---|
Decision Tree | 0.999937 | 0.999930 | 0.000039 | 0.999933 |
Random Forest | 0.999952 | 0.999948 | 0.000000 | 1.000000 |
Gradient Boosting | 0.999921 | 0.999913 | 0.000000 | 0.999992 |
Extra Trees | 0.999952 | 0.999948 | 0.000000 | 1.000000 |
Logistic Regression | 0.967361 | 0.963015 | 0.077529 | 0.985449 |
XGBoost | 0.999984 | 0.999983 | 0.000000 | 1.000000 |
LightGBM | 0.999984 | 0.999983 | 0.000039 | 1.000000 |
[✓] Best model: LightGBM—saved as: m22_model_best.joblib
[✓] Consolidated output → network_stream_heuristic_labeled.csv
Total execution time: 209.41 s
M2.3—Behavioral Threat Insights
The M2.3 submodule was implemented as an unsupervised anomaly detection engine aimed at identifying behavioral deviations in network traffic without relying on pre-labeled data. It integrates multiple models to infer abnormal patterns based on statistical and clustering-based methods, thereby enabling a robust detection pipeline for unknown or evolving threats.
In this implementation, the system processes the file Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv, a subset of the CSE-CIC-IDS 2017 dataset. All numeric features are standardized and submitted to four anomaly detection algorithms:
Each model is configured to detect the top 1% most anomalous flows in the dataset. PCA and KM are preceded by dimensionality reduction steps to enhance performance and accuracy. After execution, the results from all models are merged to calculate a composite field named behavioral_score, representing the number of models that classified the same flow as anomalous (ranging from 0 to 4).
All experiments were executed on the vm-core machine using Python 3.12, and required libraries include pandas, numpy, and scikit-learn.
Input Files
Output Files
network_stream_behavioral_labeled.csv: full dataset with labels from any model (score ≥ 1)
m23_behavioral_score3plus.csv: filtered flows with score ≥ 3 (high-confidence anomalies)
m23_behavioral_high_confidence.csv: flows flagged by all models (score = 4)
m23_comparison_metrics.csv: anomaly count and coverage by model
m23_behavioral_ip_summary.csv: top Source → Destination IP pairs with highest aggregate scores
Reproducibility Instructions
- 1.
Install dependencies:
pip install pandas numpy scikit-learn
- 2.
Prepare dataset:
Ensure the file Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv is in the same directory as the script.
- 3.
Run the script:
python m23_behavioral_analyzer.py \
--input “Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv” \
--models iso lof pca km
You may choose a subset of models using --models, such as iso pca.
Execution Results
[+] Loading: Friday-WorkingHours-Afternoon-PortScan.pcap_ISCX.csv
[✓] Running ISO…
[✓] Model: ISO
Table 22.
Isolation forest anomaly detection summary.
Table 22.
Isolation forest anomaly detection summary.
Anomalies detected | 2865 |
Coverage (dataset) | 1.00% |
Avg anomaly score | −0.026 |
[✓] ISO completed in 2.98 s.
[✓] Running LO…
[✓] Model LOF
Table 23.
Local Outlier Factor (LOF) anomaly detection summary.
Table 23.
Local Outlier Factor (LOF) anomaly detection summary.
Anomalies detected | 2865 |
Coverage (dataset) | 1.00% |
Mean lof score | 1,682,202.493 |
[✓] LOF completed in 532.12 s.
[✓] Running PCA…
[✓] Model PCA
Table 24.
PCA-based anomaly detection summary.
Table 24.
PCA-based anomaly detection summary.
Anomalies detected | 2865 |
Coverage (dataset) | 1.00% |
Explained var | 76.92% |
[✓] PCA completed in 1.05 s.
[✓] Running KM…
[✓] Model KM
Table 25.
K-Means anomaly detection summary (with PCA preprocessing).
Table 25.
K-Means anomaly detection summary (with PCA preprocessing).
Anomalies detected | 2865 |
Coverage (dataset) | 1.00% |
Mean distance to centroid | 21.433 |
[✓] KM completed in 1.22 s.
behavioral_score distribution:
Table 26.
Distribution of flows by behavioral score.
Table 26.
Distribution of flows by behavioral score.
Score 0 | 278,198 (97.11%) |
Score 1 | 6015 (2.10%) |
Score 2 | 1325 (0.46%) |
Score 3 | 921 (0.32%) |
Score 4 | 8 (0.00%) |
Average behavioral_score: 0.040
Score Group Summary:
Table 27.
Behavioral score aggregation summary.
Table 27.
Behavioral score aggregation summary.
Behavioral_Score | Count | Percentage |
---|
0 | 278,198 | 97.11 |
1 | 6015 | 2.10 |
2 | 1325 | 0.46 |
3 | 921 | 0.32 |
4 | 8 | 0.00 |
Total execution time: 544.35 s
Glossary:
Table 28.
Description of output metrics and files generated by M2.3.
Table 28.
Description of output metrics and files generated by M2.3.
avg_anomaly_score | mean anomaly score from Isolation Forest |
mean_lof_score | average inverse density from LOF |
explained_var | % of variance explained by PCA components |
mean_distance_to_centroid | average distance to cluster center (K-Means) |
overlap | number of flows flagged by both models |
jaccard | intersection/union of two detection sets |
behavioral_score | number of models that flagged the flow (0–4) |
coverage | proportion of dataset flagged as anomalous |
network_stream_behavioral_labeled.csv | flows flagged by any model |
score3plus.csv | flows flagged by ≥3 models (triage) |
m23_behavioral_high_confidence.csv | flows flagged by all models |
m23_comparison_metrics.csv | anomaly count and coverage by model |
m23_behavioral_ip_summary.csv | top Source→Destination IP pairs |
[✓] Top 10 Source→Destination IPs:
Table 29.
Top 10 source–destination IP pairs (high-confidence flows).
Table 29.
Top 10 source–destination IP pairs (high-confidence flows).
Source IP | Destination IP | Count | Avg_Score |
---|
192.168.10.17 | 104.197.43.56 | 3 | 4.0 |
192.168.10.12 | 79.127.127.5 | 1 | 4.0 |
192.168.10.15 | 52.84.26.193 | 1 | 4.0 |
192.168.10.16 | 173.241.242.220 | 1 | 4.0 |
192.168.10.16 | 198.54.12.96 | 1 | 4.0 |
192.168.10.25 | 192.168.10.3 | 1 | 4.0 |
M3.1—Threat Signature Validation
The M3.1 submodule is responsible for validating potentially malicious IP addresses observed across all previous MICRA detection modules by cross-referencing them with external threat intelligence. Specifically, the system integrates with the VirusTotal public API, allowing the enrichment of Indicators of Compromise (IoCs) by leveraging a reputational score computed from multiple antivirus and security engines.
The script m31_threat_validate.py consolidates the Top 50 public IP addresses from each of the following labeled network traffic files:
network_stream_honeypot_labeled.csv (output of M2.1);
network_stream_heuristic_labeled.csv (output of M2.2);
network_stream_behavioral_labeled.csv (output of M2.3).
Each IP is validated using VirusTotal’s v3 API, with results cached locally in SQLite to minimize redundant queries and improve performance. The validation logic applies a simple binary verdict: an IP is labeled malicious if its malicious score from VirusTotal is 1 or higher; otherwise, it is considered benign.
Each verified IP is stored in a PostgreSQL table named validated_iocs, including relevant metadata such as number of reports, associated country, and the date of last analysis. The final report is also exported as a consolidated CSV for traceability.
This module reinforces MICRA’s layered defense by correlating internal detection signals with trusted external sources, increasing confidence and supporting further response or triage steps.
Input Files
(Labeled flows flagged by deterministic pattern-matching based on honeypot IoCs);
(Labeled flows from supervised ML models);
(Labeled flows from behavioral anomaly detection).
Output Files
Consolidated VirusTotal enrichment results for the top 50 public IPs from each input file (
Table 30).
Stores normalized IP reputation verdicts and metadata for long-term querying.
Reproducibility Instructions
- 1.
Set environment variable for database access:
export PG_DSN = “postgresql://micra_user:micra_pass@localhost:5432/micra”;
- 2.
Ensure VirusTotal API key is configured:
Either by editing VT_API_KEY directly in the script, or:
export VT_API_KEY = “your_api_key_here”;
- 3.
Install required dependencies:
pip install pandas requests psycopg2-binary;
- 4.
Run the validation:
python m31_threat_validate.py \
--m21 network_stream_honeypot_labeled.csv \
--m22 network_stream_heuristic_labeled.csv \
--m23 network_stream_behavioral_labeled.csv
Execution Results
[+] network_stream_honeypot_labeled.csv: top 0 public IPs collected
[+] network_stream_heuristic_labeled.csv: top 1 public IPs collected
[+] network_stream_behavioral_labeled.csv: top 50 public IPs collected
[+] Total unique IPs to check: 51
Table 30.
Public IPs Validated through VirusTotal.
Table 30.
Public IPs Validated through VirusTotal.
(1/51) 104.31.91.87 | VT_malicious = 0 | (2/51) 104.97.120.94 | VT_malicious = 0 |
(3/51) 104.97.137.26 | VT_malicious = 0 | (4/51) 106.122.252.16 | VT_malicious = 0 |
(5/51) 141.170.25.54 | VT_malicious = 0 | (6/51) 151.101.21.127 | VT_malicious = 0 |
(7/51) 157.240.18.19 | VT_malicious = 0 | (8/51) 157.240.18.35 | VT_malicious = 3 |
(9/51) 157.240.2.25 | VT_malicious = 0 | (10/51) 157.240.2.35 | VT_malicious = 0 |
(11/51) 160.17.5.1 | VT_malicious = 2 | (12/51) 162.213.33.50 | VT_malicious = 0 |
(13/51) 17.253.14.125 | VT_malicious = 0 | (14/51) 172.217.10.110 | VT_malicious = 0 |
(15/51) 172.217.10.130 | VT_malicious = 0 | (16/51) 172.217.10.226 | VT_malicious = 0 |
(17/51) 172.217.10.66 | VT_malicious = 0 | (18/51) 172.217.12.162 | VT_malicious = 0 |
(19/51) 172.217.12.174 | VT_malicious = 0 | (20/51) 172.217.12.206 | VT_malicious = 0 |
(21/51) 172.217.3.110 | VT_malicious = 0 | (22/51) 172.217.3.98 | VT_malicious = 0 |
(23/51) 172.217.6.194 | VT_malicious = 0 | (24/51) 172.217.9.226 | VT_malicious = 0 |
(25/51) 173.241.242.143 | VT_malicious = 0 | (26/51) 178.124.129.12 | VT_malicious = 0 |
(27/51) 178.255.83.1 | VT_malicious = 0 | (28/51) 192.229.211.82 | VT_malicious = 0 |
(29/51) 192.82.242.23 | VT_malicious = 0 | (30/51) 217.118.87.98 | VT_malicious = 0 |
(31/51) 31.13.71.36 | VT_malicious = 1 | (32/51) 31.13.71.7 | VT_malicious = 0 |
(33/51) 31.13.80.12 | VT_malicious = 1 | (34/51) 37.209.240.1 | VT_malicious = 0 |
(35/51) 50.63.243.230 | VT_malicious = 0 | (36/51) 62.161.94.230 | VT_malicious = 0 |
(37/51) 63.251.240.12 | VT_malicious = 0 | (38/51) 67.72.99.137 | VT_malicious = 0 |
(39/51) 68.67.178.111 | VT_malicious = 0 | (40/51) 68.67.180.12 | VT_malicious = 1 |
(41/51) 69.172.216.111 | VT_malicious = 0 | (42/51) 69.4.95.11 | VT_malicious = 0 |
(43/51) 72.21.81.48 | VT_malicious = 0 | (44/51) 74.117.200.68 | VT_malicious = 0 |
(45/51) 74.121.138.87 | VT_malicious = 0 | (46/51) 8.0.6.4 | VT_malicious = 0 |
(47/51) 8.43.72.97 | VT_malicious = 0 | (48/51) 8.43.72.98 | VT_malicious = 0 |
(49/51) 8.6.0.1 | VT_malicious = 0 | (50/51) 91.236.51.44 | VT_malicious = 0 |
(51/51) 93.184.216.180 | VT_malicious = 0 | | |
[✓] 5 malicious IPs inserted/updated in PostgreSQL.
[✓] Full report saved at:/home/linuxman/scripts/micra/results_m31/m31_validated_ips.csv
M3.2—Expert Intelligence Validation
The M3.2 submodule was implemented to incorporate human-driven threat intelligence into the unified threat indicator database used by the MICRA architecture. It allows analysts to manually validate, override, or complement indicators of compromise (IoCs) previously obtained through automated modules (M3.1) or other pipelines. This ensures that expert domain knowledge is properly integrated into the detection and decision-making process.
The input consists of a structured CSV file (expert_iocs.csv) containing analyst-verified IoCs. Each entry must explicitly declare the indicator value, type (e.g., IP, domain, URL, hash), verdict (malicious, benign, or false_positive), the analyst name, and an optional rationale. Upon execution, the script m32_manual_validate.py reads the file and performs either upserts or soft deletions into the unified PostgreSQL table validated_iocs. Fields such as score, reports, and country, typically filled by automated sources like VirusTotal, are left untouched in this stage (
Table 31,
Table 32,
Table 33 and
Table 34).
The submodule enforces strict data validation rules, provides detailed execution feedback (including per-analyst and per-verdict statistics), and supports auditability and traceability by persisting the analyst, rationale, and updated_at fields.
This module reinforces the MICRA design principle of human-in-the-loop intelligence validation, enabling cybersecurity professionals to intervene, contextualize, or contest automated inferences.
Input Files
Output Files
Reproducibility Instructions
- 1.
Prepare your environment:
export PG_DSN = “postgresql://micra_user:micra_pass@localhost:5432/micra”.
- 2.
Create or edit the input file expert_iocs.csv with analyst judgments.
- 3.
Run the script:
python m32_manual_validate.py --csv expert_iocs.csv
Execution Results
Import Summary
Table 31.
Summary of expert-validated IoC entries processed from CSV input.
Table 31.
Summary of expert-validated IoC entries processed from CSV input.
Total entries in CSV | 49 |
Valid inserts/updates | 49 |
Deletions (hard/soft) | 0 |
Invalid/ignored rows | 0 |
Verdict Breakdown
Table 32.
Distribution of analyst-assigned verdicts for imported threat indicators.
Table 32.
Distribution of analyst-assigned verdicts for imported threat indicators.
malicious | 22 |
benign | 14 |
false_positive | 13 |
Analysts
Table 33.
Number of IoCs contributed by each human analyst during validation.
Table 33.
Number of IoCs contributed by each human analyst during validation.
Bob | 15 IoCs |
Eve | 11 IoCs |
Carol | 10 IoCs |
Alice | 9 IoCs |
Dave | 4 IoCs |
IOC Types
Table 34.
Breakdown of IoC types included in the manually curated dataset.
Table 34.
Breakdown of IoC types included in the manually curated dataset.
ip | 14 |
domain | 14 |
url | 14 |
hash | 6 |
junk | 1 |
[✓] Operation complete.
M3.3—Strategic Data Validation Hub
The M3.3 submodule serves as the centralized repository of validated threat intelligence within the MICRA architecture. It is designed to store and maintain Indicators of Compromise (IoCs) that have been verified as malicious or classified by expert analysis. This module provides a reliable and consistent source of ground truth, supporting decision-making, inference, and automated response across all downstream components.
Rather than performing detection or analysis itself, M3.3 functions as a persistent intelligence layer. It enables the consolidation of external threat data, manual assessments, and outputs from other analytical modules into a single authoritative structure. This unified repository facilitates streamlined access to high-confidence IoCs, promoting interoperability, auditability, and scalability within security workflows.
The centralized repository is maintained in a PostgreSQL 15 database running in a Docker container. The schema (
Table 35) consists of a single table (validated_iocs) populated incrementally by upstream modules. The M3.3 implementation runs on vm-core.
Table structure—validated_iocs
Table 35.
Schema definition of the validated_iocs table used for centralized threat intelligence storage.
Table 35.
Schema definition of the validated_iocs table used for centralized threat intelligence storage.
Column | Type | Description |
---|
ioc | TEXT | The IoC value (IP, domain, URL, hash, etc.) |
ioc_type | TEXT | Type of indicator (ip, domain, url, hash, etc.) |
verdict | TEXT | Classification result: malicious, benign, or false_positive |
score | INTEGER | Confidence score (if available) |
reports | INTEGER | Number of reports or sightings |
country | TEXT | Country of origin (applicable for IP addresses) |
last_report | DATE | Date of last known report |
source | TEXT | Origin of the data (e.g., virustotal, manual) |
analyst | TEXT | Analyst responsible for manual classification (if applicable) |
rationale | TEXT | Justification or reasoning for the classification |
updated_at | TIMESTAMPTZ | Timestamp of the last update |
This structure ensures traceability, consistency, and extensibility for future integration with other security platforms, including SIEMs, CTI feeds, and response engines.
M4.1—Dynamic Perimeter Defense
The M4.1 submodule implements an automated perimeter protection mechanism by remotely applying iptables firewall rules to a Linux-based border host. Its purpose is to block confirmed malicious IP addresses, previously validated and stored in the centralized PostgreSQL repository, thereby preventing inbound or outbound traffic to known threats. The execution strategy is idempotent: before inserting any new rule, the system checks for existing entries to avoid duplication or conflict.
This mechanism reinforces the proactive defense posture of MICRA by transforming threat intelligence into real-time network enforcement actions at the perimeter level.
Architecture and Tooling
The script is executed from the analysis node (vm-core), which connects via SSH to the target perimeter host (vm-misp, running Ubuntu 24.04 LTS). The firewall on vm-misp is managed with iptables. The SSH user (micra) has a sudo rule that allows execution of/usr/sbin/iptables without a password prompt (NOPASSWD), enabling full automation without manual intervention.
The submodule is implemented in the script m41_dynamic_perimeter_block.py, which uses the libraries psycopg2, paramiko and dotenv to query the database and enforce the firewall policies.
Input Files
The script queries this table directly via the connection string defined in the PG_DSN environment variable.
Output Files
The script does not generate output files by default, but all commands and results are printed to the console (
Table 36). Redirecting stdout (e.g., > results.txt) is recommended for auditing purposes.
Reproducibility Instructions
- 1.
Install required libraries
pip install psycopg2-binary paramiko python-dotenv
- 2.
Ensure PostgreSQL access and environment configuration
export PG_DSN = “postgresql://micra_user:micra_pass@localhost:5432/micra”
export MISP_HOST = “vm-misp”
export MISP_USER = “micra”
- 3.
Ensure SSH and sudo configuration on target host (vm-misp)
micra ALL = (ALL) NOPASSWD:/usr/sbin/iptables
- 4.
Run the script from vm-core
python m41_dynamic_perimeter_block.py
Execution Results
[+] Attempting to block 27 IPs on vm-misp…
Table 36.
List of 27 malicious IPs successfully blocked via iptables on the perimeter host (vm-misp).
Table 36.
List of 27 malicious IPs successfully blocked via iptables on the perimeter host (vm-misp).
Blocked 102.157.44.105 | Blocked 105.158.118.241 |
Blocked 111.111.111.111 | Blocked 112.112.112.112 |
Blocked 113.113.113.113 | Blocked 113.169.187.159 |
Blocked 123.123.123.123 | Blocked 134.35.9.209 |
Blocked 139.195.43.166 | Blocked 147.185.221.30 |
Blocked 154.94.232.230 | Blocked 157.240.18.35 |
Blocked 160.17.5.1 | Blocked 172.64.80.1 |
Blocked 185.103.100.63 | Blocked 185.143.223.69 |
Blocked 193.233.171.95 | Blocked 234.234.234.234 |
Blocked 31.13.71.36 | Blocked 31.13.80.12 |
Blocked 45.12.112.91 | Blocked 68.67.180.12 |
Blocked 68.83.169.91 | Blocked 82.102.21.123 |
Blocked 89.89.89.89 | Blocked 91.108.245.232 |
Blocked 91.236.51.44 | |
Blocking process completed.
M4.2—Dynamic Endpoint Defense
The M4.2 submodule is responsible for synchronizing confirmed malicious IP addresses with endpoint agents in real time, leveraging the native integration capabilities of the Wazuh platform. By maintaining an up-to-date block list within the Wazuh Manager, the system ensures that all connected endpoints receive and enforce perimeter rules dynamically, supporting distributed protection in a scalable and automated fashion.
This defense mechanism enhances MICRA’s architecture by extending threat mitigation beyond the network perimeter (M4.1), reaching down to each enrolled endpoint in a coordinated and secure manner.
Implementation Architecture
The Wazuh Manager runs inside a Docker container tagged wazuh.manager. The MICRA core system (vm-core) executes the script m42_sync_endpoint_wazuh.py, which performs the following operations:
Queries the PostgreSQL validated_iocs table for up to 1000 recently validated malicious IPs, excluding private ranges;
Generates a block list file in the expected Wazuh format;
Packages the list as a tar archive and transfers it into the container directory/var/ossec/etc/shared/lists/;
Restarts the Wazuh Manager via wazuh-control, triggering automatic distribution to all agents.
This workflow supports secure and idempotent updates without requiring direct access to the host filesystem or containers.
Input Files
PostgreSQL Table: validated_iocs
Output Files
This file is automatically synchronized with all Wazuh agents and enforces dynamic blocking policies.
Reproducibility Instructions
- 1.
Ensure prerequisites are met:
PostgreSQL service running and accessible from vm-core;
Docker installed and the Wazuh Manager container active;
Environment variable PG_DSN set with proper connection string.
- 2.
Install required Python packages:
pip install psycopg2-binary docker.
- 3.
Execute the script:
python m42_sync_endpoint_wazuh.py
Execution Results
Generated:/tmp/tmpu9xdiloz → blocklist_20250731.lst
Target container: single-node-wazuh.manager-1 (OK)
Total IPs in database: 31
IPs synced (latest 1000): 31
Top 10 IPs to sync:
82.102.21.123
193.233.171.95
185.103.100.63
91.236.51.44
91.108.245.232
45.12.112.91
134.35.9.209
139.195.43.166
185.143.223.69
68.83.169.91
Copied to single-node-wazuh.manager-1:/var/ossec/etc/shared/lists/blocklist_20250731.lst
Manager restarted—rules will propagate in ~1 min
M4.3—Dynamic Network Intrusion Prevention
The M4.3 submodule dynamically updates network-level intrusion prevention mechanisms by generating a Suricata ruleset based on the centralized threat intelligence repository. Its goal is to ensure that newly validated malicious IPs are automatically transformed into blocking rules and distributed to perimeter sensors.
This module connects to the PostgreSQL database to extract all malicious IPs stored in the validated_iocs table. Each IP is converted into a Suricata drop rule and saved to a dedicated rules file named micra_ioc.rules. The rules are remotely deployed via SSH to the vm-sensor machine, where Suricata runs inside a Docker container. Once transferred, the module triggers a rule reload to ensure immediate enforcement.
The ruleset uses a reserved SID range (from 6,000,000 onward) and includes metadata such as timestamp and classification for easier traceability.
Input Files
Output Files
drop ip any any -> 91.108.245.232 any (msg: ”MICRA IOC—Malicious IP 91.108.245.232”; sid:6000001; rev:1; classtype: trojan-activity;)
Reproducibility Instructions
- 1.
Install dependencies
pip install psycopg2-binary paramiko
- 2.
Set environment variables
export PG_DSN = “postgresql://micra_user:micra_pass@localhost:5432/micra”
export SENSOR_HOST = “vm-sensor”
export SSH_KEY = “/home/linuxman/.ssh/id_rsa” # adjust as needed
export SSH_USER = “linuxman”
- 3.
Ensure permissions on vm-sensor
The SSH user must have write access to/opt/suricata/rules or be granted via chmod 777 (for testing).
- 4.
Execute the script
python m43_sync_suricata_rules.py
Execution Results
File deployed to vm-sensor:/opt/suricata/rules/micra_ioc.rules
Total rules written: 31
Top 10 IPs:
82.102.21.123
193.233.171.95
185.103.100.63
91.236.51.44
91.108.245.232
45.12.112.91
134.35.9.209
139.195.43.166
185.143.223.69
68.83.169.91
Suricata reloaded successfully: {“message”: “done”, “return”: “OK”}
31 rules successfully active in Suricata.
M5.1—SQL Threat Search Engine
The M5.1—SQL Threat Search Engine module implements a modular hunt generation and ingestion engine, designed to convert validated threat intelligence indicators (IoCs) into structured SQL queries. These queries, referred to as hunts, are stored and versioned for execution in Security Information and Event Management (SIEM) platforms or PostgreSQL-compatible telemetry databases.
The hunt engine allows analysts and automated systems to investigate patterns of malicious behavior retrospectively using structured queries. Each hunt is versioned with metadata (e.g., title, severity, tags) and saved both in .sql and .yml formats. Once results are collected from the SIEM or SQL engine, they can be ingested back into the MICRA system for correlation, historical tracking, and triage.
This submodule is composed of two main scripts:
m51_build_hunts.py: Generates SQL hunt templates using validated IP-based IoCs.
m51_ingest_hunts.py: Loads the CSV outputs of executed hunts into the MICRA database (suspect_historical_sql table) for storage and further analysis.
Input Files
Validated IoCs: Malicious IP addresses stored in the PostgreSQL table validated_iocs (populated by modules M3.1, M3.2, etc.).
Executed Hunt Results (CSV): CSV files resulting from hunt execution via SIEM or manual query execution.
Output Files
hunts_out/bruteforce_auth.sql: SQL query to detect brute-force authentication attempts.
hunts_out/ioc_contact.sql: SQL query to detect contact with known malicious IPs.
hunts_out/hunts.yml: Metadata for all generated hunts, including ID, title, severity, and description.
Populated PostgreSQL table suspect_historical_sql with ingestion of hunt results via CSV files.
Reproducibility Instructions
- 1.
Install Dependencies
pip install pyyaml psycopg2-binary sqlparse
- 2.
Set Environment Variable for Database Connection
export PG_DSN = “postgresql://micra_user:micra_pass@localhost:5432/micra”
- 3.
Generate Hunts (SQL + Metadata)
python m51_build_hunts.py
This command will generate .sql files and a metadata file under hunts_out/.
- 4.
Execute the Hunts Manually or via SIEM
psql micra < hunts_out/ioc_contact.sql > results_m51/ioc_contact_result.csv
Ensure the result is saved in CSV format.
- 5.
Ingest Hunt Results
python m51_ingest_hunts.py --in-dir results_m51 --tag “Q2 Threat Hunt”
This step inserts the hunt results into the table suspect_historical_sql, preserving metadata such as source IP, destination IP, timestamp, and hunt ID.
Execution Results
After running the full cycle of M5.1, the following results were obtained:
2 hunts generated:
bruteforce_auth.sql (high severity)
ioc_contact.sql (critical severity)
Metadata exported to hunts_out/hunts.yml
CSV results ingested with tag “Q2 Threat Hunt”:
ioc_contact_result.csv: 17 rows inserted
bruteforce_auth_result.csv: 5 rows inserted
Table suspect_historical_sql contains enriched data including IPs, timestamps, and original CSV rows (stored as JSON in extra field)
The modular design allows for easy extension with new hunt types, as well as automatic or scheduled ingestion of detection results for long-term tracking and visualization.
M5.2—Malware Pattern Search Engine
The M5.2 submodule is designed to retroactively identify malware artifacts across local or historical file repositories using pattern-matching technologies such as YARA and Sigma. It operationalizes structured detection strategies to scan for previously validated threat indicators—particularly file hashes (SHA-256), suspicious strings, and behavioral signatures extracted from honeypot sessions. This process supports malware triage, incident forensics, and historical threat correlation.
Two distinct scripts were developed for this module:
The resulting file is ready for SIEM conversion via tools such as sigma convert.
These rules can be used to scan local file systems using the yara CLI tool.
Input Files
cowrie.json: Honeypot log with attacker sessions (source for pattern generation)
scan_yara.csv: Output of a retroactive scan using yara -r…
Output Files
rules_out/m52_sigma_rules_iocs_<DATE>.yml: Sigma rules generated from Cowrie logs
rules_out/m52_yara_rules_iocs_<DATE>.yar: YARA rules with malware indicators
validated_iocs (PostgreSQL): Updated with new SHA-256 hits (source: yara-hunt)
Reproducibility Instructions
- 1.
Install dependencies:
pip install psycopg2-binary pyyaml sqlparse
- 2.
Generate Sigma rules:
python m52_build_sigma_from_cowrie.py
- 3.
Generate YARA rules:
python m52_build_yara_from_cowrie.py
- 4.
Run a scan using YARA (example):
yara -r rules_out/m52_yara_rules_iocs_<DATE>.yar/target_dir > scan_yara.csv
- 5.
Ingest results into PostgreSQL:
export PG_DSN = “postgresql://user:pass@host:5432/micra”
python m52_ingest_yara_results.py scan_yara.csv
Execution Results
Created rules for:
- ○
12 individual SHA-256 hashes
- ○
3 URL/IP clusters (60 IOCs total)
- ○
1 generic dropper rule
- ○
1 SSH banner detection rule
Output saved to: rules_out/m52_sigma_rules_iocs_2025-07-31.yml
Created rules for:
- ○
12 SHA-256 hashes
- ○
60 URLs and IPs in 3 grouped rules
- ○
1 combined dropper rule
- ○
1 SSH banner detection rule
Output saved to: rules_out/m52_yara_rules_iocs_2025-07-31.yar
After scanning:
- ○
SHA-256 ingested: 4 new, 2 updated, 3 skipped.
- ○
All inserted entries tagged as source = ‘yara-hunt’ and verdict = ‘malicious’
M6.1—Internal Data Intelligence Hub
The M6.1 submodule is responsible for exporting validated threat intelligence from the MICRA repository to the internal MISP (Malware Information Sharing Platform) instance for collaborative enrichment, correlation, and visualization. It consolidates strategic indicators of compromise (IoCs) classified as malicious in the PostgreSQL database and publishes them as new structured events in the MISP platform, tagged and categorized for use in IDS/IPS signatures, threat analysis, or forensic triage.
This mechanism bridges the internal threat validation architecture (Module M3.3) with a dedicated CTI platform (MISP), supporting traceability, collaboration, and long-term threat knowledge accumulation.
The submodule is executed from the vm-core node and publishes events directly to the vm-misp instance, assuming the existence of valid API credentials and appropriate visibility configurations (e.g., org-only or sharing group ID).
Input Files
PostgreSQL table validated_iocs: Contains all malicious IoCs validated in MICRA
Output Files
New MISP event: Event with ip-dst attributes for each malicious IoC
Event metadata: Includes rationale, date, and MISP tags (validated, ids:signature)
Reproducibility Instructions
- 1.
Configure environment variables:
export PG_DSN = “postgresql://micra_user:micra_pass@localhost:5432/micra”
export MISP_KEY = “your_api_key_here”
export SHARING_ID = “1” # optional: ID of sharing group
- 2.
Run the publishing script:
python m61_ioc_m33_to_misp.py
- 3.
Expected behavior:
- ○
All malicious IPs are extracted from the validated_iocs table.
- ○
A new event is created in the MISP instance.
- ○
Each IP is added as an attribute (ip-dst) with optional analyst commentary.
Execution Results
Example output for an execution containing 31 indicators:
✓ Published event 325 with 31 IoCs to MISP at
http://vm-misp (MISP instance)
Each IoC is added with its rationale (if present), and the event is tagged appropriately. If no new IoCs are present, the script will return:
✓ No new IoCs—nothing to publish.
This design supports repeated execution without duplication due to MISP’s internal deduplication and MICRA’s control over verdict assignment.
M6.2—External Data Intelligence Hub
The M6.2 submodule enables bidirectional threat intelligence sharing between MICRA and external partners, such as ISACs or trusted MISP nodes. Its core objective is to ingest externally received IoCs into MICRA’s validation pipeline and to automatically export internally validated indicators back to partner nodes in compliance with configured sharing policies.
This architecture reinforces MICRA’s ability to operate as both a consumer and a provider of cyber threat intelligence (CTI), while ensuring that all indicators follow the same internal scrutiny path regardless of their origin.
The process is performed manually during the MVP phase, using the script m63_ioc_externo_misp_to_m33.py, which extracts new indicators tagged as status:new_external in the local MISP and inserts them into the validation_queue table for reprocessing by the M3.x validation modules.
Input Files
MISP attributes with tag status:new_external—Incoming threat intelligence tagged by automation rules
Output Files
validation_queue: Temporary queue for unverified IoCs to be processed by M3.x
Reproducibility Instructions
- 1.
Configure required environment variables:
export MISP_KEY = “your_api_key”
export PG_DSN = “postgresql://micra_user:micra_pass@localhost:5432/micra”
- 2.
Execute ingestion script on vm-core:
python m63_ioc_externo_misp_to_m33.py
- 3.
Confirm insertion into validation_queue:
✓ 23 IoCs inserted into validation_queue
Execution Results
After execution, all external IoCs tagged as status:new_external in MISP are routed to MICRA’s internal validation pipeline. If properly configured, the complete flow proceeds as follows:
External IoCs appear in MISP.
Script m63_ioc_externo_misp_to_m33.py inserts them into validation_queue.
Validation pipeline (M3.x) processes the entries and updates validated_iocs.
Validated indicators are published internally (M6.1) and externally via MISP sync.
This design supports reproducible, rule-based CTI sharing under trust boundaries defined by TLP and tag-based filtering.