Author Contributions
Conceptualization, S.S.B., D.M., S.C.B., D.H.S. and F.M.; methodology, D.H.S. and F.M.; software, D.H.S. and F.M.; validation, S.S.B., D.M., S.C.B., D.H.S. and F.M.; formal analysis, D.H.S. and F.M.; investigation, D.H.S. and F.M.; resources, S.S.B., D.M. and S.C.B.; data curation, D.H.S. and F.M.; writing—D.H.S.; writing—review and editing, S.S.B., D.M., S.C.B. and D.H.S.; visualization, D.H.S. and F.M.; supervision, S.S.B., D.M. and S.C.B.; project administration, S.S.B., D.M. and S.C.B.; funding acquisition, S.S.B., D.M. and S.C.B. All authors have read and agreed to the published version of the manuscript.
Figure 1.
Preprocessing flow chart.
Figure 1.
Preprocessing flow chart.
Figure 2.
Attributes before and after preprocessing.
Figure 2.
Attributes before and after preprocessing.
Figure 3.
Cypher query: creating the nodes and edges.
Figure 3.
Cypher query: creating the nodes and edges.
Figure 4.
Cypher query: creating the network.
Figure 4.
Cypher query: creating the network.
Figure 5.
Cypher query: generating the PageRank scores.
Figure 5.
Cypher query: generating the PageRank scores.
Figure 6.
Cypher query: setting in- and out-degree.
Figure 6.
Cypher query: setting in- and out-degree.
Figure 7.
Cypher query for count of bridges.
Figure 7.
Cypher query for count of bridges.
Figure 8.
Cypher query returning the number of nodes and edges.
Figure 8.
Cypher query returning the number of nodes and edges.
Figure 9.
Node and edge cardinality for tactics of interest.
Figure 9.
Node and edge cardinality for tactics of interest.
Figure 10.
Returns paths of length 2 to 5 edges away.
Figure 10.
Returns paths of length 2 to 5 edges away.
Figure 11.
A node of the Reconnaissance tactic to destination addresses.
Figure 11.
A node of the Reconnaissance tactic to destination addresses.
Figure 12.
Node of None (benign traffic) with path length of 2.
Figure 12.
Node of None (benign traffic) with path length of 2.
Figure 13.
Nodes of the Discovery tactic to destination addresses.
Figure 13.
Nodes of the Discovery tactic to destination addresses.
Figure 14.
UWF-ZeekData22 visualization.
Figure 14.
UWF-ZeekData22 visualization.
Figure 15.
Showing 10,000 nodes of the benign data.
Figure 15.
Showing 10,000 nodes of the benign data.
Figure 16.
Showing 10,000 nodes of “none” tactic.
Figure 16.
Showing 10,000 nodes of “none” tactic.
Figure 17.
Highlight of a highly connected node of “none” (benign data).
Figure 17.
Highlight of a highly connected node of “none” (benign data).
Figure 18.
Cypher query for feature selection.
Figure 18.
Cypher query for feature selection.
Figure 19.
GATJK source node classification metrics.
Figure 19.
GATJK source node classification metrics.
Figure 20.
GATJK default source node classification loss.
Figure 20.
GATJK default source node classification loss.
Figure 21.
GATJK learning rate source node classification metric.
Figure 21.
GATJK learning rate source node classification metric.
Figure 22.
GATJK learning rate source node classification loss.
Figure 22.
GATJK learning rate source node classification loss.
Figure 23.
GATJK source classification learning rate comparison.
Figure 23.
GATJK source classification learning rate comparison.
Figure 24.
GATJK epoch source node classification metrics.
Figure 24.
GATJK epoch source node classification metrics.
Figure 25.
GATJK epoch source node classification loss.
Figure 25.
GATJK epoch source node classification loss.
Figure 26.
Destination epoch node classification metrics.
Figure 26.
Destination epoch node classification metrics.
Figure 27.
GATJK epoch destination node classification loss.
Figure 27.
GATJK epoch destination node classification loss.
Figure 28.
GraphSAGE source node classification metrics.
Figure 28.
GraphSAGE source node classification metrics.
Figure 29.
GraphSAGE source node classification loss.
Figure 29.
GraphSAGE source node classification loss.
Figure 30.
GraphSAGE destination node classification metrics.
Figure 30.
GraphSAGE destination node classification metrics.
Figure 31.
GraphSAGE destination node classification loss.
Figure 31.
GraphSAGE destination node classification loss.
Figure 32.
GATv2 source node classification metrics.
Figure 32.
GATv2 source node classification metrics.
Figure 33.
GATv2 source node classification loss.
Figure 33.
GATv2 source node classification loss.
Figure 34.
GATv2 destination node classification metrics.
Figure 34.
GATv2 destination node classification metrics.
Figure 35.
GATv2 destination node classification loss.
Figure 35.
GATv2 destination node classification loss.
Table 1.
Count by Tactic: UWF-ZeekData22.
Table 1.
Count by Tactic: UWF-ZeekData22.
Tactic | UWF-ZeekData22 Count |
---|
None | 9,281,599 |
Reconnaissance | 9,278,722 |
Discovery | 2086 |
Credential Access | 31 |
Privilege Escalation | 13 |
Exfiltration | 7 |
Lateral Movement | 4 |
Resource Development | 3 |
Persistence, Initial Access, Defense Evasion | 1 |
Execution | 0 |
Command and Control | 0 |
Defense Evasion | 0 |
Initial Access | 0 |
Initial Access, Persistence | 0 |
Table 2.
PageRank execution time.
Table 2.
PageRank execution time.
Tactic | Row Count | Execution Time (ms) |
---|
None | 9,281,599 | 13,554 |
Reconnaissance | 9,278,722 | 13,198 |
Discovery | 2086 | 47 |
Tactic | Row Count | Execution time (ms) |
None | 9,281,599 | 13,554 |
Table 3.
Reconnaissance: top 10 PageRank scores.
Table 3.
Reconnaissance: top 10 PageRank scores.
Address | PageRank Score |
---|
143.88.5.1:53 | 0.036933 |
143.88.7.1:443 | 0.002461 |
143.88.7.12:8080 | 0.001996 |
143.88.7.11:631 | 0.001677 |
143.88.7.15:135 | 0.001384 |
143.88.2.10:80 | 0.000762 |
143.88.7.12:22 | 0.000697 |
143.88.2.10:443 | 0.000673 |
143.88.7.12:80 | 0.000621 |
143.88.7.11:21 | 0.000617 |
Table 4.
Discovery: top 10 PageRank scores.
Table 4.
Discovery: top 10 PageRank scores.
Address | PageRank Score |
---|
143.88.2.12:22 | 0.003656 |
143.88.2.12:1 | 0.001554 |
143.88.2.12:443 | 0.000954 |
143.88.2.12:80 | 0.000954 |
143.88.7.10:3 | 0.000909 |
143.88.2.12:1999 | 0.000654 |
143.88.2.12:5907 | 0.000654 |
143.88.2.12:8443 | 0.000654 |
143.88.2.12:3945 | 0.000654 |
143.88.2.12:8001 | 0.000654 |
Table 5.
In-degree centrality as tactics subgraphs.
Table 5.
In-degree centrality as tactics subgraphs.
Reconnaissance | Reconnaissance | None | None |
---|
Address–Port | In-Degree | Address–Port | In-Degree |
---|
143.88.5.1:53 | 27.50483817 | 10.0.10.1:53 | 27.6587867 |
143.88.7.15:135 | 3.174730286 | 143.88.11.1:53 | 2.40422696 |
143.88.7.1:443 | 1.936158381 | 8.8.8.8:53 | 1.87328573 |
143.88.7.12:8080 | 1.665665666 | 8.8.4.4:53 | 1.86731302 |
143.88.7.11:631 | 1.352463575 | 143.88.1.1:53 | 1.65291272 |
143.88.7.12:22 | 0.654877099 | ff02::1:2:547 | 0.27426719 |
143.88.7.15:1 | 0.654877099 | 143.88.255.10:53 | 0.22335524 |
143.88.7.11:80 | 0.512512513 | 143.88.0.41:53 | 0.10713022 |
143.88.7.11:21 | 0.498276054 | 172.28.128.255:138 | 0.08518095 |
143.88.7.12:80 | 0.441330219 | 172.28.128.255:137 | 0.06999827 |
Table 6.
Out-degree centrality as tactics subgraphs.
Table 6.
Out-degree centrality as tactics subgraphs.
Reconnaissance | Reconnaissance | None | None |
---|
Address–Port | Out-Degree | Address–Port | Out-Degree |
---|
143.88.2.10:53565 | 14.69202536 | fe80::250:56ff:fe9e:5457:546 | 0.17432725 |
143.88.2.10:41562 | 14.46424202 | 172.28.128.3:138 | 0.08518095 |
143.88.2.10:58517 | 14.40729619 | 172.28.128.3:137 | 0.06999827 |
143.88.2.10:51130 | 14.33611389 | 143.88.11.10:3 | 0.05328951 |
143.88.2.10:38774 | 14.29340452 | 143.88.1.50:138 | 0.05276711 |
143.88.2.10:54736 | 14.29340452 | 143.88.11.10:68 | 0.05139322 |
143.88.2.10:35962 | 14.25069514 | 143.88.11.14:3 | 0.04899509 |
143.88.2.10:43921 | 14.23645868 | 143.88.1.50:137 | 0.04237246 |
143.88.2.10:62815 | 14.23645868 | 143.88.11.10:50888 | 0.0398427 |
143.88.2.10:44715 | 14.23645868 | 143.88.11.10:53887 | 0.03973987 |
Table 7.
Out-degree averages for UWF-ZeekData22.
Table 7.
Out-degree averages for UWF-ZeekData22.
Tactic | Source Out-Degree Average | Destination In-Degree Average |
---|
Reconnaissance | 743.321 | 1673.07 |
None (benign) | 40.031 | 814.657 |
Credential Access | 1.292 | 4.429 |
Discovery | 43.31 | 2.078 |
Table 8.
Bridge counts by tactic.
Table 8.
Bridge counts by tactic.
Tactic | Bridge Count |
---|
Reconnaissance | 2 |
None (Benign) | 5742 |
Credential Access | 18 |
Discovery | 1866 |
Table 9.
Weakly connected components.
Table 9.
Weakly connected components.
Tactic | Components Count |
---|
Reconnaissance | 930 |
None (Benign) | 34 |
Credential Access | 7 |
Discovery | 4 |
Table 10.
Path length.
Tactic | Edges |
---|
None (Benign) | 9,281,599 |
Reconnaissance | 9,278,722 |
Discovery | 2086 |
Credential Access | 31 |
Resource Development | 0 |
Table 11.
Example of node classification training parameters.
Table 11.
Example of node classification training parameters.
Aggregator | Mean |
---|
checkpoint_freq | 5 |
console_log_freq | 5 |
device_type | cuda |
hidden_features_size | [16, 16] |
layer_type | GATJK |
learning_rate | 0.1 |
Metrics | [“loss”,“accuracy”,“f1_score”,“precision”,“recall”,“num_wrong_examples”] |
node_id_property | id |
num_epochs | 100 |
path_to_model | /tmp/torch_models/model_GATJK_ |
split_ratio | 0.8 |
weight_decay | 0.0005 |
Table 12.
Tactic label values.
Table 12.
Tactic label values.
Tactic | Label |
---|
None | 1 |
Reconnaissance | 2 |
Discovery | 3 |
Credential Access | 4 |
Discovery, Reconnaissance | 5 |
Reconnaissance, none | 6 |
no_conn | 7 |
Table 13.
Node property descriptions.
Table 13.
Node property descriptions.
Node Property | Description | Example Value |
---|
address | Concatenation of IP address and port | 143.88.7.10:54124 |
dest_class | Numeric label of node as a destination of a tactic | 7 |
dest_credential_access | Binary label of node as destination of Credential Access tactic | 0 |
dest_discovery | Binary label of node as destination of Discovery tactic | 0 |
dest_no_conn | Binary label of node as destination of no connections | 1 |
dest_none | Binary label of node as destination of None tactic (benign, normal connection) | 0 |
dest_reconnaissance | Binary label of node as destination of Reconnaissance tactic | 0 |
features | Array of feature values {in-degree, out-degree, PageRank} | Array [3] |
Discovery, Reconnaissance | 5 | [0, |
Reconnaissance, none | 6 | 0.0009735246917805615, |
no_conn | 7 | 0.000002120804881073081] |
in_degree | In-degree of node | 0 |
out_degree | Out-degree of node | 0.000973525 |
rank | PageRank of node | 2.1208 × 106 |
src_class | Numeric label of node as a source of a tactic | 2 |
src_credential_access | Binary label of node as a source of Credential Access tactic | 0 |
src_discovery | Binary label of node as a source of Discovery tactic | 0 |
src_no_conn | Binary label of node as a source of no connections | 0 |
src_none | Binary label of node as a source of None tactic (benign, normal connection) | 0 |
Table 14.
Best GATJK results.
Table 14.
Best GATJK results.
| Source Node Classification | Destination Node Classification |
---|
Accuracy | 0.9851 | 0.9785 |
F1 Score | 0.9834 | 0.9775 |
Precision | 0.984 | 0.978 |
Recall | 0.9851 | 0.9785 |
Table 15.
Best GraphSAGE results.
Table 15.
Best GraphSAGE results.
| Source Node Classification | Destination Node Classification |
---|
Accuracy | 0.9553 | 0.9681 |
F1 Score | 0.9541 | 0.966 |
Precision | 0.9536 | 0.9662 |
Recall | 0.9553 | 0.9681 |
Table 16.
Best GATv2 results.
Table 16.
Best GATv2 results.
| Source Node Classification | Destination Node Classification |
---|
Accuracy | 0.4297 | 0.8928 |
F1 Score | 0.5442 | 0.8992 |
Precision | 0.8399 | 0.9113 |
Recall | 0.4297 | 0.8928 |