Generative AI and LLMs for Critical Infrastructure Protection: Evaluation Benchmarks, Agentic AI, Challenges, and Opportunities
Abstract
:1. Introduction
- Implementing an all-hazards approach to risk management, considering cyber and physical threats to critical infrastructure integrity.
- Integrating Incident Response (IR) strategies with Business Continuity Planning (BCP) to ensure seamless continuity of operations during and after security incidents.
- Adopting a consequence-management approach to manage critical infrastructure failures’ immediate and long-term impacts, including economic, societal, and environmental consequences.
- Regularly assessing the security status of CNIs and conducting penetration testing to identify vulnerabilities and weaknesses.
- Employing robust security-mitigation measures such as intrusion-detection systems, cryptography methods, firewalls, anti-virus software, and emerging security technologies like blockchain, Artificial Intelligence (AI), and machine learning.
- Establishing and enforcing policies for maintaining and updating software and hardware periodically to mitigate vulnerabilities arising from outdated systems.
- Providing comprehensive cybersecurity training to staff to enhance awareness and preparedness against cyber threats.
- Enforcing robust cybersecurity policies and operating procedures, ensuring compliance with regulatory frameworks and industry standards.
- Encouraging international cooperation and coordination to address cross-border cyber threats effectively, including information sharing and joint response efforts.
- Collaborating with industry experts and sharing threat intelligence to stay ahead of emerging cyber threats and vulnerabilities.
2. Reliability of Critical National Infrastructures
2.1. Weibull Analysis
2.2. Markov Chains
- Identifying every potential state of the system, including failure and normal states, to create a complete state space.
- Creating a transition probability matrix that shows the possibility of a state change at a given moment. This matrix is the foundation of Markov Chain analysis.
- Indication of the rates of transition between the states, including the rates of failure and repair. For continuous-time Markov Chains, it is usually assumed to follow exponential distributions.
- Determination of the long-term behaviour of the system by calculating the steady-state probability of each system state. This involves solving the balance equations derived from the transition probability matrix.
- Employing steady-state probabilities to determine key reliability metrics such as mean time to repair (MTTR), system availability, and mean time to failure (MTTF).
2.3. Monte Carlo Simulation
- Definition of a System Parameter: Set up initial system parameters (P), such as operating conditions (), failure rates (), and repair rates.
- Setup of the Simulation Model: Create a model, , that represents the system’s operational states over time, with representing normal operation and representing failure.
- Requirements for Failure: Define performance thresholds as the basis for failure criteria and designate the system as failed when its performance (P) falls below a given threshold ().
- Stochastic Sampling: For every parameter, perform a random sampling from the corresponding probability distributions; for example, sample for failure rates from .
- Iteration and Statistical Analysis: Conduct multiple simulation iterations (N) to observe various outcomes, calculating the system reliability (R) and time to failure () as follows:
- Analysis of Results: Estimate and reliability over the specified period with:
3. Benchmarks for Evaluating LLMs in Cybersecurity
3.1. Cybersecurity in Industrial Control Systems
3.2. Network Operations Evaluation
3.3. Debugging Capabilities of LLMs
3.4. Security Knowledge Assessment
3.5. Code-Generation Security
3.6. Foundational Knowledge in Cybersecurity
3.7. Python Code Security
3.8. IT Operations Evaluation
3.9. Adversarial Code Vulnerabilities
3.10. Cognitive-Level Cybersecurity Tasks
3.11. Coding Assistant Vulnerabilities
3.12. Code Security Evaluation
3.13. Capture the Flag Challenges
3.14. Large-Scale Vulnerability Detection
3.15. Cyber-Attack Attribution
3.16. Expanded Cybersecurity Risks
4. Cybersecurity Issues
- Malware: Malicious software like viruses, worms, and Trojan horses can compromise the integrity, availability, and confidentiality of critical infrastructure systems. A malware program may be designed to steal sensitive data, disrupt operations, or allow attackers to take control of infrastructure assets remotely.
- Ransomware: Critical infrastructure has increasingly been targeted by ransomware attacks. The attacks can disrupt operations and demand large ransom payments, resulting in financial losses and outages.
- Supply Chain Attacks: A critical infrastructure often depends on third-party vendors for hardware, software, and services. The threat of supply chain attacks, where attackers compromise suppliers to gain access to target infrastructure, is becoming more common and difficult to detect.
- Phishing: Phishing attacks target employees or system users in an attempt to obtain sensitive information, such as login credentials or financial information. By impersonating legitimate entities, such as utility providers and government agencies, phishing emails or messages can gain access to critical infrastructure networks.
- Denial-of-Service: By overloading critical infrastructure systems with traffic, these attacks cause them to become slow or unresponsive. Multi-device DDoS attacks can disrupt essential services like communication networks and online utilities.
- SQL Injection: Databases are targeted by SQL injection attacks that exploit vulnerabilities in web applications. Attackers can manipulate SQL queries to access, modify, or delete sensitive data stored in critical infrastructure systems.
- Zero-Day Exploits: Zero-day exploits can take advantage of previously unknown vulnerabilities in software or hardware that have not yet been patched. These vulnerabilities are exploited by attackers to gain unauthorized access to critical infrastructure systems, steal data, or disrupt operations before security patches are available.
5. Trust, Privacy, and Resilience
6. Securability
- State is when both the system and the DR site are operating normally;
- State is when the system is down due to a malfunction or attack;
- State is when the DR site is switched off;
- State is when both S and DR are switched off.
Collaborative Intelligence for Privacy-Preserving CIP
7. Generative AI and Large Language Models for Critical Infrastructure Protection
7.1. LLM Lifecycle for Critical Infrastructure Protection
7.1.1. Vision and Scope: Defining the Project’s Direction for CIP
- Objective Clarification: We establish the model’s role in protecting critical infrastructure. ‘Will it analyze threat intelligence, aid vulnerability assessments, or assist in emergency response?’ Setting a clear, CIP-focused objective will guide the development process.
- Scope Determination: We identify which critical infrastructure sectors the LLM will focus on, such as energy, water, and transportation. Different sectors may require different types of data and domain knowledge.
7.1.2. Model Selection: Tailoring to CIP Requirements
- Security and Reliability: We choose or develop a new model emphasizing security and data privacy, which are essential for CIP applications.
- Domain Adaptation: We decide whether to adapt an existing LLM or train a new one with a dataset enriched with CIP-related content.
7.1.3. Model’s Performance and Adjustment: Ensuring CIP Efficacy
- Performance Assessment: We evaluate the model’s ability to identify, classify, and predict threats to critical infrastructure.
- Adjustment for CIP: Focus adjustments on enhancing the model’s capability to deal with the specific nuances of critical infrastructure threats. Hence, this could involve prompt engineering with CIP-specific prompts or further fine-tuning on targeted datasets.
7.1.4. Evaluation and Iteration: Refining for CIP Precision
- CIP-Specific Metrics: We use evaluation metrics that reflect the model’s performance in a CIP context—threat-detection accuracy, response speed, and ability to work with domain-specific data.
7.1.5. LLM Deployment: Launching the LLM Model for CIP
7.2. Predictive Analysis and Threat Intelligence: The Case of Energy Grid-Protection
7.3. Automated Incident Response: Enhancing Pipeline Security
7.4. Enhancing Communication and Coordination: Water-Treatment Facility Case Study
7.5. Challenges and Considerations
7.5.1. Building an Instruction Cybersecurity Dataset
7.5.2. Pre-Training Models
7.5.3. Supervised Fine-Tuning
7.5.4. Reinforcement Learning from Human Feedback
7.5.5. Quantization
7.5.6. Retrieval-Augmented Generation
7.5.7. Inference Optimization
8. Agentic AI for Critical Infrastructure Protection
8.1. Real-Time Anomaly Detection and Threat Mitigation
8.2. Intelligent Incident Response and Recovery
8.3. Proactive Resilience and Predictive Maintenance
8.4. ML-Assisted Protection in Critical Infrastructure
8.5. Automated Policy Enforcement and Compliance
8.6. Multi-Stakeholder Collaboration and Incident Coordination
8.7. Ethical, Secure, and Trustworthy AI for CIP
8.8. AI Ethics
8.9. Bias Mitigation in AI for CIP
- Data rebalancing ensures that diverse and representative datasets for training AI models, using techniques such as synthetic data augmentation and resampling methods to balance underrepresented cases.
- Fairness-aware model training includes bias correction algorithms, such as adversarial debiasing and reweighting methods, to ensure that the AI system does not systematically favour or neglect specific categories.
- Regular auditing of AI models for bias using explainable AI techniques identify and correct potential discriminatory patterns.
- Active learning continuously updates AI models with real-world data to improve their adaptability and reduce bias over time.
9. Future Directions
Challenges and Real-World Adoption of Emerging Technologies
10. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
CNI | Critical National Infrastructure |
DDoS | Distributed Denial-of-Service |
APT | Advanced Persistent Threat |
ICS | Industrial Control System |
IR | Incident Response |
BCP | Business Continuity Planning |
AI | Artificial Intelligence |
CIP | Critical Infrastructure Protection |
LLM | Large Language Model |
MTTF | Mean Time to Failure |
MTTR | Mean Time to Restore, Respond, or Repair |
NetOps | Network Operations |
CWE | Common Weakness Enumeration |
FDSP | Feedback-Driven Security Patching |
CTF | Capture the Flag |
DIA | Dynamic Intelligence Assessment |
NER | Named Entity Recognition |
GDPR | General Data Protection Regulation |
HIPAA | Health Insurance Portability and Accountability Act |
PDPA | Personal Data Protection Act |
SSCA | Safety and Security Co-analysis |
MTTA | Mean Time to Attack |
SMPC | Secure Multi-Party Computation |
RLHF | Reinforcement Learning from Human Feedback |
NIS | Network and Information System |
AR | Augmented Reality |
RAG | Retrieval-Augmented Generation |
References
- Critical National Infrastructure. Available online: https://www.npsa.gov.uk/critical-national-infrastructure-0 (accessed on 3 July 2024).
- 130+ Cybersecurity Statistics to Inspire Action This Year [2024 Update]. Available online: https://secureframe.com/blog/cybersecurity-statistics (accessed on 18 May 2024).
- Shifting Attack Landscapes and Sectors in Q1 2024 with a 28% Increase in Cyber Attacks Globally. Available online: https://blog.checkpoint.com/research/shifting-attack-landscapes-and-sectors-in-q1-2024-with-a-28-increase-in-cyber-attacks-globally/ (accessed on 18 May 2024).
- Cybercrime Expected to Skyrocket in Coming Years. Available online: https://www.statista.com/chart/28878/expected-cost-of-cybercrime-until-2027/ (accessed on 18 May 2024).
- High-Impact Attacks on Critical Infrastructure Climb 140%. Available online: https://securityintelligence.com/news/high-impact-attacks-on-critical-infrastructure-climb-140/ (accessed on 15 May 2024).
- What Is Industry 4.0 and How Does It Work? Available online: https://www.ibm.com/topics/industry-4-0 (accessed on 27 June 2023).
- BlackEnergy APT Attacks in Ukraine. Available online: https://www.kaspersky.com/resource-center/threats/blackenergy (accessed on 27 June 2023).
- Twenty Years of Cyberattacks on the World of Water. Available online: https://www.stormshield.com/news/twenty-years-of-cyber-attacks-on-the-world-of-water/ (accessed on 27 June 2023).
- Kansas Man Indicted in Connection with 2019 Hack at Water Utility. Available online: https://cyberscoop.com/kansas-ellsworth-water-district-hack-travnichek/ (accessed on 27 June 2023).
- Cyber-Attacks and Data Breaches in Review: May 2021. Available online: https://www.itgovernance.eu/blog/en/cyber-attacks-and-data-breaches-in-review-may-2021 (accessed on 27 June 2023).
- New APT34 Malware Targets the Middle East. Available online: https://www.trendmicro.com/ (accessed on 27 June 2023).
- Indicators of Compromise for Malware Used by APT28. Available online: https://www.ncsc.gov.uk/news/indicators-of-compromise-for-malware-used-by-apt28 (accessed on 27 June 2023).
- Significant Cyber Incidents. Available online: https://www.csis.org/programs/strategic-technologies-program/significant-cyber-incidents (accessed on 1 July 2024).
- NIST Cybersecurity Framework. Available online: https://www.nist.gov/cyberframework (accessed on 4 February 2025).
- What Is ISO/IEC 27001? Available online: https://www.iso.org/standard/27001 (accessed on 4 February 2025).
- Yigit, Y.; Bal, B.; Karameseoglu, A.; Duong, T.Q.; Canberk, B. Digital Twin-Enabled Intelligent DDoS Detection Mechanism for Autonomous Core Networks. IEEE Commun. Stand. Mag. 2022, 6, 38–44. [Google Scholar] [CrossRef]
- Yigit, Y.; Chrysoulas, C.; Yurdakul, G.; Maglaras, L.; Canberk, B. Digital Twin-Empowered Smart Attack Detection System for 6G Edge of Things Networks. In Proceedings of the 2023 IEEE Globecom Workshops (GC Wkshps), Kuala Lumpur, Malaysia, 4–8 December 2023; pp. 178–183. [Google Scholar] [CrossRef]
- Makrakis, G.M.; Kolias, C.; Kambourakis, G.; Rieger, C.; Benjamin, J. Industrial and Critical Infrastructure Security: Technical Analysis of Real-Life Security Incidents. IEEE Access 2021, 9, 165295–165325. [Google Scholar] [CrossRef]
- Maglaras, L.; Janicke, H.; Ferrag, M.A.; Buchanan, W.J.; Tassiulas, L. Bridging the gap between Cybersecurity and Reliability for Critical National Infrastructures. BRIDGE 2023, 119, 14–19. [Google Scholar]
- Application of Monte Carlo Simulations to System Reliability Analysis. Available online: https://www.911metallurgist.com/blog/wp-content/uploads/2016/01/Application-of-Monte-Carlo-Simulations-to-System-Reliability-Analysis.pdf (accessed on 20 February 2024).
- Dechgummarn, Y.; Fuangfoo, P.; Kampeerawat, W. Predictive Reliability Analysis of Power Distribution Systems Considering the Effects of Seasonal Factors on Outage Data Using Weibull Analysis Combined With Polynomial Regression. IEEE Access 2023, 11, 138261–138278. [Google Scholar] [CrossRef]
- Liao, Q.; Wang, X.; Ling, D.; Xiao, Z.; Huang, H.Z. Equipment reliability analysis based on the Mean-rank method of two-parameter Weibull distribution. In Proceedings of the 2011 International Conference on Quality, Reliability, Risk, Maintenance, and Safety Engineering, Xi’an, China, 17–19 June 2011; pp. 361–364. [Google Scholar] [CrossRef]
- Ali, S.; Zafar, T.; Shah, I.; Wang, L. Cumulative Conforming Control Chart Assuming Discrete Weibull Distribution. IEEE Access 2020, 8, 10123–10133. [Google Scholar] [CrossRef]
- Yang, Y.; Li, J.; Xu, C. Reliability Data Analysis of Aviation Equipment Based on Weibull Distribution. In Proceedings of the 2022 4th International Conference on Frontiers Technology of Information and Computer (ICFTIC), Qingdao, China, 2–4 December 2022; pp. 342–345. [Google Scholar] [CrossRef]
- Zuo, W.; Li, K. Three-State Markov Chain Based Reliability Analysis of Complex Traction Power Supply Systems. In Proceedings of the 2021 5th International Conference on System Reliability and Safety (ICSRS), Palermo, Italy, 24–26 November 2021; pp. 74–79. [Google Scholar] [CrossRef]
- Maglaras, L. From Mean Time to Failure to Mean Time to Attack/Compromise: Incorporating Reliability into Cybersecurity. Computers 2022, 11, 159. [Google Scholar] [CrossRef]
- Wang, Y.; Han, X.; Ding, Y. Power system operational reliability equivalent modeling and analysis based on the Markov Chain. In Proceedings of the 2012 IEEE International Conference on Power System Technology (POWERCON), Auckland, New Zealand, 30 October–2 November 2012; pp. 1–5. [Google Scholar] [CrossRef]
- Nashwan, I.I.H. Reliability Function of the Connected-(2,2)-out-of-(m,n): F Linear and Circular System Using Markov Chain. In Proceedings of the 2023 International Conference on Information Technology (ICIT), Kyoto, Japan, 14–17 December 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Cardoso, J.B.; de Almeida, J.R.; Dias, J.M.; Coelho, P.G. Structural reliability analysis using Monte Carlo simulation and neural networks. Adv. Eng. Softw. 2008, 39, 505–513. [Google Scholar] [CrossRef]
- How to Use Monte Carlo simulation for Reliability Analysis? Available online: https://eracons.com/resources/monte-carlo-simulation (accessed on 10 February 2024).
- Song, C.; Kawai, R. Monte Carlo and variance reduction methods for structural reliability analysis: A comprehensive review. Probabil. Eng. Mech. 2023, 73, 103479. [Google Scholar] [CrossRef]
- Bhusal, D.; Alam, M.T.; Nguyen, L.; Mahara, A.; Lightcap, Z.; Frazier, R.; Fieblinger, R.; Torales, G.L.; Rastogi, N. SECURE: Benchmarking Generative Large Language Models for Cybersecurity Advisory. arXiv 2024, arXiv:2405.20441. [Google Scholar]
- Miao, Y.; Bai, Y.; Chen, L.; Li, D.; Sun, H.; Wang, X.; Luo, Z.; Ren, Y.; Sun, D.; Xu, X.; et al. An empirical study of netops capability of pre-trained large language models. arXiv 2023, arXiv:2309.05557. [Google Scholar]
- Tian, R.; Ye, Y.; Qin, Y.; Cong, X.; Lin, Y.; Pan, Y.; Wu, Y.; Hui, H.; Liu, W.; Liu, Z.; et al. Debugbench: Evaluating debugging capability of large language models. arXiv 2024, arXiv:2401.04621. [Google Scholar]
- Liu, Z. Secqa: A concise question-answering dataset for evaluating large language models in computer security. arXiv 2023, arXiv:2312.15838. [Google Scholar]
- Siddiq, M.L.; Santos, J.C. SecurityEval dataset: Mining vulnerability examples to evaluate machine learning-based code generation techniques. In Proceedings of the 1st International Workshop on Mining Software Repositories Applications for Privacy and Security, Singapore, 18 November 2022; pp. 29–33. [Google Scholar]
- Tihanyi, N.; Ferrag, M.A.; Jain, R.; Bisztray, T.; Debbah, M. CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge. In Proceedings of the 2024 IEEE International Conference on Cyber Security and Resilience (CSR), London, UK, 2–4 September 2024; pp. 296–302. [Google Scholar]
- Li, G.; Li, Y.; Guannan, W.; Yang, H.; Yu, Y. SecEval: A Comprehensive Benchmark for Evaluating Cybersecurity Knowledge of Foundation Models. 2023. Available online: https://github.com/XuanwuAI/SecEval (accessed on 8 February 2025).
- Alrashedy, K.; Aljasser, A.; Tambwekar, P.; Gombolay, M. Can LLMs Patch Security Issues? arXiv 2023, arXiv:2312.00024. [Google Scholar]
- Liu, Y.; Pei, C.; Xu, L.; Chen, B.; Sun, M.; Zhang, Z.; Sun, Y.; Zhang, S.; Wang, K.; Zhang, H.; et al. OpsEval: A Comprehensive IT Operations Benchmark Suite for Large Language Models. arXiv 2024, arXiv:2310.07637. [Google Scholar]
- Hossen, M.I.; Zhang, J.; Cao, Y.; Hei, X. Assessing Cybersecurity Vulnerabilities in Code Large Language Models. arXiv 2024, arXiv:2404.18567. [Google Scholar]
- Yu, Z.; Zeng, J.; Chen, S.; Xu, W.; Xu, D.; Liu, X.; Ying, Z.; Wang, N.; Zhang, Y.; Yang, M. CS-Eval: A Comprehensive Large Language Model Benchmark for CyberSecurity. arXiv 2024, arXiv:2411.16239. [Google Scholar]
- Bhatt, M.; Chennabasappa, S.; Nikolaidis, C.; Wan, S.; Evtimov, I.; Gabi, D.; Song, D.; Ahmad, F.; Aschermann, C.; Fontana, L.; et al. Purple llama cyberseceval: A secure coding benchmark for language models. arXiv 2023, arXiv:2312.04724. [Google Scholar]
- Tony, C.; Mutas, M.; Ferreyra, N.E.D.; Scandariato, R. Llmseceval: A dataset of natural language prompts for security evaluations. In Proceedings of the 2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR), Melbourne, Australia, 15–16 May 2023; pp. 588–592. [Google Scholar]
- Shao, M.; Jancheska, S.; Udeshi, M.; Dolan-Gavitt, B.; Xi, H.; Milner, K.; Chen, B.; Yin, M.; Garg, S.; Krishnamurthy, P.; et al. NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security. arXiv 2024, arXiv:2406.05590. [Google Scholar]
- Tihanyi, N.; Bisztray, T.; Dubniczky, R.A.; Toth, R.; Borsos, B.; Cherif, B.; Jain, R.; Muzsai, L.; Ferrag, M.A.; Marinelli, R.; et al. Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence. In Proceedings of the 2024 IEEE International Conference on Big Data (BigData), Washington, DC, USA, 15–18 December 2024; pp. 3313–3321. [Google Scholar] [CrossRef]
- Chauvin, T. eyeballvul: A future-proof benchmark for vulnerability detection in the wild. arXiv 2024, arXiv:2407.08708. [Google Scholar]
- Deka, P.; Rajapaksha, S.; Rani, R.; Almutairi, A.; Karafili, E. AttackER: Towards Enhancing Cyber-Attack Attribution with a Named Entity Recognition Dataset. In Proceedings of the International Conference on Web Information Systems Engineering, Doha, Qatar, 2–5 December 2024; pp. 255–270. [Google Scholar]
- Wan, S.; Nikolaidis, C.; Song, D.; Molnar, D.; Crnkovich, J.; Grace, J.; Bhatt, M.; Chennabasappa, S.; Whitman, S.; Ding, S.; et al. Cyberseceval 3: Advancing the evaluation of cybersecurity risks and capabilities in large language models. arXiv 2024, arXiv:2408.01605. [Google Scholar]
- Yang, Z.; Meng, Z.; Zheng, X.; Wattenhofer, R. Assessing Adversarial Robustness of Large Language Models: An Empirical Study. arXiv 2024, arXiv:2405.02764. [Google Scholar]
- Rajaee, M.; Mazlumi, K. Multi-Agent Distributed Deep Learning Algorithm to Detect Cyber-Attacks in Distance Relays. IEEE Access 2023, 11, 10842–10849. [Google Scholar] [CrossRef]
- Ferrag, M.A.; Friha, O.; Maglaras, L.; Janicke, H.; Shu, L. Federated Deep Learning for Cyber Security in the Internet of Things: Concepts, Applications, and Experimental Analysis. IEEE Access 2021, 9, 138509–138542. [Google Scholar] [CrossRef]
- Pinto, A.; Herrera, L.C.; Donoso, Y.; Gutierrez, J.A. Survey on Intrusion Detection Systems Based on Machine Learning Techniques for the Protection of Critical Infrastructure. Sensors 2023, 23, 2415. [Google Scholar] [CrossRef]
- Chen, Y.; Cui, M.; Wang, D.; Cao, Y.; Yang, P.; Jiang, B.; Lu, Z.; Liu, B. A survey of large language models for cyber threat detection. Comput. Secur. 2024, 145, 104016. [Google Scholar] [CrossRef]
- Yigit, Y.; Panitsas, I.; Maglaras, L.; Tassiulas, L.; Canberk, B. Cyber-Twin: Digital Twin-Boosted Autonomous Attack Detection for Vehicular Ad-Hoc Networks. In Proceedings of the ICC 2024—IEEE International Conference on Communications, Denver, CO, USA, 9–13 June 2024; pp. 2167–2172. [Google Scholar] [CrossRef]
- Risk Management Standards. Available online: https://www.enisa.europa.eu/publications/risk-management-standards (accessed on 4 February 2025).
- General Data Protection Regulation. Available online: https://gdpr-info.eu/ (accessed on 10 March 2024).
- Health Insurance Portability and Accountability Act. Available online: https://www.ncbi.nlm.nih.gov/books/NBK500019/ (accessed on 8 February 2025).
- Personal Data Protection Act Overview. Available online: https://www.pdpc.gov.sg/overview-of-pdpa/the-legislation/personal-data-protection-act (accessed on 10 March 2024).
- Liu, Y.; Shan, G.; Liu, Y.; Alghamdi, A.; Alam, I.; Biswas, S. Blockchain Bridges Critical National Infrastructures: E-Healthcare Data Migration Perspective. IEEE Access 2022, 10, 28509–28519. [Google Scholar] [CrossRef]
- Kendzierskyj, S.; Jahankhani, H. The Role of Blockchain in Supporting Critical National Infrastructure. In Proceedings of the 2019 IEEE 12th International Conference on Global Security, Safety and Sustainability (ICGS3), London, UK, 16–18 January 2019; pp. 208–212. [Google Scholar] [CrossRef]
- Ten, C.W.; Manimaran, G.; Liu, C.C. Cybersecurity for Critical Infrastructures: Attack and Defense Modeling. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2010, 40, 853–865. [Google Scholar] [CrossRef]
- Bakalos, N.; Voulodimos, A.; Doulamis, N.; Doulamis, A.; Ostfeld, A.; Salomons, E.; Caubet, J.; Jimenez, V.; Li, P. Protecting Water Infrastructure From Cyber and Physical Threats: Using Multimodal Data Fusion and Adaptive Deep Learning to Monitor Critical Systems. IEEE Signal Process. Mag. 2019, 36, 36–48. [Google Scholar] [CrossRef]
- Maglaras, L.; Ayres, N.; Moschoyiannis, S.; Tassiulas, L. The end of Eavesdropping Attacks through the Use of Advanced End to End Encryption Mechanisms. In Proceedings of the IEEE INFOCOM 2022—IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Virtual Conference, 2–5 May 2022; pp. 1–2. [Google Scholar] [CrossRef]
- Kavallieratos, G.; Katsikas, S.; Gkioulos, V. SafeSec Tropos: Joint security and safety requirements elicitation. Comput. Stand. Interfaces 2020, 70, 103429. [Google Scholar] [CrossRef]
- Fan, S.; Yang, Z. Safety and security co-analysis in transport systems: Current state and regulatory development. Transp. Res. Part A Policy Pract. 2022, 166, 369–388. [Google Scholar] [CrossRef]
- Lautieri, S.; Cooper, D.; Jackson, D. SafSec: Commonalities between safety and security assurance. In Proceedings of the Constituents of Modern System-safety Thinking: Proceedings of the Thirteenth Safety-Critical Systems Symposium, Southampton, UK, 8–10 February 2005; pp. 65–75. [Google Scholar]
- Archer, D.W.; Bogdanov, D.; Lindell, Y.; Kamm, L.; Nielsen, K.; Pagter, J.I.; Smart, N.P.; Wright, R.N. From Keys to Databases—Real-World Applications of Secure Multi-Party Computation. Comput. J. 2018, 61, 1749–1771. [Google Scholar] [CrossRef]
- Yigit, Y.; Ahmadi, H.; Yurdakul, G.; Canberk, B.; Hoang, T.; Duong, T.Q. Digi-Infrastructure: Digital Twin-Enabled Traffic Shaping with Low-Latency for 6G Smart Cities. IEEE Commun. Stand. Mag. 2024, 8, 28–34. [Google Scholar] [CrossRef]
- Giannopoulos, A.E.; Spantideas, S.T.; Zetas, M.; Nomikos, N.; Trakadas, P. FedShip: Federated Over-the-Air Learning for Communication-Efficient and Privacy-Aware Smart Shipping in 6G Communications. IEEE Trans. Intell. Transp. Syst. 2024, 25, 19873–19888. [Google Scholar] [CrossRef]
- Balint, A.; Raja, H.; Driesen, J.; Kazmi, H. Using Domain-Augmented Federated Learning to Model Thermostatically Controlled Loads. IEEE Trans. Smart Grid 2023, 14, 4116–4124. [Google Scholar] [CrossRef]
- Chowdhury, N.; Gkioulos, V. Cyber security training for Critical Infrastructure Protection: A literature review. Comput. Sci. Rev. 2021, 40, 100361. [Google Scholar] [CrossRef]
- Ferrag, M.A.; Debbah, M.; Al-Hawawreh, M. Generative AI for cyber threat-hunting in 6g-enabled iot networks. In Proceedings of the 2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW), Bangalore, India, 1–4 May 2023; pp. 16–25. [Google Scholar]
- Zhu, B.; Mu, N.; Jiao, J.; Wagner, D. Generative AI Security: Challenges and Countermeasures. arXiv 2024, arXiv:2402.12617. [Google Scholar]
- Ferrag, M.A.; Ndhlovu, M.; Tihanyi, N.; Cordeiro, L.C.; Debbah, M.; Lestable, T.; Thandi, N.S. Revolutionizing Cyber Threat Detection with Large Language Models: A privacy-preserving BERT-based Lightweight Model for IoT/IIoT Devices. IEEE Access 2024, 12, 3363469. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
- Taori, R.; Gulrajani, I.; Zhang, T.; Dubois, Y.; Li, X.; Guestrin, C.; Liang, P.; Hashimoto, T.B. Alpaca: A strong, replicable instruction-following model. Stanf. Cent. Res. Found. Model. 2023, 3, 7. [Google Scholar]
- Driess, D.; Xia, F.; Sajjadi, M.S.; Lynch, C.; Chowdhery, A.; Ichter, B.; Wahid, A.; Tompson, J.; Vuong, Q.; Yu, T.; et al. Palm-e: An embodied multimodal language model. arXiv 2023, arXiv:2303.03378. [Google Scholar]
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open foundation and fine-tuned chat models. arXiv 2023, arXiv:2307.09288. [Google Scholar]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
- Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y.; et al. A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. 2023, 15, 1–45. [Google Scholar] [CrossRef]
- Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A survey of large language models. arXiv 2023, arXiv:2303.18223. [Google Scholar]
- Zhao, H.; Chen, H.; Yang, F.; Liu, N.; Deng, H.; Cai, H.; Wang, S.; Yin, D.; Du, M. Explainability for large language models: A survey. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–38. [Google Scholar] [CrossRef]
- Zhang, S.; Dong, L.; Li, X.; Zhang, S.; Sun, X.; Wang, S.; Li, J.; Hu, R.; Zhang, T.; Wu, F.; et al. Instruction tuning for large language models: A survey. arXiv 2023, arXiv:2308.10792. [Google Scholar]
- Min, B.; Ross, H.; Sulem, E.; Veyseh, A.P.B.; Nguyen, T.H.; Sainz, O.; Agirre, E.; Heintz, I.; Roth, D. Recent advances in natural language processing via large pre-trained language models: A survey. ACM Comput. Surv. 2023, 56, 1–40. [Google Scholar] [CrossRef]
- Zhu, Y.; Yuan, H.; Wang, S.; Liu, J.; Liu, W.; Deng, C.; Dou, Z.; Wen, J.R. Large language models for information retrieval: A survey. arXiv 2023, arXiv:2308.07107. [Google Scholar]
- Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.; Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal, S.; Slama, K.; Ray, A.; et al. Training language models to follow instructions with human feedback. Adv. Neural Inf. Process. Syst. 2022, 35, 27730–27744. [Google Scholar]
- Ashkboos, S.; Mohtashami, A.; Croci, M.L.; Li, B.; Jaggi, M.; Alistarh, D.; Hoefler, T.; Hensman, J. QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs. arXiv 2024, arXiv:2404.00456. [Google Scholar]
- Frantar, E.; Ashkboos, S.; Hoefler, T.; Alistarh, D. Gptq: Accurate post-training quantization for generative pre-trained transformers. arXiv 2022, arXiv:2210.17323. [Google Scholar]
- Lin, J.; Tang, J.; Tang, H.; Yang, S.; Dang, X.; Han, S. Awq: Activation-aware weight quantization for llm compression and acceleration. arXiv 2023, arXiv:2306.00978. [Google Scholar] [CrossRef]
- Kim, S.; Hooper, C.; Gholami, A.; Dong, Z.; Li, X.; Shen, S.; Mahoney, M.W.; Keutzer, K. Squeezellm: Dense-and-sparse quantization. arXiv 2023, arXiv:2306.07629. [Google Scholar]
- Egiazarian, V.; Panferov, A.; Kuznedelev, D.; Frantar, E.; Babenko, A.; Alistarh, D. Extreme Compression of Large Language Models via Additive Quantization. arXiv 2024, arXiv:2401.06118. [Google Scholar]
- Rahmath P, H.; Srivastava, V.; Chaurasia, K.; Pacheco, R.G.; Couto, R.S. Early-Exit Deep Neural Network—A Comprehensive Survey. ACM Comput. Surv. 2024, 57, 1–37. [Google Scholar] [CrossRef]
- Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; Wang, H. Retrieval-augmented generation for large language models: A survey. arXiv 2023, arXiv:2312.10997. [Google Scholar]
- Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.T.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
- Yu, G.I.; Jeong, J.S.; Kim, G.W.; Kim, S.; Chun, B.G. Orca: A distributed serving system for {Transformer-Based} generative models. In Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22), Carlsbad, CA, USA, 11–13 July 2022; pp. 521–538. [Google Scholar]
- Patel, P.; Choukse, E.; Zhang, C.; Goiri, Í.; Shah, A.; Maleki, S.; Bianchini, R. Splitwise: Efficient generative llm inference using phase splitting. arXiv 2023, arXiv:2311.18677. [Google Scholar]
- Li, D.; Shao, R.; Xie, A.; Xing, E.P.; Gonzalez, J.E.; Stoica, I.; Ma, X.; Zhang, H. Lightseq: Sequence level parallelism for distributed training of long context transformers. arXiv 2023, arXiv:2310.03294. [Google Scholar]
- Kang, H.; Zhang, Q.; Kundu, S.; Jeong, G.; Liu, Z.; Krishna, T.; Zhao, T. Gear: An efficient kv cache compression recipefor near-lossless generative inference of llm. arXiv 2024, arXiv:2403.05527. [Google Scholar]
- Shoeybi, M.; Patwary, M.; Puri, R.; LeGresley, P.; Casper, J.; Catanzaro, B. Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv 2019, arXiv:1909.08053. [Google Scholar]
- Huang, Y.; Cheng, Y.; Bapna, A.; Firat, O.; Chen, M.X.; Chen, D.; Lee, H.; Ngiam, J.; Le, Q.V.; Wu, Y.; et al. GPipe: Easy Scaling with Micro-Batch Pipel ine Parallelism. arXiv 2019, arXiv:1811.06965. [Google Scholar]
- Miao, X.; Oliaro, G.; Zhang, Z.; Cheng, X.; Jin, H.; Chen, T.; Jia, Z. Towards efficient generative large language model serving: A survey from algorithms to systems. arXiv 2023, arXiv:2312.15234. [Google Scholar]
- Gozalo-Brizuela, R.; Garrido-Merchán, E.C. A survey of Generative AI Applications. arXiv 2023, arXiv:2306.02781. [Google Scholar] [CrossRef]
- Huang, X.; Liu, W.; Chen, X.; Wang, X.; Wang, H.; Lian, D.; Wang, Y.; Tang, R.; Chen, E. Understanding the planning of LLM agents: A survey. arXiv 2024, arXiv:2402.02716. [Google Scholar]
- Ferrag, M.A.; Friha, O.; Kantarci, B.; Tihanyi, N.; Cordeiro, L.; Debbah, M.; Hamouda, D.; Al-Hawawreh, M.; Choo, K.K.R. Edge learning for 6G-enabled Internet of Things: A comprehensive survey of vulnerabilities, datasets, and defenses. IEEE Commun. Surv. Tutor. 2023, 25, 2654–2713. [Google Scholar] [CrossRef]
- Xi, Z.; Chen, W.; Guo, X.; He, W.; Ding, Y.; Hong, B.; Zhang, M.; Wang, J.; Jin, S.; Zhou, E.; et al. The rise and potential of large language model based agents: A survey. arXiv 2023, arXiv:2309.07864. [Google Scholar] [CrossRef]
- Li, Y.; Wen, H.; Wang, W.; Li, X.; Yuan, Y.; Liu, G.; Liu, J.; Xu, W.; Wang, X.; Sun, Y.; et al. Personal llm agents: Insights and survey about the capability, efficiency and security. arXiv 2024, arXiv:2401.05459. [Google Scholar]
- Guo, T.; Chen, X.; Wang, Y.; Chang, R.; Pei, S.; Chawla, N.V.; Wiest, O.; Zhang, X. Large language model based multi-agents: A survey of progress and challenges. arXiv 2024, arXiv:2402.01680. [Google Scholar]
- Jin, H.; Huang, L.; Cai, H.; Yan, J.; Li, B.; Chen, H. From llms to llm-based agents for software engineering: A survey of current, challenges and future. arXiv 2024, arXiv:2408.02479. [Google Scholar]
- Yigit, Y.; Maglaras, L.A.; Buchanan, W.J.; Canberk, B.; Shin, H.; Duong, T.Q. AI-Enhanced Digital Twin Framework for Cyber-Resilient 6G Internet of Vehicles Networks. IEEE Internet Things J. 2024, 11, 36168–36181. [Google Scholar] [CrossRef]
- Spantideas, S.T.; Giannopoulos, A.E.; Trakadas, P. Smart Mission Critical Service Management: Architecture, Deployment Options, and Experimental Results. IEEE Trans. Netw. Serv. Manag. 2024, 1. [Google Scholar] [CrossRef]
- Perez-Cerrolaza, J.; Abella, J.; Borg, M.; Donzella, C.; Cerquides, J.; Cazorla, F.J.; Englund, C.; Tauber, M.; Nikolakopoulos, G.; Flores, J.L. Artificial Intelligence for Safety-Critical Systems in Industrial and Transportation Domains: A Survey. ACM Comput. Surv. 2024, 56, 1–40. [Google Scholar] [CrossRef]
- Turtiainen, H.; Costin, A.; Hämäläinen, T. Defensive Machine Learning Methods and the Cyber Defence Chain. In Artificial Intelligence and Cybersecurity: Theory and Applications; Sipola, T., Kokkonen, T., Karjalainen, M., Eds.; Springer International Publishing: Cham, Switzerland, 2023; pp. 147–163. [Google Scholar] [CrossRef]
- Wang, L.; Ma, C.; Feng, X.; Zhang, Z.; Yang, H.; Zhang, J.; Chen, Z.; Tang, J.; Chen, X.; Lin, Y.; et al. A survey on large language model based autonomous agents. Front. Comput. Sci. 2024, 18, 186345. [Google Scholar] [CrossRef]
- He, F.; Zhu, T.; Ye, D.; Liu, B.; Zhou, W.; Yu, P.S. The emerged security and privacy of llm agent: A survey with case studies. arXiv 2024, arXiv:2407.19354. [Google Scholar]
- Li, X.; Wang, S.; Zeng, S.; Wu, Y.; Yang, Y. A survey on LLM-based multi-agent systems: Workflow, infrastructure, and challenges. Vicinagearth 2024, 1, 9. [Google Scholar] [CrossRef]
- Xie, J.; Chen, Z.; Zhang, R.; Wan, X.; Li, G. Large multimodal agents: A survey. arXiv 2024, arXiv:2402.15116. [Google Scholar]
- Dong, X.; Zhang, X.; Bu, W.; Zhang, D.; Cao, F. A Survey of LLM-based Agents: Theories, Technologies, Applications and Suggestions. In Proceedings of the 2024 3rd International Conference on Artificial Intelligence, Internet of Things and Cloud Computing Technology (AIoTC), Wuhan, China, 13–15 September 2024; pp. 407–413. [Google Scholar]
- Yigit, Y.; Buchanan, W.J.; Tehrani, M.G.; Maglaras, L. Review of Generative AI methods in cybersecurity. arXiv 2024, arXiv:2403.08701. [Google Scholar]
- Kieslich, K.; Keller, B.; Starke, C. Artificial intelligence ethics by design. Evaluating public perception on the importance of ethical design principles of artificial intelligence. Big Data Soc. 2022, 9, 20539517221092956. [Google Scholar] [CrossRef]
- Al-kfairy, M.; Mustafa, D.; Kshetri, N.; Insiew, M.; Alfandi, O. Ethical challenges and solutions of Generative AI: An interdisciplinary perspective. Informatics 2024, 11, 58. [Google Scholar] [CrossRef]
- Anderljung, M.; Hazell, J.; von Knebel, M. Protecting society from AI misuse: When are restrictions on capabilities warranted? AI Soc. 2024, 1–17. [Google Scholar] [CrossRef]
- Gupta, M.; Akiri, C.; Aryal, K.; Parker, E.; Praharaj, L. From chatgpt to threatgpt: Impact of Generative AI in cybersecurity and privacy. IEEE Access 2023, 11, 80218–80245. [Google Scholar] [CrossRef]
- Veale, M.; Binns, R. Fairer machine learning in the real world: Mitigating discrimination without collecting sensitive data. Big Data Soc. 2017, 4, 2053951717743530. [Google Scholar] [CrossRef]
- Friha, O.; Ferrag, M.A.; Kantarci, B.; Cakmak, B.; Ozgun, A.; Ghoualmi-Zine, N. Llm-based edge intelligence: A comprehensive survey on architectures, applications, security and trustworthiness. IEEE Open J. Commun. Soc. 2024, 5, 5799–5856. [Google Scholar] [CrossRef]
- Digital Twin Industry Research Report 2024–2029. Available online: https://www.businesswire.com (accessed on 4 February 2025).
- NIST Releases First 3 Finalized Post-Quantum Encryption Standards. Available online: https://www.nist.gov/ (accessed on 4 February 2025).
- Blockchain Market. Available online: https://www.marketsandmarkets.com/ (accessed on 4 February 2025).
- Yigit, Y.; Kinaci, O.K.; Duong, T.Q.; Canberk, B. TwinPot: Digital Twin-assisted Honeypot for Cyber-Secure Smart Seaports. In Proceedings of the 2023 IEEE International Conference on Communications Workshops (ICC Workshops), Rome, Italy, 28 May–1 June 2023; pp. 740–745. [Google Scholar] [CrossRef]
- Yigit, Y.; Nguyen, L.D.; Ozdem, M.; Kinaci, O.K.; Hoang, T.; Canberk, B.; Duong, T.Q. TwinPort: 5G Drone-assisted Data Collection with Digital Twin for Smart Seaports. Sci. Rep. 2023, 13, 12310. [Google Scholar] [CrossRef]
- Papathanasaki, M.; Fountas, P.; Maglaras, L.; Douligeris, C.; Ferrag, M.A. Quantum Cryptography in Maritime Telecommunications. In Proceedings of the 2021 IEEE International Conference on Cyber Security and Resilience (CSR), Rhodes, Greece, 26–28 July 2021; pp. 530–535. [Google Scholar] [CrossRef]
- Ak, E.; Canberk, B. BCDN: A proof of concept model for blockchain-aided CDN orchestration and routing. Comput. Netw. 2019, 161, 162–171. [Google Scholar] [CrossRef]
- Ferrag, M.A.; Derdour, M.; Mukherjee, M.; Derhab, A.; Maglaras, L.; Janicke, H. Blockchain Technologies for the Internet of Things: Research Issues and Challenges. IEEE Internet Things J. 2019, 6, 2188–2204. [Google Scholar] [CrossRef]
- Tihanyi, N.; Bisztray, T.; Ferrag, M.A.; Jain, R.; Cordeiro, L.C. How secure is AI-generated code: A large-scale comparison of large language models. Empir. Softw. Eng. 2025, 30, 1–42. [Google Scholar] [CrossRef]
- Mechri, A.; Ferrag, M.A.; Debbah, M. SecureQwen: Leveraging LLMs for vulnerability detection in python codebases. Comput. Secur. 2025, 148, 104151. [Google Scholar] [CrossRef]
- Global Perspectives on Blockchain Adoption by Industry: The Future Is Now. Available online: https://www2.deloitte.com/ (accessed on 4 February 2025).
Month and Year | Attack Type | Critical National Infrastructure Sector | Area |
---|---|---|---|
January 2022 | Phishing | Government | USA |
February 2022 | Ransomware | Energy | Belgium, Germany |
March 2022 | Data Breach | Government | Italy |
April 2022 | Ransomware | Finance | Costa Rica |
May 2022 | DDoS | Transport | UK |
June 2022 | DDoS | Transport | Norway |
July 2022 | Misinformation | Communications | Ukraine |
August 2022 | Data Breach | Government | Montenegro |
September 2022 | Data Breach | Defence | Mexico |
October 2022 | Ransomware | Communications | Australia |
November 2022 | DDoS | Government | India |
December 2022 | DDoS | Government | Vatican City |
January 2023 | Ransomware | Government | UK |
February 2023 | Phishing | Government | Italy |
March 2023 | Cyber Espionage | Civil Nuclear | China |
April 2023 | Supply Chain Attack | Communications | Global |
May 2023 | Data Breach | Communications | USA |
June 2023 | Ransomware | Health | USA |
July 2023 | DDoS | Government | Trinidad and Tobago |
August 2023 | DDoS | Finance | Czech Republic |
September 2023 | Data Theft | Defence | UK |
October 2023 | Malware Phishing | Defence | South Korea |
November 2023 | Data Breach | Space | Japan |
December 2023 | Encryption Attack | Water | Russia |
January 2024 | Ransomware | Government | Sweden |
February 2024 | Data Breach | Health | France |
March 2024 | Data Leak | Defence | Germany |
April 2024 | Data Breach | Finance | El Salvador |
May 2024 | Data Breach | Defence | UK |
Dataset | Domain | #Questions | Format | Language(s) | Key Features/Notes |
---|---|---|---|---|---|
SECURE [32] | ICS (Industrial Control System) Security | 6 datasets | Knowledge extraction, understanding, reasoning | English |
|
NetEval [33] | Networks | 5732 | Multiple-Choice | Multi-lingual |
|
DebugBench [34] | Code Debugging | 4253 instances | Debugging tasks | C++, Java, Python |
|
SecQA [35] | Computer Security | Not specified (Two versions: v1 and v2) | Multiple-Choice | English |
|
SecurityEval [36] | Code Security (Vulnerability) | 130 samples | Code-based tasks | English (code contexts) |
|
CyberMetric [37] | Cybersecurity | 80/500/2000/10,000 | Multiple-Choice | English |
|
SecEval [38] | Cybersecurity | 2000+ | Multiple-Choice | English |
|
PythonSecurityEval [39] | Code Security (Vulnerability) | Large-scale (specific number not disclosed) | Code-based tasks | English, Python |
|
OpsEval [40] | IT Operations (AIOps) | 7184 (MC) + 1736 (QA) | Multiple-Choice & QA | English, Chinese |
|
Dataset | Domain | #Data | Format | Language(s) | Key Features/Notes |
---|---|---|---|---|---|
EvilInstructCoder [41] | Adversarial Attacks on Code LLMs | 81 samples (0.5% of instruction dataset) | Malicious code injection tasks | English |
|
CS-Eval [42] | Cybersecurity (comprehensive & bilingual) | 42 categories | Multiple-question types | English & Chinese |
|
CyberSecEval [43] | Code Security & Compliance | Not specified | Code-based & Compliance tasks | English |
|
LLMSecEval [44] | Code Security | 150 NL prompts | Natural Language (NL) to code tasks | English |
|
NYU CTF Dataset [45] | Cybersecurity CTF Challenges | Diverse range (compiled from popular competitions) | Challenge-based tasks | English |
|
DIA-Bench [46] | Mathematics, Cryptography, Cybersecurity, Computer Science | Dynamic (150 templates with mutable parameters) | Text, PDFs, Compiled Binaries, Visual Puzzles, CTF-style Challenges | English |
|
eyeballvul [47] | Large-Scale Vulnerability Detection | 24,000+ vulnerabilities across 6000+ revisions | Code-based tasks | English |
|
AttackER dataset [48] | Cyber-Attack Attribution | Not specified | NER-based (annotated cybersecurity texts) | English |
|
CYBERSECEVAL 3 [49] | Cybersecurity Risk Measurement | 8 distinct risks | Various (e.g., offensive security, social engineering) | English |
|
ine |
Below is an instruction that describes a task paired with an input that provides further context. Write a response that appropriately completes the request. |
### Instruction: |
{instruction} |
### Input: |
{input} |
### Response: |
{response} |
ine |
Dimension | Traditional CIP | Agentic AI–Enabled CIP | Key Benefits |
---|---|---|---|
Monitoring & Anomaly Detection | Predefined thresholds, manual reviews | Adaptive thresholds via RL, integrates multiple data sources | Real-time detection, fewer false positives, fast zero-day threat identification |
Incident Response | Manual playbooks, limited automation | Automated workflows, AI-driven isolation/failover | Faster containment, consistent and scalable responses |
Predictive Maintenance | Schedule-based, siloed data | Data-driven forecasting, early failure detection | Reduced downtime, cost savings, proactive asset management |
Policy & Compliance | Periodic, manual checks | Real-time validation, automatic non-compliance flags | Continuous compliance, automated reporting, strengthened governance |
Scalability & Flexibility | Hard to scale, infrastructure-heavy | Modular architecture, easy integration | Minimal overhauls, rapid expansion, adaptable system design |
Cross-Agency Collaboration | Manual processes, slow info sharing | Multi-agent synchronization, real-time insights | Coordinated responses, streamlined crisis management, enhanced transparency |
Future Direction | CIP Impact |
---|---|
Digital Twins | Offers virtual replicas of physical assets for proactive risk mitigation, enhanced operational visibility, and real-time analytics. However, adoption in CNI is slow due to cybersecurity risks, high computational demands, and real-time data synchronization challenges [128]. |
Quantum Computing | Enables advanced threat detection, vulnerability analysis, and optimization in resource allocation; can significantly speed up critical computations. |
Quantum Cryptographic | Provides quantum-resistant cryptographic techniques to secure data at rest and in transit, protecting CNIs from advanced quantum attacks. However, deployment requires costly infrastructure upgrades (fiber-optic QKD networks, quantum processors), and industry adoption is still limited to pilot programs in finance and defense [129]. |
Augmented Reality | Improves situational awareness for operators; real-time overlays of system status and threat alerts facilitate rapid decision-making. |
Resilient & Adaptive Control Systems | Uses self-healing and distributed control to maintain operations under stress, mitigating cyberattacks and physical disruptions. |
Blockchain | Ensures tamper-proof data exchange, secure identity management, and immutable audit trails to bolster trust in critical operations. However, high energy consumption, slow transaction speeds, and regulatory challenges limit large-scale CNI adoption [130]. |
High-Quality Cybersecurity Datasets | Enables robust ML model training; diversified, accurately labeled data improves threat detection and minimizes false positives. |
Agentic AI | Leverages autonomous decision-making, real-time orchestration, and adaptive learning for proactive, swift, and scalable CIP solutions. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yigit, Y.; Ferrag, M.A.; Ghanem, M.C.; Sarker, I.H.; Maglaras, L.A.; Chrysoulas, C.; Moradpoor, N.; Tihanyi, N.; Janicke, H. Generative AI and LLMs for Critical Infrastructure Protection: Evaluation Benchmarks, Agentic AI, Challenges, and Opportunities. Sensors 2025, 25, 1666. https://doi.org/10.3390/s25061666
Yigit Y, Ferrag MA, Ghanem MC, Sarker IH, Maglaras LA, Chrysoulas C, Moradpoor N, Tihanyi N, Janicke H. Generative AI and LLMs for Critical Infrastructure Protection: Evaluation Benchmarks, Agentic AI, Challenges, and Opportunities. Sensors. 2025; 25(6):1666. https://doi.org/10.3390/s25061666
Chicago/Turabian StyleYigit, Yagmur, Mohamed Amine Ferrag, Mohamed C. Ghanem, Iqbal H. Sarker, Leandros A. Maglaras, Christos Chrysoulas, Naghmeh Moradpoor, Norbert Tihanyi, and Helge Janicke. 2025. "Generative AI and LLMs for Critical Infrastructure Protection: Evaluation Benchmarks, Agentic AI, Challenges, and Opportunities" Sensors 25, no. 6: 1666. https://doi.org/10.3390/s25061666
APA StyleYigit, Y., Ferrag, M. A., Ghanem, M. C., Sarker, I. H., Maglaras, L. A., Chrysoulas, C., Moradpoor, N., Tihanyi, N., & Janicke, H. (2025). Generative AI and LLMs for Critical Infrastructure Protection: Evaluation Benchmarks, Agentic AI, Challenges, and Opportunities. Sensors, 25(6), 1666. https://doi.org/10.3390/s25061666