1. Introduction
As interconnectivity grows, ensuring the security of information and communication technology (ICT) systems becomes a top priority for organizations, governments, and regulatory bodies worldwide. The rise of complex and multilayered systems, such as smart grids, 5G networks, and the Internet of Things (IoT), presents unique cybersecurity challenges. These systems are often composed of heterogeneous components that must operate in dynamic and evolving environments, making them particularly vulnerable to cyberattacks. In fact, organizations are under extra pressure from “the challenge of managing security exposures in a constantly evolving threat environment and from increasing regulatory obligations and government oversight of cybersecurity”, as reflected in the Gartner Top Trends in Cybersecurity 2024 survey [
1].
In this landscape, the European Union (EU) has recognized the need for a comprehensive cybersecurity certification framework through ambitious initiatives such as the Cybersecurity Act (CSA) [
2] and the Cyber Resilience Act (CRA) [
3]. Both regulations agree that a suitable cybersecurity certification approach would help to assess and compare different devices and products, increasing the trust of end users in a hyperconnected society. They place additional pressure on manufacturers, emphasizing the need to support cybersecurity throughout the product lifecycle.
Therefore, the definition of a certification framework requires efforts from different areas to meet the requirements of different stakeholders, such as manufacturers, institutions, and consumers. One of the main challenges arises from the heterogeneity present in the ICT landscape. The unique characteristics of each system require adapted approaches that could provide objective comparisons with respect to cybersecurity, whereas the dynamic nature of cybersecurity requires that the certification take into account the changing conditions in which the product will operate. This dynamism often manifests itself in patches or updates needed to address new vulnerabilities discovered in specific devices or components. Therefore, an agile certification process is essential to ensure that the security level remains up-to-date throughout the lifecycle of the product, as required by the new EU regulation.
At the same time, a certification framework must address the practical needs of the market by being efficient and cost-effective, ensuring that the launch of new products is not delayed. From an end-user perspective, the technical results of the certification process are often complex and present an additional challenge in how to communicate this information clearly and accessibly. Therefore, it is crucial to convert these technical findings into a simplified format, such as a visual label or certificate. This will help non-expert users easily understand and compare products while preserving the necessary technical details to ensure transparency and trust.
To address these challenges, this article proposes a security evaluation methodology based on two building blocks: security risk assessment and security testing. This methodology aims to certify the system’s security within a specific context. In particular, to support transparency for the user about the outcome of the certification process, this work proposes the concept of a cybersecurity label which contains information related to the level of security validated in the certification process. Our approach is based on an instantiation of the security risk assessment and testing methodology proposed by the European Telecommunications Standards Institute (ETSI) [
4], which relies on the International Organization for Standardization (ISO) 31000 [
5] and ISO 29119 [
6] standards. Furthermore, we integrate the Manufacturer Usage Description (MUD) [
7] standard from the Internet Engineering Task Force (IETF) as part of the certification process.
The applicability of the proposed methodology is demonstrated in two particular scenarios, an ICT gateway used in smart grid infrastructures and an Artificial Intelligence (AI) Investments platform that highlights the methodology’s relevance in assessing security for critical infrastructure components in different environments. The contributions of this paper are threefold:
A flexible and scalable security evaluation framework that integrates risk assessment with test-based evaluation, allowing for continuous monitoring, mitigation, and recertification of ICT systems.
A dynamic security labeling system, which provides transparent and real-time security status indicators, making it easier for both experts and non-experts to assess and compare the security levels of different products or systems.
Validation of the methodology through two case studies, demonstrating its adaptability across different types of ICT systems, including both hardware-centric infrastructures and AI-driven platforms.
2. Challenges and State of the Art
This section explores the key limitations and challenges faced by current certification schemes, as well as recent advances aimed at addressing these gaps.
2.1. Objectivity and Harmonization
A critical issue is the subjectivity and variability of the metrics used within the risk assessment processes. Current assessment methodologies, such as OCTAVE [
8] and EVITA [
9], rely on manual assessments, leading to inconsistent evaluations [
10]. Additionally, metrics such as likelihood or impact are difficult to measure due to their complexity [
11], relying on historical data, which are not always available. Approaches such as ARMOUR [
12] address these challenges by combining risk assessment with empirical testing, but they are limited to specific contexts, e.g., IoT devices, and they lack a generalized framework for complex ICT systems.
Additionally, the wide range of existing cybersecurity certification schemes [
13] can make the comparison of the certification results difficult. This negatively impacts the capability of users to compare the results of the certification process. Some certification frameworks such as Common Criteria (CC) [
14] support the comparability among the results of independent cybersecurity evaluations through Collaborative Protection Profiles (cPP) under the terms of the Common Criteria Recognition Arrangement (CCRA). Others, like Commercial Product Assurance (CPA) [
15], test products against CPA Security Characteristics, some of which are qualitative and ambiguous [
11]. Similarly, CAP UL 2900 standards [
16] are not public, so harmonization aspects are not addressed.
Our methodology integrates security testing results into the risk assessment process, enhancing objectivity and reproducibility while being supported by automated tools. This allows for precise and dynamic assessments of evolving security vulnerabilities. The methodology considers the system as a set of components, integrating individual cybersecurity aspects along with dependencies and potential cascade effects. Moreover, the methodology is aligned with well-known standards from ETSI, ISO, and IETF to foster the harmonization of the processes and metrics.
2.2. Operational Context
The context in which the system will operate is crucial to determine the required security level for certification. Security requirements in a smart home differ from those in a medical environment. Approaches like CC, the European Cybersecurity Certification Scheme (EUCC), or the European Cybersecurity Certification Scheme for Cloud Services (EUCS) rely on security profiles to define the minimum security level for each context, while others, such as CPA, do not adequately consider the operational context [
11]. Inspired by CC, our methodology reuses the notion of protection profile (PP) to establish the security level for each context, serving as the certification basis.
2.3. Cost and Time
Frameworks widely used for ICT security certification, such as CC, OCTAVE, or CAP UL [
17], are based on formal, documentation-intensive processes, specially for high Evaluation Assurance Levels (EALs). While effective in establishing security baselines, CC can be inflexible and resource-intensive, making it unsuitable for fast-evolving systems requiring frequent updates [
18,
19]. Recertification under these schemes can be time-consuming and costly, posing challenges for continuous security assurance in dynamic environments. Although EUCC and EUCS offer important foundations for security certification, their complex processes do not always adapt quickly to emerging technologies or recertifications [
20,
21], and no products have been certified under these evolving schemes.
Automated testing methodologies, such as Model-Based Testing (MBT) and fuzzing, improve the efficiency of security evaluations by automatically generating test cases or random input data to detect vulnerabilities. However, these tools are often implemented in isolation and are not fully integrated with broader certification frameworks, creating a gap between empirical testing and formal certification. Our methodology integrates automated testing tools and methods to accelerate the assessment process. It is flexible enough to be instantiated through other tools and mechanisms, and it allows for composition by reusing the previous results of individual components to assess more complex systems.
One critical aspect often overlooked in security evaluation is value management. This involves balancing the trade-offs between security investments, operational costs, and the overall benefit to the organization. The proposed methodology ensures that resources are allocated efficiently by aligning risk assessment and treatment efforts with the evaluator priorities, which could be mapped to the organization priorities. By integrating continuous feedback mechanisms and cost-effective automated tools, the methodology minimizes unnecessary costs while maintaining robust security levels.
2.4. Dynamism
One of the main challenges in cybersecurity certification is the dynamic nature of cybersecurity. A product certified as secure can quickly become vulnerable due to new threats or patches, needing recertification. EU regulations require manufacturers to address security issues, and the certification process must adapt to lifecycle changes to ensure ongoing compliance. Frameworks like CC, CPA, CSPN, ARMOUR, or CAP UL offer static, point-in-time security evaluations which do not reflect the evolving nature of cyber threats [
11]. Continuous monitoring and fast recertification are crucial in environments with frequent updates or emerging vulnerabilities (e.g., zero-day attacks).
The proposed methodology integrates monitoring and automated assessment tools to support recertification. It helps manufacturers in lifecycle management by linking evaluation results with mitigation strategies and policies enforceable during the product’s deployment. For this, the methodology relies on the Manufacturer Usage Description (MUD) standard, which offers a framework for specifying the network actions a device is allowed to perform, thereby reducing the attack surface. However, MUD’s application is limited to specific device behaviors and does not extend to more complex systems with interactions between hardware, software, and networks [
22]. To address this, we use an extended MUD model [
23] complemented by the evaluation results generated by the methodology.
2.5. Labeling
Cybersecurity certification often lacks transparency, making it difficult for users to understand its benefits due to complex jargon and non-comparable standards. To address this, cybersecurity labels have been introduced as concise indicators of a product’s security level [
12]. These labels balance simplifying technical details for users while clearly representing certification results. In this sense, our methodology incorporates a dynamic security labeling system using QR codes that, together with the continuous monitoring processes, provide real-time visibility into the security status of a system.
Table 1 provides a summary of the comparison between the main certification frameworks and the proposed methodology based on the challenges identified in the previous subsections. Additional information about the frameworks and their limits is detailed in our previous work [
11].
3. Security Evaluation Methodology for ICT Systems
Figure 1 shows the overall process of certification that we propose. It is derived from the ETSI proposal [
4], which combines an extended security assessment derived from ISO 31000 and typical security testing activities following the ISO 29119 standard. Building on this approach, we define a cybersecurity certification framework based on two core components outlined in the ETSI proposal:
security testing and
security risk assessment. Security testing focuses on identifying flaws, vulnerabilities, and other technical issues, while security risk assessment addresses potential vulnerabilities with an emphasis on legal and business concerns. The process begins with
establishing the context, which lays the foundation for both streams by analyzing the environment in which the device or component will be evaluated. The methodology incorporates
Continuous communication and auditing to maintain a management perspective, ensuring continuous oversight and control of the information generated from assessment and testing processes. The
treatment phase provides targeted security countermeasures based on the identified risks. Finally, the methodology also includes specific activities related to certification. In particular, a
labeling activity has been integrated after the risk assessment process, which is in charge of communicating the security level obtained for a specific Target of Evaluation (TOE). While the CC framework defines the TOE as a collection of software, firmware, and/or hardware with accompanying guidance, our approach extends this definition to include the TOE’s configuration (e.g., specific protocols, libraries, or cryptographic parameters, among others).
This work proposes the use of specific technologies and tools for security risk assessment and testing processes as the main building blocks to build a security certification approach for ICT systems. It represents, in turn, an extension of the methodology proposed by ETSI and supported by other standards to take advantage of its standardized basis.
Table 2 shows an overview of the concepts and processes taken from different standards that support the proposed methodology, which will be further detailed in the next subsections. The methodology addresses key challenges identified in previous sections, such as the need for automated, dynamic, and objective risk assessment and the adaptability to different systems and tools.
3.1. Establishing the Context
To effectively assess the security of a system, it is essential to establish a strong basis to identify potential vulnerabilities and security flaws. Traditional risk assessment approaches often depend on known vulnerabilities cataloged in databases such as the National Vulnerability Database (NVD) [
28] and expert analysis. Although these are valuable sources, they may not fully capture the evolving threat landscape, especially for emerging technologies. Our methodology enhances this approach by incorporating not only known threats but also best practices and relevant security standards to form an initial set of security claims.
Therefore, in this preliminary phase of
establishing the context, we define the claims that will serve as a baseline to protect the system against both known and unknown threats. In total, we elicited 74 claims collected in [
29]. Each claim has a description, a STRIDE and impact classification following our methodology, preconditions to apply the claim, dependencies on other claims, metrics to be collected, possible tests, and the sources from which the claim was obtained.
This phase also introduces the concept of Tolerance Profiles (TPs) to reflect the TOE’s operational context. Similarly to PP in CC, these profiles define acceptable levels of risk across different security aspects within the specific operational context.
Figure 2 shows an example of a tolerance profile. The TP specifies the tolerable risk levels for each security property such as a confidentiality risk ranging from 0 to 7. If the confidentiality risk evaluated exceeds the defined threshold of 7, the product would not be eligible for certification. Additionally, a security level coding system, ranging from A (most secure, green) to D (least secure, red), links these risk thresholds to certification in a visual way through the label (see
Section 3.6). Finally, in this phase, we can also establish the depth of the evaluation to be carried out. For this, we can rely on the notion of EALs outlined in the CSA, EUCC, or CC.
3.2. Risk Identification
Once the general claims have been established, the next step is to identify the specific ones relevant to the TOE. The goal of the risk identification phase is to break down the system into its components, determine the applicable security properties and claims for each of them, and define the security tests needed to validate these claims. This phase transitions from a high-level understanding of the system to a more detailed, component-specific view, which is critical for the following security assessment and testing phases.
Figure 3 describes the proposed scheme for identifying risks following the well known Unified Modeling Language (UML) notation, moving from a broad, abstract description of the system to detailed security tests for each component. The process can be summarized as follows:
(1) From system to security properties: Each system is evaluated against authentication, authorization, non-repudiation, confidentiality, integrity, and availability based on the well-known STRIDE taxonomy. Using this taxonomy simplifies the creation of a visual security label later in the process, as the security level will be quantified for each of the categories. It is important to note that not all security properties will apply to every system. For example, if a system does not interact with external resources or entities, the authorization property may not be relevant.
(2) From system security properties to components: Systems may consist of a single component (e.g., a device) or multiple interconnected components, each of which may affect one or more security properties. In this step, we identify the components that compose the system and their associated security properties. This mapping helps to identify which parts of the system require specific security tests.
(3) From components to applicable claims: Once the components are identified, we select the claims applicable to each one of their security properties. The set of claims we identified has already been classified following the STRIDE taxonomy, facilitating this process for the security evaluator.
(4) From generic claims to vulnerability-based claims: Some claims are directly linked to known vulnerabilities (e.g., “C32: The source code must not use components with known vulnerabilities”). These vulnerabilities can be identified from public databases such as the NVD, Common Weakness Enumeration (CWE) (
https://cwe.mitre.org accessed on 3 February 2025), and Common Vulnerabilities and Exposures (CVE) (
https://cve.mitre.org accessed on 3 February 2025). When a vulnerability is identified, it is assigned a predefined risk score (e.g., using CVSS) on a scale from 0 to 10. The security expert may refine this score based on the specific characteristics of the component and its usage context.
(5) From claims to security tests: Once the vulnerabilities and claims have been identified, the next step is to define the tests required to validate them. For vulnerability-based claims, tests are designed to verify whether the identified vulnerabilities are present in the system. For test-based claims, relevant tests are developed to ensure that the system meets the security requirements for each claim.
At the end of the process, we decompose the TOE evaluation in a tree (
Figure 3) in which the leaves represent the specific security tests to be developed and executed in subsequent phases.
This phase also analyses the dependencies between components in order to identify any cascade effects and highly sensitive components within the system; that is, components that depend on a wide number of other components and are more prone to failures. Therefore, a sensitivity metric between 0 and 10 is assigned to each component of the system. The higher the sensitivity value, the more likely the component is to be impacted by cascade effects. This metric can be obtained manually or semi-automatically by analyzing the dependencies discovered in files such as the Software Bill of Materials (SBOM) [
30] for software dependencies or MUD for network dependencies. The sensitivity metric also supports prioritizing components during the security evaluation and testing process. Components with higher sensitivity values are more likely to propagate failures or vulnerabilities throughout the system and should therefore be tested with greater priority. This prioritization is essential to optimize security efforts, as ensuring 100% security is impractical due to resource constraints and/or system complexity. The trade-off between security and practical applicability is managed by focusing testing and verification efforts on the most critical components, ensuring an efficient and effective evaluation.
3.3. Security Testing
The security testing phase is essential for creating a comprehensive test description to evaluate the security claims identified in the previous phase. The first step in this phase is the test design in which claims relevant to the TOE are detailed and transformed into structured test outlines, prioritizing each test based on its associated risk levels. This prioritization ensures that tests targeting higher-risk components or properties are implemented first, optimizing security assessment efforts.
Once the skeleton of the tests is obtained, the
test implementation phase translates them into specific low-level steps that the TOE could understand and process. Based on the nature of each test, the most suitable testing technique and tool should be chosen. Although the MBT approach is highly recommended due to the automation of the testing process, this technique can be complemented with others, such as fuzzing for testing inputs and random behaviors. An example of combining multiple techniques can be found in
Section 4. The tests can be implemented using a multipurpose programming language such as Python, C, or Java with testing support (e.g., the JUnit framework) or a dedicated language such as Testing and Test Control Notation version 3 (TTCN3) [
31], which is supported by TITAN (
https://projects.eclipse.org/projects/tools.titan accessed on 3 February 2025) during the execution.
The
environment setup phase ensures that the TOE and its testing environment are fully configured. This environment may be local, using internal devices and resources, or remote, using platforms such as FIT-IoT (
https://www.iot-lab.info accessed on 3 February 2025) for broader, distributed testing. Local environments allow for precise control, while remote setups facilitate tests that require multiple entities such as distributed denial-of-service simulations.
With the environment established, tests are executed in the test execution phase. Automation is encouraged, for example, using the previously mentioned platforms (JUnit or TITAN) to streamline test execution. At the end of the test execution phase, a test report is generated to collect the results of the tests:
PASS if the test result meets the conditions of the test specification.
FAIL if the test result does not meet the conditions for passing the test.
Specific metrics that are not exclusive to one of the other values (PASS/FAIL). These metrics provide more refined information than a binary result, e.g., the encryption percentage, the algorithm used, or the length of the key, helping to improve the estimation of risk.
3.4. Risk Estimation
The
risk estimation phase uses test results from the previous phase to determine the security level of the system more precisely and objectively. This phase calculates risk by combining two primary factors, the
likelihood of a vulnerability being exploited and the
impact of such an exploitation, using the following well-established formula [
32]:
3.4.1. Likelihood
Even if the two factors of the equation have been refined in different risk assessment schemes (e.g., CWSS, CVSS, DREAD, etc.), the likelihood continues to be a complex measurement that requires either a history of vulnerabilities or an expert who determines, possibly based on several factors (e.g., equipment, necessary knowledge, exposure of the system, etc.), the likelihood of exploiting the vulnerability. To address this problem, the proposed methodology establishes a mapping between the test result and the likelihood value. Deterministic tests, such as those that verify whether communications are encrypted, contribute to the overall risk with 0 if the test passes (encryption present) and 1 if the test fails (no encryption). More complex tests yield graded likelihood values based on empirical data, such as the percentage of encrypted data or algorithm strength, scaling results between 0 and 1 to align with the likelihood scale. This approach ensures that both binary and graded tests contribute to the nuanced estimation of likelihood, allowing security evaluators to refine assessments based on real-world test metrics.
3.4.2. Impact
The impact factor is evaluated by a security expert based on the context of the TOE. Recognizing that not all vulnerabilities have equivalent effects, we treat the impact as a multidimensional measure that encompasses several domains: safety, financial, operational, privacy, and legal impact. For example, a vulnerability in an aircraft’s control system impacts safety critically, whereas one in an in-flight entertainment system primarily affects financial and operational aspects. Following this multidimensional approach already validated in frameworks such as HEAVENS [
33], EVITA [
9] or MoRA [
34] ensures that impact values are aligned with the context of the system.
To reduce the subjectivity of the impact assessment, we use standardized scales that offer clear criteria for quantifying each aspect aligned with best practices and standards. This ensures reproducibility of the results of the evaluation and a harmonized comparison between products evaluated by different persons. For instance,
Table 3 shows the scale for safety impact based on ISO 26262-3 standards [
24]. Similarly, we have impact scales for the financial dimension (
Table 4 following the classification of BSI-Standard 100-4 [
26]). The Operational dimension (
Table 5) is based on Failure Mode and Effects Analysis (FMEA) [
25], adapting the vehicular defect severity categorization to classify the operational damages as done in HEAVENS. Finally, the privacy and legislation dimension (
Table 6) follows the scale dimensions proposed in HEAVENS, although it could be also aligned with other proposals such as the Privacy Impact Assessment Guideline provided by BSI [
27].
The aggregate impact score is normalized on a scale of 0 to 10 (
Table 7), facilitating an objective and scalable impact assessment. In this sense, the Safety and Financial impacts are given equal importance when estimating the overall impact level. The repercussions of these factors can be extremely severe for stakeholders (e.g., vehicle occupants may not survive, or organizations may face bankruptcy). In contrast, the impact of the operational and privacy and legislation parameters is comparatively lower concerning safety and financial damages. To accurately represent this during impact level estimation, corresponding factors were reduced by a magnitude of one for both operational and privacy and legislation parameters relative to the safety and financial parameters.
3.4.3. Risk Aggregation
While the risk identification phase focused on breaking down high-level system analysis into low-level tests, this phase reverses the process, aggregating risks from the test level back to the overall system security properties, as shown in
Figure 3.
(1) From tests to claims: The overall risk for each claim is calculated as
using the typical formula from NIST [
32]. However, to accurately reflect the complexity of system risks, the calculation is refined depending on the type of claim. While test-based claims do not have an established impact value, vulnerability claims related to NVD entries already have a recognized impact value that the methodology can use for the risk calculation.
Test-based claims: For claims that are verified directly through testing, the risk is calculated by averaging the likelihood scores of all
m-related tests:
Here,
is determined by the evaluator using the standardized metrics presented in
Section 3.4.2, and
represents the empirical likelihoods derived from each test, as explained in
Section 3.4.1.
Vulnerability-based claims: For claims related to
n known vulnerabilities, each vulnerability (
) is assigned an impact score based on established databases (e.g. CVSS vector in NVD). The risk is then calculated using the max function instead of the mean to emphasize the severity of the most critical vulnerability:
where
, as before, is averaged from all tests associated with that vulnerability.
(2) From claims to components: After calculating the risk for each claim (
) based on Equation (
1), we aggregate the risks at the component level. We compute the risk for each
component with
r associated claims as follows:
Again, we use the max function instead of the mean to emphasize the severity of the most critical risk.
(3) From components to system security properties: The next step evaluates the risk for each STRIDE security property across the system. If a system property is considered in
c components, the overall risk is calculated by weighting each component’s risk by its sensitivity (calculated in the risk identification phase),
where
is the value computed in Equation (
3).
At the end of this process, the methodology obtains a risk value for each of the six STRIDE security properties in the system using Equation (
4).
3.5. Risk Evaluation
The risk evaluation phase determines whether the estimated risks for each STRIDE security property are acceptable given the specific context in which the system will operate. To this end, we rely on the TP defined in the establishing the context phase.
Each security property risk obtained from Equation (
4) is compared against the acceptable risk thresholds defined by the TP, which specify the maximum risk allowed for certification within a specific context.
Figure 4 illustrates the evaluation process for a TP that sets different acceptable risk ranges. If the confidentiality risk evaluated is
3, this falls within the acceptable range, and a security level of
B will be assigned to the confidentiality property. However, if the confidentiality risk evaluated is
8, it exceeds the tolerance threshold, making the risk unacceptable in this context, and, as a result, the system would not be certified. This evaluation procedure is repeated for each security property.
3.6. Labeling
The labeling system introduced in the methodology serves as a visual tool to communicate evaluation results to both technical and non-technical stakeholders. The radar chart was selected as the primary representation due to its ability to intuitively summarize multiple security dimensions. While there are some proposals for security labels, they tend to be overly simplistic, merely including the certification logo, such as the “Cybersecurity Made in Europe” label from ECSO [
35]. Others, such as the Carnegie Mellon label [
36], present too much information, potentially overwhelming non-expert users and becoming outdated quickly.
Our proposal strikes a balance by providing a visual representation of the information without overwhelming details while offering a link to more detailed and updated information via a dynamic QR code.
Each axis displays the score for a specific property, with the chart’s enclosed area reflecting the overall security level: a larger area indicates higher security across the evaluated dimensions. This intuitive representation, inspired by the unidimensional EU Energy Label [
37], links risk levels to a visual format, making complex evaluations accessible to non-experts. Alternative representations, such as linear graphs or tabular formats, were considered but discarded due to their lack of clarity and immediate comprehensibility for diverse stakeholders. By enhancing transparency, the label supports informed decision-making, empowering consumers and stakeholders to compare products and better understand their security features, fostering a more secure marketplace.
At the center of the radar chart, a QR code is embedded, as required by the EUCC. This code currently links to the project’s webpage, but its design and implementation are intended to support additional functionalities in future iterations. Specifically, the QR code shall encode a unique identity for the device, allowing users to access detailed properties through various views. For example, scanning the QR code could provide information about the device’s security certifications, risk levels, testing results, or compliance with specific standards. This feature is envisioned to enhance traceability and provide stakeholders with immediate, context-specific insights about the device.
While the current implementation is preliminary, future developments will focus on integrating the QR code with a secure database that enables real-time lookups of device properties. This will allow the label to evolve beyond static representations, supporting dynamic updates and more interactive security management.
3.7. Treatment
Risk treatment is defined by ISO 31000 as the process of modifying risks, typically through the implementation of controls. In general, the results of the security evaluation are used only to validate or certify the security of the system, overlooking an essential benefit: the identification of security vulnerabilities within the system and the opportunity to implement mitigation strategies during its operational phase. Our methodology incorporates risk treatment by embedding actionable security recommendations within the MUD extended profile defined in [
23]. This approach leverages the results of the security evaluation to generate a behavioral profile that includes both manufacturer-provided security guidance and recommendations obtained from the security assessment.
3.8. Monitoring and Communication
The cybersecurity certification process should not end with the initial security assessment performed before market deployment. Recognized entities, such as the National Institute of Standards and Technology (NIST) [
38] or the European Commission under the CSA CRA, emphasize that cybersecurity certification should be an ongoing effort, adapting to evolving threats. Thus, mechanisms for maintaining updated security levels should be integral to any cybersecurity certification framework.
In this sense, the methodology integrates the MUD profile created in the previous phase with monitoring tools, enabling operators to detect and respond to any deviations from the expected behaviors outlined by the certification authority and the manufacturer in the profile. Additionally, the MUD profile could be dynamic, so the manufacturer can describe additional policies in case of a new threat before a patch or update is released.
4. Evaluation over Use Cases
This section demonstrates the application of the proposed security evaluation methodology to two specific use cases: an ICT Gateway (ICT GW) within a smart grid environment and an AI Investments platform. Considering the maturity levels defined in
Table 8, the methodology has been applied in both scenarios at the "managed" level. We have a partial automation of this process due to the usage of specific tools, and basic metrics have been selected to provide a proof of concept.
Figure 5 provides an overview of the tools and concepts selected to support the instantiation of the methodology. In particular, the context establishment phase makes use of our predefined set of claims and TP. The risk identification phase is supported by the ResilBlockly tool, which automatizes the risk analysis and the generation of an initial MUD file. As testing tools, we use GraphWalker for MBT and a fuzzing tool developed within the BIECO project. Risks are aggregated and evaluated using the SecurityScorer tool, which also produces a security label that is visualized in the BIECO GUI. The instantiation considers additional tools for the generation of the extended MUD (MUD updater tool) and for the detection of misbehaviors based on the MUD (auditing tool). The following dedicated subsections detail how the tools are used in each use case, supporting the automatization of the methodology.
4.1. ICT Gateway
This scenario, developed in the context of the NET2DG project [
39], illustrates how the methodology can guide the security evaluation of complex systems composed of different subsystems.
The ICT GW, depicted in
Figure 6, is a critical component in a smart grid that serves as a bridge between the distribution system operator and a variety of data sources and actuation subsystems, including smart meters, remote terminal units, and inverters (INVs). The ICT GW consolidates and standardizes the data collected from these heterogeneous subsystems, storing them in a central database for use by domain applications. These applications, operating within the DSO, aim to improve grid efficiency, maintain voltage quality, and support outage diagnosis. The ICT GW’s architecture is organized into three layers, the adapters layer, the domain logic layer, and the service layer, each contributing distinct functions to manage data flow, ensure operational efficiency, and enable seamless integration within the grid.
While the ICT GW enhances data interoperability and grid management, it also introduces potential security risks, particularly due to its integration with multiple external subsystems. Given that the ICT GW is commercialized, a robust security evaluation offers an additional assurance layer to DSOs. The ICT GW, including the three architectural layers, is considered the TOE.
4.1.1. Context Establishment
This phase involves establishing the security context by defining relevant security and privacy claims tailored to each layer of the ICT GW architecture. These claims are selected from a predefined set that aligns with industry standards and best practices, establishing the basis for security evaluation. A total of 11 claims were identified as critical to the system’s secure operation. Additional details about them are given in [
29].
C10: Changes in user authentication values are executed securely.
C11: Sensitive parameters for secure association establishment are protected for integrity during communication.
C17: Sensitive data in transit are encrypted with appropriate methods.
C22: The system is resistant to Denial of Service (DoS) attacks.
C23: Input data are validated to prevent injection vulnerabilities.
C45: All protocols and libraries used are up-to-date.
C46: Authentication protocols utilize recommended algorithms for security.
C47: Authenticated sessions expire and require re-authentication for enhanced security.
C52: The system allows data subjects to delete their personal data permanently.
C58: The system enforces a limit on consecutive failed login attempts.
C72: Logs are protected against unauthorized removal.
In addition to these claims, we defined TP specific to the ICT GW use case. These profiles specify the minimum acceptable risk levels across each security property—such as confidentiality, integrity, and availability—according to the ICT GW’s role in the smart grid. The profiles were structured in YAML [
40] format (first block of Listing 1, lines 2–8) to facilitate interpretation and integration with automated tools, ensuring that the methodology is adaptable and scalable for different implementations.
Listing 1. YAML-based file syntax for system description. |
![Applsci 15 01600 i001]() |
4.1.2. Risk Identification
The
risk identification phase focuses on characterizing the system, its components, and the relationships between them to identify vulnerabilities that may pose security risks. For the ICT GW, this phase involves constructing a detailed model of the system, leveraging multiple data sources and tools to provide a structured overview of components and associated security claims. In this instantiation (
Figure 5), we employ the ResilBlockly tool for a detailed analysis of vulnerabilities and attack paths, as well as for creating an extended behavioral profile. ResilBlockly [
41] is a tool developed within the BIECO project designed to model highly complex and interconnected systems and infrastructures while significantly reducing the cognitive load typically associated with such activities. Additionally, it includes features for modeling and identifying threats, enabling security risk assessments of identified weaknesses and vulnerabilities.
We describe the ICT GW in a YAML file following the format described in Listing 1 (lines 9–22), which follows the IETF notation, where ? denotes optional fields and * denotes repeatable fields. In this file, components are listed with their respective sensitivity scores (ranging from 0 to 10), indicating the criticality of each component. The claims block links these components to both generic and vulnerability-specific claims, together with their impact levels across the different dimensions (safety, operational, financial, and privacy/legal compliance). Vulnerability information, including known vulnerabilities from public databases (e.g., CVE), is also included in this system description, allowing for integration with risk assessment tools like ResilBlockly for further analysis.
Furthermore, we use the ResilBlockly tool to create a model of the TOE (available in [
23]). From this model, ResilBlockly can execute a vulnerability and weaknesses analysis, identifying and associating them with components. In the analysis performed over the interface GUI-ICTGW, ResilBlockly was able to retrieve 152 CWEs and 11 CVEs from the catalogs. By carefully analyzing the CWEs and using an approach that involves selecting CWEs from the first level of a hierarchical weakness tree, the number of CWEs associated with the interface was reduced to 33. Finally, ResilBlockly assists in generating a preliminary extended MUD profile for the ICT GW, integrating the risk analysis outcomes and information from the system model. The complete MUD file and further details about the generation process of this file are given in [
23].
4.1.3. Security Testing
The methodology instantiation incorporates two security testing techniques, MBT and fuzzing, using tools developed or refined within the BIECO project, as shown in
Figure 5: GraphWalker and Fuzzing Tool. Each technique is applied in different layers of the ICT GW to verify the compliance with the selected set of security claims.
Testing was conducted over three iterations, each targeting one of the ICT GW’s architectural layers (adapters, logic, and service).
Table 9 summarizes the implemented tests.
Fuzzing testing is a technique that aims to identify security vulnerabilities in the TOE by using unintended or incorrect inputs. This approach has proven to be effective in identifying weaknesses that may be overlooked by other testing methods. On the one hand, it can be used to test input data (data fuzzing testing) by feeding the TOE with random data to find possible errors or vulnerabilities. On the other hand, it can be used as behavioral fuzzing testing in which valid/invalid message sequences are used for the same purpose.
In this instantiation, the Fuzzing Tool (
https://www.gradiant.org/blog/deteccion-de-vulnerabilidades-fuzzing-bieco/ accessed on 3 February 2025) developed within BIECO was used to generate multiple HTTP requests to the ICT GW’s endpoints, testing combinations of parameters not specified in the Swagger file. This testing approach identified potential misconfigurations or response anomalies that could pose security risks.
MBT involves creating a high-level model representing the TOE, allowing for automated test generation based on it. This is a major advantage, as it eliminates the need to specify each test step by step. The research presented in [
42] represents an example of how MBT can be used to automate the generation of tests. Other examples also show how to combine and automate fuzzing testing with MBT to test inputs and behaviors [
43].
We used the open-source tool GraphWalker (
https://graphwalker.github.io/ accessed on 3 February 2025) to define the ICT GW testing model as a directed graph.
Figure 7 illustrates the GraphWalker model designed for the adapter layer of the ICT GW, which was used to generate and execute tests. By default, GraphWalker generates a unique sequence of steps that cover the whole tree, and the user still needs to connect the high-level steps with specific actions in the real system. To automate this process and adapt to the methodology needs, we enhanced GraphWalker with a Test Adapter and Suite Generator (TASG) extension. This extension is in charge of automating the generation of different tests based on finish conditions embedded in the model, and the creation of an adapter to connect each step with the real system. This enables automation of approximately 46.18% of the test generation process. Although some manual implementation of the adapter is necessary, subsequent modifications, additions, and repetitions of the tests can be made without making significant changes to the adapter. This improves the efficiency of the re-evaluation process after a possible security change is detected. This capability is particularly valuable in complex systems with numerous components and dependencies between them. Finally, the user can run all tests, in this case using Maven, obtaining two results files:
TEST-TestSuite.xml: Records the results (pass/fail) for each test, with detailed failure reasons and assertions.
TestSuite-output.json: Collects non-binary metrics for each test in JSON format, allowing for additional information capture such as encryption strength or session expiration. The JSON schema, shown in Listing 2, specifies metrics, values, and scales, which are mapped to a 0–1 range for likelihood calculation.
The resulting test data from GraphWalker and the fuzzing tool are used in the next phases for risk estimation via the SecurityScorer tool and risk treatment via the MUD Updater tool, ensuring a structured and automated approach to analyzing test outcomes.
Listing 2. JSON file for non binary values. |
![Applsci 15 01600 i002]() |
4.1.4. Risk Estimation and Evaluation
This phase calculates the security level of the system based on the data collected from the testing tools, which can be in formats such as .json,.xml,.csv, etc., and the YAML system description created during the risk identification phase. To automate the process, we developed a scorer tool, the SecurityScorer (
Figure 5), which performs the following sequence of actions:
Reading and parsing the YAML system description file, including the TP;
Reading and parsing the outputs of the security testing phase tools, including non-binary results;
Using both sources of information to evaluate the risk value for each component based on the methodology defined in
Section 3.4;
Combining the risks of the components to calculate the overall risk value of the whole system;
Using the tolerance profiles to certify the system properties (A, B, C, D, or not certified);
Generating the cybersecurity label according to the evaluation results.
For the ICT GW, we obtained the results shown in
Table 9. The last column shows the test result (failed/pass), while the sixth column shows the non-binary information collected from that test, if applicable. We discovered that some of the claims were not fulfilled by the ICT GW, such as the establishment of a limit on authentication attempts to prevent brute-force attacks or the protection against log removal. Among the non-binary information, we discovered the presence of 18 outdated libraries, a maximum of 4822 simultaneous connections supported in the service layer, and a session expiration time of 1 min. Based on the evaluation, the manufacturer was able to update the implementation to add additional security functionalities or suggest specific configurations to be enforced by the ecosystem, such as using a firewall, implementing limits on authentication attempts, or updating the identified libraries.
4.1.5. Labeling
The security label has been designed as a hexagonal radar diagram to support the visualization of the security level in a way that it could be understood by non-expert users. In this case, the SecurityScorer calculates the security level for each of the six security properties based on the information produced in the previous phase and the TP defined in the YAML file. As shown in
Figure 5, the label is visualized in the GUI developed within the BIECO project. In particular,
Figure 8 shows the label obtained for ICT GW after running the SecurityScorer, which integrates a QR code to deal with future updates of the label with the objective of providing additional information about the security evaluation process, enhancing transparency and reusing evidence.
The label indicates that the ICT GW achieved a high level of security in terms of confidentiality, integrity, availability, and authorization. However, it highlights areas needing improvement, specifically in non-repudiation and authentication security properties. For a non-expert user, the label shows a hexagon with a large green area, suggesting that the product can be considered secure.
4.1.6. Treatment
Based on the results obtained from the testing tools, we improve the initial version of the extended MUD, which was created by the ResilBlockly tool during the risk identification phase. To automate this process, we created the MUD Updater tool (
Figure 5, treatment phase). It collects empirical results from the security testing phase to refine the initial values of the extended MUD, adding new configurations to reduce the attack surface of possible attacks and mitigations for the vulnerabilities encountered.
Figure 9 shows an overview of the relationship between the MUD and the methodology in which the original MUD file created by the manufacturer is extended by the assessment report of ResilBlockly and refined later by the MUD updater.
The MUD updater uses the metrics derived from the test’s execution, which are stored in the TestSuite-output.json file. In particular, the MUD updater analyses the field matchWith shown in Listing 2 (line 12) to locate the name of the corresponding ACL within the extended MUD that needs to be updated or extended. Then, based on the metric’s name and value, the tool defines the specific policy to be integrated into the MUD.
The outputs of the ICT GW testing, in addition to their MUD file, were then used as inputs for the MUD updater to derive new security configurations. In particular, the maximum number of simultaneous connections (
num-connections) was established at 4822 (C22), and the cryptographic algorithm was set up to RSA-OAEP-256 (C46). The MUD was extended 160’69% with respect to the original MUD file, and 40% of the non-binary test results were integrated into it. The resulting extended MUD file can be found in [
44].
4.1.7. Monitoring and Communication
As explained previously, the communication and auditing phase deals with the security changes that may occur during the lifecycle of the system. Following the steps of
Figure 5, the methodology is supported in this phase by the auditing framework and the monitoring ontology developed in [
45], which is intended to detect security issues based on a set of blueprints derived from the design phase and manually specified by the user. As shown in
Figure 9, one of the blueprints used in the auditing framework is the updated MUD generated from the treatment phase, which contains a specific configuration (e.g., limited connections) that must be monitored to keep the system in a secure state according to the test results. If there is a violation of the MUD policies during the monitoring process, the auditing framework requests a re-evaluation process or a mitigation action, which could imply a modification of the MUD file. Additional details on this can be found in [
45,
46] and therefore are not included in this article.
4.2. AI Investments
AI Investments (AII) is a platform that aims to improve financial portfolio management and reduce risk exposure by integrating advanced artificial intelligence and deep learning techniques. Its primary goal is to improve the performance metrics of hedge funds, trading companies, and other investment institutions. By adopting a comprehensive approach, the AI Investments system integrates and standardizes disparate data sets, leveraging modern advances in time series analysis and optimization algorithms to construct investment portfolios that meet optimal performance benchmarks. Once this process is complete, the platform executes trading actions through the designated stock brokerage, effectively combining technological innovation with investment strategy.
Incorporating state-of-the-art AI and deep learning mechanisms into the AI Investments platform, including the use of deep neural networks for signal creation in transactions, convolutional neural networks for pattern detection, and Long Short-Term Memory (LSTM) networks for the analysis of time-based data sequences, emphasizes the need for robust security measures across these interconnected systems. The proposed methodology aims to protect the core processes powered by AI from any potential threats that could emerge from interactions with external entities. As shown in
Figure 10, the risk of cybersecurity incidents is increased by the connection between the main component of the platform, the master component, and third-party brokers. These brokers operate outside of the protected environment of the AI platform, which means that there is a possibility that they could unknowingly become pathways for cyberattacks. In this context, the master component has been selected as the TOE. Any breach in security not only poses a risk of financial loss but could also undermine confidence in the innovative trading solutions offered by AI Investments. To address this risk, a thorough security validation plan has been developed to carefully examine and remove these threats so that the platform’s defenses are strengthened against potential breaches originating from these external sources.
Compliance with strict regulatory standards related to data protection and system security is also imperative in the fintech industry. The adopted security practices guarantee adherence to these legal requirements, safeguarding the firm from possible legal and fiscal penalties due to non-compliance. The management of sensitive financial and strategic investment data, crucial to the user base of the AI platform, highlights the need for an impregnable system. The security plan in place is specifically designed to protect this vital information, ensuring that its confidentiality, integrity, and availability remain intact. With cyber adversaries constantly innovating in their methods to exploit vulnerabilities, the security validation framework for the principal component is structured to be flexible, adapting to novel threats as they arise, and thus continuously securing the platform’s defenses. By rolling out a comprehensive security evaluation process, AI Investments is affirming its commitment to the safeguarding of investor assets and information, which is vital for the platform’s endorsement and eventual triumph.
In the fintech industry, it is also crucial to adhere to strict regulatory standards regarding data protection and system security. By implementing security practices, companies can ensure compliance with legal requirements and avoid potential penalties. The AI platform relies on sensitive financial and strategic investment data, making system and data security of utmost importance. The current security plan is designed to protect this vital information and maintain its confidentiality, integrity, and availability. Overall, by implementing the security evaluation methodology, AI Investments can demonstrate its commitment to protect investor assets and information, which is essential for the success of the platform.
4.2.1. Context Establishment
We selected a set of basic security claims [
29] that should be fulfilled to consider the master component secure.
Table 10 (first and second columns) shows the list of selected claims and the associated security properties. In particular, we selected six claims:
C6: The changes in the authentication values for user authentication are successful.
C15: Sensitive security parameters exchanged during the communication for the establishment of a secure association should be integrity protected.
C18: Sensitive security parameters should be encrypted in transit, with appropriate encryption.
C22: Resistance to DoS attacks.
C23: Data input validation.
C47: Authenticated sessions should expire, with a new reauthentication required.
4.2.2. Risk Identification
Following the process, the master component was modeled in ResilBlockly.
Figure 11 shows a portion of the master model that was simplified to contain only one RUMI and service for the sake of the readability of this paper. The master was modeled as an entity (CS) called by the master REST API. One of the master REST API functionalities is to generate the training report. To execute it, the master REST API calls the internal master processor entity. The call is represented by a green box
Master_API_To_Master_Get_Training_Report reference in
Figure 11. In turn, the master processor contains a service
Generate_Training_Report (orange box) and a RUMI
Master_To_Master_API_Training_Report that responds with the
Training_Report_Result message (blue box) to the REST API. Other RUMIs and services in the master processor are implemented similarly, as well as the other internal subsystems like the monitoring API, worker API, and AIT DB API in the AIT architecture.
In this phase, we also generated the YAML file with the system description to identify and map the components, claims, security properties, and tests that will be executed. As before, the impact of each claim (column 5 of
Table 10) and the master sensitivity (set to 8) were determined based on the expertise of the use case owner.
4.2.3. Security Testing
In this phase, we implemented the tests using the GraphWalker and Fuzzing Tool. As before, the swagger file was used as input for the fuzzing tool, and the master model necessary to generate the selected test was created in GraphWalker, as shown in
Figure 12. The list of implemented tests can be found in the fourth column of
Table 10.
The adapter interface generated by the GraphWalker extension was implemented to link the automatically generated JUnit test suite with the master real component. With this approach, we automated the 44’96% of the test generation process.
4.2.4. Risk Estimation, Evaluation, and Labeling
The results obtained from the tests are presented in
Table 10. The last column shows the test result (fail/pass), while the penultimate column shows the non-binary information collected from that test, if applicable. We found that some of the claims were not fulfilled by the TOE, especially with respect to the usage of certificates. Moreover, during the non-binary test, two unnecessarily exposed interfaces were identified, and the expiration time was determined to be 60 min. Regarding DoS resistance, the maximum number the computer was capable of testing was 50,000, so the real maximum number of simultaneous connections was not determined. Based on the evaluation, the manufacturer implemented additional security functionalities to validate certificates based on the evaluation and improve the security of the AI Investment platform.
As before, the test report, non-binary values, and the YAML file containing the system description were used as input in the SecurityScorer tool, which calculated the security level for each security property and generated the security label shown in
Figure 13. Since the non-repudiation and integrity security properties were not tested in this particular use case, the label does not take them into account (there are no vertices for them).
The label indicates that the AI investments platform achieved a high level of security in terms of confidentiality, authorization, and availability. However, it highlights areas needing improvement, specifically in the authentication mechanism. Since the non-repudiation and integrity security properties were not tested in this particular use case, the label does not take them into account (there are no vertices for them). For a non-expert user, the label displays a medium quadrilateral, suggesting that the product has a medium level of security.
Comparing this result with the previous use case, the ICT GW can be considered as more secure. Although the simplicity of the label helps users to decide between two similar products based on their security level, one of the limitations of the label is the false notion of security if the basis for the evaluation process is not strong. Therefore, the certification process associated with the methodology should rely on mechanisms and certification authorities that regulate the minimum requirements for a product to be certified.
4.2.5. Treatment
After the execution of the tests derived by GraphWalker, the TestSuite-output.json file was generated to describe the non-binary values and their match with the extended MUD policies. Across the set of tests, three of them were non-binary. Since the exact maximum number of simultaneous connections was not determined, we established 50,000 as a secure limit. The JSON file was then used as input for the MUD updater to modify the field num-connections to 50,000, which previously was estimated to be 3. The MUD was extended 268’94% with respect to the original MUD file, and 33’33% of the non-binary test results were integrated into it.
5. Discussion
The proposed security evaluation methodology for ICT systems offers a robust framework to address the limitations of traditional approaches. By integrating automated testing and dynamic labeling, this methodology meets the demand for adaptable and cost-effective cybersecurity certification processes, particularly suited for systems operating in highly dynamic environments. The automation capabilities offered by integrated tools like GraphWalker and the Fuzzing Tool significantly streamline the traditionally time-consuming certification processes. This approach not only ensures that testing is comprehensive but also supports rapid re-evaluation, facilitating ongoing validation. Automating the processing of test results, risk scoring, and MUD profile generation further simplifies this process, allowing for timely evaluations of systems.
By combining risk assessment with test-based evaluations, this approach allows for a more precise and objective assessment of security risks, capturing both binary and non-binary test results. This empirical approach, combined with best practices approaches such as HEAVENS or ISO standards for metrics calculation, enhances the accuracy of risk estimations and aligns security assessments with the operational needs of modern ICT environments. Moreover, the introduction of TP tailored to the security context provides a customized approach to certification. These profiles allow the methodology to adapt to the unique security requirements of different contexts and systems. However, the definition of such TP requires the joint effort of regulatory entities and industry or an adaptation from CC collaborative profiles.
Applying the methodology to two use cases has shown its effectiveness in assessing complex and interconnected systems in different contexts. Furthermore, the methodology has been designed to enable the composition of previous evaluations, since the risk estimation phase combines the individual risks of each system component. Although previous results can be reused, the combination of individual risks is modulated by the sensitivity factor, so a detailed analysis of the role of the component within the system and its dependencies with other components is necessary for the final calculation of the security level. Additionally, integration of the component into the system can lead to additional claims and integration tests that allow for verification of the joint security of the system. In this case, the results of the new tests should be combined with the previously certified security of the component.
The methodology introduced a user-friendly visual security label that conveys the system’s security posture to both technical and non-technical stakeholders. By representing key security properties in a radar chart format, the label provides a clear summary of complex evaluations, enabling consumers to make informed decisions and easily compare the security levels of different systems. This transparency is a crucial step toward fostering trust and understanding in cybersecurity certification. However, there is still some pending discussion regarding which information should be public or private and how to manage access to the sensitive information inside the security label.
The methodology also demonstrates significant alignment with the CRA and CSA regulatory frameworks. The CRA underscores the importance of lifecycle security, requiring manufacturers to continuously address vulnerabilities and maintain compliance throughout the product lifecycle. The proposed methodology integrates continuous monitoring and automated assessment tools, enabling rapid adaptation to evolving threats and supporting lifecycle recertifications. By leveraging standards such as the MUD and extending them to complex systems, the methodology not only reduces the attack surface but also ensures adaptability to dynamic cybersecurity landscapes. These features align closely with the CRA’s emphasis on lifecycle security and vulnerability management. The use of standards within the methodology fosters consistency and comparability in certifications as required by the CSA, while the dynamic security labeling system featuring visual indicators and QR codes enhances transparency and accessibility for all stakeholders. By introducing TP and adapting security evaluations to specific operational contexts, the methodology provides tailored yet harmonized certification outcomes, supporting the CSA’s vision for a unified certification framework.
6. Conclusions and Future Work
The proposed security evaluation methodology provides several significant advantages. First, it introduces automation and dynamic features, which simplify the traditionally complex and resource-intensive certification processes. The combination of automated testing, dynamic labeling, and clear visual communication supports both technical and non-technical stakeholders, enabling informed decision-making and fostering trust. Second, the methodology’s adaptability to different operational contexts through the use of tolerance profiles makes it suitable for a wide variety of ICT systems, from simple devices to complex interconnected platforms. Lastly, the ability to integrate and reuse previous evaluations while incorporating sensitivity factors ensures efficiency without compromising the accuracy of security assessments.
To further validate and refine this methodology, future work will involve conducting a series of case studies to evaluate and rate each component of the framework. This approach will guide targeted enhancements to better address practical challenges encountered during implementation. Additionally, we will investigate how to extend the methodology with semi-automated procedures to address issues encountered during the evaluation process before treatment sharing. In relation to this, we will consider technologies and mechanisms such as Hyperledger and threat MUD to support secure information sharing between relevant stakeholders and to launch the required processes to update the information contained in the QR code of the label.
The proposed methodology, while focused on technical risk management, can also be aligned with organizational risk governance, as outlined in the ISO 31001-3 standards, enabling stakeholders to make informed decisions that balance technical security needs with business objectives. For example, the context establishment phase can include analyzing how ICT system risks align with enterprise risk management frameworks, regulatory compliance, and strategic goals. Similarly, the treatment phase could support the prioritization of mitigation efforts based on their impact on both technical security and business outcomes. Further organizational risk management could be explored as part of the methodology’s future work.
Although coordination between stakeholders and regulatory bodies is key to defining a comprehensive cybersecurity certification framework, the proposed methodology is intended to serve as a basis for a more standardized and uniform approach. With ongoing advancements, this methodology is expected to evolve in alignment with EU regulations and ENISA guidelines, contributing to a cohesive and adaptable certification landscape in the coming years.
Author Contributions
Conceptualization, S.N.M.; methodology, S.N.M., J.F.M.-G., I.B., J.M., R.P., and A.S.; software, J.F.M.-G., I.B., J.M., and R.P.; validation, S.N.M., J.F.M.-G., I.B., J.M., and R.P.; writing—original draft preparation, S.N.M., J.F.M.-G., I.B., J.M., R.P., and A.S.; writing—review and editing, S.N.M. and R.P.; supervision, S.N.M. and A.S. All authors have read and agreed to the published version of the manuscript.
Funding
This work has been partially funded by the European Commission through the projects H2020-952702 BIECO, H2021-101069471 CERTIFY, and DOSS (grant no. 101120270).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Data is contained within the article.
Conflicts of Interest
Author Irene Bicchierai was employed by ResilTech s.r.l., author Jan Marchel and Radosław Piliszek were employed by 7bulls.com. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Abbreviations
The following abbreviations are used in this manuscript:
CC | Common Criteria |
CPA | Commercial Product Assurance |
CSA | Cybersecurity Act |
CSPN | Certification de Sécurité de Premier Niveau |
CRA | Cyber Resilience Act |
CVE | Common Vulnerabilities and Exposures |
CVSS | Common Vulnerability Scoring System |
CWE | Common Weakness Enumeration |
CWSS | Common Weakness Scoring System |
DoS | Denial of Service |
DPIA | Data Protection Impact Assessment |
DREAD | Damage potential, Reproducibility, Exploitability, affected users, Discoverability |
EAL | Evaluation Assurance Level |
ENISA | European Union Agency for Cybersecurity |
ETSI | European Telecommunications Standards Institute |
EUCC | Common Criteria-based European candidate cybersecurity certification scheme |
EUCS | European Cybersecurity Certification Scheme for Cloud Services |
EVITA | E-safety Vehicle InTrusion-protected Applications |
FMEA | Failure Mode and Effect Analysis |
GDPR | General Data Protection Regulation |
HEAVENS | HEAling Vulnerabilities to ENhance Software Security and Safety |
ICT | Information and Communication Technology |
IoT | Internet of Things |
ISO | International Organization for Standardization |
MBT | Model-Based Testing |
MUD | Manufacturer Usage Description |
NIST | National Institute of Standards and Technology |
NVD | National Vulnerability Database |
OCTAVE | Operationally Critical Threat, Asset, and Vulnerability Evaluation |
PP | Protection Profile |
STRIDE | Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privileges |
TOE | Target of Evaluation |
TP | Tolerance Profile |
TTCN | Testing and Test Control Notation |
UL | Underwriter Laboratories |
YAML | YAML Aint Markup Language |
References
- Gartner. Top Cybersecurity Trends and Strategies for Securing the Future. 2024. Available online: https://www.gartner.com/en/cybersecurity/topics/cybersecurity-trends (accessed on 3 February 2025).
- Regulation (EU) 2019/881 of the European Parliament and of the Council of 17 April 2019 on ENISA (the European Union Agency for Cybersecurity) and on information and communications technology cybersecurity certification and repealing Regulation (EU) No 526/2013 (Cybersecurity Act) (Text with EEA relevance), 2013.
- Regulation (EU) 2024/2847 of the European Parliament and of the Council of 23 October 2024 on horizontal cybersecurity requirements for products with digital elements and amending Regulations (EU) No 168/2013 and (EU) 2019/1020 and Directive (EU) 2020/1828 (Cyber Resilience Act) (Text with EEA relevance), 2024.
- European Telecommunications Standards Institute (ETSI). ETSI EG 203 251 V1.1.1: Methods for Testing and Specification; Risk-based Security Assessment and Testing Methodologies. 2016. Available online: https://www.etsi.org/deliver/etsi_eg/203200_203299/203251/01.01.01_60/eg_203251v010101p.pdf (accessed on 3 February 2025).
- ISO 31000:2018; Risk Management—Guidelines. ISO (International Organization for Standardization): Geneva, Switzerland, 2018.
- ISO/IEC/IEEE 29119-2:2021; Software and Systems Engineering—Software Testing. Part 2: Test Processes. ISO (International Organization for Standardization): Geneva, Switzerland, 2021.
- Lear, E.; Romascanu, D.; Droms, R. Manufacturer Usage Description Specification. Standard IETF RFC 8520. 2019. Available online: https://tools.ietf.org/html/rfc8520 (accessed on 3 February 2025).
- Alberts, C.J.; Dorofee, A.J.; Stevens, J.F.; Woody, C. OCTAVE-S Implementation Guide, Version 1; Carnegie Mellon University. Technical Report; 2005. Available online: https://insights.sei.cmu.edu/documents/1608/2005_002_001_14273.pdf (accessed on 3 February 2025).
- E-Safety Vehicle Intrusion Protected Applications (EVITA). D3.4.3-On-Board Architecture and Protocols Verification; 2010. Available online: https://www.evita-project.org/deliverables.html (accessed on 3 February 2025).
- Shameli-Sendi, A.; Aghababaei-Barzegar, R.; Cheriet, M. Taxonomy of information security risk assessment (ISRA). Comput. Secur. 2016, 57, 14–30. [Google Scholar] [CrossRef]
- Matheu-García, S.N.; Hernández-Ramos, J.L.; Skarmeta, A.; Baldini, G. A Survey of Cybersecurity Certification for the Internet of Things. ACM Comput. Surv. 2020, 53, 1–36. [Google Scholar] [CrossRef]
- Matheu, S.N.; Hernandez-Ramos, J.L.; Skarmeta, A.F. Toward a Cybersecurity Certification Framework for the Internet of Things. IEEE Secur. Priv. 2019, 17, 66–76. [Google Scholar] [CrossRef]
- European Cyber Security Organisation (ECSO), WG1—Standardisation, Certification, Labelling and Supply Chain Management. Overview of Existing Cybersecurity Standards and Certification Schemes v2; 2017. Available online: https://ecs-org.eu/ecso-uploads/2022/10/5a31129ea8e97.pdf (accessed on 3 February 2025).
- Common Criteria. Common Criteria for Information Technology Security Evaluation. Part 1: Introduction and General Model; 2022. Available online: https://commoncriteriaportal.org/files/ccfiles/CC2022PART1R1.pdf (accessed on 3 February 2025).
- National Cyber Security Centre (NCSC). The Commercial Product Assurance (CPA) Build Standard v1.4; NCSC-1844117881-312. 2018. Available online: https://www.ncsc.gov.uk/files/CPA-Build_Standard_1-4.pdf (accessed on 3 February 2025).
- Underwriters Laboratories (UL). Software Cybersecurity for Network-Connectable Products, Part 2-1: Particular Requirements for Network Connectable Components of Healthcare and Wellness Systems; 2023. Available online: https://www.shopulstandards.com/ProductDetail.aspx?productId=UL2900-2-1 (accessed on 3 February 2025).
- Agence Nationale de la séCurité des Systèmes D’information (ANSSI). Certification de séCurité de Premier Niveau des Produits des Technologies de L’information (CSPN); Paris, No. 45 /ANSSI/SDE/PSS/CCN; 2023. Available online: https://cyber.gouv.fr/sites/default/files/document/ANSSI-CSPN-CER-P-01%20Certification_de_securite_de_premier_niveau_v5.0.pdf (accessed on 3 February 2025).
- Zhou, C.; Ramacciotti, S. Common Criteria: Its Limitations and Advice on Improvement. ISSA J. 2011, 24–28. Available online: https://www.difesa.it/assets/allegati/33182/commoncriteria_issa_journal_0411.pdf (accessed on 3 February 2025).
- Hernandez-Ramos, J.L.; Matheu, S.N.; Skarmeta, A. The Challenges of Software Cybersecurity Certification [Building Security In]. IEEE Secur. Priv. 2021, 19, 99–102. [Google Scholar] [CrossRef]
- Fowler, D.; Epiphaniou, G.; Maple, C. Cybersecurity Assurance and Certification for Systems. 2022. Available online: https://www.researchgate.net/publication/370873334_Cybersecurity_Assurance_and_Certification_for_Systems (accessed on 3 February 2025).
- Khurshid, A.; Alsaaidi, R.; Aslam, M.; Raza, S. EU Cybersecurity Act and IoT Certification: Landscape, Perspective and a Proposed Template Scheme. IEEE Access 2022, 10, 129932–129948. [Google Scholar] [CrossRef]
- Hernández-Ramos, J.L.; Matheu, S.N.; Feraudo, A.; Baldini, G.; Bernabe, J.B.; Yadav, P. Defining the Behavior of IoT Devices Through the MUD Standard: Review, Challenges, and Research Directions. IEEE Access 2021, 9, 126265–126285. [Google Scholar] [CrossRef]
- Matheu García, S.N.; Sánchez-Cabrera, A.; Schiavone, E.; Skarmeta, A. Integrating the manufacturer usage description standard in the modelling of cyber–physical systems. Comput. Stand. Interfaces 2024, 87, 103777. [Google Scholar] [CrossRef]
- ISO 26262-3:2018; Road Vehicles—Functional Safety Part 3: Concept Phase. ISO (International Organization for Standardization): Geneva, Switzerland, 2018.
- Failure Modes and Effects Analysis (FMEA). University of Cambdridge. Available online: https://www.ifm.eng.cam.ac.uk/research/dmg/tools-and-techniques/fmea-failure-modes-and-effects-analysis/ (accessed on 3 February 2025).
- BSI-Standard 100-4; Business Continuity Management. BSI (Federal Office for Information Security): Nordrhein-Westfalen, Germany, 2009.
- Federal Office for Information Security (BSI). Privacy Impact Assessment Guideline; Federal Office for Information Security (BSI): Bonn, Germany, 2011. [Google Scholar]
- National Institute of Standards and Technology (NIST). National Vulnerability Database (NVD). Available online: https://nvd.nist.gov (accessed on 3 February 2025).
- Matheu, S.; Sánchez, A.; Cioroaica, E.; Daoudagh, S.; Lonetti, F.; Marchetti, E.; Schiavone, E.; Massimiliano, L.; Sorokos, I.; Pintos, B.; et al. D7.1 Report on the Identified Security and Privacy Metrics and Security Claims to Evaluate the Security of a System. BIECO Project—Building Trust in Ecosystems and Ecosystem Components. 2022. Available online: https://www.bieco.org/project-description/deliverables/d7-1-report-on-the-identified-security-and-privacy-metrics-and-security-claims-to-evaluate-the-security-of-a-system (accessed on 3 February 2025).
- National Telecommunications and Information Administration (NTIA). Software Bill of Materials (SBOM). 2022. Available online: https://ntia.gov/SBOM (accessed on 3 February 2025).
- ETSI ES 201 873-1; Methods for Testing and Specification (MTS); Testing and Test Control Notation Version 3; Part 1: TTCN-3 Core Language. European Telecommunications Standards Institute (ETSI): Sophia Antipolis, France, 2023.
- NIST SP 800-30; Joint Task Force Transformation Initiative. Guide for Conducting Risk Assessments: Information security. National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2012. [CrossRef]
- Lautenbach, A. D2. Security Models. HEAVENS project—HEAling Vulnerabilities to ENhance Software Security and Safety. 2016. Available online: https://autosec.se/wp-content/uploads/2018/03/HEAVENS_D2_v2.0.pdf (accessed on 3 February 2025).
- Eichler, J.; Angermeier, D. Modular risk assessment for the development of secure automotive systems. In Proceedings of the 31st VDI/VW Joint Conference Automotive Security, Wolfsburg, Germany, 21–22 October 2015. [Google Scholar]
- European Cyber Security Organisation (ECSO). Cybersecurity Made in Europe. Available online: https://www.cybersecurity-label.eu/ (accessed on 3 February 2025).
- IoT Security and Privacy Label. University of Carnegie Mellon. Available online: https://www.iotsecurityprivacy.org/ (accessed on 3 February 2025).
- Directive 2010/30/EU of the European Parliament and of the Council of 19 May 2010 on the indication by labelling and standard product information of the consumption of energy and other resources by energy-related products (recast) (Text with EEA relevance. Directive-2010/30-EN-EUR-Lex. 2010.
- National Institute of Standards and Technology (NIST). Considerations for Managing Internet of Things (IoT) Cybersecurity and Privacy Risks; National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2019. Available online: https://nvlpubs.nist.gov/nistpubs/ir/2018/NIST.IR.8228-draft.pdf (accessed on 3 February 2025).
- Net2DG Project-Leveraging NETworked Data for the Digital Electricity Grid. Funded by European Union’s Horizon 2020 Research and Innovation Programme Under Grant Agreement No 774145. 2020. Available online: http://www.net2dg.eu/ (accessed on 3 February 2025).
- Ben-Kiki, O.; Evans, C.; Ingy, N. YAML Ain’t Markup Language (YAML) Version 1.2. 2021. Available online: https://yaml.org/spec/1.2.2/ (accessed on 3 February 2025).
- Schiavone, E.; Nostro, N.; Brancati, F. A MDE Tool for Security Risk Assessment of Enterprises. In Proceedings of the 10th Latin-American Symposium on Dependable Computing (LADC 2021). Sociedade Brasileira de Computação, Porto Alegre, Brasil, 22–26 November 2021; pp. 5–7. [Google Scholar] [CrossRef]
- Matheu-García, S.N.; Hernández-Ramos, J.L.; Skarmeta, A.F.; Baldini, G. Risk-based automated assessment and testing for the cybersecurity certification and labelling of IoT devices. Comput. Stand. Interfaces 2019, 62, 64–83. [Google Scholar] [CrossRef]
- Lorrain, J.; Fourneret, E.; Dadeau, F.; Legeard, B. MBeeTle-un Outil Pour la géNération de Tests à-la-volée à l’aide de Modèles. Groupement De Recherche CNRS du Génie de la Programmation et du Logiciel, Besançon, France. 2016. Available online: https://hal.science/hal-02472608 (accessed on 3 February 2025).
- Bicchierai, I.; Araniti, E.; Matheu-García, S.N.; Gil, J.F.M. Validating the BIECO Security Evaluation Methodology within a Smart Grid Monitoring SW. In Proceedings of the 2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW), Florence, Italy, 9–12 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 9–12. [Google Scholar] [CrossRef]
- Daoudagh, S.; Marchetti, E.; Calabrò, A.; Ferrada, F.; Oliveira, A.I.; Barata, J.; Peres, R.; Marques, F. DAEMON: A Domain-Based Monitoring Ontology for IoT Systems. SN Comput. Sci. 2023, 4, 1–16. [Google Scholar] [CrossRef]
- Matheu, S.; Sebastio, S.; Skarmeta, A.; Orizio, R.; Vasileiadis, S.; Kalos, V.; Muller, K.; Grubl, T.; Omana, R.; Tuck, S.; et al. Active Security for Connected Device Lifecycle: The CERTIFY Architecture. In Proceedings of the Special Session on Intelligent Internet of Things Security and Privacy (WISP), Salamanca, Spain, 26–28 June 2024. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).