A Flexible Risk-Based Security Evaluation Methodology for Information Communication Technology System Certification

Matheu, Sara N.; Martínez-Gil, Juan F.; Bicchierai, Irene; Marchel, Jan; Piliszek, Radosław; Skarmeta, Antonio

doi:10.3390/app15031600

Open AccessArticle

A Flexible Risk-Based Security Evaluation Methodology for Information Communication Technology System Certification

by

Sara N. Matheu

^1,*

,

Juan F. Martínez-Gil

¹,

Irene Bicchierai

²

,

Jan Marchel

³,

Radosław Piliszek

³

and

Antonio Skarmeta

¹

Department of Information and Communication Engineering, Computer Science Faculty, University of Murcia, 30100 Murcia, Spain

²

ResilTech s.r.l., Piazza Nilde Iotti, 25, 56025 Pontedera, Italy

³

7bulls.com, Aleja Armii Ludowej 26, 00-609 Warszawa, Poland

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(3), 1600; https://doi.org/10.3390/app15031600

Submission received: 5 December 2024 / Revised: 24 January 2025 / Accepted: 1 February 2025 / Published: 5 February 2025

(This article belongs to the Special Issue New Advances in Computer Security and Cybersecurity)

Download

Browse Figures

Versions Notes

Abstract

:

As Information and Communication Technology (ICT) systems become increasingly complex, the need for adaptable and efficient security certification frameworks grows. This paper introduces a flexible security evaluation methodology designed to serve as the foundation for cybersecurity certification across diverse ICT systems. The proposed methodology integrates risk assessment and test-based evaluation, offering a scalable approach that adapts to different tools and processes, addressing the limitations of existing rigid certification schemes. The certification approach expands on ETSI’s Risk-Based Security Assessment and Testing methods, based on ISO 31000 and ISO 29119, and it integrates widely recognized standards such as MUD. This ensures an objective, empirical evaluation process that enables partial automation and simplifies recertification. As a proof of concept, we validate the methodology in two real use cases, an ICT gateway for smart grids and an AI-powered investments platform, demonstrating its flexibility and applicability to real-world contexts while addressing the challenges of modern ICT ecosystems.

Keywords:

cybersecuritycertification; risk assessment; ICT systems; security evaluation; MUD standard; security labeling; automated certification; ICT gateway

1. Introduction

As interconnectivity grows, ensuring the security of information and communication technology (ICT) systems becomes a top priority for organizations, governments, and regulatory bodies worldwide. The rise of complex and multilayered systems, such as smart grids, 5G networks, and the Internet of Things (IoT), presents unique cybersecurity challenges. These systems are often composed of heterogeneous components that must operate in dynamic and evolving environments, making them particularly vulnerable to cyberattacks. In fact, organizations are under extra pressure from “the challenge of managing security exposures in a constantly evolving threat environment and from increasing regulatory obligations and government oversight of cybersecurity”, as reflected in the Gartner Top Trends in Cybersecurity 2024 survey [1].

In this landscape, the European Union (EU) has recognized the need for a comprehensive cybersecurity certification framework through ambitious initiatives such as the Cybersecurity Act (CSA) [2] and the Cyber Resilience Act (CRA) [3]. Both regulations agree that a suitable cybersecurity certification approach would help to assess and compare different devices and products, increasing the trust of end users in a hyperconnected society. They place additional pressure on manufacturers, emphasizing the need to support cybersecurity throughout the product lifecycle.

Therefore, the definition of a certification framework requires efforts from different areas to meet the requirements of different stakeholders, such as manufacturers, institutions, and consumers. One of the main challenges arises from the heterogeneity present in the ICT landscape. The unique characteristics of each system require adapted approaches that could provide objective comparisons with respect to cybersecurity, whereas the dynamic nature of cybersecurity requires that the certification take into account the changing conditions in which the product will operate. This dynamism often manifests itself in patches or updates needed to address new vulnerabilities discovered in specific devices or components. Therefore, an agile certification process is essential to ensure that the security level remains up-to-date throughout the lifecycle of the product, as required by the new EU regulation.

At the same time, a certification framework must address the practical needs of the market by being efficient and cost-effective, ensuring that the launch of new products is not delayed. From an end-user perspective, the technical results of the certification process are often complex and present an additional challenge in how to communicate this information clearly and accessibly. Therefore, it is crucial to convert these technical findings into a simplified format, such as a visual label or certificate. This will help non-expert users easily understand and compare products while preserving the necessary technical details to ensure transparency and trust.

To address these challenges, this article proposes a security evaluation methodology based on two building blocks: security risk assessment and security testing. This methodology aims to certify the system’s security within a specific context. In particular, to support transparency for the user about the outcome of the certification process, this work proposes the concept of a cybersecurity label which contains information related to the level of security validated in the certification process. Our approach is based on an instantiation of the security risk assessment and testing methodology proposed by the European Telecommunications Standards Institute (ETSI) [4], which relies on the International Organization for Standardization (ISO) 31000 [5] and ISO 29119 [6] standards. Furthermore, we integrate the Manufacturer Usage Description (MUD) [7] standard from the Internet Engineering Task Force (IETF) as part of the certification process.

The applicability of the proposed methodology is demonstrated in two particular scenarios, an ICT gateway used in smart grid infrastructures and an Artificial Intelligence (AI) Investments platform that highlights the methodology’s relevance in assessing security for critical infrastructure components in different environments. The contributions of this paper are threefold:

A flexible and scalable security evaluation framework that integrates risk assessment with test-based evaluation, allowing for continuous monitoring, mitigation, and recertification of ICT systems.
A dynamic security labeling system, which provides transparent and real-time security status indicators, making it easier for both experts and non-experts to assess and compare the security levels of different products or systems.
Validation of the methodology through two case studies, demonstrating its adaptability across different types of ICT systems, including both hardware-centric infrastructures and AI-driven platforms.

2. Challenges and State of the Art

This section explores the key limitations and challenges faced by current certification schemes, as well as recent advances aimed at addressing these gaps.

2.1. Objectivity and Harmonization

A critical issue is the subjectivity and variability of the metrics used within the risk assessment processes. Current assessment methodologies, such as OCTAVE [8] and EVITA [9], rely on manual assessments, leading to inconsistent evaluations [10]. Additionally, metrics such as likelihood or impact are difficult to measure due to their complexity [11], relying on historical data, which are not always available. Approaches such as ARMOUR [12] address these challenges by combining risk assessment with empirical testing, but they are limited to specific contexts, e.g., IoT devices, and they lack a generalized framework for complex ICT systems.

Additionally, the wide range of existing cybersecurity certification schemes [13] can make the comparison of the certification results difficult. This negatively impacts the capability of users to compare the results of the certification process. Some certification frameworks such as Common Criteria (CC) [14] support the comparability among the results of independent cybersecurity evaluations through Collaborative Protection Profiles (cPP) under the terms of the Common Criteria Recognition Arrangement (CCRA). Others, like Commercial Product Assurance (CPA) [15], test products against CPA Security Characteristics, some of which are qualitative and ambiguous [11]. Similarly, CAP UL 2900 standards [16] are not public, so harmonization aspects are not addressed.

Our methodology integrates security testing results into the risk assessment process, enhancing objectivity and reproducibility while being supported by automated tools. This allows for precise and dynamic assessments of evolving security vulnerabilities. The methodology considers the system as a set of components, integrating individual cybersecurity aspects along with dependencies and potential cascade effects. Moreover, the methodology is aligned with well-known standards from ETSI, ISO, and IETF to foster the harmonization of the processes and metrics.

2.2. Operational Context

The context in which the system will operate is crucial to determine the required security level for certification. Security requirements in a smart home differ from those in a medical environment. Approaches like CC, the European Cybersecurity Certification Scheme (EUCC), or the European Cybersecurity Certification Scheme for Cloud Services (EUCS) rely on security profiles to define the minimum security level for each context, while others, such as CPA, do not adequately consider the operational context [11]. Inspired by CC, our methodology reuses the notion of protection profile (PP) to establish the security level for each context, serving as the certification basis.

2.3. Cost and Time

Frameworks widely used for ICT security certification, such as CC, OCTAVE, or CAP UL [17], are based on formal, documentation-intensive processes, specially for high Evaluation Assurance Levels (EALs). While effective in establishing security baselines, CC can be inflexible and resource-intensive, making it unsuitable for fast-evolving systems requiring frequent updates [18,19]. Recertification under these schemes can be time-consuming and costly, posing challenges for continuous security assurance in dynamic environments. Although EUCC and EUCS offer important foundations for security certification, their complex processes do not always adapt quickly to emerging technologies or recertifications [20,21], and no products have been certified under these evolving schemes.

Automated testing methodologies, such as Model-Based Testing (MBT) and fuzzing, improve the efficiency of security evaluations by automatically generating test cases or random input data to detect vulnerabilities. However, these tools are often implemented in isolation and are not fully integrated with broader certification frameworks, creating a gap between empirical testing and formal certification. Our methodology integrates automated testing tools and methods to accelerate the assessment process. It is flexible enough to be instantiated through other tools and mechanisms, and it allows for composition by reusing the previous results of individual components to assess more complex systems.

One critical aspect often overlooked in security evaluation is value management. This involves balancing the trade-offs between security investments, operational costs, and the overall benefit to the organization. The proposed methodology ensures that resources are allocated efficiently by aligning risk assessment and treatment efforts with the evaluator priorities, which could be mapped to the organization priorities. By integrating continuous feedback mechanisms and cost-effective automated tools, the methodology minimizes unnecessary costs while maintaining robust security levels.

2.4. Dynamism

One of the main challenges in cybersecurity certification is the dynamic nature of cybersecurity. A product certified as secure can quickly become vulnerable due to new threats or patches, needing recertification. EU regulations require manufacturers to address security issues, and the certification process must adapt to lifecycle changes to ensure ongoing compliance. Frameworks like CC, CPA, CSPN, ARMOUR, or CAP UL offer static, point-in-time security evaluations which do not reflect the evolving nature of cyber threats [11]. Continuous monitoring and fast recertification are crucial in environments with frequent updates or emerging vulnerabilities (e.g., zero-day attacks).

The proposed methodology integrates monitoring and automated assessment tools to support recertification. It helps manufacturers in lifecycle management by linking evaluation results with mitigation strategies and policies enforceable during the product’s deployment. For this, the methodology relies on the Manufacturer Usage Description (MUD) standard, which offers a framework for specifying the network actions a device is allowed to perform, thereby reducing the attack surface. However, MUD’s application is limited to specific device behaviors and does not extend to more complex systems with interactions between hardware, software, and networks [22]. To address this, we use an extended MUD model [23] complemented by the evaluation results generated by the methodology.

2.5. Labeling

Cybersecurity certification often lacks transparency, making it difficult for users to understand its benefits due to complex jargon and non-comparable standards. To address this, cybersecurity labels have been introduced as concise indicators of a product’s security level [12]. These labels balance simplifying technical details for users while clearly representing certification results. In this sense, our methodology incorporates a dynamic security labeling system using QR codes that, together with the continuous monitoring processes, provide real-time visibility into the security status of a system.

Table 1 provides a summary of the comparison between the main certification frameworks and the proposed methodology based on the challenges identified in the previous subsections. Additional information about the frameworks and their limits is detailed in our previous work [11].

3. Security Evaluation Methodology for ICT Systems

Figure 1 shows the overall process of certification that we propose. It is derived from the ETSI proposal [4], which combines an extended security assessment derived from ISO 31000 and typical security testing activities following the ISO 29119 standard. Building on this approach, we define a cybersecurity certification framework based on two core components outlined in the ETSI proposal: security testing and security risk assessment. Security testing focuses on identifying flaws, vulnerabilities, and other technical issues, while security risk assessment addresses potential vulnerabilities with an emphasis on legal and business concerns. The process begins with establishing the context, which lays the foundation for both streams by analyzing the environment in which the device or component will be evaluated. The methodology incorporates Continuous communication and auditing to maintain a management perspective, ensuring continuous oversight and control of the information generated from assessment and testing processes. The treatment phase provides targeted security countermeasures based on the identified risks. Finally, the methodology also includes specific activities related to certification. In particular, a labeling activity has been integrated after the risk assessment process, which is in charge of communicating the security level obtained for a specific Target of Evaluation (TOE). While the CC framework defines the TOE as a collection of software, firmware, and/or hardware with accompanying guidance, our approach extends this definition to include the TOE’s configuration (e.g., specific protocols, libraries, or cryptographic parameters, among others).

This work proposes the use of specific technologies and tools for security risk assessment and testing processes as the main building blocks to build a security certification approach for ICT systems. It represents, in turn, an extension of the methodology proposed by ETSI and supported by other standards to take advantage of its standardized basis. Table 2 shows an overview of the concepts and processes taken from different standards that support the proposed methodology, which will be further detailed in the next subsections. The methodology addresses key challenges identified in previous sections, such as the need for automated, dynamic, and objective risk assessment and the adaptability to different systems and tools.

3.1. Establishing the Context

To effectively assess the security of a system, it is essential to establish a strong basis to identify potential vulnerabilities and security flaws. Traditional risk assessment approaches often depend on known vulnerabilities cataloged in databases such as the National Vulnerability Database (NVD) [28] and expert analysis. Although these are valuable sources, they may not fully capture the evolving threat landscape, especially for emerging technologies. Our methodology enhances this approach by incorporating not only known threats but also best practices and relevant security standards to form an initial set of security claims.

Therefore, in this preliminary phase of establishing the context, we define the claims that will serve as a baseline to protect the system against both known and unknown threats. In total, we elicited 74 claims collected in [29]. Each claim has a description, a STRIDE and impact classification following our methodology, preconditions to apply the claim, dependencies on other claims, metrics to be collected, possible tests, and the sources from which the claim was obtained.

This phase also introduces the concept of Tolerance Profiles (TPs) to reflect the TOE’s operational context. Similarly to PP in CC, these profiles define acceptable levels of risk across different security aspects within the specific operational context. Figure 2 shows an example of a tolerance profile. The TP specifies the tolerable risk levels for each security property such as a confidentiality risk ranging from 0 to 7. If the confidentiality risk evaluated exceeds the defined threshold of 7, the product would not be eligible for certification. Additionally, a security level coding system, ranging from A (most secure, green) to D (least secure, red), links these risk thresholds to certification in a visual way through the label (see Section 3.6). Finally, in this phase, we can also establish the depth of the evaluation to be carried out. For this, we can rely on the notion of EALs outlined in the CSA, EUCC, or CC.

3.2. Risk Identification

Once the general claims have been established, the next step is to identify the specific ones relevant to the TOE. The goal of the risk identification phase is to break down the system into its components, determine the applicable security properties and claims for each of them, and define the security tests needed to validate these claims. This phase transitions from a high-level understanding of the system to a more detailed, component-specific view, which is critical for the following security assessment and testing phases.

Figure 3 describes the proposed scheme for identifying risks following the well known Unified Modeling Language (UML) notation, moving from a broad, abstract description of the system to detailed security tests for each component. The process can be summarized as follows:

(1) From system to security properties: Each system is evaluated against authentication, authorization, non-repudiation, confidentiality, integrity, and availability based on the well-known STRIDE taxonomy. Using this taxonomy simplifies the creation of a visual security label later in the process, as the security level will be quantified for each of the categories. It is important to note that not all security properties will apply to every system. For example, if a system does not interact with external resources or entities, the authorization property may not be relevant.

(2) From system security properties to components: Systems may consist of a single component (e.g., a device) or multiple interconnected components, each of which may affect one or more security properties. In this step, we identify the components that compose the system and their associated security properties. This mapping helps to identify which parts of the system require specific security tests.

(3) From components to applicable claims: Once the components are identified, we select the claims applicable to each one of their security properties. The set of claims we identified has already been classified following the STRIDE taxonomy, facilitating this process for the security evaluator.

(4) From generic claims to vulnerability-based claims: Some claims are directly linked to known vulnerabilities (e.g., “C32: The source code must not use components with known vulnerabilities”). These vulnerabilities can be identified from public databases such as the NVD, Common Weakness Enumeration (CWE) (https://cwe.mitre.org accessed on 3 February 2025), and Common Vulnerabilities and Exposures (CVE) (https://cve.mitre.org accessed on 3 February 2025). When a vulnerability is identified, it is assigned a predefined risk score (e.g., using CVSS) on a scale from 0 to 10. The security expert may refine this score based on the specific characteristics of the component and its usage context.

(5) From claims to security tests: Once the vulnerabilities and claims have been identified, the next step is to define the tests required to validate them. For vulnerability-based claims, tests are designed to verify whether the identified vulnerabilities are present in the system. For test-based claims, relevant tests are developed to ensure that the system meets the security requirements for each claim.

At the end of the process, we decompose the TOE evaluation in a tree (Figure 3) in which the leaves represent the specific security tests to be developed and executed in subsequent phases.

This phase also analyses the dependencies between components in order to identify any cascade effects and highly sensitive components within the system; that is, components that depend on a wide number of other components and are more prone to failures. Therefore, a sensitivity metric between 0 and 10 is assigned to each component of the system. The higher the sensitivity value, the more likely the component is to be impacted by cascade effects. This metric can be obtained manually or semi-automatically by analyzing the dependencies discovered in files such as the Software Bill of Materials (SBOM) [30] for software dependencies or MUD for network dependencies. The sensitivity metric also supports prioritizing components during the security evaluation and testing process. Components with higher sensitivity values are more likely to propagate failures or vulnerabilities throughout the system and should therefore be tested with greater priority. This prioritization is essential to optimize security efforts, as ensuring 100% security is impractical due to resource constraints and/or system complexity. The trade-off between security and practical applicability is managed by focusing testing and verification efforts on the most critical components, ensuring an efficient and effective evaluation.

3.3. Security Testing

The security testing phase is essential for creating a comprehensive test description to evaluate the security claims identified in the previous phase. The first step in this phase is the test design in which claims relevant to the TOE are detailed and transformed into structured test outlines, prioritizing each test based on its associated risk levels. This prioritization ensures that tests targeting higher-risk components or properties are implemented first, optimizing security assessment efforts.

Once the skeleton of the tests is obtained, the test implementation phase translates them into specific low-level steps that the TOE could understand and process. Based on the nature of each test, the most suitable testing technique and tool should be chosen. Although the MBT approach is highly recommended due to the automation of the testing process, this technique can be complemented with others, such as fuzzing for testing inputs and random behaviors. An example of combining multiple techniques can be found in Section 4. The tests can be implemented using a multipurpose programming language such as Python, C, or Java with testing support (e.g., the JUnit framework) or a dedicated language such as Testing and Test Control Notation version 3 (TTCN3) [31], which is supported by TITAN (https://projects.eclipse.org/projects/tools.titan accessed on 3 February 2025) during the execution.

The environment setup phase ensures that the TOE and its testing environment are fully configured. This environment may be local, using internal devices and resources, or remote, using platforms such as FIT-IoT (https://www.iot-lab.info accessed on 3 February 2025) for broader, distributed testing. Local environments allow for precise control, while remote setups facilitate tests that require multiple entities such as distributed denial-of-service simulations.

With the environment established, tests are executed in the test execution phase. Automation is encouraged, for example, using the previously mentioned platforms (JUnit or TITAN) to streamline test execution. At the end of the test execution phase, a test report is generated to collect the results of the tests:

PASS if the test result meets the conditions of the test specification.
FAIL if the test result does not meet the conditions for passing the test.
Specific metrics that are not exclusive to one of the other values (PASS/FAIL). These metrics provide more refined information than a binary result, e.g., the encryption percentage, the algorithm used, or the length of the key, helping to improve the estimation of risk.

3.4. Risk Estimation

The risk estimation phase uses test results from the previous phase to determine the security level of the system more precisely and objectively. This phase calculates risk by combining two primary factors, the likelihood of a vulnerability being exploited and the impact of such an exploitation, using the following well-established formula [32]:

R i s k = L i k e l i h o o d \times I m p a c t

3.4.1. Likelihood

Even if the two factors of the equation have been refined in different risk assessment schemes (e.g., CWSS, CVSS, DREAD, etc.), the likelihood continues to be a complex measurement that requires either a history of vulnerabilities or an expert who determines, possibly based on several factors (e.g., equipment, necessary knowledge, exposure of the system, etc.), the likelihood of exploiting the vulnerability. To address this problem, the proposed methodology establishes a mapping between the test result and the likelihood value. Deterministic tests, such as those that verify whether communications are encrypted, contribute to the overall risk with 0 if the test passes (encryption present) and 1 if the test fails (no encryption). More complex tests yield graded likelihood values based on empirical data, such as the percentage of encrypted data or algorithm strength, scaling results between 0 and 1 to align with the likelihood scale. This approach ensures that both binary and graded tests contribute to the nuanced estimation of likelihood, allowing security evaluators to refine assessments based on real-world test metrics.

3.4.2. Impact

The impact factor is evaluated by a security expert based on the context of the TOE. Recognizing that not all vulnerabilities have equivalent effects, we treat the impact as a multidimensional measure that encompasses several domains: safety, financial, operational, privacy, and legal impact. For example, a vulnerability in an aircraft’s control system impacts safety critically, whereas one in an in-flight entertainment system primarily affects financial and operational aspects. Following this multidimensional approach already validated in frameworks such as HEAVENS [33], EVITA [9] or MoRA [34] ensures that impact values are aligned with the context of the system.

To reduce the subjectivity of the impact assessment, we use standardized scales that offer clear criteria for quantifying each aspect aligned with best practices and standards. This ensures reproducibility of the results of the evaluation and a harmonized comparison between products evaluated by different persons. For instance, Table 3 shows the scale for safety impact based on ISO 26262-3 standards [24]. Similarly, we have impact scales for the financial dimension (Table 4 following the classification of BSI-Standard 100-4 [26]). The Operational dimension (Table 5) is based on Failure Mode and Effects Analysis (FMEA) [25], adapting the vehicular defect severity categorization to classify the operational damages as done in HEAVENS. Finally, the privacy and legislation dimension (Table 6) follows the scale dimensions proposed in HEAVENS, although it could be also aligned with other proposals such as the Privacy Impact Assessment Guideline provided by BSI [27].

The aggregate impact score is normalized on a scale of 0 to 10 (Table 7), facilitating an objective and scalable impact assessment. In this sense, the Safety and Financial impacts are given equal importance when estimating the overall impact level. The repercussions of these factors can be extremely severe for stakeholders (e.g., vehicle occupants may not survive, or organizations may face bankruptcy). In contrast, the impact of the operational and privacy and legislation parameters is comparatively lower concerning safety and financial damages. To accurately represent this during impact level estimation, corresponding factors were reduced by a magnitude of one for both operational and privacy and legislation parameters relative to the safety and financial parameters.

3.4.3. Risk Aggregation

While the risk identification phase focused on breaking down high-level system analysis into low-level tests, this phase reverses the process, aggregating risks from the test level back to the overall system security properties, as shown in Figure 3.

(1) From tests to claims: The overall risk for each claim is calculated as

R i s k_{C l a i m} = L i k e l i h o o d \times I m p a c t

using the typical formula from NIST [32]. However, to accurately reflect the complexity of system risks, the calculation is refined depending on the type of claim. While test-based claims do not have an established impact value, vulnerability claims related to NVD entries already have a recognized impact value that the methodology can use for the risk calculation.

Test-based claims: For claims that are verified directly through testing, the risk is calculated by averaging the likelihood scores of all m-related tests:

$R i s k_{C l a i m} = I m p a c t_{C l a i m} \times \frac{\sum_{m} L i k e l i h o o d_{t e s t}}{m}$

(1)

Here, $I m p a c t_{C l a i m}$ is determined by the evaluator using the standardized metrics presented in Section 3.4.2, and $L i k e l i h o o d_{t e s t}$ represents the empirical likelihoods derived from each test, as explained in Section 3.4.1.
Vulnerability-based claims: For claims related to n known vulnerabilities, each vulnerability ( $v u l n_{i}$ ) is assigned an impact score based on established databases (e.g. CVSS vector in NVD). The risk is then calculated using the max function instead of the mean to emphasize the severity of the most critical vulnerability:

$R i s k_{C l a i m} = max_{1 \leq i \leq n} (I m p a c t_{v u l n_{i}} \times L i k e l i h o o d_{v u l n_{i}})$

(2)

where $L i k e l i h o o d_{v u l n_{i}}$ , as before, is averaged from all tests associated with that vulnerability.

(2) From claims to components: After calculating the risk for each claim (

R i s k_{C l a i m_{i}}

) based on Equation (1), we aggregate the risks at the component level. We compute the risk for each component with r associated claims as follows:

R i s k_{C o m p o n e n t} = max_{1 \leq i \leq r} R i s k_{C l a i m_{i}}

(3)

Again, we use the max function instead of the mean to emphasize the severity of the most critical risk.

(3) From components to system security properties: The next step evaluates the risk for each STRIDE security property across the system. If a system property is considered in c components, the overall risk is calculated by weighting each component’s risk by its sensitivity (calculated in the risk identification phase),

R i s k_{S e c u r i t y P r o p e r t y} = max_{1 \leq i \leq c} (R i s k_{C o m p o n e n t_{i}} \times S e n s i t i v i t y_{C o m p o n e n t_{i}})

(4)

where

R i s k_{C o m p o n e n t_{i}}

is the value computed in Equation (3).

At the end of this process, the methodology obtains a risk value for each of the six STRIDE security properties in the system using Equation (4).

3.5. Risk Evaluation

The risk evaluation phase determines whether the estimated risks for each STRIDE security property are acceptable given the specific context in which the system will operate. To this end, we rely on the TP defined in the establishing the context phase.

Each security property risk obtained from Equation (4) is compared against the acceptable risk thresholds defined by the TP, which specify the maximum risk allowed for certification within a specific context. Figure 4 illustrates the evaluation process for a TP that sets different acceptable risk ranges. If the confidentiality risk evaluated is 3, this falls within the acceptable range, and a security level of B will be assigned to the confidentiality property. However, if the confidentiality risk evaluated is 8, it exceeds the tolerance threshold, making the risk unacceptable in this context, and, as a result, the system would not be certified. This evaluation procedure is repeated for each security property.

3.6. Labeling

The labeling system introduced in the methodology serves as a visual tool to communicate evaluation results to both technical and non-technical stakeholders. The radar chart was selected as the primary representation due to its ability to intuitively summarize multiple security dimensions. While there are some proposals for security labels, they tend to be overly simplistic, merely including the certification logo, such as the “Cybersecurity Made in Europe” label from ECSO [35]. Others, such as the Carnegie Mellon label [36], present too much information, potentially overwhelming non-expert users and becoming outdated quickly.

Our proposal strikes a balance by providing a visual representation of the information without overwhelming details while offering a link to more detailed and updated information via a dynamic QR code.

Each axis displays the score for a specific property, with the chart’s enclosed area reflecting the overall security level: a larger area indicates higher security across the evaluated dimensions. This intuitive representation, inspired by the unidimensional EU Energy Label [37], links risk levels to a visual format, making complex evaluations accessible to non-experts. Alternative representations, such as linear graphs or tabular formats, were considered but discarded due to their lack of clarity and immediate comprehensibility for diverse stakeholders. By enhancing transparency, the label supports informed decision-making, empowering consumers and stakeholders to compare products and better understand their security features, fostering a more secure marketplace.

At the center of the radar chart, a QR code is embedded, as required by the EUCC. This code currently links to the project’s webpage, but its design and implementation are intended to support additional functionalities in future iterations. Specifically, the QR code shall encode a unique identity for the device, allowing users to access detailed properties through various views. For example, scanning the QR code could provide information about the device’s security certifications, risk levels, testing results, or compliance with specific standards. This feature is envisioned to enhance traceability and provide stakeholders with immediate, context-specific insights about the device.

While the current implementation is preliminary, future developments will focus on integrating the QR code with a secure database that enables real-time lookups of device properties. This will allow the label to evolve beyond static representations, supporting dynamic updates and more interactive security management.

3.7. Treatment

Risk treatment is defined by ISO 31000 as the process of modifying risks, typically through the implementation of controls. In general, the results of the security evaluation are used only to validate or certify the security of the system, overlooking an essential benefit: the identification of security vulnerabilities within the system and the opportunity to implement mitigation strategies during its operational phase. Our methodology incorporates risk treatment by embedding actionable security recommendations within the MUD extended profile defined in [23]. This approach leverages the results of the security evaluation to generate a behavioral profile that includes both manufacturer-provided security guidance and recommendations obtained from the security assessment.

3.8. Monitoring and Communication

The cybersecurity certification process should not end with the initial security assessment performed before market deployment. Recognized entities, such as the National Institute of Standards and Technology (NIST) [38] or the European Commission under the CSA CRA, emphasize that cybersecurity certification should be an ongoing effort, adapting to evolving threats. Thus, mechanisms for maintaining updated security levels should be integral to any cybersecurity certification framework.

In this sense, the methodology integrates the MUD profile created in the previous phase with monitoring tools, enabling operators to detect and respond to any deviations from the expected behaviors outlined by the certification authority and the manufacturer in the profile. Additionally, the MUD profile could be dynamic, so the manufacturer can describe additional policies in case of a new threat before a patch or update is released.

4. Evaluation over Use Cases

This section demonstrates the application of the proposed security evaluation methodology to two specific use cases: an ICT Gateway (ICT GW) within a smart grid environment and an AI Investments platform. Considering the maturity levels defined in Table 8, the methodology has been applied in both scenarios at the "managed" level. We have a partial automation of this process due to the usage of specific tools, and basic metrics have been selected to provide a proof of concept.

Figure 5 provides an overview of the tools and concepts selected to support the instantiation of the methodology. In particular, the context establishment phase makes use of our predefined set of claims and TP. The risk identification phase is supported by the ResilBlockly tool, which automatizes the risk analysis and the generation of an initial MUD file. As testing tools, we use GraphWalker for MBT and a fuzzing tool developed within the BIECO project. Risks are aggregated and evaluated using the SecurityScorer tool, which also produces a security label that is visualized in the BIECO GUI. The instantiation considers additional tools for the generation of the extended MUD (MUD updater tool) and for the detection of misbehaviors based on the MUD (auditing tool). The following dedicated subsections detail how the tools are used in each use case, supporting the automatization of the methodology.

4.1. ICT Gateway

This scenario, developed in the context of the NET2DG project [39], illustrates how the methodology can guide the security evaluation of complex systems composed of different subsystems.

The ICT GW, depicted in Figure 6, is a critical component in a smart grid that serves as a bridge between the distribution system operator and a variety of data sources and actuation subsystems, including smart meters, remote terminal units, and inverters (INVs). The ICT GW consolidates and standardizes the data collected from these heterogeneous subsystems, storing them in a central database for use by domain applications. These applications, operating within the DSO, aim to improve grid efficiency, maintain voltage quality, and support outage diagnosis. The ICT GW’s architecture is organized into three layers, the adapters layer, the domain logic layer, and the service layer, each contributing distinct functions to manage data flow, ensure operational efficiency, and enable seamless integration within the grid.

While the ICT GW enhances data interoperability and grid management, it also introduces potential security risks, particularly due to its integration with multiple external subsystems. Given that the ICT GW is commercialized, a robust security evaluation offers an additional assurance layer to DSOs. The ICT GW, including the three architectural layers, is considered the TOE.

4.1.1. Context Establishment

This phase involves establishing the security context by defining relevant security and privacy claims tailored to each layer of the ICT GW architecture. These claims are selected from a predefined set that aligns with industry standards and best practices, establishing the basis for security evaluation. A total of 11 claims were identified as critical to the system’s secure operation. Additional details about them are given in [29].

C10: Changes in user authentication values are executed securely.
C11: Sensitive parameters for secure association establishment are protected for integrity during communication.
C17: Sensitive data in transit are encrypted with appropriate methods.
C22: The system is resistant to Denial of Service (DoS) attacks.
C23: Input data are validated to prevent injection vulnerabilities.
C45: All protocols and libraries used are up-to-date.
C46: Authentication protocols utilize recommended algorithms for security.
C47: Authenticated sessions expire and require re-authentication for enhanced security.
C52: The system allows data subjects to delete their personal data permanently.
C58: The system enforces a limit on consecutive failed login attempts.
C72: Logs are protected against unauthorized removal.

In addition to these claims, we defined TP specific to the ICT GW use case. These profiles specify the minimum acceptable risk levels across each security property—such as confidentiality, integrity, and availability—according to the ICT GW’s role in the smart grid. The profiles were structured in YAML [40] format (first block of Listing 1, lines 2–8) to facilitate interpretation and integration with automated tools, ensuring that the methodology is adaptable and scalable for different implementations.

Listing 1. YAML-based file syntax for system description.

4.1.2. Risk Identification

The risk identification phase focuses on characterizing the system, its components, and the relationships between them to identify vulnerabilities that may pose security risks. For the ICT GW, this phase involves constructing a detailed model of the system, leveraging multiple data sources and tools to provide a structured overview of components and associated security claims. In this instantiation (Figure 5), we employ the ResilBlockly tool for a detailed analysis of vulnerabilities and attack paths, as well as for creating an extended behavioral profile. ResilBlockly [41] is a tool developed within the BIECO project designed to model highly complex and interconnected systems and infrastructures while significantly reducing the cognitive load typically associated with such activities. Additionally, it includes features for modeling and identifying threats, enabling security risk assessments of identified weaknesses and vulnerabilities.

We describe the ICT GW in a YAML file following the format described in Listing 1 (lines 9–22), which follows the IETF notation, where ? denotes optional fields and * denotes repeatable fields. In this file, components are listed with their respective sensitivity scores (ranging from 0 to 10), indicating the criticality of each component. The claims block links these components to both generic and vulnerability-specific claims, together with their impact levels across the different dimensions (safety, operational, financial, and privacy/legal compliance). Vulnerability information, including known vulnerabilities from public databases (e.g., CVE), is also included in this system description, allowing for integration with risk assessment tools like ResilBlockly for further analysis.

Furthermore, we use the ResilBlockly tool to create a model of the TOE (available in [23]). From this model, ResilBlockly can execute a vulnerability and weaknesses analysis, identifying and associating them with components. In the analysis performed over the interface GUI-ICTGW, ResilBlockly was able to retrieve 152 CWEs and 11 CVEs from the catalogs. By carefully analyzing the CWEs and using an approach that involves selecting CWEs from the first level of a hierarchical weakness tree, the number of CWEs associated with the interface was reduced to 33. Finally, ResilBlockly assists in generating a preliminary extended MUD profile for the ICT GW, integrating the risk analysis outcomes and information from the system model. The complete MUD file and further details about the generation process of this file are given in [23].

4.1.3. Security Testing

The methodology instantiation incorporates two security testing techniques, MBT and fuzzing, using tools developed or refined within the BIECO project, as shown in Figure 5: GraphWalker and Fuzzing Tool. Each technique is applied in different layers of the ICT GW to verify the compliance with the selected set of security claims.

Testing was conducted over three iterations, each targeting one of the ICT GW’s architectural layers (adapters, logic, and service). Table 9 summarizes the implemented tests.

Fuzzing testing is a technique that aims to identify security vulnerabilities in the TOE by using unintended or incorrect inputs. This approach has proven to be effective in identifying weaknesses that may be overlooked by other testing methods. On the one hand, it can be used to test input data (data fuzzing testing) by feeding the TOE with random data to find possible errors or vulnerabilities. On the other hand, it can be used as behavioral fuzzing testing in which valid/invalid message sequences are used for the same purpose.

In this instantiation, the Fuzzing Tool (https://www.gradiant.org/blog/deteccion-de-vulnerabilidades-fuzzing-bieco/ accessed on 3 February 2025) developed within BIECO was used to generate multiple HTTP requests to the ICT GW’s endpoints, testing combinations of parameters not specified in the Swagger file. This testing approach identified potential misconfigurations or response anomalies that could pose security risks.

MBT involves creating a high-level model representing the TOE, allowing for automated test generation based on it. This is a major advantage, as it eliminates the need to specify each test step by step. The research presented in [42] represents an example of how MBT can be used to automate the generation of tests. Other examples also show how to combine and automate fuzzing testing with MBT to test inputs and behaviors [43].

We used the open-source tool GraphWalker (https://graphwalker.github.io/ accessed on 3 February 2025) to define the ICT GW testing model as a directed graph. Figure 7 illustrates the GraphWalker model designed for the adapter layer of the ICT GW, which was used to generate and execute tests. By default, GraphWalker generates a unique sequence of steps that cover the whole tree, and the user still needs to connect the high-level steps with specific actions in the real system. To automate this process and adapt to the methodology needs, we enhanced GraphWalker with a Test Adapter and Suite Generator (TASG) extension. This extension is in charge of automating the generation of different tests based on finish conditions embedded in the model, and the creation of an adapter to connect each step with the real system. This enables automation of approximately 46.18% of the test generation process. Although some manual implementation of the adapter is necessary, subsequent modifications, additions, and repetitions of the tests can be made without making significant changes to the adapter. This improves the efficiency of the re-evaluation process after a possible security change is detected. This capability is particularly valuable in complex systems with numerous components and dependencies between them. Finally, the user can run all tests, in this case using Maven, obtaining two results files:

TEST-TestSuite.xml: Records the results (pass/fail) for each test, with detailed failure reasons and assertions.
TestSuite-output.json: Collects non-binary metrics for each test in JSON format, allowing for additional information capture such as encryption strength or session expiration. The JSON schema, shown in Listing 2, specifies metrics, values, and scales, which are mapped to a 0–1 range for likelihood calculation.

The resulting test data from GraphWalker and the fuzzing tool are used in the next phases for risk estimation via the SecurityScorer tool and risk treatment via the MUD Updater tool, ensuring a structured and automated approach to analyzing test outcomes.

Listing 2. JSON file for non binary values.

4.1.4. Risk Estimation and Evaluation

This phase calculates the security level of the system based on the data collected from the testing tools, which can be in formats such as .json,.xml,.csv, etc., and the YAML system description created during the risk identification phase. To automate the process, we developed a scorer tool, the SecurityScorer (Figure 5), which performs the following sequence of actions:

Reading and parsing the YAML system description file, including the TP;
Reading and parsing the outputs of the security testing phase tools, including non-binary results;
Using both sources of information to evaluate the risk value for each component based on the methodology defined in Section 3.4;
Combining the risks of the components to calculate the overall risk value of the whole system;
Using the tolerance profiles to certify the system properties (A, B, C, D, or not certified);
Generating the cybersecurity label according to the evaluation results.

For the ICT GW, we obtained the results shown in Table 9. The last column shows the test result (failed/pass), while the sixth column shows the non-binary information collected from that test, if applicable. We discovered that some of the claims were not fulfilled by the ICT GW, such as the establishment of a limit on authentication attempts to prevent brute-force attacks or the protection against log removal. Among the non-binary information, we discovered the presence of 18 outdated libraries, a maximum of 4822 simultaneous connections supported in the service layer, and a session expiration time of 1 min. Based on the evaluation, the manufacturer was able to update the implementation to add additional security functionalities or suggest specific configurations to be enforced by the ecosystem, such as using a firewall, implementing limits on authentication attempts, or updating the identified libraries.

4.1.5. Labeling

The security label has been designed as a hexagonal radar diagram to support the visualization of the security level in a way that it could be understood by non-expert users. In this case, the SecurityScorer calculates the security level for each of the six security properties based on the information produced in the previous phase and the TP defined in the YAML file. As shown in Figure 5, the label is visualized in the GUI developed within the BIECO project. In particular, Figure 8 shows the label obtained for ICT GW after running the SecurityScorer, which integrates a QR code to deal with future updates of the label with the objective of providing additional information about the security evaluation process, enhancing transparency and reusing evidence.

The label indicates that the ICT GW achieved a high level of security in terms of confidentiality, integrity, availability, and authorization. However, it highlights areas needing improvement, specifically in non-repudiation and authentication security properties. For a non-expert user, the label shows a hexagon with a large green area, suggesting that the product can be considered secure.

4.1.6. Treatment

Based on the results obtained from the testing tools, we improve the initial version of the extended MUD, which was created by the ResilBlockly tool during the risk identification phase. To automate this process, we created the MUD Updater tool (Figure 5, treatment phase). It collects empirical results from the security testing phase to refine the initial values of the extended MUD, adding new configurations to reduce the attack surface of possible attacks and mitigations for the vulnerabilities encountered. Figure 9 shows an overview of the relationship between the MUD and the methodology in which the original MUD file created by the manufacturer is extended by the assessment report of ResilBlockly and refined later by the MUD updater.

The MUD updater uses the metrics derived from the test’s execution, which are stored in the TestSuite-output.json file. In particular, the MUD updater analyses the field matchWith shown in Listing 2 (line 12) to locate the name of the corresponding ACL within the extended MUD that needs to be updated or extended. Then, based on the metric’s name and value, the tool defines the specific policy to be integrated into the MUD.

The outputs of the ICT GW testing, in addition to their MUD file, were then used as inputs for the MUD updater to derive new security configurations. In particular, the maximum number of simultaneous connections (num-connections) was established at 4822 (C22), and the cryptographic algorithm was set up to RSA-OAEP-256 (C46). The MUD was extended 160’69% with respect to the original MUD file, and 40% of the non-binary test results were integrated into it. The resulting extended MUD file can be found in [44].

4.1.7. Monitoring and Communication

As explained previously, the communication and auditing phase deals with the security changes that may occur during the lifecycle of the system. Following the steps of Figure 5, the methodology is supported in this phase by the auditing framework and the monitoring ontology developed in [45], which is intended to detect security issues based on a set of blueprints derived from the design phase and manually specified by the user. As shown in Figure 9, one of the blueprints used in the auditing framework is the updated MUD generated from the treatment phase, which contains a specific configuration (e.g., limited connections) that must be monitored to keep the system in a secure state according to the test results. If there is a violation of the MUD policies during the monitoring process, the auditing framework requests a re-evaluation process or a mitigation action, which could imply a modification of the MUD file. Additional details on this can be found in [45,46] and therefore are not included in this article.

4.2. AI Investments

AI Investments (AII) is a platform that aims to improve financial portfolio management and reduce risk exposure by integrating advanced artificial intelligence and deep learning techniques. Its primary goal is to improve the performance metrics of hedge funds, trading companies, and other investment institutions. By adopting a comprehensive approach, the AI Investments system integrates and standardizes disparate data sets, leveraging modern advances in time series analysis and optimization algorithms to construct investment portfolios that meet optimal performance benchmarks. Once this process is complete, the platform executes trading actions through the designated stock brokerage, effectively combining technological innovation with investment strategy.

Incorporating state-of-the-art AI and deep learning mechanisms into the AI Investments platform, including the use of deep neural networks for signal creation in transactions, convolutional neural networks for pattern detection, and Long Short-Term Memory (LSTM) networks for the analysis of time-based data sequences, emphasizes the need for robust security measures across these interconnected systems. The proposed methodology aims to protect the core processes powered by AI from any potential threats that could emerge from interactions with external entities. As shown in Figure 10, the risk of cybersecurity incidents is increased by the connection between the main component of the platform, the master component, and third-party brokers. These brokers operate outside of the protected environment of the AI platform, which means that there is a possibility that they could unknowingly become pathways for cyberattacks. In this context, the master component has been selected as the TOE. Any breach in security not only poses a risk of financial loss but could also undermine confidence in the innovative trading solutions offered by AI Investments. To address this risk, a thorough security validation plan has been developed to carefully examine and remove these threats so that the platform’s defenses are strengthened against potential breaches originating from these external sources.

Compliance with strict regulatory standards related to data protection and system security is also imperative in the fintech industry. The adopted security practices guarantee adherence to these legal requirements, safeguarding the firm from possible legal and fiscal penalties due to non-compliance. The management of sensitive financial and strategic investment data, crucial to the user base of the AI platform, highlights the need for an impregnable system. The security plan in place is specifically designed to protect this vital information, ensuring that its confidentiality, integrity, and availability remain intact. With cyber adversaries constantly innovating in their methods to exploit vulnerabilities, the security validation framework for the principal component is structured to be flexible, adapting to novel threats as they arise, and thus continuously securing the platform’s defenses. By rolling out a comprehensive security evaluation process, AI Investments is affirming its commitment to the safeguarding of investor assets and information, which is vital for the platform’s endorsement and eventual triumph.

In the fintech industry, it is also crucial to adhere to strict regulatory standards regarding data protection and system security. By implementing security practices, companies can ensure compliance with legal requirements and avoid potential penalties. The AI platform relies on sensitive financial and strategic investment data, making system and data security of utmost importance. The current security plan is designed to protect this vital information and maintain its confidentiality, integrity, and availability. Overall, by implementing the security evaluation methodology, AI Investments can demonstrate its commitment to protect investor assets and information, which is essential for the success of the platform.

4.2.1. Context Establishment

We selected a set of basic security claims [29] that should be fulfilled to consider the master component secure. Table 10 (first and second columns) shows the list of selected claims and the associated security properties. In particular, we selected six claims:

C6: The changes in the authentication values for user authentication are successful.
C15: Sensitive security parameters exchanged during the communication for the establishment of a secure association should be integrity protected.
C18: Sensitive security parameters should be encrypted in transit, with appropriate encryption.
C22: Resistance to DoS attacks.
C23: Data input validation.
C47: Authenticated sessions should expire, with a new reauthentication required.

4.2.2. Risk Identification

Following the process, the master component was modeled in ResilBlockly. Figure 11 shows a portion of the master model that was simplified to contain only one RUMI and service for the sake of the readability of this paper. The master was modeled as an entity (CS) called by the master REST API. One of the master REST API functionalities is to generate the training report. To execute it, the master REST API calls the internal master processor entity. The call is represented by a green box Master_API_To_Master_Get_Training_Report reference in Figure 11. In turn, the master processor contains a service Generate_Training_Report (orange box) and a RUMI Master_To_Master_API_Training_Report that responds with the Training_Report_Result message (blue box) to the REST API. Other RUMIs and services in the master processor are implemented similarly, as well as the other internal subsystems like the monitoring API, worker API, and AIT DB API in the AIT architecture.

ResilBlockly used the information of the system model to create an initial extended MUD for the master component. The extended MUD generated (ACL block) can be downloaded here (https://univmurcia-my.sharepoint.com/:u:/g/personal/saranieves_matheu_um_es/EaP1RhtPM1tDjUaLrfq6rHEBcLFYpfogUPgfcCwaSXhAWA?e=QE6suo accessed on 3 February 2025).

In this phase, we also generated the YAML file with the system description to identify and map the components, claims, security properties, and tests that will be executed. As before, the impact of each claim (column 5 of Table 10) and the master sensitivity (set to 8) were determined based on the expertise of the use case owner.

4.2.3. Security Testing

In this phase, we implemented the tests using the GraphWalker and Fuzzing Tool. As before, the swagger file was used as input for the fuzzing tool, and the master model necessary to generate the selected test was created in GraphWalker, as shown in Figure 12. The list of implemented tests can be found in the fourth column of Table 10.

The adapter interface generated by the GraphWalker extension was implemented to link the automatically generated JUnit test suite with the master real component. With this approach, we automated the 44’96% of the test generation process.

4.2.4. Risk Estimation, Evaluation, and Labeling

The results obtained from the tests are presented in Table 10. The last column shows the test result (fail/pass), while the penultimate column shows the non-binary information collected from that test, if applicable. We found that some of the claims were not fulfilled by the TOE, especially with respect to the usage of certificates. Moreover, during the non-binary test, two unnecessarily exposed interfaces were identified, and the expiration time was determined to be 60 min. Regarding DoS resistance, the maximum number the computer was capable of testing was 50,000, so the real maximum number of simultaneous connections was not determined. Based on the evaluation, the manufacturer implemented additional security functionalities to validate certificates based on the evaluation and improve the security of the AI Investment platform.

As before, the test report, non-binary values, and the YAML file containing the system description were used as input in the SecurityScorer tool, which calculated the security level for each security property and generated the security label shown in Figure 13. Since the non-repudiation and integrity security properties were not tested in this particular use case, the label does not take them into account (there are no vertices for them).

The label indicates that the AI investments platform achieved a high level of security in terms of confidentiality, authorization, and availability. However, it highlights areas needing improvement, specifically in the authentication mechanism. Since the non-repudiation and integrity security properties were not tested in this particular use case, the label does not take them into account (there are no vertices for them). For a non-expert user, the label displays a medium quadrilateral, suggesting that the product has a medium level of security.

Comparing this result with the previous use case, the ICT GW can be considered as more secure. Although the simplicity of the label helps users to decide between two similar products based on their security level, one of the limitations of the label is the false notion of security if the basis for the evaluation process is not strong. Therefore, the certification process associated with the methodology should rely on mechanisms and certification authorities that regulate the minimum requirements for a product to be certified.

4.2.5. Treatment

After the execution of the tests derived by GraphWalker, the TestSuite-output.json file was generated to describe the non-binary values and their match with the extended MUD policies. Across the set of tests, three of them were non-binary. Since the exact maximum number of simultaneous connections was not determined, we established 50,000 as a secure limit. The JSON file was then used as input for the MUD updater to modify the field num-connections to 50,000, which previously was estimated to be 3. The MUD was extended 268’94% with respect to the original MUD file, and 33’33% of the non-binary test results were integrated into it.

5. Discussion

The proposed security evaluation methodology for ICT systems offers a robust framework to address the limitations of traditional approaches. By integrating automated testing and dynamic labeling, this methodology meets the demand for adaptable and cost-effective cybersecurity certification processes, particularly suited for systems operating in highly dynamic environments. The automation capabilities offered by integrated tools like GraphWalker and the Fuzzing Tool significantly streamline the traditionally time-consuming certification processes. This approach not only ensures that testing is comprehensive but also supports rapid re-evaluation, facilitating ongoing validation. Automating the processing of test results, risk scoring, and MUD profile generation further simplifies this process, allowing for timely evaluations of systems.

By combining risk assessment with test-based evaluations, this approach allows for a more precise and objective assessment of security risks, capturing both binary and non-binary test results. This empirical approach, combined with best practices approaches such as HEAVENS or ISO standards for metrics calculation, enhances the accuracy of risk estimations and aligns security assessments with the operational needs of modern ICT environments. Moreover, the introduction of TP tailored to the security context provides a customized approach to certification. These profiles allow the methodology to adapt to the unique security requirements of different contexts and systems. However, the definition of such TP requires the joint effort of regulatory entities and industry or an adaptation from CC collaborative profiles.

Applying the methodology to two use cases has shown its effectiveness in assessing complex and interconnected systems in different contexts. Furthermore, the methodology has been designed to enable the composition of previous evaluations, since the risk estimation phase combines the individual risks of each system component. Although previous results can be reused, the combination of individual risks is modulated by the sensitivity factor, so a detailed analysis of the role of the component within the system and its dependencies with other components is necessary for the final calculation of the security level. Additionally, integration of the component into the system can lead to additional claims and integration tests that allow for verification of the joint security of the system. In this case, the results of the new tests should be combined with the previously certified security of the component.

The methodology introduced a user-friendly visual security label that conveys the system’s security posture to both technical and non-technical stakeholders. By representing key security properties in a radar chart format, the label provides a clear summary of complex evaluations, enabling consumers to make informed decisions and easily compare the security levels of different systems. This transparency is a crucial step toward fostering trust and understanding in cybersecurity certification. However, there is still some pending discussion regarding which information should be public or private and how to manage access to the sensitive information inside the security label.

The methodology also demonstrates significant alignment with the CRA and CSA regulatory frameworks. The CRA underscores the importance of lifecycle security, requiring manufacturers to continuously address vulnerabilities and maintain compliance throughout the product lifecycle. The proposed methodology integrates continuous monitoring and automated assessment tools, enabling rapid adaptation to evolving threats and supporting lifecycle recertifications. By leveraging standards such as the MUD and extending them to complex systems, the methodology not only reduces the attack surface but also ensures adaptability to dynamic cybersecurity landscapes. These features align closely with the CRA’s emphasis on lifecycle security and vulnerability management. The use of standards within the methodology fosters consistency and comparability in certifications as required by the CSA, while the dynamic security labeling system featuring visual indicators and QR codes enhances transparency and accessibility for all stakeholders. By introducing TP and adapting security evaluations to specific operational contexts, the methodology provides tailored yet harmonized certification outcomes, supporting the CSA’s vision for a unified certification framework.

6. Conclusions and Future Work

The proposed security evaluation methodology provides several significant advantages. First, it introduces automation and dynamic features, which simplify the traditionally complex and resource-intensive certification processes. The combination of automated testing, dynamic labeling, and clear visual communication supports both technical and non-technical stakeholders, enabling informed decision-making and fostering trust. Second, the methodology’s adaptability to different operational contexts through the use of tolerance profiles makes it suitable for a wide variety of ICT systems, from simple devices to complex interconnected platforms. Lastly, the ability to integrate and reuse previous evaluations while incorporating sensitivity factors ensures efficiency without compromising the accuracy of security assessments.

To further validate and refine this methodology, future work will involve conducting a series of case studies to evaluate and rate each component of the framework. This approach will guide targeted enhancements to better address practical challenges encountered during implementation. Additionally, we will investigate how to extend the methodology with semi-automated procedures to address issues encountered during the evaluation process before treatment sharing. In relation to this, we will consider technologies and mechanisms such as Hyperledger and threat MUD to support secure information sharing between relevant stakeholders and to launch the required processes to update the information contained in the QR code of the label.

The proposed methodology, while focused on technical risk management, can also be aligned with organizational risk governance, as outlined in the ISO 31001-3 standards, enabling stakeholders to make informed decisions that balance technical security needs with business objectives. For example, the context establishment phase can include analyzing how ICT system risks align with enterprise risk management frameworks, regulatory compliance, and strategic goals. Similarly, the treatment phase could support the prioritization of mitigation efforts based on their impact on both technical security and business outcomes. Further organizational risk management could be explored as part of the methodology’s future work.

Although coordination between stakeholders and regulatory bodies is key to defining a comprehensive cybersecurity certification framework, the proposed methodology is intended to serve as a basis for a more standardized and uniform approach. With ongoing advancements, this methodology is expected to evolve in alignment with EU regulations and ENISA guidelines, contributing to a cohesive and adaptable certification landscape in the coming years.

Author Contributions

Conceptualization, S.N.M.; methodology, S.N.M., J.F.M.-G., I.B., J.M., R.P., and A.S.; software, J.F.M.-G., I.B., J.M., and R.P.; validation, S.N.M., J.F.M.-G., I.B., J.M., and R.P.; writing—original draft preparation, S.N.M., J.F.M.-G., I.B., J.M., R.P., and A.S.; writing—review and editing, S.N.M. and R.P.; supervision, S.N.M. and A.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been partially funded by the European Commission through the projects H2020-952702 BIECO, H2021-101069471 CERTIFY, and DOSS (grant no. 101120270).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

Author Irene Bicchierai was employed by ResilTech s.r.l., author Jan Marchel and Radosław Piliszek were employed by 7bulls.com. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CC	Common Criteria
CPA	Commercial Product Assurance
CSA	Cybersecurity Act
CSPN	Certification de Sécurité de Premier Niveau
CRA	Cyber Resilience Act
CVE	Common Vulnerabilities and Exposures
CVSS	Common Vulnerability Scoring System
CWE	Common Weakness Enumeration
CWSS	Common Weakness Scoring System
DoS	Denial of Service
DPIA	Data Protection Impact Assessment
DREAD	Damage potential, Reproducibility, Exploitability, affected users, Discoverability
EAL	Evaluation Assurance Level
ENISA	European Union Agency for Cybersecurity
ETSI	European Telecommunications Standards Institute
EUCC	Common Criteria-based European candidate cybersecurity certification scheme
EUCS	European Cybersecurity Certification Scheme for Cloud Services
EVITA	E-safety Vehicle InTrusion-protected Applications
FMEA	Failure Mode and Effect Analysis
GDPR	General Data Protection Regulation
HEAVENS	HEAling Vulnerabilities to ENhance Software Security and Safety
ICT	Information and Communication Technology
IoT	Internet of Things
ISO	International Organization for Standardization
MBT	Model-Based Testing
MUD	Manufacturer Usage Description
NIST	National Institute of Standards and Technology
NVD	National Vulnerability Database
OCTAVE	Operationally Critical Threat, Asset, and Vulnerability Evaluation
PP	Protection Profile
STRIDE	Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privileges
TOE	Target of Evaluation
TP	Tolerance Profile
TTCN	Testing and Test Control Notation
UL	Underwriter Laboratories
YAML	YAML Aint Markup Language

References

Gartner. Top Cybersecurity Trends and Strategies for Securing the Future. 2024. Available online: https://www.gartner.com/en/cybersecurity/topics/cybersecurity-trends (accessed on 3 February 2025).
Regulation (EU) 2019/881 of the European Parliament and of the Council of 17 April 2019 on ENISA (the European Union Agency for Cybersecurity) and on information and communications technology cybersecurity certification and repealing Regulation (EU) No 526/2013 (Cybersecurity Act) (Text with EEA relevance), 2013.
Regulation (EU) 2024/2847 of the European Parliament and of the Council of 23 October 2024 on horizontal cybersecurity requirements for products with digital elements and amending Regulations (EU) No 168/2013 and (EU) 2019/1020 and Directive (EU) 2020/1828 (Cyber Resilience Act) (Text with EEA relevance), 2024.
European Telecommunications Standards Institute (ETSI). ETSI EG 203 251 V1.1.1: Methods for Testing and Specification; Risk-based Security Assessment and Testing Methodologies. 2016. Available online: https://www.etsi.org/deliver/etsi_eg/203200_203299/203251/01.01.01_60/eg_203251v010101p.pdf (accessed on 3 February 2025).
ISO 31000:2018; Risk Management—Guidelines. ISO (International Organization for Standardization): Geneva, Switzerland, 2018.
ISO/IEC/IEEE 29119-2:2021; Software and Systems Engineering—Software Testing. Part 2: Test Processes. ISO (International Organization for Standardization): Geneva, Switzerland, 2021.
Lear, E.; Romascanu, D.; Droms, R. Manufacturer Usage Description Specification. Standard IETF RFC 8520. 2019. Available online: https://tools.ietf.org/html/rfc8520 (accessed on 3 February 2025).
Alberts, C.J.; Dorofee, A.J.; Stevens, J.F.; Woody, C. OCTAVE-S Implementation Guide, Version 1; Carnegie Mellon University. Technical Report; 2005. Available online: https://insights.sei.cmu.edu/documents/1608/2005_002_001_14273.pdf (accessed on 3 February 2025).
E-Safety Vehicle Intrusion Protected Applications (EVITA). D3.4.3-On-Board Architecture and Protocols Verification; 2010. Available online: https://www.evita-project.org/deliverables.html (accessed on 3 February 2025).
Shameli-Sendi, A.; Aghababaei-Barzegar, R.; Cheriet, M. Taxonomy of information security risk assessment (ISRA). Comput. Secur. 2016, 57, 14–30. [Google Scholar] [CrossRef]
Matheu-García, S.N.; Hernández-Ramos, J.L.; Skarmeta, A.; Baldini, G. A Survey of Cybersecurity Certification for the Internet of Things. ACM Comput. Surv. 2020, 53, 1–36. [Google Scholar] [CrossRef]
Matheu, S.N.; Hernandez-Ramos, J.L.; Skarmeta, A.F. Toward a Cybersecurity Certification Framework for the Internet of Things. IEEE Secur. Priv. 2019, 17, 66–76. [Google Scholar] [CrossRef]
European Cyber Security Organisation (ECSO), WG1—Standardisation, Certification, Labelling and Supply Chain Management. Overview of Existing Cybersecurity Standards and Certification Schemes v2; 2017. Available online: https://ecs-org.eu/ecso-uploads/2022/10/5a31129ea8e97.pdf (accessed on 3 February 2025).
Common Criteria. Common Criteria for Information Technology Security Evaluation. Part 1: Introduction and General Model; 2022. Available online: https://commoncriteriaportal.org/files/ccfiles/CC2022PART1R1.pdf (accessed on 3 February 2025).
National Cyber Security Centre (NCSC). The Commercial Product Assurance (CPA) Build Standard v1.4; NCSC-1844117881-312. 2018. Available online: https://www.ncsc.gov.uk/files/CPA-Build_Standard_1-4.pdf (accessed on 3 February 2025).
Underwriters Laboratories (UL). Software Cybersecurity for Network-Connectable Products, Part 2-1: Particular Requirements for Network Connectable Components of Healthcare and Wellness Systems; 2023. Available online: https://www.shopulstandards.com/ProductDetail.aspx?productId=UL2900-2-1 (accessed on 3 February 2025).
Agence Nationale de la séCurité des Systèmes D’information (ANSSI). Certification de séCurité de Premier Niveau des Produits des Technologies de L’information (CSPN); Paris, No. 45 /ANSSI/SDE/PSS/CCN; 2023. Available online: https://cyber.gouv.fr/sites/default/files/document/ANSSI-CSPN-CER-P-01%20Certification_de_securite_de_premier_niveau_v5.0.pdf (accessed on 3 February 2025).
Zhou, C.; Ramacciotti, S. Common Criteria: Its Limitations and Advice on Improvement. ISSA J. 2011, 24–28. Available online: https://www.difesa.it/assets/allegati/33182/commoncriteria_issa_journal_0411.pdf (accessed on 3 February 2025).
Hernandez-Ramos, J.L.; Matheu, S.N.; Skarmeta, A. The Challenges of Software Cybersecurity Certification [Building Security In]. IEEE Secur. Priv. 2021, 19, 99–102. [Google Scholar] [CrossRef]
Fowler, D.; Epiphaniou, G.; Maple, C. Cybersecurity Assurance and Certification for Systems. 2022. Available online: https://www.researchgate.net/publication/370873334_Cybersecurity_Assurance_and_Certification_for_Systems (accessed on 3 February 2025).
Khurshid, A.; Alsaaidi, R.; Aslam, M.; Raza, S. EU Cybersecurity Act and IoT Certification: Landscape, Perspective and a Proposed Template Scheme. IEEE Access 2022, 10, 129932–129948. [Google Scholar] [CrossRef]
Hernández-Ramos, J.L.; Matheu, S.N.; Feraudo, A.; Baldini, G.; Bernabe, J.B.; Yadav, P. Defining the Behavior of IoT Devices Through the MUD Standard: Review, Challenges, and Research Directions. IEEE Access 2021, 9, 126265–126285. [Google Scholar] [CrossRef]
Matheu García, S.N.; Sánchez-Cabrera, A.; Schiavone, E.; Skarmeta, A. Integrating the manufacturer usage description standard in the modelling of cyber–physical systems. Comput. Stand. Interfaces 2024, 87, 103777. [Google Scholar] [CrossRef]
ISO 26262-3:2018; Road Vehicles—Functional Safety Part 3: Concept Phase. ISO (International Organization for Standardization): Geneva, Switzerland, 2018.
Failure Modes and Effects Analysis (FMEA). University of Cambdridge. Available online: https://www.ifm.eng.cam.ac.uk/research/dmg/tools-and-techniques/fmea-failure-modes-and-effects-analysis/ (accessed on 3 February 2025).
BSI-Standard 100-4; Business Continuity Management. BSI (Federal Office for Information Security): Nordrhein-Westfalen, Germany, 2009.
Federal Office for Information Security (BSI). Privacy Impact Assessment Guideline; Federal Office for Information Security (BSI): Bonn, Germany, 2011. [Google Scholar]
National Institute of Standards and Technology (NIST). National Vulnerability Database (NVD). Available online: https://nvd.nist.gov (accessed on 3 February 2025).
Matheu, S.; Sánchez, A.; Cioroaica, E.; Daoudagh, S.; Lonetti, F.; Marchetti, E.; Schiavone, E.; Massimiliano, L.; Sorokos, I.; Pintos, B.; et al. D7.1 Report on the Identified Security and Privacy Metrics and Security Claims to Evaluate the Security of a System. BIECO Project—Building Trust in Ecosystems and Ecosystem Components. 2022. Available online: https://www.bieco.org/project-description/deliverables/d7-1-report-on-the-identified-security-and-privacy-metrics-and-security-claims-to-evaluate-the-security-of-a-system (accessed on 3 February 2025).
National Telecommunications and Information Administration (NTIA). Software Bill of Materials (SBOM). 2022. Available online: https://ntia.gov/SBOM (accessed on 3 February 2025).
ETSI ES 201 873-1; Methods for Testing and Specification (MTS); Testing and Test Control Notation Version 3; Part 1: TTCN-3 Core Language. European Telecommunications Standards Institute (ETSI): Sophia Antipolis, France, 2023.
NIST SP 800-30; Joint Task Force Transformation Initiative. Guide for Conducting Risk Assessments: Information security. National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2012. [CrossRef]
Lautenbach, A. D2. Security Models. HEAVENS project—HEAling Vulnerabilities to ENhance Software Security and Safety. 2016. Available online: https://autosec.se/wp-content/uploads/2018/03/HEAVENS_D2_v2.0.pdf (accessed on 3 February 2025).
Eichler, J.; Angermeier, D. Modular risk assessment for the development of secure automotive systems. In Proceedings of the 31st VDI/VW Joint Conference Automotive Security, Wolfsburg, Germany, 21–22 October 2015. [Google Scholar]
European Cyber Security Organisation (ECSO). Cybersecurity Made in Europe. Available online: https://www.cybersecurity-label.eu/ (accessed on 3 February 2025).
IoT Security and Privacy Label. University of Carnegie Mellon. Available online: https://www.iotsecurityprivacy.org/ (accessed on 3 February 2025).
Directive 2010/30/EU of the European Parliament and of the Council of 19 May 2010 on the indication by labelling and standard product information of the consumption of energy and other resources by energy-related products (recast) (Text with EEA relevance. Directive-2010/30-EN-EUR-Lex. 2010.
National Institute of Standards and Technology (NIST). Considerations for Managing Internet of Things (IoT) Cybersecurity and Privacy Risks; National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2019. Available online: https://nvlpubs.nist.gov/nistpubs/ir/2018/NIST.IR.8228-draft.pdf (accessed on 3 February 2025).
Net2DG Project-Leveraging NETworked Data for the Digital Electricity Grid. Funded by European Union’s Horizon 2020 Research and Innovation Programme Under Grant Agreement No 774145. 2020. Available online: http://www.net2dg.eu/ (accessed on 3 February 2025).
Ben-Kiki, O.; Evans, C.; Ingy, N. YAML Ain’t Markup Language (YAML) Version 1.2. 2021. Available online: https://yaml.org/spec/1.2.2/ (accessed on 3 February 2025).
Schiavone, E.; Nostro, N.; Brancati, F. A MDE Tool for Security Risk Assessment of Enterprises. In Proceedings of the 10th Latin-American Symposium on Dependable Computing (LADC 2021). Sociedade Brasileira de Computação, Porto Alegre, Brasil, 22–26 November 2021; pp. 5–7. [Google Scholar] [CrossRef]
Matheu-García, S.N.; Hernández-Ramos, J.L.; Skarmeta, A.F.; Baldini, G. Risk-based automated assessment and testing for the cybersecurity certification and labelling of IoT devices. Comput. Stand. Interfaces 2019, 62, 64–83. [Google Scholar] [CrossRef]
Lorrain, J.; Fourneret, E.; Dadeau, F.; Legeard, B. MBeeTle-un Outil Pour la géNération de Tests à-la-volée à l’aide de Modèles. Groupement De Recherche CNRS du Génie de la Programmation et du Logiciel, Besançon, France. 2016. Available online: https://hal.science/hal-02472608 (accessed on 3 February 2025).
Bicchierai, I.; Araniti, E.; Matheu-García, S.N.; Gil, J.F.M. Validating the BIECO Security Evaluation Methodology within a Smart Grid Monitoring SW. In Proceedings of the 2023 IEEE 34th International Symposium on Software Reliability Engineering Workshops (ISSREW), Florence, Italy, 9–12 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 9–12. [Google Scholar] [CrossRef]
Daoudagh, S.; Marchetti, E.; Calabrò, A.; Ferrada, F.; Oliveira, A.I.; Barata, J.; Peres, R.; Marques, F. DAEMON: A Domain-Based Monitoring Ontology for IoT Systems. SN Comput. Sci. 2023, 4, 1–16. [Google Scholar] [CrossRef]
Matheu, S.; Sebastio, S.; Skarmeta, A.; Orizio, R.; Vasileiadis, S.; Kalos, V.; Muller, K.; Grubl, T.; Omana, R.; Tuck, S.; et al. Active Security for Connected Device Lifecycle: The CERTIFY Architecture. In Proceedings of the Special Session on Intelligent Internet of Things Security and Privacy (WISP), Salamanca, Spain, 26–28 June 2024. [Google Scholar]

Figure 1. Overview of the proposed security evaluation methodology.

Figure 2. Example of a tolerance profile (TP).

Figure 3. Decomposition of the system.

Figure 4. Example of risk evaluation against a tolerance profile.

Figure 5. Tools and mechanisms used for the specific scenario.

Figure 6. Overview of the ICT GW.

Figure 7. GraphWalker model for adapter layer of ICT GW.

Figure 8. Security label obtained for ICT GW.

Figure 9. Overview of interaction between the extended MUD and the auditing framework.

Figure 10. Overview of the AI Investments platform.

Figure 11. Portion of the ResilBlockly model of the master component of the AII platform.

Figure 12. GraphWalker model for the master component of the AII platform.

Figure 13. Security label obtained for the master component.

Table 1. Comparison between the main certification frameworks.

Framework	Objectivity and Harmonization	Context	Cost and Time	Dynamism	Labeling
CC	Based on standard methods, CPP and MRA, heavy reliance on manual evaluation	Considered as PP	High cost and slow	Static evaluations not suited for frequent recertifications, requiring a full re-assessment	Label considered for assurance levels
CSPN	No formal methods and heavy reliance on manual evaluation	Evaluation on a representative execution environment	Low cost and fast	Static evaluations without support for lifecycle updates or evolving threats	No label
CPA	Subjective and heavy reliance on manual evaluation	Not considered	Moderate cost and slow	Static certification	No label
CAP UL	Private methods not validated by the community	Context considered in the report	Cost-intensive for small-scale systems and significant time investment	Static evaluation process; does not support continuous monitoring or recertification.	No label
ARMOUR	Based on standard methods but focused on IoT	Considered as PP	Low cost and time	Static evaluation process	Visual label
Our proposal	Based on standard methods and support for complex systems	Considered as tolerance profiles	Reduces cost and time via automation and composition	Incorporates continuous monitoring and dynamic recertification to address evolving threats	Visual and dynamic label

Table 2. Standards that support the proposed methodology.

Phase	Standard	Concept	Purpose
Establishing the context	CC/EUCC [14]	EAL	Describe the depth of the evaluation
	CC [14]	PP	Consider the context of the TOE
	Any	Claims	Identify security requirements applicable to the TOE
Risk assessment	ISO 29119 [6], ETSI TR 101 583 [4]	Process	Identify actions to perform during the risk assessment phase and synergies with security testing
	ISO 26262-3 [24]	Impact	Estimate the safety impact
	FMEA [25]	Impact	Estimate the operational impact
	BSI 100-4 [26]	Impact	Estimate the financial impact
	BSI PIA Guideline [27]	Impact	Estimate privacy and legislation impact
Security testing	ISO 31000 [5], ETSI TR 101 583	Process	Identify actions to perform during security testing phase and synergies with security risk assessment
Treatment	MUD [7]	Format	Describe security mitigation policies to be enforced during operational time
Communication and monitoring	ETSI TR 101 583	Process	Identify actions to perform during communication and monitoring

Table 3. Safety impact [33].

Value	Criteria
0	No injury
10	Light and moderate injuries
100	Severe and life-threatening injuries (survival probable)
1000	Life-threatening injuries (survival uncertain), fatal injuries

Table 4. Financial impact [33].

Value	Criteria
0	No discernible effect, no appreciable consequences
10	The financial damage remains tolerable to the organization
100	The resulting damage leads to substantial financial losses but does not threaten the existence of the organization
1000	The financial damage threatens the existence of the organization

Table 5. Operational impact metrics [33].

Value	Criteria
0	No discernible effect
1	Minor/Moderate
10	Degradation or loss of secondary function (system still operable but comfort or convenience functions do not work or work at a reduced level of performance)
10	Degradation of a primary function (system still operates but at a derated performance)
100	Degradation of a primary function (system still operates but at a derated performance)
100	Potential failure mode affects safe system operation with some warning or non-compliance with government regulation

Table 6. Privacy and legislation impact metrics [33].

Value	Criteria
0	No discernible effects in relation to violations of privacy and legislation.
0	Privacy violations of a particular stakeholder, which may not lead to abuses (e.g., impersonation of a victim to perform actions with stolen identities is not acceptable).
1	Violation of legislation without appreciable consequences for business operations and finance
10	Privacy violations of a particular stakeholder leading to abuses and media coverage.
10	Violation of legislation with potential of consequences for business operations and finance (e.g. penalties, loss of market share, media coverage).
100	Privacy violation of multiple stakeholders leading to abuses. This level of privacy violation may lead to extensive media coverage as well as severe consequences in terms of loss of market share, business operations, trust, reputation, and finance for manufacturers and providers.
100	Violation of legislation causing significant consequences for business operations and finance (e.g. huge financial penalties, loss of market share) as well as extensive media coverage.

Table 7. Impact level and normalized value [33].

Sum of the Impact Aspects	Impact Level	Impact Value
0	No impact	0
1–6	Low	1
7–13	Low	2
14–19	Low	3
20–45	Medium	4
46–73	Medium	5
74–99	Medium	6
100–299	High	7
300–699	High	8
700–999	High	9
>=1000	Critical	10

Table 8. Maturity levels for the security evaluation methodology.

Maturity Level	Description	Key Characteristics
Initial	Basic assessments, no formal procedures	Manual assessments, minimal documentation
Defined	Formal processes introduced, limited automation	Standard-aligned, structured but manual processes
Managed	Automated tools for risk identification	Partial automation, basic metrics applied
Optimized	Advanced metrics and continuous feedback	Iterative improvements, extensive metrics usage
Sustainable	Fully automated and continuous evaluation	Real-time analysis, predictive metrics, automation

Table 9. Overview of ICT GW security evaluation.

Component	Claim	Security Property	Test	Overall Impact	Additional Information Collected	Test Result
Adapters layer	C17	Confidentiality	Authentication encrypted	4		PASS
Adapters layer	C46	Authentication	Authentication protocol	4	Protocol used for AuthN = PSK RSA 2048	PASS
Adapters layer	C47	Authentication	Session expired	4	Expiration time = 1 min	PASS
Adapters layer	C52	Authorization, confidentiality	Personal data deleted	2		PASS
Adapters layer	C58	Authentication	Consecutive invalid login attempts	2	Limit attempts, time between attempts = NA	FAIL
Adapters layer	C72	Non-repudiation	Remove logs	1		FAIL
Adapters layer	C10	Authentication	Change auth values	1		FAIL
Service layer	C43	All	Check libraries	8	Number of libraries outdated = 18	PASS
Service layer	C11	Integrity	Check input sensitivity	4		PASS
Service layer	C23	Availability	Check input validation	4		PASS
Domain logic layer	C10	Authentication	Change password	1		PASS
Domain logic layer	C23	Availability	Bad format	4		PASS
Domain logic layer	C23	Availability	DoS resistance	7		PASS
Domain logic layer	C23	Availability	Check DoS resistance	7	Max simultaneous connections = 4822	PASS

Table 10. Overview of AI Investments security evaluation.

Component	Claim	Security Property	Test	Overall Impact	Additional Information Collected	Test Result
Master	C6	Confidentiality, authentication	Check correct unique session created	2		PASS
Master	C47	Authentication	Reauthentication	2	Expiration time = 60 min	PASS
Master	C15	Authentication	Revoked cert by client	7		FAIL
Master	C15	Authentication	Expired client cert	7		FAIL
Master	C15	Authentication	Client connection with no cert	7		FAIL
Master	C15	Authentication	Correct cert client connection	7		PASS
Master	C15	Authentication	Invalid server cert warning	7		FAIL
Master	C15	Authentication	Auth service wrong credentials	7		PASS
Master	C23	Availability	API error handling long input	7		PASS
Master	C18	Authorization	Unused interfaces disabled	2	Number of exposed interfaces = 2	PASS
Master	C22	Availability	Denial of service	7	Max simultaneous connections = 50,000	PASS

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Matheu, S.N.; Martínez-Gil, J.F.; Bicchierai, I.; Marchel, J.; Piliszek, R.; Skarmeta, A. A Flexible Risk-Based Security Evaluation Methodology for Information Communication Technology System Certification. Appl. Sci. 2025, 15, 1600. https://doi.org/10.3390/app15031600

AMA Style

Matheu SN, Martínez-Gil JF, Bicchierai I, Marchel J, Piliszek R, Skarmeta A. A Flexible Risk-Based Security Evaluation Methodology for Information Communication Technology System Certification. Applied Sciences. 2025; 15(3):1600. https://doi.org/10.3390/app15031600

Chicago/Turabian Style

Matheu, Sara N., Juan F. Martínez-Gil, Irene Bicchierai, Jan Marchel, Radosław Piliszek, and Antonio Skarmeta. 2025. "A Flexible Risk-Based Security Evaluation Methodology for Information Communication Technology System Certification" Applied Sciences 15, no. 3: 1600. https://doi.org/10.3390/app15031600

APA Style

Matheu, S. N., Martínez-Gil, J. F., Bicchierai, I., Marchel, J., Piliszek, R., & Skarmeta, A. (2025). A Flexible Risk-Based Security Evaluation Methodology for Information Communication Technology System Certification. Applied Sciences, 15(3), 1600. https://doi.org/10.3390/app15031600

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Flexible Risk-Based Security Evaluation Methodology for Information Communication Technology System Certification

Abstract

1. Introduction

2. Challenges and State of the Art

2.1. Objectivity and Harmonization

2.2. Operational Context

2.3. Cost and Time

2.4. Dynamism

2.5. Labeling

3. Security Evaluation Methodology for ICT Systems

3.1. Establishing the Context

3.2. Risk Identification

3.3. Security Testing

3.4. Risk Estimation

3.4.1. Likelihood

3.4.2. Impact

3.4.3. Risk Aggregation

3.5. Risk Evaluation

3.6. Labeling

3.7. Treatment

3.8. Monitoring and Communication

4. Evaluation over Use Cases

4.1. ICT Gateway

4.1.1. Context Establishment

4.1.2. Risk Identification

4.1.3. Security Testing

4.1.4. Risk Estimation and Evaluation

4.1.5. Labeling

4.1.6. Treatment

4.1.7. Monitoring and Communication

4.2. AI Investments

4.2.1. Context Establishment

4.2.2. Risk Identification

4.2.3. Security Testing

4.2.4. Risk Estimation, Evaluation, and Labeling

4.2.5. Treatment

5. Discussion

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI