Exercise-Specific YANG Profile for AI-Assisted Network Security Labs: Bidirectional Configuration Exchange with Large Language Models

Tateiwa, Yuichiro

doi:10.3390/info16080631

Open AccessArticle

Exercise-Specific YANG Profile for AI-Assisted Network Security Labs: Bidirectional Configuration Exchange with Large Language Models^†

by

Yuichiro Tateiwa

Department of Computer Science, Nagoya Institute of Technology, Gokiso-cho, Showa-ku, Nagoya-shi 466-8555, Aichi-ken, Japan

^†

This article is a revised and expanded version of a paper entitled “Development of Dialogue Feature between Participants and ChatGPT in Network Security Exercise System”, which was presented at the 7th International Conference on Information and Computer Technologies (ICICT 2024), Honolulu, Hawaii, USA, 15–17 March 2024.

Information 2025, 16(8), 631; https://doi.org/10.3390/info16080631

Submission received: 21 May 2025 / Revised: 17 July 2025 / Accepted: 22 July 2025 / Published: 24 July 2025

(This article belongs to the Special Issue AI Technology-Enhanced Learning and Teaching)

Download

Browse Figures

Versions Notes

Abstract

Network security courses rely on hands-on labs where students configure virtual Linux networks to practice attack and defense. Automated feedback is scarce because no standard exists for exchanging detailed configurations—interfaces, bridging, routing tables, iptables policies—between exercise software and large language models (LLMs) that could serve as tutors. We address this interoperability gap with an exercise-oriented YANG profile that augments the Internet Engineering Task Force (IETF) ietf-network module with a new network-devices module. The profile expresses Linux interface settings, routing, and firewall rules, and tags each node with roles such as linux-server or linux-firewall. Integrated into our LiNeS Cloud platform, it enables LLMs to both parse and generate machine-readable network states. We evaluated the profile on four topologies—from a simple client–server pair to multi-subnet scenarios with dedicated security devices—using ChatGPT-4o, Claude 3.7 Sonnet, and Gemini 2.0 Flash. Across 1050 evaluation tasks covering profile understanding (n = 180), instance analysis (n = 750), and instance generation (n = 120), the three LLMs answered correctly in 1028 cases, yielding an overall accuracy of 97.9%. Even with only minimal follow-up cues (≦3 turns) —rather than handcrafted prompt chains— analysis tasks reached 98.1% accuracy and generation tasks 93.3%. To our knowledge, this is the first exercise-focused YANG profile that simultaneously captures Linux/iptables semantics and is empirically validated across three proprietary LLMs, attaining 97.9% overall task accuracy. These results lay a practical foundation for artificial intelligence (AI)-assisted security labs where real-time feedback and scenario generation must scale beyond human instructor capacity.

Keywords:

LLM (large language model); YANG; network security; e-learning; virtual machine; network configuration; network management

1. Introduction

The persistent cybersecurity skills gap underscores the need for more effective educational models and strategic learning approaches [1]. In response, network security education has increasingly relied on hands-on exercises where students design, configure, and test virtual networks to explore both attack and defense strategies. However, current exercise platforms scale poorly: instructors must manually inspect every configuration, craft personalized feedback, and shepherd troubleshooting, a workload that quickly becomes unsustainable as class sizes and scenario complexity grow. Large language models (LLMs) are promising candidates to automate these mentoring tasks, yet no standard, machine-readable format exists for exchanging network configuration data between educational platforms and LLM-based assistants. To fill this gap, we introduce a YANG profile that captures Linux and iptables semantics and is designed for bidirectional parsing by both sides.

Large language models (LLMs) possess broad knowledge in the field of computer science and are capable of responding effectively to related inquiries. Recent empirical studies confirm that large language models (LLMs) can already undertake non-trivial network engineering tasks with high accuracy. ChatNet [2] reports a small capacity-planning case study and shows that retrieval-augmented generation (RAG) (vs. zero-/few-shot chain-of-thought (CoT)) mitigates the calculator bottleneck. NETBUDDY shows that batching high-level policy requirements enables GPT-4 to generate P4 table entries and Border Gateway Protocol (BGP) configurations at roughly one-sixth the per-requirement cost [3]. On the NeMoEval benchmark, GPT-4 achieved 88% functional correctness while generating NetworkX-based traffic analysis code [4]. These results substantiate the practical value of LLM assistance in network design and motivate this work, which introduces a dedicated YANG-based exchange profile for network security exercises. Through interactions with ChatGPT [5], Claude [6], and Gemini [7], the author confirmed that these LLMs can provide practical guidance on basic network construction and secure network operation, often including concrete implementation examples.

In the network security exercise classes taught by the author, participants design, build, and operate networks, and observe their behavior to solve practice problems and assigned tasks. In such situations, LLMs are expected to assist in troubleshooting various issues, effectively supplementing or replacing instructors.

The LiNeS Cloud exercise environment [8] enables the construction of virtual networks on a server using user-mode Linux virtual machines as nodes and provides a web-based user interface for network editing. In this environment, participants use their local PCs to build networks composed of Linux-based devices (such as servers and firewalls), and perform both attack and defense activities. Under these conditions, it is straightforward for server-side application processes to collect the participants’ network configurations.

However, no standardized mechanism currently exists for exchanging structured configuration data between applications and large language models (LLMs) in the context of educational network exercises—particularly one that supports Linux-specific semantics such as iptables rules and custom device roles. As a result, challenges remain in ensuring interoperability and seamless integration between LLMs and system components. While LLMs are capable of understanding both natural and formal languages, the ambiguity and lack of structure inherent in natural language hinder deterministic applications from accurately interpreting such information. As a result, it remains difficult for the LiNeS Cloud system to construct networks directly based on LLM-generated proposals (see Figure 1a). To overcome this limitation, we introduce an exercise-specific YANG profile that enables deterministic, bidirectional exchange of network configuration data between LiNeS Cloud and LLMs, as illustrated in Figure 1b.

YANG [9] is a formal language designed for describing network configurations, with its specifications documented in natural language and widely accessible online. Additional configuration parameters can be incorporated by systematically extending existing YANG modules according to established rules. The author confirmed that ChatGPT, Claude, and Gemini were able to describe basic network configurations with reasonable accuracy using existing YANG modules.

However, the YANG modules defined in IETF Request for Comments (RFC) documents [10] are insufficient for fully describing the exercise networks. This is due to the inclusion of Linux-based devices in the exercise environment, whose specific configurations—such as Linux-specific network interface settings, iptables-based firewall rules, and exercise-specific device roles—are not adequately captured by standard YANG modules. Since these modules are primarily designed for general-purpose network devices, they lack the structural support needed to describe Linux-specific configurations.

The author developed a dialogue system within LiNeS Cloud that enables participants to interact with ChatGPT, proposing a method to share network configuration information with ChatGPT using YANG profiles [11]. This dialogue function is expected to enable participants to consult ChatGPT for troubleshooting network design issues and implementing attack and defense strategies, thereby reducing their reliance on instructors or teaching assistants.

However, the previous study [11] did not provide sufficient detail on the design rationale or evaluation of the YANG profile, nor did it examine compatibility with LLMs other than ChatGPT. In response, this research proposes a YANG profile designed to efficiently share exercise network configuration information between the LiNeS Cloud application and LLMs. This profile builds upon the ietf-network module [12], which provides a model for network topology, and extends it using three main strategies: (1) defining and adding elements specific to exercise networks (e.g., firewall configurations), (2) incorporating reusable components from existing YANG modules (e.g., interface names), and (3) modifying dependencies and conditional relationships among elements to reflect constraints in the exercise environment. This structured approach allows for a detailed representation of exercise networks in a format interpretable by both LLMs and applications. Furthermore, this study experimentally evaluates how accurately LLMs can interpret and generate network configurations based on the proposed YANG profile.

The YANG profile proposed in this study serves as a foundational technology for integrating LLMs into network exercise systems. Leveraging this technology, LLMs can comprehend the networks constructed by participants, offer feedback and improvement suggestions, and even design exercise scenarios themselves (e.g., troubleshooting networks or attack scenarios). Such collaboration with AI is expected to significantly enhance the quality and adaptability of hands-on learning experiences, thereby broadening the educational applications of AI.

The remainder of this paper is organized as follows. Section 2 surveys five related research streams and positions our contribution. Section 3 states the design principles for the YANG profile. Section 4 details the profile specification, and Section 5 describes the systematic element definition method. Section 6 evaluates the profile with three state-of-the-art LLMs. Finally, Section 7 concludes and outlines future work.

2. Related Work

A short prototype of this idea was reported in our conference paper [11]. That work introduced the concept of sending exercise networks to ChatGPT, but (1) lacked a formal YANG profile, (2) evaluated only a single LLM, and (3) did not cover Linux/iptables. The present article addresses all three gaps.

Recent advances in network automation and AI-assisted education have highlighted the need for an efficient and trustworthy mechanism to exchange structured configuration data. The present study is situated at the confluence of five research streams: (1) network configuration management tools, (2) educational network-exercise platforms, (3) AI-based tutoring systems, (4) integrations of LLMs and structured data, and (5) YANG extension efforts. Table 1 summarizes representative studies in each stream and delineates how the proposed approach addresses their limitations. The remainder of this section reviews these studies in turn.

2.1. Network Configuration Management Tools

Wågbrant and Radic [13] quantitatively compared Ansible, Puppet, and SaltStack, demonstrating labor savings but assuming no interaction with LLMs. These tools often rely on configuration formats like YAML (YAML Ain’t Markup Language) and a command-line interface (CLI). The Cisco-Telefónica case study [14] shows that while Cisco Network Services Orchestrator (NSO) enables multi-vendor inventory consolidation, it requires integration with additional Crosswork components (such as Change Automation and Health Insights) and external workflow management systems to achieve fully automated remediation capabilities. By contrast, the present study employs a YANG-based structured representation that enables bidirectional exchange of detailed configuration data—including iptables rules and OS versions—with LLMs, thereby supporting automated diagnosis and repair suggestions.

2.2. Educational Network-Exercise Platforms

Nedyalkov [15] reported security experiments using GNS3 to study data exchange security between power electronic devices and control centers, focusing on power distribution unit (PDU) and uninterruptible power supply (UPS) devices with network protocol analysis tools. Harahus and Cavojský [16] evaluated EVE-NG and noted that instructors can share preprogrammed lab files to save time and reduce manual configuration tasks. While prior works like PocketCTF by Karagiannis et al. [17] have focused on creating lightweight, container-based cybersecurity exercise platforms, they do not provide a mechanism for the bidirectional exchange of structured configuration data with LLMs. The proposed YANG profile can be integrated with such platforms to further enhance efficiency by enabling LLMs to provide automated analysis and feedback on learner configurations.

2.3. AI-Based Tutoring Systems

PyTutor [18] generates explanations that are validated by instructors before delivery, resulting in a semi-synchronous workflow. The present study enables live sharing of network state with an LLM, through which learners can receive instant feedback and interactively revise their configurations. Recent research has shown both the potential and challenges of applying LLMs in education. For instance, Espinha Gasiba et al. [19] evaluated ChatGPT’s ability to fix software vulnerabilities, noting that while it achieves high accuracy on well-documented issues, its performance is inconsistent in context-specific scenarios. This suggests that a structured data exchange format, such as the one proposed by this study, is crucial for achieving stable LLM performance.

2.4. LLMs and Structured Data

ChatNet [2] proposes a modular framework for intent-to-configuration generation and quantitatively compares zero-shot, few-shot, CoT, and RAG prompts in a capacity-planning scenario using relative LLM score and human intervention counts. NETBUDDY [3] leveraged GPT-4 to compile declarative, high-level network policies into concrete P4 table-entry and BGP configurations, and further showed that batching multiple requirements can cut the per-requirement synthesis cost by roughly six-fold in selected scenarios. Mani et al. [4] proposed an LLM-based approach for generating network management code. In the NeMoEval benchmark, GPT-4 achieved 88% functional correctness when synthesizing traffic analysis programs that leverage the Python NetworkX library; their evaluation covered a variety of synthetic and production-inspired topologies, rather than being limited to live production networks. This study contributes an exercise-specific YANG profile and evaluates it across multiple LLMs, directly addressing the domain-adaptation gap outlined in [2].

2.5. YANG Extension Efforts

OpenConfig [20] supplies vendor-neutral YANG models for production networks but lacks Linux-specific and educational elements. Cisco NSO [14] abstracts services via YANG for multi-vendor deployments, but its standard distribution lacks a built-in workflow engine, requiring integration with external workflow management systems or additional Crosswork components for automated remediation. The present profile extends standard YANG systematically with iptables, OS versions, and exercise-specific device roles, and demonstrates its applicability in a learning environment.

2.6. Positioning and Novelty

Across the five streams, the literature exhibits several limitations: (1) configuration tools omit LLM interaction, (2) educational platforms rely on Extensible Markup Language (XML)/CLI and cannot exchange structured data, (3) generic tutoring systems do not consider the peculiarities of network exercises, (4) LLM data studies focus on commercial settings rather than education, and (5) existing YANG extensions disregard the dynamic nature of learning tasks. The present study integrates these streams by introducing the first bidirectional data-exchange mechanism between LLMs and exercise systems based on an exercise-specific YANG profile, enriched with Linux-related details, and validates its practicality on ChatGPT, Claude, and Gemini.

In short, despite the high quantitative performance reported above, no prior study has applied these LLM capabilities to hands-on security labs, nor has any work simultaneously (1) enabled bidirectional data exchange between an educational network-exercise platform and LLMs, (2) extended standard YANG with Linux/iptables semantics, and (3) benchmarked three proprietary LLMs in a single, profile-consistent setting. By satisfying these three elements, our study is decisively differentiated from prior work such as ChatNet [2] and NETBUDDY [3]. Hence, this study provides the first bidirectional LLM exercise platform bridge, outperforming the prior state of the art across all five capability items in Table 1.

Table 1. Capability matrix of prior LLM+network studies vs. this work.

Category (Section)	Representative	Data Model	LLM Aware	Educational Focus	Gap Addressed by This Study
Section 2.1	Ansible [13]	YAML/CLI	No	No	No diagnosis or advice
	Cisco NSO: Inventory and Telemetry [14]	YANG	No	No	Semi-automated remediation; external workflow manager
Section 2.2	GNS3 [15]	Unspecified/Ad-hoc	No	Yes	Lacks standardized, machine-readable format for configuration exchange
	EVE-NG [16]	Unspecified/ad-hoc	No	Yes	Manual configuration/lab setup burden
	PocketCTF [17]	Unspecified/Ad-hoc	No	Yes	Lacks standardized, machine-readable format for configuration exchange
Section 2.3	PyTutor [18]	—	ChatGPT	Yes	Semi-synchronous, no live chat
	CyberSecurity Challenges [19]	—	ChatGPT	Yes	Lacks a specific data model, leading to inconsistent performance; focuses on code analysis, not network configuration
Section 2.4	ChatNet [2]	CLI	GPT-4	No	Linux/education support absent
	NETBUDDY [3]	P4/BGP	GPT-4	No	No learner assistance
Section 2.5	OpenConfig [20]	YANG	—	No	Linux/`iptables` undefined
	Cisco NSO [14]	YANG	—	No	No built-in workflow engine; requires external workflow management or Crosswork components for automated remediation
This study	YANG + LLM	YANG/JSON	GPT-4o/Claude/Gemini	Yes	Exercise-specific YANG and multi-LLM evaluation

2.7. External Validity Supported by Related Work

Recent studies on automatic grading and repair of network configurations with large language models (LLMs) typically validate their approaches on synthetic or lab-scale topologies before moving to real-world data. For example, Mani et al. [4] evaluate GPT-4 and other LLMs on communication graphs of varying sizes as well as the public MALT enterprise network dataset, yet do not involve any student submissions. Similarly, Wang et al.’s NETBUDDY [3] demonstrates proof-of-concept scenarios—such as multiprotocol label switching (MPLS) and BGP—on small Kathará-emulated networks, leaving large-scale evaluation for future work. These precedents indicate that preliminary experiments on synthetic or controlled environments can still yield insights of practical value. Accordingly, our use of four curriculum-inspired topologies aligns with this trend and constitutes a justified preparatory step toward the large-scale assessment on anonymized student submissions that we plan for the next academic term.

3. Model Design Principles

The virtual network in LiNeS Cloud encompasses a wide range of information, including network topology and user accounts. However, the YANG profile proposed in this study focuses solely on information directly relevant to solving exercise tasks—specifically, the information that participants need to configure or reference. By narrowing the input to LLMs to only essential elements associated with the exercises, the profile reduces unnecessary noise and enhances the relevance of interactions.

Existing YANG modules are insufficient for fully describing the configuration of exercise networks. The ietf-network and ietf-network-topology modules defined in RFC 8345 provide a foundational framework for expressing basic network topology. However, they lack the capability to represent detailed settings required in the LiNeS Cloud environment, such as operating system types, service configurations, and firewall rules. Similarly, other modules like ietf-interfaces (RFC 8343) and ietf-routing (RFC 8349) are tailored for general-purpose network devices and do not contain the specific elements necessary for addressing exercise-related tasks.

To address this gap, this study proposes a YANG profile that facilitates efficient sharing of exercise network configuration information between LLMs and applications. Specifically, a new module named network-devices was defined to extend the /networks/network element from the ietf-network module. The resulting profile integrates essential components from existing modules with those defined in the new module.

The choice of /networks/network as the extension base was motivated by the following reasons:

It provides a well-established framework for representing basic network topology.
It features a structure that clearly expresses the relationships between nodes and links.
It offers high extensibility, allowing flexible addition of new containers and leaves.
It is a standardized model that facilitates LLM understanding of network topologies.

4. YANG Profile

The proposed YANG profile is based on the ietf-network module defined in RFC 8345, extended with a custom network-devices module and integrated with elements from existing modules. This profile is illustrated in Figure A1, following the tree diagram format [21]. The prefixes used in this YANG profile and in the following explanation are listed in Table 2. In the subsequent explanation, elements are referenced in the format “element-name(line number)”.

Each exercise network is uniquely identified by network-id(2). Since participants may create multiple networks, this identifier is used to distinguish them. link-id(4) represents the identifier of a cable. source-node(6) and dest-node(9) contain the identifiers of the devices (i.e., node-id(12)) connected to each end of the cable. source-tp(7) and dest-tp(10) specify the identifiers of the physical ports (i.e., tp-id(16)) at both cable endpoints.

node-id(12) identifies a device. node-type(13) indicates the device type, which can be one of the following: L2-switch, repeater-hub, linux-server, linux-client, linux-firewall, linux-router, or linux-blackhat. power-state(14) specifies whether the device is powered on or off. tp-id(16) refers to the name of a physical port on the device. Depending on the device type, a list of termination-point(15) is defined. For node-type(13) set to L2-switch or repeater-hub, the list includes eth[0-4]; for linux-server or linux-client, eth0; for linux-firewall or linux-blackhat, eth[0-1]; and for linux-router, eth[0-2]. hostname(17) is a user-assigned name to help participants distinguish between devices. os-info(18) represents a combination of Linux distribution and kernel version.

name(21) is the Linux-recognized name of a network interface (e.g., eth0, br0). bridge-name(22) indicates the bridge interface to which the network interface belongs and corresponds to name(30). oper-status(23) indicates the operational state of the interface, represented as an enumeration: up for active and down for inactive. phys-address(24) is the MAC address of the interface. ip(26) and netmask(27) specify the IP address and subnet mask, respectively. name(30) denotes the name of the bridge interface.

name(34) identifies the Routing Information Base (RIB). Although the exercise network uses a single RIB, the profile retains the original structure to help LLMs understand the model more quickly and accurately. route-preference(37) indicates the preference level of a route. destination-prefix(38) specifies the destination network address, while next-hop-address(40) indicates the next hop. outgoing-interface(41) specifies the egress interface for the route, corresponding to the value of name(21).

iptables-save-output(43) contains the firewall configuration for linux-firewall, represented by the output of the iptables-save command. service-name(45) identifies a service provided by a linux-server, with supported values including http, ssh, syslog, xinetd, telnet, and ftp. local-address(47) and local-port(48) specify the IP address and port number on which the service is provided.

5. Definition Method for YANG Profile

This section describes the method for defining elements in the YANG profile. In this model, element definitions follow a prioritized order: “Define as identical element > Define as similar element > Define as new element.” The rationale for adopting this design approach based on such prioritization is as follows:

Defining as identical elements maintains consistency with industry standards, enhancing the possibility that LLMs can accurately interpret using their existing knowledge.
Defining as similar elements helps LLMs understand the relation to standard definitions while allowing necessary adjustments for the exercise environment.
Defining as new elements is limited to introducing exercise-specific concepts that cannot be expressed with existing models, thereby minimizing the increase in model complexity.

5.1. Identical Elements

When a configuration can be described using elements defined in existing YANG modules, the original definition is referenced via a reference statement to indicate that it is strictly identical:

reference "See <XPath to the element> in <module name of the element

> module for the original definition.";

Table 3 lists the identical elements used in this model along with their corresponding XPaths. For instance, the listening address local-address(47) and listening port local-port(48) of TCP services can be added by reusing tcp-server-grouping from RFC 9643 [29] using the uses statement. However, as RFC 9643 was published on 10 October 2024, potentially after the LLMs’ knowledge cutoff, these elements were defined as identical to ensure consistent interpretation.

5.2. Similar Elements

When existing YANG elements can be partially reused with modifications, a reference statement is used to indicate similarity while acknowledging deviations:

reference "Defined similarly as <XPath to the element> in <module

name of the element> module.";

Table 4 shows similar elements defined in the proposed YANG profile and their corresponding original XPaths. The following are specific reasons for treating these elements as similar:

tp-id(16): The original definition uses only the string type. To align with exercise devices, constraints were added using the must statement, e.g., eth[0-4], eth0, etc., based on node-type.
bridge-name(22): The original leafref path was adapted from /dot1q:bridges/... to /nw:networks/... to fit the new structure.
oper-status(23): The original enumeration had seven states, but only relevant options were retained for the exercise environment.
address(25): Defined as a child of interface(20) to clarify which interface the address belongs to.
bridges(28): Moved under node(11) to explicitly associate with specific nodes.
destination-prefix(38), next-hop-address(40): Modified when conditions to match the structure of the exercise profile.

Other elements not explicitly listed are typically containers or lists that serve as ancestors of the elements above, or those defined in Section 5.1 and Section 5.3. These are treated as similar because only their child elements differ from standard definitions.

5.3. New Elements

When existing YANG models cannot represent a configuration, new elements are introduced along with description statements explaining their purpose.

node-type(13) is one such element, representing the type of device:

description "The type of the network node.";

It is defined as an enumeration type, allowing values such as L2-switch, repeater-hub, linux-server, linux-client, linux-firewall, linux-router, and linux-blackhat, each with its own description, for example,

enum "L2-switch" {description "Layer 2 switch, typically used in

Ethernet networks.";}

os-info(18) was introduced instead of using os-name or os-release, to avoid ambiguity between Linux distributions and kernel versions. It is defined as follows:

type enumeration {

enum "RedHat-7.2-Kernel-2.4.19-5um" {description "Red Hat 7.2

with Kernel 2.4.19-5um";}

enum "Debian-4.0-Kernel-2.6.24" {description "Debian 4.0 with

Kernel 2.6.24";}

}

interface(20) represents network interfaces detected by Linux and must not appear in descriptions of switching or repeater hubs. This restriction is enforced with a when statement:

when "not(/nw:networks/nw:network/nw:node/nd:node-type = ’L2-switch’

or /nw:networks/nw:network/nw:node/nd:node-type = ’repeater-hub

’)";

outgoing-interface(41) refers to the host’s egress interface and must match name(21). This is ensured using a leafref type:

type leafref {

path "/nw:networks/nw:network/nw:node/interfaces/interface/name

";

}

iptables-save-output(43) stores the output of the iptables-save command. To aid interpretation by LLMs, it includes the following reference:

reference "Linux iptables-save command: https://man7.org/linux/man-

pages/man8/iptables-save.8.html";

5.4. Design Rationale and Conceptual Model

The network-devices module proposed in this study is realized by augment-ing /nw:networks/nw:network of the ietf-network module. This section formally states the guiding principles that shaped the module and illustrates the resulting layer structure.

5.4.1. Design Guidelines

The element definition order introduced in Section 5 (“Define as identical element > define as similar element > define as new element”) is grounded in three practical guidelines:

Minimal invasiveness
Policy: Leave every standard YANG module unmodified and localize all additional elements under a dedicated /nd: * subtree via augment.
Effect: Limits the ripple scope of changes, thereby improving reusability and easing maintenance.
Backward compatibility
Policy: When refining existing data, follow the lifecycle guidance in Section 4.7 of RFC 8407 [30] and the update rules in Section 11 of RFC 7950; that is, never rename or repurpose a published data node, and do not change its data type unless the new type is fully syntax- and semantics-compatible. If different semantics or a stricter type is required, introduce a new leaf (or case) instead.
Effect: Enables the profile to coexist with legacy tools and devices, thereby supporting step-wise deployment.
Cognitive affinity
Policy: Reuse standard vocabulary and data types whenever possible; introduce new terms only when unavoidable.
Effect: Secures semantic consistency with knowledge already embedded in large language models (LLMs), thereby improving interpretation accuracy.

5.4.2. Layered Conceptual Model

Figure 2 depicts the layered structure of the proposed profile. The “standard layer” adopts the ietf-network and ietf-network-topology modules (RFC 8345) without modification, while the “extension layer” (network-devices) introduces exercise-specific leaf nodes such as node-type, os-info, and iptables-save-output via augment. The dashed “vocabulary-alignment layer” is an optional future extension aimed at optimizing interactions with LLMs; it is neither implemented nor evaluated in this work.

The profile therefore maintains the definitions of RFC 8345/8349 intact, while the extension layer encapsulates all exercise-specific elements, ensuring a clear separation between standard and exercise-oriented semantics.

6. Evaluation Experiment

The effectiveness of the proposed YANG profile in terms of how well it can be understood and utilized by LLMs was evaluated from the following three perspectives:

Understanding of the YANG profile structure (Section 6.1);
Analysis of YANG instances based on the YANG profile (Section 6.3);
Generation of YANG instances based on the YANG profile (Section 6.4).

The evaluation was conducted through interactions via a web-based dialogue interface with LLMs. The YANG profile and evaluation prompts were submitted to the LLMs through an input field, and their responses were reviewed. If a response was insufficient, follow-up questions were submitted until a satisfactory response was obtained. When an incorrect response was received, the error was pointed out. If no improvement was observed after up to three correction attempts, the evaluation was terminated for that item.

Three LLMs were used in this experiment: ChatGPT 4o (OpenAI), Claude 3.7 Sonnet (Anthropic), and Gemini 2.0 Flash (Google).

6.1. Understanding of YANG Profile Structure

The YANG profile consists of both custom and existing YANG modules. Accordingly, the evaluation was conducted in two stages: first assessing the LLMs’ understanding of the custom YANG module, and then their understanding of the full YANG profile (Table 5).

In Table 5, the “√” column indicates the number of cases in which the LLM provided a correct response without any errors. The “∆” column represents the number of cases where the LLM initially made mistakes but eventually arrived at the correct answer. The “×” column indicates the number of cases where the evaluation was terminated due to a lack of sufficient progress. Across 180 evaluation items (6 tasks × 3 LLMs × 10 trials each), all models answered correctly on the first attempt, yielding an initial accuracy of 100%.

The tasks listed in Table 5 primarily involve syntactic understanding of the YANG profile, such as enumerating leaf elements, identifying their data types, determining their originating modules, and interpreting reference or constraint expressions like leafref, must, and when.

Since the complete structure of the YANG profile was made available in advance, these tasks did not require semantic inference but rather pattern recognition and structural matching. Accordingly, the high success rate (“√”) observed reflects the LLMs’ capacity for syntactic parsing and structural reasoning under well-defined inputs.

6.1.1. Level of Understanding of the Custom YANG Module

This evaluation examined whether the LLMs could detect all leaf elements defined in the custom YANG module and correctly understand the data type and role of each element.

In Task No. 1, the custom YANG module was first provided to the LLMs, followed by a task prompt requesting the enumeration of all leaf elements. The responses showed that the LLMs were able to enumerate all relevant elements.

Subsequently, in Task No. 2, the LLMs were asked to identify the data types of the leaf elements. The responses indicated that the data types were correctly recognized.

Finally, in Task No. 3, the LLMs were asked to explain the role of each leaf element. The responses indicated that the roles were correctly understood.

6.1.2. Level of Understanding of the YANG Profile

Some elements of the YANG profile (e.g., link-id(4)) are derived from existing YANG modules, as shown in Table 3 and Table 4. This introduces the potential for LLMs to confuse the source of such elements. Additionally, certain elements (e.g., network-id(2)) originate from existing modules and are not defined in the custom YANG module. As a result, there is a possibility that LLMs may rely solely on the custom module and fail to identify definitions for those elements.

To examine this issue, Task No. 4 evaluated whether LLMs could recognize the source module for each leaf in the YANG profile. After completing Task No. 3, a list of all leaf paths was provided to the LLMs, followed by a question asking them to identify the defining module for each. The responses indicated that the source of each definition was correctly recognized.

Leaves defined with the leafref type must refer to values of other elements specified via the path statement. Accurately resolving these references requires a comprehensive understanding of the overall profile structure. Task No. 5 assessed whether LLMs could interpret such leafref elements defined in existing modules (e.g., source-node(6)), through follow-up questions in the same dialogue session.

Although must and when statements are not essential to the structural definition of elements, they provide important semantic constraints and dependencies. In the proposed profile, these statements often refer to the values of other leaf elements, thus requiring a deep structural understanding to interpret correctly. Task No. 6 was designed to evaluate whether the LLMs could identify and understand the must statement for os-info(18) and the when statement for interface(20). This evaluation was also conducted as a continuation of the previous dialogue.

6.2. Networks for Evaluation Experiment

In the exercise environment, participants construct networks in accordance with the exercise tasks. These networks may vary between individuals depending on their experimental strategies and may also evolve as the tasks progress. Therefore, evaluation of the YANG profile based on actual student-created networks is considered future work.

Instead, the present experiment evaluates the YANG profile using representative network topologies that are commonly constructed in exercises, including those designed for beginners. The focus is on assessing whether LLMs can (1) accurately interpret all leaf elements and (2) correctly generate YANG instances, even for larger-scale networks with increased description complexity.

Figure 3 illustrates the topologies of the four networks used for evaluation. Squares represent devices, with the upper portion of each box indicating the device name (hostname(17)) and the lower portion showing the abbreviated device type (node-type(13)). The abbreviations used are: client for linux-client, switch for L2-switch, server for linux-server, router for linux-router, black for linux-blackhat, and firewall for linux-firewall. Lines between devices represent cables, with physical port abbreviations labeled near the connection points. Integer values correspond to interface names, such that 0 corresponds to eth0, 1 to eth1, and so on. Each device is configured according to its role—for example, a server has a valid IP address and provides services such as http and ssh.

The exercise curriculum comprises 11 practice problems and 6 exercise assignments [31]. Network 1 corresponds to the basic wiring topology shown in Figure 1a of [31] and is used as the starting point for 10 practice problems and one assignment. In this setting, participants learn fundamental operations—such as IP address configuration and service activation—and conduct both back-door attacks and defensive counter-measures. Network 4 is based on the two-host topology with a transparent firewall shown in Figure 1b of [31] and serves as the starting point for one practice problem and one assignment. This setup focuses on configuring iptables for packet filtering and log analysis. Participants may further modify these initial networks according to their experimental objectives. Network 2 and Network 3 are synthetic topologies that represent the multi-subnet and star-shaped scenarios, respectively.

The YANG instances describing each network utilize the leaf elements listed in Table A1, where √ indicates usage and × indicates non-usage. The elements bridge-name(22) and name(30) are used to represent bridging configurations implemented in firewalls. next-hop-address(40) is used for describing routing table entries and thus appears in Networks 2 and 3, which include multiple subnets. iptables-save-output(43) is employed to describe iptables settings in firewalls. service-name(45), local-address(47), and local-port(48) are used for defining server-side service configurations.

All YANG instances submitted to the LLMs were written in a human-readable JavaScript Object Notation (JSON) format, commonly referred to as “pretty-printed JSON,” which includes indentation and line breaks. Although no specific formatter was enforced, we followed standard conventions such as two-space indentation and one element per line. Token counts were measured using OpenAI’s tiktoken library with the cl100k_base encoding, which corresponds to the tokenizer used in GPT-4o. These token counts reflect the exact data that was submitted to GPT-4o during evaluation, without any further minification.

In addition to token-based indicators, Table 6 reports the number of nodes and physical links in each network topology, as well as the number of JSON keys in each instance. These metrics provide a clear view of both the structural complexity and the input size of each evaluation case.

As summarized in Figure 3 and Table A1 and Table 6, the four evaluation topologies enlarge the set of configuration elements and the overall description size in lock-step with the instructional workflow—basic wiring, inter-subnet routing, scalability testing, and security hardening. Network 1 and Network 4 replicate template topologies prescribed in the exercise curriculum [31], whereas Network 2 and Network 3 are synthetic scenarios devised to stress routing scale and topological diversity. Taken together, the four cases cover every leaf element listed in Table A1, thereby enabling a systematic comparison of model behavior across successive difficulty levels.

6.3. Analysis of YANG Instances

To evaluate the LLMs’ ability to interpret YANG instances, the custom YANG module was first provided to the models, followed by YANG instance data corresponding to the networks described in Section 6.2. Evaluation tasks tailored to each network were then submitted, and the responses from the LLMs were assessed. A response was judged as correct if it answered the given task accurately and completely without any structural or semantic errors, based on the intended interpretation of the YANG instance. When an incorrect response was received, the error was pointed out in a neutral and minimal manner, such as “source-node is incorrect” or “please check iptables-save-output again”. Correct answers were never provided to the LLMs. Instead, the feedback was limited to short prompts that merely indicated the presence of an error without guiding toward the solution. All LLMs were evaluated under identical conditions using this protocol, and each LLM was given a maximum of three attempts per item. The purpose of this evaluation design was not to lead the models to the correct answer but to examine whether each model could recognize and correct its own mistakes when given only minimal external cues.

Table 7 summarizes the results: the “√” column indicates the number of cases where the LLM responded correctly without errors; the “∆” column indicates responses that included initial mistakes but eventually reached the correct answer; and the “×” column shows cases in which the evaluation was terminated due to insufficient progress. Across the 750 evaluation items (25 tasks × 3 LLMs × 10 trials each), 25 instances (3.3%) contained errors in their initial replies. Subsequent follow-up prompts rectified 11 of them, leaving 14 unresolved; this yields an effective accuracy of 98.1%. All errors arose from just four tasks—11, 12, 18, 19—in the link-resolution category, while the remaining 21 tasks (including Task 1) were answered correctly on the first attempt.

Several notable errors were observed. For Task No. 11, Gemini responded with “Cannot determine from the given information.” In Task No. 12, it incorrectly identified the port as “cli0’s eth0.” In Task No. 18, it omitted “rtr2’s eth1” from the answer. For Task No. 19, ChatGPT provided incorrect outputs including “fw0’s eth1” and “not connected to any link.” Gemini, in the same task, repeatedly misidentified ports—returning “fw0’s eth1” seven times and “fw0’s eth0” twice.

These results indicate that while the LLMs generally understood the evaluation questions, difficulties remained in accurately interpreting the YANG instance structures. In particular, interpretation of the link(3) structure was a common source of error, with Gemini exhibiting this tendency more frequently.

To better interpret the observed performance differences among models, we retrospectively classified the evaluation tasks into two categories: syntax-based and inference-based. Syntax-based tasks involve straightforward extraction of individual values from the YANG instance, such as IP addresses, OS information, and routing entries. These tasks rely primarily on local traversal of the hierarchical structure and include task numbers 2, 3, 4, 6, 7, 8, 9, 10, 13, 14, 15, 16, 20, 21, 22, and 24. In contrast, inference-based tasks require interpretation of relationships between multiple elements, including resolving leafref paths, conditional constructs such as when or must statements, or understanding the semantics of iptables rules. These tasks include numbers 1, 5, 11, 12, 17, 18, 19, 23, and 25. Through this classification, we found that all three models performed reliably on syntax-based tasks, whereas inference-based tasks were more challenging overall and showed variation in model behavior. Notably, the inference-based tasks that generated the most errors—Tasks 12, 18, and 19—all require complex multi-hop reasoning through the network’s link structure. This reasoning requirement explains why models, particularly Gemini, frequently failed at link mapping by producing responses such as “Cannot determine from the given information” or incorrectly identifying the destination port.

6.4. Generation of YANG Instances

To evaluate whether LLMs can generate YANG instances that conform to the proposed YANG profile, each model was tasked with describing a given network using a YANG instance, based solely on its own understanding. In this evaluation, the correct YANG instances were not provided; instead, networks were presented in natural language. These descriptions intentionally excluded any references to leaf names defined in the YANG profile. Furthermore, values for enumeration-type leaves were not presented using enumeration terms to prevent LLMs from inferring leaf names based on value clues. This approach was intended to ensure that instance generation depended on an actual understanding of the YANG profile, rather than pattern matching.

To enhance interpretability, each LLM was provided with network descriptions written in the style it had itself previously produced, under the assumption that self-generated language would yield the highest level of comprehension.

The proposed custom YANG module extends existing modules, and the YANG profile consists of elements from both. To help LLMs build an understanding of the profile, the definition of the custom module was first provided, followed by the paths of leaf elements from existing modules. Once the YANG profile was introduced in this way, LLMs were given YANG instances and then asked to describe the networks in natural language—these descriptions are referred to as “explanatory texts” below.

Table 8 summarizes the evaluation of YANG instance generation for each network. Each generated instance was judged based on whether it satisfied the structural and semantic constraints implied by the input description and the YANG profile. A response was marked as “√” if it met these conditions without requiring correction; “∆” if it reached an acceptable state after follow-up feedback; and “×” if it failed to do so within three attempts. To ensure objective and reproducible evaluation of generated instances, we adopted a rule-based error classification scheme. Across 120 generation items (4 networks × 3 LLMs × 10 trials each), 25 initial outputs (20.8 %) contained structural or semantic errors. Subsequent follow-up prompts resolved 17 of them, yielding an effective success rate of 93.3%; the remaining 8 items were terminated after three unsuccessful attempts.

An output was regarded as incorrect if it contained one or more of the following:

Illegal element: The instance includes leaf or container elements not defined in the proposed YANG profile.
Mandatory miss: Elements that are expected to appear with non-empty values, based on the semantics of the described network, are missing.
Value mismatch: Elements whose values are fixed or constrained by the task specification (e.g., node-type, destination-prefix) contain incorrect values.

If the output contained errors, a follow-up prompt was submitted to the LLM to indicate the issue. The feedback was based on the error classification scheme: for Illegal element and Value mismatch, the name of the relevant element was specified to indicate that it contained an error; for Mandatory miss, the parent element was identified to highlight the omission. For example, a follow-up prompt might say “node-type seems incorrect” or “check the routing section again”. This feedback was minimal and did not reveal the correct value. Each model was allowed up to three attempts under the same protocol.

For Networks 1 and 3, the YANG instances generated by the LLMs aligned well with expectations.

For Network 2, the outputs varied. ChatGPT consistently misclassified hub0 as a repeater hub across all 10 instances, while correctly identifying hub1 as a switching hub. Table 9 lists every switching hub that appeared in the evaluation together with the phrase in the explanatory text that indicated its type. Although the actual explanatory texts supplied to ChatGPT were written in Japanese, the table shows their literal English renderings for the reader’s convenience. ChatGPT produced only one misclassification—hub0 in Network 2—while it classified all remaining devices correctly. The erroneous phrase contains the verb clause “relays Ethernet signals”, whereas every correct case explicitly contains the word “switch” (e.g., “switching”, “Layer-2 switch”, or “L2 switch”), a direct lexical cue that the device is a switching hub.

The results for Network 4 also differed across LLMs. In nine of ChatGPT’s instances, the interface(20) of fw0 lacked a description of the bridge interface. Since the firewall in this network functions as a transparent device bridging eth0 and eth1, the bridge interface should have been described both in bridge(29) and interface(20). Claude and Gemini received explanatory texts that described the relationship with the bridge interface separately in both interfaces(19) and bridges(28). In contrast, the explanation given to ChatGPT described the relationship only in interfaces(19), and referenced the bridge indirectly through that. This implies that ChatGPT tends to simplify representations in both explanatory texts and generated instances. This may suggest that ChatGPT tends to omit redundant structural elements when the relationship is already implied within a single container (i.e., interfaces(19)). However, it remains unclear whether this behavior is due to the model’s internal bias toward simplification or a sensitivity to the input structure. Further controlled experiments using parallel explanatory texts are needed to isolate the cause.

In six of the YANG instances generated by Gemini, natural language descriptions were placed in iptables-save-output(43). Although the explanatory text for this element was provided in natural language, Gemini’s output differed from the given phrasing. Notably, Gemini correctly described this element in the remaining four instances and successfully handled other newly defined elements (see Section 5.3). This suggests that Gemini is not incapable of handling custom elements. However, whereas most new elements are defined as enumeration types, iptables-save-output(43) is a string-type element whose purpose is specified in a description statement as “storing the output of iptables-save.” These observations suggest that Gemini may have a tendency to overlook or deprioritize descriptive constraints when handling newly defined elements. It is also possible that Gemini interprets string-type elements as free-form fields when no explicit formatting or examples are present in the prompt. Unlike enumeration types, which restrict possible outputs, iptables-save-output(43) provides only a descriptive instruction, which may not have been sufficiently emphasized for the model to generate configuration-like content. Further examination with constrained prompts or formatting examples would be required to verify this hypothesis.

7. Conclusions

This study proposed a YANG profile designed to efficiently facilitate the exchange of network configuration information between applications and large language models (LLMs) in network security exercise environments. The profile is based on the ietf-network module defined in RFC 8345 and was extended to accommodate exercise-specific requirements. In particular, it introduces elements representing Linux-specific network interface configurations, firewall settings using iptables, and the functional roles of devices in exercises.

Evaluation experiments demonstrated that the proposed YANG profile is generally well understood and appropriately processed by leading LLMs such as ChatGPT, Claude, and Gemini. All models showed high accuracy in interpreting and analyzing basic network topology elements. Moreover, most exercise-specific elements—including OS types and custom node types—were correctly interpreted and generated, with only a few exceptions. The experiments also revealed several tendencies among the LLMs: device type identification was sensitive to how descriptions were phrased, some models favored simplified expressions in their outputs, and adherence to description statements was not always guaranteed. These findings offer valuable insights for designing effective information-sharing mechanisms between LLMs and network-oriented applications.

The YANG profile proposed in this study expands the potential for AI utilization in educational contexts by serving as a foundational technology for bidirectional integration between LLMs and network exercise systems. This integration provides tangible benefits to various stakeholders in cybersecurity education, which relies heavily on hands-on labs using Linux environments.

For Lab Practitioners (Instructors and TAs): This work addresses the critical challenge of scalability in hands-on education. Lab practitioners can leverage this profile to automate configuration management, alleviating the burden of manually inspecting hundreds of iptables rules or complex network settings. This allows them to focus on higher-level teaching rather than routine troubleshooting.
For Students: The primary benefit is access to personalized and immediate support. When troubleshooting complex multi-subnet topologies or firewall misconfigurations, students can receive real-time feedback from an LLM. This allows them to focus on understanding security concepts rather than debugging syntax errors, fostering deeper learning through independent problem solving.
For Researchers: The extensible design of the YANG profile, as demonstrated in this study, indicates potential for future expansion. It is anticipated that the profile could serve as a foundation for future extensions to support more diverse exercise scenarios, such as configuring other Linux-based security tools (e.g., intrusion detection systems) or managing the network configurations of containerized environments. This suggests that the present work can serve as a versatile starting point for research into next-generation educational platforms.

This exploratory study evaluated only four curriculum-inspired synthetic topologies, which restricts ecological validity. Moreover, the current evaluation relies exclusively on the exact match ratio and uses a modest dataset, leading to wide confidence intervals. To address these limitations, I am preparing an Institutional Review Board (IRB) application and, pending approval, will conduct a large-scale assessment on anonymized student submissions in the next academic term. Additional directions include (1) validating the profile’s usability in real classrooms, (2) extending coverage to more complex network architectures, (3) implementing and measurementing the optional vocabulary-alignment layer, (4) improving LLM accuracy in understanding and generating configurations, (5) designing an operational framework that ensures privacy and security while enabling automatic verification, (6) automating the evaluation pipeline by publishing a deterministic pyang + JSON-diff grading script to eliminate subjective multi-turn interactions, and (7) assessing robustness under noisy, partial, or adversarial YANG inputs such as missing sub-trees or permuted iptables rules.

The outcomes of this research provide a foundational basis for promoting the integration of AI into network exercise environments such as LiNeS Cloud, offering both practical and theoretical contributions to the field.

Compared with the preliminary work presented at the 7th International Conference on Information and Computer Technologies (ICICT) 2024 [11], this article version makes three principal contributions:

Profile design: We distil explicit design principles and publish a self-contained YANG profile that covers Linux–specific interfaces, iptables rules, and exercise-level roles.
Multi-LLM evaluation: The profile is assessed on ChatGPT 4o, Claude 3.7, and Gemini 2.0, revealing model-specific strengths and failure patterns.
Open artefacts: All YANG modules, instance datasets, and grading scripts are released under an open-source licence to foster replication and reuse.

Funding

This research was funded by Japan Society for the Promotion of Science (JSPS) KAKENHI grant numbers 20K12108 and 24K06257.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A. Full YANG Tree of the Proposed Profile

Figure A1. Complete YANG tree extending ietf-network with exercise-specific elements.

Appendix B. Leaf-Element Usage Across the Evaluation Networks

Table A1. Leaf element usage across the four evaluation networks, where √ indicates usage and × indicates non-usage.

Leaf	1	2	3	4
network-id(2)	√	√	√	√
link-id(4)	√	√	√	√
source-node(6)	√	√	√	√
source-tp(7)	√	√	√	√
dest-node(9)	√	√	√	√
dest-tp(10)	√	√	√	√
node-id(12)	√	√	√	√
node-type(13)	√	√	√	√
power-state(14)	√	√	√	√
tp-id(16)	√	√	√	√
hostname(17)	√	√	√	√
os-info(18)	√	√	√	√
name(21)	√	√	√	√
bridge-name(22)	×	×	×	√
oper-status(23)	√	√	√	√
phys-address(24)	√	√	√	√
ip(26)	√	√	√	√
netmask(27)	√	√	√	√
name(30)	×	×	×	√
name(34)	√	√	√	√
route-preference(37)	√	√	√	√
destination-prefix(38)	√	√	√	√
next-hop-address(40)	×	√	√	×
outgoing-interface(41)	√	√	√	√
iptables-save-output(43)	×	×	×	√
service-name(45)	√	√	×	√
local-address(47)	√	√	×	√
local-port(48)	√	√	×	√

References

Mukherjee, M.; Le, N.T.; Chow, Y.-W.; Susilo, W. Strategic Approaches to Cybersecurity Learning: A Study of Educational Models and Outcomes. Information 2024, 15, 117. [Google Scholar] [CrossRef]
Huang, Y.; Du, H.; Zhang, X.; Niyato, D.; Kang, J.; Xiong, Z.; Wang, S.; Huang, T. Large Language Models for Networking: Applications, Enabling Techniques, and Challenges. arXiv 2023, arXiv:2311.17474. [Google Scholar] [CrossRef]
Wang, C.; Scazzariello, M.; Farshin, A.; Kostić, D.; Chiesa, M. Making Network Configuration Human Friendly. arXiv 2023. [Google Scholar] [CrossRef]
Mani, S.K.; Zhou, Y.; Hsieh, K.; Segarra, S.; Eberl, T.; Azulai, E.; Frizler, I.; Chandra, R.; Kandula, S. Enhancing Network Management Using Code Generated by Large Language Models. In Proceedings of the 22nd ACM Workshop on Hot Topics in Networks (HotNets’23), Cambridge, MA, USA, 28–29 November 2023; pp. 196–204. [Google Scholar]
ChatGPT. Available online: https://chat.openai.com/chat (accessed on 18 April 2025).
Claude. [AI Assistant]. Available online: https://www.anthropic.com/claude (accessed on 18 April 2025).
Gemini. Available online: https://gemini.google.com (accessed on 18 April 2025).
Tateiwa, Y. LiNeS Cloud: A Web-Based Hands-On System for Network Security Classes with Intuitive and Seamless Operability and Light-Weight Responsiveness. IEICE Trans. Inf. Syst. 2022, E105.D, 1557–1567. [Google Scholar] [CrossRef]
Bjorklund, M. (Ed.) The YANG 1.1 Data Modeling Language; RFC 7950; Internet Engineering Task Force (IETF): Wilmington, DE, USA, 2016. [Google Scholar]
RFC INDEX. Available online: https://www.rfc-editor.org/rfc-index.html (accessed on 18 April 2025).
Tateiwa, Y. Development of Dialogue Feature between Participants and ChatGPT in Network Security Exercise System. In Proceedings of the 2024 7th International Conference on Information and Computer Technologies (ICICT), Honolulu, HI, USA, 15–17 March 2024; pp. 479–484. [Google Scholar]
Clemm, A.; Medved, J.; Varga, R.; Bahadur, N.; Ananthakrishnan, H.; Liu, X. A YANG Data Model for Network Topologies; RFC 8345; Internet Engineering Task Force (IETF): Wilmington, DE, USA, 2018. [Google Scholar]
Wågbrant, S.; Dahlén, R.V. Automated Network Configuration: A Comparison Between Ansible, Puppet, and Saltstack for Network Configuration. Bachelor’s Thesis, Mälardalen University, Västerås, Sweden, 2022. [Google Scholar]
Perrin, S. Multi-Vendor Automation for Established IP Networks: A Telefónica Case Study; Heavy Reading: New York, NY, USA, 2020; Available online: https://www.cisco.com/c/dam/en/us/products/collateral/cloud-systems-management/crosswork-network-automation/telefonica-light-reading-white-paper.pdf (accessed on 23 June 2025).
Nedyalkov, I. Application of GNS3 to Study the Security of Data Exchange between Power Electronic Devices and Control Center. Computers 2023, 12, 101. [Google Scholar] [CrossRef]
Harahus, M.; Cavojský, M.; Bugár, G.; Pleva, M. Interactive Network Learning: An Assessment of EVE-NG Platform in Educational Settings. Acta Electrotech. Inform. 2023, 23, 3–9. [Google Scholar] [CrossRef]
Karagiannis, S.; Ntantogian, C.; Magkos, E.; Ribeiro, L.L.; Campos, L. PocketCTF: A Fully Featured Approach for Hosting Portable Attack and Defense Cybersecurity Exercises. Information 2021, 12, 318. [Google Scholar] [CrossRef]
Yang, A.C.; Lin, J.Y.; Lin, C.Y.; Ogata, H. Enhancing Python Learning with PyTutor: Efficacy of a ChatGPT-Based Intelligent Tutoring System in Programming Education. Comput. Educ. Artif. Intell. 2024, 7, 100309. [Google Scholar] [CrossRef]
Espinha Gasiba, T.; Iosif, A.-C.; Kessba, I.; Amburi, S.; Lechner, U.; Pinto-Albuquerque, M. May the Source Be with You: On ChatGPT, Cybersecurity, and Secure Coding. Information 2024, 15, 572. [Google Scholar] [CrossRef]
OpenConfig Working Group. OpenConfig Data Models (YANG v1.0). 2018–Present. Available online: https://www.openconfig.net (accessed on 24 June 2025).
Bjorklund, M.; Berger, L. YANG Tree Diagrams; RFC 8340; Internet Engineering Task Force (IETF): Wilmington, DE, USA, 2018. [Google Scholar]
IEEE 802.1 Working Group. ieee802-dot1q-bridge YANG Module. 2018. Available online: https://github.com/YangModels/yang/blob/main/standard/ieee/published/802.1/ieee802-dot1q-bridge.yang (accessed on 18 April 2025).
IEEE 802.1 Working Group. ieee802-dot1q-types YANG Module. 2018. Available online: https://github.com/YangModels/yang/blob/main/standard/ieee/published/802.1/ieee802-dot1q-types.yang (accessed on 18 April 2025).
Bjorklund, M. A YANG Data Model for Interface Management; RFC 8343; Internet Engineering Task Force (IETF): Wilmington, DE, USA, 2018. [Google Scholar]
Schönwälder, J. Common YANG Data Types; RFC 6991; Internet Engineering Task Force (IETF): Wilmington, DE, USA, 2013. [Google Scholar]
Bjorklund, M. A YANG Data Model for IP Management; RFC 8344; Internet Engineering Task Force (IETF): Wilmington, DE, USA, 2018. [Google Scholar]
Lhotka, L.; Lindem, A.; Qu, Y. A YANG Data Model for Routing Management (NMDA Version); RFC 8349; Internet Engineering Task Force (IETF): Wilmington, DE, USA, 2018. [Google Scholar]
Bierman, A.; Bjorklund, M. A YANG Data Model for System Management; RFC 7317; Internet Engineering Task Force (IETF): Wilmington, DE, USA, 2014. [Google Scholar]
Watsen, K.; Wu, G.; Farrer, I. YANG Groupings for TCP Clients and TCP Servers; RFC 9643; Internet Engineering Task Force (IETF): Wilmington, DE, USA, 2024. [Google Scholar]
Bierman, A. Guidelines for Authors and Reviewers of Documents Containing YANG Data Models RFC 8407; Internet Engineering Task Force (IETF): Wilmington, DE, USA, 2018. [Google Scholar]
Tateiwa, Y. Learning effectiveness of exercises using network security exercise system LiNeS Cloud. In Proceedings of the IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), Bengaluru, India, 9–12 December 2024. [Google Scholar]

Figure 1. Baseline vs. proposed configuration data exchange between LiNeS Cloud and LLMs: (a) baseline interaction using ad hoc configuration formats, (b) proposed bidirectional exchange based on the exercise-specific YANG profile.

Figure 2. Layered concept of the proposed YANG profile (the dashed layer is reserved for future work).

Figure 3. Networks for evaluation experiment: (a) basic client–switch–server topology; (b) two-subnet topology with router; (c) star-topology network with three routers connected to a central switch; (d) security-focused topology with firewall.

Table 2. List of prefixes and module names referenced in this paper.

Prefix	Module Name
dot1q	ieee802-dot1q-bridge [22]
dot1q-types	ieee802-dot1q-types [23]
if	ietf-interfaces [24]
inet	ietf-inet-types [25]
ip	ietf-ip [26]
nd	network-devices
nt	ietf-network-topology [12]
nw	ietf-network [12]
rt	ietf-routing [27]
sys	ietf-system [28]
tcps	ietf-tcp-server [29]
v4ur	ietf-ipv4-unicast-routing [27]
yang	ietf-yang-types [25]

Table 3. Correspondence between identical elements and their reference elements from existing YANG modules.

Element	XPath
hostname(17)	/sys:system/sys:hostname
name(21)	/if:interfaces/if:interface/if:name
phys-address(24)	/if:interfaces/if:interface/if:phys-address
ip(26)	/if:interfaces/if:interface/ip:ipv4/ip:address/ip:ip
netmask(27)	/if:interfaces/if:interface/ip:ipv4/ip:address/ip:netmask
name(30)	/dot1q:bridges/dot1q:bridge/dot1q:name
name(34)	/rt:routing/rt:ribs/rt:rib/rt:name
route-preference(37)	/rt:routing/rt:ribs/rt:rib/rt:routes/rt:route/rt:route-preference
local-bind(46)	/tcps:tcp-server-grouping/tcps:local-bind
local-address(47)	/tcps:tcp-server-grouping/tcps:local-bind/tcps:local-address
local-port(48)	/tcps:tcp-server-grouping/tcps:local-bind/tcps:local-port

Table 4. Correspondence between similar elements and their reference elements, where elements are partially modified from existing YANG modules.

Element	XPath
termination-point(15)	/nw:networks/nw:network/nw:node/nt:termination-point
tp-id(16)	/nw:networks/nw:network/nw:node/nt:termination-point/nt:tp-id
bridge-name(22)	/if:interfaces/if:interface/dot1q:bridge-port/dot1q:bridge-name
oper-status(23)	/if:interfaces/if:interface/if:oper-status
address(25)	/if:interfaces/if:interface/ip:ipv4/ip:address
bridges(28)	/dot1q:bridges
bridge(29)	/dot1q:bridges/dot1q:bridge
routing(31)	/rt:routing
ribs(32)	/rt:routing/rt:ribs
rib(33)	/rt:routing/rt:ribs/rt:rib
routes(35)	/rt:routing/rt:ribs/rt:rib/rt:routes
route(36)	/rt:routing/rt:ribs/rt:rib/rt:routes/rt:route
destination-prefix(38)	/rt:routing/rt:ribs/rt:rib/rt:routes/rt:route/v4ur:destination-prefix
next-hop(39)	/rt:routing/rt:ribs/rt:rib/rt:routes/rt:route/rt:next-hop
next-hop-address(40)	/rt:routing/rt:ribs/rt:rib/rt:routes/rt:route/rt:next-hop/rt:next-hop-options/rt:simple-next-hop/v4ur:next-hop-address

Table 5. Evaluation results regarding LLMs’ understanding of YANG profile structure, with √ indicating correct understanding, ∆ indicating initially incorrect but eventually correct understanding, and × indicating failure to understand.

No.	Evaluation Task Summary	ChatGPT			Claude			Gemini
		√	∆	×	√	∆	×	√	∆	×
1	Can enumerate all leaves in the custom YANG module	10	0	0	10	0	0	10	0	0
2	Can recognize the data type of each leaf in the custom YANG module	10	0	0	10	0	0	10	0	0
3	Can understand the role of each leaf in the custom YANG module	10	0	0	10	0	0	10	0	0
4	Can recognize the source of definition for each leaf in the YANG profile	10	0	0	10	0	0	10	0	0
5	Can understand leafrefs in the YANG profile	10	0	0	10	0	0	10	0	0
6	Can understand must statements and when statements in the YANG profile	10	0	0	10	0	0	10	0	0

Table 6. Scale indicators of YANG instances used in evaluation.

Network	Nodes	Links	JSON Keys	GPT-4 Tokens
Network 1	3	2	104	1164
Network 2	5	4	175	1961
Network 3	10	9	393	4441
Network 4	5	4	176	2286

Table 7. Evaluation results regarding analysis of YANG instances by three LLMs, with √ indicating correct responses, ∆ indicating initially incorrect but eventually correct responses, and × indicating terminated evaluations.

No.	Network	Evaluation Task Summary	ChatGPT			Claude			Gemini
			√	∆	×	√	∆	×	√	∆	×
1	1	To which node and port is eth0 of the host named “hub” connected?	10	0	0	10	0	0	10	0	0
2		What are the OS distribution and kernel of host “cli”?	10	0	0	10	0	0	10	0	0
3		What is the routing table of host “cli”?	10	0	0	10	0	0	10	0	0
4		What services are provided by the host “srv”?	10	0	0	10	0	0	10	0	0
5	2	To which node and port is eth0 of the host named “hub0” connected?	10	0	0	10	0	0	10	0	0
6		What are the OS distribution and kernel version of the host “rtr”?	10	0	0	10	0	0	10	0	0
7		What is the routing table of the host “rtr”?	10	0	0	10	0	0	10	0	0
8		What services are provided by the host “srv”?	10	0	0	10	0	0	10	0	0
9		What IP address is assigned to physical port eth1 of the host “rtr”?	10	0	0	10	0	0	10	0	0
10		What is the operational status of physical port eth2 on the host “rtr”?	10	0	0	10	0	0	10	0	0
11		Is physical port eth2 of the host “rtr” connected to a link?	10	0	0	10	0	0	7	3	0
12	3	To which node and port is eth0 of the host named “hub0” connected?	10	0	0	10	0	0	0	2	8
13		What are the OS distribution and kernel version of the host “rtr0”?	10	0	0	10	0	0	10	0	0
14		What is the routing table of the host “rtr1”?	10	0	0	10	0	0	10	0	0
15		What IP address is assigned to physical port eth1 of the host “rtr2”?	10	0	0	10	0	0	10	0	0
16		What is the operational status of physical port eth2 on the host “rtr1”?	10	0	0	10	0	0	10	0	0
17		Is physical port eth2 of the host “rtr1” connected to a link?	10	0	0	10	0	0	10	0	0
18		List all links connected to the device “hub3”.	10	0	0	10	0	0	9	0	1
19	4	To which node and port is eth1 of the host named “hub1” connected?	8	2	0	10	0	0	1	4	5
20		What are the OS distribution and kernel version of the host “fw0”?	10	0	0	10	0	0	10	0	0
21		What is the routing table of the host “fw0”?	10	0	0	10	0	0	10	0	0
22		What is the bridge interface of the host “fw0”?	10	0	0	10	0	0	10	0	0
23		Which network interfaces are connected to the bridge interface of the host “fw0”?	10	0	0	10	0	0	10	0	0
24		What is the value of `iptables-save` on the host “fw0”?	10	0	0	10	0	0	10	0	0
25		What does the host “fw0” filter using `iptables`?	10	0	0	10	0	0	10	0	0

Table 8. Evaluation results regarding generation of YANG instances by three LLMs, with √ indicating correct generation, ∆ indicating partially correct generation, and × indicating incorrect generation.

Network	ChatGPT			Claude			Gemini
	√	∆	×	√	∆	×	√	∆	×
1	10	0	0	10	0	0	10	0	0
2	0	10	0	10	0	0	10	0	0
3	10	0	0	10	0	0	10	0	0
4	1	4	5	10	0	0	4	3	3

Table 9. Device-type judgements obtained from ChatGPT. The phrases in column 3 are literal English translations of the Japanese explanatory texts that were actually supplied to ChatGPT.

Network	Host Name(s)	English Rendering of the Device-Type Phrase	Result
1	`hub0`	Ethernet switching hub	Correct
2	`hub0`	A dedicated Layer-2 device that relays Ethernet signals	Incorrect
	`hub1`	Layer-2 switch	Correct
3	`hub0`, `hub1`, `hub2`, `hub3`	Layer-2 switch	Correct
4	`hub0`, `hub1`	L2 switch	Correct

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tateiwa, Y. Exercise-Specific YANG Profile for AI-Assisted Network Security Labs: Bidirectional Configuration Exchange with Large Language Models. Information 2025, 16, 631. https://doi.org/10.3390/info16080631

AMA Style

Tateiwa Y. Exercise-Specific YANG Profile for AI-Assisted Network Security Labs: Bidirectional Configuration Exchange with Large Language Models. Information. 2025; 16(8):631. https://doi.org/10.3390/info16080631

Chicago/Turabian Style

Tateiwa, Yuichiro. 2025. "Exercise-Specific YANG Profile for AI-Assisted Network Security Labs: Bidirectional Configuration Exchange with Large Language Models" Information 16, no. 8: 631. https://doi.org/10.3390/info16080631

APA Style

Tateiwa, Y. (2025). Exercise-Specific YANG Profile for AI-Assisted Network Security Labs: Bidirectional Configuration Exchange with Large Language Models. Information, 16(8), 631. https://doi.org/10.3390/info16080631

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Network	ChatGPT			Claude			Gemini
	√	∆	×	√	∆	×	√	∆	×
1	10	0	0	10	0	0	10	0	0
2	0	10	0	10	0	0	10	0	0
3	10	0	0	10	0	0	10	0	0
4	1	4	5	10	0	0	4	3	3

Network	ChatGPT			Claude			Gemini
	√	∆	×	√	∆	×	√	∆	×
1	10	0	0	10	0	0	10	0	0
2	0	10	0	10	0	0	10	0	0
3	10	0	0	10	0	0	10	0	0
4	1	4	5	10	0	0	4	3	3

Article Menu

Exercise-Specific YANG Profile for AI-Assisted Network Security Labs: Bidirectional Configuration Exchange with Large Language Models †