1. Introduction
The persistent cybersecurity skills gap underscores the need for more effective educational models and strategic learning approaches [
1]. In response, network security education has increasingly relied on hands-on exercises where students design, configure, and test virtual networks to explore both attack and defense strategies. However, current exercise platforms scale poorly: instructors must manually inspect every configuration, craft personalized feedback, and shepherd troubleshooting, a workload that quickly becomes unsustainable as class sizes and scenario complexity grow. Large language models (LLMs) are promising candidates to automate these mentoring tasks, yet no standard, machine-readable format exists for exchanging network configuration data between educational platforms and LLM-based assistants. To fill this gap, we introduce a YANG profile that captures Linux and iptables semantics and is designed for bidirectional parsing by both sides.
Large language models (LLMs) possess broad knowledge in the field of computer science and are capable of responding effectively to related inquiries. Recent empirical studies confirm that large language models (LLMs) can already undertake non-trivial network engineering tasks with high accuracy. ChatNet [
2] reports a small capacity-planning case study and shows that retrieval-augmented generation (RAG) (vs. zero-/few-shot chain-of-thought (CoT)) mitigates the calculator bottleneck. NETBUDDY shows that batching high-level policy requirements enables GPT-4 to generate P4 table entries and Border Gateway Protocol (BGP) configurations at roughly one-sixth the per-requirement cost [
3]. On the NeMoEval benchmark, GPT-4 achieved 88% functional correctness while generating
NetworkX-based traffic analysis code [
4]. These results substantiate the practical value of LLM assistance in network design and motivate this work, which introduces a dedicated YANG-based exchange profile for network security exercises. Through interactions with ChatGPT [
5], Claude [
6], and Gemini [
7], the author confirmed that these LLMs can provide practical guidance on basic network construction and secure network operation, often including concrete implementation examples.
In the network security exercise classes taught by the author, participants design, build, and operate networks, and observe their behavior to solve practice problems and assigned tasks. In such situations, LLMs are expected to assist in troubleshooting various issues, effectively supplementing or replacing instructors.
The LiNeS Cloud exercise environment [
8] enables the construction of virtual networks on a server using user-mode Linux virtual machines as nodes and provides a web-based user interface for network editing. In this environment, participants use their local PCs to build networks composed of Linux-based devices (such as servers and firewalls), and perform both attack and defense activities. Under these conditions, it is straightforward for server-side application processes to collect the participants’ network configurations.
However, no standardized mechanism currently exists for exchanging structured configuration data between applications and large language models (LLMs) in the context of educational network exercises—particularly one that supports Linux-specific semantics such as iptables rules and custom device roles. As a result, challenges remain in ensuring interoperability and seamless integration between LLMs and system components. While LLMs are capable of understanding both natural and formal languages, the ambiguity and lack of structure inherent in natural language hinder deterministic applications from accurately interpreting such information. As a result, it remains difficult for the LiNeS Cloud system to construct networks directly based on LLM-generated proposals (see
Figure 1a). To overcome this limitation, we introduce an exercise-specific YANG profile that enables deterministic, bidirectional exchange of network configuration data between LiNeS Cloud and LLMs, as illustrated in
Figure 1b.
YANG [
9] is a formal language designed for describing network configurations, with its specifications documented in natural language and widely accessible online. Additional configuration parameters can be incorporated by systematically extending existing YANG modules according to established rules. The author confirmed that ChatGPT, Claude, and Gemini were able to describe basic network configurations with reasonable accuracy using existing YANG modules.
However, the YANG modules defined in IETF Request for Comments (RFC) documents [
10] are insufficient for fully describing the exercise networks. This is due to the inclusion of Linux-based devices in the exercise environment, whose specific configurations—such as Linux-specific network interface settings,
iptables-based firewall rules, and exercise-specific device roles—are not adequately captured by standard YANG modules. Since these modules are primarily designed for general-purpose network devices, they lack the structural support needed to describe Linux-specific configurations.
The author developed a dialogue system within LiNeS Cloud that enables participants to interact with ChatGPT, proposing a method to share network configuration information with ChatGPT using YANG profiles [
11]. This dialogue function is expected to enable participants to consult ChatGPT for troubleshooting network design issues and implementing attack and defense strategies, thereby reducing their reliance on instructors or teaching assistants.
However, the previous study [
11] did not provide sufficient detail on the design rationale or evaluation of the YANG profile, nor did it examine compatibility with LLMs other than ChatGPT. In response, this research proposes a YANG profile designed to efficiently share exercise network configuration information between the LiNeS Cloud application and LLMs. This profile builds upon the
ietf-network module [
12], which provides a model for network topology, and extends it using three main strategies: (1) defining and adding elements specific to exercise networks (e.g., firewall configurations), (2) incorporating reusable components from existing YANG modules (e.g., interface names), and (3) modifying dependencies and conditional relationships among elements to reflect constraints in the exercise environment. This structured approach allows for a detailed representation of exercise networks in a format interpretable by both LLMs and applications. Furthermore, this study experimentally evaluates how accurately LLMs can interpret and generate network configurations based on the proposed YANG profile.
The YANG profile proposed in this study serves as a foundational technology for integrating LLMs into network exercise systems. Leveraging this technology, LLMs can comprehend the networks constructed by participants, offer feedback and improvement suggestions, and even design exercise scenarios themselves (e.g., troubleshooting networks or attack scenarios). Such collaboration with AI is expected to significantly enhance the quality and adaptability of hands-on learning experiences, thereby broadening the educational applications of AI.
The remainder of this paper is organized as follows.
Section 2 surveys five related research streams and positions our contribution.
Section 3 states the design principles for the YANG profile.
Section 4 details the profile specification, and
Section 5 describes the systematic element definition method.
Section 6 evaluates the profile with three state-of-the-art LLMs. Finally,
Section 7 concludes and outlines future work.
4. YANG Profile
The proposed YANG profile is based on the
ietf-network module defined in RFC 8345, extended with a custom
network-devices module and integrated with elements from existing modules. This profile is illustrated in
Figure A1, following the tree diagram format [
21]. The prefixes used in this YANG profile and in the following explanation are listed in
Table 2. In the subsequent explanation, elements are referenced in the format “element-name(line number)”.
Each exercise network is uniquely identified by network-id(2). Since participants may create multiple networks, this identifier is used to distinguish them. link-id(4) represents the identifier of a cable. source-node(6) and dest-node(9) contain the identifiers of the devices (i.e., node-id(12)) connected to each end of the cable. source-tp(7) and dest-tp(10) specify the identifiers of the physical ports (i.e., tp-id(16)) at both cable endpoints.
node-id(12) identifies a device. node-type(13) indicates the device type, which can be one of the following: L2-switch, repeater-hub, linux-server, linux-client, linux-firewall, linux-router, or linux-blackhat. power-state(14) specifies whether the device is powered on or off. tp-id(16) refers to the name of a physical port on the device. Depending on the device type, a list of termination-point(15) is defined. For node-type(13) set to L2-switch or repeater-hub, the list includes eth[0-4]; for linux-server or linux-client, eth0; for linux-firewall or linux-blackhat, eth[0-1]; and for linux-router, eth[0-2]. hostname(17) is a user-assigned name to help participants distinguish between devices. os-info(18) represents a combination of Linux distribution and kernel version.
name(21) is the Linux-recognized name of a network interface (e.g., eth0, br0). bridge-name(22) indicates the bridge interface to which the network interface belongs and corresponds to name(30). oper-status(23) indicates the operational state of the interface, represented as an enumeration: up for active and down for inactive. phys-address(24) is the MAC address of the interface. ip(26) and netmask(27) specify the IP address and subnet mask, respectively. name(30) denotes the name of the bridge interface.
name(34) identifies the Routing Information Base (RIB). Although the exercise network uses a single RIB, the profile retains the original structure to help LLMs understand the model more quickly and accurately. route-preference(37) indicates the preference level of a route. destination-prefix(38) specifies the destination network address, while next-hop-address(40) indicates the next hop. outgoing-interface(41) specifies the egress interface for the route, corresponding to the value of name(21).
iptables-save-output(43) contains the firewall configuration for linux-firewall, represented by the output of the iptables-save command. service-name(45) identifies a service provided by a linux-server, with supported values including http, ssh, syslog, xinetd, telnet, and ftp. local-address(47) and local-port(48) specify the IP address and port number on which the service is provided.
6. Evaluation Experiment
The effectiveness of the proposed YANG profile in terms of how well it can be understood and utilized by LLMs was evaluated from the following three perspectives:
Understanding of the YANG profile structure (
Section 6.1);
Analysis of YANG instances based on the YANG profile (
Section 6.3);
Generation of YANG instances based on the YANG profile (
Section 6.4).
The evaluation was conducted through interactions via a web-based dialogue interface with LLMs. The YANG profile and evaluation prompts were submitted to the LLMs through an input field, and their responses were reviewed. If a response was insufficient, follow-up questions were submitted until a satisfactory response was obtained. When an incorrect response was received, the error was pointed out. If no improvement was observed after up to three correction attempts, the evaluation was terminated for that item.
Three LLMs were used in this experiment: ChatGPT 4o (OpenAI), Claude 3.7 Sonnet (Anthropic), and Gemini 2.0 Flash (Google).
6.2. Networks for Evaluation Experiment
In the exercise environment, participants construct networks in accordance with the exercise tasks. These networks may vary between individuals depending on their experimental strategies and may also evolve as the tasks progress. Therefore, evaluation of the YANG profile based on actual student-created networks is considered future work.
Instead, the present experiment evaluates the YANG profile using representative network topologies that are commonly constructed in exercises, including those designed for beginners. The focus is on assessing whether LLMs can (1) accurately interpret all leaf elements and (2) correctly generate YANG instances, even for larger-scale networks with increased description complexity.
Figure 3 illustrates the topologies of the four networks used for evaluation. Squares represent devices, with the upper portion of each box indicating the device name (
hostname(17)) and the lower portion showing the abbreviated device type (
node-type(13)). The abbreviations used are:
client for
linux-client,
switch for
L2-switch,
server for
linux-server,
router for
linux-router,
black for
linux-blackhat, and
firewall for
linux-firewall. Lines between devices represent cables, with physical port abbreviations labeled near the connection points. Integer values correspond to interface names, such that 0 corresponds to
eth0, 1 to
eth1, and so on. Each device is configured according to its role—for example, a server has a valid IP address and provides services such as
http and
ssh.
The exercise curriculum comprises 11 practice problems and 6 exercise assignments [
31]. Network 1 corresponds to the basic wiring topology shown in Figure 1a of [
31] and is used as the starting point for 10 practice problems and one assignment. In this setting, participants learn fundamental operations—such as IP address configuration and service activation—and conduct both back-door attacks and defensive counter-measures. Network 4 is based on the two-host topology with a transparent firewall shown in Figure 1b of [
31] and serves as the starting point for one practice problem and one assignment. This setup focuses on configuring iptables for packet filtering and log analysis. Participants may further modify these initial networks according to their experimental objectives. Network 2 and Network 3 are synthetic topologies that represent the multi-subnet and star-shaped scenarios, respectively.
The YANG instances describing each network utilize the
leaf elements listed in
Table A1, where √ indicates usage and × indicates non-usage. The elements
bridge-name(22) and
name(30) are used to represent bridging configurations implemented in firewalls.
next-hop-address(40) is used for describing routing table entries and thus appears in Networks 2 and 3, which include multiple subnets.
iptables-save-output(43) is employed to describe
iptables settings in firewalls.
service-name(45),
local-address(47), and
local-port(48) are used for defining server-side service configurations.
All YANG instances submitted to the LLMs were written in a human-readable JavaScript Object Notation (JSON) format, commonly referred to as “pretty-printed JSON,” which includes indentation and line breaks. Although no specific formatter was enforced, we followed standard conventions such as two-space indentation and one element per line. Token counts were measured using OpenAI’s tiktoken library with the cl100k_base encoding, which corresponds to the tokenizer used in GPT-4o. These token counts reflect the exact data that was submitted to GPT-4o during evaluation, without any further minification.
In addition to token-based indicators,
Table 6 reports the number of nodes and physical links in each network topology, as well as the number of JSON keys in each instance. These metrics provide a clear view of both the structural complexity and the input size of each evaluation case.
As summarized in
Figure 3 and
Table A1 and
Table 6, the four evaluation topologies enlarge the set of configuration elements and the overall description size in lock-step with the instructional workflow—basic wiring, inter-subnet routing, scalability testing, and security hardening. Network 1 and Network 4 replicate template topologies prescribed in the exercise curriculum [
31], whereas Network 2 and Network 3 are synthetic scenarios devised to stress routing scale and topological diversity. Taken together, the four cases cover every leaf element listed in
Table A1, thereby enabling a systematic comparison of model behavior across successive difficulty levels.
6.3. Analysis of YANG Instances
To evaluate the LLMs’ ability to interpret YANG instances, the custom YANG module was first provided to the models, followed by YANG instance data corresponding to the networks described in
Section 6.2. Evaluation tasks tailored to each network were then submitted, and the responses from the LLMs were assessed. A response was judged as correct if it answered the given task accurately and completely without any structural or semantic errors, based on the intended interpretation of the YANG instance. When an incorrect response was received, the error was pointed out in a neutral and minimal manner, such as “
source-node is incorrect” or “
please check iptables-save-output again”. Correct answers were never provided to the LLMs. Instead, the feedback was limited to short prompts that merely indicated the presence of an error without guiding toward the solution. All LLMs were evaluated under identical conditions using this protocol, and each LLM was given a maximum of three attempts per item. The purpose of this evaluation design was not to lead the models to the correct answer but to examine whether each model could recognize and correct its own mistakes when given only minimal external cues.
Table 7 summarizes the results: the “√” column indicates the number of cases where the LLM responded correctly without errors; the “∆” column indicates responses that included initial mistakes but eventually reached the correct answer; and the “×” column shows cases in which the evaluation was terminated due to insufficient progress. Across the 750 evaluation items (25 tasks × 3 LLMs × 10 trials each), 25 instances (3.3%) contained errors in their initial replies. Subsequent follow-up prompts rectified 11 of them, leaving 14 unresolved; this yields an effective accuracy of 98.1%. All errors arose from just four tasks—11, 12, 18, 19—in the link-resolution category, while the remaining 21 tasks (including Task 1) were answered correctly on the first attempt.
Several notable errors were observed. For Task No. 11, Gemini responded with “Cannot determine from the given information.” In Task No. 12, it incorrectly identified the port as “cli0’s eth0.” In Task No. 18, it omitted “rtr2’s eth1” from the answer. For Task No. 19, ChatGPT provided incorrect outputs including “fw0’s eth1” and “not connected to any link.” Gemini, in the same task, repeatedly misidentified ports—returning “fw0’s eth1” seven times and “fw0’s eth0” twice.
These results indicate that while the LLMs generally understood the evaluation questions, difficulties remained in accurately interpreting the YANG instance structures. In particular, interpretation of the link(3) structure was a common source of error, with Gemini exhibiting this tendency more frequently.
To better interpret the observed performance differences among models, we retrospectively classified the evaluation tasks into two categories: syntax-based and inference-based. Syntax-based tasks involve straightforward extraction of individual values from the YANG instance, such as IP addresses, OS information, and routing entries. These tasks rely primarily on local traversal of the hierarchical structure and include task numbers 2, 3, 4, 6, 7, 8, 9, 10, 13, 14, 15, 16, 20, 21, 22, and 24. In contrast, inference-based tasks require interpretation of relationships between multiple elements, including resolving leafref paths, conditional constructs such as when or must statements, or understanding the semantics of iptables rules. These tasks include numbers 1, 5, 11, 12, 17, 18, 19, 23, and 25. Through this classification, we found that all three models performed reliably on syntax-based tasks, whereas inference-based tasks were more challenging overall and showed variation in model behavior. Notably, the inference-based tasks that generated the most errors—Tasks 12, 18, and 19—all require complex multi-hop reasoning through the network’s link structure. This reasoning requirement explains why models, particularly Gemini, frequently failed at link mapping by producing responses such as “Cannot determine from the given information” or incorrectly identifying the destination port.
6.4. Generation of YANG Instances
To evaluate whether LLMs can generate YANG instances that conform to the proposed YANG profile, each model was tasked with describing a given network using a YANG instance, based solely on its own understanding. In this evaluation, the correct YANG instances were not provided; instead, networks were presented in natural language. These descriptions intentionally excluded any references to leaf names defined in the YANG profile. Furthermore, values for enumeration-type leaves were not presented using enumeration terms to prevent LLMs from inferring leaf names based on value clues. This approach was intended to ensure that instance generation depended on an actual understanding of the YANG profile, rather than pattern matching.
To enhance interpretability, each LLM was provided with network descriptions written in the style it had itself previously produced, under the assumption that self-generated language would yield the highest level of comprehension.
The proposed custom YANG module extends existing modules, and the YANG profile consists of elements from both. To help LLMs build an understanding of the profile, the definition of the custom module was first provided, followed by the paths of leaf elements from existing modules. Once the YANG profile was introduced in this way, LLMs were given YANG instances and then asked to describe the networks in natural language—these descriptions are referred to as “explanatory texts” below.
Table 8 summarizes the evaluation of YANG instance generation for each network. Each generated instance was judged based on whether it satisfied the structural and semantic constraints implied by the input description and the YANG profile. A response was marked as “√” if it met these conditions without requiring correction; “∆” if it reached an acceptable state after follow-up feedback; and “×” if it failed to do so within three attempts. To ensure objective and reproducible evaluation of generated instances, we adopted a rule-based error classification scheme. Across 120 generation items (4 networks × 3 LLMs × 10 trials each), 25 initial outputs (20.8 %) contained structural or semantic errors. Subsequent follow-up prompts resolved 17 of them, yielding an effective success rate of 93.3%; the remaining 8 items were terminated after three unsuccessful attempts.
An output was regarded as incorrect if it contained one or more of the following:
Illegal element: The instance includes leaf or container elements not defined in the proposed YANG profile.
Mandatory miss: Elements that are expected to appear with non-empty values, based on the semantics of the described network, are missing.
Value mismatch: Elements whose values are fixed or constrained by the task specification (e.g., node-type, destination-prefix) contain incorrect values.
If the output contained errors, a follow-up prompt was submitted to the LLM to indicate the issue. The feedback was based on the error classification scheme: for Illegal element and Value mismatch, the name of the relevant element was specified to indicate that it contained an error; for Mandatory miss, the parent element was identified to highlight the omission. For example, a follow-up prompt might say “node-type seems incorrect” or “check the routing section again”. This feedback was minimal and did not reveal the correct value. Each model was allowed up to three attempts under the same protocol.
For Networks 1 and 3, the YANG instances generated by the LLMs aligned well with expectations.
For Network 2, the outputs varied. ChatGPT consistently misclassified
hub0 as a repeater hub across all 10 instances, while correctly identifying
hub1 as a switching hub.
Table 9 lists every switching hub that appeared in the evaluation together with the phrase in the explanatory text that indicated its type. Although the actual explanatory texts supplied to ChatGPT were written in Japanese, the table shows their literal English renderings for the reader’s convenience. ChatGPT produced only one misclassification—
hub0 in Network 2—while it classified all remaining devices correctly. The erroneous phrase contains the verb clause “relays Ethernet signals”, whereas every correct case explicitly contains the word “switch” (e.g., “switching”, “Layer-2 switch”, or “L2 switch”), a direct lexical cue that the device is a switching hub.
The results for Network 4 also differed across LLMs. In nine of ChatGPT’s instances, the interface(20) of fw0 lacked a description of the bridge interface. Since the firewall in this network functions as a transparent device bridging eth0 and eth1, the bridge interface should have been described both in bridge(29) and interface(20). Claude and Gemini received explanatory texts that described the relationship with the bridge interface separately in both interfaces(19) and bridges(28). In contrast, the explanation given to ChatGPT described the relationship only in interfaces(19), and referenced the bridge indirectly through that. This implies that ChatGPT tends to simplify representations in both explanatory texts and generated instances. This may suggest that ChatGPT tends to omit redundant structural elements when the relationship is already implied within a single container (i.e., interfaces(19)). However, it remains unclear whether this behavior is due to the model’s internal bias toward simplification or a sensitivity to the input structure. Further controlled experiments using parallel explanatory texts are needed to isolate the cause.
In six of the YANG instances generated by Gemini, natural language descriptions were placed in
iptables-save-output(43). Although the explanatory text for this element was provided in natural language, Gemini’s output differed from the given phrasing. Notably, Gemini correctly described this element in the remaining four instances and successfully handled other newly defined elements (see
Section 5.3). This suggests that Gemini is not incapable of handling custom elements. However, whereas most new elements are defined as enumeration types,
iptables-save-output(43) is a string-type element whose purpose is specified in a
description statement as “storing the output of iptables-save.” These observations suggest that Gemini may have a tendency to overlook or deprioritize descriptive constraints when handling newly defined elements. It is also possible that Gemini interprets string-type elements as free-form fields when no explicit formatting or examples are present in the prompt. Unlike enumeration types, which restrict possible outputs,
iptables-save-output(43) provides only a descriptive instruction, which may not have been sufficiently emphasized for the model to generate configuration-like content. Further examination with constrained prompts or formatting examples would be required to verify this hypothesis.
7. Conclusions
This study proposed a YANG profile designed to efficiently facilitate the exchange of network configuration information between applications and large language models (LLMs) in network security exercise environments. The profile is based on the ietf-network module defined in RFC 8345 and was extended to accommodate exercise-specific requirements. In particular, it introduces elements representing Linux-specific network interface configurations, firewall settings using iptables, and the functional roles of devices in exercises.
Evaluation experiments demonstrated that the proposed YANG profile is generally well understood and appropriately processed by leading LLMs such as ChatGPT, Claude, and Gemini. All models showed high accuracy in interpreting and analyzing basic network topology elements. Moreover, most exercise-specific elements—including OS types and custom node types—were correctly interpreted and generated, with only a few exceptions. The experiments also revealed several tendencies among the LLMs: device type identification was sensitive to how descriptions were phrased, some models favored simplified expressions in their outputs, and adherence to description statements was not always guaranteed. These findings offer valuable insights for designing effective information-sharing mechanisms between LLMs and network-oriented applications.
The YANG profile proposed in this study expands the potential for AI utilization in educational contexts by serving as a foundational technology for bidirectional integration between LLMs and network exercise systems. This integration provides tangible benefits to various stakeholders in cybersecurity education, which relies heavily on hands-on labs using Linux environments.
For Lab Practitioners (Instructors and TAs): This work addresses the critical challenge of scalability in hands-on education. Lab practitioners can leverage this profile to automate configuration management, alleviating the burden of manually inspecting hundreds of iptables rules or complex network settings. This allows them to focus on higher-level teaching rather than routine troubleshooting.
For Students: The primary benefit is access to personalized and immediate support. When troubleshooting complex multi-subnet topologies or firewall misconfigurations, students can receive real-time feedback from an LLM. This allows them to focus on understanding security concepts rather than debugging syntax errors, fostering deeper learning through independent problem solving.
For Researchers: The extensible design of the YANG profile, as demonstrated in this study, indicates potential for future expansion. It is anticipated that the profile could serve as a foundation for future extensions to support more diverse exercise scenarios, such as configuring other Linux-based security tools (e.g., intrusion detection systems) or managing the network configurations of containerized environments. This suggests that the present work can serve as a versatile starting point for research into next-generation educational platforms.
This exploratory study evaluated only four curriculum-inspired synthetic topologies, which restricts ecological validity. Moreover, the current evaluation relies exclusively on the exact match ratio and uses a modest dataset, leading to wide confidence intervals. To address these limitations, I am preparing an Institutional Review Board (IRB) application and, pending approval, will conduct a large-scale assessment on anonymized student submissions in the next academic term. Additional directions include (1) validating the profile’s usability in real classrooms, (2) extending coverage to more complex network architectures, (3) implementing and measurementing the optional vocabulary-alignment layer, (4) improving LLM accuracy in understanding and generating configurations, (5) designing an operational framework that ensures privacy and security while enabling automatic verification, (6) automating the evaluation pipeline by publishing a deterministic pyang + JSON-diff grading script to eliminate subjective multi-turn interactions, and (7) assessing robustness under noisy, partial, or adversarial YANG inputs such as missing sub-trees or permuted iptables rules.
The outcomes of this research provide a foundational basis for promoting the integration of AI into network exercise environments such as LiNeS Cloud, offering both practical and theoretical contributions to the field.
Compared with the preliminary work presented at the 7th International Conference on Information and Computer Technologies (ICICT) 2024 [
11], this article version makes three principal contributions:
Profile design: We distil explicit design principles and publish a self-contained YANG profile that covers Linux–specific interfaces, iptables rules, and exercise-level roles.
Multi-LLM evaluation: The profile is assessed on ChatGPT 4o, Claude 3.7, and Gemini 2.0, revealing model-specific strengths and failure patterns.
Open artefacts: All YANG modules, instance datasets, and grading scripts are released under an open-source licence to foster replication and reuse.