Optimizing RTL Code Obfuscation: New Methods Based on XML Syntax Tree

Yi, Hanwen; Zhang, Jin; Liu, Sheng

doi:10.3390/app14010243

Open AccessArticle

Optimizing RTL Code Obfuscation: New Methods Based on XML Syntax Tree

by

Hanwen Yi

^1,†,

Jin Zhang

^1,†

and

Sheng Liu

^2,3,*

¹

School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha 410114, China

²

College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China

³

Key Laboratory of Advanced Microprocessor Chips and Systems, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2024, 14(1), 243; https://doi.org/10.3390/app14010243

Submission received: 26 November 2023 / Revised: 14 December 2023 / Accepted: 19 December 2023 / Published: 27 December 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

As the most widely used description code in digital circuits and system on chip (SoC), the security of register transfer level (RTL) code is extremely critical. Code obfuscation is a typical method to ensure the security of RTL code, but popular obfuscation methods are not fully applicable to RTL code. In addition, some RTL code obfuscation tools also have issues with incomplete functionality or obfuscation errors. In view of the above issues, this paper studies the RTL code security problem represented by obfuscation. Based on the extensible markup language (XML) syntax tree generated by parsing RTL code, a complete RTL code refactoring model is constructed, and four targeted RTL code obfuscation methods are proposed, namely: Layout obfuscation; Parameter obfuscation; Critical path obfuscation; Code increment obfuscation. Utilizing the developed obfuscation tool, an assessment of the performance and effectiveness of the obfuscation methods is conducted, alongside testing the equivalence between the obfuscated code and the source code. The experimental results show that the proposed obfuscation methods have higher practicability and reliability, and have the characteristics of high obfuscation coverage that can be stable at over 98% and preservation of compiler indicative Comments.

Keywords:

integrated circuit; RTL code; code security; code obfuscation; XML syntax tree

1. Introduction

Currently, the field of Integrated Circuit (IC) technology is undergoing rapid advancement, leading to increased commercial cooperation within the IC industry. Given that Register Transfer Level (RTL) code is the most prevalently utilized description language in digital circuits and System on Chip (SoC) designs, the industry hopes to realize win-win business cooperation through Intellectual Property (IP) of RTL code projects [1]. However, due to security concerns related to IP, conflicts of interest often arise among companies in practical collaborations. In August 2022, ARM filed a lawsuit against Qualcomm, alleging unauthorized usage of ARM’s IP in chip design [2]. Therefore, it is very important to ensure IP security during business cooperation. In the preliminary communication stage of business cooperation, if RTL code appears in plain text, the internal information security of the code is difficult to guarantee. Regarding IP authorization issues, RTL code without any protective measures faces challenges in ensuring the post-authorization interests of the IP provider remain unharmed. To circumvent these infringements, researchers employ methods such as code encryption and obfuscation to secure the RTL code, thereby promoting secure and equitable business practices within the IC industry.

Code obfuscation and code encryption represent two distinct methodologies for safeguarding RTL code. Code obfuscation functions as a countermeasure against human analysis and reverse engineering by transforming the source code into a protected format with indiscernible semantics [3]. Obfuscated code thwarts attackers from analyzing the control flow and data interaction patterns within the code, thereby preserving the intellectual property of the developer or organization. Obfuscation focuses on making the code less readable, which is more appropriate when the developer does not want others to be able to analyze the code to obtain the fruits of their labor. Code encryption is a method of converting code into ciphertext by using a key, and is preferred to prevent intellectual property leakage through key management. Nevertheless, due to the necessity of compliance with the regulations of the Institute of Electrical and Electronics Engineers (IEEE) [4], the encryption can only rely on the key of the electronic design automation (EDA) software company. This arrangement introduces a potential vulnerability, as there exists a risk of information disclosure from the EDA software company endowed with key management responsibilities. Code obfuscation is more advantageous than code encryption in preventing code leakage and avoiding code parsing.

Current obfuscation studies of RTL code are still problematic. Several popular obfuscation methods that are widely used are not well-suited for RTL code. For instance, data obfuscation techniques designed to complicate mathematical operations, when applied to RTL code, tend to increase the number of operation units in the circuit. This is often conflicts with practical development scenarios with area, timing, and power requirements, and requires obfuscation methods that are more compatible with RTL code. Simultaneously, certain existing RTL code obfuscation tools have the problem of wrong code analysis, leading to the obfuscation of information that should be preserved. The obfuscation process of these tools is also unreasonable, and through careful observation, patterns may be found and thus the possibility of cracking exists. These flaws reduce the efficiency and accuracy of RTL code obfuscation.

In response to the industry’s demand for code obfuscation, this paper proposes RTL code obfuscation methods based on Extensible Markup Language (XML) syntax tree. The introduction of these methods is anticipated to yield benefits and changes for practitioners in related fields. The proposed obfuscation method is based on analyzing the XML syntax tree, performing obfuscation operations on the RTL code to meet its characteristics and engineering requirements, and implementing the obfuscation method through an obfuscation tool. During the obfuscation process of RTL code, the tool abandons the original code style and produce better security obfuscated code. The proposed obfuscation method includes the following: (1). obfuscation of crucial information, such as names, to conceal their meanings, with the obfuscated information being irreversible. Additionally, the removal of unnecessary spaces and comments in the code aims to augment the complexity of analysis; (2). hide all the parameters with important information and add false parameters in the code to heighten the complexity of analysis; (3). insert a strong covert module in critical paths to extend the critical paths to achieve the purpose of destroying the synthesized effect of the circuit; (4). split all the “aggregate” sentences to increase the code while ensuring that the functions of the obfuscated code are consistent with the source code.

Finally, four sets of open-source IP cores underwent obfuscation, followed by formal verification to demonstrate the accuracy and equivalence of the obfuscated code. The obfuscation tool was experimentally compared with other tools to verify that the proposed obfuscation method has certain advantages.

2. Related Work

Safeguarding source code against unauthorized use and theft has perennially been a paramount concern for practitioners in relevant fields, with ongoing research dedicated to addressing this issue. In response to the insecurity factors of code, many technical means to enhance code security have been produced. Code obfuscation technology is one of the very important technologies.

In the 1990s, Collberg et al. [5] first elaborated on the concept and classification of code obfuscation in their article. They put various evaluation criteria to assess the effectiveness and performance of obfuscation algorithms. Collberg et al. [5] categorized obfuscation transformations into four distinct types: layout obfuscation, data obfuscation, control obfuscation, and preventive obfuscation. Following this groundwork, Castillo et al. [6] extended these obfuscation methods to RTL code. However, the outcomes did not meet expectations. Despite certain methods proving unsuitable for RTL code, these early efforts laid the groundwork for subsequent research in RTL code obfuscation.

Hoffmann et al. [7] proposed an opaque predicate obfuscation technique that is difficult to find through static analysis. By using existing suitable signals or adding small circuits, when they are not enabled or in a waiting state in the circuit, as opaque predicates participate in code obfuscation. This technology has been improved in concealment and has good performance in ensuring low overhead.

Sengupta and Chaurasia [8] introduce a novel hybrid approach in which multilevel structural obfuscation is employed to safeguard RTL descriptions, coupled with encrypted chromosomal DNA impressions for securing IP in digital signal processing (DSP) applications. The method involves embedding stealth DNA impressions into structure-obfuscated DSP designs through robust encoding and encryption using multi-iterative Feistel ciphers. This approach demonstrates greater robustness in terms of providing digital evidence and resistance against tampering.

Sazadur Rahman et al. [9] proposed an RTL finite state machine obfuscation technology called ReTrustFSM. This technology includes three types of secrecy methods: explicit secrecy using external keys, implicit secrecy based on specific clocks, and internal secrecy through hidden FSM transformation functions. The adoption of this hybrid obfuscation scheme significantly enhances the circuit’s resilience against attacks on logic locking, especially attacks against FSM obfuscation.

As electronic packaging continues to evolve and innovate with technologies such as finite substrates with layered dielectric forms [10], an increasing number of hardware obfuscation techniques are coming to the forefront. Koushanfar and Qu [11] first proposed the concept of hardware metering, specifically by selecting a part of the circuit for programming, and finally, the circuit gets a unique signature.

Li et al. [12] devised a key-based obfuscation scheme aimed at safeguarding intellectual property from infringement. The method is to embed a key in the power-up state of an integrated circuit. Without the corresponding key for unlocking, the circuit’s efficiency is significantly reduced. Additionally, the implicit key can be used as a concealed watermark.

Islam and Katkoori [13] proposed three techniques for securing RTL Intellectual Property. The first technique is proposed in the early design phase of High-Level Synthesis (HLS). Given a control data flow graph, obfuscation logic is inserted during and after HLS using synthesized information. The second technique proposes a design lockout mechanism that embeds a comparator on obfuscation logic output to check if the key is correct. When the number of errors exceeds the allowed number, a finite state machine checker is used to enforce design lockout. The third technique designs four obfuscated module variants to disguise the RTL design.

Karmakar and Chattopadhyay [14] present several strategies that can prevent SAT attacks by obfuscating scan-based Design for Testability (DfT) infrastructures. In addition, several potential solutions for inserting key gates into circuits are presented to ensure protection against various attacks that utilize vulnerable key gate locations. Finally, a finite state machine (FSM) watermarking strategy based on cellular automata is also presented to help detect potential theft of the designer’s intellectual property by an adversary.

3. Code Refactoring Model Based on XML Syntax Tree

3.1. RTL Code Parse

RTL code inherently has a unique hardware operating characteristics: multiple components run in parallel, sequential logic governed by clock involvement and sensitive combinatorial logic. These features render the parsing of RTL code more complex and challenging compared to other code.

Given the complexity and uncertainty in the syntax of RTL code, the development of a compliant parser can incur significant costs and may result in substantial issues, such as the omission of essential functionality or parsing errors. Consequently, opting for a comprehensive and user-friendly parser is the preferred approach.

In this paper, a collection of various open-source parsing tools for RTL code is collected, tested, and summarized. The most comprehensive and user-friendly parser among these parsers is finally selected for further study. As illustrated in Table 1, experimental testing reveals that certain parsers exhibit incomplete support for RTL syntax, while others have the disadvantage that their parsing results are not easily analyzable.

Experiments with the Verilator demonstrate its comprehensive support for RTL code syntax, user-friendly operation, and the XML file produced after parsing has the advantages of simple structure and clear content. Therefore, the Verilator is chosen as the parser for RTL code.

3.2. RTL Code Refactoring Model

XML is a language that conveys content similar to HTML files but lacks own predefined tags. Consequently, designers have the flexibility to customize their own tags in XML based on development requirements. Data in XML files is presented in a clear format, facilitating easy analysis, and attributes can be added to each tag for detailed descriptions. This file format, characterized by potent information storage and data transfer capabilities, enhances the study’s capacity to analyze and extract key information from the RTL code, thereby aiding subsequent obfuscation efforts [16].

The XML syntax tree is a hierarchical structure composed of tags in an XML document, their relationships, and attributes. It provides a clear representation of the hierarchical organization of an XML document, treating tags as nodes connected in a tree-like structure. The relationships between these nodes describe the entirety of the project. In the context of expressing RTL code, the root node serves as the entry point for external access, representing the top-level module of the project. Subsequently, the tree is traversed hierarchically from the top-level module, sequentially displaying the information of each module.

In the XML syntax tree, each node has a parent node and zero or more child nodes, and the unique parent-child and brother relationships of this tree structure can help clarify the logic inherent in RTL code. As shown in Figure 1, the <always> tag in an XML document is at the outermost level and contains a variety of tags that are either sibling or parent-child.

After a detailed analysis of the content within the <always> tag, the corresponding syntax tree is illustrated in Figure 2. The root node of the syntax tree is <always>, and its two sub-tags become sibling nodes at the second level. The sub-tags contained within the second-level node tags further become nodes at the third level, and so on, resulting in the derivation of a comprehensive structure for the syntax tree.

Ultimately, when presented with the syntax tree featuring clear relationships, this study reconstructs the RTL code based on the corresponding syntax of the tags and the internal data of the tags. As illustrated in Figure 3, the contents of the <always> tag are derived as RTL code.

An abstraction of the above reveals that the process can be algorithmically determined. The construction of the syntax tree is essentially achieved by leveraging the information contained in the XML file:

R_{t} = A n a l ({T a g}_{x m l}, {N a m e}_{x m l}, {R e l a}_{x m l})

(1)

where

R_{t}

represents the resulting syntax tree,

{R e l a}_{x m l}

denotes the data relations in XML, and

A n a l

represents the analysis algorithm. When combining the results of Formula (1) to reshape the RTL code, the emphasis is on the regularity of the derivation process and the correction of syntax:

R_{c} = D e r (T r a v_a l g o (R_{t}), {D a t a}_{x m l}, {S y n}_{R T L})

(2)

Here,

R_{c}

signifies the refactored RTL code,

D e r

and

T r a v_a l g o

represent the refactoring and traversal algorithms proposed, while

{S y n}_{R T L}

denotes the categorized RTL grammar.

This study integrates and standardizes a series of operations in which a syntax tree is deduced from an XML file content, and then the syntax tree is deduced into RTL code. A refactoring model for RTL code has been formulated based on these operations. The RTL code refactoring model, based on the XML syntax tree, facilitates obfuscation and modification of RTL code. All subsequent research on obfuscation methods is rooted in this refactoring model.

4. RTL Code Obfuscation Methods

4.1. Layout Obfuscation

The RTL code itself contains a lot of useful information during development. Whether it is a register name, a module name, or something else, most of it has its function and meaning in its name. Comments added to the code also expose specific information, so layout obfuscation is necessary. This study employs two specific layout obfuscation methods:

Name replacement: Replace register names, module names, function names, etc. in the RTL code.
Information removal: Remove all useless spaces and useless comments from the RTL code.

The name replacement method employs randomly generated irreversible strings for information substitution [17]. The key information such as module names, function names, variable names, register names, etc. are first extracted from the XML syntax tree, and the duplicates are saved only once. Subsequently, a unique, meaningless encrypted string is randomly generated for each piece of information. This ensures that each information in the code corresponds to a distinct encrypted string. The original RTL code is then replaced with these meaningless strings.

The information removal method is employed to eliminate hints in the RTL code that explain the function of the code and the meaning of variables. These details could aid attackers in analyzing the RTL code, necessitating the removal of these insecure comments. Also, there are a lot of spaces in the code that are added to make the code easier to read, and these spaces make the structure of the code clearer. In order to increase the difficulty of analyzing the code, all the spaces that do not affect the operation of the code are removed during the obfuscation process to make the structure of the code more compact and difficult to read [18].

As illustrated in Figure 4, the application of name replacement and information removal results in the removal of the vast majority of meaningful information from the RTL code. These operations are irreversible, significantly increasing the difficulty for attackers to read and analyze the code. In order to facilitate subsequent debugging, a replacement information file is output to the user after layout obfuscation. Consequently, the security of the RTL code is further enhanced, and this transformation does not introduce substantial additional overhead.

4.2. Parameter Obfuscation

The use of parameters is very common in RTL code, with each parameter holding significant meaning and potentially altering the final result of the code. Obfuscation of parameters is necessary to avoid exposing them to attackers.

During the parsing of the RTL code, mathematical operations involving the original parameters are concealed and substituted with the result of the operation. As illustrated in Figure 5a, the parameter addition operation represented by <add> is substituted with the result of its operation. The corresponding code has also changed as shown in the figure. Subsequently, the obfuscation tool randomly selects a suitable node in the syntax tree, and the node is altered to a meaningless mathematical operation. False and meaningless parameters are then introduced into the operation for the purpose of disruption. As shown in Figure 5b, data 4′b0100 is replaced by an equivalent addition operation with false parameter.

After the above process, the final obfuscated code eliminates the original parameters from the module writing and instantiation declarations, erasing any trace of the original parameters. Simultaneously, false parameters are introduced into both the module writing and instantiation declarations. In order to further confuse the attacker, the false parameters in both places have different data.

4.3. Critical Path Obfuscation

After obtaining the obfuscated code, the attacker may choose to forgo analysis and cracking, proceeding directly to simulation, synthesis, or even proceed to the chip fabrication of the obfuscated code. Therefore, it is necessary to prevent the crazy infringement.

In this paper, we propose an obfuscation scheme targeting critical paths. First, a syntax tree is built by analyzing the XML file, and all critical paths are identified and saved. Subsequently, as shown in Figure 6, when obfuscating one of these critical paths, a specialized module is inserted to extend the path. These special modules can be codec modules, First In, First Out (FIFO) modules, or even internal modules sourced from the project. The existence of these modules is concealed with a complex logic that is challenging to detect, effectively slowing down the critical path and consequently reducing the circuit’s performance.

Upon completion of the obfuscation process, the circuit’s operational speed is constrained due to the elongated critical path, potentially resulting in higher power consumption. This outcome ultimately impacts the reliability and stability of the entire circuit. Such an obfuscation method can provide effective protection to the RTL code.

4.4. Code Increment Obfuscation

The splice character enables the assembly of multiple statements into a single statement, and the “generate” statement constructs a loop structure for creating multiple instantiations of a module. Although this type of statement is convenient for developers and enhances code clarity, it also provides useful information for attackers analyzing RTL code. Code increment obfuscation can be employed to disassemble these “aggregate” statements with the assistance of XML file contents, making them less compact. Statements using the splice character lose their associations and are split into multiple statements, and the loop statement of the “generate” statement is broken up. This obfuscation ensures code correctness and heightens the difficulty of analysis.

During the parsing phase of RTL code, “aggregate” statements are identified, and their internal operations are uniquely tagged. Subsequent operations on the syntax tree will specifically handle these tagged labels, categorizing their internal operations to ensure syntactic correctness in the subsequent code refactoring. In the obfuscation process, the “aggregate” statement is reconstructed based on the processing in the initial two steps, effectively achieving the goal of adding code without altering the original circuit.

This form of obfuscation significantly augments the volume of RTL code without increasing actual circuit overhead. When coupled with layout obfuscation, it effectively heightens the challenge faced by attackers attempting to read and analyze the code.

5. Experimentation and Performance Evaluation of Obfuscation Methods

5.1. Implementation of Obfuscation Tool

To thoroughly assess the efficacy of the proposed methods for obfuscating RTL code, an obfuscation tool that incorporates all the methods outlined in this paper is developed. This tool serves as a foundation for conducting experiments on the obfuscation methods, allowing for more detailed and scientifically rigorous evaluation tests.

We choose to employ Python for the implementation of the tool. The tool is designed to perform the analysis of the syntax tree and execute the RTL code refactoring model, utilizing this model to carry out the obfuscation operations on RTL code. Specifically, the tool initially reads and analyzes both the source code and XML files, constructing a syntax tree based on this analysis. Subsequently, it extracts tags from the XML syntax tree. Following this, the tool executes the refactoring and obfuscation of the code using the written RTL syntax. The obfuscation methods are implemented at different stages of the tool’s operation, ensuring the absence of data conflicts between the various methods and facilitating the efficient completion of the obfuscation task.

5.2. Performance Evaluation

To offer a more lucid assessment of the specific performance of obfuscation tools, this paper proposes the following criteria for evaluation:

Equivalence: Examining whether the circuit functions and data paths remain equivalent compared to the original RTL code.
Obfuscation Rate: Determining the proportion of the original code that has been obfuscated.
Security: Assessing the degree to which the obfuscated code can be reasonably concealed and the difficulty an attacker faces in attempting to crack it.
Cost: The additional cost of code obfuscation [19].
Time: The time required to complete all obfuscation methods.

In the proposed evaluation criteria, equivalence is assessed by comparing the functionality and logic between the original and obfuscated circuits. In real development scenarios, users do not want the obfuscation to modify the data path and control logic of the original circuit. Therefore, a formal verification tool is used to map and compare the nodes of the obfuscated code and the original code, specifically through the reduced-order binary decision diagram (ROBDD) used to express the logical paths of the nodes [20]. If the ROBDDs of both mapped nodes are the same, then the function of the node has not been changed after the obfuscation. Therefore, for formal verification, identical node information indicates the equivalence between the obfuscated circuit and the original circuit.

The obfuscation rate is assessed by calculating the percentage of the original code that undergoes obfuscation. The obfuscation rate reflects the comprehensiveness of the obfuscation tool, and whether the tool takes into account the adverse effects of macro definition parameters. A higher obfuscation rate implies a reduced impact of macro definitions and a diminished exposure of the original code.

The security assessment is conducted by analyzing obfuscation methods within the tool. Layout obfuscation hides all the information, and the one-way substitution of information makes it impossible for the attacker to decipher the original information. Parameter obfuscation conceals all parameters, utilizing a substantial volume of data and false parameters, making it nearly impossible for attackers to discern the original parameters’ location and the data. Consequently, attackers remain uninformed about the project’s configuration information. Critical path obfuscation is not the insertion of meaningless modules. The modules inserted into critical path obfuscation have actual operations, making it challenging to arouse suspicion among attackers. Even if an attacker is resolute to remove these modules, the process incurs substantial costs, establishing this method as highly resistant to compromise. Code increment obfuscation also add a lot of trouble to the attacker’s analysis.

The cost evaluation involves comparing the variance in circuit overhead between the obfuscated code and the original code. The anticipated scenario is that the circuit incurs minimal additional overhead when critical path obfuscation is not employed. In contrast, when critical path obfuscation is applied, the circuit’s metrics undergo changes, with a more noticeable impact on timing metrics, as a way to ensure that the obfuscation objective is reached. This aligns with the specific requirements of actual development.

The assessment of the time required to complete obfuscation operations is straightforward: it involves measuring the duration from the initiation of the obfuscation tool to its completion.

The experimental evaluation specifically selected four open-source IP cores from the OpenCore [21] website. After the obfuscation of these IP cores, the tool’s performance was assessed using the metrics outlined earlier. Table 2 shows the obfuscation effectiveness of the tool. From the table, it clearly indicates that the obfuscation operation performed by the tool maintains the equivalence of the original code logic. In the absence of critical path obfuscation, the additional overhead incurred by the circuit is essentially none, so the impact is negligible. When using critical path obfuscation, various performance metrics of the circuit undergo alterations without compromising the functionality of the circuit. Timing, as a crucial metric for the critical path obfuscation method, demonstrates a significant delay with an average change value of 4.7%. The change in circuit overhead meets the criteria expected from the experiment. The obfuscation rate consistently remains at 98% or higher, indicating successful coverage of a substantial portion of the code. The duration of obfuscation is expected to exhibit a positive correlation with the size of the RTL code content. However, the precise time is also influenced by the complexity associated with reconstructing the data.

Critical path obfuscation differs from other obfuscation methods in that it changes the structure of the project and the evaluation of the obfuscation effect is more complex. After the critical path obfuscation, part of the code is changed, but the inserted modules ensure that the input data does not change, preserving functional equivalence. However, this definitely increases the various overheads of the circuit but is perfectly acceptable for developers who need it.

5.3. Evaluation of Obfuscation Features

Combining the design of the obfuscation methods and the specific experimental performance, the features of the proposed obfuscation methods are summarized in the following two points.

5.3.1. High Obfuscation Coverage

In practice, when simulating RTL code, different macro definition parameters can affect the actual operation and results of the project. Certain obfuscation tools might face interference from these macro definitions, potentially leading to incomplete code obfuscating. This paper addresses this scenario by specifically analyzing XML files generated under different macro definition parameters. The goal is to identify discrepancies and obfuscate the code comprehensively, ensuring that no portion of the code is overlooked. The resulting obfuscated code is likely to have a coverage rate of 98% or more, and the coverage rate of some project code can reach 100%.

5.3.2. Preservation of Compiler Indicative Comments

Compiler indicative comments differ from typical explanatory comments in that they provide guidance to the EDA tool during simulation, synthesis, and other processes involving the RTL code. Deleting these comments can significantly impact the results of subsequent RTL code operations, and such action is inconsistent with the precision and rigor demanded in engineering practices. Preserving indicative comments is essential for ensuring the accurate processing of the code by EDA tools.

To prevent errors in the project, the proposed obfuscation methods filter the original code content based on the indicative comments provided in Table 3. These comments are intentionally preserved in the obfuscated code. This approach ensures that crucial guidance and instructions for the EDA tool are maintained, contributing to the accuracy and reliability of the obfuscated code during subsequent project phases.

5.4. Performance Comparison

The subsequent experiment aims to compare the proposed obfuscation methods to assess the distinctions between these methods. The experimental scheme involves sequentially abandoning each obfuscation method and subsequently performing obfuscation. The obfuscation outcomes are analyzed and compared, and the evaluation metrics are specifically chosen to be the additional circuit area overhead, the change in space occupancy, and the obfuscation rate.

After conducting multiple experiments and analyzing the results presented in Table 4, Table 5 and Table 6, the following experimental conclusions can be summarized:

The impact on area overhead shows that critical path obfuscation has the most significant impact on the additional performance overhead of the circuit. This is due to the fact that the method leads to an increase in circuit modules. In contrast, other obfuscation methods primarily alter the code and presentation, resulting in a relatively stable additional overhead, often stable at 0%.
Regarding changes in space occupancy, the absence of layout obfuscation or critical path obfuscation leads to more substantial changes compared to scenarios where all obfuscation methods are applied. Consequently, layout obfuscation and critical path obfuscation emerge as factors with a greater impact on space occupancy.
In terms of the rate of code obfuscation, the absence of layout obfuscation results in a significant reduction, dropping to a range between 20% and 40%. Therefore, layout obfuscation is the key to ensuring high obfuscation coverage.

Table 7 summarizes the characteristics of the different obfuscation methods, which will help subsequent users to choose more appropriate obfuscation methods for different needs. The utilization of critical path obfuscation demands careful consideration, as this method is invariably a double-edged sword. While it serves to safeguard the code from attacks, it concurrently results in a pronounced degradation of circuit performance.

Simultaneously, to provide a more comprehensive assessment of the obfuscation tool’s performance, an experimental comparison was conducted using both the obfuscation tool developed in this paper and the VCS tool. For the four open source IP cores, the VCS tool, utilizing version 2020, employed the command “vcs -full64 -Xman=1 demo.v” to execute the code obfuscation. Meanwhile, the obfuscation tool developed in this research will use the proposed obfuscation methods.

The experimental results are analyzed as follows. As depicted in Figure 7a, the VCS tool consolidates all obfuscated content into one file, leading to substantial compression of the project size. However, this compression comes at the cost of losing the original project architecture. In contrast, the obfuscation tool adheres to the principle of preserving the original project architecture. After employing various obfuscation methods, not only does the architecture remain unchanged, but the file size also undergoes minimal alteration. Therefore, in terms of space occupancy, while obfuscation methods may introduce changes, the overall impact is generally less than 10%. This slight additional space overhead is considered entirely acceptable to ensure the security and stability of the RTL code.

As observed in Figure 7b, concerning code obfuscation coverage, the analysis of RTL code by VCS is easily affected by macro-defined parameters, which leads to part of the code being overlooked and not obfuscated. In contrast, obfuscation tools exhibit higher and more stable obfuscation rates than VCS tools. This superiority is attributed to the obfuscation tools’ capability to comprehensively analyze the code, ensuring a more thorough obfuscation coverage.

In the comparison of obfuscation time, as illustrated in Figure 7c, VCS completes the obfuscation in less than 1 s. In contrast, the obfuscation tool requires time for analyzing the XML syntax tree, refactoring the code, and using multiple obfuscation methods, resulting in a noticeable gap in runtime compared to VCS. Despite the longer runtime of the obfuscation tool, the protection measures are more varied and stringent. For the complexity and security of the obfuscation tool, a runtime of a few seconds is still perfectly reasonable.

Several tests confirm that the proposed obfuscation methods effectively secure RTL code while preserving its correctness. Compared with other obfuscation tools, the proposed methods are able to perform multiple complex obfuscation operations with low overheads, although they do not have advantages in terms of space occupancy and obfuscation time. In inter-method comparisons, the obfuscation methods in this paper exhibit greater diversity, showcasing enhanced code protection abilities and heightened practical application potential.

6. Conclusions

This paper introduces several RTL code obfuscation methods based on the XML syntax tree. Specifically, four types of methods, namely, layout obfuscation, parameter obfuscation, critical path obfuscation, and code increment obfuscation, are proposed to change the presentation of the original code, while other strategies are used to further enhance the robustness of the obfuscated code. These proposed obfuscation methods for RTL code are effective in preventing intellectual property infringement and protecting individual or collective interests, improving practicality and making it easier to automate.

The reliability of these obfuscation methods is demonstrated through experiments on multiple IP cores using a self-developed obfuscation tool. The methods are characterized by high obfuscation coverage and preservation of compiler indicative Comments. Comparative experiments highlight the distinct advantages of the proposed obfuscation methods, making them suitable for RTL code protection tasks in real-world scenarios.

Moving forward, research work will innovate in obfuscation methods and explore dynamic obfuscation techniques that can dynamically modify code at runtime.

Author Contributions

Conceptualization, H.Y., J.Z. and S.L.; methodology, H.Y., J.Z. and S.L.; software, H.Y.; validation, H.Y. and S.L.; formal analysis, J.Z. and H.Y.; resources, S.L.; writing—original draft preparation, H.Y. and S.L.; writing—review and editing, J.Z. and S.L.; supervision, J.Z. and S.L.; funding acquisition, J.Z. and S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 61972055), the Research on High-performance Molecular Dynamics Simulation Technology (Automatic Task Management and Parallel Visualization Rendering Sub Project) (Grant No. 31511010402), the National Defense Science and Technology Key Laboratory Fund Project (Grant No. 2021-KJWPDL-17), the Research on High Energy Efficiency Microprocessor technology and the Key Laboratory of Advanced Microprocessor Chips and Systems (Grant No. 2019-JCJQ-ZD-090-00).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

You can find the open-source IP cores at https://opencores.org/ (accessed on 27 November 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, X.; Liu, J. The development of China’s integrated circuit industry: Review of the “13th Five-Year Plan” and the outlook of the “14th Five-Year Plan”. Mod. Econ. Res. 2021, 3, 87–96. [Google Scholar]
Arm Sues Qualcomm for Destruction of Related Chip Designs. Available online: https://new.qq.com/rain/a/20220902A00XVY00 (accessed on 7 December 2023).
Li, L.; Zhang, F.; LI, G. A review of research on code obfuscation techniques. Software 2020, 41, 62–65. [Google Scholar]
Speith, J.; Schweins, F.; Ender, M.; Fyrbiak, M.; May, A.; Paar, C. How Not to Protect Your IP—An Industry-Wide Break of IEEE 1735 Implementations. In Proceedings of the 2022 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 23–26 May 2022. [Google Scholar]
Collberg, C.; Thomborson, C.; Low, D. A Taxonomy of Obfuscating Transformations; Department of Computer Science, The University of Auckland: Auckland, New Zealand, 1997. [Google Scholar]
Meyer-Bäse, U.; Castillo, E.; Botella, G. Intellectual property protection (IPP) using obfuscation in C, VHDL, and verilog coding. In Proceedings of the Independent Component Analyses, Wavelets, Neural Networks, Biosystems, and Nanoengineering IX, Bellingham, CA, USA, 9 June 2011. [Google Scholar]
Hoffmann, M.; Paar, C. Stealthy Opaque Predicates in Hardware—Obfuscating Constant Expressions at Negligible Overhead. arXiv 2019, arXiv:1910.00949. [Google Scholar] [CrossRef]
Sengupta, A.; Chaurasia, R. Securing IP Cores for DSP Applications Using Structural Obfuscation and Chromosomal DNA Impression. IEEE Access 2022, 10, 50903–50913. [Google Scholar] [CrossRef]
Rahman, M.S.; Guo, R.; Kamali, H.M. ReTrustFSM: Toward RTL Hardware Obfuscation-A Hybrid FSM Approach. IEEE Access 2023, 11, 19741–19761. [Google Scholar] [CrossRef]
Niazi, A.; Zheng, S.; Nguyen, C.; Okhmatovski, V. Full-Wave Analysis of Interconnects in Finite Substrates with Layered Media Formulation of SVS-EFIE for 3D Composite Metal-Dielectric Structures. In Proceedings of the 2023 IEEE 32nd Conference on Electrical Performance of Electronic Packaging and Systems (EPEPS), Milpitas, CA, USA, 15–18 October 2023. [Google Scholar]
Koushanfar, F.; Qu, G. Hardware metering. In Proceedings of the 38th Annual Design Automation Conference, New York, NY, USA, 22 June 2001. [Google Scholar]
Li, L.; Zhou, H. Structural transformation for best-possible obfuscation of sequential circuits. In Proceedings of the 2013 IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), Austin, TX, USA, 2–3 June 2013. [Google Scholar]
Islam, S.A.; Sah, L.K.; Katkoori, S. High-Level Synthesis of Key-Obfuscated RTL IP with Design Lockout and Camouflaging. ACM Trans. Design Autom. Electron. Syst. 2022, 26, 1–35. [Google Scholar] [CrossRef]
Karmakar, R.; Chattopadhyay, S. Hardware IP Protection Using Logic Encryption and Watermarking. In Proceedings of the 2020 IEEE International Test Conference (ITC), Washington, DC, USA, 1–6 November 2020. [Google Scholar]
Takamaeda-Yamazaki, S. Pyverilog: A python-based hardware design processing toolkit for verilog hdl. In Proceedings of the Applied Reconfigurable Computing: 11th International Symposium, Bochum, Germany, 13–17 April 2015. [Google Scholar]
Zhang, F.; Li, Q. Constructing ontologies by mining deep semantics from XML Schemas and XML instance documents. Int. J. Intell. Syst. 2022, 37, 661–698. [Google Scholar] [CrossRef]
Kumar, K.A.; Verma, A.; Kumar, H. Smart Contract Obfuscation Technique to Enhance Code Security and Prevent Code Reusability. Int. J. Math. Sci. Comput. 2022, 3, 30–36. [Google Scholar]
Wang, Y. Code Obfuscation System Based on Control Obfuscation and Layout Obfuscation. Master’s Thesis, University of Science and Technology of China, Hefei, China, 2017. [Google Scholar]
Chakraborty, R.S.; Bhunia, S. RTL hardware IP protection using key-based control and data flow obfuscation. In Proceedings of the 2010 23rd International Conference on VLSI Design, Bangalore, India, 3–7 January 2010. [Google Scholar]
Bryant, R.E. Graph-based algorithms for boolean function manipulation. IEEE Trans. Comput. 1986, 35, 677–691. [Google Scholar] [CrossRef]
OpenCores. Available online: https://opencores.org/ (accessed on 15 August 2023).

Figure 1. The <always> tag and its sub-tags.

Figure 2. The corresponding syntax tree generated by parsing tags.

Figure 3. Parsing Syntax Trees to Refactor Out RTL Code.

Figure 4. Layout obfuscation schematic.

Figure 5. Schematic diagram of parameter changes in the syntax tree and code. (a) The original parameter is eliminated from the syntax tree and code; (b) adding false parameter to the syntax tree and code.

Figure 6. Schematic diagram of critical path changes.

Figure 7. Comparison between VCS tool and obfuscation tool in this article. (a) Comparison of space occupancy before and after obfuscation; (b) Comparison of obfuscation coverage; (c) Obfuscation time comparison.

Table 1. Advantages and disadvantages of different parsers.

Tool	Supports Full Syntax	Multi-File Processing	Format of Parsing Results
Verilator	√	√	XML
verilog-perl		√	Tool Customized Format
Pyverilog [15]		√	AST
verilog-parser	√	√	AST

√ Meets the corresponding description.

Table 2. Obfuscation effect of 4 sets of IP cores.

IP Core	Occupancy Space (KB)	Equivalence	Circuit Performance Changes (%)						Obfuscation Rate (%)	Consume Time ¹ (s)
			Critical Path Obfuscation U ¹			Critical Path Obfuscation N U ²
			Area	TIMING	Power	Area	Timing	Power
MCPU	27.3	√	3.7	4.6	2.6	0.0	0.0	0.0	100.0	1.23
DUSB	148.6	√	3.1	5.5	3.3	0.0	0.0	0.0	100.0	2.35
DAHB64	410.4	√	8.1	4.1	5.0	0.0	0.0	0.0	100.0	3.11
ALTOR	298.5	√	4.6	4.7	3.8	0.0	0.0	0.0	98.4	3.55

¹ Critical path obfuscation used. ² Critical path obfuscation not used. √ Meets the corresponding description.

Table 3. Compiler indicative comments that will be reserved.

Common Prefix	Follow-Up Content
//synopsys	trasnslate_off & translate_on
	parallel_case
	full_case
	async_set_reset
	sync_set_reset
	dc_tcl_script_begin & dc_tcl_script_end
	one_cold
	one_hot
//cadence	full_case
	parallel_case
	trasnslate_off & translate_on
	async_set_reset
	sync_set_reset

Table 4. Additional overhead for circuits that do not use a certain obfuscation method.

Methods of Obfuscation Not Used	MCPU	DUSB	DAHB64	ALTOR
None *	3.9%	3.5%	8.6%	4.5%
Layout obfuscation	2.9%	3.7%	9.8%	4.6%
Parameter obfuscation	3.2%	4.3%	9.0%	4.5%
Critical path obfuscation	0.0%	0.0%	0.0%	0.0%
Code increment obfuscation	3.8%	4.2%	8.5%	4.7%

* Use all obfuscation methods proposed.

Table 5. Change in space occupancy without using a certain obfuscation method.

Methods of Obfuscation Not Used	MCPU	DUSB	DAHB64	ALTOR
None *	3.7%	−12.8%	5.8%	4.3%
Layout obfuscation	7.4%	4.5%	8.3%	7.9%
Parameter obfuscation	3.5%	−13.3%	5.4%	4.1%
Critical path obfuscation	−2.5%	−15.4%	0.1%	−1.9%
Code increment obfuscation	3.4%	−13.6%	3.8%	3.6%

* Use all obfuscation methods proposed.

Table 6. The obfuscation rate without using a certain obfuscation method.

Methods of Obfuscation Not Used	MCPU	DUSB	DAHB64	ALTOR
None *	100.0%	100.0%	100.0%	98.4%
Layout obfuscation	20.4%	28.5%	34.8%	32.7%
Parameter obfuscation	97.0%	96.7%	98.6%	96.3%
Critical path obfuscation	100.0%	100.0%	100.0%	98.4%
Code increment obfuscation	100.0%	100.0%	100.0%	98.4%

* Use all obfuscation methods proposed.

Table 7. Characteristics of code obfuscation methods.

Obfuscation Methods	High Obfuscation Coverage	Primary Data Loss	Reduce Circuit Performance	Change in Space Occupancy	Circuit Overhead
Layout obfuscation	√	√		Decrease	No effect
Parameter obfuscation		√		Increase	No effect
Critical path obfuscation			√	Increase	Increase
Code increment obfuscation				Increase	No effect

√ Meets the corresponding description.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yi, H.; Zhang, J.; Liu, S. Optimizing RTL Code Obfuscation: New Methods Based on XML Syntax Tree. Appl. Sci. 2024, 14, 243. https://doi.org/10.3390/app14010243

AMA Style

Yi H, Zhang J, Liu S. Optimizing RTL Code Obfuscation: New Methods Based on XML Syntax Tree. Applied Sciences. 2024; 14(1):243. https://doi.org/10.3390/app14010243

Chicago/Turabian Style

Yi, Hanwen, Jin Zhang, and Sheng Liu. 2024. "Optimizing RTL Code Obfuscation: New Methods Based on XML Syntax Tree" Applied Sciences 14, no. 1: 243. https://doi.org/10.3390/app14010243

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimizing RTL Code Obfuscation: New Methods Based on XML Syntax Tree

Abstract

1. Introduction

2. Related Work

3. Code Refactoring Model Based on XML Syntax Tree

3.1. RTL Code Parse

3.2. RTL Code Refactoring Model

4. RTL Code Obfuscation Methods

4.1. Layout Obfuscation

4.2. Parameter Obfuscation

4.3. Critical Path Obfuscation

4.4. Code Increment Obfuscation

5. Experimentation and Performance Evaluation of Obfuscation Methods

5.1. Implementation of Obfuscation Tool

5.2. Performance Evaluation

5.3. Evaluation of Obfuscation Features

5.3.1. High Obfuscation Coverage

5.3.2. Preservation of Compiler Indicative Comments

5.4. Performance Comparison

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI