Next Article in Journal
Datamator: An Authoring Tool for Creating Datamations via Data Query Decomposition
Next Article in Special Issue
Automated System-Level Malware Detection Using Machine Learning: A Comprehensive Review
Previous Article in Journal
Electric Load Forecasting Based on Deep Ensemble Learning
Previous Article in Special Issue
An Efficient Confidence Interval-Based Dual-Key Fuzzy Vault Scheme for Operator Authentication of Autonomous Unmanned Aerial Vehicles
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Finding Taint-Style Vulnerabilities in Lua Application of IoT Firmware with Progressive Static Analysis

1
School of Cyber Science and Engineering, Information Engineering University, Zhengzhou 450000, China
2
State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou 450000, China
*
Authors to whom correspondence should be addressed.
Appl. Sci. 2023, 13(17), 9710; https://doi.org/10.3390/app13179710
Submission received: 2 August 2023 / Revised: 19 August 2023 / Accepted: 21 August 2023 / Published: 28 August 2023
(This article belongs to the Special Issue Security Challenges for the Internet of Things and Mobile Networks)

Abstract

:
With the rapid growth of IoT devices, ensuring the security of embedded firmware has become a critical concern. Despite advances in existing vulnerability discovery methods, previous research has been limited to vulnerabilities occurring in binary programs. Although an increasing number of vendors are utilizing Lua scripting language in firmware development, no automated method is currently available to discover vulnerabilities in Lua-based programs. To fill this gap, in this paper, we propose FLuaScan, a novel progressive static analysis approach specifically designed to detect taint-style vulnerabilities in Lua applications in IoT firmware. FLuaScan first heuristically locates the code that handles user input, then divides the code into different segments to conduct a progressive taint analysis. Finally, a graph-based search method is applied to identify vulnerable code that satisfies the conditions of taint propagation. To comprehensively compare FLuaScan with state-of-the-art tool Tscancode, we conducted various experiments on a dataset consisting of 13 real-world firmware samples from different vendors. The results demonstrate the superior performance of FLuaScan in terms of accuracy (increased TP rate from 0% to 42.50%), effectiveness (discovered 21 vulnerabilities, of which 7 are unknown), and practicality (acceptable time overhead and visual output to assist in manual analysis).

1. Introduction

The rapid development of Internet of Things (IoT) technology has resulted in the proliferation of embedded devices in everyday life. Reports project that the number of IoT terminals worldwide will increase to 1.67 billion by 2023 and to 4.16 billion by 2025 [1]. Unfortunately, IoT manufacturers exhibit a lack of software security awareness, resulting in many IoT devices deployed on the Internet with either known or unknown vulnerabilities. Recent network attacks [2] have exposed the dire state of IoT security vulnerability, which significantly endangers the normal operation of both the network and the physical world. In 2016, Mirai compromised hundreds of thousands of DVRs and home routers to launch a DDoS attack against Internet mainstays like Netflix, Dyn, and GitHub [3]. Although numerous new technologies (e.g., edge computing [4,5,6] and artificial intelligence [7,8]) have been proposed and used to enhance network security, the current security situation of IoT devices remains weak [9]. It is crucial for the community to develop techniques that facilitate automatic vulnerability analysis of firmware.
Two fundamental methods for identifying firmware vulnerabilities are static analysis [10,11] and dynamic analysis [12,13,14]. Despite the progress made in dynamic analysis—for example, Firmadyne has successfully emulated the execution of over 1900 firmware images—this technique has several limitations, including low success rates and distorted simulation environments, leading to abnormal program behavior. Consequently, static analysis, which can ignore the complex environmental dependencies of firmware while detecting multiple vulnerabilities, is gaining interest in the research community as a complementary method to dynamic analysis. For example, KARONTE [11] is a static analysis framework for embedded firmware that can discover vulnerabilities due to multibinary interactions. SaTC [10] also performs taint analysis to discover bugs in binary programs. It utilizes shared keywords related to user input in the front end and back end to infer the taint source.
Currently, the analysis target of most research on static analysis is limited to binary programs. This limitation is troublesome because IoT manufacturers are increasingly using scripting languages (especially Lua) in their development. Meanwhile, vulnerabilities in script programs are more harmful and easier to exploit, since they are often unrestricted by traditional security defense mechanisms such as ASLR, DEP, and Stack Canary. Despite the possibility of numerous high-risk vulnerabilities that could compromise device security, the academic community lacks methods to identify such bugs automatically.
To fill this gap, we propose FLuaScan, the first scalable software system to automatically expose vulnerabilities in the Lua applications of IoT firmware. FLuaScan first locates input processing code from complex multifile code based on the characteristics of dispatch. Then, we devised a progressive taint analysis method to address the limitations of current static taint analysis engines that hinder cross-functional and cross-file analysis. This method facilitates the examination of taint propagation from processing handlers to sink points and effectively performs static vulnerability analysis.
We conducted a comparative analysis of FLuaScan and existing tools on a total of 13 devices. The experiments demonstrate that FLuaScan exhibits higher accuracy, efficacy, and practicality than state-of-the-art tool Tscancode. FLuaScan achieves an average accuracy of 42% in detecting vulnerabilities within a duration of 30 s for real-world firmware images. Finally, we employed FLuaScan to identify a total of 21 vulnerabilities, 7 of which were previously undocumented.
The contributions of this paper are summarized as follows:
  • We observed and analyzed the distribution patterns employed by vendors for different functional code, subsequently proposing a novel technique for precise localization of service handlers within vendor-made service programs;
  • We designed and implemented FLuaScan, which utilizes progressive taint analysis to effectively discover taint-style vulnerabilities in the Lua applications of IoT firmware;
  • We evaluated our tool against 13 firmware programs in popular, real-world embedded devices. Experiments demonstrate that FLuaScan outperforms the state-of-the-art tool in terms of correctness, effectiveness, and practicality. Specifically, we found 21 vulnerabilities, 7 of which were previously unknown.

2. Background

2.1. WEB Application in Firmware

IoT devices usually provide a web-based interface for end users to configure the system. In the early stage, the web server is usually written in C language. However, now, more and more vendors tend to use powerful scripting languages in development, especially Lua. Although there are no standards for how to implement this interface, many vendors prefer to utilize web technology because of its flexibility and simplicity. However, it has been proven that developing secure web applications is a non-trivial task [15]. Web servers have become the primary targets for hackers due to their vulnerabilities that can be remotely exploited, allowing for the acquisition of elevated privileges.

2.2. Taint-Style Vulnerabilities

Web applications often provide many features for end users and act correspondingly on user inputs and interactions, e.g., form submissions or clicks. A security vulnerability occurs when user-supplied data (i.e., taint) are not sufficiently sanitized and used in critical operations (i.e., sinks) of the application. Such vulnerabilities are known as taint-style vulnerabilities [16,17]. Taint-style vulnerabilities are a persistent security threat to web applications. Common types of vulnerabilities, such as command injection (CI), SQL injection (SQLi), etc., are instances of taint-style vulnerabilities. An attacker might exploit such a flaw by providing malicious inputs to change the expected behavior of the application, e.g., injecting malicious code.

2.3. Vulnerability Detection Methods in Embedded Devices

In recent years, numerous efforts have been made to identify vulnerabilities in embedded devices. This section provides a detailed analysis of these methods, highlighting their strengths and weaknesses. Furthermore, we point out the shortcomings in the existing research.
We use Table 1 to organize existing vulnerability detection literature. The “Literature” column records the abbreviation of the literature. The “Dependence” column indicates the conditions on which the work relies. The “Technique” column represents the core technology used in the work. The “Purpose” column indicates the research objectives of the work. The “Target” column records the objects analyzed in the work.
According to the technical approach, there are two main categories: dynamic analysis and static analysis.
Dynamic analysis involves running the embedded devices in a certain way and conducting analysis during the runtime [32]. Such methods can be divided into two categories based on research strategy: emulation and fuzzing. Emulation focuses on running firmware to mimic the behavior of a real device as closely as possible. FIRMADYNE [24] is a firmware emulation framework designed for automated dynamic analysis in large-scale embedded firmware. Although it is promising, its emulation rate of network reachability and web service availability is considerably low. FIRMAE [25] uses several heuristics to address the problems encountered in FIRMADYNE and increases the emulation success rate. However, it can only handle observed cases and may not handle problems in new devices. AVATAR [26] enables complex dynamic analysis of embedded devices by orchestrating the execution of an emulator, together with the real hardware. FirmFuzz [31] employs a greybox-based generational fuzzing approach coupled with static analysis and system introspection to provide targeted and deterministic bug discovery within a firmware image. The above examples of dynamic analysis rely on emulation, but accurate emulation is a non-trivial task in practice due to the diversity of embedded devices [33]. Many works [34,35,36,37,38,39] have made advances in providing large-scale and reliable emulation environments for dynamic analysis, although many problems remain unsolved [13]. Emulation does not directly discover vulnerabilities, whereas fuzzing can [40]. Table 1 introduces some typical fuzzing works. RPfuzzer [28] is a fuzzing framework specifically designed to find protocol vulnerabilities in router devices. IoTFuzzer [27] attempts to find memory corruption vulnerabilities in IoT devices via their official apps; therefore, it is firmware-free. Snipuzz [14] represents a novel automated blackbox fuzz testing approach to overcome the challenge of applying syntax-based fuzz testing to various non-standard communication protocol formats widely used in IoT devices. Diane [30] views the functions that are executed before data-transforming functions but after input validation, as fuzzing triggers and uses them to fuzz IOT devices automatically.
Static analysis refers to the process of discovering vulnerabilities in a program without executing it. They can be divided into two categories based on research strategy: pattern-matching-based methods and taint-analysis-based methods. Pattern-matching-based methods identify inherent patterns in vulnerabilities and convert them into rules for automated vulnerability mining. Such methods are usually designed to find a backdoor in firmware. Costin [21] uses fuzzy hashing to match weak keys in the firmware and finds the similarity between different firmware images across four different dimensions through association analysis. Stringer [22] proposed a method based on static data comparison analysis to detect undisclosed features and hard-coded authentication backdoors in COTS device firmware. Taint analysis [17,41] is an effective approach to finding taint-style vulnerabilities. It tracks the propagation of taints introduced from external sources (e.g., untrusted user-supplied data) during program execution, checks if the tainted data could flow to critical program locations (sinks), and reports the sinks that could potentially be manipulated by attackers [42]. Several works have been proposed utilizing taint analysis methods for vulnerability detection in IoT devices. SainT [18] detected privacy-related vulnerabilities in IoT devices by tracing the information flow between sources and external sinks. To find cryptographic misuse vulnerabilities, CRYPTOREX [20] lifts binary code to a unified IR and performs static taint analysis across multiple executables. KARONTE [11] leverages static analysis techniques to perform multibinary taint analysis. SaTC [10] determines whether variable names are commonly shared between front-end files and back-end functions and leverages this information to enhance taint analysis.
In general, dynamic analysis has the advantage of low false-positive rates when discovering vulnerabilities, but its disadvantage is the need to address numerous dependency conditions and the difficulty of conducting large-scale analysis. Static analysis, on the other hand, has the advantage of requiring fewer dependency conditions and being suitable for large-scale analysis, but it has a higher false-positive rate. In this study, we chose to use the static analysis approach to explore vulnerabilities in embedded devices, focusing on high-impact taint-style vulnerabilities.
However, in the academic community, the current research focus in vulnerability discovery is primarily limited to binary programs, leaving vulnerabilities in Lua applications reliant on manual identification.

2.4. Tscancode and Semgrep

Although no tools have been designed to detect vulnerabilities in Lua applications of IoT firmware, several works support the analysis of Lua applications. To the best of our knowledge, TscanCode [43], which was released on GitHub with 1.7 k stars, can perform static analysis of Lua applications. It was developed by Tencent to help programmers identify code defects at the beginning of the design process. However, it was not designed for IoT scenarios, leading to a high-false negative rate when detecting firmware programs. Semgrep [44] is a fast, open-source static analysis engine for finding bugs, detecting dependency vulnerabilities, and enforcing code standards. Analysts can extract specific code snippets or perform taint analysis on programs using the interface provided by Semgrep. Additionally, Semgrep supports various programming languages, including Go, Java, C, and C++, among others. Owing to its robust capabilities and open nature, Semgrep has garnered widespread adoption and currently boasts 8.4k stars on GitHub. However, for the Lua language, Semgrep only supports intraprocedural analysis. Semgrep [44] plays a crucial role as an import component of the prototype system presented in this paper. Moreover, semgrep [44] offers an online playground (https://semgrep.dev/playground/ accessed on 20 August 2023) that facilitates a more profound comprehension of the code snippets provided in the subsequent sections.

3. Empirical Analysis and Problem Statement

In this section, we first present an empirical analysis of our research object. Then, based on the results, we make a detailed statement of our research problem.

3.1. Empirical Analysis

We conducted an empirical analysis to gain a precise understanding of the usage details of the Lua language in firmware. First, 552 firmware programs from 7 vendors were selected as the initial dataset. We wrote a program to automatically extract the system(rootfs) file from these firmware programs. Then, we traversed their rootfs to identify and record the language type of each script program (Lua, asp, php, etc.). The results regarding the Lua language are recorded in Table 2. The first row represents the brand of the images, and the second row represents the total number of images involved in the empirical analysis. If the number of Lua programs in the file system exceeds 50, we classify an image as special, while the third row records the number of special images. The fourth row (%) records the percentage of special images among all surveyed images, while the last row records the average number of Lua programs in the special images. The results presented in Table 1 reveal three facts.
Fact-1: IoT vendors widely use Lua programs in their product development, accounting for up to 56.70% of all surveyed images.
Fact-2: When vendors use Lua for development, it is usually in the form of multiple files.
Fact-3: The use of Lua programs is not limited to a single vendor but multiple vendors.

3.2. Research Scope and Research Goals

Fact-1 mentioned above indicates the significant prevalence of Lua program usage in firmware. However, the detection of vulnerabilities in these programs lacks an effective approach. Vulnerabilities in web servers are commonly high-risk, and Lua is widely used in IoT development. Consequently, in this work, we aim to strengthen static analysis methods for the identification of taint-style vulnerabilities within the Lua-based web server of firmware. We focus on data-flow analysis, despite the absence of a dedicated solution for Lua programs. Nevertheless, its theoretical foundation has found extensive utilization in both academic [10,11,45,46] and industrial [47] domains.

3.3. Research Challenge

Based on the facts presented in Section 3.1, designing and implementing a static taint analysis tool for Lua applications in firmware is challenging.
First, the development environment for IoT devices is heavily fragmented, with manufacturers exhibiting preferences for diverse frameworks. Most existing static analysis methods are designed for desktop systems, while their web applications often utilize popular frameworks. These frameworks have comprehensive documentation, which considerably aids in conducting static analysis. However, these methods cannot be directly ported to firmware objects. Meanwhile, as Fact-3 shows, multiple vendors use Lua to develop IoT devices. Furthermore, usually, they have their respective framework but without documentation. Consequently, data-flow analysis encounters challenges in pinpointing the initial input source from users.
Second, we need to propose a new interprocedural and cross-file taint-tracking algorithm to compensate for failures of existing methods. As Fact-2 shows, Lua applications in firmware are complicated and usually consist of various files. Therefore, a taint analysis with interprocedural and cross-file support is a prerequisite for accurate vulnerability detection. However, existing tools can only support intraprocedural analysis.
Thirdly, it is necessary to establish vulnerability models in the IoT field to rectify the shortcomings of current approaches. However, this is a formidable undertaking due to the intricate interdependence between software and hardware within IoT devices. Moreover, the web servers of these devices necessitate intricate interactions with back-end programs and underlying system modules.

4. Motivation

In this section, our core insight for solving this problem is demonstrated by utilizing motivating examples. After analyzing the web server based on Lua of multiple vendors, we discovered a common characteristic. In Lua applications, a dispatch module is commonly utilized to parse requested packets from users and direct them to corresponding handlers for real processing. Although different vendors may implement similar functionalities differently in web server development, a general convergence can be observed in their dispatch modules. Two typical approaches for implementation of the dispatch module are found, as illustrated in Listings 1 and 2.
As lines 2–4 show in Listing 1, the web server in the Xiaomi R3600 router dispatches different service handlers by registering different entries. Service handler refers to the code snippet implementing the desired work of a single interaction, such as the function setConfigIotDev in Listing 1, which is responsible for configuring the router’s wireless network. The term “registering entries” is coined to describe this type of dispatch method. As for the TP-Link Archer AXE75 router, its web server dispatches different service handlers through the dispatch_tbl function in the Listing 2. In the same way, we named this kind of dispatch a “handler table”.
There is a command injection vulnerability at line 21 of Listing 1. However, existing tool TscanCode [43] fails to find this vulnerability. The root cause is that Tscancode fails to locate the service handler and therefore does not apply precise data-flow analysis. Therefore, our insight is leveraging the dispatch module to enhance taint analysis. Specifically, we first locate the dispatch module by analyzing its appeal characteristics, then further extract the service handler. Afterward, taint analysis is launched starting from the service handler.
Listing 1. Code snippet from Xiaomi R3600.
Applsci 13 09710 i001
Listing 2. Code snippet from TP-Link Archer AXE75.
Applsci 13 09710 i002

5. Design and Implementation

5.1. Overall Design

Based on the above insights, we designed a prototype system, FLuaScan, which takes IoT firmware as initial input and ultimately outputs suspicious vulnerability points, along with their data flow details. As shown in Figure 1, FLuaScan consists of three tightly coupled modules.
As indicated by ❶ in Figure 1, the handler location module automatically extracts Lua applications from the firmware. Then, FLuaScan utilizes heuristic methods based on the characteristics described in Section 4 to identify the dispatch mode and locate service handlers.
Then, FLuaScan carries out a progressive taint analysis (❷ in Figure 1) starting from these service handlers. During the analysis process, our system divides multiple Lua files into segments and performs taint analysis in a segmented manner.
The final module is the graph-based vulnerability search module (❸ in Figure 1). It records the taint flows from module ❷ and organizes them as a graph based on their relationships. Then, FLuaScan utilizes graph search algorithms to identify the execution paths in dangerous functions that accept user input without sanitization and reports them as potential vulnerability points.

5.2. Handler Extraction

Given an IoT firmware, FLuaScan first extracts rootfs from the image file. FLuaScan subsequently traverses the rootfs, extracting all Lua files and recording their respective paths.
Thirdly, FLuaScan conducts a scan of each source file using semgrep. By applying the rules presented in Listing 3, two dispatch methods can be identified. In these cases, $F represents the function name of the handler, and the locations of these functions are recorded. Then, FLuaScan conducts an initial analysis of each handler to identify user inputs and pinpoint the initial point for data flow analysis. The user input of the web server sources from the HTTP protocol, so FLuaScan identifies the user-controllable portions in the HTTP request packets. The web server receives user input through the HTTP protocol. FLuaScan extracts the user-controllable portions from the HTTP request packets to find the taint source. As shown in lines 6 to 8 shown in Listing 4, three parts are user-controllable: the form data, environmental variables, and cookies. To record the user-controllable variables and support the following graph-based search, a novel concept called the analysis node is introduced.
Listing 3. Code snippet to extract service handler utilizing semgrep.
Applsci 13 09710 i003
N p : < a t t r , p o s , t y p e >
As shown in Equation (1), the analysis node is defined as a 3-tuple, including attr, pos, and type. The type field is used to distinguish between different analysis nodes. For user-controlled variables, the type is denoted as I, and the node name is represented as I x as an abbreviation (like I1 in Figure 2). The attr field is used to record additional information. Taking user-controllable variables as an example, the attr field records the function name of the handler. The pos field is responsible for recording the specific position of the node, including the file path and the starting and ending offset of the variable in the file.
Listing 4. Code snippet to execute handler analysis utilizing semgrep.
Applsci 13 09710 i004

5.3. Progressive Taint Analysis

Once user-controllable variables have been identified, FLuaScan employs taint analysis to expose vulnerabilities. However, the current limitation of semgrep [44] is that it exclusively supports intraprocedure analysis, potentially leading to false positives in vulnerability detection. To overcome this issue, we designed a progressive taint analysis engine. This engine divides multiple Lua files into segments and conducts taint analysis on each segment. It also records taint propagation across various functions and files to establish their relationships. FLuaScan initially employs the ctags [48] syntax analysis tool to parse each Lua file and assigns a distinct identifier to each object. Subsequently, this identifier is utilized for correlation analysis. Next, as shown in Figure 2, FLuaScan performs three rounds of taint analysis on all files: (a) handler analysis, (b) taint propagation, and (c) sink analysis.
During the (a) handler analysis process, FLuaScan aims to identify the data flow from user input to middle nodes. Middle nodes are defined as analysis nodes that have the potential to facilitate taint propagation between different functions. These nodes can be categorized into two types: (1)Invocation of subfunctions; during this process, taint can potentially propagate with the transfer of parameters. (2) Database write operations; during this process, taint can potentially propagate after being written to a database, then retrieved again. The type field of the middle mode is set as M (like M2 in Figure 2). The semgrep configuration used to accomplish this task is illustrated as Listing 4. We employed the taint mode of semgrep and designated the three user-controllable parts in the HTTP protocol as the sources of tainted data. As for the sink points, in Listing 4, lines 11–17 aim to find the invocation of subfunctions, while line 18 aims to find the database write operations. In Listing 4, lines 21–22 show the designs of sanitizers, including two categories: (1) vendor-customized security filtering functions, which perform security checks and filtering on variable content, and (2) string conversion functions, such as tonumber, making the user input unable to trigger vulnerabilities after the conversion.
During the (b) taint propagation process, FLuaScan aims to identify data flows within the invocation of subfunctions. The semgrep configuration utilized to achieve this task is depicted in Listing 5. The taint mode of semgrep is consistently used, while the configuration of sinks and sanitizers remains unchanged. However, the taint sources are been changed, consisting of two types: (1) parameters passed when a subfunction is called (lines 5–10 in Listing 5) and (2) data retrieved from the database (line 12 in Listing 5). The taint sources from subfunctions are defined as propagation nodes (like P1 in Figure 2), with their type field set to P.
Listing 5. Code snippet to execute handler analysis utilizing semgrep.
Applsci 13 09710 i005
There is ample evidence to suggest that the incorrect utilization of vulnerable functions serves as the root cause of numerous vulnerabilities. Hence, precise modeling of vulnerable functions is imperative to detect vulnerabilities effectively. Current methods disregard the specificities of IoT scenarios, only perceiving a limited number of sensitive system APIs as vulnerable. During the development process, it has been observed that numerous manufacturers tend to encapsulate low-level system APIs into higher-level custom APIs. Overlooking these interfaces can lead to erroneous identification of vulnerabilities. Hence, we decided to identify these custom APIs and continue to employ taint analysis with semgrep to accomplish this goal. In the specific configuration shown in Listing 6, the tainted sources are defined as the function’s parameters, while the underlying foundational command execution functions (like io.popen and os.execute) are designated as the sink points. Finally, FLuaScan discovered many custom APIs that are specific to various vendors, such as XQFunction.forkExec, LuciUtil.exec, and subprocess.call. Furthermore, these sensitive APIs are defined as sink nodes (like S1 in Figure 2).
Listing 6. Code snippet to locate custom APIs utilizing semgrep.
Applsci 13 09710 i006
Once these custom APIs are identified, the (c) sink analysis process aims to identify the incorrect usage of vulnerable functions. Unsafe usage can be classified into two types: direct propagation, which involves the direct propagation of user inputs to vulnerable APIs within a function, and indirect propagation, which refers to the propagation of user inputs across functions via middle nodes to access vulnerable APIs. FLuaScan utilizes semgrep to achieve this objective with the configuration presented in Listing 7.
Listing 7. Code snippet to locate custom APIs utilizing semgrep.
Applsci 13 09710 i007
Therefore, in the taint source setup, lines 24–27 in Listing 7 focus on the direct propagation, while lines 17–22 focus on indirect propagation. In the sink point setup, both previously discovered custom vendor APIs and common sensitive APIs are taken into account.

5.4. Graph-Based Vulnerability Search

Upon completion of progressive taint analysis, the details of taint propagation between distinct snippets were elucidated. However, vulnerability mining requires finding a complete path from the input to the sink point. Therefore, FLuaScan proposes a graph-based vulnerability search method. Drawing on the previously mentioned concept of analysis nodes, vulnerability detection is redefined as a graph search problem. Specifically, our initial step involves utilizing the Neo4j database to store all nodes that have been analyzed in prior stages. Then, FLuaScan creates three labels (INode, MNode, PNode, and SNode) in the Neo4j database corresponding to three different types of nodes. FLuaScan inserts INode and MNode into the Neo4j database, creating the internal taint relationships, based on the result of the handler analysis process. Subsequently, based on the results shown in (b) and (c), the relationships between PNode and MNode, as well as between PNode and SNode, are processed in a similar manner. The necessary condition for the existence of a vulnerability is to find a path from INode to SNode. Currently, finding such a path is not possible due to our limitation in obtaining only the relationships of direct contamination within different code snippets. The analysis of subfunctions is essential to connect these distinct code snippets into a whole path. During the progressive taint analysis, in the (a) process, the invocation of subfunctions acts as the sink point and is represented as MNodes in the Neo4j database. In contrast, in the (b) and (c) processes, the parameters of subfunctions function as taint sources and are represented as PNodes in the Neo4j database. FLuaScan examines the pos attributes of all MNodes and PNodes. A relationship between the MNode and the PNode is established if they pertain to the same subfunction. Finally, FLuaScan employs a Cypher query to search for paths that start from the INode, traverse multiple MNodes, and end at the SNode. Detail of the Cypher query are demonstrated in Listing 8.
Listing 8. Cypher query to detect vulnerability.
Applsci 13 09710 i008
To assist users in manual analysis, FLuaScan provides an interactive web page displaying all identified suspicious vulnerability points. Figure 3 (raw figure at https://sm.ms/image/L8VRorXCgnUSefk accessed on 20 August 2023) shows a screenshot of the web page, which represents the paths that meet the vulnerability requirements.
The purple, red, blue, and yellow nodes, respectively, represent INodes, MNodes, PNodes, and SNodes, respectively. In addition to this web page, FLuaScan also includes a VSCode plugin, assisting in manual analysis. With the assistance of this plugin, comments are automatically added at the positions of analysis nodes and displayed with different highlight colors based on their type.

6. Experiment

To ensure thorough testing of FLuaScan, three evaluation questions are proposed.
Q1: Correctness: What is the number of true positives generated by FLuaScan? (Section 6.1);
Q2: Effectiveness: Can FLuaScan detect vulnerabilities in real-world devices? (Section 6.2);
Q3: Practicability: How are the time overhead and actual user experience of FLuaScan? (Section 6.3).
To bolster the credibility of the experiment, we painstakingly curated a diverse test set consisting of multiple devices sourced from various vendors. As shown in Table 3, the test set comprises firmware from 13 devices obtained from 7 distinct vendors. The final row documents the count of Lua files present within the firmware. The test set has an average of 147 Lua files per image, satisfying the requirement of the research objective for multifile detection. In order to evaluate the advancement of FLuaScan, we also compared it with current state-of-the-art tool TscanCode in the experiment.

6.1. Correctness

We conducted vulnerability detection on each firmware in the test set using FLuaScan and TscanCode. These two tools output some alerts; then, we will hire three expert vulnerability hunters to conduct a second confirmation of these suspicious vulnerability points. If experts determine that the danger can be excluded, it is classified as a false positive (FP); otherwise, it is classified as a true positive (TP). The detailed experimental results are recorded in Table 4, where the “alerts”, “TP”, and “FP” columns represent the total number of potential vulnerability points identified by the tool, the number of confirmed vulnerabilities after secondary verification, and the number of false positives, respectively. In addition, the “Handlers” column records the number of service handlers identified by FLuaScan from Lua applications.
As shown in Table 3, TscanCode exhibits a significantly higher number of alerts compared to FLuaScan. However, TscanCode fails to discover actual vulnerabilities. FLuaScan achieved an average accuracy rate of 42.50% on this test set. The success of FLuaScan in achieving exceptional results can be attributed to its precise extraction of handlers. In contrast, TscanCode not only fails to achieve this but also struggles to identify vulnerabilities accurately. Meanwhile, for the ninth firmware, as FLuaScan was unable to locate the handler that processes user input, no vulnerabilities were discovered.

6.2. Effectiveness

After organizing all the suspicious vulnerabilities identified by FLuaScan, we ultimately discovered 21 vulnerabilities, of which 7 are unknown. We meticulously compiled a comprehensive summary of the information about these vulnerabilities and organized it in Table 5. The third, fourth, and fifth columns represent the affected devices, the function names where the vulnerabilities are located, and the types of vulnerabilities, respectively. The vulnerabilities FLuaScan found can be categorized into two types: command injection (CI) and SQL injection (SQLi). The sixth column (MNode) in the table reveals the count of middle nodes through which the vulnerability traverses from user input to the sensitive function. Each middle node signifies a cross-file invocation. The results indicate that the majority of vulnerabilities involve cross-file invocations, highlighting the necessity of cross-file static analysis. However, current state-of-the-art tool TscanCode is unable to accomplish this. The seventh column represents whether the vulnerability is known, while the last column records the severity level of the vulnerability. The results highlight two distinct characteristics. First, a vulnerability can potentially affect multiple products within the same vendor. Second, the vulnerabilities we uncovered exhibit a significant level of risk, all falling into moderate to high-severity categories. These two characteristics are similar to those summarized in other literature reports, indicating the universality of our findings.

6.3. Practicality

To evaluate the practicality of FLuaScan, we adopted a combined approach of quantitative and qualitative evaluation. In quantitative terms, we separately recorded the runtime of FLuaScan and Tscancode to detect the same image. In Figure 2, we present these results in the form of bar and line charts. The brown and green bars represent the runtime of FLuaScan and Tscancode, respectively, while the blue line represents the number of Lua files.
As shown in Figure 4, within the same brand, as the number of Lua files increases, the runtime of both Tscancode and FLuaScan also increases. However, regarding the same image, FLuaScan always takes more time than Tscancode. FLuaScan has an average runtime of 25.16 s, while Tscancode has an average runtime of 1.51 s. The primary factor contributing to the higher time overhead of FLuaScan is its greater analytical depth in comparison to Tscancode, in addition to the extra operations involving database writing and analysis. Meanwhile, on average, FLuaScan can complete vulnerability detection for a real-world device image in less than half a minute. Considering the higher accuracy of FLuaScan, we believe that this time overhead level is acceptable. Since security analysts typically need to manually verify the results provided by automated tools, it is essential for the tool to be able to collaborate with manual code auditing.
We also compared the usage mode of TscanCode and FLuaScan in qualitative terms. The output of TscanCode is relatively simple, as it only includes the locations of the issues and is separated from the original code, making it difficult to assist in manual analysis directly. FLuaScan offers two visual output options: (1) using interactive graphs to display the specific propagation of vulnerabilities and tainted data flow, as shown in Figure 3, and (2) As shown in Figure 5, highlighting sensitive snippets of vulnerable code based on their attributes with the assistance of the VSCode plugin. In conclusion, FLuaScan’s visual output options and acceptable time cost make it more practical than TscanCode.
Based on the above results, we have presented the differences between FLuaScan and Tscancode regarding correctness, effectiveness, and practicability in a tabular form. As shown in Table 6, compared with existing tool TscanCode, the results demonstrate the superior performance of FLuaScan in terms of accuracy, effectiveness, and practicality.

7. Discussion

This section aims to examine the constraints of our methodology and provide insights into potential directions for future development. During the operation of FLuaScan, it leverages existing software tools like Semgrep and Neo4j. However, the methodology employed in FLuaScan is entirely innovative, and no method described in existing literature can do the same. At present, FLuaScan exclusively focuses on supporting the Lua programming language. However, the fundamental theories discussed in this paper regarding vulnerability mining are applicable to various programming languages. Furthermore, the semgrep we employ demonstrates the capability to support multiple programming languages, facilitating seamless portability to other script languages. After conducting surveys, we discovered that many IoT manufacturers also employ languages such as ASP and PHP in their product development. Therefore, enhancing FLuaScan to support multiple scripting languages is our next research direction. Meanwhile, some manufacturers are developing products by packaging compiled Lua bytecode as firmware rather than directly using Lua source files. The absence of symbol information in this scenario poses numerous challenges for static analysis. The next step of our research is to find ways to improve FLuaScan to adapt to Lua programs without sources.

8. Conclusions

In conclusion, this paper provides a thorough study of the security concerns associated with Lua applications in IoT firmware. This paper introduces an innovative progressive taint analysis method that precisely identifies service handlers, performs segmented taint analysis, and utilizes graph search to identify vulnerability paths. Furthermore, we developed a prototype system, FLuaScan, and conducted experiments to demonstrate its superiority over current state-of-the-art tool TscanCode in terms of correctness, effectiveness, and practicality. FLuaScan discovered 21 vulnerabilities during the experiment, of which 7 are unknown. The work presented in this paper provides a solid foundation for future research on vulnerability detection in other language scripts programs of IoT firmware.

Author Contributions

Conceptualization, X.L. and Z.W.; Methodology, X.L., W.G. and Q.W.; Software, X.L. and Q.W.; Validation, W.G.; Writing—original draft, X.L.; Writing—review & editing, X.L. and W.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Readers could send emails to us to access the related data.

Conflicts of Interest

The authors declare that they have no conflict of interest and no competing financial interest or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. IoT Analytics. State of IoT 2023: Number of Connected IoT Devices Growing 16% to 16.7 Billion Globally. 2023. Available online: https://iot-analytics.com/number-connected-iot-devices/ (accessed on 20 August 2023).
  2. Antonakakis, M.; April, T.; Bailey, M.; Bernhard, M.; Bursztein, E.; Cochran, J.; Durumeric, Z.; Halderman, J.A.; Invernizzi, L.; Kallitsis, M.; et al. Understanding the mirai botnet. In Proceedings of the 26th USENIX Security Symposium, USENIX Security 2017, Vancouver, BC, Canada, 16–18 August 2017; Kirda, E., Ristenpart, T., Eds.; USENIX Association: Boston, MA, USA, 2017; pp. 1093–1110. [Google Scholar]
  3. Microsoft. Microsoft Digital Defense Report 2022. In Technical Report; 2023; Available online: https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE5bUvv (accessed on 20 August 2023).
  4. Deng, S.; Zhao, H.; Fang, W.; Yin, J.; Dustdar, S.; Zomaya, A.Y. Edge Intelligence: The Confluence of Edge Computing and Artificial Intelligence. IEEE Internet Things J. 2020, 7, 7457–7469. [Google Scholar] [CrossRef]
  5. Khan, M.N.; Rahman, H.U.; Khan, M.Z.; Mehmood, G.; Sulaiman, A.; Shaikh, A.; Alqhatani, A. Energy-Efficient Dynamic and Adaptive State-Based Scheduling (EDASS) Scheme for Wireless Sensor Networks. IEEE Sens. J. 2022, 22, 12386–12403. [Google Scholar] [CrossRef]
  6. Hameed, M.; Yang, F.; Ghafoor, M.I.; Jaskani, F.H.; Islam, U.; Fayaz, M.; Mehmood, G. IOTA-Based Mobile Crowd Sensing: Detection of Fake Sensing Using Logit-Boosted Machine Learning Algorithms. Wirel. Commun. Mob. Comput. 2022, 2022, e6274114. [Google Scholar] [CrossRef]
  7. Mian, T.; Choudhary, A.; Fatima, S.; Panigrahi, B.K. Artificial intelligence of things based approach for anomaly detection in rotating machines. Comput. Electr. Eng. 2023, 109, 108760. [Google Scholar] [CrossRef]
  8. Chang, Z.; Liu, S.; Xiong, X.; Cai, Z.; Tu, G. A Survey of Recent Advances in Edge-Computing-Powered Artificial Intelligence of Things. IEEE Internet Things J. 2021, 8, 13849–13875. [Google Scholar] [CrossRef]
  9. Stergiou, C.; Psannis, K.E.; Gupta, B.B.; Ishibashi, Y. Security, privacy & efficiency of sustainable Cloud Computing for Big Data & IoT. Sustain. Comput. Inform. Syst. 2018, 19, 174–184. [Google Scholar] [CrossRef]
  10. Chen, L.; Wang, Y.; Cai, Q.; Zhan, Y.; Hu, H.; Linghu, J.; Hou, Q.; Zhang, C.; Duan, H.; Xue, Z. Sharing More and Checking Less: Leveraging Common Input Keywords to Detect Bugs in Embedded Systems. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Virtual Event, 11–13 August 2021; pp. 303–319. [Google Scholar]
  11. Redini, N.; Machiry, A.; Wang, R.; Spensky, C.; Continella, A.; Shoshitaishvili, Y.; Kruegel, C.; Vigna, G. Karonte: Detecting Insecure Multi-binary Interactions in Embedded Firmware. In Proceedings of the IEEE Symposium on Security & Privacy (S&P), San Francisco, CA, USA, 18–21 May 2020; pp. 1544–1561. [Google Scholar] [CrossRef]
  12. Zheng, Y.; Davanian, A.; Yin, H.; Song, C.; Zhu, H.; Sun, L. FIRM-AFL: High-throughput greybox fuzzing of iot firmware via augmented process emulation. In Proceedings of the 28th USENIX Security Symposium, Santa Clara, CA, USA, 14–16 August 2019; pp. 1099–1114. [Google Scholar]
  13. Zhang, C.; Wang, Y.; Wang, L. Firmware Fuzzing: The State of the Art. In Proceedings of the 12th Asia-Pacific Symposium on Internetware, Singapore, 1–3 November 2020; Internetware’20. Association for Computing Machinery: New York, NY, USA, 2020; pp. 110–115. [Google Scholar] [CrossRef]
  14. Feng, X.; Sun, R.; Zhu, X.; Xue, M.; Wen, S.; Liu, D.; Nepal, S.; Xiang, Y. Snipuzz: Black-box Fuzzing of IoT Firmware via Message Snippet Inference. In Proceedings of the CCS’21: Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, 15–19 November 2021; CCS’21. Association for Computing Machinery: New York, NY, USA, 2021; Volume 1. [Google Scholar] [CrossRef]
  15. Xie, W.; Chen, J.; Wang, Z.; Feng, C.; Wang, E.; Gao, Y.; Wang, B.; Lu, K. Game of Hide-and-Seek: Exposing Hidden Interfaces in Embedded Web Applications of IoT Devices. In Proceedings of the ACM Web Conference, New York, NY, USA, 25 April 2022; WWW ’22. Association for Computing Machinery: New York, NY, USA, 2022; pp. 524–532. [Google Scholar] [CrossRef]
  16. Schwartz, E.J.; Avgerinos, T.; Brumley, D. All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask). In Proceedings of the 2010 IEEE Symposium on Security and Privacy, Oakland, CA, USA, 16–19 May 2010; SP ’10. pp. 317–331. [Google Scholar] [CrossRef]
  17. Cheng, K.; Liu, T.; Guan, L.; Liu, P.; Li, H.; Zhu, H.; Sun, L. Finding Taint-Style Vulnerabilities in Linux-based Embedded Firmware with SSE-based Alias Analysis. arXiv 2022, arXiv:2109.12209. [Google Scholar]
  18. Celik, Z.B.; Babun, L.; Sikder, A.K.; Aksu, H.; Tan, G.; McDaniel, P.D.; Uluagac, A.S. Sensitive information tracking in commodity IoT. In Proceedings of the 27th USENIX Security Symposium, USENIX Security 2018, Baltimore, MD, USA, 15–17 August 2018; Enck, W., Felt, A.P., Eds.; USENIX’18; USENIX Association: Berkeley, CA, USA, 2018; pp. 1687–1704. [Google Scholar]
  19. Cheng, K.; Li, Q.; Wang, L.; Chen, Q.; Zheng, Y.; Sun, L.; Liang, Z. DTaint: Detecting the Taint-Style vulnerability in embedded device firmware. In Proceedings of the Proceedings-48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Luxembourg, 25–28 June 2018; DSN‘18. pp. 430–441, ISSN 2158-3927. [Google Scholar] [CrossRef]
  20. Zhang, L.; Chen, J.; Diao, W.; Guo, S.; Weng, J.; Zhang, K. CryptoREX: Large-Scale Analysis of Cryptographic Misuse in IoT Devices; USENIX Association: Berkeley, CA, USA, 2019; pp. 151–164. [Google Scholar]
  21. Costin, A.; Zarras, A.; Francillon, A. Automated Dynamic Firmware Analysis at Scale: A Case Study on Embedded Web Interfaces. In Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security, Xi’an, China, 30 May–3 June 2016; ASIA CCS ’16. Association for Computing Machinery: New York, NY, USA, 2016; pp. 437–448. [Google Scholar] [CrossRef]
  22. Thomas, S.L.; Chothia, T.; Garcia, F.D. Stringer: Measuring the Importance of Static Data Comparisons to Detect Backdoors and Undocumented Functionality. In Proceedings of the Computer Security-ESORICS 2017-22nd European Symposium on Research in Computer Security, Oslo, Norway, 11–15 September 2017; Proceedings, Part II; Foley, S.N., Gollmann, D., Snekkenes, E., Eds.; Springer: Berlin/Heidelberg, Germany, 2017; Volume 10493, pp. 513–531. [Google Scholar] [CrossRef]
  23. Shoshitaishvili, Y.; Wang, R.; Hauser, C.; Kruegel, C.; Vigna, G. Firmalice-Automatic Detection of Authentication Bypass Vulnerabilities in Binary Firmware. In Proceedings of the 22nd Annual Network and Distributed System Security Symposium, NDSS 2015, San Diego, CA, USA, 8–11 February 2015. [Google Scholar] [CrossRef]
  24. Chen, D.D.; Woo, M.; Brumley, D.; Egele, M. Towards Automated Dynamic Analysis for Linux-based Embedded Firmware. In Proceedings of the 23rd Annual Network and Distributed System Security Symposium, NDSS 2016, San Diego, CA, USA, 21–24 February 2016. [Google Scholar] [CrossRef]
  25. Kim, M.; Kim, D.; Kim, E.; Kim, S.; Jang, Y.; Kim, Y. FirmAE: Towards Large-Scale Emulation of IoT Firmware for Dynamic Analysis. In Proceedings of the Annual Computer Security Applications Conference, Austin, TX, USA, 7–11 December 2020; ACSAC ’20. Association for Computing Machinery: New York, NY, USA, 2020; pp. 733–745. [Google Scholar] [CrossRef]
  26. Zaddach, J.; Bruno, L.; Francillon, A.; Balzarotti, D. AVATAR: A Framework to Support Dynamic Security Analysis of Embedded Systems’ Firmwares. In Proceedings of the 21st Annual Network and Distributed System Security Symposium, NDSS 2014, San Diego, CA, USA, 23–26 February 2014. [Google Scholar] [CrossRef]
  27. Chen, J.; Diao, W.; Zhao, Q.; Zuo, C.; Lin, Z.; Wang, X.; Lau, W.C.; Sun, M.; Yang, R.; Zhang, K. IoTFuzzer: Discovering Memory Corruptions in IoT Through App-based Fuzzing. In Proceedings of the 2018 Network and Distributed System Security Symposium, San Diego, CA, USA, 18–21 February 2018. [Google Scholar] [CrossRef]
  28. Wang, Z.; Zhang, Y.; Liu, Q. RPFuzzer: A Framework for Discovering Router Protocols Vulnerabilities Based on Fuzzing. KSII Trans. Internet Inf. Syst. 2013, 7, 1989–2009. [Google Scholar] [CrossRef]
  29. Zhang, Y.; Huo, W.; Jian, K.; Shi, J.; Lu, H.; Liu, L.; Wang, C.; Sun, D.; Zhang, C.; Liu, B. SRFuzzer: An automatic fuzzing framework for physical SOHO router devices to discover multi-type vulnerabilities. In Proceedings of the 35th Annual Computer Security Applications Conference, {ACSAC} 2019, San Juan, PR, USA, 9–13 December 2019; ACSAC‘19; Balenson, D., Ed.; ACM: New York, NY, USA, 2019; pp. 544–556. [Google Scholar] [CrossRef]
  30. Redini, N.; Continella, A.; Das, D.; Pasquale, G.D.; Spahn, N.; Machiry, A.; Bianchi, A.; Kruegel, C.; Vigna, G.; De Pasquale, G.; et al. Diane: Identifying Fuzzing Triggers in Apps to Generate Under-constrained Inputs for IoT Devices. In Proceedings of the 42nd IEEE Symposium on Security and Privacy, SP 2021, San Francisco, CA, USA, 24–27 May 2021; Volume 2021, pp. 484–500, ISSN 10816011. [Google Scholar] [CrossRef]
  31. Srivastava, P.; Peng, H.; Li, J.; Okhravi, H.; Shrobe, H.; Payer, M. FirmFuzz: Automated IoT firmware introspection and analysis. In Proceedings of the 2nd International ACM Workshop on Security and Privacy for the Internet-of-Things, London, UK, 15 November 2019; pp. 15–21. [Google Scholar]
  32. Li, X.; Wei, Q.; Wu, Z.; Guo, W. A comprehensive survey of vulnerability detection method towards Linux-based IoT devices. In Proceedings of the 2023 2nd International Conference on Networks, Communications and Information Technology, Xi’an, China, 7–9 June 2023; CNCIT ’23. Association for Computing Machinery: New York, NY, USA, 2023; pp. 35–41. [Google Scholar] [CrossRef]
  33. Muench, M.; Stijohann, J.; Kargl, F.; Francillon, A.; Balzarotti, D. What You Corrupt Is Not What You Crash: Challenges in Fuzzing Embedded Devices. In Proceedings of the Proceedings 2018 Network and Distributed System Security Symposium, San Diego, CA, USA, 18–21 February 2018. NDSS’18. [Google Scholar] [CrossRef]
  34. Fasano, A.; Ballo, T.; Muench, M.; Leek, T.; Bulekov, A.; Dolan-Gavitt, B.; Egele, M.; Francillon, A.; Lu, L.; Gregory, N. SoK: Enabling Security Analyses of Embedded Systems via Rehosting. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, Hong Kong, China, 7–11 June 2021; pp. 687–701. [Google Scholar] [CrossRef]
  35. Clements, A.A.; Gustafson, E.; Scharnowski, T.; Grosen, P.; Fritz, D.; Kruegel, C.; Vigna, G.; Bagchi, S.; Payer, M. HALucinator: Firmware Re-hosting Through Abstraction Layer Emulation. In Proceedings of the 29th USENIX Security Symposium, USENIX Security 2020, Boston, MA, USA, 12–14 August 2020; USENIX’20; Capkun, S., Roesner, F., Eds.; USENIX Association: Boston, MA, USA, 2020; pp. 1201–1218. [Google Scholar]
  36. Scharnowski, T.; Bars, N.; Schloegel, M.; Gustafson, E.; Muench, M.; Vigna, G.; Kruegel, C.; Holz, T.; Abbasi, A. Fuzzware: Using precise MMIO modeling for effective firmware fuzzing. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 10–12 August 2022; Security’22. USENIX Association: Boston, MA, USA, 2022. [Google Scholar]
  37. Zhou, W.; Guan, L.; Liu, P.; Zhang, Y. Automatic firmware emulation through invalidity-guided knowledge inference. In Proceedings of the 30th USENIX Security Symposium, USENIX Security 2021, Vancouver, BC, Canada, 11–13 August 2021; USENIX’21; Bailey, M., Greenstadt, R., Eds.; USENIX Association: Boston, MA, USA, 2021; pp. 2007–2024. [Google Scholar]
  38. Feng, B.; Mera, A.; Lu, L. P2IM: Scalable and hardware-independent firmware testing via automatic peripheral interface modeling. In Proceedings of the 29th USENIX Security Symposium, USENIX Security 2020, Boston, MA, USA, 12–14 August 2020; Capkun, S., Roesner, F., Eds.; USENIX Association: Boston, MA, USA, 2020; pp. 1237–1254. [Google Scholar]
  39. Gustafson, E.; Muench, M.; Spensky, C.; Redini, N.; Machiry, A.; Fratantonio, Y.; Balzarotti, D.; Francillon, A.; Choe, Y.R.; Kruegel, C.; et al. Toward the Analysis of Embedded Firmware through Automated Re-Hosting. In Proceedings of the 22nd International Symposium on Research in Attacks, Intrusions and Defenses, RAID 2019, Beijing, China, 23–25 September 2019; pp. 135–150. [Google Scholar]
  40. Qin, C.; Peng, J.; Liu, P.; Zheng, Y.; Cheng, K.; Zhang, W.; Sun, L. UCRF: Static analyzing firmware to generate under-constrained seed for fuzzing SOHO router. Comput. Secur. 2023, 128, 103157. [Google Scholar] [CrossRef]
  41. Chen, S.; Lin, Z.; Zhang, Y. SelectiveTaint: Efficient data flow tracking with static binary rewriting. In Proceedings of the 30th USENIX Security Symposium, USENIX Security 2021, Vancouver, BC, Canada, 11–13 August 2021; Bailey, M., Greenstadt, R., Eds.; USENIX Association: Boston, MA, USA, 2021; pp. 1665–1682. [Google Scholar]
  42. Luo, C.; Li, P.; Meng, W. TChecker: Precise static inter-procedural analysis for detecting taint-style vulnerabilities in PHP applications. In Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, CCS 2022, Los Angeles, CA, USA, 7–11 November 2022; Yin, H., Stavrou, A., Cremers, C., Shi, E., Eds.; ACM: New York, NY, USA, 2022; pp. 2175–2188. [Google Scholar]
  43. Tencent. TscanCode: A Static Code Analyzer for C++, C#, Lua. Available online: https://github.com/Tencent/TscanCode (accessed on 20 August 2023).
  44. Semgrep—Find Bugs and Enforce Code Standards. Available online: https://semgrep.dev/ (accessed on 20 August 2023).
  45. Medeiros, I.; Neves, N.F.; Correia, M. DEKANT: A static analysis tool that learns to detect web application vulnerabilities. In Proceedings of the 25th International Symposium on Software Testing and Analysis, ISSTA 2016, Saarbrücken, Germany, 18–20 July 2016; Zeller, A., Roychoudhury, A., Eds.; ACM: New York, NY, USA, 2016; pp. 1–11. [Google Scholar]
  46. Yao, M.; Cui, B.; Chen, C. Research on IoT Device Vulnerability Mining Technology Based on Static Preprocessing and Coloring Analysis. In Proceedings of the Innovative Mobile and Internet Services in Ubiquitous Computing (IMIS-2020), Lodz, Poland, 1–3 July, 2020; Barolli, L., Poniszewska-Maranda, A., Park, H., Eds.; Advances in Intelligent Systems and Computing; Springer International Publishing: Cham, Switzerland, 2020; pp. 254–263. [Google Scholar] [CrossRef]
  47. Bandara, V.; Rathnayake, T.; Weerasekara, N.; Elvitigala, C.; Thilakarathna, K.; Wijesekera, P.; De Zoysa, K.; Keppitiyagama, C. Large scale analysis on vulnerability remediation in open-source JavaScript projects. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, 15–19 November 2021; pp. 2447–2449. [Google Scholar] [CrossRef]
  48. Universal Ctags. 2023. Available online: https://github.com/universal-ctags/ctags (accessed on 20 August 2023).
Figure 1. Overview of FLuaScan.
Figure 1. Overview of FLuaScan.
Applsci 13 09710 g001
Figure 2. The overview of progressive taint analysis engine.
Figure 2. The overview of progressive taint analysis engine.
Applsci 13 09710 g002
Figure 3. The visual result of a graph-based vulnerability search in FLuaScan.
Figure 3. The visual result of a graph-based vulnerability search in FLuaScan.
Applsci 13 09710 g003
Figure 4. The runtime of FLuaScan and Tscancode to detect vulnerabilities in the test set.
Figure 4. The runtime of FLuaScan and Tscancode to detect vulnerabilities in the test set.
Applsci 13 09710 g004
Figure 5. The vulnerability source code highlighted with different colors supported by FLuaScan’s VSCode plugin.
Figure 5. The vulnerability source code highlighted with different colors supported by FLuaScan’s VSCode plugin.
Applsci 13 09710 g005
Table 1. Summary of existing vulnerability detection methods in embedded device.
Table 1. Summary of existing vulnerability detection methods in embedded device.
CategoryStrategyLiteratureDependenceTargetTechnologyPurpose
DeviceFwApp
Static
analysis
Taint
analysis-
based
method
KARONTE [11] Binary
program
in firmware
Multibinary
interaction and
taint analysis
taint style
vulnerabilities
(BOF/ CI)
SainT [18] IoT appTaint analysisSensitive
information leak
DTaint [19] Binary
program
in firmware
Taint analysisTaint-style
vulnerabilities
(BOF/ CI)
SATC [10] Binary
program
in firmware
Taint analysisTaint-style
vulnerabilities
(BOF/CI)
CryptoREX [20] Binary
program
in firmware
IR-based analysisCrypto misuse
pattern-
matching-
based
method
Costin [21] FirmwareCorrelation analysisBackdoor
Stringer [22] Static data
comparison
Backdoor
Firmalice [23] Symbolic execution and
program slicing
Backdoor
dynamic
Analysis
analysis
basis
Firmadyne [24] FirmwareFull system
emulation
Emulation
purpose
FirmAE [25] Arbitration emulationEmulation
purpose
Avatar [26] Forward I/O
and full-separation mode
emulation
Dynamic analysis
framework
FuzzingFIRMAFL [12] FirmwareGreybox fuzzingBOF/NPD
IoTFuzzer [27] Blackbox fuzzingBOF/NPD
RPFuzzer [28] Greybox fuzzingDoS
SRFuzzer [29] Blackbox fuzzingBOF/NPD/XSS
Snipuzz [14] Blackbox fuzzing
and message snippet
inference
NPD/DoS
Diane [30] Blackbox fuzzing
and app static analysis
BOF/DoS
FirmFuzz [31] taint analysis and
structured symbolic
expression
BOF/NPD
CI/XSS
BOF: buffer overflow; CI: command injection; NPD: null pointer dereference; DoS: denial of service; XSS: cross-site scripting.
Table 2. Information about representative experimental platforms in empirical analysis.
Table 2. Information about representative experimental platforms in empirical analysis.
BrandXiaomiTplinkTrendentLinksysNetgearTotolinkZyxel
Images1921338153392331
Lua > 50184692516559
%95.83%51.88%30.86%30.19%12.82%21.74%29.03%
Avg.209.39168.39173.68149.63125117.3495
Table 3. Information about the test set.
Table 3. Information about the test set.
#BrandDeviceVersionLua Files
1xiaomir3aV2.12.8202
2xiaomir3600V1.0.17209
3xiaomir4cV2.28.48207
4tplinkTL-WR1043NV5_161118140
5tplinkArcher-C1900V2.0_220524156
6tplinkArcher-AXE75V1_220405215
7zyxelNBG6716V1.0069
8zyxelNBG6604V1.00117
9linksysEA6350V2V2.1.2.184309137
10totolinkA6000RV1.0.1140
11totolinkX5000RV9.1.0cu.2089114
12trendentTEW-821DAPV1.08B04115
13netgearEX8000V1.0.1.16491
Table 4. Detailed result of the test set.
Table 4. Detailed result of the test set.
#BrandDeviceFLuaScanTscanCode
AlertsTPFPHandlersAlertsTPFP
1xiaomir3a165114481490149
2xiaomir3600176114881790179
3xiaomir4c10553991660166
4tplinkTL-WR1043N3113471410141
5tplinkArcher-C19002113211380138
6tplinkArcher-AXE752118202970297
7zyxelNBG67163219290090
8zyxelNBG6604321651180118
9linksysEA6350V200001340134
10totolinkA6000R7341131520152
11totolinkX5000R734911630163
12trendentTEW-821DAP211371220122
13netgearEX80008444396096
Table 5. The vulnerabilities found by FLuaScan.
Table 5. The vulnerabilities found by FLuaScan.
#VendorDeviceVuln FunctionTypeMNode0 DaySeverity
1xiaomir3a,r4csetconfigIoTDevCI0KnownHigh
2xiaomir4c,r3a,r3600wifi_accessCI2KnownHigh
3xiaomir3a,r4cgetThumbCI1KnownHigh
4xiaomir3a,r4csetSyncRouterFileCI2UnknownHigh
5xiaomir3a,r4csetDeviceNickNameSQLi3UnknownMiddle
6xiaomir3a,r4cupdateDeviceNicknameSQLi3UnknownMiddle
7xiaomir3a,r4cbackupDataCI3KnownHigh
8tplinkTL-WR1043N,Archer-AXE75,
Archer-C1900
set_countryCI3KnownHigh
9tplinkArcher-AXE75vpn_user_updateCI0KnownHigh
10zyxelNBG6604,NBG6716multiple_ssidCI1KnownHigh
11zyxelNBG6604,NBG6716multiple_ssid5GCI1KnownHigh
12zyxelNBG6604,NBG6716wlanwpsstationCI1KnownHigh
13zyxelNBG6604,NBG6716action_wirelessCI1KnownHigh
14totolinkA6000Renable_wshCI0KnownHigh
15totolinkA6000R,X5000Raction_rebootCI0UnknownHigh
16totolinkA6000R,X5000Rreset_wifiCI0KnownHigh
17netgearEX8000iface_shutdownCI1UnknownHigh
18netgearEX8000iface_reconnectCI1UnknownHigh
19netgearEX8000iface_deleteCI1UnknownHigh
20netgearEX8000action_bandwidthCI1KnownHigh
21trendnetTEW-821DAPaction_bandwidthCI0KnownHigh
Table 6. Comparison of FLuaScan and Tscancode in terms of correctness, effectiveness, and practicability.
Table 6. Comparison of FLuaScan and Tscancode in terms of correctness, effectiveness, and practicability.
CorrectnessEffectivenessPracticability
Tscancode0% TP Rate0 actual vulnerabilitiestime cost: average 1.51 s
output: simple text
FLuaScan42.50% TP Rate21 vulnerabilities(9 unknown)
moderate to high severity risk
time cost: average 25.16 s (acceptable)
output: two visual output assist manual analysis
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, X.; Wei, Q.; Wu, Z.; Guo, W. Finding Taint-Style Vulnerabilities in Lua Application of IoT Firmware with Progressive Static Analysis. Appl. Sci. 2023, 13, 9710. https://doi.org/10.3390/app13179710

AMA Style

Li X, Wei Q, Wu Z, Guo W. Finding Taint-Style Vulnerabilities in Lua Application of IoT Firmware with Progressive Static Analysis. Applied Sciences. 2023; 13(17):9710. https://doi.org/10.3390/app13179710

Chicago/Turabian Style

Li, Xixing, Qiang Wei, Zehui Wu, and Wei Guo. 2023. "Finding Taint-Style Vulnerabilities in Lua Application of IoT Firmware with Progressive Static Analysis" Applied Sciences 13, no. 17: 9710. https://doi.org/10.3390/app13179710

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop