LogInjector: Detecting Web Application Log Injection Vulnerabilities

Pan, Zulie; Chen, Yu; Chen, Yuanchao; Shen, Yi; Li, Yang

doi:10.3390/app12157681

Open AccessArticle

LogInjector: Detecting Web Application Log Injection Vulnerabilities

by

Zulie Pan

^1,2

,

Yu Chen

^1,2,*,

Yuanchao Chen

^1,2

,

Yi Shen

^1,2 and

Yang Li

^1,2

¹

College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China

²

Anhui Province Key Laboratory of Cyberspace Security Situation Awareness and Evaluation, Hefei 230037, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(15), 7681; https://doi.org/10.3390/app12157681

Submission received: 17 June 2022 / Revised: 21 July 2022 / Accepted: 25 July 2022 / Published: 30 July 2022

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Web applications widely use the logging functionality, but improper handling can bring serious security threats. An attacker can trigger the execution of malicious data by writing malicious data to the web application logs and then accessing the view–logs interface, resulting in a vulnerability of the web application log injection. However, detecting this type of vulnerability requires automatic discovery of log-injectable interfaces and view–logs interfaces, which is difficult. In addition, bypasssing the application-specific input-filtering checks to write an effective payload to the log is also challenging. This paper proposes LogInjector, an efficient web application log injection vulnerability detection method. First, it obtains the log storage form and location and then finds the log-injectable interfaces through the extended dynamic crawler. Second, it automatically identifies the web application view–logs interfaces. Finally, LogInjector utilizes a dynamic testing approach based on the feedback-guided mutation to detect web application log injection vulnerabilities. To verify the effectiveness of LogInjector, we test it in 14 popular web applications in real-world cases and compare it with Black Widow, the state-of-the-art web vulnerability scanner. LogInjector detects 16 web application log injection vulnerabilities, including 6 zero-day vulnerabilities, while Black Widow can only detect three log injection vulnerabilities, demonstrating the effectiveness of LogInjector in practice.

Keywords:

log injection vulnerabilities; log-injectable interfaces; view–logs interfaces; feedback-guided mutation

1. Introduction

The logging functionality is the basic functionality of most web applications. It records the running information, user operations, debugging, warning, and error information of the web application, which provides great convenience for the website administrator to operate and maintain the website. However, suppose the web application does not properly handle the log storage and viewing process. In that case, malicious data input by the user can be written into the log. When the web interface for viewing logs is accessed, the injected malicious data will be executed as code. This will result in a log injection vulnerability of the web application, referred to as a log injection vulnerability after this. We define the web interface that can write logs as a log-injectable interfaces and the web interface for viewing logs as a view–logs interface.

Nowadays, almost every web application has a logging functionality, and the security awareness of web application developers is uneven. Some developers do not properly and effectively process the data entered by users into the log. As a result, a large number of web applications may have log injection vulnerabilities and bring potential risks to web applications.

Log injection vulnerabilities are more serious than other regular vulnerabilities of web applications. On the one hand, because the log injection vulnerabilities will persistently store the malicious payload in the database or file on the web application server side, it will cause persistent harm to the web application. On the other hand, unauthenticated users can inject malicious data into the logs. For example, some web applications will record the login information of users who fail to authenticate. Unauthenticated users can inject data into the log, so the log injection vulnerability exploit conditions are low.

For this type of vulnerability detection, the main challenges are as follows:

There are many web interfaces in web applications that interact with users. However, only a few can inject data into the log, so it is challenging to accurately find the log-injectable interfaces.
The log injection vulnerabilities need to be triggered by accessing the view–logs interface of the web application, and it is difficult to identify the view–logs interface.
Each web application has a different input-filtering check. It is challenging to pass the application-specific input-filtering checks of the web application and to ensure that the injected data can be executed.

To address these challenges, we propose LogInjector, an efficient detection framework for log injection vulnerabilities. Specifically, first, LogInjector obtains the log storage form and location, and then LogInjector uses the extended dynamic crawler to find the log-injectable interfaces. Second, LogInjector combines dynamic crawler and static features to identify the view–logs interfaces. Finally, based on the obtained information, LogInjector adopts a dynamic testing method based on the feedback-guided mutation to detect log injection vulnerabilities.

The contributions of this paper are summarized as follows:

We systematically studied the web application log injection vulnerabilities for the first time and proposed LogInjector, an effective web application log injection vulnerability-detection method;
LogInjector can automatically obtain the log storage location, the log-injectable interfaces, and the view–logs interfaces. Based on this information, LogInjector employs dynamic testing based on the feedback-guided mutation to detect log injection vulnerabilities effectively;
We implemented a prototype of LogInjector and evaluated the effectiveness of LogInjector on 14 web applications. LogInjector detected a total of 16 log injection vulnerabilities, including 6 zero-day vulnerabilities.

The rest of the paper is structured as follows. In Section 2, we give the background knowledge on the log injection vulnerabilities. Section 3 discusses related work. Section 4 discusses an overview of LogInjector. Section 5 presents the detailed design method of LogInjector. We describe the evaluation of LogInjector in Section 6. Finally, Section 7 concludes the paper.

2. Background

In this section, we provide some background on log injection vulnerabilities. We leverage the impact of a real log injection vulnerability to illustrate the motivation for our work in Section 2.1. In addition, we collect the log injection vulnerability disclosed in the CVE (Common Vulnerabilities and Exposures) database and systematically study and analyze its characteristics and security impact in Section 2.2. Finally, we summarize the existing conditions for the log injection vulnerabilities in Section 2.3.

2.1. A Motivating Example

A log injection vulnerability exists in the edit blog posts module of PHPFusion [1] (9.03.20) (CVE-2020-17449 [2]). When a web application user is editing a blog post, if the submitted blog_image parameter includes single quotes and a piece of malicious HTML script code, such as test.png ’ <script>alert(’log injection’)</script>, then when the user saves the changes, because the input blog_image parameter contains single quotes, it will cause an error in the relevant code to execute the SQL statement. The relevant code for PHPFusion to store the error message is shown in Listing 1. PHPFusion will store the error level, the error message, the filename in which the error was raised, and the line number where the error was raised in the DB_ERRORS table.

Listing 1. PHPFusion logs the error message.

The relevant code snippet showing error log information is shown in Listing 2. When the website administrator accesses the web page of the Error Log notification module, PHPFusion will query the corresponding error information from the DB_ERRORS table according to the error_id input by the HTTP request GET method in line 7 and output the error message in line 9. The injected malicious code is parsed and executed by the browser, triggering the vulnerability. If a malicious user enters a payload that steals the user’s cookies, when the website administrator views the logs, the malicious user can steal the administrator’s cookie and take over the site.

Listing 2. Display of error log information.

2.2. Investigating Known Log Injection Vulnerabilities

We collected 14 known log injection vulnerabilities from CVE, and these 14 vulnerabilities are distributed in 13 different web applications. We conducted an in-depth analysis of these vulnerabilities and found the following:

Finding 1: Eight of these web applications use databases to store logs, and the remaining five use files to store logs. For web applications that store logs in the database, the corresponding table name of the log storage location will contain the keywords error, log, etc. For example, Contao [3] uses the tl_log table to record logs. For a web application that uses files to store logs, the file name of the stored log also contains the keywords error or log, etc., such as the webmin.log file in Webmin [4].

Finding 2: Among the web applications studied, eight log injection vulnerabilities are caused by web application logs recording user operations, and six are caused by recording program code running error messages, such as when an error occurs in the program code associated with the execution of an SQL statement. These 14 vulnerabilities are all triggered by submitting the corresponding payload to the form on the web page. In particular, for the vulnerability caused by recording program execution error information, it is necessary to input the characters that cause the code execution error to the corresponding web form, such as single quotation marks.

Finding 3: These 13 web applications all have their own input-filtering checks, and the implementations of their respective input-filtering checks are also different. The input-filtering checks of each web application have corresponding bypass methods.

Finding 4: Ten of the log injection vulnerabilities can cause stored cross-site scripting execution (Stored XSS), and the remaining four cause remote code execution (RCE). The damage caused by the log injection vulnerabilities is very serious. An attacker can steal user’s cookies with stored cross-site scripting execution. If an attacker succeeds in stealing a website administrator’s cookies, the attacker can take over the website. In addition, for remote code execution, an attacker can use this vulnerability to take control of the host and remotely steal private user information, and the impact is more serious.

2.3. Vulnerability Conditions

OWASP (Open Web Application Security Project) defines log injection vulnerabilities [5] as writing invalidated user input to log files that can allow an attacker to forge log entries or inject malicious content into the logs. By referring to the definition proposed by OWASP and a systematic study of known log injection vulnerabilities, we summarize three conditions for the existence of such vulnerabilities:

(1): There is user-controllable input that can be written to the log. A log injection vulnerability must have a user-controlled input source that can write to the log.
(2): It must has a view–logs interface. For the payload injected into the log, it is necessary to access the view–logs interface to execute the payload to trigger the vulnerability.
(3): The injected malicious data can pass the input-filter checks of the web application and retain the execution semantics. A log injection vulnerability exists only if the payload can be stored in the log and the payload can be executed.

3. Related Work

Log injection vulnerabilities can lead to (1) the injection of malicious XSS (cross-site scripting execution) payloads into web application logs, triggering stored cross-site scripting execution when the view–logs interface is accessed, and (2) injection of commands that parsers (such as PHP parsers) could execute, triggering remote code execution when accessing the view–logs interface. Therefore, log injection vulnerabilities are a special kind of second-order vulnerability. Current web application vulnerability detection research mainly focuses on first-order vulnerabilities, mainly including reflected cross-site scripting execution, first-order SQL injection, first-order remote code execution, and file upload vulnerabilities. There are few studies on second-order vulnerabilities.

We now discuss related work on first-order vulnerability analysis and second-order vulnerability analysis.

3.1. First-Order Vulnerability Analysis

Jovanovic et al. [6] used data flow analysis to detect XSS and SQL injection vulnerabilities. A lot of work has been put into modeling aliases. However, their method has a high false-positive rate due to missing dynamic features and context analysis. Appelt et al. [7] proposed an input mutation method for the automated testing of SQL injection vulnerabilities. They proposed three categories of mutation operations based on behaviour-changing, syntax-repairing, and obfuscation, aiming to generate executable and web application firewall-capable inputs through random mutation. Backes et al. [8] suggested an approach that leverages code property graph (CPG) for discovering vulnerabilities in the server-side of PHP web applications. Then, they obtained the vulnerability path by performing graph query on the CPG. Fang et al. [9] presented a novel approach based on deep learning. This involves extracting features using Word2vec from XSS payloads to capture the word order information by mapping each payload to a feature vector. The downside of this approach is the problem of false negatives. Tang et al. [10] presented a high-accuracy SQL injection-detection method based on neural networks. However, this approach does not support contexts analysis. Van Rooij et al. [11] proposed a grey-box fuzzing method for web applications, using code coverage to guide fuzz testing. They proposed five mutation operations designed to trigger new execution paths. They only detected reflected cross-site scripting vulnerabilities. Erdődi et al. [12] proposed leverage reinforcement learning to automatically exploit SQL injection vulnerabilities, but this method is only suitable for simple scenarios such as capture-th- flag (CTF) challenges. Liu et al. [13] proposed a cross-site scripting payload generation method based on a genetic algorithm. They sequenced the initial attack vector into genes and then mutated the gene sequence according to the commonly used bypass methods. Finally, an effective vector for detecting cross-site scripting-execution vulnerabilities is generated. However, this method lacks adaptive capability. Lee et al. [14] proposed a novel method of adapting attack payloads to a target reflected XSS vulnerability using reinforcement learning. They leveraged context-aware payload generation to trigger complex XSS vulnerabilities; however, they only support the detection of reflected cross-site scripting vulnerabilities. Zhao et al. [15] proposed directed fuzzing to detect remote code-execution vulnerabilities. The effective mutation strategy they designed achieves good results in vulnerability detection but only supports first-order remote code-execution vulnerabilities.

3.2. Second-Order Vulnerability Analysis

Log injection vulnerabilities are a special kind of second-order vulnerability. Some existing second-order vulnerability detection methods cannot detect such vulnerabilities very well. At present, second-order vulnerability detection methods mainly include static analysis and dynamic testing methods.

Static methods were first proposed to detect second-order vulnerabilities in web applications [16,17,18,19]. Balzarotti et al. [16] proposed MiMoSA to find multi-module data flow and workflow vulnerabilities. It models the data flow in the database by dynamically executing SQL statements. Moreover, it pays attention to detecting web application workflow vulnerabilities. However, MiMoSA does not support multi-step exploits. Dahse et al. [17] used a static method, which first models and analyzes the reading and writing of the memory location of the web server. After the data flow analysis is completed, it judges whether the data read from the persistent storage are controllable and whether they are vulnerable. Olivo et al. [18] leveraged static analysis to detect second-order denial-of-service (DoS) and then perform symbolic execution to generate candidate attack vectors. However, their system can only detect second-order DoS vulnerabilities. Yan et al. [19] statically analyze the source code to find a vulnerable field in the database that could lead to a second-order SQL injection vulnerability and then generate a test sequence related to that field. However, this method can only detect second-order SQL injection vulnerabilities.

The methods of dynamically detecting second-order vulnerabilities include black-box testing methods and gray-box testing methods. Previous research [20,21,22,23,24] has also shown that it is hard for black-box testing to detect such vulnerabilities. McAllister et al. [25] presented a black-box scanner capable of detecting second-order XSS vulnerabilities. It explores larger application parts by generating more comprehensive test cases. However, it cannot easily analyze the inter-state dependencies of the web application and cannot identify which page the injected data are reflected on. Doupé et al. [26] presented a state-aware black-box web vulnerability scanner. It can achieve high code coverage, and its disadvantage is that it does not support inter-state dependency analysis. Duchene et al. [27] used a black-box approach to detect web vulnerabilities and applied a genetic algorithm to modify the payloads. This approach does not produce false positives, but it requires the ability to reset the application. Steinhauser et al. [28] extended a common database to intercept the executed SQL statement. This approach identifies injected input in web output and then checks encoding and filtering rules. Eriksson et al. [29] analyzed inter-state dependencies by injecting unique tokens into pages and looking for these strings in subsequent web pages. However, their system can only detect cross-site scripting vulnerabilities.

Currently, many researchers are focusing on protection mechanisms for stored cross-site scripting execution and code-execution vulnerabilities. Alkhalaf et al. [30] proposed a detection method for input sanitization/validation in web applications. Chin et al. [31] utilized dynamic tainting analysis to defend against code execution attacks; however, this system suffers from runtime overhead. Bisht et al. [32] detected possible attacks by building benign user input models, but this approach requires accurate modeling of ever-evolving attackers and target applications. Bulekov et al. [33] introduced an abstraction-aware technique for applying PoLP (Principle Of Least Privilege) to interpret PHP applications. Every PHP program should only be allowed to call the system calls it needs to function correctly.

A summary of related work is given in Table 1; we found that there is currently no effective detection method for log injection vulnerability, and the existing methods cannot bypass application-specific input-filtering checks well.

4. Overview

We propose LogInjector, an efficient automated detection method for log injection vulnerabilities in web applications. Its overall architecture is shown in Figure 1. It consists of three parts: finding log-injectable interfaces, identifying view–logs interfaces, and dynamic testing of feedback-guided mutations.

4.1. Finding of Log-Injectable Interfaces

The web interfaces are the only input points for web application and user interaction. The collection and identification of the web interfaces is a very important part of vulnerability detection work. LogInjector first obtains the log storage form and the location of the web application server and then uses a crawler to crawl the web application. Note that during the crawling process, the crawler automatically fills each form with specific inputs to try to trigger writing to the log, and after the crawler sends each HTTP request, it can automatically observe the log storage location and determine whether the log is written.

4.2. Identification of View–Logs Interfaces

LogInjector combines the dynamic crawler and static features of the view–logs interface to identify the view-logs interface of a web application. In the process of crawling a website by a dynamic crawler, LogInjector determines whether the web interface has static features of the view–logs interface to identify whether the interface is a view–logs interface.

4.3. Dynamic Testing of Feedback-Guided Mutations

After obtaining the log-storage location, log-injectable interfaces, and view–logs interfaces, we conducted dynamic testing based on feedback-guided mutation. We propose five kinds of initial input, six kinds of mutation operations, and eight kinds of feedback. First, LogInjector takes the initial input into the request parameters of the log-injectable interface and tests it as a test case. We abbreviate the test case as TC. Second, it analyzes the test case’s feedback information and scores the test case. Finally, it preferentially selects test cases with high scores for subsequent mutation and repeats the previous step until monitoring the vulnerability trigger or reaching the maximum number of attempts.

5. Design

In this section, we introduce the specific design of LogInjector in detail.

5.1. Finding Log-Injectable Interfaces

Get Log Storage Form and Location. Web application logs are generally stored in the database or file of the web application server. LogInjector first performs a fuzzy query on the web application database through log-related keywords (such as log, errors, etc.) and then uses regular expressions to match the web application directory files to obtain the log-related file names. Finally, the web application log’s storage form and location in the web server are obtained.

Find Log-Injectable Interfaces. Web application logs usually record user operations, exceptions, and error messages, such as users logging in, posting comments on web pages, entering illegal data to web forms, or importing files in the wrong format. Therefore, we use the crawler to simulate user operations, input data in the wrong format, and try to trigger the log writing functionality of the web application. We extend the application based on crawlergo [34] to achieve our functionality. Crawlergo is an open-source browser crawler. It uses chrome headless mode for URL collection. It hooks key positions of the whole web page with the DOM rendering stage, automatically fills and submits forms with intelligent JS event triggering, and collects as many entries exposed by the website as possible [34]. Specifically, crawlergo first injects JavaScript code to hook key functions and events, such as History API [35], setTimeout [36], and registered events, before the webpage loads. Then, it uses a headless browser to simulate form filling and click submit operations, filling in the corresponding data according to the input type of the form, so that the JavaScript code sends the request in the correct format. Finally it collects links from src, href, data-url, longDesc, lowsrc attributes, and comments to cover as much of the web application as possible. In addition, crawlergo can also finds directories and files that are not referenced by setting the parameter value to a custom dictionary. Our goal in this paper is to find log-injectable interfaces. For this purpose, LogInjector extends the functionality of crawlergo. After LogInjector recognizes web page forms, it fills each form with a constructed specific input consisting of special characters (’"<>) plus a random string of a fixed length. For the file upload form, it uploads a file containing the above special characters (’"<>) and fixed-length random strings. After LogInjector automatically submits the form, it determines whether the constructed input is written to the log according to the log storage form and the location obtained before. Specifically, if the log is stored in the form of a file, LogInjector will perform string matching on the contents of the log file to detect whether the random string filled in the current form is stored. If the log is stored in a database, LogInjector executes an SQL statement to query the location of the log storage and detects whether the random string filled in the current form is stored. If LogInjector detects the corresponding random string, the web interface is a log-injectable interface. Finally, after LogInjector finishes crawling the web application, it logs all log-injectable interfaces obtained, which are used as mutation vectors in Section 5.3.

5.2. Identification of View–Logs Interfaces

The view–logs interface is a web interface provided by the web application for website administrators to monitor the running status of the web application. Accurately identifying the view–logs interfaces is very important for the detection of log injection vulnerability. To the best of our knowledge, there is currently no automated way to identify view–logs interfaces. The existing methods for the detection of log injection vulnerability require a lot of manual operations to complete this part of the work.

Considering the specification of web application code writing and the convenience of users, view–logs interfaces designed by web application developers include the following characteristics:

(1): The url contains keywords such as log, error;
(2): The HTML <title> tag contains keywords such as log, error;
(3): Tags such as <a>, <h>, and <span> in the HTML of the web page contain keywords such as log, error.

Figure 2 shows the view–logs web page of PHPFusion [1]. In order to highlight the key information, we delete irrelevant codes. The corresponding url contains the error keyword, the <title> tag contains the Error Log keyword, and the <h3> tag of the HTML page contains the Error Log keyword. Based on these static features, we can identify whether the web interface of the current page is the view-logs interface.

However, since modern web applications use a lot of JavaScript code and client-side events, they have complex dynamic features [29]. For example, when a user clicks a button on a web page, it will cause the entire DOM tree of the web page to change. Therefore, we combine dynamic crawlers and these static features to identify the view–logs interfaces.

5.3. Dynamic Testing of Feedback-Guided Mutations

Through Section 5.1 and Section 5.2, we can obtain the log storage location, the log-injectable interfaces, and view–logs interfaces. In order to efficiently and accurately detect log injection vulnerabilities, we propose a dynamic testing method based on feedback-guided mutation, which includes five types of initial inputs, six types of mutation operations, and eight types of feedback to realize the automatic detection of log injection vulnerability.

5.3.1. Workflow

We propose Algorithm 1 for dynamic testing of log injection vulnerabilities. The input of the algorithm includes the initial input init_input and the heap tree heaptree. LogInjector maintains a fixed-size heap tree to store test cases that mutate well. The output of the algorithm is

T C_{s u c c e s s}

, which represents the set of successful test cases. First, LogInjector adds the initial input to the heap tree and then takes the first element of the heap tree as

T C_{c u r r e n t}

, where

T C_{c u r r e n t}

represents the current test case (lines 2–3). Second, LogInjector sends

T C_{c u r r e n t}

and then calculates the score of

T C_{c u r r e n t}

based on the feedback information and verifies whether the vulnerability is triggered (lines 7–9). If the vulnerability is triggered, the successful test case will be returned directly (line 26). If the vulnerability is not triggered, LogInjector compares the scores of

T C_{c u r r e n t}

with the parent test case (line 13). If

T C_{c u r r e n t}

is better than the parent test case, LogInjector will add

T C_{c u r r e n t}

to the heap tree (line 14). Otherwise, LogInjector will reduce the score of the parent test case by 0.1 (line 16). In order to avoid repeated mutation of poor test cases, we reduce the score of test cases that did not achieve better results after mutation by 0.1. Then, LogInjector will adjust the heap tree (line 18). In line 20, if the number of elements in the heap tree is larger than the fixed size we defined, LogInjector removes

T C_{l e a s t}

, where

T C_{l e a s t}

represents the test case with the lowest score, from the heap tree.

Then, LogInjector takes the test case with the highest score from the heap tree and mutates it randomly as

T C_{c u r r e n t}

. Finally, LogInjector repeats the above process until a vulnerability is successfully detected or the number of tests reaches a threshold to stop.

Algorithm 1 Dynamic Testing Algorithm Based on Feedback-Guided Mutation.

Input:: $i n i t_i n p u t, h e a p t r e e$
Output:: $T C_{s u c c e s s}$
1:: $T C_{s u c c e s s} = {}$
2:: add init_input into heaptree
3:: $T C_{c u r r e n t}$ = heaptree[0]
4:: heaptree[0].score = 0
5:: success = False
6:: whilenot succcess or max tries not reached do
7:: feedback, responce = SendRequest( $T C_{c u r r e n t}$ )
8:: $T C_{c u r r e n t}$ .score = compute(feedback)
9:: if Vulexist(responce) then
10:: $T C_{s u c c e s s} = T C_{s u c c e s s}$ ∪ $T C_{c u r r e n t}$
11:: success = True
12:: else
13:: if $T C_{c u r r e n t}$ .score > heaptree[0].score then
14:: add $T C_{c u r r e n t}$ into heaptree
15:: else
16:: heaptree[0].score -= 0.1
17:: end if
18:: heapify(heaptree)
19:: if len(heaptree) > threshold then
20:: $T C_{l e a s t}$ = Least(heaptree)
21:: remove $T C_{l e a s t}$ from heaptree
22:: end if
23:: $T C_{c u r r e n t}$ = RndomMutate(heaptree[0])
24:: end if
25:: end while
26:: return $T C_{s u c c e s s}$

5.3.2. Initial Inputs

Because different log injection vulnerabilities may have different effects, some will lead to cross-site scripting execution, and others will lead to remote code execution. Therefore, we propose five initial inputs, as shown in Table 2. S1 means inputting the script tag, which is used to execute the JavaScript statement in the tag. S2 means executing the injected JavaScript code by entering the HTML event attribute. S3 means executing the injected JavaScript code through the JavaScript pseudo protocol. S4 executes the remote js file through the src or href attribute. S5 executes the inserted PHP code through the PHP tag.

5.3.3. Mutation Operations

The purpose of mutation is to allow injected code to both retain execution semantics and bypass the web application’s input-filter checks. The research on the payload composition of log injection vulnerabilities found that it mainly includes five core parts: prefix, tag, attribute, code snippet, and suffix, as shown in Figure 3.

We further observed the implementation of different web applications’ input-filter check modules. We found that it mainly detects whether the data input by the user contain dangerous tags, attributes, and code snippets. Once it is found that the user enters these dangerous data, the web application will filter out these dangerous data or prohibit the input. However, because different web applications use different input-filter check modules and different web application developers have different levels of security awareness, there are usually input detection module defects and corresponding bypass methods. Therefore, we propose six mutation operations for the five core parts of prefix, suffix, tag, attribute, and code snippet and the whole test case. Each mutation operation serves both of the purposes mentioned above. We take the HTTP request parameters in the log-injectable interfaces recorded in Section 5.1 as the mutation vector. The six mutation operations are as follows:

M1: Add prefix: The purpose of this mutation operation is to close or annotate the HTML or PHP code before the injection point so that the injected code can retain the execution semantics. The mutation is added before the test case: ’, ", >, ’>, ">, –>, */, ’;, "; and other characters.

M2: Add suffix: The injected data usually break the syntax of the original code and cannot be executed correctly. In order to ensure that the grammar rules are not violated, we propose this mutation operation, adding characters such as <!–, ’, ", //, #, ;, and other characters after the test case. These characters will comment out the code behind the injection point or close the code behind the injection point.

M3: Tag transformation: Tag transformation mainly performs operations such as uppercase and lowercase transformation, double-writing tags, replacing tags, deleting tags, and tag encoding on the tags in the test case to bypass the detection of dangerous labels by the input-filtering check module.

M4: Attribute transformation: Attribute transformation mainly transforms the attributes in the test case, including case transformation, double-writing attributes, replacement attributes, coding attributes, etc.

M5: Code snippet transformation: Code snippet transformation includes the replacement of code functions injected into the payload, code block splitting, code block encoding, and bracket removal, etc., to bypass the input detection module’s filtering of dangerous characters, such as alert(1)=>confirm(1), alert(1)=>var A="ale"+"rt"+"(1);";eval(A);, phpinfo()=>(

\tilde{%}

8F%97%8F%96%91%99%90)(), alert(1)=>onerror=alert; throw 1.

M6: Overall transformation: Overall transformation includes transforming the entire test case, including encoding the entire test case, replacing spaces and parentheses in the test case, and so on.

5.3.4. Feedback

Each test case generated by LogInjector contains a random string of a fixed length behind it, which can accurately locate the location of the test case in the log storage location and the HTML response and then calculate the feedback information corresponding to each test case. Eight kinds of feedback information are shown in Table 3. For F1, F2, F3, and F4, we match the log storage location to identify whether we have successfully injected tags, attributes, and code snippets in the log and then calculate the similarity between the test cases we entered and the strings injected in the log. Next, for F5, F6, F7, and F8, we parse the HTML response through the html5lib [37] library, determine whether the input tags, attributes, and code snippet are injected into the HTML response, and calculate the string similarity between the input test case and the output content in the HTML response. We assign corresponding weights to the above feedback information and score the feedback information corresponding to each test case to indicate the mutation effect of the test case.

5.3.5. Vulnerability Verification

Vulnerability verification is used to detect whether the current test case successfully triggers the vulnerabilities. In order to reduce false positives, LogInjector performs the corresponding vulnerability verification method according to the initial input corresponding to each test case. For initial inputs 1–4, LogInjector uses the html5lib library to parse the HTML response and locates the position of the test case in the HTML response based on the fixed-length random string contained in the test case. Then, LogInjector extracts the tag at this position. If it is a script tag, check whether the code snippet in the tag has alert(1). If it is a media tag, check whether the code snippet in the tag has an event-attribute with on prefix and the corresponding attribute value is alert(1), such as onerror =alert(1). If it is another tag, check whether the code snippet in the tag contains the src or href attribute, and the corresponding attribute value is http://attack.js or javascript:alert(1). For initial input 5, check if the HTML response contains the characteristic string (Zend Memory Manager) output by phpinfo. If the above requirements are met, LogInjector judges that the vulnerability is triggered.

6. Evaluation

In this section, we evaluate LogInjector through experiments. We introduce the experimental setup in Section 6.1. In Section 6.2, we exploit LogInjector to detect known log injection vulnerabilities and zero-day log injection vulnerabilities to illustrate the effectiveness of LogInjector. The advantages of LogInjector are demonstrated in Section 6.3 by comparison with recent similar related work.

6.1. Experiment Setup

6.1.1. Experiment Dataset

We selected 14 web applications as the experimental dataset, as shown in Table 4, which includes the web applications selected by the recent related research [11,15] experiments and web applications with known log injection vulnerabilities. We download the source code of each application from its official website or GitHub and configure each web application. Note that the versions corresponding to the first nine web applications in the table are backup version that is not in use anymore, and the versions corresponding to the last five web applications are the latest versions during the experiment.

6.1.2. Test Environment

We experimented on Ubuntu 21.04, which used an Intel Core i7-10750 (2.60GHz) CPU with 32GB of RAM. For each web application, we deployed it on Ubuntu 21.04, which used an Intel Core i7-10750 (2.60GHz) CPU with 4GB of RAM.

6.2. Effectiveness

We used LogInjector to detect web application log injection vulnerabilities in 14 web applications and detected 16 log injection vulnerabilities, including 6 zero-day log injection vulnerabilities.

6.2.1. Known Vulnerabilities Verification

In order to verify the effectiveness of LogInjector, we first use LogInjector to test web applications with known log injection vulnerabilities. The web applications, version numbers, and vulnerability numbers corresponding to these known vulnerabilities are shown in Table 5. The experimental results show that LogInjector can detect all the vulnerabilities in Table 5.

LogInjector can accurately identify that SuiteCRM and webmin use files to record logs, and the other seven web applications use databases to record logs. By constructing a specific input to the web form, LogInjector can discover the log-injectable interfaces of the web application in Table 5. Combining the dynamic crawler and the static features of the view–logs interfaces, LogInjector can identify the view–logs interfaces of the web application in Table 5 and further use the dynamic testing method based on feedback-guided mutation to bypass the input-filter checks of these web applications.

6.2.2. Zero-Day Vulnerability Detection

The effectiveness of LogInjector has been preliminarily proved through previous experiments. Next, we verify the effectiveness of LogInjector for zero-day log injection vulnerability detection. We selected the web applications shown in the first column of Table 6, including the web applications used in previous similar work [11,15,29] experiments, and finally, we detected a total of six zero-day log injection vulnerabilities, five of which can lead to remote code execution and one that can cause cross-site scripting execution. We have notified the corresponding manufacturers of the details of these vulnerabilities and reported them to CNVD (China National Vulnerability Database). Table 6 lists the details of the zero-day vulnerabilities detected by LogInjector, including web applications and their version numbers, vulnerability types, and obtained vulnerability numbers.

LogInjector could identify that among the above six web applications, PHPFusion will store the log in the database, and the remaining five will store the log in the file. For AbanteCart, LogInjector imports files with specific content in the file upload form of AbanteCart’s data import module. When the AbanteCart application back-end program imports data according to the file content, because the file content does not meet the required format, the code execution error is reported, and the error message is recorded in the in the corresponding log file. For the other five web applications, LogInjector can submit specific input in the identified form and find the log-injectable interfaces. In addition, LogInjector can identify the view–logs interfaces of these six applications based on dynamic crawler and static features. After successfully identifying the log-injectable interfaces and the view–logs interfaces, LogInjector performs dynamic testing according to Algorithm 1. First, it fills the five initial inputs in Table 2 into the log-injectable interfaces in turn and stores them in the heaptree as initial test cases. Second, it takes a test case from the heaptree for testing and calculates the score of the test case according to the feedback information in Table 3. Third, it uses the vulnerability verification mechanism proposed in Section 5.3.5 to verify whether the current test case triggers the vulnerability. If the current test case triggers an RCE vulnerability or an XSS vulnerability, LogInjector stops detection and outputs a successful test case. If the current test case does not trigger the vulnerability, LogInjector compares whether the score of the current test case is higher than the score of the test case with the highest score in the heaptree; if it is higher, the current test case is added to the heaptree, and otherwise, the score of the test case decreases by 0.1. Then, LogInjector selects the test case with the highest score from the heaptree and randomly performs the mutation operation mentioned in Section 5.3.5 as the current test case and repeats the above process until a vulnerability is detected or the number of tests reaches the threshold. Note that although we have identified the log-injectable interfaces and the view–logs interfaces, we cannot determine whether the possible vulnerability is a cross-site scripting execution or a remote code-execution vulnerability. Therefore, the inputs during dynamic testing include the five initial inputs in Table 2 and the inputs mutated by the mutation operation proposed in Section 5.3.3, such as <script>alert(1)</script>, <SCrIpT>alert(1)</SCrIpT>, <?php phpinfo();?>, <? phpinfo();?> etc. Finally, LogInjector successfully writes the effective payload to the log of each web application and triggers the vulnerability by analyzing the feedback and constantly mutating the test case. It is worth noting that when we used LogInjector to detect the known vulnerability CVE-2020-17449 of PHPFusion 9.03.20, we found another zero-day log injection vulnerability in this version of PHPFusion. After the special characters in the search form of the members search module of PHPFusion are entered, an error will be reported in the execution of the web application code, and the log information of the error will be stored in the database DB_ERRORS table. When the website administrator accesses the “Error Log” module in the background, the execution of entered malicious data is triggered.

6.3. Comparison with Existing Work

In this section, we compare and analyze LogInjector with Black Widow [29]. We choose Black Widow because currently only Black Widow has the ability to detect a subset of log injection vulnerabilities automatically and it has open-source code. Black Widow adopts the method of a black box dynamic crawler by inputting random strings in each input box and matching these strings in subsequent web pages to analyze the web application’s inter-state dependencies. Then, it performs vulnerability detection. We use LogInjector and Black Widow to detect the 16 log injection vulnerabilities in the second column of Table 7. The experimental results are shown in Table 7. We enumerate the detection capabilities of LogInjector and Black Widow for these 16 log injection vulnerabilities.

As shown in Table 7, LogInjector detected a total of 16 log injection vulnerabilities, including 9 cross-site scripting vulnerabilities and 7 remote code-execution vulnerabilities. Black Widow detected a total of three log injection vulnerabilities, all of which are cross-site scripting vulnerabilities. Because Black Widow cannot detect remote code-execution vulnerabilities, we mainly analyzed the reasons why Black Widow did not detect the remaining six cross-site scripting vulnerabilities, which can be summarized as the following three:

The input constructed by Black Widow cannot be written to the log;
The set of test cases injected by Black Widow is limited, and even if the log is written, it does not have execution semantics. The injected data cannot be executed;
Black Widow injected a payload with code execution semantics, but did not identify the view-logs interface and failed to trigger the vulnerability.

The results of our analysis of these six vulnerabilities are shown in Table 8. Taking Reason-2 as an example, we can see that two web applications corresponding to the Reason-2 row in the Table 8 both perform different input filtering on user input. Due to Black Widow’s fixed set of test cases, the injected data are encoded or replaced and cannot be executed. Thanks to our proposed dynamic testing method based on feedback-guided mutations, LogInjector can mutate test cases based on feedback in logs as well as in HTML responses and finally enter an effective payload.

The comparative experimental results prove the challenges of using black-box detection methods to detect such vulnerabilities. This result is similar to those of previous studies [21,22,23,24,48]. For special scenarios, such as the Reason-2 row in Table 8, blindly traversing the input test cases cannot trigger the vulnerability. On the other hand, as shown in the Reason-3 row in Table 8, even if an effective payload is injected, the previous work is meaningless without identifying which page being visited can trigger the vulnerability. The comparative experiments further illustrate the necessity for LogInjector to obtain the log storage location, log-injectable interfaces, and view–logs interfaces in the early stage.

6.4. Discussion

In this section, we will discuss the capabilities and limitations of LogInjector and future directions of research.

Malicious attackers can use log injection vulnerabilities to disrupt the normal operation of a website, bypass authentication to take over the background of the website, remotely control the target host to obtain user private information, and seriously affect the availability, confidence, and integrity of web applications. LogInjector can detect existing log injection vulnerabilities during web application testing and report them to web application developers to fix the vulnerabilities and avoid the harm caused by the vulnerabilities. This ensures that the web application can normally provide services to the user and that the user’s private data is not stolen.

We propose LogInjector, an effective method for detecting log injection vulnerabilities; however, LogInjector has some limitations. (1) The coverage issue of crawling target interface: LogInjector is built on the basis of crawlergo. If the vulnerability-related interface is missed during the crawling process of the target web application, the corresponding vulnerability cannot be detected. (2) Inability to discover new types of vulnerabilities: The mutation operations of LogInjector are set based on known methods that bypass the application-specific content-filtering checks. Therefore, the log injection vulnerabilities discovered by LogInjector are known types of vulnerabilities, and it cannot discover new kinds of vulnerabilities. These limitations and challenges motivate us to go further in future research.

In the future, we intend to study improving the coverage of the crawling target interface to optimize LogInjector and extend this method to research on the detection of other new types of vulnerabilities (Cache Poisoning [49] and HTTP request smuggling [50]).

7. Conclusions

In this paper, we propose LogInjector, an effective method for detecting log injection vulnerabilities in web applications. LogInjector uses the extended crawler to identify log-injectable interfaces and the view–logs interfaces. Then, it guides the test case of mutation detection of log injection vulnerability based on the feedback information. We evaluate the performance of LogInjector in detecting the log injection vulnerabilities of 14 web applications. The experimental results demonstrate that LogInjector can effectively detect log injection vulnerabilities. LogInjector found 16 web application log injection vulnerabilities on 14 web applications, and 6 are zero-day vulnerabilities. Finally, by finding web application log injection vulnerabilities, in the wild, LogInjector can be used to detect possible vulnerable interactive functions in web applications, and developers can fix the vulnerabilities in time to avoid more serious losses.

Author Contributions

Conceptualization, Z.P. and Y.C. (Yu Chen); methodology, Y.C. (Yu Chen); software, Y.C. (Yu Chen); validation, Z.P., Y.C. (Yuanchao Chen), Y.L. and Y.S.; formal analysis, Y.C. (Yu Chen); investigation, Y.C. (Yuanchao Chen); resources, Y.S. and Y.L.; data curation, Y.C. (Yuanchao Chen); writing—original draft preparation, Y.C. (Yu Chen); writing—review and editing, Z.P.; visualization, Y.C. (Yuanchao Chen); supervision, Y.S.; project administration, Y.C. (Yuanchao Chen); funding acquisition, Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Acknowledgments

We would like to sincerely thank the reviewers for their insightful comments, which helped us improve this work.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DOM	Document Object Model
CNVD	China National Vulnerability Database
XSS	Cross-site scripting
RCE	Remote code execution
OWASP	Open Web Application Security Project

References

Home—Official Home of the PHPFusion CMS. Available online: https://www.php-fusion.co.uk/ (accessed on 1 June 2022).
CVE—CVE-2020-17449. Available online: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2020-17449 (accessed on 1 June 2022).
Contao Open Source CMS—Contao. Available online: https://contao.org/ (accessed on 1 June 2022).
Webmin. Available online: https://www.webmin.com/ (accessed on 1 June 2022).
Log Injection Software Attack|OWASP Foundation. Available online: https://owasp.org/www-community/attacks/Log_Injection (accessed on 1 June 2022).
Jovanovic, N.; Kruegel, C.; Kirda, E. Static analysis for detecting taint-style vulnerabilities in web applications. J. Comput. Secur. 2010, 18, 861–907. [Google Scholar] [CrossRef] [Green Version]
Appelt, D.; Nguyen, C.D.; Briand, L.C.; Alshahwan, N. Automated testing for SQL injection vulnerabilities: An input mutation approach. In Proceedings of the 2014 International Symposium on Software Testing and Analysis, San Jose, CA, USA, 21–25 July 2014; pp. 259–269. [Google Scholar]
Backes, M.; Rieck, K.; Skoruppa, M.; Stock, B.; Yamaguchi, F. Efficient and flexible discovery of php application vulnerabilities. In Proceedings of the 2017 IEEE European Symposium on Security and Privacy (EuroS&P), Paris, France, 26–28 July 2017; pp. 334–349. [Google Scholar]
Fang, Y.; Li, Y.; Liu, L.; Huang, C. DeepXSS: Cross site scripting detection based on deep learning. In Proceedings of the 2018 International Conference on Computing and Artificial Intelligence, Chengdu, China, 12–14 March 2018; pp. 47–51. [Google Scholar]
Tang, P.; Qiu, W.; Huang, Z.; Lian, H.; Liu, G. Detection of SQL injection based on artificial neural network. Knowl.-Based Syst. 2020, 190, 105528. [Google Scholar] [CrossRef]
Van Rooij, O.; Charalambous, M.A.; Kaizer, D.; Papaevripides, M.; Athanasopoulos, E. webFuzz: Grey-Box Fuzzing for Web Applications. In Computer Security—ESORICS 2021; Bertino, E., Shulman, H., Waidner, M., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 152–172. [Google Scholar]
Erdődi, L.; Sommervoll, Å.Å.; Zennaro, F.M. Simulating SQL injection vulnerability exploitation using Q-learning reinforcement learning agents. J. Inf. Secur. Appl. 2021, 61, 102903. [Google Scholar] [CrossRef]
Liu, Z.; Fang, Y.; Huang, C.; Xu, Y. GAXSS: Effective Payload Generation Method to Detect XSS Vulnerabilities Based on Genetic Algorithm. Secur. Commun. Netw. 2022, 2022, 2031924. [Google Scholar] [CrossRef]
Lee, S.; Wi, S.; Son, S. Link: Black-Box Detection of Cross-Site Scripting Vulnerabilities Using Reinforcement Learning. In Proceedings of the ACM Web Conference, Lyon, France, 25–29 April 2022; pp. 743–754. [Google Scholar]
Zhao, J.; Lu, Y.; Zhu, K.; Chen, Z.; Huang, H. Cefuzz: An Directed Fuzzing Framework for PHP RCE Vulnerability. Electronics 2022, 11, 758. [Google Scholar] [CrossRef]
Balzarotti, D.; Cova, M.; Felmetsger, V.V.; Vigna, G. Multi-module vulnerability analysis of web-based applications. In Proceedings of the 14th ACM Conference on Computer and Communications Security, Alexandria, VA, USA, 28–31 October 2007; pp. 25–35. [Google Scholar]
Dahse, J.; Holz, T. Static Detection of Second-Order Vulnerabilities in Web Applications. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security 14), San Diego, CA, USA, 20–22 August 2014; pp. 989–1003. [Google Scholar]
Olivo, O.; Dillig, I.; Lin, C. Detecting and exploiting second order denial-of-service vulnerabilities in web applications. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, 12–16 October 2015; pp. 616–628. [Google Scholar]
Yan, L.; Li, X.; Feng, R.; Feng, Z.; Hu, J. Detection method of the second-order SQL injection in Web applications. In Proceedings of the International Workshop on Structured Object-Oriented Formal Language and Method, Queenstown, New Zealand, 29 October 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 154–165. [Google Scholar]
Suto, L.; San, C. Analyzing the Accuracy and Time Costs of Web Application Security Scanners. Appl. Secur. 2010, 187, 64. [Google Scholar]
Bau, J.; Bursztein, E.; Gupta, D.; Mitchell, J. State of the art: Automated black-box web application vulnerability testing. In Proceedings of the 2010 IEEE Symposium on Security and Privacy, Oakland, CA, USA, 16–19 May 2010; pp. 332–345. [Google Scholar]
Vieira, M.; Antunes, N.; Madeira, H. Using web security scanners to detect vulnerabilities in web services. In Proceedings of the IEEE/IFIP International Conference on Dependable Systems & Networks, Lisbon, Portugal, 2 July 2009; pp. 566–571. [Google Scholar]
Doupé, A.; Cova, M.; Vigna, G. Why Johnny can’t pentest: An analysis of black-box web vulnerability scanners. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Bonn, Germany, 8–9 July 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 111–131. [Google Scholar]
Parvez, M.; Zavarsky, P.; Khoury, N. Analysis of effectiveness of black-box web application scanners in detection of stored SQL injection and stored XSS vulnerabilities. In Proceedings of the 2015 10th International Conference for Internet Technology and Secured Transactions (ICITST), London, UK, 14–16 December 2015; pp. 186–191. [Google Scholar]
MCaLlister, S.; KirDa, E.; Kruegel, C. Leveraging User Interactions for In-Depth Testing of Web Applications. In Proceedings of the International Symposium on Recent Advances in Intrusion Detection, Cambridge, MA, USA, 15–17 September 2008. [Google Scholar]
Doupé, A.; Cavedon, L.; Kruegel, C.; Vigna, G. Enemy of the State: A {State-Aware}{Black-Box} Web Vulnerability Scanner. In Proceedings of the 21st USENIX Security Symposium (USENIX Security 12), Bellevue, WA, USA, 8–10 August 2012; pp. 523–538. [Google Scholar]
Duchene, F.; Rawat, S.; Richier, J.L.; Groz, R. KameleonFuzz: Evolutionary fuzzing for black-box XSS detection. In Proceedings of the 4th ACM Conference on Data and Application Security and Privacy, San Antonio, TX, USA, 3–5 March 2014; pp. 37–48. [Google Scholar]
Steinhauser, A.; Ta, P. Database traffic interception for graybox detection of stored and context-sensitive XSS. Digit. Threat. Res. Pract. 2020, 1, 1–23. [Google Scholar] [CrossRef]
Eriksson, B.; Pellegrino, G.; Sabelfeld, A. Black Widow: Blackbox Data-driven Web Scanning. In Proceedings of the 2021 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 24–27 May 2021; pp. 1125–1142. [Google Scholar] [CrossRef]
Alkhalaf, M.A. Automatic Detection and Repair of Input Validation and Sanitization Bugs; University of California: Santa Barbara, CA, USA, 2014. [Google Scholar]
Chin, E.; Wagner, D. Efficient character-level taint tracking for Java. In Proceedings of the 2009 ACM Workshop on Secure Web Services, Chicago, IL, USA, 13 November 2009; pp. 3–12. [Google Scholar]
Bisht, P.; Madhusudan, P.; Venkatakrishnan, V. CANDID: Dynamic candidate evaluations for automatic prevention of SQL injection attacks. ACM Trans. Inf. Syst. Secur. (TISSEC) 2010, 13, 1–39. [Google Scholar] [CrossRef]
Bulekov, A.; Jahanshahi, R.; Egele, M. Saphire: Sandboxing {PHP} Applications with Tailored System Call Allowlists. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Virtual, CA, USA, 11–13 August 2021; pp. 2881–2898. [Google Scholar]
Qianlitp/Crawlergo: A Powerful Browser Crawler for Web Vulnerability Scanners. Available online: https://github.com/Qianlitp/crawlergo (accessed on 1 June 2022).
History API—Web APIs | MDN. Available online: https://developer.mozilla.org/en-US/docs/Web/API/History_API (accessed on 1 June 2022).
setTimeout()—Web APIs | MDN. Available online: https://developer.mozilla.org/en-US/docs/Web/API/setTimeout (accessed on 1 June 2022).
html5lib/html5lib-Python: Standards-Compliant Library for Parsing and Serializing HTML Documents and Fragments in Python. Available online: https://github.com/html5lib/html5lib-PYTHON (accessed on 1 June 2022).
Easy Digital Downloads—Simple eCommerce for Selling Digital Files—WordPress plugin | WordPress.org. Available online: https://wordpress.org/plugins/easy-digital-downloads/ (accessed on 1 June 2022).
SuiteCRM—Open Source CRM Software Application for Businesses. Available online: https://suitecrm.com/ (accessed on 1 June 2022).
Zen Cart Support—Zen Cart™—Putting the Dream of Your Own Business within Reach of Anyone! Available online: https://www.zen-cart.com/ (accessed on 1 June 2022).
Free Shopping Cart Application and Open Source Ecommerce Solution. Available online: https://www.abantecart.com/ (accessed on 1 June 2022).
Maxsite CMS. Available online: https://max-3000.com/ (accessed on 1 June 2022).
DSGVO All in one for WP—WordPress Plugin | WordPress.org. Available online: https://wordpress.org/plugins/dsgvo-all-in-one-for-wp/ (accessed on 1 June 2022).
WP Photo Album Plus—WordPress Plugin | WordPress.org. Available online: https://wordpress.org/plugins/wp-photo-album-plus/ (accessed on 1 June 2022).
FREE Shopping Cart and Open Source eCommerce Platform—Start Selling Online for Free. Available online: https://www.oscommerce.com/ (accessed on 1 June 2022).
Better WordPress Google XML Sitemaps (Support Sitemap Index, Multi-Site and Google News)—WordPress Plugin | WordPress.org. Available online: https://wordpress.org/plugins/bwp-google-xml-sitemaps/ (accessed on 1 June 2022).
CE Phoenix Cart, Free Open Source Shopping Cart | CE Phoenix Cart. Available online: https://phoenixcart.org/ (accessed on 1 June 2022).
Anagandula, K.; Zavarsky, P. An Analysis of Effectiveness of Black-Box Web Application Scanners in Detection of Stored SQL Injection and Stored XSS Vulnerabilities. In Proceedings of the 2020 3rd International Conference on Data Intelligence and Security (ICDIS), South Padre Island, TX, USA, 24–26 June 2020; pp. 40–48. [Google Scholar]
Cache Poisoning | OWASP Foundation. Available online: https://owasp.org/www-community/attacks/Cache_Poisoning (accessed on 1 June 2022).
What is HTTP Request Smuggling? Tutorial & Examples|Web Security Academy. Available online: https://portswigger.net/web-security/request-smuggling (accessed on 1 June 2022).

Figure 1. The architecture of LogInjector. (TC in the figure represents the test case).

Figure 2. HTML code of viewing error log web interface page of PHPFusion.

Figure 3. Log injection vulnerability payload structure.

Table 1. Summary of related work.

Research Focus	Research Articles	Approach	Advantages	Disadvantages
First-order vulnerabilities	Jovanovic et al. [6] (2010)	Data flow analysis	Supports alias analysis	High false positives Misses dynamic features
	Appelt et al. [7] (2014)	Input mutation	Bypasses web application firewalls	Limited mutation operations
	Backes et al. [8] (2017)	Code property graph	Detects vulnerabilities based on graph queries	False positives exist Misses dynamic features
	Fang et al. [9] (2018)	Deep learning	Extracts the features of XSS payloads	False negatives exist
	Tang et al. [10] (2020)	Neural network	Bypasses web application firewalls	Misses context analyses
	Van Rooij et al. [11] (2021)	Coverage-based greybox fuzzing	High coverage	Limited mutation operations Only detects reflected XSS
	Erdődi et al. [12] (2021)	Reinforcement learning	Supports autonomous penetration testing	Only works with simple CTF challenges
	Liu et al. [13] (2022)	Genetic algorithm	Generates effective XSS payloads	Lacks adaptive capability
	Lee et al. [14] (2022)	Reinforcement learning	Supports output context analyses	Only detects reflected XSS
	Zhao et al. [15] (2022)	Directed fuzzing	Has an effective mutation strategy	Only detects first-order RCE
Second-order vulnerabilities	Balzarotti et al. [16] (2007)	Data flow analysis	Identifies sophisticated multi-module vulnerabilities	Does not support multi-step exploits
	McAllister et al. [25] (2008)	Black-box dynamic	Explores larger application parts	Does not support inter-state dependency analysis
	Doupé et al. [26] (2012)	State-aware black box	Explores more application code	Misses complex workflows and inter-state dependencies
	Duchene et al. [27] (2014)	Black box dynamic Genetic algorithm	No false positives	Requires the ability to reset the application
	Dahse et al. [17] (2014)	Taint analysis	Models the data flow through persistent data stores	High false positives
	Yan et al. [19] (2014)	Combines with static and dynamic analysis	Reduces false positives	Only detects second-order SQLi
	Olivo et al. [18] (2015)	Symbolic execution	Supports generating candidate attack vectors	Only detects second-order DoS
	Steinhauser et al. [28] (2020)	Graybox dynamic	Intercepts database traffic to identify vulnerabilities	Only detects stored XSS
	Eriksson et al. [29] (2021)	Black box data-driven	Supports inter-state dependency analysis	Only detects stored XSS

Table 2. Initial inputs.

ID	Initial Inputs	Description
S1	<script>alert(1)</script>	Script Tag
S2	<img src=’x’ onerror=alert(1)>	Event Attributes
S3	javascript:alert(1)	Javascript Pseudo Protocol
S4	<link rel=import src=’http://attack.js’>	Remote js file
S5	<?php phpinfo();?>	PHP Tag

Table 3. Feedback information.

ID	Description	Score
F1	Whether the tag is injected into the log	0–1
F2	Whether the attribute is injected into the log	0–1
F3	Whether the code snippet is injected into the log	0–1
F4	String similarity between the payload and the content injected in the log	[0,1]
F5	Whether the tag is injected into HTML	0–1
F6	Whether the attribute is injected into HTML	0–1
F7	Whether the code snippet is injected into HTML	0–1
F8	String similarity between payload and the corresponding content in the HTML	[0–1]

Table 4. Experimental dataset.

ID	Web Application	Version	ID	Web Application	Version
1	Contao [3]	4.9.5	8	easy-digital-downloads [38]	2.11.5
2	SuiteCRM [39]	7.11.19	9	ZenCart [40]	1.5.5e
3	webmin [4]	1.941	10	AbanteCart [41]	1.3.2
4	PHPFusion [1]	9.03.20	11	MaxSite CMS [42]	v108
5	DSGVO All in one for WP [43]	3.9	12	SuiteCRM [39]	7.11.23
6	WP Photo Album Plus [44]	8.0.9	13	osCommerce [45]	2.3.4.1
7	BWP Google XML Sitemaps [46]	1.4.1	14	CE Phoenix [47]	1.0.8.14

Table 5. Summary of public vulnerabilities.

Web Application	Detectable by LogInjector	Vulnerability Type	VUL-ID
Contao 4.9.5	Yes	XSS	CVE-2021-35210
SuiteCRM 7.11.19	Yes	RCE	CVE-2021-42840
webmin 1.941	Yes	XSS	CVE-2020-8820
webmin 1.941	Yes	XSS	CVE-2020-8821
PHPFusion 9.03.20	Yes	XSS	CVE-2020-17449
ZenCart 1.5.5e	Yes	RCE	CVE-2017-11675
DSGVO All in one for WP 3.9	Yes	XSS	CVE-2021-24294
WP Photo Album Plus 8.0.9	Yes	XSS	CVE-2021-25115
BWP Google XML Sitemaps 1.4.1	Yes	XSS	CVE-2022-0230
Easy-digital-downloads 2.11.5	Yes	XSS	CVE-2022-0706

Table 6. Zero-day log injection vulnerabilities discovered by LogInjector.

Web Application	Detectable by LogInjector	Vulnerability Type	VUL-ID
PHPFusion 9.03.20	Yes	XSS	CNVD-2022-48580
AbanteCart 1.3.2	Yes	RCE	CNVD-2022-48600
MaxSite CMS v108	Yes	RCE	CNVD-2022-51198
SuiteCRM 7.11.23	Yes	RCE	CNVD-2022-51165
osCommerce 2.3.4.1	Yes	RCE	CNVD-2022-48592
CE Phoenix 1.0.8.14	Yes	RCE	CNVD-2022-48597

Table 7. Vulnerability Detection Capability Comparison.

Vulnerability Type	VUL-ID	Web Application	LogInjector	Black Widow
XSS	CVE-2021-35210	Contao 4.9.5	✓	✗
	CVE-2020-8820	webmin 1.941	✓	✗
	CVE-2020-8821	webmin 1.941	✓	✗
	CVE-2021-24294	DSGVO All in one for WP 3.9	✓	✓
	CVE-2021-25115	WP Photo Album Plus 8.0.9	✓	✗
	CVE-2022-0230	BWP Google XML Sitemaps 1.4.1	✓	✓
	CVE-2022-0706	easy-digital-downloads 2.11.5	✓	✓
	CVE-2020-17449	PHPFusion 9.03.20	✓	✗
	CNVD-2022-48580	PHPFusion 9.03.20	✓	✗
RCE	CVE-2017-11675	ZenCart 1.5.5e	✓	✗
	CVE-2021-42840	SuiteCRM 7.11.19	✓	✗
	CNVD-2022-51165	SuiteCRM 7.11.23	✓	✗
	CNVD-2022-51198	MaxSite CMS v108	✓	✗
	CNVD-2022-48600	AbanteCart 1.3.2	✓	✗
	CNVD-2022-48592	osCommerce 2.3.4.1	✓	✗
	CNVD-2022-48597	CE Phoenix 1.0.8.14	✓	✗

Table 8. Statistics of Black Widow Undetected Vulnerability Causes.

Reason	Web Application	Vul-ID
Reason-1	PHPFusion 9.03.20	CVE-2020-17449
Reason-1	PHPFusion 9.03.20	CNVD-2022-48580
Reason-2	WP Photo Album Plus	CVE-2021-25115
	Webmin 1.941	CVE-2020-8820
	Webmin 1.941	CVE-2020-8821
Reason-3	Contao 4.9.15	CVE-2021-35210

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pan, Z.; Chen, Y.; Chen, Y.; Shen, Y.; Li, Y. LogInjector: Detecting Web Application Log Injection Vulnerabilities. Appl. Sci. 2022, 12, 7681. https://doi.org/10.3390/app12157681

AMA Style

Pan Z, Chen Y, Chen Y, Shen Y, Li Y. LogInjector: Detecting Web Application Log Injection Vulnerabilities. Applied Sciences. 2022; 12(15):7681. https://doi.org/10.3390/app12157681

Chicago/Turabian Style

Pan, Zulie, Yu Chen, Yuanchao Chen, Yi Shen, and Yang Li. 2022. "LogInjector: Detecting Web Application Log Injection Vulnerabilities" Applied Sciences 12, no. 15: 7681. https://doi.org/10.3390/app12157681

APA Style

Pan, Z., Chen, Y., Chen, Y., Shen, Y., & Li, Y. (2022). LogInjector: Detecting Web Application Log Injection Vulnerabilities. Applied Sciences, 12(15), 7681. https://doi.org/10.3390/app12157681

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LogInjector: Detecting Web Application Log Injection Vulnerabilities

Abstract

1. Introduction

2. Background

2.1. A Motivating Example

2.2. Investigating Known Log Injection Vulnerabilities

2.3. Vulnerability Conditions

3. Related Work

3.1. First-Order Vulnerability Analysis

3.2. Second-Order Vulnerability Analysis

4. Overview

4.1. Finding of Log-Injectable Interfaces

4.2. Identification of View–Logs Interfaces

4.3. Dynamic Testing of Feedback-Guided Mutations

5. Design

5.1. Finding Log-Injectable Interfaces

5.2. Identification of View–Logs Interfaces

5.3. Dynamic Testing of Feedback-Guided Mutations

5.3.1. Workflow

5.3.2. Initial Inputs

5.3.3. Mutation Operations

5.3.4. Feedback

5.3.5. Vulnerability Verification

6. Evaluation

6.1. Experiment Setup

6.1.1. Experiment Dataset

6.1.2. Test Environment

6.2. Effectiveness

6.2.1. Known Vulnerabilities Verification

6.2.2. Zero-Day Vulnerability Detection

6.3. Comparison with Existing Work

6.4. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI