The widespread use and distinctive characteristics of RESTful APIs have recently drawn significant attention from the testing research community. In this section, we provide an overview of the related works in automated test generation for RESTful APIs, with a particular focus on the existing methods for parameter value generation. Additionally, we introduce related research on using LLMs for fuzzing.
2.1. Automated Test Generation for RESTful API
For RESTful APIs, most of the existing works related to automated test generation employ black-box testing.
In 2019, Atlidakis et al. proposed RESTler [
5], the first stateful black-box RESTful API fuzzer designed to automatically test cloud services via their RESTful APIs. RESTler is a generation-based fuzzer that uses a predefined dictionary to instantiate parameter values and analyzes input OpenAPI specifications statically to infer dependencies between request types. Furthermore, Atlidakis et al. implemented several security rule checkers [
17] in RESTler that can automatically detect violations of these security rules. To further enhance RESTler’s capabilities, Godefroid et al. defined differential regression testing for RESTful APIs, which leverages RESTler to construct network logs on different versions of the target RESTful APIs and detects service and specification regressions by comparing these network logs [
18]. Godefroid et al. also studied how to generate data payloads learned from RESTful API specifications and find data-processing bugs in cloud services [
19].
Model-based testing is a representative approach for automatically generating test cases for RESTful APIs. Viglianisi et al. proposed RESTTESTGEN [
6], which considers data dependencies among operations and operation semantics. The dependencies between APIs are specified using an operation dependency graph (ODG), which is initialized with an OpenAPI schema and evolves during test execution. Liu et al. propose Morest [
7], which utilizes a dynamically updating RESTful-service property graph (RPG) to model the dependencies. Compared to the ODG, the RPG not only models the producer–consumer dependencies between APIs with more detail but also captures property equivalence relations between schemas. This allows the RPG to describe a broader range of RESTful service behaviors and flexibly update itself based on execution feedback.
Karlsson et al. proposed QuickREST [
20], a property-based test generation approach using OpenAPI schemas. QuickREST checks for non-500 status codes and schema compliance. Hatfield-Dodds et al. presented Schemathesis [
21], a tool for finding semantic errors and crashes in OpenAPI or GraphQL web APIs through property-based testing.
Tsai et al. proposed HsuanFuzz [
22], which introduced the new concept of “RESTful API black-box fuzz testing based on coverage level guidelines”. HsuanFuzz uses test coverage level [
23] as feedback to guide black-box fuzzers. Additionally, HsuanFuzz employs pairwise testing to reduce the combinations of test parameters and accelerate the testing process. Wu et al. present RestCT [
10], a systematic and fully automated approach that adopts combinatorial testing (CT) to test RESTful APIs. RestCT is systematic in that it not only covers and tests the interactions of a certain number of operations in RESTful APIs but also the interactions of specific input parameters within each operation.
Very recently, Lin et al. proposed foREST [
8], a novel tree-based RESTful API black-box fuzzing approach that was more efficient than the traditional graph-based approaches. foREST subtly models the relations of APIs with a tree structure that not only reduces the complexity of API dependencies but also captures the priority of resource dependencies. Fuzzing experiments with real-world RESTful services show that foREST achieves better performance than state-of-the-art tools.
There are a few studies that employ white-box information (e.g., code coverage) to assist automated test generation for RESTful APIs.
Atlidakis et al. proposed Pythia [
24], a grey-box fuzzer that uses the built-in functions of Ruby to obtain code coverage. Pythia is designed based on the execution results of RESTler. The mutation stage is improved using a machine learning method, while the validity of the grammar is maintained even after some noise is injected. Arcuri et al. considered testing from a developer’s perspective and designed the white-box testing tool EvoMASTER [
25]. EvoMASTER uses genetic algorithms to generate random test cases and guide mutation testing based on coverage. Subsequently, Arcuri et al. [
26] improved EvoMASTER by employing the MIO algorithm for intelligent sampling to generate predefined structured tests, thus accelerating the search process. However, the enhanced EvoMASTER still exhibits low code coverage in certain projects, leaving room for further improvement. To address this, they [
27] take the database state into account by monitoring SQL commands and calculating heuristic values to enhance search-based testing. They continuously optimize and improve EvoMASTER with an adaptive weight-based hypermutation approach [
28] and extend Rd-MIO* by employing SQL commands [
29].
Regarding
concrete parameter value generation, using static mapping directories and random values are likely the simplest and most common approaches in the existing research [
5,
19]. RestTestGen [
6] proposes four sources for generating valid parameter values, including values from a previous successful request, parameter dictionaries, examples from the documentation, or completely random values. The subsequent work mostly utilizes the aforementioned conventional methods to generate request parameter values. According to the evaluation results of 10 RESTful API fuzzers conducted by Kim et al. [
30], these tools generate many invalid requests that are rejected by the services and fail to exercise service functionality in depth. In most cases, this occurs when the parameters require domain-specific values or have data-format restrictions.
Several recent works focus on generating valid parameter values to improve the effectiveness of test requests. Alberto et al. proposed an automated testing tool, RESTest [
11], based on their previous work where they introduced an IDL for formalizing request parameter constraint relationships and an automated analysis tool [
31]. RESTest enables the automatic analysis of internal parameter constraint relationships and generates parameter values that satisfy the constraints. RestCT [
10] proposes a pattern-matching approach to infer parameter constraint relationships and then uses combinatorial testing methods to construct request parameter constraint coverage arrays, thereby instantiating request parameter values. However, since RestCT relies on a set of hard-coded pattern matching, it lacks flexibility and generality. NLPtoREST [
12] proposes using natural language processing techniques to automatically extract parameter constraint relationships described in natural language in OpenAPI documentation and add these constraints to OpenAPI documents following OpenAPI syntax to assist in generating valid request parameter values during fuzz testing. Alonso et al. [
13] considered the practical significance of parameters and used natural language processing technology to extract data from a knowledge base, improving the automation of parameter value generation. Instead of generating valid parameter values, Mirabella et al. [
32] proposed a deep-learning-based approach to automatically infer whether a generated request is valid, i.e., whether the request satisfies all parameter constraints, before dynamic testing.
Existing test case generation still has two main limitations, which cause state-of-the-art RESTful API fuzzers to generate many invalid test requests during fuzzing. On the one hand, existing test case generators are unaware of the errors in invalid test requests, leading to the subsequent generation of more invalid test requests containing the same incorrect parameters. Instead of reassigning all parameters of the request template each time, we propose an innovative approach to generate new test requests by revising the incorrect parts of previously tested invalid requests. On the other hand, existing strategies for parameter value generation struggle to produce valid values for incorrect parameters due to a lack of effective information. Relying solely on information extracted from the OAS is insufficient to generate valid values for these incorrect parameters. These approaches overlook useful parameter description information in the error responses of invalid test requests. Moreover, they do not attempt to utilize the existing API-related resource information in the target services. In this paper, we design two new strategies for generating valid parameter values, based on understanding the semantic information of dynamic error responses and actively accessing API-related resource information from the SUTs.
2.2. Large Language Model for Fuzzing
Emerging LLMs have demonstrated impressive performance on many very specific downstream tasks using prompt engineering [
33], where users provide the LLMs with a description of the task and, optionally, a few examples of solving the task. More recently, researchers have explored the capacity of large language models (LLMs) in the field of fuzzing.
Lemieux et al. proposed CODAMOSA [
34], the first to apply LLMs to fuzzing. CODAMOSA demonstrates the capability of LLMs in the automatic generation of test cases for Python modules. For fuzzing deep learning libraries such as PyTorch and TensorFlow, Deng et al. proposed TitanFuzz [
35], which uses Codex [
36] to generate seed programs and InCoder [
37] to perform template-based mutation. Deng et al. also proposed FuzzGPT [
38], which utilizes historical bug-triggering code snippets to either prompt or fine-tune LLMs for generating more unusual code snippets, leading to more effective fuzzing. Ackerman et al. [
39] leveraged LLMs for fuzzing parsers by employing LLMs to recursively examine a natural language format specification and generate instances from the specification to use as strong seed examples for a mutation fuzzer. Meng et al. proposed CHATAFL [
40], an LLM-guided fuzzing engine for protocol implementations, to overcome the challenges of the existing protocol fuzzers. CHATAFL uses LLMs’ knowledge of protocol message types for well-known protocols. Cheng et al. proposed MSFuzz [
41], which leverages LLMs to augment protocol fuzzing with message syntax comprehension. Wang et al. proposed LLMIF [
42], a fuzzing algorithm that incorporates LLMs into IoT fuzzing. Xia et al. proposed Fuzz4All [
43], which leverages LLMs as an input generation and mutation engine. Fuzz4All enables the generation of diverse and realistic inputs for any practically relevant language.
Two contemporaneous studies explore the capability of LLMs in RESTful API testing. Decrop et al. proposed RESTSpecIT [
44], the first automated RESTful API specification inference and black-box testing approach leveraging LLMs. Kim et al. proposed RESTGPT [
45], which harnesses large language models (LLMs) to enhance REST API specifications by identifying constraints and generating relevant parameter values. Unlike their work, we leverage LLMs to understand the semantic information in error responses during dynamic testing and revise invalid parameters. In other words, we apply LLMs to specialized tasks from a different perspective to enhance RESTful API fuzzing.