1. Introduction
Structured concurrency is a concurrent programming method that constrains the concurrent behavior between tasks by defining a concurrent region, enhancing the program’s maintainability and observability. The structured concurrency framework was first introduced in JDK19 [
1], which defines a clear scope for a set of tasks that are executed simultaneously to manage them, aiming at more precise control and management of concurrent tasks. Structured concurrency can help to simplify thread management issues in concurrent programming and can realize the programming style of one task per thread.
In recent years, various methods and tools have been proposed in both academia and industry to enhance the performance of concurrent programs. Some researchers have proposed methods for converting sequential code in Java to concurrent code. Dig et al. [
2] proposed a method to improve the responsiveness of mobile wearable devices by converting long-running blocking code to asynchronous execution. Tramontana et al. [
3] proposed how to utilize program analysis techniques to parallelize identified serial code. Other researchers have suggested using new features in the JDK to transform serial streams to parallel streams. Midolo et al. [
4] proposed a refactoring method to transform different
for loops to the corresponding stream structures to improve the execution efficiency of the loops. Radoi et al. [
5] proposed a refactoring method called
Relooper, which transforms serial code arrays to parallel arrays that can be executed in parallel. Additionally, there has been work on lock-related refactoring in parallel programs; for example, Frank et al. [
6] proposed an automatic refactoring tool called
Relocker, which helps programmers to refactor synchronized blocks to reentrant or read–write locks. Zhang et al. introduced refactoring tools
FineLock [
7] and
Clock [
8] to improve lock patterns.
Most of the refactoring tools mentioned above are built based on the Java 8 APIs, such as Lambda expressions and the Stream API. These refactoring tools, although they have played certain roles in their specific application scenarios, because of the limitations of their technical foundation, none of them can effectively solve the complexity problems brought about by various aspects such as task coordination, error propagation control, and complex concurrent logic organization, in concurrent programming. Therefore, it is difficult for them to achieve in-depth optimization and efficient management of concurrent programming. However, our tool is based on the structured concurrency features in JDK 19. It can provide developers with a convenient, safe, and efficient concurrent programming model. In Java, although unstructured concurrency can be refactored to structured concurrency manually, manual refactoring is time consuming, labor intensive, and prone to errors. Automated refactoring can be employed, but it mainly faces the following challenges: (1) the identification of unstructured concurrency and refactoring it to structured concurrency through program analysis, (2) ensuring the independence of tasks during structured concurrency refactoring, and (3) ensuring that tasks do not exceed the boundaries of the structured-concurrency control during the structured-concurrency process.
To address these issues, this paper proposes an automated refactoring method. This method first uses visitor pattern analysis to locate the unstructured code for refactoring. Then, it performs a precondition to filter the unstructured code and conducts a scope analysis. Scope analysis ensures that the variable scope remains unchanged after refactoring. Finally, the refactoring is completed in the AST (abstract syntax tree) based on the results obtained. Using this method, we implemented the automated refactoring tool
ReStruct as a plugin within the Eclipse JDT [
9] framework. This tool was used to conduct experiments on seven real-world projects, including SystemML [
10] and Light-4j [
11]. The experimental results evaluated
ReStruct in terms of the number of refactorings, the changed LOCs, the refactoring time, and the performances of the refactored programs. A total of 82 refactorings were refactored by
ReStruct, with an average time of 27.3 s for each project. These results demonstrate that
ReStruct can effectively refactor unstructured concurrency to structured concurrency in Java.
The remainder of this paper is organized as follows:
Section 2 discusses related research work.
Section 3 presents the motivation.
Section 4 introduces the automated refactoring method for structured concurrency.
Section 5 presents the consistency check of the program before and after refactoring.
Section 6 presents the implementation of
ReStruct.
Section 7 discusses the experimental results.
Section 8 presents threats to the validity of the experiment.
Section 9 concludes the paper.
3. Motivation
This section illustrates motivation by selecting a method
callApiAsyncMultiThread from the class
Http2ClientPoolTest in the project
Light-4j, as shown in
Figure 1a. This method aims to use multiple threads to asynchronously call the API. First, it creates a thread pool with a fixed number of threads and a set of tasks (lines 2–4). Then, it submits tasks using the method
invokeAll(), which returns a collection of
futures representing the results of these tasks (line 5). Through the
futures, we can obtain the execution results or the exception information of the tasks. Next, it iterates through the
futures and uses the method
Future.get() to obtain the results of the tasks (lines 6–9). Finally, it outputs the results (line 10). This is the way of the unstructured concurrent task submission. There will be several problems with this kind of code. First, the exception handling for the subtasks lacks synchronization with the main thread’s capturing mechanism. When the main thread catches an exception from a subtask and terminates, it may not be able to promptly cancel or abort the remaining subtasks. Second, the interruption signal captured by the main thread is not effectively transmitted to subtasks. Even if the main thread stops executing after encountering an interruption, because the interruption signal will not be propagated to the subtasks, this may cause the subtasks that should be canceled immediately to continue executing. Finally, the main thread’s response to exceptions from subtasks may experience delays. Because
Future.get is a synchronous blocking method, when the main thread is waiting for the result of a particular task, it enters a blocked state. At this time, if other subtasks encounter an exception, the main thread may not be able to respond to this exception promptly. These issues may lead to the abuse of thread resources and could even cause thread leaks.
Figure 1b illustrates the results of the refactoring through
IntelliJ IDEA [
24]. This type of refactoring cannot be directly provided in IDEA; however, it can provide refactoring in the form of the template code. The template code means pre-editing the code that you want to generate and then generating it via shortcut keys at the place where it needs to be inserted. Lines 2–3 are the original code. The code template for the structured concurrency is on lines 4–10. It has no direct association with the original code and requires developers to manually modify the template according to the original code. First, during the task iteration process, the generic type (T) should be replaced by the return type of the task. Because the name of the task set is unknown, a placeholder is left. It is necessary to insert the name of the task set into the original code at this position (line 5). Second, the code to obtain the results of the tasks should be added after the method
scope.throwIfFailed. The code for task acquisition in
Figure 1a must be inserted on line 10 in
Figure 1c. Therefore, its definition needs to be moved forward, as shown on line 4 in
Figure 1c. In general, this refactoring belongs to a semiautomated refactoring approach. The current integrated development environments do not support the automatic refactoring of the unstructured concurrency to structured concurrency.
In contrast,
Figure 1c shows the correct refactoring result. First, because refactoring affects the scope of the variable
resultList, this variable is declared on line 4. Next, a scope object representing the structured concurrency context is created, which automatically releases the object once its context concludes. The strategy used by the scope object is
shutdownOnFailure, indicating that the failure of any task will lead to the failure of the entire task (line 5). The tasks are then submitted using the method
scope.fork (lines 7–9). The method
scope.join is used to block the main thread until the subtasks are completed. If any subtask throws an exception, the method
scope.throwIfFailed will immediately catch it. In this way, it can automatically cancel the execution of the remaining subtasks, effectively avoiding the risk of thread leakage (lines 10–11). Afterward, the results of the tasks are obtained with the
resultNow method, which is not blocking and can immediately retrieve the results of the tasks (line 13). Finally, it outputs the results of the tasks. By comparing
Figure 1b,c, we can find that there is still a lot of logic in the template method provided by
IDEA that needs to be manually supplemented.
We also obtain the execution times before and after the refactoring using a benchmarking tool,
JMH [
25]. The results show that the execution time before refactoring is 4.2 s, while the execution time after refactoring is 3.8 s, resulting in a 9.5% reduction in the runtime. This indicates that refactoring unstructured concurrency to structured concurrency can effectively improve program execution efficiency. However, current integrated development environments still do not provide sufficient support to achieve this type of refactoring.
4. Design
This section introduces the method Restruct, which is a refactoring method that can assist developers in refactoring unstructured concurrent code to structured concurrent code. The design of this method consists mainly of four parts: the refactor probe, precondition, program analysis, and transformation.
4.1. Overview
Figure 2 presents an architectural overview of
ReStruct. First, it parses the Java source code to generate an AST and traverses the AST with the
Visitor Pattern to locate the unstructured code. Second, it performs a precondition on the unstructured code to validate whether it meets the requirements for refactoring. Third,
ReStruct performs a scope analysis to collect the variables which scope will change during refactoring. Based on these results, the refactoring is undertaken on the AST. Finally, the refactored code is checked with the consistency rules.
To better describe the refactoring method, we make the following definitions, as shown in
Figure 3. Here,
cs represents the statement that creates a thread pool,
ss represents the statement that submits tasks with the thread pool,
gs represents the statement that obtains the result of the execution of the task,
ds represents the statement that destroys the thread pool, and
os represents the statements located between
ss and
gs. Among them, the scope of the variables defined in
os may change after refactoring.
Definition 1. (Method pending refactoring): In a program (P), given a set (M), collect all the methods to be refactored, . For , it holds that && denotes that must contain ss and gs.
Definition 2. (Tasks in method): In a program (P), all the tasks in M are defined as T, . For , , which indicates that each corresponds to an , and contains k tasks. The variables and represent the numbers of tasks before and after refactoring in .
Definition 3. (The type of method): In a method (), we classify according to the relative positions of and . If there exists a try structure that acts as a parent node for both statements, we classify the type of as , as shown in Figure 4a. In contrast, the type is defined as in Figure 4b, and this try structure is represented as . 4.2. Refactor Probe
The purpose of the refactor probe is to identify unstructured code segments in a Java program. First, ReStruct uses the JDT’s AST parser to convert the entire project to the corresponding AST. The AST is a tree representation of the source code and can present the structure and syntax information of the program. In the AST, each variable has binding information. With the help of the binding information, the specific types of variables can be determined. Second, ReStruct utilizes Visitor Pattern to traverse the AST and check the type of the left-hand variable in each statement VariableDeclaration. Once it is found that the variable belongs to the ExecutorService interface or its implementation class, this variable will be recorded. Continuing the traversal, Restruct checks whether the variable is used in any MethodInvocation statement. If so, the method traversed currently will be marked and added to the set (M).
4.3. Precondition
The purpose of the precondition is to verify whether the methods in the set (M) meet the refactoring requirements. Basically, it focuses mainly on the following two aspects:
4.3.1. Precondition for shutdownOnFailure Strategy
ShutdownOnFailure is a strategy in structured concurrency. It regards all the tasks as a whole and requires them to succeed together. Moreover, when one of the tasks fails, it cancels the remaining tasks that are running. In order to make the code before refactoring comply with this principle, we need to check the following two aspects: First, exceptions in subtasks should be captured in a timely manner in the main thread. In this way, the main thread can be aware of the execution status of the subtasks. Once a subtask fails, the failure of the entire concurrent task can be judged and dealt with. Second, the subtasks submitted should be independent of each other. The independence of subtasks is to prevent exceptions of one subtask from spreading and affecting other subtasks because of dependency relationships, thus ensuring that all the subtasks can be regarded as a whole.
Check task result acquisition. In Java’s multithreaded task return mechanism, when a task is submitted, a
Future object can be used to represent the returned result of the task. Using the method
Future.get, we can obtain the result of the execution of the corresponding subtask or track the exception information from the subtask. If the method
Future.get is not called in the main thread, even if an exception occurs in the subtask, it cannot be captured, as shown in
Figure 5. The code snippet comes from the method
aggregateUnaryMatrix of the class
LibMatrixAgg in the project
SystemML. In this code snippet, a fixed-sized thread pool is created, and a group of tasks is added to the task set (lines 2–10) first. Then, this group of tasks is submitted through the method
invokeAll. Finally, the thread pool is closed (lines 11–12). However, this method does not obtain the returned result of
invokeAll, so the main thread cannot use
future.get to obtain the exceptions or the results of the subtasks. Therefore, regardless of whether the subtasks are executed successfully, the main thread will continue to execute. This clearly violates the strategy
shutdownOnFailure in structured concurrency. Therefore, we need to determine whether a
Future object has been returned after all the subtasks are submitted. Then, we need to judge whether the results of the task have been obtained through
Future.get in the main thread. In this way, the main thread can capture exceptions in subtasks in a timely manner and terminate.
Check task dependency. When converting from unstructured concurrency to structured concurrency, it is necessary to ensure independence between tasks. From the perspective of error propagation, independent tasks have no complicated dependencies and can directly terminate all the tasks according to the strategy ShutdownOnFailure once a certain task fails. From the aspects of task scheduling and execution, in structured concurrency, the relationship between a parent task and subtasks is usually organized into a tree structure. This tree structure allows us to decompose complex tasks into a series of smaller subtasks, forming a clear hierarchical relationship. If the subtasks are independent of each other, it avoids mutual constraints in sibling tasks, enabling a more flexible organization and execution of tasks. In this way, structured concurrency can better manage and schedule tasks, improving the maintainability and execution efficiency of the program.
Task dependency analysis aims to identify dependencies between subtasks in a multithreaded environment. In multithreaded programming, the dependencies in tasks are typically manifested as control and data dependencies. Control dependence refers to the relationship between the execution order among tasks. Data dependency refers to the potential competition that may arise when multiple tasks access or modify shared resources. Algorithm 1 presents the task dependency analysis, which takes the method
m and the tasks in
m as input. By invoking the methods
hasConDep and
hasDataDep, the Algorithm 1 performs the control and data dependency analysis (line 2). If any dependency exists, it returns
true (line 3).
Algorithm 1: Task Dependency Analysis |
![Applsci 15 02407 i001]() |
The method hasConDep is used to detect control dependencies between tasks. Typically, tasks at the same level do not have mutual calling relationships. Therefore, the control dependency here primarily determines whether one task depends on the execution results of another task. The hasConDep method also takes m and tasks as input parameters. First, it obtains all the Future objects and the results of these Future objects in m, recording them as a set (res) (line 6). This step ensures that all the possible sources of dependencies are covered. Then, it iterates through the statements in tasks. If a statement uses a variable that belongs to res, it indicates that the current task depends on the results of the execution of other tasks. In this case, the algorithm returns true (lines 7–10). If no such variables are found, it checks the following statement. Finally, if no dependency is detected, it returns false (line 11).
The method hasDataDep is used to analyze data dependencies between tasks; hasDataDep determines dependency by analyzing the method (m) and the collection of tasks. First, it performs pointer analysis and returns the result of the pointer analysis (pta) (line 13). Here, pta is an instance of the interface PointsToAnalysis, which implements pointer analysis. PointsToAnalysis defines a series of methods for performing pointer analysis. Among them, the method reachingObject takes an object as input and returns the set of variables that reference this object. Next, a collection (set) is defined to store the pointer information of the variables for each class (line 14). Then, it iterates through each task, collecting the “value” objects and their corresponding pointer information in the task and storing them in the map (mp). “Value” represents a general variable in a program, including constants and variables. After the completion of the task traversal, mp is added to set (lines 15–20). In this way, the set contains the pointer information of the variables in m’s tasks. Finally, if the values of different maps in set have an intersection and one of them involves a write operation on its keys, true is returned (lines 21–25).
4.3.2. Precondition for Checking the Thread Pool
It is inadvisable to refactor the method using the customized thread pool. There are two main reasons for this: First, the customized thread pool has its own logic, which does not align properly with the APIs of the structured concurrency. For example, a customized thread pool may have unique pre-processing logic after task submission, which is quite different from the standard task submission process of the structured concurrency. Second, there are significant differences in aspects such as task state management, concurrency control mechanisms, and lifecycles, resulting in rather high difficulty in refactoring. For instance, with regard to concurrency control, a customized thread pool utilizes a particular locking mechanism to guarantee mutually exclusive access to resources. In contrast, structured concurrency employs different methods of task isolation and dependency control. Direct refactoring may cause conflicts. After manual refactoring, we found that after the refactoring of the customized thread pool, it frequently gives rise to unexpected errors.
Through the refactor probe, we locate the corresponding cs and check whether the types of variables declared in the cs inherit from ThreadPoolExecutor. If so, the currently traversed method must be removed from M.
4.4. Program Analysis
In this section, we will focus on scope analysis. This analysis provides a critical foundation for subsequent refactoring.
Scope analysis obtains the variables which scopes will change after refactoring. The structured concurrent code is typically contained within a
try-with-resources structure. If the method is of type
and defines variables in
os, refactoring to a structured concurrency will reduce the scope of these variables because of the newly added
try-with-resources structure. As shown in
Figure 1a,c, after refactoring, the
try-with-resources structure will include the code from line 5 to line 9 in
Figure 1a. This will reduce the scope of
resultList, as defined on line 6 in
Figure 1a. Therefore, the definition of the variable
resultList should be moved outside of the
try-with-resources structure. Therefore, the purpose of scope analysis is to find these variables.
Algorithm 2 is the pseudocode for the scope analysis algorithm, with its input parameters being the method m and the output being a set. LocalUses is used to check the usage of a particular variable. Specifically, a “local” object represents a local variable. The getUsesOf method in LocalUses takes a “local” object as a parameter. This method yields a set of positional details on the usage of this local variable. The main steps of the algorithm are as follows:
ReStruct defines a collection (res) to collect variables which scopes may change after refactoring (line 1);
If the method is of type , the algorithm returns res directly (lines 2–3) because the methods of type will not introduce any new try-with-resource structures after refactoring. Otherwise, it identifies the sets of subsequent statements in ss and gs, denoted as and (lines 4–5). Then, it calculates the difference between and , which represents os (line 6);
If os is empty, it indicates that no variable’s scope will change after refactoring, and the algorithm returns res (lines 7–8);
Otherwise, it obtains localUses to acquire the location information of where each variable is used. Then, it iterates through os to check whether each statement contains a local variable. If so, it uses the method localUses.getUsesOf() to collect the location information of the statements that use the local variable. If this location information intersects with , it indicates that the scope of this local variable will change after refactoring, and this local variable is added to res (lines 9–12). Finally, the algorithm returns res (line 13).
Algorithm 2: Scope Analysis |
![Applsci 15 02407 i002]() |
4.5. Transformation
This section elaborates on the refactoring transformation, which takes the method m as input. It introduces structured concurrency by modifying the AST source code. The key steps include adjusting the variable scope with the results of the scope Analysis, inserting and deleting AST nodes according to the immutability of the AST nodes, introducing the structured concurrency API, and performing a consistency check to ensure the correctness of the refactoring. Among them, the immutability of AST nodes means that because the AST is a syntax-based tree structure, each part of the code corresponds to a node. When modifying the code structure, a new node must be created first, and the information of the original node must be copied to the new node instead of directly modifying the parent–child relationships of the original node. Algorithm 3 presents a detailed delineation of the transformation.
- (1)
ReStruct obtains a copy (m’) of the method m and uses it as the input for the subsequent consistency check (it will be presented in the next section) (line 1). Then, it imports the necessary structured concurrency packages and creates a scope object to represent the structured concurrency context (lines 2–3);
- (2)
It obtains the result of the scopeAnalysis and records it as set. Then, it iterates the collection set, redefines the variables therein, and deletes their definitions in the original program (lines 7–9);
- (3)
It checks if ss and gs have a common parent node, twr. If not, it creates a try-with-resource node as twr and inserts twr after the definitions above (lines 7–10);
- (4)
It adds the scope object to the resource of twr and submits the task using the method scope.fork (lines 11–12);
- (5)
For methods of type , because of the immutability of the AST node, it iterates os, using the method AST.copyNode to duplicate each statement and insert the copy into twr. Finally, it removes the original node from the AST (lines 13–16);
- (6)
It inserts methods scope.join and scope.throwIfFailed and replaces Future.get with Future.resultNow (lines 17–18). Then, it tries to remove statements cs and ds from the AST (line 19);
- (7)
Finally, it performs a consistency check on the refactored method (m). If it fails the consistency check, it returns false (lines 19–21).
Algorithm 3: Transformation |
![Applsci 15 02407 i003]() |
5. Consistency Check
The consistency check is used to check the consistency of the program before and after refactoring.
FlexSync [
26] is a tool that can perform refactoring and conversion among different synchronization mechanisms by applying different markers. It defines a set of consistency check rules to verify the correctness of concurrent programs before and after refactoring. Similarly to
FlexSync, we design the consistency check rules for
ReStruct. Before presenting these rules, we will first make the relevant definitions.
Definition 4. (Task dependency). The dependency relationship for in is defined as the set . For , we have , indicating that the execution of task j may depend on multiple tasks.
Definition 5. (Resources). For each os in , the definitions of the various variables, objects, and other resources contained in the set are represented as .
Definition 6. (Resource access). For each , the subsequent read and write operations contained after the statements (gs) are denoted as . In , if there exists an operation () that accesses a certain resource (r), it is represented as .
Definition 7. (Task cancellation mechanism). For , when tasks fail, the set of tasks that are canceled at this time is defined as the set . Among them, indicates that when these tasks fail, there will be multiple tasks to be canceled. However, the number of tasks canceled must be less than the total number of tasks minus one in .
Based on the definitions provided above, the following consistency check rules for refactoring are presented:
Rule 1: Before refactoring, , with , it follows that . After refactoring, for the same , it remains that .
This rule indicates that the number of tasks will not change after refactoring, thus ensuring the integrity of the tasks.
Rule 2: Before refactoring, for in , for all , we have . After refactoring, in this , for , we still have .
This rule indicates that if there is no dependency relationship among the tasks before refactoring then there will still be no dependency relationship among the tasks after refactoring.
Rule 3: Before refactoring , it is the case that . After refactoring, for the same , we have , where r is the number of tasks that have not yet been completed at that time.
Rule 3 ensures that the code that cancels specific tasks will not be refactored to structured concurrency. The reason is that the shutdownOnFailure strategy does not allow the selective cancellation of individual tasks. In essence, this rule safeguards the integrity of the task cancellation mechanism during the refactoring process.
Rule 4: Before refactoring, for , , , such that . After refactoring, for , still contains .
Rule 4 ensures that before refactoring, the resources in os are accessible by a specific read/write operation (op) and remain accessible by the same operation after refactoring.