A Combinatorial Strategy for API Completion: Deep Learning and Heuristics

Liu, Yi; Yin, Yiming; Deng, Jia; Li, Weimin; Peng, Zhichao

doi:10.3390/electronics13183669

Open AccessArticle

A Combinatorial Strategy for API Completion: Deep Learning and Heuristics

by

Yi Liu

^1,2,*,

Yiming Yin

¹,

Jia Deng

¹,

Weimin Li

² and

Zhichao Peng

²

¹

School of Computer Science and Engineering, Hunan University of Science and Technology, Xiangtan 411201, China

²

School of Information, Hunan University of Humanities, Science and Technology, Loudi 417000, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(18), 3669; https://doi.org/10.3390/electronics13183669

Submission received: 18 August 2024 / Revised: 9 September 2024 / Accepted: 10 September 2024 / Published: 15 September 2024

Download

Browse Figures

Versions Notes

Abstract

:

Remembering software library components and mastering their application programming interfaces (APIs) is a daunting task for programmers, due to the sheer volume of available libraries. API completion tools, which predict subsequent APIs based on code context, are essential for improving development efficiency. Existing API completion techniques, however, face specific weaknesses that limit their performance. Pattern-based code completion methods that rely on statistical information excel in extracting common usage patterns of API sequences. However, they often struggle to capture the semantics of the surrounding code. In contrast, deep-learning-based approaches excel in understanding the semantics of the code but may miss certain common usages that can be easily identified by pattern-based methods. Our insight into overcoming these challenges is based on the complementarity between these two types of approaches. This paper proposes a combinatorial method of API completion that aims to exploit the strengths of both pattern-based and deep-learning-based approaches. The basic idea is to utilize a confidence-based selector to determine which type of approach should be utilized to generate predictions. Pattern-based approaches will only be applied if the frequency of a particular pattern exceeds a pre-defined threshold, while in other cases, deep learning models will be utilized to generate the API completion results. The results showed that our approach dramatically improved the accuracy and mean reciprocal rank (MRR) in large-scale experiments, highlighting its utility.

Keywords:

API completion; transformer; n-gram; combinatorial strategy; deep learning

1. Introduction

The modern software industry increasingly relies on third-party libraries provided by companies or open-source organizations, and programmers frequently need to interact with the APIs of these libraries to accomplish daily tasks. However, most API libraries are complex and difficult to use, leaving developers with the daunting task of finding APIs in a vast ocean of types [1]. Therefore, becoming acquainted with all available APIs is nearly impossible, and even experienced programmers encounter numerous issues when using unfamiliar API libraries, not to mention novice developers. Moreover, the API libraries provided by different companies and organizations follow various styles, and there may be complex dependencies between APIs, which adds to the difficulty of API usage. In response to these challenges, developers need an effective way to better manage complex API libraries. Consequently, API completion technology has emerged. It facilitates the software development process by providing developers with contextually relevant and timely API suggestions, which enhance development efficiency and improve code quality.

Existing API completion technologies primarily focus on two approaches: pattern-based and learning-based research:

Pattern-based methods view API completion as a recommendation task, employing traditional collaborative filtering (user-item) techniques [2,3]. In these approaches, the internal context is treated as the user, and the APIs as items. Similarities between users are calculated to identify and recommend the most similar APIs. There have been a number of studies that captured API usage patterns by mining association rules from large corpora, which is a complicated computational process [4,5,6,7]. Additionally, some tools augment existing API recommendation methods by using heuristic parsers to mine API usage patterns [8]. These tools address API recommendation challenges in common usage patterns. However, these approaches do not try to understand the semantics of the code, and thus fail to consider the relationship between APIs and other tokens in the code.

Learning-based approaches rely on the naturalness of the software [9], deploying machine learning or deep learning methods on the code. Differently from pattern-based approaches, learning-based approaches consider the relationship between APIs and other code tokens. They conceptualize the API completion challenge as a next-token prediction issue, introducing numerous statistical language models to address this [10,11,12]. Additionally, recent research work has attempted to utilize syntax and data flow information for more accurate prediction using deep learning models [13,14]. Overall, deep-learning-based completion techniques have shown promising results in API recommendation, with the Transformer model standing out for its superior performance on real datasets [1,15]. However, deep learning models are non-deterministic black boxes with poor explainability, which means they may unintentionally produce inaccurate results. This means deep-learning-based approaches often miss common API usage patterns that can be easily identified by deterministic pattern-based approaches [1,8].

To address the aforementioned limitations, our observation is that the key to improving API completion lies in a hybrid approach that leverages the robust pattern recognition capabilities of pattern-based approaches and the semantic understanding strengths of deep learning models. While pattern-based approaches excel at identifying commonly used API sequences based on historical data, they often overlook the nuanced contextual clues that deep learning methods excel at capturing. Conversely, deep learning models, particularly those based on the Transformer architecture, provide a rich understanding of code semantics and can predict less frequent but contextually appropriate APIs that pattern-based methods might miss.

Based on this finding, we propose a combinatorial strategy that combines deep learning and heuristics for API completion (DLH-API). It skillfully combines the advantages of both approaches. Specifically, we utilize advanced Transformer models to perform deep learning on large-scale code datasets, extracting code context features to effectively predict APIs. We also adopt a Markov-based N-gram model to analyze the co-occurrence relationships among different APIs in the code, and predict APIs based on their frequency of occurrence. At the core of our approach is a confidence-based selector, which determines which type of approach should be utilized to generate predictions. The N-gram model will only be adopted if the frequency of a particular API sequence pattern exceeds a pre-defined threshold, as that indicates such a pattern is rather common in the real-world projects. In other cases, the Transformer model will be relied on to generate the API completions, since this deep learning technique generally achieves better prediction accuracy.

To evaluate the performance of DLH-API, we conducted experiments on four publicly available Python datasets, using accuracy and mean reciprocal rank (MRR) as evaluation metrics. The experimental results demonstrated that our combinatorial strategy achieved the best results across all datasets, compared to either an individual learning-based or pattern-based approach. For instance, the accuracy of DLH-API on the DL dataset reached 34.16%, which was higher than the accuracies of 32.85% and 15.06% for the standalone learning-based and pattern-based methods.

Our main contributions are summarized as follows:

We identified the complementarity between learning-based and pattern-based methods, and proposed a new API completion strategy that effectively integrates both approaches. This addresses the challenge that a single method struggles to simultaneously capture token-level details and specific API usage patterns.
We performed extensive experiments to evaluate our approach on four publicly available Python benchmark datasets. The experimental results demonstrated significant improvements over existing technologies.

2. Background

In this section, we will introduce the background knowledge used in this paper, including pattern-based and learning-based API completion methods.

2.1. Pattern-Based API Completion Methods

Pattern-based API completion methods work by capturing API usage patterns from large code repositories and using the similarity of these patterns for matching and completion. An API usage pattern refers to the sequence of API calls required to perform a specific function, and accurately capturing these patterns is crucial. Zhong et al. [16] first proposed a classical API usage pattern mining algorithm MAPO (mining API usages from open-source repositories). The method first clusters API call sequences and uses the SPAM algorithm [17] to mine frequent API call sequences in each clustered subset. Then, recommendations are made based on the similarity of API usage patterns. To reduce the redundancy in recommendation results, Wang et al. [18] proposed a new method called UP-Miner based on MAPO. This method employs a probabilistic model to represent the call sequences, which increases the simplicity and coverage of the API usage patterns. On the other hand, Raychev et al. [19] viewed the API completion task as a prediction problem for sentence probability. They extracted API call sequences from large codebases and used a bigram model to create the simplest forms of API usage patterns. Then, by analyzing the frequency of these patterns, generating candidate values, and finally using recurrent neural networks (RNN) to create probability distributions, this method achieves rapid and effective API completion. This approach opened a new way to predict APIs by combining pattern-based and learning-based methods.

The n-gram model [20] is one of the most commonly used models for modeling both natural language and programming languages. Its principle is based on a simple statistical assumption: the occurrence of a word only depends on the preceding

n - 1

words. This assumption is often referred to as the Markov property, i.e., future states depend only on a finite history of current states. In API call sequences, this model is employed to define the usage pattern of APIs, where a sequence of n APIs occurs with a specific frequency. Thus, the probability of the next API’s occurrence is based on the preceding

n - 1

APIs, and the calculation method is as follows:

p (a_{l} ∣ a_{1} a_{2} \dots a_{l - 1}) = p (a_{l} ∣ a_{l - n + 1} \dots a_{l - 1})

(1)

where l represents the length of the API call sequence,

a_{n}

denotes the nth API, and nth denotes the probability of the occurrence of an API.

Additionally, according to the definition of conditional probability, the probability of an API call sequence

a_{1}, a_{2}, \dots, a_{k}

can be calculated using the following formula:

p (a_{1} a_{2} \dots a_{k}) = p (a_{1}) p (a_{2} | a_{1}) \dots p (a_{k} | a_{1} a_{2} \dots a_{k - 1})

(2)

Based on Equations (1) and (2), if a 3-gram model is selected, the probability of a given API call sequence can be approximated as follows:

p (a_{1} a_{2} \dots a_{k}) = p (a_{1}) p (a_{2} | a_{1}) p (a_{3} | a_{1} a_{2}) \dots p (a_{k} | a_{1} a_{2} \dots a_{k - 1})

(3)

The n-gram model is a fundamental approach to capturing API usage patterns by extracting sequences of API method calls from software projects. This approach effectively identifies frequent sequences that predict subsequent API calls, providing a straightforward and practical way to predict API usage. However, pattern-based API completion methods mainly create the usage context by extracting API method call sequences from software projects. They lack a deep understanding of the relationships between APIs and other tokens in the code, which leads to weak context matching.

2.2. Learning-Based API Completion Methods

Learning-based API completion methods apply machine learning or deep learning techniques to large-scale corpora. They automatically learn the relationships between APIs and other tokens in the code to complete APIs. These methods not only understand the semantics of the code, but also recommend APIs based on the learned semantic information.

Early learning-based API completion methods relied on simple statistical models that recommended APIs by analyzing the co-occurrence frequencies of codes [21,22]. With the development of deep learning techniques, deep-learning-based API-completion methods have gradually gained attention. These approaches do not just consider the co-occurrence relations between APIs but treat APIs as individual tokens within the code and transform the API recommendation task into the problem of predicting the next token. Gu et al. [23] first demonstrated the effectiveness of deep learning in the field of API prediction. Their DeepAPI method, which used an RNN encoder–decoder architecture, effectively predicted API calls. Additionally, the introduction of LSTMs addressed the issue of vanishing gradients in analyzing complex code structures, resulting in significant improvements in API completion [24,25]. Many studies have applied these deep learning models to API completion, with notable success. Recently, with the introduction of the Transformer model, the learning-based approach to API completion has stepped into a new phase. By handling longer code sequences and capturing more complex dependencies, the Transformer model has further improved the relevance and accuracy of API completion [15,26], marking significant progress in understanding program semantics and structure.

The Transformer model abandons the traditional recurrent structure and adopts a self-attention mechanism to process sequence data in parallel. The structure is illustrated in Figure 1. In the self-attention layer, each token is multiplied by three trainable weight matrices,

W_{q}

,

W_{k}

, and

W_{v}

, to obtain three vectors of the same length, Q, K, and V, respectively. Then, the similarity scores of Q and K are computed and interacted with V to obtain the vector representation of each token. The specific calculation process is as follows:

Q = E W_{q}, K = E W_{k}, V = E W_{v}

(4)

Attn (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(5)

At the same time, the Transformer mode enhances the performance of API completion tasks through mechanisms of multi-head self-attention and masked self-attention. Multi-head self-attention captures features from different perspectives within the sequence, while masked self-attention ensures that each position in the sequence cannot “see” the tokens that follow it. In API completion, the Transformer uses the self-attention mechanism to precisely capture the dependencies between any two tokens in the code, even if they are far apart in the sequence. This capability allows each token to interact with other tokens in the sequence, effectively improving the model’s understanding.

3. Motivation Example

3.1. Example 1

Deep-learning-based approaches, especially the Transformer model, are able to capture and process complex code context dependencies and outperform n-gram models based on API call patterns in API completion tasks. As shown in the code snippet in Figure 2, the “with tf.?” implies that some kind of context management operation needs to be performed in TensorFlow 2.16.1.

The model analyzes the current context, including operations such as fetching tensor shapes, comparing elements, and calculating total elements. At the same time, the model learns from the training data that tf.control_dependencies() is typically used to ensure that particular operations are executed before further actions, to enforce execution order. As a result, despite the rarity of this API call pattern in training data, the Transformer model can accurately predict the use of tf.control_dependencies() by inferring the order of operations and dependencies from the context, demonstrating its strong code comprehension capabilities.

In contrast, pattern-based methods rely on frequent call patterns. If the training data lack sufficient instances of tf.control_dependencies() calls, these methods may fail to predict correctly. However, the Transformer model can overcome this limitation by understanding the logical relationships, allowing it to make accurate predictions.

3.2. Example 2

Deep-learning-based API completion methods often outperform traditional methods in handling Python 3.9 code [13,27,28]. However, there are some limitations, such as the inability to complete APIs under certain specific patterns. We demonstrate these limitations with an example of the completion results of the Transformer model, which is now the state-of-the-art model, as shown in Figure 3:

In the code described above, after “model.”, the Transformer incorrectly completed this with the API add(). In fact, the correct completion result is the API summary(), which means that after the statement model.add(Dense(action_size(), activation=‘softmax’)), the model does not continue to add other new layers, but rather completes the definition of the model’s architecture, and only needs to call the API summary() to print the model. The possible reason for this error is that the performance of the Transformer model highly depends on the quality and coverage of the training data. In the context where completion was needed, the model had already called the API add() twice consecutively to add new layers, so it “naturally” assumed that another layer would follow. This suggests that there were few examples in the training data with a similar context, leading to the Transformer model possibly not learning the correct API call sequence. However, in the pattern-based n-gram model, since the usage pattern add()→dense()→action_size()→summary() was counted in the training set, and the API call sequence add()→dense()→action_size() was recognized in the test context, it might correctly predict the next API as summary().

4. Method

4.1. Overall Architecture

In this paper, we propose DLH-API by integrating the current state-of-the-art deep learning model Transformer with a pattern-based n-gram model. The strategy leverages the efficient code comprehension capability of Transformers to fully capture the details and contextual information at the code token level, and performs API prediction through characterization learning on large-scale datasets. However, for API usage patterns that cannot be addressed at such a fine-grained token level [8], a confidence-based n-gram model is used for prediction. Confidence thresholds are set using heuristic methods to enhance the accuracy of the predictions. However, for API usage patterns that cannot be addressed at such a fine-grained token level [8], we utilize a confidence-based n-gram model for prediction. The confidence thresholds are set using heuristic methods to enhance the accuracy of the predictions. The system validates API call sequence prefixes against a statistics database, selecting the most frequent API for predictions when call frequencies exceed experimentally determined optimal thresholds, thus ensuring reliable and precise model performance.

The specific framework of DLH-API is shown in the following Figure 4, which includes three main components: an autoregressive Transformer decoder model, the n-gram model, and the heuristic confidence-driven combinatorial strategy module:

(1): Autoregressive Transformer Decoder Model: This model converts the abstract syntax tree (AST) sequences of source code, including generated sequences, into vector representations. Utilizing multi-head self-attention layers, it identifies dependencies within the input sequence to create context vectors. These vectors are nonlinearly processed by a feed-forward neural network to generate predictions for the current time step.
(2): N-gram Model: First, the API call sequences are extracted from the AST sequences of the source code, which are used as the input of the n-gram model. Then, the model extracts all n-grams (consecutive n tokens) from these API call sequences and calculates their frequency. Using these statistics, it computes the probability $p (a | t_{1} t_{2} t_{3})$ of the next API based on the previous $n - 1$ tokens, selecting the API with the highest probability as the prediction result.
(3): Heuristic confidence-driven combinatorial strategy module: Confidence levels are set using a heuristic method. When the confidence level of a predicted API exceeds a certain threshold, predictions are made using the n-gram model. Conversely, if the confidence level of a predicted API falls below the threshold, predictions are made using the autoregressive Transformer decoder model.

4.2. Prediction Based on the Autoregressive Transformer Model

To improve the accuracy of API predictions, several alternative approaches have been considered. For example, giving increasing attention to RNNs [29] has been suggested to compensate for the loss of signal relevance over long distances. The Transformer model is better equipped to handle long-range dependencies and has achieved excellent results in various NLP tasks [30]. In API completion tasks, the Transformer model has also demonstrated superior performance compared to other models. Therefore, for deep-learning-based methods, we decided to continue using the state-of-the-art Transformer model [1,15,26].

First, the program is preprocessed. To eliminate the interference of code comments in the program analysis, we remove all comments from the source code. Next, we replace all string constants with the “STRING_LITERAL” token to standardize processing and reduce variable complexity. This helps us focus on the structure of the code rather than specific data content. Finally, the code snippets are converted into token sequences in preparation for the input model.

Next, the context is encoded. For the preprocessed token sequences, we use an autoregressive Transformer decoder model with multi-head self-attention to encode the context [31]. This model, through its multi-head self-attention mechanism, focuses on current and previous positions within the sequence, enhancing the model’s ability to capture long-distance dependencies. Formally, if

h (n)

denotes the output of the nth layer of the Transformer block, at the

n + 1

layer,

h (n)

serves as the input vector. It is processed through the multi-head self-attention mechanism and a feed-forward neural network. The specific formulas are as follows:

z (n + 1) = LNorm (h (n)) + MHead (h (n))

(6)

h (n + 1) = LNorm (z (n + 1)) + FFN (z (n + 1))

(7)

where LNorm(·) denotes normalization processing, MHead(·) represents the multi-head attention mechanism, and FFN(·) refers to the application of a feed-forward neural network.

Next,

h (n + 1)

serves as the input for the subsequent Transformer block, continuing to pass to the next layer. In this way, after processing through 6 such Transformer blocks, this ensures effective transfer of information from layer to layer and helps mitigate the problem of vanishing gradients. This enables the model to effectively learn the features of deep networks and finally obtain an encoded representation of the context.

In the prediction phase, the autoregressive Transformer decoder model calculates the probability distribution of the next API using the following formula:

p (a_{n + 1} | t_{1}, t_{2}, \dots, t_{n}) = softmax (W \cdot h^{(L)} + b)

(8)

where

a_{n + 1}

is the n-th token, and this token is an API;

h^{(L)}

is the output from the last layer L of the Transformer block, with W and b being trainable model parameters.

4.3. Prediction Based on the n-Gram Model

Firstly, we extract all relevant API call sequences from the project’s code repository on a per-Python file basis. In this paper, following established practices [23], we convert the source code into an abstract syntax tree (AST), identify the API nodes within it, and then retain only the API call nodes and control nodes (such as keywords like ‘for’, ‘if’, ‘return’, etc.) from the token sequence of the source code. This process results in a complete API call sequence for each file, as shown in Figure 5:

Secondly, we utilize a Markov-chain-based n-gram model to analyze API co-occurrences, mine frequent sequences, and identify common API calling patterns, thereby creating an API statistical database. A Markov chain is a hypothetical mathematical model that assumes when the number of code tokens is large, the probability of future code tokens appearing depends only on their immediately previous tokens and not on other tokens. The n-gram is a commonly used model to capture the co-occurrence relationships between APIs, predicting the nth API based on the previous

n - 1

tokens. Due to the repetitive nature of code, code segments that implement specific functionality are usually realized by calling certain API sequences with certain rules. For example, data request and processing immediately follow user authentication, and automatic login occurs right after user registration. By capturing the statistical dependencies between API calls, we can effectively identify the patterns of API usage in the actual development process. These patterns often represent optimal or common practices, reflecting specific functional implementation strategies. Therefore, Markov-chain-based n-gram models can be used, not only for identifying patterns of API calls, but also for helping programmers to predict the APIs to be called. We calculate the co-occurrence probabilities from the API call sequences to form an API statistical database, denoting the frequency of a certain API appearing after

n - 1

common API call items. For instance, the n-gram item (net(),model(),net()):{count_params():124,model():54,…} indicates that after the API sequence net()→model()→net(), the API count_params() appears 124 times and model() 54 times, where we call (net(),model(),net()) the API call sequence prefix and use it to predict the probability of subsequent API calls based on this prefix.

4.4. Heuristic Confidence-Driven Combinatorial Strategy for API Completion

The autoregressive Transformer decoder model excels in the API-completion task [1] by analyzing a vast amount of actual API calls in code repositories, which can sensitively capture the details of tokens and complex API usage contexts. This fine-grained analysis of code features is incomparable to pattern-based API completion approaches, as shown in Section 3.1. However, despite the advantages of the Transformer model in handling complex situations, it may not always provide accurate predictions for certain specific API usage patterns. In these scenarios, pattern-based methods might offer more precise predictions, as illustrated in Section 3.2.

In order to compensate for the existing deficiencies, we employ two methods in a heuristic confidence-driven combinatorial strategy module for API prediction: one is a learning-based token-level autoregressive Transformer decoder model, and the other is a pattern-based n-gram model for statistical analysis of API call sequences. The prediction method uses the autoregressive Transformer model, but when this is ineffective for certain specific API call sequences, the n-gram model is used instead. In practical applications, the system first checks whether the prefix of the API call sequence to be predicted already exists in the API statistical database. If it exists and the frequency of API calls exceeds a set threshold, the API with the highest occurrence frequency is chosen as the prediction result. For example, if the prefix of the API call sequence to be completed is net()→ model()→ net(), and there is a matching entry in the API statistical database: (net(),model(),net()):{count_params():124,model():54,…}, with a threshold set at 10, then since the frequency of count_params() is 124, far exceeding the threshold, the system will select count_params() as the prediction result. If there is no matching prefix in the statistical database, the system defaults to using the autoregressive Transformer model for prediction. The specific combinatorial strategy is illustrated in Figure 6.

This hybrid strategy integrates the advantages of deep learning and pattern analysis to form a working model with complementary strengths. On one hand, it fully leverages the exceptional capability of deep learning models to handle large datasets, employing the Transformer model for deep token-level understanding and context mining of code. This approach demonstrates high learning efficiency and predictive accuracy in broadly applicable API scenarios. On the other hand, to address specific shortcomings that deep learning might face in certain situations, the strategy incorporates pattern analysis methods. This method parses the source code to identify API usage patterns, recognizing and supplementing API call information that the deep learning model might miss, effectively broadening the application boundary. As a result, this dual-track hybrid API completion strategy ensures that the accuracy of API prediction is effectively improved when it comes to widely diverse API usage scenarios.

5. Evaluation

5.1. Dataset

We used four publicly available Python datasets [1] for our method evaluation, which were “ML”, “Security”, “Web”, and “DL” (https://zenodo.org/records/5797297#.YcU6gS21H8A, accessed on 10 June 2024). Each dataset includes 500 repositories from GitHub (https://github.com, accessed on 28 October 2022) with the highest star ratings and 500 with the most forks. Repositories with fewer than 10 files, less than 1000 lines of code, or where Python accounts for less than

10 %

of the code were filtered out with the cloc tool [32]. The detailed statistics of the datasets are displayed in Table 1.

5.2. Evaluation Metrics

We evaluated the performance of the model using two metrics: accuracy and mean reciprocal rank (MRR). Accuracy is the proportion of the number of correctly predicted APIs to the total number of APIs, which is calculated as follows:

Accuracy = \frac{{p r e d i c t i o n s}_{T r u e}}{{p r e d i c t i o n s}_{A l l}}

(9)

where

p r e d i c t i o n s_{T r u e}

denotes the number of correctly predicted APIs and

p r e d i c t i o n s_{A l l}

denotes the total number of API predictions.

Additionally, mean reciprocal rank (MRR) is a metric that considers the position of a correct prediction within the list of predicted candidates. It measures a model’s performance by calculating the average of the reciprocals of the ranking positions of the first correct prediction across all prediction instances. A higher rank of the correct prediction results in a higher reciprocal value, indicating the better prediction quality of the model. The formula for calculating MRR is as follows:

MRR = \frac{1}{n} \sum_{i = 1}^{n} \frac{1}{{r a n k}_{i}}

(10)

where n denotes the number of candidate values considered for the completion list, and

r a n k_{i}

denotes the positional ranking of the correct API in the completion list.

5.3. Experimental Setup

In our experiments, we configured the autoregressive Transformer model with the following parameters: the embedding layer dimension was set to 300, the model architecture included 6 layers, and the maximum length of the input sequences was limited to 600, with each hidden layer unit also having a dimension of 300. In the multi-head attention mechanism used, we set up 6 heads, each with a dimension of 60. For long sequences (i.e., files longer than 1000 tokens), we adopted a technique mentioned in the literature [33], using a sliding window technique to divide long sequences into shorter segments, which effectively preserved some of the previous contextual information. Additionally, to simplify the model’s vocabulary and improve the processing efficiency, all string constants in the code were replaced with the “STRING_LITERAL” token, and unknown vocabulary was marked as “UNK”. This experiment used a cross-entropy loss function and combined small-batch SGD with an Adam optimizer for model training, with the initial learning rate set at 0.001. When building the API statistical database based on the n-gram model, we set n to four different values: 3, 4, 5, and 6. All experiments were performed on a Linux operating system equipped with an Nvidia GTX 2080Ti GPU with 11 GB of video memory.

5.4. Experimental Results and Analysis

The experiments showed that when the autoregressive Transformer decoder model was combined with the 5-gram model, selecting a relative frequency confidence level of about

0.6

resulted in better performance for API predictions. Figure 7 shows the prediction results of the 5-gram, Transformer, and DLH-API models:

As can be seen from Figure 7, compared to using the autoregressive Transformer model or the n-gram-model-based API sequence call statistics method alone, DLH-API improved both the accuracy and MRR of API completion. This demonstrates the effectiveness of combining deep learning and heuristics for API prediction.

To gain a deeper understanding and evaluate our proposed approach, we conducted experiments to investigate the following four research questions:

RQ1: How effective is it to use an autoregressive Transformer decoder model and an n-gram statistical method based on API usage patterns for predicting APIs separately?

To investigate the effectiveness of using the autoregressive Transformer decoder model alone for API completion, we evaluated its performance across four datasets: ML, Security, Web, and DL. The autoregressive Transformer decoder model can predict other code tokens besides API tokens, but in this paper, we only evaluated its performance in the API completion task. The results are shown in Table 2:

The experimental results from the table above show that, although the autoregressive Transformer-decoder-based model achieved the best performance among five methods presented at recent top conferences [1], its API completion was still low for real datasets, and there is much room for improvement.

In addition, API usage patterns are also widely applied in API completion, where the n-gram model is a representative and simple method that has achieved good results in various studies [19]. This model predicts the next API based on the previous

n - 1

words of an API call sequence, where n is a hyperparameter in the n-gram model. Following the settings used by Xiao et al. [34], in this experiment, n was set to 3, 4, and 5. Based on the API statistical database constructed from the training set, the test set predicted APIs by matching the corresponding API call sequence prefixes (i.e., the previous

n - 1

items). If a corresponding prefix existed, the API with the highest frequency in the nth item was chosen as the prediction result. The results are shown in Table 3:

As shown in Table 3, compared to the autoregressive Transformer decoder model, the accuracy and MRR of the API sequence call statistical method prediction based on the n-gram model were low overall. Moreover, as the value of n in the n-gram increased, both the accuracy and MRR decreased. This illustrates the superiority of deep learning methods.

RQ2: How effective is the combination of deep learning and heuristics for API prediction?

The n-gram model for statistical analysis of API call sequences primarily serves as a supplementary strategy to deep learning, used for predicting APIs in specific usage patterns that may be overlooked or difficult to handle. To achieve this, firstly, there should be API usage patterns in the test set that match those in the API statistics database, which is denoted by the prefix matching rate (PMR). Secondly, among all matching API call sequence prefixes, the higher the accuracy of the predictions, the better, which is measured as precision. Table 4 below shows the experimental results for these two metrics across the four datasets:

As can be seen from Table 4, the prefix matching rate gradually decreased and the accuracy gradually increased with increasing n values. This suggests that, with a larger n-value, the number of API call sequence prefixes in the test set that matched those in the training set decreased, but the correctness of predictions increased. Based on this observation, we chose to combine the 5-gram-model-based API call sequence statistics method with the autoregressive Transformer decoder model for API prediction (the impact of different n-values on the results will be discussed in subsequent issues). The results of the combinatorial strategy prediction were as follows Table 5:

From the above results, it can be seen that the results of the combinatorial strategy prediction were improved, both in terms of accuracy and MRR, relative to the autoregressive Transformer decoder based model and n-gram model alone. This demonstrates the effectiveness of the combinatorial prediction strategy.

RQ3: What is the effect of setting confidence levels for different heuristics on the performance of the combinatorial prediction strategy?

To enhance the prediction accuracy, we explored two heuristic methods for setting thresholds based on absolute and relative frequencies. Absolute frequency refers to the actual count of API occurrences, for instance, (net(),load_state_dict(),net():eval():3,load_state_dict():1, which indicates that in the training set, after the API call sequence prefix (net(),load_state_dict(),net()), eval() appears three times and load_state_dict() once. Relative frequency indicates the percentage of a specific API’s occurrences relative to other APIs under the same call prefix. In the example provided, relative frequency is represented as (net(),load_state_dict(),net()): {eval():0.75,load_state_dict():0.25}, meaning that after the prefix (net(),load_state_dict(),net()), eval() appeared with a relative probability of

75 %

, and load_state_dict() with

25 %

. We set thresholds

θ

based on both absolute and relative frequencies. If the predicted API’s prefix existed in the statistical database and its frequency exceeded the threshold

θ

, the most frequently appearing API was selected as the prediction. For example, in the DL dataset with

n = 5

, we evaluated the performance of the combinatorial strategy predictions under different thresholds for both absolute and relative frequencies. The results are shown in Figure 8:

Figure 8 shows that when using absolute frequency, the highest accuracy, at

33.89 %

, occurred when the threshold was set to 1. With relative frequency, the highest accuracy peaked at

34.16 %

when the threshold was

0.6

, which is higher than the accuracy under the absolute frequency. Additionally, both absolute and relative frequencies exhibited a trend of increasing and then decreasing accuracy. Similar observations were made in the experiments conducted on the other three datasets.

RQ4: How different values of n affect the combinatorial prediction strategy?

The data from Table 4 indicate that, as the value of n increased, the precision of API predictions based on the n-gram model improved, but the accuracy decreased. This suggests that although increasing n enhanced the precision, the number of API call prefixes matching the training set actually decreased in the test set. Thus, increasing n did not necessarily enhance the overall performance of the combinatorial strategy predictions. Additionally, the analysis from Question 3 reveals that setting the confidence based on relative frequency was more effective than absolute frequency. Based on this observation, this experiment focused on the effects on the combinatorial prediction strategy at different thresholds (intervals of

0.1

) under relative frequency, and Figure 9 illustrates the prediction performance for different N values on the DL dataset.

From Figure 9, it can be seen that for accuracy, with a confidence level of 0, the result for

n = 6

was indeed better than that for

n = 5

. However, as the confidence level increased, the improvement in the combinatorial strategy prediction accuracy was slow, reaching its peak at a threshold of

0.5

with a value of

34.06 %

, which was still lower than the optimal result of

36.16 %

achieved with

n = 5

. This indicates that, while increasing n can improve the precision of prefix matching, the reduced number of matching prefixes led to a decrease in the overall number of correctly predicted APIs, thus affecting the combinatorial strategy prediction results.The performance for MMR was similar, achieving the best results at

n = 5

. Similar conclusions were obtained on the other three datasets, where the combinatorial strategy prediction performance at

n = 5

was relatively better.

RQ5: How does DLH-API perform compared to baselines?

The following baseline models were run for comparison, to evaluate our API completion model.

PyART [13]: This model utilizes a predictive framework optimized with heuristic principles, incorporating data-flow details, token similarity, and token co-occurrence to enhance API completion. It effectively employs these heuristic features to achieve accurate predictions.
MPL [12]: This method enhances the AST characterization of source code by integrating multiple paths, and leverages LSTM to improve API completion. It focuses on extending the traditional AST approach to better capture diverse code features.
TravTrans [15]: This approach processes AST node sequences using a pre-order traversal, encoding them into a Transformer model to predict masked API nodes. It aims to accurately identify APIs in the source code by utilizing advanced transformation techniques.

The results are shown in Table 6. The last row shows the outcomes of our method, while the other rows represent the baseline results. From these results, it is evident that our proposed method outperformed all the baseline methods across the four datasets. In particular, it excelled on the DL dataset, with a high accuracy and MRR of 34.16% and 45.64%, respectively, which was an improvement of 3.99% and 4.90% over the state-of-the-art method (i.e., TravTrans). This success is attributed to our method employing a powerful backbone deep learning model and introducing an API for call sequence information.

6. Discussion

Our research combined an n-gram model with an autoregressive Transformer model to propose a new API completion strategy, DLH-API. This strategy was evaluated on four different publicly available Python datasets and demonstrated superior performance compared to single models. This suggests that, while deep learning models can effectively capture complex code contexts and semantic relationships, traditional statistical methods still exhibit unique advantages for certain specific API usage patterns.

A potential threat to the internal validity of our experiments is the selection of hyperparameters, particularly the setting of confidence thresholds in the n-gram model. The performance of our method was significantly influenced by different confidence settings. Theoretically, these confidence thresholds vary continuously, but in our experiments, they were adjusted manually, so we cannot guarantee that these settings are optimal. Nonetheless, we conducted a series of experiments at

0.1

intervals and successfully identified a relatively optimal confidence threshold, which achieved a significant improvement in performance.

Threats to external validity mainly concern the generalizability of our proposed DLH-API strategy across different application scenarios. Although DLH-API performed well on multiple Python datasets, extending this strategy to other programming environments may be challenging. First, due to significant differences in syntax and API usage habits among programming languages, the experimental results based on Python cannot be guaranteed to directly apply to other object-oriented languages like Java or c#. Second, even though the DLH-API strategy relies on language-independent techniques, such as the Transformer model and n-gram model, these models may still require targeted adjustments and optimizations when handling language-specific constructs and library functions. Therefore, future work will involve further testing and adjusting the model in different programming environments to ensure its effectiveness and reliability in a broader range of applications.

7. Related Work

7.1. Deep-Learning-Based API Completion

Deep learning technology has brought significant progress to API completion [1,35,36]. Yan et al. [37] introduced the APIHelper model, enhancing API call completion by leveraging a modified long short-term memory (LSTM) architecture. This approach involves API concatenation encoding, where object types and APIs are individually encoded and then concatenated to serve as input for the LSTM model. Similarly, Svyatkovskiy [28] adapted an LSTM model to autocomplete common APIs in Python packages, utilizing pre-order traversal results of abstract syntax trees (ASTs) directly as LSTM inputs, differing from Yan’s approach of using API call sequences. Nguyen et al. [38] extracted templates from a code corpus and ranked them using deep learning models for API completion suggestions. Chen et al. [39] developed the DeepAPIRec model, which employs a Tree-LSTM approach to model ASTs extracted from source code, recommending APIs for vacant spots and completing API parameters using data flow analysis. Liu et al. [12] employed a multipath approach for code representation and mitigated out-of-vocabulary (OOV) words through a pointer network, achieving commendable results in API completion. Kim et al. [15] introduced the Transformer model and achieved state-of-the-art results in the API completion task.

7.2. Heuristic API Completion

In the domain of API completion, heuristic methods have garnered attention due to their ability to offer recommendations based on specific rules or patterns [2,5]. Bruch et al. [40] introduced three heuristic API completion techniques: frequency-based completion, recommending the most commonly used APIs; co-occurrence-based completion, suggesting APIs that frequently appear together; and a K-nearest neighbors approach (BMN) that identifies code snippets with similar features using the Hamming distance. While the BMN method showed promising results, its practical application is limited by computational costs and its disregard for the sequence of API calls. Proksch et al. [41] enhanced the BMN method by incorporating a pattern-based Bayesian network (PBN), which improved accuracy using additional code features such as parameter arrays, class context, and definition types. He et al. [13] introduced PyART, a novel heuristic method for real-time API recommendation in Python programs. It used lightweight analysis to mimic human-like partial data flow insights. This method focuses on data-flow, token similarity, and co-occurrence, and is trained using random forest models. Evaluations across various Python projects showed that PyART consistently outperformed competitors, effectively meeting real-time requirements.

8. Conclusions

In this paper, we introduced a novel API completion strategy DLH-API. This strategy integrates the traditional n-gram model with the advanced autoregressive Transformer model and employs heuristic methods to set confidence thresholds, aiming to enhance accuracy and MRR in API completion tasks. The experimental results showed significant improvements in accuracy and MRR compared to using pattern-based or learning-based methods alone, confirming the effectiveness of the DLH-API strategy. This provides new perspectives and tools for future research and applications in the field of API completion.

In future work, we plan to extend the testing strategy to cover a wider range of datasets from different programming languages. This approach will help us evaluate the robustness and adaptability of the model in different coding environments. At the same time, we aim to refine the algorithmic framework with a focus on the automatic setting of confidence thresholds. These refinements aim to improve the accuracy of the DLH-API strategy and ensure its validity and reliability in a wider range of application scenarios.

Author Contributions

Conceptualization, Y.L.; Methodology, Y.L.; Validation, J.D.; Investigation, Y.Y.; Resources, Z.P.; Data curation, Z.P.; Supervision, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

The work described in this paper was fully supported by the Research Project of Hunan Provincial Education Department (No. 22C0600) and Provincial Natural Science Foundation of Hunan (No. 2024JJ7249).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Peng, Y.; Li, S.; Gu, W.; Li, Y.; Wang, W.; Gao, C.; Lyu, M.R. Revisiting, benchmarking and exploring API recommendation: How far are we? IEEE Trans. Softw. Eng. 2022, 49, 1876–1897. [Google Scholar] [CrossRef]
Nguyen, P.T.; Di Rocco, J.; Di Ruscio, D.; Ochoa, L.; Degueule, T.; Di Penta, M. Focus: A recommender system for mining api function calls and usage patterns. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE), Montreal, QC, Canada, 25–31 May 2019; pp. 1050–1060. [Google Scholar]
D’Souza, A.R.; Yang, D.; Lopes, C.V. Collective intelligence for smarter API recommendations in python. In Proceedings of the 2016 IEEE 16th International Working Conference on Source Code Analysis and Manipulation (SCAM), Raleigh, NC, USA, 2–3 October 2016; pp. 51–60. [Google Scholar]
Wen, F.; Aghajani, E.; Nagy, C.; Lanza, M.; Bavota, G. Siri, write the next method. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), Madrid, Spain, 22–30 May 2021; pp. 138–149. [Google Scholar]
Xie, R.; Kong, X.; Wang, L.; Zhou, Y.; Li, B. Hirec: Api recommendation using hierarchical context. In Proceedings of the 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE), Berlin, Germany, 28–31 October 2019; pp. 369–379. [Google Scholar]
Nguyen, A.T.; Hilton, M.; Codoban, M.; Nguyen, H.A.; Mast, L.; Rademacher, E.; Nguyen, T.N.; Dig, D. API code recommendation using statistical learning from fine-grained changes. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Seattle, WA, USA, 13–18 November 2016; pp. 511–522. [Google Scholar]
Nguyen, T.T.; Pham, H.V.; Vu, P.M.; Nguyen, T.T. Learning API usages from bytecode: A statistical approach. In Proceedings of the 38th International Conference on Software Engineering, Austin, TX, USA, 14–22 May 2016; pp. 416–427. [Google Scholar]
Wei, M.; Huang, Y.; Wang, J.; Shin, J.; Harzevili, N.S.; Wang, S. API recommendation for machine learning libraries: How far are we? In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Singapore, 14–18 November 2022; pp. 370–381. [Google Scholar]
Hindle, A.; Barr, E.T.; Gabel, M.; Su, Z.; Devanbu, P. On the naturalness of software. Commun. ACM 2016, 59, 122–131. [Google Scholar] [CrossRef]
Nguyen, A.T.; Nguyen, T.N. Graph-based statistical language model for code. In Proceedings of the 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, Florence, Italy, 16–24 May 2015; Volume 1, pp. 858–868. [Google Scholar]
Raychev, V.; Bielik, P.; Vechev, M. Probabilistic model for code with decision trees. ACM SIGPLAN Not. 2016, 51, 731–747. [Google Scholar] [CrossRef]
Liu, Y.; Liu, J.; Zhang, X.; Hu, H. A Multiple-Path Learning Neural Network Model for Code Completion. In Proceedings of the 2023 IEEE International Conference on Web Services (ICWS), Chicago, IL, USA, 2–8 July 2023; pp. 224–233. [Google Scholar]
He, X.; Xu, L.; Zhang, X.; Hao, R.; Feng, Y.; Xu, B. Pyart: Python api recommendation in real-time. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), Madrid, Spain, 22–30 May 2021; pp. 1634–1645. [Google Scholar]
Natella, R.; Liguori, P.; Improta, C.; Cukic, B.; Cotroneo, D. AI Code Generators for Security: Friend or Foe? IEEE Secur. Priv. 2024; early access. [Google Scholar]
Kim, S.; Zhao, J.; Tian, Y.; Chandra, S. Code prediction by feeding trees to transformers. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), Madrid, Spain, 22–30 May 2021; pp. 150–162. [Google Scholar]
Zhong, H.; Xie, T.; Zhang, L.; Pei, J.; Mei, H. MAPO: Mining and recommending API usage patterns. In Lecture Notes in Computer Science, Proceedings of the ECOOP 2009–Object-Oriented Programming: 23rd European Conference, Genoa, Italy, 6–10 July 2009; Proceedings 23; Springer: Berlin/Heidelberg, Germany, 2009; pp. 318–343. [Google Scholar]
Ayres, J.; Flannick, J.; Gehrke, J.; Yiu, T. Sequential pattern mining using a bitmap representation. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, AB, Canada, 23–26 July 2002; pp. 429–435. [Google Scholar]
Wang, J.; Dang, Y.; Zhang, H.; Chen, K.; Xie, T.; Zhang, D. Mining succinct and high-coverage API usage patterns from source code. In Proceedings of the 2013 10th Working Conference on Mining Software Repositories (MSR), San Francisco, CA, USA, 18–19 May 2013; pp. 319–328. [Google Scholar]
Raychev, V.; Vechev, M.; Yahav, E. Code completion with statistical language models. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, Edinburgh, UK, 9–11 June 2014; pp. 419–428. [Google Scholar]
Cavnar, W.B.; Trenkle, J.M. N-gram-based text categorization. In Proceedings of the SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, USA, 11–13 April 1994. [Google Scholar]
Nguyen, T.T.; Nguyen, A.T.; Nguyen, H.A.; Nguyen, T.N. A statistical semantic language model for source code. In Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering, Saint Petersburg, Russia, 18–26 August 2013; pp. 532–542. [Google Scholar]
Niu, H.; Keivanloo, I.; Zou, Y. API usage pattern recommendation for software development. J. Syst. Softw. 2017, 129, 127–139. [Google Scholar] [CrossRef]
Gu, X.; Zhang, H.; Zhang, D.; Kim, S. Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, Seattle, WA, USA, 13–18 November 2016; pp. 631–642. [Google Scholar]
Xiao, Y.; Song, W.; Ahmed, S.; Ge, X.; Viswanath, B.; Meng, N.; Yao, D. Measurement of Embedding Choices on Cryptographic API Completion Tasks. ACM Trans. Softw. Eng. Methodol. 2024, 33, 1–30. [Google Scholar] [CrossRef]
Tang, Z.; Li, C.-Y.; Ge, J.-D.; Luo, B. Method of API Completion Based on Object Type. J. Softw. 2022, 33, 1736–1757. (In Chinese) [Google Scholar]
Fang, L.; Ge, L.; Bolin, W.; Xin, X.; Zhiyi, F.; Zhi, J. A unified multi-task learning model for AST-level and token-level code completion. Empir. Softw. Eng. 2022, 27, 1–38. [Google Scholar]
Svyatkovskiy, A.; Deng, S.K.; Fu, S.; Sundaresan, N. Intellicode compose: Code generation using transformer. In Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Virtual, 8–13 November 2020; pp. 1433–1443. [Google Scholar]
Svyatkovskiy, A.; Zhao, Y.; Fu, S.; Sundaresan, N. Pythia: Ai-assisted code completion system. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2727–2735. [Google Scholar]
Li, J.; Wang, Y.; Lyu, M.R.; King, I. Code completion with neural attention and pointer networks. arXiv 2017, arXiv:1711.09573. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017. [Google Scholar] [CrossRef]
AIDanial. Count Lines of Code. 2021. Available online: https://github.com/AlDanial/cloc (accessed on 10 June 2024).
Al-Rfou, R.; Choe, D.; Constant, N.; Guo, M.; Jones, L. Character-level language modeling with deeper self-attention. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 3159–3166. [Google Scholar]
Xiao, Y. Neural Network-Based Methodologies for Securing Cryptographic Code. Ph.D. Thesis, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA, 2022. [Google Scholar]
Gao, L.; Madaan, A.; Zhou, S.; Alon, U.; Liu, P.; Yang, Y.; Callan, J.; Neubig, G. Pal: Program-aided language models. In Proceedings of the International Conference on Machine Learning, PMLR, Honolulu, HI, USA, 23–29 July 2023; pp. 10764–10799. [Google Scholar]
Patil, S.G.; Zhang, T.; Wang, X.; Gonzalez, J.E. Gorilla: Large language model connected with massive apis. arXiv 2023, arXiv:2305.15334. [Google Scholar]
Yan, J.; Qi, Y.; Rao, Q.; He, H. Learning API Suggestion via Single LSTM Network with Deterministic Negative Sampling. In Proceedings of the SEKE, San Francisco, CA, USA, 1–3 July 2018; pp. 136–137. [Google Scholar]
Nguyen, S.; Nguyen, T.; Li, Y.; Wang, S. Combining program analysis and statistical language model for code statement completion. In Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), San Diego, CA, USA, 11–15 November 2019; pp. 710–721. [Google Scholar]
Chen, C.; Peng, X.; Sun, J.; Xing, Z.; Wang, X.; Zhao, Y.; Zhang, H.; Zhao, W. Generative API usage code recommendation with parameter concretization. Sci. China Inf. Sci. 2019, 62, 1–22. [Google Scholar] [CrossRef]
Bruch, M.; Monperrus, M.; Mezini, M. Learning from examples to improve code completion systems. In Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, Amsterdam, The Netherlands, 24–28 August 2009; pp. 213–222. [Google Scholar]
Proksch, S.; Lerch, J.; Mezini, M. Intelligent code completion with Bayesian networks. ACM Trans. Softw. Eng. Methodol. (TOSEM) 2015, 25, 1–31. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of transformer model structure.

Figure 2. An example where the Transformer model predicts correctly but the n-gram method does not, where the red ? denotes the API to be completed.

Figure 3. An example where the n-gram method correctly predicts but the Transformer model does not, where the red ? denotes the API to be completed.

Figure 4. Overview of the combinatorial strategy that combines deep learning and heuristics for API completion.

Figure 5. Schematic diagram of API call sequence.

Figure 6. Schematic diagram of the combinatorial strategy for API prediction.

Figure 7. The accuracy (left) and MRR (right) of the 5-gram, Transformer, and DLH-API models.

Figure 8. The accuracy of absolute frequency (left) and relative frequency (right) at different thresholds.

Figure 9. Comparison of accuracy (left) and MRR (right) for different values of n.

Table 1. Dataset Information.

Dataset	Projects	Files	API (Training Set)	API (Test Set)
ML	323	46,556	2,206,423	1,094,635
Security	126	15,785	36,870	179,815
Web	568	82,771	1,325,006	622,548
DL	307	39,577	1,453,038	710,177

Table 2. The accuracy and MRR of the autoregressive Transformer-decoder-based model for API completion.

Dataset	Accuracy	MRR
ML	0.3127	0.4232
Security	0.2825	0.3683
Web	0.3092	0.4070
DL	0.3285	0.4351

Table 3. The accuracy and MRR of the n-gram model-based API call sequence statistics method for API completion.

Dataset	n = 3		n = 4		n = 5
Dataset	Accuracy	MRR	Accuracy	MRR	Accuracy	MRR
ML	11.12%	23.83%	7.25%	17.51%	4.89%	9.98%
Security	10.56%	14.41%	8.29%	15.83%	6.12%	9.65%
Web	11.27%	24.53%	8.14%	15.97%	6.34%	11.27%
DL	15.06%	26.86%	9.99%	19.62%	6.92%	12.54%

Table 4. PMR and prediction of n-gram model based statistical methods for API sequence calls.

Dataset	n = 3		n = 4		n = 5
Dataset	PMR	Precision	PMR	Precision	PMR	Precision
ML	52.79%	21.06%	21.39%	33.88%	9.63%	50.76%
Security	50.64%	20.74%	20.09%	34.21%	9.55%	46.92%
Web	55.23%	22.30%	22.97%	35.67%	10.81%	48.35%
DL	57.25%	26.30%	24.07%	41.50%	11.79%	58.72%

Table 5. The combinatorial strategy prediction results with

n = 5

.

Table 5. The combinatorial strategy prediction results with

n = 5

.

Dataset	Accuracy	MRR
ML	0.3151	0.4342
Security	0.2830	0.3706
Web	0.3919	0.4096
DL	0.3363	0.4492

Table 6. Comparison of accuracy and MRR for API completion across different datasets.

Dataset	ML		Security		Web		DL
Dataset	Accuracy	MRR	Accuracy	MRR	Accuracy	MRR	Accuracy	MRR
PyART	18.33%	27.25%	25.60%	36.71%	26.42%	37.08%	21.85%	33.47%
MPL	27.20%	38.17%	25.91%	36.52%	26.06%	38.21%	25.34%	35.52%
TravTrans	31.27%	42.32%	28.25%	36.83%	30.92%	40.70%	32.85%	43.51%
DLH-API	32.40%	44.39%	29.63%	38.92%	32.28%	42.69%	34.16%	45.64%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Y.; Yin, Y.; Deng, J.; Li, W.; Peng, Z. A Combinatorial Strategy for API Completion: Deep Learning and Heuristics. Electronics 2024, 13, 3669. https://doi.org/10.3390/electronics13183669

AMA Style

Liu Y, Yin Y, Deng J, Li W, Peng Z. A Combinatorial Strategy for API Completion: Deep Learning and Heuristics. Electronics. 2024; 13(18):3669. https://doi.org/10.3390/electronics13183669

Chicago/Turabian Style

Liu, Yi, Yiming Yin, Jia Deng, Weimin Li, and Zhichao Peng. 2024. "A Combinatorial Strategy for API Completion: Deep Learning and Heuristics" Electronics 13, no. 18: 3669. https://doi.org/10.3390/electronics13183669

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Combinatorial Strategy for API Completion: Deep Learning and Heuristics

Abstract

1. Introduction

2. Background

2.1. Pattern-Based API Completion Methods

2.2. Learning-Based API Completion Methods

3. Motivation Example

3.1. Example 1

3.2. Example 2

4. Method

4.1. Overall Architecture

4.2. Prediction Based on the Autoregressive Transformer Model

4.3. Prediction Based on the n-Gram Model

4.4. Heuristic Confidence-Driven Combinatorial Strategy for API Completion

5. Evaluation

5.1. Dataset

5.2. Evaluation Metrics

5.3. Experimental Setup

5.4. Experimental Results and Analysis

6. Discussion

7. Related Work

7.1. Deep-Learning-Based API Completion

7.2. Heuristic API Completion

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI