1. Introduction
The emerging trend towards decentralization leads the pursuit for technologies that facilitate tamper-proof data exchange. Similarly, blockchain is based on a peer-to-peer architecture, which has empowered several core technologies including digital signatures, smart contracts, cryptographic hashing, and consensus mechanisms [
1,
2,
3]. With the continued development of blockchain technology in a multitude of applications, the number of smart contracts deployed in distributed ledgers has increased tremendously. Smart contracts [
4] are self-executing decentralized applications running on blockchain used for governance of financial assets that once deployed are autonomous and immutable. Smart contracts adhere to the underlying configuration of distributed ledgers and inherit automation, immutability, and decentralization qualities [
5].
With the widespread adoption of blockchain platforms across various decentralized applications, smart contract interoperability is continuously evolving. Hence, Ethereum [
6] has become a prominent smart-contract-based blockchain platform due to the increasing adoption of its decentralized applications. Ethereum’s surging popularity can be attributed to its high level of robustness and adaptability for a wide range of applications [
7]. Ethereum currently hosts over four million smart contracts with approximately 670,000 smart contracts deployed each month and over 3000 DApps [
8]. Therefore, a major concern for users is identifying the desired application service among dozens of smart contracts in a timely and efficient manner. The identification of contracts is one of Ethereum’s primary challenges. Some smart contract developers make their source code available, along with a description of its context and purpose. Unless the contract developer invests in publicizing it through designated fora, the vast majority of smart contracts remain anonymous and hardly traceable without a description. With the increasing number of smart contracts, assisting users to identify their required service among a massive number of contracts has become an ongoing challenge. As a result, it is imperative to outline a hierarchy that provides a comprehensive mapping of smart contracts that exceed the primary query services of blockchain platforms that are limited to contract address, block number, transaction hash, and timestamp. An important step towards performing such searches requires accurate labeling of the contracts, which was initially performed through an inefficient manual process. As a result, a comprehensive classification model capable of automatically classifying current or recently uploaded contracts is required.
Research has been carried out to help comprehend what smart contracts do and to enable contract searches based on their context and purpose. The proposed methods are centered on learning the characteristics and the structural code embedding of smart contracts. However, the existing classification and topic modeling schemes for smart contracts are limited to application domains, such as entertainment, management, the IoT, lottery, gambling, gaming, and so on [
9,
10,
11]. This is why recent studies have underlined the significance of blockchain-enabled peer-to-peer energy trading systems [
12] leveraging smart contracts [
13].
The advent of blockchain technology offers the potential to securely automate P2P energy trading [
14,
15]. Smart contracts have proven to be effective for autonomous and secure execution of end-to-end energy transactions based on local consumer preferences. Ethereum’s smart contracts are regarded as strict protocols on the blockchain that allow energy transactions to be carried out once all prerequisites have been met. Energy exchanges are monitored as financial transactions, with the corresponding resource consumption quantified in gas units and remitted in Ether at the gas price [
16,
17]. However, despite the prevalent adoption of smart contracts in energy applications that streamline consumer and prosumer interactions towards a robust settlement process, no research has been conducted on the identification and analysis of energy smart contracts. Considering the multitude of deployed contracts, the lack of methodologies (such as classification models for the analysis of energy smart contracts) makes it challenging to gain insights into this ecological environment and to identify vulnerable smart contracts once deployed.
To the best of our knowledge, there has been no contribution explicitly addressing the detection and analysis of energy smart contracts. The key contributions of this article can be summarized as follows. This paper demonstrates the significance of domain-specific classification and analysis tools for smart contracts. This is accomplished using a method that leverages contextual terms for classification of smart contracts using code and comments. The proposed classification pipeline provides a method for the detection and labeling of energy smart contracts using their source code. It uses feature engineering methods to produce domain-specific corpora, which are subsequently embedded within machine learning classification models. A domain-specific analysis is a method of deriving technical and language-dependent features to enhance the structural and procedural understanding of smart contracts. This is achieved by employing contextual terms at a lexical level, searching for key terms and attribute tags to develop the energy corpus that will aid in deciphering the code’s context and retrieving energy contracts. By discerning the domains, we intend to reach higher levels of abstraction and handle the intricacies of DApps design specifications. The developed model captures the entire lexicon of transactions used for developing an energy smart contract. Finally, energy smart contracts are analyzed to identify patterns in the distribution of code segments, the predominance of specific functions, and recurring contracts across the Ethereum network.
The proposed approach can be used by contract developers to track similar contracts deployed on Ethereum. Contract level representation is used to include the highest level of granularity in the classification; thus, the proposed method is applicable to any application domain by using the key terms and attributes pertaining to that domain. It develops the corresponding corpus using contextual terms, and the collected attributes are then used to train the classifier. This methodology can be further extended to include measures for anomaly detection and malicious contract detection in the context of energy smart contract analysis. Security practitioners can use it to investigate the potential vulnerabilities of energy smart contracts. The search for vulnerabilities can be optimized by factoring intriguing elements such as the semantics of the defects while vulnerable contract’s syntactic representation is protected to the greatest extent possible. This an important step towards the design and development of auditing systems to address the identified vulnerabilities.
The article is organized as follows.
Section 2 introduces energy smart contracts and provides background information on smart contracts and natural language processing. In
Section 3, related works are discussed to further highlight the contributions of this research.
Section 4 describes the research methodology and the proposed classification pipeline, followed by
Section 5 which showcases the classification results obtained by the baseline models. Finally,
Section 6 summarizes the conclusions and future research directions.
3. Related Work
Smart contracts, unlike conventional contracts, are not written in natural language, making it difficult to determine their context. When compared to conventional programming languages, such as C and Java, uniform understanding of Ethereum smart contracts is relatively limited. Nonetheless, there have been a few attempts at classifying Ethereum smart contracts prior to this research. However, smart contract classification described in the existing works is not necessarily consistent with Buterin’s initial classification of three tiers of financial, semi-financial, and non-financial applications [
24]. For instance, Shi et al. [
25] applied NLP on bytecode to classify contracts as governance, finance, gaming, wallet, and social, although wallets are a subset of financial applications, according to Buterin’s early classification. Using LSTM, Hu et al. [
11] classified Ethereum smart contracts by identifying six behavior patterns analysing transactions including game, gambling, exchange, finance, high-risk, and social transactions. Later, Tian et al. [
9] developed a smart contract classification strategy based on Bi-LSTM and Gaussian latent Dirichlet allocation (LDA) to classify contracts as entertainment, management, lottery and tools, finance, IoT, and others.
According to our experience, there is no consistent set of smart contract specifications in published research work, and smart contracts associated with comparable practices are often classified differently. The proposed study extracts domain models with the intended goal of deriving business logic from current Ethereum-based Dapps aiming at transactive energy systems. Complete formal characterizations of a program’s intended behavior are rarely available; thus, it is imperative to describe smart contracts from a domain-specific standpoint. Furthermore, addressing prevalent smart contract vulnerabilities mandates a semantic and synthetic understanding of the compromised contract.
Although the transparent execution of smart contracts has enhanced the readability of blockchain-enabled systems, the characteristics of distributed ledgers make it extremely challenging to revoke vulnerable smart contracts once they are deployed. As a result, massive financial losses caused by smart contract security breaches in former intrusions have compromised the ecological stability of the contract layer in widely adopted blockchain platforms such as Ethereum. Hence, a growing concern in blockchain security is the detection of smart contract vulnerabilities [
26,
27]. Scalability and new security vulnerabilities will emerge as the scale of the Ethereum projects advances over time. Novel vulnerability detection algorithms should detect and assess novel security threats and determine how to mitigate them. Accordingly, the search for vulnerabilities can be optimized by factoring intriguing elements such as the semantics of the defects. In fact, contract classification based on the application domain and transaction context offers greater insight into the syntactic and semantic properties of a given class of contracts. This can be further used for the design and implementation of customized vulnerability and fault detection mechanisms for a specific domain, including transactive energy systems [
28]. Consequently, it is important to design and employ smart contract analysis tools to gain a broader knowledge of contracts concerning their underlying domains [
29]. This will help to establish the groundwork for the development of domain-specific vulnerability detection algorithms for the detection and mitigation of unknown vulnerabilities and facilitates protecting vulnerable contracts’ syntactic representation to the greatest extent possible.
4. Methodology
To evaluate the grammatical, symbolic, and arithmetic characteristics of smart contracts, this study analyzes their source codes using machine learning algorithms. They are deployed as predictive models for detecting Ethereum energy smart contracts. As depicted in
Figure 2, the proposed classification pipeline can be deconstructed into tree main stages of pre-training, training, and testing. Pre-training encompasses data collection and feature engineering. This adheres to the embedding layer and baseline models under training, after which the models are tested and evaluated. Logistic regression (LR), naive Bayes (NB), and Support Vector Machine (SVM) are employed as classifiers and a comparative performance analysis is carried out on the obtained results. As shown in
Figure 2, pre-training embeddings and domain-specific embeddings form the corpus. Its components are further illustrated in the form of the ontology shown in
Figure 3 and the embeddings listed in
Table 1, which can be interpreted in terms of their relative frequency (see
Section 5 for an example).
4.1. Data Collection and Pre-Processing
This study is performed on smart contracts solidity source codes retrieved from etherscan.io [
8]. Since the code is explicitly used as the input for the classifier, it must be parsed to acquire an appropriate code representation form [
37,
38]. Code representation is performed using a customized solidity parser at the contract level, function level, comment level, and token level to identify semantic information within the focal points. As a result, contract level representation is employed to include the highest level of granularity in the classification. Moreover, pre-processing of raw unstructured text is required for content analysis. The pre-processing of the source code includes cleaning, normalizing, and stemming to remove stop words and semantic-irrelevant terms and to break down the words to their roots for detection of semantic similarities in feature extraction [
37].
4.2. Building a Domain Corpus
In NLP, the frequency of terms is important to analyze the context. As a result, most feature extraction methods need a corpus of annotated terms paired with an ensemble of algorithms to learn the significance of a term in a document [
38]. The process of corpus development and energy token extraction is discussed in this section.
Machine learning has been widely adopted for the semantic and syntactic analysis of text. As a result, most machine learning approaches require a corpus of annotated text for the underlying algorithm to learn the significance of each term in the document [
39]. The corpus must be a balanced, representative collection of terms about a specific topic [
40]. The vocabulary of size 376 and energy corpus was created using domain knowledge and by extracting domain features and terms from both the comment and code segments of the contracts. Text analysis and feature extraction methods include different levels of granularity including document level, sentence level, and word level analyses. In the course of developing the energy corpus and processing the smart contracts, we analyzed source codes at the lexical level. From a lexical standpoint, featuring a domain-specific corpus facilitates capturing the terms and statements that correspond to a specific application domain with prototypical measures at syntactic and semantic levels. Moreover, a domain-specific term distinguishes itself by the relative degree of closure of its lexicon, which signifies that, unlike a general corpus, a domain-specific vocabulary is nearly finite. Since certain syntactic structures and classes are more prevalent in a given application domain than in general corpus, we aim to create key energy terms, tokens, and attribute tags that will facilitate interpreting the context of the code and retrieving energy contracts. Moreover, semantics serve a limited role in smart contract development and using keywords allows a surface-level interpretation of source codes. Using term frequency and relevance, the corpus is intended to connect a knowledge base as a dictionary to the source code text. The corpus will be updated progressively as new energy contracts are identified during the process.
4.3. Embedding Layer
The main challenge in active learning is selecting the most insightful data instances to label and use to begin training. The choice of embeddings requires special consideration since we are parsing source code. As mentioned earlier, source code deviates from a genuine text in that it features distinct granularity levels and lacks high-quality contextual information at the document level. NLP and word embeddings have recently seen considerable advances. However, source code processing requires semantics knowledge at the concept level rather than unique occurrences in text. At the same time, instance-specific embeddings (as introduced in BERT and similar approaches) are best suited for language translation and search engine queries [
45]. Hence, feature selection techniques are useful to identify and eliminate unnecessary and irrelevant subsets of features [
46]. Word2Vec [
47] and Term Frequency–Inverse Document Frequency (TF-IDF) [
48] were used in this work as feature extraction methods. They analyzed over 10,000 smart contracts to facilitate preliminary filtering. Word2vec is a pre-trained word embedding neural network, effective in text classification with small corpus, as in our case. The Word2Vec embedding layer keeps the semantic and syntactic information of codes and comments and it predicts the context of the terms. Using Continuous Bag of Words (CBOW) as the underlying architecture, the semantic correlation of the existing terms in the corpus was evaluated to find the closest match [
49]. CBOW quantifies the frequency of the terms in the document by assigning each term a value representing the occurrence of that feature. The corpus is updated by terms with the highest semantic similarity scores, as illustrated in
Table 1. Using a lexicon of energy-relevant terms, the occurrence of each term and their corresponding semantic correlation score are factored as measures of the energy-ness of the contract. Subsequently, the pre-processed contracts are passed through an embedding layer stacked in front of the classification model. TF-IDF is used as a sparse embedding layer to extract features from the labeled data. It works by penalizing frequently occurring terms in the source code to identify prominent yet infrequently used terms that prevail over the context. Using TF-IDF, each term in the source code is assigned a weight that determines the significance of the term in the source code based on its frequency and inverse document frequency used for training the classifier.
4.4. Baseline Models
Determining the best classifier is an imperative yet challenging decision in any text classification workflow. It needs to take into account many aspects, including the data composition, scalability of training, and run-time efficiency. In this study, NB, LR, and SVM are used as baseline models for classification of the Ethereum smart contracts [
50].
4.4.1. Logistic Regression
LR is a discriminative, probabilistic classifier that is commonly employed in NLP as a supervised machine learning model. Based on its core assumption that dependent and independent variables do not have a linear relationship, LR examines the relationship between categorical variables using a logistic function. It requires a training corpus to detect discriminating features between the desired classes. The input corpus is used by LR to learn the domain’s verbal intuition and syntactic literature, as well as to retrieve document features and biased terms. Each input feature is assigned a weight that represents its importance in the classification decision. LR assigns higher weights to the primitive terms, although it is not capable of generating an instance of these terms on its own. The bias term, commonly referred to as the intercept, is also added to the weighted inputs. This implies that energy terms are negatively associated with the non-energy decision, as illustrated in
Figure 4. Instead of determining similarity, LR takes into account the distance between the energy and the non-energy contracts. Following that, gradient descent is used to iteratively update the weights in order to minimize the cross-entropy loss which is a convex optimization problem. Hence, the algorithm’s resistance to correlated characteristics contributes to a higher classification precision.
4.4.2. Naive Bayes
Another probabilistic, supervised classifier employed in this study is NB. It determines the likelihood of a label based on previously observed characteristics and their conditional independence. Using the Bayes theorem, this model identifies the correlation between conditional probabilities and statistical quantities. As an incremental approach, NB is predicated on a theory that all attributes are independent and any context disregarding this conditional independence principle deteriorates its performance. The source code is transformed into a feature vector as an input for naive Bayes to be trained on the training set, estimating the likelihood of energy-ness given each feature. Generally, features can be developed by analyzing the training set while keeping linguistic intuitions and the domain-specific linguistic literature in mind. Developing complex features that are variations of a number of primitive features is especially useful, as illustrated in
Figure 3. A thorough assessment of errors on the training set often yields perspectives on these features. NB generates the probability of each feature for each class, such that the probability of each feature can be optimized to project energy-ness or non-energy-ness of the smart contracts.
4.4.3. Support Vector Machine
The last supervised learning approach used in this research is SVM. It has proven to be an effective method for pattern recognition and text classification. As a discriminant classifier with a statistical learning paradigm, SVM attempts to capture the optimum trade-off between complexity and learning to ensure maximum generalization and minimum structural risk. The source code is perceived as a bag of words and each term is associated with a feature where the significance of the feature is determined by the frequency by which it appears inside the contract using TF-IDF. Once feature vectors are obtained, SVM transforms the training set into a multidimensional space to create a hyper-plane. The optimal position of the hyper-plane is directly affected by the data points closest to the decision boundary as they are the most challenging to identify. In addition, a subset of the training set is used as support vectors in the decision function, making it memory efficient. Using a higher dimension, SVM distinguishes the classes with the highest marginal distance, establishing a decision boundary to optimize the classification accuracy.
5. Evaluation Results
Table 2 and
Table 3 demonstrate how each model performs on energy and non-energy contracts independently, as well as the overall accuracy of each algorithm using precision, recall, and the F1-score [
51]. The obtained results show that LR produced the maximum accuracy of 98.34%, 97.53% precision, 98.78% recall, and 98.12% F1-score.
Figure 4 depicts the LRs effective energy assessment of the frequently encountered terms in the source codes. The model is fed the TF-IDF features and their corresponding coefficients, which translate to a weighted combination of input features. This is to determine the importance of the feature in the overall logloss calculation. Hence, the probability of the contract being a non-energy contract increases as the logloss increases, while the probability of the contract being an energy contract increases as the logloss decreases. As a result, the term “Energy” has been appointed a −8.0191 correlation score with non-energy terms and is regarded as the most energy-related term. Pylon [
52], which has been further identified as one of the dominant energy tokens adopted in several energy transactions, has been classified as an energy-related term with a −2.361 correlation score with respect to non-energy terms. On the other hand, terms such as game, gamble, sport, betting, and lottery that embody dominant application areas for smart contracts are classified as non-energy terms [
11].
LR achieved substantial results since the classification task is fundamentally a binary problem. In addition, the probabilistic structure of LR allows the use of the likelihood ratio for reducing the costs associated with misclassification. The primary difference between LR and NB as the best performing algorithms is that LR is a discriminative classifier and NB is a generative classifier [
53]. As a result, the success of LR over NB can be traced back to LRs robustness to correlated features and NBs intense conditional independence theorem. Although NB is acknowledged for its fast convergence, it demonstrated relatively high errors compared to LR. Nevertheless, because of its comparable outcomes and effortless training, NB remains a viable technique for use on small datasets [
54].
SVM produced satisfactory results in the contract classification, with 87.23% accuracy. SVM is able to generalize because, as a probabilistic method, it does not penalize instances in which the correct decision is made with a reasonable degree of certainty. However, SVM underperformed in comparison to the other algorithms since it aims to maximize the perpendicular space between the two edges of the hyperplane to reduce the risk of generalization errors. As a result of the high correlation between smart contracts, the marginal distance of the data points decreased, resulting in a higher generalization error and a lower accuracy.
Subsequently, a fraction of the identified energy contracts were analyzed to capture any patterns in code segment distribution, prevalent adoption of specific functions, and recurring contracts across the Ethereum network [
29].
Figure 5 depicts the most common energy tokens used to facilitate transactions in energy smart contracts. POWR is a utility token that grants access to the Power Ledger platform and its peer-to-peer features and serves as the ecosystem’s fuel [
44]. Similarly, WPP Energy is a publicly available blockchain-based renewable energy investment platform that offers peer-to-peer smart-contract-enabled transactions using WPP tokens in an effort to promote the use of cryptocurrency in the energy market [
34]. WePower is another blockchain-based green energy trading platform that enables energy suppliers to gain capital for green energy efficiency through smart contracts and WPR tokens. These tokens are indicative of the energy produced by producers in the days to come, giving buyers the opportunity to invest in renewable energy [
55].
Finally, potential discrepancies between the code segments of both categories were examined, selecting 20 energy smart contracts and 20 non-energy smart contracts.
Figure 6 illustrates the comparative results obtained from the distribution of contract code segments. When compared to non-energy smart contracts, the results obtained on the distribution of energy code segments imply that the development of energy contracts tends to prioritize the adoption of contracts and libraries over interfaces. The number of logical code lines (excluding comments and empty lines (LLOC)), the number of source code lines (SLOC), the number of functions (NF), the deepest nesting level (NL), the number of parameters (PAR), and the number of comment lines (CLOC) results also validate the prevalent use of identical contracts with minor adjustments and demonstrate how the lines of comments are heterogeneous and not necessarily proportionate to the length of the code. The results further confirm the prominent adoption of StandardToken and ERC20 in both classes upon analyzing dominant contract names and function names across both energy and non-energy smart contracts. As illustrated in
Figure 7, the primary differences between the two classes in terms of function names can be attributed to the Ownable, Safemath, Pausable, and Mintable contracts. This indicates that Mintable contracts are not commonly used in energy contract development, since Mintable tokens feature a non-fixed total supply, allowing the token issuer to mint additional tokens. On the other hand, Ownable, Safemath, and Pausable are not identified as dominant non-energy contracts. Owanble contracts may be utilized for lowering gas costs and binding configuration functions to specific external addresses and are widely adopted in the energy sector. The SafeMath library examines whether an arithmetic operation will result in an integer overflow/underflow. Energy contracts use Safemath to send an exception and rollback the transaction. Pausable contracts are another common practice among energy contracts, allowing the owner of a Pausable contract to halt and restart functions. The owner can pause the functionality at any time; thus, users may be hesitant to utilize the corresponding dApp, making this a drawback in design for energy contracts. Although from a vulnerability analysis standpoint, the lack of a pause mechanism requires vulnerable contracts to be aborted while an alternate instance becomes available on the blockchain. Finally, the same analysis among functions in both classes entails the adoption of comparable functions across each category, with more prevalent adoption among energy contracts, as illustrated in
Figure 7.
6. Conclusions
Blockchain technology has brought innovation to a wide array of industries. The number of transactions on the Ethereum blockchain is approaching half a billion, turning Ethereum into the largest smart contract blockchain platform. Unlike traditional contracts, smart contracts are not written in a natural language, making it difficult to determine their content. As a result, smart contract classification based on the application domain and transaction context provides greater insight into the syntactic and semantic properties of that domain. With the progression towards a more decentralized and dynamic energy system, the impact of blockchain-enabled smart contracts in transactive energy systems has gained prominence. As a result, it is imperative to analyze the energy smart contract feature space to gain more insights into the characteristics of contracts deployed for energy transactions.
Analyzing over 10,000 smart contract solidity source codes, this study proposes an approach to discriminate energy smart contracts using the publicly accessible Ethereum source codes. NLP and machine learning classification algorithms are employed to detect and properly label energy smart contracts. To begin, a domain-specific embedding layer is generated to identify and analyze energy tokens and energy-related terms. Subsequently, both the energy corpus and categorical attributes are employed as baselines for the training of the classification algorithms. Logistic regression, naive Bayes, and Support Vector Machine are implemented as classifiers. The classification performance of each algorithm is then evaluated using accuracy, precision, recall, and F1-score metrics. Energy smart contracts are detected with up to 98.34% accuracy, with LR outperforming the other algorithms. Detected contracts are further examined to discern any discrepancies or patterns in the distribution of code segments, the predominant use of specific functions, and recurring contracts across the Ethereum network.
We anticipate that the proposed approach will help to establish the groundwork for innovative solutions for domain-specific classification and vulnerability detection of smart contracts. Looking exclusively into the grammatical, symbolic, and arithmetic characteristics of energy smart contracts may facilitate the identification of vulnerability features that may have gone undetected in previous studies. Subsequently, machine learning models can be employed for vulnerability assessments of the energy contracts at the function level. To improve the accuracy of the existing vulnerability detection models, the implications of integrating the conventional pattern extraction methods with machine learning models can also be investigated.