1. Introduction
Implementing current information technology (IT) systems with a suitable information and communication technology (ICT) structure has become important in the current decade. The quick advancement of technology performs allows various tasks to be completed simultaneously and delivers high performance and efficiency. The important role that IT plays in several industries, including healthcare and decision-support systems, emphasizes the importance of IT in a variety of sectors. To implement ICT, organizations use specific software that is crucial to service delivery and production measures. Significantly, businesses that produce software and hardware give importance to the development and application of these instruments, normally ignoring the sustainability of the resources used in their manufacturing. This imprecision has resulted in a notable matter that may be examined from two viewpoints: (1) the software within the information technology system, and (2) sustainability [
1]. The result for IT systems is an exponential increase in software complexity, which harms costs, stakeholders, and the owner [
2]. On the other hand, IT systems negatively influence sustainability due to the amount of hardware, data centers, and energy required for these systems to be developed, maintained, and run. Additionally, it is important to consider the negative effects on social and economic features [
3].
The demand for sustainable and eco-friendly software is rising, with implications for performance levels, network bandwidth, and hardware requirements. These factors also affect energy consumption and the use of natural resources. Research on greening software engineering appears to be a critical answer to this growing demand. The goal of this research is to improve software systems’ sustainability through a multiangle approach. To design software with as few negative impacts as possible on the environment, society, and the economy, it becomes imperative to adopt the viewpoint of sustainable software development. Numerous studies show the intersection of nonfunctional requirements, such as performance, maintainability, scalability, and usability, with sustainable environmental factors. This intersection demonstrates the relationship between green IT factors and NFRs. The key evidence showing the relationship between NFRs and green IT factors is given below:
Resource Consumption: Green IT factors focus on the best possible usage of resources, including software and hardware resources. Different NFRs, such as scalability and resource consumption, ensure that the system can handle additional load without adding unnecessary resources to the system. Research studies indicate efficient resource management in the software development process, which can lead to the better resource consumption of existing infrastructure.
Energy Efficiency: Minimizing energy consumption is one of the major green IT factors. Different NFRs, such as efficiency and performance, are directly related to these green IT factors. We can reduce energy consumption by decreasing the computational resources using optimized algorithms and efficient coding techniques [
4].
Usability: Usability is a major NFR that can produce an impact on user behavior towards green IT practices. For example, a user-friendly interface that encourages system users to power down the system when it is not working can contribute to energy saving. Useful interface strategies can produce a positive impact on sustainability factors.
Maintainability: The software system that follows green IT practices has a long lifecycle. Maintainability is an NFR that ensures the system can be easily maintained and updated. A maintained system has a very low chance of replacement, which addresses the reduced electronic waste and is one of the green IT factors.
Regulatory Compliance: NFRs that are related to regulatory compliance ensure that a system meets these regulatory requirements, which often include energy efficiency and efficient waste management. By meeting these regulatory requirements, the system will directly support green IT practices.
The study presented here contributes significantly to the convergence of software engineering and sustainability in various ways. First, we present a greening requirements engineering concept, which is an important stage in the software development lifecycle (SDLC). Many studies used machine learning and deep learning approaches for requirement classification [
5,
6]. In particular, our study defines a procedure to link nonfunctional requirements (NFRs) to characteristics of green software. Secondly, we add a new feature that captures the author-defined aspects of software sustainability. Our study improves the extensively used Predictor Models in Software Engineering (PROMISE) dataset [
7]. This dataset is important for training the language model that we created in this work. This novel study creates a link between NFRs and sustainability extents that exist in the PROMISE dataset [
7].
Thirdly, to classify different kinds of nonfunctional requirements (NFRs) into relevant sustainability aspects, we build, implement, and evaluate an extremely effective, precisely tailored Bidirectional Encoder Representations from Transformers (BERT) language model. This model is a keystone of our novel methodology, which measures the degree of sustainability perception and attention at the RE stage of the software development lifecycle (SDLC). Different pre-trained language models are used for different natural language processing (NLP) applications. For example, OpenAI’s Generative Pre-Trained Transformer (GPT) generates text using an unsupervised learning technique [
8]. Using a uniform framework, Google’s Text-to-Text Transfer Transformer (T5) is trained on tasks including summarization and text translation [
9]. Facebook presented a Robustly Optimized BERT Pretraining Approach (RoBERTa) as a BERT alteration that improves overall performance by incorporating more training data and methodologies [
10]. In this study, we employ the BERT model based on the features of the selected dataset and the proposed mapping job from NFRs into sustainability aspects of the software. BERT is a highly efficient pre-trained language model that performs exceptionally well in various natural language processing tasks, especially fine-grained classification tasks that require the classification of text into discrete groups.
In contrast to previous language models, BERT’s attention mechanism and bidirectional nature provide extra advantages for fine-grained categorization job accuracy. Additionally, BERT is especially well suited for fine-tuning on a limited dataset, as used in this study [
8]. The real motivation to study sustainable requirements engineering is its capability to produce an impact on the environment. Sustainable requirements engineering helps us to design systems that can reduce resource and energy consumption. Sustainable requirements engineering is useful for producing eco-friendly software systems. We can make cheaper and more cost-effective software systems by studying sustainable requirements engineering. Incorporating sustainability into the software development process makes it more efficient and competitive. Moreira [
11] discussed the impact of sustainable requirements engineering on the software development process. The study addressed the sustainability requirements catalog with the help of a systematic mapping approach to show the sustainable characteristics that affect the development process. A different study [
12] addressed the effectiveness of software sustainability through a systematic mapping approach. A third study [
13] analyzed the impact of sustainable requirements on social and environmental issues and found them useful in all types of projects. The main contributions of our research are given below:
Focuses on the correlation of nonfunctional requirements with sustainable green IT factors.
Extends the PROMISE_exp dataset by adding 61 new instances and one column of sustainability class with the binary labels “socio-economic” and “eco-technical”.
Performs sustainability-factor labeling on all 1030 instances of the extended PROMISE_exp dataset.
Evaluates the BERT model for classifying sustainability factors within the context of NFRs.
Improves evaluation metrics’ values and accuracy by using the BERT language model for the classification of sustainability factors.
This paper is structured as follows: A literature review is presented in
Section 2. The training dataset, the phases of dataset growth and preprocessing, and the feature engineering procedure are covered in
Section 3. The mapping approach suggested in this study is presented and discussed in
Section 4. In
Section 5, the evaluation of the experiment carried out to assess the suggested BERT model is shown, and the outcomes are discussed. The study’s conclusions and potential future research directions are covered in
Section 6.
3. Preprocessing and Dataset Training
We used an expanded version of PROMISE, the most commonly used dataset in software engineering research. The PROMISE_exp dataset is used as a base dataset in our research study. We added healthcare software system requirements to this dataset. Firstly, we analyzed healthcare system requirements. After analysis, we added those requirements to our repository. Experts validated these extracted requirements. This process will improve the original dataset. The main reason to choose the healthcare system’s requirements is the diverse nature of the healthcare system. As the healthcare system is a critical and sociotechnical system, adding requirements to this system will improve the dataset and may show the other perspectives of the dataset. The PROMISE dataset has 625 labeled requirements. PROMISE_exp has 969 labeled requirements. The requirements of both datasets have been written in the English language. Numerous studies have used these datasets for requirements classification purposes [
63]. Both datasets contain three major attributes. Those attributes are the project ID, requirements text, and class of the requirement. The project ID defines the project from which the requirement has been extracted. Requirements text addresses the major text describing a specific project’s requirements. The class attribute defines the major category of requirement from which it belongs. PROMISE_exp contains functional and 11 types of nonfunctional requirements. Those types are usability (US), legal (L), scalability (SC), portability (PO), maintainability (MN), availability (A), security (SE), look and feel (LF), fault tolerance (FT), performance (PE), and operational (O). Some other nonfunctional requirements are added after adding healthcare system requirements to the PROMISE_exp dataset. Those requirements will be discussed in the next section.
3.1. Overview of Proposed Novel Methodology
The proposed methodology has several benefits over the existing literature studies. The proposed study maps NFRs to sustainable green IT factors and classifies them within the context of NFRs using the BERT model. Classification accuracy and other evaluation metrics have also been calculated. The details of every part of the proposed study will be discussed in the following sections.
Figure 1 presents the basic workflow of the proposed methodology. The novel contributions of the proposed work are highlighted in red boxes in the figure.
The highlighted boxes in the above figure show the major contributions of our proposed research study. In the first box, we extended the PROMISE_exp dataset by adding 61 new requirements from the healthcare system document. After extending, we map the NFRs with sustainable green IT factors. We make two sustainable green IT groups (socio-economic and eco-technical) by merging four sustainable dimensions (economic, environmental, social, and technical). We add one column for the binary classification of green IT factors. After evaluating 18 different NFRs in the dataset, software experts map the green IT factor to its relevant NFR. Each NFR is assessed based on its contribution to the socio-economic or eco-technical group. Each NFR is labeled according to its nature. For example, one NFR from the dataset is “The system shall refresh the display every 60 s”, which comes under the eco-technical green sustainable group due to its contribution to environmental and technical sustainable dimensions. After mapping all NFRs with green IT factors, we apply the BERT model for feature engineering. After applying the BERT model, we find the classification accuracy of our proposed methodology. The BERT model gives us an accuracy of 0.90 and classifies 403 socio-economic and 149 eco-technical requirements. Our novel methodology is different from existing literature studies. The study identified hidden sustainability characteristics in the requirement change management process. Sustainable software has different dimensions to understand NFRs in the system. The study addressed the sustainable requirements in the scrum lifecycle. The study addressed the importance of sustainability in the requirements engineering process. Our proposed study merges the four dimensions of sustainability into two green IT groups and maps all NFRs to those green IT factors according to their context. The details of every section of the above figure are given in the following sections.
3.2. PROMISE_exp Dataset Expansion
We extended the PROMISE_exp dataset as a part of our research contribution and to cover all aspects of software quality attributes. All the steps executed for preprocessing, labeling, and extending the PROMISE_exp dataset are shown as an activity diagram in
Figure 2.
We expanded the dataset in two different steps. Firstly, we added seven more types of nonfunctional requirement to the dataset: efficiency (EF), interoperability (IN), reliability (RE), accessibility (AC), reusability (REU), accuracy (ACU), and adaptability (AD). These nonfunctional requirements can also be used to find the sustainability degree of the software. All the steps that were performed in the above
Figure 2 are the preprocessing steps before adding the functional and nonfunctional instances into the repository. In the first stage, the healthcare system requirement document was searched. After searching, a healthcare system document was retrieved. After a deeper analysis of healthcare system requirements, functional and nonfunctional requirements were extracted and classified according to their type. After extraction, the validation of software requirements was executed by the experts. Experts check the consistency and completeness of the requirements. After the validation of the NFRs, all extracted NFRs were added to the PROMISE_exp dataset. This step extended the PROMISE_exp dataset. The expanded PROMISE_exp dataset now contains 1030 requirements distributed over 50 software projects. There are a total of 478 functional and 552 nonfunctional requirements in the extended dataset. The distribution of all types of nonfunctional requirement is presented in
Figure 3. Another column in the dataset was created to specify the sustainability factors as a part of our research contribution. The new expanded dataset was unlabeled. The next section will describe the labeling process and sustainability factors.
From
Figure 3, we can see that the SE (security) NFR has the greatest number (128) in the extended dataset, while AD (adaptability), RE (reliability), IN (interoperability), and REU (reusability) have small numbers in the extended dataset. The large number of security requirements indicates that security is a major concern in the PROMISE_exp dataset. Before extending the PROMISE_exp dataset, it contained 969 requirements. We added the 61 new requirements of the healthcare system to the dataset. As security is a major concern in the healthcare system, the new extended dataset contains the maximum security requirements. Many NFRs come under the category of security requirements, such as authentication, authorization, integrity, confidentiality, immunity, and auditing [
64]. Another reason is its capability to influence other NFRs. Security can produce an impact on the performance, efficiency, scalability, portability, usability, and availability of the system directly or indirectly [
65]. These are the reasons for the major contribution of security requirements in most software engineering–related datasets.
3.3. Dataset Labeling
According to the nature of the extended PROMISE_exp dataset, manual labeling was selected. A software domain specialist was selected to analyze each instance and assign appropriate labels to each instance. These labels serve as a good resource to show the context of sustainability factors related to green IT principles. The labeling process is presented in
Figure 4.
To map sustainability dimensions with green IT factors, two labels were introduced for describing the categories of green IT factors. Those two labels are “eco-technical” and “socio-economic”. These two labels were used for the semantic classification of different NFR text into two categories. The software expert asked to decide on unlabeled NFR in the dataset. Software experts analyzed all NFRs and decided whether NFRs contribute to technical or environmental sustainability. For example, the requirement “The application gives output in the designated period” contributes to environmental and technical sustainability by not consuming so much energy and power. After this labeling, the distribution of these green IT factors shows that the expanded dataset has more socio-economic than eco-technical labels. The dataset has 403 socio-economic and 149 eco-technical green IT factors. The evaluation process of labeling the NFRs consists of three steps. Firstly, we check the impact of each NFR on the sustainability dimension. Then, we evaluate the semantics of each NFR and check its impact on the green IT factor. Finally, we added a new column to label each NFR with a specific green IT factor.
3.4. Feature Engineering
In this section, we will describe feature engineering that gets the language features of sustainability NFRs in an extended dataset. There are numerous feature-engineering techniques, but we selected BERT, which Google AI introduced. The BERT model provides syntactic relations and text semantics precisely. The BERT model performs well in classification, question answering, and natural language processing tasks [
66]. BERT is a pre-trained model that has two major goals. One goal is to mask language modeling, and the other is sentence prediction. These goals make BERT a favorite and adaptable model for text classification in our proposed study. BERT has many advantages. One of the advantages of BERT is its transfer-learning abilities. BERT replaces the traditional pipeline for natural language processing-based systems. We can reduce the processing time as BERT is a time-consuming model compared to other deep learning models. The BERT model can provide concurrency and efficiency to compute long text sentences. It can understand long sentences in a finer way compared to other models [
67]. These benefits of BERT beat other techniques such as LSTM (Long Short-Term Memory), TF-IDF (Term Frequency–Inverse Document Frequency), and Bag of Words.
In this study, we used the scikit library for feature extraction and pre-processing tasks. We can also use the PyTorch library of Python, a user-friendly library for tensor computation. PyTorch is an open-source library that offers compatibility with a broad range of other Python libraries [
68].
Figure 5 shows the BERT model training process using an activity diagram. Firstly, the extended dataset is converted to a scikit frame using the scikit library. Then, we load the BERT tokenizer and convert NFR text into tokens. After tokenization, all labels are converted into numeric values. After tokenization, testing and training datasets are formed. Now, BERT model is used for sequence classification. After the classification trainer is created, finally, the BERT’s accuracy is calculated.