New Model for Defining and Implementing Performance Tests

Bolanowski, Marek; Ćmil, Michał; Starzec, Adrian

doi:10.3390/fi16100366

Open AccessArticle

New Model for Defining and Implementing Performance Tests

by

Marek Bolanowski

^1,*

,

Michał Ćmil

^1,*

and

Adrian Starzec

²

¹

Department of Complex Systems, The Faculty of Electrical and Computer Engineering, Rzeszow University of Technology, 35-959 Rzeszów, Poland

²

Faculty of Electrical and Computer Engineering, Rzeszow University of Technology, Al. Powstańców Warszawy 12, 35-959 Rzeszów, Poland

^*

Authors to whom correspondence should be addressed.

Future Internet 2024, 16(10), 366; https://doi.org/10.3390/fi16100366

Submission received: 29 July 2024 / Revised: 20 September 2024 / Accepted: 7 October 2024 / Published: 10 October 2024

(This article belongs to the Special Issue Internet of Things Technology and Service Computing)

Download

Browse Figures

Versions Notes

Abstract

:

The article proposes a new model for defining and implementing performance tests used in the process of designing and operating IT systems. By defining the objectives, types, topological patterns, and methods of implementation, a coherent description of the test preparation and execution is achieved, facilitating the interpretation of results and enabling straightforward replication of test scenarios. The model was used to develop and implement performance tests in a laboratory environment and in a production system. The proposed division of the testing process into layers correlated with the test preparation steps allows to separate quasi-independent areas, which can be handled by isolated teams of engineers. Such an approach allows to accelerate the process of implementation of performance tests and may affect the optimization of the cost of their implementation.

Keywords:

performance testing; performance test application; performance methodology; stress testing; load testing; application stress test; web application stress test; web server stress test; application bottleneck; distributed system performance test; formal software testing

1. Introduction

Performance testing is a concept used very widely and considered in the design process of information systems and applications in particular. Precise determination of application performance is becoming increasingly important for economic, computing resource reservation, and reliability reasons. For services using complex server systems with implemented vertical and horizontal scaling functions, the lack of precisely defined performance thresholds is not a critical problem. Increasingly, however, applications, including critical systems, are running on Internet of Things (IoT) devices that have limited hardware resources. Traditional programming paradigms may not be sufficient to effectively manage limited resources such as CPU and RAM. Therefore, more and more attention should be paid to developing techniques to accurately determine such parameters as application performance, how much load is placed on individual hardware resources depending on the number of user queries, response size, database queries, etc.

Initial research work showed that the number of tools used in performance testing is very large, and new ones are being developed all the time, both commercial and distributed under OpenSource licenses. However, the work identified a problem related to the precise definition of the term “performance test”. In the literature, we can find many different definitions of this concept and many different types of this test. Related to this situation is a much more serious problem—the possibility of comparing the results of performance tests of an application installed on different software and hardware environments performed by different entities. Currently, there are no consistent methods available to describe how to perform such comparative tests. Thus, based on the analysis of the current state of knowledge and the gaps that were identified, the authors proposed a new model for defining and implementing performance tests. The following are the main objectives and the authors’ own contributions to this article:

Conceptual ordering related to the implementation of performance tests.
Development of coherent model that allow the realization of comparative tests, taking into account the possibility of analyzing the individual components of the system in both software and hardware aspects.
Testing of the developed solution in real conditions.

This paper is organized as follows: Section 1 provides an introduction and defines the scope of the work conducted. Section 2 reviews the scientific and specialized literature related to the area of conducted research and identifies gaps in the current state of knowledge. Section 3 formulates the problem and proposes a new model for the definition of performance tests. Section 4 presents the test results of the developed model. Section 5 concludes the article.

2. Literature Review

As part of the preliminary work, it was decided to check how the performance tests currently described in scientific publications, as well as in the professional and specialized literature, are implemented. As a first step, it began with a detailed analysis of the leading publishers and databases of scientific publications: IEEE, Springer, Google Scholar, Elsevier, MDPI—Multidisciplinary Digital Publishing Institute, ScienceDirect, Web of Science, and Scopus. The following keywords were used in the search: performance testing, performance test application, performance methodology, stress testing, load testing, application stress test, web application stress test, web server stress test, performance stress test technology, application bottleneck, web server bottleneck, distributed system performance test, web request performance, TCP performance test, and formal software testing.

The paper [1] presents an analysis of various performance testing methodologies and best practices used to evaluate the performance of software systems, including the importance of testing in detecting bottlenecks and optimizing resource utilization. Various performance tests such as load tests, stress tests, and scalability tests are described. The importance of designing appropriate and realistic test cases includes, but is not limited to, real user data to assure better and more accurate performance evaluation is emphasized. The possibilities of using modern tools and technologies in performance testing were analyzed. The importance of integrating continuous testing into the software development and operation cycle was emphasized, which is key to early identification of performance regressions and facilitates rapid troubleshooting. Also highlighted are the challenges and limitations of performance testing, including the complexity of predicting user behavior, the dynamic nature of distributed systems, and the need for effective analysis of test results. The article focuses mainly on describing test data and solutions. However, it does not include an analysis of real tests conducted by the authors in laboratory or test conditions. The authors of the article [2] presented a study of the impact of Asterisk server hardware configuration on VOIP Quality of Service (QoS). They gradually loaded the server with massive calls and checked the performance of the CPU and RAM. They also scanned network packets and monitored call quality. They tried to examine the threshold value for the number of bulk connections generated by the hardware configuration in order to guarantee good QoS. The article is strongly practical in nature. However, it lacks a detailed description of the test scenarios themselves. In the publication [3], a tool called Testing Power Web Stress was developed, providing the ability to implement extreme server testing along with the ability to analyze the server response time in the course of executing a given transaction. Three functional areas were created for testing: within the Local Area Network (LAN), from the computer to the Wide Area Network (WAN), from the computer to the cloud network and then to the WAN environment. In order to verify the accuracy of the function, a free network load test tool called Pylot was used as a benchmark. Both of the tools used recorded response times; however, they also encountered limitations such as the inability to analyze different computer specifications and the inability to host a website using Hypertext Transfer Protocol Secure (HTTPS). The article limits the range of tests considered, e.g., no server throughput tests were conducted. There is also no information on the consumption of server resources during the tests. The article [4] compares the performance of network testing tools such as Apache JMeter and SoapUI. The key objectives of the article were to evaluate the key technological differences between Apache JMeter and SoapUI and to examine the benefits and drawbacks associated with the use of these technologies. Ultimately, the authors came to the following conclusions: compared to SoapUI, JMeter is more suitable for client–server architecture. JMeter handles a higher load of HTTP user requests compared to SoapUI, it is also open source and no infrastructure is required for its installation. The tool is based on Java, so it is platform independent. JMeter can be used to test the load of large projects and generate more accurate results in graphs and Extensible Markup Language (XML) formats. The article lacked a representation of what a given test environment looks like. Also, not indicated was the ability to track the occupancy of server resources during testing. Publication [5] proposes a performance testing scheme for mobile applications based on LoadRunner, an automated testing tool developed in the C programming language. The solution supports performance testing of applications in C, Java, Visual Basic (VB), JavaScript, etc. The test tool can generate a large number of concurrent user sessions, or concurrent users, to realize concurrent load. Real-time performance monitoring makes it possible to perform tests on the business system and, consequently, to detect bottlenecks, e.g., detect memory leaks, overloaded components, and other system defects. The tests performed are well described, but the paper does not provide a general scheme for conducting the test and exact methods for tracking key resources of the server under test. The authors of the article [6] describe application performance analysis using Apache-Jmeter, and SoapUI. Apache-Jmeter is pointed out as a useful tool for testing server performance under heavy load. SoapUI, on the other hand, verifies the quality of service of a specific application under variable load. The authors described how load tests should be designed and created a table where it is shown how to identify whether a test is a load test according to various characteristics. The paper [7] analyzed the impact of more realistic dynamic load on network performance metrics. A typical e-commerce scenario was evaluated and various dynamic user behaviors were reproduced. The results were compared with those obtained using traditional workloads. The authors created their test model and showed what their network topology looks like. To define the dynamic workloads, they used the Dweb model, which allows them to model behaviors that cannot be represented with such accuracy using traditional approaches. The authors found that dynamic workloads degraded server performance more than traditional workloads. The authors of the article [8] aimed to present a method for estimating server load based on measurements of external network traffic obtained at an observation point close to the server. In conducting the study, the time from TCP SYN to SYN + ACK segments on the server side was measured. It was found that SRT on servers varies throughout the day depending on their load. The concept introduced in paper [9] describes an AI-based framework for autonomous performance testing of software systems. It uses model-free reinforcement learning (specifically Q-learning) with multiple experience bases to determine performance breaking points for CPU-intensive, memory-intensive, and disk-intensive applications. This approach allows the system to learn and optimize stress testing without relying on performance models. However, the article primarily focuses on describing the framework and its theoretical use, without providing practical examples of its implementation. Publication [10] presents an analysis of indicators and testing methods of the performance testing of the web, and put forward some testing process and methods to optimize the strategy. The paper has the merit of providing a detailed definition of stress and load testing and a number of strategies for optimizing test performance. However, its focus is strictly on web application testing, which limits the broader applicability of the tests. Article [11] demonstrates the usage of the model-driven architecture (MDA) to performance testing. Authors do this by extending their own tool to generate test cases which are capable of checking performance specific behavior of a system. However, as the authors themselves acknowledge, the method requires further development, as it does not provide insight into identifying bottlenecks or how different system resources are utilized. It merely assesses overall system performance, indicating the percentage of tests that passed or failed for a given number of test executions. Paper [12] shows an example of conduct analyzing the behavior of the system in the server environment that currently run and then optimize the configuration of the service and server with JMeter as a performance test tools. The researchers focus on testing application performance with load testing and stress testing. The paper presents a single specific example of testing on one system, which limits its universality. Article [13] presented an approach for load testing of websites which is based on stochastic models of user behavior. Furthermore, authors described implementation of load testing in a visual modelling and performance testbed generation tool. However, the article is limited to load testing only. It does not describe other types of tests. The example only applies to websites. In work [14] authors apply load and stress testing for software defined networking (SDN) controllers. In essence, the article focuses only on answering the question “How much throughput can each controller provide?”. While the article explains why the performance tests were performed and briefly describes the methodology, it does not explain why the specific tests were chosen, nor does it delve into the nature of the testing methodology. The article is highly technical, focusing on test results, but lacks justification for the specific testing approaches used.

The scientific papers presented above on performance testing have often been characterized by an overly general approach or limited to a few specific tools, such as JMeter, but without a thorough presentation of the methodology and specific examples of its broader application. Some relied only on brief descriptions of the tests in question, while others mentioned what the tests were about and showed a brief example of how to use it in practice or analyzing the results obtained. In addition, the work mainly focused on two types of tests: stress and load tests. Further, the paper analyzed specialized industry sources, technical reports, and materials from manufacturers of performance testing solutions. In reports [15,16,17] issued by IXIA, one can find descriptions of test parameter settings and a list of possible load tests with simple usage examples. The official user’s guide to IxLoad [18], which contained similar information, was also analyzed. In addition, a number of articles on load testing were found in the online literature. In [19], the definition of stress testing was presented and the most important aspects were briefly described, such as why to do such tests, what to pay attention to when testing, possible benefits, etc. However, the descriptions, metrics, and examples presented were very vague, e.g., the test presented assumed saturation of server resources, but no explanation of how to do it. The text is limited only to stress tests. Reference [20] also describes in general terms what stress testing is and presents two metrics: the Mean time between failures (MTBF) and the Mean time to failure (MTTF). As before, these metrics are too general, as they assume averaging of time. Article [21] describes the differences between the different tests and proposes several testing tools like JMeter but does not provide an example of how to use it. It is also very vague about the metrics. Similarly, item [22] does not present examples or define relevant metrics to describe test results. Items [23,24] are limited only to defining what stress tests are and explaining the benefits of using them. Item [25], compared to the previous two, still presents a list of stress testing tools with their characteristics—but no examples of use. In [26] presented stress tests based on a JavaScript script that simulates the work of a certain number of users for a set period of time. Item [27] described different types of tests and proposed the BlazeMeter tool for conducting them but presented only a laconic example of its use.

In summary, the literature items presented provided only general descriptions of load tests (stress and load) and the differences between them and performance tests. These works pointed to exemplary tools, such as JMeter, without a detailed explanation of the methodology, description of specific metrics, or examples of application. Among the main problems identified during the literature analysis stage and the authors’ experience, we can highlight (1) The lack of official standards, including tools or testing methodologies, results in a lack of uniform test scenarios, leading to divergent approaches to testing systems in practice. When there are no clearly defined standards or guidelines for tools or test methodologies, different teams may use different approaches. In addition, the lack of consistent guidelines can result in tests not being conducted in a complete and reliable manner, jeopardizing the reliability of the results and the effectiveness of the system evaluation. For example, a testing team focusing only on stress testing at the expense of load testing may overlook memory leakage issues. (2) In addition, there is a noticeable lack of experience in test preparation teams. Failure to include in a test scenario all the key aspects and variables that occur in a real operating environment runs the risk that the system may perform well under test conditions but fail to meet requirements in real-world applications. For example, skipping stress testing may result in overlooking anomaly handling problems such as a sudden increase in the number of users. (3) Subjectivity in performance evaluation. When there are not clearly defined and objective evaluation criteria, evaluators may rely on their own beliefs, experiences, and intuition, which in turn leads to uneven and potentially unfair assessments. Subjectivity in interpreting results can also create conflicts and misunderstandings among different stakeholders, who may have different expectations and priorities. As a result, decision-making based on such results becomes less transparent and more prone to error and manipulation. (4) There is also a noticeable lack of diversity in performance testing which in practice translates into being limited to using only a few types of tests. This can lead to an incomplete understanding of system performance and its behavior under varying conditions. When only stress tests and load tests are described, other important aspects of performance testing that can provide valuable information about the system are omitted. For example, information on scalability, throughput, and endurance tests is very often missing. (5) In published works and reports, we are very often confronted with a very elaborate description of the assumptions of the tests and the tools used, and many times there is only a theoretical description of the tests without application and an example of its use in practice. The analyzed works lack detailed descriptions of test preparation and implementation. This stage is very often overlooked—and it is crucial for the results obtained.

3. Problem Formulation and Proposed Solution

It is natural to strive for maximum resilience of systems and software against errors and failures resulting from performance problems and overloads. As Edsger W. Dijkstra noted: “Program testing can be used to show the presence of bugs, but never to show their absence!” [28]. It follows that the process of identifying bugs is inherent in the process of developing, implementing, and operating the system. It should be emphasized, however, that the application should not be tested in isolation from the hardware parts on which it resides. To begin with, let us look a little more closely at the types of performance tests encountered in the literature. We can distinguish three main groups of tests [29]:

Performance testing: The process of measuring and evaluating aspects of software system performance. Metrics tested include response time, throughput, and resource utilization.
Load testing (load test): The process of evaluating the behavior of a system under a typical load to detect problems that only occur when the system is operating at an increased rate, such as memory leaks or system instability.
Overload testing (stress testing): The process of subjecting a system to extreme conditions to test its resilience and detect problems associated with overloading, such as saturation of CPU or memory resources or database errors.

In addition, we can find additional types of tests in the literature:

Spike Testing assesses the performance of a system when there is a sudden and significant increase in the number of simulated end users. Spike testing helps determine whether a system can handle a sudden, drastic increase in workload over a short period of time. As with stress testing, the IT team typically conducts spike tests before a major event in which the system is likely to be exposed to higher-than-normal workloads [1].
Endurance Testing simulates a steady increase in the number of end users over time to test the long-term stability of the system. During the test, the test engineer monitors Key Performance Indicators (KPIs), such as memory usage, and checks for failures, such as memory shortages. Stress tests also analyze throughput and response times after long-term use to show whether these indicators are consistent with their state at the start of the test [30].
Scalability Testing—a non-functional testing method that measures the performance of a system or network when the number of user requests is increased or decreased. The purpose of scalability testing is to ensure that the system can handle anticipated increases in user traffic, data volume, transaction count frequency, etc. [31].
Volume Testing is a type of software testing in which software is subjected to a huge amount of data. It is also referred to as flood testing. Volume testing is carried out to analyze system performance by increasing the amount of data in the database. Using volumetric testing, it is possible to study the impact on response time and system behavior when the system is exposed to a large amount of data [1].

3.1. Dependency Matrix of Objectives and Test Types

The names of the above tests are often used interchangeably, which is a mistake, since each has a specific purpose, even though they share some common aspects. Unlike functional tests, which have clearly defined objectives (pass/fail criteria), load test objectives (e.g., performance requirements) are not formally defined. The problem lies in the lack of an official, formal standard for load testing. Developers, when creating systems or applications, can use the ISO/IEC 25010 standard [32], which defines quality characteristics for software evaluation. However, testers must rely on their own experience when it comes to performance/load/stress tests. In order to organize this process, the authors proposed a m matrix of dependencies of objectives and a given type of testing. This approach makes it possible to precisely define the scope of test work carried out depending on the objectives of a given test scenario and the customer’s requirements. It should be noted that the matrix shown in Figure 1 can take different forms depending on the entities or teams in which it is implemented. The example test objectives presented in Figure 1 are taken from the publication [29].

The proposed matrix consists of two main elements. The first is a set

T = {t_{0}, t_{1}, \dots t_{m}}, m \in ℕ

whose elements denote specific types of performance tests that check specific performance parameters of the system or its components. In the example given, we used a set of predefined tests (seven different tests). It should be noted that the set of tests can grow and depend on the particular organization or project team. The second set

A = {a_{0}, a_{1}, \dots a_{n}}, n \in ℕ

is the set of test objectives defined by the organization—also, this set can change over time and evolve. Figure 1 shows a matrix that juxtaposes the sets

T

and

A

. The matrix consists of rows corresponding to objectives and columns corresponding to test types. Each row of the matrix contains a colored marker that indicates whether a test type needs to be included to achieve a given test objective. For instance, as you can see in Figure 1, stress tests check elements related to the resilience of the system to operate under extreme conditions including network problems. In the case of performance testing, the tester will rather focus on determining the response time of the entire system or individual components in the case of handling packets depending on their size. In the case of load testing, the tester also examines the response time and resilience of the system, taking into account the load close to that predicted during real system operation.

The use of such a matrix very precisely defines the objectives of the tests, the components, and the ways of carrying out the given scenarios. At this point, it should be emphasized that the dependency matrix with objectives and test suites must be created independently by the project team in order to meet the specific requirements of the application under development. These requirements can, for example, be defined internally by the development team or by the customer. A single requirement can directly translate into a single test objective or multiple test objectives. The matrix presented in Figure 1 is an example of a matrix built for the objectives defined in the publication [29] with a test set developed based on the experience of the authors of this article. Once prepared, the matrix can be used repeatedly during the project and operation of the system. In addition, it essentially facilitates the analysis of test results by the development team. During the test implementation of the developed matrix in the web-based system that improves the quality and effectiveness of educational services for primary school students in the field of robotics and programming [33] the need to modify the form of the set

T

to the matrix was identified:

T = \{\begin{matrix} t_{0}^{0} & t_{1}^{0} & \dots & t_{m}^{0} \\ t_{0}^{1} & t_{1}^{1} & \dots & t_{m}^{1} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ t_{0}^{n} & t_{1}^{n} & \dots & t_{m}^{n} \end{matrix}\} n, m \in ℕ,

This notation makes it possible to define different tools and test scenarios of a given type for different targets from the set

A

. For the purpose of presenting the possibilities of this approach, let’s introduce the following notations:

t_{0}^{0}

—refers to performing a stress test using the JMeter tool, a

t_{0}^{1}

—refers to performing a stress test using Keysight’s Novus ONE PLUS hardware generator. For the example shown in Figure 1, a stress test (

t_{0}

) can be implemented for a goal

a_{1}

using the JMeter tool and then the value of

t_{0}^{0}

will appear in the dependency matrix instead of

t_{0}

. For the purpose of

a_{3}

, it may be more appropriate to run a stress test using Keysight’s Novus ONE PLUS hardware generator and then instead of the

t_{0}

value, we enter the

t_{0}^{1}

value. The use of the matrix form

T

(instead of vector notation) expands the descriptive possibilities offered by the dependency matrix. However, the use of such an approach is not mandatory, and its appropriateness for use should be evaluated by the project team.

To illustrate the use of the dependency matrix, the following example demonstrates how it can be applied to a file server (Figure 2).

In the case of a file server, it is crucial to ensure a smooth user experience despite high server load, efficient processing of large amounts of data, and maintaining the integrity of transferred files even when network quality degrades. Therefore, three critical aspects of server performance were selected, which were the focus of the performance tests:

Correct functioning of the system under load— $a_{0}$ ;
Validation of whether a system can efficiently handle large files— $a_{1}$ ;
Maintains correct communication while bandwidth drops or packet loss— $a_{2}$ .

These three test objectives were selected because they focus on key aspects of server performance. Appropriate tests can be selected for these purposes (let us use the tests described at the beginning of Section 3 for this). The combination of load, spike, and volume tests allows for evaluating of how well the file server handles multiple concurrent connections and file transfers while maintaining stability (objective

a_{0}

). Performing a collection of stress, performance, and volume tests ensures the server can efficiently transfer large files without hitting performance limits (objective

a_{1}

). The last case consists of stress and scalability tests assessing the server’s resilience in maintaining file transfer integrity under degraded network conditions like reduced bandwidth or packet loss (objective

a_{2}

).

3.2. Topological Pattern of Performance Test Execution

A precisely defined set of tests, presented in the form of the matrix presented in Figure 1, can be implemented in a real or laboratory test environment in the next step. In order to standardize this process, the authors of this work proposed the introduction of a topological pattern for the implementation of performance tests, which is shown in Figure 3. As illustrated in Figure 3, the pattern presented in this work is based on three layers: network traffic generator layer (L1), network layer (L2), and application layer (L3). The first layer (L1) includes the device/application/server that generates network/application traffic at the appropriate intensity or simulates user sessions or traffic generated by a given application. The L1 layer lists a set of elements

G = {g_{0}, g_{1}, \dots g_{u}}, u \in ℕ

, which defines a set of tools and techniques for generating workload and is specific to a particular organization. An example element of set

G

is shown in Figure 4. The generated load is configurable, allowing the simulation of various network traffic scenarios, from typical loads to extreme stress conditions. All simulated traffic is redirected to the system under test through the network layer marked with a green arrow input (INPUT).

The network layer (L2) is the network that transits the data flow between the traffic generator and the server/application. Within the L2 layer, a set of elements

N = {n_{0}, n_{1}, \dots n_{r}}, r \in ℕ

is specified, which defines a set of tools and techniques for traffic transmission and is specific to the organization. An example set

N

is shown in Figure 4. The main aspect when testing this layer is to monitor and analyze the behavior of the network under load, including latency, packet loss, and the performance of individual network modules. The network can be both local (LAN) or distributed (WAN) and feature varying levels of infrastructure complexity (multiple switches, routers, or firewalls). However, it should be noted that it should strive for a situation where the network layer is the least complex possible because otherwise the tester (the engineer conducting the test) must have perfect knowledge of the parameters of all network devices through which traffic passes [34,35]. In this layer, it is possible to place additional components that will simulate adverse network conditions, packet loss, packet reordering, jitter, or introduce additional delays. The third layer is the application layer, in which the server resides along with the hardware infrastructure. By hardware architecture, the authors mean not only physical servers but also the cloud computing environment within which the application is operated. The server handles incoming requests and responds to them, allowing us to evaluate its performance, stability, and scalability. These measurements are usually performed at the L1 layer. In addition, during load testing, various parameters are measured, such as response time, throughput, and resource utilization (CPU, memory, disk)—in this case, it is necessary to access elements/agents installed in the L3 layer. Within the L3 layer, a set of elements

C = {c_{0}, c_{1}, \dots c_{z}}, z \in ℕ

is specified. They correspond to the components of the hardware and software platform on which the application runs (see Figure 4). These components independently or using dedicated agents send data to the DATA COLLECTOR, to which data from the other layers, L1 and L2, are also sent. It is important that the components operating in each layer have synchronized system clocks, for example, using the Network Time Protocol (NTP) protocol. This allows the DATA ANALYZER to correlate events from each layer among themselves and draw conclusions. During testing, data such as load on individual system parts, response times, traffic saturation, resource occupancy, switching times, etc. are collected, among other things. Protocols such as SNMP, RestAPI, SSH, for instance, can be used for this purpose. The DATA ANALYZER visualizes and collates the results obtained. Depending on the adopted architecture, it can also enable automation of the evaluation of the functioning of the system under test through the use of machine learning mechanisms. When designing the framework, it was intended to achieve the greatest possible versatility, i.e., the ability to adapt and apply in different environments and to test a wide range of IT system components, such as servers, network applications, and network devices.

The entire test definition process is shown in Figure 5. In the first step, the team selects the objective of test execution

a_{1}

, and then in step 2 a set of performance test types that will achieve the selected test objective

\{t_{0}, t_{2}, t_{3}, t_{4}, t_{5}, t_{6}\}

. In the next step, the topological pattern of test execution is determined for each element of the set

T

(in Figure 5, steps 3 and 4 are considered for the element

t_{0}

). In the example given in Figure 5, the test topology consists of all layers, i.e.,

\{L_{1}, L_{2}, L_{3}\}

. Of course, it is possible to develop a test that will contain only two layers, for instance:

\{L_{1}, L_{3}\}

. In step 4, the elements of the sets

G, N, C

that allow the test to be realized are determined. It is also possible to use an abbreviated form of description of the test vector realizing the goal:

\begin{array}{l} a_{1} = t_{0} \{L_{1} (g_{0}, g_{2}, \dots), L_{2} (n_{0}, n_{1}, \dots), L_{3} (c_{2}, c_{3} \dots)\} + \\ t_{2} \{L_{1} (g_{0}, g_{3}, \dots), L_{3} (c_{1}, c_{3} \dots)\} + \\ ⋮ \\ t_{6} \{L_{1} (g_{4}, g_{5}, \dots), L_{2} (n_{0}, n_{7}, \dots), L_{3} (c_{2}, c_{3} \dots)\} . \end{array}

After precisely defining the objectives and test sets, they can be executed for a given IT system. As a result of conducting these tests, outcomes will be obtained for the test sets

t

defined for the specific objectives

a

. Based on the analysis of these results, the development team can determine the minimum acceptable threshold values that will indicate the system’s readiness for production implementation. These values can be determined experimentally, based on client requirements or the experience of the project team developing the system.

In summary, in order to achieve high-quality and reliable systems, it is necessary to precisely execute different types of performance, load, stress, and other relevant tests, with the right methodology and tools, tailored to the specific needs of the software under test. However, selecting the right methodology and tools can be challenging as it depends on the specific project requirements, system characteristics, and testing objectives. For example, in the case of an operational e-learning platform, we focused primarily on testing its resilience under high network traffic, as this is the most common type of load the platform experiences in practice. As a result of this approach, a kind of reference model of test execution was created in which each step can be treated as a separate layer. Thanks to this, analogous to the ISO/OSI reference model [36,37], it is possible to apply specialization in knowledge and technology for individual layers, which results in the possibility of flexibility in managing human teams and outsourcing certain elements of the testing process.

4. Experiment and Discussion

This section presents the results of applying the above test scheme to some of the most common types of load tests carried out for two real systems (web servers)—the first, without most additional services such as DNS, running under laboratory conditions, and the second running under production conditions.

All tests were performed using the IXIA Novus One traffic generator and the dedicated IxLoad software. The application operates on layers 4–7 of the ISO/OSI model, which makes it possible to test the efficiency of support for specific state protocols and to examine the server’s response in fixed bandwidth traffic. The parameters of the generator used are shown below:

General specifications: OS: Ixia FlixOS Version 2020.2.82.5; RAM: 8 GB; Hard disk: 800 GB; 4 × 10 GBase-T-RJ-45/SFP+; maximum bandwidth: 10 Gbit/s; complies with standards: IEEE 802.1x, IEEE 802.3ah, IEEE 802.1as, IEEE 802.1Qbb, IEEE 1588v2, IEEE 802.1Qat.

Protocols: data link: Fast Ethernet, Gigabit Ethernet, 10 Gigabit Ethernet; network/transport: TCP/IP, UDP/IP, L2TP, ICMP/IP, IPSec, iSCSI, ARP, SMTP, FTP, DNS, POP3, IMAP, DHCPv6, NFS, RTSP, DHCPv4, IPv4, IPv6, SMB v2, SMB v3; Routingu: RIP, BGP-4, EIGRP, IGMP, OSPFv2, PIM-SM, OSPFv3, PIM-SSM, RIPng, MLD, IS-ISv6, BGP-4+, MPLS, IS-ISv4; VoIP: SIP, RTP.

4.1. Scenario 1

In the first case, an example of the use of the model bypassing the Network Layer is demonstrated, i.e.,

\{L_{1}, L_{3}\}

. The performance test model without a network layer is useful in scenarios where the objective is to directly evaluate the performance of a server or application (let’s denote this as

a_{7}

). In order to achieve the objective, it was decided to run the following types of tests (elements of the

T

set):

$t_{0}$ —The Connections per Second test (CPS) provides a rough performance metric and allows for determining how well the server can accept and handle new connections. The test aims to establish the maximum number of connections (without transactions) required to maximize CPU resource utilization.
$t_{1}$ —The throughput test determines the maximum throughput of a server. It should be noted, however, that this refers to the throughput of the application layer and not to a general measurement of throughput as the total number of bits per second transmitted on the physical medium.

In addition, the following elements of the sets

G, C

were defined:

For $t_{0}$ : $g_{1}$ —TCP/UDP Packet Generators, $c_{0}$ —Servers: Web, Database; $c_{1}$ —OS Built-in Monitoring Tools.
For $t_{1}$ : $g_{1}$ , $g_{2}$ —HTTP Request Generators, $c_{0}$ , $c_{1}$ .

The test notation for the scenario 1 target is as follows:

a_{7} = t_{0} \{L_{1} (g_{1}), L_{3} (c_{0}, c_{1})\} + t_{1} \{L_{1} (g_{1}, g_{2}), L_{3} (c_{0}, c_{1})\}

In this example, the test object was an html page. It was hosted on an Apache 2.4 server. Services such as DNS, SSL, or firewall (possible bottlenecks) were also disabled. The system was hosted on a server with the following parameters: OS: Debian 12; RAM: 8 GB; Hard disc: 800 GB; CPU: Intel Pentium i5. The topological pattern of the test is shown in Figure 6. In addition, all data were collected on a dedicated desktop computer which acted as DATA COLECTOR (DC) and DATA ANALYZER (DA) (see Figure 3).

During the test, TCP connections were attempted and http requests were sent using the GET method to an example resource on the server—in this case, it was a website on the server called index.html (size: 340 bytes). The web page was almost completely devoid of content. As can be seen in Figure 6, the system is devoid of any additional devices—communication takes place directly on the traffic generator—server line. Network traffic takes place via a 1 Gbit/s network interface. The built-in resource usage monitor of the Debian 12 was used as the monitoring agent.

Test

t_{0}

. Below (Figure 7) is a graph of the dependence of the number of connections per second on time and a report (Figure 8) for the CPS test (

t_{0}

).

As can be read from Figure 7, the web server was able to handle 30,000 connections per second with a success rate of 92.57% (3,887,403 transactions out of 4,199,327 total attempts were made)—Figure 8. Such results mean that a limit has been reached on the number of connections that the web server is still able to make without a complete communication breakdown. These results are within expectations.

Test

t_{1}

. In the next step, the support of maximum throughput was tested (the throughput test).

As can be seen in Figure 9, almost the entire 1 Gb communication bandwidth was used on the test server, maintaining almost 100% communication correctness (1,123,294 transactions out of 1,123,473 total attempts (Figure 10). These results are within expectations.

4.2. Scenario 2

Scenario 2 demonstrates an example of the use of a model that considers all three layers, i.e.,

{L_{1}, L_{2}, L_{3}}

. This model of performance testing is useful in scenarios where the goal (

a_{8}

) is to assess the resilience of the system to operate under heavy load. In order to meet the objective, it was decided to run the following types of tests (elements of the

T

set):

$t_{0}$ —Described in scenario 1.
$t_{1}$ —Described in scenario 1.
$t_{2}$ —The concurrent connection (CC) test allows you to determine the maximum number of active simultaneous TCP sessions that the server is able to maintain. In other words, it provides an answer to the question of how many concurrently active sessions the server is able to maintain before it runs out of memory.

In addition, the following elements of the sets

G, C, N

were defined (only those elements of the sets that were not defined in scenario 1 will be described):

For $t_{0}$ : $g_{1}$ , $n_{0}$ —Routers, Switches, Firewalls, $c_{0}, c_{3}$ —Data Analysis Tools: Grafana,
For $t_{1}$ : $g_{1}$ , $g_{2}$ , $n_{0}$ , $c_{0}, c_{3}$
For $t_{2}$ : $g_{1}$ , $n_{0}$ , $c_{0}, c_{3}$

The test notation for the scenario 2 target is as follows:

\begin{array}{l} a_{8} = t_{0} \{L_{1} (g_{1}), L_{2} (n_{0}), L_{3} (c_{0}, c_{3})\} \\ + t_{1} \{L_{1} (g_{1}, g_{2}), L_{2} (n_{0}), L_{3} (c_{0}, c_{3})\} \\ + t_{2} \{L_{1} (g_{1}), L_{2} (n_{0}), L_{3} (c_{0}, c_{3})\} \end{array}

Including the network layer in the performance test model allows for a more realistic simulation of actual system operating conditions. Such a model takes into account the impact of network devices and network topology on server and application performance.

In the second scenario, the test object was again the web server, but this time a much more complex system, as it was a working e-learning platform. In order to best replicate real-life conditions, the tests were performed with a DNS server and an implemented SSL certificate. The web server functions are implemented by two virtual machines. The machines are implemented on a server hosted by the ESXi hypervisor. The first machine hosts the Moodle platform, while the second hosts the PostgreSQL database associated with Moodle. The pattern of the test topology for scenario 2 is shown in Figure 11.

As can be seen in Figure 11, the system is much more complex than the previous one, because the system chosen for testing is as close as possible to real systems. This time, network traffic is not generated directly to the server, but instead passes through an intermediate network layer (a real campus network loaded with normal user traffic).

Test

t_{0}

. Below (Figure 12) is a graph showing the dependence of the number of CPS connections on time, which was generated by the IxLoad software.

Test on the e-learning platform showed (see Figure 12) that the maximum number of connections per second the server is able to handle is around 450–500 connections per second (this value stabilizes after approximately 40 s). These results are well below expectations.

In addition, it can be seen in Figure 13 that after about 40 s there is a sharp increase in the number of concurrent connections, which is a sign of abnormal communication with the server.

As can be read from Figure 14, out of 280,665 http requests, a correct response was received for 10,486 attempts (2xx), while 270,177 requests were rejected with an error (5xx abort). This error usually means that the server’s resources have been exhausted. The CPU, RAM, and disk usage percentages on the virtual machines with Moodle and the database are shown below.

As can be seen in Figure 15, around the 40th second there is a 10% increase in RAM usage (approximately 13 GB) due to the increase in concurrent connections, but while the overall resource usage on the Moodle machine does not exceed 50%, for the database VM the CPU usage from around the 40th second is 100% and the disk usage around 75–80% (Figure 16).

Taking this into account, the

t_{0}

test was repeated, but this time limiting the number of generated connections to 400 per second to determine the maximum CPS value at which the server still maintained 100% correct communication. The limit of 400 connections per second was chosen based on the observation that the database VM’s CPU usage reached 100% and RAM usage increased significantly during the initial test. By reducing the connection rate, we aimed to identify the threshold at which the server could handle the load efficiently without reaching critical resource limits. Additionally, this approach allowed us to verify whether the performance issues consistently occur above 400 CPS or if this value was an anomaly.

At this point, it should be noted that the

t_{0}

test was performed with different call values for the

g_{1}

generator, which was defined as TCP/UDP Packet Generators. It is possible to define more precisely the generator used along with its call parameters, e.g.,

g_{6}

—TCP/UDP Packet Generators (without CPS limit),

g_{7}

—TCP/UDP Packet Generators (whit limitations 400 CPS) in which case the test description could be written as follows:

\begin{array}{l} a_{8} = t_{0} \{L_{1} (g_{6}, g_{7}), L_{2} (n_{0}), L_{3} (c_{0}, c_{3})\} \\ + t_{1} \{L_{1} (g_{1}, g_{2}), L_{2} (n_{0}), L_{3} (c_{0}, c_{3})\} \\ + t_{2} \{L_{1} (g_{1}), L_{2} (n_{0}), L_{3} (c_{0}, c_{3})\} \end{array}

The level of detail in describing a given testing process depends on the requirements defined by the project team.

Figure 17 shows a graph of the time dependence of the CPS for the repeated test—400 CPS i.e., as assumed before, while the number of concurrent connections (Figure 18), although lower than in the first test, is still relatively high, which means that the server load is extremely close to saturation of resources and, although 100% correct packet transmission was obtained, there are already delays (hence the relatively high number of concurrent connections). Admittedly, Admittedly, Figure 19 shows a 100% success rate for 250,007 HTTP requests, and an analysis of resource usage on the Moodle virtual machine (see Figure 20) shows that the server still has a significant amount of free computing resources, but Figure 21 also shows that CPU usage on the database VM is fluctuating around 95%. In summary, 400 connections per second is the extreme value that the server is able to handle without packet loss.

Test

t_{2}

. The CC test was performed as the next step.

As can be seen in Figure 22, the maximum number of concurrent connections obtained is approximately 2200. RAM consumption increased (Figure 23) by approximately 10 per cent (13 GB), but this is once again significantly lower than expected. Analyzing Figure 24, it can be seen that the CPU consumption on the database VM has again reached 100%.

Test

t_{1}

. In the last step, the support of maximum throughput was tested (the throughput test).

As can be seen from Figure 25, the maximum throughput is 112 Mbps. For this throughput, the correctness of communication is almost 100% (Figure 26). However, as in the previous tests, this value is lower than expected. Again, you can see in Figure 27, there are still a lot of free resources left on the virtual machine with Moodle, while on the virtual machine with the database the CPU resources have been fully saturated (Figure 28).

The experiments carried out in this chapter have shown how to use the proposed model to implement the tests. These tests were performed on two different web servers in order to evaluate their performance and identify potential problems associated with their operation. The first stage of testing was carried out on a simple web server, where the results were as expected. This server achieved relatively high CPS and throughput values, confirming that under laboratory conditions the tests prepared on the basis of the proposed framework work correctly and effectively measure server performance.

The second stage of testing included a system operating in production conditions. The tests showed a decrease in performance with increasing intensity of generated loads. CPS, CC, and throughput values were much lower than expected, which suggests the existence of performance problems at the configuration level or at the system level. Further analysis revealed that the main cause of these problems was the saturation of processor resources on the virtual machine on which the database is running. The use of the proposed model turned out to be crucial in the process of preparing the testing process and interpretation of results by different teams working on different components, which allowed for a relatively quick and precise diagnosis.

Table 1 compares the key characteristics of the proposed layered model with the classical approach to performance testing.

The comparative areas were selected on the basis of the literature analysis conducted in chapter 2 with particular emphasis on the areas of concern identified in that analysis ((1)–(5)). As can be read from the table above, the new approach is distinguished from the classic approach by several features. Firstly, in the new model presented, tests are organized in layers corresponding to different stages of preparation and execution, allowing a clear separation of tasks. In contrast, the classical approach is typically linear and handled by a single team, which limits efficiency. Additionally, we use a precise objectives-tests matrix that aligns specific goals with appropriate tests, ensuring better coverage and focus. The classical approach relies more on general practices and experience, which may result in less targeted testing. The layered model is also more modular, allowing easy adjustments to meet the unique needs of each system. The classical approach is less adaptable, making it harder to adjust during different project phases. By isolating tasks, the layered model enables different teams to work concurrently on separate layers, optimizing both time and cost. Moreover, the clear steps and structure of our method make it easier to replicate across various projects, while the classical approach relies heavily on team experience, which makes replication more challenging. In terms of cost efficiency, the layered specialization of the model and targeted task assignment help optimize resource usage, leading to potential cost savings. Of course, the initial stage related to model definition can be costly for simple tests, but in the long run it has a cost-reducing effect. However, the classical model holds an advantage in simplicity and faster deployment for straightforward projects. In cases where the system is less complex and testing requirements are standard, the classical approach may be more intuitive and easier to implement, as it relies on well-established and proven practices. This comparison highlights the strengths and limitations of both methods, providing context for why the new approach is better suited for complex and evolving IT environments. During our work, we also analyzed the possibility of using formal methodologies based on UML2 and SysML for test descriptions. They largely focus on detailed definition of functional tests also with aspects of detailed analysis of correlations between given components of a given subsystem, defining in great detail requirements for (including performance requirements and availability of individual components) specific parameters e.g., ([38,39,40]). From the point of view of the proposed model, these tests can be viewed as individual elements of a set

T

, while the approach proposed in the article can be treated as a metamodel. On the other hand, the direct use of systems modeling language models to design and describe a whole range of tests is hampered by the lack of direct support for performance test procedures and the need to apply a complex process of converting the notation used into formats that are understood by testers or can be used to prepare test tool configurations [41]. The requirements defined in the model-based systems engineering (MBSE) approach are generally functional in nature and using them to describe performance tests assumes the use of considerable expertise in the selection of testing methods and means [42]. In our model proposition, tests are defined including the indication of tools for their implementation and focuses on performance testing, which plays an increasingly important role in the implementation process of small and medium-sized systems in production conditions. The process of defining a system model for MBSE is complex and time-consuming and requires expertise, and is often reserved for complex, large IT systems specific to areas such as aerospace, automotive, defense, and telecommunication.

5. Conclusions

Performance testing of applications and hardware components used to build IT systems is crucial to their design, implementation, and subsequent operation. At present, there are no consistent methods that precisely distinguish between different types of tests and define how to conduct them. This contributes to difficulties in the process of test implementation and interpretation of their results. In this paper, based on the analysis of the literature, a number of problems related to the implementation of performance tests have been identified. Then, a model was proposed which, in four steps, allows to precisely define the goal of testing and the methods and means to achieve it. The proposed solution can be seen in terms of a design framework. In the course of implementing the developed model, engineering teams can independently define the individual elements of sets

G, N, C

and determine the topology used for test execution. The proposed approach makes it possible to minimize the impact of the problems defined in chapter two (1)–(5) on the efficiency and accuracy of the implemented tests, in particular in the areas of precise definition of test scenarios, their differentiability, and repeatability. In addition, dividing the testing process into layers correlated with the test preparation steps allows to separate quasi-independent areas which can be handled by specialized engineering teams or outsourced to external companies. Such an approach, on the one hand, allows to accelerate the process of performance testing execution and can reduce the cost of its implementation

The developed model was used to plan and conduct tests in a laboratory and real web application environment. As a result of the performance tests, bottlenecks in the system were identified, as well as key parameters that affect the performance of the system. In addition, the team of developers creating the solution in production was able to understand exactly how the tests were conducted and which parameters of the system and how they were tested, which resulted in faster development of fixes and optimization of system performance. In addition, using the proposed solution, a team of developers could quickly develop a new type of tests and, using the developed notation, communicate its description to the testing team.

During the research, it turned out that especially important from the point of view of the effectiveness of the use of the proposed solution is the correct definition of the elements of sets

A, T, G, N, C

. This requires the team to have experience and extensive knowledge of test execution and operation of the hardware and software components that create each layer

L_{1}, L_{2}, L_{3}

. Therefore, further work will develop a reference set of elements that can be used by less experienced teams in the process of test planning and execution.

Author Contributions

Conceptualization, M.B., M.Ć. and A.S.; methodology, M.B. and M.Ć.; software, M.B. and M.Ć.; validation, M.B. and M.Ć.; formal analysis, M.B. and M.Ć.; investigation, M.B. and M.Ć.; resources, M.B. and M.Ć.; writing—original draft preparation, M.B., M.Ć. and A.S.; writing—review and editing, M.B. and M.Ć.; visualization, M.B. and M.Ć.; supervision, M.B.; project administration, M.B. and M.Ć.; funding acquisition, M.B. and M.Ć. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Pargaonkar, S. A Comprehensive Review of Performance Testing Methodologies and Best Practices: Software Quality Engineering. Int. J. Sci. Res. IJSR 2023, 12, 2008–2014. [Google Scholar] [CrossRef]
Pal, D.; Triyason, T.; Vanijja, V. Asterisk Server Performance under Stress Test. In Proceedings of the 2017 IEEE 17th International Conference on Communication Technology (ICCT), Chengdu, China, 27–30 October 2017; pp. 1967–1971. [Google Scholar]
Sofian, H.; Saidi, R.M.; Yunos, R.; Ahmad, S.A. Analyzing Server Response Time Using Testing Power Web Stress Tool. In Proceedings of the 2010 International Conference on Science and Social Research (CSSR 2010), Kuala Lumpur, Malaysia, 5–7 December 2010; pp. 1120–1125. [Google Scholar]
Tiwari, V.; Upadhyay, S.; Goswami, J.K.; Agrawal, S. Analytical Evaluation of Web Performance Testing Tools: Apache JMeter and SoapUI. In Proceedings of the 2023 IEEE 12th International Conference on Communication Systems and Network Technologies (CSNT), Bhopal, India, 8 April 2023; pp. 519–523. [Google Scholar]
Guan, X.; Ma, Y.; Shao, Z.; Cao, W. Design and Implementation of Mobile Application Performance Test Scheme Based on LoadRunner. In Proceedings of the 2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC), Beijing, China, 12–14 July 2019; pp. 90–93. [Google Scholar]
Lenka, R.K.; Rani Dey, M.; Bhanse, P.; Barik, R.K. Performance and Load Testing: Tools and Challenges. In Proceedings of the 2018 International Conference on Recent Innovations in Electrical, Electronics & Communication Engineering (ICRIEECE), Bhubaneswar, India, 27–28 July 2018; pp. 2257–2261. [Google Scholar]
Peña-Ortiz, R.; Gil, J.A.; Sahuquillo, J.; Pont, A. Analyzing Web Server Performance under Dynamic User Workloads. Comput. Commun. 2013, 36, 386–395. [Google Scholar] [CrossRef]
De Pedro, L.; Rosu, A.M.; López De Vergara, J.E. Server Load Estimation by Burr Distribution Mixture Analysis of TCP SYN Response Time. J. Netw. Comput. Appl. 2023, 218, 103694. [Google Scholar] [CrossRef]
Helali Moghadam, M.; Saadatmand, M.; Borg, M.; Bohlin, M.; Lisper, B. Machine Learning to Guide Performance Testing: An Autonomous Test Framework. In Proceedings of the 2019 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Xi’an, China, 22–23 April 2019; pp. 164–167. [Google Scholar]
Zhu, K.; Fu, J.; Li, Y. Research the Performance Testing and Performance Improvement Strategy in Web Application. In Proceedings of the 2010 2nd International Conference on Education Technology and Computer, Shanghai, China, 22–24 June 2010; pp. V2-328–V2-332. [Google Scholar]
Javed, Z.; Mohmmadian, M. Model-Driven Method for Performance Testing. In Proceedings of the 2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 29–31 August 2018; pp. 147–155. [Google Scholar]
Hendayun, M.; Ginanjar, A.; Ihsan, Y. Analysis of Application Performance Testing Using Load Testing and Stress Testing Methods in Api Service. J. Sisfotek Glob. 2023, 13, 28. [Google Scholar] [CrossRef]
Draheim, D.; Grundy, J.; Hosking, J.; Lutteroth, C.; Weber, G. Realistic Load Testing of Web Applications. In Proceedings of the Conference on Software Maintenance and Reengineering (CSMR’06), Bari, Italy, 22–24 March 2006; pp. 11–70. [Google Scholar]
Latah, M.; Toker, L. Load and Stress Testing for SDN’s Northbound API. SN Appl. Sci. 2020, 2, 122. [Google Scholar] [CrossRef]
Ixia. Black Book: Application Delivery, 10th ed.; Ixia: Calabasas, CA, USA, 2014. [Google Scholar]
Ixia. IxLoad: Getting Started Guide, Release 9.00. A Keysight Business. Available online: https://downloads.ixiacom.com/library/user_guides/ixos/9.00/9.00_Rev_A/GettingStartedGuide/GettingStartedGuide.pdf (accessed on 6 October 2024).
Ixia. Server Load Balancing (SLB) Testing IxLoad; Ixia: Calabasas, CA, USA, 2006. [Google Scholar]
Ixia Online Help—WelcomeIxLohelp. Available online: https://downloads.ixiacom.com/library/user_guides/ixload/9.00/UserGuideHTML5/IxLoadUserGuide.htm (accessed on 24 July 2024).
Singureanu, C. Stress Testing—Types, Process, Tools, Checklists & More. Available online: https://www.zaptest.com (accessed on 6 October 2024).
Introduction to Stress Testing with an Example. Available online: https://codedamn.com/news/testing/introduction-to-stress-testing-with-an-example (accessed on 24 July 2024).
Oksnevad, J. Performance Testing vs. Stress Testing vs. Load Testing. Available online: https://www.loadview-testing.com/blog/performance-testing-vs-stress-testing-vs-load-testing/ (accessed on 24 July 2024).
Queue-it Load Testing vs. Stress Testing: Key Differences & Examples. Available online: https://queue-it.com/blog/load-vs-stress-testing/ (accessed on 24 July 2024).
Stress Test Software—Is It Safe? Why Should You Use It?|Testspring. Available online: https://testspring.pl/en/blog/stress-test-software/ (accessed on 24 July 2024).
What Is Stress Testing? Pushing Software Testing to Its Limits. Available online: https://www.bairesdev.com/blog/what-is-stress-testing/ (accessed on 24 July 2024).
Software Stress Testing. Available online: https://www.qable.io/blog/software-stress-testing (accessed on 24 July 2024).
Stress Testing: A Beginner’s Guide. Available online: https://grafana.com/blog/2024/01/30/stress-testing/ (accessed on 24 July 2024).
Cohen, N. Performance Testing vs. Load Testing vs. Stress Testing|BlazeMeter by Perforce. Available online: https://www.blazemeter.com/blog/performance-testing-vs-load-testing-vs-stress-testing (accessed on 24 July 2024).
Dijkstra, E.W. Notes on Structured Programming. Available online: https://www.cs.utexas.edu/~EWD/transcriptions/EWD02xx/EWD249/EWD249.html (accessed on 24 July 2024).
Jiang, Z.M.; Hassan, A.E. A Survey on Load Testing of Large-Scale Software Systems. IEEE Trans. Softw. Eng. 2015, 41, 1091–1118. [Google Scholar] [CrossRef]
What Is STRESS Testing in Software Testing? Available online: https://www.guru99.com/stress-testing-tutorial.html (accessed on 24 July 2024).
Load Testing vs Stress Testing|Key Differences. Available online: https://testsigma.com/blog/load-testing-vs-stress-testing/ (accessed on 24 July 2024).
14:00-17:00 ISO/IEC 25010:2023. Available online: https://www.iso.org/standard/78176.html (accessed on 5 June 2024).
RAP Steam. Available online: https://zsz.prz.edu.pl/en/news/agreement-on-the-rap-steam-project-consortium-222.html (accessed on 25 July 2024).
Bolanowski, M.; Paszkiewicz, A.; Zapala, P.; Żak, R. Stress Test of Network Devices with Maximum Traffic Load for Second and Third Layer of ISO/OSI Model. Pomiary Autom. Kontrola 2014, 60, 10. [Google Scholar]
Bolanowski, M.; Paszkiewicz, A. Performance Test of Network Devices. Ann. Univ. Mariae Curie-Sklodowska Sect. AI–Inform. 2015, 13, 29–36. [Google Scholar] [CrossRef]
Secgin, S. Seven Layers of ISO/OSI. In Evolution of Wireless Communication Ecosystems; IEEE: New York, NY, USA, 2023; pp. 41–50. ISBN 978-1-394-18232-9. [Google Scholar]
14:00-17:00 ISO/IEC 7498-1:1994. Available online: https://www.iso.org/standard/20269.html (accessed on 25 July 2024).
Viehl, A.; Schonwald, T.; Bringmann, O.; Rosenstiel, W. Formal Performance Analysis and Simulation of UML/SysML Models for ESL Design. In Proceedings of the Design Automation & Test in Europe Conference, Munich, Germany, 6–10 March 2006; pp. 1–6. [Google Scholar] [CrossRef]
Hecht, M.; Agena, S. A Reliability and Availability Model of a Kubernetes Cluster Using SysML. In Proceedings of the 2024 Annual Reliability and Maintainability Symposium (RAMS), Albuquerque, NM, USA, 22–25 January 2024; pp. 1–7. [Google Scholar] [CrossRef]
Cederbladh, J.; Gottschall, M.; Suryadevara, J.; Alekeish, K. Correlating Logical and Physical Models for Early Performance Validation—An Experience Report. In Proceedings of the 2024 IEEE International Systems Conference (SysCon), Montreal, QC, Canada, 15–18 April 2024; pp. 1–8. [Google Scholar] [CrossRef]
Mondal, S.; Jayaraman, P.P.; Haghighi, P.D.; Hassani, A.; Georgakopoulos, D. Situation-Aware IoT Data Generation towards Performance Evaluation of IoT Middleware Platforms. Sensors 2022, 23, 7. [Google Scholar] [CrossRef] [PubMed]
Bjorkman, E.A.; Sarkani, S.; Mazzuchi, T.A. Using model-based systems engineering as a framework for improving test and evaluation activities. Syst. Eng. 2013, 16, 346–362. [Google Scholar] [CrossRef]

Figure 1. Dependency matrix of objectives and test types.

Figure 2. Dependency matrix of objectives and test types for file server.

Figure 3. Topological pattern of performance test execution.

Figure 4. Examples of elements of sets G, N, C.

Figure 5. Steps in the test definition process.

Figure 6. Template of the test topology for scenario 1.

Figure 7. Graph of the dependence of the number of connections per second on time on the test web server.

Figure 8. Report on http client-server communication for test t₀ in Scenario 1.

Figure 9. Throughput vs. time graph on test web server for test t₁ in Scenario 1.

Figure 10. Report on http client-server communication for test t₁ in Scenario 1.

Figure 11. Template of the test topology for scenario 2.

Figure 12. Graph of the dependence of the CPS on time on e-learning platform.

Figure 13. Graph of the number of concurrent connections versus time on e-learning platform.

Figure 14. Report on http client-server communication for test t₀ in Scenario 2 (maximum possible CPS).

Figure 15. Graph of the dependence of the resource load on the Moodle virtual machine on time for test t₀ in Scenario 2 (maximum possible CPS).

Figure 16. Dependence graph of resource load of the virtual machine with the database on time for test t₀ in Scenario 2 (maximum possible CPS).

Figure 17. Graph of the dependence of the CPS on time.

Figure 18. Graph of the number of concurrent connections versus time for test t₀ in Scenario 2.

Figure 19. Report on http client-server communication for test t₀ in Scenario 2 (CPS limited to 400).

Figure 20. Graph of the dependence of the resource load on the Moodle virtual machine on time for test t₀ in Scenario 2 (CPS limited to 400).

Figure 21. Dependence graph of resource load of the virtual machine with the database on time for test t₀ in Scenario 2 (CPS limited to 400).

Figure 22. Graph of the number of concurrent connections versus time for test t₂ in Scenario 2.

Figure 23. Graph of the dependence of the resource load on the Moodle virtual machine on time for test t₂ in Scenario 2.

Figure 24. Dependence graph of resource load of the virtual machine with the database on time for test t₂ in Scenario 2.

Figure 25. Throughput vs. time graph on test web server for test t₁ in Scenario 2.

Figure 26. Report on http client-server communication for test t₁ in Scenario 2.

Figure 27. Graph of the dependence of the resource load on the Moodle virtual machine on time for test t₁ in Scenario 2.

Figure 28. Dependence graph of resource load of the virtual machine with the database on time for test t₁ in Scenario 2.

Table 1. Comparison of the Proposed Layered Model and the Classical Approach to Performance Testing.

Features	Classical Approach	Proposed Approach
(1) Standardization of the performance testing process	Lack of standardization	Possibility of full standardization
(2) Support for test preparation by inexperienced teams	Limited	Limited
(3) Support the preparation of objective, comparable and reproducible tests	Limited	Full support
(4) Support for a large variety of tests	Labor-intensive, lengthy process with limited repeatability in other projects	Full support
(5) A precise and detailed description of the performance testing process	Limited	Full process description
Flexibility and adaptability	Limited	High flexibility of the model
Simplicity of deployment	Easy implementation of simple models	Complicated at the initial stage of the model implementation process
The possibility of dividing the work in the preparation of a single test scenario	Very difficult	Regarding the dependency matrix, the possibility of horizontal and vertical division of labor among team members
Speed of implementation	Fast for simple tests	Time-consuming at the test definition stage, fast in subsequent stages
Result Analysis	Difficult, complex process of analyzing and interpreting the results	Extensive opportunities to compare and analyze results

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bolanowski, M.; Ćmil, M.; Starzec, A. New Model for Defining and Implementing Performance Tests. Future Internet 2024, 16, 366. https://doi.org/10.3390/fi16100366

AMA Style

Bolanowski M, Ćmil M, Starzec A. New Model for Defining and Implementing Performance Tests. Future Internet. 2024; 16(10):366. https://doi.org/10.3390/fi16100366

Chicago/Turabian Style

Bolanowski, Marek, Michał Ćmil, and Adrian Starzec. 2024. "New Model for Defining and Implementing Performance Tests" Future Internet 16, no. 10: 366. https://doi.org/10.3390/fi16100366

APA Style

Bolanowski, M., Ćmil, M., & Starzec, A. (2024). New Model for Defining and Implementing Performance Tests. Future Internet, 16(10), 366. https://doi.org/10.3390/fi16100366

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

New Model for Defining and Implementing Performance Tests

Abstract

1. Introduction

2. Literature Review

3. Problem Formulation and Proposed Solution

3.1. Dependency Matrix of Objectives and Test Types

3.2. Topological Pattern of Performance Test Execution

4. Experiment and Discussion

4.1. Scenario 1

4.2. Scenario 2

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI