Next Article in Journal
The Influence of Grain Size Gradation on Deformation and the Void Structure Evolution Mechanism of Broken Rock Mass in the Goaf
Previous Article in Journal
Application of Artificial Intelligence to Forecast Drought Index for the Mekong Delta
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Studying the Efficiency of the Apache Kafka System Using the Reduction Method, and Its Effectiveness in Terms of Reliability Metrics Subject to a Copula Approach

by
Elsayed E. Elshoubary
1 and
Taha Radwan
2,3,*
1
Basic Science Department, El-Ahram Institute for Engineering and Technology, Cairo 12573, Egypt
2
Department of Management Information Systems, College of Business and Economics, Qassim University, Buraydah 52571, Saudi Arabia
3
Department of Mathematics and Statistics, Faculty of Management Technology and Information Systems, Port Said University, Port Said 42521, Egypt
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(15), 6758; https://doi.org/10.3390/app14156758
Submission received: 23 June 2024 / Revised: 29 July 2024 / Accepted: 31 July 2024 / Published: 2 August 2024

Abstract

:
This research envisages a system composed of three subsystems connected in series. Each subsystem comprises three units connected in parallel. For the system to function, at least one unit per subsystem must remain operational. Unit failure is governed by an exponential distribution, while unit repair is governed by either a general distribution or a Gumbel–Hougaard family copula distribution. The primary goal of this research is to compare the overall performance of our system under these two different regimes for performing repairs. Laplace transforms and supplementary variable methods are employed in solving the system. Our metrics for evaluating system performance are the availability, reliability, mean time to failure, and cost. The second goal of this research is to showcase a strategy for reduction that enhances the overall efficiency and availability of our system.

1. Introduction

In the realm of distributed systems, effective and rapid communication between components is essential. Apache Kafka emerges as a favored solution for fulfilling this need thanks to its ability to provide speedy transmission, handle large volumes of data efficiently, and ensure system resilience in the face of failures. For any mission-critical application that engages in a cycle of ’read–process–write’ operations and requires the guarantee of processing each step exactly once, transactions become essential. In Apache Kafka, transactional capabilities are built upon the foundation of durability settings such as acknowledgments, producer idempotence, and consumer offset commit strategies, ensuring reliability and consistency in data processing. The architecture from a macroscopic perspective comprises entities known as Producers, which generate messages destined for an Apache Kafka cluster, and consumers, which retrieve and process these messages. This cluster, consisting of multiple brokers, serves as the backbone of the system, with each broker housing specific partitions corresponding to different topics. The orchestration of these brokers is overseen by the Apache Zookeeper server, tasked with the management and coordination of their activities. The advantages of Apache Kafka systems continue to be substantial.
For instance, Apache Kafka leverages the file system for message storage and caching. With advancements in disk performance in recent times, modern operating systems now incorporate techniques such as read-ahead and write-behind, enabling the prefetching of data in larger blocks and consolidating smaller logical writes into larger physical ones. Additionally, there has been a significant increase in the utilization of main memory for disk caching. In this context, a contemporary operating system efficiently allocates all available free memory to disk caching with minimal performance impact when the memory is subsequently reclaimed. Han Wu [1] evaluated the reliability of Kafka in different application scenarios and tested the impacts of various configuration parameters on the reliability of Kafka, including retry strategies and replications of partitions for fault tolerance. Han Wu et al. [2] studied the impact of batch size on the performance of Kafka, considering both spatial and temporal batch size.
Multi-component systems, particularly the k-out-of-n structure, are widely recognized and utilized by many industries and organizations. Researchers have focused on k-out-of-n systems, making significant contributions to the field of reliability. The k-out-of-n term typically refers to either k-out-of-n: good (G) or k-out-of-n: fail (F) systems. Generally, n is greater than k, both being positive integers. Therefore, a k-out-of-n system functions as long as at least k units out of n are operational, whereas a k-out-of-n system fails if at least k components out of n fail. Redundancy serves as a key technique in bolstering the reliability and availability of physical systems. Configurations such as k-out-of-n: F and k-out-of-n: G are commonly employed in the design of intricate mechanical, electrical, and electronic systems that are critical in nature. These configurations play a vital role in improving system reliability. Researchers have extensively examined the reliability attributes of various system types through mathematical modeling and have analyzed different measures using a variety of techniques.
Researchers often prioritize studying k-out-of-n-type systems due to their broader applicability compared to purely parallel or purely series systems. These configurations are commonly encountered in practical scenarios. In [3], Yusuf and colleagues conducted a study examining the performance of a complex computer system consisting of three interconnected subsystems operating sequentially and evaluated the effectiveness of a Copula repair policy in maintaining system functionality. Yusuf and Hussaini [4] conducted an analysis on a three-unit redundant system examining three different types of failures, including a comparative examination of various failure scenarios. Ram and Kumar [5] examined how well a system performs when subjected to a 1-out-of-2: G scheme with flawless reworking, considering various failure types and two distinct repair methods utilizing copula metric for analysis. Kumar et al. [6] studied the profitability and reliability metrics of a warm standby system consisting of non-identical units with a single server operating in both normal and abnormal environments. They utilized semi-Markov processes and regenerative point techniques while implementing a first-come, first-served (FCFS) policy. Taj et al. [7] investigated a cable plant subsystem that prioritizes repair over preventive maintenance. They utilized semi-Markov processes and regenerating point techniques in their analysis.
Gokhan Gokdere et al. [8] employed the transition matrix approach to analyze a system that evolves over time. Amirian et al. [9] developed a method for solving problems that follow a sequential linear pattern, where the sequence starts with k 1 , then progresses to k 2 and so on, until k n r + 1 . They also investigated circular out-of-r-from-n ( k 1 , k 2 , , k n ) problems, which involve selecting F systems out of a total of n, with r being the number of systems chosen in each selection. They collected numerical data on the reliability of different combinations of k, r, and n values using the MATLAB program, version 2021. Elshoubary [10] undertook a thorough investigation into the dynamics of collaboration within the framework of Software-Defined Networking (SDN). This study explored different dimensions such as the dependability of SDN, the typical duration until failure (MTTF), a comparison of costs and benefits, the evaluation of availability using copula distribution, and an analysis of how implementing reduction strategies influenced overall system performance. Elshoubary et al. [11] examined how a multifaceted system was constructed, comprising three separate subsystems. Ram et al. [5] and their colleagues utilized copula repair techniques to analyze the system’s performance. Temraz [12] conducted research investigating the accessibility and dependability of a parallel system when subjected to imperfect repair and replacement methods. The study also delved into analyzing the associated costs and optimizing them. Maihulla et al. [13] conducted a study examining the reliability, availability, and consistency of solar systems. Singh et al. [14] evaluated the performance of a repairable system that consisted of two subsystems connected in series using a 2-out-of-4: G approach. The research conducted by Singh et al. [15] delved into and analyzed numerous computer networking systems and their associated components. Singh and colleagues [16] conducted an assessment to determine the effectiveness of a sophisticated repairable system. This system consisted of two subsystems operating simultaneously connected by a single faulty switch. Lado et al. [17] conducted a study examining the cost assessment of a complex repairable system consisting of two subsystems connected in series. Anas and Yusuf [18] investigated the reliability metrics to determine the strength of a six-subsystem reverse osmosis system using the Gumbel–Hougaard family copula. The study provides graphical representations of the results over time and failure rates, offering insights to improve water treatment systems in harsh or contaminated environments. Sengar and Singh [19] analyzed the reliability of an engine assembly system with an inspection facility, comprising nine main units. The system can fail due to individual unit failures or catastrophic failure. Using the supplementary variable method and Gumble–Hougaard copula, various reliability metrics are obtained. Wang et al. [20] studied a reduced-order observer-based reset output feedback controller designed to address the output synchronization challenge in linear heterogeneous clustered multi-agent system (CMAS). Liang et al. [21] introduced an optimal controller designed to minimize the expected value of a function influenced by random disturbances. Poonia and Singh [22] analyzed the system reliability of a computer network that features the delayed reporting of non-fatal faults. The network is structured into four subsystems: a router, a gateway switch, two identical load balances, and a group of n similar applications or web servers operating under a k-out-of-n: G policy with a standby arrangement, all connected in series.
In this paper, a novel model is introduced for an Apache Kafka system, which is composed of three interconnected subsystems arranged in series: a producer group, an Apache Kafka cluster, and a consumer group (see Figure 1). Each subsystem consists of three identical units operating in parallel, employing a 1-out-of-3 redundancy strategy for robust performance. The units within subsystem 2 are linked to ZooKeeper to ensure the smooth functioning of the overall system. The failure rates of the units in each subsystem are assumed to be constant and follow exponential distributions. Additionally, we have implemented two repair distribution models: a general distribution and a Gumbel–Hougaard copula distribution. By utilizing transition diagrams and a system of partial differential equations, we have conducted an analysis to determine various reliability metrics such as reliability, availability, mean time to failure (MTTF), sensitivity analysis, and profit function. These insights can be valuable for managerial staff in the industry when making decisions.
The paper is structured into several sections. Section 1 examines the existing literature and research findings. Section 2 provides an overview of the system, including its design assumptions and notation. Section 3 illustrates the system’s behavior through transition diagrams under varying failure and repair scenarios along with mathematical modeling using differential equations. Section 4 presents analytical findings on system performance metrics such as availability, reliability, mean time to failure (MTTF), sensitivity analysis, and expected profit margins demonstrated through specific cases. Finally, Section 5 offers concluding remarks on the study.

2. Model Description and Notations

2.1. State Description

Table 1 describes the different potential transient states of the system. States labeled S 0 , S 1 , S 2 , S 3 , S 4 , S 5 , and S 6 represent active states, while S 7 , S 8 , S 9 , and S 10 denote inactive states within the system. In Figure 2’s transition diagram, S 0 represents the initial fully functional state. Subsequent states, such as S 1 , S 3 , S 4 , S 5 , and S 6 , indicate varying degrees of partial failure or degradation within the system. States S 7 , S 8 , S 9 , and S 10 represent complete failure states. When a unit within subsystem 1, 2, or 3 fails, the system transitions to partially failed states S 1 , S 3 , or S 5 respectively. If two units fail within any subsystem, the system progresses to major partially failed states, namely S 2 , S 4 , or S 6 . Complete failure states S 7 , S 8 , and S 9 occur when all three units within their respective subsystems fail simultaneously. State S 10 represents complete failure specifically caused by a ZooKeeper failure.

2.2. Assumptions

In consideration of this article’s perspective, this paper operates under the following assumptions:
  • At the beginning, all subsystems operate flawlessly.
  • The system functions efficiently as long as at least one unit within each subsystem is in good working condition following the 1-out-of-3: G policy.
  • The system will be inoperative if three units fail from each subsystem. Additionally, if the ZooKeeper unit within subsystem 2 fails, it will also lead to system inoperability.
  • The rate at which failures occur remains consistent over time, and these rates are distributed according to an exponential distribution.
  • The process of repairing states that have partially failed is achieved through general distribution, while the complete restoration of failed states is accomplished through the application of the Gumbel–Hougard family copula distribution.
  • A repairman is employed on a full-time basis and has the capability to fix units that have experienced partial or complete failure.
  • The repaired system functions with the same reliability and efficiency as a new one.

2.3. Notations

t/s: Time scale/Laplace transform.
σ 1 / σ 2 / σ 3 : Failure rate of every unit in subsystem 1, 2 and 3.
σ 4 : The failure rate of ZooKeeper unit in subsystem-2.
ζ ( x ) : Repair rate of sybsystem.
P i ( t ) : The probability of transitioning or changing from one condition, situation, or status to another within a given context.
P * ( s ) : The Laplace transform can be used to examine the likelihood of changing states inside a system over time.
P i ( x , t ) : The function P i ( x , t ) represents the likelihood of the system being in a specific state S i at time t, given that the system is undergoing repair characterized by repair variables x.
V 1 / V 2 : Revenue/The cost of operating system.
E p ( t ) : Profit expected for time period [ 0 , t ) .
ν ( x ) : The Gumbel–Hougaard family of copula provides an equation for the joint probability of transitioning from a failed state S j , where j = 7 , . . . , 10 , to a successful state S 0 :
ν ( x ) = C ϵ v 1 ( x ) , v 2 ( x ) = e x p [ x ϵ + { l o g α ( x ) } ϵ ] 1 ϵ , 1 < ϵ < , where v 1 = ζ ( x ) and v 2 = e x .

3. Mathematical Model and Solution

From Figure 2, a set of equations have been derived based on the observed data or patterns illustrated in that figure. However, the system of first-order partial differential equations was generated via the Markov process:
d d t + 3 σ 1 + 3 σ 2 + 3 σ 3 + σ 4 P 0 ( t ) = 0 ζ ( x ) P 1 ( x , t ) d x + 0 ζ ( y ) P 3 ( y , t ) d y + 0 ζ ( z ) P 5 ( z , t ) d z + 0 ν ( x ) P 10 ( x , t ) d x + 0 ν ( y ) P 11 ( y , t ) d y + 0 ν ( z ) P 12 ( z , t ) d z + 0 ν ( m ) P 13 ( m , t ) d m
t + x + 2 σ 1 + σ 4 + ζ ( x ) P 1 ( x , t ) = 0
t + x + σ 1 + σ 4 + ζ ( x ) P 2 ( x , t ) = 0
t + y + 2 σ 2 + σ 4 + ζ ( y ) P 3 ( y , t ) = 0
t + y + σ 2 + σ 4 + ζ ( y ) P 4 ( y , t ) = 0
t + z + 2 σ 3 + σ 4 + ζ ( z ) P 5 ( z , t ) = 0
t + z + σ 3 + σ 4 + ζ ( z ) P 6 ( z , t ) = 0
t + x + ν ( x ) P 7 ( x , t ) = 0
t + y + ν ( y ) P 8 ( y , t ) = 0
t + z + ν ( z ) P 9 ( z , t ) = 0
t + m + ν ( m ) P 10 ( m , t ) = 0
Boundary conditions:
P 1 ( 0 , t ) = 3 σ 1 P 0 ( t ) + 0 ζ ( x ) P 2 ( x , t ) d x
P 2 ( 0 , t ) = 2 σ 1 P 1 ( 0 , t )
P 3 ( 0 , t ) = 3 σ 2 P 0 ( t ) + 0 ζ ( y ) P 4 ( y , t ) d y
P 4 ( 0 , t ) = 2 σ 2 P 3 ( 0 , t )
P 5 ( 0 , t ) = 3 σ 3 P 0 ( t ) + 0 ζ ( z ) P 6 ( z , t ) d z
P 6 ( 0 , t ) = 2 σ 3 P 5 ( 0 , t )
P 7 ( 0 , t ) = σ 1 P 2 ( 0 , t )
P 8 ( 0 , t ) = σ 2 P 4 ( 0 , t )
P 9 ( 0 , t ) = σ 3 P 6 ( 0 , t )
P 10 ( 0 , t ) = σ 4 P 0 ( t ) + P i ( 0 , t ) , i = 1 , 2 , . . . , 6
Initial condition:
P 0 ( 0 ) = 1 , P w ( x , 0 ) = 0 , w = 1 , 2 , . . . , 10
By applying the Laplace transformation to Equations (1) through (21), one can obtain the following:
s + 3 σ 1 + 3 σ 2 + 3 σ 3 + σ 4 P 0 * ( s ) = 1 + 0 ζ ( x ) P 1 * ( x , s ) d x + 0 ζ ( y ) P 3 * ( y , s ) d y + 0 ζ ( z ) P 5 * ( z , s ) d z + 0 ν ( x ) P 10 * ( x , s ) d x + 0 ν ( y ) P 11 * ( y , s ) d y + 0 ν ( z ) P 12 * ( z , s ) d z + 0 ν ( m ) P 13 * ( m , s ) d m
s + x + 2 σ 1 + σ 4 + ζ ( x ) P 1 * ( x , s ) = 0
s + x + σ 1 + σ 4 + ζ ( x ) P 2 * ( x , s ) = 0
s + y + 2 σ 2 + σ 4 + ζ ( y ) P 3 * ( y , s ) = 0
s + y + σ 2 + σ 4 + ζ ( y ) P 4 * ( y , s ) = 0
s + z + 2 σ 3 + σ 4 + ζ ( z ) P 5 * ( z , s ) = 0
s + z + σ 3 + σ 4 + ζ ( z ) P 6 * ( z , s ) = 0
s + x + ν ( x ) P 7 * ( x , s ) = 0
s + y + ν ( y ) P 8 * ( y , s ) = 0
s + z + ν ( z ) P 9 * ( z , s ) = 0
s + m + ν ( m ) P 10 * ( m , s ) = 0
Laplace of the boundary condition:
P 1 * ( 0 , s ) = 3 σ 1 P 0 * ( s ) + 0 ζ ( x ) P 2 * ( x , s ) d x
P 2 * ( 0 , s ) = 2 σ 1 P 1 * ( 0 , s )
P 3 * ( 0 , s ) = 3 σ 2 P 0 * ( s ) + 0 ζ ( y ) P 4 * ( y , s ) d y
P 4 * ( 0 , s ) = 2 σ 2 P 3 * ( 0 , s )
P 5 * ( 0 , s ) = 3 σ 3 P 0 * ( s ) + 0 ζ ( z ) P 6 * ( z , s ) d z
P 6 * ( 0 , s ) = 2 σ 3 P 5 * ( 0 , s )
P 7 * ( 0 , s ) = σ 1 P 2 * ( 0 , s )
P 8 * ( 0 , s ) = σ 2 P 4 * ( 0 , s )
P 9 * ( 0 , s ) = σ 3 P 6 * ( 0 , s )
P 10 * ( 0 , s ) = σ 4 P 0 * ( s ) + P i * ( 0 , s ) , i = 1 , 2 , . . . , 6
By solving Equations (23)–(33) using the boundary conditions specified in (34)–(43) and applying Equation (22), one can obtain the following results:
P 0 * ( s ) = 1 G ( s )
P 1 * ( s ) = 3 σ 1 ( 1 S ζ * ( s + 2 σ 1 + σ 4 ) ) G ( s ) ( s + 2 σ 1 + σ 4 ) ( 1 2 σ 1 ( S ζ * ( s + σ 1 + σ 4 ) ) )
P 2 * ( s ) = 6 σ 1 2 ( 1 S ζ * ( s + σ 1 + σ 4 ) ) G ( s ) ( s + σ 1 + σ 4 ) ( 1 2 σ 1 ( S ζ * ( s + σ 1 + σ 4 ) ) )
P 3 * ( s ) = 3 σ 2 ( 1 S ζ * ( s + 2 σ 2 + σ 4 ) ) G ( s ) ( s + 2 σ 2 + σ 4 ) ( 1 2 σ 2 ( S ζ * ( s + σ 2 + σ 4 ) ) )
P 4 * ( s ) = 6 σ 2 2 ( 1 S ζ * ( s + σ 2 + σ 4 ) ) G ( s ) ( s + σ 2 + σ 4 ) ( 1 2 σ 2 ( S ζ * ( s + σ 2 + σ 4 ) ) )
P 5 * ( s ) = 3 σ 3 ( 1 S ζ * ( s + 2 σ 3 + σ 4 ) ) G ( s ) ( s + 2 σ 3 + σ 4 ) ( 1 2 σ 3 ( S ζ * ( s + σ 3 + σ 4 ) ) )
P 6 * ( s ) = 6 σ 3 2 ( 1 S ζ * ( s + σ 3 + σ 4 ) ) G ( s ) ( s + σ 3 + σ 4 ) ( 1 2 σ 3 ( S ζ * ( s + σ 3 + σ 4 ) ) )
P 7 * ( s ) = 6 σ 1 3 ( 1 S ν * ( s ) ) G ( s ) ( s ( 1 2 σ 1 ( S ζ * ( s + σ 1 + σ 4 ) ) ) )
P 8 * ( s ) = 6 σ 2 3 ( 1 S ν * ( s ) ) G ( s ) ( s ( 1 2 σ 2 ( S ζ * ( s + σ 2 + σ 4 ) ) ) )
P 9 * ( s ) = 6 σ 3 3 ( 1 S ν * ( s ) ) G ( s ) ( s ( 1 2 σ 3 ( S ζ * ( s + σ 3 + σ 4 ) ) ) )
P 10 * ( s ) = σ 4 ( 1 S ν * ( s ) ) G ( s ) s [ 1 + ( 3 σ 1 ) ( 1 + 2 σ 1 ) 1 2 σ 1 ( S ζ * ( s + σ 1 + σ 4 ) ) + ( 3 σ 2 ) ( 1 + 2 σ 2 ) 1 2 σ 2 ( S ζ * ( s + σ 2 + σ 4 ) ) + ( 3 σ 3 ) ( 1 + 2 σ 3 ) 1 2 σ 3 ( S ζ * ( s + σ 3 + σ 4 ) ) ]
where
G ( s ) = s [ 1 + σ 4 ( 1 S ν * ( s ) ) s + [ 3 σ 1 1 2 σ 1 ( S ζ * ( s + σ 1 + σ 4 ) ) ] [ [ ( 1 S ζ * ( s + 2 σ 1 + σ 4 ) ) ( s + 2 σ 1 + σ 4 ) + σ 4 ( 1 S ν * ( s ) ) s ] + 2 σ 1 [ ( 1 S ζ * ( s + σ 1 + σ 4 ) ) ( s + σ 1 + σ 4 ) + σ 4 ( 1 S ν * ( s ) ) s ] ] + [ 3 σ 2 1 2 σ 2 ( S ζ * ( s + σ 2 + σ 4 ) ) ] [ [ ( 1 S ζ * ( s + 2 σ 2 + σ 4 ) ) ( s + 2 σ 2 + σ 4 ) + σ 4 ( 1 S ν * ( s ) ) s ] + 2 σ 2 [ ( 1 S ζ * ( s + σ 2 + σ 4 ) ) ( s + σ 2 + σ 4 ) + σ 4 ( 1 S ν * ( s ) ) s ] ] + [ 3 σ 3 1 2 σ 3 ( S ζ * ( s + σ 3 + σ 4 ) ) ] [ [ ( 1 S ζ * ( s + 2 σ 3 + σ 4 ) ) ( s + 2 σ 3 + σ 4 ) + σ 4 ( 1 S ν * ( s ) ) s ] + 2 σ 3 [ ( 1 S ζ * ( s + σ 3 + σ 4 ) ) ( s + σ 3 + σ 4 ) + σ 4 ( 1 S ν * ( s ) ) s ] ] ]
The following Laplace transforms represent the probabilities of system failure and successful functioning at any given time:
P * u p ( s ) = P 0 * ( s ) + P 1 * ( s ) + P 2 * ( s ) + P 3 * ( s ) + P 4 * ( s ) + P 5 * ( s ) + P 6 * ( s )
P * d o w n ( s ) = 1 P * u p ( s )

4. Model Analysis

4.1. System Availability

Case I: Availability of the system with copula repair.
Rates of repair are set by the Gumbel–Hougaard copula distribution:
S * ν ( s ) = e x p [ x ϵ + { l o g ζ ( x ) } ϵ ] 1 ϵ s + e x p [ x ϵ + { l o g ζ ( x ) } ϵ ] 1 ϵ , S * ζ ( s ) = ζ s + ζ
For σ 1 = 0.01 , σ 2 = 0.02 , σ 3 = 0.03 , σ 4 = 0.04 , ϵ = 1 , ζ = 1 , x = 1 , and ζ ( x ) = ζ ( y ) = 1 in Equation (56) and when applying the inverse Laplace transform using Mathematica, the availability is as follows:
P * u p ( t ) = 0.98531 + 0.0190956 e 2.77139 t 0.00422647 e 1.22903 t 0.000044883 e 1.09014 t 0.0000623626 e 1.06847 t 0.0000639626 e 1.05263 t 4.3467 ( 10 6 ) e 1.02726 t 3.67201 ( 10 6 ) e 1.0153 t 2.28297 ( 10 13 ) e 0.1 t + 2.69213 ( 10 12 ) e 0.08 t 4.39599 ( 10 12 ) e 0.07 t + 2.06294 ( 10 12 ) e 0.06 t 2.4632 ( 10 13 ) e 0.05 t
Case II: Availability of the system with general repair.
Rates of repair are set by two general distributions:
S * ν ( s ) = ν s + ν , S * ζ ( s ) = ζ s + ζ
Assigning specific numerical values to the rates of failure and repair as in Case I using Equation (56), one can obtain the availability as follows:
P * u p ( t ) = 0.961052 + 0.0305836 e 1.27165 t + 0.000749247 e 1.09104 t + 0.00140467 e 1.0701 t + 0.001385 e 1.05447 t + 0.000270381 e 1.02754 t + 0.000549342 e 1.01579 t + 0.00401258 e 1.00535 t 3.68768 ( 10 13 ) e 0.1 t + 1.6794 ( 10 12 ) e 0.08 t 2.69796 ( 10 12 ) e 0.07 t + 1.28643 ( 10 12 ) e 0.06 t 2.84601 ( 10 13 ) e 0.05 t
Case III: Reduction method
The overall availability of the system is enhanced By applying a factor ρ , where 0 < ρ < 1 , to the failure rates of the individual components of the system, this scaling of the parameter from Case I produces higher levels of system availability compared to the original configuration.
P u p ( t ) = 0.991204 + 0.0103481 e 2.74674 t 0.00148593 e 1.13728 t 0.0000165956 e 1.05404 t 0.0000225276 e 1.041 t 0.0000244228 e 1.03143 t 1.08928 ( 10 6 ) e 1.01633 t 1.80207 ( 10 6 ) e 1.00916 t + 5.1252 ( 10 13 ) e 0.06 t 2.1356 ( 10 12 ) e 0.048 t + 1.75621 ( 10 12 ) e 0.042 t 1.28452 ( 10 12 ) e 0.036 t + 1.60821 ( 10 13 ) e 0.03 t
By incrementing the time variable t in Equations (58)–(60) from t = 0 to t = 10 , one can generate Table 2 and the corresponding Figure 3. These results present the evolution of the availability in each of the three distinct scenarios introduced above.

4.2. System Reliability Analysis

The reliability of an unrepairable system is quantified through probability-based measurements. By setting all rates of repair to zero in Equation (56), the inverse Laplace transform gives the probability of the system functioning without any interventions or repairs.
Case I: one can obtain an expression for the reliability of the system subject to the failure rates values outlined in Section 4.1. Proceeding in the same fashion as for the availability and considering the same cases, in essence, the exploration is centered on an alternative perspective regarding the reliability of the system.
R ( t ) = 0.420601 e 0.22 t + 0.75 e 0.1 t + 0.428571 e 0.08 t + 0.036 e 0.07 t + 0.2025 e 0.06 t + 0.00352941 e 0.05 t
Case II: Once the reduction approach is implemented, it is anticipated that the failure rates of the system’s units will decrease by a factor of ρ , in (0,1). This is confirmed by substituting the parameter values from Section 4.1 into Equation (56) and performing the inverse Laplace transform.
R ( t ) = 0.398789 e 0.132 t + 0.75 e 0.06 t + 0.428571 e 0.048 t + 0.0216 e 0.042 t + 0.1965 e 0.036 t + 0.00211765 e 0.03 t
The values of reliability, represented by R ( t ) , may vary when using Equations (61) and (62), as evidenced in Table 3 and the corresponding Figure 4.

4.3. Mean Time to Failure (MTTF) Analysis

Setting the total number of fixes to zero and taking the limit as s approaches zero in Equation (56) enables us to calculate the system’s mean time to failure (MTTF).
M T T F = lim s 0 P * u p ( s )
The formula for M T T F can be derived by defining the limit as s goes to zero and treating all repairs in Equation (56) as zero.
M T T F = 1 3 ( σ 1 + σ 2 + σ 3 ) + σ 4 [ 1 + 3 σ 1 1 2 σ 1 + σ 4 + 2 σ 1 σ 1 + σ 4 + 3 σ 2 1 2 σ 2 + σ 4 + 2 σ 2 σ 2 + σ 4 + 3 σ 3 1 2 σ 3 + σ 4 + 2 σ 3 σ 3 + σ 4 ) ]
One can examine the impact of various parameter values ( σ 1 = 0.01 , σ 2 = 0.02 , σ 3 = 0.03 , σ 4 = 0.04 ) on the mean time to failure (MTTF) by systematically changing each parameter individually. Specifically, we are varying σ 1 , σ 2 , σ 3 , and σ 4 across a range of values (0.01 to 0.10 in increments of 0.01) in Equation (63). The resulting variations in MTTF, in relation to the corresponding failure rates, are documented in Table 4 and visualized in Figure 5.

4.4. Sensitivity Analysis of the MTTF

The partial derivative of the MTTF with respect to failure rates provide information on the sensitivity of the system. By taking partial derivatives of Equation (63) and substituting σ 1 = 0.01 , σ 2 = 0.02 , σ 3 = 0.03 , and σ 4 = 0.04 , the resulting sensitivity values are presented in Table 5 and illustrated in Figure 6.

4.5. Cost Analysis

The anticipated advantage of the system over the time interval [0, t) under the condition that the service facility is consistently available can be assessed by employing the following calculation method:
E p ( t ) = V 1 0 t P u p ( t ) d t V 2 t
Case I: Considering V 1 to represent the revenue generated and V 2 as the service cost per unit time, and using the same values for the parameters in Equations (58) and (64) as those used in Case I of Section 4.1, the outcome will be identical to what was obtained in Equation (65). The anticipated profit outcomes are accessible through the data presented in Table 6, along with a visual representation depicted in the corresponding Figure 7.
E p ( t ) = ( 0.00328323 0.00689025 e 2.77139 t + 0.00343887 e 1.22903 t + 0.0000411719 e 1.09014 t + 0.0000583665 e 1.06847 t + 0.0000607646 e 1.05263 t + 4.23136 ( 10 6 ) e 1.02726 t + 3.61666 ( 10 6 ) e 1.0153 t + 2.28297 ( 10 12 ) e 0.1 t 3.36516 ( 10 11 ) e 0.08 t + 6.27998 ( 10 11 ) e 0.07 t 3.43823 ( 10 11 ) e 0.06 t + 4.92639 ( 10 12 ) e 0.05 t + 0.98531 t ) V 1 V 2 t
Case II: Cost analysis with general distribution:
Applying the same set of conditions outlined in Section 4.1, Case I, along with Equations (59) and (64), one can derive the following Equation (66), you can find the anticipated profit outcomes in Table 7, and a visual representation of the data is presented in Figure 8.
E p ( t ) = ( 0.0321583 0.0240503 e 1.27165 t 0.00068673 e 1.09104 t 0.00131265 e 1.0701 t 0.00131345 e 1.05447 t 0.000263134 e 1.02754 t 0.000540803 e 1.01579 t 0.00399124 e 1.00535 t + 3.68768 ( 10 12 ) e 0.1 t 2.09926 ( 10 11 ) e 0.08 t + 3.85423 ( 10 11 ) e 0.07 t 2.14406 ( 10 11 ) e 0.06 t + 5.69202 ( 10 12 ) e 0.05 t + 0.961052 t ) V 1 V 2 t
The anticipated profit is displayed in Table 7 and Figure 8 for the following values of V 1 = 1 ; V 2 = 0.1 to 0.5 ; and across a time span of 0 to 10 units.

5. Conclusions

This paper examines the performance measures of a repairable system composed of three subsystems arranged in series, with each subsystem having three replica units in a parallel configuration. The system operates under a 1-out-of-3: G strategy. The study, supported by supplementary variables, concludes that employing copula repair as a repair policy is superior and more effective. The analysis presented in this paper allows for the identification of various decisions that can be made. These decisions are derived from the insights gained through the analysis conducted. Table 2 and Figure 3 present the system’s availability variations under copula repair and general repair for different failure rates. The data clearly illustrate that, as time progresses, the system’s availability decreases across all scenarios. Notably, the availability is comparatively better when copula repair is employed. When employing the reduction strategy in Case III, we decrease the unit failure rates of the system by a factor ρ (where 0 < ρ < 1 ). This strategic reduction is aimed at enhancing the system’s availability. The evaluation of the system’s reliability mirrors the assessment of its availability in both scenarios. As time progresses, the reliability of the initial system diminishes when intentionally manipulating failure rates in scenarios I and II. Introducing a reduction strategy involving a factor ρ leads to a decrease in the system’s failure rate per unit, resulting in an improvement in its reliability. Table 3 and Figure 4 illustrate that implementing this reduction technique contributes to enhancing the original system’s reliability, particularly when comparing Cases I and II. Figure 5 illustrates how the mean time to failure (MTTF) of the system changes in response to variations in σ 1 , σ 2 , σ 3 , and σ 4 , while keeping the other parameters constant. The observed variation in MTTF for different failure rates suggests that a gradual increase in the value of a parameter leads to a decrease in the MTTF of the repairable system. Table 5 and the corresponding Figure 6 indicate that an increase in failure rate values correlates positively with enhanced sensitivity. It is crucial to highlight that the system becomes more responsive as failure rates rise. Figure 7 and Figure 8 show that the anticipated profit rises over time, with the expected profit attributed to copula repair exceeding that of general repair. This strongly suggests that copula repair not only enhances system availability, but also leads to higher profits from the operation of repairable systems. Finally, it is noticeable that, as service costs rise, there is a corresponding decrease in profit. The findings suggest that reliability modeling can effectively evaluate and improve the robustness, efficiency, and performance of an Apache Kafka system. To alleviate congestion in the repair facility and resolve issues using supplemental variable techniques, this research could be expanded to encompass a system with multiple subsystems and diverse repair machines. This work holds promise for benefiting industrial applications and addressing human needs.

Author Contributions

Conceptualization, E.E.E. and T.R.; Methodology, E.E.E. and T.R.; Software, E.E.E. and T.R.; Validation, E.E.E. and T.R.; Formal analysis, E.E.E. and T.R.; Investigation, E.E.E. and T.R.; Resources, E.E.E. and T.R.; Data curation, E.E.E. and T.R.; Writing—original draft, E.E.E. and T.R.; Writing—review & editing, E.E.E. and T.R.; Visualization, E.E.E. and T.R.; Supervision, E.E.E. and T.R.; Project administration, E.E.E. and T.R.; Funding acquisition, T.R. All authors have read and agreed to the published version of the manuscript.

Funding

The researchers would like to thank the Deanship of Graduate Studies and Scientific Research at Qassim University for financial support (QU-APC-2024-9/1).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Han, W. Reliability evaluation of the Apache Kafka streaming system. In Proceedings of the 2019 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), Berlin, Germany, 27–30 October 2019. [Google Scholar] [CrossRef]
  2. Han, W.; Zhihao, S.; Peng, G.; Wolter, K. A Reactive batching strategy of Apache Kafka for reliable stream processing in real-time. In Proceedings of the 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), Coimbra, Portugal, 12–15 October 2020. [Google Scholar] [CrossRef]
  3. Yusuf, I.; Lado, A.I.; Singh, V.V.; Ali, U.A.; Sufi, N.A. Performance analysis of multi computer system consisting of three subsystems in series configuration using copula repair policy. SN Comput. Sci. 2020, 1, 241. [Google Scholar] [CrossRef]
  4. Yusuf, I.; Hussaini, N. A comparative analysis of three unit redundant systems with three types of failures. Arab. J. Sci. Eng. 2014, 4, 3337–3349. [Google Scholar] [CrossRef]
  5. Ram, M.; Kumar, A. Performability analysis of a system under 1-out-of-2: G scheme with perfect reworking. J. Braz. Soc. Mech. Sci. Eng. 2015, 37, 1029–1038. [Google Scholar] [CrossRef]
  6. Kumar, A.; Panwar, D.; Malik, S.C. Profit analysis of a warm standby non- identical unit system with single server performing in normal and abnormal environment. Life Cycle Reliab. Saf. Eng. 2019, 8, 219–226. [Google Scholar] [CrossRef]
  7. Taj, S.Z.; Rizman, S.M.; Alkali, B.M. Probabilistic modeling and analysis of a cable plant subsystem with priority to repair over preventive maintenance. Insp. Manag. J. Math. 2017, 3, 13–21. [Google Scholar]
  8. Gokdere, G.; Keung, H.; Tony, N. Time-dependent reliability analysis for repairable consecutive k-out-of-n: F system. Stat. Theory, Relat. Fields 2021, 6, 139–147. [Google Scholar] [CrossRef]
  9. Amirian, Y.; Khodadidi, A. Modeling and Exact Reliability for a Consecutive Linear (k1, k2, …, knr+1)-out-of-r-from-n: F System and consecutive circular (k1, k2, …, kn)-out-of-r-from-n: F System. Int. J. Reliab. Qual. Saf. Eng. 2021, 4, 2150029. [Google Scholar] [CrossRef]
  10. Elshoubary, E.E. Effect of reduction method on the performance a software defined network system using Gumbel Hougaard family copula distribution. J. N. Soc. Phys. Sci. 2023, 4, 1402. [Google Scholar] [CrossRef]
  11. Elshoubary, E.E.; Abu Shaeer, Z.F. Performance study of a complex system with three subsystems in series configuration using reduction method and copula distribution. Int. J. Comput. Intell. Ctrl. 2021, 13, 19–28. Available online: https://www.mukpublications.com/resources/ijcic%20v13-2-3.pdf (accessed on 30 July 2024).
  12. Temraz, N. Availability and reliability of a parallel system under imperfect repair and replacement: Analysis and cost optimization. Int. J. Sys. Assur. Eng. Manag. 2019, 10, 1002–1009. [Google Scholar] [CrossRef]
  13. Maihulla, A.S.; Yusuf, I. Reliability, availability, maintainability, and dependability analysis of photovoltaic systems. Life Cycle Reliab. Saf. Eng. 2022, 11, 19–26. [Google Scholar] [CrossRef]
  14. Chand, U.; Kumar, R.; Rawal, D.K.; Singh, V.V. Stochastic analysis of a complex repairable system comprises two subsystems in series configuration via multi-repair strategy. Life Cycle Reliab. Saf. Eng. 2022, 11, 323–335. [Google Scholar] [CrossRef]
  15. Singh, V.V.; Gahlot, M. Reliability analysis of (n) clients system under startopology and copula linguistic approach. Int. J. Comput. Syst. Eng. 2021, 6, 123–133. [Google Scholar] [CrossRef]
  16. Singh, V.V.; Poonia, P.K.; Adbullahi, A.H. Performance analysis of a complex repairable system with two subsystems in series configuration with an imperfect switch. J. Math. Comput. Sci. 2020, 10, 359. [Google Scholar] [CrossRef]
  17. Lado, A.K.; Singh, V.V. Cost assessment of complex repairable system consisting two subsystems in series configuration using Gumbel Hougaard family copula. Int. J. Qual. Reliab. Manag. 2019, 36, 1683. [Google Scholar] [CrossRef]
  18. Anas, S.M.; Yusuf, I. Reliability modeling and performance analysis of reverse osmosis machine in water purification using Gumbel–Hougaard family copula. Life Cycle Reliab. Saf. Eng. 2022, 12, 11–22. [Google Scholar] [CrossRef]
  19. Sengar, S.; Singh, S.B. Reliability Analysis of an Engine Assembly Process of Automobiles with Inspection Facility. Math. Theory Model. 2014, 4, 153–164. Available online: https://www.iiste.org/Journals/index.php/MTM/article/view/13118/13376 (accessed on 30 July 2024).
  20. Wang, Q.; Hu, J.; Wu, Y.; Zhao, Y. Output synchronization of wide-area heterogeneous multi-agent systems over intermittent clustered networks. Inf. Sci. 2023, 619, 263–275. [Google Scholar] [CrossRef]
  21. Liang, Q.; Hu, J.; Xiang, L.; Shi, K. Adaptive dynamic programming and distributionally robust optimal control of linear stochastic system using the Wasserstein metric. Int. J. Adapt. Control. Signal Process. 2024; early view. [Google Scholar] [CrossRef]
  22. Poonia, P.K.; Singh, V.V. Stochastic analysis of delayed reporting of faults in a computer network using copula distribution. Int. J. Qual. Eng. Tech. 2023, 9, 161–182. [Google Scholar] [CrossRef]
Figure 1. Block diagram of Apache Kafka system.
Figure 1. Block diagram of Apache Kafka system.
Applsci 14 06758 g001
Figure 2. An illustration of the model’s state transitions.
Figure 2. An illustration of the model’s state transitions.
Applsci 14 06758 g002
Figure 3. Availability analysis of two cases.
Figure 3. Availability analysis of two cases.
Applsci 14 06758 g003
Figure 4. Reliability analysis of two cases.
Figure 4. Reliability analysis of two cases.
Applsci 14 06758 g004
Figure 5. M T T F versus rates of failure.
Figure 5. M T T F versus rates of failure.
Applsci 14 06758 g005
Figure 6. Sensitivity analysis of the system as a function of failure rate.
Figure 6. Sensitivity analysis of the system as a function of failure rate.
Applsci 14 06758 g006
Figure 7. The expected profit resulting from the improvement of copula.
Figure 7. The expected profit resulting from the improvement of copula.
Applsci 14 06758 g007
Figure 8. Profit expected for the general repair plan.
Figure 8. Profit expected for the general repair plan.
Applsci 14 06758 g008
Table 1. Explain the present state of the system.
Table 1. Explain the present state of the system.
StateDescription
S 0 At this stage, it is expected that all components of the subsystem are operating correctly and without any issues.
S 1 , S 3 ,
S 5
The mentioned state has worsened and is considered to be a minor failure within a system. However, despite
the failure of any single unit within subsystems 1, 2, and 3, the overall system remains operational as two units
are still functioning properly. Efforts are underway to restore the system through comprehensive repairs.
S 2 , S 4 ,
S 6
After experiencing the malfunction of at least two units within subsystems 1, 2, and 3, the system has reached
a significantly impaired operational state, requiring ongoing repairs to restore full functionality.
S 7 , S 8 ,
S 9
These states have experienced significant breakdowns due to the failure of more than two units within subsystems
1, 2, and 3. To address this issue, the Gumbel–Hougard copula distribution is being employed to rectify the system.
S 10 Due to a malfunction in the ZooKeeper unit, subsystem 2 has completely stopped working. The issue was resolved
using copula distribution.
Table 2. Comparing the availability of the copula and general systems.
Table 2. Comparing the availability of the copula and general systems.
TimeCase 1Case 2Case 3
0111
10.9852070.9726060.997056
20.9850020.9645170.997001
30.9852020.9621040.997037
40.9852770.9613760.997054
50.98530.9611530.99706
60.9853070.9610840.997063
70.9853090.9610620.997063
80.985310.9610550.997064
90.985310.9610530.997064
100.985310.9610520.997064
Table 3. Variation of reliability with respect to time in various cases.
Table 3. Variation of reliability with respect to time in various cases.
TimeCase 1Case 2
011
10.964340.992194
20.9224620.984121
30.8767130.975801
40.8288790.967258
50.7803050.958509
60.7319920.949574
70.6846680.940472
80.638850.931218
90.5948940.92183
100.5530270.912322
Table 4. Variation of M T T F with respect to failure rates.
Table 4. Variation of M T T F with respect to failure rates.
Failure RatesMTTF ( σ 1 )MTTF ( σ 2 )MTTF ( σ 3 )MTTF ( σ 4 )
0.0114.905215.795517.587514.9052
0.0214.228614.905216.273714.9052
0.0313.372413.865114.905214.9052
0.0412.539212.889813.68814.9052
0.0511.780112.023312.643214.9052
0.0611.103111.265111.751614.9052
0.0710.502710.602910.988114.9052
0.089.970110.022910.329714.9052
0.099.496229.512249.757814.9052
0.19.072899.060259.2571414.9052
Table 5. Variation of sensitivity with respect to failure rates.
Table 5. Variation of sensitivity with respect to failure rates.
Failure Rates ( M T T F ) σ 1 ( M T T F ) σ 2 ( M T T F ) σ 3 ( M T T F ) σ 4
0.01−41.9193−62.5954−107.932−569.162
0.02−82.4095−102.874−140.725−370.604
0.03−85.9876−102.218−130.34−263.725
0.04−79.949−92.2716−112.923−198.283
0.05−71.7737−81.0735−96.4012−154.875
0.06−63.7345−70.7879−82.3463−124.444
0.07−56.4945−61.8877−70.7466−69.3555
0.08−50.179−28.6126−61.232−51.685
0.09−44.7334−24.9794−53.4053−45.353
Table 6. Expected profit where the repair follows cupola distribution.
Table 6. Expected profit where the repair follows cupola distribution.
Time V 2 = 0.5 V 2 = 0.4 V 2 = 0.3 V 2 = 0.2 V 2 = 0.1
000000
10.4892260.5892260.6892260.7892260.889226
20.9741911.174191.374191.574191.77419
31.45931.75932.05932.35932.6593
41.944552.344552.744553.144553.54455
52.429842.929843.429843.929844.42984
62.915153.515154.115154.715155.31515
73.400454.100454.800455.500456.20045
83.885764.685765.485766.285767.08576
94.371075.271076.171077.071077.97107
104.856385.856386.856387.856388.85638
Table 7. Expected profit assuming a general distribution the repairs.
Table 7. Expected profit assuming a general distribution the repairs.
Time V 2 = 0.5 V 2 = 0.4 V 2 = 0.3 V 2 = 0.2 V 2 = 0.1
000000
10.4835780.5835780.6835780.7835780.883578
20.9513411.151341.351341.551341.75134
31.414421.714422.014422.314422.61442
41.876082.276082.676083.076083.47608
52.337332.837333.337333.837334.33733
62.798443.398443.998444.598445.19844
73.259513.959514.659515.359516.05951
83.720574.520575.320576.120576.92057
94.181625.081625.981626.881627.78162
104.642675.642676.642677.642678.64267
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Elshoubary, E.E.; Radwan, T. Studying the Efficiency of the Apache Kafka System Using the Reduction Method, and Its Effectiveness in Terms of Reliability Metrics Subject to a Copula Approach. Appl. Sci. 2024, 14, 6758. https://doi.org/10.3390/app14156758

AMA Style

Elshoubary EE, Radwan T. Studying the Efficiency of the Apache Kafka System Using the Reduction Method, and Its Effectiveness in Terms of Reliability Metrics Subject to a Copula Approach. Applied Sciences. 2024; 14(15):6758. https://doi.org/10.3390/app14156758

Chicago/Turabian Style

Elshoubary, Elsayed E., and Taha Radwan. 2024. "Studying the Efficiency of the Apache Kafka System Using the Reduction Method, and Its Effectiveness in Terms of Reliability Metrics Subject to a Copula Approach" Applied Sciences 14, no. 15: 6758. https://doi.org/10.3390/app14156758

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop