Towards Double-Layer Dynamic Heterogeneous Redundancy Architecture for Reliable Railway Passenger Service System

Wu, Xinghua; Wang, Mingzhe; Shen, Jinsheng; Gong, Yanwei

doi:10.3390/electronics13183592

Open AccessArticle

Towards Double-Layer Dynamic Heterogeneous Redundancy Architecture for Reliable Railway Passenger Service System

¹

Beijing Key Laboratory of Security and Privacy in Intelligent Transportation, Beijing Jiaotong University, Beijing 100044, China

²

Institute of Computing Technology, China Academy of Railway Sciences, Beijing 100081, China

³

Strategic Development Department, China Association for Science and Technology, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(18), 3592; https://doi.org/10.3390/electronics13183592

Submission received: 16 July 2024 / Revised: 30 August 2024 / Accepted: 8 September 2024 / Published: 10 September 2024

(This article belongs to the Special Issue Reliability, Fault Tolerance and Safety of Electronic Devices and Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Researchers have proposed the dynamic heterogeneous redundancy (DHR) architecture, which integrates dynamic, heterogeneous, redundant, and closed-loop feedback elements into the system, to fortify the reliability of the railway passenger service system (RPSS). However, there are at least two weaknesses with the common DHR architectures: (1) they need system nodes with enough computing and storage resources; (2) they have hardly considered the reliability of DHR architecture. To this end, this paper proposes a double-layer DHR (DDHR) architecture to ensure the reliability of RPSS. This architecture introduces a set of algorithms, which are optimized co-computation and ruling weight optimization algorithms for the data processing flow of the DDHR architecture. This set improves the reliability of the DDHR architecture. For the evaluation of the reliability of DDHR architecture, this paper also proposes two metrics: (1) Dynamic available similarity metric. This metric does not rely on the overall similarity of the double-layer redundant executor sets but evaluates the similarity of their performance under the specified interaction paths within a single scheduling cycle. The smaller its similarity, the higher its reliability. (2) Scheduling cycle under dual-layer similarity threshold. This metric evaluates the reliability of the RPSS under actual conditions by setting the schedulable similarity thresholds between the same and different layers of the dual-layer redundant executives in the scheduling process. Finally, analog simulation experiments and prototype system building experiments are carried out, whose numerical experimental results show that the DDHR architecture outperforms the traditional DHR architecture in terms of reliability and performance under different redundancy and dynamically available similarity thresholds, while the algorithmic complexity and multi-tasking concurrency performance are slightly weaker than that of the DHR architecture, but can be applied to the main operations of the RPSS in general.

Keywords:

DHR; mimetic defense; reliability; railway system

1. Introduction

The railway passenger service system (RPSS), a piece of cloud-edge architecture, is an essential and crucial part of China’s railway information network. It has integrated various passenger services, including railway station broadcasting, guiding, and video [1]. The system architecture consists of three parts: the regional center node, the line center node, and the railway passenger station node. As depicted in Figure 1, the regional center node is equipped with a regional center data service cloud platform in redundant mode, facilitating centralized calculation and feedback of business requests (including control, management, data interaction, etc.) pooled at the station, and ensuring the accuracy and reliability of station system operations. Similarly, the line center node incorporates a line center management platform to aggregate and manage business data, while facilitating data exchange with the regional center node. At the railway passenger station node, several end systems for various services, including broadcasting, guiding, querying, and seeking assistance, are tactfully deployed and controlled.

The RPSS diligently provides comprehensive information services to millions of passengers daily, encompassing station entry and exit procedures, transportation arrangements, and seamless connections at thousands of high-speed railway stations nationwide, thereby effectively safeguarding the orderly functioning of station passenger transportation. However, as the crucial information infrastructure for China’s railways [2,3,4], the increasing number of heterogeneous end systems involved in the expanding RPSS and the emergence of more unknown attacks result in the corruption or tampering of various types of service requests sent by station-end systems and devices to the regional center node. This will further lead to errors in the regional center data, which will affect the reliability of service request responses for the entire regional center or for the wider system. These two problems are described in detail as follows:

(1): More heterogeneous end systems enlarge the attack surface [5] of the RPSS, which leads to a decrease in the reliability of the RPSS. Due to the increasing number and variety of end systems, the attack surface of the RPSS is also increasing, thus increasing the wider range of system failures, which further causes maintenance delays leading to a decrease in the overall reliability of the RPSS.
(2): More and more unknown and targeted attacks enlarge the risk of expanding computing services from station-end systems to regional centers. Some APT attack programs and malicious destructive programs implanted by exploiting the vulnerabilities of the station-end system may affect the normal computation of the regional center service program by destroying the terminal data, thus leading to the failure of the regional node service, especially in the fields of transportation [6,7,8,9].

At present, relevant research usually adopts active defense strategies to ensure the reliability of the RPSS. The Cyber Mimic Defense (CMD) technology [10] is one of the prevailing technologies that was initially proposed by Professor Wu in 2014. Its fundamental concept revolves around targeting unknown threats and vulnerable backdoors in cyberspace by leveraging the vulnerability of the attack chain. This is accomplished through the implementation of a dynamic heterogeneous redundancy (DHR) architecture, which integrates dynamic, heterogeneous, redundant, and closed-loop feedback elements into the system, consequently fortifying the system’s reliability [11,12].

In recent years, the research and application of DHR architecture have demonstrated its efficacy in ensuring the reliability of the system. Therefore, we need to ensure reliable operation by building a set of DHR architectures adapted to the RPSS.

It should be added that DHR architecture does not replace other network security devices and strategies (e.g., DDOS attacks, etc.), but rather assists other security products to improve system reliability. Therefore, the expected reliability goals we obtained through our research include the following: (1) Design a redundancy architecture that guarantees stable operation of the system’s regional center nodes with higher reliability than the existing redundancy architecture of the RPSS. (2) Optimize the DHR architecture so that its reliability is better than the existing DHR architecture while adapting to the special resource environment of the RPSS and the actual engineering operation conditions.

However, there are at least two weaknesses with the common DHR architecture in practical engineering applications.

Weakness 1: The existing research primarily operates within an ideal state, neglecting the constraints imposed by limited heterogeneous redundant executor pools, specifically regarding the quantity and variety of heterogeneous redundant executor pools. In the resource environment of the RPSS, the regional center and the line center encounter limitations in terms of available heterogeneous redundant resources. These restrictions hinder the full utilization of the defense capabilities of DHR architecture.

Weakness 2: Existing research lacks studies on mimetic adjudication strategies as well as redundant executor scheduling strategies in the distributed deployment scenario of cloud-edge architecture. As a result, the traditional mimetic adjudication data processing cannot be effectively adapted to the cloud-edge architecture deployment model.

Therefore, this paper aims to address the aforementioned weaknesses by improving the DHR architecture. The innovative contributions of this paper include the following:

(a): We propose a double-layer DHR (DDHR) architecture to adapt to the cloud-edge architecture of the RPSS. The DDHR architecture contains a data flow processing algorithm and a weighted mimetic ruling algorithm. Among them, the data processing algorithm can be fully adapted to the distributed deployment conditions of the redundant executor sets, and the upper layer of the redundant executor set hidden characteristics, which in turn reduces the similarity of the system and improves the dynamics of the system; the weighted mimetic ruling algorithms can be used to increase the upper limit of the number of redundant executives for voting effectiveness of the double-layer redundant executors set, under the condition of minimizing the complexity of the voting algorithms.
(b): We propose a reliability indicator, called dynamic available similarity, which is a special metric for the DDHR architecture to evaluate the available heterogeneity of a double-layer redundant set of systems against a single attack process under random interaction conditions. The dynamic available similarity is different from the traditional heterogeneity metrics under the DHR architecture, which dynamically determine the heterogeneity of a single attack in real time against the access restriction characteristics of the upper and lower redundancy layers of the DDHR architecture.
(c): We propose a reliability indicator, called scheduling cycle under dual-layer similarity threshold conditions, which is another special metric to evaluate the dynamic performance of DDHR architecture in practical engineering applications when redundant entities are cleaned and scheduled offline. This metric expands on the traditional DHR architecture dynamic metrics by adding similarity limits between the same layer and different layers, to simulate the impact of redundancy offline, cleaning, and other situations on system reliability in real engineering scenarios.
(d): We design a prototype laboratory system for the DDHR architecture. By deploying heterogeneous systems and applications in a multi-virtual machine environment on a trial basis in the lab, we have realized the construction of the DDHR architecture, validation of its effectiveness, and verification of its multi-task concurrent performance.

The subsequent sections of this article are structured as follows.

(1): Related Work: Section 2 provides an introduction to the relevant references and prior studies in the field.
(2): DDHR Architecture and Process Design: in Section 3, the RPSS with DDHR architecture is modeled, and the design of the process and algorithm are presented through a pseudo-code display.
(3): Numerical Modeling and Comparison: Section 4 focuses on performing numerical modeling of indicators specific to the DDHR architecture. Furthermore, a comparative analysis is conducted, comparing the results with those obtained from the DHR architecture.
(4): Experimental Simulation: Section 5 presents the details of the experimental simulations carried out to validate the proposed approach.
(5): Conclusion and Future Work: Section 6 concludes the article by summarizing the findings and contributions. Additionally, it outlines potential directions for future research.

2. Related Work

The core architecture of CMD, known as the DHR architecture [13], comprises various components such as input agents, executors, voters, policy schedulers, and heterogeneous redundant executor pools, as illustrated in Figure 2. The fundamental processing flow of the system is as follows. (1) Dynamic allocation of redundant executors: the scheduling module dynamically assigns redundant executors from the redundant executors pool to the processing module using a dynamic selection algorithm. (2) Forwarding of user-sent message status: the input agent forwards the message status sent by the user to different redundant executors within the processing module. (3) Processing and consistency decision-making: The redundant executors process the received requests and send them to the voting unit. The voting unit then makes a consistent decision and produces the output result. (4) Negative feedback and redundant executors rescheduling: In case any inconsistent rulings are detected during the decision-making process, the redundant executors responsible undergo the negative feedback mechanism. This feedback is transmitted back to the scheduling module, triggering the rescheduling of the redundant executors.

In recent years, numerous scholars have conducted research and analysis on mimic defense architecture, with a primary focus on the design, implementation, and optimization of mimic defense architecture, dynamic scheduling, and voting strategies.

(1): Regarding the construction of mimic defense architecture.

Ren et al. [14] introduced the mimic security resilient controller for SDN network frameworks to address their vulnerability to backdoor attacks. They designed a unified multi-controller network update request strategy to prevent attacks from a single malicious controller. Wu et al. [15] proposed an active defense development framework (ICS) for cloud-native environments. They utilized technologies such as multi-version assembly, multi-instance deployment, and diversified compilation to enhance system complexity and its ability to resist attacks. Wang et al. [16] introduced an IoT DHR architecture based on the double deep reinforcement learning network (DDQN). They trained and optimized scheduling and decision strategies using the DDQN network, enabling dynamic scheduling in container cloud environments driven by Kubernetes. Sepczuk [17] proposed a defense model that combines the DHR architecture with the WAF firewall. The WAF firewall establishes temporary redundant execution rules when possible HTTP attacks are detected, thereby improving the security of the WAF. Li et al. [18] oriented to the connected automated vehicles (CAVs) system and network security, constructed an intelligent DHR scheme suitable for its characteristics using the CTMC model, and verified the feasibility of the architecture through simulation tests.

(2): In the context of mimic defense dynamic scheduling and voting strategy analysis and optimization.

Wei et al. [19] proposed a conditional probability voting algorithm (CPVA) based on heterogeneity to address isomorphism errors that may occur in voting algorithms. Chen et al. [20] tackled the nonlinear problem of component heterogeneity superposition and introduced a heterogeneous evaluation model based on the minimum L-order error probability. Liu et al. [21] presented a random seed and minimum similarity algorithm (RSMS) and introduced a comprehensive evaluation index based on the similarity characteristic distance of system components. Building upon Liu et al.’s work, Pu et al. [22] further proposed a redundant executors heterogeneous evaluation indicator based on the dual dimensions of space and time. They also introduced the PSPT redundant executors scheduling algorithm based on the time slice strategy. Tong et al. [23] proposed the concept of spatial and temporal hybrid diversity based on the redundancy of DHR architectures and proposed attack step (AL) and attack tolerance (AT) as security metrics to evaluate the security of architectures. Shi et al. [24] proposed an evolutionary DHR system. They solved the problem of the limited number of heterogeneous executors by adding evolutionary sub-strategies of executors. Finally, they verified the effectiveness of the proposed scheme by constructing a game model. Chen et al. [25] proposed a dynamic architecture evaluation method based on incomplete information game strategies, and further calculated and evaluated the benefits of both offense and defense through the Markov chain model to verify the security of the architecture. Li et al. [26] proposed a time threshold-based TIRTS scheduling algorithm, a task-based threshold TARTS scheduling algorithm, and the MQS multi-level queue scheduling algorithm that integrates time and task thresholds. Guo et al. [27] proposed a scheduling optimization strategy based on the sliding window model, which realized the dynamics of the DHR architecture by setting redundant scheduling feedback exception thresholds and time limits, and finally performed the algorithm through the Monte Carlo method. Zhu et al. [28] proposed a comprehensive scheduling algorithm (HHAC) based on high-order heterogeneity and adaptive historical confidence to optimize the dynamic strategy of DHR architecture, and they further analyzed the dynamic indicator among the CRS, TIRTS, RSMS, and HHAC algorithms. Shao et al. [29] proposed a dynamic scheduling algorithm (HCDC) based on historical credibility and K-Means heterogeneous clustering, and through simulation experiments, the HCDC algorithm and the RS, MD, and OMD algorithms were successfully attacked on the system rate and other indicators for comparison and verification.

In recent years, extensive research has been conducted on the design and application of DHR architecture in various fields such as routers, servers, and cloud computing. The feasibility and security of DHR architecture have been verified, and progress has been made in optimizing the model scheduling and decision-making processes. Evaluation indicators have evolved from single, static indicators to dynamic composite indicators that consider both reliability and cost. Furthermore, the continuous optimization of the architecture has led to the development of scheduling and decision-making strategies from single-index threshold scheduling decisions to comprehensive scheduling decisions that consider high-order heterogeneity and security. A comparison of the various types of metrics and optimization directions for the DHR architecture in the relevant literature is shown in Table 1.

3. DDHR Architecture Design

In this section, the design idea of the DDHR architecture is initially introduced in Section 3.1. Subsequently, Section 3.2 partially discusses the design approach for the model logic design and engineering implementation of the DDHR architecture in the context of the RPSS. Finally, in Section 3.3, the data processing flow of the DDHR architecture is designed, and an adapted upper and lower redundant executors collection decision weight setting strategy is proposed. Additionally, this section concludes by presenting the pseudo-code for the data processing algorithm.

3.1. DDHR Architecture Design Idea

The main defense strategies of the DHR architecture against attacks can be summarized as follows: (1) making it impossible for an attacker to attack a sufficient number of computing nodes within a limited time to cause a system error through the double-layer heterogeneous equivalent redundant computing node setups as well as joint decision-making strategy setups; (2) making the entire system able to quickly clean and replace the faulty redundant executor in the running state. The core of the DDHR architecture proposed in this paper is the design of methods and strategies that further consider the limited resources of redundant pools and the reliability enhancement of two-layers cooperative computing on the basis of the above strategies.

This section presents the schematic design idea for the DDHR architecture, as depicted in Figure 3.

(1): Layering of redundant resources to improve system dynamics under resource-limited conditions.

By converting the 2n centrally deployed heterogeneous redundant executors of the traditional DHR architecture into a distributed deployment, they are dispersed to the upper and lower layers. This allows each layer of the redundant executors pool to have enough resources for the dynamic scheduling of redundant executors. Additionally, the results are integrated through collaborative computation, computation weight optimization, and unified voting. Consequently, this approach effectively addresses the issue of the insufficient redundant scheduling capability in the DHR architecture within the constraints of a limited redundant executors pool.

(2): Access Hiding to Upper Redundant Resources to Improve System Heterogeneity under Two-layer Computing Conditions.

Through the establishment of a scheduling gateway, the isolation and management of communication within the heterogeneous redundancy layers are achieved. Specifically, the lower-layer redundant executors are fully accessible to the input agent, whereas the upper-layer redundant executors and heterogeneous redundant executor pools remain inaccessible. Only limited individual communication is permitted through the scheduling gateway. This approach allows the system similarity to be further constrained between the upper and lower layers of randomly connected redundancies, thus improving the heterogeneity of the system under two-layer computational conditions.

In summary, we can enhance the dynamics and heterogeneity of the traditional DHR architecture through modification of the DDHR architecture, so as to further improve its reliability, under the condition of limited heterogeneous redundant resources that can be dispatched by different center nodes of the RPSS.

3.2. Design of DDHR Logical Architecture for RPSS

The DDHR logical architecture can be designed based on the following six modules: processing module, scheduling gateway, scheduling module, redundant executors pool, input module, and output module. These modules collectively form the core components of the architecture. The specific operational functions of each module are depicted in Figure 4.

(1): Processing Module:

The processing module in the DDHR logical architecture consists of two parts: the lower processing module and the upper processing module. Each module contains different redundant executors obtained from the redundant executors pool by the scheduling module. The lower and upper redundant executors are isolated from each other to prevent unauthorized access.

Lower-layer processing module: This module comprises m redundant executors and is primarily responsible for receiving message requests from the input agent. It parses and calculates these requests using the m redundant executors and forwards them to the upper-layer processing module through the scheduling gateway routing.

Upper-layer processing module: This module consists of n redundant executors (n ≥ m). It is not directly connected to the input module but is only linked to the authorized lower-layer processing module through the dispatch gateway. The functions of the upper-layer processing module include the following: (1) Receiving message requests from the dispatch gateway and the input agent, and performing parsing and calculations. (2) Receiving calculation results from the lower-layer processing module (m redundant executors) and performing comparison calculations. (3) Sending the calculation results from both the lower and upper-layer processing modules to the voting unit for output judgment.

(2): Scheduling Gateway:

The scheduling gateway is a newly introduced module in the DDHR architecture. Its main functions are as follows: (1) During the system initialization phase, it generates and stores sets of m redundant executors for the lower-layer processing module and n redundant executors for the upper-layer processing module based on the algorithm strategy of the scheduling module. It also establishes routing access lists between these sets. (2) It forwards message requests sent by the input agent. (3) Upon request, it provides the routing access strategy between the redundant executors of the upper-layer processing module and the lower-layer processing module.

(3): Scheduling Module:

The scheduling module consists of an upper-layer scheduling module and a lower-layer scheduling module. Their responsibilities include generating a heterogeneous scheduling algorithm that stores the DDHR architecture and publishing resource combinations from the upper and lower-layer redundant pools to the corresponding processing modules and scheduling gateways based on the algorithm.

(4): Redundant executors pool:

The upper and lower redundant executor pools serve as storage for their respective redundant components. Their primary functions encompass two aspects: (1) enabling the configuration of corresponding heterogeneous components based on the requirements of the scheduling module, and (2) facilitating the repair and cleansing of faulty or high-risk heterogeneous components.

(5): Input Module:

The input module in the DDHR architecture is primarily responsible for sending message requests to the lower-level processing module. It also sends these message requests to the scheduling gateway.

(6): Output Module:

The output module is responsible for unified decision-making and processing of the calculation results from both the upper-layer processing module and the lower-layer processing module. It uniformly sends decision-making requests to the regional center access server for business distribution processing.

3.3. Engineering Architecture Design

According to the aforementioned logical architecture, the DDHR architecture for the RPSS can be constructed by leveraging the existing resource transformation of the private cloud platform in the regional center and the line center management platform. This is illustrated in Figure 5.

(1): The Line Center Node transformation

The resources of the line center node depicted in Figure 1 are transformed through virtualization. This transformation involves deploying the message input agent service at the message aggregation entrance node. Additionally, Docker and VM virtual machine deployment technologies are utilized to achieve the integration of available components. These components include CPU and operating systems. Various types of redundant heterogeneous resources (such as JavaWeb applications, MQ applications, Redis applications, etc.) are also integrated. The platform employs Kubernetes and Nginx for unified scheduling and establishes redundant executors pool, lower-layer redundant executables, and scheduling gateways.

(2): The Regional center Node transformation

Given that the private cloud platform at the regional center is already deployed in a virtualized manner, the transformation primarily focuses on the deployed applications. Specifically, in the access service cluster of the regional center illustrated in Figure 1, the CPU resources, operating system resources, and various types of redundant heterogeneous resources for message processing applications are scheduled and deployed. Moreover, upper-layer redundant executables and unified voting services are constructed, relying on the regional center node.

3.4. DDHR Architecture Data Processing Design

The relevant symbols in this paper are shown in Table 2.

The system business process encompasses several key stages: system initialization, message request submission by the client agent, execution by the lower-layer redundant executors, operation of the scheduling gateway, execution by the upper-layer redundant executors, and mimic voting. The working flow is shown in Figure 6. In the figure “...” indicates that the content of each redundant executor calculation result is omitted.

(1): System Initialization

Assuming the lower-layer redundant executors pool contains a total of M redundant executors, and the upper-layer redundant executors pool contains a total of N redundant executors. During the initial phase of the system, the scheduling module employs the loaded mimic scheduling algorithm to selectively choose redundant executors from the lower-layer and upper-layer redundant executor pools. This selection process results in the formation of a lower-layer redundant executors set (D) comprising m redundant executors and an upper-layer redundant executors set (U) comprising n redundant executors (n ≥ m). Concurrently, the scheduling module establishes the association between the m lower-layer redundant executors and the n upper-layer redundant executors based on the scheduling algorithm. This association information is then transmitted to the scheduling gateway to generate the corresponding routing policy.

(2): Message Request Submission by the Client Agent

When the terminal sends business requests, the client agent transmits them as messages to both the lower-layer redundant executors and the scheduling gateway, specifically into message inf.

(3): Execution by the Lower-layer Redundant executors

The m lower-layer redundant executors undertake the task of parsing and calculating the message inf transmitted by the client agent to generate the lower-layer execution result.

D = {k_{i} | k_{1}, k_{2} \dots k_{m}}

. Subsequently, the lower-layer redundant executors request from the dispatch gateway the access list of upper-layer redundant executors, establish a secure link, and transmit

k_{i}

to the respective upper-layer redundant executors.

(4): Operation of the Scheduling Gateway

The scheduling gateway stores and provides the appropriate access list of upper-layer redundant executors in response to requests from lower-layer redundant executors. Simultaneously, it forwards the message inf sent by the client agent to the corresponding upper-layer redundant executors based on the access request.

(5): Execution by the Upper-layer Redundant Executors

The upper-layer redundant executors set U receives the link request from the lower-layer redundant executors set, receives the result set D, and performs secondary calculations to obtain the result set

K V = {< k_{i}, v_{i} > | < k_{1}, v_{1} >, < k_{2}, v_{2} > \dots, < k_{m}, v_{m} > < k_{m}, v_{n} >}

. Simultaneously, it performs calculations on the forwarded message inf from the scheduling gateway, resulting in the result set

K^{'} V^{'} = {< {k^{'}}_{i}, {v^{'}}_{i} > | < {k^{'}}_{1}, {v^{'}}_{1} >, < {k^{'}}_{2}, {v^{'}}_{2} > \dots < {k^{'}}_{n}, {v^{'}}_{n} >}

. Subsequently, the upper-layer redundant executors set combines and stores the execution results KV and K′V′ obtained from the lower-layer redundant executors and sends them to the voting machine for decision-making.

(6): Mimic Voting

Voting Weight Setting:

In the general dynamic heterogeneous redundancy (DHR) architecture design, all redundant executors are initially set as equivalent. The decision-making process follows a mimic approach based on the principle of majority consistency. This means that in a redundant system with n redundant executors, if the ruling result is incorrect, it is necessary to wait until the number of inconsistencies among the redundant executors is greater than or equal to

[(n + 1) / 2]

. In the references [15,16], different discrimination algorithms have been proposed to determine the weights assigned to the judgments of redundant executors. However, these algorithms rely on the prior attributes of the redundant executors, including their heterogeneous attributes and historical decisions, to make these determinations. Then the algorithm’s complexity reaches

o (n^{2})

. Taking into account the characteristics of the double-layer redundant executors connection calculation in the DDHR architecture, this article introduces the concept of a voting weight index, denoted as w. It establishes a relationship between the weights w_down and w_up assigned to the calculation results of the lower and upper redundant executors, respectively, and the final decision result

V (w)

according to Formula (1):

\begin{array}{l} V (w) = V (w_{d o w n}) + V (w_{u p}) \\ s . t . {\begin{cases} w_{d o w n} = - w \\ w_{u p} = w + ε \\ ε ≪ w, w > 0 \end{cases} \end{array}

(1)

Among them, the voting weight w_down of the lower-layer redundant executors is determined by the sum of w_up and a parameter

ε

, which is significantly smaller than w. This ensures that the overall decision-making result remains consistent with the majority while not significantly impacting the majority. In this case, if the ruling result is incorrect, it requires a minimum of

[n / 2] + 1

inconsistencies among the redundant executors for equivalent redundancy. It can be observed that

[n / 2] + 1 > [(n + 1) / 2]

, indicating that the system can still operate normally even after all the lower-layer redundant executors are attacked. This enhances the reliability of the redundant system. Additionally, the algorithm complexity is of a constant order

o (1)

, which is smaller than that described in references [15,16]. This makes it more convenient for practical application.

Voting strategy:

The voting machine first verifies the consistency of the entire KV set. If consistent, the result will be output directly based on the KV set; otherwise, the KV set and the K′V′ result set will be compared with the majority consistency ruling under the ruling weight condition to obtain the final result output.

The entire data processing algorithm simulation code is shown in Algorithms 1:

Algorithms 1. Data processing algorithm simulation code

(1) Initialization
Input: redundant executors pool, upper-layer redundant executors set, lower-layer redundant executors set, redundant executors routing link list GList(), scheduling algorithm F
Function Init
U,D,

GList () \leftarrow ϕ

U,D

\leftarrow

F(A)
GList()

\leftarrow

F(U,D)
End
(2) Business process
Input: message inf, upper-layer redundant executors set, upper-layer isomer set redundancy n, lower-layer isomer set D, lower-layer isomer set redundancy m, scheduling gateway storage message set G, upper and lower isomer link list GList (), data processing result set K, V
Output: Mimic verdict result rs
Function mimicjob
   Init()
   for i in D do
     k[i]= D[i].job(inf)
//The lower redundant executors process the message inf to obtain k
     if k[i]!=null then
       G

\leftarrow

GList(k[i],inf)
D

\leftarrow

k[i]
     endif
   endfor
   for j in U do
// The upper redundant executors process K to obtain <k, v>
     <k[j],v[j]>= U[j].job(D)
     KV

\leftarrow

<k[j],v[j]>
// The upper redundant executors process the message inf and obtains <k′, v′>
<k′[j],v′[j]>= U[j].job(G)
K′V′

\leftarrow

<k′[j],v′[j]>
endfor
if count(KV

\cap

D)==n then
rs

\leftarrow

KV
else
// The upper and lower-layer redundant executors output results

if samecount (K V \cup K^{'} V^{'}

) > [(n + 1) / 2]

then
rs

\leftarrow K V \cap K^{'} V^{'}

elseif samecount (K V \cup K^{'} V^{'}

) = = [(n + 1) / 2]

then
rs

\leftarrow

KV
else
rs

\leftarrow ϕ

     endif
   endif
   return rs
end

4. DDHR Architecture Analysis

Besides the system’s inherent reliability indicators, dynamics and heterogeneity serve as two key factors that directly evaluate the efficacy of the DDHR—greater heterogeneity results in reduced probabilities of system evasion or successful attacks. Conversely, heightened dynamism in the heterogeneous scheduling leads to diminished probabilities of successful attacks, thereby decreasing the likelihood even further. Accordingly, this paper employs numerical modeling to examine the system’s dynamics and heterogeneity, and the failure probability of the system itself, followed by a comparative analysis of the DHR architecture and DDHR architecture.

The following analysis in this paper has the following assumptions:

Assumption 1.

The attacker’s attack on the same layer of the redundant executors set is randomized and there is no a priori situation.

Assumption 2.

The upper and lower redundant executor pools have the same resources and the upper and lower redundant executor sets have the same redundancy.

Assumption 3.

Each redundant executor in the pool of the upper redundant executors has only a unique corresponding associated lower redundant executor.

Assumption 4.

The attacker can be empirically accumulated but there is no attack escape scenario.

4.1. System Heterogeneity Analysis

System heterogeneity is, in fact, an aggregate function of the heterogeneity exhibited by each redundant component within the system. For instance, assuming a redundant executor a comprises k distinct components, denoted as

a_{i}, a_{j}

, each possessing heterogeneity feature vectors represented by

l^{i j} = {{l^{i j}}_{1} \dots {l^{i j}}_{n}}

, the heterogeneity of the redundant executors can be mathematically described by the function

h (l^{i j})

applied to the component heterogeneity feature vectors

l^{i j}

of the redundant executors. In previous studies [21], it has been observed that computer systems exhibit widespread dissimilarity, characterized by diverse forms that make it challenging to establish a precise definition of “complete” dissimilarity. However, when considering its dual problem, similarity tends to converge, thereby allowing the possibility of defining a “complete” isomorphism. Consequently, in this study, we undertake a comparative analysis of system isomorphism by examining the similarity of redundant executors. Additionally, we introduce the concept of “instantly available similarity” to evaluate the system’s performance within the context of the distinctive features of the DDHR architecture.

Let us consider an n-dimensional redundant system

A = {a_{1} \dots a_{n}}

, where s_ij represents the similarity between two redundancies, namely

a_{i}, a_{j}

. In this context, the existence similarity matrix for the redundant system A can be mathematically formulated as follows:

S = (\begin{matrix} 1 & \dots & s_{1 n} \\ ⋮ & ⋱ & ⋮ \\ s_{n 1} & \dots & 1 \end{matrix})

(2)

In the case where

i = j

, indicating a comparison of the redundant executors with itself, we can make the similarity value s_ij equal to “1”. However, when

i \neq j

, redundant executors are similar but not exactly equivalent, we can make the similarity value

s_{i j} \in [0, 1)

. The matrix S_fi is a real symmetric matrix of size n, where all diagonal elements are equal to 1. The corresponding normalized mathematical expression for similarity, as reported in [21], is as follows:

S |_{A^{n}} = \frac{1}{C_{n}^{2}} \sum_{i = 1}^{n - 1} \sum_{j = i + 1}^{n} s_{i j}

(3)

Consider a set of redundant executors scheduling schemes denoted as

F = {f_{i} | f_{1} \dots f_{m}}

, each with a determined margin n. In the scenario where all redundancies within the schemes are equivalent and externally exposed, and each scheme

f_{i}

responds to an attack at moment

Δ t

, the system’s reliability can be effectively correlated with the instantaneous available similarity matrix S_fi. Notably, this similarity matrix can be represented by the similarity matrix of the redundant system itself, expressed as follows:

S_{f i} = S = (\begin{matrix} 1 & \dots & s_{1 n} \\ ⋮ & ⋱ & ⋮ \\ s_{n 1} & \dots & 1 \end{matrix})

(4)

Regarding the DDHR architecture, the reachability between the upper and lower layers of redundant executors is constrained by the scheduling gateway. Consequently, the scheme

f_{i}

can effectively correlate the system’s reliability by utilizing the similarity matrix S_fi, and can be represented as

S \times l Δ t

in response to an attack at moment

Δ t

. In this context,

l Δ t

represents the instantaneous reliability correlation vector matrix at moment

Δ t

, which is of the same order as S_fi. The elements of the matrix are assigned a value of “1” if reachability is present, and “0” if reachability is absent. It is important to note that when two redundancies are not reachable, their similarity is considered to be “0” (Note: the matrix element s_ij denotes similarity, not the reachability path). Based on this, the instantaneous usable similarity matrix S′_fi for the redundant system under the DDHR architecture can be represented as follows:

{S^{'}}_{f i} = S \times l Δ t = (\begin{matrix} 1 & s_{12} & s_{13} & \dots & s_{1 i} & 0 & \dots & 0 \\ s_{21} & 1 & s_{23} & \dots & 0 & s_{2 (i + 1)} & \dots & 0 \\ s_{31} & s_{32} & 1 & \dots & 0 & 0 & s_{3 (i + 2)} & 0 \\ ⋮ & ⋮ & ⋮ & ⋱ & ⋮ & ⋮ & ⋮ & ⋮ \\ s_{i 1} & 0 & 0 & \dots & 1 & s_{i (i + 1)} & s_{i (i + 2)} & s_{i n} \\ ⋮ & ⋮ & ⋮ & \dots & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & \dots & 0 & \dots & s_{n i} & s_{n (i + 1)} & \dots & 1 \end{matrix})

(5)

In real environments, achieving extensive complete heterogeneity is nearly impossible. Consequently, the non-zero elements of S′_fi redundantly overlap with S_fi. As a result, based on Equation (3), for a heterogeneous executor set within the DDHR architecture with the same redundancy, its similarity is lower than that of the heterogeneous executor set within the traditional DHR architecture. In other words,

S_{f i} |_{A^{n}} < {S^{'}}_{f i} |_{A^{n}}

.

4.2. System Dynamics Analysis

The dynamics of the DHR architecture are typically assessed by evaluating the scheduling cycle of the set of redundant executors. Meanwhile, the scheduling cycle of a computing system can be approximated by calculating the number of schedulable schemes without considering the scheduling time of the redundant executors or the complexity condition. In recent years, studies focusing on DHR scheduling algorithms, such as references [22,23,28,29], have emphasized the balanced optimization of scheduling scheme heterogeneity and dynamics. This means that while ensuring the desired level of heterogeneity, efforts are made to enhance system dynamics by introducing additional parameters, optimizing redundant executor finding strategies, and exploring other approaches. Among these algorithms, the stochastic scheduling algorithm stands out for its high dynamics and lower computational complexity, disregarding other reliability-related indicators such as heterogeneity. Consequently, this paper takes a comprehensive system perspective to analyze the dynamics of the two architectural models under the same stochastic scheduling algorithm.

Definition 1.

The sum of non-duplicated scheduling schemes in the entire redundant executors pool represents the number of schedulable schemes for a redundant system.

Let us consider a redundant system A with N redundant executors in the redundant executors pool. The number of scheduling schemes T can be mathematically expressed as follows for the set of single-tier redundancies with a redundancy r, assuming that all redundant executors in the pool satisfy the scheduling condition:

T = C_{N}^{r} = \frac{N!}{r! (N - r)!}

(6)

In the absence of any restrictions on the scheduling of redundant executors within the redundant system, each redundant executor can be assumed to be identical in the absence of predefined randomized attack conditions. In this case, the number of scheduling schemes can be formulated similarly as follows:

T^{'} = C_{N}^{r} = \frac{N!}{r! (N - r)!}

(7)

It can be concluded that in the absence of any restrictions, both the DHR redundant executors set and DDHR redundant executors set modes of scheduling have the same number of possible scenarios.

In the actual deployment environment, there are situations where a redundant executor requires offline updates. In such cases, the heterogeneity characteristics of the updated redundant executors may not meet the operational requirements of the system. This discrepancy is defined in this paper as the similarity threshold, denoted as s. Considering the limitation imposed by the similarity threshold, the number of aggregate scheduling schemes for the DDHR redundant executors is controlled by two parameters: the intra-layer similarity threshold s_lim and the inter-layer similarity threshold s_limud. On the other hand, the number of aggregate scheduling schemes for the DHR redundant executors is controlled by a similarity threshold s′_lim. The system makes the following assumptions:

Assumption 5.

The similarity thresholds for the two-layer redundant executors pool in the DDHR architecture and the redundant executors pool in the DHR architecture are equal, i.e.,

{s^{'}}_{\lim} = s_{\lim}

. Additionally, for the inter-layer similarity threshold in the DDHR architecture,

s_{\lim u d} \leq s_{\lim}

.

Assumption 6.

There exists a function, denoted as

y = y (N, s_{\lim})

, which maps the total number of redundant executors N in the redundant executors pool and the similarity threshold s_lim to a unique per-layer redundant executors set y with a count (

c o u n t_{y}

).

Assumption 7.

Similar to Assumption 2, there exists a function, denoted as

y^{'} = y (N, s_{\lim u d})

, which maps the similarity threshold s_limud to the set of redundant executors with access restriction between layers. The number of redundant executors in this set is denoted as

c o u n t_{y^{'}}

.

Assumption 8.

We consider only the case where the set of redundant executors with access restriction between layers, denoted as y′, is a subset of the per-layer redundant executors set y, i.e.,

y \subseteq y^{'}

.

The number of scheduling schemes for the DHR architecture, i.e., the number of schedulable redundant executors that satisfy the similarity threshold s_lim and form the set y, can be expressed as follows:

\begin{matrix} T = \frac{C ou n t_{y}!}{r! (C ou n t_{y} - r)!} \\ s . t . y \subseteq A \end{matrix}

(8)

The number of scheduling scenarios for the DDHR architecture is determined by the number of redundancies that satisfy the requirement of having the lower-layer redundant executor in the set and the associated upper-layer redundant executor also in the set. Mathematically, this can be expressed as follows:

\begin{matrix} T^{'} = C_{c o u n t_{y}}^{r / 2} \cdot C_{c o u n t_{y^{'}}}^{r / 2} = \frac{C ou n t_{y}!}{(r / 2)! (C o u n t_{y} - r / 2)!} \cdot \frac{C ou n t_{y^{'}}!}{(r / 2)! (C o u n t_{y^{'}} - r / 2)!} \\ s . t . y^{'} \subseteq y \subseteq A \end{matrix}

(9)

4.3. System Failure Probability Analysis

In this paper, we analyze the failure probability model of a redundant system A designed with the DDHR architecture when subjected to attacks. The attacker can select any lower-layer redundant executor to attack. After completing the attack, if successful, the attacker can randomly choose either the corresponding redundancy in the same layer or the upper redundant executors to attack. If the attack fails, the attacker will continue to randomly select lower redundant executors to attack.

When the system A fails, meaning that h redundancies have been successfully attacked, there are n lower-layer redundant executor sets that have been successfully attacked, and h minus n upper-layer redundant executor sets that have been successfully attacked. Let us denote the probability of the i-th lower-layer redundant executor d_i being attacked successfully as p_i. Then, the probability that the upper-layer redundant executor u_i corresponding to d_i is attacked successfully is denoted as

p (u_{i} | d_{i})

, and the probability that it is not attacked successfully is denoted as

p (\bar{u_{i}} | d_{i})

. Additionally, the probability that any other lower-layer redundant executor is not attacked successfully, given that d has been attacked, is denoted as

p ({\bar{d}}_{j} | d_{i})

. The average failure probability P of the entire redundant system A can be expressed as follows:

P = \frac{1}{r - h + 1} (\sum_{h = [\frac{r}{2}] + 1}^{r} \sum_{k = [\frac{r}{4}] + 1}^{r / 2} (\prod_{\begin{array}{l} i = 1 \\ j = i \end{array}}^{k} p (d_{j} | d_{i}) \prod_{\begin{array}{l} i = 1 \\ l \neq i \end{array}}^{h - k} p (\bar{d_{l}} | d_{i}) \prod_{m = 1}^{k} p (u_{m} | d_{m}) \prod_{n = 1}^{h - k} p (\bar{u_{n}} | d_{n})))

(10)

Furthermore, assuming that all failure factors of a single redundant executor are influenced by the attacks on redundant executors in the same layer or the upper and lower layers with corresponding relationships, we introduce two similarity functions:

g (s_{i j})

and

f (s_{i j})

. These functions map the similarity and failure rate of a single layer, as well as the similarity and failure rate between the two layers, respectively. The conditional probabilities in the previous equations can be expressed as the product of the probability that the redundant executors themselves fail due to an attack and the probability of the redundant executors themselves being attacked, considering the similarity functions

g (s_{i j})

or

f (s_{i j})

. For the sake of convenience in research, we can map the failure rate to the average similarity rate. Let

g (\bar{s})

denote the average similarity correlation function of redundancies in the same layer, and

f (\bar{s})

denote the average similarity correlation function of redundancies in different layers. With this mapping, the expression function for the failure probability P of the redundant system A can be rewritten as follows:

P = \frac{1}{r - h + 1} (\sum_{h = [\frac{r}{2}] + 1}^{r} \sum_{k = [\frac{r}{4}] + 1}^{r / 2} \begin{array}{l} [\prod_{i = 1}^{k} p (d_{i}) p (u_{i}) \prod_{l \neq i}^{h - k} (1 - p (d_{l})) \prod_{n = 1}^{h - k} (1 - p (u_{n})) \cdot f {(\bar{s})}^{r / 2} \\ \cdot g {(\bar{s})}^{(r / 2) - 1}] \end{array})

(11)

For a system with the DHR architecture, there is no inter-layer similarity, so the term

f (\bar{s})

does not exist. As a result, the upper and lower redundant executors can be considered together, meaning that the terms

p (d)

and

p (u)

can be merged. With these considerations, the equation can be rewritten as follows:

P = \frac{1}{r - h + 1} (\sum_{h = [\frac{r}{2}] + 1}^{r} [\prod_{i = 1}^{h} p (d_{i}) \prod_{l \neq i}^{r - h} (1 - p (d_{l})) \cdot g {(\bar{s})}^{r - 1}])

(12)

5. Experiment Simulation

This section presents the simulation and metric analysis of single DHR and double DDHR redundant systems. The experiments were conducted in an environment consisting of an Intel Core i7 7200 CPU, 16GB DDR memory, Windows 11 Professional operating system, and Python 3.9 software runtime. For the simulations, we considered a redundant executors pool with a given degree of redundancy, set to 12. The similarity between redundant executors was generated randomly following a

β

-distribution with parameters (5, 15) [30]. Consequently, we obtained a similarity matrix as shown in Figure 7.

5.1. Available Similarity Simulation Experiments

For the cases where the redundancy r was 3 or 4, we conducted 100 independent experiments for both DHR architecture (with redundancy 3 or 4) and DDHR architecture (with upper redundancy 3 and lower redundancy 3 for r = 6, and upper redundancy 4 and lower redundancy 4 for r = 8). The results of these experiments are illustrated in Figure 8a,b.

The statistics of the average similarity experiment results are shown in Table 3.

Based on the aforementioned experiments, it is evident that when the redundancy is identical, the average similarity redundancy of the DDHR architecture is significantly lower compared to the average similarity of the DHR architecture. Specifically, the redundancy of the DDHR architecture is approximately 47% of the average usable similarity observed in the DHR architecture. Additionally, when the redundancy of each layer in the DDHR architecture matches those of the DHR architecture, the average usable similarity of the DDHR architecture’s redundancy set amounts to approximately 46% of the average usable similarity of the redundancy set in the DHR architecture.

5.2. Scheduling Cycle Simulation Experiment under Double-Layer Similarity Threshold Conditions

For the cases where the redundancy is r = 6 and r = 8, we conducted 500 experiments considering two scenarios: without considering the similarity threshold and considering the similarity threshold. In the DDHR architecture, the upper layer has a redundancy of 3 and the lower layer has a redundancy of 3. In the DHR architecture, the upper layer has a redundancy of 4, and the lower layer has a redundancy of 4.

(1): Without considering the similarity threshold:

When the similarity threshold restriction is not taken into account, the scheduling cycle of the DDHR redundant executors set is the same as that of the DHR redundant executors set. The average scheduling cycle is depicted in Figure 9.

The statistics of the average scheduling cycle experimental results, when the similarity threshold restriction is not considered, are shown in Table 4.

(2): Considering the similarity threshold:

For the DDHR architecture, the same-layer similarity threshold (slim) is set to 0.40, and the inter-layer similarity thresholds slimud are set to 0.35 and 0.3, respectively. For the DHR architecture, the single-layer similarity thresholds slim are set to 0.40, 0.35, and 0.3, respectively. The scheduling cycles of the different architectures and the trends of changes are presented in Figure 10a,b.

(1): DDHR Architecture Scheduling Cycle.

(2): DHR Architecture Scheduling Cycle

Table 5 presents the statistical analysis of the average scheduling cycle experimental results considering the limitation imposed by the similarity threshold.

Based on the aforementioned experiments, it is evident that when the similarity threshold is not taken into consideration, the system scheduling cycles (Ts) for both the DHR redundant executors collection and DDHR redundant executors collection are approximately equal to the

C_{N}^{r}

. When taking the similarity threshold into account and setting the single-layer similarity threshold slim to 0.4, the overall scheduling cycles of the DDHR architecture and DHR architecture are reduced by approximately 42% and 71%, respectively, for redundancy of r = 6 and r = 8. Moreover, utilizing the constraints imposed by the single-layer similarity thresholds, the DDHR architecture adopts inter-layer similarity thresholds slimud of 0.35 and 0.3, while the DHR architecture employs similarity thresholds slim of 0.35 and 0.3. Under the condition of redundancy of r = 6, the scheduling cycle of the DDHR architecture is reduced by 13% and 61%, and the DHR architecture experiences a reduction of 35% and 76%. Furthermore, under the redundancy of r = 6, the overall scheduling cycles of the DDHR architecture and DHR architecture are reduced by approximately 42% and 71%, respectively.

(3): Comparison of DDHR and DHR architectures with the changing trend of thresholds.

The comparison of the scheduling cycle trend with the threshold for the DDHR and DHR architectures, under the same threshold, is depicted in Figure 10c. It is evident that as the threshold value decreases, the DDHR architecture exhibits a slower decrease in its scheduling cycle compared to the DHR architecture. This implies that in practical operational scenarios, the DDHR architecture demonstrates better adaptability to situations such as offline operations and cleaning, while still maintaining the system’s effective redundant scheduling. Consequently, it is more suitable for engineering applications.

5.3. System Failure Probability Simulation Experiment

Since all redundant executors can be considered equivalent without considering the a priori case, it is assumed that the failure rates of the redundant executors in the pool are all the same, set at 0.1. Additionally, assuming that the correlation functions between similarity and the probability of system failure denoted as

g (s_{i j})

and

f (s_{i j})

, respectively, are linearly increasing functions within the value domain of

(0, + \infty)

and the range of

(0, 1)

, the constant coefficients of the average similarity are represented by

g (\bar{s})

and

f (\bar{s})

. The failure rate of the system with redundancy of 6 and 8 is depicted in Figure 11.

Meanwhile, we further compare the system failure probability of DDHR architecture, DHR architecture, and traditional redundant architecture, as shown in Table 6.

For redundancy r = 6 and 8, the average failure rates of the DDHR architecture are approximately 61% and 54% of that observed in the DHR architecture, respectively. Consequently, the DDHR architecture exhibits a lower average failure rate compared to the DHR architecture. At the same time, the DHR and DDHR architectures are only about one-thousandth the size of traditional redundant architectures in terms of mean failure probability metrics. In practical redundant systems, it is commonly observed that the similarity between each redundant executor of the system and its component program exhibits a nonlinear positive correlation. This observation is supported by the reference study [20], which highlights that if the dissimilarity between one component of the two redundant executors is significantly larger than the dissimilarity among the other components, the resulting ruling error will be greater when the overall dissimilarity of the redundant executors is linearly composed of the dissimilarities of the individual components. The correlation can be analyzed using the following approach:

Let us consider a redundant executor, denoted as

a = {l_{1} \dots l_{n}}

, which comprises n weight approximations of the heterogeneity component vector l. If we treat the first n − 1 heterogeneity vectors as a collective entity, we can represent it as

a = {l_{1 \cup ‥ n - 1}, l_{n}}

. In this scenario, when comparing redundant executors with other isomorphic redundant executors, it becomes evident that the weight influence of

l_{1 \cup ‥ n - 1}

is relatively larger. To further investigate this, we define

f (l_{1 \cup ‥ n - 1})

as the influence function of

l_{1 \cup ‥ n - 1}

on the reliability of the redundant executors. The integral

\int f (l_{1 \cup ‥ n - 1}) d Δ l_{n}

, calculated over the proportion of the number of redundant executors

Δ l_{n}

, denotes the function of the effect of heterogeneity on reliability.

The obtained result

\int f (l_{1 \cup ‥ n - 1}) d Δ l ≫ \int f (l_{n}) d Δ l

implies that the impact of component similarity attributes on the system failure rate gradually diminishes beyond a certain threshold of similar components. Consequently, this section further assumes the correlation functions g(s) and f(s) between similarity and system failure probability, where

g (s) = 1 / (1 - e^{- s})

and

f (s) = 1 / (1 - e^{- s / 2})

, to qualitatively evaluate the system reliability and the similarity of redundant executors. Moreover, we consider

f (s)

as a similarity function between the upper and lower layers of redundant executors, which is slightly smaller than the similarity function

g (s)

within the same layer of redundant executors. The relationship between the number of similar components and the system failure rate for the redundancy r = 6 and 8 is then depicted in Figure 12.

5.4. Comparative Analysis of System Model Complexity and System Overheads

Assuming that the number of upper and lower redundant executors are both n, the redundancy of the whole system is 2n. Then the overall computational time complexity of DHR architecture is the time complexity of independent parallel computation of 2n redundant executors, which can be written as

o (2 n)

. The computational time complexity of DDHR architecture, according to the algorithm proposed in Section 2 of this paper, is equal to

o (3 n)

. The comparison of the time complexity of the two algorithms is shown in Table 7. Obviously, the overall computational time complexity of DDHR architecture is slightly higher than that of DHR architecture.

In terms of system overhead, the computational overhead of the DHR architecture and the DDHR architecture is the same because each redundant executor is independently computing in parallel with each other. If the computation delay and network transmission delay of different heterogeneous systems are ignored, there is no change in the number of computing operations of each redundant executor at the same time.

5.5. Laboratory Simulation Experiment

We further built a prototype system environment in a high-speed railway laboratory and conducted both validity and multi-task concurrency performance experiments to verify its performance in real-world environments.

(1): Setting up the experimental environment for the prototype system

As shown in Figure 13, we constructed the simplest DHR/DDHR architecture with a redundancy of 6 by having one client, two Ngnix agent hosts, six virtual machines, and one database server.

Where the client inputs and receives feedback, the Ngnix-1 agent host manages the access agents of all the VMs, and the Ngnix-2 agent host switches between the DHR and DDHR architectures. When the Ngnix-2 proxy host starts the proxy function, it can be regarded as the scheduling gateway in the DDHR architecture, and it realizes the two-tier computing of DDHR architecture by controlling the data received from the VM1, VM2, VM3 virtual machines, and Ngnix-1. When Ngnix-2 deactivates the proxy function, it directly forwards the data sent by Ngnix-1 to VM4-VM6 to realize the redundant body computing function of DHR architecture. The database server is used to realize the storage of the simulated access data and the mimetic ruling computation. Six virtual hosts simulate the lower and upper heterogeneous redundancy executors, respectively, and the specific configurations are shown in Table 8.

The six redundant executors deploy the same business applications in heterogeneous environments separately and are unified through the management center on the agent host, as shown in Figure 14.

(2): Validity experiments

We test the validity experiments under traditional Master-slave mode redundancy, DHR architecture, and DDHR architecture conditions, respectively, by a client requesting one piece of train arrival and departure data (about 1 KB) from the database server, and by controlling the Ngnix proxy hosts to make changes to the results of the VM virtual host parsing computations.

Master-slave mode redundancy

Select VM1 as the master node, and change the data of the VM1 node to “−1” through Ngnix-1 when the client sends out the request; then, the client will fail to return results.

DHR and DDHR Architecture

When the client sends out a request, randomly select any 1–2 nodes of VM1–VM6, and change their data to “−1” through Ngnix-1, with Ngnix-2 turned off; the client can still return data normally.

When the client sends out a request, randomly select any 1–3 nodes of VM1–VM3, and change their data to “−1” through Ngnix-1, with Ngnix-2 turned on; the client can still return data normally.

Figure 15 shows the traditional Master-slave mode redundancy, the DHR architecture, and the DDHR architecture with the results returned by the client. The DDHR and DHR architectures can tolerate problematic failures of any portion of the compute nodes, whereas traditional redundant architectures will fail when the master node fails.

(3): Concurrency Performance Experiment

We tested the DHR architecture and DDHR architecture through performance testing software for 5 min under simulated real business scenarios of about 100–200 users concurrently, and the results are shown in Figure 16. Figure 16a represents the response time comparison between DHR architecture and DDHR architecture under the condition of 100 users’ concurrency; Figure 16b represents the response time comparison between DHR architecture and DDHR architecture under the condition of 200 users’ concurrency.

The average response times of the DHR architecture and DDHR architecture under 100 and 200 users’ concurrency conditions are shown in Table 9.

The DDHR architecture response time is slightly higher than the response time of the DHR architecture under 100 and 200 users’ concurrency conditions. It is 1.43 times and 1.47 times the response time of DHR architecture, respectively. The DDHR architecture is slightly worse than the DHR architecture in terms of average response time. However, its average response time is still much less than the business response requirement of the RPSS station under the condition of 200 users’ concurrency, i.e., the average response time is less than 3 s [1].

In summary, the DDHR architecture proposed in this paper has the following advantages over the DHR architecture:

(1) Under the cloud-edge architecture of the RPSS and the limited schedulable resources at each deployment level, the resources at different levels can be comprehensively utilized to achieve unified redundant scheduling. (2) Under the cloud-edge architecture, it has better dynamics, heterogeneity, and reliability than the DHR architecture.

However, the dual-layer architecture also has certain weaknesses: the additional computational overhead caused by the computational complexity makes the DDHR architecture perform slightly worse than the traditional DHR architecture in processing large amounts of data and under the condition of multi-task concurrency.

Therefore, DDHR architecture can be applied to the key core business modules of the RPSS, such as control and management command processing with less data volume in the station nodes, rather than to a large amount of data interaction, synchronization, and other businesses. It can fully utilize the existing resources and improve the reliability of the system in the process of business control and management.

6. Conclusions

This paper ensures the reliability of China’s railway passenger service system (RPSS) during its extended construction. To achieve this goal, a double-layer dynamic heterogeneous redundancy (DDHR) architecture, a data flow processing algorithm, and a weighted mimetic ruling algorithm adapted to it are proposed, so that the redundant executors of the RPSS can be uniformly scheduled and co-computed under the resource-limited conditions of the regional center nodes and the line center nodes, thus enhancing the reliability of the mimetic architecture under the limited resource conditions. Subsequently, numerical modeling analysis and simulation experiments are carried out to evaluate the performance metrics of the DDHR architecture. The results show that the DDHR architecture outperforms the DHR architecture in terms of heterogeneity, dynamics, and the probability of failure by attack. Meanwhile, the computational complexity and overhead of DDHR and DHR architectures are analyzed, and it is verified that the computational overhead of DDHR architecture is the same as that of DHR architecture. Still, the computational complexity is slightly higher than that of DHR architecture. Finally, a simple prototype system with a redundancy of 6 is built, and the system's effectiveness and multi-task concurrent performance are verified through experiments. The experiment proves that DDHR architecture and DHR architecture can effectively realize business computing and fault tolerance. At the same time, it is also verified that the DDHR architecture is weaker than the DHR architecture in terms of multi-tasking concurrency performance. However, it can still satisfy the business needs of the system in terms of core business computation and control of the RPSS.

Nonetheless, this paper has not performed an in-depth study of the application scenarios of the DDHR architecture, or of the optimization of the redundant executor scheduling algorithm. Future research will focus on the optimization of the scheduling algorithm for the DDHR architecture.

Author Contributions

Conceptualization, X.W., M.W. and J.S. Investigation and methodology, X.W., M.W. and J.S. Writing original draft preparation, X.W.; Writing-review and editing, M.W., J.S. and Y.G. Software, X.W., M.W. and J.S. Validation, X.W., M.W. and J.S. All the authors have proofread the final version. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Scientific Research Project of China Academy of Railway Sciences Co., Ltd. (No. N2023S005).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

TB 10074-2016; Design Specification for Railway Passenger Transport Service Information System. National Railway Administration: Beijing, China, 2016.
Liu, S.; Yin, C.; Chen, D.; Lv, H.; Zhang, Q. Cascading Failure in Multiple Critical Infrastructure Interdependent Networks of Syncretic Railway System. IEEE Trans. Intell. Transp. Syst. 2022, 23, 5740–5753. [Google Scholar] [CrossRef]
Bešinović, N.; Nassar, R.F.; Szymula, C. Resilience assessment of railway networks: Combining infrastructure restoration and transport management. Reliab. Eng. Syst. Saf. 2022, 224, 108538. [Google Scholar] [CrossRef]
Arvidsson, B.; Johansson, J.; Guldåker, N. Critical infrastructure, geographical information science and risk governance: A systematic cross-field review. Reliab. Eng. Syst. Saf. 2021, 213, 107741. [Google Scholar] [CrossRef]
Theisen, C.; Munaiah, N.; Al-Zyoud, M.; Carver, J.C.; Meneely, A.; Williams, L. Attack surface definitions: A systematic literature review. Inf. Softw. Technol. 2018, 104, 94–103. [Google Scholar] [CrossRef]
Park, N.-E.; Lee, Y.-R.; Joo, S.; Kim, S.-Y.; Park, J.-Y.; Kim, S.-Y.; Lee, I.-G. Performance evaluation of a fast and efficient intrusion detection framework for advanced persistent threat-based cyberattacks. Comput. Electr. Eng. 2023, 105, 108548. [Google Scholar] [CrossRef]
Zhang, J.; Zheng, J.; Zhang, Z.; Chen, T.; Tan, Y.-A.; Zhang, Q.; Li, Y. ATT & CK-based Advanced Persistent Threat attacks risk propagation assessment model for zero trust networks. Comput. Netw. 2024, 245, 110376. [Google Scholar]
Stojanović, B.; Hofer-Schmitz, K.; Kleb, U. APT datasets and attack modeling for automated detection methods: A review. Comput. Secur. 2020, 94, 101734. [Google Scholar] [CrossRef]
Kumar, R.; Kela, R.; Singh, S.; Trujillo-Rasua, R. APT attacks on industrial control systems: A tale of three incidents. Int. J. Crit. Infrastruct. Prot. 2022, 37, 100521. [Google Scholar] [CrossRef]
Zheng, Y.; Li, Z.; Xu, X.; Zhao, Q. Dynamic defenses in cyber security: Techniques, methods, and challenges. Digit. Commun. Netw. 2022, 8, 422–435. [Google Scholar] [CrossRef]
Wu, J. Cyberspace Mimic Defense: Generalized Robust Control and Endogenous Security; Springer: Cham, Switzerland, 2020; pp. 207–272. [Google Scholar]
Wu, J. Cyberspace Mimic Defense; Springer: Cham, Switzerland, 2016. [Google Scholar]
Wu, J. Cyberspace Endogenous Safety and Security. Engineering 2022, 15, 179–185. [Google Scholar] [CrossRef]
Ren, Q.; Guo, Z.; Wu, J.; Hu, T.; Jie, L.; Hu, Y.; He, L. SDN-ESRC: A Secure and Resilient Control Plane for Software-Defined_Networks. IEEE Trans. Netw. Serv. Manag. 2022, 19, 2366–2381. [Google Scholar] [CrossRef]
Qiang, W.; Chunming, W.; Xincheng, Y.; Qiumei, C. Intrinsic Security and Self-Adaptive Cooperative Protection Enabling Cloud Native Network Slicing. IEEE Trans. Netw. Serv. Manag. 2021, 18, 1287–1304. [Google Scholar] [CrossRef]
Wang, Z.; Jiang, D.; Lv, Z. AI-Assisted Trustworthy Architecture for Industrial IoT Based on Dynamic Heterogeneous Redundancy. IEEE Trans. Ind. Inform. 2022, 19, 2019–2027. [Google Scholar] [CrossRef]
Sepczuk, M. Dynamic Web Application Firewall detection supported by Cyber Mimic. J. Netw. Comput. Appl. 2023, 213, 89–97. [Google Scholar] [CrossRef]
Li, Y.; Liu, Q.; Zhuang, W.; Zhou, Y.; Cao, C.; Wu, J. Dynamic Heterogeneous Redundancy-Based Joint Safety and Security for Connected Automated Vehicles. IEEE Veh. Technol. Mag. 2023, 18, 89–97. [Google Scholar] [CrossRef]
Wei, S.; Zhang, H.; Zhang, W.; Yu, H. Conditional Probability Voting Algorithm Based on Heterogeneity of Mimic Defense System. IEEE Access 2020, 8, 188760–188770. [Google Scholar] [CrossRef]
Chen, G.; Shi, G.; Chen, L.; He, X.; Jiang, S. A Novel Model of Mimic Defense Based on Minimal L-Order Error Probability. IEEE Access 2020, 8, 180481–180490. [Google Scholar] [CrossRef]
Liu, Q.R.; Lin, S.J.; Gu, Z.Y. Heterogeneous redundancies scheduling algorithm for mimic security defense. J. Commun. 2018, 39, 188–198. [Google Scholar]
Pu, L.; Wu, J.; Ma, H.; Zhu, Y.; Li, Y. MimicCloudSim: An environment for modeling and simulation of mimic cloud service. China Commun. 2021, 18, 212–221. [Google Scholar] [CrossRef]
Tong, Q.; Guo, Y. A comprehensive evaluation of diversity systems based on mimic defense. Sci. China Inf. Sci. 2021, 64, 229304. [Google Scholar] [CrossRef]
Shi, L.; Miao, Y.; Ren, J.; Liu, R. Game analysis and optimization for evolutionary Dynamic Heterogeneous Redundancy. IEEE Trans. Netw. Serv. Manag. 2023, 20, 4186–4197. [Google Scholar] [CrossRef]
Chen, Z.; Cui, G.; Zhang, L.; Yang, X.; Li, H.; Zhao, Y.; Ma, C.; Sun, T. Optimal Strategy for Cyberspace Mimic Defense Based on Game Theory. IEEE Access 2021, 9, 68376–68386. [Google Scholar] [CrossRef]
Li, Q.; Meng, S.; Sang, X.; Zhang, H.; Wang, S.; Bashir, A.K.; Yu, K.; Tariq, U. Dynamic Scheduling Algorithm in Cyber Mimic Defense Architecture of Volunteer Computing. ACM Trans. Internet Technol. 2021, 21, 1–33. [Google Scholar] [CrossRef]
Guo, W.; Wu, Z.; Zhang, F.; Wu, J. Scheduling Sequence Control Method Based on Sliding Window in Cyberspace Mimic Defense. IEEE Access 2020, 8, 1517–1533. [Google Scholar] [CrossRef]
Zhu, Z.; Yu, H.; Liu, Q.; Liu, D.; Yu, H. An Adaptive Multi-executors Scheduling Algorithm Based on Heterogeneity for Cyberspace Mimic Defense. Secur. Commun. Netw. 2022, 13, 2300407. [Google Scholar]
Shao, S.; Ji, Y.; Zhang, W.; Liu, S.; Jiang, F.; Cao, Z.; Wu, F.; Zeng, F.; Zuo, J.; Zhou, L. A DHR executor selection algorithm based on historical credibility and dissimilarity clustering. Sci. China 2023, 66, 212304. [Google Scholar] [CrossRef]
Yao, W.B.; Yang, X.Z. Design of selective algorithm for diverse software components. J. Harbin Inst. Technol. 2003, 35, 261–264. [Google Scholar]

Figure 1. Architecture of RPSS.

Figure 2. Dynamic heterogeneous redundancy structure.

Figure 3. Schematic diagram of DDHR architecture design concept.

Figure 4. DDHR logical architecture.

Figure 5. DDHR engineering architecture of RPSS.

Figure 6. Working flow.

Figure 7. Similarity matrix of redundant executors pool for an experiment.

Figure 8. (a): The average similarity of DHR architecture when r = 3, 4. (b) The average similarity of DHR and DDHR architectures when r = 6, 8.

Figure 9. The average scheduling cycle of DHR and DDHR architectures when r = 6, 8.

Figure 10. (a) The average scheduling cycle of DDHR architectures when r = 6, 8 under the condition of similarity threshold limit. (b) The average scheduling cycle of DHR architectures when r = 6, 8 under the condition of similarity threshold limit. (c) Trends in average scheduling cycle time for DHR and DDHR architectures when r = 6, 8.

Figure 11. Comparison of the failure rates of DHR and DDHR systems under the assumption of linear correlation between similarity and probability of system failure when r = 6, 8.

Figure 12. Comparison of the failure rates of DHR and DDHR systems under the assumption of nonlinear correlation between similarity and redundancy failure rate when r = 6, 8.

Figure 13. System prototype of DHR/DDHR architecture.

Figure 14. Heterogeneous redundant executors deployment and management page.

Figure 15. Experimental testing of the effectiveness of Master-slave mode redundancy, DHR, and DDHR architectures.

Figure 16. (a) The response time comparison between DHR architecture and DDHR architecture under the condition of 100 users’ concurrency. (b) The response time comparison between DHR architecture and DDHR architecture under the condition of 100 users’ concurrency.

Table 1. Comparison of research on DHR architecture indicator.

	Optimization Direction	Scheduling Constraints	Heterogeneity	Dynamic	Reliability	Other Indicators
	Optimization Direction	Scheduling Constraints	Heterogeneity	Dynamic	Reliability	Other Indicators
[19]	Voting strategy		√			Voting heterogeneous weights
[20]	Voting/Scheduling strategy		√	√		Historical credibility decision-making
[21]	Scheduling strategy		√	√	√
[22]	Scheduling strategy		√	√		Redundancy security
[23]	Scheduling strategy		√	√		AL/AT
[24,25]	Scheduling strategy		√	√	√	attack cost/defensive cost
[26,27]	Scheduling strategy			√		Time, task threshold
[28,29]	Scheduling strategy		√	√		High-order heterogeneity, historical confidence
This paper	Voting/Scheduling strategy	√	√	√	√	Available Similarity and Redundancy Weights

Table 2. Symbol definition.

Symbol	Definition
A	Heterogeneous redundant system
a	Redundant executor
U	Upper-layer redundant executors set
D	Lower-layer redundant executors set
r, m, n	Redundancy
l	Heterogeneous feature vectors
$u_{i}$	The i-th upper redundant executor
$d_{i}$	The i-th lower redundant executor
$f$	Scheduling scheme
s	Similarity threshold
Slim	Same layer redundancy similarity threshold
Slimud	Different layer redundancy similarity threshold
Sfi	$The similarity matrix of the scheme f_{i}$
T	Scheduling cycle
P	Average system failure probability
p(ui)	Average failure probability of Upper-layer redundant executors set
p(di)	Average failure probability of Lower-layer redundant executors set

Table 3. The average similarity of the DHR and DDHR architecture redundant executor sets when r = 3, 4, 6, and 8.

Redundancy Rate r	DHR-Average Available Similarity	DDHR-Average Available Similarity
r = 3	0.2540	--
r = 4	0.2444	--
r = 6	0.24514	0.1175
r = 8	0.2498	0.1161

Table 4. The average scheduling cycle of the redundant executor sets of DHR and DDHR when r = 6, 8.

Scheduling Cycle T	r = 6	r = 8
DHR/DDHR	933.74	480.74

Table 5. The average scheduling cycle of the redundant sets of DHR and DDHR when r = 6, 8.

	DDHR Scheduling Cycle T			DHR Scheduling Cycle T
	slim = 0.4	slimud = 0.35	slimud = 0.30	slim = 0.4	slim = 0.35	slim = 0.30
r = 6	546.58	476.88	213.51	540.02	353.85	130.48
r = 8	143.86	136.52	86.27	139.19	56.84	20.26

Table 6. Comparison of the failure rates of DHR, DDHR, and traditional redundancy architectures.

Probability of System Failure (p)	r = 6	r = 8
Traditional redundancy	$1.67 \times 10^{- 2}$	$1.25 \times 10^{- 2}$
DHR architecture	$8.89 \times 10^{- 5}$	$9.24 \times 10^{- 6}$
DDHR architecture	$5.40 \times 10^{- 5}$	$5.01 \times 10^{- 6}$

Table 7. The comparison of time complexity of the DHR and DDHR architecture algorithms.

	DHR	DDHR
Time complexity	$o (2 n)$	$o (3 n)$

Table 8. Redundant executor configurations of the prototype system.

Classification	Name and Configuration	Environment
Lower-layer	(D)VM1(4C,8G)	JAVA_1.8&Linux_RedHat8
	(D)VM2(4C,8G)	c#&Winserver_2018
	(D)VM3(4C,8G)	Python_3.9&Windows_11
Upper-layer	(U)VM4(4C,8G)	JAVA_1.8&Kirin_V10
	(U)VM5(4C,8G)	Asp.Netcore&Linux_centos8
	(U)VM6(4C,8G)	C++&Unicon

Table 9. Average response time for DHR architecture and DDHR architecture.

Number of Concurrent Users	DHR	DDHR
100	0.176 s	0.252 s
200	0.288 s	0.426 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, X.; Wang, M.; Shen, J.; Gong, Y. Towards Double-Layer Dynamic Heterogeneous Redundancy Architecture for Reliable Railway Passenger Service System. Electronics 2024, 13, 3592. https://doi.org/10.3390/electronics13183592

AMA Style

Wu X, Wang M, Shen J, Gong Y. Towards Double-Layer Dynamic Heterogeneous Redundancy Architecture for Reliable Railway Passenger Service System. Electronics. 2024; 13(18):3592. https://doi.org/10.3390/electronics13183592

Chicago/Turabian Style

Wu, Xinghua, Mingzhe Wang, Jinsheng Shen, and Yanwei Gong. 2024. "Towards Double-Layer Dynamic Heterogeneous Redundancy Architecture for Reliable Railway Passenger Service System" Electronics 13, no. 18: 3592. https://doi.org/10.3390/electronics13183592

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Towards Double-Layer Dynamic Heterogeneous Redundancy Architecture for Reliable Railway Passenger Service System

Abstract

1. Introduction

2. Related Work

3. DDHR Architecture Design

3.1. DDHR Architecture Design Idea

3.2. Design of DDHR Logical Architecture for RPSS

3.3. Engineering Architecture Design

3.4. DDHR Architecture Data Processing Design

4. DDHR Architecture Analysis

4.1. System Heterogeneity Analysis

4.2. System Dynamics Analysis

4.3. System Failure Probability Analysis

5. Experiment Simulation

5.1. Available Similarity Simulation Experiments

5.2. Scheduling Cycle Simulation Experiment under Double-Layer Similarity Threshold Conditions

5.3. System Failure Probability Simulation Experiment

5.4. Comparative Analysis of System Model Complexity and System Overheads

5.5. Laboratory Simulation Experiment

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI