Model-Driven Bayesian Network Learning for Factory-Level Fault Diagnostics and Resilience

Ademujimi, Toyosi; Prabhu, Vittaldas

doi:10.3390/su16020513

Open AccessArticle

Model-Driven Bayesian Network Learning for Factory-Level Fault Diagnostics and Resilience

by

Toyosi Ademujimi

and

Vittaldas Prabhu

^*

Harold and Inge Marcus Department of Industrial and Manufacturing Engineering, Pennsylvania State University, University Park, PA 16802, USA

^*

Author to whom correspondence should be addressed.

Sustainability 2024, 16(2), 513; https://doi.org/10.3390/su16020513

Submission received: 29 October 2023 / Revised: 31 December 2023 / Accepted: 4 January 2024 / Published: 7 January 2024

(This article belongs to the Special Issue Industry 4.0: Smart Green Applications)

Download

Browse Figures

Versions Notes

Abstract

:

We propose to use engineering models for Bayesian Network (BN) learning for fault diagnostics at the factory-level using key performance indicators (KPIs) such as overall equipment effectiveness (OEE). OEE is widely used in industry and it measures sustainability by capturing product quality (e.g., less scrap) and measures resilience by capturing availability. A major advantage of the proposed approach is that the engineering models are likely to be available long before the corresponding digitalized smart factory becomes fully operational. Specifically, for BN structure learning, we propose to use analytical queueing theory models of the factory to elicit the structure, and to carry out intervention we propose to use designed experiments based on discrete-event simulation models of the factory. For parameter learning, we apply a qualitative maximum a posteriori (QMAP) method and propose additional expert constraints based on the law of propagation of uncertainty from queueing theory. Furthermore, the proposed approach overcomes the challenge of obtaining balanced-class data in BN learning for fault diagnostics. We apply the proposed BN learning approach to (i) a 4-robot cell in our laboratory and (ii) a robotic machining cell in a commercial vehicle factory. In both cases, the proposed method is found to be efficacious in accurately learning the BN structure and parameter, as measured using structural-hamming distance and Kullback–Leibler divergence score, respectively. The proposed approach can pave the way for a new class of resilient and sustainable smart manufacturing systems.

Keywords:

machine learning; diagnostics; OEE; queueing models; smart manufacturing; smart factory; resilience; sustainability

1. Introduction

Industry 4.0 is leading to infusion of a variety of technologies such as Artificial Intelligence/Machine Learning (AI/ML), networked sensors that are increasingly based on wireless networks, and powerful computing capabilities through cloud platforms. There are significant opportunities to leverage these advances to make manufacturing systems smarter and greener. Manufacturing systems can be made more sustainable by making them more resilient, e.g., “bounce back quickly from disruptions”. Such sustainable and resilient manufacturing systems would perform at or near zero waste levels in terms of energy, material, and other resources. One of the prevailing challenges in achieving this is uncertainty in general, and random faults in particular. A specific challenge in dealing with random faults is the need for quick and accurate diagnostics to ensure resilience to faults. Likewise, quick and accurate diagnostics can also contribute to sustainability by enabling fuller utilization of the installed manufacturing capacity.

For research in smart and green Industry 4.0 initiatives to be relevant to engineering practitioners, the metrics used to model performance should be widely used in practice. Overall equipment effectiveness (OEE) is one such metric that combines availability, performance, and quality (this is discussed in more detail later). OEE is widely used in industry and it measures sustainability by capturing product quality (e.g., less scrap) and measures resilience by capturing availability. Better maintenance leads to improved availability, thereby improving OEE and sustainability. The sustainability benefits of improving maintenance tend to be second-order benefits that accrue over the long-term such as life cycle of the asset [1]. Although the link between maintenance and sustainability is intuitive, researchers have begun to focus on it more recently [2,3]. There has been a proposal for overall environmental equipment effectiveness (OEEE) that combines OEE with sustainability [4]. There is a need to develop a more comprehensive framework that can more completely characterize the relationship between maintenance and sustainability, and that can be thoroughly validated [5]. Industry 4.0 enables entirely new paradigms for maintenance by making systems smarter and more sustainable [6]. Advances in ML coupled with industry level data can also be used to benchmark factories to quickly identify opportunities to improve energy efficiency, thereby making manufacturing greener [7].

Additive manufacturing (AM) as an emerging technology [8] has shown great potential in reducing material waste. Advances through Industry 4.0 in sensors and AI/ML can significantly contribute to reducing scrap parts by enabling in situ and real-time process monitoring, especially in AM processes. In the past, statistical process control (SPC) has been the main approach in checking for bad parts that violate dimensional tolerance, which typically would trigger machine maintenance/calibration and potentially scrapped parts because of the time lags involved [9]. Well-maintained machines hold tolerances better, and produce less scrap and rework parts, which contributes to sustainability. Such scrapped parts can be minimized or eliminated by embedding in situ sensors and real-time computation for defect detection and part quality monitoring [10,11].

AI/ML can be used for improving manufacturing OEE through fault diagnostics, thereby improving resilience and sustainability. Manufacturing system faults can be classified using the integrated ISA-95 and ISA-98 equipment hierarchical model [12] as either factory-level faults or machine-level faults [13]. Machine-level faults occur at the equipment module level, and they include the loss of functionality of individual machine’s components or sub-components. Factory (or system) level faults, on the other hand, occur at or above the unit level, and they refer to shortcomings in manufacturing key performance indicators (KPIs) such as those defined in the ISO 22400-1 [14] and ISO 22400-2 [15] standards, including low throughput and low overall equipment effectiveness (OEE). Since these KPIs are used by upper management for tracking system performance, identifying which KPI drives the other can be very informative for strategic improvement.

In the context of Industry 4.0, ML can be used to improve OEE by using approaches like Bayesian Network (BN) models. BN belongs to the class of probabilistic graphical models being studied extensively for several diagnostics applications, including determining the influential relationships between different human resource KPIs [16] as well as between manufacturing KPIs [17]. BN factorizes the global distribution of random variables into a single model, making it a good fit for representing complex hierarchical systems like manufacturing systems at the factory-level. This hierarchical representation allows for determination of fault paths from symptom, i.e., factory-level fault, to root cause, i.e., machine-level fault.

Training a BN model entails two tasks: structure learning and parameter learning. The most prevalent structure learning methods in the literature include carrying out designed experiment on the physical system, utilizing heuristic search method to learn the tree structure from observational data or domain expert elicitation. Although experimentation is usually required to identify the causal relationships between nodes [18,19], experimentation on the physical manufacturing plant is prohibitively expensive in practice. Heuristic search methods require a large amount of balanced-class data to obtain a good network, which is not always available, especially in factories at an early stage of smart manufacturing. A class imbalance arises because failure events are relatively infrequent compared to normal events. Another drawback of learning the directed acyclic graph (DAG) structure using observational data is that, at best, only the graph’s skeleton can be identified [20] as observational data cannot differentiate between graphs that fall in the same independence equivalence (I-equivalent) class, even if infinite data is available [21]. For example, in Figure 1, graphs (a) to (c) are I-equivalent with their skeleton shown in (e). Expert knowledge elicitation, which is usually the preferred method, has an underlying assumption that the expert always readily knows in their memory the causal influence between variables, which may not be always true. Frequently, opinions can vary among experts in the same domain [16,22]. For parameter learning, maximum likelihood estimation (MLE) is a commonly used method, which also requires a large amount of training data to perform well. Purely data-driven methods will produce unreliable parameters when the dataset is small or incomplete [22].

Given the above context for Industry 4.0 approaches for smart and green manufacturing, we posit that there are significant research opportunities to develop ML techniques that can leverage engineering models to accelerate machine learning rather than to be limited to strictly operational data from the manufacturing system. This will allow the manufacturing system to be smarter sooner, thereby making the system more resilient earlier and thus more sustainable. In this paper, we address this opportunity by proposing a BN learning method that uses engineering models to augment the expert-elicitation process for both structure and parameter learning. For structure learning, we first propose a method that applies an analytical model based on queueing theory to determine the structure, followed by designed experiments using discrete-event simulation (DES) model to carry out intervention for causal discovery. A DES model can mirror the operating conditions, thereby simulating the behavior of the factory realistically in terms of performance [23,24]. This approach is much more cost-effective than perturbing the actual manufacturing system for causal discovery. For parameter learning using the qualitative maximum a posteriori (QMAP) method [25], we propose some prior constraints adapted from the queueing theory literature. Our long-term objective is to use learning approaches to improve OEE, to pave the way for a new class of resilient and sustainable smart manufacturing systems.

The rest of this paper is organized as follows. We first provide some background on BN followed by a literature review on current training methods. Next, our proposed methodology is presented, starting with structure learning using a queueing model followed by structure learning using a DES model, and then we move on to parameter learning using a QMAP method. Finally, we show the efficacy of our method using an experimental test bed and a case study followed by the discussion section, our conclusions, and future research directions.

2. Literature Review

Sustainable manufacturing entails the creation of manufactured goods using processes and systems that minimize negative environmental impacts, conserve energy and materials, are safe for employees, and are economically viable [26]. The ongoing adoption of smart manufacturing is reshaping the way products are designed, manufactured, and maintained, potentially resulting in more resilient and sustainable manufacturing systems [27]. In particular, improved fault diagnostics promote resilient and sustainable manufacturing by reducing waste in terms of equipment downtime and rework, reduction in the consumption of materials by preventing production of defective parts caused by faulty equipment, and improved worker safety [28,29]. Several machine learning models, including Bayesian Network (BN) model are thus being applied to improve manufacturing fault diagnostics.

A BN is a probabilistic graphical model that employs a DAG

G

to represent the probabilistic dependencies between a set of random variables and parameter set

θ

to represent the degree of influence between these variables. A directed arc from one variable to another such as

X_{j} \to X_{i}

means that

X_{j}

is the parent or cause of

X_{i}

, and

X_{i}

the child or direct effect of

X_{j}

, indicating a probabilistic cause and effect relationship between the variables. The joint probability distribution of a BN can be decomposed into local distributions as follows using the chain rule of BNs:

P (X) = \prod_{i = 1}^{m} P (X_{i} | P a (X_{i}))

(1)

where

P a (X_{i})

is the parent set of node

X_{i}

. A BN embeds a local Markov property that each variable is conditionally independent of its non-descendants given its parents. For root cause analysis purposes, the parent–child relationship is interpreted as causality. It is worth noting, however, that it is difficult to derive causality from observational data as statistical dependency does not always imply causality [30]. Training a BN model entails two sequential tasks: learning the DAG structure

G

first followed by fitting parameters

θ

from data set

D

to the DAG

G

, which can be written mathematically as

P (G, θ| D) = P (G| D) \times P (θ| G, D) .

(2)

2.1. Bayesian Network Structure Learning

One of the main limitations in the application of BN for fault diagnostics of complex manufacturing systems is the difficulty in determining the network structure [10,16]. Structure learning methods generally fall under one of these four approaches: heuristic search, design of experiment, expert opinion, or engineering model. Different score-based heuristic search methods have been applied to learn the BN DAG for manufacturing fault diagnostics applications such as the K2 algorithm [31] and the Chow–Liu algorithm [32].

A major limitation of heuristic-structure learning methods using exploratory data is that only the essential graph containing the skeleton and v-structures can be discovered [33]. That is, only a set of “Markov equivalent class” of networks can be learned [21]. Figure 1 shows examples of graphs that are Markov equivalent. Furthermore, these heuristic methods require much training data to obtain a good network, which is not always available, especially for manufacturing companies at the beginning/pilot stage of data analytics [34,35]. As such, a possible alternative is to use intervention data obtained through designed experiments to learn the DAG.

Carrying out designed experiments facilitates causal discovery [18,19], which is highly desired for learning causal BNs. A designed experiment was utilized for determining the DAG for fault diagnosis in References [36,37]. Since carrying out intervention on a physical manufacturing plant can result in lost production time, we are proposing the performance of the experiments in the DES model of the manufacturing system instead. Additionally, since single-variable intervention might result in an incorrect DAG, a sequential DAG learning strategy that first implements a single-variable intervention step followed by multiple-variable intervention step to confirm/deny the edges learnt in the single-variable intervention step, proposed in Reference [38], is utilized in this paper.

Another common approach for structure learning is expert knowledge elicitation, which involves obtaining the influential relationship between variables from a domain expert. More often than not, it is used in conjunction with other structure learning methods. To determine the influential relationships between different human resource KPIs in a manufacturing industry, Xiao et al. [16] combined expert knowledge and the L1-regularized Markov blanket heuristic search algorithm. Their method is applicable to small dataset situations where the available data are used to construct a prior network and then the edge information that maximizes the posterior distribution of the network are sequentially elicited from the expert. Li and Shi [39] used a prototypical constrain-based (PC) algorithm with manufacturing domain knowledge to determine the DAG for fault diagnostics of a rolling manufacturing process. Some major drawbacks of expert elicitation include misinterpretation of causal direction due to the difference between domain terminology and BN terminology [40], differences in opinions of experts from same domain, and inability for experts to readily recall the DAG from memory. A method to train BN using expert knowledge documented in maintenance logs was proposed in Reference [41].

Engineering models such as failure mode and effect analysis (FMEA), fault tree analysis (FTA), and fishbone diagram have been used by several authors to create the BN tree for fault diagnosis applications. These models embed some knowledge about the domain, and thus could be an improvement over direct DAG elicitation from domain experts. A finite element analysis (FEA) model was used in Reference [42] whereas FMEA was used in Reference [43]. Pradhan et al. [44] utilized several quality engineering tools such as FMEA, fishbone diagram, FTA, 5-whys report, 8-D report, variation sensitivity matrix, and ontology to generate the BN structure. A fault tree was used to learn the BN structure in References [45,46,47]. Graph theory knowledge and matrix tools was used in Reference [48] for a CNC machine fault diagnosis, whereas a multilevel flow model of a chemical plant that decomposes the system into mass and energy flows was used in Reference [49]. To characterize the influence of manufacturing decisions, variables, and constraints on manufacturing KPIs, Panicker et al. [17] utilized cost models of products produced using an AM process to develop the causal graph. A hybrid-structure learning approach where domain knowledge is used as structure priors to guide score-based heuristic search method for training BN for diagnosing process faults of multi-stage manufacturing processes was proposed in Reference [50]. For quality assurance in AM processes, a hybrid structure learning method that first utilizes domain ontology for adding constraints between variables, followed by using a PC algorithm for constructing the graph skeleton and lastly using a Hill-Climbing score-based method to add, delete, and change the direction of edges, was proposed in Reference [10]. In the probabilistic Boolean network domain, methods such as the state-flipped control technique have been explored to investigate their stabilization [51,52].

Our method follows this line of research by using a different type of engineering model for structure learning. Unlike most of the existing methods being only applicable to lower level machine faults, our method applies to factory-level faults, which is important for improving system-level resilience and sustainability.

2.2. Bayesian Network Parameter Learning

Once the BN DAG has been obtained, the next step in the training process is to determine the model parameters also known as the conditional probability table (CPT). Parameter estimation methods include the classical maximum likelihood estimation (MLE) and maximum a posteriori estimation (MAP), and their variants. A comprehensive review of BN parameter learning is presented in Reference [53]. MLE is a very effective parameter-estimation method for data-rich applications, and it was used in References [39,43,54]. For small dataset applications like the one being studied in this paper, however, MLE falls short of providing accurate and representative parameters. Several alternative approaches have thus been proposed such as transfer learning using expert knowledge [55] and expert elicitation based on fuzzy theory [47]. Qualitative constraints elicited from domain experts can also be introduced into the MLE algorithm to limit the parameter search space such as in Reference [44].

MAP method is a Bayesian statistics parameter learning method that is also common in the small-dataset BN literature. It allows for incorporation of expert knowledge in the form of priors, and it averages the priors with the likelihood calculated from the sample data to obtain the posterior parameters. Dey and Stori [36] utilized a uniform prior to learn the CPT from a sparse training database for the root cause analysis of machining process variation. Coming up with exact quantitative prior probabilities is, however, challenging in practice. A more practical alternative is to elicit qualitative constraints in the form of relationships between parameters such as

p_{1} \approx p_{3}, (p_{1} + p_{2}) > 0.7, (p_{1} + p_{2}) < p_{3}

, etc., where

p_{1}, p_{2}, a n d p_{3}

denote individual parameters in a CPT. These constraints are then used to restrict the prior probability search space during parameter optimization. Some of these methods include constrained maximum a posteriori (CMAP) [22], qualitative maximum a posteriori (QMAP) [25], further constrained qualitative maximum a posteriori (FC-QMAP) [56], and parameter extension under constraints method [57]. We apply the QMAP approach for parameter learning in this paper and we propose qualitative constraints derived from engineering models.

2.3. Data Generation

Another research area in the limited dataset literature is data generation. One of the most prevalent methods is data augmentation where the available small and imbalanced dataset is extended through some function to generate synthetic data. The feasibility of using Kriging and Radial Basis Function models to generate data for training BN was explored in Reference [35] and the results obtained show that in some cases generated data could increase the accuracy of the model. Generating data directly from a simulation model of the actual system is another promising approach. Jain, Narayanan, and Lee [58] generated CNC machine-level data from a virtual factory model of a small job shop for training neural networks and Gaussian process regression ML algorithms. The ability to generate synthetic data from simulation models allowed for the testing of these predictive models even when only a small amount of real data was available, thereby expediating the data-analytics application process. The use of a boosted ensembles ML model for modeling manufacturing systems with a small data size has also been studied [59]. Our approach falls along these lines but is applicable to system’s level performance evaluation using BN.

3. Engineering Model-Driven Bayesian Network Learning

The main thesis of this paper is that the engineering models used for designing and analyzing factories can be used for Bayesian Network learning, thereby using domain knowledge contained in engineering models to accelerate learning. Figure 2 illustrates the key components of the proposed approach. Domain knowledge pertaining to the design of a factory is usually contained in analytical and simulation models, based on queueing theory and discrete-event simulation techniques. These models can be used to learn the structure and parameters of the BN. A major advantage of the proposed approach is that the engineering models are likely to be available long before the digitalized smart factory becomes operational. Post-digitalization, smart manufacturing systems can be expected to have an established flow of operational data from sensors and transactions resulting from part flows in the factory. KPIs used for managing operations along with detailed data can be used to improve the accuracy of the BN parameters learnt during pre-digitalization and also for diagnostics.

Typically, full digitalization for smart manufacturing involves installing IoT sensors and networking them to acquire machine- and process-level operational data, which can take several months or years to become fully functional [60]. The proposed method offers the prospect of leveraging learning from engineering models to rapidly ramp up factory-level fault diagnostics capabilities in smart manufacturing systems. Next, each of the major components of the proposed engineering model-driven learning approach are detailed.

3.1. Bayesian Network Structure Learning

We are proposing two methods for augmenting the expert-opinion DAG learning for system-level fault modelling. The first is using a queueing systems model to elicit the DAG and the second is using a DES model to carry out intervention to learn the DAG.

3.1.1. Queueing Model for Structure Learning

We first analyze the influential relationships between a single KPI type, specifically OEE, across multiple workstations in a manufacturing system, followed by those of different KPI types.

Learning the DAG for a Single KPI Type. To determine the DAG for one manufacturing KPI type across several workstations, we will need to distinguish between KPIs whose effects propagate to other workstations (i.e., throughput, OEE) and those whose effects are local (i.e., availability, quality ratio, first pass quality). For example, low throughput in upstream workstations will likely result in low throughput in downstream workstations due to starving. On the contrary, the availability of a workstation will generally not affect the availability of other workstations, although it might affect a different type of KPI of downstream workstations such as throughput due to reduction in the number of incoming parts from the machine with low availability.

In this paper, we focus on the performance influence between workstations in a plant evaluated by their individual OEE metrics and the overall manufacturing system output measured by the plant’s throughput. OEE is a compound KPI that aggregates the effect of three other KPIs, namely availability (A), performance (P), and quality ratio (Q) to evaluate the productivity of a station. The OEE equation is given as:

O E E = A \times P \times Q

(3)

where A measures the uptime of a machine calculated as the ratio of run time to planned production time. P accounts for part-production speed calculated as the ratio of produced parts to the ideal throughput. Q tracks the defect rate of manufactured products calculated as the number of defects divided by the number of parts produced. OEE measures sustainability by capturing product quality (e.g., less scrap) and measures resilience by capturing availability. Hence, OEE is not only widely accepted among practitioners, but it can serve as a starting point for engineering resilient and sustainable manufacturing systems. Our long-term objective is to use this learning approach to improve OEE, to pave the way for a new class of resilient and sustainable smart manufacturing systems.

A manufacturing system can be viewed as a queueing system where parts arrive at a particular rate to workstations for processing and depart once processing is completed. As a result, the factory dynamics is largely dictated by the part flow, which depends on the equipment layout and the processing capacity of each workstation. Therefore, the process-flow information alongside the Markov assumption of BN dictate the system-level influence between workstations’ KPIs’.

For illustration, we use the equations of flow variability interactions between workstations obtained from Reference [61]. A workstation can contain multiple similar machines that perform a particular task in parallel. Variability in any workstation can be adequately explained by the variability in the processing time of the workstation itself and variability in the arrival rate of parts to the workstation dictated by the variability in departures from the preceding workstation. For example, consider the simple serial-manufacturing line depicted in Figure 3a where part departures from station

(i - 1)

become arrivals to station

(i)

and departures from station

(i)

become arrivals to station

(i + 1)

. The original manufacturing rate (in parts per unit time) of workstation

(i + 1)

is

r_{o} (i + 1),

which is the inverse of the average part processing time. The first source of variability is the effect of the reduction in production rate due to machine downtimes and product quality, characterized by the effective production rate

r_{e} (i + 1)

. Let

p (i + 1)

and

A (i + 1)

represent the fraction of defective parts produced and the availability, respectively, of workstation

(i + 1)

, then

r_{e} (i + 1) = (1 - p (i + 1)) \times A (i + 1) \times r_{o} (i + 1)

(4)

These effects of quality and availability are local to the workstation. Quality is independent in this case because we are assuming that inspection is carried out at the end of a workstation and that only good parts are passed on to the next workstation. The second source of variability is the part-arrival rate

r_{a} (i + 1)

. Since station

(i)

feeds station

(i + 1)

with parts, the arrival rate of station

(i + 1)

must be equal to the departure rate (or throughput) of station

(i)

, which is

r_{d} (i)

. That is,

r_{a} (i + 1) = r_{d} (i)

and

c_{a} (i + 1) = c_{d} (i)

, where

c_{a}

and

c_{d}

are the coefficients of variability of arrival and departure rates, respectively. As a result, any performance metric for station

(i + 1)

will be determined by these two sources of variability. The throughput of station

(i)

is a function of its arrival parameters

r_{a} (i)

,

c_{a} (i)

, and effective processing rate parameters

r_{e} (i)

and

c_{e} (i)

; thus, any KPI for station

(i)

will result from the effects of these variables. As such, the KPIs of station

(i)

influence the KPIs of station

(i + 1)

through the transfer of parts to it. Likewise, the KPIs of station

(i - 1)

will influence the KPIs of station

(i)

in the same way.

However, since the effects of the process variability of station

(i - 1)

cannot directly influence the process variability of station

(i + 1)

without passing through station

(i)

, the KPIs of station

(i - 1)

are conditionally independent of the KPIs of station

(i + 1)

given station

(i)

. That is, all of the variability of station

(i + 1)

can be fully explained by the variability in station

(i),

which is the Markov assumption that the BN DAG encodes: a node is conditionally independent of its non-descendant given its parents. We can, therefore, apply this logic to create the KPI metric DAG by assigning parents to each child based on the manufacturing process flow.

The corresponding BN tree for the serial line in Figure 3a is presented in Figure 3b. It is worth noting that if there is a limited buffer between two machines, then blocking can occur whenever the downstream machine fails or is busy processing. Therefore, this introduces cycles in the BN DAG as a downstream machine’s performance can now affect an upstream machine’s performance. Our current method therefore only applies to balanced manufacturing lines that have adequate buffer size between workstations such that blocking never occurs. This is a realistic assumption in many manufacturing facilities. In addition, end-of-line quality inspection will make the quality ratio of the workstations dependent on one another and will also affect the way the availability term of OEE KPI is calculated; thus, such systems are not considered in this paper.

Learning the DAG for Multiple KPI Types. For structure learning where there is more than one KPI type, our method can still be used to generate the BN tree in a two-step process. The first step involves using our queueing method to reduce the search space by limiting possible parents to only KPIs of same workstation or preceding workstation(s) based on the process flow. That is, only KPIs of workstation

(i)

can be parents of workstation

(i + 1)

’s KPIs. This reduces the search space as each group will contain a smaller number of variables, leading to more accurate results compared to searching the whole structure from scratch at once [62]. The second step involves using either expert opinion or any other DAG learning method to determine the influence between different KPI types of the same workstation and/or adjacent workstations.

3.1.2. Discrete-Event Simulation Model for Structure Learning

The DES model can be used for DAG learning either via carrying out structural intervention or via data generation where a DAG learning algorithm is applied to learn the DAG from the generated data. Carrying out a full designed experiment is the most precise method for discovering the true causal structure between variables [63] and we make use of structural intervention algorithm (SIA) for BN learning [38] in this paper. SIA consists of two steps; (1) an influence-discovery step where intervention is carried out on a single variable to determine the other variables it influences, and (2) a parent-confirmation step where an intervention on multiple variables is performed simultaneously to establish whether the influence discovered in the first step is a direct cause (parent) or an ancestor.

3.2. Bayesian Network Parameter Learning

Once the BN structure is determined, the next step is to fit the parameters that represent the degree of influence between a child node and its’ parent(s). To generate the parameters for small dataset applications, we are adopting the QMAP algorithm along with parameter constraints derived from a queueing model. QMAP recruits a Monte Carlo sampling method (rejection sampling) to construct Dirichlet priors from the qualitatively constrained parameter space for the MAP estimation. QMAP score function is given by:

P (θ| G, D, Ω) = \frac{P (D| θ, G) P (θ | G, Ω)}{P (D | G, Ω)}

(5)

where

G

is the DAG,

D

is the available data,

Ω

is the qualitative constraint information elicited from the expert, and

θ

is the BN parameter set to be learned. The maximum estimate of the QMAP score for a particular parameter is computed as:

{\hat{θ}}_{i j k} = \frac{N_{i j k} + M_{i j k}}{\sum_{k = 1}^{r_{i}} (N_{i j k} + M_{i j k})} = \frac{N_{i j k} + M_{i j k}}{N_{i j} + M_{i j}}

(6)

where

i, j, k

are the node, parent and state indexes for

X_{i}

, respectively,

1 \leq i \leq n

,

1 \leq j \leq q_{i}

, and

1 \leq k \leq r_{i}

.

n

is the total number of nodes in the network,

q_{i}

is the total number of states of

P a (X_{i})

and

r_{i}

is the number of states of

X_{i}

.

N_{i j k}

are counts from the available dataset while

M_{i j k}

are pseudo counts created from the qualitative constraints using a rejection-sampling method.

We first utilize the relationship between KPI values to propose prior constraints regarding the rarity of certain KPI combinations in the Intra-CPT table constraint group. We also propose an Inter-CPT constraint, named the variability-propagation constraint, based on the queueing systems nature of manufacturing systems.

Prior constraints used in this paper are derived from statistical relations as well as from expert opinion, and are grouped into three categories as follows:

(1): Axiomatic constraints, which are statistical constraints based on the laws of probability such as that probabilities should sum to one and be non-negative. $\sum θ_{i j k} = 1, 0 \leq θ_{i j k} \leq 1, \forall i, j, k$ .
(2): Intra-CPT table constraints, which are constraints obtained from domain experts about the relationships between the probabilities within a node’s CPT table. Examples include approximate-equality constraints, which specify the probability of some known events such as that the probability of rare events should be very small, i.e., it is highly unlikely for a child node to have high OEE value given the parent node has low OEE, and likewise, it is more likely that a child node has a high OEE value given its parent node has a high OEE value. An additional example is the inequality relationship between probabilities such as the probability of observing a child node with high OEE given that the parent has high OEE is greater than the probability of observing a child node having high OEE given that the parent node OEE is medium. $θ_{i j k} \approx 0, θ_{i j k} > b, θ_{i j^{'} k^{'}} \approx θ_{i j k}, θ_{i j k^{'}} < θ_{i j k} \forall j \neq j^{'}, k \neq k^{'}$ .
(3): Inter-CPT constraints, which are also expert constraints but apply to the relationship between different nodes’ CPTs. We are proposing the yield-loss or variability-propagation constraint, which is based on the relationship between workstations in a manufacturing system. In a serial manufacturing line, variability propagates through the system resulting in cumulative yield losses going from upstream to downstream workstations. For example, in a production line with three serial workstations, losses in the first workstation will result in a reduced number of finished parts that it sends to the second workstation. This reduction in input, along with losses in the second workstation itself, will cumulatively reduce the performance of the second workstation when compared to the first workstation. Likewise, a reduction in the throughput of the second workstation will further reduce the number of parts being sent to the third workstation, resulting in the performance of the third workstation being lower than that of second workstation. Any other available information that can be used to infer the relative performance of workstations falls into this constraint category. $θ_{i^{'} j^{'} k^{'}} < θ_{i j k} \forall i \neq i^{'}, j \neq j^{'}, k \neq k^{'}$ .

4. Experimental Setup and Case Study

To illustrate our proposed method, we apply the BN learning approach to (i) a 4-robot cell in our Factory for Advanced Manufacturing Education Laboratory; (ii) a robotic machining cell in a commercial vehicle factory. The simulation models were created using Simio discrete-event simulation software version 12.205 whereas the BN models were created using the bnlearn package version 4.6.1 [64] in R version 3.6.3. The accuracy of the learned DAGs were evaluated using the structural-hamming distance (SHD) metric [65]. SHD describes the number of changes that need to be made to the estimated DAG to turn it into the true DAG. It is the sum of the missing edges, the extra edges, and the incorrect edge directions in the estimated DAG [66]. For evaluating the parameter accuracy, the Kullback–Leibler (KL) divergence score [67] was used that indicates the divergence between the learned parameters and the true parameters. The KL-divergence score originates from the field of information theory and is a measure of relative entropy between two probability distributions [68]. Both metrics are non-negative and lower values are better, and a value of zero means the estimated is identical to the true value. The average KL-divergence score was computed using Equation (9).

\bar{K L} (θ, \hat{θ}) = \frac{1}{\sum_{i = 1}^{n} r_{i} q_{i}} \sum_{i = 1}^{n} \sum_{j = 1}^{q_{i}} \sum_{k = 1}^{r_{i}} θ_{i j k} l o g (\frac{θ_{i j k}}{{\hat{θ}}_{i j k}})

(7)

The objective was to train a BN to determine the influence of the daily performance of each workstation evaluated by their OEE on one another as well as on the overall output of the manufacturing system appraised by the throughput (TH). Both variables were discretized to three levels; low (L), typical (TP), and world-class (WC) where WC and L are the best and worst KPI values, respectively. Examples of probability constraint values used for parameter learning are presented in Table 1, and the ranges were chosen by domain experts based on their expertise and knowledge of OEE distribution.

4.1. Experimental Robotic Assembly

An experimental pick-and-place robotic assembly line that consisted of four industrial ABB IRB140 robots with IRC5 controllers used to sequentially assemble 3D printed interlocking plastic bricks, as shown in Figure 4, was set up in the laboratory in our university. The fully assembled part was made up of four top bricks and one base brick as shown in Figure 5. Each robot attached one top brick to the base brick before passing the subassembly to the next robot. For transporting parts between robot stations, i.e., material handling, an inclined plastic U-channel was used. After a robot completed its assembly task, it placed the subassembly on the elevated end of the appropriate U-channel for the subassembly to slide down to the next robot station under gravity. Each U-channel was fitted with a proximity sensor at the end to detect when an incoming subassembly was available for pick-up. The last robot placed the fully assembled part into a box after it attached the last top brick to it.

The same experimental setup is described in Reference [38] where a digital twin model was created for it to train BN for all manufacturing faults. In this paper, we instead utilize the digital model to train BN for factory-level fault diagnostics. Furthermore, in this paper we use analytical queueing model for structure learning, and propose constraints based on the law of propagation of uncertainty from queueing theory for applying QMAP for parameter learning. Since the robots are healthy and are used for other teaching purposes, the types of faults were limited to those that can be simulated without causing physical damage to the robots, such as compressed air not functioning (simulated by turning off the compressed air valve), controller fault (simulated by stopping the robot program), and low performance (simulated by reducing the speed in the robot program). Part quality faults modeled include part dimension out of specification, part placed incorrectly for pick-up, and top brick out of stock. The target throughput for the assembly line per 2-h shift was 70 fully assembled parts and the average cycle time for one robot to process a part was about 1.26 min. An operator monitored the assembling process and identified when a defective part was being produced as defective assemblies caused the robot jaw to jam.

The first step in the BN training process was to determine the BN structure for the system-level performance evaluated by the OEE. As this is an experimental assembly, intervention was carried out directly on the physical assembly line starting with single-variable intervention. Each OEE variable was set to different states and any other variable whose state changed was deemed a child node of the set variable. The state change was implemented by increasing or decreasing the availability and/or quality values of the station. For example, intervening on Robot 1 OEE involves changing the availability or quality of Robot 1 and then observing the states of all other variables. In this case, all the other KPIs’ states changed due to this intervention. The resulting DAG from all single-variable interventions is presented in Figure 6. This DAG has several edges that require the parent-confirmation intervention step due to nodes possibly showing an effect on not only their child node but also on their descendant nodes, i.e., Robot 1 OEE has an edge to Robot 3 OEE and so on. The correct DAG following the parent-confirmation step is shown in Figure 7. Next, this experiment was reconducted in the simulation model and the result was the same. A total of 1000 data points were generated from the assembly line’s DES model to learn the DAG using the Hill-Climbing (HC) algorithm and the learned DAG, which has an SHD of one, is shown in Figure 8.

Applying our queueing model to elicit the DAG, the resulting DAG agrees exactly with the resulting DAG from the designed experiment shown in Figure 7. That is, without any data at all, our method is able to generate the correct BN structure. The effect of data size on the accuracy of the learned DAG and also on the parameters are presented in Figure 9 and Figure 10, respectively. These two figures show the improvement in the learned DAG and parameters as data size increases. The difference between the resulting DAG and parameters learned using the proposed method and traditional methods is more significant at lower data sizes. The QMAP method outperforms the MLE method at all data sizes as shown in Figure 10 and both methods converge to about the same KL-divergence score as data size increases.

From the experimental assembly line’s factory-level fault diagnostics BN model in Figure 7, the probabilities of different events can be computed given some evidence such as the probability that Robot 4 OEE is world-class (WC) given that line throughput (TH) is WC is 0.84. Likewise, the probability that line TH is WC given that Robot 4 OEE is WC is 1. The probability that line TH is Low given that Robot 2 OEE is WC is 0.015, while the probability that Robot 2 OEE is WC given that line TH is Low is 0.545. These two examples illustrate how a BN model can be used to reason in both diagnostics and prognostic directions.

To illustrate how using the correct DAG can improve fault diagnostics and promote sustainability, let’s consider a case where the throughput of the experimental assembly line is low and needs to be improved. Let us assume that Robots 3 and 4 both have faulty grippers such that they both produce a high amount of defective assembly causing them to both have low OEE. Furthermore, these faulty grippers cannot be repaired and some cost is required to replace them, but the current budget is limited. According to the incorrect DAG learned using the HC algorithm (Figure 8), Robot_3 OEE directly influences the overall throughput of the system alongside Robot_4 OEE. Thus, improving Robot 3′s OEE without improving Robot 4′s OEE should result in some improvement of the overall throughput of the system, which is not the case in reality. Thus, if resources are invested to improve Robot 3 alone without fixing Robot 4, the throughput of the assembly will stay the same (only an improvement in Robot 4′s OEE can directly improve the overall throughput according to the correct DAG in Figure 7) and defective parts will continue to be produced until Robot 4 is also fixed. This incorrect diagnostic will result in the production of defective parts, which is a source of material waste as well as energy waste.

4.2. Tough Trucks Robotic Machining Cell

Our proposed method was also applied to train a BN for system-level fault diagnosis of a robotic machining cell in the factory of a commercial vehicle original equipment manufacturer (OEM), which we will call Tough Trucks (name altered for privacy). The process flowchart of the machining cell is presented in Figure 11. This cell consists of three horizontal CNC machines that work in parallel for machining the parts, including a robot for material handling within the cell and a capping machine that inserts a bearing cap into the housing. The first process involves the machining of an as-cast axle housing to the required specification in a machining cell. Once machined, the housing moves to the part marker device, which inscribes an identification number. Next, the marked axle housing is sent to a part washer station for washing before proceeding to the final assembly station where all internal components are inserted into the housing to create the final axle. The final axle then proceeds to the Marposs inspection station for quality inspection.

The goal is to train a BN for system-level fault diagnosis based on the OEE of the machining cell (Cell OEE) as well as that of the individual CNC machines (M1 OEE, M2 OEE and M3 OEE), part marker OEE (Mark OEE), part washer OEE (Wash OEE), axle assembly OEE (Assy OEE), and the throughput (TH) of the manufacturing line. Although a lot of the operational data were available, some of the OEE data were not. Therefore, for the purpose of illustrating the proposed BN learning we synthesized the missing data using the detailed DES model we developed to ensure overall consistency and to run a heuristic search method.

Using our proposed queueing model structure learning method, the resulting DAG is presented in Figure 12. Since production cannot be stopped for the sake of carrying out intervention on the actual shop floor, intervention was carried out in the DES simulation model of the manufacturing line to determine the true DAG using our proposed intervention method. The resulting DAG exactly matched the one developed using the queueing model. The effect of data size on the DAG learned using the HC heuristic search algorithm is presented in Figure 13. The DAG obtained using 1000 data points has an SHD value of 2 and is presented in Figure 14. Using our proposed method with no data at all, the correct DAG was able to be determined, showing the superiority of our method when compared to heuristic search method that require a good amount of data to generate a good DAG. For parameter learning, the comparison between the QMAP and MLE methods for different data sizes is presented in Figure 15, and it can be seen that the QMAP method also outperforms the MLE method for all data sizes.

From the Tough Trucks BN model CPT in Figure 12, probabilities of different events can be computed given some evidence. For example, the probability that Cell OEE is TP given that Wash OEE is TP is 0.926 (meaning highly likely) and this probability drops to 0.054 when Wash OEE is WC because it is less likely to have a typical OEE at Cell when Wash OEE is high. Reasoning in the other direction, the probability that Wash OEE is TP given that Cell OEE is TP is 0.937 and the probability that Wash OEE is WC given that Cell OEE is TP is 0.064. Additionally, machine M2 usually has to have WC OEE in order to obtain a WC OEE at the assembly, i.e., probability of Assy OEE = WC given that H2 OEE = WC is 0.629. This probability drops to 0.360 when H2 OEE = TP, and to 0.091 when H2 OEE = Low.

5. Discussion

We have demonstrated how several engineering models can be leveraged to accelerate the training of a BN model for factory-level fault diagnostics especially for companies at the nascent stage of data collection. From the experimental and case study results, our proposed method was able to generate the correct DAG structure even without any data available whereas the HC algorithm required at least 100 data points to perform close to ours on average. For parameter learning, our method produced parameters that were about five times closer to the ground truth even with no data points, and MLE method required at least 200 data points to match the performance of our proposed approach with no data, showing the superiority of our method. In terms of computational complexity of the proposed method, SIA structure learning algorithm takes approximately O(n) computational time where n is the number of variables in the BN. QMAP also takes approximately O(n) where n is the number of parameters to be estimated [57].

Nevertheless, there are limitations to the proposed methods. First, the queueing model for structure learning is only applicable to manufacturing lines where blocking cannot occur. Blocking occurs when a completed part cannot proceed to the downstream machine due to buffer capacity limitation. The finished part will therefore prevent the upstream machine from processing other parts, thus creating a situation where downstream machines’ performance can affect that of upstream machines leading to a violation of the BN acyclicity assumption. Furthermore, the proposed queueing model-based DAG learning method is limited to directly eliciting the DAG of one KPI type (i.e., OEE) on several machines. To apply it to elicit the DAG of different KPI types on the same machine, a hybrid DAG learning approach will have to be implemented where our method is used to limit the DAG search space for other DAG learning methods as discussed in the second part of Section 3.1.1.

In order to accurately learn the correct DAG through interventions using the DES model of the factory, it is crucial that the DES model is representative of the current state of the factory i.e., it accurately captures the system-level dynamics of the real factory. However, it is possible for the DES model to become obsolete if the factory undergoes updates or modifications subsequent to the design of the DES model, rendering the model’s logic and/or parameters outdated. One way to prevent this is to utilize a digital twin (DT) model instead of a DES model as proposed in Reference [38]. The live automated bidirectional data exchange between the DT model and the physical factory can continuously keep the model’s parameters up-to-date with its physical counterpart.

6. Conclusions and Future Work

Lack of adequate and appropriate data is a major impediment in the application of Bayesian Network (BN) models for factory-level fault diagnostics. This is a pervasive problem because many companies are still in the early stages of digitalization and smart manufacturing. In this paper, we presented methods to learn the BN structure by leveraging engineering models typically used for design and analysis of factories: analytical models based on queueing theory and discrete-event simulation (DES) models. Furthermore, we provided qualitative constraints from queueing theory for learning the parameters using a qualitative maximum a posteriori (QMAP) method. Engineering models embody domain knowledge and thus augment the expert knowledge in the BN training process.

There are several fruitful directions of research stemming from this paper. One would be to study the situation when blocking can occur in the system, which in turn will violate the acyclicity assumption. Potentially, approaches that could be investigated include Markov network, which have undirected graphs, or Dynamic BN that provides an acyclic version of feedback loops. We only focused on modeling one key performance indicator (KPI) type on several machines in this research, and a natural extension is to include the influence of different KPIs on one another, such as the effect of OEE of station one on throughput of station two; this will be beneficial as different KPIs may be relevant for different workstations. Also, since more data can reduce the performance of ML models in some cases, more research is required to determine the right amount of data needed. Given that KPIs are continuous variables but were discretized in this work, the effect of different discretization levels should be investigated. Furthermore, modeling KPIs as continuous variables using continuous BNs needs to be studied. Additionally, the sensitivity of the QMAP method to the selected probability constraints should be further evaluated.

Using OEE as a KPI makes the approach presented in this paper not only relevant for practitioners but also to further additional aspects of resilience and sustainability. Future work could also consider additional KPIs, such as energy and environment, based on the data availability and overall performance objectives. Another future research opportunity is to thoroughly investigate the impact of Industry 4.0 on the impact of better capacity utilization on sustainability.

Author Contributions

Conceptualization, V.P.; methodology, software, validation, and formal analysis, T.A.; writing—original draft preparation, T.A.; writing—review and editing, V.P.; visualization, T.A.; supervision, V.P.; project administration, V.P.; funding acquisition, V.P. All authors have read and agreed to the published version of the manuscript.

Funding

The work described in this paper was funded in part by National Institute of Science and Technology (NIST) cooperative agreement with Penn State University, 70NANB14H255.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data not available due to legal restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Aditya, P.; Diego, G. Achieving Sustainable Development through Maintenance Excellence. J. Appl. Eng. Sci. 2012, 10, 79–84. [Google Scholar] [CrossRef]
Takata, S.; Kirnura, F.; van Houten, F.; Westkamper, E.; Shpitalni, M.; Ceglarek, D.; Lee, J. Maintenance: Changing Role in Life Cycle Management. CIRP Ann. 2004, 53, 643–655. [Google Scholar] [CrossRef]
Garetti, M.; Taisch, M. Sustainable Manufacturing: Trends and Research Challenges. Prod. Plan. Control 2012, 23, 83–104. [Google Scholar] [CrossRef]
Domingo, R.; Aguado, S. Overall Environmental Equipment Effectiveness as a Metric of a Lean and Green Manufacturing System. Sustainability 2015, 7, 9031–9047. [Google Scholar] [CrossRef]
Franciosi, C.; Voisin, A.; Miranda, S.; Riemma, S.; Iung, B. Measuring Maintenance Impacts on Sustainability of Manufacturing Industries: From a Systematic Literature Review to a Framework Proposal. J. Clean. Prod. 2020, 260, 121065. [Google Scholar] [CrossRef]
Sénéchal, O.; Trentesaux, D. A Framework to Help Decision Makers to Be Environmentally Aware during the Maintenance of Cyber Physical Systems. Environ. Impact Assess. Rev. 2019, 77, 11–22. [Google Scholar] [CrossRef]
Sarswatula, S.A.; Pugh, T.; Prabhu, V. Modeling Energy Consumption Using Machine Learning. Front. Manuf. Technol. 2022, 2, 855208. [Google Scholar] [CrossRef]
Coro, A.; Macareno, L.M.; Aguirrebeitia, J.; de Lacalle, L.N.L. A Methodology to Evaluate the Reliability Impact of the Replacement of Welded Components by Additive Manufacturing Spare Parts. Metals 2019, 9, 932. [Google Scholar] [CrossRef]
Lee, J. Machine Performance Monitoring and Proactive Maintenance in Computer-Integrated Manufacturing: Review and Perspective. Int. J. Comput. Integr. Manuf. 1995, 8, 370–380. [Google Scholar] [CrossRef]
Chen, R.; Lu, Y.; Witherell, P.; Simpson, T.W.; Kumara, S.; Yang, H. Ontology-Driven Learning of Bayesian Network for Causal Inference and Quality Assurance in Additive Manufacturing. IEEE Robot. Autom. Lett. 2021, 6, 6032–6038. [Google Scholar] [CrossRef]
Everton, S.K.; Hirsch, M.; Stravroulakis, P.; Leach, R.K.; Clare, A.T. Review of In-Situ Process Monitoring and in-Situ Metrology for Metal Additive Manufacturing. Mater. Des. 2016, 95, 431–445. [Google Scholar] [CrossRef]
Scholten, B. Integrating ISA-88 and ISA-95. In Proceedings of the ISA EXPO 2007, Houston, TX, USA, 2–4 October 2007; International Society of Automation (ISA): Houston, TX, USA, 2007. [Google Scholar]
Brundage, M.P.; Kulvatunyou, B.; Ademujimi, T.; Rakshith, B. Smart Manufacturing Through a Framework for a Knowledge-Based Diagnosis System. In Proceedings of the ASME 2017 International Manufacturing Science and Engineering Conference, MSEC2017, Los Angeles, CA, USA, 4–8 June 2017; American Society of Mechanical Engineers: New York, NY, USA, 2017; Volume 50749. [Google Scholar] [CrossRef]
ISO 22400-1; Automation Systems and Integration—Key Performance Indicators (KPIs) for Manufacturing Operations Management—Part 1: Overview, Concepts and Terminology. International Organization for Standardization: Geneva, Switzerland, 2014.
ISO 22400-2; Automation Systems and Integration—Key Performance Indicators (KPIs) for Manufacturing Operations Management—Part 2: Definitions and Descriptions of KPIs. International Organization for Standardization: Geneva, Switzerland, 2011.
Xiao, C.; Jin, Y.; Liu, J.; Zeng, B.; Huang, S. Optimal Expert Knowledge Elicitation for Bayesian Network Structure Identification. IEEE Trans. Autom. Sci. Eng. 2018, 15, 1163–1177. [Google Scholar] [CrossRef]
Panicker, S.; Nagarajan, H.P.; Mokhtarian, H.; Hamedi, A.; Chakraborti, A.; Coatanéa, E.; Haapala, K.R.; Koskinen, K. Tracing the Interrelationship between Key Performance Indicators and Production Cost Using Bayesian Networks. Procedia CIRP 2019, 81, 500–505. [Google Scholar] [CrossRef]
Eberhardt, F.; Scheines, R. Interventions and Causal Inference. Philos. Sci. 2007, 74, 981–995. [Google Scholar] [CrossRef]
Hauser, A.; Bühlmann, P. Two Optimal Strategies for Active Learning of Causal Models from Interventional Data. Int. J. Approx. Reason. 2014, 55, 926–939. [Google Scholar] [CrossRef]
Eaton, D.; Murphy, K. Exact Bayesian Structure Learning from Uncertain Interventions. Artif. Intell. Stat. 2007, 2, 107–114. [Google Scholar]
Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
Yang, Y.; Gao, X.; Guo, Z.; Chen, D. Learning Bayesian Networks Using the Constrained Maximum a Posteriori Probability Method. Pattern Recognit. 2019, 91, 123–134. [Google Scholar] [CrossRef]
Tolio, T.; Sacco, M.; Terkaj, W.; Urgo, M. Virtual Factory: An Integrated Framework for Manufacturing Systems Design and Analysis. Procedia CIRP 2013, 7, 25–30. [Google Scholar] [CrossRef]
Jain, S.; Shao, G.; Shin, S.-J. Manufacturing Data Analytics Using a Virtual Factory Representation. Int. J. Prod. Res. 2017, 55, 5450–5464. [Google Scholar] [CrossRef]
Chang, R.; Wang, W. Novel Algorithm for Bayesian Network Parameter Learning with Informative Prior Constraints. In Proceedings of the 2010 International Joint Conference on Neural Networks (IJCNN), Barcelona, Spain, 18–23 July 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 1–8. [Google Scholar] [CrossRef]
Rosen, M.A.; Kishawy, H.A. Sustainable Manufacturing and Design: Concepts, Practices and Needs. Sustainability 2012, 4, 154–174. [Google Scholar] [CrossRef]
Ameri, F.; Sormaz, D.; Psarommatis, F.; Kiritsis, D. Industrial Ontologies for Interoperability in Agile and Resilient Manufacturing. Int. J. Prod. Res. 2022, 60, 420–441. [Google Scholar] [CrossRef]
Vrignat, P.; Kratz, F.; Avila, M. Sustainable Manufacturing, Maintenance Policies, Prognostics and Health Management: A Literature Review. Reliab. Eng. Syst. Saf. 2022, 218, 108140. [Google Scholar] [CrossRef]
Angelopoulos, A.; Michailidis, E.T.; Nomikos, N.; Trakadas, P.; Hatziefremidis, A.; Voliotis, S.; Zahariadis, T. Tackling Faults in the Industry 4.0 Era—A Survey of Machine-Learning Solutions and Key Aspects. Sensors 2019, 20, 109. [Google Scholar] [CrossRef]
Pearl, J. Causality: Models, Reasoning and Inference, 2nd ed.; Cambridge University Press: New York, NY, USA, 2009. [Google Scholar]
Yang, L.; Lee, J. Bayesian Belief Network-Based Approach for Diagnostics and Prognostics of Semiconductor Manufacturing Systems. Robot. Comput. -Integr. Manuf. 2012, 28, 66–74. [Google Scholar] [CrossRef]
Correa, M.; Bielza, C.; Pamies-Teixeira, J. Comparison of Bayesian Networks and Artificial Neural Networks for Quality Detection in a Machining Process. Expert Syst. Appl. 2009, 36, 7270–7279. [Google Scholar] [CrossRef]
Verma, T.S.; Judea, P. Equivalence and Synthesis of Causal Models. In Proceedings of the Sixth Conference on Uncertainty in Artificial Intelligence, Cambridge, MA, USA, 27–29 July 1990; Elsevier Science: New York, NY, USA, 1991; pp. 255–269. [Google Scholar] [CrossRef]
Betancourt, A. Making AI Work with Small Data. Industry Week. 2020. Available online: https://www.industryweek.com/technology-and-iiot/digital-tools/article/21122846/making-ai-work-with-small-data (accessed on 15 July 2020).
MacAllister, A.; Kohl, A.; Winer, E. Using High-Fidelity Meta-Models to Improve Performance of Small Dataset Trained Bayesian Networks. Expert Syst. Appl. 2020, 139, 112830. [Google Scholar] [CrossRef]
Dey, S.; Stori, J.A. A Bayesian Network Approach to Root Cause Diagnosis of Process Variations. Int. J. Mach. Tools Manuf. 2005, 45, 75–91. [Google Scholar] [CrossRef]
Endo, M.; Tsuruta, K.; Kita, S.; Nakajima, H. A Study of Cause-Effect Structure Acquisition for Anomaly Diagnosis in Discrete Manufacturing Processes. In Proceedings of the 2008 IEEE International Conference on Systems, Man and Cybernetics, Singapore, 12–15 October 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 2099–2104. [Google Scholar] [CrossRef]
Ademujimi, T.; Prabhu, V. Digital Twin for Training Bayesian Networks for Fault Diagnostics of Manufacturing Systems. Sensors 2022, 22, 1430. [Google Scholar] [CrossRef]
Li, J.; Shi, J. Knowledge Discovery from Observational Data for Process Control Using Causal Bayesian Networks. IIE Trans. 2007, 39, 681–690. [Google Scholar] [CrossRef]
Flores, M.J.; Nicholson, A.E.; Brunskill, A.; Korb, K.B.; Mascaro, S. Incorporating Expert Knowledge When Learning Bayesian Network Structure: A Medical Case Study. Artif. Intell. Med. 2011, 53, 181–204. [Google Scholar] [CrossRef]
Ademujimi, T.; Prabhu, V. Fusion-Learning of Bayesian Network Models for Fault Diagnostics. Sensors 2021, 21, 7633. [Google Scholar] [CrossRef]
Liu, Y.; Jin, S. Application of Bayesian Networks for Diagnostics in the Assembly Process by Considering Small Measurement Data Sets. Int. J. Adv. Manuf. Technol. 2013, 65, 1229–1237. [Google Scholar] [CrossRef]
De, S.; Das, A.; Sureka, A. Product Failure Root Cause Analysis during Warranty Analysis for Integrated Product Design and Quality Improvement for Early Results in Downturn Economy. Int. J. Prod. Dev. 2010, 12, 235–253. [Google Scholar] [CrossRef]
Pradhan, S.; Singh, R.; Kachru, K.; Narasimhamurthy, S. A Bayesian Network Based Approach for Root-Cause-Analysis in Manufacturing Process. In Proceedings of the 2007 International Conference on Computational Intelligence and Security (CIS 2007), Harbin, China, 15–19 December 2007; IEEE: Harbin, China, 2007; pp. 10–14. [Google Scholar] [CrossRef]
Przytula, K.; Milford, R. An Efficient Framework for the Conversion of Fault Trees to Diagnostic Bayesian Network Models. In Proceedings of the 2006 IEEE Aerospace Conference, Big Sky, MT, USA, 4–11 March 2006; IEEE: Piscataway, NJ, USA, 2006; pp. 1–14. [Google Scholar] [CrossRef]
Hu, J.; Zhang, L.; Cai, Z.; Wang, Y. An Intelligent Fault Diagnosis System for Process Plant Using a Functional HAZOP and DBN Integrated Methodology. Eng. Appl. Artif. Intell. 2015, 45, 119–135. [Google Scholar] [CrossRef]
Cheng, J.; Zhu, C.; Fu, W.; Wang, C.; Sun, J. An Imitation Medical Diagnosis Method of Hydro-Turbine Generating Unit Based on Bayesian Network. Trans. Inst. Meas. Control 2019, 41, 3406–3420. [Google Scholar] [CrossRef]
Zhang, Y.; Mu, L.; Shen, G.; Yu, Y.; Han, C. Fault Diagnosis Strategy of CNC Machine Tools Based on Cascading Failure. J. Intell. Manuf. 2019, 30, 2193–2202. [Google Scholar] [CrossRef]
Kirchhubel, D.; Jorgensen, T.M. Generating Diagnostic Bayesian Networks from Qualitative Causal Models. In Proceedings of the 24th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA), Zaragoza, Spain, 10–13 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1239–1242. [Google Scholar] [CrossRef]
Mondal, P.P.; Ferreira, P.M.; Kapoor, S.G.; Bless, P.N. Sequential Modeling and Knowledge Source Integration for Identifying the Structure of a Bayesian Network for Multistage Process Monitoring and Diagnosis. J. Manuf. Sci. Eng. 2024, 146, 011005. [Google Scholar] [CrossRef]
Liu, Y.; Liu, Z.; Yerudkar, A.; Del Vecchio, C. Stabilization of Probabilistic Boolean Networks via State-Flipped Control and Reinforcement Learning. IEEE Trans. Autom. Control 2023, 1–8. [Google Scholar] [CrossRef]
Liu, Z.; Zhong, J.; Liu, Y.; Gui, W. Weak Stabilization of Boolean Networks Under State-Flipped Control. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 2693–2700. [Google Scholar] [CrossRef]
Rohmer, J. Uncertainties in Conditional Probability Tables of Discrete Bayesian Belief Networks: A Comprehensive Review. Eng. Appl. Artif. Intell. 2020, 88, 103384. [Google Scholar] [CrossRef]
Li, H.; Wang, F.; Li, H. Abnormal Condition Identification and Safe Control Scheme for the Electro-Fused Magnesia Smelting Process. ISA Trans. 2018, 76, 178–187. [Google Scholar] [CrossRef]
Yuan, P.; Sun, Y.; Li, H.; Wang, F.; Li, H. Abnormal Condition Identification Modeling Method Based on Bayesian Network Parameters Transfer Learning for the Electro-Fused Magnesia Smelting Process. IEEE Access 2019, 7, 149764–149775. [Google Scholar] [CrossRef]
Guo, Z.-G.; Gao, X.-G.; Ren, H.; Yang, Y.; Di, R.-H.; Chen, D.-Q. Learning Bayesian Network Parameters from Small Data Sets: A Further Constrained Qualitatively Maximum a Posteriori Method. Int. J. Approx. Reason. 2017, 91, 22–35. [Google Scholar] [CrossRef]
Hou, Y.; Zheng, E.; Guo, W.; Xiao, Q.; Xu, Z. Learning Bayesian Network Parameters With Small Data Set: A Parameter Extension under Constraints Method. IEEE Access 2020, 8, 24979–24989. [Google Scholar] [CrossRef]
Jain, S.; Narayanan, A.; Lee, Y.-T.T. Comparison of Data Analytics Approaches Using Simulation. In Proceedings of the 2018 Winter Simulation Conference (WSC), Gothenburg, Sweden, 9–12 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1084–1095. [Google Scholar] [CrossRef]
Bustillo, A.; Urbikain, G.; Perez, J.M.; Pereira, O.M.; de Lacalle, L.N.L. Smart Optimization of a Friction-Drilling Process Based on Boosting Ensembles. J. Manuf. Syst. 2018, 48, 108–121. [Google Scholar] [CrossRef]
Clemons, J.; Why Smart Manufacturing Projects Fail. Forbes Technology Council. 2021. Available online: https://www.forbes.com/sites/forbestechcouncil/2021/05/03/why-smart-manufacturing-projects-fail/?sh=81bdab72f8ef (accessed on 15 August 2021).
Hopp, W.J.; Spearman, M.L. Factory Physics, 3rd ed.; Waveland Press: Long Grove, IL, USA, 2011. [Google Scholar]
Masegosa, A.R.; Moral, S. An Interactive Approach for Bayesian Network Learning Using Domain/Expert Knowledge. Int. J. Approx. Reason. 2013, 54, 1168–1181. [Google Scholar] [CrossRef]
Eberhardt, F. Almost Optimal Intervention Sets for Causal Discovery. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence, Catalina Island, CA, USA, 15–17 August 2012; Available online: http://arxiv.org/abs/1206.3250 (accessed on 24 December 2019).
Scutari, M. Learning Bayesian Networks with the Bnlearn R Package. J. Stat. Softw. 2010, 35, 1–22. [Google Scholar] [CrossRef]
Tsamardinos, I.; Brown, L.E.; Aliferis, C.F. The Max-Min Hill-Climbing Bayesian Network Structure Learning Algorithm. Mach. Learn. 2006, 65, 31–78. [Google Scholar] [CrossRef]
de Jongh, M.; Druzdzel, M.J. A Comparison of Structural Distance Measures for Causal Bayesian Network Models. In Recent Advances in Intelligent Information Systems; Springer: Berlin/Heidelberg, Germany, 2009; pp. 443–456. [Google Scholar]
Kullback, S.; Leibler, R. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
Hu, Z.; Hong, L.J. Kullback-Leibler divergence constrained distributionally robust optimization. Available Optim. Online 2013, 1, 9. [Google Scholar]

Figure 1. Illustration of Markov equivalence graphs. (a–c) are Markov equivalent. (d) Completed partially directed acyclic graph of (a–c). (e) Skeleton of (a–c).

Figure 2. Engineering model-driven Bayesian network learning for factory-level fault diagnostics showing available data/knowledge for BN learning at pre-digitalization and post-digitalization stages.

Figure 3. (a) Propagation of variability between workstations in a serial manufacturing line (Adapted from Reference [61]). (b) The corresponding Bayesian network structure of the serial manufacturing line shown in (a).

Figure 4. Experimental robotic-assembly line setup showing the four robots.

Figure 5. Zoomed in image of part assembly starting from an empty base brick on the left followed by a base brick with one top brick and finally a fully assembled part (base brick with four top bricks) on the right.

Figure 6. Resulting DAG after single-variable intervention for the robotic assembly line BN.

Figure 7. The correct DAG for the experimental assembly line and the CPTs. TP and WC refer to typical OEE and world-class OEE, respectively, while LTH, TPTH, and WCTH refer to low throughput, typical throughput, and world-class throughput, respectively.

Figure 8. The incorrect DAG structure learned using Hill-Climbing algorithm for experimental assembly line which has an SHD score of one, and the incorrect arc is shown in red.

Figure 9. Data size versus DAG structure accuracy for experimental assembly line comparing Hill-Climbing (HC) and our proposed engineering (engr) model structure learning algorithms. For the DAG learned using the HC algorithm, the SHD score varies for different random subsets selected for each data size. Thus, the minimum SHD (min SHD), average SHD (avg SHD), and maximum SHD (max SHD) scores are plotted.

Figure 10. Data size versus parameter accuracy for QMAP and MLE parameter estimation methods for experimental assembly line.

Figure 11. Process flowchart for Tough Trucks production line.

Figure 12. Correct DAG structure for Tough Trucks manufacturing line with CPTs.

Figure 13. Data size versus structure accuracy for Tough Trucks using SHD score to compare Hill-Climbing (HC) algorithm and our proposed engineering (engr) model structure learning method. For the DAG learned using the HC algorithm, the SHD score varies for different random subsets selected for each data size. Thus, the minimum SHD (min_SHD), average SHD (avg_SHD), and maximum SHD (max_SHD) scores are shown.

Figure 14. Resulting DAG from Hill-Climbing algorithm for Tough Trucks showing incorrect arcs in red.

Figure 15. Data size versus parameter accuracy for MLE and QMAP parameter learning methods for Tough Trucks Bayesian Network.

Table 1. Examples of probability constraints used in the QMAP.

Event	Probability-Constraint Range	Remark
Low OEE of root node	0.015 to 0.035	Marginal distribution
World-class OEE of root node	0.650 to 0.800	Marginal distribution
Highly unlikely event	0.001 to 0.002	Conditional distribution
Highly likely event	0.700 to 0.950	Conditional distribution

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ademujimi, T.; Prabhu, V. Model-Driven Bayesian Network Learning for Factory-Level Fault Diagnostics and Resilience. Sustainability 2024, 16, 513. https://doi.org/10.3390/su16020513

AMA Style

Ademujimi T, Prabhu V. Model-Driven Bayesian Network Learning for Factory-Level Fault Diagnostics and Resilience. Sustainability. 2024; 16(2):513. https://doi.org/10.3390/su16020513

Chicago/Turabian Style

Ademujimi, Toyosi, and Vittaldas Prabhu. 2024. "Model-Driven Bayesian Network Learning for Factory-Level Fault Diagnostics and Resilience" Sustainability 16, no. 2: 513. https://doi.org/10.3390/su16020513

APA Style

Ademujimi, T., & Prabhu, V. (2024). Model-Driven Bayesian Network Learning for Factory-Level Fault Diagnostics and Resilience. Sustainability, 16(2), 513. https://doi.org/10.3390/su16020513

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Model-Driven Bayesian Network Learning for Factory-Level Fault Diagnostics and Resilience

Abstract

1. Introduction

2. Literature Review

2.1. Bayesian Network Structure Learning

2.2. Bayesian Network Parameter Learning

2.3. Data Generation

3. Engineering Model-Driven Bayesian Network Learning

3.1. Bayesian Network Structure Learning

3.1.1. Queueing Model for Structure Learning

3.1.2. Discrete-Event Simulation Model for Structure Learning

3.2. Bayesian Network Parameter Learning

4. Experimental Setup and Case Study

4.1. Experimental Robotic Assembly

4.2. Tough Trucks Robotic Machining Cell

5. Discussion

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI