An Outlier Detection Approach to Recognize the Sources of a Process Failure within a Multivariate Poisson Process

Hou, Chia-Ding; Su, Rung-Hung

doi:10.3390/math12182813

Open AccessFeature PaperArticle

An Outlier Detection Approach to Recognize the Sources of a Process Failure within a Multivariate Poisson Process

by

Chia-Ding Hou

and

Rung-Hung Su

^*

Department of Statistics and Information Science, Fu Jen Catholic University, New Taipei City 242062, Taiwan

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(18), 2813; https://doi.org/10.3390/math12182813

Submission received: 2 August 2024 / Revised: 9 September 2024 / Accepted: 10 September 2024 / Published: 11 September 2024

(This article belongs to the Section D1: Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

Among attribute processes, the number of nonconformities conforming to a Poisson distribution is among the most crucial quality attributes. Furthermore, owing to the variety of quality attributes, the significance of the multivariate Poisson process in industry cannot be overstated. An out-of-control multivariate Poisson process can be detected using an alarm on a multivariate control chart. Nevertheless, pinpointing the specific quality attributes that led to the process shifts is complex. The study focuses on the causes that lead to process shifts in multivariate Poisson processes, unlike the majority of studies examining shifts in multivariate normal processes. This paper initially presents a statistical method for detecting outliers in a multivariate Poisson distribution. Furthermore, a progressive testing algorithm is then developed to identify the variables responsible for a failure within a multivariate Poisson process. According to simulation results, the proposed approach can effectively determine the sources of a process fault within a multivariate Poisson process.

Keywords:

nonconformity; multivariate poisson process; outlier

MSC:

62P30; 62J15

1. Introduction

Pinpointing the causes of process failures is crucial to enhancing a process. Multivariate control charts are extensively employed in identifying process faults. Nevertheless, this just means that faults have disrupted the underlying process. Thus, uncovering the main cause of faults in multivariate control charts has become increasingly critical, driving the rapid surge of related research.

In previous investigations, most attempts have been made to determine the variables that induce changes in mean or variance within a multivariate normal process. Several studies have suggested utilizing an approach based on soft computing for detecting changes in process variance or mean [1,2,3,4,5,6,7,8,9,10,11]. Furthermore, the investigation of the application of decomposition statistics to identify changes in variance or mean in a multivariate normal process was conducted [12,13,14,15,16,17,18,19,20]. However, the majority of these studies made the assumption that the process can be measured and conform to a multivariate normal distribution. It is noteworthy that many processes are not able to be numerically quantified. Instead, multiple correlated attributes must be assessed simultaneously to determine quality characteristics [21,22,23,24,25,26,27,28,29,30,31]. When a process has multiple attributes and the number of nonconformities follow a multivariate Poisson distribution, as is often the case in practical situations [23,30,31], recognizing the cause of a signal shift is crucial. Therefore, as opposed to the earlier discussion concerning multivariate normal process shifts, the aim of this research is to devise an effective method to recognize the sources of a process failure for a multivariate Poisson process.

The next section introduces the presented method for recognizing the sources of nonconformities shifts for a multivariate Poisson process. Section three presents numerical results, showing the efficiency of the suggested procedure. Section four provides a demonstrative example to illustrate the proposed method. The last section concludes this study.

2. Main Results

Let the k characteristics of the i th observation be represented by

X_{i} = [X_{i 1}, X_{i 2}, \dots, X_{i k}]^{'}, i = 1, 2, \dots, n,

(1)

which follows a multivariate Poisson distribution

M V P_{k} (λ, Σ)

, where

λ = [λ_{1}, λ_{2}, \dots,

λ_{k}]^{'}

,

Σ = C o v (X_{i}) = {[σ_{s t}]}_{k \times k}

, and

σ_{s t} = cov (X_{i s}, X_{i t})

. To monitor a multivariate Poisson process, numerous multivariate control charts have been suggested in the literature. To give an example, Chiu and Kuo [31] proposed the following control limits:

\begin{array}{l} U C L = \sum_{j = 1}^{k} λ_{j} + 3 {(\sum_{j = 1}^{k} λ_{j} + 2 \sum ρ_{j l} \sqrt[]{λ_{j} λ_{l}})}^{\frac{1}{2}} \\ C L = \sum_{j = 1}^{k} λ_{j} \\ L C L = \sum_{j = 1}^{k} λ_{j} - 3 {(\sum_{j = 1}^{k} λ_{j} + 2 \sum ρ_{j l} \sqrt[]{λ_{j} λ_{l}})}^{\frac{1}{2}} \end{array}

(2)

where

λ_{j} = E (X_{i j})

(3)

and

ρ_{j l} = co r r (X_{i j}, X_{i l}) .

(4)

In practice, when the parameters

λ_{j}

and

ρ_{j l}

are unknown, unbiased estimators can be utilized to estimate the unknown parameters. When the signal becomes beyond control on a multivariate control chart of this kind, one of the difficulties is determining the variable which is responsible for it. The succeeding subsection first develops an approach to detecting outliers of the multivariate Poisson distribution established upon the maximum adjusted residual. Afterward, a straightforward approach is then proposed to recognize the causes of a process failure in a multivariate Poisson process.

2.1. A Method for Identifying Outliers in the Multivariate Poisson Distribution

Consider

X_{1}, X_{2}, \dots, X_{n}

to be a random sample taken from the above-described multivariate Poisson distribution. The sample means are therefore

{\bar{X}}_{j} = \frac{\sum_{i = 1}^{n} X_{i j}}{n}, j = 1, 2, \dots, k .

(5)

Note that the multivariate Poisson distribution of

X_{i}

implies that

X_{i j}

has a Poisson distribution

P (λ_{j})

. There are many situations where it is necessary to test the hypothesis.

H_{0} : λ_{1} = λ_{1}^{(0)}, λ_{2} = λ_{2}^{(0)}, \dots, λ_{k} = λ_{k}^{(0)},

(6)

where

(λ_{1}^{(0)}, λ_{2}^{(0)}, \dots, λ_{k}^{(0)})

denotes a predetermined mean vector. Accordingly, the residual that has been adjusted can be defined as follows:

\begin{array}{l} R_{j} & = \frac{{\bar{X}}_{j} - E ({\bar{X}}_{j})}{\sqrt[]{V ({\bar{X}}_{j})}} . \\ = \frac{{\bar{X}}_{j} - λ_{j}^{(0)}}{\sqrt[]{\frac{λ_{j}^{(0)}}{n}}}, j = 1, 2, \dots, k . \end{array}

(7)

Under the null hypothesis, it is simple to show that

R =

[R_{1}, R_{2}, \dots, R_{k}]^{'}

will converge in distribution to

N = [N_{1}, N_{2}, \dots, N_{k}]^{'}

, which conforms to a multivariate normal distribution having mean

0

and covariance matrix

Σ^{*} = {[σ_{s t}^{*}]}_{k \times k}

, where

σ_{s t}^{*} = \{\begin{array}{l} 1, & f o r s = t \\ \frac{σ_{s t}}{\sqrt[]{λ_{s}^{(0)} λ_{t}^{(0)}}}, & f o r s \neq t \end{array} .

(8)

Discovering outliers in the data is essential across a broad range of applications. Outliers can be defined in various ways depending on the application field. According to Suri, Murty, and Athithan [32], an outlier in a data set is typically defined as an object that deviates from the known/normal behavior or assumes values significantly different from the expected ones. According to the definition provided, this study defines outliers as data points that do not conform to the given “typical” distribution. Consider

d_{i} = λ_{i} - λ_{i}^{(0)}

as the difference between

λ_{i}

and

λ_{i}^{(0)}

. If every

d_{i}

is zero, we can say there are no outliers. However, if certain differences are positive or negative, those corresponding variables could be defined as positive or negative outliers. Based on our understanding, there has been no literature so far that offers a statistical test method for outlier detection in multivariate Poisson-distributed data in line with the mentioned outlier definition. To find multivariate Poisson distribution’s positive outliers, an effective method may be to use the largest value among the adjusted residuals

\max_{j} R_{j} .

(9)

At significance level

α

, the test regarding such a single-sided alternative would be to reject it, provided that

\max_{j} R_{j} > c,

(10)

where c meets the condition of

P [\max_{j} R_{j} > c |H_{0}] = α .

(11)

By utilizing Boole’s inequality, we can obtain an easy-to-use approximation of c, which is

\begin{array}{l} α & = P [\max_{j} R_{j} > c |H_{0}] \\ \leq \sum_{j = 1}^{k} P [R_{j} > c |H_{0}] \\ \approx k (1 - Φ (c)) . \end{array}

(12)

Accordingly,

c \leq Z_{\frac{α}{k}},

(13)

where

Z_{\frac{α}{k}}

represents the

\frac{α}{k}

th upper-tail percentile of the standard normal distribution. Consequently,

Z_{\frac{α}{k}}

is a proximate upper limit for c. As the calculation is easy to perform and typically leads to a conservative result, our suggestion is to derive the critical value of the test above by approximating it with the upper bound.

We undertake a series of simulated trials to assess whether the introduced approximation is effective. To evaluate whether the approximation is accurate, we perform a Monte Carlo simulation to estimate

P [\max_{j} R_{j} > Z_{\frac{α}{k}}]

in accordance with null hypothesis assumption and compare to the nominal level. We evaluated two distinct values of k: 2, 3. Additionally, we consider five possible cases of

λ^{(0)}

, namely, (1,…,1), (5,…,5), (10,…,10), (30,…,30), (50,…,50). Sample sizes of 5, 10, 30, 50 are simulated to evaluate the effects of sample size. Due to the non-positive definiteness of the covariance matrix for

ρ \leq - 0 . 1

, the present study concerns six values of

ρ

: 0, 0.1, 0.3, 0.5, 0.7, 0.9. Krummenauer’s [33] algorithm is used to simulate 10,000 samples from each of the given multivariate Poisson populations to calculate

\max_{j} R_{j}

. To estimate

P [\max_{j} R_{j} > Z_{\frac{α}{k}}]

, we determine the percentage of the 10,000 simulated

\max_{j} R_{j}

values that exceed

Z_{\frac{α}{k}}

. Accordingly, better performance is associated with smaller deviations of

P [\max_{j} R_{j} > Z_{\frac{α}{k}}]

from the nominal levels.

For each value of

λ

, Figure 1 plots the deviations of

P [\max_{j} R_{j} > Z_{\frac{α}{k}}]

from the nominal levels against the

ρ

values and (k, n). Figure 1 shows that around half of the absolute deviations will fall under 0.0053 and 70% of them will be below 0.0095 for nominal levels of 0.05. At a nominal level 0.01, roughly 70% and 90% of absolute deviations are below 0.0027 and 0.0057. Additionally, when the nominal level is 0.001, 70% of the absolute deviations will be less than 0.0007 and 90% will be less than 0.0017. Consequently, in most cases, the approximation may be adequate.

2.2. A Method for Recognizing the Causes of a Process Failure in a Multivariate Poisson Process

Through the use of the method introduced earlier and the test procedure outlined below, one can pinpoint the main contributors to the out-of-control signals for a multivariate Poisson process:

(I): Commence at i = 1.
(II): Assign the value of $\frac{α}{i + 1}$ to $α^{*}$ . (The error spending approach and the Bonferroni method are utilized here to retain the type I error around its nominal value. Refer to [34]).
(III): At the $α^{*}$ significance level, test the hypothesis $H_{0} : λ_{1} = λ_{1}^{(0)}, λ_{2} = λ_{2}^{(0)}, \dots$ $, λ_{k} = λ_{k}^{(0)}$ with the statistic $\max_{j} R_{j}$ .
(IV): Eliminate the variable exhibiting the largest value in the adjusted residual if the hypothesis is rejected in Step (III). To make it easier, assuming the kth variable is removed. Updata k = k − 1 and i = i + 1. Revisit Step (II).
(V): When the hypothesis in Step (III) cannot be rejected, Terminate and deduce whether the other variables are not responsible for the nonconformity shifts.

Repeating the steps iteratively is necessary until only a subset of variables can no longer reject the hypothesis. Accordingly, while other characteristics were deemed to be contributors to process shifts, this set of characteristics was not considered to be the source.

To summarize, the method mentioned above can be briefly outlined as follows: Once a data set is collected, first use the in-control process mean to calculate the standardized residual for each quality characteristic. Then, rank these standardized residuals from largest to smallest. Next, compare them sequentially with the corresponding critical values, from largest to smallest, to determine whether the quality characteristic is out-of-control. It is worth noting that the proposed method is based on the fundamental assumption that the quality characteristics follow a multivariate Poisson distribution. If the quality characteristics do not conform to this distribution, the method will not be able to identify the sources accurately. Additionally, the proposed method assumes that when the process is out-of-control, the multivariate control chart can correctly detect it. Under these circumstances, the method can effectively determine the contributors. However, if the multivariate control chart fails to detect the out-of-control condition, the method will not be applicable.

As commonly understood, the multivariate Poisson distribution has broad applications in various fields. In addition, many research issues are essentially statistical problems focused on detecting outliers. Therefore, the approach introduced in this study can be applied not only for detecting sources of out-of-control processes in a multivariate Poisson process but can also offer practical potential for addressing problems involving the detection of outliers in multivariate Poisson distribution data in various fields.

3. Numerical Simulations

We perform simulations to showcase the effectiveness of the approach introduced earlier. Examining every possible data structure is an unfeasible task. Therefore, this study considers two values of k: 2, 3. FORTRAN V was used for coding all simulation programs. When the value of k is two, we use the telecommunication data set discussed in Chiu and Kuo [31] and assume the data have a bivariate Poisson distribution with

(λ_{1}^{(0)}, λ_{2}^{(0)}, ρ)

= (15.31, 2.35, 0.28)

in a process under control. In the case where k equals three, we apply the hepatitis C data set examined in Pascual and Akhundjanov [30] and assume

(λ_{1}^{(0)}, λ_{2}^{(0)}, λ_{3}^{(0)}) = (3.15, 14.00, 4.80)

(14)

and that the correlation matrix is

Ψ = [\begin{matrix} 1 & 0.00 & 0.36 \\ 0.00 & 1 & 0.16 \\ 0.36 & 0.16 & 1 \end{matrix}] .

(15)

Additionally, three simulation cases with out-of-control situations were considered. These cases take into account the following mean vectors:

(λ_{1}^{(0)} + δ \sqrt[]{λ_{1}^{(0)}}, λ_{2}^{(0)}, \dots, λ_{k}^{(0)}), (case I)

(λ_{1}^{(0)} + δ \sqrt[]{λ_{1}^{(0)}}, λ_{2}^{(0)} + δ \sqrt[]{λ_{2}^{(0)}}, λ_{3}^{(0)}), (case II)

and

(λ_{1}^{(0)} + δ \sqrt[]{λ_{1}^{(0)}}, λ_{2}^{(0)} + δ \sqrt[]{λ_{2}^{(0)}}, λ_{3}^{(0)} + δ \sqrt[]{λ_{3}^{(0)}}) . (case III)

Twelve values of

δ

are taken into account; they are 0.25, 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0, 2.25, 2.5, 2.75, and 3.0. Only the case of k = 3 is taken into account as an illustrative example in cases II and III. Assessing the impact of sample size, the sample size is set to 5, 10, 20, 30, 50, and 100 for case I, 10, 30, 50, 100, 200, and 500 for case II, and 30, 50, 100, 200, 500, and 700 for case III. The significance level is set at 0.05. In this simulation experiment, we apply the approximation presented in the previous section to calculate the critical value of the introduced test statistic. In addition, to evaluate the new approach, the accuracy recognition rate (ARR) is used, which is the percentage of correctly recognized characteristics. We perform 10,000 simulations and calculate the mean ARRs. To examine the performance of the proposed method with just a single out-of-control quality characteristic, we investigate case I. The results can be seen in Figure 2. In addition, to compare the ARR performance of the proposed method in three different cases, we examine the instances where k = 3. The results are shown in Figure 3.

Figure 2 illustrates how the ARR changes with varying shift values for various sample sizes. It is observed that, as either the sample size or the shift value increases, the ARR rises to 1. This is to be expected. Additionally, according to Figure 3, it can be inferred that more out-of-control quality variables necessitate a larger sample size to maintain the same ARR, as observed through a comparison of the three different simulated cases. As a result, it is observable that the accurate identification of sources of a process failure for a multivariate Poisson process can be achieved using the proposed approach with a sufficient sample size. Similar results were observed in our comprehensive simulation studies.

4. A Demonstrative Case

A practical example discussed in Pascual and Akhundjanov [30] concerning hepatitis C disease is analyzed to demonstrate the proposed method. Pascual and Akhundjanov [30] utilized the monthly data on hepatitis C notifications from three Australian states and assumed the counts of hepatitis C incidents to be trivariate Poisson distribution distributed. The three states referenced are New South Wales, Victoria, and South Australia. For clarity, let N, V, and S symbolize the three associated states. Pascual and Akhundjanov [30] tracked disease activity with a multivariate control chart to quickly find any upward trends in cases. Based on Pascual and Akhundjanov [30], the in-control mean vector and correlation matrix are estimated as shown in (14) and (15). To demonstrate the proposed method, we consider the following out-of-control mean vector

(λ_{1}^{(1)}, λ_{2}^{(1)}, λ_{3}^{(1)}) = (1.5 \times λ_{1}^{(0)}, 1 . 1 \times λ_{2}^{(0)}, λ_{3}^{(0)}),

(16)

which is one of the out-of-control simulation scenarios examined by Pascual and Akhundjanov [30]. It is evident that N and V are the states contributing to nonconformity shifts. In the event that the multivariate control chart triggers an out-of-control signal, the proposed method can now be used to determine the quality variables causing the nonconformity shifts. For ease of explanation, an illustrative data set is generated by assuming n = 10 and using the distribution

M V P_{k} (λ, Σ)

with the specified

(λ_{1}^{(1)}, λ_{2}^{(1)}, λ_{3}^{(1)})

defined in (16) and

Ψ

defined in (15). The simulated data set consists of the following ten observations: (3, 16, 5), (4, 17, 4), (3, 18, 6), (6, 18, 5), (11, 19, 10), (5, 23, 6), (2, 17, 5), (3, 12, 5), (6, 17, 3), and (7, 17, 8). To assess the appropriateness of the simulated data, the one-sample Kolmogorov–Smirnov test is applied to determine if the three variables conform to the Poisson distribution. The results are given in Table 1.

As shown in Table 1, the p-values for all three variables are above 0.05. These results offer substantial support that Poisson distributions appropriately model the marginal distributions of the simulated disease counts. In addition, the Fisher z-transformation test statistic

Z = \frac{\frac{1}{2} \ln (\frac{1 + r}{1 - r}) - \frac{1}{2} \ln (\frac{1 + ρ}{1 - ρ})}{\sqrt[]{\frac{1}{n - 3}}}

(17)

is employed to test whether the correlation coefficients between each pair of variables in the simulated data conform to Equation (15). The results are shown in Table 2.

According to Table 2, all p-values exceed 0.05, suggesting that the correlation coefficients are in agreement with Equation (15). Based on the results above, it may be concluded that the simulated disease counts conform to a multivariate Poisson distribution. Furthermore, the sample means and adjusted residuals are calculated and shown in Table 3.

With a significance level of 0.05 and the application of the method described earlier, it is possible to determine the contributors to nonconformity shifts. Table 4 provides a summary of this analysis.

According to Table 4, since the test statistic surpasses 2.39 in the initial iteration, we reject the null hypothesis. Given that N exhibits the highest test value, we assert that it contributes to the nonconformity shifts. We exclude the first variable, increment i by i + 1, and decrement k by k − 1. In the same way, in the second iteration, the test statistic still exceeds 2.39, and V has the highest test value. As a result, the null hypothesis is rejected and the state V is considered to be the cause. After removing the second variable, we adjust i to i + 1 and k to k − 1. At iteration three, the test statistic falls to 1.30, which is below 2.24, so the null hypothesis cannot be rejected and we end the testing procedure. The analysis reveals that the remaining state S is not responsible for the nonconformity shift. The data in this table highlight how the proposed method can effectively and easily pinpoint the contributors to nonconformity shifts.

5. Conclusions

In the process industry, it is crucial to recognize the out-of-control process contributor as quickly and accurately as possible. This study differs from most previous methods by focusing on identifying contributors in nonconformity shifts instead of multivariate normal processes. This study proposes a new test approach to identify outliers in a multivariate Poisson distribution. Additionally, this study presents an iterative testing method for recognizing the sources of a process failure for a multivariate Poisson process. According to our numerical results, the proposed method is a straightforward and effective approach for recognizing sources of a process failure in a multivariate Poisson process. Multivariate Poisson distributions are used widely in social and natural sciences, so the introduced test procedure may apply to various other disciplines.

When more than one hypothesis test is conducted at once, it is referred to as multiple testing. The probability of encountering one or more false positives while performing multiple tests is referred to as the family-wise error rate. To maintain the overall family-wise error rate close to our desired significance level and to reduce the risk of false positives, this study employs the Bonferroni method and the error spending approach. However, the Bonferroni correction is not the sole method for tackling multiple tests. Alternative techniques, including the Benjamani–Hochberg method [35], the Holm–Bonferroni method [36], and the Sidak method [37], are also viable options to consider. More studies are necessary to ascertain which method is the best.

A technique that approximates the critical value of the introduced test was created with the use of Boole’s inequality. In light of our numerical findings, the approximation works effectively under the typical nominal level. Although the approximation may be satisfactory for numerous applications, using a sharper inequality could enhance its performance. Further investigation of this possibility is needed. In addition, as various other types of multivariate processes exist, further research is needed to determine whether other multivariate processes can be tackled using the same approach.

Author Contributions

Conceptualization, C.-D.H. and R.-H.S.; methodology, C.-D.H. and R.-H.S.; software, C.-D.H. and R.-H.S.; validation, R.-H.S.; formal analysis, C.-D.H.; investigation, C.-D.H. and R.-H.S.; resources, C.-D.H. and R.-H.S.; data curation, C.-D.H. and R.-H.S.; writing—original draft preparation, C.-D.H. and R.-H.S.; writing—review and editing, C.-D.H. and R.-H.S.; project administration, C.-D.H. and R.-H.S.; visualization, C.-D.H. and R.-H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work is partially supported by the National Science and Technology Council, Taiwan, under grant numbers MOST 112-2118-M-030-002 (C.-D.H.).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Low, C.; Hsu, C.M.; Yu, F.J. Analysis of variations in a multivariate process using neural networks. Int. J. Adv. Manuf. Technol. 2003, 22, 911–921. [Google Scholar] [CrossRef]
Chen, L.H.; Wang, T.Y. Artificial neural networks to classify mean shifts from multivariate chart signals. Comput. Ind. Eng. 2004, 47, 195–205. [Google Scholar] [CrossRef]
Hwarng, H.B.; Wang, Y. Shift detection and source identification in multivariate autocorrelated process. Int. J. Prod. Res. 2010, 48, 835–859. [Google Scholar] [CrossRef]
Shao, Y.E.; Huang, H.Y.; Chen, Y.J. Determining the sources of variance shifts in a multivariate process using flexible discriminant analysis. ICIC Express Lett. 2010, 4, 1573–1578. [Google Scholar]
Shao, Y.E.; Lu, C.J.; Wang, Y.C. A hybrid ICA-SVM approach for determining the quality variables at fault in a multivariate process. Math. Probl. Eng. 2012, 2012, 284910. [Google Scholar] [CrossRef]
Shao, Y.E.; Hou, C.D. Hybrid artificial neural networks modeling for faults identification of a stochastic multivariate process. Abstr. Appl. Anal. 2013, 2013, 386757. [Google Scholar] [CrossRef]
Shao, Y.E.; Hou, C.D. Fault identification in industrial processes using an integrated approach of neural network and analysis of variance. Math. Probl. Eng. 2013, 2013, 516760. [Google Scholar] [CrossRef]
Shao, Y.E. Recognition of process disturbances for an SPC/EPC stochastic system using support vector machine and artificial neural network approaches. Abstr. Appl. Anal. 2014, 2014, 519705. [Google Scholar] [CrossRef]
Shao, Y.E. Using a computational intelligence hybrid approach to recognize the faults of variance shifts for a manufacturing process. J. Ind. Intell. Inf. 2016, 4, 131–135. [Google Scholar] [CrossRef]
Shao, Y.E.; Lin, S.C. Using a time delay neural network approach to diagnose the out-of-control signals for a multivariate normal process with variance shifts. Mathematics 2019, 7, 959. [Google Scholar] [CrossRef]
Sabahno, H.; Niaki, S.T.A. New Machine-Learning Control Charts for Simultaneous Monitoring of Multivariate Normal Process Parameters with Detection and Identification. Mathematics 2023, 11, 3566. [Google Scholar] [CrossRef]
Runger, G.C.; Alt, F.B.; Montgomery, D.C. Contributors to a multivariate statistical process control signal. Commun. Stat.-Theory Methods 1996, 25, 2203–2213. [Google Scholar] [CrossRef]
Mason, R.L.; Tracy, N.D.; Young, J.C. A practical approach for interpreting multivariate T² control chart signals. J. Qual. Technol. 1997, 29, 396–406. [Google Scholar] [CrossRef]
Maravelakisa, P.E.; Bersimisb, S.; Panaretosc, J.; Psarakisa, S. Identifying the out of control variable in a multivariate control chart. Commun. Stat.-Theory Methods 2002, 31, 2391–2408. [Google Scholar] [CrossRef]
Vives-Mestres, M.; Daunis-i-Estadella, J.; Martín-Fernández, J.A. Signal interpretation in Hotelling’s T² control chart for compositional data. IIE Trans. 2016, 48, 661–672. [Google Scholar] [CrossRef]
Kim, J.; Jeong, M.K.; Elsayed, E.A.; Al-Khalifa, K.N.; Hamouda, A.M.S. An adaptive step-down procedure for fault variable identification. Int. J. Prod. Res. 2016, 54, 3187–3200. [Google Scholar] [CrossRef]
Pina-Monarrez, M. Generalization of the Hotelling’s T² decomposition method to the R-chart. Int. J. Ind. Eng.-Theory Appl. Pract. 2018, 25, 200–214. [Google Scholar] [CrossRef]
Güler, Z.O.; Bakır, M.A. Detection and identification of mean shift using independent component analysis in multivariate processes. J. Stat. Comput. Simul. 2021, 92, 1920–1940. [Google Scholar] [CrossRef]
Haq, A.; Khoo, M.B.C. An adaptive multivariate EWMA mean chart with variable sample sizes and/or variable sampling intervals. Qual. Reliab. Eng. Int. 2022, 38, 3322–3341. [Google Scholar] [CrossRef]
Jing, H.; Li, J.; Bai, K. Directional monitoring and diagnosis for covariance matrices. J. Appl. Stat. 2022, 49, 1449–1464. [Google Scholar] [CrossRef]
Lu, X.S.; Xie, M.; Goh, T.N.; Lai, C.D. Control chart for multivariate attribute processes. Int. J. Prod. Res. 1998, 36, 3477–3489. [Google Scholar] [CrossRef]
Taleb, H. Control charts applications for multivariate attribute processes. Comput. Ind. Eng. 2009, 56, 399–410. [Google Scholar] [CrossRef]
Topalidou, E.; Psarakis, S. Review of multinomial and multiattribute quality control charts. Qual. Reliab. Eng. Int. 2009, 25, 773–804. [Google Scholar] [CrossRef]
Chiu, J.E.; Kuo, T.I. Control charts for fraction nonconforming in a bivariate binomial process. J. Appl. Stat. 2010, 37, 1717–1728. [Google Scholar] [CrossRef]
Yang, S.F.; Yeh, J.T. Using cause selecting control charts to monitor dependent process stages with attributes data. Expert Syst. Appl. 2011, 38, 667–672. [Google Scholar] [CrossRef]
Li, J.; Tsung, F.; Zou, C. Directional control schemes for multivariate categorical processes. J. Qual. Technol. 2012, 44, 136–154. [Google Scholar] [CrossRef]
Niaki, S.T.A.; Jahani, P. The economic design of multivariate binomial EWMA VSSI control charts. J. Appl. Stat. 2013, 40, 1301–1318. [Google Scholar] [CrossRef]
Li, J.; Tsung, F.; Zou, C. Multivariate binomial/multinomial control chart. IIE Trans 2014, 46, 526–542. [Google Scholar] [CrossRef]
Niaki, S.T.A.; Khedmati, M. Step change-point estimation of multivariate binomial processes. Int. J. Qual. Reliab. Manag. 2014, 31, 566–587. [Google Scholar] [CrossRef]
Pascual, F.G.; Akhundjanov, S.B. Copula-based control charts for monitoring multivariate Poisson processes with application to hepatitis C counts. J. Qual. Technol. 2020, 52, 128–144. [Google Scholar] [CrossRef]
Chiu, J.E.; Kuo, T.I. Attribute control chart for multivariate Poisson distribution. Commun. Stat.-Theory Methods 2008, 37, 146–158. [Google Scholar] [CrossRef]
Suri, N.N.R.R.; Murty, M.N.; Athithan, G. Outlier Detection: Techniques and Applications. A Data Mining Perspective; Springer Nature: Cham, Switzerland, 2019. [Google Scholar]
Krummenauer, F. Efficient simulation of multivariate binomial and Poisson distributions. Biom. J. 1998, 40, 823–832. [Google Scholar] [CrossRef]
Hou, C.D.; Chiang, J.; Tai, J.J. Identifying chromosomal fragile sites from a hierarchical-clustering point of view. Biometrics 2001, 57, 435–440. [Google Scholar] [CrossRef]
Benjamini, Y.; Hochberg, Y. Controlling the false discovery rate: A practical and powerful approach to multiple hypothesis testing. J. R. Stat. Soc. Ser. B-Stat. Methodol. 1995, 57, 289–300. [Google Scholar] [CrossRef]
Holm, S. A simple sequentially rejective multiple test procedure. Scand. J. Stat. 1979, 6, 65–70. [Google Scholar] [CrossRef]
Sidák, Z.K. Rectangular confidence regions for the means of multivariate normal distributions. J. Am. Stat. Assoc. 1967, 62, 626–633. [Google Scholar] [CrossRef]

Figure 1. The relationship between

ρ

and deviation, with each

λ

and (k, n) shown separately.

Figure 1. The relationship between

ρ

and deviation, with each

λ

and (k, n) shown separately.

Figure 2. The relationship between

δ

and the ARR under case I for various sample sizes with k = 2 and k = 3.

Figure 2. The relationship between

δ

and the ARR under case I for various sample sizes with k = 2 and k = 3.

Figure 3. The relationship between

δ

and the ARRs under different cases for various sample sizes with k = 3.

Figure 3. The relationship between

δ

and the ARRs under different cases for various sample sizes with k = 3.

Table 1. The results of the one-sample Kolmogorov–Smirnov test for checking whether the variables conform to a Poisson distribution.

State	N	V	S
Test statistic	0.43	0.75	0.46
p-value	0.99	0.63	0.98

Table 2. The results of the Fisher z-transformation test for testing whether the correlation coefficients between each pair of variables follow Equation (15).

Correlation Coefficient	$ρ_{N, V}$	$ρ_{N, S}$	$ρ_{V, S}$
Test statistic	0.98	0.94	0.36
p-value	0.33	0.36	0.72

Table 3. The summary statistics of the simulated disease counts.

State	N	V	S
Sample mean	5.00	17.40	5.70
Adjusted residual	3.30	2.87	1.30

Table 4. Demonstration of the proposed test procedure (

α = 0.05

).

Table 4. Demonstration of the proposed test procedure (

α = 0.05

).

Iteration $i$	k	$R = (R_{1}, \dots, R_{k})$	Test Statistic $\underset{j}{m a x} R_{j}$	Critical Value $Z_{\frac{α^{*}}{k}}$	Summary
1	3	(3.30, 2.87, 1.30)	3.30	2.39	N is the contributor
2	2	(2.87, 1.30)	2.87	2.39	V is the contributor
3	1	(1.30)	1.30	2.24	S is not the contributor

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hou, C.-D.; Su, R.-H. An Outlier Detection Approach to Recognize the Sources of a Process Failure within a Multivariate Poisson Process. Mathematics 2024, 12, 2813. https://doi.org/10.3390/math12182813

AMA Style

Hou C-D, Su R-H. An Outlier Detection Approach to Recognize the Sources of a Process Failure within a Multivariate Poisson Process. Mathematics. 2024; 12(18):2813. https://doi.org/10.3390/math12182813

Chicago/Turabian Style

Hou, Chia-Ding, and Rung-Hung Su. 2024. "An Outlier Detection Approach to Recognize the Sources of a Process Failure within a Multivariate Poisson Process" Mathematics 12, no. 18: 2813. https://doi.org/10.3390/math12182813

APA Style

Hou, C.-D., & Su, R.-H. (2024). An Outlier Detection Approach to Recognize the Sources of a Process Failure within a Multivariate Poisson Process. Mathematics, 12(18), 2813. https://doi.org/10.3390/math12182813

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Outlier Detection Approach to Recognize the Sources of a Process Failure within a Multivariate Poisson Process

Abstract

1. Introduction

2. Main Results

2.1. A Method for Identifying Outliers in the Multivariate Poisson Distribution

2.2. A Method for Recognizing the Causes of a Process Failure in a Multivariate Poisson Process

3. Numerical Simulations

4. A Demonstrative Case

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI