Next Article in Journal
Solar Ultraviolet Bursts in the Joint Footpoints of Multiple Transition Region Loops
Next Article in Special Issue
Control Charts for Monitoring the Mean of Skew-Normal Samples
Previous Article in Journal
Multifarious Roles of Hidden Chiral-Scale Symmetry: “Quenching” gA in Nuclei
Previous Article in Special Issue
The Development of a Heterogeneous MP Data Model Based on the Ontological Approach
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Deep Learning-Based Residual Control Chart for Binary Response

1
Division of Science and Mathematics, University of Minnesota-Morris, Morris, MN 56267, USA
2
Department of Statistics, Pukyong National University, Busan 48513, Korea
*
Author to whom correspondence should be addressed.
Symmetry 2021, 13(8), 1389; https://doi.org/10.3390/sym13081389
Submission received: 7 June 2021 / Revised: 20 July 2021 / Accepted: 29 July 2021 / Published: 31 July 2021
(This article belongs to the Special Issue New Advances and Applications in Statistical Quality Control)

Abstract

:
A residual (r) control chart of asymmetrical and non-normal binary response variable with highly correlated explanatory variables is proposed in this research. To avoid multicollinearity between multiple explanatory variables, we employ and compare a neural network regression model and deep learning regression model using Bayesian variable selection (BVS), principal component analysis (PCA), nonlinear PCA (NLPCA) or whole multiple explanatory variables. The advantage of our r control chart is able to process both non-normal and correlated multivariate explanatory variables by employing a neural network model and deep learning model. We prove that the deep learning r control chart is relatively efficient to monitor the simulated and real binary response asymmetric data compared with r control chart of the generalized linear model (GLM) with probit and logit link functions and neural network r control chart.

1. Introduction

The COVID-19 pandemic started at the end of the year 2019. It has dramatically changed the social life of human activity since people fight against the spread of COVID-19 by covering their faces wearing masks and doing social distancing. Artificial intelligence (AI) platform-based contact-free human activity has been more common in our society since the occurrence of the COVID-19 pandemic. Therefore, deep learning and machine learning methods for artificial intelligence have been exponentially developed by software engineers recently. However, applying the deep learning method to the quality control area has not been deeply considered yet even though AI-based products such as smart glasses with optical head-mounted display have been developed in our society rapidly.
Quality improvement is an endless objective in manufacturing industries. To improve the quality of a product, researchers try to reduce process variations by using the statistical process control (SPC) which has been an essential statistical method to complete this objective and monitor industrial processes among the quality control society. Walter A. Shewhart developed this control chart in 1924 and has been named as Shewhart control chart which is a graphical display of the quality characteristic used for the monitoring of the process. The main creative idea by Shewhart is to consider the variability of a production process in terms of statistical viewpoints and analyze the variation of a process into common and special causes. Many variants of the SPC have been developed since then. Many diverse control charts can be found in [1,2]. Now, we have too many SPCs available so that the choice of the appropriate SPC considering symmetric or asymmetric data has become a prime research question among professionals for quality control in manufacturing industries. Asymmetric, big and highly correlated datasets have been produced in our modern society, and those data have asymmetric and non-normal distributions. Therefore, it is difficult for quality control researchers to handle highly correlated and asymmetric data because the current available quality control charts can not handle asymmetrical data. Therefore, it is common to get inaccurate quality control information from the current available control charts. Numerous multivariate control charts such as the Hotelling T 2 distribution [3], mulvariate CUSUM [4] and multivariate EWMA [5] have been proposed to monitor a process mean vector. But these multivariate control charts have a difficulty to handle non-normal and asymmetric data because of the estimation issue of the unknown covariance structure. Neural network-based approach to quality control research has been popularly applied. Recently [6] proposed r control charts for binary asymmetrical response variable with highly correlated multivariate covariates by using a single layer neural network regression model.
In this research, we extend the single hidden layer neural network regression-based r control charts for binary asymmetrical data to a deep learning regression model with multiple hidden layers via Bayesian variable selection (BVS), principal component analysis (PCA) and nonlinear PCA (NLPCA) so that our r control chart can solve a multicollinearity problem among independent variables. Reference [7] also proposed Poisson, negative binomial and COM-Poisson-based principal component regression-based r-control charts for monitoring dispersed count data to avoid the multicollinearity problem.
Our research proves that our deep learning r control chart has better efficiency than the current methods [6] while overcoming the multicollinearity issue of high-dimensional correlated multivariate data. Our deep learning r control chart will be evaluated with simulated data and Cleveland heart disease read data found in the UCI machine learning repository.

2. Statistical Methods

This research presents deep learning regression-based r-control charts for binary asymmetrical data with multicollinearity among independent variables. We also compare deep learning regression-based r-control charts with whole data, which means it does not apply one of BVS, PCA, and NLPCA to the whole data, with deep learning regression-based r-control charts with dimension reduction data by applying one of BVS, PCA, and NLPCA to the whole data. In addition, we compare our proposed control chart with the binary response regression models (GLM with logit and probit, and neural network regression model) proposed by [6].

2.1. Bayesian Variable Selection and Dimension Reduction by Principal Component Analysis

Before we apply the proposed control chart to a multivariate dataset, we employ the Bayesian variable selection and PCA methods to avoid the multicollearity issue of the multivariate dataset. First, we introduce Objective Bayesian variable selection in linear models proposed in [8]. We used GibbsBvs function with gZellner prior in BayesVarSel R package and performed the number of iterations = 10,000 and the number of burninng = 100 in the BayesVarSel R package [9] for applying the Bayesian variable selection method to a simulated data and real data, the Cleveland heart disease data [10].
In this paper, the r control chart for binary response regression model with the important selected variables by the BVS is a new proposed SPC method which monitors the binary response variable. The BVS method will be applied to GLM with probit, GLM with logit, neural network and deep learning regression models with simulated and real data. PCA is a statistical dimensional reduction method converting a multivariate data set of correlated variables into a set of values of linearly uncorrelated variables called principal components which account for the variation of the original data.
References [6,7,11] considered PCA method for SPCwith the multivariate highly correlated data. Reference [12] proposed nonlinear principal component analysis (NLPCA) as a kernel eigenvalue problem. Unlike linear PCA, the nonlinear kernel PCA (NLPCA) method is a statistical method for performing a nonlinear form of principal component analysis. To extract five principal components in high-dimensional feature spaces, using kernel PCA, we used the ‘kernlab’ R package [13] which provides the most popular kernel functions. We used Gaussian Radial Basis kernel function with hyperparameter: sigma = 0.2 which is inverse kernel width for the radial basis kernel function.
The r control chart for the binary response regression model with primary principal components by PCA was introduced in [6]. But, in this paper, the r control chart for binary response regression model with primary principal components by the NLPCA is a new statistical process control which monitors the binary response variable as a function of uncorrelated PCs, overcoming a multicollinearity issue among independent variables. The new method will be applied to GLM with probit, GLM with logit, neural network and deep learning regression models with simulated and real data.

2.2. Generalized Linear Model and Neural Network Model for Binary Response Data

The r control charts for binary response regression models, such as the GLM with logit and probit, and neural network regression model were proposed by [6]. The GLM has the following probability density distribution which comes from the exponential family:
f ( y | λ , δ ) = exp y λ a 2 ( λ ) a 1 ( δ ) + a 3 ( y , δ ) ,
where we denote the response variable to be y, the location parameter to be λ , the dispersion parameter to be δ , and arbitrary functions to be a 1 ( · ) , a 2 ( · ) , and a 3 ( · ) . In particular, a 1 ( δ ) is commonly of the form a 1 ( δ ) = δ or a 1 ( δ ) = δ / w with a known weight w, a 2 ( λ ) is a cumulant function of λ , and a 3 ( y , δ ) is a function of y and δ : for various forms of the three functions, we recommend to see Section 2.2.2 in [14].
We denote ζ = x b to be the linear predictor for the response, y , so that ζ is a linear combination of unknown parameters b = ( b 0 , b 1 , , b p ) and input variables x = ( 1 , x 1 , , x p ) . A link function g, such that E ( y ) = g 1 ( ζ ) , provides the relationship between the linear predictor and the mean of the distribution function. The link function g ( · ) specifies how to convert the expected value μ = E ( y ) to the linear predictor ζ : i.e.,
ζ = g ( μ ) = x b .
As an example, when the response variable y follows a Bernoulli distribution with a success probability p, we have p = E ( y ) = μ . If in (1) we take a logit link function with g ( p ) = L o g i t ( p ) = log { p / ( 1 p ) } where p = P ( y = 1 | x ) , a logit model (or logistic model) is given by
L o g i t [ P ( y = 1 | x ) ] = x b ,
where b is the column vector of the fixed-effects regression coefficients. For the (2), it can be written as
P ( y = 1 | x ) = 1 1 + e x b = p ( x )
The response probability distribution of the GLMs belongs to an exponential family of distributions which employ methods analogous to normal linear methods for the normal data [15,16]. Therefore, for asymmetrical (non-normal) distributed data, the GLM with probit link function may not be the best model. That was the motivation that [6] proposed a neural network model based on the r control chart for better predictive accuracy with the non-normal data.
Artificial neural networks (ANNs) are the same as biological neural networks which imitate human brain activity through computer simulations [17,18,19] The ANN uses the concept of weight to select the highest probability of inhibiting all neurons [17,18,19]. The basic formula of an ANN is a single layer feedforward type of connection among neurons. ANNs have input layers and multiple hidden layers. Lastly, the hidden layers are connected to the output layer, which produces the outputs. Reference [20] proposed a pattern recognition for bivariate process mean shifts using feature-based ANN and [21] proposed a control chart pattern recognition using radial basis function (RBF) neural networks. Recently, Reference [22] proposed statistical process control with intelligence based on the deep learning model and reviewed the neural network-based statistical process control. In this paper, we used the ‘nnet’ R packge [23] for feed-forward neural networks with a single hidden layer, and for the deep learning model, we used the ‘deepnet’ R packge [24] with the backpropagation (BP) algorithm for training feed-forward neural networks by using the ‘nn.predict’ command.
Based on the setup of [6], the r-control charts for binary response data use GLM models with logit and probit link functions, and neural network models employ deviance residuals being independent and asymptotically normally distributed with zero mean and unit variance, i.e., r i N ( 0 , 1 ) for i = 1 , , n . In this research, we chose a deviance residual for the GLM models with logit and probit link functions and a neural network model because the R packages for the GLM models with logit and probit link functions and the neural network model have a command for producing the deviance residual. It is easy to compare the residuals from both models, which are the GLM-based model, single hidden layer neural network model and multiple hidden layers deep learning model. Reference [25] proposed Shewhart control limits for the deviance residuals as follows:
E ( r i ) ± k V a r ( r i ) ± k ,
where k is defined by the false alarm probability, α = 1 / A R L 0 , and A R L 0 is the average run length (ARL) under the process in-control. The ARL is a measure of the performance of control charts for monitoring a process.

2.3. New Binary Response Statistical Process Control Procedure

Our r control chart for binary response uses the following statistical process control procedures for the deviance residuals:
  • Apply the BVS, PCS and NLPCS to input variables X and obtain the important selected variables or principal components.
  • Fit the binary response regression model by using the binary response variable y and the important selected variables or the principal components through probit link function, logit link function, and neural network or deep learning regression models, respectively.
  • Obtain the deviance residuals from each model.
  • Set the k value and obtain the lower and upper control limits of the r-charts using (4).

3. Illustrated Examples

With the proposed method in Section 2, we perform the efficiency comparison among the proposed methods with simulated data and real data.

3.1. Simulation Study

With high correlated and non-normal simulated data, we want to compare the r control charts for binary response regression models introduced in the Section 2. So we need to generate high correlated and non-normal simulated data denoting X as input variables.
Because of relaxing the assumptions of normality, linearity and independence, copulas have been popular in the research areas of biostatics, econometrics, finance and statistics over the last three decades. A copula is a statistical method to find the dependence structure of multivariate data. By using the copula, we can have the marginal behavior of a random variable and the joint dependence of two random variables. Every joint distribution can be expressed by as F X Y ( x , y ) = C ( F X ( x ) , F Y ( y ) ) = C ( u , v ) where u = F X and v = F Y are marginal distributions.
A bivariate copula is a function C : [ 0 , 1 ] 2 [ 0 , 1 ] , whose domain is the entire unit square with the following three properties:
(i)
C ( u , 0 ) = C ( 0 , v ) = 0 ,   u , v [ 0 , 1 ] ;
(ii)
C ( u , 1 ) = C ( 1 , u ) = u ,   u [ 0 , 1 ] ;
(iii)
C ( u 1 , v 1 ) C ( u 1 , v 2 ) C ( u 2 , v 1 ) + C ( u 2 , v 2 ) 0 , u 1 , u 2 , v 1 , v 2 [ 0 , 1 ] such that u 1 u 2 and v 1 v 2 where u i = F X i ( x i ) and v i = F Y i ( x i ) , i = 1 , 2 .
See [26,27] for the definitions of the copula in detail. To construct a highly correlated dependence structure of input variables, we employed two Archimedean copula functions. One is the Clayton copula with a dependence parameter equalling to 3 and the number of dimensions equal to 30, and the other is the Gumbel copula with a dependence parameter equal to 30 and the number of dimensions equalling to 30.
The reason of choosing the Clayton and the Gumbel copulas for generating simulated data in this research is that the Clayton copula is an asymmetric Archimedean copula, exhibiting greater dependence in the negative tail than in the positive and the Gumbel copula is an asymmetric Archimedean copula, exhibiting greater dependence in the positive tail than in the negative.
We generate a random sample of 1000 observations from each copula. The random sample is assigned to X as input variables. With each simulated random sample data X , we define the coefficients of parameters ( β ’s) to be β 0 = 0.17186 , β 1 = 0.47606 , β 2 = 0.15747 , β 3 = 0.96697 , β 4 = 0.42469 , β 5 = 0.90749 , β 6 = 0.33934 , β 7 = 1.10366 , β 8 = 0.33125 , β 9 = 0.10993 , β 10 = 0.81525 , β 11 = 0.16434 , β 12 = 0.13478 , β 13 = 0.56175 , β 14 = 0.53524 , β 15 = 0.80218 , β 16 = 0.17433 , β 17 = 0.32668 , β 18 = 0.47729 , β 19 = 0.67109 , β 20 = 0.37587 , β 21 = 0.06678 , β 22 = 0.38838 , β 23 = 0.43394 , β 24 = 0.04385 , β 25 = 0.22217 , β 26 = 0.78461 , β 27 = 0.32422 , β 28 = 0.10109 , β 29 = 0.23819 , β 30 = 0.65768 so that P ( y = 1 ) = 1 1 + exp ( β 0 i = 1 30 β i x i ) ) which passes through an inverse logit function. Then, we generate the response variable y randomly by using the Bernoulli distribution with the probability P ( y = 1 ) with sample size 1000.
For the one (‘1’) inflated case of binary response data, we added 0.1 to the probability P ( y = 1 ) , such as P ( y = 1 ) + 0.1 , and for the zero (‘0’) inflated case of binary response data, we subtracted 0.1 from the probability P ( y = 1 ) , such as P ( y = 1 ) 0.1 . Additionally, P ( y = 1 ) is used for the in-control dispersion case.
In each setup, we perform 1000 different replications of sample size of 1000. Table 1 shows the simulation results. With the simulated data, 70% of data were assigned to the training data and 30% of data were assigned to the test data.
We apply the BVS, PCS, and NLPCS to input variables X and then we fit the binary response regression model by using the binary response variable y and the important selected variables or the principal components or the whole data through the probit link function, logit link function, and neural network or deep learning regression models, respectively. We used the ‘nnet’ R packge [23] for feed-forward neural networks with a single hidden layer with 30 neurons by using ‘predict’ command, and for the deep learning model, we used the ‘deepnet’ R packge [24] with the backpropagation (BP) algorithm for training feed-forward neural networks with double hidden layers and (15, 15) neurons by using the ‘nn.predict’ command.
Root   MSE = i = 1 n ( x i x ^ i ) n ,
where Root MSE = root mean squared error, i = variable   i , n = number of observations, x i = actual   observation , and x ^ i = predicted value of x i observations.
By using the Root MSE Formula (5), we performed the Root MSE of each simulated in-control data of sample size 1000 with 1000 repetitions in Table 1. It is a surprising result that the r-chart based on the deep learning models with BVS, PCS, NLPCS and whole data for both the Clayton and the Gumbel copulas show a superiority to all other cases in Table 1 in terms of the accuracy and precision by mean, median, and interquartile range (IQR).
From Figure 1 and Figure 2 in cases of the in-control, over dispersion and under dispersion, we can observe that the residuals of deep learning regression models with BVS, PCS, NLPCS, and whole data for both the Clayton and Gumbel copulas show a superiority over the neural network regression models with BVS, PCS, NLPCS, and whole data in Table 1 in terms of the precision by a measure of spread, IQR.
With three cases of the in-control, over dispersion, and under dispersion in Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7, we apply the BVS, PCS, and NLPCS to input variables X and then we fit the binary response regression model by using the binary response variable y and the important selected variables or the principal components through neural network or deep learning regression models, respectively. By using the deviance residuals for each model and (4) for k = 1 , 2 , 3 , we compute the lower control limit (LCL) and upper control limit (UCL) for the process. The expected length of the confidence interval is computed by the average of the length of control limits. The coverage probability is the proportion of the deviance residuals contained in the control limits. The lower control limit and the upper control limit value for the r-chart are calculated by means of y minus and plus its one, two, and three standard deviations.
Mainly, we compare the results for the deep learning regression model and neural network regression model based on BVS, PCA, NLPCA, and whole data because [6] showed that the neural network regression model (Nnet) outperformed the GLMs with probit and logit link functions based on PCA. We found that, for the in-control case, the one-inflated case, and zero-inflated case, the expected lengths of the confidence interval on the deep learning regression model (DL) based on BVS, PCA, NLPCA, and whole data are shorter than in all other cases of Nnet in Table 2, Table 3, Table 4 and Table 5 while, in terms of the coverage probability, the DL is keeping overall higher than the neural network regression model (nnet).
In terms of the ARLs, the coverage probability and the expected length of the confidence interval, we note that the r-chart based on the DL based on whole data for monitoring observations is about the same as the r-chart based on the DL based on BVS, PCA, and NLPCA.

3.2. Real Data Analysis

For the real data application, we used Wisconsin breast cancer data in the R package ‘mlbench’ [23]. The objective of collecting the data was to identify a number of benign or malignant classes. Samples arrive periodically as Dr. Wolberg reports his clinical cases. The database, therefore, reflects this chronological grouping of the data. This grouping information appears immediately below, having been removed from the data itself. Each variable except for the first was converted into 11 primitive numerical attributes with values ranging from 0 through 10. There are 16 missing attribute values. A data frame contained 11 variables ((1) Id (Sample code number), (2) Cl.thickness (Clump Thickness), (3) Cell.size (Uniformity of Cell Size), (4) Cell.shape (Uniformity of Cell Shape), (5) Marg.adhesion (Marginal Adhesion), (6) Epith.c.size (Single Epithelial Cell Size), (7) Bare.nuclei (Bare Nuclei), (8) Bl.cromatin (Bland Chromatin), (9) Normal.nucleoli (Normal Nucleoli), (10) Mitoses, (11) Class), one being a character variable, 9 being ordered or nominal, and 1 target class.
By using the R package ‘missForest’ [28], we imputed the missing data in the Wisconsin breast cancer data. We set the target variable (y) to be Class (“malignant” = 0, “benign” = 1) and 9 input variables except for Id and Class variables in the Wisconsin breast cancer data. We also used the ‘nnet’ R packge [23] with a single hidden layer with 30 neurons by using ‘predict’ command, and for deep learning model, we used the R packge ‘deepnet’ [24] with double hidden layers and (15, 15) neurons by using ‘nn.predict’ command on the Wisconsin breast cancer data. By using the Root MSE formula (5), we performed Root MSE of each random sample data of sample size 478 ( = 0.683 × 0.7 ) out of the total number of data (683) with 1000 repetitions in Table 6. It confirms that the r-chart based on the DL models with BVS, PCS, NLPCS and whole data show a superiority to all other cases in Nnet models in Table 6 in terms of the accuracy and precision by mean and interquartile range (IQR).
The expected lengths of the confidence interval on the DL-based on BVS, PCA, NLPCA, and whole real data are shorter than in all other cases of Nnet in Table 7 while, in terms of the coverage probability, DL is overall keeping higher than the Nnet. In terms of the ARLs, the coverage probability and the expected length of the confidence interval, we note that the r-chart based on the DL based on whole real data is about the same as the r-chart based on the DL based on BVS, PCA, and NLPCA.
Therefore, from the simulation study and real data analysis, we confirmed that the DL based r control chart for binary response data on BVS, PCA, NLPCA, and whole real data are superior to the Nnet-based r control chart for binary response data on BVS, PCA, NLPCA, and whole data in terms of the accuracy, precision, coverage probability, and expected length of the confidence interval.

4. Conclusions

In this research, we have presented the binary response DL regression model-based statistical process control r-charts for dispersed binary asymmetrical data with multicollinearity among input variables. We have demonstrated the proposed DL method in terms of the model flexibility and performance by running simulations for various circumstances: in-control, one inflated-, or zero inflated-dispersion data. With both simulated data and real data, our DL proposed methods based on BVS, PCA, NLPCA, and whole data have shown a superiority of performance compared with the binary response regression model-based statistical control r-charts with the GLM with probit and logit link function models and Nnet based on BVS, PCA, NLPCA and whole data. We also showed that the binary response DL regression model-based statistical process control r-charts for dispersed binary asymmetrical data with multicollinearity among input variables does not need dimension reduction methods such as BVS, PCA, and NLPCA because the results with the dimension reduction methods, such as BVS, PCA, and NLPCA are the same as the results without the dimension reduction methods. Our proposed approach by deep learning is superior in handling cases of dispersed binary asymmetrical data with multicollinearity among explanatory variables. The conclusion in this research is that for the high-dimensional correlated multivariate covariate data, the binary control chart by DL is a good statistical process control method. Our proposed binary control chart by DL can be applied to improve the quality control of visual fault detection medical equipment devices such a full-body X-ray scanner or brain functional magnetic resonance imaging scanner or a computed tomography (CT) scanner for detecting cancers. Our future research will be the general version of DL-based SPC for categorical data or continuous data or mixed data for categorical and continuous data. We will also apply our proposed method to a multi-stage SPC for binary outcome variables given the covariates.

Author Contributions

J.M.K. designed the model, analyzed the data and wrote the paper. I.D.H. formulated the conceptual framework, designed the model, obtained inference, and wrote the paper. Both authors cooperated to revise the paper. Both authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. NRF-2020R1F1A1A01056987).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Montgomery, D.C. Statistical Quality Control, 7th ed.; John Wiley and Sons Press: New York, NY, USA, 2012. [Google Scholar]
  2. Qiu, P. Introduction to Statistical Process Control, 1st ed.; Chapman & Hall/CRC Texts in Statistical Science: Boca Raton, FL, USA, 2013. [Google Scholar]
  3. Hotelling, H. Multivariate Quality Control; McGraw-Hill: New York, NY, USA, 1947. [Google Scholar]
  4. Crosier, R.B. Multivariate generalizations of cumulative sum qualitycontrol schemes. Technometrics 1988, 30, 291–303. [Google Scholar] [CrossRef]
  5. Lowry, C.A.; Woodall, W.H.; Champ, C.W.; Rigdon, S.E. Multivariate exponentially weighted moving average control chart. Technometrics 1992, 34, 46–53. [Google Scholar] [CrossRef]
  6. Kim, J.-M.; Wang, N.; Liu, Y.; Park, K. Residual Control Chart for Binary Response with Multicollinearity Covariates by Neural Network Model. Symmetry 2020, 12, 381. [Google Scholar] [CrossRef] [Green Version]
  7. Park, K.; Kim, J.-M.; Jung, D. GLM-based statistical control r-charts for dispersed count data with multicollinearity between input variables. Qual. Reliab. Eng. Int. 2018, 34, 1103–1109. [Google Scholar] [CrossRef]
  8. Garcia-Donato, G.; Forte, A. Bayesian Testing, Variable Selection and Model Averaging in Linear Models using R with BayesVarSel. R J. 2018, 10, 329. [Google Scholar] [CrossRef] [Green Version]
  9. Forte, A. Bayes Factors, Model Choice and Variable Selection in Linear Models; R Package, BayesVarSel; R Foundation for Statistical Computing: Vienna, Austria, 2020. [Google Scholar]
  10. Leisch, F.; Dimitriadou, E. Machine Learning Benchmark Problems; R Package, mlbench; R Foundation for Statistical Computing: Vienna, Austria, 2015. [Google Scholar]
  11. Kim, J.-M.; Liu, Y.; Wang, N. Multi-stage change point detection with copula conditional distribution with PCA and functional PCA. Mathematics 2020, 8, 1777. [Google Scholar] [CrossRef]
  12. Schölkopf, B.; Smola, A.; Müller, K.-R. Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Comput. 1998, 10, 1299–1319. [Google Scholar] [CrossRef] [Green Version]
  13. Karatzoglou, A.; Smola, A.; Hornik, K.; National ICT Australia; Maniscalco, M.A.; Teo, C.H. Kernel-Based Machine Learning Lab; R Package, Kernlab; R Foundation for Statistical Computing: Vienna, Austria, 2019. [Google Scholar]
  14. McCullagh, P.; Nelder, J.A. Generalized Linear Models; Chapman and Hall: New York, NY, USA, 1989. [Google Scholar]
  15. Myers, R.H.; Montgomery, D.C.; Vining, G.G. Generalized Linear Models, with Applications in Engineering and the Sciences; John Wiley and Sons Press: New York, NY, USA, 2002. [Google Scholar]
  16. Nelder, J.A.; Wendderburn, R.W.M. Generalized linear model. J. R. Stat. Hencec. A 1972, 35, 370–384. [Google Scholar] [CrossRef]
  17. Agatonovic-Kustrin, S.; Beresford, R. Basic concepts of artificial neural network (ANN) modeling and its application in pharmaceutical research. J. Pharm. Biomed. Anal. 2000, 22, 717–727. [Google Scholar] [CrossRef]
  18. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [Google Scholar] [CrossRef] [PubMed]
  19. Hassabis, D.; Kumaran, D.; Summerfield, C.; Botvinick, M. Neuroscience-inspired artificial intelligence. Neuron 2017, 95, 245–258. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  20. Masood, I.; Hassan, A. Pattern Recognition for Bivariate Process Mean Shifts Using Feature-Based Artificial Neural Network. Int. J. Adv. Manuf. Technol. 2013, 66, 1201–1218. [Google Scholar] [CrossRef] [Green Version]
  21. Addeh, A.; Khormali, A.; Golilarz, N.A. Control Chart Pattern Recognition Using RBF Neural Network with New Training Algorithm and Practical Features. ISA Trans. 2018, 79, 202–216. [Google Scholar] [CrossRef] [PubMed]
  22. Zan, T.; Liu, Z.; Su, Z.; Wang, M.; Gao, X.; Chen, D. Statistical Process Control with Intelligence Based on the Deep Learning Model. Appl. Sci. 2020, 10, 308. [Google Scholar] [CrossRef] [Green Version]
  23. Ripley, B.; Venables, W. Feed-Forward Neural Networks and Multinomial Log-Linear Models; R Package, mlbench; R Foundation for Statistical Computing: Vienna, Austria, 2016. [Google Scholar]
  24. Rong, X. Deep Learning Toolkit in R; R Package, Deepnet; R Foundation for Statistical Computing: Vienna, Austria, 2015. [Google Scholar]
  25. Skinner, K.R.; Montgomery, D.C.; Runger, G.C. Process monitoring for multiple count data using generalized linear model-based control charts. Int. J. Prod. Res. 2003, 41, 1167–1180. [Google Scholar] [CrossRef]
  26. Nelsen, R.B. An Introduction to Copulas, 2th ed.; Springer: New York, NY, USA, 2006. [Google Scholar]
  27. Kim, J.-M. A Review of Copula Methods for Measuring Uncertainty in Finance and Economics. Quant. Bio-Sci. 2020, 39, 81–90. [Google Scholar]
  28. Stekhoven, D.J. Nonparametric Missing Value Imputation Using Random Forest; R Package, missForest; R Foundation for Statistical Computing: Vienna, Austria, 2016. [Google Scholar]
Figure 1. Violin plots of RMSE with Clayton Copula simulated data.
Figure 1. Violin plots of RMSE with Clayton Copula simulated data.
Symmetry 13 01389 g001aSymmetry 13 01389 g001b
Figure 2. Violin plots of RMSE with Gumbel Copula simulated data.
Figure 2. Violin plots of RMSE with Gumbel Copula simulated data.
Symmetry 13 01389 g002aSymmetry 13 01389 g002b
Table 1. Root MSE of simulated in-control data with 1000 Repetitions.
Table 1. Root MSE of simulated in-control data with 1000 Repetitions.
Clayton CopulaModelQ1MEDIANMEANQ3IQR
Whole DataNnet0.78980.8240.8270.8580.068
BVSNnet0.49490.4990.50.5040.009
PCANnet0.53480.5440.5460.5550.02
NLPCANnet0.52550.5330.5340.5420.017
Whole DataDL0.49930.50.5010.5020.003
BVSDL0.49910.50.5010.5020.003
PCADL0.49930.50.5010.5020.003
NLPCADL0.49930.50.5010.5020.003
PCALogit0.72880.7660.7670.8030.074
PCAProbit0.70450.7310.7320.7570.053
NLPCALogit0.72070.7540.7550.7910.071
NLPCAProbit0.70420.7290.730.7570.053
Gumbel CopulaModelQ1MEDIANMEANQ3IQR
Whole DataNnet0.73720.7690.7710.7990.062
BVSNnet0.49960.5030.5030.5070.007
PCANnet0.5370.5460.5470.5560.019
NLPCANnet0.53090.5390.5410.5490.018
Whole DataDL0.49980.50.5010.5020.002
BVSDL0.49980.50.5010.5020.002
PCADL0.49980.5010.5010.5020.002
NLPCADL0.49980.5010.5010.5020.002
PCALogit0.66430.6910.6940.7230.059
PCAProbit0.67510.6970.6980.7210.046
NLPCALogit0.66560.6930.6960.7280.062
NLPCAProbit0.67480.6970.6980.7210.046
Table 2. ARLs of simulated data by Clayton Copula.
Table 2. ARLs of simulated data by Clayton Copula.
Clayton CopulaWhole DataNnetDL
1 σ 2 σ 3 σ 1 σ 2 σ 3 σ
In-ControlARL3.33420.71112.7462.02NANA
CL0000.0020.0020.002
LCL−0.821−1.642−2.464−0.498−0.998−1.498
UCL0.8221.6432.4640.5011.0011.501
CI Length1.6433.2854.92811.9992.999
Coverage0.70.9550.9940.5511
Over-dispersionARL3.29222.289106.5632.695NANA
CL0.0020.0020.002−0.002−0.002−0.002
LCL−0.81−1.621−2.432−0.495−0.988−1.481
UCL0.8131.6242.4350.4910.9831.476
CI Length1.6223.2454.8670.9861.9712.957
Coverage0.6990.9560.9940.58511
Under-dispersionARL3.26122.502112.4932.561NANA
CL−0.001−0.002−0.0010.0010.0010.002
LCL−0.801−1.601−2.4−0.485−0.971−1.457
UCL0.7991.5982.3980.4870.9741.46
CI Length1.63.1994.7980.9721.9452.917
Coverage0.6990.9560.9950.61711
Clayton CopulaBVSNnetDL
1 σ 2 σ 3 σ 1 σ 2 σ 3 σ
In-ControlARL2.264138.234130.3332.111NANA
CL0.0010.0010.0010.0010.0010.001
LCL−0.504−1.009−1.513−0.498−0.998−1.498
UCL0.5061.011.5150.5011.0011.501
CI Length1.0092.0193.02811.9992.999
Coverage0.5480.99910.54311
Over-dispersionARL2.476111.384146.1312.534NANA
CL000000
LCL−0.513−1.026−1.54−0.493−0.986−1.479
UCL0.5131.0271.540.4930.9861.479
CI Length1.0262.0533.0790.9861.9722.957
Coverage0.5870.99510.58411
Under-dispersionARL2.486149.789181.3532.623NANA
CL−0.002−0.002−0.002−0.001−0.001−0.001
LCL−0.494−0.986−1.478−0.488−0.974−1.46
UCL0.490.9821.4740.4850.9711.457
CI Length0.9841.9682.9520.9731.9452.917
Coverage0.5970.99910.61611
Table 3. ARLs of Simulated Data by Clayton Copula.
Table 3. ARLs of Simulated Data by Clayton Copula.
Clayton CopulaPCANnetDL
1 σ 2 σ 3 σ 1 σ 2 σ 3 σ
In-ControlARL2.58582.355132.4372.135NANA
CL−0.001−0.001−0.001−0.002−0.002−0.002
LCL−0.545−1.088−1.632−0.502−1.002−1.501
UCL0.5431.0861.630.4980.9971.497
CI Length1.0872.1753.26211.9992.998
Coverage0.610.990.9990.54911
Over-dispersionARL2.72773.39147.5812.365NANA
CL0.0010.0010.001−0.001−0.001−0.001
LCL−0.537−1.074−1.611−0.493−0.986−1.479
UCL0.5381.0761.6130.4920.9851.477
CI Length1.0752.153.2250.9851.9712.956
Coverage0.6130.9880.9990.58611
Under-dispersionARL2.59771.143143.4742.593NANA
CL0.0030.0030.0030.0040.0040.004
LCL−0.528−1.06−1.591−0.483−0.97−1.457
UCL0.5341.0651.5970.4910.9781.465
CI Length1.0632.1253.1880.9741.9482.922
Coverage0.6140.9870.9990.61311
Clayton CopulaNLPCANnetDL
1 σ 2 σ 3 σ 1 σ 2 σ 3 σ
In-ControlARL2.448105.494143.1632.135NANA
CL−0.001−0.001−0.001−0.002−0.002−0.002
LCL−0.536−1.071−1.605−0.502−1.002−1.501
UCL0.5331.0681.6030.4980.9971.497
CI Length1.0692.1393.20811.9992.998
Coverage0.5920.99410.54911
Over-dispersionARL2.47895.062135.0562.365NANA
CL000−0.001−0.001−0.001
LCL−0.528−1.056−1.584−0.493−0.986−1.479
UCL0.5281.0561.5840.4920.9851.477
CI Length1.0562.1123.1680.9851.9712.956
Coverage0.5960.99210.58611
Under-dispersionARL2.57490.481141.2412.593NANA
CL0.0040.0040.0040.0040.0040.004
LCL−0.517−1.038−1.559−0.483−0.97−1.457
UCL0.5251.0461.5670.4910.9781.465
CI Length1.0422.0843.1260.9741.9482.922
Coverage0.6030.99210.61311
Table 4. ARLs of simulated data by Gumbel Copula.
Table 4. ARLs of simulated data by Gumbel Copula.
Gumbel CopulaWhole DataNnetDL
1 σ 2 σ 3 σ 1 σ 2 σ 3 σ
In-ControlARL3.39721.495108.6162.113NANA
CL0.0020.0020.0020.0010.0010.001
LCL−0.793−1.588−2.383−0.499−0.999−1.498
UCL0.7971.5922.3860.5011.0011.500
CI Length1.5903.1794.7691.0001.9992.999
Coverage0.7160.9550.9940.5491.0001.000
Over-dispersionARL3.33621.692109.0062.484NANA
CL0.0040.0040.004−0.001−0.001−0.001
LCL−0.778−1.559−2.340−0.493−0.986−1.479
UCL0.7851.5662.3470.4920.9851.478
CI Length1.5623.1244.6870.9861.9712.957
Coverage0.7120.9540.9940.5851.0001.000
Under-dispersionARL3.68421.393110.3142.582NANA
CL−0.004−0.004−0.0040.0010.0010.001
LCL−0.772−1.540−2.308−0.486−0.973−1.459
UCL0.7651.5332.3010.4870.9741.460
CI Length1.5363.0734.6090.9731.9462.919
Coverage0.7120.9550.9940.6151.0001.000
Gumbel CopulaBVSNnetDL
1 σ 2 σ 3 σ 1 σ 2 σ 3 σ
In-ControlARL2.217129.387156.5002.088NANA
CL0.0020.0020.0020.0010.0010.001
LCL−0.504−1.010−1.515−0.499−0.998−1.498
UCL0.5071.0131.5180.5011.0011.500
CI Length1.0112.0223.0330.9991.9992.998
Coverage0.5470.9991.0000.5431.0001.000
Over-dispersionARL2.350137.165157.2922.316NANA
CL0.0000.0000.0000.0010.0010.001
LCL−0.499−0.998−1.497−0.492−0.985−1.478
UCL0.4990.9971.4960.4930.9861.479
CI Length0.9981.9952.9930.9861.9712.957
Coverage0.5720.9991.0000.5851.0001.000
Under-dispersionARL2.504129.826119.9472.592NANA
CL0.0000.0000.0000.0010.0010.001
LCL−0.494−0.987−1.480−0.486−0.972−1.459
UCL0.4930.9861.4790.4870.9741.460
CI Length0.9861.9732.9590.9731.9462.919
Coverage0.5990.9991.0000.6151.0001.000
Table 5. ARLs of simulated data by Gumbel Copula.
Table 5. ARLs of simulated data by Gumbel Copula.
Gumbel CopulaPCANnetDL
1 σ 2 σ 3 σ 1 σ 2 σ 3 σ
In-ControlARL2.64087.182153.2422.153NANA
CL0.0010.0010.0010.0010.0010.001
LCL−0.540−1.081−1.622−0.499−0.998−1.498
UCL0.5431.0841.6250.5011.0011.500
CI Length1.0822.1653.2471.0001.9992.999
Coverage0.6070.9900.9990.5461.0001.000
Over-dispersionARL2.52283.406146.5872.231NANA
CL−0.001−0.001−0.001−0.001−0.001−0.001
LCL−0.536−1.070−1.605−0.493−0.986−1.479
UCL0.5341.0681.6030.4920.9851.478
CI Length1.0692.1393.2080.9861.9712.957
Coverage0.6070.9890.9990.5851.0001.000
Under-dispersionARL2.42876.504142.2142.440NANA
CL−0.002−0.002−0.002−0.001−0.001−0.001
LCL−0.527−1.053−1.578−0.487−0.973−1.460
UCL0.5241.0501.5750.4850.9711.458
CI Length1.0512.1033.1540.9721.9452.917
Coverage0.6120.9880.9990.6161.0001.000
Gumbel CopulaNLPCANnetDL
1 σ 2 σ 3 σ 1 σ 2 σ 3 σ
In-ControlARL2.519104.428145.6052.153NANA
CL0.0020.0020.0020.0010.0010.001
LCL−0.534−1.069−1.605−0.499−0.998−1.498
UCL0.5371.0721.6080.5011.0011.500
CI Length1.0712.1423.2131.0001.9992.999
Coverage0.5950.9941.0000.5461.0001.000
Over-dispersionARL2.37196.315151.8932.231NANA
CL−0.001−0.001−0.001−0.001−0.001−0.001
LCL−0.530−1.058−1.587−0.493−0.986−1.479
UCL0.5281.0571.5860.4920.9851.478
CI Length1.0582.1153.1730.9861.9712.957
Coverage0.5980.9921.0000.5851.0001.000
Under-dispersionARL2.48093.172135.8822.440NANA
CL−0.002−0.002−0.002−0.001−0.001−0.001
LCL−0.523−1.045−1.566−0.487−0.973−1.460
UCL0.5191.0411.5620.4850.9711.458
CI Length1.0432.0853.1280.9721.9452.917
Coverage0.6060.9911.0000.6161.0001.000
Table 6. RMSE with Cleveland heart disease data.
Table 6. RMSE with Cleveland heart disease data.
Q1MEDIANMEANQ3IQR
Whole DataNnet0.39080.4210.6370.4610.071
BVSNnet0.37270.4082.3720.4980.125
PCANnet0.62720.6710.6730.7160.089
NLPCANnet0.4990.5271.2231.3390.84
Whole DataDL0.49770.50.50180.50410.0064
BVSDL0.49770.49990.50180.50390.0062
PCADL0.49760.50.50160.50390.0063
NLPCADL0.49790.50030.5020.50420.0063
Table 7. ARLs, CIs, and coverage with Cleveland heart disease Data.
Table 7. ARLs, CIs, and coverage with Cleveland heart disease Data.
NnetDL
Whole Data 1 σ 2 σ 3 σ 1 σ 2 σ 3 σ
ARL8.64417.07143.7352.220NANA
CL−0.119−0.119−0.1190.0060.0060.006
LCL−2.481−4.842−7.204−0.494−0.993−1.492
UCL2.2434.6046.9660.5051.0041.503
CI Length4.7239.44614.1690.9981.9972.995
Coverage0.7690.9180.9960.5491.0001.000
NnetDL
BVS 1 σ 2 σ 3 σ 1 σ 2 σ 3 σ
ARL10.09922.68721.9882.165NANA
CL0.1050.1050.1050.0020.0020.002
LCL−1.111−2.326−3.542−0.497−0.996−1.495
UCL1.3212.5373.7530.5011.0011.500
CI Length2.4324.8637.2950.9981.9962.995
Coverage0.7560.9870.9900.5501.0001.000
NnetDL
PCA 1 σ 2 σ 3 σ 1 σ 2 σ 3 σ
ARL7.25316.20253.4602.157NANA
CL−0.012−0.012−0.0120.0050.0050.005
LCL−0.646−1.281−1.915−0.494−0.993−1.493
UCL0.6231.2571.8910.5041.0031.502
CI Length1.2692.5383.8070.9981.9962.995
Coverage0.7650.9060.9980.5491.0001.000
NLPCA 1 σ 2 σ 3 σ 1 σ 2 σ 3 σ
ARL3.22719.83244.8642.179NANA
CL−0.003−0.003−0.0030.0030.0030.003
LCL−0.676−1.348−2.020−0.496−0.995−1.494
UCL0.6691.3412.0130.5011.0001.499
CI Length1.3452.6894.0340.9981.9952.993
Coverage0.7030.9490.9940.5511.0001.000
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kim, J.M.; Ha, I.D. Deep Learning-Based Residual Control Chart for Binary Response. Symmetry 2021, 13, 1389. https://doi.org/10.3390/sym13081389

AMA Style

Kim JM, Ha ID. Deep Learning-Based Residual Control Chart for Binary Response. Symmetry. 2021; 13(8):1389. https://doi.org/10.3390/sym13081389

Chicago/Turabian Style

Kim, Jong Min, and Il Do Ha. 2021. "Deep Learning-Based Residual Control Chart for Binary Response" Symmetry 13, no. 8: 1389. https://doi.org/10.3390/sym13081389

APA Style

Kim, J. M., & Ha, I. D. (2021). Deep Learning-Based Residual Control Chart for Binary Response. Symmetry, 13(8), 1389. https://doi.org/10.3390/sym13081389

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop