**2. Methodology**

In this section, we give details of the Shewhart control chart for normal and non-normal environments. The known and unknown parameter scenarios, the practitioner–practitioner variation in the estimation stage, the presence of outliers/extreme values in the estimation sample, and incorporating some outlier detection models in the Shewhart chart are all discussed in the following subsections.

### *2.1. Overview of the Shewhart Control Chart*

Let *Yij i* = 1, 2, ... , *n* and *j* = 1, 2, ... represent a *i*th observation from *j*th sample of an ongoing (continuous) process. Further *Yij* follows a normal distribution with mean μ0 + δσ0 and variance σ0 2 i.e., *Yij* ∼ *N* μ0 + δσ0, σ0 2 . The process is said to be in the in-control (IC) state if δ = 0, and out-of-control (OoC) otherwise. A default Shewhart set-up monitors a process by plotting the sample mean ( *Yi* = 1/*n n j*=1 *Yij*) of *Yij* against the following control chart limits.

$$\text{LCLL} = \mu\_0 + L \frac{\sigma\_0}{\sqrt{n}}, \text{ LCL} = \mu\_0 - L \frac{\sigma\_0}{\sqrt{n}} \tag{1}$$

where UCL and LCL denote the upper and lower control limits, respectively. Limits in (1) are useful when the parameters (μ0 and σ0 2) of the process are known. However, when they are unknown, their respective unbiased estimators from the phase-I are used, and the resulting control chart structures will be in estimated form.

For phase-I, let *Yil* represents *i*th observation from *l*th random sample ∀ *i* = 1, 2, 3, ... , *n* and *l* = 1, 2, 3, ... , *m*, regarded to be under statistically IC state. It is good to mention here that the choice of *m* and *n* varies from one practitioner-to another. Therefore, it a ffects the accuracy of the control limits implying an influenced ARL in phase-II. The unbiased estimators for the parameters μ and σ of an IC process are defined as:

$$\begin{array}{l} \text{\(\mu\_0 = (1/m)\,\Sigma\_{l=1}^m\,\overline{Y}\_l\)}\\ \text{\(\sigma\_0 = (1/m\text{C}\_4)\,\Sigma\_{l=1}^m\,\text{S}\_l\)} \end{array} \tag{2}$$

where *Yl* = *n l*=1 *Yil n* , *Sl* = *n <sup>i</sup>*=<sup>1</sup>(*Yil*−*Yl*) 2 *<sup>n</sup>*−1 and *c*4 = √2/(*<sup>n</sup>*−<sup>1</sup>)Γ(*n*/2) <sup>Γ</sup>[(*<sup>n</sup>*−<sup>1</sup>)/2] is the bias correction constant. Subsequently, the resulting control limits in (1) are modified to the following:

$$\widehat{\text{UCL}} = \frac{\sum\_{l=1}^{m} \overline{Y\_{l}}}{m} + L \frac{\sum\_{l=1}^{m} S\_{l}}{m \text{C}\_{4} \sqrt{n}}, \quad \widehat{\text{LCL}} = \frac{\sum\_{l=1}^{m} \overline{Y\_{l}}}{m} - L \frac{\sum\_{l=1}^{m} S\_{l}}{m \text{C}\_{4} \sqrt{n}} \tag{3}$$

In phase-II, *Yl*s are plotted against the control limits in (3) and the chart is said to have given an OoC signal if any value of *Yl* is plotted outside the limits. Here, the sample number at which the statistic is plotted outside the limits is recorded as run length (RL). RL is an important variable in measuring the performance of control charts in general, and the Shewhart is not an exception. The most widely used property of RL is ARL, which is the average number of samples observed before the chart sends an OoC signal. Mathematically, ARL = *s k*=1 RL*k*/*<sup>s</sup>* where s is the number of RLs recorded. In addition to ARL, standard deviation of the RL (SDRL) gives more information about the behavior of the RL variable in evaluating the performance of a control chart. Furthermore, the ARL is of two types i.e., the IC ARL, denoted as ARL0 and the OoC ARL, referred to as ARL1. ARL0 is expected to be su fficiently large enough to avoid false alarms. On the other hand, ARL1 is anticipated to be su fficiently small to enable the process to send a signal as soon as there is a shift in the process parameter(s).

### *2.2. Variability in the Shewhart Chart Performance*

In this section, we explain the e ffect of the practitioner to practitioner variability on the Shewhart chart, both in normal and non-normal distribution, by using the Monte Carlo simulation approach. See ([25–29]) for more information about the e ffect of sample size and practitioners' variability. To achieve this aim, we develop an algorithm in R programing language to simulate the Shewhart chart environment, using the standard Shewhart chart as our benchmark and reference point. The *X* chart has a control limits width determinant *L* that influences RL properties. We use the standard *L* = 3, that corresponds to the ARL0 = 370 (see [1] for more details). Without any loss of generality, we generate random samples from a standard normal distribution *<sup>N</sup>*(μ = 0, σ = <sup>1</sup>), each of sample size *n* = 5, assuming the process parameters are known. While for the non-normal distribution, we considered the *t*-distribution with degrees of freedom *v* = 5, 25, and 100. Since all the three categories of *v* exhibit the same pattern, we report only the results for *v* = 100. In both environments, normal and *t*-distributions, we set up the chart limits as given in Equation (1) and plot the sample

means against the UCL and LCL. As soon as a value of *Yj* is plotted outside the limits, RL is recorded and saved. The process is iterated 10<sup>5</sup> times to ge<sup>t</sup> ARL and SDRL.

For the unknown parameters, we estimate the parameter from phase-I. The number of samples employed for the estimation differs from on practitioner to another and so does the accuracy of the charts in phase-II. To depict that, we estimated both μ0 and σ0 from different number of in-control phase-I samples i.e., *m* = 25, 50, 100, 250, 500 and 1000 each of sample size *n* = 5. The estimated parameters μˆ 0 and σˆ 0 from the phase-I IC stage are, therefore, used in the same algorithm instead of μ0 and σ0 respectively. Subsequently the parameter *L*, changes as the amount of phase-I samples changes. The corresponding *L*s for the different *m*s are *L* = 2.962, 2.983, 2.9925, 2.997, 2.999, and 3 respectively for the normal distribution, and *L* = 2.974, 2.995, 3.005, 3.010, 3.012, and 3.012 respectively for the *t*-distribution of *v* = 100. These *L*'s are determined through simulations to obtain ARL0 = 370. We carry out the simulation with different level of shifts δ ranging from 0 to 5 i.e., δ ∈ (0, 0.5, <sup>5</sup>), as shown in Tables 1 and 2.

**Table 1.** Average run length (ARL) of the Shewhart chart with estimated parameters for standard normal and t (*v* = 100) distributions.


**Table 2.** Standard deviation of the run length (SDRL) of the Shewhart chart with estimated parameters for standard normal and t (*v* = 100) distributions.


### *2.3. Presence of Outliers in the Shewhart Chart with Estimated Parameters*

Although the estimation of the unknown parameters in phase-I samples plays its role on the efficiency of the control chart in phase-II. The drop in the efficacy of the chart performance is not limited to this fact alone, rather it extends to presence of outlying/extreme values in the phase-I samples.

In this Section, we study the effect of outliers in the phase-I samples on the performance and accuracy of the Shewhart chart. Here, through Monte Carlo simulation, we generate the *m* phase-I samples from a mixture distribution i.e., (1 − α)100% from assumed (normal or *t*-distribution) and the remaining α100% from a chi-square distribution with *n* degrees of freedom denoted by <sup>χ</sup><sup>2</sup>(*n*). Subsequently, the estimated parameters emerging from the *m* samples have an extreme values effect on the control chart in phase-II. That is, each observation of the phase-I sample is generated from the following expression:

$$\begin{aligned} (1-\alpha)\mathcal{N}(\mu,\sigma^2) + a \left[ \mathcal{N}(\mu,\sigma^2) + w \,\,\chi^2\_{\,(n)} \right] ∨ \\ (1-\alpha)t(v) + a \left[ t(v) + w \,\,\chi^2\_{\,(n)} \right] \end{aligned} \tag{4}$$

where α > 0, is the probability of having a multiple of <sup>χ</sup><sup>2</sup>(*n*) added to the assumed distribution, serving as the outliers in the samples. In addition, *w* ≥ 1 is the magnitude of the outlier. We develop an algorithm from the R language, similar to that in Section 2.2, but the samples are from the environment described in (4). We set μ = 0, σ2 = 1, *v* = 100, *w* = 3, and α [0, 0.01]. We design the Shewhart chart using the same parameters *L* and *m* as in Section 2.2.

In general, the pattern exhibited by the RL properties implies the following:


Unfortunately, neither of the two suggested remedies is practicable in real life. Thus, we propose outliers detecting structures through the robust Turkey and MAD detection models.

### *2.4. Shewhart Chart with Outlier Detection Models*

In the section, we propose two outlier-detecting models as remedy to the issues raised in Sections 2.2 and 2.3. The Tukey and the MAD model-based Shewhart charts. Their procedures applied in parallel to the Shewhart chart are described in the sub sections below:

### 2.4.1. The Tukey Shewhart Control Chart

For the phase-I samples, *Y* be the median of all *m* × *n* observations. For any observation *yo* if *yo* − *Y* > *p* × *IQR*, then *yo* is declared an outlier. Here *IQR* = *Q*3 − *Q*1 is the inter-quartile range of the sample. *Q*3 and *Q*1 are the third and first quartiles, respectively, of all *m* × *n* phase-I observations. The constant *p* on the other hand is the confidence factor of the Tukey's detector, commonly chosen between 1.5 and 3.0. The confidence factor should be carefully chosen, and not too small, to avoid over detection. Also it should not be too large, to prevent under detection [18]. In this study, we choose *p* = 2.2. Applying the same algorithm, parameters and limits employed in Section 2.2, we incorporate the Tukey outlier-detector model on the phase-I samples to screen out the extreme values present there in. Then we compute the IC ARL and SDRL values for the Shewhart chart based on the Tukey model in phase-II, when the parameters are estimated.

### 2.4.2. The Median Absolute Deviation (MAD) Shewhart Control Chart

We define median absolute deviation (MAD) as the deviation of the dataset about the median as MAD = *medianYil* − *Y*/0.6574. Then it follows, that any observation *yo* from the sample that falls outside the expression *Y* ± *b* ∗ MAD , is declared an outlier. Here *b* is the outlier detecting constant and chosen 3.642 so that the percentage of screening by MAD is the same as Tukey. This has been done to keep the comparison between two outlier detectors valid [19].

Furthermore, it is worth distinguishing between outlying and OoC sample points. The former emerges from *m*phase-I samples, which are used to construct the control limits for the monitoring stage; phase-II; while the latter are the sample points that fall beyond the control limits in phase-II. Therefore, the presence of outlying sample points in phase-I leads to wider control limits, rendering the control charts less effective. A flowchart summarizing the procedure is depicted in Figure 1.

**Figure 1.** Flowchart of the procedures of proposed Shewhart control chart.
