Data-Driven Surveillance of Internet Usage Using a Polynomial Profile Monitoring Scheme

Netshiozwi, Unarine; Yeganeh, Ali; Shongwe, Sandile Charles; Hakimi, Ahmad

doi:10.3390/math11173650

Open AccessArticle

Data-Driven Surveillance of Internet Usage Using a Polynomial Profile Monitoring Scheme

¹

Department of Mathematical Statistics and Actuarial Science, Faculty of Natural and Agricultural Sciences, University of the Free State, Bloemfontein 9301, South Africa

²

Department of Industrial Engineering, Faculty of Engineering, University of Kurdistan, Sanandaj 0098, Iran

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(17), 3650; https://doi.org/10.3390/math11173650

Submission received: 15 July 2023 / Revised: 13 August 2023 / Accepted: 17 August 2023 / Published: 23 August 2023

(This article belongs to the Special Issue Mathematical Data Science with Applications in Business, Industry, and Medicine)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Control charts, which are one of the major tools in the Statistical Process Control (SPC) domain, are used to monitor a process over time and improve the final quality of a product through variation reduction and defect prevention. As a novel development of control charts, referred to as profile monitoring, the study variable is not defined as a quality characteristic; it is a functional relationship between some explanatory and response variables which are monitored in such a way that the major aim is to check the stability of this model (profile) over time. Most of the previous works in the area of profile monitoring have focused on the development of different theories and assumptions, but very little attention has been paid to the practical application in real-life scenarios in this field of study. To address this knowledge gap, this paper proposes a monitoring framework based on the idea of profile monitoring as a data-driven method to monitor the internet usage of a telecom company. By definition of a polynomial model between the hours of each day and the internet usage within each hour, we propose a framework with three monitoring goals: (i) detection of unnatural patterns, (ii) identifying the impact of policies such as providing discounts and, (iii) investigation of general social behaviour variations in the internet usage. The results shows that shifts of different magnitudes can occur in each goal. With the aim of different charting statistics such as Hoteling T² and MEWMA, the proposed framework can be properly implemented as a monitoring scheme under different shift magnitudes. The results indicate that the MEWMA scheme can perform well in small shifts and has faster detection ability as compared to the Hoteling T² scheme.

Keywords:

control chart; internet usage monitoring; profile monitoring; statistical process control (SPC); telecom company

MSC:

03-08

1. Introduction

Statistical Process Control (SPC) is a novel idea with the aim of high-quality process monitoring and control. It has been utilised frequently in industrial processes since the 1920s [1] to monitor numeric or attribute quality characteristics and it includes seven major techniques, the most prominent of which is the control chart. To conduct a process monitoring procedure, two phases (namely, Phases I and II) are defined for each control chart in such a way that Phase I aims to estimate the process parameters from the historical data (off-line monitoring), and Phase II aims to adequately detect unnatural conditions, denoted by Out-of-Control (OC), from the stable or normal situations, denoted by In-Control (IC) as soon as possible. According to Montgomery [2], an IC process has only process-inherent random variability while a process that can be affected by assignable (non-random) causes is referred to as an OC situation. So, it could be said that the major aim of Phase II monitoring is about the detection of assignable causes from the process-inherent random ones.

To monitor a process in SPC with control charts, two approaches are usually performed. The first (conventional) one is to consider a univariate or multivariate distribution for a single or multiple quality characteristics. For the second one, the process quality is formulated and monitored by a functional relationship between a response variable and one or more explanatory variables, which is denoted by profile monitoring [3]. It could be said that the major goal of profile monitoring is to check the stability of a specified IC relationship (or profile) over time [4].

In recent decades, several studies have focused on the development of novel control charts in the area of profile monitoring either in Phase I or II. As a few examples of the pioneer works, one can refer to Kang and Albin [5], Kim, et al. [6] and Woodall, et al. [7] in which some basic control charts were developed for the linear profiles whose shape is established by a simple straight line. Although linear profiles are the simplest and the most frequent case [8], other profile shapes may be needed in the complicated process. To name just a few, these are ordinal [9], binary logistic [10,11], Poisson [12], nonlinear [13], complex relationship [14,15] and so forth. In addition to these types, polynomial profiles have been considered in several studies which are in line with the aim of this study.

The polynomial profiles are built in the form of y = b₀ + b₁x + b₂x² +…+ b_kx^k and were first discussed by Kazemzadeh, et al. [16]. They proposed three control charts for Phase I analysis based on the F-test, Hotelling’s T² (hereafter T²) and Likelihood Ratio Test (LRT) statistics. Amiri, et al. [17] considered a quadratic (polynomial with order 2) profile type in monitoring the relationship between the torque produced by an engine and the engine speed in revolutions per minute as a case study in the automotive industry. In addition, they discussed the observed challenges of monitoring polynomial profiles in Phase I and II. Zhang, et al. [18] proposed two other modified methods for a quadratic form of polynomial profiles when the IC model was linear. They concluded that T² and LRT control charts yielded the same performance. In a similar problem, i.e., changing the linear IC form to a quadratic polynomial profile, Zhang, et al. [19] proposed a new Exponential Weighted Moving Average (EWMA) charting statistic in combination with a score test approach. The simulation results indicated that their proposed approach was robust against the variations of the profiles’ shape. Furthermore, a cumulative sum (CUSUM) approach was proposed by Zhang, et al. [20] in similar problem adjustments. Despite the three previous works which were limited to speculation about the quadratic polynomial profiles, Yao, et al. [21] considered the general form of polynomial profiles and compared four common control charts for this situation.

Considering previous research in the area of profile monitoring, it could be observed that the emphasis of most studies has been on the development of new control charts not only in the polynomial model but also in all other profile types, while little attention has been paid to the practical application of the profile monitoring idea. To the best of the authors’ knowledge, there is no research about the real application of polynomial profiles, but some sporadic works could be found in other profile types. For instance, to count a few: monitoring of highway safety [22], calibration system [23] and chemical gas sensors [24] with a general linear model, plasma etch process in semiconductor manufacturing with nonlinear profiles [25], social networks and healthcare with a logistic model [26,27,28], engine procedure in the automotive industry with autocorrelated profiles [17] and customer satisfaction with a nonparametric approach [29]. Furthermore, it is noteworthy to mention that there are several practical applications of control charts for monitoring quality characteristics; see, for example, [30,31,32,33].

Similar to the above research, the aim of this study is to propose a practical application of control charts with the idea of profile monitoring in the field of internet usage monitoring. From the (telecommunication) internet companies’ point of view, the trend of total internet usage in some specific platforms (e.g., social networks or applications) during different hours in each day is crucial. Identifying different trends and behaviours in internet usage can reveal unnatural conditions and situations as well as lead to new decision and policy making such as offering discounts and markups, changing service tariff and costs, and so forth. Although a few studies investigated internet usage from other practical aspects [34,35,36], there is no monitoring scheme that has been extended yet in this area.

Thus, as the major contribution of this study, a novel monitoring scheme based on the idea of profiles is proposed to monitor the internet usage of a well-known anonymous internet company. By the aim of the proposed method, one can monitor the internet usage in each day and identify the unnatural trends and patterns. The problem definition, assumption, formulation and adjustments are completely described in the next sections. Previous studies usually investigated two common phases, Phase I and Phase II, in the practical application. Recently, Zwetsloot, et al. [37] suggested a more comprehensive SPC framework for practical application with four phases. Based on their idea and as the second major contribution, both phases (i.e., I and II) and some other steps are considered in this paper as a comprehensive framework for implementation of control charts in the area of this problem. It includes different steps such as model estimation, verification, Phase I analysis, Phase II signalling and signal interpretation. Although some other frameworks have been extended for other aims in the SPC domain [17,38,39,40], the latter are not compatible for the provided real-life data used in this paper. The proposed framework could be applied in real application of not only polynomial profiles, but also other profile types and other SPC monitoring problems.

To sum up, the contributions of this paper can be listed as follows:

(i): A practical application of profile monitoring in the area of internet usage monitoring.
(ii): Development of a novel SPC framework for detection of unnatural trends and patterns in some specific platforms (e.g., social networks or applications).
(iii): Consideration of both SPC common phases and some other steps including model estimation, verification, Phase I analysis, Phase II signalling and signal interpretation in the proposed framework.
(iv): Definition of charting statistics based on the polynomial profile model.

The remainder of the article is organized as follows. Section 2 proposes the problem definition and formulation. Section 3 presents a unified framework based on the SPC approach to solve the problem. The monitoring procedure based on the proposed framework is described in Section 4 and finally, Section 5 concludes the article and provides some direction for future research.

2. Problem Definition

In some companies such as online shops, hypermarkets, manufacturing plants and so forth, the trend (pattern) of purchase orders, consumption of a specific product or some other quality characteristics at different hours within a day can have several important consequences for the performance of the company. This type of information is very helpful in such companies to arrange the products and personnel, investigate the sales program, provide some hourly discounts and so forth. On the other hand, a change in the pattern of a specific day from the previous days could be caused by some important assignable causes, which may lead to unfavourable events. For example, if it is expected that the purchase orders will decrease with a nearly constant rate in all hours of holidays or weekends, but instead some occurrence of a sharp decrease or increase in a specific hour is observed, then it is sometimes necessary to discover the reason behind that unexpected event.

The above idea is also important in internet telecommunication (telecom) companies which provide data/broadband and internet services. As such, these companies can obtain some beneficial results from monitoring internet usage during different hours of the day. From the latter scenario, the following three major aims are investigated in this paper:

(i): First, unnatural patterns, network failure and some other abnormal situations in which the total trend of a day is affected and varied are detected by this procedure.
(ii): Identifying the impact of some policies such as providing discounts or markups for specific hours is considered as the second aim.
(iii): Some general social behaviour variations, which could also be considered as customer requirements, are detected by this idea. In this case, some specific products are subject to change and it is important to identify whether this change can affect the overall trend or not.

To reach the above aims, process monitoring is implemented, which usually refers to surveillance of an industrial process or manufacturing lines over time to achieve a high level of final product quality, by defining control limits for the desired quality characteristics and producing signals in the unfavourable situations at every step of the manufacturing process [2]. The SPC techniques, specifically control charts, provide several beneficial tools and techniques to discover unusual variability and prevent the occurrence of these again. Hence, it is necessary to define and utilize a proper control chart in the area of this problem. The simple and conventional approach is to provide some univariate or multivariate control charts to monitor the process at each hour of the day. However, it is a more complex scheme in this area, as a large number of charting statistics and control limits need to be defined [41]. To avoid these challenges, some techniques and tools are required to aggregate all the day’s results and monitor the process entirely. For this aim, the profile monitoring idea which monitors a relationship between some explanatory and response variables over time is suggested in this paper. More details about how profile monitoring deals with this situation can be found in Zou, et al. [42] and Yeganeh, et al. [43].

Hence, a well-known anonymous telecom company provided some real-life data to be studied as per the above aims using the idea of profile monitoring. The data includes the internet usage (in terms of GigaBytes (GB)) of more frequently active subscribers in two major social networks, i.e., Instagram and WhatsApp, in each hour of the days between 1 January 2021 and 30 April 2022. So, the dataset of each social network has 485 rows (days) and 24 columns (hours). For better understanding, Figure 1 depicts the profile of internet usage in the first ten days of 2021 for (a) Instagram and (b) WhatsApp. Each line (colour) represents a specific day. As shown, there seem to be low usage of internet between 3:00 and 8:00, while the rate is at the highest volume during 18:00 to 23:00 in all ten days.

Considering Figure 1, the polynomial profile models are suitable for the internet usage in each day. For this aim, we followed the approach of Wang, Li, Ma, Song and Wang [29] in which the number of months were assumed as the explanatory variables in such a way that the hours of the day establish the independent variables. Hence, the responses are defined as the usage of internet in a specific hour. For the jth day of the monitoring procedure, the polynomial profile model is as given in Equation (1).

\begin{array}{l} y_{i j} = a_{0} + a_{1} x_{i}^{1} + a_{2} x_{i}^{2} + \dots + a_{k} x_{i}^{k} + ε_{i j}, \\ ε_{i j} \sim N (0, σ^{2}), \\ i = 1, 2, \dots, n, \\ j = 1, 2, .... \end{array}

(1)

As the explanatory variables are hours, they are fixed (constant) in all the profiles and it could be written that x_i = 1, 2,…, 24. In other words, it is not necessary to use a time index (here j) for the explanatory variable [44]. Furthermore, it is assumed that the error terms are completely independent, that is, there is no between and within autocorrelated error terms. At the end of the jth day, there are 24 values (n = 24) for the internet usage at each hour so that the parameters of the model in Equation (1) are estimated by the Ordinary Least Square (OLS) algorithm as shown in Equation (2) [16].

\begin{array}{l} [{\hat{a}}_{0 j}, {\hat{a}}_{1 j}, \dots, {\hat{a}}_{k j}] = {(X^{'} X)}^{- 1} X^{'} Y_{j}, \\ X = (\begin{matrix} 1 & 1 & 1 & \dots & 1 \\ 1 & 2 & 2^{2} & \dots & 2^{k} \\ 1 & 3 & 3^{2} & \dots & 3^{k} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 1 & 24 & 24^{2} & \dots & 24^{k} \end{matrix}), \\ Y_{j} = [y_{1 j}, y_{2 j}, \dots, y_{24 j}]^{'} . \end{array}

(2)

In Equation (2), the fixed explanatory variable is a 24 × (k + 1) matrix and the response variable is a 24 × 1 vector. Furthermore, the error variance is estimated by the Mean Squared Error (MSE) of the residuals as follows [42]:

{\hat{σ}}_{j}^{2} = \frac{\sum_{i = 1}^{n} {(y_{i j} - ({\hat{a}}_{0 j} + {\hat{a}}_{1 j} x_{i}^{1} + {\hat{a}}_{2 j} x_{i}^{2} + \dots + {\hat{a}}_{k j} x_{i}^{k}))}^{2}}{n - k - 1} .

(3)

To solve this problem, the proper polynomial model is first assigned on the data to reach the IC model and then, the stability of this model is checked over time by the idea of profile monitoring and control chart theory. In fact, the IC regression model parameters in Equation (1), denoted by 0 index hereafter, including a₀₀, a₁₀,…, a_k₀ and σ₀², are monitored over time and triggering a signal on the jth day is equivalent to changing the IC model to OC, or the occurrence of a new pattern in internet usage on the jth day. Considering this point, the response variables do not directly monitor the process and it does not matter in this approach whether the internet usage is located in a specific range in different hours, but it is necessary that the IC polynomial predefined model has a constant form at the end of each day over time. Therefore, when the internet usage at a specific hour changes from the previous day’s pattern, and if the usage in other hours also changes according to a predefined IC polynomial model in the way that the IC model remains constant, the process is still IC; otherwise, an OC signal is triggered by this approach.

3. The Proposed Framework

As a general SPC approach, previous historical data are used to establish a control chart in Phase I and estimate the IC model. The information of m previous days, with the most common range being between 20 and 30, is also considered as the Phase I (reference) data in this study (it is usually denoted by the number of subgroups in Montgomery [2], hence we can say there are m subgroups). The value of m is considered as a hyper parameter of the model and could be obtained based on expert judgment or some other criteria. On the other hand, it is necessary to verify the polynomial model assumptions before Phases I and II analyses. Except the work presented by Amiri, Jensen and Kazemzadeh [17] and Zwetsloot, Jones-Farmer and Woodall [37], there is no other research on model verification in the profile monitoring area. Five steps are defined in the proposed approach to reach a suitable decision about the process condition; these are model estimation, model verification, Phase I analysis, Phase II analysis and signal interpretation. These steps could be applied not only in polynomial profiles but also in other profile types. The details of the five proposed steps are given in the following subsections.

3.1. Model Estimation

In the first step, the polynomial model is fitted on each subgroup and the coefficients as well as the residuals of m profiles are estimated considering a specific value of k. Hence, the residual and coefficient matrices have the form m × n and m × (k + 1), respectively.

3.2. Model Verification

The error terms of the regression model in Equation (1) contain some principal assumptions; those are as follows:

(i): Following normal distribution;
(ii): Equality of variances;
(iii): Lack of any autocorrelation.

The importance of these assumptions is due to the validation of OLS estimators under the normal distribution and independence of error terms. So, some requirements are necessary to ensure proper design principals under the prepared real data. The following checklist and solutions are followed in this framework:

The residuals of m profiles are aggregated (to have an m × n matrix) and the normality assumption is checked by the well-known Kolmogorov–Smirnov test. To modify the non-normal data, which is equal to the acceptance of an alternative hypothesis, Montgomery [2] suggested the use of transformation on the raw data. Following this idea, the responses (internet usage values) are transformed as y* = y^s, where (0 < s < 1). For different values of s (from greater than 0 to lower than 1), the polynomial model is fitted on the data, and the normality of residuals is investigated to reach the best value of s based on the model’s accuracies.
The equality of variance between the samples is checked by the Levene test computed by performing analysis of variance (ANOVA) on the squared deviations of the residual values from other samples means. It is expected to reach large p-values for the equality of variances, whereas transformation on response values could also be modified this way.
The existence of autocorrelation may occur between or within the profiles. These two situations were completely investigated by Noorossana, et al. [45] and Soleimani, et al. [46], in which the autocorrelation coefficients were denoted by ϕ and ρ, respectively. Furthermore, sometimes both of them (between and within correlation) may happen at the same time; this scenario has been discussed by Ahmadi, et al. [47]. Note that the observations of different lags are possible when studying the autocorrelation effect. So, a broad range of situations should be checked, and it is worth mentioning that this is a very time-consuming procedure. Considering the above references, it is suggested that the first order autocorrelation (i.e., lag = 1) is the most probable condition and thus, we only consider it in this paper. On the other hand, there is no general approach or hypothesis test for checking the autocorrelation in all the profiles simultaneously in the current Phase I literature.
In line with this aim, the first order autocorrelation effect is checked on the rows and columns of the m × n obtained residual matrix to reach the m-variate vector [ϕ₁, ϕ₂,…, ϕ_m] and the n-variate vector [ρ₁, ρ₂,..., ρ_n]. Then, their average could be the indicator of existence of autocorrelation ∅ between and within profiles, which is computed as $\bar{\emptyset} = \frac{\sum_{a = 1}^{m} \emptyset_{a}}{m}$ and $\bar{ρ} = \frac{\sum_{a = 1}^{n} ρ_{a}}{n}$ . Based on the results of Noorossana, Amiri and Soleimani [45] and Soleimani, Noorossana and Amiri [46], the effect of autocorrelation can be disregarded in case of absolute values lower than 0.3 values for $\bar{\emptyset}$ and $\bar{ρ}$ . When there is autocorrelation in the model, Amiri, Jensen and Kazemzadeh [17] suggested to use a Linear Mixed Model (LMM). It generates some new coefficients for the model and the effect of autocorrelation could be reduced or removed. Another possible approach is to employ the proposed statistics by Noorossana, Amiri and Soleimani [45], Soleimani, Noorossana and Amiri [46] and Ahmadi, Yeganeh and Shadman [47] for Phase I analysis, depending on the type of autocorrelation.

The best polynomial model is obtained by checking the order of different models, i.e., the determination of k in Equation (1), and yields to the satisfaction of the above three conditions. It is naturally an iterative procedure which starts from k = 2 and is terminated when the best value of k is reached. Note that the value of k is identical for all the m profiles. At the end of this step, there is a matrix of coefficients with dimension m × (k + 1).

3.3. Phase I Analysis

As mentioned, the principal aims of Phase I analysis are to check the stability of the process, estimate the parameters, remove the outliers, ensure the distribution of the data and check the false alarm rate. Kang and Albin [5] and Yeh, et al. [48] suggested the T² control chart for Phase I profile monitoring analysis due to the existence of large shifts. Furthermore, a proper power and detection ability were reported for this approach in Kazemzadeh, Noorossana and Amiri [16]. Different extensions of the T² control chart have been proposed based on different estimations of the variance–covariance matrix, one of which has been extended by Williams, et al. [49]. This approach is also applied here by defining the charting statistic for the hth profile as outlined in Equation (4).

\begin{array}{l} {\hat{β}}_{h} = [{\hat{a}}_{0 h}, {\hat{a}}_{1 h}, \dots, {\hat{a}}_{k h}] = {(X^{'} X)}^{- 1} X^{'} Y_{h}, \\ Y_{h} = [y_{1 h}, y_{2 h}, \dots, y_{24 h}]^{'}, \\ T_{h}^{2} = ({\hat{β}}_{h} - \bar{β})^{'} S^{- 1} ({\hat{β}}_{h} - \bar{β}), \\ \bar{β} = \frac{1}{m} \sum_{h = 1}^{m} {\hat{β}}_{h}, \\ h = 1, 2, \dots, m . \end{array}

(4)

Note that in this paper, the index of each profile is indicated by h in Phase I, and by j in Phase II. In Equation (4), the estimation of the variance–covariance matrix is computed as shown in Equation (5).

\begin{array}{l} S = \frac{{\hat{V}}_{h}^{'} {\hat{V}}_{h}}{2 (m - 1)}, \\ {\hat{V}}_{h} = (\begin{matrix} {\hat{v}}^{'}_{1} \\ {\hat{v}}^{'}_{2} \\ \dots \\ {\hat{v}}^{'}_{(m - 1)} \end{matrix}), \\ {\hat{v}}^{'}_{h} = {\hat{β}}_{h + 1} - {\hat{β}}_{h} . \end{array}

(5)

To obtain the control limit, the exact distribution of

T_{h}^{2}

is needed; while it may not be known, some approximations have been proposed. For example, Williams, et al. [50] showed that χ²_(k+1),_α could be considered as a proper control limit (α is determined by the user specifications). By this approach, all the m charting statistics are compared with χ²_(k+1),_α and the profiles with charting statistic greater than χ²_(k+1),_α, i.e.,

T_{h}^{2}

> χ ²_(k+1),a are removed and the procedure is iterated until there are no OC profiles. At the end of the Phase I analysis, the IC process parameters including a₀₀, a₁₀, …, a_k₀, ϕ₀, ρ₀, σ₀ and the variance–covariance matrix (S₀) are estimated based on the averaging of the remaining profiles. Note that the 0 index indicates the IC condition.

3.4. Phase II Analysis

As mentioned in the literature review, three general categories were extended in polynomial profiles entailing (i) T²-, (ii) LRT- and (iii) EWMA-based methods. Our simulations revealed similar performance between the LRT and EWMA approaches (the results can be given to the interested readers), so it is neglected here. Two other groups were chosen for the Phase II monitoring aim. As the shift in magnitude in Phase II is usually lower than that of Phase I, the T² statistic proposed by Williams, Woodall and Birch [49] is not efficient. Yao, Li, He and Zhang [21] extended a modified version of the T² control chart for Phase II analysis in polynomial profiles. In this approach, the variance–covariance matrix (S) is modified based on the idea of Kang and Albin [5]. To compute the charting statistics in the jth profile in Phase II, the parameters are first estimated using the OLS approach (

{\hat{β}}_{j}

) and then the charting statistic (

T_{j}^{2}

) are computed as shown in Equation (6).

\begin{array}{l} {\hat{β}}_{j} = [{\hat{a}}_{0 j}, {\hat{a}}_{1 j}, \dots, {\hat{a}}_{k j}] = {(X^{'} X)}^{- 1} X^{'} Y_{j}, \\ β_{0} = [a_{00}, a_{10}, \dots, a_{k 0}], \\ Y_{j} = [y_{1 j}, y_{2 j}, \dots, y_{24 j}]^{'}, \\ T_{j}^{2} = ({\hat{β}}_{j} - β_{0})^{'} S_{0}^{- 1} ({\hat{β}}_{j} - β_{0}), \\ j = 1, 2, ...., \\ S_{0} = σ_{0}^{2} X^{'} X^{- 1} . \end{array}

(6)

In Equation (6), δ₀ is the IC standard deviation of errors which is estimated in Phase I analysis using Equation (3). The OC signal is triggered when

T_{j}^{2}

is greater than

{U C L}_{T^{2}}

. As the distribution of

T_{j}^{2}

is not known in Phase II and the proposed control limit by Williams, Woodall, Birch and Sullivan [50] is suitable for Phase I,

{U C L}_{T^{2}}

is computed by a Monte Carlo simulation. For this aim, it is common to define a new criterion denoted by Average Run Length (ARL), which means, on average, the number of required samples to reach a signal. When the simulation studies are performed using IC profiles, then it is denoted as ARL₀. It is expected to be reached as a target by the adjustment of

{U C L}_{T^{2}}

[51]. To better clarify, the following Algorithm 1 is suggested to obtain

{U C L}_{T^{2}}

. Following previous works such as those of Kang and Albin [5] and Yeganeh, et al. [52], in this paper, we also set ARL₀ at 200.

Algorithm 1. Procedure of computations of

{U C L}_{T^{2}}

based on the desired ARL₀

Maxiteration = 10,000; Set

{U C L}_{T^{2}}

to an approximate value;

while ARL₀ is not equal to desired ARL₀

Let ARL = [];

for t = 1:Maxiteration do

j = 0;

while (charting statistic <

{U C L}_{T^{2}}

)

Generate random IC profile;

Compute charting statistic;

j = j + 1;

end while

Store the signalling time (j) in ARL;

end for; % ARL would be a 1 × 10,000 vector after this procedure

ARL₀ = average of (ARL)

if (ARL₀ < desired ARL₀) increase

{U C L}_{T^{2}}

; else decrease

{U C L}_{T^{2}}

; end if

end while

However, the T² control chart is not solely sufficient for Phase II analysis due to its weakness in detection of small shifts. Several authors such discussed this issue and suggested the use of EWMA control charts [5,6]. To remedy this challenge, the Multivariate EWMA (MEWMA) approach, proposed by Zou, Tsung and Wang [42], is also applied in Phase II monitoring of polynomial profiles. The comprehensive details of this scheme were discussed in Yeganeh, Shadman and Amiri [52]. Similar to

{U C L}_{T^{2}}

, UCL_MEWMA is also determined based on Monte Carlo simulations. To construct the MEWMA charting statistic, the coefficients of the jth generated polynomial profile is first scaled as follows:

\begin{array}{l} Z_{j}^{} (β) = \frac{({\hat{β}}_{j} - β_{0})}{σ_{0}^{}}, \\ Z_{j} (σ) = Φ^{- 1} {F ((n - (k + 1)) \frac{{\hat{σ}}_{j}^{2}}{σ_{0}^{2}}; n - (k + 1))}, \end{array}

(7)

where Φ⁻¹(.) is the inverse of the standard normal cumulative distribution function, and F(.,n) is the chi-square cumulative distribution function with n degrees of freedom. The (k + 2)-variate vector Z_j, which concatenates Z_j(β) and Z_j(σ) as Z_j = [Z_j(β),Z_j(σ)], is a multivariate normally distributed with mean vector 0 and covariance matrix

Σ_{m} = (\begin{matrix} {(\overset{´}{X X})}^{- 1} & 0 \\ 0 & 1 \end{matrix})

, where

Σ_{m}

is a (k + 2) × (k + 2) square matrix. Eventually, W_j is defined as the MEWMA of the vector Z_j from the first to the jth profile:

W_{j} = θ Z_{j} + (1 - θ) W_{j - 1}, j = 1, 2, ....,

(8)

where θ is the EWMA smoothing parameter (here it is assumed to be fixed and equal to 0.2) and W₀ is a (k + 2)-dimensional vector (usually equal to the mean value of Z_j in the IC condition). The MEWMA chart signals if

U_{j} = W_{j}^{T} Σ_{m}^{- 1} W_{j} > U C L_{M E W M A} = L \frac{θ}{2 - θ} .

(9)

When there is autocorrelation effect in Phase II, the LMM approach, as well as the proposed statistics by Noorossana, Amiri and Soleimani [45], Soleimani, Noorossana and Amiri [46] and Ahmadi, Yeganeh and Shadman [47] can be utilized to remove the autocorrelation effect; note that this is similar to the Phase I analysis for the T² control chart. On the other hand, several researchers, such as Sheu and Lu [53] and Li, et al. [54], studied the robustness of EWMA control charts with the effect of autocorrelation in the Phase II analysis so that the MEWMA approach [42] can be directly employed with autocorrelated data. More discussions on the latter are presented in Appendix A.

3.5. Signal Interpretation

A Phase II signal in an industrial process is the indicator of an unnatural situation and it is highly recommended to stop the process after an OC signal to find and remove the assignable causes. As mentioned in Section 2, three aims are defined for the internet usage monitoring in this paper. In line with these aims, each OC signal is investigated and discussed to reach proper justifications and conclusions.

For better understanding, the above procedures and steps are summarized as a comprehensive framework in Figure 2. In this framework, the values of m (number of subgroups in Phase I) and ARL₀ are assigned by the user.

4. Analysis of the Practical Implementation of the Proposed Framework

In this section, the proposed framework in Figure 2 is investigated through a telecom company’s real internet usage data. After several discussions with the company’s experts about the monitoring challenges and considering the predefined monitoring aims in Section 2, three different scenarios (timestamps) of internet usage are defined to better illustrate the practical application of the proposed methods. In these scenarios, an OC situation occurred in a specific time which is known to us. We want to demonstrate that had the proposed framework been used at that time, it would have identified this unnatural event by triggering an OC signal. To provide a realistic condition, we adjust the Phase I analysis before the change in a way that some of the days in the beginning of Phase II would also be IC. Using this approach, the ability of the proposed framework in the detection of false alarms (i.e., triggering an OC signal when in fact the process is IC) is also evaluated. The following subsections provide the details about each scenario. In all the scenarios, the values of m, α and ARL₀ are set at 20, 0.95 and 200, respectively.

4.1. Scenario I: Detection of Unnatural Patterns in Internet Usage

This subsection aims to investigate the unnatural pattern detection ability of the proposed method. Due to some repairs and maintenances actions in the network, there were some network problems in the internet usage from 24 September 2021 until 15 days later. Naturally, the internet usage profile was different during this interval. For brevity and the similarity of conclusions, the results of this subsection are only based on Instagram data and the results of WhatsApp are neglected.

Suppose the current time is 3 September 2021 and m is set at 20 so Phase I data involves 15 August–3 September 2021. Figure 3a depicts the profile of Instagram internet usage in each hour of the day in Phase I. It is obvious that each line depicts a specific day in this period. Considering steps 1 and 2 in the framework, i.e., model estimation and verification (Section 3.1 and Section 3.2), the proper value of k is obtained as 8. Table 1 reports the procedure of obtaining the proper value of k. In this table, the procedure of increasing the value of k was performed on raw (without transformation columns) and transformed values. The p-value of the Kolmogorov–Smirnov and Levene tests are shown with p_ks and p_l. Additionally, the values of

\bar{\emptyset}

and

\bar{ρ}

were computed by adjusting the lag to 1. The nonacceptable values (lower than 0.05 for p-values and absolute values lower than 0.3 for autocorrelation coefficients) are shown in red. It is noteworthy to mention that the best value of s (the power value) was acquired as 0.5 in all cases.

From Table 1, it could be concluded that the normal distribution is not followed in each situation when the raw values used as the p-values of the Kolmogorov–Smirnov test (p_ks) are near to 0. Having transformed the raw values by s = 0.5, we were able to reach normal error terms in k = 8 and 9. Due to the lower complexity, the value of k is assigned as 8. In this situation, the autocorrelation effects are neglected; hence, the absolute values of

\bar{\emptyset}

and

\bar{ρ}

are lower than 0.3 (it is equal to ϕ₀ = ρ₀ = 0). It is also obvious that there is no problem regarding the equality of variance based on the Levene’s test results (the values of p_l are greater than 0.05 in all cases except k = 7).

In the next step (Phase I analysis, i.e., Section 3.3), considering Equation (4), the charting statistics (

T_{h}^{2}

) are computed until there are no outliers in the data. For this scenario, it needed three iterations to have no outliers, which is depicted in Figure 4. In the first iteration (Figure 4a), the 14th sample exceeded the control limit (χ²_(9,0.95) = 16.92) and was removed from the Phase I data. Then (Figure 4b) the Phase I computations were continued by m = 19, which led to another outlier and finally, in the third iteration, all the samples (m = 18) were in IC (Figure 4c). After this procedure, the IC parameters were estimated by the remaining eighteen profiles. The IC model was obtained as y_ij = 18.34 + 3.52x_i − 6.14x_i² − 1.89x_i³ − 0.27x_i⁴ + 0.02x_i⁵ − 0.001x_i⁶ + 0.000025x_i⁷ − 0.00000026x_i⁸ + ε_ij, where the standard deviation of errors is estimated as 1.38. Furthermore, the variance–covariance matrix is a 9 × 9 matrix which is not reported, for brevity.

To start the Phase II analysis,

{U C L}_{T^{2}}

and UCL_MEWMA are calculated by simulations as 23.59 and 2.68, respectively. As the network problems, which are usually referred to as OC shift in the SPC literature, had happened on 24 September 2021, there are 20 natural (IC) days in Phase II, and it was not expected to have any signal during the period from 4 to 23 September 2021. The profile of these 20 IC samples (days) are shown in Figure 3b. It could be inferenced from Figure 3a,b that the Phase I and IC Phase II samples provided similar profiles.

To detect unnatural conditions, two control charts were used to monitor the internet usage in Phase II and these are depicted in Figure 5. As an interesting result in Figure 5a, the T² control chart had very weak performance in such a way that it triggered an OC signal with a 132-day delay. As mentioned in the initial part of this subsection, the problem in the network (or equivalent OC condition) lasted about 15 days, so it could be said that the T² control chart was not able to identify this situation. On the other hand, the MEWMA chart only needed 7 days to reach an OC signal (i.e., 1 October 2021) which is completely superior to the T² control chart. The increasing pattern in MEWMA statistics after the occurrence of the change (i.e., 24 September 2021), which is obvious in Figure 5b, indicates that the exact time of the shift was nearly seven days later. The reason behind this inference is related to another SPC field denoted as the change point estimation. As it is beyond the scope of this paper, interested readers are referred to Holland and Hawkins [55] and Montgomery [2].

As the T² control chart is weak in the detection of small shifts, it could be stated that the applied magnitude shift on 24 September 2021, due to a network problem, was small. On the other hand, the MEWMA approach, which has superior ability in small shifts, identified the OC situation after 7 days. To better show this assertion, the Instagram internet usage profiles of these 7 days (i.e., 24 September–1 October 2021) are illustrated in Figure 3c. Some differences in the pattern of this period can be seen comparing two other IC situations (Figure 3a,b) which could be an indicator of a small shift. However, decision making about the IC and OC situations with only the information from plotting the profiles is not an easy task, and a scientific and statistical approach is needed to provide acceptable results. As the last step, the provided signal should be interpreted by an expert to find the main reasons for its occurrence. That said, we know now it was caused by some repairs and maintenances actions in the network. Generally, the discussions and conclusions of this subsection show that the proposed framework is a proper choice for the identification of the unnatural patterns in the monitoring procedure of internet usage.

4.2. Scenario II: Detection the Effect of Discount Policy

The second scenario investigates the effect of a discount policy in the internet usage pattern using the proposed framework. The company had suggested an 80% discount on the internet prices due to holidays on 12–26 March 2022 between 0:00 and 7:00. So, it is expected to see a tangible increase in these hours’ usage on the given days and also some changes in the patterns of other hours. The aim is to find how and when the proposed framework identifies this situation. Additionally, dealing with the autocorrelation effect is investigated. Similarly, to the previous subsection, the results of WhatsApp are here neglected for brevity.

Suppose the current time is 6 March 2022 and m is set at 20, so Phase I data involves 15 February–6 March 2022. The profiles of these twenty days are shown in Figure 6a. Obviously, there are five IC days in Phase II (7–11 March 2022) which are depicted in Figure 6b. Finally, the discount times (12–26 March 2022) in which it is supposed to reach an OC signal as soon as possible are illustrated in Figure 6c. The effect of the discount is obvious in Figure 6c, especially between 3:00 and 7:00, as the internet usage has almost quadrupled.

It is again necessary to transform the data due to the non-normality of the raw values; for example, Figure 7 depicts the histogram of the residual of the first ten profiles, which is an indicator of the existence of the skewness in all subgroups. After transforming the values, p_K and p_L were calculated as 0.22 and 0.51, respectively (the best value of k was again 8). However, the values of

\bar{\emptyset}

and

\bar{ρ}

were computed as 0.38 and 0.06, respectively, which indicates the between-profile autocorrelation, and thus the charting statistics in Section 3.3 are not appropriate. Figure 7 depicts a matrix of the residual of the first ten Phase I profiles as the scatter plots for each pair (the diagonal histograms are depicted based on the residual of raw values of each profile). The first lag autocorrelation and also non-normality of the raw values’ residuals can be clearly observed in Figure 7.

To remedy this challenge, we first tested the LMM approach implemented by Amiri, Jensen and Kazemzadeh [17] but it was not able to remove the autocorrelation effect. Then, we utilized the approach proposed by Noorossana, Amiri and Soleimani [45]. As it was developed for Phase II analysis, we applied some modification to be able to implement it in Phase I. For further illustration, the details are provided in Appendix A. Considering the formula provided in Appendix A, the UCL was obtained as χ²_(24,0.95), which is equal to 36.42 and the charting statistics were computed as shown in Figure 8. From that, it could be concluded that all the profiles were IC and the parameters could be estimated. The IC model was obtained as y_ij = 18.11 + 5x_i − 6.83x_i² + 2x_i³ − 0.28x_i⁴ + 0.02x_i⁵ − 0.001x_i⁶ + 0.000025x_i⁷ − 0.00000025x_i⁸ + ε_ij, and the standard deviation of errors is estimated as 1.55. Furthermore, the variance–covariance matrix is a 9 × 9 matrix which is not reported, for brevity.

For Phase II analysis, the charting statistic of Noorossana, Amiri and Soleimani [45] was utilized as the T² control chart to consider the autocorrelation effect. Due to the robustness of EWMA control charts against the autocorrelation effect in Phase II analysis, we also used the MEWMA approach, proposed by Zou, Tsung and Wang [42], in this section with consideration of autocorrelation in the parameter estimation as discussed in Appendix A. For this aim, the

{U C L}_{T^{2}}

and UCL_MEWMA are calculated using simulations as 45.56 and 1.78, respectively. Figure 9 shows the charting statistics of each control chart from the beginning of Phase II until the end of the discount time (7–26 March 2022). The yellow vertical lines are the start of the discount. As it can be seen, both charts did not provide a false alarm in the first five days (7–11 March 2022). On the other hand, T² control charts had a better performance as they could detect the OC situation in the first day of the discount, but the MEWMA chart needed two days for a signal. It could be inferenced that a shift with a large magnitude occurred in the profile parameters after the beginning of the discount, as the T² control chart could detect it earlier. Due to the recursive effect in the MEWMA statistic, we can see the rise in the U_j values during the discount period. From the managerial side, the operator should check and interpret the reason behind the signal. The investigation of the usage of internet in the signal days (T²: 12 March 2022 and MEWMA: 13 March 2022) shows that there is an increase in the internet usage between 0:00 and 7:00 in this period.

4.3. Scenario III: Detection of Social Behavior Variations

Due to some protests in the company’s country, the connectivity to some major social networks such as Telegram was lost between 24 February and 1 March 2021. On the other hand, the users were able to connect to WhatsApp without any filtering; this caused a tangible increase in the WhatsApp messages during these days. The difference between this situation and Scenario I (in Section 4.1) is that only one platform changed in Scenario III while all of them were influenced in Scenario I. The aim of this subsection is to illustrate the ability of the proposed framework to detect this social behaviour variation. Some of the similar results are not given in this subsection for the sake of brevity.

Suppose that the current time is 20 February 2021 and m is set at 20 so Phase I data involves 1–20 February 2021. The profiles of these twenty days are shown in Figure 10a. Three days in Phase II did not have any change (21–23 February 2021), which are depicted in Figure 10b. Finally, the period with a sudden increase in the WhatsApp internet usage (24 February–1 March 2021) is illustrated in Figure 10c; for example, the maximum usage between 22:00 and 23:00 was about 100 GB on 1 March, while the average usage in this hour was about 60 GB in Phase I, i.e., 1–20 February 2021.

By a similar procedure as Table 1, using s = 0.55, it was possible to reach a verified model with k = 5, p_K = 0.06, p_L = 0.95,

\bar{\emptyset}

= −0.07 and

\bar{ρ}

= −0.15. So, the autocorrelation effect was neglected in the subsequent computations. Then, the Phase I analysis was carried out with two iterations and the IC model was obtained as y_ij = 9.12 − 4.19x_i − 0.73x_i² − 0.05x_i³ + 0.002x_i⁴ − 0.00002x_i⁵ + ε_ij, and the standard deviation of errors was estimated as 1.12. Furthermore, the variance–covariance matrix is a 9 × 9 matrix which is not reported, for brevity.

For Phase II analysis, the

{U C L}_{T^{2}}

and UCL_MEWMA were calculated using simulations as 18.54 and 2.18, respectively. Then, the charting statistics were computed for both approaches from the beginning of Phase II until twenty days later. The results are reported in Table 2. Both control charts could detect the unnatural situation as soon as it happened. The T² control chart retrieved from OC to IC on 5 March 2021 in which the WhatsApp internet usage was normal, while the MEWMA scheme still showed the OC situation until 13 March 2021. As the MEWMA statistic stores the previous points information, it needs more time to come back to the IC condition. To avoid false alarms, one can reset the initial parameters of MEWMA schemes (i.e., W₀ in Equation (8)) which is not in line with the aim of this study [56]. Another useful solution may be the use of smaller smoothing parameters for the MEWMA approach (θ in Equation (8)). Montgomery [2] evaluated the performance of the EWMA control chart under different smoothing parameter values. Knowing the magnitude of the OC shift, we can select the proper value for θ. The very large charting statistics in OC profiles are an indication of large shifts in the OC situation which can also be found by comparing different panels in Figure 10.

The results of the three scenarios show that the proposed framework can be considered as a practical application in the monitoring of unnatural situations and events in telecom companies. Using this procedure, it is not needed to monitor each hour of the day separately, as it incorporates all of them as a charting statistic. Due to the extension of the profile monitoring statistic, an acceptable detection ability is achieved such that the OC situations in all the scenarios are detected very quickly. Furthermore, consideration of non-normal and autocorrelated conditions provide a very powerful and accurate framework under different model assumptions.

5. Concluding Remarks

Although several novelties and ideas have been extended in the area of profile monitoring, papers about the practical application of profile monitoring are scarce. In line with this knowledge gap, in this paper, on the basis of a profile monitoring idea, a novel practical monitoring framework is proposed which includes five steps: (i) model estimation, (ii) model verification, (iii) Phase I analysis, (iv) Phase II analysis, and finally (v) signal interpretation. In this study, effort is made to consider different practical challenges and assumptions, such as the normality of data, the existence of autocorrelation, etc., and to provide useful solutions for them. The practical implementation of the proposed framework is demonstrated on the internet usage monitoring of telecom companies which has not been investigated in the literature. These companies encounter three major monitoring challenges entailing (i) detection of unnatural patterns, network failure and other abnormal situations in which the total trend of a day is affected and varied, (ii) identifying the impact of policies such as providing discounts or markups for specific hours and (iii) investigation of general social behaviour variations which could also be considered as the customer requirement. For this aim, a polynomial profile model between the hours of each day and the internet usage is defined and monitored over time. The proposed framework is applied to real historical data in three different timestamps and its monitoring ability is investigated. The results show that it can detect unnatural situations very quickly under the proper adjustments and setups.

As this paper proposes a general applicable framework, it is not limited to polynomial profiles and it can be implemented in other real applications with different IC models. Finally, future studies can apply other Phase I and II control charts under the proposed framework and compare them with this approach. The major limitation of this study relates to the violation of predefined assumption such as normality, independence and so forth. In a situation such as the latter, one useful solution is to extend the non-parametric methods in this area of study. Several non-parametric control charts have been proposed and the use of them in this area can be considered as a nice future research direction.

Author Contributions

Conceptualization, U.N., A.Y. and S.C.S.; Formal analysis, A.Y. and U.N.; Investigation, A.Y., U.N., S.C.S. and A.H., Validation, Resources, A.Y. and A.H.; Software, A.Y. and U.N.; Writing—review & editing, A.Y., U.N., S.C.S. and A.H.; Methodology, A.Y. and U.N.; Supervision, A.Y. and S.C.S.; Project Administration, S.C.S. All authors have read and agreed to the published version of the manuscript.

Funding

The Masters research work of U.N. is funded by National Research Foundation (NRF) in South Africa and the research work of A.Y. is funded by University of the Free State Postdoctoral Fellowship.

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data of this study can be given to the interested researchers upon request.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. The Proposed Charting Statistic by Noorossana, Amiri and Soleimani [45]

In this approach, it is assumed that profiles are not independent from each other over time. This dependence exists in successive profiles in such a way that the autocorrelation structure follows a first-order autoregressive model with a known coefficient ϕ between the profiles. Furthermore, the IC simple linear model parameters are shown by A₀, A₁ and σ₀² which are denoted by intercept, slope and error variance. For the jth generated profile in Phase II, the estimated response variables and errors are obtained based on the previous profile responses as

\begin{array}{l} {\hat{y}}_{i j} = ϕ y_{i j} + (1 - ϕ) (A_{0} + A_{1} x_{i}), \\ e_{i j} = y_{i j} - {\hat{y}}_{i j} = y_{i j} - ϕ y_{i j} - (1 - ϕ) (A_{0} + A_{1} x_{i}) . \end{array}

(A1)

Then, the charting statistic is defined as

\begin{array}{l} T_{j}^{2} = {\underline{e}}_{j} \sum_{{\underline{e}}_{j}}^{- 1} {\underline{e}}_{j}^{- 1}, \\ \sum_{{\underline{e}}_{j}}^{} = σ_{0}^{2} I, \end{array}

(A2)

where I represents the n × n identity matrix and

{\underline{e}}_{j}

indicates the 1 × n vector of error terms. The UCL of the proposed statistic is computed by χ²_n_,_α. Furthermore, the estimation of error variance is different from Equation (3), where for the jth generated profile, it is computed as follows:

{\hat{σ}}_{j}^{2} = \frac{\sum_{i = 1}^{n} e_{i j}^{2}}{n} .

(A3)

References

Viharos, Z.J.; Jakab, R. Reinforcement Learning for Statistical Process Control in Manufacturing. Measurement 2021, 182, 109616. [Google Scholar] [CrossRef]
Montgomery, D.C. Introduction to Statistical Quality Control; John Wiley & Sons: Hoboken, NJ, USA, 2020. [Google Scholar]
Yeganeh, A.; Abbasi, S.A.; Shongwe, S.C. A Novel Simulation-Based Adaptive MEWMA Approach for Monitoring Linear and Logistic Profiles. IEEE Access 2021, 9, 124268–124280. [Google Scholar] [CrossRef]
Liu, Y.; Zhu, J.; Lin, D.K.J. A generalized likelihood ratio test for monitoring profile data. J. Appl. Stat. 2021, 48, 1402–1415. [Google Scholar] [CrossRef] [PubMed]
Kang, L.; Albin, S.L. On-Line Monitoring When the Process Yields a Linear Profile. J. Qual. Technol. 2000, 32, 418–426. [Google Scholar] [CrossRef]
Kim, K.; Mahmoud, M.A.; Woodall, W.H. On the Monitoring of Linear Profiles. J. Qual. Technol. 2003, 35, 317–328. [Google Scholar] [CrossRef]
Woodall, W.H.; Spitzner, D.J.; Montgomery, D.C.; Gupta, S. Using Control Charts to Monitor Process and Product Quality Profiles. J. Qual. Technol. 2004, 36, 309–320. [Google Scholar] [CrossRef]
Yeganeh, A.; Shongwe, S.C. A novel application of statistical process control charts in financial market surveillance with the idea of profile monitoring. PLoS ONE 2023, 18, e0288627. [Google Scholar] [CrossRef]
Ding, D.; Tsung, F.; Li, J. Ordinal profile monitoring with random explanatory variables. Int. J. Prod. Res. 2017, 55, 736–749. [Google Scholar] [CrossRef]
Alevizakos, V.; Koukouvinos, C.; Lappa, A. Comparative study of the Cp and Spmk indices for logistic regression profile using different link functions. Qual. Eng. 2019, 31, 453–462. [Google Scholar] [CrossRef]
Mohammadzadeh, M.; Yeganeh, A.; Shadman, A. Monitoring logistic profiles using variable sample interval approach. Comput. Ind. Eng. 2021, 158, 107438. [Google Scholar] [CrossRef]
He, S.; Song, L.; Shang, Y.; Wang, Z. Change-point detection in Phase I for autocorrelated Poisson profiles with random or unbalanced designs. Int. J. Prod. Res. 2020, 1–18. [Google Scholar] [CrossRef]
Steiner, S.; Jensen, W.A.; Grimshaw, S.D.; Espen, B. Nonlinear Profile Monitoring for Oven-Temperature Data. J. Qual. Technol. 2016, 48, 84–97. [Google Scholar] [CrossRef]
Pacella, M.; Semeraro, Q. Monitoring roundness profiles based on an unsupervised neural network algorithm. Comput. Ind. Eng. 2011, 60, 677–689. [Google Scholar] [CrossRef]
Ghosh, M.; Li, Y.; Zeng, L.; Zhang, Z.; Zhou, Q. Modeling multivariate profiles using Gaussian process-controlled B-splines. IISE Trans. 2021, 53, 787–798. [Google Scholar] [CrossRef]
Kazemzadeh, R.B.; Noorossana, R.; Amiri, A. Phase I monitoring of polynomial profiles. Commun. Stat.—Theory Methods 2008, 37, 1671–1686. [Google Scholar] [CrossRef]
Amiri, A.; Jensen, W.A.; Kazemzadeh, R.B. A case study on monitoring polynomial profiles in the automotive industry. Qual. Reliab. Eng. Int. 2010, 26, 509–520. [Google Scholar] [CrossRef]
Zhang, Y.; He, Z.; Shan, L.; Zhang, M. Directed control charts for detecting the shape changes from linear profiles to quadratic profiles. Int. J. Prod. Res. 2014, 52, 3417–3430. [Google Scholar] [CrossRef]
Zhang, Y.; He, Z.; Zhang, M.; Wang, Q. A Score-test-based EWMA Control Chart for Detecting Prespecified Quadratic Changes in Linear Profiles. Qual. Reliab. Eng. Int. 2016, 32, 921–931. [Google Scholar] [CrossRef]
Zhang, Y.; Shang, Y.; He, Z.; Wang, Q. CUSUM Schemes for Monitoring Prespecified Changes in Linear Profiles. Qual. Reliab. Eng. Int. 2017, 33, 579–594. [Google Scholar] [CrossRef]
Yao, C.; Li, Z.; He, C.; Zhang, J. A Phase II control chart based on the weighted likelihood ratio test for monitoring polynomial profiles. J. Stat. Comput. Simul. 2020, 90, 676–698. [Google Scholar] [CrossRef]
Jamal, A.; Mahmood, T.; Riaz, M.; Al-Ahmadi, H.M. GLM-based flexible monitoring methods: An application to real-time highway safety surveillance. Symmetry 2021, 13, 362. [Google Scholar] [CrossRef]
Gupta, S.; Montgomery, D.C.; Woodall, W.H. Performance evaluation of two methods for online monitoring of linear calibration profiles. Int. J. Prod. Res. 2006, 44, 1927–1942. [Google Scholar] [CrossRef]
Abbas, T.; Mahmood, T.; Riaz, M.; Abid, M. Improved linear profiling methods under classical and Bayesian setups: An application to chemical gas sensors. Chemom. Intell. Lab. Syst. 2020, 196, 103908. [Google Scholar] [CrossRef]
Jeong, Y.-S.; Kim, B.; Ko, Y.-D. Exponentially weighted moving average-based procedure with adaptive thresholding for monitoring nonlinear profiles: Monitoring of plasma etch process in semiconductor manufacturing. Expert Syst. Appl. 2013, 40, 5688–5693. [Google Scholar] [CrossRef]
Azarnoush, B.; Paynabar, K.; Bekki, J.; Runger, G. Monitoring Temporal Homogeneity in Attributed Network Streams. J. Qual. Technol. 2016, 48, 28–43. [Google Scholar] [CrossRef]
Fotuhi, H.; Amiri, A.; Taheriyoun, A.R. A novel approach based on multiple correspondence analysis for monitoring social networks with categorical attributed data. J. Stat. Comput. Simul. 2019, 89, 3137–3164. [Google Scholar] [CrossRef]
Yeganeh, A.; Shadman, A.; Shongwe, S.C.; Abbasi, S.A. Employing evolutionary artificial neural network in risk-adjusted monitoring of surgical performance. Neural Comput. Appl. 2023, 35, 10677–10693. [Google Scholar] [CrossRef]
Wang, Y.; Li, J.; Ma, Y.; Song, L.; Wang, Z. Nonparametric monitoring schemes in Phase II for ordinal profiles with application to customer satisfaction monitoring. Comput. Ind. Eng. 2022, 165, 107931. [Google Scholar] [CrossRef]
Barbeito, I.; Zaragoza, S.; Tarrío-Saavedra, J.; Naya, S. Assessing thermal comfort and energy efficiency in buildings by statistical quality control for autocorrelated data. Appl. Energy 2017, 190, 1–17. [Google Scholar] [CrossRef]
Kim, J.; Lim, C. Customer complaints monitoring with customer review data analytics: An integrated method of sentiment and statistical process control analyses. Adv. Eng. Inform. 2021, 49, 101304. [Google Scholar] [CrossRef]
Chen, K.-Y.; Shaw, Y.-C. Applying back propagation network to cold chain temperature monitoring. Adv. Eng. Inform. 2011, 25, 11–22. [Google Scholar] [CrossRef]
Freitas, L.L.G.; Kalbusch, A.; Henning, E.; Walter, O.M.F.C. Using Statistical Control Charts to Monitor Building Water Consumption: A Case Study on the Replacement of Toilets. Water 2021, 13, 2474. [Google Scholar] [CrossRef]
Gebert, S.; Pries, R.; Schlosser, D.; Heck, K. Internet Access Traffic Measurement and Analysis. In Proceedings of Traffic Monitoring and Analysis; Springer: Berlin/Heidelberg, Germany, 2012; pp. 29–42. [Google Scholar]
Jovic, J.; Pantovic-Stefanovic, M.; Mitkovic-Voncina, M.; Dunjic-Kostic, B.; Mihajlovic, G.; Milovanovic, S.; Ivkovic, M.; Fiorillo, A.; Latas, M. Internet use during coronavirus disease of 2019 pandemic: Psychiatric history and sociodemographics as predictors. Indian J. Psychiatry 2020, 62, S383. [Google Scholar] [CrossRef] [PubMed]
Subudhi, R.; Palai, D. Impact of internet use during COVID lockdown. Horiz. J. Hum. Soc. Sci 2020, 2, 59–66. [Google Scholar]
Zwetsloot, I.M.; Jones-Farmer, L.A.; Woodall, W.H. Monitoring univariate processes using control charts: Some practical issues and advice. Qual. Eng. 2023, 1–13. [Google Scholar] [CrossRef]
Chuang, S.-C.; Hung, Y.-C.; Tsai, W.-C.; Yang, S.-F. A framework for nonparametric profile monitoring. Comput. Ind. Eng. 2013, 64, 482–491. [Google Scholar] [CrossRef]
Nawaz, M.; Maulud, A.S.; Zabiri, H.; Taqvi, S.A.A.; Idris, A. Improved process monitoring using the CUSUM and EWMA-based multiscale PCA fault detection framework. Chin. J. Chem. Eng. 2021, 29, 253–265. [Google Scholar] [CrossRef]
Han, D.; Tsung, F. A reference-free cuscore chart for dynamic mean change detection and a unified framework for charting performance comparison. J. Am. Stat. Assoc. 2006, 101, 368–386. [Google Scholar] [CrossRef]
Motasemi, A.; Alaeddini, A.; Zou, C. An Area-based Methodology for the Monitoring of General Linear Profiles. Qual. Reliab. Eng. Int. 2017, 33, 159–181. [Google Scholar] [CrossRef]
Zou, C.; Tsung, F.; Wang, Z. Monitoring General Linear Profiles Using Multivariate Exponentially Weighted Moving Average Schemes. Technometrics 2007, 49, 395–408. [Google Scholar] [CrossRef]
Yeganeh, A.; Abbasi, S.A.; Pourpanah, F.; Shadman, A.; Johannssen, A.; Chukhrova, N. An ensemble neural network framework for improving the detection ability of a base control chart in non-parametric profile monitoring. Expert Syst. Appl. 2022, 204, 117572. [Google Scholar] [CrossRef]
Malela-Majika, J.-C.; Shongwe, S.C.; Chatterjee, K.; Koukouvinos, C. Monitoring univariate and multivariate profiles using the triple exponentially weighted moving average scheme with fixed and random explanatory variables. Comput. Ind. Eng. 2022, 163, 107846. [Google Scholar] [CrossRef]
Noorossana, R.; Amiri, A.; Soleimani, P. On the monitoring of autocorrelated linear profiles. Commun. Stat.—Theory Methods 2008, 37, 425–442. [Google Scholar] [CrossRef]
Soleimani, P.; Noorossana, R.; Amiri, A. Simple linear profiles monitoring in the presence of within profile autocorrelation. Comput. Ind. Eng. 2009, 57, 1015–1021. [Google Scholar] [CrossRef]
Ahmadi, A.; Yeganeh, A.; Shadman, A. Monitoring simple linear profiles in the presence of within- and between-profile autocorrelation. Qual. Reliab. Eng. Int. 2023, 39, 752–775. [Google Scholar]
Yeh, A.B.; Huwang, L.; Li, Y.-M. Profile monitoring for a binary response. IIE Trans. 2009, 41, 931–941. [Google Scholar] [CrossRef]
Williams, J.D.; Woodall, W.H.; Birch, J.B. Statistical monitoring of nonlinear product and process quality profiles. Qual. Reliab. Eng. Int. 2007, 23, 925–941. [Google Scholar] [CrossRef]
Williams, J.D.; Woodall, W.H.; Birch, J.B.; Sullivan, J.H. Distribution of Hotelling’s T 2 statistic based on the successive differences estimator. J. Qual. Technol. 2006, 38, 217–229. [Google Scholar] [CrossRef]
Yeganeh, A.; Chukhrova, N.; Johannssen, A.; Fotuhi, H. A network surveillance approach using machine learning based control charts. Expert Syst. Appl. 2023, 219, 119660. [Google Scholar] [CrossRef]
Yeganeh, A.; Shadman, A.; Amiri, A. A novel run rules based MEWMA scheme for monitoring general linear profiles. Comput. Ind. Eng. 2021, 152, 107031. [Google Scholar] [CrossRef]
Sheu, S.-H.; Lu, S.-L. Monitoring autocorrelated process mean and variance using a GWMA chart based on residuals. Asia-Pac. J. Oper. Res. 2008, 25, 781–792. [Google Scholar] [CrossRef]
Li, Y.; Pan, E.; Xiao, Y. On autoregressive model selection for the exponentially weighted moving average control chart of residuals in monitoring the mean of autocorrelated processes. Qual. Reliab. Eng. Int. 2020, 36, 2351–2369. [Google Scholar] [CrossRef]
Holland, M.D.; Hawkins, D.M. A control chart based on a nonparametric multivariate change-point model. J. Qual. Technol. 2014, 46, 63–77. [Google Scholar] [CrossRef]
Mitra, A.; Lee, K.B.; Chakraborti, S. An adaptive exponentially weighted moving average-type control chart to monitor the process mean. Eur. J. Oper. Res. 2019, 279, 902–911. [Google Scholar] [CrossRef]

Figure 1. The internet usage of more frequently active subscribers for (a) Instagram and (b) WhatsApp in the first 10 days of 2021 (each line is an indicator of a specific day—10 days in total).

Figure 2. The proposed monitoring framework of this study including five steps (the values of m, a and ARL₀ are determined by the user).

Figure 3. The Instagram internet usage of more frequently active subscribers for (a) Phase I, (b) IC samples (days) of Phase II and (c) the needed days to reach an OC signal with MEWMA approach (each line is indicator of a specific day).

Figure 4. The Phase I chart statistics (T²_h) in the three iterations. The red horizontal line is equal to χ²_(9,0.95).

Figure 5. The statistics of T² (a) and MEWMA (b) control charts from the start of Phase II until an OC signal. The red horizontal lines are control limits. The yellow vertical lines indicate the network problem beginning.

Figure 6. The Instagram internet usage of more frequently active subscribers for (a) Phase I (b), IC samples (days) of Phase II and (c) the discount time (each line is indicator of a specific day).

Figure 7. The scatter plot of the first ten Phase I profiles’ residuals. The diagonal histograms were drawn by the raw values of residuals.

Figure 8. The Phase I chart statistics (T²_h) based on the Noorossana, Amiri and Soleimani [45] approach. The red horizontal line is equal to χ²_(24,0.95).

Figure 9. The statistics of T² (a) and MEWMA (b) control charts from start of Phase II until the end of the discount time. The red horizontal lines are control limits. The yellow vertical lines indicate the beginning of the discount (i.e., 12 March 2022).

Figure 10. The WhatsApp internet usage of more frequently active subscribers for (a) Phase I (b), IC samples (days) of Phase II and (c) the sudden increase time (each line is indicator of a specific day).

Table 1. The procedure for reaching the best value of k as the model estimation and verification steps (red values are indicator of nonacceptable results)—the best selected value of k is 8.

k	Without Transformation				With Transformation
k	p_ks	p_l	$\bar{\emptyset}$	$\bar{ρ}$	p_ks	p_l	$\bar{\emptyset}$	$\bar{ρ}$
2	0	0.99	0.39	−0.11	0	1	0.59	−0.10
3	0	0.97	0.4	−0.14	0	1	0.57	−0.10
4	0	0.87	−0.04	−0.15	0	0.78	0.01	−0.09
5	0	0.81	−0.06	−0.16	0.0001	0.63	−0.14	−0.11
6	0	0.78	−0.4	−0.16	0.02	0.31	−0.18	−0.11
7	0	0.04	−0.22	−0.12	0.02	0.04	−0.25	−0.09
8	0	0.22	−0.32	−0.12	0.39	0.55	−0.30	−0.07
9	0	0.18	−0.31	−0.08	0.16	0.5	−0.30	−0.06

Table 2. The charting statistics of T² and MEWMA control charts for the phase II analysis of WhatsApp internet usage—the red values exceed the control limit.

j	Date	Process Condition	$T_{j}^{2}$	U_j
1	24 February 2021	IC	2.03	0.08
2	25 February 2021	IC	14.77	0.17
3	26 February 2021	IC	4.06	0.29
4	27 February 2021	OC	1163.49	3.61
5	28 February 2021	OC	1206.91	11.31
6	1 March 2021	OC	1422.83	22.11
7	2 March 2021	OC	1584.19	34.91
8	3 March 2021	OC	1763.52	48.04
9	4 March 2021	OC	2049.03	62.63
10	5 March 2021	IC	14.27	39.35
11	6 March 2021	IC	5.94	25.86
12	7 March 2021	IC	7.46	17.92
13	8 March 2021	IC	25.02	12.24
14	9 March 2021	IC	8.59	8.52
15	10 March 2021	IC	19.21	6.07
16	11 March 2021	IC	13.36	3.55
17	12 March 2021	IC	10.91	2.94
18	13 March 2021	IC	10.01	2.07
19	14 March 2021	IC	15.37	1.15
20	15 March 2021	IC	27.94	0.98

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Netshiozwi, U.; Yeganeh, A.; Shongwe, S.C.; Hakimi, A. Data-Driven Surveillance of Internet Usage Using a Polynomial Profile Monitoring Scheme. Mathematics 2023, 11, 3650. https://doi.org/10.3390/math11173650

AMA Style

Netshiozwi U, Yeganeh A, Shongwe SC, Hakimi A. Data-Driven Surveillance of Internet Usage Using a Polynomial Profile Monitoring Scheme. Mathematics. 2023; 11(17):3650. https://doi.org/10.3390/math11173650

Chicago/Turabian Style

Netshiozwi, Unarine, Ali Yeganeh, Sandile Charles Shongwe, and Ahmad Hakimi. 2023. "Data-Driven Surveillance of Internet Usage Using a Polynomial Profile Monitoring Scheme" Mathematics 11, no. 17: 3650. https://doi.org/10.3390/math11173650

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data-Driven Surveillance of Internet Usage Using a Polynomial Profile Monitoring Scheme

Abstract

1. Introduction

2. Problem Definition

3. The Proposed Framework

3.1. Model Estimation

3.2. Model Verification

3.3. Phase I Analysis

3.4. Phase II Analysis

3.5. Signal Interpretation

4. Analysis of the Practical Implementation of the Proposed Framework

4.1. Scenario I: Detection of Unnatural Patterns in Internet Usage

4.2. Scenario II: Detection the Effect of Discount Policy

4.3. Scenario III: Detection of Social Behavior Variations

5. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. The Proposed Charting Statistic by Noorossana, Amiri and Soleimani [45]

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI