**2. Methodology**

In this section, methodologies of control charts for functional random variables, **X**, which take values in a functional space E = *<sup>L</sup>*<sup>2</sup>(*T*), with *T* ⊂ R, are developed.

Based on observations of the functional variable **X**, we obtain a sample each of calibration and monitoring; they are functional datasets of sizes *n* and *m*, respectively, which allow us to build control charts for Phase I (in the case of the calibration sample) and Phase II (from the monitoring sample).

In the case of designing the control charts, any unstable or out-of-control process is refereed to the assignable causes of variation emerging from unusual and avoidable events that interrupt the process, that is, when they cause a change in the parameters of the underlying model of the profile or functional data [**?** ]; these variations can be eliminated from the data by identifying and acting on the cause; this approach will avoid such variations in the future [**?** ].

Concerning a method for building quality control charts, the probability of process instability (level of significance) presents a measure of its performance. This probability, provided that it is within the *H*0, allows the derivation of at least one measure (observed value of the statistic) outside the control limits [**?** ]. Phase I involves the development of a method and the estimation of the level of significance; however, in the case of Phase II, is the process is assumed to be under control, that is, the level of significance is fixed.

### *2.1. Procedure for Building a Control Chart for Phase I (Stabilization)*

As mentioned above, in Phase I, the retrospective data corresponding to a calibration sample of size *n* is analyzed for evaluating the stability of the process whose quality is ascertained over time and for estimating the parameters of the control chart [**?** ].

In Phase I, a control chart is used to test the hypothesis that there is no change in the distribution of observations of the variable ordered with respect to time {X1(*t*), <sup>X</sup>2(*t*),..., <sup>X</sup>*n*(*t*)}.

These changes can be punctual (freaks or bunches) or they can be related to a change in process that is evaluated (observable through patterns of sudden or gradual change in the mean of the process). Concerning isolated changes, it refers to the occurrence of at least one observation of the observed variable that deviates from the distribution of the other observations [**?** ]. The hypothesis tested in Phase I is:

$$\begin{aligned} H\_0: \mathcal{X}\_i(t) &\stackrel{d}{=} \mathcal{X}\_j'(t), \forall i, j \in \{1, \dots, n\} \\ H\_a: \mathcal{X}\_i(t) &\stackrel{d}{\neq} \mathcal{X}\_j'(t), \text{for some } i, j \in \{1, \dots, n\}. \end{aligned} \tag{1}$$

The stabilization phase of a process consists of applying an iterative method that allows the detection and elimination of those observations (in this context curves) that have a deviation with respect to the shape or magnitude of most of the observed curves. In other words, a curve is an atypical value if it has been generated by a different stochastic process or there is a change in the trend or variability of the stochastic process with respect to that corresponding to the remaining data [**?** ]. The quantity of outliers is assumed to be unknown, although small.

In this study, the proposed control charts for Phase I are based on the adaptation of outlier detection methodologies from functional data, based on the data depth calculations [**?** ].

The method of the detection of outliers for functional data [**?** ] considers an atypical curve if its depth is less than a specific quantile of the distribution of depths estimated by bootstrap. In other words, an atypical curve will have a significantly low depth.

This procedure can be used with different types of functional depths. In the library fda.usc [**?** ], the following alternatives are offered: Fraiman and Muniz (FM) [**? ?** ], mode depth [**?** ] and random projections depth (RP) [**? ?** ]. In this work, the outlier detection procedure proposed by Febrero et al. fda.usc has been adapted to estimate a specific quantile of the depth distribution that plays the role of the lower control limit (LCL) for a Phase I control chart.

The control chart proposed for Phase I is estimated and plotted from a depth measurement (FM, RP or mode) and only the lower control limit (LCL) is considered to detect if the process is out-of-control (the depth of a curve is less than the LCL). In addition to this representation, an additional chart showing the original curves is proposed to provide an intuitive idea about the cause behind the identified anomaly (by checking its shape or magnitude) and thus to identify assignable causes. For instance, in the case of the HVAC energy consumption in buildings, anomalies may include stopping air handler, failure of the counter or sensor, a change in the regulation of the machines and adverse weather conditions.

In Phase I, we consider the functional random variable X , from which a random sample is drawn—{X1(*t*), <sup>X</sup>2(*t*), ... , <sup>X</sup>*n*(*t*)}. These data are used to formulate the following steps to build the control chart for Phase I:


out-of-control) is small (for example, *α* = 1%). The following procedures are used to estimate the LCL.

	- **–** Reorder the curves according to their depths in a decreasing way. <sup>X</sup>(1),..., <sup>X</sup>(*n*).
	- **–** It is assumed that, at most, *α*% of the sample can be considered outliers.
	- **–** *B* samples are obtained by a smoothed bootstrap procedure from the dataset resulting from discarding the *α*% of the less depth curves. Let X ∗*b i* , *i* = 1, ... , *n*, *b* = 1, ... , *B* these bootstrap samples. To obtain each bootstrap resample:
		- ∗ A uniform sampling is done, *i*∗ of 1, . . . , [*n*(<sup>1</sup> − *<sup>α</sup>*)].
		- ∗ *Zi*∗ is generated as a Gaussian process with zero mean and variance and covariance matrix.

*γ*ΣX with *γ* ∈ [0, 1], where ΣX is the variance and covariance matrix of the observations <sup>X</sup>(1),..., <sup>X</sup>([*n*(<sup>1</sup>−*<sup>α</sup>*)]).

	- **–** The depths of the X1,..., X*n* curves are obtained.
	- **–** *B* samples are obtained through a smoothed bootstrap from the original dataset weighted by their depths. Let X ∗*b i* , *i* = 1, ... , *n*, *b* = 1, ... , *B* these bootstrap samples. These replicas would be obtained:
		- ∗ Weighted sampling is performed, with *i*∗ of 1, . . . , *n* and with probability proportional to *<sup>D</sup>*(<sup>X</sup>1),..., *<sup>D</sup>*(<sup>X</sup>*n*).
		- ∗ *Zi*∗ is generated as a Gaussian process with zero mean and variance-covariance matrix *γ*ΣX , with *γ* [0, 1], where ΣX is the variance and covariance matrix of the observations X1,..., X*<sup>n</sup>*.
		- ∗ Finally, we ge<sup>t</sup> X ∗*b i* = X*i*∗ + *Zi*∗
	- **–** For each *b* = 1, ... , *B*, we ge<sup>t</sup> *<sup>C</sup>b*, which is the empirical quantile corresponding to the *α*% of the distribution of the depths, *D*(X ∗*b i* ). The final value *C* = LCL is the quantile *β* of the values *<sup>C</sup>b*, with *b* = 1, . . . , *B*.

Moreover, once the atypical curves are detected, they are removed and the procedure is repeated until the process becomes stable (under control), namely, defined by a total absence of atypical data.

*2.2. Procedure for Building a Control Chart for Process Monitoring (Phase II)*

Phase II deals with process monitoring; it involves quick detection of changes from the calibrated sample stabilized in Phase I [**?** ]. For the scalar and multivariate cases, the process is monitored by taking the estimated control limits in Phase I [**?** ] as a reference. In this phase, the average run length (ARL) is used to evaluate the performance of the control charts [**?** ].

In this context, we test if there are deviations between the data obtained in Phase II, also called monitoring sample, {X*n*+<sup>1</sup>(*t*), <sup>X</sup>*n*+<sup>2</sup>(*t*), ... , <sup>X</sup>*m*(*t*)} and the reference data {X1(*t*), <sup>X</sup>2(*t*), ... , <sup>X</sup>*n*(*t*)} or calibration sample, taking into account its distribution.

In Phase II, in the univariate case, an *F* distribution for the under-control process is estimated from a calibration sample or reference data. It is assumed that *F* is the distribution of the CTQ variable of an under-control process (Phase I). This distribution is used to establish control limits that will be used to monitor the process in Phase II. The limits comprise an interval that will cover new observations of the process with a high probability, assuming that the process is under control. In Phase II, a sample of the *G* distribution is monitored. Therefore, in this stage, the methods for constructing control charts are based on contrasting the hypothesis:

$$\begin{array}{lll} H\_0: & F = G \\ H\_1: & F \neq G. \end{array} \tag{2}$$

In the FDA context, we do not have a density function for a functional random variable X that allows us to perform different tests corresponding to Phases I and II. Alternatively, we can estimate the distribution of the depth corresponding to the curves that belong to a sample of functional data. Thus, for Phase II, the use of the rank control charts [**?** ] is proposed in an FDA context. The calculation of depths for functional data is proposed, which facilitate the calculation of ranks. These form the basis for developing *r* control charts, called rank charts.

The adaptation of *r* control charts involves the calculation of the rank statistic from functional depth measurements. The *r* chart plots the rank statistic as a function of time. The central control line CL = 0.5 serves as a reference point for observing possible patterns or trends. The lower limit is LCL = *α*, where *α* is the false alarm rate.

For the practical application of this proposal, the qcr [**?** ] and fda.usc [**?** ] R packages are used. The fda.usc package provides tools for the calculation of functional data depth, while the rank control chart, among other nonparametric charts proposed in Reference [**?** ], is applied using the qcr package, which was developed by the authors.

As mentioned above, in Phase II, the curves corresponding to the calibration sample of Phase I, {X1(*t*), <sup>X</sup>2(*t*), ... , <sup>X</sup>*n*(*t*)}, are used for detecting changes or deviations with respect to the behavior of the process described in Phase I. The curves of the monitoring sample, {X*n*+<sup>1</sup>(*t*), <sup>X</sup>*n*+<sup>2</sup>(*t*), ... , <sup>X</sup>*n*+*m*(*t*)}, are collected; additionally, we test the hypothesis of each new curve belonging to the same distribution that corresponds to the calibration sample.

The procedure for estimating control charts for Phase II follows the same scheme presented in Reference [**?** ]. Particularly, we assume that the rank statistic follows a uniform asymptotic distribution. This result is applicable to the functional case because of the way the rank corresponding to each observation is calculated (percentage of less deep curves than the observed ones). This fact provides a computational advantage in the monitoring of continuous processes since it eliminates the need to estimate the LCL. However, it is set as the quantile of a uniform distribution at a significance level *α*.

The procedure to develop the rank control chart for the functional univariate case (process defined by one functional variable) is detailed below, which can be easily generalized to the functional multivariate case (process defined by more than one functional variable).


$$r\_G(\mathcal{X}) = \frac{\#\{\mathcal{X}\_i | D(\mathcal{X}\_i) \le D(\mathcal{X}), i = 1, \dots, n\}}{n}$$

.


### **3. Data Collection: Case Study of HVAC Installations in Commercial Areas**

Here, the case study of HVAC installations' control are considered for a clothing store of a commercial area in the Panama City [**?** ]. The data stream has been obtained by using the Σqus energy web platform. Sixteen CTQ variables are measured taking into account their ability to provide information about the energy efficiency, air quality and the thermal comfort of the store environment—indoor temperatures, overall energy consumption, HVAC energy consumption, CO2 content in the air (ppm), relative humidity (%), temperatures of impulsion and return temperatures of the chillers in different areas of the store (see Figure **??**).

**Figure 1.** Plan of the case study store located in the Panama City.

Hourly measurements are obtained from 1 August 2017 to 31 October 2018. The operations of the HVAC facilities of the store start at 9:00 a.m. or 10:00 a.m. At start-up, the energy consumption peaks due to the characteristics of the HVAC installation. From 12:00 p.m., the consumption remains relatively constant the store closure at 20:00 p.m., 21:00 p.m. or 22:00 p.m. The shutdown takes about 1 or 2 h, with consumption falling at a constant rate of change. The resulting data can be considered functional data and thus FDA techniques can be applied. It is also important to note that this case study is a controlled study in which the anomalies and their assignable causes have been previously detected for the maintenance staff.

The data were obtained in the framework of a controlled environment where anomalies were identified by the maintenance staff. They are briefly described as follows:


It is important to note that only working days have been studied in this work to evaluate the performance of the proposed control charts.

### **4. Application to Real Data**

This section shows the usefulness and performance of the new graphical methodology for quality control using functional data, which is evaluated in the case study on the detection of energy efficiency anomalies of an HVAC installation. Specifically, the case study considers a commercial area of a well-known Galician apparel brand located in the Panama City. In this controlled case study, the anomalies and their assignable causes were detected by the maintenance personnel.

The following section shows the need to develop and apply FDA methodologies for control charts, considering the observations of the data of the present case study, particularly those corresponding to August. As mentioned above, in August, no event destabilized the process due to assignable causes. However, by using a methodology for scalar data (ignoring the autocorrelation between the variables), an unacceptable number of false alarms could be detected.

In the scalar case, boxplot [**?** ] is commonly used to detect anomalous or atypical data. Figure **??** shows a traditional scalar approach for detecting outliers using boxplot. The left panel shows the boxplot for each variable of energy consumption in HVAC systems per hour, while the right panel shows the curves of daily energy consumption in HVAC systems, highlighting curves detected as outliers by the descriptive procedure based on the application of boxplots to each hourly consumption. In the usual procedure, atypical curves are those in which at least one point has been detected as an outlier in some boxplot; however, the drawback of this approach is that it increases the probability of type I error. It detects 12 daily energy consumption curves as outliers.

**Figure 2.** Detecting outliers in the heating, ventilation and air conditioning (HVAC) energy consumption by developing a boxplot for energy consumption per h.

Based on the information described in the previous section, first, we apply the data depth control chart for Phase I and, subsequently, the rank control chart to monitor the process during Phase II. The application of these two statistical techniques, together with the contribution of an intuitive graphic tool (to facilitate the detection of assignable causes for the anomalies), constitutes the new proposed procedure of control charts for functional data. Generally, the procedure can be summarized as follows:


In Figure **??**, the black curves correspond to August (23 curves), whereas the gray curves account for the HVAC energy consumption in September (21 curves). The days from Monday to Friday are used in this study, taking into account that the work schedule is different on Saturday and Sunday. In September, the actual anomalies were detected (see Figure **??**, curves in red) using the Phase I control chart based on the functional data depth outlier detection methods. Particularly, the anomalies corresponding to 11, 21, 22, 27 and 29 September were identified (refer to Section **??** for more information on the assignable causes of these anomalies).
