**1. Introduction**

Industry 4.0 is taking its course and continuously raising new challenges to classical process operation and managemen<sup>t</sup> functions. One pillar of any production system is related to guaranteeing the stability, consistency and predictability ofF industrial processes. This was and it will always be a major concern for industry. Juran considered it as a fundamental function for implementing any quality managemen<sup>t</sup> systems (the Juran trilogy of Planning, Control and Improvement), and it is an intrinsic part of every Quality standard, deeply embedded in the celebrated ISO 9000 series. Traditionally, this function is operationalized in the shop floor using variation managemen<sup>t</sup> and reduction methodologies such as Statistical Process Monitoring (SPM), Engineering Process Control (EPC), or combined versions of them [1–4], as well as other approaches like error-proof systems (Poka Yoke). However, the way SPM is implemented in industry is changing, as a result of, among other possible drivers: (i) increasing complexity of the processes/products under monitoring and (ii) the new challenges imposed by data collected from them.

Regarding the first category of drivers, current processes under monitoring can present characteristics (almost) absent from the scenarios where, during over 80 years, most SPM technology was developed, such as: they can present stationary dynamics (autocorrelation) [5–10] or non-stationary dynamics (such as discontinuous or batch processes) [11–14]; multiple set-points and operations modes [15–17]; processes have a complex network structure and are composed of many sub-process in series or parallel [18–22]; processes are intrinsically multiscale in space and time [23–27].

The second category of drivers are connected to the new data-intensive environments industry is currently immersed in. SPM is a data-driven methodology, as its implementation strongly relies on the analysis of historical data (Phase 1 analysis) and data collected during operation (Phase 2 analysis) [28]. The vast majority of SPM methods were developed to handle scalar (0th order tensor) sensor like data, either univariately [29–31] or multivariately [32–35]. More recently, profile monitoring emerged as a new SPM branch dedicated to functional relationships [36–38] or higher-order tensorial data structures such as: near-infrared (NIR) spectra [39–41], surface profilometry [37], grey-level images [42], colour and hyperspectral images [43–47], hyphenated instruments [48] among others, expanding the SPM domain to new processes/products.

The one aspect shared by all methods proposed in the past and the fundamental premise established since the seminal work of Walter A. Shewhart [31], is that all common cause variation must be present in the reference historical Normal Operating Conditions (NOC) dataset collected to conduct Phase 1 analysis, i.e., in order to assess process stability and establish the control limits for the monitoring statistics. However, this fundamental premise does not hold in many processes, such as the assembling processes in the microelectronics industry using Surface Mount Technology (SMT), which is the focus of our paper.

To make our presentation more objective and clearer, in the following sub-sections we will describe our problem and provide and introductory sketch of the proposed solution.

#### *1.1. Problem Statement: Process Monitoring of Surface Mount Technology (SMT) Production Lines*

The assembly process of complex electronic devices involves placing, fixing and functionalizing electronic components on Printed Circuit Boards (PCB). This is done through the preliminary deposition of solder past deposits (SPD) in specific positions, with a well-defined target shape and volume. This part of the process is critical for quality, as any defect or misplacement may result in the loss of function of the module and eventually the entire device. Therefore, SPDs should be monitored immediately after being placed, to avoid the well-known consequence of the accumulation of costs as any fault is detected later on in the process (usually the cost rises by roughly a factor of 10 as one moves from one stage to the next without detecting a potential problem, leading to the progression of \$1:\$10:\$100: ... , in the costs due to poor quality of the assembly process). In the present case study, data arises from a modern production line equipped with Surface Mount Technology (SMT) that performs 100% inspection of all paste deposits for each PCB produced (more details about the process are provided in Section 3), which implies the simultaneous analysis of several thousands of SPDs. Handling such a large number of variables for process monitoring raises several important challenges to traditional statistical process monitoring approaches, as addressed elsewhere [49], but the fundamental issue that was not considered so far is indeed the poor coverage of common cause variation in the reference dataset (even when composed of what is usually considered a sufficiently high number of samples).

For a better understanding of the problem under analysis, Figure 1a,b represent the scores for the first three principal components (PC1, PC2 and PC3) for several datasets collected for the production of the same product, all of them regarding NOC conditions. Given the large number of variables involved, we opt to present just these three scores, that represent a significant portion of the overall variability (from the properties of Principal Component Analysis, they are the three linear combinations with maximum explanation power of the variability presented by the original variables; see [50,51]) and are enough to establish the picture we want to convey at this point. Each point concerns a PCB, and these points, as stated above, arise from the reference dataset (CS1) and other two datasets collected

afterwards at di fferent periods, also regarding normal operating conditions. All observations should therefore be considered "normal", as confirmed by process experts. Conducting a Phase 1 analysis of the reference NOC dataset (CS1) it is possible to conclude that the process is stable. This can be confirmed by analyzing Figure 1c, where the two multivariate monitoring statistics fall in general within the region limited by the Phase 1 control limits (more details on these monitoring statistics are presented in Section 4). This historic reference dataset can therefore be used to setting up the multivariate control limits from conducting the monitoring activity of future incoming PCBs. However, from the plot in Figure 1a,b, it is clearly visible that many alarms are expected to be issued when applying these control limits to data from future PCBs. This can actually be confirmed by analyzing Figure 1d, where the SPM charts developed using CS1 were applied to dataset CS3, resulting in sustained alarms being issued by the two control charts. Therefore, implementing any SPM methodology based on the information strictly inferred from the historical dataset will inevitably result in frequently signaling as faulty PCBs that are perfectly good, making this activity, as it is currently conducted, of very limited value.

**Figure 1.** Projection of data from several production periods (designated as CS1, CS2 and CS3) on the principal components space estimated using CS1 as the reference dataset: (**a**) Scores plot of PC1 vs. PC2; (**b**) Scores plot of PC1 vs. PC3. The 99% confidence ellipses for the scores of CS1 are also represented; (**c**) MSPM-PCA monitoring statistics (Hotelling's *T2* of the scores and the *Q* or *SPE* statistic of the residuals; see Section 4) for CS1, using CS1 as the reference dataset; (**d**) MSPM-PCA monitoring statistics for CS3, using CS1 as the reference dataset.

The fundamental reason causing the situation described above is the limited information about process variability that can be extracted from a dataset representing a stable period of NOC operation—the reference dataset does not reflect the whole of common cause variation sources, but just a small part of it.

Fortunately, the reference NOC dataset is not the only source of information about common cause variation available.

In fact, the engineering team has been accumulating knowledge over time about the "common" causes of variability that lead to such false alarms, by analyzing case by case what is causing them. The root causes regard aspects that are very common and perfectly normal and expected in process operations, such as slight changes in the settings from di fferent lots, automatic corrections and adjustments introduced by the assembly units, "normal" (or acceptable) tridimensional deformations in the boards fed to the process, rigid body movements (rotations, translations) of the boards as they move in the line, and other known specificities of the paste deposition tools. These aspects of variability are known and expected to happen over a long time frame, but not all of them will be, for sure, present in the initial stages of the process, or in any other single isolated period in the future.

Furthermore, the extensive process knowledge accumulated over time enables not only the identification of which phenomena may cause the variation patterns presented in Figure 1, but also to know their magnitudes. The detailed analysis of past production runs where they took place, does provide information on the amount of variation associated with them. Therefore, besides the reference dataset, there is also information available about the *long term structural components of common cause variability* (at least for the dominant modes of common cause variability), and their stochastic behavior (of which the reference dataset is a particular realization).

Bringing this extra engineering knowledge of common structural cause variation is fundamental to address the present problem rooted in the unavoidable underrepresentation of common cause variation in the reference dataset.

#### *1.2. Proposed Methodology: Artificial Generation of (Common Cause) Variability (AGV)*

In this work we introduce a Data Augmentation approach for enriching the reference dataset with structural common cause variation sources. Data is generated by conducting stochastic computational simulation of process behavior using rigorous mechanistic models for the dominant structural modes of common cause variation, whose conditions and parameters are described probabilistically based on data collected from industrial runs for other related assembly processes (dispersion) whereas the targets and settings are those of the current process (central tendency). The simulations will generate patterns of variation in the measurements respecting the physical constrains of the systems/products, leading to the same natural long term and short term correlations patterns found in real process data.

Ultimately, the proposed methodology is even able to create a frame of reference for starting the monitoring right from the onset of the production, using the variation patterns extracted from previous runs with other products (as shown in Section 5). We call it *conditionally expected common cause variation*, i.e., the expected variability under normal operating conditions, *conditioned* on all the past data and knowledge extracted over the years from data regarding the production of related products.

Together with the solution for the unwelcomed false alarms, the proposed methodology also brings new diagnostic tools: the variation from the di fferent simulated phenomena is usually well captured by specific principal components. Therefore, once a potential fault is detected the analysis of the scores my reveal a possible mechanism underlying its origin (the one connected to the principal component where the deviation is more noticeable). Other causes can be explored by other diagnostic tools, such as residual analysis and contribution plots.

The aforementioned concept of *conditionally expected common cause variation* di ffers from Shewhart's original perspective, but is still inspired in it. It requires and builds up on extensive knowledge about the process physics and accumulated data. As Industry 4.0 takes place, the evolution of Digital Twining technology is likely to create similar opportunities to use accurate models of the process, namely for process monitoring as proposed in this article.

The present article is organized as follows. In Section 2, a brief description of the process and datasets to be analyzed is provided. Then, in Section 3, we introduce in detail the proposed data augmentation methodology (Artificial Generation of common cause Variability, AGV). The monitoring scheme based on the augmented NOC dataset is described in Section 4. The results from the application of the proposed data augmentation methodology for process monitoring to several real world industrial dataset are presented and discussed in Section 5. Section 6 further extends the discussion of results and their consequences. Finally, in Section 7 we provide a brief summary of the main advantages and limitations of the proposed methodology and refer to future work to address these limitations and make the approach more sensitive to localized faults a ffecting a relatively small number of elements.
