Following the framework development, an application is demonstrated on data from the brownstock washing department of a dissolving pulp mill in order to put this architecture together and interpret data for decision making.
Measurements taken in industry contains several types of errors that can be grouped under three main categories: abnormalities, random errors, and gross errors. These errors are corrected in a series of five steps. First, given the fact that data are always processed according to a specific and targeted goal, the objective of the analysis (the scope) is defined, and data are collected (step 1). Then, data are processed, i.e., data undergo a synchronization step, a lag correction, the abnormal operations are removed, and data are filtered to remove unwanted high frequency components (step 2). The third step is to detect the steady-state data series that will allow the data reconciliation step (step 4). In step #5, the various operating regimes are detected and identified. This framework is limited to offline operating data (as opposed to real-time online process data) as they are an important source of process insight.
The proposed framework focuses on data that are representative of normal and desirable operating conditions. Abnormal operations, fault detection and diagnosis are critical, but are not the subject under advisement in this paper. Therefore, shutdown and startup periods as well as outliers are removed, and steady-state periods are detected.
Lastly, all steps have their own internal feedback loop. For instance, in the data reconciliation step, the moment one gross error is eliminated, another may be found; a practical way to perform data reconciliation is to remove gross errors one at the time.
4.4. Step 4: Unit-Wide Steady-State Data Reconciliation
Following SSD, the remaining time-series data are reconciled. Data reconciliation (DR) consist of identifying measurement systematic (gross) error—inconsistencies with respect to known conservation laws—and more specifically persistent gross error. This is done by fitting the data to the material and energy balances. This analysis does indicate the presence of systematic error, but does not specify which instruments (sensors) are wrong—and whether or not there are more than one sensor in fault.
The primary purpose of applying DR is to improve industrial plant data considering the inherent uncertainty and complexity in the process measurement systems. This is done by making sure they satisfy all material, energy, and momentum equality constraints or balances and any other inequality constraints or bounds that may be justifiably included. Therefore, DR leads to the identification/diagnosis of defective, faulty and/or inconsistent measuring instrument. Furthermore, DR provide estimates of unmeasured quantities and qualities to be used, for instance, in daily, weekly, or monthly process, production and/or yield accounting reports. When substituting the raw measured values by the reconciled and estimated (regressed) unmeasured variables, all balances equal zero.
The process of data reconciliation aims to align measured process data with the principles of conservation of matter. By solely relying on these conservation laws, gross errors can be identified. However, if these laws do not reveal any errors, it does not mean that the data are necessarily free of gross errors. More complex models, incorporating transport phenomena, fluid mechanics, and reaction kinetics details, may uncover additional errors. Material, energy, and momentum balances are typically just a subset of the available constraints, and adding more constraints may uncover errors that were not detected using only these three [
75].
Prior to running the DR algorithm, the process variables are categorized as measured, unmeasured, or fixed (constant in the process). After running the solver, measured variables can be indicated as redundant or not whereas unmeasured variables could be marked as observable or not. This classification highlights which variables can be reconciled (redundant variables), which ones can be estimated (observables) and, finally, those whose reconciliation is not possible and whose accuracy remains unchanged (non-redundant). A measurement is redundant when, if it is removed (marked as unmeasured), that value/variable is still observable. On the other hand, an observable variable is one that is uniquely calculated from the model and measurements. Even if data reconciliation can only take place if there is redundancy (the data reconciliation algorithm still works even if the DOF is null; however, all the sensors become non-redundant) in the system, i.e., there exists more constraints or model equations than unmeasured variables, observability is more important. Among unmeasured variables, defining which may be calculated (which one are solvable) is required. Indeed, unobservable variables are non-unique, not reliable and their values are arbitrary in the sense that they only satisfy the model constraints numerically.
If there are unobservable variables in the system, there are three approaches to remove the unobservabilities: (1) add new sensors which requires capital investments, (2) add more equations if possible and (3) reconfigure and simplify the model to remove the unobservable variables, i.e., the model perhaps can be too granular.
Employing the framework may lead to the recommendation of implementing additional sensors to increase redundancy. The sensor network can be redesigned using the data reconciliation concept. That could be achieved either by using an optimization algorithm that would maximize observability, redundancy, and precision or manually with a simulation, adding sensors one at the time, then running data reconciliation, and assess if observability, redundancy, and precision were improved. That would maximize the reliability of decision making based on data interpretation.
Following variables classification, a data reconciliation model based on physical models (thermodynamic, material and energy balance models) is built to identify inconsistencies. This model represents the flowsheet of the plant or sub-plant, it is a subset of the equations used in a process simulation. This step requires a profound comprehension of the process. DR model relates to the fundamental laws of material, energy, and momentum conservation, and it is also dependent of the processing operations (operating modes, grades, equipment set-ups, start-ups, shutdowns, and switchovers as well as recycle/recirculation loops via bypassing, repetitive cleaning and purging, etc.). From a DR perspective, these variations in the processing operations may be mathematically expressed using the notion of temporary stream variables (versus permanent) with zero/one, open/closed, on/off or active/inactive stream switches. Temporary streams can be switched on or off depending on the state of the process. Usually, industrial plant operations know a priori when mode, grade and/or any other type of processing logic changes will occur via their planning, scheduling, and coordinating department. As such, before, during or after the changeover occurs, manual and/or automatic indicators such as valve actuator positions, key process conditions, and pump/compressor starts/stops will be available to aid in the determination of when and which stream switches changed. The DR model represents all possible streams, both permanent and temporary. Most of the streams are permanent in industrial processes—these are represented as solid lines in
Figure 7. Nevertheless, when the process conditions change (going into another operating mode), there are streams switches or temporary streams (represented in dotted lines in
Figure 7). Thus, when a process switches to a different operating mode, some flows are no longer required whereas others may be opened just for the time being. Therefore, handling those stream-switch configurations can reliably indicate a certain mode, grade or changeover that occurred and thus reducing the model gross errors possibility, plausibility, and probability of occurrence.
Next, the model is validated, i.e., making sure that there are no model errors. Moreover, it should be validated for each “top-down” operating regime. Indeed, in order to properly validate the model, the operating regimes should be acknowledged since for every operating mode, the operations change, there might be different streams (temporary streams). Therefore, if the operating mode changes, then the model could be wrong if it is not changed properly. Hence, the temporary streams due to operating-mode or grade switches are properly accounted for in the model structure.
Before validating the model, both model and measurement gross error are to be expected in the process data sets. However, once it is done, data are only left with measurement gross error; there should not be any more modelling errors. Moreover, when the process is unsteady, there will be model gross error because the mass balance equation will not equal zero. Therefore, in order to only identify measurements gross error, unsteady periods are excluded from the analysis, i.e., this is one of the reason SSD is required.
As part of the model validation concept, a preliminary validation stage could be performed if a simulation of the plant process (or some process units) is available. The simulated values would be regressed against the DR model. If no gross errors are present, then a proper (or the actual) validation is performed based on real process data. Afterwards, gross error detection (GED) might take place.
All three concepts (pre-validation, validation, and GED) can use initial values or starting guesses from the variables from a simulation base case. These may be used every time the DR is run, as they “prime” the DR solver with default results. The initial values are used for a warm start, they can influence the solution since it is a non-convex problem (i.e., multiple local optima exists), convergence to the global optimum cannot be guaranteed. Hence, the starting guesses may find a different local solution. Lastly, these values are only used as default results and are updated during the reconciliation solving.
Once the process model is validated, it is used to analyze typically averaged or aggregated steady-state industrial process data. All the time-windows that are found to be statistically steady are reconciled. The steady-state data reconciliation (SSDR) is executed for the number of spans detected; it is computed based on the average value of each variable over the multiple time steps declared to be steady in the span. The SSDR algorithm is detailed in
Appendix B.
4.5. Step 5: Operating Regimes Detection and Identification
The fifth step of the proposed framework is to translate processed steady-state data into operating regimes. This gives a snapshot of the plant and, therefore, the idea is to perform a multivariate data analysis to detect what is the operating regime when the plant is operating that specific way. In order to classify the snapshots, i.e., to identify the operating regime, independent key process variables (variables of interests that are important in a process unit) that distinguish the operating regimes are identified by process experts.
Starting from a “top-down” perspective, the steady-state processed data are used to build a principal component analysis model. PCA exploits historical process data and discovers hidden phenomena that may be useful for detecting unexpected/novel or unknown operating conditions. The principal component analysis allows to represent process variability and identify and understand significant correlations (between variables) that are inherent to the plant operation. It groups together variables with similar characteristics into clusters unmindful of the link between consecutive time steps. Standard PCA ignores dynamicity (it is not a time-series technique), it is treating all data points (time increments) equivalently, regardless of how far apart they are timewise [
76].
Then, the principal components scores are used as input variables in a
k-means clustering algorithm [
41,
42,
43,
44]. Only the components that explain an important proportion (90% for example) of the variation among the operation variables are selected. Based on the result of the principal component analysis—starting with the number of clusters visible on the score or loading chart—the number of clusters detected by the
k-means algorithm is changed incrementally. It is widely recognized that trial-and-error or multiple random number of clusters to retain are tested [
42]. The
k-means algorithm is highly performant, it can be used with large datasets. This unsupervised clustering algorithm offers many insights, is simple to implement and the clusters it returns are effortlessly interpretable and visualizable. Lastly, a priori subject matter knowledge can be used to set the number of clusters [
77].
These clustered data are assessed by process experts to highlight what makes a cluster different from another and assess whether they all represent distinct operating regimes. The cluster analysis gives process insight, i.e., existing regimes, actions took by operators, consequence of actions, root cause analysis, cause–effect relationship, etc. This insight is useful to make the process more effective, economical, efficient, helps understanding the process, phenomenon, deviation, and improves decision making. This clustering analysis is a precursor of decision making; subsequent analysis (cost analysis, environmental analysis, energy analysis, etc.) is required in to act on the process.
Additionally, the contribution analysis and the loading chart [
78] are employed to identify which variables characterize each cluster and explain the variability between steady-state operating regimes; hence, which variables are important for a particular regime, i.e., which variables are affected by the different clusters. Lastly, the experts confirm the operating regimes and explain the fundamental drivers behind them, i.e., the variable of interest helps identifying the clusters.
These “bottom-up” operating regimes based on a data-driven approach as well as process knowledge offer the opportunity to interpret and understand the process in more depth. Those regimes could be linked with the way the plant is run by operators since they might operate the plant differently when producing the same product.
The identified operating regimes should be mutually exclusive. Hence, some variables specific values must uniquely be attributable to one operating regime. In other words, there is a unique set of parameters values to distinguish each regime. All operating regimes are unique, and they cover all the possibilities for characterization variables (attributes).
Operating regime detection and identification allows improving process knowledge as analyzing each cluster may lead to discovering unknown operation mode. On the other hand, process knowledge is required to interpret the patterns extracted from the principal component analysis and to achieve an accurate representation of the operating regimes in complex processes. The PCA results interpretation must be performed by process experts, and they should be cautious when doing so as the PCA blindly finds correlations; however, the physical reality of the process plays no part in generating the statistical outputs [
76]. Therefore, PCA results should be interpreted based on an understanding of the process fundamentals. This is a task for process experts; the “bottom-up” operating regimes are detected and identified by merging data-driven approach and process knowledge.
Performing all the previously mentioned data management steps on the raw process data—including abnormal operations removal and noise reduction—maximizes the usefulness and truthfulness of multivariate analysis as a statistical analysis is only as good as the raw data and pre-treated data represent the process more accurately [
76]. Multivariate analysis (such as PCA) are entirely data-driven techniques, and thus highly susceptible to the issue of “garbage in, garbage out”. They are sensitive to outliers and instrument drift; the latter can appear as a long-term trend to which multivariate algorithm could blindly ascribe statistical significance [
76].
Noise in process data represents a different problem. As each data point (one-hour period for instance) is treated as a separate observation which bears little resemblance to each other before bringing them all to the frequency analysis, some kind of smoothing is therefore required; this filtering improves the multivariate analysis ability to find correlations between variables [
76]. Therefore, a direct use of the raw data would yield meaningless multivariate analysis results, since the algorithm could, for instance, blindly attribute most of the correlation to the start/stop phenomenon and not to actual changes in the process [
76].
To conclude the framework development, operating regimes resulting from the bottom-up analysis are identified through a clustering analysis that is based on the combination of principal component analysis, k-means algorithm, and process knowledge. Some of these regimes are selected according to the scope to solve a management problem (related to process design or operations).
4.6. Application: Brownstock Washing Department in a Dissolving Pulp Mill
An application is demonstrated on data from the brownstock washing of a dissolving pulp mill (
Figure 8) in order to illustrate the benefits of the proposed framework. This unit is isolated by buffer tanks. Those tanks absorb process fluctuations, so events in one system will not impact downstream systems. Therefore, the unit can be marked as independent from the rest of the process. This system was chosen as it is the one showing the highest redundancy (allows data reconciliation analysis); the pulp and paper mills are famously known for their lack of redundancy. The results from a simulation of this department are used to increase redundancy.
The implications of each step of the framework are shown here. This section is divided in accordance with the main parts of the framework; scope definition, signal processing, steady-state detection, unit-wide data reconciliation and operating regime detection and identification. Each step plays a critical role in the whole framework, and together they allow the use of rectified and reconciled (clean) segmented steady-state data for design decision making.
Scope definition
The intended use of the data is to assess how many bottom-up regimes can be detected and identified in the brownstock washing (BSW) through the use of the framework when the process is running smoothly (in steady-state). The top-down regime considered is the summertime when a specific grade is being produced. Brownstock washing data that fit this context is collected.
Signal processing
In this case, data synchronization and imputation are not required as all data points are sampled at the same rate, i.e., a sampling interval or time step of 10 min (process experts confirmed that a 10-min sampling interval is adequate for this SSD). A sampling interval that reduces autocorrelation in the time-series process data is used. Additionally, as the pulp crosses the whole unit in a few minutes (the overall residence time of the system is around 10 min), there is no need for intra-unit lag correction. This is confirmed by using the lag correction analysis offered in EXPLORE (version 2.2.0.814); a straight line means that there is no lag detected (
Figure 9). If a lag was present, the graphs would have shown some waves and a cycle.
From there, start up and shutdown periods are removed. To do so, three independent flow variables that describe the process are used as reference to detect those periods. The latter are removed by using threshold values specified by a process expert. A two- and four-hour time interval are removed, respectively, before and after each upset periods to account for the time required to go into shutdown and restart the system. These time intervals are based on the time required for the thickener consistency measurement to restabilize itself after an upset. Lastly, the removal of process shutdown and start up from reference variables is mirrored onto the other variables used as part of the analysis.
Then, outliers are removed using quartile analysis as well as the principal component analysis (
Figure 10). Similarly as what is done for the shutdown and start up periods, key process flow variables are used to detect outliers. This removal is reflected in all the other variables. Having almost 20,000 data points, outliers are not corrected, but rather attributed a NNON value.
Last of all, the signals are filtered using wavelet transform (
Figure 11). Given that the quality of wavelet transform denoising relies on the optimal configuration of its control parameters, i.e., signal cut-off level (scale), wavelet function and threshold parameters, the selection is done through the use of process knowledge in order to retain only the trend associated with the operation of the process.
Steady-state detection
This section describes an industrial implementation of the proposed SSD algorithm. The latter requires a data-vector of time-series data discretized into a time-ordered sequence of uniformly distributed timestep of equal durations with the oldest data point referenced as timestep one (1). The SSD algorithm also necessitates the tuning of the time-window, threshold, standard deviation of the noise (if desired) and cut-off probability; these tuning parameters are part of the configuration of the algorithm. This implies that some prior knowledge and understanding of the process is necessary.
It is not uncommon for experts at the mill (or in plants in general) to be overoptimistic on how well the process unit performs and mentioned how the process runs for many consecutive days in steady-state, possibly overestimating the stability of the plant. There are mainly two factors that may explain this. It might come from the fact that their steady-state analysis is most of the time qualitative. This is what they aim, expect, and wish for, although without performing a quantitative data-driven analysis, there is no way to confirm that the process is indeed at steady-state. On the other hand, detection of steady-state periods might also be a visually subjective process expert decision. However, this visual steady-state recognition approach requires continual human attention, and it is subject to human error. More specifically, noisy measurements, slow process changes, multiple dynamic trends, and change-of-shift timing are features that may compromise the visual interpretation of data. The statistically-based SSD standardizes the procedure.
In the context of this application, the monitoring horizon is five months in the summertime whereas the time-window is four hours. The latter should have a number of samples equivalent to three (3) to five (5) times the time constant of the overall process divided by the sampling interval; it should be long enough for the variable to reach steady-state or equilibrium. Here, experts stated that it corresponds to the average time required to reach steady-state in the process. Since the time to reach steady-state varies with operating conditions [
73], an average provided by a process expert is used. Too short of a time-window will not give sufficient time to the process to reach some level of stability, thus the steady-state probability will always be low. On the other hand, a too long time-window may lead to the false conclusion that the signal is steady when in fact it is not. Additionally, long windows are not well suited for detecting unsteady behavior with short duration; it is harder to detect. Therefore, the time-window must be long enough to account for the system dynamics, but short enough to detect undesirable changes in the process value that has short duration.
Additionally, since different variables present different dynamics, a slow process should consider a longer time-window, while if the dynamics are fast, the time-window duration should be shorter. Therefore, the time-window is the parameter that can account for both the sampling interval as well as the process dynamics. Furthermore, it is possible for every KPV to have its own time-window size [
79].
The threshold is set at 95% confidence interval (Student-t score). This parameter is related to the importance of a period to be labeled properly; for a critical application, a higher α should be used. Therefore, if the steady-state detection is not extremely critical for a particular application, a 5% probability of incorrectly rejecting steady-state is accepted.
Then, even if the standard deviation of the noise could be assessed by Equation (A5), the latter is provided for all key process variables (KPV) targeted to detect steady-state periods. As the standard deviation of the noise value increases, the maximum steady-state span increases as well—more data are deemed at steady-state, the spans get longer.
The KPV selected by a process expert represents well the process and the operations undertaken by the mill, they have a significant impact on the latter and they should have the least autocorrelation as possible. Experts decide what is the minimum number of KPV required to detect whether the process is at steady-state, and then set up the procedure on that minimal set. For this application, a process expert at the mill mentioned that the best indicator to know if the process is stable is when three distinct pulp flows meet the steady-state conditions at the same time.
The SSD algorithm returns a probability that the data-vector is steady with a probability near one (1) and is unsteady with a probability near zero (0). The determination of the probability limit which indicates a steady or unsteady signal is the responsibility of the user. A suitable cut-off value of whether the process is deemed to be at steady-state depends on the application. In this case, the cut-off probability is set to 95% (probability of the time-window being steady) for each KPV. Once the tuning parameters are established, the SSD function is called for every time-window until the end of the monitoring horizon.
This approach, which uses a combination of good judgment, knowledge and interpretation of process operations, and statistics, has two different modes. The first one runs every time step (sampling interval) continuously (every 10 min in this case)—it incrementally moves over the smallest time step. Therefore, every 10 min, the algorithm is looking back over the time-window duration, and it declares if the process is steady or unsteady for the past 4 h. The second mode is the batch one where the algorithm runs every 4 h, still looking back, declares the whole time-window steady or unsteady, and then moves forward to another time-window to conclude on its steadiness. However, in both cases, the time-window duration is the minimal threshold of time for which the process can be deemed steady or not, and for which the steady-state duration is judged representative; the algorithm is not going to find anything less than the time-window. In other words, every time the process is declared to be in SS, it lasted for the past time-window. As part of this proposed framework, the first mode is preferred.
In order to label a period as steady, all the key process variables must be steady at the same time, i.e., all KPV have a probability higher than 95%. It would also be possible to label a period as being in steady-state if specified fractions of the KPV are in steady-state at the same time. In this case, since there are only three of those, all three must meet the probability threshold.
As a result, the algorithm evaluated that out of the five (5) months monitoring horizon, the process is in steady-state 32% of the time. However, a major challenge in this case is the constant starting and stopping of production lines and pieces of equipment. Consequently, knowing that there are sporadic and persistent steady-state periods, a run length of a minimum of 12 h is targeted to deem the period as persistently steady; when the process runs reasonably steady for a while, these operating data might be used as a basis for decision making. The run length is the amount of time or samples required to be confident in declaring that the process is persistently steady. In the algorithm, a second routine identifies the contiguous sets of steady data. This routine is a data function that determines the statistics of the contiguous span of when the process is steady; a start, stop and span range are assessed as well as the maximum and minimum number of data points in a span, the span mean, mode, median and variance. The following table (
Table 2) presents information about the contiguous steady-state regimes collection.
However, considering only when the process unit is contiguously steady for longer than the run length, a total of 20 spans are found. Once all steady-state periods are detected, they all individually and separately go through the process of data reconciliation to determine if there are measurement gross errors—each one of the contiguous steady-state set respecting the run length becomes a dataset for data reconciliation.
Unit-wide data reconciliation for various datasets
As part of this framework, the data reconciliation problem is modeled from a mathematical programming perspective in opposition to a matrix algebra perspective. From this standpoint, every stream variable has a reconciled value and an adjustment (revision) value. In order to perform data reconciliation, a process model is built and validated. This model consists of a set of equations (See
Supplementary Material for the model). There are sensor constraints equations, i.e., adjustment + reconciled = measurement, and model constraints, which are all the laws of conservation of material, energy, and momentum, i.e., mixer, splitter, process, and density equations. The process recycling loops are included in the DR model. These generates unbound variables which lead to an infinite number of solutions. Hence, the simulation results are used to evaluate the split fraction and create a hard constraint.
The equations are a mixture of linear and non-linear constraints. This makes the DR problem more difficult to solve, but it is important to represent the process in its most fundamental way to reduce model gross errors. Having non-linear equations makes the problem non-convex; the problem is subject to local solutions, there are local optimum. Thus, the solver is run a few times and each time, it randomizes the initial values (starting value generation). This also helps with convergence issues. IMPL-DATA can solve non-linear data reconciliation problems.
Generally, adding constraints to any kind of mathematical minimization optimization problem will either increase the value of the objective function or it will stay the same if the information is redundant. The only way adding a constraint could reduce the objective function value is if the solution is a local optimal (it converges to a different local optimum).
The model is validated when it is consistent with the simulation data (given some tolerance). Hence, the process simulation is used for the validation of the data reconciliation model. When the DR model is validated, it is possible to say in confidence that from now on, only measurement GE will be identified.
In industry, not every stream has a flow and/or consistency measurement. This application is no exception; measurements are very sparse. There are many unmeasured and unobservable variables and not enough measurements to identify gross errors. Thus, there is not enough redundancy in the mill; there is redundancy overall, but not around some equipment. Therefore, redundancy is created using the simulation results (this is not a common practice in DR). Hence, a sensor is assigned to all streams; the simulation results are considered as measurements—the simulation approximates the real process. By doing so, observability is increased. There is no level of acceptable observability, but IMPL-DATA have a pre-solve observability functionality that can be applied to better determine the observability, to improve the numerical robustness of the observability detection [
80]. The pre-solve algorithm goes through the sparse unmeasured variables incidence matrix and excludes variables that are strongly or guaranteed to be observable in the singleton constraints [
80]. Declaring these as observable shrinks the matrix. A large matrix has a large condition number, therefore making the matrix smaller inertly means a smaller condition number, which makes it more numerically reliable. Therefore, the pre-solve provides a smaller matrix that has less constraints and variables to go ahead with the observability detection analysis.
Additionally, to build a DR model, weights are assigned to measurements. These weights reflect the reliability, precision, and accuracy of measurements. For instance, temperature measurements tend to be more accurate than flow measurements, and more specifically, steam flows are more problematic as they are harder to calibrate and need to be corrected for temperature and pressure to get the flows right. On the other hand, liquid flows are generally fairly good. Then, temperature measurements from thermocouple are generally pretty decent, they do not drift much. Next, level measurements from tanks are, most of the time, not even reconciled as plants probably have those values right since the operators would not let tanks overflow or underflow. Lastly, pressure signals are reasonably good and consistency data have a certain range where the sensor seems to work reasonably well.
In this application, only flow and consistency measurements are being reconciled. Measurements are given a tolerance or precision interval which is translated into a raw variance where its inverse is used as an objective function weight in Equation (A8). As consistency measurements present less variation than flow measurements, and are well controlled, process experts decided to assign a tolerance value of 0.5% to consistency measurements and a tolerance value of 2% to flow measurements. These tolerances are for hard sensors (direct measurements); a tighter tolerance is put when the stream has a physical sensor. However, for values assumed from the simulation (soft sensors), a larger tolerance is considered as there is more uncertainty. Therefore, a suitable tolerance for indirect (soft sensor) consistency measurements is 4% whereas 5% is assumed for flows. Lastly, fixed variables are given a tolerance of 0%.
Following the tolerance analysis, for every variable on all streams, there is something indicating whether it is measured (1) or unmeasured (0). This is the sensor switch. When there are a lot of zeros in the sensor switch, the results start showing unobservability, and some flows present negative values (this is a sign of GE). Therefore, to maximize observability, there is a tradeoff that must be found in labeling variables as unmeasured and getting negative flows. The sensor switch helps in turning off the wrong sensors, either hard or soft; to keep the sensor, a value of one (1) is used. Therefore, even if a sensor is assigned to all streams, a value is assumed only for those having their sensor switch value to one (1).
In addition to the sensor switches, which can indicate whether or not a sensor is good, whether it may be trusted or not, stream switches manage permanent and temporary streams. The latter must be considered to make sure that the reconciliation results make sense. Since processes have different operating regimes, the DR model changes and must therefore allow streams to be turned on and off, i.e., it is parameterized in the sense that streams may be active or inactive. Logic is required to manage whether the streams are active or not. For instance, when a stream has a flow sensor, the latter could be used as a stream switch by setting the sensor value to zero when required.
When plants perform data reconciliation, it is mostly based on mass balance. However, most of the flows in plants are volumetric. Therefore, to convert those to mass, density measurements are required. The latter can be assumed, or calculated given an ad hoc formula, density measurements can come from a simulator, or they could be taken every week or every day at the lab. However, densities are not as good of a measurement as flow or consistency because they are not continuous, they are sampled intermittently. Ultimately, the reliability of the density data is poor primarily because there is little, if any, statistically-driven measurement quality feedback being transferred or relayed back to the engineering, operations, instrumentation, and maintenance departments. Additionally, they could be wrong because they are based on assumptions about the process and operating modes that do not necessarily apply. Given the inherent sparsity in the density measuring system, these densities may be biased (wrong, not accurate), especially if these densities are operating-condition or -mode dependent. Hence, many problems could come from inferring the densities. Performing DR can help remove that bias provided that bad densities are detected, identified, and removed from the data reconciliation problem.
Once all variables are accounted for and the model equations are set, a degree-of-freedom analysis is assessed. A negative value of the degree-of-freedom (number of variables—number of equations or constraints—number of measurements or fixed variables) is expected as the simulation values were used as measurements substitution. This non-linear data reconciliation problem reconciles both volume flow and density simultaneously, involving volume, density, consistency, and mass balances. In the present application, there are 53 sub-units, 107 streams, and 19 flow sensors as well as 4 consistency sensors. The sub-units in the brownstock washing are mixers (18), splitters (9) and processes (26). For mixers, both volume (28) and mass (18) balances or equations are required, splitters count 46 equations, and lastly, process equations have a total of 38 mass balances and 30 volumetric equations. In addition to these, the BSW process unit accounts for 107 density equations. Then, as each stream has a flow, consistency and density value, there are 321 variables. Therefore, the number of variables, equations, and measurements (including those assumed from the simulation) yield a DOF of .
In this analysis, the qualities/intensive (compositions—fractions, consistency, properties—density, molecular weight, and conditions—temperature, pressure, velocity) and quantities/extensive (flow, mass inventory, moles, volume, energy, and momentum) are kept distinct to have different constraints on the individual variables: volumetric flow, consistency, and volumetric flow*density*consistency (to obtain the mass flow). Both variable categories are reconciled simultaneously.
This DR problem only considers material balance (flow in − flow out = 0). However, if there is a hold-up unit in the model (tank, drum, vessel), the hold-up balance (flow in − flow out + opening − closing = 0) could be performed considering measurements of the level (hold-up, inventory). To do so, the opening hold-up is considered as a fixed variable, meaning that it has 0 uncertainty, or its weight is infinity, and then only the closing value is reconciled. Another way to do this would be to reconcile the difference between the opening and the closing.
In summary, when performing data reconciliation, the first step is to find unobservable unmeasured variables. DR would still run with these, but it would calculate numbers that are completely meaningless since they are non-unique. Hence, from a mathematical perspective, there is no issue; however, from an engineering perspective, unobservable unmeasured variables may be problematic. If required, all unmeasured variables can be made observable either by adding assumed values or shrinking the data reconciliation model. Then, after running the algorithm, one must make sure that there is no negative flows greater than constraint and convergence tolerances, because they mean that there are measurement gross errors. In fact, negative flows could mean that their model directions should be reversed. Once all the negative flows have been resolved, then gross error detection may begin.
For this application, data reconciliation is run offline on averaged steady-state data. However, data reconciliation is generally run online on hourly average steady-state data in order to detect the most persistent or sustained sensor with GE. The execution frequency of DR should be based on how many gross error could happen within that time frame. The goal is to run it when there is zero or no more than one. As a matter of fact, as there are sporadic and persistent GE, a monitoring report is available when reaching the end of the reporting horizon (moment when conclusions about DR is made), such as a shift, a week, or a month, to inform process experts about the most problematic sensors. Hence, only persistent GE are recorded or added to the ongoing list; they may as well be ranked. The ratio of the number of acceptances over the duration of the reporting horizon gives a probability of occurrence and indicates what percentage of the time a sensor is persistently faulty. Therefore, deploying online unit-wide data reconciliation may continuously improve the reliability of process data, assure that the sensor networks is functioning with consistency and integrity, and provide the level of assurance required for descriptive, predictive, and prescriptive analytics.
Nevertheless, in this case, DR is run on the averaged steady-state spans of arbitrary duration, but always longer than the run length.
Table 3 presents the objective function value (Equation (A8)) for all spans. According to the total number of DOF and the 95% confidence interval, the Chi-squared statistic is 85.965. The objective function values are all greater than the chi-squared statistic, hence, out of the 20 steady-state span found from SSD, all of them contain gross errors. Colum 3 of
Table 3 gives the worst (biggest) maximum power measurement test values (Equation (A9)) across all variables in each span. Lastly, the next column provides the values of Equation (A10). Since all values are higher than the Chi-squared critical value (with one less DOF), 84.821, there are more than one gross error in each span. Therefore, it is difficult to isolate the most-likely bad sensors or to reliably identifier of the sensors with the gross error. Considering all the measurements in each span with a MPMT value close to the worst one, it was concluded that the top four persistently problematic sensors across the steady-state spans that would need to be verified are three flow meters (06FIC137, 06FIC152, and 06FIC433), and one consistency sensor (06NIC423). This information is transmitted to process experts for them to take a look at the problematic sensors. Knowing which sensors are faulty is important because experts base their analysis (such as optimization) on these data, and until these sensors are fixed, gross errors are present in the datasets.
Operating regimes detection and identification
In general, in the pulp and paper industry, there is little to no acknowledgement of the fact that a process has many different steady-state operating periods. In other words, operating regimes are not explicitly considered for decision making [
8]. However, the value and potential of operating regime detection and identification is well recognized. In the present study, a model based on principal component analysis as well as
k-means algorithm is used to identify the operating regimes of the brownstock washing department of a dissolving pulp mill. In this application, five months of data are used to detect and identify the process operating regimes.
The clusters apparent on the score chart of the PCA (
Figure 12) were confirmed with a
k-means clustering analysis detailed in
Section 4.5. The first and second components separated the observations that are different and gathered the identical observations. The variables that influence these clusters can be observed through a contribution analysis (
Figure 13). Analyzing the results with process experts, it is found that the main drivers for the “bottom-up” operating regimes are the pulp level in tanks, its density, and the shower wash water flow rate. The clusters represent changes in the operating conditions.
Clustered data can be difficult to interpret, and as they are interpreted by process experts, errors can happen. Interpretation errors are part of a continuous improvement process. Experts gain insight through the data processing framework.
Lastly, the loading chart of the first and second components is shown in
Figure 14. This chart identifies which variables characterized each cluster and explains the variability between the different regimes. Variables close to the center of the chart do not have a lot of importance for component 1 and 2 (in this case) whereas variables away from the center and close to either component will be of great importance—further from the center, more influence they have. Lastly, those located diagonally are influenced by both components.
Figure 14 shows for instance that the pulp consistency explains a lot of variability in the first component while the pulp density explains most the variability of the second component.