5.2.2. Nonrepresentative Training Data

The effects of nonrepresentative training data on consistency-based MCS topology design and on redundancy-based design are evaluated. Consistency-based topology is obtained by Algorithm 1 with parameter *α* = 0 as argued in Proposition 4. Its redundancybased counterpart is obtained by Algorithm 2. To ensure highly redundant information sources in fusion nodes, parameter *η* is set to 0.6, i.e., sources are added to a fusion node if their redundancy is greater than or equal to *η*.

Both design approaches are applied to the Sensorless Drive Diagnosis (SDD) dataset [82,83]—a multi-class classification dataset (The SDD dataset is available for download at the University of California Machine Learning Repository [84]). Nonrepresentative training data are simulated by withholding data of certain classes from the design algorithms creating a situation of epistemic uncertainty.

For the creation of the Sensorless Drive Diagnostics data set, an electromechanical drive was monitored to detect faulty system behaviour. Data comprise features obtained from phase-related motor currents and voltages. Each feature serves as an information source in this evaluation. The dataset is particularly interesting because (i) it contains highly noisy data and (ii) data are often linearly or non-linearly correlated and thus potentially redundant. The SDD dataset contains 11 classes in total, of which class 1 represents healthy system behaviour (henceforth referred to as the normal condition). All other classes represent various fault states, such as gear or bearing damage.

The design algorithms are executed on two subsets of the dataset. First, only data belonging to the normal condition build a reduced training dataset. This reduced set manifests epistemic uncertainty. It is nonrepresentative with regard to the complete behaviour of information sources. For comparison, the second subset is constructed to include all data, i.e., the complete dataset serves as training data.

Regarding the preprocessing steps, the unimodal potential function (31) is trained on the normal condition with parameters *D*<sup>l</sup> = 2 and *D*<sup>r</sup> = 2. To regard the noise in the dataset, possibility distributions are modified with reliability parameters ∀*S* ∈ **S** : *rel*(*S*) = 0.9 and *β* = 1. Additionally, memberships are smoothed with a moving average filter using a window size of 5. As the SDD dataset provides data as singular values, the preprocessing steps result in rectangular possibility distributions.

The following behaviour is expected from the topology design approaches, which helps in verifying their output:


The results of Algorithms 1 and 2 are shown in Tables 2 and 3, respectively. Both tables show found fusion nodes for the first layer of the two-layer fusion topology. Fusion nodes are shown both for reduced and complete training data along with redundancy *ρ* (18), range evidence ep (20), and inconsistency evidence ec (19).

The results of Table 2 show that the MCS-based topology meets the expectation regarding fusion node sizes. Furthermore, each set **S**MCS−*<sup>α</sup>* (*k*),complete is a subset of at least one **S**MCS−*<sup>α</sup>* (*k*),reduced, e.g., **<sup>S</sup>**MCS−*<sup>α</sup>* (1),complete <sup>⊂</sup> **<sup>S</sup>**MCS−*<sup>α</sup>* (7),reduced. It is also notable that—especially but not exclusively on reduced data—some sources occur in many fusion nodes.

This relates, for example, to sources 25 and 37. Sources with little informative value are likely to be consistent with other sources because such sources provide possibility distributions, which are likely wide or even close to total ignorance. For sources 25 and 37, it is the case that both provide large possibility distributions covering a significant part of the frame of discernment. Lastly, no fusion node based on complete data is exactly similar to fusion nodes based on reduced data (which is different in the following redundancy-based approach). Fusion nodes differ significantly. This means that nonrepresentative data limits the performance of the consistent-based approach substantially, i.e., because epistemic uncertainty is not considered by Algorithm 1 fusion nodes are inflated with spuriously consistent information sources.

The results of the redundancy-based approach (Table 3) also meet the expectations formulated beforehand, i.e., <sup>∀</sup>*k*, <sup>∃</sup>*<sup>k</sup>* : **<sup>S</sup>***<sup>ρ</sup>* (*k*) <sup>⊆</sup> **<sup>S</sup>**MCS−*<sup>α</sup>* (*<sup>k</sup>* ) . In contrast to the consistency-based approach, sources with little informative value are not part of fusion nodes (e.g., sources 25 and 37). The computation of the range (22) penalises wide possibility distributions. This is because of the disjunctive fusion prior to computing the position of a set of distributions (21). Sets including information items close to total ignorance are given a position close to 0.5 resulting in low range values and hence low redundancies.

Similar to the consistency approach, the amount of fusion nodes decreases from reduced to complete training data. This shows that the redundancy-based approach is not able to rule out all sets showing spurious redundancy. However, the majority of nodes learned on complete data are exactly similar to nodes on reduced data. This is true for sets {10, 11, 12}, {19, 20, 21, 22, 23, 24}, {31, 32, 33}, {34, 35, 36}, and {46, 47, 48} with {7, 8, 9} coming close. This shows that the redundancy-based approach finds significant sets despite nonrepresentative training data.

**Table 2.** Fusion nodes and their contributing information sources as designed by Algorithm 1 with parameter *α* = 0. Grouped information sources are consistent for all instances of training data (see metric ec (19)). The left side shows fusion nodes found on reduced, highly epistemic uncertain training data, i.e., only data of the class stating normal condition were available. The right side shows nodes found on complete data. Fusion node sets on reduced training data do not meet the required redundancy threshold (i.e., *ρ η*), which is due to the low range-based evidence ep (20). Information sources are numbered as provided by the SDD dataset [82,83]. Fusion nodes with less than two information sources are omitted. In total, 24 fusion nodes were found on reduced data and 28 on complete data.


Therefore, it copes better than the consistency approach in situations with high epistemic uncertainty because the evidence ep (20) quantifies epistemic uncertainty. Nonetheless, it is advisable to update and adapt fusion nodes and topology with newly available data. This reduces risk of nodes with spurious redundancy.

Figure 7 depicts scatter plots of selected information sources to visualise the shortcomings of the consistency-based approach and to show the effects of epistemic uncertainty. Information items may be close to each other—and therefore be consistent—for parts of the training data (see plots (a), (b), and (c)). This is indicated by the fact that the positions of items are clustered in the upper right corners for reduced training data. This does not mean that consistent behaviour carries over to complete data (which is only true for (c)).

**Table 3.** Fusion nodes and their contributing information sources as designed by Algorithm 2 with parameters *α* = 0 and *η* = 0.6. Grouped information sources are consistent for all instances of training data and range over a significant part of the frame of discernment. The left side shows fusion nodes found on reduced, highly epistemic uncertain training data. The right side shows nodes found on complete data. Information sources (features) are numbered as provided by the SDD dataset [82,83]. Fusion nodes with less than two information sources are omitted. In total, 29 fusion nodes were found on reduced data and 31 on complete data.


**Figure 7.** Information items of selected information sources belonging to reduced training data (green) and complete training data (blue). Data belongs to the Sensorless Drive Diagnosis dataset [82,83]. Subplot (**a**) shows information sources (features) {1, 5}, (**b**) {25, 10}, and (**c**) {43, 45}. Each point in the scatter plots represents the position or centre of gravity of a possibility distribution obtained by (21). Possibility distributions of a single pair are plotted below each scatter plot to give an intuition about the size of the distributions. In the case of reduced training data, information sources (**a**) {1, 5} and (**b**) {25, 10} belong to fusion nodes in the consistency-based approach (see Table 2) but not in the redundancy-based approach (see Table 3). Without the additional information provided by the range metric (22), the consistency-based approach considers sources, which result in being inconsistent on complete training data. Sources (**c**) {43, 45} are given as an example in which information items are consistent over the complete training data. Both the consistency-based as well as the redundancybased approach consider {43, 45} in fusion nodes. Note that the scatter plot in (**a**) is zoomed in for better visibility.

#### 5.2.3. Defective Sources

Regarding defective sources, two adaptations to the MCS topology were proposed in this paper. Both adaptations—(i) discounting defective sources and (ii) estimation-fusionbased nodes—were evaluated on data with purposely engineered source defects.

The Typical Sensor Defects (TSD) dataset [21] provides such defective sources (The TSD dataset is available for download at https://zenodo.org/record/56358 (accessed on 9 March 2022)). The TSD dataset contains data of a storage container for hazardous and flammable materials measured, e.g., by temperature, smoke, and gas sensors. The dataset comprises several files, which each include a specific simulated source defect, such as incremental drift or outlier readings. For this evaluation, the files "data\_standard.csv" and "data\_drift\_0\_001.csv" are used.

The first provides unaltered data without defects. The second one contains the same data with the exception that a temperature sensor (feature 15) drifts with 1‰ h−<sup>1</sup> of its base value. Regarding preprocessing, the parameters for the unimodal potential function (31) are provided as metadata in the dataset. As data are hardly affected by noise, sources are fully reliable, and no averaging filter is applied. Data are provided with an error margin of ±2% of the sensor's measurement range [21] creating a uniform probability density function. Thus, preprocessing results in triangular possibility distributions.

The fusion topology is learned on unaltered data using the consistency-based approach of Algorithm 1—again with *α* = 0. This creates three fusion nodes on the first layer:


Their fusion results are fused at the final node *fn*(1,2) using MCS fusion (6). For the first layer nodes, the following fusion rules are used and evaluated:


Intermediate and final fusion outputs are computed for each of these fusion rules. The results of the same fusion rule on unaltered (standard) and drifted data are compared regarding their similarity. As similarity measure the possibilistic Jaccard index [32,85]

$$\text{sim}(\mathbf{p}) = \frac{\int\_0^1 \min\left(\pi\_{(k),\text{standard}}^{\text{fu}}(\mu), \pi\_{(k),\text{drift}}^{\text{fu}}(\mu)\right) \text{d}x}{\int\_0^1 \max\left(\pi\_{(k),\text{standard}}^{\text{fu}}(\mu), \pi\_{(k),\text{drift}}^{\text{fu}}(\mu)\right) \text{d}x}. \tag{32}$$

is applied. Similarities *sim* ∈ [0, 1] with *sim* = 1 indicating full similarity. Table 4 lists the minimum, arithmetic mean, and maximum values of the computed similarity values for *fn*(2) and *fn*(1,2). High similarities show robust behaviour against the defective source. As *fn*(1) and *fn*(3) contain no defective sources, they are omitted from the table.

It can be seen from the results that renormalised conjunctive fusion, which is the default rule in MCS fusion, was affected the most by the drifting source. Measures against defective sources are therefore reasonable.

The approach of detecting and discounting by widening inconsistent possibility distributions improved the robustness slightly but not substantially. The ineffectiveness is due to two reasons. First, widening with (25) does shift the fusion result toward reliable sources but does not guarantee that the original fusion result is restored. It is reasonable to assume that parameter *β* has a substantial impact, which needs to be investigated in further works. Second, a drifting possibility distribution may actually drift into other possibility distributions creating a false most consistent subset in the process.

This may lead to situations in which the wrong source is discounted. It is assumed that the risk of this happening decreases with the number of sources in a fusion node. Estimation fusion nodes showed, on the other hand, a significant increase in robustness evidenced by the higher min- and mean-values. Weighted estimation fusion demonstrated the best performance. Due to its averaging nature, estimation fusion reduces the effects of defective sources the better the higher the number of sources.

**Table 4.** Similarity between fusion node outputs on unaltered (standard) dataset and drift affected dataset. The table shows the minimum, arithmetic mean, and maximum of similarities computed on each data instance. The drift affected source belongs to *fn*(2). Therefore, *fn*(1) and *fn*(3) are not explicitly listed. Similarity is increased by proposed countermeasures to defective sources.

