*3.2. Combination Pattern*

Regarding uncorrelated categorical variables, enforcing their limited set of qualitative values is the main intra-feature constraint. Therefore, the interval approach cannot be replicated even if they are encoded to a numerical form, and a straightforward solution can be recording each value a feature can assume. Nonetheless, the most pertinent aspect

of perturbing tabular data is the correlation between multiple variables. Since the value present in a variable may influence the values used for other variables, there can be several inter-feature constraints. To improve beyond the previous solution and fulfil both types of constraints, several features can be combined into a single common record.

The combination pattern records the valid combinations to perform a simultaneous and coherent perturbation of multiple features (Figure 3). It can be configured with locked features, whose values are used to find combinations for other features without being modified. Due to the simultaneous perturbations, its 'probability to be applied', in the (0, 1] interval, can affect several features.

**Figure 3.** Combination pattern (business process model and notation).

Besides the initially recorded combinations, new data can provide additional possibilities. These can be merged with the previous or used as gradual updates. For a given feature and a momentum *k* ∈ [0, 1], the number of updated combinations *Ci* of a batch *i* is mathematically expressed as:

$$C\_i = C\_{i-1} \* k + \text{unique}(x\_i) \tag{4}$$

where unique(*xi*) is the number of unique combinations of the samples *xi* of batch *i*.

Each perturbation created by this pattern consists of a combination randomly selected from the current possibilities, considering the locked features. It directly replaces the originalvalues,ensuringthatthefeaturesremaincoherent.
