1. Introduction
One of the goals of experimentation is to establish the form of the relationships that allow accurate data to be obtained for design purposes. Based on this objective, Wilkie in 1962 addressed the need to use mixed-level designs with 6 or 8 levels for one or more factors, and presented a case study, as well as a statistical analysis [
1]. Mixed-level designs are commonly used in different applications, especially when factors are qualitative. Mixed-level designs are defined as those in which the factors have different numbers of levels [
2,
3]. Included in this definition there are two cases regarding the factors’ levels: when the levels are equal for all factors, for example D(4
5), these designs are called pure or symmetrical; and when the levels are different for some other factor, for example, D(2
43
14
1), these are called mixed or asymmetrical [
4]. Within the group of asymmetrical designs, there are two subgroups with different characteristics: the first of these is a design in which some of its levels are multiples of each other, for example, D(3
16
17
1). The second is a design in which none if its levels are multiples of each other, for example, D(3
15
17
1). For this research, we have focused on the second subgroup, which we have called
pure asymmetrical arrays.
Practical success when using mixed-level designs is due to efficient use of experimental runs to study many factors simultaneously [
5]. Fractional factorial designs are the most popular designs in experimental investigation [
6]. Traditional construction methods for mixed-level fractional factorial designs often make use of complex programming, specialized software, and powerful computer equipment; see [
2,
3,
7,
8,
9,
10,
11,
12,
13,
14]. For this, an important number of criteria has been developed to measure the balance and orthogonality properties as quality attributes [
3,
7,
8,
15,
16,
17]. Even giving way to some comparisons between them [
5,
18] with different applications such as those described in [
1,
10,
19,
20,
21] as well as techniques to perform augmentations, in which the minimal requirement is to add a number of runs equal to 50% of the initial design size see [
22,
23]. According to the literature, there is an area of opportunity that must be attended in favor of the development of an algorithm with zero computational cost that allows the construction of fractions with the best levels of balance and orthogonality. Two situations are of particular interest: (1) when these designs form a design themselves; and (2) when these designs are joined to other designs to form a new design for example: an orthogonal fraction of the (
n, 2
4) and a semi-orthogonal fraction (
n, 3
15
17
1) can form a semi-orthogonal fraction for the design (
n, 2
43
15
17
1); this is a common practice to form a mixed-level fraction [
9].
At present, the use mixed-level fractional factorial designs in early stages of experimentation opens an important possibility within the oriented use of resources (i.e., human resources, raw materials, machinery, among others). Allowing the experimenter to scrutinize the influential effects in an economic scenario, with advantages such as allocating resources, obtaining results in a shorter time, reducing the impact of machine deterioration and equipment, among many others. Although the advantages in the use of mixed-level fractional factorial designs are widely known, the use of these designs has been limited because the exiting techniques for generating these fractions require the use of tools that require extensive domain and investment (i.e., complex methods, specialized computer equipment, specialist labor, specialized software, among others.).
There is an interest in the development of an instrument that breaks with the need for these additional resources. This research offers a zero computational cost tool that expands the tools currently offered by the state of the art, providing the experimenter with an easy to understand and apply method that does not require complex programming and can be used by anyone with basic knowledge of statistics, and therefore facilitating the implementation of mixed-level fractional factorial design in different fields of study.
Pantoja et al. (2019) developed the
NOBA (near-orthogonal balanced array) method to generate mixed-level fractional factorial designs balanced-orthogonal and semi-orthogonal, the study showed that a percentage of the designs analyzed proved to be ”infractionable” due to nature of its factors [
24]. Several examples of these designs, including 2 to 6 factors, are shown in the
Table 1 and
Table 2. In these designs, several of the levels are not multiples of each other. Therefore, the least common multiple of the levels is equal to the number of runs of the design matrix. When choosing a design of pure asymmetrical arrays to be fractioned, the size of this array stops being a multiple of at least one of the factors levels. Thus, this method is only able to generate near-orthogonal, near-balanced arrays. For this reason, the fractions generated are called near-orthogonal, near-balanced pure asymmetrical arrays (
NONBPAs). This group is clearly the least studied since fractions belonging to this group have been only published in [
3]. In this work it is possible to see the concept of efficient array (
EA), the design with the best possible balance and orthogonality properties.
EAs have been obtained from the application of genetic algorithms and the optimization of an objective function resulting from the sum of the standardized
J2-optimality and the standardized balanced coefficient (Form II). It is in this context, and when considering the possibility that a
NONBPA could be required in any field of application just as much as any other design, that the importance of studying
NONBPAs became evident.
Consider a shoe manufacturing company in which the implementation of a
NONBPA is required. The objective is to evaluate different materials for a new shoe concept, focused on users with foot pathologies. Two response variables are of interest: pathological benefits and production costs. The required design is (2
13
15
17
1) and the factors to consider are: buttress material, lining, type sole, and slipper material.
Table 3 shows the design levels, in this case, the alternative of running a full factorial (210 runs) was ruled out due to projected costs and required times. The decision was to run a
NONBPA consisting in only 20 runs (9.5% of the full factorial).
A notable contribution from this research is the development of two algorithms of zero computational cost. The first algorithm allows the construction of a
NONBPA fraction and the second algorithm provides a strategy to increase these fractions with
M additional runs. Both designs, the original
NONBPA and its augmented version were compared to the
EAs presented in [
3]. The results showed that the
NONBPAs are just as good as the
EAs in terms of
GBM (general balanced metric),
J2 (orthogonal parameter), and
(Average variance inflation factors).
The paper has been organized as follows:
Section 1 presents the introduction and motivation.
Section 2 presents several new concepts and two algorithms (
NONBPA structure, method to build a
NONBPA and an example, as well as a strategy to increase
NONBPAs with
M additional runs). In
Section 3, a comparison of
NONBPAs vs.
EAs is provided.
Section 4 presents a practical application, and finally, the conclusions are presented in
Section 5.
1.1. Mixed-Level Fractional Factorial Designs
The study of orthogonal arrays has been the focus of many investigations; two desirable properties for these arrays are balance and orthogonality. Orthogonal arrays contain pairs of linearly independent columns and are useful to evaluate the importance of several factors. Orthogonality ensures that the effects can be estimate independently [
7]. For a matrix to be balanced, in each column, each possible factor level must appear the same number of times. Columns whose levels do not appear with the same frequency are called unbalanced. The concept of near-balanced denotes that, although not all levels appear equally due to design size limitations, all levels appear with the most similar frequency. The importance of preserving the balance lies in the fact that executing the same number of times each level of a factor in an experiment, results in a uniform distribution of information for each level. Thus, there is consistency in the variances of the difference of observations in pairs of treatment combinations [
3].
Mixed-level fractional factorial designs have led to the continued generation of parameters to measure the quality of these arrays. Xu and Wu (2001) developed the generalized minimum aberration (
GMA) for comparing asymmetrical fractional factorial designs. This criterion is independent of the choice of treatment contrasts, and thus model free and it is applicable to symmetrical and asymmetrical designs [
15]. Xu (2003) proposed the minimum moment aberration (
MMA) to assess the goodness of nonregular designs and supersaturated designs [
16]. Xu and Deng (2005) proposed the moment aberration projection (
MAP) to rank and classify nonregular designs, it measures the goodness of a design through moments of the number of coincidences between the rows of its projection designs [
17]. Xu (2002) presented the
J2 parameter (see
Section 1.3) [
7]. Dean and Lewis (2006) offered an important revision of this criteria from the minimum aberration criterial approach [
21]. Liu et al. (2006) generalized χ2 (D) criterial and investigated connections between
GMA,
MMA, and
MAP criteria [
5]. Guo et al. (2007) defined the balanced coefficient criterion for main effects and used it as an objective function to measure the degree of balance and orthogonality of a near orthogonal array generated by using genetic algorithms; in this research he presents a catalog of 20 arrays also called
EAs; one characteristic of these designs is they require a reduced number of runs while preserving high levels for balance and orthogonality [
3]. Guo et al. (2009) extended the balance coefficient beyond main effects giving rise to the
GBM a minimum aberration criterion that can be used to evaluate and compare mixed-level fractional factorial designs [
8] (see,
Section 2.3).
Methods for construction of mixed-level fractional factorial designs include Wang and Wu (1992), they proposed an approach for construction of orthogonal designs based upon difference matrices [
10]. Wang (1996) presented a method for construction of orthogonal asymmetrical arrays through the generalized Kronecker sum mixed-level matrix and mixed difference matrices [
11]. Nguyen (1996) presented a method to augment orthogonal arrays with additional columns in such a way that the resulting design possesses good level for
E and other criteria [
19]. DeCock and Stufken (2000) designed and algorithm for construction of orthogonal mixed-level design through searching some existing two-level orthogonal designs [
25]. Xu (2002) developed an algorithm to add columns sequentially to a design by using the generalized minimum aberration and minimum moment aberration criteria [
7]. Salawu (2012) used the balanced coefficient and
J2 optimality criteria to compare the two forms of balanced coefficient methods using the generalized minimum aberration and minimum moment aberration criteria [
26]. Fontana (2017) presented a methodology based on the joint use of polynomial counting function, complex counting of levels and algorithms for quadratic optimization [
13]. Grömping and Fontana (2018) proposed an algorithm for generation of mixed-level arrays with generalized minimum aberration using mixed integer optimization with conic quadratic constraints [
14]. Pantoja et al. (2019) developed the
NOBA method, an algorithm based on divisor factors and permuted vectors that can generate mixed-level fractional factorial designs [
24].
One consequence of using a fractional factorial design is the aliasing of factorial effects. A standard follows up strategy involves adding a second fraction called
foldover. A
foldover can be constructed for various reasons. If the analysis of the initial design reveals that a particular set of main effects and interaction are significant, the
foldover design can be chosen to resolve confounding problems; if one factor is very important, it should not be confused with other factors. On the other hand, if the goal is to
dealias all, or as many as possible main effects from 2
FIs, or 2
FIs from each other [
27,
28]. A full
foldover consists of adding a second fraction of the same size as the initial fraction, obtained by inverting the signs of one or more columns two-level designs or by rotating one or more columns (for three-level and mixed level designs) [
29].
The
foldover is only one of several augmentation techniques developed for two-level designs, other techniques include
semifold,
D-optimal semifold,
quarterfold, and
R3 algorithm. Sequential experimentation techniques for mixed-level designs include
foldover [
22] and
semifold [
23]. The
foldover is constructed by rotating columns and the
semifold by performing exhaustive research. The
foldover technique is computationally more efficient when compared to searching for additional runs in the full factorial, which could not be practical. The main disadvantage of this method is that it requires the same number of runs as the initial array and the size of the augmented design may be large in some situations. In order to reduce the number of runs required by a
foldover, the concept of
semifold was introduced making it possible to reduce the
foldover plans to half the number of runs. [
23].
1.2. General Balanced Metric and Balanced Columns
Balanced columns contain all levels equally often. Therefore, a balanced matrix for main effects has a value of
GBM = 0 (Equation (5)). Columns whose levels do not appear equally often are called unbalanced. The concept of near-balanced denotes that while not all levels appear equally often, due to the size limitations, all levels appear as equally often as possible. Therefore, both balanced and near-balanced designs are considered to have optimal balanced status given the constraint on the number of runs. An unbalanced column is considered not near-balanced when it is neither balanced nor near-balanced [
8]. Ghosh and Chowdhury mentioned the importance of balance for achieving some or all treatment contrasts estimated with the same variance, they also mentioned the importance of common variance (
CV) designs when the objective is to discriminate between two models having common as well as uncommon parameter. This paper emphasizes the major role played by the uncommon parameters and generalizes the concept of
CV designs when there are at most
k (
≥1) uncommon parameters. They also introduce a new concept of “Robust
CV designs for replications” having the possibility of replicated observations and demonstrate the robustness for equally replicated observations. In addition, two general designs for three level symmetric factorial experiments are presented [
30].
Guo et al. (2009) defines the
GBM as a measure of the degree of balance for both, main effects and interactions in a mixed-level design [
8]. It is defined as an
n ×
k design matrix
,
is the number of rows and
is the number of factors. Let
denote matrices including all
t-factor interaction columns, and
is the one-factor-interaction matrix for the main effects. Note that
is equivalent to
. Therefore, the whole interaction matrix involves all
-factor interaction matrices
. That is (see Equation (1)),
Let
be the number of levels of the
column in
. Let
be the number of times the
th levels appears in the
jth column of
. Let
be the counts for each level for the
jth column of
. The notation
is used for the balance coefficient of
We can employ a distance function to reflect the degree of balance and define the
jth columns balance coefficient as shown in Equation (2),
for the
k-factor interaction matrix, where
is fixed. Substituting
, then
becomes in the Equation (3),
The balance coefficients
for
just sum the
and are defined as shown in Equation (4),
Then, the
GBM can be defined as in Equation (5),
For two designs and , suppose is the smallest value such that . Say that is more general balanced than if . If no design is more general balanced than , then is said to be the most general balanced design. To calculate the value of the GBM parameter, consider that Hjt (Equation (2)) represents the error between the frequencies with which each level appears with respect to the frequency with which it should appear. Therefore, it is notable that for a semi-balanced column Hjt > 0 and said value will tend to increase when the frequency of one or more the levels in that column moves away from the mean, which in this context corresponds to the frequency with each level should appear.
1.3. J2 and VIFs for Orthogonal Arrays
The
J2 optimality parameter was proposed by Xu [
7]. For an
matrix
, weight
is assigned for column
, which has
levels. For
, let (see Equation (6),
where
if
and
otherwise. The
value measures the similarity between the
th and
th rows of
. In particular, if
is chosen for all
, then
is the number of coincidences between the
th and
th rows. Defined in the Equation (7),
A design is -optimal if it minimizes Obviously, by minimizing , it is desired that the rows of be as dissimilar as possible.
For an
N × n matrix
d whose
kth columns has
sk levels and weight
wk, and the equality holds if and only of
d is
OA (see Equation (8)).
L(n); is the minimum value that is reached by J2 when a matrix is orthogonal. Therefore, since the NONBPAs are semi-orthogonal arrays, the value of L(n) cannot be considered as a reference point to minimize J2. A more direct comparison is achieved by calculating the .
VIF (variance inflation factor), of the predictor
xj is calculated based on the linear relationship between the predictor
xj and the other independent variables [
x1,
x2, …,
xj-1,
xj+1, …,
xm]. As shown in Equation (9).
where,
Rj2 is the coefficient of determination of the regression of
xj on all other independent variables in the data set [
x1,
x2, …,
xj-1,
xj+1, …,
xm] (see Equation (10)).
As it is known if the value of
VIF = 1; then el coefficient of determination
Rj2 = 0 and the predictors are not correlated, if 1
≤ VIF ≤ 5; the predictors are moderately correlated and if
VIF > 10 indicates that the correlation between predictors is excessively influencing the regression results.
VIFs are easy to interpret since the higher the
VIFs value, the greater the correlation between the predictors [
31,
32].
5. Conclusions
Industrial experiments often involve situations in which categorical and numerical factors with different numbers of levels are present, these experiments are commonly known as mixed-level designs. Mixed-level designs require a high number of runs and are difficult to carry out because of the cost and time required. One alternative to avoid running a full factorial is to run a mixed-level fractional factorial design. Unfortunately, these fractions are not easy to construct because they often require complex programming techniques, specialized software, and expensive computer equipment.
The new method presented here, called
NONBPA, is an algorithm capable of generating mixed-level fractional factorial designs when the factor levels are not multiple of each other. The near-orthogonal near-balanced pure symmetrical arrays generated are extremely flexible in run size and possess high levels of balance and orthogonality. The arrays generated with this method were compared to the
EAs presented in [
3] and the results showed that the balance and orthogonality property were identical for both methods. In addition to the construction method, a method to perform augmentations was also provided, this method allows augmenting any
NONBPA with
M additional runs while preserving the balance and orthogonality properties.
The main advantages of the NONBPA method are that it is easy to understand and apply, it does not require complex programming, the computational cost is low and it can be used by any person with basic knowledge in statistics.
GBM and are parameters that allow to compare respectively, balance and orthogonality between arrays with the same or different number of runs. On the other hand, J2 only allows the comparison of the level of orthogonality between arrays that have same number of runs. A disadvantage in the use of J2 for NONBPAs is that it is not possible to know the minimum value of L(n) for semi-orthogonal arrays. Therefore, for the NONBPAs the use of is recommended.
Future research for the NONBPA will focus on evaluating balance and orthogonality beyond main effects, opening a greater number of possibilities for experimenters in the various fields of application.