*Article* **An Extension of the Concept of Derivative: Its Application to Intertemporal Choice**

#### **Salvador Cruz Rambaud 1,\* and Blas Torrecillas Jover <sup>2</sup>**


Received: 26 March 2020; Accepted: 26 April 2020; Published: 2 May 2020

**Abstract:** The framework of this paper is the concept of derivative from the point of view of abstract algebra and differential calculus. The objective of this paper is to introduce a novel concept of derivative which arises in certain economic problems, specifically in intertemporal choice when trying to characterize moderately and strongly decreasing impatience. To do this, we have employed the usual tools and magnitudes of financial mathematics with an algebraic nomenclature. The main contribution of this paper is twofold. On the one hand, we have proposed a novel framework and a different approach to the concept of relative derivation which satisfies the so-called generalized Leibniz's rule. On the other hand, in spite of the fact that this peculiar approach can be applied to other disciplines, we have presented the mathematical characterization of the two main types of decreasing impatience in the ambit of behavioral finance, based on a previous characterization involving the proportional increasing of the variable "time". Finally, this paper points out other patterns of variation which could be applied in economics and other scientific disciplines.

**Keywords:** derivation; intertemporal choice; decreasing impatience; elasticity

**MSC:** 16W25

**JEL Classification:** G41

#### **1. Introduction and Preliminaries**

In most social and experimental sciences, such as economics, psychology, sociology, biology, chemistry, physics, epidemiology, etc., researchers are interested in finding, *ceteris paribus*, the relationship between the explained variable and one or more explaining variables. This relationship has not to be linear, that is to say, linear increments in the value of an independent variable does not necessarily lead to linear variations of the dependent variable. This is logical by taking into account the non-linearity of most physical or chemical laws. These circumstances motivate the necessity of introducing a new concept of derivative which, of course, generalizes the concepts of classical and directional derivatives. Consequently, the frequent search for new patterns of variations in the aforementioned disciplines justifies a new framework and a different approach to the concept of derivative, able to help in modelling the decision-making process. Let us start with some general concepts.

Let *A* be an arbitrary *K*-algebra (non necessarily commutative or associative), where *K* is a field. A *derivation* over *A* is a *K*-linear map *D* : *A* → *A* satisfying Leibniz's identity:

$$D(ab) = D(a)b + aD(b).$$

*Mathematics* **2020**, *8*, 696

It can be easily demonstrated that the sum, difference, scalar product and composition of derivations are derivations. In general, the product is not a derivation, but the so-called *commutator*, defined as [*D*1, *D*2] = *D*1*D*<sup>2</sup> − *D*2*D*1, is a derivation. We are going to denote by Der*K*(*A*) the set of all derivations on *A*. This set has the structure of a *K*-module and, with the commutator operation, becomes a Lie Algebra. This algebraic notion includes the classical partial derivations of real functions of several variables and the Lie derivative with respect to a vector field in differential geometry. Derivations and differentials are important tools in algebraic geometry and commutative algebra (see [1–3]).

The notion of derivation was extended to the so-called (*σ*, *τ*)-derivation [4] for an associative C-algebra *<sup>A</sup>*, where *<sup>σ</sup>* and *<sup>τ</sup>* are two different algebra endomorphisms of *<sup>A</sup>* (*σ*, *<sup>τ</sup>* ∈ EndAlg(*A*)), as a C-linear map *<sup>D</sup>* : *<sup>A</sup>* → *<sup>A</sup>* satisfying:

$$D(ab) = D(a)\,\pi(b) + \sigma(a)D(b).$$

If *τ* = id*A*, we obtain *D*(*ab*) = *D*(*a*)*b* + *σ*(*a*)*D*(*b*). In this case, we will say that *D* is a *σ*-derivation. There are many interesting examples of these generalized derivations. The *q*-derivation, as a *q*-differential operator, was introduced by Jackson [5]. In effect, let *A* be a C-algebra (which could be C[*z*, *z*−1] or various functions spaces). The two important generalizations of derivation are *Dq* and *Mq* : *A* → *A*, defined as:

$$(D\_q(f))(z) = \frac{f(qz) - f(z)}{qz - z} = \frac{f(qz) - f(z)}{(q - 1)z}$$

and

$$M\_{\emptyset}(f)(z) = \frac{f(qz) - f(z)}{q - 1}$$

These operations satisfy the *q*-deformed Leibniz's rule, *D*(*f g*) = *D*(*f*)*g* + *σq*(*f*)*g*, where *σq*(*f*)(*z*) = *f*(*qz*), i.e., the *q*-derivation is a *σ*-derivation. Observe that this formula is not symmetric as the usual one.

Now, we can compare this *q*-derivation with the classical *h*-derivation, defined by:

$$(D\_h(f))(z) = \frac{f(z+h) - f(z)}{h}.$$

In effect,

$$\lim\_{q \to 1} D\_q(f(z)) = \lim\_{h \to 0} D\_h(f(z)) = \frac{\mathrm{d}f(x)}{\mathrm{d}x}.$$

provided that *f* is differentiable.

The *q*-derivation is the key notion of the quantum calculus which allows us to study those functions which are not differentiable. This theory has been developed by many authors and has found several applications in quantum groups, orthogonal polynomials, basic hypergeometric functions, combinatorics, calculus of variations and arithmetics [6,7].

In general, the *q*-derivation is more difficult to be computed. For instance, there is not a chain formula for this kind of derivation. We refer the interested reader to the books by Kac and Cheung [8], Ernst [9], and Annaby and Mansour [10] for more information about the *q*-derivations and *q*-integrals.

The *q*-derivation has been generalized in many directions within the existing literature. Recently, the *β*-derivative was introduced by Auch in his thesis [11], where *β*(*z*) = *az* + *b*, with *a* ≥ 1, *b* ≥ 0 and *a* + *b* > 1, the general case being considered in [12] and continued by several scholars [13]:

$$D\_{\beta}(f)(z) = \frac{f(\beta(z)) - f(z)}{\beta(z) - z},$$

relationships between them.

where *β* = *<sup>z</sup>* and *<sup>β</sup>* : *<sup>I</sup>* → *<sup>I</sup>* is a strictly increasing continuous function, *<sup>I</sup>* ⊆ R. Thus, *<sup>q</sup>*-derivation is a

*Dq*,*ω*(*f*)(*z*) = *<sup>f</sup>*(*qz* <sup>+</sup> *<sup>ω</sup>*) <sup>−</sup> *<sup>f</sup>*(*z*)

introduced to study orthogonal polynomials, is also a particular case. Another generalization of

Figure 1 summarizes the different types of derivations presented in this section and shows the

(*<sup>q</sup>* <sup>−</sup> <sup>1</sup>)*<sup>z</sup>* <sup>+</sup> *<sup>ω</sup>* ,

particular case of this new concept. Moreover, the operator by Hahn [14,15], defined as:

*q*-calculus is the so-called (*p*, *q*)-derivative which is a (*σ*, *τ*)-derivation [16].

**Figure 1.** Chart of the different revised derivations.

This paper has been organized as follows. After this Introduction, Section 2 presents the novel concept of derivation relative to a given function *f* . It is shown that this derivative can be embodied in the ambit of relative derivations and satisfies Leibniz's rule. In Section 3, this new algebraic tool is used to characterize those discount functions exhibiting moderately and strongly decreasing impatience. In Section 4, the obtained results are discussed in the context of other variation patterns (quadratic, logarithmic, etc.) present in economics, finance and other scientific fields. Finally, Section 5 summarizes and concludes.

#### **2. An Extension of the Concept of Derivative**

#### *2.1. General Concepts*

Let *A* be a *K*-algebra and *M* be an *A*-module, where *K* is a field. Let *σ* : *A* → *A* an endomorphism of *A*. A *σ*-*derivation D* on *M* is an *K*-linear map

$$D: A \longrightarrow M\_\prime$$

such that

$$D(ab) = D(a)b + \sigma(a)D(b),$$

for every *<sup>a</sup>* and *<sup>b</sup>* <sup>∈</sup> *<sup>A</sup>*. From a structural point of view, let us denote by Der*<sup>σ</sup> <sup>K</sup>*(*A*, *M*) the set of all *K*-derivations on *M*. Obviously, Der*<sup>σ</sup> <sup>K</sup>*(*A*, *<sup>M</sup>*) is an *<sup>A</sup>*-module. In effect, if *<sup>D</sup>*, *<sup>D</sup>*1, *<sup>D</sup>*<sup>2</sup> <sup>∈</sup> Der*<sup>σ</sup> <sup>K</sup>*(*A*, *M*) and *<sup>a</sup>* <sup>∈</sup> *<sup>A</sup>*, then *<sup>D</sup>*<sup>1</sup> <sup>+</sup> *<sup>D</sup>*<sup>2</sup> <sup>∈</sup> Der*<sup>σ</sup> <sup>K</sup>*(*A*, *<sup>M</sup>*) and *aD* <sup>∈</sup> Der*<sup>σ</sup> <sup>K</sup>*(*A*, *M*).

In the particular case where *M* = *A*, *D* will be called a *K-derivation on A*, and

$$\operatorname{Der}\_K^{\sigma}(A, A) := \operatorname{Der}\_K^{\sigma}(A).$$

will be called the *module of σ-derivations on A*.

#### *2.2. Relative Derivation*

Let us consider the module of *K*-derivations on the algebra *A*, Der*<sup>σ</sup> <sup>K</sup>*(*A*). For every *D*<sup>0</sup> and *<sup>D</sup>* <sup>∈</sup> Der*<sup>σ</sup> <sup>K</sup>*(*A*) and *a* ∈ *A*, we can define the *derivation relative to D*<sup>0</sup> *and a* as

$$D\_a(\cdot) := D\_0(a)D(\cdot)\,.$$

**Lemma 1.** *If A is commutative, then Da is a σ-derivation.*

**Proof.** In effect, clearly *D* is *K*-linear and, moreover, satisfies the generalized Leibniz's condition:

$$\begin{array}{rcl}D\_{\mathfrak{a}}(\mathbf{x}y) &=& D\_{0}(\mathbf{a})D(\mathbf{x}y) \\ &=& D\_{0}(\mathbf{a})[D(\mathbf{x})y + \sigma(\mathbf{x})D(y)] \\ &=& D\_{0}(\mathbf{a})D(\mathbf{x})y + \sigma(\mathbf{x})D\_{0}(\mathbf{a})D(y) \\ &=& D\_{\mathfrak{a}}(\mathbf{x})y + \sigma(\mathbf{x})D\_{\mathfrak{a}}(y). \end{array}$$

This completes the proof.

Given two *<sup>σ</sup>*-derivations, *<sup>D</sup>*<sup>0</sup> and *<sup>D</sup>* <sup>∈</sup> *Der<sup>σ</sup> <sup>K</sup>*(*A*), we can define a map

$$\mathcal{D}: A \to Der\_K^{\sigma}(A)$$

such that

$$\mathcal{D}(a) = D\_a := D\_0(a)D.$$

**Example 1.** *Consider the polynomial ring A* := *K*[*x*1, *x*2, ... , *xn*, *y*1, *y*2, ... , *yn*]*, D*<sup>0</sup> = *∂x*<sup>1</sup> + ··· + *∂xn and D* = *∂y*<sup>1</sup> + ··· + *∂yn . Then, for a* = *x*1*y*<sup>1</sup> + ··· + *xnyn and every f* ∈ D*, one has:*

$$\begin{array}{rcl} D\_a(f) &=& D\_0(a)D(f) \\ &=& D\_0(x\_1y\_1 + \cdots + x\_ny\_n)D(f) \\ &=& (y\_1 + \cdots + y\_n)[\partial\_{y\_1}(f) + \cdots + \partial\_{y\_n}(f))]. \end{array}$$

**Proposition 1.** *For a commutative ring A,* D *is a σ-derivation.*

**Proof.** Firstly, let us see that D is *K*-linear. In effect, for every *a*, *b* ∈ *A* and *k* ∈ *K*, one has:

$$\begin{array}{rcl} \mathcal{D}(a+b) &=& D\_{a+b} = D\_0(a+b)D \\ &=& D\_0(a)D + D\_0(b)D = D\_d + D\_b = \mathcal{D}(a) + \mathcal{D}(b) \end{array}$$

and

$$\begin{array}{rcl} \mathcal{D}(ka) &=& D\_{ka} = D\_0(ka)D \\ &=& kD\_0(a)D = kD\_d = k\mathcal{D}(a). \end{array}$$

Secondly, we are going to show that D satisfies the generalized Leibniz condition. In effect, for every *a*, *b* ∈ *A*, one has:

$$\begin{array}{rcl} \mathcal{D}(ab) &=& D\_{ab} = D\_0(ab)D = [D\_0(a)b + \sigma(a)D\_0(b)]D \\ &=& [D\_0(a)D]b + \sigma(a)[D\_0(b)D] = D\_ab + \sigma(a)D\_b = \mathcal{D}\_ab + \sigma(a)\mathcal{D}(b). \end{array}$$

Therefore, D is a *σ*-derivation.

Now, we can compute the bracket of two relative *σ*-derivations *Da* and *Db*, for every *a*, *b* ∈ *A*:

$$\begin{array}{rcl} [D\_a, D\_b] &=& D\_a \diamond D\_b - D\_b \diamond D\_a \\ &=& D\_a[D\_0(b)D] - D\_b[D\_0(a)D] \\ &=& D\_0(a)D(D\_0(b)D) - D\_0(b)D(D\_0(a)D) \\ &=& D\_0(a)[D D\_0(b)D + \sigma(D\_0(b))D^2] - D\_0(b)[D D\_0(a)D + \sigma(D\_0(a))D^2] \\ &=& [D\_0(a)D D\_0(b) - D\_0(b)D D\_0(a)]D + [D\_0(a)\sigma(D\_0(b)) - D\_0(b)\sigma(D\_0(a))]D^2. \end{array}$$

Observe that, although the bracket of two derivations is a derivation, in general it is not a relative derivation. However, if *A* is commutative and *σ* is the identity, then the former bracket could be simplified as follows:

$$\begin{array}{rcl} [D\_a, D\_b] &=& [D\_0(a) D D\_0(b) - D\_0(b) D D\_0(a)] D + [D\_0(a), D\_0(b)] D^2 \\ &=& [D\_0(a) D D\_0(b) - D\_0(b) D D\_0(a)] D. \end{array}$$

Moreover, if *A* is commutative and *σ* is the identity, the double bracket of three derivations *Da*, *Db* and *Dc*, for every *a*, *b*, *c* ∈ *A*, is:

$$\begin{array}{lcl} \left[ \left[ D\_{\mathbf{u}}, D\_{\mathbf{b}} \right], D\_{\mathbf{c}} \right] &=& \left[ D\_{0}(a) D D\_{0}(b) - D\_{0}(b) D D\_{0}(a) \right] D D\_{\mathbf{c}} - D\_{\mathbf{c}} \big[ D\_{0}(a) D D\_{0}(b) - D\_{0}(b) D D\_{0}(a) \big] D \\ &=& \left[ D\_{0}(a) D D\_{0}(b) - D\_{0}(b) D D\_{0}(a) \right] D D\_{0}(c) D - D\_{0}(c) D \big[ D\_{0}(a) D D\_{0}(b) - D\_{0}(b) D D\_{0}(a) \big] D \\ &=& \left[ D\_{0}(a) D D\_{0}(b) - D\_{0}(b) D D\_{0}(a) \right] D D\_{0}(c) D + \left[ D\_{0}(a) D D\_{0}(b) - D\_{0}(b) D D\_{0}(a) \right] \big[ D\_{0}(c) D^{2} \\ &=& D\_{0}(c) \big[ D \big[ D\_{0}(a) D D\_{0}(b) - D\_{0}(b) D D\_{0}(a) \big] \big] D - D\_{0}(c) \big[ D \big[ D\_{0}(a) D D\_{0}(b) - D\_{0}(b) D D\_{0}(a) \big] D. \end{array}$$

Since the relative derivations are true derivations, they satisfy Jacobi's identity, i.e., for every *a*, *b*, *c* ∈ *A*, the following identity holds:

$$[D\_{a\_{\prime}}\left[D\_{b\prime}D\_{c}\right]] + [D\_{b\prime}\left[D\_{c\prime}D\_{a}\right]] + [D\_{b\prime}\left[D\_{a\prime}D\_{b}\right]] = 0.1$$

For *σ* derivation one could modify the definition of the bracket as in [17] and then a Jacobi-like identity is obtained [17, Theorem 5]. We left the details to the reader.

Given another derivation *D*1, we can define a new relative *σ*-derivation *D <sup>a</sup>* := *D*1(*a*)*D*. In this case, the new bracket is:

$$\begin{array}{rcl} [D\_a, D\_a'] &=& D\_a D\_a' - D\_a' D\_a \\ &=& D\_0(a)D(D\_1(a)D) - D\_1(a)D(D\_0(a)D) \\ &=& D\_0(a)[D D\_1(a)D - \sigma(D\_1(a))D^2] - D\_1(a)[D D a(a)D - \sigma(D\_0(a))D^2] \\ &=& [D\_0(a)D D\_1(a) - D\_1(a)D D\_0(a)]D + [D\_1(a)\sigma(D\_0(a)) - D\_0(a)\sigma(D\_1(a))]D^2. \end{array}$$

If *A* is commutative and *σ* is the identity, then:

$$[D\_{a\prime}D\_a'] = [D\_0(a)DD\_1(a) - D\_1(a)DD\_0(a)]D.$$

The chain rule is also satisfied for relative derivations when *A* is commutative. In effect, assume that, for every *f* , *g* ∈ *A*, the composition, *f* ◦ *g*, is defined. Then

$$\begin{array}{rcl}D\_{\mathfrak{a}}(f\circ\mathfrak{g})&=&D\_{0}(f\circ\mathfrak{g})D(f\circ\mathfrak{g})\\&=&[D\_{0}(f)\circ\mathfrak{g}]D\_{0}(\mathfrak{g})[D(f)\circ\mathfrak{g}]D(\mathfrak{g})\\&=&[D\_{0}(f)\circ\mathfrak{g}][D(f)\circ\mathfrak{g}]D\_{0}(\mathfrak{g})D(\mathfrak{g})\\&=&[D\_{0}(f)D(f)\circ\mathfrak{g}]D\_{0}(\mathfrak{g})D(\mathfrak{g})\\&=&[D\_{\mathfrak{a}}(f)\circ\mathfrak{g}]D\_{\mathfrak{a}}(\mathfrak{g}).\end{array}$$

#### *2.3. Derivation Relative to a Function*

Let *f*(*x*, Δ) be a real function of two variables *x* and Δ such that, for every *a*:

$$\lim\_{\Delta \to 0} f(a, \Delta) = a. \tag{1}$$

Let *F*(*x*) be a real function differentiable at *x* = *a*. The *derivative of F relative to f , at x* = *a*, denoted by *Df*(*F*)(*a*), is defined as the following limit:

$$D\_f(F)(a) := \lim\_{\Delta \to 0} \frac{F[f(a, \Delta)] - F(a)}{\Delta}. \tag{2}$$

In this setting, we can define *g*(*x*, Δ) as the new function, also of two variables *x* and Δ, satisfying the following identity:

$$f(\mathbf{x}, \Delta) := \mathbf{x} + \mathcal{g}(\mathbf{x}, \Delta). \tag{3}$$

Observe that *g*(*x*, Δ) = *f*(*x*, Δ) − *x* and, consequently,

$$\lim\_{\Delta \to 0} \lg(a, \Delta) = 0.$$

The set of functions *g*(*x*, Δ), denoted by S, is a subalgebra of the algebra of real-valued functions of two variables *x* and Δ, represented by A. In effect, it is obvious to check that, if *g*, *g*1, *g*<sup>2</sup> ∈ S and *<sup>λ</sup>* ∈ R, then *<sup>g</sup>*<sup>1</sup> + *<sup>g</sup>*2, *<sup>g</sup>*<sup>1</sup> · *<sup>g</sup>*2, and *<sup>λ</sup><sup>g</sup>* belong to S. Therefore, if F denotes the set of the so-defined functions *f*(*x*, Δ), we can write:

$$\mathcal{F} = \text{id} + \mathcal{S}, \tag{4}$$

where id(*x*) = *x* is the identity function of one variable. In S, we can define the following binary relation:

$$g\_1 \sim g\_2 \text{ if, and only if, } \lim\_{\Delta \to 0} \frac{g\_1(\mathbf{x}, \Delta)}{\Delta} = \lim\_{\Delta \to 0} \frac{g\_2(\mathbf{x}, \Delta)}{\Delta}. \tag{5}$$

Obviously, ∼ is an equivalence relation. Now, we can define *h*(*x*, Δ) as the new function also of two variables *x* and Δ, satisfying the following identity:

$$\log(\mathbf{x}, \Delta) := h(\mathbf{x}, \Delta)\Delta. \tag{6}$$

Thus,

$$h(\mathbf{x}, \Delta) = \frac{f(\mathbf{x}, \Delta) - \mathbf{x}}{\Delta}. \tag{7}$$

Therefore,

$$D\_f(F)(a) = \lim\_{\Delta \to 0} \frac{F[a + h(a, \Delta)\Delta] - F(a)}{h(a, \Delta)\Delta} \lim\_{\Delta \to 0} h(a, \Delta). \tag{8}$$

Observe that now limΔ→<sup>0</sup> *h*(*a*, Δ) only depends on *a*, whereby it can be simply denoted as *h*(*a*):

$$h(a) := \lim\_{\Delta \to 0} h(a, \Delta). \tag{9}$$

Thus, Equation (8) results in:

$$D\_f(F)(a) = D(F)(a) \cdot h(a). \tag{10}$$

If *f*(*x*, Δ) is derivable at Δ = 0, then *f*(*x*, Δ) is continuous at Δ = 0, whereby

$$f(a,0) = \lim\_{\Delta \to 0} f(a,\Delta) = a \tag{11}$$

and, consequently,

$$h(a) = \lim\_{\Delta \to 0} h(a, \Delta) = \lim\_{\Delta \to 0} \frac{f(a, \Delta) - f(a, 0)}{\Delta} = \left. \frac{\partial f(a, \Delta)}{\partial \Delta} \right|\_{\Delta = 0} \,. \tag{12}$$

Therefore,

$$D\_f(F)(a) = \left. \frac{\partial f(a, \Delta)}{\partial \Delta} \right|\_{\Delta = 0} D(F)(a). \tag{13}$$

Observe that we are representing by *<sup>D</sup>* : <sup>C</sup>1(R) → C1(R) the operator *<sup>D</sup>* <sup>=</sup> <sup>d</sup> <sup>d</sup>*<sup>x</sup>* . If, additionally, the partial derivative *<sup>∂</sup> <sup>f</sup>*(*a*,Δ) *∂*Δ ) ) ) Δ=0 is simply denoted by *∂y*=0(*f*)(*a*), expression (13) remains as:

$$D\_f(F)(a) = \partial\_{y=0}(f)(a)D(F)(a) = \partial\_{y=0}(g)(a)D(F)(a)$$

or, globally,

$$D\_f(F) = \partial\_{\mathcal{Y}=0}(f)D(F) = \partial\_{\mathcal{Y}=0}(\mathcal{g})D(F).$$

Thus, *Df* : <sup>C</sup>1(R) → C1(R) is really a derivation:

$$\begin{array}{rcl}D\_f(FG) &=& \partial\_{y=0}(f)D(FG) \\ &=& \partial\_{y=0}(f)[D(F)G + FD(G)] \\ &=& [\partial\_{y=0}(f)D(F)]G + F[\partial\_{y=0}(f)D(G)], \\ &=& D\_f(F)G + FD\_f(G). \end{array}$$

Observe that *h*(*a*) or *∂y*=0(*f*)(*a*) represents the equivalence class including the function *g*(*a*, Δ). Moreover, the set of all suitable values of *Df*(*F*)(*a*) is restricted to the set

$$D(F)(a)(\mathcal{S}/\sim)\_{\prime}$$

where S/ ∼ is the quotient set derived from the equivalence relation ∼.

The name assigned to this derivative can be justified as follows. Observe that the graphic representation of *g*(*a*, Δ) is a surface which describes a kind of "valley" over the *a*-axis (that is to say, Δ = 0) (see Figure 2). Therefore, for every value of *a*, a path can be obtained by intersecting the surface with the vertical plane crossing the point (*a*, 0), giving rise to the function *g*(*a*, Δ), which represents the increment of *a*. As previously indicated, *g*(*a*, Δ) tend to zero as Δ approaches to zero (represented by the red arrow).

**Figure 2.** Plotting function *g*(*a*, Δ).

In the particular case in which

$$f(\mathbf{x}, \Delta) = \mathbf{x} + \Delta,\tag{14}$$

obviously, one has:

$$D\_f(F)(a) = D(F)(a),\tag{15}$$

that is to say, the derivative relative to the function *f*(*x*, Δ) = *x* + Δ (absolute increments) coincides with the usual derivative.

**Example 2.** *Assume that (percentage increments of the variable):*

$$f(\mathbf{x}, \Delta) = \mathbf{x} + \frac{\Delta}{\mathbf{x}}.$$

*In this case,*

$$D\_f(F)(a) = \frac{1}{a}D(F)(a).$$

In the context of certain scientific problems, it is interesting to characterize the variation (increase or decrease) of the so-defined derivative relative to a given function. In this case, the sign of the usual derivative of this relative derivative will be useful:

$$D[D\_f(F)] = D[\partial\_{y=0}(f)D(F)] = D[\partial\_{y=0}(f)]D(F) + \partial\_{y=0}(f)D^2(F). \tag{16}$$

Thus, if *Df*(*F*) must be increasing (resp. decreasing), then

$$D^2(F) > -\frac{D[\partial\_{y=0}(f)]D(F)}{\partial\_{y=0}(f)}\tag{17}$$

(resp. *<sup>D</sup>*2(*F*) <sup>&</sup>lt; <sup>−</sup> *<sup>D</sup>*[*∂y*=0(*f*)]*D*(*F*) *<sup>∂</sup>y*=0(*f*) ).

**Example 3.** *Assume that f*(*x*, Δ) = ln(exp{*a*} + Δ)*. In this case,*

$$D\_f(F)(a) = \frac{1}{\exp\{a\}} D(F)(a) \dots$$

*The condition of increase of Df*(*F*) *leads to*

$$D^2(F) > D(F).$$

#### **3. An Application to Intertemporal Choice: Proportional Increments**

This section is going to apply this new methodology to a well-known economic problem, more specifically to intertemporal choice. In effect, we are going to describe a noteworthy particular case of our derivative when the change in the variable is due to proportional instead to absolute increments. To do this, let us start with the description of the setting in which the new derivative will be applied (see [18,19]).

Let *X* be set R<sup>+</sup> of non-negative real numbers and *T* a non-degenerate closed interval of [0, +∞). A *dated reward* is a couple (*x*, *t*) ∈ *X* × *T*. In what follows, we will refer to *x* as the *amount* and *t* as the *time of availability* of the reward. Assume that a decision maker exhibits a continuous weak order on *X* × *T*, denoted by , satisfying the following conditions (the relations ≺, , and ∼ can be defined as usual):

1. For every *s* ∈ *T* and *t* ∈ *T*, (0,*s*) ∼ (0, *t*) holds.


The most famous representation theorem of preferences is due to Fishburn and Rubinstein [20]: If order, monotonicity, continuity, impatience, and separability hold, and the set of rewards *X* is an interval, then there are continuous real-valued functions *u* on *X* and *F* on the time interval *T* such that

$$
\mu(\mathbf{x}, \mathbf{s}) \preceq (y, t) \text{ if, and only if, } \mu(\mathbf{x})F(\mathbf{s}) \preceq \mu(y)F(t).
$$

Additionally, function *u*, called the *utility*, is increasing and satisfies *u*(0) = 0. On the other hand, function *F*, called the *discount function*, is decreasing, positive and satisfies *F*(0) = 1.

Assume that, for the decision maker, the rewards (*x*,*s*) and (*y*, *t*), with *s* < *t*, are indifferent, that is to say, (*x*,*s*) ∼ (*y*, *t*). Observe that, necessarily, *u*(*x*) < *u*(*y*). The *impatience* in the interval [*s*, *t*], denoted by *I*(*s*, *t*), can be defined as the difference *u*(*y*) − *u*(*x*) which is the amount that the agent is willing to loss in exchange for an earlier receipt of the reward. However, in economics the magnitudes should be defined in relative, better than absolute, terms. Thus, the impatience corresponding to the interval [*s*, *t*], relatively to time and amount, should be:

$$I(\mathbf{s}, t) := \frac{\mu(y) - \mu(\mathbf{x})}{(t - \mathbf{s})\mu(y)}.$$

According to [20], the following equation holds:

$$
\mu(\mathfrak{x})F(s) = \mathfrak{u}(y)F(t)\_r
$$

whereby

$$I(s,t) = \frac{F(s) - F(t)}{(t-s)F(s)}.$$

Observe that, in algebraic terms, *I*(*s*, *t*) is the classical "logarithmic" derivative, with minus sign, of *F* at time *s*:

$$I(s,t) = -\left(\frac{D\_{t-s}(F)}{F}\right)(s),$$

where *Dt*−*<sup>s</sup>* is the classical *h*-derivation, with *h* = *t* − *s*. However, in finance, the most employed measure of impatience is given by the limit of *I*(*s*, *t*) when *t* tends to *s*, giving rise to the well-known concept of *instantaneous discount rate*, denoted by *δ*(*s*):

$$\delta(s) := -\lim\_{t \to s} \left( \frac{D\_{t-s}(F)}{F} \right)(s) = -D(\ln F)(s).$$

For a detailed information about the different concepts of impatience in intertemporal choice, see [21]. The following definition introduces a central concept to analyze the evolution of impatience with the passage of time.

**Definition 1** ([19])**.** *A decision-maker exhibiting preferences has decreasing impatience (DI) if, for every s* < *t, k* > 0 *and* 0 < *x* < *y,* (*x*,*s*) ∼ (*y*, *t*) *implies* (*x*,*s* + *k*) (*y*, *t* + *k*)*.*

A consequence is that, under the conditions of Definition 1, given *σ* > 0, there exists *τ* = *τ*(*σ*) > *σ* such that

$$(\mathfrak{x}, \mathfrak{s} + \sigma) \sim (\mathfrak{y}, t + \tau).$$

The existence of *τ* is guaranteed when, as usual, the discount function is regular, i.e., satisfies lim*t*→<sup>∞</sup> *F*(*t*) = 0. A specific case of DI is given by the following definition.

**Definition 2** ([19])**.** *A decision-maker exhibiting decreasing impatience has strongly decreasing impatience if sτ* ≥ *tσ.*

The following proposition provides a nice characterization of strongly decreasing impatience.

**Proposition 2** ([22])**.** *A decision-maker exhibiting preferences has strongly decreasing impatience if, and only if, for every s* < *t, λ* > 1 *and* 0 < *x* < *y,* (*x*,*s*) ∼ (*y*, *t*) *implies* (*x*, *λs*) ≺ (*y*, *λt*)*.*

**Definition 3.** *Let F*(*t*) *be a discount function differentiable in its domain. The elasticity of F*(*t*) *is defined as:*

$$\epsilon\_F(t) := t \frac{D(F)(t)}{F(t)} = tD(\ln F)(t) = -t\delta(t).$$

**Theorem 1.** *A decision-maker exhibiting preferences has strongly decreasing impatience if, and only if, <sup>D</sup>*2(*F*) <sup>&</sup>gt; <sup>−</sup> *<sup>D</sup>*(*F*) *id .*

**Proof.** In effect, for every *s* < *t*, *λ* > 1 and 0 < *x* < *y*, by Proposition 1, (*x*,*s*) ∼ (*y*, *t*) implies (*x*, *λs*) ≺ (*y*, *λt*). Consequently,

$$
\mu(\mathfrak{x})F(\mathfrak{s}) = \mathfrak{u}(\mathfrak{y})F(\mathfrak{t}).
$$

and

$$
\mu(\mathbf{x})F(\lambda \mathbf{s}) < \mu(y)F(\lambda \mathbf{t})\,.
$$

By dividing the left-hand sides and the right-hand sides of the former inequality and equality, one has:

$$\frac{F(\lambda s)}{F(s)} < \frac{F(\lambda t)}{F(t)},$$

from where:

$$
\ln F(\lambda s) - \ln F(s) < \ln F(\lambda t) - \ln F(t).
$$

As *λ* > 1, we can write *λ* := 1 + Δ, with Δ > 0, and so:

$$
\ln F((1+\Delta)s) - \ln F(s) < \ln F((1+\Delta)t) - \ln F(t).
$$

By dividing both member of the former inequality by Δ and letting Δ → 0, one has:

$$D\_f(F)(s) \le D\_f(F)(t)\_\*$$

where *f*(*x*, Δ) := (1 + Δ)*x*. Therefore, the function *Df*(*F*) is increasing, whereby:

$$D[D\_f(F)] \ge 0.$$

In order to calculate *Df*(*F*), take into account that now there is a proportional increment of the variable, that is to say:

$$f(\mathbf{x}, \Delta) = (1 + \Delta)\mathbf{x}.$$

Thus,

$$D\_f(F)(a) = aD(F)(a)$$

or, globally,

$$D\_f(F) = \text{id}D(F).$$

Consequently, id*D*(*F*) is increasing, whereby:

$$D[ID(F)] = D(F) + \text{id}D^2(F) > 0\_\prime$$

from where:

$$D^2(F) > -\frac{D(F)}{\text{id}}.$$

The proof of the converse implication is obvious.

**Example 4.** *The discount function F*(*t*) = exp{− arctan(*t*)} *exhibits strongly decreasing impatience. In effect, simple calculation shows that:*


*In this case, the inequality D*2(*F*) <sup>&</sup>gt; <sup>−</sup> *<sup>D</sup>*(*F*) *id results in a*<sup>2</sup> <sup>+</sup> *<sup>a</sup>* <sup>−</sup> <sup>1</sup> <sup>&</sup>gt; <sup>0</sup> *which holds for a* <sup>&</sup>gt; <sup>−</sup>1<sup>+</sup> √5 <sup>2</sup> *.*

The following result can be derived from Theorem 1 [22].

**Corollary 1.** *A decision-maker exhibiting preferences has strongly decreasing impatience if, and only if, <sup>F</sup> is decreasing.*

**Proof.** It is immediate, taking into account that id*D*(*F*) is increasing. As *F* is decreasing, then id *<sup>D</sup>*(*F*) *F* is increasing and

= −id*δ*

is decreasing. The proof of the converse implication is obvious.

Another specific case of DI is given by the following definition.

**Definition 4** ([19])**.** *A decision-maker exhibiting decreasing impatience has moderately decreasing impatience if sτ* < *tσ.*

The following corollary provides a characterization of moderately decreasing impatience.

**Corollary 2** ([22])**.** *A decision-maker exhibiting preferences has moderately decreasing impatience if, and only if, for every s* < *t, k* > 0*, λ* > 1 *and* 0 < *x* < *y,* (*x*,*s*) ∼ (*y*, *t*) *implies* (*x*,*s* + *k*) (*y*, *t* + *k*) *but* (*x*, *λs*) (*y*, *λt*)*.*

**Corollary 3.** *A decision-maker exhibiting preferences has moderately decreasing impatience if, and only if,* [*D*(*F*)]<sup>2</sup> *<sup>F</sup>* <sup>&</sup>lt; *<sup>D</sup>*2(*F*) ≤ − *<sup>D</sup>*(*F*) *id .*

**Proof.** It is an immediate consequence of Theorem 1 and of the fact that, in this case, *<sup>δ</sup>* <sup>=</sup> <sup>−</sup> *<sup>D</sup>*(*F*) *F* is decreasing.

The following result can be derived from Corollary 3 [22].

**Corollary 4.** *A decision-maker exhibiting preferences has moderately decreasing impatience f, and only if, <sup>F</sup> is increasing but δ is decreasing.*

#### **4. Discussion**

In this paper, we have introduced a new modality of relative derivation, specifically the so-called derivation of *F*(*x*) relative to a function *f*(*x*, Δ) := *x* + *g*(*x*, Δ), where *g*(*x*, Δ) represents the increments in the variable *x*. Obviously, this novel concept generalizes the two most important derivatives used in differential calculus:


It is easy to show that, in the former cases, Equation (13) leads to the well-known expressions of these two derivatives. In this paper, we have gone a step further and have considered proportional variations of the independent variable. These increments appear in the so-called *sensitivity analysis* which is a financial methodology which determines how changes of a variable can affect variations of another variable. This method, also called simulation analysis, is usually employed in financial problems under uncertainty contexts and also in econometric regressions.

In effect, in some economic contexts, percentage variations of the independent variable are analyzed. For example, the *elasticity* is the ratio of the percentage variations of two economic magnitudes. In linear regression, if the explanatory and the explained variables are affected by the natural logarithm, it is noteworthy to analyze the percentage variation of the dependent variable compared to percentage changes in the value of an independent variable. In this case, we would be interested in analyzing the ratio:

$$\frac{f(x + \Delta x) - f(x)}{f(x)}\text{ }$$

when <sup>Δ</sup>*<sup>x</sup> <sup>x</sup>* = *λ*, for a given *λ*. Thus, the former ratio remains as:

$$\frac{f(x + \lambda x) - f(x)}{f(x)}.$$

In another economic context, Karpoff [23], when searching the relationship between the price and the volume of transactions of an asset in a stock market, suggests quadratic and logarithmic increments:

• Quadratic increments aim to determine the variation of the volume when quadratic changes in the price of an asset have been considered. In this case,

$$\lg(\mathbf{x}, \Delta) = \left(\mathbf{x} + \Delta\right)^2 - \mathbf{x}^2$$

and so

and so

$$
\partial\_{y=0}(f)(a) = 2a.
$$

*g*(*x*, Δ) = ln(*x* + Δ) − ln *x*

*<sup>∂</sup>y*=0(*f*)(*a*) = <sup>1</sup>

Figure 3 summarizes the different types of increments discussed in this section.

*a* .

• Logarithmic increments aim to find the variation of the volume when considering quadratic changes in the price of an asset. In this case,

**Figure 3.** Chart of the different types of increments.

Indeed, some other variation models could be mentioned here. Take into account that some disciplines, such as biology, physics or economics, might be interested in explaining the increments in the dependent variable by using alternative patterns of variation. For example, think about a particle which is moving by following a given trajectory. In this context, researchers may be interested in knowing the behavior of the explained variable when the particle is continuously changing its position according to a given function.

#### **5. Conclusions**

This paper has introduced the novel concept of derivative of a function relative to another given function. The manuscript has been divided into two parts. The first part is devoted to the algebraic treatment of this concept and its basic properties in the framework of other relative derivatives. Moreover, this new derivative has been put in relation with the main variants of derivation in the field of abstract algebra. Given two *σ*-derivations over a *K*-algebra *A*, where *K* is a field, a relative *σ*-derivation has been associated to any function. This construction is, in fact, a derivation from *A* to the *A*-module of *σ*-derivations. Specifically, if *σ* is the identity of the algebra, these derivations can be applied to the theory of intertemporal choice.

The second part deals with the mathematical characterization of the so-called "strongly" and "moderately decreasing impatience" based on previous characterizations involving the proportional increasing of the variable "time". In effect, a specific situation, the case of proportional increments, plays a noteworthy role in economics, namely in intertemporal choice, where the analysis of decreasing impatience is a topic of fundamental relevance. In effect, the proportional increment of time is linked to the concept of strongly and moderately decreasing impatience. Therefore, the calculation of derivatives relatively to this class of increments will allow us to characterize these important modalities of decreasing impatience.

Moreover, after providing a geometric interpretation of this concept, this derivative has been calculated relatively to certain functions which represent different patterns of variability of the main variable involved in the problem.

Observe that, according to Fishburn and Rubinstein [20], the continuity of the order relation implies that functions *F* and *u* are continuous but not necessarily derivable. Indeed, this is a limitation of the approach presented in this paper which affects both function *F* and the variation pattern *g*. A further research could be to analyze the case of functions which are differentiable except at possibly a finite number of points in its domain.

Finally, apart from this financial application, another future research line is the characterization of other financial problems with specific models of variability. In this way, we can point out the proportional variability of reward amounts [24].

**Author Contributions:** Conceptualization, S.C.R. and B.T.J.; Formal analysis, S.C.R. and B.T.J.; Funding acquisition, S.C.R. and B.T.J.; Supervision, S.C.R. and B.T.J.; Writing – original draft, S.C.R. and B.T.J. All authors have read and agreed to the published version of the manuscript.

**Funding:** The authors gratefully acknowledge financial support from the Spanish Ministry of Economy and Competitiveness [National R&D Project "La sostenibilidad del Sistema Nacional de Salud: reformas, estrategias y propuestas", reference: DER2016-76053-R] and [National R&D Project "Anillos, módulos y álgebras de Hopf, reference: MTM2017-86987-P].

**Acknowledgments:** We are very grateful for the valuable comments and suggestions offered by three anonymous referees.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:

DI Decreasing Impatience

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **The VIF and MSE in Raise Regression**

**Román Salmerón Gómez 1, Ainara Rodríguez Sánchez <sup>2</sup> and Catalina García García 1,\* and José García Pérez <sup>3</sup>**


Received: 1 April 2020; Accepted: 13 April 2020; Published: 16 April 2020

**Abstract:** The raise regression has been proposed as an alternative to ordinary least squares estimation when a model presents collinearity. In order to analyze whether the problem has been mitigated, it is necessary to develop measures to detect collinearity after the application of the raise regression. This paper extends the concept of the variance inflation factor to be applied in a raise regression. The relevance of this extension is that it can be applied to determine the raising factor which allows an optimal application of this technique. The mean square error is also calculated since the raise regression provides a biased estimator. The results are illustrated by two empirical examples where the application of the raise estimator is compared to the application of the ridge and Lasso estimators that are commonly applied to estimate models with multicollinearity as an alternative to ordinary least squares.

**Keywords:** detection; mean square error; multicollinearity; raise regression; variance inflation factor

#### **1. Introduction**

In the last fifty years, different methods have been developed to avoid the instability of estimates derived from collinearity (see, for example, Kiers and Smilde [1]). Some of these methods can be grouped within a general denomination known as penalized regression.

In general terms, the penalized regression parts from the linear model (with *p* variables and *n* observations), **Y** = **X***β* + **u**, and obtains the regularization of the estimated parameters, minimizing the following objective function:

$$(\mathbf{Y} - \mathbf{X}\boldsymbol{\beta})^t (\mathbf{Y} - \mathbf{X}\boldsymbol{\beta}) + P(\boldsymbol{\beta})\_t$$

where *P*(*β*) is a penalty term that can take different forms. One of the most common penalty terms is the bridge penalty term ([2,3]) is given by

$$P(\mathfrak{f}) = \lambda \sum\_{j=1}^{p} \left| \mathcal{J}\_j \right|^\alpha, \text{ or } > 0,$$

where *λ* is a tuning parameter. Note that the ridge ([4]) and the Lasso ([5]) regressions are obtained when *α* = 2 and *α* = 1, respectively. Penalties have also been called soft thresholding ([6,7]).

These methods are applied not only for the treatment of multicollinearity but also for the selection of variables (see, for example, Dupuis and Victoria-Feser [8], Li and Yang [9] Liu et al. [10], or Uematsu and Tanaka [11]), which is a crucial issue in many areas of science when the number of variables exceeds the sample size. Zou and Hastie [12] proposed elastic net regularization by using the penalty terms *λ*<sup>1</sup> and *λ*<sup>2</sup> that combine the Lasso and ridge regressions:

$$P(\mathfrak{f}) = \lambda\_1 \sum\_{j=1}^p |\beta\_j| + \lambda\_2 \sum\_{j=1}^p \beta\_j^2.$$

Thus, the Lasso regression usually selects one of the regressors from among all those that are highly correlated, while the elastic net regression selects several of them. In the words of Tutz and Ulbricht [13] "the elastic net catches all the big fish", meaning that it selects the whole group.

From a different point of view, other authors have also presented different techniques and methods well suited for dealing with the collinearity problems: continuum regression ([14]), least angle regression ([15]), generalized maximum entropy ([16–18]), the principal component analysis (PCA) regression ([19,20]), the principal correlation components estimator ([21]), penalized splines ([22]), partial least squares (PLS) regression ([23,24]), or the surrogate estimator focused on the solution of the normal equations presented by Jensen and Ramirez [25].

Focusing on collinearity, the ridge regression is one of the more commonly applied methodologies and it is estimated by the following expression:

$$
\hat{\boldsymbol{\beta}}(\mathbf{X}) = \left(\mathbf{X}^t \mathbf{X} + \mathbf{K} \cdot \mathbf{I}\right)^{-1} \mathbf{X}^t \mathbf{Y} \tag{1}
$$

where **I** is the identity matrix with adequate dimensions and *K* is the ridge factor (ordinary least squares (OLS) estimators are obtained when *K* = 0). Although ridge regression has been widely applied, it presents some problems with current practice in the presence of multicollinearity and the estimators derived from the penalty come into these same problems whenever *n* > *p*:


$$\sum\_{i=1}^{n} \left( Y\_i - \bar{Y} \right)^2 = \sum\_{i=1}^{n} \left( \hat{Y}\_i(K) - \bar{Y} \right)^2 + \sum\_{i=1}^{n} e\_i(K)^2 + 2 \sum\_{i=1}^{n} \left( \hat{Y}\_i(K) - \bar{Y} \right) \cdot e\_i(K).$$

When the OLS estimators are obtained (*K* = 0), the third term is null. However, this term is not null when *K* is not zero. Consequently, the relationship *TSS*(*K*) = *ESS*(*K*) + *RSS*(*K*) is not satisfied in ridge regression, and the definition of the coefficient of determination may not be suitable. This fact not only limits the analysis of the goodness of fit but also affects the global significance since the critical coefficient of determination is also questioned. Rodríguez et al. [28] showed that the estimators obtained from the penalties mentioned above inherit the problem of the ridge regression in relation to the goodness of fit.

In order to overcome these problems, this paper is focused on the raise regression (García et al. [29] and Salmerón et al. [30]) based on the treatment of collinearity from a geometrical point of view. It consists in separating the independent variables by using the residuals (weighted by the raising factor) of the auxiliary regression traditionally used to obtain the VIF. Salmerón et al. [30] showed that the raise regression presents better conditions than ridge regression and, more recently, García et al. [31] showed, among other questions, that the ridge regression is a particular case of the raise regression.

This paper presents the extension of the VIF to the raise regression showing that, although García et al. [31] showed that the application of the raise regression guarantees a diminishing of the VIF, it is not guaranteed that its value is lower the threshold traditionally established as troubling. Thus, it will be concluded that an unique application of the raise regression does not guarantee the mitigation of the multicollinearity. Consequently, this extension complements the results presented by García et al. [31] and determines, on the one hand, whether it is necessary to apply a successive raise regression (see García et al. [31] for more details) and, on the other hand, the most adequate variable for raising and the most optimal value for the raising factor in order to guarantee the mitigation of the multicollinearity.

On the other hand, the transformation of variables is common when strong collinearity exists in a linear model. The transformation to unit length (see Belsley et al. [32]) or standardization (see Marquardt [27]) is typical. Although the VIF is invariant to these transformations when it is calculated after estimation by OLS (see García et al. [26]), it is not guaranteed either in the case of the raise regression or in ridge regression as showed by García et al. [26]. The analysis of this fact is one of the goals of this paper.

Finally, since the raise estimator is biased, it is interesting to calculate its mean square error (MSE). It is studied whether the MSE of the raise regression is less than the one obtained by OLS. In this case, this study could be used to select an adequate raising factor similar to what is proposed by Hoerl et al. [33] in the case of the ridge regression. Note that estimators with MSE less than the one from OLS estimators are traditionally preferred (see, for example, Stein [34], James and Stein [35], Hoerl and Kennard [4], Ohtani [36], or Hubert et al. [37]). In addition, this measure allows us to conclude whether the raise regression is preferable, in terms of MSE, to other alternative techniques.

The structure of the paper is as follows: Section 2 briefly describes the VIF and the raise regression, and Section 3 extends the VIF to this methodology. Some desirable properties of the VIF are analyzed, and its asymptotic behavior is studied. It is also concluded that the VIF is invariant to data transformation. Section 4 calculates the MSE of the raise estimator, showing that there is a minimum value that is less than the MSE of the OLS estimator. Section 5 illustrates the contribution of this paper with two numerical examples. Finally, Section 6 summarizes the main conclusions of this paper.

#### **2. Preliminaries**

#### *2.1. Variance Inflation Factor*

The following model for *p* independent variables and *n* observations is considered:

$$\mathbf{Y} = \beta\_1 + \beta\_2 \mathbf{X}\_2 + \dots + \beta\_i \mathbf{X}\_i + \dots + \beta\_p \mathbf{X}\_p + \mathbf{u} = \mathbf{X}\beta + \mathbf{u},\tag{2}$$

where **Y** is a vector *n* × 1 that contains the observations of the dependent variable, **X** = [**1 X**<sup>2</sup> ... **X***<sup>i</sup>* ... **X***p*] (with **1** being a vector of ones with dimension *n* × 1) is a matrix with order *n* × *p* that contains (by columns) the observations of the independent variables, *β* is a vector *p* × 1 that contains the coefficients of the independent variables, and **u** is a vector *n* × 1 that represents the random disturbance that is supposed to be spherical (*E*[**u**] = **0** and *Var*(**u**) = *σ*2**I**, where **0** is a vector with zeros with dimension *n* × 1 and **I** the identity matrix with adequate dimensions, in this case *p* × *p*).

Given the model in Equation (2), the variance inflation factor (VIF) is obtained as follows:

$$VIF(k) = \frac{1}{1 - R\_k^2}, \quad k = 2, \dots, p\_\prime \tag{3}$$

where *R*<sup>2</sup> *<sup>k</sup>* is the coefficient of determination of the regression of the variable **X***<sup>k</sup>* as a function of the rest of the independent variables of the model in Equation (2):

$$\mathbf{X}\_{k} = \mathbf{a}\_{1} + \mathbf{a}\_{2}\mathbf{X}\_{2} + \dots + \mathbf{a}\_{k-1}\mathbf{X}\_{k-1} + \mathbf{a}\_{k+1}\mathbf{X}\_{k+1} + \dots + \mathbf{a}\_{p}\mathbf{X}\_{p} + \mathbf{v} = \mathbf{X}\_{-k}\mathbf{a} + \mathbf{v},\tag{4}$$

where **X**−*<sup>k</sup>* corresponds to the matrix **X** after the elimination of the column *k* (variable **X***k*).

If the variable **X***<sup>k</sup>* has no linear relationship (i.e., is orthogonal) with the rest of the independent variables, the coefficient of determination will be zero (*R*<sup>2</sup> *<sup>k</sup>* = 0) and the *VIF*(*k*) = 1. As the linear relationship increases, the coefficient of determination (*R*<sup>2</sup> *<sup>k</sup>*) and consequently *VIF*(*k*) will also increase. Thus, the higher the VIF associated with the variable **X***k*, the greater the linear relationship between this variable and the rest of the independent variables in the model in Equation (2). It is considered that the collinearity is troubling for values of VIF higher than 10. Note that the VIF ignores the role of the constant term (see, for example, Salmerón et al. [38] or Salmerón et al. [39]), and consequently, this extension will be useful when the multicollinearity is essential; that is to say, when there is a linear relationship between at least two independent variables of the model of regression without considering the constant term (see, for example, Marquandt and Snee [40] for the definitions of essential and nonessential multicollinearity).

#### *2.2. Raise Regression*

Raise regression, presented by García et al. [29] and more developed further by Salmerón et al. [30], uses the residuals of the model in Equation (4), **e***k*, to raise the variable *k* as **<sup>X</sup>**.*<sup>k</sup>* <sup>=</sup> **<sup>X</sup>***<sup>k</sup>* <sup>+</sup> *<sup>λ</sup>***e***<sup>k</sup>* with *<sup>λ</sup>* <sup>≥</sup> 0 (called the raising factor) and to verify that **<sup>e</sup>***<sup>t</sup> <sup>k</sup>***X**−*<sup>k</sup>* = **<sup>0</sup>**, where **<sup>0</sup>** is a vector of zeros with adequate dimensions. In that case, the raise regression consists in the estimation by OLS of the following model:

$$\mathbf{Y} = \beta\_1(\lambda) + \beta\_2(\lambda)\mathbf{X}\_2 + \dots + \beta\_k(\lambda)\dot{\mathbf{X}}\_k + \dots + \beta\_{\mathcal{P}}(\lambda)\mathbf{X}\_{\mathcal{P}} + \tilde{\mathbf{u}} = \dot{\mathbf{X}}\pounds(\lambda) + \tilde{\mathbf{u}},\tag{5}$$

where **X˜** = [**1 X**<sup>2</sup> ... **<sup>X</sup>**.*<sup>k</sup>* ... **<sup>X</sup>***p*]=[**X**−*<sup>k</sup>* **<sup>X</sup>**.*k*]. García et al. [29] showed (Theorem 3.3) that this technique does not alter the global characteristics of the initial model. That is to say, the models in Equations (2) and (5) have the same coefficient of determination and experimental statistics for the global significance test.

Figure 1 illustrates the raise regression for two independent variables being geometrically separated by using the residuals weighted by the raising factor *λ*. Thus, the selection of an adequate value for *λ* is essential, analogously to what occurs with the ridge factor *K*. A preliminary proposal about how to select the raising factor in a model with two independent standardized variables can be found in García et al. [41]. Other recently published papers introduce and highlight the various advantages of raise estimators for statistical analysis: Salmerón et al. [30] presented the raise regression for *p* = 3 standardized variables and showed that it presents better properties than the ridge regression and that the individual inference of the raised variable is not altered, García et al. [31] showed that it is guaranteed that all the VIFs associated with the model in Equation (5) diminish but that it is not possible to quantify the decrease, García and Ramírez [42] presented the successive raise regression, and García et al. [31] showed, among other questions, that ridge regression is a particular case of raise regression.

**Figure 1.** Representation of the raise method.

The following section presents the extension of the VIF to be applied after the estimation by raise regression since it will be interesting whether, after the raising of one independent variable, the VIF falls below 10. It will be also analyzed when a successive raise regression can be recommendable (see García and Ramírez [42]).

#### **3. VIF in Raise Regression**

To calculate the VIF in the raise regression, two cases have to be differentiated depending on the dependent variable, **X***k*, of the auxiliary regression:

1. If it is the raised variable, **<sup>X</sup>**.*<sup>i</sup>* with *<sup>i</sup>* = 2, ... , *<sup>p</sup>*, the coefficient of determination, *<sup>R</sup>*<sup>2</sup> *<sup>i</sup>* (*λ*), of the following auxiliary regression has to be calculated:

$$\begin{array}{rcl}\overline{\mathbf{X}}\_{l} &=& \boldsymbol{a}\_{1}(\boldsymbol{\lambda}) + \boldsymbol{a}\_{2}(\boldsymbol{\lambda})\mathbf{X}\_{2} + \cdots + \boldsymbol{a}\_{l-1}(\boldsymbol{\lambda})\mathbf{X}\_{l-1} + \boldsymbol{a}\_{l+1}(\boldsymbol{\lambda})\mathbf{X}\_{l+1} + \cdots + \boldsymbol{a}\_{p}(\boldsymbol{\lambda})\mathbf{X}\_{p} + \overline{\mathbf{v}}\\ &=& \mathbf{X}\_{-l}\mathbf{a}(\boldsymbol{\lambda}) + \overline{\mathbf{v}}.\end{array} \tag{6}$$

2. If it is not the raised variable, **X***<sup>j</sup>* with *j* = 2, ... , *p* being *j* = *i*, the coefficient of determination, *R*2 *<sup>j</sup>*(*λ*), of the following auxiliary regression has to be calculated:

$$\begin{aligned} \mathbf{X}\_{j} &= \begin{aligned} a\_{1}(\lambda) + a\_{2}(\lambda)\mathbf{X}\_{2} + \dots + a\_{l}(\lambda)\mathbf{X}\_{l} + \dots + a\_{j-1}(\lambda)\mathbf{X}\_{j-1} + a\_{j+1}(\lambda)\mathbf{X}\_{j+1} \\ &+ \dots + a\_{p}(\lambda)\mathbf{X}\_{p} + \overline{\mathbf{v}} \\ &= \begin{pmatrix} \mathbf{X}\_{-i,-j} \ \overline{\mathbf{X}}\_{l} \end{pmatrix} \begin{pmatrix} \mathfrak{a}\_{-i,-j}(\lambda) \\ a\_{i}(\lambda) \end{pmatrix} + \overline{\mathbf{v}}, \end{aligned} \tag{7}$$

where **X**−*i*,−*<sup>j</sup>* corresponding to the matrix **X** after the elimination of columns *i* and *j* (variables **X***<sup>i</sup>* and **<sup>X</sup>***j*). The same notation is used for *<sup>α</sup>*−*i*,−*j*(*λ*).

Once these coefficients of determination are obtained (as indicated in the following subsections), the VIF of the raise regression will be given by the following:

$$VIF(k, \lambda) = \frac{1}{1 - R\_k^2(\lambda)}, \quad k = 2, \dots, p. \tag{8}$$

#### *3.1. VIF Associated with Raise Variable*

In this case, for *i* = 2, ... , *p*, the coefficient of determination of the regression in Equation (6) is given by

$$\begin{array}{rcl} R\_i^2(\lambda) &=& 1 - \frac{(1 + 2\lambda + \lambda^2)RSS\_i^{-\iota}}{TSS\_i^{-\iota} + (\lambda^2 + 2\lambda)RSS\_i^{-\iota}} = \frac{ESS\_i^{-\iota}}{TSS\_i^{-\iota} + (\lambda^2 + 2\lambda)RSS\_i^{-\iota}} \\ &=& \frac{R\_i^2}{1 + (\lambda^2 + 2\lambda)(1 - R\_i^2)} \end{array} \tag{9}$$

since:

$$\begin{array}{rcl} TSS\_{i}^{-i}(\lambda) &=& \widetilde{\mathbf{X}}\_{i}^{t}\widetilde{\mathbf{X}}\_{i} - n \cdot \overline{\widetilde{\mathbf{X}}}\_{i}^{2} = \mathbf{X}\_{i}^{t}\mathbf{X}\_{i} + (\lambda^{2} + 2\lambda)\mathbf{e}\_{i}^{t}\mathbf{e}\_{i} - n \cdot \overline{\mathbf{X}}\_{i}^{2} \\ &=& TSS\_{i}^{-i} + (\lambda^{2} + 2\lambda)RSS\_{i}^{-i}, \\\ RSS\_{i}^{-i}(\lambda) &=& \widetilde{\mathbf{X}}\_{i}^{t}\widetilde{\mathbf{X}}\_{i} - \widehat{\mathbf{a}}(\lambda)^{t}\mathbf{X}\_{-i}^{t}\widetilde{\mathbf{X}}\_{i} = \mathbf{X}\_{i}^{t}\mathbf{X}\_{i} + (\lambda^{2} + 2\lambda)\mathbf{e}\_{i}^{t}\mathbf{e}\_{i} - \widehat{\mathbf{a}}^{t}\mathbf{X}\_{-i}^{t}\mathbf{X}\_{i} \\ &=& (\lambda^{2} + 2\lambda + 1)RSS\_{i}^{-i} \end{array}$$

where *TSS*−*<sup>i</sup> <sup>i</sup>* , *ESS*−*<sup>i</sup> <sup>i</sup>* and *RSS*−*<sup>i</sup> <sup>i</sup>* are the total sum of squares, explained sum of squares, and residual sum of squares of the model in Equation (4). Note that it has been taken into account that

$$\widetilde{\mathbf{X}}\_i^t \widetilde{\mathbf{X}}\_i = (\mathbf{X}\_i + \lambda \mathbf{e}\_i)^t \left(\mathbf{X}\_i + \lambda \mathbf{e}\_i\right) = \mathbf{X}\_i^t \mathbf{X}\_i + (\lambda^2 + 2\lambda) \mathbf{e}\_i^t \mathbf{e}\_i.$$

since **e***<sup>t</sup> i* **X***<sup>i</sup>* = **e***<sup>t</sup> i* **e***<sup>i</sup>* = *RSS*−*<sup>i</sup> <sup>i</sup>* and

$$
\widehat{\mathfrak{a}}(\lambda) = \left(\mathbf{X}\_{-i}^t \mathbf{X}\_{-i}\right)^{-1} \mathbf{X}\_{-i}^t \widetilde{\mathbf{X}}\_i = \widehat{\mathfrak{a}}\_{-i}
$$

due to **X***<sup>t</sup>* −*i* **<sup>X</sup>**.*<sup>i</sup>* = **<sup>X</sup>***<sup>t</sup>* −*i* **X***i*.

Indeed, from Equation (9), it is evident that


Finally, from properties 1) and 3), it is deduced that *R*<sup>2</sup> *<sup>i</sup>* (*λ*) <sup>≤</sup> *<sup>R</sup>*<sup>2</sup> *<sup>i</sup>* for all *λ*.

#### *3.2. VIF Associated with Non-Raised Variables*

In this case, for *j* = 2, ... , *p*, with *j* = *i*, the coefficient of determination of regression in Equation (7) is given by

$$\begin{split} R\_{\hat{f}}^{2}(\lambda) &= \quad 1 - \frac{\text{RSS}\_{\hat{f}}^{-\hat{f}}(\lambda)}{\text{TSS}\_{\hat{f}}^{-\hat{f}}(\lambda)}\\ &= \quad \frac{1}{\text{TSS}\_{\hat{f}}^{-\hat{f}}} \left( TSS\_{\hat{f}}^{-\hat{f}} - RSS\_{\hat{f}}^{-i,-j} + \frac{\text{RSS}\_{i}^{-i,-j} \left( RSS\_{\hat{f}}^{-i,-j} - RSS\_{\hat{f}}^{-\hat{f}} \right)}{\text{RSS}\_{i}^{-i,-j} + (\lambda^{2} + 2\lambda) \cdot RSS\_{i}^{-i}} \right) \end{split} \tag{10}$$

Taking into account that **<sup>X</sup>**.*<sup>t</sup> i* **X***<sup>j</sup>* = (**X***<sup>i</sup>* + *λ***e***i*)*<sup>t</sup>* **X***<sup>j</sup>* = **X***<sup>t</sup> i* **X***<sup>j</sup>* since **e***<sup>t</sup> i* **X***<sup>j</sup>* = 0, it is verified that

$$TSS\_{\dot{\jmath}}^{-\dot{\jmath}}(\lambda) = \mathbf{X}\_{\dot{\jmath}}^{t}\mathbf{X}\_{\dot{\jmath}} - n \cdot \overline{\mathbf{X}}\_{\dot{\jmath}}^{2} = TSS\_{\dot{\jmath}}^{-\dot{\jmath}}\text{.}$$

and, from Appendices A and B,

*RSS*−*<sup>j</sup> <sup>j</sup>* (*λ*) = **<sup>X</sup>***<sup>t</sup> <sup>j</sup>***X***<sup>j</sup>* <sup>−</sup> /*α*(*λ*)*<sup>t</sup>* **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **X***j* **X**.*t i* **X***j* = **X***<sup>t</sup> <sup>j</sup>***X***<sup>j</sup>* <sup>−</sup> /*α*−*i*,−*j*(*λ*)*<sup>t</sup>* **X***t* −*i*,−*j* **<sup>X</sup>***<sup>j</sup>* <sup>−</sup> /*αi*(*λ*)*<sup>t</sup>* **X***t <sup>i</sup>***X***<sup>j</sup>* = Appendix A **X***t <sup>j</sup>***X***<sup>j</sup>* <sup>−</sup> **<sup>X</sup>***<sup>t</sup> <sup>j</sup>***X**−*i*,−*<sup>j</sup>* **X***t* −*i*,−*j* **X**−*i*,−*<sup>j</sup>* −1 **X***t* −*i*,−*j* **X***j* <sup>−</sup> *RSS*−*i*,−*<sup>j</sup> i RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* + (*λ*<sup>2</sup> <sup>+</sup> <sup>2</sup>*λ*) · *RSS*−*<sup>i</sup> i* · · *RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* **<sup>X</sup>***<sup>t</sup> <sup>j</sup>***X**−*i*,−*<sup>j</sup>* · *<sup>B</sup>* · *<sup>B</sup><sup>t</sup>* · **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **X***j* +**X***<sup>t</sup> <sup>j</sup>***X***<sup>i</sup>* · *<sup>B</sup><sup>t</sup>* · **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **X***j* <sup>−</sup> *RSS*−*i*,−*<sup>j</sup> i RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* + (*λ*<sup>2</sup> <sup>+</sup> <sup>2</sup>*λ*) · *RSS*−*<sup>i</sup> i* · /*αt i***X***t <sup>i</sup>***X***<sup>j</sup>* = **X***<sup>t</sup> j* **I** − **X**−*i*,−*<sup>j</sup>* **X***t* −*i*,−*j* **X**−*i*,−*<sup>j</sup>* −1 **X***t* −*i*,−*j* **X***j* <sup>−</sup> *RSS*−*i*,−*<sup>j</sup> i RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* + (*λ*<sup>2</sup> <sup>+</sup> <sup>2</sup>*λ*) · *RSS*−*<sup>i</sup> i* · · *RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* **<sup>X</sup>***<sup>t</sup> <sup>j</sup>***X**−*i*,−*<sup>j</sup>* · *<sup>B</sup>* · *<sup>B</sup><sup>t</sup>* · **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **X***j* +**X***<sup>t</sup> <sup>j</sup>***X***<sup>i</sup>* · *<sup>B</sup><sup>t</sup>* · **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **<sup>X</sup>***<sup>j</sup>* <sup>+</sup> /*α<sup>t</sup> i***X***t <sup>i</sup>***X***<sup>j</sup>* = Appendix B *RSS*−*i*,−*<sup>j</sup> j* <sup>−</sup> *RSS*−*i*,−*<sup>j</sup> i RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* + (*λ*<sup>2</sup> <sup>+</sup> <sup>2</sup>*λ*) · *RSS*−*<sup>i</sup> i* · *RSS*−*i*,−*<sup>j</sup> <sup>j</sup>* <sup>−</sup> *RSS*−*<sup>j</sup>*

where *TSS*−*<sup>j</sup> <sup>j</sup>* and *RSS*−*<sup>j</sup> <sup>j</sup>* are the total sum of squares and residual sum of squares of the model in Equation (4) and where *RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* and *RSS*−*i*,−*<sup>j</sup> <sup>j</sup>* are the residual sums of squares of models:

$$\begin{array}{rcl} \mathbf{X}\_{i} &=& \mathbf{X}\_{-i,-j}\boldsymbol{\gamma} + \boldsymbol{\eta}\_{i} \\ \boldsymbol{\omega} &=& \boldsymbol{\omega} \end{array} \tag{11}$$

*j* ,

$$\mathbf{X}\_{\circ} \quad = \; \mathbf{X}\_{-i,-j} \boldsymbol{\delta} + \boldsymbol{\nu}. \tag{12}$$

Indeed, from Equation (10), it is evident that

1. *R*<sup>2</sup> *<sup>j</sup>*(*λ*) decreases as *λ* increases.

$$\text{2. }\lim\_{\lambda \to +\infty} R\_j^2(\lambda) = \frac{\text{TSS}\_j^{-\prime} - \text{RSS}\_j^{-\imath\_\ast - \jmath}}{\text{TSS}\_j^{-\jmath}}$$

3. *R*<sup>2</sup> *<sup>j</sup>*(*λ*) is continuous in zero. That is to say, *<sup>R</sup>*<sup>2</sup> *<sup>j</sup>*(0) = *TSS*−*<sup>j</sup> <sup>j</sup>* <sup>−</sup>*RSS*−*<sup>j</sup> j TSS*−*<sup>j</sup> j* = *R*<sup>2</sup> *j* .

.

Finally, from properties 1) and 3), it is deduced that *R*<sup>2</sup> *<sup>j</sup>*(*λ*) <sup>≤</sup> *<sup>R</sup>*<sup>2</sup> *<sup>j</sup>* for all *λ*.

#### *3.3. Properties of V IF*(*k*, *λ*)

From conditions verified by the coefficient of determination in Equations (9) and (10), it is concluded that *VIF*(*k*, *λ*) (see expression Equation (8)), verifies that

*Mathematics* **2020**, *8*, 605

1. The VIF associated with the raise regression is continuous in zero because the coefficients of determination of the auxiliary regressions in Equations (6) and (7) are also continuous in zero. That is to say, for *λ* = 0, it coincides with the VIF obtained for the model in Equation (2) when it is estimated by OLS:

$$VIF(k,0) = \frac{1}{1 - R\_k^2(0)} = \frac{1}{1 - R\_k^2} = VIF(k), \quad k = 2, \dots, p.p.$$

2. The VIF associated with the raise regression decreases as *λ* increases since this is the behavior of the coefficient of determination of the auxiliary regressions in Equations (6) and (7). Consequently,

$$VIF(k, \lambda) = \frac{1}{1 - R\_k^2(\lambda)} \le \frac{1}{1 - R\_k^2} = VIF(k), \quad k = 2, \dots, p, \quad \forall \lambda \ge 0.$$

3. The VIF associated with the raised variable is always higher than one since

$$\lim\_{\lambda \to +\infty} VIF(i, \lambda) = \lim\_{\lambda \to +\infty} \frac{1}{1 - R\_i^2(\lambda)} = \frac{1}{1 - 0} = 1, \quad i = 2, \dots, p.p.$$

4. The VIF associated with the non-raised variables has a horizontal asymptote since

$$\begin{aligned} \lim\_{\lambda \to +\infty} VIF(j, \lambda) &= \lim\_{\lambda \to +\infty} \frac{1}{1 - R\_j^2(\lambda)} = \frac{1}{1 - \frac{TSS\_j^{-i} - RSS\_j^{-i, -j}}{TSS\_j^{-i}}} \\ &= \frac{TSS\_j^{-j}}{RSS\_j^{-i, -j}} = \frac{TSS\_j^{-i, -j}}{RSS\_j^{-i, -j}} = \frac{1}{1 - R\_{ij}^2} = VIF\_{-i}(j), \end{aligned}$$

where *R*<sup>2</sup> *ij* is the coefficient of determination of the regression in Equation (12) for *j* = 2, ... , *p* and *j* = *<sup>i</sup>*. Indeed, this asymptote corresponds to the VIF, *VIF*−*i*(*j*), of the regression **<sup>Y</sup>** = **<sup>X</sup>**−*i<sup>ξ</sup>* + **<sup>w</sup>** and, consequently, will also always be equal to or higher than one.

Thus, from properties (1) to (4), *VIF*(*k*, *λ*) has the very desirable properties of being continuous, monotone in the raise parameter, and higher than one, as presented in García et al. [26].

In addition, the property (4) can be applied to determine the variable to be raised only considering the one with a lower horizontal asymptote. If the asymptote is lower than 10 (the threshold established traditionally as worrying), the extension could be applied to determine the raising factor by selecting, for example, the first *λ* that verifies *VIF*(*k*, *λ*) < 10 for *k* = 2, ... , *p*. If none of the *p* − 1 asymptotes is lower than the established threshold, it will not be enough to raise one independent variable and a successive raise regression will be recommended (see García and Ramírez [42] and García et al. [31] for more details). Note that, if it were necessary to raise more than one variable, it is guaranteed that there will be values of the raising parameter that mitigate multicollinearity since, in the extreme case where all the variables of the model are raised, all the VIFs associated with the raised variables tend to one.

#### *3.4. Transformation of Variables*

The transformation of data is very common when working with models where strong collinearity exists. For this reason, this section analyzes whether the transformation of the data affects the VIF obtained in the previous section.

Since the expression given by Equation (9) can be expressed with *i* = 2, . . . , *p* in the function of *R*<sup>2</sup> *i* :

$$R\_i^2(\lambda) = \frac{R\_i^2}{1 + (\lambda^2 + 2\lambda) \cdot (1 - R\_i^2)},$$

it is concluded that it is invariant to origin and scale changes and, consequently, the VIF calculated from it will also be invariant.

On the other hand, the expression given by Equation (10) can be expressed for *j* = 2, ... , *p*, with *j* = *i* as

$$\begin{split} R\_{\hat{f}}^{2}(\lambda) &= \begin{aligned} 1 - \frac{RSS\_{\hat{f}}^{-i,-j}}{TSS\_{\hat{f}}^{-i}} + \frac{1}{TSS\_{\hat{f}}^{-i,-j}} \cdot \frac{RSS\_{\hat{f}}^{-i,-j} \cdot (RSS\_{\hat{f}}^{-i,-j} - RSS\_{\hat{f}}^{-j})}{RSS\_{\hat{f}}^{-i,-j} + (\lambda^{2} + 2\lambda) \cdot RSS\_{\hat{f}}^{-i}} \\ &= \ R\_{\hat{i}\hat{f}}^{2} + \frac{RSS\_{\hat{i}}^{-i,-j}}{RSS\_{\hat{i}}^{-i,-j} + (\lambda^{2} + 2\lambda) \cdot RSS\_{\hat{i}}^{-i}} \cdot \left(\frac{RSS\_{\hat{f}}^{-i,-j} - \frac{RSS\_{\hat{f}}^{-j}}{TSS\_{\hat{f}}^{-i,-j}}\right)}{TS\_{\hat{f}}^{-i,-j} - \frac{1}{TSS\_{\hat{f}}^{-i,-j}}} \\ &= \ R\_{\hat{i}\hat{f}}^{2} + \frac{R\_{\hat{f}}^{2} - R\_{\hat{i}}^{2}}{1 + (\lambda^{2} + 2\lambda) \cdot \frac{RSS\_{\hat{i}}^{-i,-j}}{RSS\_{\hat{i}}^{-i,-j}}} \end{aligned} \tag{13}$$

where it was applied that *TSS*−*<sup>j</sup> <sup>j</sup>* <sup>=</sup> *TSS*−*i*,−*<sup>j</sup> <sup>j</sup>* .

In this case, by following García et al. [26], transforming the variable **X***<sup>i</sup>* as

$$\mathbf{x}\_{i} = \frac{\mathbf{X}\_{i} - a\_{i}}{b\_{i}}, \ a\_{i} \in \mathbb{R}, \ b\_{i} \in \mathbb{R} - \{0\}, \quad i = 2, \dots, p\_{\prime}$$

it is obtained that *RSS*−*<sup>i</sup> <sup>i</sup>* (*T*) = <sup>1</sup> *b*2 *i RSS*−*<sup>i</sup> <sup>i</sup>* and *RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* (*T*) = <sup>1</sup> *b*2 *i RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* where *RSS*−*<sup>i</sup> <sup>i</sup>* (*T*) and *RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* (*T*) are the residual sum of squares of the transformed variables.

Taking into account that **X***<sup>i</sup>* is the dependent variables in the regressions of *RSS*−*<sup>i</sup> <sup>i</sup>* and *RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* , the following is obtained:

$$\frac{RSS\_i^{-i}}{RSS\_i^{-i,-j}} = \frac{RSS\_i^{-i}(T)}{RSS\_i^{-i,-j}(T)}.$$

Then, the expression given by Equation (13) is invariant to data transformations (As long as the dependent variables are transformed from the regressions of *RSS*−*<sup>i</sup> <sup>i</sup>* and *RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* in the same form. For example, (a) for considering that *ai* is its mean and *bi* is its standard deviation (typification), (b) for considering that *ai* is its mean and *bi* is its standard deviation multiplied by the square root of the number of observations (standardization), or (c) for considering that *ai* is zero and *bi* is the square root of the squares sum of observations (unit length).) and, consequently, the VIF calculated from it will also be invariant.

*Mathematics* **2020**, *8*, 605

#### **4. MSE for Raise Regression**

Since the estimator *β* obtained from Equation (5) is biased, it is interesting to study its Mean Square Error (MSE).

Taking into account that, for *k* = 2, . . . , *p*,

$$\begin{aligned} \widetilde{\mathbf{X}}\_{k} &= \mathbf{X}\_{k} + \lambda \mathbf{e}\_{k} \\ &= \left(1 + \lambda\right) \mathbf{X}\_{k} - \lambda \left(\widehat{\mathbf{a}}\_{0} + \widehat{\mathbf{a}}\_{1} \mathbf{X}\_{1} + \dots + \widehat{\mathbf{a}}\_{k-1} \mathbf{X}\_{k-1} + \widehat{\mathbf{a}}\_{k+1} \mathbf{X}\_{k+1} + \dots + \widehat{\mathbf{a}}\_{p} \mathbf{X}\_{p}\right), \end{aligned}$$

it is obtained that matrix **X**. of the expression in Equation (5) can be rewritten as **X**. = **X** · **M***λ*, where

$$\mathbf{M}\_{\lambda} = \begin{pmatrix} 1 & 0 & \cdots & 0 & -\lambda\widehat{\boldsymbol{\alpha}}\_{0} & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 & -\lambda\widehat{\boldsymbol{\alpha}}\_{1} & 0 & \cdots & 0 \\ \vdots & \vdots & & \vdots & \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & 1 & -\lambda\widehat{\boldsymbol{\alpha}}\_{k-1} & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 & 1 + \lambda & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 & -\lambda\widehat{\boldsymbol{\alpha}}\_{k+1} & 1 & \cdots & 0 \\ \vdots & \vdots & & \vdots & \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & 0 & -\lambda\widehat{\boldsymbol{\alpha}}\_{p} & 0 & \cdots & 1 \\ \end{pmatrix} \tag{14}$$

Thus, we have *<sup>β</sup>*/(*λ*)=(**X**.*<sup>t</sup>* · **<sup>X</sup>**.)−1**X**.*<sup>t</sup>* · **<sup>Y</sup>** <sup>=</sup> **<sup>M</sup>**−<sup>1</sup> *<sup>λ</sup>* · *β*/, and then, the estimator of *β* obtained from Equation (5) is biased unless **M***<sup>λ</sup>* = **I**, which only occurs when *λ* = 0, that is to say, when the raise regression coincides with OLS. Moreover,

$$\begin{split} tr\left(Var\left(\widehat{\mathfrak{F}}(\lambda)\right)\right) &=& tr(\mathbf{M}\_{\lambda}^{-1} \cdot Var(\widehat{\mathfrak{F}}) \cdot (\mathbf{M}\_{\lambda}^{-1})^{t}) = \sigma^{2} tr((\widetilde{\mathbf{X}}^{t}\widetilde{\mathbf{X}})^{-1}), \\ tr(E[\widehat{\mathfrak{F}}(\lambda)] - \mathfrak{F})^{t}(E[\widehat{\mathfrak{F}}(\lambda)] - \mathfrak{F}) &=& \mathfrak{F}^{t}(\mathbf{M}\_{\lambda}^{-1} - \mathbf{I})^{t}(\mathbf{M}\_{\lambda}^{-1} - \mathbf{I})\mathfrak{F}, \end{split}$$

where *tr* denotes the trace of a matrix.

In that case, the MSE for raise regression is

$$\begin{split} \text{MSE}\left(\widehat{\boldsymbol{\beta}}(\boldsymbol{\lambda})\right) &=& \text{tr}\left(\boldsymbol{Var}\left(\widehat{\boldsymbol{\beta}}(\boldsymbol{\lambda})\right)\right) + \left(E[\widehat{\boldsymbol{\beta}}(\boldsymbol{\lambda})] - \boldsymbol{\beta}\right)^{t} (E[\widehat{\boldsymbol{\beta}}(\boldsymbol{\lambda})] - \boldsymbol{\beta}) \\ &=& \sigma^{2} \text{tr}\left((\widehat{\mathbf{X}}^{t}\widehat{\mathbf{X}})^{-1}\right) + \boldsymbol{\beta}^{t} (\mathbf{M}\_{\boldsymbol{\lambda}}^{-1} - \mathbf{I})^{t} (\mathbf{M}\_{\boldsymbol{\lambda}}^{-1} - \mathbf{I}) \boldsymbol{\beta} \\ &=& \sum\_{\text{Append}\subset\mathbb{C}} \sigma^{2} \text{tr}\left(\left(\mathbf{X}\_{-k}^{t}\mathbf{X}\_{-k}\right)^{-1}\right) + \left(1 + \sum\_{j=0, j\neq k}^{p} \widehat{a}\_{j}^{2}\right) \cdot \boldsymbol{\beta}\_{k}^{2} \cdot \frac{\lambda^{2} + h}{(1+\lambda)^{2}}, \end{split}$$

where *h* = *<sup>σ</sup>*<sup>2</sup> *β*2 *<sup>k</sup>* ·*RSS*−*<sup>k</sup> k* .

We can obtain the MSE from the estimated values of *σ*<sup>2</sup> and *β<sup>k</sup>* from the model in Equation (2). On the other hand, once the estimations are obtained and taking into account the Appendix C, *<sup>λ</sup>min* <sup>=</sup> /*σ*<sup>2</sup> *β* /2 *<sup>k</sup>* ·*RSS*−*<sup>k</sup> k* minimizes MSE *β*/(*λ*) . Indeed, it is verified that MSE *β*/(*λmin*) <sup>&</sup>lt; MSE *β*/(0) ; that is to say, if the goal is exclusively to minimize the MSE (as in the work presented by Hoerl et al. [33]), *λmin* should be selected as the raising factor.

Finally, note that, if *<sup>λ</sup>min* <sup>&</sup>gt; 1, then MSE *β*/(*λ*) <sup>&</sup>lt; MSE *β*/(0) for all *λ* > 0.

#### **5. Numerical Examples**

To illustrate the results of previous sections, two different set of data will be used that collect the two situations shown in the graphs of Figures A1 and A2. The second example also compares results obtained by the raise regression to results obtained by the application of ridge and Lasso regression.

#### *5.1. Example 1: h* < 1

The data set includes different financial variables for 15 Spanish companies for the year 2016 (consolidated account and results between e800,000 and e9,000,000) obtained from the dabase Sistema de Análisis de Balances Ibéricos (SABI) database. The relationship is studied between the number of employees, *E*, and the fixed assets (e), *FA*; operating income (e), *OI*; and sales (e), *S*. The model is expressed as

$$E = \beta\_1 + \beta\_2 FA + \beta\_3 OI + \beta\_4 S + \mu. \tag{15}$$

Table 1 displays the results of the estimation by OLS of the model in Equation (15). The presence of essential collinearity in the model in Equation (15) is indicated by the determinant close to zero (0.0000919) of the correlation matrix of independent variables

$$R = \begin{pmatrix} 1 & 0.7264656 & 0.7225473 \\ 0.7264656 & 1 & 0.9998871 \\ 0.7225473 & 0.9998871 & 1 \end{pmatrix},$$

and the VIFs (2.45664, 5200.315, and 5138.535) higher than 10. Note that the collinearity is provoked fundamentally by the relationship between **OI** and **S**.

In contrast, due to the fact that the coefficients of variation of the independent variables (1.015027, 0.7469496, and 0.7452014) are higher than 0.1002506, the threshold established as troubling by Salmerón et al. [39], it is possible to conclude that the nonessential multicollinearity is not troubling. Thus, the extension of the VIF seems appropriate to check if the application of the raise regression has mitigated the multicollinearity.

**Remark 1.** *λ*(1) *and λ*(2) *will be the raising factor of the first and second raising, respectively.*


*Mathematics* **2020**, *8*, 605

#### 5.1.1. First Raising

A possible solution could be to apply the raise regression to try to mitigate the collinearity. To decide which variable is raised, the thresholds for the VIFs associated with the raise regression are calculated with the goal of raising the variable that the smaller horizontal asymptotes present. In addition to raising the variable that presents the lowest VIF, it would be interesting to obtain a lower mean squared error (MSE) after raising. For this, the *λ*(1) *min* is calculated for each case. Results are shown in Table 2. Note that the variable to be raised should be the second or third since their asymptotes are lower than 10, although in both cases *λ*(1) *min* is lower than 1 and it is not guaranteed that the MSE of the raise regression will be less than the one obtained from the estimation by the OLS of the model in Equation (15). For this reason, this table also shows the values of *λ*(1) that make the MSE of the raise regression coincide with the MSE of the OLS regression, *λ*(1) *mse*, and the minimum value of *λ*(1) that leads to values of VIF less than 10, *λ*(1) *vi f* .

**Table 2.** Horizontal asymptotes for variance inflation factors (VIF) after raising each variable and *<sup>λ</sup>*(1) *min*, *<sup>λ</sup>*(1) *mse*, and *<sup>λ</sup>*(1) *vi f* .


Figure <sup>2</sup> displays the VIF associated with the raise regression for 0 <sup>≤</sup> *<sup>λ</sup>*(1) <sup>≤</sup> 900 after raising the second variable. It is observed that VIFs are always higher than its corresponding horizontal asymptotes.

The model after raising the second variable will be given by

$$E = \beta\_1(\lambda) + \beta\_2(\lambda)\mathbf{F}\mathbf{A} + \beta\_3(\lambda)\widehat{\mathbf{O}}\mathbf{I} + \beta\_4(\lambda)\mathbf{S} + \tilde{\mathfrak{u}},\tag{16}$$

where **OI** <sup>1</sup> <sup>=</sup> **OI** <sup>+</sup> *<sup>λ</sup>*(1) · **eOI** with **eOI** the residual of regression:

$$\mathbf{OI} = \boldsymbol{\alpha}\_1 + \boldsymbol{\alpha}\_2 \mathbf{FA} + \boldsymbol{\alpha}\_3 \mathbf{S} + \boldsymbol{\nu}.$$

**Figure 2.** VIF of the variables after raising **OI**.

**Remark 2.** *The coefficient of variation of* **OI** <sup>1</sup> *for <sup>λ</sup>*(1) <sup>=</sup> 24.5 *is equal to 0.7922063; that is to say, it was lightly increased.*

As can be observed from Table 3, in Equation (16), the collinearity is not mitigated by considering *λ*(1) equal to *λ*(1) *min* and *<sup>λ</sup>*(1) *mse*. For this reason, Table 1 only shows the values of the model in Equation (16) for the value of *λ*(1) that leads to VIF lower than 10.


**Table 3.** VIF of regression Equation (16) for *<sup>λ</sup>*(1) equal to *<sup>λ</sup>*(1) *min*, *<sup>λ</sup>*(1) *mse*, and *<sup>λ</sup>*(1) *vi f* .

#### 5.1.2. Transformation of Variables

After the first raising, it is interesting to verify that the VIF associated with the raise regression is invariant to data transformation. With this goal, the second variable has been raised, obtaining the *VIF*(**FA**, *<sup>λ</sup>*(1)), *VIF*(**OI** <sup>1</sup>, *<sup>λ</sup>*(1)), and *VIF*(**S**, *<sup>λ</sup>*(1)) for *<sup>λ</sup>*(1) ∈ {0, 0.5, 1, 1.5, 2, ... , 9.5, 10}, supposing original, unit length, and standardized data. Next, the three possible differences and the average of the VIF associated with each variable are obtained. Table 4 displays the results from which it is possible to conclude that differences are almost null and that, consequently, the VIF associated with the raise regression is invariant to the most common data transformation.

**Table 4.** Effect of data transformations on VIF associated with raise regression.


#### 5.1.3. Second Raising

After the first raising, we can use the results obtained from the value of *λ* that obtains all VIFs less than 10 or consider the results obtained for *λmin* or *λmse* and continue the procedure with a second raising. By following the second option, we part from the value of *λ*(1) = *λ*(1) *min* = 0.42 obtained after the first raising. From Table 5, the third variable is selected to be raised. Table 6 shows the VIF associated with the following model for *λ*(2) *min*, *<sup>λ</sup>*(2) *mse*, and *<sup>λ</sup>*(2) *vi f* :

$$E = \beta\_1(\lambda) + \beta\_2(\lambda)\mathbf{F}\mathbf{A} + \beta\_3(\lambda)\widehat{\mathbf{O}}\mathbf{I} + \beta\_4(\lambda)\widehat{\mathbf{S}} + \widetilde{\mathbf{u}},\tag{17}$$

where **<sup>S</sup>**. <sup>=</sup> **<sup>S</sup>** <sup>+</sup> *<sup>λ</sup>*(2) · **eS** with **eS** the residuals or regression:

$$\mathbf{S} = \boldsymbol{\alpha}\_1(\boldsymbol{\lambda}) + \boldsymbol{\alpha}\_2(\boldsymbol{\lambda})\mathbf{F}\mathbf{A} + \boldsymbol{\alpha}\_3(\boldsymbol{\lambda})\mathbf{O}\mathbf{I} + \widetilde{\mathbf{v}}.$$

**Remark 3.** *The coefficient of variation of* **OI** <sup>1</sup> *for <sup>λ</sup>*(1) <sup>=</sup> 0.42 *is equal to 0.7470222, and the coefficient of variation of* **<sup>S</sup>**. *for <sup>λ</sup>*(2) <sup>=</sup> 17.5 *is equal to 0.7473472. In both cases, they were slightly increased.*

Note than it is only possible to state that collinearity has been mitigated when *λ*(2) = *λ*(2) *vi f* = 17.5. Results of this estimation are displayed in Table 1.

**Table 5.** Horizontal asymptote for VIFs after raising each variable in the second raising for *<sup>λ</sup>*(2) *min*, *<sup>λ</sup>*(2) *mse* and *<sup>λ</sup>*(2) *vi f* .


**Table 6.** VIFs of regression Equation (16) for *<sup>λ</sup>*(2) equal to *<sup>λ</sup>*(2) *min*, *<sup>λ</sup>*(2) *mse*, and *<sup>λ</sup>*(2) *vi f* .


Considering that, after the first raising, it is obtained that *λ*(1) = *λ*(1) *mse* = 1.43, from Table 7, the third variable is selected to be raised. Table 8 shows the VIF associated with the following model for *λ*(2) *min*, *<sup>λ</sup>*(2) *mse*, and *<sup>λ</sup>*(2) *vi f* :

$$E = \beta\_1(\lambda) + \beta\_2(\lambda)\mathbf{F}\mathbf{A} + \beta\_3(\lambda)\overline{\mathbf{O}}\overline{\mathbf{I}} + \beta\_4(\lambda)\overline{\mathbf{S}} + \widetilde{\mathbf{u}},\tag{18}$$

where **S**. = **S** + *λ* · **eS**.

**Remark 4.** *The coefficient of variation of* **OI** <sup>1</sup> *for <sup>λ</sup>*(1) <sup>=</sup> 1.43 *is equal to 0.7473033, and the coefficient of variation of* **<sup>S</sup>**. *for <sup>λ</sup>*(2) <sup>=</sup> <sup>10</sup> *is equal to 0.7651473. In both cases, they were lightly increased.*

**Remark 5.** *Observing the coefficients of variation of* **OI** 1 *for different raising factor. it is concluded that the coefficient of variation increases as the raising factor increases: 0.7470222 (λ* = 0.42*), 0.7473033 (λ* = 1.43*), and 0.7922063 (λ* = 24.5*).*

Note that it is only possible to state that collinearity has been mitigated when *λ*(2) = *λ*(2) *vi f* = 10. Results of the estimations of this model are shown in Table 1.

**Table 7.** Horizontal asymptote for VIFs after raising each variables in the second raising for *<sup>λ</sup>*(2) *min*, *<sup>λ</sup>*(2) *mse*, and *<sup>λ</sup>*(2) *vi f* .



**Table 8.** VIFs of regression Equation (16) for *<sup>λ</sup>*(2) equal to *<sup>λ</sup>*(2) *min*, *<sup>λ</sup>*(2) *mse*, and *<sup>λ</sup>*(2) *vi f* .

#### 5.1.4. Interpretation of Results

Analyzing the results of Table 1, it is possible to conclude that


Thus, in conclusion, the model in Equation (16) is selected as it presents the smallest MSE and there is an improvement in the individual significance of the variables.

#### *5.2. Example 2: h* > 1

This example uses the following model previously applied by Klein and Goldberger [43] about consumption and salaries in the United States from 1936 to 1952 (1942 to 1944 were war years, and data are not available):

$$\mathbf{C} = \beta\_1 + \beta\_2 \mathbf{W} \mathbf{I} + \beta\_3 \mathbf{N} \mathbf{W} \mathbf{I} + \beta\_4 \mathbf{F} \mathbf{I} + \mathbf{u},\tag{19}$$

where **C** is consumption, **WI** is wage income, **NWI** is non-wage, non-farm income, and **FI** is the farm income. Its estimation by OLS is shown in Table 9.

However, this estimation is questionable since no estimated coefficient is significantly different to zero while the model is globally significant (with 5% significance level), and the VIFs associated with each variable (12.296, 9.23, and 2.97) indicate the presence of severe essential collinearity. In addition, the determinant of the matrix of correlation

$$\mathbf{R} = \begin{pmatrix} 1 & 0.9431118 & 0.8106989 \\ 0.9431118 & 1 & 0.7371272 \\ 0.8106989 & 0.7371272 & 1 \end{pmatrix},$$

is equal to 0.03713592 and, consequently, lower than the threshold recommended by García et al. [44] (1.013 · 0.1 + 0.00008626 · *n* − 0.01384 · *p* = 0.04714764 being *n* = 14 and *p* = 4); it is maintained the conclusion that the near multicollinearity existing in this model is troubling.

Once again, the values of the coefficients of variation (0.2761369, 0.2597991, and 0.2976122) indicate that the nonessential multicollinearity is not troubling (see Salmerón et al. [39]). Thus, the extension of the VIF seems appropriate to check if the application of the raise regression has mitigated the near multicollinearity.

Next, it is presented the estimation of the model by raise regression and the results are compared to the estimation by ridge and Lasso regression.

#### 5.2.1. Raise Regression

When calculating the thresholds that would be obtained for VIFs by raising each variable (see Table 10), it is observed that, in all cases, they are less than 10. However, when calculating *λmin* in each case, a value higher than one is only obtained when raising the third variable. Figure 3 displays the MSE for *λ* ∈ [0, 37). Note that *MSE*(*β*/(*λ*)) is always less than the one obtained by OLS, 49.434, and presents an asymptote in lim *<sup>λ</sup>*→+<sup>∞</sup> *MSE*(*β*/(*λ*)) = 45.69422.

**Figure 3.** Mean square error (MSE) for the model in Equation (19) after raising third variable.


**Table 9.** Estimation of the original and raised models: Standard deviation is inside the parentheses, *R*2 is the coefficient of determination, *F*3,10 is the experimentalvalue of the jointsignificancecontrast,andˆ*σ*2isthevarianceestimateoftherandomperturbation.


**Table 10.** Horizontal asymptote for VIFs after raising each variable and *λmin*.

The following model is obtained by raising the third variable:

$$\mathbf{C} = \beta\_1(\lambda) + \beta\_2(\lambda)\mathbf{WI} + \beta\_3(\lambda)\mathbf{NWI} + \beta\_4(\lambda)\mathbf{FI} + \tilde{\mathbf{u}},\tag{20}$$

where **FI**. = **FI** + *λ* · **eFI** being **eFI** the residuals of regression:

$$\mathbf{FI} = \boldsymbol{\kappa}\_1 + \boldsymbol{\kappa}\_2 \mathbf{WI} + \boldsymbol{\kappa}\_3 \mathbf{NWI} + \mathbf{v}\_{..}$$

**Remark 6.** *The coefficient of variation* **FI**. *for <sup>λ</sup>*(1) <sup>=</sup> 6.895 *is 1.383309. Thus, the application of the raise regression has mitigated the nonessential multicollinearity in this variable.*

Table 9 shows the results for the model in Equation (20), being *λ* = 6.895. In this case, the MSE is the lowest possible for every possible value of *λ* and lower than the one obtained by OLS for the model in Equation (19). Furthermore, in this case, the collinearity is not strong once all the VIF are lower than 10 (9.098, 9.049, and 1.031, respectively). However, the individual significance in the variable was not improved.

With the purpose of improving this situation, another variable is raised. If the first variable is selected to be raised, the following model is obtained:

$$\mathbf{C} = \beta\_1(\lambda) + \beta\_2(\lambda)\mathbf{\tilde{WI}} + \beta\_3(\lambda)\mathbf{NWI} + \beta\_4(\lambda)\mathbf{FI} + \mathbf{\tilde{u}},\tag{21}$$

where **WI** 1 = **WI** + *λ* · **eWI** being **eWI** the residuals of regression:

$$\mathbf{WI} = \boldsymbol{\alpha}\_1 + \boldsymbol{\alpha}\_2 \mathbf{N} \mathbf{WI} + \boldsymbol{\alpha}\_3 \mathbf{FI} + \mathbf{v}\_{..}$$

**Remark 7.** *The coefficient of variation of* **WI** <sup>1</sup> *for <sup>λ</sup>*(1) <sup>=</sup> 0.673 *is 0.2956465. Thus, it is noted that the raise regression has lightly mitigated the nonessential mutlicollinearity of this variable.*

Table 9 shows the results for the model in Equation (21), being *λ* = 0.673. In this case, the MSE is lower than the one obtained by OLS for the model in Equation (19). Furthermore, in this case, the collinearity is not strong once all the VIF are lower than 10 (5.036024, 4.705204, and 2.470980, respectively). Note that raising this variable, the values of VIFs are lower than raising the first variable but the MSE is higher. However, this model is selected as preferable due to the individual significance being better in this model and the MSE being lower than the one obtained by OLS.

#### 5.2.2. Ridge Regression

This subsection presents the estimation of the model in Equation (19) by ridge regression (see Hoerl and Kennard [4] or Marquardt [45]). The first step is the selection of the appropriate value of *K*.

The following suggestions are addressed:


• García et al. [44] proposed the following values:

$$\begin{array}{rcl} \mathbf{K\_{exp}} & = & 0.006639 \cdot e^{1-\det(\mathbf{R})} - 0.00001241 \cdot n + 0.005745 \cdot p\_{\prime} \\ \mathbf{K\_{linear}} & = & 0.01837 \cdot (1 - \det(\mathbf{R})) - 0.00001262 \cdot n + 0.005678 \cdot p\_{\prime} \\ \mathbf{K\_{sq}} & = & 0.7922 \cdot (1 - \det(\mathbf{R}))^2 - 0.6901 \cdot (1 - \det(\mathbf{R})) - 0.000007567 \cdot n \\ & & - 0.01081 \cdot p\_{\prime} \end{array}$$

where *det*(**R**) denotes the determinant of the matrix of correlation, **R**.

The following values are obtained *KHKB* = 0.417083, *KVIF* = 0.013, *Kexp* = 0.04020704, *Klinear* = 0.04022313, and *Ksq* = 0.02663591.

Tables 11 and 12 show (The results for *Klinear* are not considered as they are very similar to results obtained by *Kexp*.) the estimations obtained from ridge estimators (expression (1)) and the individual significance intervals obtained by bootstrap considering percentiles 5 and 95 for 5000 repeats. It is also calculated the goodness of the fit by following the results shown by Rodríguez et al. [28] and the MSE.

Note that only the constant term can be considered significatively different to zero and that, curiously, the value of *K* proposed by Hoerl et al. [33] leads to a value of MSE higher than the one from OLS while the values proposed by García et al. [26] and García et al. [44] lead to a value of MSE lower than the one obtained by OLS. All cases lead to values of VIF lower than 10; see García et al. [26] for its calculation:


In any case, the lack of individual significance justifies the selection of the raise regression as preferable in comparison to the models obtained by ridge regression.

**Table 11.** Estimation of the ridge models for *KHKB* = 0.417083 and *KVIF* = 0.013. Confidence interval, at 10% confidence, is obtained from bootstrap inside the parentheses, and *R*<sup>2</sup> is the coefficient of determination obtained from Rodríguez et al. [28].


**Table 12.** Estimation of the ridge models for *Kexp* = 0.04020704 and *Ksq* = 0.02663591. Confidence interval, at 10% confidence, is obtained from bootstrap inside the parentheses, and *R*<sup>2</sup> is the coefficient of determination obtained from Rodríguez et al. [28].


#### 5.2.3. Lasso Regression

The Lasso regression (see Tibshirani [5]) is a method initially designed to select variables constraining the coefficient to zero, being specially useful in models with a high number of independent variables. However, this estimation methodology has been widely applied in situation where the model presents worrying near multicollinearity.

Table 13 shows results obtained by the application of the Lasso regression to the model in Equation (19) by using the package *glmnet* of the programming environment R Core Team [46]. Note that these estimations are obtained for the optimal value of *λ* = 0.1258925 obtained after a k-fold cross-validation.

**Table 13.** Estimation of the Lasso model for *λ* = 0.1258925: Confidence interval at 10% confidence (obtained from bootstrap inside the parentheses).


The inference obtained by bootstrap methodology (with 5000 repeats) allows us to conclude that in, at least, the 5% of the cases, the coefficient of **NWI** is constrained to zero. Thus, this variable should be eliminated from the model.

However, we consider that this situation should be avoided, and as an alternative to the elimination of variable, that is, as an alternative from the following model, the estimation by raise or ridge regression is proposed.

$$\mathbf{C} = \pi\_1 + \pi\_2 \mathbf{W} \mathbf{I} + \pi\_3 \mathbf{F} \mathbf{I} + \varepsilon\_\prime \tag{22}$$

It could be also appropriate to apply the residualization method (see, for example, York [47], Salmerón et al. [48], and García et al. [44]), which consists in the estimation of the following model:

$$\mathbf{C} = \tau\_1 + \tau\_2 \mathbf{WI} + \tau\_3 \mathbf{FI} + \tau\_4 \mathbf{res}\_{\mathbf{NWI}} + \varepsilon,\tag{23}$$

where, for example, **resNWI** represents the residuals of the regression of **NWI** as a function of **WI** that will be interpreted as the part of **NWI** not related to **WI**. In this case (see García et al. [44]), it is verified that *<sup>π</sup>*/*<sup>i</sup>* <sup>=</sup> *<sup>τ</sup>*/*<sup>i</sup>* for *<sup>i</sup>* <sup>=</sup> 1, 2, 3. That is to say, the model in Equation (23) estimates the same relationship between **WI** and **FI** with **C** as in the model in Equation (22) with the benefit that the variable **NWI** is not eliminated due to a part of it being considered..

#### **6. Conclusions**

The Variance Inflation Factor (VIF) is one of the most applied measures to diagnose collinearity together with the Condition Number (CN). Once the collinearity is detected, different methodologies can be applied as, for example, the raise regression, but it will be required to check if the methodology has mitigated the collinearity effectively. This paper extends the concept of VIF to be applied after the raise regression and presents an expression of the VIF that verifies the following desirable properties (see García et al. [26]):


The paper also shows that the VIF in the raise regression is scale invariant, which is a very common transformation when working with models with collinearity. Thus, it yields identical results regardless of whether predictions are based on unstandardized or standardized predictors. Contrarily, the VIFs obtained from other penalized regressions (ridge regression, Lasso, and Elastic Net) are not scale invariant and hence yield different results depending on the predictor scaling used.

Another contribution of this paper is the analysis of the asymptotic behavior of the VIF associated with the raised variable (verifying that its limit is equal to 1) and associated with the rest of the variables (presenting an horizontal asymptote). This analysis allows to conclude that


On the other hand, since the raise estimator is biased, the paper analyzes its Mean Square Error (MSE), showing that there is a value of *λ* that minimizes the possibility of the MSE being lower than the one obtained by OLS. However, it is not guaranteed that the VIF for this value of *λ* presents a value less than the established thresholds. The results are illustrated with two numerical examples, and in the second one, the results obtained by OLS are compared to the results obtained with the raise, ridge, and Lasso regressions that are widely applied to estimated models with worrying multicollinearity. It is showed that the raise regression can compete and even overcome these methodologies.

Finally, we propose as future lines of research the following questions:


**Funding:** This research received no external funding.

**Acknowledgments:** We thank the anonymous referees for their useful suggestions.

**Conflicts of Interest:** The authors declare no conflict of interest.

**Author Contributions:** conceptualization, J.G.P., C.G.G. and R.S.G. and A.R.S.; methodology, R.S.G. and A.R.S.; software, A.R.S.; validation, J.G.P., R.S.G. and C.G.G.; formal analysis, R.S.G. and C.G.G.; investigation, R.S.G. and A.R.S.; writing—original draft preparation, A.R.S. and C.G.G.; writing—review and editing, C.G.G.; supervision, J.G.P. All authors have read and agreed to the published version of the manuscript.

#### **Appendix A**

Given the linear model in Equation (7), it is obtained that

/*α*(*λ*) = **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **<sup>X</sup>**−*i*,−*<sup>j</sup>* **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **X**.*i* **X**.*t i* **<sup>X</sup>**−*i*,−*<sup>j</sup>* **<sup>X</sup>**.*<sup>t</sup> i* **X**.*i* −1 · **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **X***j* **X**.*t i* **X***j* = **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **<sup>X</sup>**−*i*,−*<sup>j</sup>* **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **X***i* **X***t i* **<sup>X</sup>**−*i*,−*<sup>j</sup>* **<sup>X</sup>***<sup>t</sup> i* **X***<sup>i</sup>* + (*λ*<sup>2</sup> + 2*λ*)*RSS*−*<sup>i</sup> i* −1 · **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **X***j* **X***t i* **X***j* = *A*(*λ*) *B*(*λ*) *B*(*λ*)*<sup>t</sup> C*(*λ*) · **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **X***j* **X***t i* **X***j* = *<sup>A</sup>*(*λ*) · **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **<sup>X</sup>***<sup>j</sup>* <sup>+</sup> *<sup>B</sup>*(*λ*) · **<sup>X</sup>***<sup>t</sup> i* **X***j <sup>B</sup>*(*λ*)*<sup>t</sup>* · **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **<sup>X</sup>***<sup>j</sup>* <sup>+</sup> *<sup>C</sup>*(*λ*) · **<sup>X</sup>***<sup>t</sup> i* **X***j* = /*α*−*i*,−*j*(*λ*) /*αi*(*λ*) ,

Since it is verified that **e***<sup>t</sup> i* **<sup>X</sup>**−*i*,−*<sup>j</sup>* <sup>=</sup> **<sup>0</sup>**, then **<sup>X</sup>**.*<sup>t</sup> i* **<sup>X</sup>**−*i*,−*<sup>j</sup>* = (**X***<sup>i</sup>* <sup>+</sup> *<sup>λ</sup>***e***i*)*<sup>t</sup>* **<sup>X</sup>**−*i*,−*<sup>j</sup>* <sup>=</sup> **<sup>X</sup>***<sup>t</sup> i* **X**−*i*,−*j*, where

*<sup>C</sup>*(*λ*) = **X***t i* **X***<sup>i</sup>* + (*λ*<sup>2</sup> + 2*λ*)*RSS*−*<sup>i</sup> <sup>i</sup>* <sup>−</sup> **<sup>X</sup>***<sup>t</sup> i* **X**−*i*,−*<sup>j</sup>* **X***t* −*i*,−*j* **X**−*i*,−*<sup>j</sup>* −1 **X***t* −*i*,−*j* **X***i* −1 = **X***t i* **I** − **X**−*i*,−*<sup>j</sup>* **X***t* −*i*,−*j* **X**−*i*,−*<sup>j</sup>* −1 **X***t* −*i*,−*j* **X***<sup>i</sup>* + (*λ*<sup>2</sup> + 2*λ*)*RSS*−*<sup>i</sup> i* −1 = *RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* + (*λ*<sup>2</sup> <sup>+</sup> <sup>2</sup>*λ*)*RSS*−*<sup>i</sup> i* −1 , *B*(*λ*) = − **X***t* −*i*,−*j* **X**−*i*,−*<sup>j</sup>* −1 **X***t* −*i*,−*j* **<sup>X</sup>***<sup>i</sup>* · *<sup>C</sup>*(*λ*) = *RSS*−*i*,−*<sup>j</sup> i RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* +(*λ*2+2*λ*)*RSS*−*<sup>i</sup> i* · *B*, *<sup>A</sup>*(*λ*) = **X***t* −*i*,−*j* **X**−*i*,−*<sup>j</sup>* −1 + **X***t* −*i*,−*j* **X**−*i*,−*<sup>j</sup>* −1 **X***t* −*i*,−*j* **<sup>X</sup>***<sup>i</sup>* · *<sup>C</sup>*(*λ*) · **<sup>X</sup>***<sup>t</sup> i* **X**−*i*,−*<sup>j</sup>* **X***t* −*i*,−*j* **X**−*i*,−*<sup>j</sup>* −1 = **X***t* −*i*,−*j* **X**−*i*,−*<sup>j</sup>* −1 <sup>+</sup> (*RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* )<sup>2</sup> *RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* +(*λ*2+2*λ*)*RSS*−*<sup>i</sup> i* · *<sup>B</sup>* · *<sup>B</sup><sup>t</sup>* .

Then,

$$
\begin{split}
\widehat{\mathbf{a}}\_{-i,-j}(\lambda) &= \begin{pmatrix} \mathbf{X}^{t}\_{-i,-j}\mathbf{X}\_{-i,-j} \end{pmatrix}^{-1} \mathbf{X}^{t}\_{-i,-j}\mathbf{X}\_{j} + \frac{(\operatorname{RSS}^{-t-i}\_{i})^{2}}{\operatorname{RSS}^{-t}\_{i} - ^{t}\_{i} + (\lambda^{2} + 2\lambda)\operatorname{RSS}^{-t}\_{i}} \cdot \mathbf{B} \cdot \mathbf{B}^{t} \cdot \mathbf{X}^{t}\_{-i,-j}\mathbf{X}\_{j} \\ &+ \frac{\operatorname{RSS}^{-t-i}\_{i}}{\operatorname{RSS}^{-t}\_{i} - ^{t}\_{i} + (\lambda^{2} + 2\lambda)\operatorname{RSS}^{-t}\_{i}} \cdot \mathbf{B} \cdot \mathbf{X}^{t}\_{j}\mathbf{X}\_{j} \\ &= \begin{pmatrix} \mathbf{X}^{t}\_{-i,-j}\mathbf{X}\_{-i,-j} \end{pmatrix}^{-1} \mathbf{X}^{t}\_{-i,-j}\mathbf{X}\_{j} \\ &+ \frac{\operatorname{RSS}^{-t-i}\_{i}\left(\operatorname{RSS}^{-t-i}\_{i} + \lambda^{2}\operatorname{R}\mathbf{A}^{t}\mathbf{X}\_{-i,-j}\mathbf{X}\_{j} + \mathbf{B}^{t}\mathbf{X}\_{i}\right)}{\operatorname{RSS}^{-t-i}\_{i} + (\lambda^{2} + 2\lambda)\operatorname{RSS}^{-t}\_{i}}, \\
\widehat{\mathbf{a}}\_{i}(\lambda) &= \begin{pmatrix} \frac{\operatorname{RSS}^{-t-i}\_{i}}{\operatorname{RSS}^{-t-i}\_{i} + (\lambda^{2} + 2\lambda)\operatorname{RSS}^{-t}\_{i}} \cdot \mathbf{B}^{t} \cdot \mathbf{X}^{t}\_{-i,-j}\mathbf{X}\_{j} \\$$

#### **Appendix B**

Given the linear model

$$\mathbf{X}\_{j} = \mathbf{X}\_{-j}\mathbf{a} + \mathbf{v} = \left(\mathbf{X}\_{-i,-j}\mathbf{x}\_{i}\right)\left(\begin{array}{c} \mathbf{a}\_{-i,-j} \\ \mathbf{a}\_{i} \end{array}\right) + \mathbf{v}\_{i}$$

*Mathematics* **2020**, *8*, 605

it is obtained that

$$
\begin{split}
\hat{\mathbf{x}}^{t} &= \left(\begin{array}{c} \mathbf{X}^{t}\_{-i,-j}\mathbf{X}\_{-i,-j} & \mathbf{X}^{t}\_{-i,-j}\mathbf{X}\_{i} \\ \mathbf{X}^{t}\_{i}\mathbf{X}\_{-i,-j} & \mathbf{X}^{t}\_{i}\mathbf{X}\_{i} \end{array}\right)^{-1} \cdot \left(\begin{array}{c} \mathbf{X}^{t}\_{-i,-j}\mathbf{X}\_{j} \\ \mathbf{X}^{t}\_{i}\mathbf{X}\_{j} \end{array}\right) = \left(\begin{array}{c} A & B \\ B^{t} & C \end{array}\right) \cdot \left(\begin{array}{c} \mathbf{X}^{t}\_{-i,-j}\mathbf{X}\_{j} \\ \mathbf{X}^{t}\_{i}\mathbf{X}\_{j} \end{array}\right) \\ &= \left(\begin{array}{c} A \cdot \mathbf{X}^{t}\_{-i,-j}\mathbf{X}\_{j} + B \cdot \mathbf{X}^{t}\_{i}\mathbf{X}\_{j} \\ B^{t} \cdot \mathbf{X}^{t}\_{-i,-j}\mathbf{X}\_{j} + C \cdot \mathbf{X}^{t}\_{i}\mathbf{X}\_{j} \end{array}\right) = \left(\begin{array}{c} \hat{\mathbf{a}}\_{-i,-j} \\ \hat{\mathbf{a}}\_{i} \end{array}\right),
\end{split}
$$

where

$$\begin{split} \mathbf{C} &= \quad \left(\mathbf{X}\_{i}^{t}\mathbf{X}\_{i} - \mathbf{X}\_{i}^{t}\mathbf{X}\_{-i,-j}\left(\mathbf{X}\_{-i,-j}^{t}\mathbf{X}\_{-i,-j}\right)^{-1}\mathbf{X}\_{-i,-j}^{t}\mathbf{X}\_{i}\right)^{-1} \\ &= \quad \left(\mathbf{X}\_{i}^{t}\left(\mathbf{I} - \mathbf{X}\_{-i,-j}\left(\mathbf{X}\_{-i,-j}^{t}\mathbf{X}\_{-i,-j}\right)^{-1}\mathbf{X}\_{-i,-j}^{t}\right)\mathbf{X}\_{i}\right)^{-1} = \left(\mathbf{R}\mathbf{S}\_{i}^{-i,-j}\right)^{-1}, \\ \mathbf{B} &= \quad -\left(\mathbf{X}\_{-i,-j}^{t}\mathbf{X}\_{-i,-j}\right)^{-1}\mathbf{X}\_{-i,-j}^{t}\mathbf{X}\_{i} \cdot \mathbf{C}, \\ \mathbf{A} &= \quad \left(\mathbf{X}\_{-i,-j}^{t}\mathbf{X}\_{-i,-j}\right)^{-1}\cdot\left(\mathbf{I} + \mathbf{X}\_{-i,-j}^{t}\mathbf{X}\_{i} \cdot \mathbf{C} \cdot \mathbf{X}\_{i}^{t}\mathbf{X}\_{-i,-j}\left(\mathbf{X}\_{-i,-j}^{t}\mathbf{X}\_{-i,-j}\right)^{-1}\right) \\ &= \quad \left(\mathbf{X}\_{-i,-j}^{t}\mathbf{X}\_{-i,-j}\right)^{-1} + \frac{1}{\mathbf{C}} \cdot \mathbf{B} \cdot \mathbf{B}^{t}. \end{split}$$

In that case, the residual sum of squares is given by

*RSS*−*<sup>j</sup> <sup>j</sup>* <sup>=</sup> **<sup>X</sup>***<sup>t</sup> <sup>j</sup>***X***<sup>j</sup>* − *<sup>A</sup>* · **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **<sup>X</sup>***<sup>j</sup>* <sup>+</sup> *<sup>B</sup>* · **<sup>X</sup>***<sup>t</sup> i* **X***j <sup>B</sup><sup>t</sup>* · **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **<sup>X</sup>***<sup>j</sup>* <sup>+</sup> *<sup>C</sup>* · **<sup>X</sup>***<sup>t</sup> i* **X***j <sup>t</sup>* **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **X***j* **X***t i* **X***j* = **X***<sup>t</sup> <sup>j</sup>***X***<sup>j</sup>* <sup>−</sup> **<sup>X</sup>***<sup>t</sup> <sup>j</sup>***X**−*i*,−*<sup>j</sup>* · *<sup>A</sup><sup>t</sup>* · **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **<sup>X</sup>***<sup>j</sup>* <sup>−</sup> **<sup>X</sup>***<sup>t</sup> <sup>j</sup>***X***<sup>i</sup>* · *<sup>B</sup><sup>t</sup>* · **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **<sup>X</sup>***<sup>j</sup>* <sup>−</sup> /*α<sup>t</sup> i***X***t <sup>i</sup>***X***<sup>j</sup>* = **X***t <sup>j</sup>***X***<sup>j</sup>* <sup>−</sup> **<sup>X</sup>***<sup>t</sup> <sup>j</sup>***X**−*i*,−*<sup>j</sup>* **X***t* −*i*,−*j* **X**−*i*,−*<sup>j</sup>* −1 **X***t* −*i*,−*j* **X***j* −*RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* **<sup>X</sup>***<sup>t</sup> <sup>j</sup>***X**−*i*,−*<sup>j</sup>* · *<sup>B</sup>* · *<sup>B</sup><sup>t</sup>* · **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **<sup>X</sup>***<sup>j</sup>* <sup>−</sup> **<sup>X</sup>***<sup>t</sup> <sup>j</sup>***X***<sup>i</sup>* · *<sup>B</sup><sup>t</sup>* · **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **<sup>X</sup>***<sup>j</sup>* <sup>−</sup> /*α<sup>t</sup> i***X***t <sup>i</sup>***X***<sup>j</sup>* = *RSS*−*i*,−*<sup>j</sup> j* − *RSS*−*i*,−*<sup>j</sup> <sup>i</sup>* **<sup>X</sup>***<sup>t</sup> <sup>j</sup>***X**−*i*,−*<sup>j</sup>* · *<sup>B</sup>* · *<sup>B</sup><sup>t</sup>* · **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **X***<sup>j</sup>* + **X***<sup>t</sup> <sup>j</sup>***X***<sup>i</sup>* · *<sup>B</sup><sup>t</sup>* · **<sup>X</sup>***<sup>t</sup>* −*i*,−*j* **X***j* +/*αt i***X***t <sup>i</sup>***X***<sup>j</sup>* , ,

and consequently

$$RSS\_{\hat{j}}^{-i,-j} - RSS\_{\hat{j}}^{-j} = RSS\_{i}^{-i,-j} \mathbf{X}\_{\hat{j}}^{t} \mathbf{X}\_{-i,-j} \cdot \mathbf{B} \cdot \mathbf{B}^{t} \cdot \mathbf{X}\_{-i,-j}^{t} \mathbf{X}\_{\hat{j}} + \mathbf{X}\_{\hat{j}}^{t} \mathbf{X}\_{i} \cdot \mathbf{B}^{t} \cdot \mathbf{X}\_{-i,-j}^{t} \mathbf{X}\_{\hat{j}} + \hat{\mathbf{a}}\_{i}^{t} \mathbf{X}\_{i}^{t} \mathbf{X}\_{\hat{j}}.$$

#### **Appendix C**

First, parting from the expression Equation (14), it is obtained that

$$\mathbf{M}\_{\hat{\lambda}}^{-1} = \begin{pmatrix} 1 & 0 & \cdots & 0 & -\frac{\lambda}{1+\lambda}\hat{\alpha}\_0 & 0 & \cdots & 0\\ 0 & 1 & \cdots & 0 & \frac{\lambda}{1+\lambda}\hat{\alpha}\_1 & 0 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & \cdots & 1 & (-1)^{k-1}\frac{\lambda}{1+\lambda}\hat{\alpha}\_{k-1} & 0 & \cdots & 0\\ 0 & 0 & \cdots & 0 & \frac{1}{1+\lambda} & 0 & \cdots & 0\\ 0 & 0 & \cdots & 0 & (-1)^{k+1}\frac{\lambda}{1+\lambda}\hat{\alpha}\_{k+1} & 1 & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & \cdots & 0 & (-1)^{p}\frac{\lambda}{1+\lambda}\hat{\alpha}\_{p} & 0 & \cdots & 1\\ \end{pmatrix},$$

*Mathematics* **2020**, *8*, 605

and then,

$$(\mathbf{M}\_{\lambda}^{-1} - \mathbf{I})^{t}(\mathbf{M}\_{\lambda}^{-1} - \mathbf{I}) = \left( \begin{pmatrix} 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 \\ \vdots & \vdots & & \vdots & \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 & a(\lambda) & 0 & \cdots & 0 \\ 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 \\ \vdots & \vdots & & \vdots & \vdots & \vdots & & \vdots \\ 0 & 0 & \cdots & 0 & 0 & 0 & \cdots & 0 \\ \end{pmatrix} \right)$$

where *a*(*λ*) = *<sup>λ</sup>*<sup>2</sup> (1+*λ*)<sup>2</sup> · /*α*<sup>0</sup> <sup>+</sup> /*α*<sup>1</sup> <sup>+</sup> ··· <sup>+</sup> /*α*<sup>2</sup> *<sup>k</sup>*−<sup>1</sup> <sup>+</sup> <sup>1</sup> <sup>+</sup> /*α*<sup>2</sup> *<sup>k</sup>*+<sup>1</sup> <sup>+</sup> ··· <sup>+</sup> /*α*<sup>2</sup> *p* . In that case,

$$\mathcal{J}^t (\mathbf{M}\_{\lambda}^{-1} - \mathbf{I})^t (\mathbf{M}\_{\lambda}^{-1} - \mathbf{I}) \mathfrak{F} = a(\lambda) \cdot \beta\_k^2.$$

Second, partitioning **X**. in the form **X**. = ' **X**−*<sup>k</sup>* **X**.*<sup>k</sup>* ( , it is obtained that

$$\begin{pmatrix} \left(\widetilde{\mathbf{x}}^t \widetilde{\mathbf{x}}\right)^{-1} = \begin{pmatrix} \left(\mathbf{X}\_{-k}^t \mathbf{X}\_{-k}\right)^{-1} + \frac{\widetilde{\mathbf{a}}^t \widetilde{\mathbf{a}}^t}{\left(1+\lambda\right)^2 \cdot \mathbf{e}\_k^t \mathbf{e}\_k} & -\frac{\widetilde{\mathbf{a}}^t}{\left(1+\lambda\right)^2 \cdot \mathbf{e}\_k^t \mathbf{e}\_k} \\ -\frac{\widetilde{\mathbf{a}}^t}{\left(1+\lambda\right)^2 \cdot \mathbf{e}\_k^t \mathbf{e}\_k} & \frac{1}{\left(1+\lambda\right)^2 \cdot \mathbf{e}\_k^t \mathbf{e}\_k} \end{pmatrix} \end{pmatrix}$$

and then,

$$\operatorname{tr}((\widetilde{\mathbf{X}}^t \widetilde{\mathbf{X}})^{-1}) = \operatorname{tr}\left(\left(\mathbf{X}\_{-k}^t \mathbf{X}\_{-k}\right)^{-1}\right) + \frac{1}{(1+\lambda)^2 \cdot \mathbf{e}\_k^t \mathbf{e}\_k} \cdot \left(\operatorname{tr}\left(\widetilde{\mathbf{a}} \widetilde{\mathbf{a}}^t\right) + 1\right).$$

Consequently, it is obtained that

$$\text{MSE}\left(\hat{\boldsymbol{\beta}}(\boldsymbol{\lambda})\right) = \sigma^2 \text{tr}\left(\left(\mathbf{X}\_{-k}^t \mathbf{X}\_{-k}\right)^{-1}\right) + \left(1 + \sum\_{j=0, j\neq k}^p \hat{a}\_j^2\right) \cdot \beta\_k^2 \cdot \frac{\lambda^2 + h}{(1+\lambda)^2},\tag{A1}$$

where *h* = *<sup>σ</sup>*<sup>2</sup> *β*2 *<sup>k</sup>* ·*RSS*−*<sup>k</sup> k* .

Third, taking into account that the first and second derivatives of expression Equation (A1) are, respectively,

$$\begin{split} \frac{\partial}{\partial \lambda} \text{MSE} \left( \hat{\mathcal{B}}(\lambda) \right) &= \quad \left( 1 + \sum\_{j=0, j \neq k}^{p} \hat{a}\_j^2 \right) \cdot \beta\_k^2 \cdot \frac{2(\lambda - h)}{(1 + \lambda)^3}, \\ \frac{\partial^2}{\partial \lambda^2} \text{MSE} \left( \hat{\mathcal{B}}(\lambda) \right) &= \quad \quad -2 \left( 1 + \sum\_{j=0, j \neq k}^{p} \hat{a}\_j^2 \right) \cdot \beta\_k^2 \cdot \frac{2\lambda - (1 + 3h)}{(1 + \lambda)^4}. \end{split}$$

Since *<sup>λ</sup>* <sup>≥</sup> 0, it is obtained that MSE *β*/(*λ*) is decreasing if *λ* < *h* and increasing if *λ* > *h*, and it is concave if *λ* > <sup>1</sup>+3*<sup>h</sup>* <sup>2</sup> and convex if *<sup>λ</sup>* <sup>&</sup>lt; <sup>1</sup>+3*<sup>h</sup>* <sup>2</sup> .

Indeed, given that

$$\begin{split} \mathop{\rm lim\limits\_{\lambda\to+\infty}}\text{MSE}\left(\hat{\mathfrak{P}}(\lambda)\right) &=& \sigma^{2}tr\left(\left(\mathbf{X}\_{-k}^{t}\mathbf{X}\_{-k}\right)^{-1}\right) + \left(1 + \sum\_{j=0, j\neq k}^{p} \hat{\mathfrak{a}}\_{j}^{2}\right) \cdot \mathfrak{f}\_{k}^{2} \\ \text{MSE}\left(\hat{\mathfrak{P}}(0)\right) &=& \sigma^{2}tr\left(\left(\mathbf{X}\_{-k}^{t}\mathbf{X}\_{-k}\right)^{-1}\right) + \left(1 + \sum\_{j=0, j\neq k}^{p} \hat{\mathfrak{a}}\_{j}^{2}\right) \cdot \mathfrak{f}\_{k}^{2} \cdot \mathfrak{h}\_{\prime} \end{split} \tag{A2}$$

if *<sup>h</sup>* <sup>&</sup>gt; 1, then MSE *β*/(0) > lim *<sup>λ</sup>*→+<sup>∞</sup> MSE *β*/(*λ*) , and if *<sup>h</sup>* <sup>&</sup>lt; 1, then MSE *β*/(0) < lim *<sup>λ</sup>*→+<sup>∞</sup> MSE *β*/(*λ*) . That is to say, if *h* > 1, then the raise estimator presents always a lower MSE than the one obtained by OLS for all *λ*, and comparing expressions Equations (A1) and (A2) when *h* < 1, MSE *β*/(*λ*) <sup>≤</sup> MSE *β*/(0) if *<sup>λ</sup>* <sup>≤</sup> <sup>2</sup>·*<sup>h</sup>* <sup>1</sup>−*<sup>h</sup>* .

From this information, the behavior of the MSE is represented in Figures A1 and A2. Note that the MSE presents a minimum value for *λ* = *h*.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
