1. Introduction
Actuarial, financial, and economic literature is abundant with models and analyses of background, or systematic, risks that affect decision making (cf., e.g.,
Finkelshtain et al. 1999;
Franke et al. 2006,
2011;
Nachman 1982;
Pratt 1998;
Guo et al. 2018;
Furman et al. 2018; and references therein). Various models have been proposed, including additive, multiplicative, and more intricate ones that couple underlying losses (or, generally speaking, inputs) with background risks. For recent far-reaching contributions to this area, we refer to
Perote et al. (
2015),
Su (
2016),
Su and Furman (
2017a,
2017b)
Semenikhine et al. (
2018),
Guo et al. (
2018), as well as to the extensive lists of references therein.
Whether or not these risks affect the underlying input variables and thus decision making is a problem of immense interest. From the conceptual point of view, broadly speaking, two scenarios arise. First, if it is suspected that the outputs are affected, then testing whether or not this is indeed the case falls, in a sense, within the context of regression analysis, though additional statistical challenges arise (e.g.,
Perote and Perote-Peña 2004;
Perote et al. 2015.;
Chen et al. 2018;
Gribkova and Zitikis 2018). The second scenario, which is the main topic of the present paper, deals with the case when it is the inputs that are possibly affected by risks.
Statistically speaking, given the input and output random variables
X and
Y, respectively, which in the risk-free scenario are connected by a “transfer” function
h via the equation
we wish to have an algorithm that would tell us whether risk-free model (
1) is true or the risk-contaminated one
where
is an exogenous risk, sometimes called input-reading error, that directly affects the input
X and thus, indirectly, the output variable as well. We note that
Chen et al. (
2018) consider model (
1) with deterministic inputs, like those to be defined in Equation (
3) below.
Gribkova and Zitikis (
2018) explore risk-free model (
1), which can be viewed as the “null hypothesis” in the context of the present paper. Hence, model (
2) can be viewed as the “alternative hypothesis,” and the algorithm to be constructed and illustrated in this paper will distinguish between the two hypotheses.
The rest of the paper is organized as follows. In
Section 2, we lay out the foundations for assessing the presence, or absence, of input-affecting risks. In
Section 3, we describe the algorithm itself. It relies on two statistics whose roles, interrelationship, and asymptotic properties are presented in
Section 4 and
Section 5.
Section 6 concludes the paper with a brief overview of main findings.
2. The Model
Systems are usually associated with finite-length transfer windows, say
, and also with transfer functions
. Let
be input random variables, which we assume to be pre-whitened (e.g.,
Box et al. 2015), that is, independent and identically distributed (iid). Denote their marginal cumulative distribution functions (cdf) by
, whose support is the transfer window
. Hence, the input values are always in
. We assume that the cdf
is strictly increasing on the interval
, with
and
. In fact, to simplify mathematics and still cover a wide variety of applications, we assume that the cdf is continuously differentiable and its probability density function (pdf) is bounded away from 0 on the transfer window
.
Denote the input-affecting risks by
, which act upon the inputs
as visualized in
Figure 1.
We assume that the input-affecting risks are pre-whitened, that is, iid random variables, and we also assume that they are independent of the input variables and are affecting their values in the additive way. The inputs take values in the interval , but the risks , being exogenous variables, are not restricted to any domain and can therefore take any real values. Our goal in this paper is to offer a practical way for detecting whether or not the risks are absent, or present. Two following notes relate our research to the topics in statistical literature.
First, the problem that we tackle is different from that dealing with errors-in-variables, where observations already contain errors, whereas in our case, the inputs are uncontaminated but possibly become such while being transferred into the filter, also known as the transmission channel in the engineering literature. That is, in the errors-in-variables scenario, we would observe , whereas in the present context we observe the original inputs and want to know whether or not they are affected by .
Second, there is a connection between our research and classical regression, and we have already noted contributions by
Perote and Perote-Peña (
2004),
Perote et al. (
2015), where we also find extensive lists of related references. Namely, given the outputs
and assuming for the sake of argument that the risks
are small, the Taylor formula gives the approximation
, which places the input-based scenario into the output-based scenario
, but the risks
depend on the inputs
via the term
. This dependence feature presents a major hurdle, which we circumvent in our following considerations and produce a user-friendly algorithm for detecting
’s when they are present.
Throughout the paper we assume that the transfer function has a bounded and continuous first derivative, and we also assume that the derivative is not identically equal to 0, thus ruling out the trivial case of constant transfer functions. Actually, throughout the paper we also exclude the case , which causes some technical complications but is hardly of practical relevance, as we shall explain in the next section. If, however, due to some considerations we would need to depart from these conditions, then there is room for relaxing the conditions, though naturally at the expense of more complex considerations.
3. The Algorithm
We first elaborate on the definition of outputs. Indeed, even though
’s are in the transfer window
, the affected inputs
may or may not be in
, which is the domain of definition of the transfer function
. Hence, the actual outputs are
where
Since the cdf of
X is continuous, we can uniquely order the random variables
. The resulting order statistics
give rise to the concomitants
(e.g.,
David and Nagaraja 2003). Based on them and using the notation
, we define the statistics
and
and then, in turn, their ratio
The algorithm, to be introduced in a moment, for detecting input-affecting risks is based on asymptotics, when n gets large, of and , which we call the pivot and its supporter, thus hinting at their main and supporting roles, respectively. Before formulating the algorithm, we make the natural assumption that the risks, when they exist, should not be so large that the performance of the system would be derailed to such an extent that it becomes unnecessary to run any algorithm. For the purpose of rigour, in the following definition we summarize the circumstances under which there is ambiguity as to the absence, or presence, of input reading risks, and thus employing the algorithm becomes warranted.
Definition 1. The presence of input-affecting risk is suspected, and thus becomes a subject for testing, when it is believed that there is a set such that the event has a (strictly) positive probability and, for all , the random variable is non-degenerate, due to the random δ.
We note at the outset that Definition 1 is a user-friendly reformulation of technically-looking condition (
10) to be presented in
Section 5 below, where it plays a pivotal role in setting rigorous mathematical foundations for our algorithm. In this regard, we note that the condition is tightly tied to the indefinite growth of
when the sample size
n grows, as we shall see in Theorem 3 below. Hence, if the subject-matter knowledge is not sufficiently convincing for the decision maker to see whether or not the circumstances delineated by Definition 1 hold, then data-based checking of the asymptotic behaviour of
for large
n should clarify the situation.
Definition 1 implies that the system’s output varies not just because of X but also because of , assuming of course that the latter is present, that is, is not degenerate at 0. This, for example, excludes situations (as unquestionably obvious) when for every x (i.e., when is very large), or when for every x (i.e., when is very large). In either of these extreme cases, the decision maker would immediately see the system’s malfunction because of the outputs constantly lingering on, or near, the boundaries and , and thus no special testing would be warranted.
We are now ready to formulate the algorithm for detecting the input-affecting risk when its presence is suspected.
- Case 1:
The pivot is not approaching .
- (i)
If decisively tends to a limit other than , then we advise the decision maker about the absence of the risk.
- (ii)
If seems to tend to a limit other than but there is some doubt as to whether this is true, then we check if the supporter is asymptotically bounded, and if yes, then we advise the decision maker about the absence of the risk.
- Case 2:
The pivot is approaching .
- (i)
If the supporter tends to infinity, then we advise the decision maker about the presence of the risk.
- (ii)
If the supporter is asymptotically bounded, then and are likely to be insufficiently different to have already triggered Case 1 above, and we thus advise the decision maker about the absence of the risk.
In the next two sections, we present rigorous results upon which the above algorithm relies. We note in passing that irrespective of whether the algorithm detects risks or not, in either case we can still wish to double-check the findings. It can also be necessary to check the system’s vulnerability (e.g.,
Hug and Giampapa 2012; and references therein). In such cases, we can use artificially constructed inputs, such as
We conclude this section with an example that shows how the algorithm works in practice. For this, let the transfer function be for . Furthermore, upon recalling that the (unconditional) Lomax cdf is for , with shape and scale parameters and , we assume that the input X follows the distribution conditioned on the transfer interval . Throughout the illustration, we set and .
Let
follow the normal distributions with the mean 0 and standard deviation
. In the risk-free case (i.e.,
), the asymptotics of
and
is depicted in panels (a) and (b) of
Figure 2, and when
, their asymptotics is depicted in panels (c) and (d). We also check the performance of the algorithm when the risk
is discrete, specifically, when it is equal to
with probability
and to 2 with probability
. The asymptotics of
and
is depicted in panels (e) and (f) of
Figure 2.
We see from the left-hand panels that the pivot
converges to the limit other than
(i.e., to the value of
to be defined by Equation (
4) in the next section) only in the risk-free case. The increasing pattern of
in panels (d) and (f) confirms the presence of input risk in both scenarios, which have initially been detected by the pivot
(due to its convergence to
) in panels (c) and (e). Note that the convergence to
in panel (e) is decisive, whereas the convergence in panel (c) may not be so well pronounced, and thus the increasing pattern of
in panel (d) provides reassurance.
4. Asymptotics of the Pivot
For another perspective on the meaning of
, we refer to
Davydov and Zitikis (
2017) where
arises as the solution to an optimization problem. The importance of Theorem 1 in the present paper follows from the fact that when the cdf
is non-degenerate, then (details in
Section 5 below) the pivot
converges to
when
. Of course, the limit
can also manifest when
is absent, that is, in the context of Theorem 1, but this can happen only when
. Indeed, as it is easy to check using the equations
and
with
, we have
if and only if
. The latter property is, however, an exception rather than the rule: it manifests in such cases when, for example, the system is down and thus
takes the same value irrespective of
. Hence, unless explicitly noted otherwise, throughout the paper we assume
as we have already mentioned earlier.
We next discuss how to check whether or not the risk
is degenerate. Naturally, in order to detect anomalies, the original state of the system has to be in reasonable working order (cf., e.g.,
Cárdenas et al. 2011, p. 360).
Gribkova and Zitikis (
2018) have put forward an argument in favour of the following definition.
Definition 2. A system is in reasonable working order whenever in the absence of input-affecting risk (i.e., when almost surely), the sequence is asymptotically bounded in probability. In mathematical terms, we write this as when .
Given that in the absence of input-affecting risk we are exploring the asymptotic behaviour of the pivot
, which is the ratio of
and
, both of which are asymptotically bounded in probability, the requirement
is natural. This can be seen from the following argument involving the mean-value theorem:
for some
between
and
, where
As a side-note, the right-hand side of bound (
6) implies that, if needed, the boundedness of the first derivative of the transfer function can be relaxed and the system can still remain in reasonable working order, as per Definition 2.
We next present an example that shows what happens with the system when the input-affecting risk is present, that is, when the cdf
is non-degenerate. Before starting the example, we recall (
David and Nagaraja 2003) that the concomitants
can be written as follows
where
is the random variable among
that corresponds to
. As noted by
David and Nagaraja (
2003, p. 145), the random variables
are iid and follow the cdf
of the original risk
.
Example 1. Let δ take value with probability and with probability , and let . The latter assumption implies that irrespective of the value of , the value of is above b with probability p and below a with probability . Hence, the concomitant is equal to with probability p and to with probability . Since each concomitant can take only two values, is equal to when and 0 otherwise. Consequently,which implies Since the variables are iid and follow the same cdf as the original δ, the mean is equal to and thus Equation (7) implies From this we conclude that if p is neither 0 nor 1, which we assume, and if , which we also assume, then when . Analogous arguments lead to Combining statements (8) and (9), we have when , which in turn implies that the system is affected by the risk. This concludes Example 1. The above example has been constructed to show—in a somewhat dramatic way—what happens when the input-affecting risk pushes the input outside the transfer window, but the same conclusion can be reached under much weaker assumptions on , as we shall show in the next section.