*2.1. Information Quality*

There are many ways to describe information quality [21–23]. The best known are the descriptions in reports and publications related to Massachusetts Institute of Technology Information Quality Program (MITIQ) [24]. They developed, among other things, an information quality model based on sixteen dimensions. Ultimately, the MITIQ defined the dimensions of information quality, which are described as [24–27]:


 and

16. Value-added (Dvadd)—a dimension that determines the benefits of using datawhether they themselves are beneficial to the task.

Figure 1 shows all the above-mentioned dimensions affecting information quality. Each of the dimensions has a direct impact on information quality. Assuming that each value of the dimension (dimension factor) may vary in the range from 0 to 1, the dimension that does not affect the quality of information has the value of 1. The dimension that significantly reduces the quality has the value of 0. Taking the value range <0.1> allows calculating information quality by statistical methods (e.g., a probability of error Pe can be used as the free of error dimension coefficient = 1—Pe) but also adopting methods of estimating uncertainty, such as mathematical evidence based on the Dempster–Shafer theory or CF modelling [28–30].

In general, information quality (IQ) consists of the above-mentioned dimensions. Thus, IQ can be described by the formula:

$$\mathbf{IQ} = \mathbf{f}(\mathbf{w}\_1, \mathbf{w}\_2, \dots, \mathbf{w}\_{\mathbf{m}})\_{\prime} \tag{1}$$

where:


In the study below, modelling based on the certainty factor of hypothesis [31,32] was applied.

#### *2.2. Modelling Certainty Factor of Hypothesis*

As mentioned above, a convenient model for describing information quality may be modelling based on CF of the hypothesis. It is assumed that this factor's value is a direct value indicating the information quality related to the given hypothesis.

Accurate presentation requires describing formalisms [31,32]. The formal simplified description of the certainty factor is defined as:

$$\text{CF}(\mathbf{s}) = \text{MB}(\mathbf{s}) - \text{MD}(\mathbf{s}),\tag{2}$$

where:


One has to bear in mind that:

$$\text{'MB} \rightarrow \langle 0, 1 \rangle; \text{MD} \rightarrow \langle 0, 1 \rangle; \text{CF} \in \langle -1, 1 \rangle,\tag{3}$$

Interpretation of the measure of belief (MB) and the measure of disbelief (MD) to probability can be defined as:

$$\text{CF(s)} \begin{cases} 1 & \text{P(s)} = 1 \\ & \text{MB(s)} \\ & 0 \\ & -\text{MD(s)} \\ & -1 & \text{P(s)} < \text{P(\neg s)} \end{cases} \tag{4}$$

where:


However, as mentioned, we do not aim at determining probability because our quality measure is to be related to the CF of final hypothesis of the model.

Since there are many varieties of CF modelling, the basic dependents used in this paper are described below [31,32].

#### 2.2.1. Parallel Basic Model

The formula for calculating the transition according to Figure 2 between two parallel observations and the hypothesis are described as [30]:

$$\text{CF}(\mathbf{h}, \mathbf{e1}, \mathbf{e2}) = \begin{cases} \text{CF}(\mathbf{h}, \mathbf{e1}) + \text{CF}(\mathbf{h}, \mathbf{e2}) - \text{CF}(\mathbf{h}, \mathbf{e1}) \cdot \text{CF}(\mathbf{h}, \mathbf{e2}) & \text{if} \quad \text{CF}(\mathbf{h}, \mathbf{e1}) \ge 0 \text{ and } \text{CF}(\mathbf{h}, \mathbf{e2}) \ge 0\\ \frac{\text{CF}(\mathbf{h}, \mathbf{e1}) + \text{CF}(\mathbf{h}, \mathbf{e2})}{1 - \text{min}((\text{CF}(\mathbf{h}, \mathbf{e1}));(\text{CF}(\mathbf{h}, \mathbf{e2})))} & \text{if} \quad \text{CF}(\mathbf{h}, \mathbf{e1}) \cdot \text{CF}(\mathbf{h}, \mathbf{e2}) < 0\\ \text{CF}(\mathbf{h}, \mathbf{e1}) + \text{CF}(\mathbf{h}, \mathbf{e2}) + \text{CF}(\mathbf{h}, \mathbf{e1}) \cdot \text{CF}(\mathbf{h}, \mathbf{e2}) & \text{if} \quad \text{CF}(\mathbf{h}, \mathbf{e1}) < 0 \text{ and } \text{CF}(\mathbf{h}, \mathbf{e2}) < 0 \end{cases}, \tag{5}$$

**Figure 2.** Parallel transitions between two observations and a hypothesis.

#### 2.2.2. Serial Basic Model

In the case of a serial model for positive values (such appears in the modelling described later), according to Figure 3, the following dependent was used [31,32]:

$$\text{CF}(\text{h,e1,e2}) = \begin{cases} \text{CF}(\text{e2,e1}) \cdot \text{CF}(\text{h,e2}) & \text{if } \quad \text{CF}(\text{e2,e1}) > 0\\ 0 & \text{if } \quad \text{CF}(\text{e2,e1}) \le 0 \end{cases} \tag{6}$$
 
$$\text{H}^{-1}$$

**Figure 3.** Serial transitions between two observations and a hypothesis.

Both connections, parallel and series, can be reduced to one connection, as shown in Figure 4. This property enables the simplification of calculations in the model proposed in the next chapter.

**Figure 4.** The result of the simplification based on formulas (5) or (6).

In the following considerations, the final hypothesis's certainty factor is the value of the information quality.

#### *2.3. Parallel-Serial Model of the Analysed Solution of the Meteorological Station*

In literature, many models are describing various states of information. The most developed ones can be found in [33], where they are called information processes. The following types of these information processes are listed below (Figure 5):


In Figure 5, the three information states are combined into one because they usually occur together. Such a presentation of information processes also makes it possible to slightly simplify the model, which does not affect the model's overall accuracy.

**Figure 5.** Diagram of a general information quality model of an information system [40].

A generalised information quality model can be presented as follows. Each of the previously mentioned information states can be a consecutive node of the information quality model and generally presented in Figure 6 [40].

**Figure 6.** Diagram of the general information quality model of an information system [40].

In the presented case, the information quality model is limited to five information states, of which the fourth state contains three information processes, as shown in Figure 5. The general model consists of five hypotheses related to information states (Figure 6) and contains groups of factors, which influence measurement quality as below:

1. Dimensions related to the main data source. In this case, the data source is the weather station. The dimensions associated with this source influence the value of the indirect hypothesis h1. In the case of data source redundancy, the h1 hypothesis consists of many indirect hypotheses.


Each of the above points can be described with a full information quality model presented in Section 2.1. A schematic representation of such a model is shown in Figure 7.

**Figure 7.** General model of information quality for weather stations.

In the case of the qualitative model, only those dimensions that significantly affect the result are of any interest. Thus, in the following description, only those factors that have such an influence are presented.

The final hypothesis is h—the data have been correctly interpreted. It consists of dependent indirect hypotheses (Figure 7):


Each of the indirect hypotheses created on the basis of observations results from observing factors in a given group. In order to simplify the final calculations, the following description includes only some of the events and the observations that may affect the quality of information.

The indirect hypothesis based on the observations e1a consists of independent observations:


The indirect hypothesis based on the observations e1b consists of independent observations:


The indirect hypothesis based on the observations e2 consists of independent observations:


The indirect hypothesis based on the observations e4 consists of independent observations:


The indirect hypothesis based on the observations e5 consists of independent observations:


Figure 8 shows a graph of the model of indirect hypotheses h1a. The hypotheses models h1b, h3, h4, and h5 are similar.

Figure 9 shows the graph of the model of indirect hypotheses h2.

Figure 10 shows the graph of the model of indirect hypotheses h1.

**Figure 8.** Model for the indirect hypothesis h1a, h1b, h3, h4, and h5.

**Figure 9.** Model for the indirect hypothesis h2.

**Figure 10.** Model for the indirect hypothesis h1 (IQ1max—this is the maximum value that the hypothesis factor can reach [40]).

#### **3. Method Verification and its Computer Exemplification**

Sample calculations are presented below. Observation coefficients were estimated for the real measuring station shown in Figure 11 (observations e1a, e1b, and e2) and based on the authors'earlier publications [40,46]. The meteorological station is located in Poland in the northwest part of Warsaw on the premises of the Military University of Technology (geographical coordinates: 52◦1510.6 N and 20◦5358.9 E). The measurements were taken in May 2020. During the measurements, the following weather parameters were recorded: wind speed from 0 m/s to 12 m/s, temperature range from 3 ◦C to 25 ◦C, relative humidity from 35% to 90%.

The meteorological station includes the following sensors:

• Digital temperature and relative humidity sensor marked with the catalogue symbol SRH1A (abbreviation comes from the words: sensor, relative humidity) placed in an anti-radiation shield.


**Figure 11.** Meteorological station and its electronic module.

Additionally, the station includes a "Micropower" module which records and transmits data to the server. The station is situated on a two-meter-high aluminium mast. The digital temperature and relative humidity sensor marked with the catalogue symbol SRH1A is a measurement device which can operate both in external conditions and inside buildings. Its basic technical data [47] are shown in Table 1.

**Table 1.** Technical data of the digital temperature and relative humidity sensor.


Analogue NTC temperature sensor with the catalogue symbol ST1R [48] is meant to measure air temperature. Its case is made of stainless steel which allows the use of the sensor in difficult atmospheric conditions. The basic parameters are as follows:


In the block diagram (Figure 12) of the meteorological station, the metrological data processing path for ambient temperature consists of blocks filled with background.

**Figure 12.** Block diagram of a meteorological station.

With reference to the diagram in Figure 5, the individual elements are related to the observations in accordance with the following list:


The element related to e1a observations consists of an analogue NTC (negative temperature coefficient) temperature sensor with catalogue symbol ST1R, working properly in the range of 11–16 V supply voltage and a cable connection with a recorder. As described in the previous section of the article, the following characteristic observations were distinguished for this element:


The e1b element consists of a digital temperature sensor with catalogue symbol SRH1A that works correctly in the range of supply voltage 4–16 V and an interface for serial data transmission in the serial–digital interface, standard for microprocessor-based sensor (SDI-12 standard). As described in the previous chapter, characteristic observations were distinguished for this element:


Element 2 is a specialised recorder based on a single-chip micropower data logger microcontroller, requiring a supply voltage of 5–16 V and made in a technology that meets the IP67 standard of resistance to environmental factors. The recorder additionally includes an SDI-12 standard input expansion module and a memory card. Based on the observations, it was determined that the following events can occur in the e2 element:


The values in Table 2 were calculated on the basis of the annual observation time of the meteorological station, which is shown in Figure 11. The states of fitness and unfitness of individual elements included in the tested meteorological station were determined [49–52].

**Table 2.** Observation coefficients (hxx, exx.x).


The value of the maximum coefficient of hypothesis (h1, IQ1max) was assumed at a level close to the value 1, namely 0.9999.

The coefficients of successive indirect hypotheses are determined using Equation (5).

$$\begin{array}{l} \text{CF} (\text{h1a, e1a1, e1a2}) = \frac{\text{CF} (\text{h1a, e1a1}) + \text{CF} (\text{h1a, e1a2})}{1 - \text{min} (|\text{CF} (\text{h1a, e1a1})| | \text{CF} (\text{h1a, e1a2}) |)}\\ = \frac{0.95 + (-0.02)}{1 - \text{min} (|0.95| |(-0.02) |)} = \frac{0.83}{0.88} \cong 0.94898 \end{array} \tag{7}$$

$$\begin{array}{l} \text{h1a} = \text{CF}(\text{h1a}, \text{e1a1}, \text{e1a2}, \text{e1a3}) = \frac{\text{CF}(\text{h1a}, \text{e1a1}, \text{e1a2}) + \text{CF}(\text{h1a}, \text{e1a3})}{1 - \text{min}(|\text{CF}(\text{h1a}, \text{e1a1}, \text{e1a2})| | |\text{CF}(\text{h1a}, \text{e1a3}) |)}\\ = \frac{0.94898 + (-0.04)}{1 - \text{min}(|0.94898 \ | \cdot | (-0.04) \|)} = \frac{0.9227}{0.96} \cong 0.94685 \end{array} \tag{8}$$

h1b, h2, h3, h4, and h5 are calculated in a similar way and they amount to:

> h1b = 0.98988 h2 = 0.98989 h3 = 0.87319 h4 = 0.82032 h5 = 0.64124

The next step is to determine the value of h1. Similarly, using Equation (5), the value of h1 is determined. h1a and h1b are replaced by values h1a = −1 + h1a and h1b = −1 + h1b.

$$\text{CF}(\text{h1a}\prime, \text{IQRmax}) = \frac{\text{CF}(\text{h1}, \text{IQRmax}) + \text{CF}(\text{h1a}\prime, \text{IQRmax})}{1 - \min(|\text{CF}(\text{h1}, \text{IQRmax})| \cdot |\text{CF}(\text{h1}, \text{IMA}\prime)|)} = \frac{0.9999 + (-0.05315)}{1 - \min(|0.9999\prime| \cdot |(-0.05315)|)} \hat{=} 0.999894,\tag{9}$$

$$\begin{array}{rcl} \text{h1} &=& \text{CF}(\text{h1}, \text{h1a}\prime, \text{h2b}\prime, \text{lQ1max}) = \frac{\text{CF}(\text{h1a}\prime, \text{lQ1max}) + \text{CF}(\text{h1b}\prime, \text{lQ1max})}{1 - \min(|\text{CF}(\text{h1a}\prime, \text{lQ1max})| ; |\text{CF}(\text{h1b}\prime, \text{lQ1max})|)} \\ &=& \frac{0.999894 + (-0.0102)}{1 - \min(|0.998994| ; |(-0.0102)|)} \stackrel{\sim}{=} 0.999893 \end{array} \tag{10}$$

Using equation 6, the final hypothesis coefficient can be determined as: h = h1 · h2 · h3 · h4 · h5 = 0.999893 · 0.98989 · 0.87319 · 0.82032 · 0.64124 = 0.45462, (11)

#### **4. Simulation and Results using Real Measurements**

In order to present the influence of the observation coefficients on the indirect hypotheses and the final hypothesis, a series of simulations was performed. The results are presented below in the form of graphs. The simulations were run with a programme written (by the first author) for this purpose. The first graph in Figure 13 shows the effect of the observation coefficient values associated with the temperature sensors. The range of the coefficients e1a.1 and e1b.1 is from 0.5 to 0.99.

The next graph (Figure 14) shows the influence of the negative coefficients of observations on the analogue temperature sensor. The range of the coefficients e1a.2 and e1a.3 is from −0.09 to −0.01.

**Figure 13.** The result of the simulation of the h hypothesis value depending on the observation coefficients e1a.1 and e1b.1.

**Figure 14.** The result of the simulation of the h hypothesis value depending on the observation coefficients e1a.2 and e1a.3.

The next graph (Figure 15) shows the influence of the negative coefficients of observations related to the digital temperature sensor. The range of the coefficients e1b.2 and e1b.3 is from −0.099 to −0.001. The coefficient e1b.4 was omitted because function h (e1b.4) has the same values as function h (e1b.3).

**Figure 15.** The result of the simulation of the h hypothesis value depending on the observation coefficients e1b.2 and e1b.3.

Figures 13–15 show some of the most important functions representing the impact of the selected and most important observations on the final hypothesis h (the data were correctly interpreted). Graphs of the presented functions show a tendency towards nonlinearity. In the ideal model, they should aim asymptotically at the value, which for this model means absolute excellence of the system, as shown in Figure 16 as an idealised curve [40]. The graphs in Figures 13–15 also prove that each of the observation coefficients affects the final value of h, and the effect is non-linear.

**Figure 16.** Illustration of the process of improving quality as a pursuit of excellence [40].

In practical terms, the presented simulation results make it possible to show whether the values calculated for the designed model are consistent with the assumptions.
