2. State of the Art
According to the definition given in [
3,
4], information is knowledge about objects, such as facts, events, things, processes, or ideas, including concepts that have special meaning in a specific context, and knowledge that reduces or removes the uncertainty about the occurrence of a specific event from a given set of possible events. In turn, data stand for a reinterpretable representation of information that is formalized in a manner suitable for communication, interpretation, or processing. This suggests that data are items of information without a dimension attribute. For example, the detector transmits data about a measured physical quantity with a specific value, but only the attribute of the measurement unit transforms the data into information.
IQ can be called a coefficient (or set of coefficients) which indicates the value of IQ. The definition in the ISO standard includes three components [
3]:
Syntactic IQ is the degree to which the data conform to a specific syntax;
Semantic is the unique and unambiguous conformance between identifiable data units and the entities represented;
Pragmatic quality is conformance with the requirements concerning the use.
In this study, the applied methods enable the determination of IQ both in continuous modeling, presented as a set of IQ dimensions [
2], and in discrete or hierarchical modeling, but also when divided into categories as in ISO 8000-8 [
3]. The applied methods are based on the calculation of dependent and independent coefficients. This approach enables any modeling of multi-layer models (including hierarchical models adopted in the aforementioned ISO standard). Therefore, the rest of this article presents flat IQ models without categorizing. Such a division does not affect the modeling results of independent elements. Because these flat models can be combined into larger multi-layer structures which reflect hierarchical division as well, it can be said that the model described in ISO is one of the particular forms of multi-layer models.
Based on the work of the ancient philosophers Lao Tsu and Plato [
5], the measure of quality can be expressed as the pursuit of perfection.
Figure 1 shows quality improvement (a term defined in ISO 9000: 2015 [
6]) as the pursuit of excellence. The graph in
Figure 1 provides the important information that excellence is not achievable. It is the limit of the infinite quality improvement function. However, subsequent steps in improving the quality result in quality improvement (a measure of quality) and the approach to perfection.
Expressed in mathematical symbols:
where:
D—perfection;
n—quality improvement steps;
wn—quality measure value string.
Based on Equation (1), it can be assumed that any function converging to some value at infinity can be a function that describes the quality measure well. The same dependencies also apply to IQ.
3. IQ Measure
In technical systems, IQ [
3] has a large impact on the assessment of the system, especially when it comes to information systems used in such critical areas as healthcare, energy, and transport. All data can be affected by various errors that introduce a certain amount of uncertainty into the information. This uncertainty may indicate to us the IQ. Hence, the smaller the uncertainty, the higher the IQ. This may concern road traffic control or the uninterruptible power supply (UPS) of hospital equipment. An example of this is the communicativeness and legibility of signs controlling road traffic.
In the literature, there are many studies examining IQ. The following are those that seem to be specific and also characteristic of the subject matter of this article. The main generator of the publication was a project launched at the Massachusetts Institute of Technology under the name MIT Information Quality Program (MITIQ) [
1]. Publications related to this project are the most frequently cited references in contemporary IQ studies. There are also many references to publications from this project in this paper. Two books are some of the most important items published under the MITIQ project. The first is “Information Quality” [
8]. The book describes the multidimensionality of IQ, how to measure it, and how to manage it. Difficulties in scaling and interpreting IQ measurements are also described. This book mentions as many as one hundred and eighteen quality attributes (features) that can be included in fifteen dimensions of IQ. The second book published within the MITIQ program worth mentioning in this study is “Introduction to Information Quality” [
2]. The book discusses the basics related to the currently understood multidimensionality of IQ, which is the basis for the proposed modeling methods in subsequent sections of this paper. The above-mentioned book also includes an extension of the previously crystallized view on the multidimensionality of IQ. It was supplemented, among others, with accessibility, security, and ease of manipulation. The book is the foundation for the guidelines published in 2010 on the US Department of Justice’s website regarding IQ.
In 2015, the ISO 8000-8 Information and Data Quality: Concepts and Measuring standard [
3] was published partly as an alternative to the work of MITIQ. The standard rather modestly refers to what was developed by the MITIQ program, but cites authors from earlier publications, including the authors of the two above-mentioned books [
2,
8]. ISO 8000-8 is a set of standards describing the IQ dimensionality and hierarchical classification of IQ, and includes definitions of basic concepts such as: data, information, metadata, and data unit. This standard introduces a division of IQ into three main categories: synthetic, semantic, and pragmatic. It also schematically presents the general principles of measuring the overall IQ and the specific (subjective) IQ for each category. The model presented there is quite modest and its description does not reflect the real extent of the problem but only tries to standardize the approach, which by using the above-mentioned three categories seems quite limited. In the following sections, the concept of determining IQ is developed based on a multi-layer quality model. The model described in ISO 8000 is also compatible with the concept described; namely, it is its special case.
Successive publications appearing worldwide indicate the extent and variety of issues related to IQ, its determination, and interpretation. Publication [
9] is a fine example. This publication presents approaches to IQ analysis in data integration framework schemes. It describes the integrated scheme quality assessment and distributed access to information systems, focusing on the minimum, consistency, and completeness of information. A multidimensional model of IQ was used for the evaluation. This approach is quite similar to that proposed in the following sections, although quite modest.
The publication [
10] introduces a new approach to IQ measurement and uses the Six Sigma (6σ) method to estimate IQ. This approach focuses on continuously improving the IQ by systematically assessing many of the IQ dimensions. In particular, it deals with the correlation and the relative importance of the IQ dimensions. Thanks to this method, a precise and systematic criterion for assessing the quality of information is proposed. This concept seems quite attractive, but it does not exhaust the complexities of IQ modeling.
The article [
11] discusses and analyzes the notion of IQ in terms of a pragmatic philosophy of language. It states that the concept of IQ is of great importance and must be situated better within a sound philosophy of information. It turns out that much research on IQ conceptualizes IQ as an inherent property of the information itself. A model of multidimensional IQ was presented, in which twenty-two dimensions were specified (accurate, appropriate, authentic, authoritative, balanced, believable, complete, comprehensive, correct, credible, current, good, neutral, relevant, reliable, objective, true, trustworthy, understandable, useful, usability, valid). These are more than the dimensions used in modeling conducted in the following sections. However, the modeling proposed in the following sections is open-ended and can theoretically be applied to an indefinite number of dimensions and may also include the number of dimensions shown in the article [
11].
One of the co-authors (the main co-author) published the first original publications describing the IQ in 2013 and 2014 [
12,
13]. The articles present a model for determining IQ based on the Certainty Factor (CF) in highway telematics. Such modeling usually concerns expert systems or artificial intelligence. However, in highway telematics systems, the main elements are computer systems that analyze and process data on vehicle traffic. Modeling assumed multidimensionality of the IQ. These dimensions are shown as both dependent and independent.
In [
14], the authors present the estimation of IQ in various domains. The article discusses the issues of portability of IQ modeling between domains. This led to the conclusion that an independent model should actually be created for each domain. The arguments presented in the following sections of this study lead to similar conclusions. In the proposed method of multidimensional, open modeling in this article, it is possible to build such an open model that will enable the description of IQ in many domains.
In 2014, the main co-author published two original works that provide the basis for this study [
15,
16]. Both publications were presented at the ESREL (European Safety and Reliability) Conference in 2014. The first paper [
15] discusses the IQ estimation model of ICT systems based on CF modeling. This modeling practice was typically used in expert systems or artificial intelligence. Here, however, computer systems that use data from ICT systems are discussed. CF modeling is one of the methods that allow us to obtain information about the properties of a system when data about this system are incomplete. The model helps identify and locate the weakest system components that have a disastrous effect on IQ. The second publication [
16] is a continuation of the previous works [
12,
13] involving the determination of IQ in various systems. When describing IQ, several basic dimensions were defined, such as: availability, actual value, completeness, reliability, flexibility, form, importance over time, accuracy, reliability, selectivity, and importance. One of the features of IQ dimensions was determined. The CF-modeling and Dempster–Shafer mathematical evidence methods were used.
The discussion on this topic was extended by the co-authors at the next ESREL 2015 conference [
17]. The publication demonstrated that modeling the uncertainty of IQ can be achieved using the mathematical evidence theory as in the publication from ESREL 2014 [
15,
16]. While in the case of independent sources influencing the IQ, the use of evidence theory is quite simple, in the case of dependent sources, this modeling is not possible. This work proposes a method of determining the IQ for dependent sources (a serial model). A two-layer model consisting of dependent and independent elements was presented. This multi-layer modeling became the basis for the models shown in the following sections of this study.
A different approach can be seen in the study described in [
18]. It attempts to investigate the importance of many information dimensions in measuring the IQ from the user’s point of view. The article provides a detailed analysis of the nature and importance of the various dimensions of IQ and their differences depending on the context and user demographics. This is an approach that takes into account only the subjective dimensions of IQ.
The next work [
19] presents the issues of IQ measurability. The article examines the reasons underlying the differences in the measurability of IQ. Using the structure of Gigerenzer’s “building blocks”, it was hypothesized that the feasibility of using a set of heuristic principles when assessing different IQ dimensions is a key factor influencing the inter-rater agreement (content moderators) in IQ judgments. This method was used to assess IQ in Internet resources.
Alternative approaches to understanding and modeling IQ that typically involve a particular approach or partial quality assessment have been described above. The publications below display how IQ can be measured. This issue has been examined not only for studies related to technical systems.
In [
20], the methodology for assessing the IQ for fifteen dimensions was defined and arranged in groups. The proposed methodology for IQ assessment (AIMQ) as a whole provides a practical tool for measuring IQ for an organization. It can apply at various organizational levels, such as the financial industry, healthcare, and manufacturing. The methodology is useful in identifying IQ issues, prioritizing areas of IQ improvement, and monitoring IQ improvements over time. This article presents a method that allows the IQ to be assessed in a hierarchical practical model arranged in groups. Such modeling usually has limitations; for example, such a model cannot be open because it is limited by groups. It has the same restrictions as the model described in ISO 8000 [
3].
The article [
21] presents a method that can be used in measuring the IQ of Internet resources. The presented method of measuring the IQ was limited to sixteen criteria, which partially overlapped with the dimensions presented in [
2]. The method was based on four successive steps with repetitions of sections. The content of websites, traffic volume, understanding, and feedback were examined, which means that this method enables the measurement of the quality of both information content and the quality of the medium that the Internet is.
A different approach was presented in [
22]. It attempts to indicate the best method of quality measurement yet, assuming that it is the definition of quality that imposes the measurement methodology. The paper includes a literature review and detects flaws in the methods presented there in the form of omitting the variability of requirements over time and different meanings of quality features. A method was proposed based on the division into analytical and synthetic measurements.
The study in [
23] proposed the quality assessment on two levels. A quality assessment based on an information decomposition of the fusion system in its elementary modules was planned. The first (global), which describes the entire information fusion system, and the second (local), for each elementary module. The method was based on the multidimensionality of the IQ, and the fusion was performed by estimating the Bayes’ subjective probability. The method seems very complicated, which limits its use.
The following article, [
24], presented an IQ model that shows how to understand IQ in the context of systems and also how to determine some common IQ indicators. The importance of predicting and modeling the IQ was also described. Building information chains automatically to meet the expected IQ was suggested. The limitation of this method is the application of chains that prevent the use of more complex structures.
Uncertainty modeling to determine IQ occurs very rarely in the literature. One of the few examples which do not belong to the authors of this paper is [
25] and describes the relationship between IQ and uncertainty modeling. Information uncertainty was presented as part of the IQ model. This type of approach seems to be very attractive, as shown in the following sections. However, the article does not develop a method into full IQ modeling using uncertainty modeling.
Summarizing the above-mentioned publications, it seems obvious that there is no specific method for determining the IQ, especially in technical systems. The authors of this article are trying to fill in the lack of such a study. Another disadvantage of the methods proposed above is the fact that they are often limited to a specific model. That is, the lack of openness in modeling to new dimensions of quality or their features. Another restriction of the presented methods is often the dependence on individual groups of IQ dimensions, which limits the flexibility of modeling and confines this modeling to selected dimensions. Another limitation is the frequent omission of the possibility of multi-level modeling with even the simplest, minimal division of quality dimensions into features. There is also no link between the IQ and its subsequent states. The authors make an attempt to eliminate all these limitations in this study, presenting a multi-layer model of IQ using uncertainty modeling and taking into account information states known from the literature. The article uses the mathematical evidence method as an example of uncertainty modeling used to determine the IQ. In the following sections, in addition to the description of the method, an example of calculations for the selected model and the simulation of the model results depending on the input coefficients are also presented. A similar approach was presented by the main co-author in [
7,
26,
27,
28,
29]. Applying the presented approach to the assessment and analysis of other systems by assessing reliability or risk, as presented in the works [
30,
31,
32], is also expected. To use the modeling presented here in various other types of technical system assessments, such as diagnostics [
33], risk assessment related to road and rail signaling [
27,
34], and development-related applications [
35], seems possible too.
4. Research Problem and Research Methodology
On the one hand, in the definition of the problem, the information quality dimensions can be distinguished, which depend on their features. On the other hand, there might appear information states, which depend on the structure of the ICT system. In order to model information quality, a flat model of sixteen quality dimensions presented in [
2] has been adopted [
7,
8,
27]. This model exhibits great elasticity and enables us to define the features of quality dimensions and also to subordinate the dimensions from these features. This model applies the quality dimensions, which are presented in
Table 1 and in
Figure 2.
The second element which demands modeling is the structure of the ICT system. One can encounter in the literature many models describing diverse information states in systems. The ones mostly elaborated on occur in [
36], where they are called information processes. The following types of information processes can be specified: generating, collecting, storage, processing, transmission, sharing, and interpreting (
Figure 3).
Each of the seven above-named information states is a consecutive element influencing IQ. Thus, a formula can be devised in the following way:
where:
m—the number of dimensions, IQ components (equals 16 according to
Table 1);
n—the number related to registration and data transmission in an ICT system (equals 7 according to
Figure 3);
w—a variable determining the influence of the particular dimension (e.g., range of values [0,1]).
The general form of the matrix:
As it has been mentioned before, searching for a method to determine IQ dimensions appears obvious.
Figure 4 presents the positioning of the quality dimensions features in respect to the dimensions themselves. It is worth noting that the dimensions as such can be completely independent, yet their features can be shared between the dimensions. This means that one feature can influence many IQ dimensions. For example, dimensions (see
Table 1) no. 3 (believability) and no. 8 (free of error) can have mutual determining features, e.g., errors of transmission or of data storage, which constitute the information (
Figure 5).
Figure 6 presents a universal diagram of an IQ model for an ICT system taking into account information states, IQ dimensions, and their dimension features.
A method to establish the value of factors of quality dimension features served as a method devised to determine IQ dimensions in publication [
17]. This method is based on uncertainty modeling using mathematical evidence and dependent relations in serial models.
Taking into consideration what has been so far described in this study, when determining the IQ of a chosen ICT system, one should follow the flowchart in
Figure 7. The first step is the choice of stages of information of the given ICT system on the basis of
Figure 3 (first step in
Figure 7). The second step is the choice of IQ dimensions on the basis of
Figure 2 or
Table 1. At this point, all dimensions can be taken into account, but this will complicate the calculations. Generally, it is not necessary to include all IQ dimensions presented in
Table 1 to determine the IQ of a chosen system. In the third step, the features of the chosen dimensions (
Figure 4) should be selected allowing for the fact that one feature can affect several dimensions (
Figure 5). According to the literature [
2], over one hundred and thirty features can be assigned to the presented sixteen dimensions. Only those features that have a significant impact on IQ, as in the example, should be selected. In the fourth step, it is possible to decide which of the information stages for the evaluated ICT system will be multiplied (
Figure 6). The penultimate step is to create a model or models to describe the impact of subsequent elements on the IQ. The last step is the calculation leading to one final IQ indicator. This sequence is used in the example in
Section 7.
7. Method Demonstration
To demonstrate the method, a program was written simulating some features of one of the IQ dimensions, and the target final quality value for the model of the IQ ICT system is presented in
Figure 8. This software was created by Marek Stawowy, one of the authors of this article. The software under the name DSHyb enables calculations for uncertainty models applying DS (mathematical evidence) and the hybrid method. From
Figure 8, it is evident that the model was restricted to two states of information of an ICT system. State e1 stands for the state of information transmission, and state e2 stands for the state of information interpretation.
The model of IQ state e1 presented in
Figure 9 includes three IQ dimensions:
e1a—appropriate amount of data;
e1b—believability;
e1c—error free.
The model of dimension e1c presented in
Figure 10 includes four dimensions features:
e1ca—correct information transmission;
e1cb—transmission of faulty data;
e1cc—data assignment to attributes failure;
e1cd—wrong attributes.
Table 2 presents values that are assigned to the model of IQ dimension in
Figure 9.
A table is the clearest way of presenting calculations for independent elements using the theory of mathematical evidence. Thus, presented in this form, it is possible to show dependencies in observation tables, as shown in
Table 3,
Table 4,
Table 5 and
Table 6.
Table 3,
Table 4,
Table 5 and
Table 6 show the subsequent stages of mass calculation. As a result, mass m
9 with the index e1ca indicates the Bel result (e1ca).
The determination of the Bel value can also be presented in the form of a matrix. A detailed example of such an operation can be found among others in the following publications: [
17,
27].
Having assigned the value with the use of the hybrid method described above and in [
17,
27], e1c = Bel (e1ca) = 0.9801.
Table 7 presents values that are assigned to the model of IQ dimension in
Figure 8.
Having assigned the value with the use of the hybrid method described in [
17,
27] and the example for calculating e1c, e1 = 0.99944.
Table 8 presents values that are assigned to the model of IQ states in
Figure 7.
In this case, the elements are dependent (serial), so Equation (12) must be applied. Thus, h = e1 · e2 = 0.97945.
This value will be the IQ coefficient of the ICT system presented in the above example.
As in the articles [
17,
29], below is presented a simulation of IQ depending on the positive observation e1ca (
Figure 11) and negative e1cb (
Figure 12). In order to obtain a visualization of IQ dependency change as a function of one of the IQ dimension features, a simulation was performed for the e1ca (observation with a positive influence on the IQ) value with a range between 0.05 and 0.99. The results of this simulation are presented in the form of a diagram in
Figure 11. As a result of the simulation, an approximate relationship was obtained, which is presented as an expected graph in
Figure 1.
In order to obtain a visualization of IQ dependency change as a function of one of the IQ dimension features, a simulation was performed for the e1cb (observation with negative influence on the IQ) value with a range between 0.001 and 0c. The results of this simulation are presented in the form of a diagram in
Figure 12.