**1. Introduction**

Highway agencies devote significant resources to collecting, storing, and maintaining many forms of data, ranging from preliminary survey data to pavement condition data, throughout the life cycle of a highway project. For instance, the National Highway Authority of India launched Data Lake, a project monitoring tool to track and monitor the progress of projects and to act as the central repository of documents across the project life cycle [1]. According to the FMI's (2019) report titled "Big Data Equals Big Questions for the Engineering and Construction Industry," some of the most significant infrastructure projects require an average of 130 million emails, 55 million documents, and 12 million workflows. At the same time, 95.5% of all data collected in the engineering and construction industry is unutilised because many firms cannot manage and process vast amounts of data for decision-making [2]. According to a 2018 industry report titled "Construction Disconnected" by FMI, 48% of all reworks in infrastructure projects in the United States are caused by poor data and miscommunication, resulting in an annual cost of over USD 31.3 billion. Globally, an average of 52% of rework was caused by poor data and communication, amounting to USD 280 billion. The primary cause of poor data and information was that 34.4 percent of reworks were caused by incorrect project data, meaning it was out-of-date or otherwise flawed data, while 28.8 percent of reworks were caused by difficulty gaining access to necessary project data [3]. Despite the significant investment, data utilisation to users' needs for extracting information, knowledge, and support decisions

**Citation:** Krishna, C.M.; Ruikar, K.; Jha, K.N. Determinants of Data Quality Dimensions for Assessing Highway Infrastructure Data Using Semiotic Framework. *Buildings* **2023**, *13*, 944. https://doi.org/10.3390/ buildings13040944

Academic Editors: Ming-Hung Hsu, Osama Abudayyeh, Zheng-Yun Zhuang and Ying-Wu Yang

Received: 30 December 2022 Revised: 27 March 2023 Accepted: 29 March 2023 Published: 2 April 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

has become debatable [4]. Data collection is becoming an increasingly significant asset for today's highway arena within highway management and operation. Several systems and technologies have created significant infrastructure data in recent years [5].

Data have been widely used to manage system operations and provide information on highway conditions. However, public and private users discovered that utilising and operating the data is becoming increasingly complex. Data are collected with varying degrees of precision and resolution, and data formats are often incompatible [6].

Technological advancements in data collection result in the real-time monitoring of data and a massive volume of data, Such as data collected in the structural health monitoring of a bridge [7] and data collected during the degradation process of concrete material [8]. In addition, the issue intensifies as the volume of data continues to increase [9–15]. Ghasemaghaei and Calic [16] discussed the role of data quality and diagnosticity in the firm's decision-making, considering the effect of big data processing. However, there is substantial evidence that data quality issues are pervasive in practice and that relying on poor or uncertain data results in less effective decision-making. It also increases the cost of correcting the data in the decision-making process of highway projects [17,18].

Data quality has been extensively studied in various disciplines for several decades [19]. It has become a professional field, emphasising organisational strategy and effective decisionmaking [20,21]. In addition, data quality is considered a multi-dimensional concept in the literature [22–24]. In the last two decades, scholars and practitioners have proposed several classifications of data quality dimensions, many of which have overlapping and occasionally contradictory meanings concerning respective disciplines (e.g., [14,24–26]). Despite the different classifications, few investigations have attempted to integrate these perspectives of data quality dimensions to assess the quality of highway data for effective decision-making. For instance, Coleman [27] gave an insightful examination of the various current classifications of data quality dimensions and identified sixteen mutually incompatible dimensions.

Although numerous studies have found the significance of data quality for decisionmaking based on various frameworks and methodologies, not much focus has been given to assessing data quality at different decision-making levels of highway projects [5,28,29]. Samitsch et al. [30] provided a guide for companies seeking to improve organisational performance by improving data quality, with a combination of 16 dimensions. Addressing this issue necessitates a method for comprehending data quality, followed by methods for enhancing data quality and decision-making based on data quality information. This research proposes a semiotic-based framework for comprehending highway infrastructure data quality, consisting of four levels: syntactic (form), empiric (connection), semantic (meaning), and pragmatic (use) [29]. The semiotic-based framework assesses and understands data quality based on the semiotic theory's application. Semiotic theory concerns using signs and symbols to convey data, information, and meaning [31]. A review of data quality frameworks applied in various fields was also carried out. Such as the semiotics framework, AIMQ methodology, data quality assessment (DQA), the observe-orient-decide-act methodology (OODA DQ), and the Canadian Institute for health information methodology (CIHI) framework are used in the healthcare industry for data quality assessment [32–36], while the total data quality management (TDQM) framework, comprehensive methodology for data quality management (CDQ), data quality practical approach (DQPA), task-based data quality method (TBDQ), and data quality assignment framework (DQAF) are used in the IT industry to deliver high-quality information products (IP) to information consumers [9,37–41]. A DQMos model and DQMes methodology are used for evaluating data quality in software engineering experiments data [42]. A questionnaire survey identified the critical data quality dimensions of the proposed semiotic framework levels from the National Highway stakeholders for decision-making. The survey helps the National highway stakeholders understand the parameters or dimensions of data quality to assess the quality of data stored in the data lake. The study investigated identifying the framework for

analysing data quality and determining the appropriate framework for assessing highway infrastructure data. Currently, there are no specialised studies of data quality dimensions for evaluating highway infrastructure data.

A literature review was conducted first for the study, followed by identifying data quality frameworks. The second step identified data quality dimensions within the four levels of the semiotic data quality framework. In the third step, an interview and questionnaire were conducted in two stages. Initially, an interview survey was undertaken to develop a list of data quality characteristics that reflect the opinions of data consumers regarding data quality. For the second stage, a questionnaire was developed from the identified dimensions through an interview study. The questionnaire survey was conducted to gather information on the importance of each of these dimensions to data consumers at the individual level of decision-making, followed by a ranking of the dimensions within the categories of semiotic frameworks to comprehend stakeholders' priorities for each characteristic data quality.

The paper is structured as follows: The next section focuses on the literature review of frameworks and data quality dimensions and identifies the most effective framework for evaluating highway infrastructure data. The subsequent section addresses the research methodology, the findings, and an analysis of the findings. Finally, conclusions and future work scope are presented.

## **2. Objectives**

The study's main objective was to investigate highway infrastructure data quality dimensions and the framework for assessing data quality. According to the 2018 FMI report, the cost of reworks caused by poor data quality and accessibility of data in the United States was USD 31.3 billion, while in Australia and New Zealand, it was USD 8.4 billion, and in the United Kingdom, it was USD 10.2 billion [3]. The literature shows that poor data quality negatively impacts the time and cost to make a decision and decision-making performance in the highway infrastructure project lifecycle. Hence, assessing data quality is critical for organisations and creates importance for identifying the dimensions to define data quality. The objective of the study was divided into three key research objectives as follows:


#### **3. Literature Review**
