**1. Introduction**

Over the last few years it has become more common for organizations to implement data science initiatives to support the digital transformation of their business (Provost and Fawcett 2013). However, organizations continue to find it difficult to trust data science outcomes for decision-making purposes, as the data is often found to be lacking the required quality (Lin et al. 2006), and it is often unclear how compliant the use of the data and the algorithms are with regards to relevant legal frameworks and societal norms and values (Nunn 2009; van den Broek and van Veenstra 2018). These uncertainties are a barrier to the acceptance and use of data science outcomes due to the possibility of financial risk and damage to an organization's reputation. For example, when making decisions regarding the managemen<sup>t</sup> of physical assets, asset managers need to be able to trust the data science outcomes before they are confident enough to use these outcomes. Examples of these decisions include when and where to perform maintenance on highways or when to replace a bridge. Erring on the side of caution can be unnecessarily expensive whilst irresponsible delay of maintenance can put public safety at risk. In order for data science to be successfully adopted, it is therefore vital that organizations are able to trust the integrity of the data science outcomes (Council on Library and Information Resources

2000; Randall et al. 2013). Recently, data governance has gained traction with many organizations as a means to develop this trust (Al-Ruithe et al. 2019; Brous et al. 2016). However, it remains unclear how data governance contributes to the development and maintenance of trust in data science for decision-making, leading to calls for more research in this area (Al-Ruithe et al. 2019; Brous et al. 2020).

The goal of data science is to improve decision-making. According to Dhar (2013), the term data science refers to knowledge gained through systematic study and presented in the form of testable explanations and predictions. As such, data science di ffers from traditional science in a number of ways (Dhar 2013; Provost and Fawcett 2013). Traditionally, scientists study a specific subject and gather data about that subject. This data is then analyzed to gain in-depth knowledge about that subject. Data scientists tend to approach this process di fferently, namely by gathering a wide variety of existing data and identifying correlations within the data which provide previously unknown or unexpected practical insights. However, research has shown that favoring analytical techniques over domain knowledge can lead to risks related to incorrect interpretation of the data (Provost and Fawcett 2013). Due to the automation of the decision-making process, it may be tempting to regard data science decision-making outcomes as being purely rational. However, as with all decision-making, the quality of the outcomes are subjected to the constraints of bounded rationality (Simon 1947; Newell and Simon 1972), in that decision-making is constrained by the quality of the data available at the time. Data science models make decisions based on the information available to them at the time and also in the time given (Gama 2013). According to Gama (2013), bounded rationality can also appear in data science in the tradeo ff between time and space required to solve a query and the accuracy of the answer. As such, it is not surprising that many organizations are implementing data governance in order to gain control over these factors (Alofaysan et al. 2014; Brous et al. 2020; van den Broek and van Veenstra 2018). Although recognized as being a powerful decision-making tool, data science is limited by the quality of the data inputs and the quality of the model itself.

Data governance can be defined as "the exercise of authority and control (planning, monitoring, and enforcement) over the managemen<sup>t</sup> of data assets" (DAMA International 2017, p. 67), and can provide direct and indirect benefits (Ladley 2012). For example, Paskaleva et al. (2017) show that adoption of data governance can change how data is created, collected, and used in organizations. Data governance can greatly improve the awareness of data science outcomes for the managemen<sup>t</sup> of infrastructure in, for example, a smart city environment (Paskaleva et al. 2017). However, information technology (IT)-driven data governance initiatives have failed in the past (Al-Ruithe et al. 2019), often being a ffected by technical feasibility aspects carried out on system by system basis.

In this paper, a di fferent starting point is used, and the focus is put on the investigation of data governance as a boundary condition for data science, which needs to be satisfied in order to be able to trust data science outcomes as suggested by Brous et al. (2020) and Janssen et al. (2020). In this research, boundary conditions for data science are defined as socio-technical constraints that need to be satisfied in order to be able to trust data science outcomes. These conditions refer to the "who, where, when" aspects (Busse et al. 2017) of data science before data science outcomes can be used. Previous research (Brous et al. 2020; Janssen et al. 2020) has suggested that data governance can be viewed as a boundary condition for data science. As such, our main research question asks, how is data governance a boundary condition for data science decision-making outcomes?

In order to answer this question, two explanatory data science case studies in the asset managemen<sup>t</sup> domain were analyzed with specific regard for the role of data governance as a boundary condition for trustworthy predictive decision-making through the creation of trust in data science decision-making outcomes. The first case under study is a data science project designed to improve the e fficiency of road maintenance through predictive maintenance. The project was performed under the auspices of a large European governmen<sup>t</sup> organization using a multitude of datasets which were sourced both within the organization and externally. Open data (Zuiderwijk and Janssen 2014) were also employed within this case study. The second case study is a data science project which analyzes transformer data to identify the fraudulent use of electricity within medium and low tension electrical grids without infringing privacy regulations. This project was performed under the auspices of a European distribution grid operator (DGO) which is responsible for the distribution of electricity over medium and low tension grids in a highly industrialized region of Europe.

Duality of technology theory (Orlikowski 1992) is used to guide the analysis of the case studies in understanding trust in data science outcomes as a boundary value problem and specifically the role of data governance as a boundary condition for trusting data science outcomes. Duality of technology (Orlikowski 1992) describes technology as assuming structural properties while being the product of human action. From a technology standpoint, data science outcomes are created by data scientists in a social context, and are socially constructed by users who attach di fferent meanings to them and provide feedback to the data scientists. In this way, data science outcomes are the result of the ongoing interaction of human choices and organizational contexts, as suggested by duality of technology (Orlikowski 1992). This approach di ffers from previous research into data science success factors, which have focused on the view that data science is either an objective, external force which has a deterministic impact on organizational properties (Madera and Laurent 2016), or that trust in data science outcomes is purely a result of strategic choice and social action (Gao et al. 2015). Duality of technology theory suggests that either model would be incomplete and suggests that both perspectives should be taken into account when analyzing boundary conditions of data science. The results of the case studies sugges<sup>t</sup> that data science outcomes are more likely to be accepted if the organization has an established data governance capability, and we conclude that data governance is a boundary condition for data science as it enables organizational conditions and consequences of data science to be met and ensures that outcomes may be trusted.

The paper reads as follows. Section 2 presents the background of literature regarding the relationship between data governance and data science. In Section 3 the methodology of the research is described. Section 4 describes the findings of the case study. Section 5 discusses the findings of the case study and Section 6 presents the conclusions.
