**3. Methodology**

This paper describes two exploratory case studies using a multi-method approach to investigate the role of data governance as a boundary condition for data science. Case study is a widely adopted method for examining contemporary phenomena, such as the adoption of data governance (Choudrie and Dwivedi 2005; Eisenhardt 1989). In this research, we follow the design of an explanatory case study research proposed by Yin (2009), including the research question, the propositions for research, the unit of analysis, and the logic linking the data to the propositions. As suggested by Eisenhardt (1989), the research was contextualized by a review of background literature.

The literature background reveals that the results of data science initiatives are often not accepted by asset managemen<sup>t</sup> organizations (Brous et al. 2017). Data science initiatives often face a number of acceptance challenges in asset managemen<sup>t</sup> organizations due, in part, to a lack of trust in the data science outcomes (Cao et al. 2016; Yoon 2017). Facing these challenges has led many asset managemen<sup>t</sup> organizations to adopt data governance as a means of coordinating and controlling the impact of data science on organizations. However, data governance remains a poorly understood concept and its contribution to the success of data science has not been widely researched. As discussed above, our main research question therefore asks, how is data governance a boundary condition for data science?

Following Ketokivi and Choi (2014), deduction type reasoning provided the basic logic for the propositions to be tested in a particular context, namely data science in an asset managemen<sup>t</sup> domain. According to Ketokivi and Choi (2014), this general logic is augmented by contextual considerations. The data analysis in this research utilizes a combination of within-case analysis (Miles and Huberman 1994) and cross-case analysis, which enabled the delineation of the combination of factors that may have contributed to the outcomes of the case (Khan and Van Wynsberghe 2008). In this research, the unit of analysis was a data science project in the asset managemen<sup>t</sup> domain.

Two case studies were selected. The first case study, "Project A", was a data science project for the purpose of predictive, "just-in-time" maintenance of asphalted roads. The project was conducted under the auspices of a large European public organization tasked with the maintenance of national highways. The second case study, "Project B", was a data science project for the purpose of discovering fraudulent use of electricity in medium and low tension electrical grids without impacting individual privacy rights. Table 1 below shows the properties of the two cases according to the subject, domain, organization size, organization type, number of datasets used, and the length (in time) of the project.


**Table 1.** Case selection.

The case studies were conducted using a multi-method approach. In order to prepare the respective organizations for the case studies, both organizations were provided with information material outlining the objectives of the research. Following the suggestions of Yin (2009), the case study research followed a research protocol. The research design was multi-method, and multiple data sources were used.

Primary data sources included the use of individual interviews. The interviews were conducted by the researchers over a period of two weeks. The interviews took place six months after the completion of the projects. The interviews were limited to one hour and followed a set line of questioning, although space was given during the interviews for follow-up questions in order to clarify descriptions or subjective statements. In both cases, two data scientists (interviewee 1 and 2), one enterprise data architect (interviewee 3), and two data governance officers (interviewee 4 and 5) were interviewed.

Secondary data sources included relevant market research and policy documents as well as websites. Internal policy documents were provided to the research team by the interviewees and the researchers were also given access to the organizations' intranet and internet websites. All documents reviewed were documents that are available in the public domain.

Triangulation of factors relating to the role of data governance as a boundary condition for data science case was made by listing aspects of data governance found in internal documentation and comparing these to the aspects of data governance exposed in the interviews, and matching these with the responses of the interviewees as to the contribution of these aspects towards the success of the project. Interviewees were also requested to provide feedback with regards to possible improvements.
