Data-Driven Business Innovation Processes: Evidence from Authorized Data Flows in China

Gao, Xueyuan; Feng, Hua

doi:10.3390/systems12080280

Open AccessArticle

Data-Driven Business Innovation Processes: Evidence from Authorized Data Flows in China

by

Xueyuan Gao

and

Hua Feng

^*

School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China

^*

Author to whom correspondence should be addressed.

Systems 2024, 12(8), 280; https://doi.org/10.3390/systems12080280

Submission received: 24 June 2024 / Revised: 31 July 2024 / Accepted: 1 August 2024 / Published: 2 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

The importance of data in current societal activities cannot be overstated, yet we know little about data governance and application. Using the Chinese Government Data Empowerment Initiative, this paper examines the process of data-driven business innovation. Using the staggered DiD model, we found that government data points effectively facilitate firms’ product innovation, with higher-quality data correlating with better innovation performance. Government-authorized data points aid firms in accumulating experience in applying and managing data, thereby enhancing their production performance. Furthermore, these data help firms improve the quality of their product innovations and achieve iterative product upgrades. We also found that government-authorized data points not only generate product innovations for government use but also stimulate the output of commercial product innovations. This research provides important insights into data governance and enterprise data management decisions.

Keywords:

data governance; data quality; government data; data management; business innovation; innovation management

1. Introduction

The application of technologies such as big data and artificial intelligence has not only catalyzed an explosive growth of data within the social system but also heightened the demand for data. Key technologies associated with Industry 4.0 present vast opportunities for both governmental and business model innovation [1,2]. These technologies enable companies to efficiently process and apply data, extracting valuable information and knowledge to complete the value creation process [3]. In recent years, numerous examples show how Industry 4.0 technologies drive fundamental business model transformations [4,5]. Nonetheless, the existing literature does not adequately explore the implications of these transformations for business management [6], and research on the mechanisms underpinning business model innovation remains sparse [7]. This paper contributes to this expansive topic by examining business innovation models from a data-driven perspective.

Industry 4.0 technologies inherently rely on data to realize their value, yet few studies directly examine the processes through which data points generate value. The pivotal role of data underscores why big data and cloud computing technologies exert a more significant influence on business model innovation compared to other technologies [8]. Increasingly, scholars posit that data points drive knowledge accumulation, thereby fostering long-term economic growth [9]. For organizations to remain competitive, continuous innovation and adaptation of business models are imperative [10]. This leads us to wonder: How do data points circulate within social systems to generate value? What impact do these data have on business innovation?

While part of the literature acknowledges the theoretical value of data in the process of knowledge accumulation, there is a notable lack of practical assessment. One significant impediment is data privacy concerns, which restrict data availability and obscure the actual flow of data in society. However, government data points mitigate this problem by encompassing the vast majority of societal data. For instance, in China, the government holds approximately 80% of all data, encompassing various aspects of life. Thus, government data points serve as a comprehensive measure of the total volume of data in society. Additionally, the government facilitates data sharing and utilization through contractual licensing with companies. By entering into such contracts, companies gain access to extensive and rich datasets, enhancing their innovation capabilities. This symbiotic approach not only promotes data flow across society but also aids companies in the knowledge accumulation process. Consequently, the economic value of data is harnessed, contributing to the overall sustainability of the economy.

The government data-sharing process in China offers an exemplary case to assess the value development and circulation processes of data. Using this framework, we examined the process of value generation and the flow of data within social systems. We acquired and analyzed all government data-sharing contracts from 1996 to 2023, examining the subsequent knowledge accumulation in numerous high-tech firms that gained access to government data. These firms, which specialize in advanced technologies such as big data and artificial intelligence, are central to societal knowledge accumulation. Therefore, if we observed a distinct pattern of knowledge accumulation through this process, it would substantiate the notion that social data points foster a positive feedback loop.

Leveraging the particular scenario of Chinese government data flows, we found that government data sharing significantly increases the innovation output of high-tech firms and varies across different economic regions in China. This accumulation of knowledge is more pronounced in regions with higher economic levels, reflecting, to some extent, differences in data quality across regions. Higher economic regions typically have higher data quality and are therefore more likely to generate more effective knowledge accumulation. In addition, we found that the value of data is diminishing at the margins, with smaller firms often lacking access to data and therefore more in need of government data and more likely to reap the benefits of knowledge accumulation. We also found that enterprises gained innovative experience by utilizing government data, which helped them complete the iterative upgrading of knowledge. Finally, we found that there are positive externalities in the data and that the data obtained from government departments not only helped to produce the software products needed by the government but also additionally helped enterprises form other commercial applications.

Our paper contributes in several ways: First, we provide direct insights into the governance and application of data in society. By demonstrating that government-initiated data sharing can expedite data flow to enterprises through authorized channels, our findings suggest that this approach can mitigate data privacy concerns while transforming vast amounts of data into actionable knowledge. Given the valuable nature of data points, companies often lack incentives to share them. Opening up and circulating government data facilitates the completion of the entire data cycle process. Second, we offer guidance on how organizations can effectively integrate and manage data. Our paper indicates that companies can leverage data across multiple knowledge accumulation processes rather than limit their application to a single R&D dimension. The insights gained from data utilization drive further product enhancements and iterations. Third, our study provides empirical support for the emerging theoretical literature on the value of data for economic growth. We identify positive externalities from data in the knowledge accumulation process, highlighting that these benefits can be shared across different segments of knowledge accumulation.

We also contribute to the literature on the government data ecosystem. While some studies emphasize the importance of government data for knowledge accumulation, they often lack analysis of real-world scenarios. Our research provides crucial empirical evidence by examining the impact of government data on the dynamics of knowledge accumulation within the context of Chinese government data sharing. This analysis offers valuable insights into how government data can effectively drive knowledge accumulation in practical settings.

The remainder of the paper is organized as follows: Section 2 presents the literature review, discusses studies related to this paper, and introduces our theoretical frameworks. Section 3 presents the empirical design, detailing the data sources and empirical strategy. Section 4 covers the regression analysis, including the benchmark regression and robustness tests. Section 5 and Section 6 present heterogeneity and the underlying mechanisms. Finally, Section 7 presents the discussion and conclusion.

2. Literature Review and Theoretical Framework

2.1. Literature Review

2.1.1. The Economic Value of Data

Data points generated from various sectors of society hold significant practical value for production applications. They have become a crucial input for numerous organizations [11,12], facilitating the extraction of valuable knowledge and information through processing. Acquisti et al. (2016) highlighted that non-rivalry is pivotal for the long-term economic value of data [13]. Unlike physical goods, data can be extensively utilized without depletion or exclusive use, allowing widespread sharing and utilization among many individuals without diminishing their overall availability. Essentially, the marginal cost of data approaches zero, enabling different firms to apply the same dataset across various economic activities [14].

Positive externalities represent a significant economic attribute of data, where processing and applying large datasets often yield unintended benefits. Schaefer and Sapi (2020) demonstrated that accumulating search data enhances firms’ product quality [15]. As firms expand their data repositories, their capacity to process and apply data grows, thereby increasing the likelihood of deriving new knowledge [16,17]. This results in data exhibiting increasing returns to scale, contributing progressively greater value when combined with factors of production such as labor and capital [14].

The value of data is rapidly increasing [18]. Data play a crucial role in predicting the actual value of products and are essential for product pricing strategies [19]. Many organizations have leveraged data accumulation to foster innovative activities [20]. By acquiring extensive consumer-related data, companies can accurately discern user preferences and recommend more suitable products [21]. Furthermore, large datasets can effectively adjust asset pricing to align with real values [22], significantly enhancing decision-making processes in the manufacturing and trading sectors through big data technologies. This body of literature underscores the substantial value of data across various industries.

A portion of the literature explores the beneficial impacts generated by data. Big data is a lucrative source of information that enhances operational efficiency and optimizes resource allocation [23,24]. By integrating data from diverse sources, firms can effectively forecast customer demand. Lamba and Singh (2019) found that the utilization of big data optimizes supply chain management and enhances its effectiveness [25]. Begenau et al. (2018) demonstrated that financial firms’ analytics, leveraging large volumes of data, substantially mitigate information asymmetry issues [26]. Sorescu (2017) argued that the intensive use of data opens up a wide range of possibilities for innovation in business models [27].

From an economic systems perspective, data serve as a primary driver of economic growth through their role in knowledge production. The increasing demand for data in numerous innovative activities underscores their inherent value, despite data being a by-product of economic activities. Moreover, the non-excludable nature of data allows for their shared utilization across various innovative endeavors. Xie and Zhang (2024) explored the impact of different data types on economic development, considering data generated not only by consumers but also through firms’ production processes [28]. They found that utilizing both types of data results in higher economic growth rates compared to relying solely on consumer data. This highlights the importance of both data quantity and quality in fostering economic growth. Additionally, Hou et al. (2022) argued that limited data storage capacity can pose a significant constraint on data-driven knowledge accumulation [29]. Intensive data applications in innovation activities require adequate storage infrastructure support.

Data points contain a vast amount of information and knowledge that requires processing to extract critical insights. Early literature emphasizes the value of knowledge accumulation for technological innovation [30]. Through the learning-by-doing process, firms accumulate a large amount of tacit knowledge as well as organizational capital that is uniquely their own, which helps firms increase productivity [31,32]. Data-based knowledge can be traded and circulated more effectively than other forms of knowledge accumulation. This means that data have a considerable ability to enable knowledge diffusion, thereby generating positive externalities.

Aggregation, exchange, and reconfiguration of information provide new opportunities for corporate innovation. In the innovation process, data points often serve as critical tools for training new algorithms and developing products, significantly enhancing R&D activities and the creation of new services [33]. When the training dataset contains more information, it leads to higher-quality algorithms and products [34]. As a firm’s product innovations attract more users, the user data points serve as valuable supplementary information to the original training data, leveraging the data’s positive externalities to further enhance product innovation quality. This positive feedback loop increases the marginal returns of data and underscores their pivotal role in firms’ innovation activities [19]. Agrawal et al. (2018a) argued that applying data reduces the cost of errors in R&D, enabling researchers to comprehensively deduce various potential innovation outcomes and thereby reducing the complexity of innovation activities [35].

These theoretical studies affirm the significant value of data elements in firms’ innovation activities, emphasizing the importance of competitiveness and increasing marginal payoffs in enhancing innovation success. However, some studies find that the marginal value of data may be diminishing. Bajari et al. (2019) observed a decreasing trend in the effectiveness of data for predictive algorithms, while an Amazon study indicated that extensive data points covering various product types do not necessarily improve algorithm training accuracy [36]. These findings underscore that specific dimensions of data may offer limited value despite their overall long-term importance for algorithmic training. Cong et al. (2021) theorized about the adverse effects of over-reliance on data elements in innovation and economic activities [33]. They argued that excessive dependence on data could strain the scientific research workforce, exacerbating the reliance on data and potentially slowing economic growth and innovation. Applying too many data points may instead lead to a decrease in the quality of decision-making and depress the operational efficiency of the organization [37]. Thus, data can be a double-edged sword for business models.

2.1.2. Data Application in the Public and Private Sectors

Studies on data for business model innovation focus mainly on the private sector. Arthur and Owen (2019) found that data have significant value for supply chain innovation [38]. Improving the accuracy of new product forecasting by using various types of information is a key channel to exploit the value of data for innovation [39]. Mariani and Fosso Wamba (2020) found that data points help in predicting the demand for new products and help organizations better align their innovation strategies [40]. Firms utilize data for value creation, thereby enabling business model innovation. Research on entrepreneurship similarly indicates that the ability to analyze data is critical to improving operational efficiency [41]. Mikalef et al. (2020) found that big data enhances firms’ incremental and radical innovative capabilities, enabling firms that collect and process large amounts of data to quickly gain new insights from changing environments [42].

There is also some literature that focuses on public organizations. The literature has examined the innovative value of government data, often using qualitative analysis. Government data points originate from the Open Government Data (OGD) initiative, which aims to improve government transparency and guarantee citizens’ right to know [43,44]. Anyone can access open data points from government departments and freely use, reuse, and redistribute them. In the early years, the public lacked the means to fully exploit the value of government data due to limited access and processing capabilities [45]. Thanks to the development of technologies such as big data, research on the exploitation of government data has gradually emerged in recent years [46]. The public sector has a natural advantage in collecting data on health care, education, etc. The private sector can only collect such information with authorization and permission. More importantly, the private sector does not have the ability to collect data across society, but the public sector does. Therefore, government data points are usually of higher quality and value than private data.

Hopp et al. (2018) highlighted the significant value of applying data at scale to provide precision medicine [47]. Chen (2018) similarly found that large-scale integration of different types of healthcare data can drive the efficient utilization of healthcare resources [48]. In addition, the government has a natural advantage in accessing data in areas such as transportation and healthcare, whereas firms, even if authorized to collect such data, often face significant privacy challenges. Therefore, the government’s ability to collect data points and authorize their use by firms is an effective means of mitigating the negative impacts of data. Although related research is reflected in the literature on government data ecosystems, it is not well connected to business model innovation.

In another part of the literature on government data ecosystems, researchers mainly explore the circulation and utilization of government data on a theoretical level [49]. Government data ecosystems aim to promote open sharing and enhance the utilization and circulation of data within the public sector [50,51]. Similar to how data points foster knowledge accumulation, Fang et al. (2024) argued that government data can stimulate innovation and incentivize broader societal utilization [52]. Yang et al. (2015) analyzed various factors influencing data flows, focusing on interactions with data rather than the economic attributes that impact data circulation [53]. Theoretically, government data points contain information about every aspect of society, suggesting that government data may have economic value. Liu et al. (2022) argued that it is big data technology that makes government data economically valuable [54]. While this literature emphasizes the value of government data, it is relatively weak in analyzing the impact of government data on business models. This leaves a gap in our research.

In summary, the existing literature focuses primarily on how data points operate within individual sectors and lacks analysis of data exchange between the public and private sectors. Research on the practical use of government data is similarly limited [55]. We aim to analyze how government data can transform business innovation models. In Section 2.2, we explore how government data points enter the private sector, addressing the gap in the government data ecosystem literature regarding cross-border data flow [52]. Additionally, we examine how firms leverage government data for business innovation and how this process is realized. The literature currently lacks an exploration of government data exploitation in the context of private sector data development as well as an in-depth analysis of how government data points facilitate knowledge creation within firms [56,57].

2.2. Theoretical Framework

We present a conceptual model of the study illustrated in Figure 1. We used the flow of data from government departments as a starting point for our analysis. Due to its role as a public service provider, the government collects and holds data on most aspects of social activity, including transportation, finance, and healthcare. The government’s collection of public data can effectively promote the orderly flow of data and provide a basis for the data value creation process. The data are initially processed in government departments, where they are disaggregated and downgraded to include various types of information. By encrypting the sensitive information contained within it, privacy-compromising situations can be effectively avoided [58]. These desensitized data already possess some characteristics of information, but due to the lack of commercial objectives, they require further processing before they can be fully transformed into useful information. The government initially processes data points to enable their use across both the government and the private sector. By aligning the data points with administrative objectives, the government designs the data to achieve administrative goals and makes them available for business use. Thus, the government effectively serves as a data source.

When data points enter the private sector, firms fully exploit the information covered in the data, allowing it to transform into knowledge. The use of data produces new outcomes, such as knowledge, products, and services. By analyzing and processing the data, this process effectively contributes to product innovation within the firm [59,60]. The value of data is fully realized through innovative products and services [27]. Companies can effectively use data to analyze customer needs and provide better products and services. Therefore, we proposed the following research hypothesis:

Hypothesis 1:

Government data points have a positive impact on business innovation.

Data create valuable, rare, and inimitable resources for firms [61], forming intellectual capital that belongs to the firm. Enterprises possess a certain ability to reorganize and integrate knowledge, which allows this knowledge to be applied in various activities, helping them improve existing products and processes [62]. For example, by analyzing traffic data, firms not only develop utility innovations applicable to urban transportation planning but also update commercially available mapping software. The knowledge gained from data helps companies better understand customer and market needs, leading to key innovations [63]. Thanks to big data technologies, firms can rapidly transform many types of knowledge into innovations [61]. Therefore, we proposed Hypotheses 2 and 3:

Hypothesis 2:

Government data points improve the quality of innovation by way of product and process innovation, thus optimizing the business innovation process.

Hypothesis 3:

Government data points generate knowledge that is shared and utilized across various innovation activities, expanding the types of business innovations, especially those aimed at serving the public sector.

Throughout the system, the government accelerates data flow by providing public services and minimizing privacy-related impediments. Enterprises use data to create innovative products for both the government and other business sectors. These users, spanning a wide range of industries, continue to generate large amounts of data, which the government can further collect, process, and utilize. The government receives feedback from these newly generated data to further optimize the structure of the data available to enterprises. As a result, the cross-border flow of government data helps firms update their business innovation models. The newly generated data points continue to serve as raw material for firms’ innovations, enabling them to produce more high-quality innovations and expand their user markets. Through this process, a complete closed loop of data flow is formed in the public sector. In the following sections, we provide empirical evidence for this theoretical model, utilizing the substantial data points held by the Chinese government, which account for nearly 80% of all data.

3. Empirical Design

3.1. Data Sources

We first needed to assess the data that the government collects and authorizes for business use. We utilized procurement contract details from government procurement databases as the core source for analyzing firms’ access to data. In China, approximately 80% of data points with potential value are held by the government sector, making government data the primary channel for enterprises to obtain data inputs. To provide public services and ensure daily operations, government departments seek potential sellers through procurement. To maintain transparency and fairness, contract information is disclosed through the government procurement database when a government department forms a contract with a supplier. The disclosed information includes the contract name, subject, key details, amount, and primary terms. Some procurement contracts are closely related to data, such as the “5G Intelligent Big Data Telephone Consultation Return System Service Procurement Project,” which typically involves the sharing of government data. Suppliers are required to provide targeted products to government departments after acquiring the data. Therefore, the disclosed information on government procurement contracts effectively measures an enterprise’s access to government data. These government procurement contracts effectively represent the government’s initial processing and handling of government data. In other words, these contracts serve as proxies for the government’s functions of collecting, processing, and operating data, as illustrated in Figure 1.

We obtained all government procurement contract data from the Government Procurement Database (GPD) for the period 1996–2023. Since the database contains all types of government transactional activities, we needed to filter out contracts that were relevant to the data. A contract labeled as data-related needs to meet one of the following conditions: (1) it has a clear requirement for the development and application of digital technology, such as “big data technology” or “artificial intelligence technology” and other contracts requiring intensive data; (2) it covers important projects involving the dissemination of substantial amounts of data, such as “smart city”, “digital government”, or the “skynet project”; or (3) it is issued by government departments capable of generating substantial amounts of data, such as information and transportation departments. Based on these criteria, we initially identified 613,477 data-related contracts from a total of 2,758,754 contracts. These contracts cover multiple vendors, and 4855 vendors receive more than one data contract.

Based on these contracted suppliers, we manually screened out high-tech companies with technological innovation capabilities. This screening process primarily utilized the “Tianyancha” database, which was established by an authoritative credit bureau verified by official government departments. This database records in detail the business status, equity relationships, intellectual property rights, and other relevant information of various companies in China. Based on this information, we first determined whether each contract supplier is a high-tech company, mainly by examining its business scope and intellectual property information.

Immediately following this, we judged whether these firms have technological innovation activities, measured by whether they have software output. Software is a trained algorithmic logic that requires data for testing and development, so software innovation needs data to support it. We define “software output” as the innovative outcomes generated by firms through the use of big data. This metric effectively measures the performance of data utilization in driving business innovation. By matching companies with data contracts and software outputs, we formed a database covering data input and software output. After excluding companies with no software output, we obtained detailed information on 278,315 software outputs from 1815 high-tech companies. Observing the impact of access to data contracts on firms’ innovation output helped us test the research hypotheses in Section 2.2.

Figure 2 shows the trend in software output over the sample period. The number of software outputs has risen since 2010, with the growth rate increasing rapidly since 2015 and remaining high in recent years. This trend coincides with China’s strong development of digital technologies and the explosion of data volumes. During the sample period, 75.5% of companies produced more than one product innovation, spanning a range of different application areas.

3.2. Empirical Strategy

We used a staggered difference-in-differences (DiD) model to test the effect of government data authorization on product innovation. Several factors motivated our choice of this model. First, the government provides the data through a contract that indicates the date of the transaction but does not specify the volume of data. Second, firms do not access data simultaneously, and there is no uniform point in time at which all firms receive treatment. Some firms receive data support relatively early in the sample period, while others receive it later. To estimate the average impact of data on firms’ product innovation output, we needed to analyze the post-treatment effect for the sample that received treatment at different time points.

We used a variant of the parallel trend assumption in staggered DiD, where the parallel trend assumption is satisfied only after some treatments have commenced [64]. Specifically, we categorized firms that initially acquire contracts with relatively few data points, followed by contracts with relatively more data, as the treatment group, while firms that only acquire contracts with relatively few data points serve as the control group [65]. This approach has several advantages: it avoids the strong assumptions of traditional parallel trends. Since contracts are not signed simultaneously, finding an ideal control group to impose a parallel trend at any given point in time is challenging. One possible scenario is that a certain percentage of firms develop the ability to access data support over time. Consequently, the criteria for judging whether a firm accesses data support can create a heterogeneous control group, potentially violating the parallel trend assumption. This setup improves estimation efficacy by eliminating some potential factors that influence whether a firm has access to data support.

The form of the staggered DiD model is shown in Equation (1):

Y_{i, t} = \sum_{τ \neq 0} D a t a_{h i g h} \cdot 1 [P_{i, t} = τ] β_{1 τ} + \sum_{τ \neq 0} 1 [P_{i, t} = τ] β_{2 τ} + γ \sum_{τ \neq 0} C o n t r o l_{i τ} + α_{i} + ϕ_{t} + ϵ_{i t}

(1)

where

t

is in half-year intervals and

τ

is an integer but measures a half-year period.

D a t a_{h i g h}

represents a dummy variable for obtaining substantial data contracts.

P_{i, t} = t - G_{i}

is then the period of firm i relative to the point in time when it acquired the substantial data contracts, and

G_{i}

represents the year in which the firm acquired the substantial data points contracts. For example,

P_{i, t} = 1

represents the first semi-annual period following the acquisition of a substantial data contract.

Y_{i, t}

then represents the software output of firm

i

in the corresponding six-month period.

C o n t r o l_{i τ}

represents a set of control variables, mainly including the firm’s registered capital (

C a p t i a l_{r e g}

), the number of employees (

E m p l o y

), and the time of founding (

E s t a b D a t e

).

α_{i}

denotes the firm-fixed effects, which control for a number of disturbing factors that do not vary over time.

ϕ_{t}

represents the time-fixed effects, which control for a number of sources of variation over time in high-tech firms.

ϵ_{i t}

is the error term. All regressions are clustered at the firm level.

β_{1 τ} + β_{2 τ}

is our parameter of interest, measuring the causal effect of accessing substantial government data.

We extended our initial screening of data-supporting contracts by applying additional criteria to identify contracts involving potentially substantial data volumes. This includes contracts designated for critical government data initiatives and those explicitly linked to the deployment of digital technology in the public safety sector. These contracts often encompass more precise and extensive datasets, effectively representing high-quality data. For example, contracts such as the “Video Skynet Social Resource Integration Platform Construction Project” undertaken by the public security department focus on systems like video access.

This empirical approach enables direct comparison of firms before and after acquiring substantial data inputs, enhancing the effectiveness of estimated coefficients in reflecting the impact of data inputs on product innovation among high-tech companies. Table 1 presents disaggregated statistics for key variables. Notably, firms acquiring substantial data inputs demonstrate higher levels of software output, larger firm size, and increased working capital compared to those acquiring smaller data volumes. Hence, these potential differences are carefully controlled for in our analysis.

4. Product Innovation Effects of Government Data Authorization

4.1. Baseline Regression

We examined the impact of government data inputs on firms’ product innovation using the model depicted in Equation (1). The baseline regression results demonstrate that firms acquiring substantial data inputs show a significant increase in the number of software outputs compared to those acquiring small data inputs. Figure 3 presents the results of the baseline regression. Treatments indicate whether firms that initially had access to a small amount of data later gained access to substantial data. Regressions cluster at the firm level. The coefficients show a consistent trend before the treatment period, satisfying the parallel assumption. Two years after acquiring substantial data inputs, these firms produce an additional nine software outputs compared to firms with small data inputs. This highlights a crucial positive causal effect of government data on product innovation in high-tech firms, underscoring the importance of high-quality data inputs for fostering innovation. The results from the baseline regression confirm the validity of Hypothesis 1.

4.2. Robustness Checks

4.2.1. Changing Data Contract Identification Rules

We refined our approach to identifying contracts with substantial data inputs to minimize the impact on the average treatment effect. Specifically, we adopted a more stringent criterion, focusing solely on contracts related to digital technology disclosed by the core public security sector. Other settings are consistent with the baseline regression model. The black dots in Figure 4 represent the corresponding staggered DiD results. The regression outcomes indicate no significant departure from the baseline regression, and the parallel trend assumption holds between the treatment and control groups. Furthermore, the results show that two years after acquiring substantial data inputs, firms exhibit significantly higher software output compared to firms acquiring small data inputs.

We also used the level of regional digital infrastructure development as a rule for determining whether firms receive substantial data inputs. Regions with higher levels of digital infrastructure development can collect more data of high quality, thus strengthening the ability of firms to innovate their products in a way that improves the quality of data inputs. Therefore, data from these regions will inherently hold higher value compared to those from economically disadvantaged regions. Specifically, we gauged the potential data quality in contracts using the number of regional cameras, aligning it with the location of the relevant department providing the data contract. Contracts with data volumes higher than the regional average are categorized as having substantial data inputs, while those below are considered to have small data inputs. The gray squares in Figure 4 illustrate the regression results using regional digital infrastructure development as a screening criterion. The regression outcomes remain consistent with the baseline regression, affirming the parallel trend assumption between the treatment and control groups. This robustness test validates our findings.

4.2.2. Consideration of Negative Weights

Regression analyses using staggered DiD often face the problem of “negative weighting” [66]. It is challenging in staggered DiD models to assume that treatment effects are homogeneous across time or individuals, and ensuring consistency in these effects is difficult. One potential issue arises when firms that receive substantial data inputs early in the sample period become part of the control group for firms receiving these contracts later on. This situation can lead to negative weighting of early treatment effects, causing cross-period contamination that may negatively influence estimated coefficients in the middle or late parts of the sample. In essence, assuming homogeneity in the average treatment effect across all periods after firms receive substantial data inputs is problematic, especially among firms initially receiving small data inputs.

To mitigate potential bias, we adjusted the regression weights. The blue diamonds in Figure 4 illustrate the result of this adjustment, which reduces the impact of the early stages of data processing while emphasizing the later samples. Despite these adjustments, the regression outcomes remain consistent with the baseline results. Specifically, firms that acquire substantial data inputs demonstrate significantly higher software outputs after two years compared to firms with small data inputs.

4.2.3. Excluding the Effect of Financial Subsidies

Another factor that may interfere with identifying causal effects is financial subsidies. Typically, every contract involves a flow of funds, as government departments purchase relevant products from firms through procurement. Consequently, the significant increase in innovation output observed in the treatment group might result from direct financial support rather than data. This potential confounding factor could directly affect causal identification. To address this, we controlled for the transaction amount firms received from data contracts by incorporating its interaction term with the relative period into the regression. The red triangles in Figure 4 show the regression results after controlling for transaction amounts. We found that these results are generally consistent with the baseline regression, suggesting that direct financial subsidies do not interfere with identifying causal relationships.

Finally, we extended the post-treatment test period to assess whether this positive causality is relatively long-term and stable. We continued to use the staggered DiD model from the baseline regression but extended the post-treatment observation period to period 8. We found that the positive causal effect of substantial data inputs on product innovation persists. The positive impact of data inputs on firms’ product innovation remains evident two years after receiving the substantial data contract. We therefore omit the presentation of this result here.

5. Heterogeneity Analysis

5.1. Regional Data Quality

In the baseline regression, we found that data inputs have a positive causal effect on a firm’s product innovation, and this positive effect becomes more pronounced when firms acquire substantial data inputs. This implies that larger or higher-quality data points are more valuable for product innovation. Regions with higher levels of economic development typically have a greater ability to provide substantial data inputs. Therefore, we predicted that firms in economically advanced regions with substantial data inputs would exhibit better product innovation performance compared to firms in other regions.

To capture this potential impact, we categorized firms acquiring substantial data input contracts in the East as a group with a high level of economic sophistication, while firms acquiring substantial data input contracts in other regions belong to a group with a relatively lower level of economic sophistication. We categorized companies that do not receive substantial data input contracts according to their office location. The eastern region of China has a higher level of economic development, so companies may receive more and higher-quality data from this region than from other regions.

We ran separate regressions for different groups of firm samples, and Figure 5 shows the results of staggered DiD within different economic regions. Other settings are consistent with the baseline regression model. The black dots represent the regression results for the region with a relatively high economic level (the eastern region), and the blue diamonds represent the regression results for the other regions. An interesting phenomenon is that firms acquiring substantial data inputs from the East have better innovation performance relative to similar firms in other regions, and positive causal impacts from these data inputs emerge quickly in the East but relatively later (typically a year later) in other regions. Since we excluded some of the potential factors that may affect whether a firm is selected as a supplier through the staggered DiD design and controlled for some of the confounding factors that may affect a firm’s product innovation output, it is reasonable to assume that this better innovation performance is brought about by the input of higher quality data elements. This higher-quality data compresses the cycle time for firms to train their algorithms, which in turn allows firms to produce software outputs more quickly.

5.2. Scarcity of Data

The value of data inputs for firms’ product innovation is likely to vary depending on the size of the firm. Larger firms have the inherent ability to generate more data, making them more likely to experience the curse of data volume when using data to train algorithms [36]. To verify this scenario, we categorized the sample into large, medium, and small firms based on the official classification of firm size and ran separate regressions for the different groups.

Figure 6 shows the regression results for different firm size subgroups. The gray dots represent the regression results for large firms, the blue diamonds represent the regression results for medium firms, and the red squares represent the regression results for small firms. We found that substantial data inputs continue to have a significant positive effect on product innovation in medium-sized and small firms, but not in large firms. This result supports our hypothesis to some extent. Larger firms may have alternative channels for acquiring appropriate data, such as access to more private data, thus diminishing the value of acquiring substantial data inputs. This is reflected in the difference in coefficients across the sample. However, due to sample limitations, only a small number of firms are categorized as large firms, which reduces the precision of the regression. The wider range of confidence intervals indicates larger errors, thus weakening the strength of the evidence.

6. Dynamics of Knowledge Accumulation

6.1. Learning from Data

When firms acquire substantial data inputs, they continue to accumulate experience related to their innovation activities in the process of applying data to produce software. These experiences, gained only from actual innovation activities, help firms continuously improve their innovation efficiency, demonstrating the “learning-by-doing” effect of utilizing data. To measure this, we calculated the average software output rate of firms based on their software output in each semi-annual period and categorized firms into high and low innovation capability groups using the sample mean as a criterion. If learning effects exist, we expected the high-capability group to exhibit significantly higher output performance in the period after acquiring substantial data inputs compared to the low-capability group.

Figure 7 shows the corresponding regression results. The black dots represent the regression results for the high-level group, and the blue diamonds represent the regression results for the low-level group. We found a significant difference between the two groups. The positive impact of substantial data inputs on firms’ product innovation is significantly higher in the high innovation capability group than in the low innovation capability group after four years. Thus, firms accumulate valuable experience from utilizing data for software development, and this accumulated experience significantly enhances firms’ product innovation performance.

6.2. Quality Upgrading Effect

Data points help firms accelerate the iterative upgrading of their products, thus continuously improving the quality of their product innovation. When more data points are used to train various algorithms, firms can continuously update existing algorithms based on these data, enhancing the quality of the algorithms. Consequently, the data may have a quality-upgrading effect on innovation, aiding in the continuous improvement of innovation quality.

To capture this mechanism, we counted the versions of software upgrades across firms over the sample period, viewing them as part of iterative upgrades within product innovation. For software, updates and upgrades are usually released after the initial (major) version of the software is published, corresponding to a minor version number that refines modules of poorer quality in the previous version. If substantial data inputs actually help firms upgrade the quality of their innovation, we expected to see positive average processing effects in these minor versions of software outputs.

We conducted regression analyses after modifying the corresponding dependent variable to a minor version of the software output. Other settings are consistent with the baseline regression model. Figure 8 shows the regression results based on product innovation in minor version software. The results present a significantly positive average treatment effect, indicating that substantial data inputs significantly contribute to product innovation in firms’ minor version software after two and a half years. This supports the notion that large data inputs help firms continually enhance their product innovation quality through iterative upgrades. Up to this point in our analysis, we confirmed the validity of Hypothesis 2.

6.3. Data Diffusion Effects

The value of data inputs for firms’ product innovation may have a diffusion effect across different types of software outputs. While data inputs are initially intended for training software products needed by the government sector, these data can also generate value in the production of software for other uses. In other words, data inputs from the government sector may help firms generate product innovations not only for the government sector but also for other applications.

To verify this diffusion mechanism, we categorized the software output of each high-tech company into public-use products and other-use products based on their intended use. Public-use products refer to software explicitly designed for the government. Figure 9 presents the regression results categorized by product use. The black dots represent regression results for software used for other purposes, and the blue diamonds represent regression results for software used exclusively in government departments. We found significant spillover effects. After three years, the substantial data inputs from public sector licenses not only help firms generate product innovations applicable to the public sector but also lead to more product innovations for other uses. This provides further evidence of the substantial value of data inputs for product innovation. Therefore, we confirmed the validity of Hypothesis 3.

7. Discussion and Conclusions

This paper examines how data points circulate through authorized sharing with the private sector, focusing on data sharing within the Chinese government. We provide empirical evidence for a large and emerging body of theoretical literature, not only illustrating how data generate value but also demonstrating how firms manage and apply data. Based on China’s unique institutional context, our study first shows that data from social systems can enter the private sector through authorized sharing. This process helps firms complete the crucial process of data value development, i.e., the innovation chain. Data-driven business models emerge by analyzing valuable information, and our study explains how companies use data to achieve these goals. This holds significant value not only for firms seeking to utilize data for innovation but also for those aiming to achieve data-driven business model innovation. Most relevant research uses conceptual approaches, which makes our study particularly important.

First, this paper provides important insights into the systematic management and application of data in business. The rapid development of Industry 4.0 technologies has sparked new interest in the study of business models in the digital age [67]. We contribute to this literature from the perspective of data business exploitation and utilization. The reason our findings are generalizable is that we systematically observe a large number of firms applying data for innovation, covering essentially all firms that utilize government data for innovation. Additionally, this paper focuses on data as the raw material for the application of core Industry 4.0 technologies, such as big data and artificial intelligence. Therefore, our findings are valuable for inspiring business model innovations across different applications of core Industry 4.0 technologies. We encourage firms to fully exploit data points, accumulate unique intellectual capital from them, and use them for innovation activities as well as business model renewal. Thus, our empirical study also supports much of the literature that recommends firms cultivate data analytics capabilities [68].

Second, this paper presents another possible channel for firms to access and utilize data: government data. This channel receives very little attention in the existing literature due to the cross-sectoral flow of data it involves. Most of the literature focuses on the application of data within either the public or private sectors, and few studies examine how government data points affect business innovation. We address this research gap by examining the scenario of government data circulation across borders in China. Our study holds important managerial implications for both policymakers and firms.

We suggest that policymakers assume the role of data providers. By encrypting government data, the government effectively avoids issues such as privacy breaches [58]. Meanwhile, enterprises should actively access and utilize government data, which offers them the opportunity to serve the public sector and broaden their business areas. Moreover, the knowledge that firms acquire from government data is shareable, which increases their commercial innovation output and improves their innovation efficiency. Therefore, firms should integrate these strategically important data assets as much as possible and explore the potential value of the data based on different application scenarios.

Third, this paper benefits government data sharing in both developing and developed countries. Many countries around the world rapidly implement OGD programs. In recent years, developed countries have introduced numerous policies aimed at facilitating the flow of data. Although the original purpose of government data is to satisfy the public’s right to know, research on exploiting the economic value of government data is gaining attention [69]. Our study emphasizes the importance of using government data to create value, particularly for business innovation. China-based research suggests that developing countries can fully utilize government data to enhance societal innovation levels and support business innovation, thereby driving economic growth. Developed countries can similarly learn from this approach and promote the open sharing of government data with the private sector.

However, researchers should also consider the possible adverse effects of the data value exploitation process. We found that data do not significantly contribute to the innovation output of large-scale firms, likely due to a decline in the quality of decision-making from holding large amounts of data [37]. It could also relate to a decline in the willingness of large firms to innovate. Therefore, we suggest that policymakers open up more government data to SMEs, as they have a greater need for data compared to large firms.

There is still ample room for future research on this broad theme. One limitation is that government data disclosure rules make it challenging to directly assess the total number of data points. Future studies could attempt to obtain data volume metrics directly, providing a more precise measure of the data within China’s social system. Additionally, our measurement of the knowledge accumulation sector is not exhaustive. We primarily focused on companies engaged in digital technology innovations, operating under the assumption that these companies, which intensively use data, are highly representative and constitute the main knowledge accumulation sector in society. However, other firms may also utilize data for product innovation, and future research should consider including these firms to provide a more comprehensive analysis.

Author Contributions

Conceptualization, methodology, software, formal analysis, investigation, X.G.; validation, X.G. and H.F.; resources, H.F.; data curation, H.F.; writing—original draft preparation, X.G.; writing—review and editing, X.G.; visualization, X.G.; supervision, H.F.; project administration, H.F.; funding acquisition, H.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Social Science Fund of China, “Research on Enhancing the Overall Effectiveness of the National Innovation System by Four-Chain Integration”, grant number: 23AJY001.

Data Availability Statement

Data will be available on request.

Acknowledgments

The authors would like to thank the people who provided help and suggestions for improving this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, M.; Sinha, A.; Hu, K.; Shah, M.I. Impact of Technological Innovation on Energy Efficiency in Industry 4.0 Era: Moderation of Shadow Economy in Sustainable Development. Technol. Forecast. Soc. Chang. 2021, 164, 120521. [Google Scholar] [CrossRef]
Veile, J.W.; Schmidt, M.C.; Voigt, K.I. Toward a New Era of Cooperation: How Industrial Digital Platforms Transform Business Models in Industry 4.0. J. Bus. Res. 2022, 143, 387–405. [Google Scholar] [CrossRef]
Wang, S.; Wan, J.; Zhang, D.; Li, D.; Zhang, C. Towards Smart Factory for Industry 4.0: A Self-Organized Multi-Agent System with Big Data Based Feedback and Coordination. Comput. Netw. 2016, 101, 158–168. [Google Scholar] [CrossRef]
Baden-Fuller, C.; Haefliger, S. Business Models and Technological Innovation. Long Range Plann. 2013, 46, 419–426. [Google Scholar] [CrossRef]
Schaefer, D.; Walker, J.; Flynn, J. A Data-Driven Business Model Framework for Value Capture in Industry 4.0. In Advances in Manufacturing Technology; IOS Press: Amsterdam, The Netherlands, 2017; Volume XXXI, pp. 245–250. [Google Scholar]
Foss, N.J.; Saebi, T. Fifteen Years of Research on Business Model Innovation: How Far Have We Come, and Where Should We Go? J. Manag. 2017, 43, 200–227. [Google Scholar] [CrossRef]
Shet, S.V.; Pereira, V. Proposed Managerial Competencies for Industry 4.0—Implications for Social Sustainability. Technol. Forecast. Soc. Chang. 2021, 173, 121080. [Google Scholar] [CrossRef]
Agostini, L.; Nosella, A. Industry 4.0 and Business Models: A Bibliometric Literature Review. Bus. Process Manag. J. 2021, 27, 1633–1655. [Google Scholar] [CrossRef]
Farboodi, M.; Veldkamp, L. Long-run growth of financial data technology. Am. Econ. Rev. 2020, 110, 2485–2523. [Google Scholar] [CrossRef]
Kraus, S.; Durst, S.; Ferreira, J.J.; Veiga, P.; Kailer, N.; Weinmann, A. Digital Transformation in Business and Management Research: An Overview of the Current Status Quo. Int. J. Inf. Manag. 2022, 63, 102466. [Google Scholar] [CrossRef]
Arrieta-Ibarra, I.; Goff, L.; Jiménez-Hernández, D.; Lanier, J.; Weyl, E.G. Should we treat data as labor? Moving beyond “free”. In aea Papers and Proceedings. In AEA Papers and Proceedings; American Economic Association: Nashville, TN, USA, 2018; Volume 108, pp. 38–42. [Google Scholar]
Cong, L.; Li, B.W.; Zhang, Q.T. Alternative data in fintech and business intelligence. In The Palgrave Handbook of FinTech and Blockchain; Palgrave Macmillan: Cham, Switzerland, 2021; pp. 217–242. [Google Scholar]
Acquisti, A.; Taylor, C.; Wagman, L. The economics of privacy. J. Econ. Lit. 2016, 54, 442–492. [Google Scholar] [CrossRef]
Veldkamp, L.; Chung, C. Data and the aggregate economy. J. Econ. Lit. 2024, 62, 458–484. [Google Scholar] [CrossRef]
Schaefer, M.; Sapi, G. Learning from data and network effects: The example of internet search. SSRN Electron. J. 2020. DIW Berlin Discussion Paper No. 1894. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3688819 (accessed on 3 May 2022).
Jones, C.I.; Tonetti, C. Nonrivalry and the Economics of Data. Am. Econ. Rev. 2020, 110, 2819–2858. [Google Scholar] [CrossRef]
Agrawal, A.; McHale, J.; Oettl, A. Finding needles in haystacks: Artificial intelligence and recombinant growth. In The Economics of Artificial Intelligence: An Agenda; University of Chicago Press: Chicago, IL, USA, 2018; pp. 149–174. [Google Scholar]
Acciarini, C.; Cappa, F.; Boccardelli, P.; Oriani, R. How can organizations leverage big data to innovate their business models? A systematic literature review. Technovation 2023, 123, 102713. [Google Scholar] [CrossRef]
Farboodi, M.; Veldkamp, L. A Model of the Data Economy (No. w28427); National Bureau of Economic Research: Cambridge, MA, USA, 2021. [Google Scholar]
Garicano, L.; Rossi-Hansberg, E. Organizing Growth. J. Econ. Theory 2012, 147, 623–656. [Google Scholar] [CrossRef]
Kitchens, B.; Dobolyi, D.; Li, J.; Abbasi, A. Advanced Customer Analytics: Strategic Value through Integration of Relationship-Oriented Big Data. J. Manag. Inf. Syst. 2018, 35, 540–574. [Google Scholar] [CrossRef]
Veldkamp, L.L. Slow boom, sudden crash. J. Econ. Theory 2005, 124, 230–257. [Google Scholar] [CrossRef]
Ianni, M.; Masciari, E.; Sperlí, G. A Survey of Big Data Dimensions vs Social Networks Analysis. J. Intell. Inf. Syst. 2021, 57, 73–100. [Google Scholar] [CrossRef]
Etzion, D.; Aragon-Correa, J.A. Big Data, Management, and Sustainability: Strategic Opportunities Ahead. Organ. Environ. 2016, 29, 147–155. [Google Scholar] [CrossRef]
Lamba, K.; Singh, S.P. Dynamic Supplier Selection and Lot-Sizing Problem Considering Carbon Emissions in a Big Data Environment. Technol. Forecast. Soc. Chang. 2019, 144, 573–584. [Google Scholar] [CrossRef]
Begenau, J.; Farboodi, M.; Veldkamp, L. Big Data in Finance and the Growth of Large Firms. J. Monet. Econ. 2018, 97, 71–87. [Google Scholar] [CrossRef]
Sorescu, A. Data-Driven Business Model Innovation: Business Model Innovation. J. Prod. Innov. Manag. 2017, 34, 691–696. [Google Scholar] [CrossRef]
Xie, D.; Zhang, L. A Generalized Model of Growth in the Data Economy. SSRN 2022. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4033576 (accessed on 13 February 2022).
Hou, Y.; Huang, J.; Xie, D.; Zhou, W. The Limits to Growth in the Data Economy: Storage Constraint and “Data Productivity Paradox”. SSRN 2022. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4099544 (accessed on 3 May 2022).
Jovanovic, B.; Nyarko, Y. Learning by Doing and the Choice of Technology; National Bureau of Economic Research: Cambridge, MA, USA, 1994. [Google Scholar]
Atkeson, A.; Kehoe, P.J. Modeling and measuring organization capital. J. Polit. Econ. 2005, 113, 1026–1053. [Google Scholar] [CrossRef]
Oberfield, E.; Venkateswaran, V. Expertise and firm dynamics. In 2018 Meeting Papers; No. 1132; Society for Economic Dynamics: New York, NY, USA, 2018. [Google Scholar]
Cong, L.W.; Xie, D.; Zhang, L. Knowledge accumulation, privacy, and growth in a data economy. Manag. Sci. 2021, 67, 6480–6492. [Google Scholar] [CrossRef]
Schaefer, M.; Sapi, G. Complementarities in learning from data: Insights from general search. Inf. Econ. Policy 2023, 65, 101063. [Google Scholar] [CrossRef]
Agrawal, A.; Gans, J.; Goldfarb, A. Prediction, judgment, and complexity: A theory of decision-making and artificial intelligence. In The Economics of Artificial Intelligence: An Agenda; University of Chicago Press: Chicago, IL, USA, 2018; pp. 89–110. [Google Scholar]
Bajari, P.; Chernozhukov, V.; Hortaçsu, A.; Suzuki, J. The impact of big data on firm performance: An empirical investigation. In AEA Papers and Proceedings; American Economic Association: Nashville, TN, USA, 2019; Volume 109, pp. 33–37. [Google Scholar]
Ghasemaghaei, M.; Turel, O. Possible Negative Effects of Big Data on Decision Quality in Firms: The Role of Knowledge Hiding Behaviours. Inf. Syst. J. 2020, 31, 268–293. [Google Scholar] [CrossRef]
Arthur, K.N.A.; Owen, R. A Micro-Ethnographic Study of Big Data-Based Innovation in the Financial Services Sector: Governance, Ethics and Organizational Practices. J. Bus. Ethics 2019, 160, 363–375. [Google Scholar] [CrossRef]
Nguyen Dang Tuan, M.; Nguyen Thanh, N.; Le Tuan, L. Applying a Mindfulness-Based Reliability Strategy to the Internet of Things in Healthcare—A Business Model in the Vietnamese Market. Technol. Forecast. Soc. Chang. 2019, 140, 54–68. [Google Scholar] [CrossRef]
Mariani, M.M.; Fosso Wamba, S. Exploring How Consumer Goods Companies Innovate in the Digital Age: The Role of Big Data Analytics Companies. J. Bus. Res. 2020, 121, 338–352. [Google Scholar] [CrossRef]
Dubey, R.; Gunasekaran, A.; Childe, S.J.; Bryde, D.J.; Giannakis, M.; Foropon, C.; Roubaud, D.; Hazen, B.T. Big Data Analytics and Artificial Intelligence Pathway to Operational Performance under the Effects of Entrepreneurial Orientation and Environmental Dynamism: A Study of Manufacturing Organisations. Int. J. Prod. Econ. 2020, 226, 107599. [Google Scholar] [CrossRef]
Mikalef, P.; Boura, M.; Lekakos, G.; Krogstie, J. The Role of Information Governance in Big Data Analytics Driven Innovation. Inf. Manag. 2020, 57, 103361. [Google Scholar] [CrossRef]
Janssen, M.; Charalabidis, Y.; Zuiderwijk, A. Benefits, adoption barriers and myths of open data and open government. Inf. Syst. Manag. 2012, 29, 258–268. [Google Scholar] [CrossRef]
Moon, M.J. Shifting from old open government to new open government: Four critical dimensions and case illustrations. Public Perform. Manag. Rev. 2020, 43, 535–559. [Google Scholar] [CrossRef]
Zuiderwijk, A.; Janssen, M.; Choenni, S.; Meijer, R.; Alibaks, R.S. Socio-technical impediments of open data. Electr. J. e-Gov. 2012, 10, 156–172. [Google Scholar]
Zhao, Y.; Fan, B. Effect of an agency’s resources on the implementation of open government data. Inf. Manag. 2021, 58, 103465. [Google Scholar] [CrossRef]
Hopp, W.J.; Li, J.; Wang, G. Big Data and the Precision Medicine Revolution. Prod. Oper. Manag. 2018, 27, 1647–1664. [Google Scholar] [CrossRef]
Chen, P.-T. Medical Big Data Applications: Intertwined Effects and Effective Resource Allocation Strategies Identified through IRA-NRM Analysis. Technol. Forecast. Soc. Chang. 2018, 130, 150–164. [Google Scholar] [CrossRef]
Heimstädt, M.; Saunderson, F.; Heath, T. From toddler to teen: Growth of an open data ecosystem. JeDEM-eJournal eDemocracy Open Gov. 2014, 6, 123–135. [Google Scholar] [CrossRef]
Jetzek, T.; Avital, M.; Bjørn-Andersen, N. The sustainable value of open government data. J. Assoc. Inf. Syst. 2019, 20, 702–734. [Google Scholar] [CrossRef]
Wang, H.J.; Lo, J. Adoption of open government data among government agencies. Gov. Inf. Q. 2016, 33, 80–88. [Google Scholar] [CrossRef]
Fang, J.; Zhao, L.; Li, S. Exploring open government data ecosystems across data, information, and business. Gov. Inf. Q. 2024, 41, 101934. [Google Scholar] [CrossRef]
Yang, T.M.; Lo, J.; Shiang, J. To open or not to open? Determinants of open government data. J. Inf. Sci. 2015, 41, 596–612. [Google Scholar] [CrossRef]
Liu, Z.G.; Li, X.Y.; Zhu, X.H. Scenario Modeling for Government Big Data Governance Decision-Making: Chinese Experience with Public Safety Services. Inf. Manag. 2022, 59, 103622. [Google Scholar] [CrossRef]
Safarov, I.; Meijer, A.; Grimmelikhuijsen, S. Utilization of open government data: A systematic literature review of types, conditions, effects and users. Inf. Polity 2017, 22, 1–24. [Google Scholar] [CrossRef]
Jetzek, T.; Avital, M.; Bjorn-Andersen, N. Data-driven innovation through open government data. J. Theor. Appl. Electron. Commer. Res. 2014, 9, 100–120. [Google Scholar] [CrossRef]
Magalhaes, G.; Roseira, C. Open Government Data and the Private Sector: An Empirical View on Business Models and Value Creation. Gov. Inf. Q. 2020, 37, 101248. [Google Scholar] [CrossRef]
Trabucchi, D.; Buganza, T.; Pellizzoni, E. Give Away Your Digital Services: Leveraging Big Data to Capture Value New models that capture the value embedded in the data generated by digital services may make it viable for companies to offer those services for free. Res. Technol. Manag. 2017, 60, 43–52. [Google Scholar] [CrossRef]
Minatogawa, V.L.F.; Franco, M.M.V.; Rampasso, I.S.; Anholon, R.; Quadros, R.; Durán, O.; Batocchio, A. Operationalizing Business Model Innovation through Big Data Analytics for Sustainable Organizations. Sustainability 2019, 12, 277. [Google Scholar] [CrossRef]
Akter, S.; Wamba, S.F.; Gunasekaran, A.; Dubey, R.; Childe, S.J. How to improve firm performance using big data analytics capability and business strategy alignment? Int. J. Prod. Econ. 2016, 182, 113–131. [Google Scholar] [CrossRef]
Erevelles, S.; Fukawa, N.; Swayne, L. Big Data consumer analytics and the transformation of marketing. J. Bus. Res. 2016, 69, 897–904. [Google Scholar] [CrossRef]
Wu, L.; Hitt, L.; Lou, B. Data analytics, innovation, and firm productivity. Manag. Sci. 2020, 66, 2017–2039. [Google Scholar] [CrossRef]
Story, V.; O’Malley, L.; Hart, S. Roles, role performance, and radical innovation competencies. Ind. Mark. Manag. 2011, 40, 952–966. [Google Scholar] [CrossRef]
Callaway, B.; Sant’Anna, P.H. Difference-in-differences with multiple time periods. J. Econom. 2021, 225, 200–230. [Google Scholar] [CrossRef]
Beraja, M.; Yang, D.Y.; Yuchtman, N. Data-intensive innovation and the state: Evidence from AI firms in China. Rev. Econ. Stud. 2023, 90, 1701–1723. [Google Scholar] [CrossRef]
Sun, L.; Abraham, S. Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. J. Econom. 2021, 225, 175–199. [Google Scholar] [CrossRef]
Frank, A.G.; Dalenogare, L.S.; Ayala, N.F. Industry 4.0 technologies: Implementation patterns in manufacturing companies. Int. J. Prod. Econ. 2019, 210, 15–26. [Google Scholar] [CrossRef]
Jha, A.K.; Agi, M.A.N.; Ngai, E.W.T. A note on big data analytics capability development in supply chain. Decis. Support Syst. 2020, 138, 113382. [Google Scholar] [CrossRef]
Jetzek, T.; Avital, M.; Bjørn-Andersen, N. The Value of Open Government Data: A Strategic Analysis Framework. In Proceedings of the 2012 Pre-ICIS Workshop: Open Data and Open Innovation in eGovernment, Orlando, FL, USA, 15 December 2012. [Google Scholar]

Figure 1. Government data-driven knowledge-building processes.

Figure 2. Changes in firm software output over the period 1996–2023.

Figure 3. Baseline regression.

Figure 4. Robustness checks.

Figure 5. Heterogeneity analysis of regional data quality.

Figure 6. Heterogeneity analysis of the scarcity of data.

Figure 7. Learning from data.

Figure 8. Quality upgrading effect.

Figure 9. Data diffusion effects.

Table 1. Summary statistics.

	Substantial Data	Little Data
Variables	(1)	(2)
Total product innovation	15.4	10.6
Total product innovation	(37.2)	(44.8)
Minor version	5.0	2.9
Minor version	(16.2)	(21.0)
$E m p l o y$	371.5	148.7
$E m p l o y$	(1676.5)	(931.0)
$C a p t i a l_{r e g}$	56.7	9.5
$C a p t i a l_{r e g}$	(869.8)	(58.0)
Duration of the company	38.4	31.4
Duration of the company	(13.1)	(13.0)
Operation capital	55.6	8.5
Operation capital	(893.5)	(57.2)
N	14,884	8316

Note: Table 1 shows the differences in economic indicators between firms with access to substantial data inputs and those with small data inputs, with standard errors in parentheses.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, X.; Feng, H. Data-Driven Business Innovation Processes: Evidence from Authorized Data Flows in China. Systems 2024, 12, 280. https://doi.org/10.3390/systems12080280

AMA Style

Gao X, Feng H. Data-Driven Business Innovation Processes: Evidence from Authorized Data Flows in China. Systems. 2024; 12(8):280. https://doi.org/10.3390/systems12080280

Chicago/Turabian Style

Gao, Xueyuan, and Hua Feng. 2024. "Data-Driven Business Innovation Processes: Evidence from Authorized Data Flows in China" Systems 12, no. 8: 280. https://doi.org/10.3390/systems12080280

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data-Driven Business Innovation Processes: Evidence from Authorized Data Flows in China

Abstract

1. Introduction

2. Literature Review and Theoretical Framework

2.1. Literature Review

2.1.1. The Economic Value of Data

2.1.2. Data Application in the Public and Private Sectors

2.2. Theoretical Framework

3. Empirical Design

3.1. Data Sources

3.2. Empirical Strategy

4. Product Innovation Effects of Government Data Authorization

4.1. Baseline Regression

4.2. Robustness Checks

4.2.1. Changing Data Contract Identification Rules

4.2.2. Consideration of Negative Weights

4.2.3. Excluding the Effect of Financial Subsidies

5. Heterogeneity Analysis

5.1. Regional Data Quality

5.2. Scarcity of Data

6. Dynamics of Knowledge Accumulation

6.1. Learning from Data

6.2. Quality Upgrading Effect

6.3. Data Diffusion Effects

7. Discussion and Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI