*2.1. Digital Transformation*

In recent years, digital transformation has gradually become a core strategic direction of global technological change. Indeed, the governments of various countries have successively introduced digital strategies to guide the development of digital technology and promote digital transformation [27]. For example, in August 2015, Singapore released the "Smart Country 2025 Plan", which focuses on using artificial intelligence (AI) and data science, immersive media, Internet of Things (IOT) and network security technologies to improve social productivity, increase employment opportunities for highly skilled jobs, accommodate an aging population and cultivate social cohesion. In May 2016 and in order to build a stronger and safer digital Denmark, the Danish government jointly formulated and promulgated the national digital strategy deployment by the central, regional and local governments called "Digital Strategy 2016–2020". This strategy laid out a blueprint

for the digital transformation of government departments, enterprises and individuals. In Australia's "Digital Transformation Strategy 2025", a specific roadmap for improving the digital service supply mode for individuals and enterprises is provided, and the current preparations and future acceleration plans for realizing the digital transformation strategy in Australia are described in detail. At the Fourth Plenary Session of the 19th CPC Central Committee, China identified that it was necessary to "establish and improve the use of the Internet, big data, artificial intelligence and other technical means for digital transformation, promote the construction of digital government, and strengthen the orderly sharing of data" [27].

However, the question of "what is digital transformation" has been a controversial topic among researchers and well-known organizations across the world. Some scholars prefer to define digital transformation from the technical level. Westerman et al. defined digital transformation as using technology to fundamentally improve the performance or influence of enterprises [28,29]. Whereas Fitzgerald et al. believe that digital transformation is the use of new digital technologies (such as social media, mobile, analytics or embedded devices) to achieve significant business improvements (such as enhancing customer experience, optimizing operations or creating new business models) [30,31]. Other scholars define digital transformation from the organizational level. In this regard, Demirkan et al. believe that digital transformation represents a profound and accelerated transformation of business activities, processes, capabilities and modes, making full use of the changes and opportunities brought by digital technology and its impact on the whole society according to a strategic priority perspective [32]. Whereas Haffke defines digital transformation as including the digitalization of sales and communication channels, which provides a new platform and a new way of interacting with customers as well as digitalization of company products (namely products and services), which replaces or adds physical products. Digital transformation also describes how to trigger tactical and strategic business movements through data-driven insights, and the introduction of digital business models, so as to realize new ways of value capture [33]. On the basis of previous studies, Vial summarized the definition of digital transformation, and holds that digital transformation refers to the process of triggering significant changes in entity attributes through the combination of information technology, computing technology, communication technology and connection technology, in order to improve the entity [34]. This definition includes four attributes of digital transformation, namely: target entity, means, scope and degree of change, and expected outcome.

#### *2.2. Transformation of the Construction Industry*

In the 21st century, the fourth industrial revolution has brought great technological and scientific progress, which embraces the use of computers and networked physical systems. The construction industry has also benefited from this progress, resulting in the concept of the digital transformation of the construction industry and this has attracted much attention in the past few years [35]. For instance, Sawhney et al. defined this phenomenon as a "transformative framework" in which three kinds of changes have taken place, viz. industrial production and construction, network physical system, and digital technology [36]. Some examples of digital technologies include building information modeling (BIM), public data environment (CDE), UAV (unmanned aerial vehicle) system, cloud-based project management, augmented reality/virtual reality (AR/VR), artificial intelligence (AI), network security, big data and analysis, blockchain and laser scanners. Within the scope of network physical system, there are robots and automation, sensors, the Internet of Things as well as workers with wearable sensors, actuators, additive manufacturing, off-site and on-site construction, and equipment with integrated sensors and embedded systems. Although the digital transformation of the construction industry does not only refer to the application of technology in the field of construction. Raihan Maskuriy and others suggest that it also includes the whole process from construction resettlement conditions to design and investment preparation, as well as the construction process itself

and the operation and maintenance of buildings and the re-enactment of government construction legislation, which includes the standardization of new processes. From the project management perspective, there is a need for project and budget preparation, construction approval, construction management and specification of the complete electronic construction to be implemented for digital transformation projects. Furthermore, the principles of public construction contracts should be applied by law to ensure necessary deliver of the project management process [37].

#### **3. Research Design**

In this empirical study, the LDA-DEMATEL-ANP model is used to discover the research theme and analyze the key factors for the digital transformation of the construction industry (as shown in Figure 1). Latent Dirichlet Allocation (LDA) is a nonparametric hierarchical Bayesian model based on probability graph, which has become one of the mainstream topic models and is widely used in computing research, such as text mining. As an unsupervised machine learning method, LDA can accurately and effectively mine potential topic information in texts, and help researchers find potential topics in large-scale text information [38]. Compared with the traditional statistical analysis based on keywords, the LDA topic model is not characterized by a single co-occurrence word pair clustering, but by generating a series of terms related to the topic by probability method, digging the semantic information of the topic deeply, and measuring the intensity of the topic and the relationship between the topics by quantification. This approach can judge the development trend of the subject field more accurately. In this research study, we first extracted a series of topics related to all documents from the literature records of digital research in the construction industry, and then determined and identified 12 research topics as the key factors for digital transformation in the construction industry according to the intensity of topics and research needs.

Decision making trial and evaluation laboratory (DEMATEL) is a systematic analysis method that combines chart theory with a matrix. This process seeks the logical relationship among key influencing factors in the form of matrix through data, and calculates the influence degree and affected degree of each key influencing factor, which serves as the theoretical basis for constructing the causal relationship model among various factors [39]. In this study, we distributed questionnaires to ten experts to determine the interaction between these twelve topics.

The Analytic Network Process (ANP) is a decision-making method proposed by T.L. Saaty of the University of Pittsburgh in 1996, which adapts to the non-independent hierarchical structure. It is a new practical decision-making method developed on the basis of AHP [40]. In this study, the ANP network hierarchy is established through the influence relationship among twelve topics quantitatively calculated by DEMATEL method, and the index system of key influencing factors is constructed, and then the effective quantitative evaluation of the digital transformation of the construction industry is realized through the constructed index system of key influencing factors. This technical route has certain innovative significance for discovering the key influencing factors and constructing the index system of such factors for the digital transformation of the construction industry.

#### *3.1. Data Sources*

The Web of Science and Cnki are the data sources of the literature. In Cnki, the literature type is set as periodical. There is a need to set the professional search, and search with the theme of "construction industry" and "digitalization". In the web of science, literature type is set as papers, meeting and comprehensive papers. There is a need to set the basic search and search with the theme of "construction" and "digital" Thereafter, download all the literature information and export it in batch in Excel format, sift out the repeated, irrelevant and incomplete literature, and finally obtain a total of 50 literature. As shown in Table 1.

**Figure 1.** The research framework.


#### **Table 1.** The representative literature data.

#### *3.2. Data Processing*

This part of the process involves the need to extract titles, keywords, abstract and text information from literature information to form the corpus source of the LDA model. Part-of-speech analysis and part-of-speech restoration are carried out on the corpus source file with Jieba in Python, and the obtained data is preprocessed by word segmentation and stop words removal to obtain the text corpus. Secondly, subject extraction is undertaken to obtain the document-word matrix.

#### *3.3. LDA Thematic Model Training*

This involves building the LDA model with Sklearn package in Python software. Before building a model, it is necessary to determine the optimal number of topics of the model. In this study, the optimal number of topics of the model is determined by combining the model Perplexity. The calculation formula of the degree of perplexity is shown in Formula (1). Where *M* is the number of documents, *Nd* is the number of words, and *P*(*wd*) is the probability of *wd* in words in documents.

$$\text{perplexity} = \exp\left\{\frac{-\sum\_{i=1}^{M} \log(P(w\_d))}{\sum\_{i=1}^{M} N\_d} \right\} \tag{1}$$

The degree of perplexity indicates the uncertainty (i.e., information entropy) of the topic to which the document belongs. Perplexity is a standard method to measure the similarity of LDA topics [41]. When the downward trend of the degree of perplexity is no longer obvious or at the inflection point, the k value at this time is the optimal number of topics. In this study, when the number of topics is 12, 13 and 14, the perplexity of LDA model of text collection is in the lowest area. At the same time, considering the good theoretical and explanatory nature of topics, and avoiding over-classification of topics, this study set the optimal number of topics as K = 12 (as shown in Figure 2).

With regard to the setting of parameters, the Sklearn package in Python software is used to infer the distribution of topics and words. Where the number of iterations(max\_iter) = 2000, algorithm for solving LDA (learning\_method) is "batch", a priori parameter α(doc\_topic\_prior) of LDA = 50/k, k = number of topics, a priori parameter β(topic\_word\_prior) of LDA = 0.01 and other parameters use default values. Finally, 15 words with the highest probability under each topic are extracted, and outputs according to the word frequency from big to small, and five high-probability words under each topic are selected as the representatives of the topic meaning, which are used as the core words for topic identification.

**Figure 2.** The trend of perplexity under different topic numbers.

#### *3.4. Constructing Evaluation Indicators of Key Influencing Factors*

Step 1: Determine the index relationship based on DEMATEL method. According to the expert level 4 DEMATEL scale, the relationship between indicators is compared pairwise, and an initial impact matrix *A* = [*aij*] n × n is formed. The initial impact matrix is standardized, and a comprehensive impact matrix is formed through formula calculation.

Step 2: Establish a network hierarchy. DEMATEL method determines the significant relationship between evaluation indexes through quantitative calculation, which provides a basis for establishing network hierarchy. The network hierarchy of ANP method includes two levels, namely the control layer and network layer, and the network layer is composed of corresponding evaluation indexes and the relationship between them.

Step 3: Generate the judgment matrix. Suppose there are n indexes in the network layer, which are A1, A2, ... , An, and Ai contains secondary indexes Ai1, Ai2, ... , Aik. Then, take the element AjL in AJ as the criterion, and score the elements in the index Ai according to their action intensity on Ajl according to the nine-level scale method, so as to obtain the judgment matrix.

Step 4: Calculate the weight vector matrix. The feature vectors are transformed into standard feature vectors through standardization, and the consistency coefficient is used to test whether the judgment matrix meets the consistency requirements. If the consistency coefficient is greater than 0.1, it is considered that the judgment matrix has failed the consistency test, and it is necessary to score again to obtain new data.

Step 5: Determine the weighted hypermatrix. The standardized feature vectors of each network layer under the control of the control layer are combined together to form a super matrix *Ww*, and the weighted super matrix *Ww* is obtained by standardizing it.

Step 6: Calculate the limit hypermatrix. The Formula (2) is used to find the limit of the weighted hypermatrix *Ww*. If the limit is convergent and unique, the weight of each complexity index can be obtained through the limit, which also shows that the index weight can fully reflect the action relationship among the indexes.

$$\lim\_{k \to \infty} w\_w^k \tag{2}$$

#### **4. Empirical Results and Analysis**

#### *4.1. Analysis of Word Frequency*

Word frequency is a common technique used in text mining to evaluate the repetition of a word in a corpus. The more times a word appears in the corpus, the more likely it is to be the focus of study. Python is used to segment the text data of this study, and the words with the minimum word number of two words and the top 100 word frequency are set to make a word cloud map (as shown in Figure 3).

**Figure 3.** A word cloud map based on the top 100 word frequency.

In Figure 3 and according to the word size distribution, we can see that digital transformation of the construction industry has a concentration on the concept (or term) of reform, followed by administration, capital, talents and construction.

#### *4.2. Training Results of LDA Thematic Model*

After the analysis process of LDA model training and topic extraction, the optimal topics with k = 12 were finally selected, and the topic clustering and pyLDAvis visualization are carried out (as shown in Figure 4). By extracting five high-probability words under each topic, they are used as the characteristic words of this topic. It is an important process to explore the path of digital transformation of construction industry that how to effectively transform the superficial characteristic words of digital transformation of construction industry into the deep influencing factors of digital transformation of construction industry. Therefore, on the basis of synthesizing the clustering results of various topics, combining with the meanings of high-probability words, this paper consults relevant literature and consults experts to manually identify topics, and obtains the distribution of topics-terms (as shown in Table 2).

**Figure 4.** The visualization of Pyldavis theme-key influencing factors for the digital construction industry.


