Utilizing Large Language Models to Illustrate Constraints for Construction Planning

He, Chuanni; Yu, Bei; Liu, Min; Guo, Lu; Tian, Li; Huang, Jianfeng

doi:10.3390/buildings14082511

Open AccessArticle

Utilizing Large Language Models to Illustrate Constraints for Construction Planning

by

Chuanni He

¹

,

Bei Yu

²

,

Min Liu

^1,*

,

Lu Guo

²

,

Li Tian

³ and

Jianfeng Huang

⁴

¹

Department of Civil and Environmental Engineering, Syracuse University, Syracuse, NY 13244, USA

²

School of Information Studies, Syracuse University, Syracuse, NY 13244, USA

³

School of Civil Engineering, Qingdao University of Technology, Qingdao 266033, China

⁴

Qingdao Xinhuayou Construction Group Company, No. 108 Zhuzhou Rd., Qingdao 266101, China

^*

Author to whom correspondence should be addressed.

Buildings 2024, 14(8), 2511; https://doi.org/10.3390/buildings14082511

Submission received: 8 July 2024 / Revised: 6 August 2024 / Accepted: 9 August 2024 / Published: 14 August 2024

(This article belongs to the Section Construction Management, and Computers & Digitization)

Download

Browse Figures

Versions Notes

Abstract

Effective construction project planning relies on addressing constraints related to materials, labor, equipment, and others. Planning meetings are typical venues for stakeholders to identify, communicate, and remove constraints. However, a critical gap exists in lacking an automated approach to identify, classify, analyze, and track constraint discussions during onsite planning meetings. Therefore, this research aims to 1. develop a natural language processing model to classify constraints in meeting discussions; 2. uncover the discussion patterns of managers and foremen regarding various constraints; and 3. extract the root causes for constraints, evaluate their impacts, and prepare managers to develop practical solutions for constraint removal. This research collected meeting transcripts from 94 onsite planning meetings of a building project, spanning 263,836 words. Next, this research leveraged a general pretrained transformer (GPT) to segment discussion dialogs into topics. A Bidirectional Encoder Representations from Transformers (BERT)-based model was developed to categorize constraint types for each topic. The constraint patterns among meeting attendees were assessed. Furthermore, a GPT-based tool was devised to track root causes, impacts, and solutions for various constraints. Test results revealed an 8.8% improvement in constraint classification accuracy compared with the traditional classification model. An occupational characteristic in constraint discussion was observed in that the management team tended to balance their focus on various constraints, while foremen concentrated on more practical issues. This research contributes to the body of knowledge by leveraging language models to analyze construction planning meetings. The findings facilitate project managers in establishing constraint logs for diagnosing and prognosticating planning issues.

Keywords:

constraint discussion; planning meeting; general pretrained transformer (GPT); bidirectional encoder representations from transformers (BERT) classification; natural language processing

1. Introduction

Statistics indicate that approximately one-third of construction megaprojects fail to meet cost or schedule targets due to inadequate constraint management [1]. Effective constraint management is crucial for aligning stakeholder goals, streamlining production workflows, and maintaining stable labor productivity [2]. However, traditional project planning methods, such as the critical path method, primarily focus on compressing tasks to adhere to schedules, often leading to imbalanced resource allocation and work disruptions [3]. Additionally, the dynamic nature of construction sites means that constraints are often interrelated, adding challenges to constraint management [4]. Recognizing this complexity, project managers invest significant time and effort in identifying, understanding, analyzing, and addressing constraints from various stakeholders [5].

Planning meetings on the job site are key platforms for project teams to discuss and negotiate constraints. These routine meetings are vital in communicating stakeholder perspectives, identifying challenges, and formulating strategies. The collaborative environment ensures that all parties are informed and reach a consensus on overcoming constraints [6]. Abundant research efforts have been made to facilitate an effective organization of the planning meetings [7,8,9]. However, existing research does not fill the knowledge gaps of adequately representing the dynamic interactions among constraints due to their qualitative analysis nature or limited quantitative performance indicators [4]. Therefore, addressing the fragmented nature of constraint discussions so that automated approaches can be developed to identify and analyze constraints from planning meetings is essential for improving planning efficiency.

Tackling this issue, recent studies have shown the effectiveness of large language models (LLMs) in predicting and extracting domain-specific knowledge [10,11,12]. As a subdivision of natural language processing (NLP), an LLM is an artificial intelligence system trained on vast text data to generate, understand, and interact with human language. The pretraining process on extensive text data enables LLMs to interpret complex and technical discussions. In addition, these models excel in processing large text volumes efficiently and provide context-sensitive analysis, enabling a comprehensive understanding of complex technical discussions without explicit reliance on predefined rules or principles.

Despite the great potential of implementing LLMs for text analysis, research efforts regarding applying them to analyze construction planning constraints are still nascent. Therefore, three research questions can be outlined. First, how to leverage LLMs to automatically identify constraint discussions? Second, are there specific constraint discussion patterns associated with different stakeholders? Third, how to efficiently extract critical information from constraint discussions to facilitate planning decision making?

Accordingly, the objectives of this research were to (1) develop a Bidirectional Encoder Representations from Transformers (BERT)-based model to classify constraints in meeting discussions; (2) identify the frequency and length of discussion patterns of managers and foremen on various types of constraints; and (3) create a general pretrained transformer (GPT)-based model to extract the root causes for constraints, evaluate the impact, and direct managers to develop effective solutions for constraint removal.

This research developed a language model-driven framework to identify, classify, and analyze constraints from planning meetings. First, meeting transcripts were collected from a construction case project. A prompt-based text segmentation model using GPT-3.5 was developed to categorize meeting dialogs into topics. Following this, a BERT-based model was devised to predict potential constraints for each topic. This research also assessed discussion sufficiency at meeting attendees and constraint levels to identify abnormal discussions. Finally, this research leveraged a GPT-4 model to establish a constraint information extraction tool to extract the root causes, potential impacts, and solutions for constraints from meeting transcripts. The experiment results indicate that the more matured GPT-4 model encompasses a deeper understanding of domain-specific knowledge to provide precise summarization. This research introduced a novel approach to identifying and classifying constraints from planning meetings. The method streamlines the planning process by anticipating upcoming issues, supporting continuous improvements, and informing policy decisions on constraint management strategies.

2. Background and Literature Review

This research is grounded in the constraint theory within construction production planning and control. This section elaborates on the aspects and definitions of constraints in planning meetings and explores the application of NLP in the construction industry. It also examines emerging LLM-based approaches, which are the foundations of the methodology.

2.1. Constraint Removal in Construction Meetings

Construction workflow, which is the sequential transfer of tasks among teams, plays a vital role in the overall performance of a construction project. A stable workflow indicates that tasks are implemented well as planned. Constraints are obstacles or restrictions that hinder the completion of these planned tasks. In traditional construction planning frameworks such as the critical path method (CPM), constraints were often identified as time, space, and labor, and were not associated with specific tasks [13,14]. With the evolution of lean construction theory, the well-defined last planner system (LPS) incorporates a look-ahead plan, which drives actions on specific activities to address constraints [15]. From the lean perspective, removing critical constraints reduces workflow variation and enhances productivity, which is key to project success [16,17]. Ref. [16] identified seven constraints that can be removed from a planning meeting to improve workflow. Work without fully removing constraints can lead to multiple conditions, including congestion, out-of-sequence work, unexpected starts or stops, inaccurate detailed plans, material obstruction, overtime, and oversizing crews. Ref. [18] introduced two more categories related to safety and unfamiliar working conditions, highlighting the national laws and challenges frequently encountered in practice.

Meetings are crucial for construction planning and scheduling since they provide a formal communication channel for project stakeholders [19]. In the labor-intensive construction industry, decision making is a multiphase complex problem [20]. Hence, planning meetings serve as a platform for stakeholders to align their objectives, discuss potential challenges, and devise strategies to remedy risks [21]. Ref. [22] highlighted the importance of regular meetings to monitor progress and make necessary adjustments. They argued that iterative meetings adaptively address unforeseen challenges, ensuring the project stays on track. In practice, a construction planning meeting is one of the most common venues for discussion and removing constraints [5]. The iterative nature of construction planning meetings ensures that constraints are continuously and sufficiently discussed, monitored, and addressed [23]. Abundant research efforts have been made to facilitate constraint removal via optimizing planning meeting efficiency. Ref. [24] spotted that a proper form and priority of planning meetings provides ample opportunity for increasing safety climates. Ref. [25] gauged the linkage between meetings and workflow reliability. The results found that meetings with other planning and control functions explain 54% of the variability in causes of failure. Ref. [5] calculated the uncertainty and impact per constraint discussion during planning meetings.

However, most existing research methods for constraint discussion require extensive manual work to convert dialogs to numeric measurements, which limits the application of existing optimization methods [5,26,27]. In addition, existing constraint optimization methods often utilize static performance measures such as the number of constraints removed or percentage of constraints removed, which is inflexible to reflect the dynamic changing workflow on the jobsite. Therefore, a research gap exists in the lack of a method to dissect discussion dialogs directly to identify constraint types, extract the root causes, and measure the impact.

This research aimed to resolve the gap by developing an NLP-based approach to classify constraints from meeting transcripts. Adopting the definitions from [16,18], this research investigated eight types of constraints. (1) External condition (EC): external environmental factors that affect construction, such as climate, geological conditions, traffic conditions, and more. (2) Equipment availability (EA): the availability and usability of various equipment required during construction, including construction machinery, tools, instruments, and more. (3) Labor availability (LA): the availability and useability of human resources required in construction, including workers, technicians, supervisors, and more. (4) Material availability (MA): the availability and usability of various materials required in construction, including cement, steel bars, bricks, sand, scaffolding, pipelines, prefabricated components, and more. (5) Prerequisite readiness (PR): the prerequisite activities of certain construction activities, including necessary preparatory activities, activities that occupy various resources in subsequent processes, and more. (6) Space availability (SA): whether the workspace of the construction site is sufficient to meet the construction needs, including the area, height, passages, and more. (7) Design and working method (DWM): the design plan and specific construction methods, process flows, and working technologies of construction. (8) Safety: safety issues during civil engineering construction, including personal safety of workers, safe use of equipment, safety management of construction sites, and more.

2.2. Natural Language Processing for Construction Documents

As a branch of artificial intelligence, NLP deals with the interaction between computers and humans through natural language. With the rise of digitalization in construction, effectively managing extensive text-based information is crucial [28,29,30]. NLP has been tested effective in automating text extraction and analysis [12,31]. Over the past years, NLP has been an active research area in construction for data collection, knowledge discovery, information extraction and retrieval, automated reasoning, opinion mining, and language generation [32].

Earlier applications of NLP in construction involved traditional methods like syntactic and semantic analysis [11] for tasks including concept relation extraction [33], automated compliance checks [34], and reasoning in regulatory documents [35]. A notable advancement in NLP came with the introduction of transformer models, which gained prominence due to the innovation of a self-attention mechanism [36]. The ability of transformers to understand dependencies between words, irrespective of their positional distance, made them highly effective for various NLP tasks. BERT became a notable example among the transformer-based models [37]. BERT’s breakthrough lies in its bidirectional training on text data, allowing for a deeper understanding than traditional unidirectional approaches. In addition, the flexibility of BERT in being fine-tuned for various tasks increases its versatility and popularity in engineering applications [32]. Previous research demonstrated the capability of the BERT model in addressing complex and construction domain-specific language processing tasks [10,12,38]. However, most applications focus on structured text data, such as legal documents or technical specifications. Research on applying BERT to colloquial or narrative contexts of meeting transcripts or dialogs in construction remains limited [32]. This gap is attributable to the inherent variability and fragmented structure of dialogs, which present unique challenges for NLP models to infer implicit information and segment topics.

Some models, such as BERT or XLNet, can be fine-tuned on smaller datasets for specific tasks [37]. Larger models can employ prompt-based few-shot learning to achieve similar results. Recently, one prominent LLM framework is the GPT series, including InstructGPT, ChatGPT, and GPT-4 [39,40]. This ability has shifted NLP research toward a prompt-based approach, where the model’s outputs are customized based on user-provided prompts [41]. Due to this feature, GPT advanced NLP models to obtain similar state-of-the-art (SOTA) performances in zero-shot or few-shot learning [42]. Research attempts have been made to apply GPT models in the construction domain. Ref. [43] proposed a GPT-2-based model to learn the implicit constraints of construction schedule relationships. Ref. [11] applied a GPT-3.5 model to develop a virtual assistant for BIM information query and extraction. Ref. [44] synthesized ChatGPT with augmented virtual reality to facilitate construction facility operations and maintenance training. The applications of LLMs show BERT’s enhanced text classification capability by capturing contextual details. Additionally, GPT models feature SOTA few-shot learning capabilities for generating contextually relevant text, facilitating information extraction from unstructured data. Given that primary objectives of construction constraint management are to identify, classify, and analyze various constraints from unstructured data sources like meeting minutes, there is a great potential to leverage LLMs to enhance the discovery and diagnosis of constraints from construction planning meetings. Nevertheless, research efforts in this direction are limited. Thus, further investigation is needed to explore how few-shot learning in GPTs can augment BERT’s capabilities in understanding construction domain-specific dialogs.

3. Research Method

This research proposed a framework to predict, analyze, and track constraint discussions from meeting transcripts. The framework comprises three interconnected modules, as depicted in Figure 1: a topic segmentation module, a constraint classification module, and a constraint analyzing and tracking module.

The topic segmentation module aims to segment or synthesize fragmented meeting dialogs into unified discussion topics, serving as the basis for constraint analysis. This research employed a prompt-based one-shot learning approach utilizing a GPT model. As shown in step A of Figure 1, it fed daily meeting transcripts and custom-designed prompts to GPT. Subsequently, step B instructed GPT to segment dialogs into distinct discussion topics according to their semantics. These topics constituted multiple dialogs and were output in a text file in step C for further analysis.

Following this, the constraint classification module categorized each meeting discussion topic into various constraints. The outputs from the first module were analyzed, and human effort was dedicated to annotating constraints, categorizing each discussion topic into one of nine classes: eight constraint types and a “no constraint” class. In step D, text cleaning was implemented to prepare for machine learning. The cleaned text data were then used to fine-tune a customized BERT classification model in step E. Through a training process, the optimal model was identified in step F. This model was then used to predict constraints on a test meeting topic dataset.

The constraint analyzing and tracking module focuses on analyzing the sufficiency of discussion for each constraint and extracting critical information pertinent to constraint discussions. After obtaining constraint prediction results from step G, the discussion sufficiency for specific constraint topics was evaluated by comparing them with the average discussion sufficiency among other topics. Step H identified topics with high variances as either under-discussed or over-discussed. Using a user-defined prompt shown in step I, these topics were then processed through a GPT-based chatbot, which extracts insights relevant to each constraint discussion, facilitating construction managers in their decision-making processes. In step J, a user-defined function is implemented to locate the discussion topics related to high discussion sufficiency variances. This research employs few-shot learning to develop a virtual assistant capable of extracting the causes, impacts, and potential remedies for abnormal topics. Finally, step K validates the framework, and 17 third-party construction professionals evaluated the performance.

3.1. Discussion Topic Segmentation

The initial step in analyzing the meeting transcripts involves segmenting the conversations by discussion topics. This research employed the GPT-3.5-turbo-16k model for topic segmentation due to three reasons. First, GPT-3.5′s extensive pretraining on diverse data offers a deep understanding of complex contextual relations. Second, the model exhibits remarkable generalization capabilities, efficiently adapting to new tasks through zero-shot or few-shot learning approaches without extensive training and testing datasets. Third, its context window of 16,385 tokens per request allows for processing a large volume of conversation per request, thereby minimizing batch sizes during learning.

The Chat Completion API was utilized to build data pipelines and execute topic segmentation tasks. Informed by existing studies on similar tasks [11,41,45], this research designed the prompts following a trial-and-error methodology. For this task, the model inputs were batches of meeting dialogs, each represented by a single string. The module groups the dialogs based on discussion topics through prompt-based one-shot learning. The segmented outputs were organized in a data frame. Each dialog was encapsulated in single quotation marks, and dialogs on the same topic were concatenated and prepared for constraint identification and classification. Additionally, this study conducted a pilot study to evaluate the model’s applicability by randomly selecting 20 discussion topics. The corpus covered by these topics was fed to the model for topic segmentation. Parallelly, two authors categorized these topics independently to obtain the ground truth labels. The results demonstrate that the GPT model correctly segmented 18 out of the 20 topics, indicating an accuracy of 90%. The pilot study revealed the model’s acceptable applicability and capability for the segmentation task.

3.2. Constraint Classification Using BERT

The second module developed a BERT classification model to predict the type of constraint per discussion topic. This module encompasses several steps: data preprocessing, model development, and model training and evaluation.

3.2.1. Data Preprocessing

This research was designed to analyze meeting transcripts recorded in the Chinese language. Given the unique characteristics of the Chinese language, the text preprocessing approach diverges significantly from that used for English. As a hieroglyphic language, Chinese characters are composed of radicals. For instance, as depicted in Figure 2, a single character combines a semantic radical “female” and a phonological radical “horse”, the latter sharing pronunciation with the character for “mother” with a different tone [46]. Consequently, processes like lowercasing, stemming, and lemmatization are inapplicable to Chinese text, as they could distort the meaning of words by separating their radicals.

Therefore, the data cleaning process includes two major steps: tokenization and stop word removal. Tokenization splits a sequence of text into smaller tokens, facilitating context understanding for machine learning models. This study employed the Jieba Python package for Chinese tokenization [47]. Additionally, the dialog form of the meeting transcripts implied that stop words were prevalent and could introduce significant noise. In NLP, stop words represent the “meaningless” words with little value in understanding the text meaning. The StopwordsISO Python package was utilized to remove these stop words after tokenization, as it offers an extensive stop word dictionary for multiple languages [48]. While some studies have included stop words in BERT modeling due to their potential utility in contextual understanding [38,49], this research removed them. This is because most constraint features are rooted in keywords rather than contextual information. In addition, parameter tuning comparisons revealed that text without stop words yielded better classification performance. The complete process of text tokenization and stop word removal is illustrated in Figure 3. Stop words were substituted by a single space during preprocessing.

3.2.2. Constraint Classification Model

This research fine-tuned a “bert-base-Chinese” model for classifying constraint types. The model contains 12 encoder block layers, 768 embedding dimensions, and 12 attention heads, with character-based tokenization and approximately 110 million parameters [50]. To adapt the BERT model for multi-class classification tasks, two multilayer perceptron (MLP) layers were integrated. Each block in the MLP comprises a linear dense layer, a batch normalization layer, a rectified linear unit (ReLU) activation function, and a dropout layer with a rate of 0.1. The application of the ReLU function is due to its computational efficiency, sparsity, and ability to capture non-linear patterns compared with other alternatives such as sigmoid or tanh [51]. The model used a cross-entropy function to measure training loss, and the AdamW optimizer was employed during training. The model inputs are cleaned texts for each discussion topic, and the model will output nine prediction probabilities per discussion topic indicating the types of constraint it is associated with.

After data preprocessing, the discussion topics and constraint labels were divided into training and test sets in an 80:20 ratio, applying a stratification technique to maintain balanced distributions. Given the inherent imbalance in constraint class distribution, a combination of oversampling and undersampling techniques was employed for the training set. For example, as shown in Figure 4a, the “no constraint” majority class of the original training set (Null) has over 800 samples, whereas the maximum sample count for constraint classes was 87. Therefore, the “no constraint” class was undersampled to 80, and other minority constraint classes were oversampled to the same number. Note that even though this research applied resampling approaches, the unique data samples in each group are still unbalanced. This may lead to potential biases of constraint classification models at the application stage. Detailed discussions regarding the model performance will be discussed in the experiment sections.

A grid search approach was utilized to tune the BERT classifier. The optimal model with the lowest validation loss was then identified for subsequent performance evaluation and constraint predictions.

3.2.3. Model Evaluation

This research applied the test accuracy and macro average precision, recall, and F1 score as performance indicators. These metrics were calculated according to Equations (1)–(4), respectively. Here, TP, FN, FP, and TN represent the numbers of true positives, false negatives, false positives, and true negatives, respectively. A confusion matrix was constructed to further dissect and understand the model’s performance across various constraint types, detailing the test results. An in-depth discussion and analysis of the model performance on the test set are presented in the Section 4.

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N},

(1)

P r e c i s i o n = \frac{T P}{T P + F P},

(2)

R e c a l l = \frac{T P}{T P + F N},

(3)

F 1 = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l},

(4)

3.3. Prompt-Based Constraint Analyzing and Tracking

The third module aims to provide insights to construction managers on efficiently removing constraints. This research proposed a discussion sufficiency metric to identify abnormal constraint discussions from various attendees. These discussion contents were then analyzed by a GPT-based tool to track their causes, impacts, and solutions.

3.3.1. Topic Discussion Sufficiency

Abundant numerical methods and models were proposed to measure the uncertainty, impact, and association among constraints [4,5,52]. This study developed a metric discussion offset (

O

) for individual constraints, meeting attendees, and constraint–attendee pairs. Defining the set of constraints as set

A

and the set of meeting attendees as set

B

, we have:

O_{i} = \frac{L_{i}}{N_{i}} - \frac{\sum_{i \in A} L_{i}}{\sum_{i \in A} N_{i}},

(5)

O_{j} = \frac{L_{j}}{N_{j}} - \frac{\sum_{j \in B} L_{j}}{\sum_{j \in B} N_{j}},

(6)

O_{i, j} = |\frac{L_{i, j}}{N_{i, j}} - \frac{\sum_{i \in A} L_{i, j}}{\sum_{i \in A} N_{i, j}}| + |\frac{L_{i, j}}{N_{i, j}} - \frac{\sum_{j \in B} L_{i, j}}{\sum_{j \in B} N_{i, j}}|,

(7)

where

L

is the length of the discussion, and

N

represents the number of the discussion. The discussion offset indicates the sufficiency of the discussion. Using Equation (5) as an example, if a constraint

i

was discussed in 10 topics, and the total length of these topics is 2000 words, then

\frac{L_{i}}{N_{i}} = 200

words/constraint. In addition, if 100 topics were related to all constraints, and their length is 22,000 words, then we have

\frac{\sum_{i \in A} L_{i}}{\sum_{i \in A} N_{i}} = \frac{22,000}{100} = 220 w o r d s / c o n s t r a i n t

on average. Therefore, the discussion offset for constraint

i

is

200 - 220 = - 20

words/constraint, indicating that constraint

i

discussions are 20 words less than the average. Similarly, according to Equation (6), the discussion offset for meeting attendee

j

can be obtained. Equation (7) focuses on a more specific constraint–attendee pair. The

\frac{L_{i, j}}{N_{i, j}}

portion calculates the average length of a constraint

i

mentioned by attendee

j

. The absolute offset of this mean value compared with the constraint average and the attendee average were then summed. Hence, the discussion offset

O_{i, j}

measures the discussion sufficiency regarding a constraint mentioned by a specific attendee.

3.3.2. GPT-Based Constraint Information Extraction

The discussion offset identifies insufficient or excessive discussions regarding specific constraints and attendees. Insufficient discussions might overlook vital information, while excessive discussions could lead to redundant or distracting content. Hence, accurately identifying and extracting critical information is essential for managers to diagnose problematic discussions. This research further leveraged the GPT-4 model to extract critical components from discussion topics for constraint removal. The GPT-4 model was applied in this task because of two reasons. First, it has a long context window of 8192 tokens to accommodate long conversations. Second, the improved few-shot learning ability facilitates the model to efficiently understand the domain-specific information, recognize similar information, and correct inaccurate or error information.

Discussion topics with predicted constraints from the BERT classification module were initially obtained. The authors then established a data pipeline to identify the attendees corresponding to each dialog. As shown in Figure 5, the “Content” column contains the discussion dialogs. The “Attendee” column includes attendees associated with each dialog accordingly. The “Constraint” column represents the classified constraint from BERT.

Subsequently, a user-defined function leveraging the GPT-4 model was developed for constraint information extraction. This involved designing prompts for the model to identify the cause, impact, and potential remedies for each constraint while verifying and correcting any misclassifications from BERT. Eventually, 17 third-party construction professionals evaluated the information extraction results and other outputs to validate the framework.

4. Experiments and Evaluations

The proposed framework was evaluated via empirical studies. This research applied the framework to a case project. Domain experts evaluated the results to establish research validity and generalizability.

4.1. Data Collection

In this empirical study, this research analyzed meeting transcripts from a high-rise office building project in Qingdao, China. The project is a standard 46-floor structure with a three-floor basement, encompassing a total area of 111,884 m², and spanning a project duration of 890 days. The project adopted the Engineering, Procurement, and Construction (EPC) delivery method, with the general contractor overseeing procurement responsibilities. This study primarily concentrated on the structural erection phase of above-ground building structures. This research selected this case project since it has a medium scope and budget. It also applied standard construction approaches and common project delivery methods. To this end, the case project is typical and representative, and the proposed method can be generalizable to broader construction projects once it is validated in the case project.

During the data collection phase, meeting inputs were gathered from seven distinct departments: technology, quality, design and BIM studio, electrical and mechanical engineering, construction management, safety management, and procurement. This research selected two types of meetings, daily huddles and weekly meetings, for data collection since they met three critical criteria. First, they are formal meetings organized by the general contractor (GC). Second, they were held regularly, minimizing potential omissions during data collection. Third, these meetings were organized following a meeting agenda, which is beneficial in establishing a norm for data collection and can enhance the generalizability of the findings. Daily huddles were convened daily at 4:20 pm. Regular attendees were production managers from the GC, subcontractors, and auxiliary labor teams. Project managers from the GC participated in the meeting irregularly. The weekly plenary meetings, scheduled every Monday at 6:10 pm, were mandatory for all management teams for this project. Table 1 presents detailed insights into these meetings.

This research collected meeting transcripts spanning 119 working days, from February 28 to 26 June 2019. This period included 100 daily huddles and 14 weekly meetings. However, 19 daily huddles were not held due to factors such as adverse weather conditions, national holidays, and site visits by leaders. One weekly meeting was also canceled due to the project manager’s absence. This study utilized a non-intrusive observation method to collect meeting transcripts data. During the data collection phase, one of the authors was present on the job site and participated in all project meetings. These meetings were recorded using a voice recorder, and the transcripts were manually typed based on these recordings after each meeting. All data analyses were conducted after the meetings, ensuring the transcripts remained accurate, original, and intact. To avoid ethical concerns, pseudonyms were used during data collection to prevent personal information-related bias.

The cumulative data comprise 263,836 Chinese characters in a dialog format. Sample meeting transcripts are illustrated in Figure 6. Texts in the “Content” column represent a dialog in each row. Finally, all meeting dialogs were organized in an Excel data table and saved as a CSV file for data processing.

4.2. Model Applications

This research applied Python built-in and user-defined functions, as well as built data pipelines to obtain the models’ performance metrics. During the model application, the data were processed and analyzed following the sequential order in the Research Method section. Initially, the raw meeting transcripts were segmented into discussion topics with GPT-3.5. The input corpus was in Chinese, so Chinese prompts were used to ensure output consistency. The English translation is presented in Figure 7.

The system message was defined to guide the model’s general behavior. The authors specified that the model should behave like a construction professional. Notice that despite the large context window of the model, it is insufficient to encompass all meeting transcripts. Consequently, a sliding window approach was adopted, dividing transcripts into batches compatible with the model’s context length. Upon examining the raw transcript data, a sliding window of approximately 8000 words was identified as optimal for two reasons. First, it matched the average context length of the daily huddles. This segmentation resulted in dividing the transcripts from 114 meetings into 123 segments. Second, it can potentially reduce the risk of hallucination, where lengthy and complex prompts might lead the model to generate unrelated or loosely related outputs [53]. For meetings with a transcript length of greater than 8000 words, manual efforts were made to split them into segments of less than 8000 words. Furthermore, a loop function fed the “User Input” section with transcript batches in each iteration. The prompt capped the context length for output segments at less than 500 words, aligning with the BERT model’s maximum sequence length of 512 tokens [37]. This constraint ensured that most segments would not exceed the 512-token limit post cleaning, thereby avoiding information loss due to truncation.

Data labeling should be implemented before feeding the discussion topic data for constraint classification. The authors independently annotated the discussion topics. To ensure the validity of the labeling process, all annotators labeled a subset of discussion topics. The inter-annotator agreement rate between the two authors was 0.90, demonstrating acceptable consistency in the data-labeling process. Inconsistent annotations were discussed among all annotators to determine the final constraint type.

The labeled data were preprocessed according to the steps discussed in prior sections. Eventually, 727 training samples and 140 test samples were prepared for BERT model training. Table 2 lists the customized hyperparameters during grid search training. All other hyperparameters were kept default. The pretrained model applied was from Google’s BERT model on the Huggingface Pytorch port [54]. The optimal model was identified through five-fold cross-validation. Torch version 2.1 was applied to build the model. Other key libraries include transformers 4.32.1, scikit-learn 1.2.2, and pandas 2.0.3. The model tuning was conducted on Google Colab virtual machines equipped with an A100 GPU accelerator.

To evaluate the third module, the constraint type per topic was obtained from the prediction results of the BERT model. Next, discussion sufficiency metrics were calculated for each pair of constraint and attendee using Equation 7. This research developed the user prompts in Figure 8 to achieve GPT-based constraint information extraction through a user-defined function.

The prompts constitute four major chunks: general instruction, constraint definition, few-shot examples, and input data. Based on experience from data labeling, it was determined that the most pertinent constraint-related information typically spans a range from three topics before to two topics after the target topic. Hence, for a specific topic

i

, the function feeds dialogs from six topics

(i - 3, i + 2)

into the prompts. The target topic is surrounded by three asterisks for the GPT model to locate the correct constraint. In addition, an instruction was given to force GPT to first determine the correctness of the constraint before information extraction. The prompt also prioritizes the answer “not found” to reduce hallucination. The few-shot example section guides the model’s response structure. It includes two examples that follow a consistent format: first assessing the correctness of the constraint, then discussing its cause, impact, and potential remediation based on the actual constraint type. The examples include one illustrating correct classification without detailed remedial discussions and another demonstrating incorrect classification with a remedial measure discussion. Lastly, the input data section incorporates the dialog contexts and their classified constraints.

In the operational stage, each function call involves the data pipeline querying and identifying a discussion topic based on meeting date, constraint type, and involved attendees. The selected topic and its contextual contents are concatenated and fed to the “Dialogue” section, while the constraint is included in the “Constraint” section. Additionally, to enhance the consistency and reliability of outcomes from the GPT-4 model, the temperature parameter was set to 0, and all other parameters were kept default.

4.3. Results and Discussions

The topic segmentation phase involved processing 9197 dialogs from the original meeting transcripts, yielding 1645 discussion topics. On average, each topic comprised 161 words and encompassed approximately 5.6 dialogs. Based on the topic segmentation results, the constraint labeling results are shown in Figure 9. This figure reflects the interaction among meeting attendees, with higher dialog frequencies indicating more extensive stakeholder involvement in specific constraints.

The analysis revealed that the top three constraints with the highest discussion frequencies were equipment availability, the design and working method, and space availability. These findings align with expectations given the nature of the project, a high-rise building in an urban setting with limited working space. Equipment availability, particularly tower cranes, which are crucial for vertical transportation, can be a significant bottleneck in the project schedule [55]. The need for intensive negotiation among multiple subcontractors to prioritize lifting tasks elucidates the high frequency of discussions surrounding this constraint. This also provides insight into the lower discussion frequency for the labor availability constraint. Since subcontractors were responsible for labor recruitment, less coordination was required across different trades. The second most discussed constraint, the design and working method, reflects the inherent complexities of working methods and design document details. Space availability gathered significant attention as the workspace is a shared resource and a typical constraint for high-rise building projects. The frequent discussions on this topic highlight the challenge of workspace coordination in resource-constrained project scheduling [56].

4.3.1. Constraint Classification with BERT

Accurate constraint classification is the backbone of GPT-based information extraction. This research established a support vector classifier (SVC) as the benchmark to evaluate the performance of the proposed BERT classifier. Employing the same training and test datasets, the optimal SVC model was identified through grid search parameter tuning:

C = 5

,

gamma = scale

, and

kernel = rbf

. The comparative performance of the SVC and BERT models on the test set was assessed using the test metrics outlined in Equations (1)–(4). Figure 10 presents the test metrics for both models, with error bars representing the standard deviation of the BERT model across ten iterations. The results indicate that the BERT model significantly outperformed the benchmark SVC in all test metrics. The statistics demonstrate that the BERT model has significantly higher accuracy in identifying constraint types with high sensitivity and specificity.

The confusion matrix was obtained from the BERT test results to identify the source of false predictions, as shown in Figure 11. The results reveal that the BERT model performed well in safety, labor availability, external conditions, and non-specific constraint topics. However, higher misclassification rates were observed in the design and working method, equipment availability, and prerequisite readiness categories. The underlying causes for these discrepancies can be attributed to two main reasons. The first relates to the inherent capabilities of the BERT model. Discussions involving safety, labor, and external conditions like weather or holidays typically include common keywords, facilitating easier identification. In contrast, discussions about design and working methods, construction equipment, and work sequences often involve construction-specific jargon, which is less prevalent in the data pretrained into BERT. This domain-specific vocabulary challenges the model’s ability to utilize prior knowledge for classification, leading to additional context understanding and generalization barriers. Given such observations, involving extensive training with additional data samples in these categories may be one solution to improve the model’s natural language understanding regarding such vocabulary. Furthermore, a pretrained embedding model focusing on domain-specific keywords may also help improve the model performance [57]. The second reason pertains to the intrinsic nature of the constraints themselves. Discussions about working methods, prerequisites, and the use of shared equipment frequently intersect with other construction resources like labor and materials. For instance, a sample discussion topic associated with prerequisite readiness, detailed below, demonstrates this interconnection.

“Can we start plastering tomorrow?” “Maybe not. I’ve prepared the woodworking, but the rebar isn’t fully installed yet.” “But I saw quite a few steelworkers on the job.” “Yes, there were a lot of them. They work on a contract basis, finish the job and then leave. I’ve done the preparatory work.” “But the scaffold isn’t done yet. When will there be workspace on the main tower above? Can we work up there?” “Sure.” “So, are we tying rebar now?” “No, wasn’t it disapproved?” “So, it means the scaffolding isn’t ready yet.”

This case illustrates a real constraint wherein the incomplete predecessive rebar installation work impedes the scheduled commencement of plastering work. Notice that the dialogs for this topic encompass a variety of key elements such as labor, materials, and workspace. This highlights a strong interrelation between the primary constraint and other ones. Consequently, this complexity poses a challenge and may result in the model showing less favorable results in some circumstances. Despite these nuances, the BERT model demonstrated practical applicability in effectively predicting constraints in construction project discussions.

4.3.2. Constraint Discussion Patterns

Due to the limited data samples, this research used the BERT constraint prediction results of the entire dataset for a consistent analysis. Next, the relative discussion frequency for each meeting attendee was calculated through user-defined Python functions and shown in Figure 12. Note that attendees with less than 30 dialogs during the data collection period were excluded to ensure result validity. This figure reveals the constraint distribution per attendee during the meetings. For instance, the first value of 0.28 for Mr. D indicates that 28% of his discussions on constraints focused on design and working methods. In identifying discussion frequencies greater than 0.3 as highly frequent, it was observed that nine attendees predominantly discussed certain constraints in the meetings. These attendees were categorized into two groups based on their job roles: management and foreman teams. The dominant constraints for these groups are detailed in Table 3. A distinct occupational characteristic was found. First, only three management team members exhibited a high frequency of discussion on specific constraints, which is fewer than the foreman team. Members like the project manager Mr. LW, the structural subcontractor manager Mr. LJ, and the GC labor manager Mr. Q discussed a variety of constraints in a more balanced manner, reflecting the management team’s extensive control over diverse project aspects. Only the safety officer Mr. T focused significantly on safety constraints. Second, there is a significant difference in high-frequency constraints between the management and foreman teams. The management team generally focused on more high-level topics such as project safety. The only exception is the chief engineer Mr. ZH who paid substantial attention to the practicalities of design and working methods. Conversely, the foreman team is closely aligned with practical and actionable items like equipment, space, and working methods, correlating directly with their job responsibilities. For example, technician Mr. F spent significant effort coordinating construction equipment such as tower cranes, hoists, loaders, and water pumps to solve equipment availability constraints from other attendees. Foreman Mr. S focused on the critical path activities and shared workspace with other trades. Similarly, foreman Mr. M, responsible for steelwork, frequently requested additional lighting facilities to work efficiently at night. As to the foreman for the waterproofing trade, the most frequent discussion for Mr. G is to reserve holes in waterproofing materials for other trades such as pipeline, scaffolding, and steel structures, leading to his top constraint in design and working method.

The insights from these results are two-fold. First, managers and foremen demonstrated a notable inconsistency in the top constraints. This discrepancy sheds light on front-line laborers’ specific needs and concerns, reminding project managers to know their needs and facilitating constraint removal for higher productivity. Second, by accumulating discussion frequency data for a project team, managers gain prior insight into common constraints likely to be encountered by specific stakeholders. This knowledge is instrumental in establishing an early warning mechanism, enabling the management team to proactively diagnose and anticipate potential constraints.

Furthermore, the discussion offsets for each attendee and constraint were calculated through user-defined functions, as illustrated in Figure 13, where the horizontal dashed line represents the average words used in a dialog per constraint (

\frac{\sum_{i \in A} L_{i, j}}{\sum_{i \in A} N_{i, j}}

in Equation (7)), and the vertical dashed line is the average number of words per attendee (

\frac{\sum_{j \in B} L_{i, j}}{\sum_{j \in B} N_{i, j}}

in Equation (7)). Each data point corresponds to a constraint mentioned by a specific attendee. The

x

coordinate represents the difference between the number of words used on the constraint and the average number of words used by them on all constraints. The

y

coordinate represents the difference between the number of words used on the constraint by them and the average number of words used on the constraint by all attendees. For example, Mr. D’s coordinates for the space availability constraint are (2.2, −13.8). This means that compared with himself, he used 2.2 more words to describe space-related constraints than the average number of words he used on all constraints. He used 13.8 fewer words on space-related constraints than the average number of words used by all attendees.

Figure 13 shows that constraints such as EA, MA, LA, and PR are relatively more converged to the center, indicating a uniform effort by attendees in their discussions. In contrast, DWM, EC, and safety discussions are more spread out in horizontal and vertical directions, showing a higher variability in the length of discussion per attendees and constraint. This observation holds practical implications for construction managers in organizing planning meetings. For constraints with consistent discussion efforts, a routine schedule for open discussion should be allocated, while those with high variability should be treated on a case-by-case basis. For example, time should be allocated in advance for the constraints with high variability.

The positive and negative offsets in Figure 13 create four distinct quadrants, each indicating different discussion dynamics. Data points in quadrant I indicate that the attendees spent more effort explaining the corresponding constraint than they did for all constraints. They also used more words to discuss the constraint than what all attendees did for that constraint. Such phenomena may be associated with long, complex discussions regarding details between several stakeholders. Therefore, a brief pre-meeting for the attendees in quadrant I can be beneficial to streamline the plenary session. Discussions in quadrant II indicate that the attendees spent less effort than themselves, whereas the discussions are longer than those presented by others. In such cases, redundant information may be included in these discussions. Attendees in this section can prepare a list of key points and share it in advance to save the meeting time. Similarly, discussions in quadrant III indicate succinct minor issues, characterized by less effort and a shorter discussion length. Finally, while emphasized by the attendee, quadrant IV discussions might be perceived as less significant by the audience. Therefore, managers should primarily focus on these topics. This recognition is crucial for ensuring that all pertinent issues, regardless of their perceived scale, are adequately addressed.

4.3.3. Constraint-Related Information Extraction

This section applied GPT-based constraint information extraction to provide content-based understanding to facilitate construction managers’ efficient contextual awareness. The analysis of discussions in quadrants II and IV in Figure 13 suggest potential issues with redundancy and insufficient depth, respectively. Figure 14 presents two typical cases in these quadrants. A “get_summary” function was developed to query associated topic contents. The GPT-4 model processed these contents and their preceding three and subsequent two topics through a data pipeline.

The utility of the GPT-based model is highlighted when addressing redundant discussions. As shown in Figure 13, the DWM constraint from Mr. T lies in quadrant II. Dialogs from Mr. T are underlined in the input section of Figure 14a. Mr. T mentioned multiple details regarding welding, painting, tying rebar, and organizing the workspace. He only provided reminders for various subcontractors, and no critical issues were discussed. Hence, from the manager’s perspective, the only case that should be considered is that all these minor issues have been communicated to the corresponding stakeholders, and no further questions have been raised. As such, the structured outputs from GPT accurately captured all details from the long dialogs. The model identified the cause of the issue, and the solutions were also extracted from the contextual topics. This enables managers to quickly grasp all the minor problems and communicate them appropriately, avoiding the need for extensive retrospection during the meeting.

Similarly, the model effectively identifies insufficient discussions as an early warning. An instance of this is observed in Mr. ZH’s discussion on safety constraints, located in quadrant IV of Figure 13. As presented in Figure 14b, the project team is resolving the safety issue around the elevator shaft. Here, Mr. ZH reminded the team that the steel form should stretch out and, therefore, cannot fully cover the shaft. However, no exact solution was proposed after his suggestion. The model succinctly captures the core issue around the elevator shaft and the associated risks while highlighting the absence of a proposed solution. To sum up, the GPT-based information extraction clarifies complex and redundant information, tracks critical decisions and progress, and assists in diagnosing potential risks and safety hazards. This approach streamlines the management process, ensuring that attention is focused on the most pertinent issues and facilitating more effective decision making. Project managers can apply the tool to rapidly locate the key constraints from meetings and develop targeted strategies based on their discussion status. However, as the method grasps constraints’ immediate impacts and solutions in real time, the long-term effects of these constraints may be underestimated. Therefore, it is also suggested that the project management team focus more on the plan’s impacts in the long run.

5. Framework Validation

This empirical study’s results were evaluated through two phases to establish research validity. Phase one used the statistical performance metrics to evaluate the BERT constraint classifier. Phase two conducted a questionnaire to evaluate GPT-based topic segmentation and information extraction. These approaches collectively established the validity of the proposed framework.

As shown in Figure 10, the BERT model has a test accuracy of 0.728, precision of 0.756, recall of 0.728, and F1 score of 0.730. In combining these metrics with the test confusion matrix in Figure 11, it can be concluded that the BERT model effectively identifies the pattern and semantic features of different constraint discussions.

The authors developed a questionnaire survey to assess the GPT outputs. We invited 25 construction professionals, affiliated with third-party industry organizations, to gauge nine aspects of the GPT models, as indicated in Table 4. Seventeen useful answers were collected and analyzed manually by the authors. They have an average of 10.6 years of work experience in the construction industry. Their job roles include chief engineer, project manager, technician, production manager, cost engineer, foreman, and technical leader. The authors randomly selected ten constraint discussions. For each discussion, we presented the professionals with the original dialogs, topic segmentation results, the BERT constraint prediction, and GPT information extraction outputs to evaluate the overall performances of the topic segmentation, BERT classification, and constraint information extraction.

Table 4 suggests that respondents generally evaluated all aspects of the model positively based on their median responses. Respondents found the topic segmentation results accurate, highly consistent, and extensible. They also claimed that the GPT-4 model outputs are representative, clear, consistent, and extensible. The respondents concluded that the overall outputs are comprehensive, accurate, and valid in capturing all constraint factors. In some instances, misclassifications with incorrect root causes were noted, impacting the conciseness and applicability of the outputs. The comments align with the model’s classification results, as misclassification results can be observed from several constraints. Nevertheless, all respondents acknowledged this framework’s validity and potential contributions, with no negative response (scale 5–7) in any identified criteria.

6. Conclusions

This research applied an approach to unveil potential constraints from construction planning discussions, quantify the discussion sufficiency among meeting attendees, and extract key information. This study introduced an attendee-level analysis of discussion adequacy, using empirical data to measure discussion sufficiency. Furthermore, the prompt-based information extraction evaluated each constraint discussion’s root cause, impact, and solutions. The research methodology relies on utilizing construction planning meeting transcripts, which are ubiquitous in construction projects. Existing case studies demonstrated similar approaches that apply meeting minutes to identify and analyze planning constraints [7,58]. Therefore, it can be reasonably concluded that while the specific outcomes of the constraint information extraction may vary across different projects, the implementation of our methodology is consistent across various project scenarios. Regarding the research methods, a recent study applied similar BERT models for construction document topic classification [12]. The capability of GPT models in summarizing construction documents has also been confirmed by existing research [59]. Hence, the proposed method exhibits high generalizability, and the insights from this research are valuable to a broader construction industry.

The results show that BERT effectively identifies various linguistic features of constraint discussions with 72.8% accuracy on the test set. Compared with traditional machine learning benchmarks, the improvements of BERT regarding accuracy, precision, recall, and F1 score are 8.8%, 30.6%, 27.8%, and 30.0%. Given the test result from a real-world case project, this research identified the communication patterns for different constraints. Topics such as construction safety, labor issues, and external conditions are often keyword-oriented. The constraint discussion often includes specific terminologies or central sentences, making the classification process straightforward. While discussions related to design and working method issues or prerequisite readiness, complex jargon, and implicit inferences are often included, these one-of-a-kind discussions posed challenges to BERT’s fine-tuning, leading to suboptimal classification. Despite this, the model’s overall competency was confirmed, suggesting a need for further integration of domain-specific knowledge with language models to enhance practical robustness and generalizability.

This study also revealed significant occupational characteristics in constraint discussions within planning meetings. Management team members tended to have a balanced focus on various constraints, while foremen and subcontractors concentrated on more practical issues. This disparity highlights the need for better alignment between managers’ expectations and foremen’s goals. In addition, the discussion sufficiency metric identified two types of discussions that can be further optimized: one with potential redundancy, and the other may suffer from insufficient discussion and lack of critical information. The findings highlight the implicit risks concealed in constraint discussions.

Furthermore, the GPT-based information extraction tool effectively captured critical aspects of each constraint from unstructured texts. GPT-4 can accurately review lengthy conversations and effectively deliver retrieved information. Compared with generative summarization approaches, the extractive approach reduces LLMs’ potential bias and hallucination issues. Evaluation results from construction professionals demonstrate the effectiveness and versatility of a prompt-based approach for construction meeting analysis, setting up a baseline for extensive LLM studies on planning meetings.

Our study follows a similar method of using meeting minutes to identify planning constraints, as seen in earlier research [7,60]. In addition, another case study of a bridge project demonstrated constraint frequency and distribution patterns similar to this study [58]. This similarity suggests that our methods can be applied across various project types, highlighting the generalizability of our proposed framework. Furthermore, this research demonstrates several advancements over existing studies. Regarding planning constraint management, existing studies often rely on manual or semi-automated approaches to identify constraint items from fragmented planning materials [5,26]. This research first applied language models to automate constraint identification and analysis. Regarding topic classification using BERT in the construction domain, recent research achieved a mean accuracy of 78% based on a three-class problem [61]. This research advanced this effort by introducing a nine-class classifier with a similar testing performance. Our work could potentially set a new benchmark for multi-class classification using BERT models in the construction field.

The research findings offer practical implications for construction project management. This study introduced a tool to identify, classify, and diagnose constraints in unstructured meeting discussions. The occupational characteristics of discussions suggest the need for a more effective communication channel besides the plenary meeting. The results recommend that professionals use discussion sufficiency analysis to better organize meeting agendas by considering constraints and attendee patterns. Construction managers can improve planning efficiency by compiling discussion logs of constraints, enabling them to anticipate future issues. The proposed framework eventually promotes a continuous improvement cycle in eliminating planning constraints. Furthermore, the findings contribute to the division of responsibility regarding constraint removal under different project delivery frameworks, facilitating policymakers to adopt more effective strategies for each project delivery method.

In summary, this research contributes three-fold to the body of knowledge. First, it pioneered the application of BERT models to classify constraint discussions from real-world meeting transcripts. Second, it proposed a new metric, discussion sufficiency, to depict constraint discussion preferences by different attendees, and identified redundancy and insufficiency. Third, the innovative prompt-based information extraction tool based on GPT extracts key components to provide structured outputs, which helps establish a standardized constraint log from fragmented dialogs.

This research has several limitations that can also lead to future research. The meeting transcripts were collected from a single case project. Applying the research framework to various projects is recommended to enhance its generalizability. The project scope, communication style of stakeholders, and local regulations can affect the constraint discussion frequency and sufficiency. In addition, as this research applied few-shot learning without fine-tuning the GPT models, they occasionally generated incorrect constraint judgments during information extraction. Using a more sophisticated information extraction framework, such as the retrieval-augmented generation (RAG) architecture, can facilitate the deployment of the language models in production environments [57]. Future research is recommended to explore the generalizability of the proposed framework by applying it to various construction project types and scales. Moreover, incorporating a domain knowledge base such as an ontology may further enhance the practical robustness and generalizability of the BERT models and alleviate potential ethical issues raised by applying generative AI technologies.

Author Contributions

Conceptualization, M.L. and C.H.; methodology, M.L., B.Y., and C.H.; validation, L.T. and J.H.; formal analysis, C.H. and L.G.; resources, M.L.; writing—original draft preparation, C.H.; writing—review and editing, C.H. and M.L.; visualization, C.H. and L.G.; supervision, M.L. and L.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article material; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

McKinsey & Company. Increasing Transparency in Megaproject Execution. Available online: https://www.mckinsey.com/capabilities/operations/our-insights/increasing-transparency-in-megaproject-execution#/ (accessed on 2 July 2024).
Lagos, C.I.; Alarcón, L.F. Assessing the Relationship between Constraint Management and Schedule Performance in Chilean and Colombian Construction Projects. J. Manag. Eng. 2021, 37, 04021046. [Google Scholar] [CrossRef]
Ottesen, J.L.; Martin, G.A. Bare Facts and Benefits of Resource-Loaded CPM Schedules. J. Leg. Aff. Disput. Resolut. Eng. Constr. 2019, 11, 02519001. [Google Scholar] [CrossRef]
Wu, C.; Li, X.; Jiang, R.; Guo, Y.; Wang, J.; Yang, Z. Graph-Based Deep Learning Model for Knowledge Base Completion in Constraint Management of Construction Projects. Comput. Civ. Infrastruct. Eng. 2022, 38, 702–719. [Google Scholar] [CrossRef]
Javanmardi, A.; Abbasian-Hosseini, S.A.; Liu, M.; Hsiang, S.M. Improving Effectiveness of Constraints Removal in Construction Planning Meetings: Information-Theoretic Approach. J. Constr. Eng. Manag. 2020, 146, 04020015. [Google Scholar] [CrossRef]
He, C.; Liu, M.; Alves, T.C.L.; Scala, N.M.; Hsiang, S.M. Prioritizing Collaborative Scheduling Practices Based on Their Impact on Project Performance. Constr. Manag. Econ. 2022, 40, 618–637. [Google Scholar] [CrossRef]
Hamzeh, F.; Zankoul, E.; Sakka, F. El Removing Constraints to Make Tasks Ready in Weekly Work Planning. Procedia Eng. 2016, 164, 68–74. [Google Scholar] [CrossRef]
Wang, J.; Shou, W.; Wang, X.; Wu, P. Developing and Evaluating a Framework of Total Constraint Management for Improving Workflow in Liquefied Natural Gas Construction. Constr. Manag. Econ. 2016, 34, 859–874. [Google Scholar] [CrossRef]
Chen, G.; He, C.; Hsiang, S.; Liu, M.; Li, H. A Mechanism for Smart Contracts to Mediate Production Bottlenecks under Constraints. In Proceedings of the 31st Annual Conference of the International Group for Lean Construction (IGLC), Lille, France, 26 June–2 July 2023; International Group for Lean Construction (IGLC): Lille, France, 2023; pp. 1232–1244. [Google Scholar]
Wang, N.; Issa, R.R.A.; Anumba, C.J. Transfer Learning-Based Query Classification for Intelligent Building Information Spoken Dialogue. Autom. Constr. 2022, 141, 104403. [Google Scholar] [CrossRef]
Zheng, J.; Fischer, M. Dynamic Prompt-Based Virtual Assistant Framework for BIM Information Search. Autom. Constr. 2023, 155, 105067. [Google Scholar] [CrossRef]
Moon, S.; Chi, S.; Im, S.B. Automated Detection of Contractual Risk Clauses from Construction Specifications Using Bidirectional Encoder Representations from Transformers (BERT). Autom. Constr. 2022, 142, 104465. [Google Scholar] [CrossRef]
Olivieri, H.; Seppänen, O.; Alves, T.D.C.L.; Scala, N.M.; Schiavone, V.; Liu, M.; Granja, A.D. Survey Comparing Critical Path Method, Last Planner System, and Location-Based Techniques. J. Constr. Eng. Manag. 2019, 145, 04019077. [Google Scholar] [CrossRef]
He, C.; Liu, M.; Zhang, Y.; Wang, Z.; Hsiang, S.M.; Chen, G.; Li, W.; Dai, G. Space—Time—Workforce Visualization and Conditional Capacity Synthesis in Uncertainty. J. Manag. Eng. 2023, 39, 04022071. [Google Scholar] [CrossRef]
Ballard, H.G. The Last Planner System of Production Control. Ph.D. Thesis, University of Birmingham, Birmingham, UK, 2000. [Google Scholar]
Koskela, L. Management of Production in Construction: A Theoretical View. In Proceedings of the 7th Annual Conference of the International Group for Lean Construction, Berkeley, CA, USA, 26–28 July 1999; pp. 241–252. [Google Scholar]
Ballard, G.; Howell, G. An Update on Last Planner. In Proceedings of the 1th Annual Conference of the International Group for Lean Construction, Blacksburg, VA, USA, 22–24 July 2003. [Google Scholar]
Lindhard, S.; Wandahl, S. Improving the Making Ready Process—Exploring the Preconditions to Work Tasks in Construction. In Proceedings of the 20th Annual Conference of the International Group for Lean Construction, San Diego, CA, USA, 18–20 July 2012. [Google Scholar]
Mincks, W.R.; Johnston, H. Construction Jobsite Management, 4th ed.; Cengage Learning: Boston, MA, USA, 2017; ISBN 9781285224930. [Google Scholar]
He, C.; Liu, M.; Hsiang, S.M.; Pierce, N. Synthesizing Ontology and Graph Neural Network to Unveil the Implicit Rules for US Bridge Preservation Decisions. J. Manag. Eng. 2024, 40, 04024007. Available online: https://ascelibrary.org/doi/10.1061/JMENEA.MEENG-5803 (accessed on 6 November 2023).
Gorse, C.A.; Emmitt, S. Informal Interaction in Construction Progress Meetings. Constr. Manag. Econ. 2009, 27, 983–993. [Google Scholar] [CrossRef]
Javanmardi, A.; He, C.; Hsiang, S.M.; Abbasian-Hosseini, S.A.; Liu, M. Enhancing Construction Project Workflow Reliability through Observe–Plan–Do–Check–React Cycle: A Bridge Project Case Study. Buildings 2023, 13, 2379. [Google Scholar] [CrossRef]
Ponton, H.; Osborne, A.; Thompson, N.; Greenwood, D. The Power of Humour to Unite and Divide: A Case Study of Design Coordination Meetings in Construction. Constr. Manag. Econ. 2020, 38, 32–54. [Google Scholar] [CrossRef]
Pousette, A.; Törner, M. Effects of Systematic Work Preparation Meetings on Safety Climate and Psychosocial Conditions in the Construction Industry. Constr. Manag. Econ. 2016, 34, 355–365. [Google Scholar] [CrossRef]
Zegarra, O.; Alarcón, L.F. Coordination of Teams, Meetings, and Managerial Processes in Construction Projects: Using a Lean and Complex Adaptive Mechanism. Prod. Plan. Control 2019, 30, 736–763. [Google Scholar] [CrossRef]
Masoetsa, T.G.; Ogunbayo, B.F.; Aigbavboa, C.O.; Awuzie, B.O. Assessing Construction Constraint Factors on Project Performance in the Construction Industry. Buildings 2022, 12, 1183. [Google Scholar] [CrossRef]
Liu, M.; Jin, Y.; Lu, Y.; Chen, M.; Hou, B.; Chen, W.; Wen, X.; Yu, X. A Wellbore Stability Model for a Deviated Well in a Transversely Isotropic Formation Considering Poroelastic Effects. Rock Mech. Rock Eng. 2016, 49, 3671–3686. [Google Scholar] [CrossRef]
Hasan, S.; Sacks, R. Integrating BIM and Multiple Construction Monitoring Technologies for Acquisition of Project Status Information. J. Constr. Eng. Manag. 2023, 149, 04023051. [Google Scholar] [CrossRef]
Wu, C.; Cui, J.; Xu, X.; Song, D. The Influence of Virtual Environment on Thermal Perception: Physical Reaction and Subjective Thermal Perception on Outdoor Scenarios in Virtual Reality. Int. J. Biometeorol. 2023, 67, 1291–1301. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Qu, T.; Yao, T.; Gong, Y.; Bian, X. Research on the Application of BIM Technology in Intelligent Building Technology. Appl. Comput. Eng. 2024, 61, 29–34. [Google Scholar] [CrossRef]
Zhang, H.; Wang, L.; Xu, J. Using the Equal Sentiment Enhancement with Distribution (ESED) Algorithm in Text Sentiment Analysis: Predicting Customers Purchasing Intention (CPI) for IT Services on Freelance Platforms. In Proceedings of the Third International Conference on Electronic Information Engineering and Data Processing (EIEDP 2024), Kuala Lumpur, Malaysia, 15–17 March 2024; Volume 13184, p. 131841G. [Google Scholar]
Chung, S.; Moon, S.; Kim, J.; Kim, J.; Lim, S.; Chi, S. Comparing Natural Language Processing (NLP) Applications in Construction and Computer Science Using Preferred Reporting Items for Systematic Reviews (PRISMA). Autom. Constr. 2023, 154, 105020. [Google Scholar] [CrossRef]
Al Qady, M.; Kandil, A. Concept Relation Extraction from Construction Documents Using Natural Language Processing. J. Constr. Eng. Manag. 2010, 136, 294–302. [Google Scholar] [CrossRef]
Salama, D.A.; El-Gohary, N.M. Automated Compliance Checking of Construction Operation Plans Using a Deontology for the Construction Domain. J. Comput. Civ. Eng. 2013, 27, 681–698. [Google Scholar] [CrossRef]
Shuai, L.; Hubo, C.; Kamat, V.R. Integrating Natural Language Processing and Spatial Reasoning for Utility Compliance Checking. J. Constr. Eng. Manag. 2016, 142, 4016074. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 2017, 5999–6009. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. [Google Scholar]
Baek, S.; Han, S.H.; Jung, W. Automated Identification of Active Players for International Construction Market Entry Using Natural Language Processing. J. Manag. Eng. 2023, 39, 04023025. [Google Scholar] [CrossRef]
OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S. GPT-4 Technical Report. arXiv, 2023; arXiv:2303.08774. [Google Scholar]
OpenAI. Introducing ChatGPT. Available online: https://openai.com/blog/chatgpt (accessed on 6 November 2023).
Kim, Y.; Guo, L.; Yu, B.; Li, Y. Can ChatGPT Understand Causal Language in Science Claims? In Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, Toronto, ON, Canada, 14 July 2023; Association for Computational Linguistics (ACL): Stroudsburg, PA, USA, 2023; pp. 379–389. [Google Scholar]
Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv. 2023, 55, 1–46. [Google Scholar] [CrossRef]
Amer, F.; Jung, Y.; Golparvar-Fard, M. Construction Schedule Augmentation with Implicit Dependency Constraints and Automated Generation of Lookahead Plan Revisions. Autom. Constr. 2023, 152, 104896. [Google Scholar] [CrossRef]
Xu, F.; Nguyen, T.; Du, J. Augmented Reality for Maintenance Tasks with ChatGPT for Automated Text-to-Action. J. Constr. Eng. Manag. 2024, 150, 04024015. [Google Scholar] [CrossRef]
Lee, P.; Bubeck, S.; Petro, J. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. N. Engl. J. Med. 2023, 388, 1233–1239. [Google Scholar] [CrossRef] [PubMed]
Yu, L.; Reichle, E.D.; Jones, M.; Liversedge, S.P. RadicalLocator: A Software Tool for Identifying the Radicals in Chinese Characters. Behav. Res. Methods 2015, 47, 826–836. [Google Scholar] [CrossRef] [PubMed]
Sun, A. Jieba Chinese Text Segmentation. Available online: https://github.com/fxsjy/jieba (accessed on 12 November 2023).
Suriyawongkul, A. Stopwords ISO. Available online: https://github.com/stopwords-iso/stopwords-iso (accessed on 12 November 2023).
Yu, B.; Li, Y.; Wang, J. Detecting Causal Language Use in Science Findings. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 4664–4674. [Google Scholar] [CrossRef]
Google Research BERT. Available online: https://github.com/google-research/bert (accessed on 8 November 2023).
Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
Hamzeh, F.R.; Zankoul, E.; Rouhana, C. How Can ‘Tasks Made Ready’ during Lookahead Planning Impact Reliable Workflow and Project Duration? Constr. Manag. Econ. 2015, 33, 243–258. [Google Scholar] [CrossRef]
Rawte, V.; Chakraborty, S.; Pathak, A.; Sarkar, A.; Tonmoy, S.M.T.I.; Chadha, A.; Sheth, A.; Das, A. The Troubling Emergence of Hallucination in Large Language Models—An The Troubling Emergence of Hallucination in Large Language Models—An Extensive Definition, Quantification, and Prescriptive Remediations. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–10 December 2023. [Google Scholar]
Huggingface. Huggingface Transformers. Available online: https://github.com/huggingface/transformers (accessed on 2 June 2024).
Chen, G.; Liu, M.; Zhang, Y.; Wang, Z.; Hsiang, S.M.; He, C. Using Images to Detect, Plan, Analyze, and Coordinate a Smart Contract in Construction. J. Manag. Eng. 2023, 39, 04023002. [Google Scholar] [CrossRef]
Ma, H.; Zhang, H.; Chang, P. 4D-Based Workspace Conflict Detection in Prefabricated Building Constructions. J. Constr. Eng. Manag. 2020, 146, 04020112. [Google Scholar] [CrossRef]
Chen, J. Demystifying Large Language Models: Unraveling the Mysteries of Language Transformer Models, Build from Ground up, Pre-Train, Fine-Tune and Deployment, 1st ed.; James Chen: Beijing, China, 2024. [Google Scholar]
Javanmardi, A.; Abbasian-Hosseini, S.A.; Hsiang, S.M.; Liu, M. Constraint Removal and Work Plan Reliability: A Bridge Project Case Study. In Proceedings of the 26th Annual Conference of the International Group for Lean Construction, Chennai, India, 18–22 July 2018; pp. 807–817. [Google Scholar]
Nyqvist, R.; Peltokorpi, A.; Seppänen, O. Can ChatGPT Exceed Humans in Construction Project Risk Management? Eng. Constr. Archit. Manag. 2024, 31, 223–243. [Google Scholar] [CrossRef]
Choo, H.J.; Tommelein, I.D.; Ballard, G.; Zabelle, T.R. Workplan: Constraint-Based Database for Work Package Scheduling. J. Constr. Eng. Manag. 1999, 125, 151–160. [Google Scholar] [CrossRef]
Pham, H.T.T.L.; Han, S. Natural Language Processing with Multitask Classification for Semantic Prediction of Risk-Handling Actions in Construction Contracts. J. Comput. Civ. Eng. 2023, 37, 04023027. [Google Scholar] [CrossRef]

Figure 1. Research framework.

Figure 2. Radicals for the Chinese character “mother” (adapted from [46]).

Figure 3. Sample text preprocessing procedures.

Figure 4. Data resampling. (a) Full dataset before resampling; (b) training set after resampling.

Figure 5. Input dataset for GPT-based constraint component extraction.

Figure 6. Sample meeting transcripts.

Figure 7. Prompts for topic segmentation.

Figure 8. Prompts for constraint component extraction.

Figure 9. Number of dialogs per constraint.

Figure 10. Constraint classification model performance.

Figure 11. Confusion matrix for BERT classifier.

Figure 12. Constraint discussion frequency by meeting attendees.

Figure 13. Discussion sufficiency.

Figure 14. GPT-based constraint information extraction samples. (a) Sample topic with redundant information; (b) sample topic with insufficient discussion.

Table 1. Meeting descriptions.

Meeting Type	Host	Attendees (Min, Max, Average)	Duration in Minutes (Min, Max, Average)	General Agenda
Daily huddle	Production Manager (GC)	(3, 22, 10)	(16, 71, 45)	1. Violations of regulations at the construction site. 2. Issues with civilized construction. 3. Equipment, labor, and material coordination. 4. Updating task completion. Identifying reasons for unfinished tasks. Making work plans for the next day.
Weekly meeting	Project Manager (GC)	(14, 22, 20)	(90, 168, 120)	1. Updating completion of the past week (project manager). 2. Identifying issues for each department (project manager). 3. Planning schedule for each department (project manager). 4. Summarizing issues encountered by production departments (production manager). 5. Discussing technical issues (technical directors). 6. Financial, contractual, and legal issues (business departments). 7. Open discussion.

Table 2. BERT hyperparameters in grid search.

Hyperparameter	Description	Range of Grid Search	Best Value
learning_rate	The step size at each iteration while moving to minimize the loss function.	(5 × 10⁻⁶, 1 × 10⁻⁵, 5 × 10⁻⁵, 1 × 10⁻⁴)	1 × 10⁻⁵
epochs	The number of times the algorithm will work through the entire training dataset.	(5, 8, 10)	8
batch_size	The number of training examples used in one iteration.	(8, 16, 32)	16

Table 3. High-frequency constraints by attendees.

Management Team		Foreman Team
Name and Job Role	High Frequent Constraint (Frequency)	Name and Job Role	High Frequent Constraint (Frequency)
Mr. LB Production manager	Safety (0.36)	Mr. D Foreman from GC	EA (0.31)
Mr. T Safety manager	Safety (0.47)	Mr. F Technician	EA (0.46)
Mr. ZH Chef engineer from GC	DWM (0.39)	Mr. DU Foreman from steel structure subcontractor	SA (0.43)
		Mr. S Foreman from GC	SA (0.49)
		Mr. G Foreman from waterproofing subcontractor	DWM (0.44)
		Mr. M Foreman from GC	EA (0.45)

Table 4. Summary of questionnaire survey results.

Evaluation Criteria	Question	Mean ¹	Standard Deviation	Result Interpretation Based on Median
Representation	How representative are the model outputs?	2.41	0.97	Agree
Comprehensiveness	Does the dialog segmentation present a comprehensive discussion topic?	2.76	1.06	Somehow agree
Classification	Does the model accurately identify the true constraint?	2.71	1.13	Somehow agree
Content validity	Does the output correctly extract the cause, impact, and remedy solutions for the identified constraint?	2.53	1.09	Somehow agree
Clarity	Is the output easy to understand and free from confusion?	2.35	0.90	Agree
Conciseness	Any unnecessary or redundant information?	2.59	0.97	Somehow agree
Consistency	Does the output maintain a uniform and dependable framework over time?	2.41	0.77	Agree
Applicability	Are the results applicable to specific practical scenarios?	2.82	1.15	Somehow agree
Extendibility	Are the results extendable to other project settings?	2.71	1.13	Agree

¹ Seven-point Likert scale: 1 = most favorable; 7 = least favorable.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

He, C.; Yu, B.; Liu, M.; Guo, L.; Tian, L.; Huang, J. Utilizing Large Language Models to Illustrate Constraints for Construction Planning. Buildings 2024, 14, 2511. https://doi.org/10.3390/buildings14082511

AMA Style

He C, Yu B, Liu M, Guo L, Tian L, Huang J. Utilizing Large Language Models to Illustrate Constraints for Construction Planning. Buildings. 2024; 14(8):2511. https://doi.org/10.3390/buildings14082511

Chicago/Turabian Style

He, Chuanni, Bei Yu, Min Liu, Lu Guo, Li Tian, and Jianfeng Huang. 2024. "Utilizing Large Language Models to Illustrate Constraints for Construction Planning" Buildings 14, no. 8: 2511. https://doi.org/10.3390/buildings14082511

APA Style

He, C., Yu, B., Liu, M., Guo, L., Tian, L., & Huang, J. (2024). Utilizing Large Language Models to Illustrate Constraints for Construction Planning. Buildings, 14(8), 2511. https://doi.org/10.3390/buildings14082511

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Utilizing Large Language Models to Illustrate Constraints for Construction Planning

Abstract

1. Introduction

2. Background and Literature Review

2.1. Constraint Removal in Construction Meetings

2.2. Natural Language Processing for Construction Documents

3. Research Method

3.1. Discussion Topic Segmentation

3.2. Constraint Classification Using BERT

3.2.1. Data Preprocessing

3.2.2. Constraint Classification Model

3.2.3. Model Evaluation

3.3. Prompt-Based Constraint Analyzing and Tracking

3.3.1. Topic Discussion Sufficiency

3.3.2. GPT-Based Constraint Information Extraction

4. Experiments and Evaluations

4.1. Data Collection

4.2. Model Applications

4.3. Results and Discussions

4.3.1. Constraint Classification with BERT

4.3.2. Constraint Discussion Patterns

4.3.3. Constraint-Related Information Extraction

5. Framework Validation

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI