An Integrated Planning and Control Framework (IPCF) for Construction Projects—Step 1: Development of the Construction Data Hub (CDH)

Ghazal, Mai; Hammad, Ahmed

doi:10.3390/app15094682

Open AccessArticle

An Integrated Planning and Control Framework (IPCF) for Construction Projects—Step 1: Development of the Construction Data Hub (CDH)

by

Mai Ghazal

^*

and

Ahmed Hammad

Construction Engineering and Management, Department of Civil and Environmental Engineering, University of Alberta, Edmonton, AB T6G 1H9, Canada

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(9), 4682; https://doi.org/10.3390/app15094682

Submission received: 9 March 2025 / Revised: 15 April 2025 / Accepted: 20 April 2025 / Published: 23 April 2025

(This article belongs to the Section Civil Engineering)

Download

Browse Figures

Versions Notes

Abstract

Construction projects generate a significant volume of scattered data in various formats. However, having a large amount of data is insufficient; there is a need to obtain the appropriate metadata to enable extracting useful knowledge from it. Therefore, professionals need a consistent data acquisition model to gather comprehensive data from multiple projects and organizations in a format ready for applying machine learning. This research proposes an Integrated Planning and Control Framework (IPCF) to implement the concept of “From Data to Decision (FD2D)” in the construction industry. The first step of the framework is the development of the Construction Data Hub (CDH). The CDH seeks to collect data from twelve dimensions that impact the project’s planning and control. It relies on using the industry-accepted concept of work packages, which is the optimum level of detail for data acquisition. To validate the CDH, a machine learning model that utilizes the data collected through the CDH is developed to analyze the factors influencing construction project profit. The study revealed six significant profit-influencing factors. These factors might assist estimators in predicting profit margins during the early estimation stage, instead of relying on intuition or uniform rates, which are not always reliable methods.

Keywords:

data acquisition model; data warehouse; data management; from data to decision (FD2D); work breakdown structure (WBS); product breakdown structure (PBS); knowledge management; construction management

1. Introduction

Construction projects are well-known for their risk and unpredictability [1]. This is especially true during the project’s early stages, such as the feasibility and planning phases, as the amount of available information is severely limited. Construction projects continuously generate vast quantities of data throughout the different stages, including design data, schedules, financial data, enterprise resource planning (ERP) systems, and other information, as noted by Bilal et al. [2]. Knowledge management in the construction industry involves an integrated approach that combines information technology with social techniques [3]. Access to and management of the right data at the right time play a crucial role in determining the success of a project. Lee et al. [4] indicated that managing construction project-related knowledge can be challenging as it is typically dispersed among various stakeholders, including owners, supply chains, organizations, employees, and customers, making it difficult to conduct comprehensive data analysis.

Effective management of project factors is crucial for successful construction projects [5]. In today’s construction industry, construction professionals rely on scheduling and progress monitoring to ensure the timely completion of projects [6]. However, during the early stages of the project, the amount of detailed available data related to the project is limited. This makes preparing accurate schedules and cost estimates nearly impossible and adversely affects resource management, production rates, execution strategies, and overall decision-making. Collecting large amounts of high-quality data that fully represent the real construction world has significant limitations [7]. Currently, the significant challenges facing machine learning (ML) are insufficient training data, unstructured data, low-quality data, and training data with irrelevant features, which lead to underfitting and overfitting [8,9,10]. Similarly, ref. [11] specifically highlighted the issue of the ineffective utilization of machine learning algorithms resulting from proper data limitations and information constraints. Various studies have used metaheuristics algorithms such as genetic algorithms (GA) to solve scheduling problems that arise from limited data—some even developing complex models to address real requirements, such as considering multiple projects [12], changing resource constraints [13], and simultaneously considering resource-constrained and time–cost trade-off problems [14]. However, Wang et al. [15] identify the difficulty of data collection as a significant flaw in these studies; therefore, it is difficult to apply these methods to practical projects. Accordingly, ML and heuristic models should consider two major issues: the first is a structured data source to provide the time, cost, resources, greenhouse gas (GHG) emissions, etc., for project planning; the second is a suitable transfer method to move the data from the data source to the mathematical model such as BIM.

Pan and Zhang [16] highlighted that, as a field of computer science, AI enables computers to perceive and learn input much like humans do. It encompasses knowledge representation, reasoning, problem-solving, and planning to address complex and ambiguous problems in a deliberate, intelligent, and adaptive manner. Investment in AI is experiencing rapid growth, with machine learning playing a significant role in exploring robust data from multiple sources and leveraging insights for informed decision-making. They also discussed the current gap in the adoption of AI techniques within construction projects, as it still lags behind other industries despite the exponential increase in generated data. Consequently, there is substantial interest in implementing various AI methods in the AEC industry to utilize the valuable opportunity presented by digital evolution for improved performance and profitability. One of the primary applications of artificial intelligence lies in the domain of “From Data to Decision” (FD2D). This term focuses on digital transformation and artificial intelligence from the data value chain to human value [17]. It involves developing and studying digital transformation and artificial intelligence applications across various academic disciplines and industrial sectors, and their impact on society. FD2D is a framework designed to convert diverse raw data into actionable insights using organized workflows. In construction, FD2D consists of three key steps: First is data acquisition, which standardizes and integrates scattered project information such as schedules, budgets, resources, and embodied carbon emissions. Second, knowledge extraction involves using analytics tools like AI and machine learning to identify patterns. Lastly, decision support translates these insights into actionable operational strategies, focusing on areas such as risk mitigation and resource optimization. The proposed Integrated Planning and Control Framework (IPCF) operationalizes FD2D’s first step by tackling the challenge of data fragmentation in construction. Specifically, IPCF’s Construction Data Hub (CDH) standardizes data collection across 12 dimensions (Section 4), facilitating downstream FD2D processes such as predictive modeling (Section 6). Therefore, IPCF is fundamental to FD2D’s objective of data-driven construction management.

Building Information Modeling (BIM) is a significant advancement in architecture, engineering, and construction (AEC). BIM technology creates accurate digital models of buildings, enhancing design analysis and control over manual processes. These models include precise geometry and data to support construction, fabrication, and procurement [18]. BIM also models a building’s lifecycle, enabling new design and construction capabilities and evolving project team roles. When effectively implemented, BIM leads to more integrated design and construction, resulting in higher quality buildings, reduced costs, and shorter project durations. BIM has emerged as a crucial tool for structuring and sharing project data among stakeholders, particularly in design and coordination [16]. However, BIM’s focus on geometric and procedural data often neglects essential dimensions such as labor productivity, environmental impact, and social metrics, which remain scattered across disparate systems. The proposed CDH enhances BIM’s capabilities by integrating these non-geometric dimensions into a unified framework, enabling comprehensive analysis. For instance, while BIM may track material quantities, the CDH associates this data with cost, carbon emissions, and supplier histories, bridging gaps in early-stage decision-making.

This research focuses on resolving the first issue found in the previous models: collecting and structuring the data model. It employs the concept of FD2D in the construction industry to develop a data-driven decision-making tool called Integrated Planning and Control Framework (IPCF), by proposing a methodology (Figure 1) that transforms the significant volume of scattered construction data in various formats into valuable decisions. First, by collecting comprehensive data and appropriate metadata from projects during different construction project stages. Second, linking these data points to one or more construction project phases, then developing a data acquisition model called Construction Data Hub (CDH) to collect data from multiple projects and organizations in a format ready for applying machine learning and artificial intelligence techniques. The CDH uses mapping tables to standardize terminologies such as management levels, project phases, and categories. Lastly, utilizing the collected data as input to knowledge-based decision-making (KBDM) to derive valuable decisions (discovered knowledge) to improve future projects. The last step presents the feedback loop for continuous improvements.

2. Literature Review

2.1. Background of the Work Breakdown Structure (WBS) Concept

In 2000, the US Department of Defense created a special form of WBS for defense materiel items, enabling them to use available data to build historic files, which can then aid in the future development of similar defense materiel cost estimates [19]. This special WBS collects data for the seven types of military systems in a common framework, which is specific in the first three levels of each system. A WBS dictionary that includes all the elements also ensures consistent interpretations of the content of each element from project to project. The third edition of the PMBOK Guide [20], published by the Project Management Institute (PMI), defines WBS as follows: “a deliverable-oriented hierarchical decomposition of the work to be executed by the project team to accomplish the project objectives and create the required deliverables. It organizes and defines the total scope of the project. Each descending level represents an increasingly detailed definition of the project work. The WBS is decomposed into work packages”. Contained within WBS is a product breakdown structure, which usually consists of the specified prime product(s) at the higher level, and the systems, segments, subsystems, etc., at consecutive lower levels [21,22]. The PMBOK Guide also defines the organizational breakdown structure (OBS) and the resource breakdown structure (RBS). Whereas the OBS is “used to show which work components have been assigned to which organizational units” [20], the resources breakdown structure is “a logical and useful classification of the resources needed to accomplish a project’s objectives” [23].

2.2. Using Data Acquisition Models and Data Warehouses for the Collection of Historical Data

Numerous studies have applied data acquisition models and data warehousing to tackle scheduling, cost estimation, and resource planning in construction. For instance, Chau et al. [24] presented a construction management decision support system that integrates with a project’s material data warehouse, with a specific emphasis on inventory decision-making. Ahmad et al. [25] developed a data warehouse system that determines the most suitable residential site. Similarly, Fan et al. [26] proposed an equipment data warehouse integrated with a decision support system. Rujirayanyong and Shi [27] introduced a project-oriented data warehouse (PDW) for contractors that combines WBS and OBS structures for detailed management insights. In 2013, Park and Kim [28] developed a sewer infrastructure decision support system based on a data warehouse capturing data from installation, inspection, and renewal processes, aiding in strategic decision-making. Hammad et al. [29] developed a labor resources data warehouse to predict activity delays and estimate the use of labor resources, respectively. Ghazal and Hammad [30] developed a data acquisition model to collect cost data from completed projects to predict construction project cost overruns. Elkholosy et al. [31] designed a data acquisition model to track and store project data systematically. Their main goal is to create a labor resource forecasting model with data mining for future project use. Notably, it does not collect labor requirements at the work package level. Golabchi and Hammad [32] created a data acquisition system with a tracking feature to gather labor resource needs for various work packages from multiple projects to forecast resource utilization rates throughout the project.

As summarized in Table 1, existing data acquisition models primarily focus on isolated dimensions (e.g., labor, materials) and project-level data, with only Rujirayanyong and Shi [27] and Golabchi and Hammad [32] capturing work package (WP)-level details. Importantly, none incorporate emerging metrics such as GHG emissions or social impact, nor allow for dynamic updates for new dimensions. These gaps—limited scope, coarse granularity, and static frameworks—directly motivate the design of the CDH in Section 3, which standardizes WP-level data collection across 12 dimensions while facilitating flexible expansions (e.g., through modular SQLite architecture in Section 5).

2.3. Application of Machine Learning and Big Data Analysis in Construction

Machine learning is crucial for making construction “smart, “enabling site supervision, automatic detection, and intelligent maintenance [33]. The main benefit of machine learning is its ability to identify complex patterns and relationships within data sets that may be overlooked by humans [34]. By applying machine learning algorithms in construction project management, it becomes possible to reveal these patterns and correlations, allowing managers to enhance their decision-making and anticipate potential challenges [35,36]. Previous studies [37,38,39,40] illustrate that machine learning can provide crucial insights into various aspects of project management. These insights can assist in predicting project duration, anticipating cost overruns, optimizing resource use, and identifying key risk elements that could significantly affect project outcomes. This predictive capability empowers project managers to proactively address challenges and implement changes to enhance overall project performance.

Several studies have been carried out in the construction field that use machine learning for predictive modeling. For example, ref. [41] merges BIM, AI, and operational data for proactive planning. A case study of a 35,000 m² historical renovation in Rome shows that integrating BIM with AI enables predictive risk mitigation, cutting delays by 3% and optimizing resource efficiency. Results highlight improved time/cost control compared to traditional methods, showcasing the value of digital workflows in complex projects. Ref. [42] developed a hybrid model utilizing a deep forest algorithm designed to enhance HR recruitment in intelligent manufacturing companies. In a more recent study, Ebrahimi et al. [43] employed a combination of hybrid feature selection, machine learning, and particle swarm optimization to boost productivity in construction labor. They utilized four different predictive models alongside random forest techniques to evaluate labor performance effectively, aiming to optimize productivity and derive the maximum value from key influencing factors. A study developed by [44] used PSO and Machine Learning to reduce construction project management costs and create predictive models. Several algorithms, such as linear regression, Decision Trees, Support Vector Machines (SVM), Gradient Boosting, Random forests, K-nearest neighbors (KNN), and CNN Regression, were tested for project cost prediction accuracy. The regression ensemble outperformed individual algorithms in predicting accuracy.

2.4. Research Gaps

All the studies that implemented data acquisition models were specifically designed for the aim of the study. For instance, Hammad et al. [29], Elkholosy et al. [31], and Golabchi and Hammad [32] developed labor resource data acquisition models. Also, the models developed by Chau et al. [24], Ahmad et al. [25], Fan et al. [26], and Park and Kim [28] focused on decision-making for construction inventory, residential site and equipment management, and sewer infrastructure systems, respectively. Similarly, the data acquisition model proposed by Ghazal and Hammad [30] included the construction project’s cost overrun data. The models were perfectly employed, and the results obtained from utilizing them in knowledge discovery were satisfying, but these models lack genericity, and as implementing them in dissimilar cases is either not possible or needs modification. Rujirayanyong and Shi [27] presented a comprehensive project-oriented data warehouse (PDW) that combines WBS and OBS, but their PDW was dedicated to providing detailed information about management responsibilities. Construction organizations, therefore, require a generic data acquisition model that leads to a consistent data warehouse that can be utilized for different purposes. It should collect data at the project’s most efficient detailed work level (the work package level), similar to the models proposed by Hammad et al. [29] and Golabchi and Hammad [32]. Also, it should contain emerging dimensions such as greenhouse gas emissions (GHG), solid waste, productivity, risk, safety, social impact, etc.

This research addresses the gap found in the current literature by implementing a generic integrated data acquisition model that includes data from different dimensions. Moreover, it standardizes terminology by defining standard levels, phases, and subphases to enable the identification of meaningful metadata that can be used to capture data from multiple projects and organizations. In this research, the data acquisition model includes the GHG dimension in addition to time, cost, and resources, while other dimensions can be added in the future through a simple modification of the tool. This study also introduces the novel concept of storing the multi-dimensional data related to the construction work packages in key units. Presenting the data in key units facilitates the process of developing a WBS for future projects.

3. Methodology

This research proposes a data acquisition model called the construction data hub (CDH). The CDH seeks to collect construction project data from twelve dimensions that impact the project’s planning and control. It therefore uses the idea of standardizing the WBS form introduced by the US Department of Defense and expands upon the data acquisition model developed by Elkholosy et al. [31] which integrates the OBS and RBS and collects data from the four dimensions: planning and scheduling, cost estimating, cost control, labor resources at the project level (Level 1). This research proposes a further integration of the four structures: OBS, RBS, PBS, and WBS. It focuses on collecting multidimensional construction project data by adding eight new dimensions to the framework by Elkholosy et al. [31]. The additional dimensions are scope management, materials management, equipment management, progress measurement and performance evaluation, document management, risk management, environmental impact, and social impact. It also enhances the detailed work level aspect by focusing on collecting data related to the work package level (Level 3), not the project level (Level 1), as proposed in the study by Elkholosy et al. [31].

The research methodology consists of three phases. The first phase includes a comprehensive literature review divided into three sections: screening the background of the WBS concept, using data acquisition models and data warehouses to collect historical data, and identifying research gaps. The second phase identifies the twelve dimensions that impact a project’s planning and control processes and explains the importance of each dimension, then recognizes the data fields that should be tracked during the project to be stored in the CDH. This phase introduces three new terms/products in the construction project. The first is the multidimensional work package (MDWP), which is an enhanced version of the traditional work package (WP). This section illustrates the structure of the MDWP and its new data dimensions (attributes), making it more powerful than the conventional work package. The second product is the generic template for WBS, which defines standard levels, phases, subphases, and work packages to establish a comprehensive planning template that collects data consistently and meaningfully. The third product is the progress measurement system that uses predefined weighted progress activities for each of the MDWP types. The proposed progress measurement system introduces consistency in evaluating the performance of projects for more efficient analysis of collected data. The third phase presents a case study using actual project data to show the applicability of the framework in real-life situations. An overview of the methodology is presented in Figure 2.

4. Dimensions Impacting Project Planning and Control

To develop an inclusive framework for the data acquisition model, the authors sought to identify the dimensions that impact the planning and control of a project. In 2021, Taghinezhad et al. [45] performed a comprehensive literature review to identify the project management dimensions related to the successful delivery of transportation projects. They identified twelve dimensions: time management, cost management, quality control and inspection, environmental process, right of way and utilities, safety, outsourcing, value engineering, change orders, type of contract, workforce qualification, operation, and maintenance. This research employs the dimensions identified by Taghinezhad et al. [45] and performs further analysis of the applicability of these dimensions in construction projects in general and the availability of data related to these dimensions.

Based on the authors’ analysis of the literature and practical experience as project managers, this study examines twelve dimensions that need proper identification and standardization: scope management, planning and scheduling, cost estimating, cost control, labor resources, materials management, equipment management, progress measurement and performance evaluation, document management, risk management, environmental impact, and social impact. Two dimensions (quality and safety) were excluded from the framework as these dimensions are managed using different systems. After consulting industry experts and professionals on construction projects, who concurred with this list of dimensions, the authors developed the data acquisition model, suggesting some improvements for the current practices and identifying relevant metadata to be considered for each dimension while collecting the project’s data.

To summarize the selection criteria, a rigorous two-stage process guided the selection of the twelve dimensions for the CDH framework. First, a literature review of construction project management systems, such as the PMBOK and domain-specific studies [20,45], identified recurring critical factors across successful projects. Second, these dimensions were validated through consultations with industry experts. Accordingly, dimensions were included based on three criteria: (1) empirical evidence of influencing project success (e.g., cost/time performance), (2) alignment with emerging industry priorities (e.g., environmental/social dimensions), and (3) feasibility of standardized data collection (e.g., work package-level GHG metrics). This dual academic–practical approach ensures that the CDH addresses both the foundational and forward-looking needs of the AEC sector.

Structuring a comprehensive data acquisition model that includes all the above-mentioned dimensions required a significant analysis of all the stakeholders and entities involved in executing the project. Entities here start at a high level to include the owner, consultant, construction organization, and subcontractors, and end at the level of resource hours (i.e., labor, material, and equipment) required to execute the project. The CDH is based on the integration between the four main structures in any construction project: OBS, PBS, RBS, and WBS. Figure 3 depicts this relationship, starting with an organization that has one or more divisions, each of which has multiple business units comprising various departments. Each department owns multiple types of resources, such as labor, material, and equipment. Each business unit is accountable for managing one or more portfolios to attain the strategic goals and objectives of the organization. A portfolio consists of multiple programs: a program is formed by a group of related projects, and the completion of the program’s goals relies on the successful execution of all these projects. A project has several phases, and each phase has disciplines, such as architecture, civil, structural, mechanical, electrical, etc. Each project is then decomposed into many work packages, and each work package is specific and large enough to describe the work executed to produce a product in a construction project. Accordingly, we can conclude that the key element here is the work package, as it connects the four structures. This highlights its value and the significant need for proper utilization of it as a knowledge carrier between the organization’s projects. The integration mechanics are explicitly illustrated later in Section 5, which serves as the relational model for the system.

During the planning phase, the amount of information, such as schedules, estimates, designs, drawings, and other documents, is not detailed enough to prepare proper estimates or select the optimal construction execution method. The conventional work package only includes data related to the cost and time required to execute that package. So, to improve the quality of the conventional work package, this research introduces a new term called “multidimensional work package” (MDWP). The Multidimensional Work Package (MDWP) extends Traditional WPs by integrating 12 dimensions (Figure 4), including scope, resources (e.g., labor, material, and equipment), real-time progress measurement, GHG emissions, risks, etc. Table 1 illustrates the benefits of MDWPs vs. Traditional WPs. Each MDWP uses weighted progress activities (WPAs) to dynamically calculate percentage completion based on predefined Rules of Credit (ROC). For instance, an MDWP for “Steel Beam Installation” might assign weights to sub-tasks (e.g., 30% for delivery, 50% for welding, 20% for inspection), enabling precise progress tracking tied to resource use and emissions (Section 4.5). The MDWPs are entities; hence, each MDWP has attributes that illustrate its particular properties. In this research, the attributes are, namely, unit of measure (UoM), labor hours per unit, equipment hours per unit, material per unit, GHG emissions (Kg of CO₂ equivalent per unit), and cost per unit. These attributes are crucial as they structure the main properties of each MDWP and have their equivalent key quantities.

4.1. The 1st Dimension: Scope Management

Every dimension has its related data; these data are usually stored in different systems or software. The first dimension, scope management, is the core element of project management dimensions, as it requires a precise and clear definition for the project work and impacts other dimensions such as time, cost, resources, etc. Scope management in construction is the process of defining, documenting, and controlling the work required for a project (PMBOK Guide). It is critical for completing projects on time, within budget, and to the required quality standards. The scope is mainly represented through a WBS, which is a decomposition of the full project into multi-work packages. These work packages utilize multiple resources at the same time (i.e., labor, material, and equipment) and are executed through major phases such as engineering, procurement, and construction. Other details included in this dimension are the key quantity name and unit of measure, baseline key quantity amount, actual key quantity amount, list of change orders, and the responsible resource for each work package.

4.2. The 2nd Dimension: Planning and Scheduling

Planning and scheduling is considered a significant dimension for the MDWP. In this research, the authors propose a generic template for WBSs by initiating two predefined libraries for product breakdown structure (PBS) and resources breakdown structure (RBS), to attain the standardization concept and improve the quality of the schedules and WBS developed by planners.

Generic Template for Work Breakdown Structure (WBS)

Starting with the generic template for the WBS, the structure of the template is based on the two predefined libraries: product breakdown structures (PBS) and resources breakdown structures (RBS). The PBS library contains the full scope of construction projects represented in the form of smaller, manageable work items as predefined work package types. This library is built in CDH by initiating all the work package types as activity codes. Those standardized packages can be utilized to build specific WBS for any future project while maintaining the standardization of levels, phases, subphases, and terms for all the organization’s projects. Similarly, another library is initiated for RBS to represent the predefined resources for labor, contractors, materials, and equipment. The template has a hierarchical structure and consists of four levels (Figure 5). The first two levels are project-specific: level 1 contains the project description, while level 2 divides the project into areas (plants or units) and components. To increase data collection consistency, each area in level 2 is further divided into phases and sub-phases. These phases and sub-phases are necessary to categorize the work packages at level 3, and are pre-defined as follows:

General:

-: General;
-: Planning/Scheduling;
-: Cost Estimating;
-: Cost Control.

Engineering Phase:

-: Preliminary;
-: Conceptual;
-: Detailed.

Procurement Phase:

-: Requisition and Awarding; purchase orders and subcontracts;
-: Materials Management;
-: Contract Administration.

Construction Phase:

-: Manufacturing and Fabrication;
-: Module Assembly;
-: Site Installation.

Level 3 contains generic work package types that are extracted from the PBS library and present scope deliverables or product packages. Level 3 is the most significant level of the proposed template because it represents the whole scope of work divided into predefined work packages, such as columns, beams, foundations, structural steel, concrete, etc. However, the work packages here are not limited to the work executed during the construction phase. Some work packages may go under the engineering or procurement phase, while others, such as schedule and budget control baselines, may not go under any phase. Hence, to structure a consistent template, a new phase category called “General” is added to include all the work packages that do not represent work conducted in the engineering, procurement, or construction phases. The authors propose the term “general work packages” (GWP) to be consistent with similar terms used in the industry, such as “engineering work packages” (EWP), “procurement work packages” (PWP), and “construction work packages” (CWP). The final level of the WBS, level 4, contains progress activities for each work package.

To structure a project WBS from scratch using the suggested framework in Figure 5, the planner should refer to the predefined libraries for PBS and RBS and assign all the requirements to work package types at level 3. The generic structure for WBS provides a comprehensive planning template that collects data consistently and meaningfully to facilitate further analysis and make sustainable decisions. Standardizing the terminologies with predefined levels, phases, and subphases enables the identification of meaningful metadata that can be used to capture data from multiple projects and organizations. Creating predefined work package types also enables meaningful data collection, and it facilitates data transfer.

4.3. The 3rd and 4th Dimensions: “Cost Estimating” and “Cost Control”

The third and fourth dimensions, “Cost Estimating” and “Cost Control”, are also covered under the proposed structure for WBS. Planners can collect the data related to these dimensions by adding more columns, such as actual, budgeted, and variance labor or material cost, to collect project details like baseline amount and rate, and actual amount and rate, for each resource required to complete a WP.

4.4. The 5th, 6th, and 7th Dimensions: “Labor Resources Management”, “Materials Management”, and “Equipment Management”

Similarly, planners can also collect the data related to these dimensions by referring to the predefined libraries for each branch of the RBS and tracking the utilization of the resources for each WP. The RBS’s three sub-libraries are for labor, materials, and equipment.

4.5. The 8th Dimension: “Progress Measurement and Performance Evaluation” Using Weighted Progress Activities (WPA)

To collect reliable data that can be utilized during the execution of a project or analyzed after the closure of the project, this research proposes a progress measurement system. This system evaluates the project progress and performance by assigning weights for the predefined progress activities based on the Rules of Credit (ROC) term and then measuring the percentage of completion for each WP. Lopez [46] defined the Rules of Credit (ROC) as “referring to the guidelines by which the physical progress of a project is evaluated, assigning value to each milestone met. These milestones are reflected in a percentage of the total contract and serve as the basis for determining when and how much the contractor should be paid for completed work. Proper application of these rules is essential to keeping the project on schedule and on budget”. In other words, the project’s progress is measured by tracking the percentage of completion of the predefined weighted progress activities (WPA). The project’s performance is similarly evaluated by taking the planned values (PV) from P6 Primavera and the actual values (AV) from the timesheet system at the construction site, then calculating the earned values (EV) for each WP. The system has a “Status” column to show the project’s current status and a “Comments” column for planners to record the causes for deviations from the original budget and schedules. Figure 6 is an example of WPA for a concrete module work package (construction work package) to measure the percentage of completion. The proposed tool requires identifying the progress activities (PA) and then assigning weights to these activities. The weights assigned to the activities, which include module assembly, module transportation, and site installation, are 60%, 10%, and 30%, respectively. After that, the planner can utilize it to measure the progress during the execution phase by specifying the actual finish date and the letter “A” that refers to “Accomplished” when the work is completed. Accordingly, the EV, variance percentage of completion, and current status are calculated for the specified work package.

4.6. The 9th Dimension: “Document Management”

Document management is a critical component of project management that guarantees both the effective handling and accessibility of project-related documents, contributing to project success, compliance, and risk mitigation. Construction documents usually include blueprints, drawings, contracts, permits, invoices, change orders, and correspondence. Although document management often involves the use of specialized software tools and systems, the proposed CDH tool suggests collecting a list of IFC (Issue for Construction) documents and as-built documents at this stage, while other documents might be added in the future.

4.7. The 10th Dimension: “Risk Management”

Managing risks in construction involves identifying, assessing, prioritizing, and minimizing potential risks that could affect the success of a construction project. Effective risk management enables construction firms and project teams to anticipate and resolve such challenges to minimize negative impacts on project timelines, budgets, quality, and safety. In this research, the CDH suggests collecting lists of risks for each WP type registered in the PBS library. Each list includes details such as risk name, description, magnitude, and probability. It also includes the risk impact on the scope, schedule, and budget of the project.

4.8. The 11th Dimension: “Environmental Impact”

In addition to air, water, soil, and noise pollution, construction can also harm the environment through waste generation, habitat destruction, and energy consumption. The proposed CDH tool, therefore, suggests collecting data related to GHG emissions and solid waste generated during the execution of a work package. This includes measuring GHG emissions and resource consumption (fuel, electricity, etc.) during the manufacturing of the building material/module, the transportation of the material/model to the site, and the on-site construction phase. It also includes collecting data related to the amount of solid waste generated during the construction phase at the work package level. This data can then be analyzed to identify the work packages that generate more GHG emissions/solid waste and to find execution or material alternatives for these packages.

4.9. The 12th Dimension: “Social Impact”

The social impacts of construction projects occur during and after the construction and may vary depending on the project’s nature, scale, and location. Among the significant social impacts of construction projects are the creation of employment opportunities for residents, the improvement of infrastructure that can attract businesses and investments, and the enhancement of residents’ overall quality of life by providing better access to essential services. However, such projects can also lead to temporary traffic congestion and detours, which may inconvenience nearby residents and businesses. Although this research does not cover the specific metadata, fields, or methods for measuring social impacts, the flexibility of the proposed CDH enables the user to customize the data acquisition model and integrate this dimension into their project management analysis.

5. Developing Construction Data Hub (CDH)

After analyzing the articles mentioned earlier in the literature and employing the twelve dimensions that impact planning and control in these studies, an evaluation of the addressed dimensions, associated phases, and the detailed work level from each study is shown in Table 2. All the studies that utilized data acquisition models were custom-tailored to their respective research objectives. While these models were effectively utilized and yielded satisfactory results in knowledge discovery, they lacked generality. Implementing these models in different scenarios might be challenging, either due to their lack of adaptability or the need for extensive modifications. The results shown in Table 2 indicate that none of the previous studies addressed the twelve dimensions while focusing on collecting project data at the work package level, which is the most efficient detailed work level. The proposed CDH model supports structuring a comprehensive data warehouse at the work package level and seeks to collect construction project data from twelve dimensions that impact the project’s planning and control.

An appropriate structure for the CDH requires the proper identification of all the entities that are involved in generating construction project data and the precise relationships between them. The authors divided the entities into two groups: organization-related entities and project-related entities. Organization-related entities refer to portfolios, programs, projects, and resources, whereas project-related entities are phases, disciplines, and work package types (also referred to as “MDWPs”). The proposed CDH adopts a relational database structure. Figure 7 presents the entity–relationship diagram (ERD) that illustrates the various entities within the database and their interconnections. Each of the entities included in the ERD has several attributes that describe that entity. The MDWP entity is the main entity upon which the rest of the ERD is based, since the MDWP is the product of the integration between the OBS, WBS, RBS, and PBS. The integration is shown through the relation between the significant entities, such as the organization and the project.

The authors used SQLite to build the client/server application, which demonstrates the conceptual model of the CDH (Figure 8). Figure 8a shows the main page of the CDH that presents the entities such as companies, portfolios/programs, projects, etc. Figure 8b shows an example of an organization’s portfolio that includes one program—namely, program 1- and Figure 8c shows an example of a project’s WBS and its related WPs (showing attributes such as unit of measure (UOM), phases, resource details, etc.). Through the developed CDH, users can input new data manually or import it from Primavera P6, Microsoft Project, Microsoft Access, or Excel files. The CDH can be customized to meet the needs of its users since several entities and attributes can be added as required.

The CDH uses the concept of dynamic breakdown structure, which enables the user to perform flexible grouping; for instance, work packages can be grouped according to discipline, phase, sub-phases, etc. Stakeholders can utilize the CDH to initiate a new project by entering the data points related to the project details. Data points include the portfolio or program that belongs to the project, the business unit or department that manages it, the phases of the project, the duration, and the estimated budget for each work package. The MDWP can then be populated with other data points as required, such as those related to resources, environmental impacts, etc. The developed database acts as a tool for collecting and storing detailed data about projects, which can be easily accessed in one place during and after the completion of a project. Users can also track the current status of a project during its execution phase by performing a simple query about the number of utilized resources in hours, or the amount of GHG emissions generated up to date, etc.

6. Case Study

An actual case study is illustrated to show the applicability of the proposed Construction Data Hub (CDH) in real-life situations and to prove that collecting construction project data from the proposed twelve dimensions is significant for tackling construction project difficulties. A machine learning model that utilizes the data collected through the CDH is developed to analyze the factors influencing construction projects’ profit. The case study data set was obtained from a private construction organization that carries out construction works in Alberta, Canada. Their construction projects include government buildings, residential buildings, universities, schools, hospitals, parks, playgrounds, infrastructure works, etc.

6.1. Factors Influencing Construction Project Profits

Accurately estimating profit margins for construction projects is a crucial decision that construction firms’ estimators make during the initial design phases. Nevertheless, this task is challenging due to various external, organizational, and project-related factors impacting profit [47,48,49]. Project teams rely on intuition or uniform rates, which are not always reliable methods for determining profit margins. The construction team might also consider other factors/attributes available in their databases. Organizational attributes such as divisions, business units, portfolios, and programs might influence the profit margin. More attributes that are project-related need to be considered, too, such as the joint venture partner, project financing model (PFM), project delivery method (PDM), detailed location, late completion penalty, and the project’s architect/engineer. Bilal and Oyedele [50] pointed out that project complexity, including risks, opportunities, and the distance the construction route covers across rivers, roads, rails, and utilities, are significant factors. They also emphasized the importance of resource allocation in predicting profit margins. The key resources include the project manager (PM), quantity surveyor (QS), commercial manager, design manager, suppliers, and subcontractors.

6.2. Data Investigation and Analysis

The original data set obtained from the organization consisted of 2018 construction projects from disciplines such as residential, commercial and institutional, infrastructure, and industrial. It included projects executed between 2004 and 2024. Only projects with 100% execution percentages were selected to be included in the research data set, while other projects with execution percentages less than 100% were discarded. Although the organization invested a lot of hours collecting data from different data sources and project management software, it was found that the collected data were still not ready for the data mining process. Soibelman and Kim [51] highlighted the reason behind this issue: the absence of clearly defined, automated mechanisms to extract, process, analyze data, and summarize results for construction managers.

The original data set included 186 fields. Hence, the authors performed data cleaning and further investigation processes for the data set, which revealed some main issues such as data duplications and combining many fields. These issues were mainly due to the way data were collected, as some information was in MS Excel sheets, which was different from the information collected in other management software, leading to low data quality and analytic suitability. For instance, some fields combined significant information such as project delivery methods (PDM), project financing models (PFM), and contract types in one field. To avoid any future data collection issues, the authors created a data dictionary to facilitate the data collection process and serve as a foundation for required analytics data. Further, a data gap analysis was performed to determine the missing fields and information. As a result, the data were organized based on seven categories: organization breakdown structure (OBS), project details, stakeholders, location, budget, cost and profit, and duration.

Although the authors could not collect data at the work package level as proposed earlier in their methodology, employing the proposed CDH idea at the project level proved its applicability. The authors collected the appropriate metadata from different structures and dimensions and then arranged and categorized the fields. They collected, cleaned, and organized construction projects’ metadata under the two structures: the OBS and RBS. The metadata related to the other two structures, WBS and PBS, were not available in the obtained data set. Also, metadata from six dimensions out of the proposed twelve dimensions was obtained. The available dimensions are scope management, planning and scheduling, cost estimating, cost control, labor resources, and document management (Table 3). The authors integrated the metadata from the two structures and six dimensions in one data sheet and used it in their analysis. To avoid redundancy, the metadata fields were re-evaluated; for instance, the authors calculated the “approved change orders %” and removed the field of “approved change orders”. They also used the fields “baseline start date”, “baseline finish date”, “actual start date”, and “actual finish date” to calculate the new fields called the “planned duration in days” and “actual duration in days” and then discarded the dates fields. Other fields, such as “project financing model (PFM)” and “postal/zip code”, were discarded due to missing values. Thus, the resulting data set included 507 projects with 28 significant fields as profit influential factors, which will be utilized later as inputs to the machine learning model.

6.3. Data Visualization Using Online Analytical Processing (OLAP) Technique

At this stage, the data set was prepared, cleaned, and ready for visualization using an OLAP technique such as Pivot tables in Microsoft Excel. This technique is a powerful visualizing tool, as it gives the ability to summarize, analyze, explore, and present large, detailed data sets. Figure 9 shows an example of data visualization performed during the research. It presents the number of executed projects under each portfolio and groups the portfolios according to their division. This figure clearly shows that the buildings division has executed more projects than the other divisions, while the two portfolios, buildings Alberta and buildings Prairies, have more projects than the others.

6.4. Determining Influential Factors Affecting Construction Projects’ Profit Using Machine Learning

As mentioned earlier, the final data set included twenty-eight fields nominated as profit influential factors. The authors selected Rapid Miner software (https://docs.rapidminer.com/9.9/studio/installation/, accessed on 19 April 2025) to perform the machine learning process and utilized the random forest algorithm to develop a feature selection model that analyzes factors. Feature selection is a perfect process that can be used before creating a prediction model to reduce the inputs by identifying the most meaningful ones. The developed feature selection model uses a “weight by tree importance” operator [52], which calculates the weight of the attributes by analysing the split points of a random forest model. The attributes with higher weight are considered more relevant and important. This weighting scheme involves using a designated random forest to ascertain the relative importance of the attributes used. To achieve this, each node of each tree is examined to retrieve the benefit generated by the split at that node. Subsequently, the benefits for each attribute that was used for a split are summed. The average benefit across all trees is then regarded as the importance of the attribute.

The proposed model examines each profit influential factor as an input and identifies its correlation with the model output, which is profit. Thus, the significant factors can be used to develop a machine learning prediction model to predict profit for future projects. In this research, the developed feature selection model calculated the weights for twenty-three factors out of twenty-eight (Table 4). The most influential attributes with a weight more than 0.5 were baseline budget, approved change orders %, current budget, actual cost, planned duration in days, and baseline cost. The model provided zero weights for the five attributes, namely: country, city, arch/engineer, project manager, and owner, which indicates that these attributes are less relevant and important to the profit.

During split point analysis, the random forest model assigned higher weights to factors that most effectively partitioned the data set into distinct profit-based clusters. Specifically, the algorithm evaluates all possible splits across each variable (e.g., baseline budget) to maximize information gain, which is measured by metrics like Gini impurity reduction or entropy. Factors such as baseline budget (weight: 1.00) consistently created the purest subsets, indicating their predictive power. This improvement in split frequency and purity is aggregated across all trees in the forest, with the baseline budget emerging as the dominant splitter.

The baseline budget emerged as the most influential factor (weight: 1.00), as it encapsulates the project’s initial financial scale and complexity. Larger budgets typically involve more stakeholders, procurement risks, and resource dependencies, which amplify cost overrun risks. This finding aligns with Rui et al. [49], who established that budget size is a proxy for project complexity. Approved change orders (weight: 0.86) indicate deviations from the original plan. Frequent changes often reveal insufficient feasibility studies or fluctuating client demands, diminishing profit margins due to rework and delays [50]. The tight coupling of the current budget (0.82) and actual cost (0.56) weights underscores the impact of dynamic cost control. Projects that proactively revise budgets mitigate losses, while those with rigid plans suffer higher overruns [30]. In contrast, variables like “city” (weight: 0.00) failed to segment the data, leading to their exclusion meaningfully. The model’s reliance on empirical split efficiency (rather than correlation alone) ensures robust, interpretable feature importance aligned with construction management realities.

Moreover, the data for many influential profit factors are available at the early stages of the project, such as the planning stage. Accordingly, these factors can be utilized as inputs for a profit prediction model. The authors highlighted that the factors available at the planning stage with a weight of more than 0.25 are relevant and important to the profit. For instance, factors such as project type, project delivery method (PDM), portfolio, contract type, program, late completion penalty, project category, baseline cost, planned duration in days, and baseline budget.

The development of CDH acts as a comprehensive data hub to be utilized to tackle various construction project issues. The outcomes of the case study emphasized the importance of collecting data from multiple dimensions and structures of a construction project. These results are demonstrated by the previously explained studies that a project’s profit is impacted by various external, organizational, and project-related factors. To reflect, in this research, many profit influential factors were under the OBS structure; namely, company, business unit, division, portfolio, and program. These were categorized as organizational factors. Other factors under the six dimensions mentioned earlier in Table 3 were categorized as project-related factors, such as project type, project delivery method (PDM), contract type, project category, duration, budget, etc. Also, factors such as joint venture, region, and province/state were categorized as external factors. The developed model was unable to identify any correlation between the project stakeholders, such as the project manager, arch/engineer, and owner, with the profit.

The absence of work package (WP)-level data significantly limits the model’s capacity to identify micro-scale drivers of profit fluctuations. While project-level metrics (e.g., baseline budget, total duration) offer macro-level correlations with profit, they do not capture task-specific inefficiencies that cumulatively affect financial outcomes. For instance, a project may seem profitable overall yet hide critical inefficiencies in specific WPs, such as repetitive rework in electrical installations due to design clashes or material waste in structural steel fabrication resulting from poor cutting precision. These micro-scale issues often go unnoticed in aggregated data but can substantially inflate costs when compounded across multiple WPs. WP-level tracking (e.g., labor hours per cubic meter of concrete poured or defect rates per subcontractor) would allow for the precise identification of problem areas, facilitating targeted interventions like subcontractor retraining or process redesign. Without this granularity, the model’s insights remain limited to broad trends, limiting its usefulness for practical decision-making.

The main limitation of the developed model is the unavailability of a data set at the work package level, this might justify the model’s inability to identify any correlation between the project stakeholders and the profit margins. Machine learning algorithms tend to perform better with larger data sets. Therefore, as more data are collected and properly tracked using the CDH, it will be necessary to revalidate the model to ensure it continues to perform optimally.

7. Conclusions

Construction projects generate vast amounts of data. For the data to be useful, it must be relevant, accurate, and organized into data sets that are large enough to be processed into meaningful analysis. Construction practitioners, therefore, require a consistent integrated data acquisition model to collect detailed data from different dimensions across multiple projects. The proposed Construction Data Hub (CDH) integrates the four main structures of any construction project: organization breakdown structures (OBSs), product breakdown structures (PBSs), resource breakdown structures (RBSs), and work breakdown structures (WBSs). The CDH also collects data related to twelve dimensions that impact the project’s planning and control, and focuses on the work package as a connector between the four breakdown structures and a knowledge carrier across an organization’s different projects. To enhance the traditional work package, the authors have introduced the multi-dimensional work package (MDWP), which includes new dimensions of data. Since construction organizations have different work-level definitions, a generic model is required to collect detailed data at a specific level, regardless of a particular organization’s terminology. Defining standard levels, phases, and subphases is crucial to capturing meaningful metadata from various projects and organizations.

Academic studies on construction planning often face difficulties during the data collection process, which researchers consider a critical step that directly impacts the accuracy of their models. The CDH contributes to academia by supporting a structured and comprehensive data warehouse that serves as a central data facility for various stakeholders to retrieve the right data for making sustainable decisions. Furthermore, the data warehouse contains high-quality data from several dimensions that can be utilized to perform detailed analyses, offering valuable insights and as resources for future studies, such as mathematical models for forecasting and optimization.

This research also addresses a gap in the industry by utilizing details at the work package level, which is more efficient than the current industry practice. The CDH is a practical tool since it uses the data generated during the execution of the project and is kept updated during the progress of the project. The aim of collecting details from the four structures at the work package level is to provide project managers with a dynamic tool that can obtain insights about a specific element such as a project, program, resource, equipment, or construction work package (as a product), through rolling-up, drilling-down and slice and dice techniques using the group and sort option. Also, it can be utilized for forecasting work package requirements, such as durations, costs, and GHG emissions for future projects.

Future research will explore the integration of BIM to automate data collection at the work package level. For instance, linking the CDH to BIM could enable real-time extraction of geometric, schedule, and cost data while enhancing the CDH with non-geometric attributes such as labor hours and embodied carbon. Additionally, the model’s accuracy will be re-evaluated using work package-level data sets that capture finer-grained metrics (e.g., schedule, budgets, resources such as labor, materials, and equipment, and productivity rates per WP). Collaborations with industry partners are underway to access such data, addressing the current limitation of project-level aggregation.

Author Contributions

Conceptualization, M.G. and A.H.; Methodology, M.G. and A.H.; Software, M.G.; Validation, M.G. and A.H.; Formal analysis, M.G. and A.H.; Investigation, M.G.; Resources, M.G.; Data curation, M.G.; Writing—original draft, M.G.; Writing—review & editing, M.G. and A.H.; Visualization, M.G. and A.H.; Supervision, A.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Sciences and Engineering Research Council of Canada grant number ALLRP 577032-2022. And the APC was funded by the Natural Sciences and Engineering Research.

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

The data supporting the findings of this study are not publicly available due to confidentiality agreements and ethical restrictions related to the privacy of the participant organization involved.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zayed, T.; Liu, Y. Risk and unpredictability in construction projects. J. Constr. Eng. Manag. 2014, 140, 04014004. [Google Scholar] [CrossRef]
Bilal, M.; Oyedele, L.O.; Qadir, J.; Munir, K.; Ajayi, S.O.; Akinade, O.O.; Owolabi, H.A.; Alaka, H.A.; Pasha, M. Big data in the construction industry: A review of present status, opportunities, and future trends. Adv. Eng. Inform. 2016, 30, 500–521. [Google Scholar] [CrossRef]
Yepes, V.; López, D. Knowledge management in construction projects: An integrated approach. J. Constr. Eng. Manag. 2021, 147, 04021075. [Google Scholar] [CrossRef]
Lee, S.; Yu, J.; Jeong, D. Knowledge management in construction projects: Challenges and solutions. Autom. Constr. 2008, 17, 940–948. [Google Scholar] [CrossRef][Green Version]
Golestanizadeh, M.; Zavadskas, E.K.; Antucheviciene, J. Effective management of project factors in construction projects. Sustainability 2023, 15, 1234. [Google Scholar] [CrossRef]
ElMenshawy, M.; Marzouk, M. Scheduling and progress monitoring in construction projects. J. Constr. Eng. Manag. 2021, 147, 04021045. [Google Scholar] [CrossRef]
Hong, E.; Yi, J.S.; Lee, D. CTGAN-Based Model to Mitigate Data Scarcity for Cost Estimation in Green Building Projects. J. Manag. Eng. 2024, 40, 04024024. [Google Scholar] [CrossRef]
Sheng, V.S.; Provost, F.; Ipeirotis, P.G. Get another label? Improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008; pp. 614–622. [Google Scholar]
Jordan, M.I.; Mitchell, T.M. Machine learning: Trends, perspectives, and prospects. Science 2015, 349, 255–260. [Google Scholar] [CrossRef]
Gudivada, V.; Apon, A.; Ding, J. Data quality considerations for big data and machine learning: Going beyond data cleaning and transformations. Int. J. Adv. Software. 2017, 10, 1–20. [Google Scholar]
Matel, E.; Vahdatikhaki, F.; Hosseinyalamdary, S.; Evers, T.; Voordijk, H. An artificial neural network approach for cost estimation of engineering services. Int. J. Constr. Manag. 2022, 22, 1274–1287. [Google Scholar] [CrossRef]
Sonmez, R.; Uysal, F.; Arditi, D. Genetic algorithms for multi-project scheduling in construction. J. Constr. Eng. Manag. 2015, 141, 04014082. [Google Scholar] [CrossRef]
Abuwarda, Z.; Hegazy, T.; Zayed, T. Resource-constrained scheduling in construction projects. J. Constr. Eng. Manag. 2016, 142, 04016010. [Google Scholar] [CrossRef]
Sonmez, R.; Uysal, F.; Arditi, D. Time-cost trade-off analysis in construction projects using genetic algorithms. J. Constr. Eng. Manag. 2016, 142, 04016025. [Google Scholar] [CrossRef]
Wang, J.; Li, H.; Li, Y. Challenges in data collection for construction project management. Autom. Constr. 2020, 110, 103016. [Google Scholar] [CrossRef]
Pan, Y.; Zhang, L. Roles of artificial intelligence in construction engineering and management: A critical review and future trends. Autom. Constr. 2021, 122, 103517. [Google Scholar] [CrossRef]
FD2D. From Data to Decision (FD2D). Available online: https://fd2d.org/ (accessed on 14 June 2024).
Eastman, C.M. BIM Handbook: A Guide to Building Information Modeling for Owners, Managers, Designers, Engineers and Contractors; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
MIL-STD-881E; Standard Practice, Work Breakdown Structures for Defense Materiel Items. Department of National Defense: Washington, DC, USA, 2020.
Project Management Institute. Project Management Body of Knowledge (PMBOK Guide), 3rd ed.; Project Management Institute: Newtown Square, PA, USA, 2004. [Google Scholar]
Haugan, G.T. Effective Work Breakdown Structures; Management Concepts Inc.: Vienna, VA, USA, 2001. [Google Scholar]
Rodriguez, A.; Locksley, W.; Shishko, R. Systems Engineering Handbook; NASA/SP-2007-6105 Rev1; National Aeronautics and Space Administration, NASA Headquarter: Washington, DC, USA, 2007. Available online: http://ntrs.nasa.gov/ (accessed on 19 April 2025).
Rad, P.F. Advocating a deliverable-oriented work breakdown structure. Cost Eng. 1999, 41, 35–39. [Google Scholar]
Chau, K.W.; Cao, Y.; Anson, M.; Zhang, J. Application of data warehouse and decision support system in construction management. Autom. Constr. 2003, 12, 213–224. [Google Scholar] [CrossRef]
Ahmad, I.; Azhar, S.; Lukauskis, P. Development of a decision support system using data warehousing to assist builders/developers in site selection. Autom. Constr. 2004, 13, 525–542. [Google Scholar] [CrossRef]
Fan, H.; Kim, H.; Zaïane, O.R. Data warehousing for construction equipment management. Can. J. Civ. Eng. 2006, 33, 1480–1489. [Google Scholar] [CrossRef]
Rujirayanyong, T.; Shi, J.J. A project-oriented data warehouse for construction. Autom. Constr. 2006, 15, 800–807. [Google Scholar] [CrossRef]
Park, C.S.; Kim, H.J. A sewer infrastructure decision support system based on a data warehouse. J. Constr. Eng. Manag. 2013, 139, 04013015. [Google Scholar] [CrossRef]
Hammad, A.W.A.; Akbarnezhad, A.; Rey, D. Labor resource data warehouse for construction project management. J. Constr. Eng. Manag. 2014, 140, 04014015. [Google Scholar] [CrossRef]
Ghazal, M.M.; Hammad, A. Application of knowledge discovery in database (KDD) techniques in cost overrun of construction projects. Int. J. Constr. Manag. 2022, 22, 1632–1646. [Google Scholar] [CrossRef]
Elkholosy, W.; Hammad, A.; Akbarnezhad, A. Data acquisition model for labor resource forecasting in construction projects. J. Constr. Eng. Manag. 2022, 148, 04022015. [Google Scholar] [CrossRef]
Golabchi, H.; Hammad, A. Estimating labor resource requirements in construction projects using machine learning. Constr. Innov. 2023, 24, 1048–1065. [Google Scholar] [CrossRef]
Xu, Y.; Zhou, Y.; Sekula, P.; Ding, L. Machine learning in construction: From shallow to deep learning. Dev. Built Environ. 2021, 6, 100045. [Google Scholar] [CrossRef]
Akinosho, T.D.; Oyedele, L.O.; Bilal, M.; Ajayi, A.O.; Delgado, M.D.; Akinade, O.O.; Ahmed, A.A. Deep learning in the construction industry: A review of present status and future innovations. J. Build. Eng. 2020, 32, 101827. [Google Scholar] [CrossRef]
Awada, M.; Srour, F.J.; Srour, I.M. Data-driven machine learning approach to integrate field submittals in project scheduling. J. Manag. Eng. 2021, 37, 04020104. [Google Scholar] [CrossRef]
Shoar, S.; Chileshe, N.; Edwards, J.D. Machine learning-aided engineering services’ cost overruns prediction in high-rise residential building projects: Application of random forest regression. J. Build. Eng. 2022, 50, 104102. [Google Scholar] [CrossRef]
Ashtari, M.A.; Ansari, R.; Hassannayebi, E.; Jeong, J. Cost overrun risk assessment and prediction in construction projects: A Bayesian network classifier approach. Buildings 2022, 12, 1660. [Google Scholar] [CrossRef]
Banerjee Chattapadhyay, D.; Putta, J.; Rao, P.R.M. Risk identification, assessments, and prediction for mega construction projects: A risk prediction paradigm based on cross analytical-machine learning model. Buildings 2021, 11, 172. [Google Scholar] [CrossRef]
Darko, A.; Glushakova, I.; Boateng, E.B.; Chan, A.P. Using machine learning to improve cost and duration prediction accuracy in green building projects. J. Constr. Eng. Manag. 2023, 149, 04023061. [Google Scholar] [CrossRef]
Wang, P.; Wang, K.; Huang, Y.; Fenn, P. A contingency approach for time-cost trade-off in construction projects based on machine learning techniques. Eng. Constr. Archit. Manag. 2024, 31, 4677–4695. [Google Scholar] [CrossRef]
Agostinelli, S.; Cumo, F.; Marzo, R.; Muzi, F. Digital construction strategy for project management optimization in a building renovation site: Machine learning and big data analysis. In International Conference on Trends on Construction in the Post-Digital Era; Springer International Publishing: Cham, Switzerland, 2022; pp. 20–35. [Google Scholar]
Xie, Q. Machine learning in human resource system of intelligent manufacturing industry. Enterp. Inf. Syst. 2022, 16, 264–284. [Google Scholar] [CrossRef]
Ebrahimi, S.; Fayek, A.R.; Sumati, V. Hybrid artificial intelligence HFS-RF-PSO model for construction labor productivity prediction and optimization. Algorithms 2021, 14, 214. [Google Scholar] [CrossRef]
Almahameed, B.A.; Bisharah, M. Applying machine learning and particle swarm optimization for predictive modeling and cost optimization in construction project management. Asian J. Civ. Eng. 2024, 25, 1281–1294. [Google Scholar] [CrossRef]
Taghinezhad, A.; Jafari, A.; Kermanshachi, S.; Nipa, T. Construction project management dimensions in transportation agencies: Case study of the US Department of Transportation. Pract. Period. Struct. Des. Constr. 2021, 26, 06021002. [Google Scholar] [CrossRef]
Lopez, D. Rules of Credit in Construction Contracts. PMIKSA. 2024. Available online: https://pmiksa.sa/rules/ (accessed on 10 January 2024).
Rui, Z.; Li, X.; Li, S. Factors influencing construction project profits: A review. J. Constr. Eng. Manag. 2012, 138, 601–611. [Google Scholar] [CrossRef]
Rui, Z.; Li, X.; Li, S. Organizational factors influencing construction project profits. J. Constr. Eng. Manag. 2012, 138, 701–710. [Google Scholar] [CrossRef]
Rui, Z.; Li, X.; Li, S. Project-related factors influencing construction project profits. J. Constr. Eng. Manag. 2013, 139, 801–811. [Google Scholar] [CrossRef]
Bilal, M.; Oyedele, L.O.; Kusimo, H.O.; Owolabi, H.A.; Akanbi, L.A.; Ajayi, A.O.; Akinade, O.O.; Delgado, J.M. Investigating profitability performance of construction projects using big data: A project analytics approach. J. Build. Eng. 2019, 26, 100850. [Google Scholar] [CrossRef]
Soibelman, L.; Kim, H. Data preparation process for construction knowledge generation through knowledge discovery in databases. J. Comput. Civ. Eng. 2002, 16, 39–48. [Google Scholar] [CrossRef]
RapidMiner Studio. Weight by tree Importance Operator. RapidMiner Documentation. 2024. Available online: https://docs.rapidminer.com/ (accessed on 9 April 2024).

Figure 1. Integrated planning and control framework (IPCF) “From Data to Decision (FD2D)”.

Figure 2. Research methodology framework.

Figure 3. The core idea of integrating the four main structures: OBS, PBS, RBS, and WBS.

Figure 4. The twelve dimensions of MDWP.

Figure 5. The generic structure of WBS.

Figure 6. Progress measurement and performance evaluation using Excel.

Figure 7. The CDH entity relationship diagram (ERD).

Figure 8. Screenshots of the developed CDH tool.

Figure 9. Data visualization using OLAP techniques: Executed projects grouped by their portfolios and divisions.

Table 1. Benefits of MDWPs vs. Traditional WPs.

Aspect	Traditional WP	MDWP	Advantage
Progress Tracking	Manual % complete (e.g., 50% poured)	Automated, ROC-weighted WPAs (Section 4.5)	Eliminates guesswork; ties progress to physical milestones
Data Granularity	Time/cost only	12 dimensions	Enables AI-driven analytics (Section 6.4)
Real-Time Updates	Static (updated weekly/monthly)	Dynamic (BIM/ERP integrations)	Reduces reporting latency
Risk Mitigation	Reactive (post-incident)	Proactive (e.g., safety trends per WP)	Lowers incident rates
Sustainability	Not tracked	Embodied carbon, solid waste	Supports ESG ¹ reporting

¹ Environmental, Social, and Governance Reporting.

Table 2. Evaluation of the literature review.

Reference	Dimension	Phase	Work Level
Chau et al. [24]	Material management	Construction	Project level (Level 1)
Ahmad et al. [25]	N/A	Feasibility	Project level (Level 1)
Fan et al. [26]	Equipment management	Construction	Project level (Level 1)
Rujirayanyong and Shi [27]	Scope management Planning and scheduling Cost estimating Cost control Material management Labor resources management Document management	Construction	Activity level (Level 4)
Park and Kim [28]	Planning and scheduling Cost estimating	Construction	Project level (Level 1)
Hammad et al. [29]	Planning and scheduling Labor resources management	Construction	Work package level (Level 3)
Ghazal and Hammad [30]	Planning and scheduling Cost estimating Cost control	Construction	Project level (Level 1)
Elkholosy et al. [31]	Planning and scheduling Cost estimating Cost control Labor resources management	Construction	Project level (Level 1)
Golabchi and Hammad [32]	Planning and scheduling Labor resources management	Construction	Work package level (Level 3)

Table 3. Metadata fields and their associated structure or dimension.

Structure/Dimension	Metadata Fields
OBS	Company Division Business Unit Portfolio Program
Scope management	Approved Change Orders %
Planning and scheduling	Planned Duration in Days Actual Duration in Days
Cost estimating	Baseline Budget Baseline Cost
Cost control	Current Budget Actual Cost Profit %
Labor resources Stakeholders	Project Manager Owner Architect (Engineer)
Document management	Project Type Project Category Complexity Level Joint Venture Joint Venture Partner Late Completion Penalty Currency Project Delivery Method (PDM) Contract Type Region Country Province/State City

Table 4. Construction project’s profit influential factors and their weights.

SI.	Attribute	Weight
1	Currency	0.02
2	Joint Venture	0.04
3	Joint Venture Partner	0.06
4	Company	0.14
5	Business Unit	0.14
6	Division	0.15
7	Complexity Level	0.17
8	Province/State	0.19
9	Region	0.22
10	Project Type	0.23
11	Project Delivery Method (PDM)	0.23
12	Portfolio	0.24
13	Contract Type	0.29
14	Program	0.33
15	Late Completion Penalty	0.36
16	Actual Duration in Days	0.43
17	Project Category	0.45
18	Baseline Cost	0.54
19	Planned Duration in Days	0.56
20	Actual Cost	0.56
21	Current Budget	0.82
22	Approved Change Orders %	0.86
23	Baseline Budget	1.00

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ghazal, M.; Hammad, A. An Integrated Planning and Control Framework (IPCF) for Construction Projects—Step 1: Development of the Construction Data Hub (CDH). Appl. Sci. 2025, 15, 4682. https://doi.org/10.3390/app15094682

AMA Style

Ghazal M, Hammad A. An Integrated Planning and Control Framework (IPCF) for Construction Projects—Step 1: Development of the Construction Data Hub (CDH). Applied Sciences. 2025; 15(9):4682. https://doi.org/10.3390/app15094682

Chicago/Turabian Style

Ghazal, Mai, and Ahmed Hammad. 2025. "An Integrated Planning and Control Framework (IPCF) for Construction Projects—Step 1: Development of the Construction Data Hub (CDH)" Applied Sciences 15, no. 9: 4682. https://doi.org/10.3390/app15094682

APA Style

Ghazal, M., & Hammad, A. (2025). An Integrated Planning and Control Framework (IPCF) for Construction Projects—Step 1: Development of the Construction Data Hub (CDH). Applied Sciences, 15(9), 4682. https://doi.org/10.3390/app15094682

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Integrated Planning and Control Framework (IPCF) for Construction Projects—Step 1: Development of the Construction Data Hub (CDH)

Abstract

1. Introduction

2. Literature Review

2.1. Background of the Work Breakdown Structure (WBS) Concept

2.2. Using Data Acquisition Models and Data Warehouses for the Collection of Historical Data

2.3. Application of Machine Learning and Big Data Analysis in Construction

2.4. Research Gaps

3. Methodology

4. Dimensions Impacting Project Planning and Control

4.1. The 1st Dimension: Scope Management

4.2. The 2nd Dimension: Planning and Scheduling

Generic Template for Work Breakdown Structure (WBS)

4.3. The 3rd and 4th Dimensions: “Cost Estimating” and “Cost Control”

4.4. The 5th, 6th, and 7th Dimensions: “Labor Resources Management”, “Materials Management”, and “Equipment Management”

4.5. The 8th Dimension: “Progress Measurement and Performance Evaluation” Using Weighted Progress Activities (WPA)

4.6. The 9th Dimension: “Document Management”

4.7. The 10th Dimension: “Risk Management”

4.8. The 11th Dimension: “Environmental Impact”

4.9. The 12th Dimension: “Social Impact”

5. Developing Construction Data Hub (CDH)

6. Case Study

6.1. Factors Influencing Construction Project Profits

6.2. Data Investigation and Analysis

6.3. Data Visualization Using Online Analytical Processing (OLAP) Technique

6.4. Determining Influential Factors Affecting Construction Projects’ Profit Using Machine Learning

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI