1. Introduction
As the construction sector undergoes a digital transformation, the amount of information packed into building models is exponentially increasing. Building design, construction and operation are information-intensive activities. For example, even two decades ago in UK construction, on average, one computer-aided design (CAD) document was produced for every 9 m
2 of building floor space [
1]. Researchers [
2] have reported the problem of ‘information overload’ in the construction sector. Building Information Modeling (BIM) models are following this general trend and becoming more information-rich. Regarding volumes of information specifically in BIM models, BIM platforms have been identified as a particularly favorable communication medium in construction compared to extranets, email and Enterprise Resource Planning systems [
3]. The advantages of BIM over documents and extranets have been reported [
4]. Although no absolute measures of the quantities of information were found, the implication from such studies is that BIM models are increasingly information-rich, motivating the development of a search engine to enable BIM users to meet their information needs.
Researchers have studied the negative impact of information overload on productivity in general office work [
5,
6]. Practitioner-based findings from the construction industry agree with those from academic research. In a panel discussion between industry experts organized by
Construction Manager magazine [
7], all agreed that ‘information overload’ was a huge concern that hinders a project’s productivity. Similarly, in a report commissioned by the British Government [
8], it is argued that the industry must modernize and embrace digital technologies to tackle the acute problems of low productivity and poor collaboration.
Although such findings are based on subjective perceptions rather than objective measurements, from a work productivity perspective, the generation and management of information do not appear to be the problem but rather the retrieval and consumption of information. According to a survey of nearly 600 construction leaders from around the world, construction professionals spend around 14% of their time searching for project information [
9]. It is often implicitly assumed that information will be created during the design and construction of the built environment, but particular effort is needed to make this information retrievable and reusable [
10]. Beyond construction, a survey of 345 IT and storage professionals [
11] highlighted the importance of this later phase of retrieval in the information lifecycle.
Search engines for building model archives and project databases were found to improve aspects of design [
12], cost [
13] and construction [
14]. Information search and retrieval in BIM have been studied from a number of perspectives. A review [
15] cites knowledge management, design reuse and continuous improvement as the main motives for developing BIM information retrieval systems (i.e., search engines). Approaches are classified into
context-,
geometry-, and
content-based BIM retrieval. The research reported here uniquely touches upon all three, combining text search augmented by 3D data and interrelationships between 3D objects to account for context.
The science of information retrieval (IR) is usually associated with documents, which can be thought of as relics from the pre-BIM age. It has been shown [
12] that parameters in 3D building models could be treated as very short documents, and the application of traditional IR techniques yielded reasonable retrieval performance. The challenge remains of augmenting traditional text-based IR techniques with 3D data. The work reported here builds on earlier inconclusive work [
16] and specifically explores the exploitation of interrelationships between 3D objects to improve retrieval performance and the impact of information standards on retrieval performance.
Interrelationships between 3D objects can account for
context when identifying items relevant to a search query. The importance of context is increasingly recognized [
15,
17]. The study of interrelationships between objects in 3D models is often labeled as model
topology. A succinct review of the concept was presented as part of the 3DIR project [
16]. In this research, topological relationships are taken to include any relationships between 3D building elements in a model that may enhance information retrieval. These relationships might be strictly topological and concerned with interior/boundary/exterior of 3D components as noted below, more general spatial/directional relationships, or even relationships as they occur in a very general semantic sense, albeit linked to 3D space. In this sense, any two objects in a model sharing the same attribute (e.g., two components supplied by the same manufacturer or made of the same material) can be said to be related. If a user searching for information is interested in the first object, but not the second object related to it, information from the second object can still be retrieved but ranked as less relevant. Graph theory has been proposed as a useful theoretical lens for studying model topology to improve retrieval performance [
16] and this is the lens adopted here.
As more of the design, construction and operation of constructed facilities is digitized, and in light of the inherent collaborative and fragmented nature of the sector, the importance of
standardization has emerged. The research reported here, therefore, goes on to investigate the effect of information standards on retrieval performance. Information standards are increasingly needed in the delivery of design and construction by complex supply chains, and it is important that standards would not prevent BIM users from meeting their information needs. Two pertinent standards are tested: Industry Foundation Classes (IFC) and Uniclass-2015. Previous research [
18] identified those two as among the most widely adopted in the UK. IFC has emerged as a globally influential standard schema for BIM data exchange. Uniclass-2015 is particularly important in the UK where it is aligned to the country’s BIM strategy.
The aim of this research is to investigate how information retrieval performance from BIM models can be improved by exploiting relationships between 3D objects in the model and to gauge the effect of information standards on IR performance.
3. Materials and Methods
The steps of the research method are illustrated in
Figure 2. To investigate the exploitation of topological relationships in BIM models, the 3DIR toolset was adopted. The 3DIR toolset was a publicly available BIM search engine developed as an add-in for Autodesk Revit as part of previous research [
16]. The 3DIR toolset indexes the text parameters of the 3D objects in a Revit model and enables keyword searches based on this index. It also follows URLs to index external documents linked to 3D objects. The 3DIR toolset exploits model topology by identifying related objects and encoding these relationships in the index. Related objects are currently identified based on
hosting,
touching or
intersecting relationships [
16]. The hosting relationship is explicitly encoded in the Revit information architecture, whereas the touching and intersecting relationships can be inferred using simple computations through the Revit Application Programming Interface. Those relationships were chosen as a feasible set of relationships to test the concept of exploiting such relationships. A deeper investigation into meaningful relationships between 3D objects is recommended for future work. For this research, a structural 3D model was obtained from Sir Robert McAlpine’s project, Battersea Phase 3a. The Battersea Power Station project is a £9 bn eight-phase development on the banks of the River Thames featuring more than 4000 new homes, commercial offices, retail units, and a hotel. Phase 3a is a key phase within the overall masterplan, comprising over 550 apartments, a lifestyle hotel, and a pedestrianized retail high street. The specific building adopted for this research is a large 16-storey multifunctional development. The actual building is shown on the right of
Figure 1. At its core, as stated above, the 3DIR toolset creates an index of all text terms in a Revit model, i.e., texts from parameters and linked documents. With reference to
Figure 1, each entry in the index is a V
i item. Each V
i item in the index is represented as a high-dimensionality vector, with a weight based on term frequency representing each textual term. The Apache Lucene library [
42] was used to compute these vector space model weights. Lucene uses the Term Frequency/Inverse Document Frequency computation (TF-IDF) ([
22]), whereby term frequencies are normalized by the number of documents in the corpus that contain that term, thereby attaching greater significance to term frequencies where that term is rarer across the corpus. Additionally, each indexed Vi item contains a reference to its related 3D object (i.e., V
3D), which represents an E
n edge.
Three 2-term search queries were developed and tested: “transfer slab”, “lobby stair” and “roof pavilion”. In addition, to test the effect of the query type, simple 1-term variants of the queries were also tested: “transfer”, “lobby” and “pavilion”. Those queries were chosen as they each returned about 100 search results, which was deemed to be a sensible amount to enable the research questions to be addressed. A 35 min interview was conducted with the project’s BIM manager, to identify relevant items for each test query. The interview was conducted in compliance with the ethical research protocol of Loughborough University. The interview consisted of a contextual introduction to the BIM search engine and this research, followed by questions regarding what the expert expected a BIM search engine to retrieve from the 3D model for each test query. Although 3DIR search results are, strictly speaking, Vi items, it was felt more natural to ask the interviewee to list relevant V3D items. In the subsequent analysis and calculation of IR performance measures, a research-specific 3DIR-ranked list of search results was created. A V3D object was considered retrieved and ranked at its top-ranking Vi item; any lower-ranked Vi items were not considered.
One expert was interviewed due to the limited availability of others. Although information from one expert was satisfactory, it lacked interrater reliability. However, the first author’s familiarity with the structural model meant that relevant items could independently be identified. Discrepancies between the expert’s and the researcher’s lists of relevant items were discussed at length during the interviews until consensus was reached. With the interviewee’s permission, the interview was audio recorded.
The graph theoretic formulation above provides a useful framework for exploiting topological relationships (i.e., E
t edges) in relevance computations for IR. To test this, the following computations were implemented. When a keyword query is entered in 3DIR, the tool computes a score for each V
i item based on term frequency. This standard score is referred to as
S(
Vi). Each
Vi item relates to a 3D object (
V3D), connected by
En edge. Therefore, a holistic score for a 3D object can be calculated as the mean of the standard term frequency scores of its constituent V
i items:
where the overbar indicates the mean of the S(V
i) scores for the 3D object in question.
For each 3D object, E
t edges can be followed to generate a list of Neighbor objects (i.e., other 3D objects related by
hosting,
touching or
interesting relationships) as well as a list of Neighbor-of-Neighbor objects. Based on the query at hand, an S(V
3D) score can be calculated for each 3D object in those sets of Neighbors and Neighbors-of-Neighbors by taking the mean of the object’s S(V
i) scores (as in Equation (1)). Subsequently, a mean score can be calculated for all the Neighbor objects and Neighbor-of-Neighbor objects for each 3D object in the index, represented by S(V
3D−N) and S(V
3D−NN), respectively:
where the overbar indicates the mean of
s(
V3D−N) and
s(
V3D−NN) for the 3D object in question.
The above computations form the basis for the relevance measures, presented in
Table 1. These, in turn, allowed testing the effects of exploiting an object’s relationships in a 3D model, in an attempt to improve the ranking of search results, i.e., the measurement of relevance to a query. The additional consideration of Neighbors (denotated by N) and Neighbors-of-Neighbors (denoted by NN) in the IR computations aimed to account for context when identifying relevant items (rather than retrieving items in isolation from a 3D model). The 3DIR toolset generates these sets of related 3D objects using the
hosting,
touching or
intersecting relationships. As the V
3D object is the meaningful “atom” of information, the bottom three relevance measures in
Table 1 are the main ones reported, with the “
Vi + V3D” relevance measure serving as the baseline.
In
Table 1,
C1–
C9 are constants, i.e., weighting factors as shown in
Table 2. These constants were maintained from earlier research [
16], where they were heuristically set. Fine-tuning the constants was considered beyond the scope of this research, although a sensitivity analysis was conducted and is reported here in
Section 4.3.
The Revit model used contained 27,008
V3D objects, which translated into 146,036
Vi items. Therefore, on average, a 3D object from the model contained around 5 indexed parameters. Of the 27,008 3D objects, 3DIR registered only 2932 objects with Neighbors (1.85 Neighbors per 3D object on average). These objects, on average, also had 3.42 Neighbors-of-Neighbors (estimated by squaring the average number of Neighbors, as the number of Neighbors-of-Neighbors was not stored in the index). The sparse set of topological relationships in the model is concerning; only just over one-tenth of the 3D objects had Neighbors, and each object with Neighbors had, on average, just under two Neighbors. This raises concerns about the experiments realizing the potential for retrieval performance improvement from considering topological relationships. This limitation is discussed in more detail in
Section 6.
To test the effect of compliance with IFC and Uniclass-2015 standards, 3DIR functionality was used to index only IFC or Uniclass-2015 data. Once converted to IFC, the model contained 8419 V3D objects and 94,645 Vi items (on average, 11 indexed parameters per 3D object). Upon closer inspection of the illogical drop in V3D objects following conversion to IFC, it emerged that some items in the Revit model were being wrongly considered as V3D instead of Vi. For Uniclass-2015, the model contained 4384 V3D objects (lower still, but understandable as not all 3D objects contained Uniclass-2015 data), and 8768 Vi items.
Measurements of Precision and Recall were used to gauge IR performance, given lists of relevant
V3D objects for each test query from the interviews and ranked lists of retrieved
V3D objects for each test query from 3DIR. Other IR performance measures were considered but ultimately rejected; from the literature reviewed, Recall and Precision proved to be the most practical and widely accepted. Precision and Recall were combined in Precision–Recall curves. In addition, Average Precision (AP) was used to summarize entire Precision–Recall curves by averaging the Precision at the eleven standard Recall levels. For reporting results, Mean Average Precision (MAP) was also used, which averaged the three AP values for a particular query type (e.g., 1-term) and model (Revit, IFC or Uniclass-2015). One drawback of AP values was that they favor cases where relevant V
3D objects are retrieved quickly, i.e., early in the ranking [
22]. Thus, 3DIR might give good AP values, but have poor Recall performance. For this reason, an F-measure (i.e., harmonic mean) combining Recall and Precision was considered. However, it would not capture the ranking change of search results that was expected from the contextual measures (as the AP measure does). Variations of the F-measure were also considered, but ultimately the AP measure was deemed superior. As a result, in cases where the AP obscures Recall performance, further analysis of IR performance was conducted via the plotted Precision–Recall curves, allowing deeper investigation into IR performance in information standards. Ultimately, the single value summary enabled tabulation and juxtaposition of IR performance results.
6. Conclusions
This research investigates information retrieval from BIM environments, where information is linked to a 3D model. In particular, it focuses on IR performance based on the 3D object parameters/attributes (textual or symbolic data) utilizing the 3DIR toolset. This research examines contextual relevance measures, which in addition to the retrieved 3D object, consider the relevance of other 3D objects related to the object in question, in order to improve search results ranking (i.e., Precision). Although significant improvement in Precision performance remains elusive, this research proposes a promising retrieval mechanism from 3D models. It provides a framework for exploiting relationships between objects for the retrieval of information from BIM models. The contextual relevance measures presented in this paper constitute a significant original contribution, whether measuring the relevance of a 3D object as a whole, a 3D object plus its set of “Neighbors” or a 3D object plus its sets of “Neighbors” and “Neighbors-of-Neighbors”. This research has also uniquely extended the 3DIR BIM search engine to comply with information standards and shows that retrieval from IFC models is adequate, while retrieval from Uniclass-2015 datasets is poor.
Only a single model (and a limited subpart of a larger construction development) could be studied. A larger scope building model would have allowed a fuller exploration of the benefits of the concepts proposed here. The limited modelling of relationships between 3D objects (with each 3D object being related to 1.85 others on average) also might have prevented the true benefit of the contextual measures from emerging. It would be promising to explore other relationships, beyond hosting, touching or intersecting. Other relationships might be based on other spatial relationships or shared attributes such as supplier, material, or fire rating. For IFC, it was not possible to study relationships between 3D objects. This was because of the way a Revit model is exported to IFC rather than a limitation of the IFC schema itself. Indeed, the accuracy of the conversion to models complying with information standards is questionable and raises concerns about the validity of the findings reported here. The IFC model contained about 32% of the number of 3D objects in the native Revit model. For Uniclass-2015, the number of 3D objects in the Uniclass-2015 model was 16% of the number of 3D objects in the Revit model. Although this limited translation to standard formats is a noteworthy result, richer models complying with information standards are needed to test the true effect of these information standards on information retrieval performance. The sheer quantity of objects in the model used here would make manual classification unfeasible. To produce such richer models, future research can explore semiautomatic mechanisms to support the automatic translation into information standards.
Despite the limitations, the particular model used was from a real, large-scale project with limited detail in the model, perhaps a challenging, near “worst-case” scenario. This should mean that the findings reported here are generalizable and that the contextual relevance measures are worthy of future research and development beyond the 3DIR toolset.
Further research is recommended to fine-tune the relevance measures proposed. The weighting factors can be adjusted to optimize IR performance. Further research is also recommended to extend the type of 3D object relationships that are exploited when searching. In an editorial to a special issue on the topic [
44], a research agenda has been set out to study topology, which touches on the diversity of topological relationships. Mechanisms have been proposed [
45] to automatically identify such spatial relationships in 3D Geographic Information Systems datasets. Based on the exponential rise of information stored digitally, BIM platforms are increasingly in need of a dedicated search engine. It is recommended that this work inform the built-in search functions included in commercial BIM platforms. Given the good IR performance reported in this research, the construction industry can benefit from such developments, facilitating knowledge/information management and drastically reducing the time spent searching for information.