BIM Search Engine: Effects of Object Relationships and Information Standards

Molsa, Maciej; Demian, Peter; Gerges, Michael

doi:10.3390/buildings13071591

Open AccessArticle

BIM Search Engine: Effects of Object Relationships and Information Standards

by

Maciej Molsa

¹,

Peter Demian

^2,*

and

Michael Gerges

³

¹

Now First Limited, London NW6 2JX, UK

²

School of Architecture, Building and Civil Engineering, Loughborough University, Loughborough LE11 3TU, UK

³

Faculty of Science and Engineering, University of Wolverhampton, Wolverhampton WV1 1LY, UK

^*

Author to whom correspondence should be addressed.

Buildings 2023, 13(7), 1591; https://doi.org/10.3390/buildings13071591

Submission received: 5 May 2023 / Revised: 8 June 2023 / Accepted: 20 June 2023 / Published: 23 June 2023

(This article belongs to the Section Construction Management, and Computers & Digitization)

Download

Browse Figures

Versions Notes

Abstract

As Building Information Modeling (BIM) models are getting bigger, with more information linked to geometrical 3D models, a dedicated BIM search engine is important. A BIM search engine was developed to examine the value of exploiting a 3D object’s topological relationships to other 3D objects when assessing that object’s relevance to a query. The impacts of two information standards, the Industry Foundation Classes (IFC) and Uniclass-2015, on information retrieval (IR) performance were also measured. The 3DIR Autodesk Revit toolset was used on a structural model of a 16-story building from an industry partner. The retrieval performance measures of Precision and Recall did not clearly highlight the benefit, although the increased relevance values of those objects deemed by experts to be relevant demonstrates the promise of such contextual measures. The effect of shifting from the native Revit file format to various standards was tested: IR performance was poor with the Uniclass-2015 dataset and was comparable to the Revit native model with the IFC model. Although not shown conclusively to improve retrieval performance, the contextual relevance measures presented in this paper are promising and constitute a significant original contribution. Future research is needed to fine-tune these measures and fully realize their potential.

Keywords:

building information modelling; search engine; information retrieval; 3DIR; topology; information standards

1. Introduction

As the construction sector undergoes a digital transformation, the amount of information packed into building models is exponentially increasing. Building design, construction and operation are information-intensive activities. For example, even two decades ago in UK construction, on average, one computer-aided design (CAD) document was produced for every 9 m² of building floor space [1]. Researchers [2] have reported the problem of ‘information overload’ in the construction sector. Building Information Modeling (BIM) models are following this general trend and becoming more information-rich. Regarding volumes of information specifically in BIM models, BIM platforms have been identified as a particularly favorable communication medium in construction compared to extranets, email and Enterprise Resource Planning systems [3]. The advantages of BIM over documents and extranets have been reported [4]. Although no absolute measures of the quantities of information were found, the implication from such studies is that BIM models are increasingly information-rich, motivating the development of a search engine to enable BIM users to meet their information needs.

Researchers have studied the negative impact of information overload on productivity in general office work [5,6]. Practitioner-based findings from the construction industry agree with those from academic research. In a panel discussion between industry experts organized by Construction Manager magazine [7], all agreed that ‘information overload’ was a huge concern that hinders a project’s productivity. Similarly, in a report commissioned by the British Government [8], it is argued that the industry must modernize and embrace digital technologies to tackle the acute problems of low productivity and poor collaboration.

Although such findings are based on subjective perceptions rather than objective measurements, from a work productivity perspective, the generation and management of information do not appear to be the problem but rather the retrieval and consumption of information. According to a survey of nearly 600 construction leaders from around the world, construction professionals spend around 14% of their time searching for project information [9]. It is often implicitly assumed that information will be created during the design and construction of the built environment, but particular effort is needed to make this information retrievable and reusable [10]. Beyond construction, a survey of 345 IT and storage professionals [11] highlighted the importance of this later phase of retrieval in the information lifecycle.

Search engines for building model archives and project databases were found to improve aspects of design [12], cost [13] and construction [14]. Information search and retrieval in BIM have been studied from a number of perspectives. A review [15] cites knowledge management, design reuse and continuous improvement as the main motives for developing BIM information retrieval systems (i.e., search engines). Approaches are classified into context-, geometry-, and content-based BIM retrieval. The research reported here uniquely touches upon all three, combining text search augmented by 3D data and interrelationships between 3D objects to account for context.

The science of information retrieval (IR) is usually associated with documents, which can be thought of as relics from the pre-BIM age. It has been shown [12] that parameters in 3D building models could be treated as very short documents, and the application of traditional IR techniques yielded reasonable retrieval performance. The challenge remains of augmenting traditional text-based IR techniques with 3D data. The work reported here builds on earlier inconclusive work [16] and specifically explores the exploitation of interrelationships between 3D objects to improve retrieval performance and the impact of information standards on retrieval performance.

Interrelationships between 3D objects can account for context when identifying items relevant to a search query. The importance of context is increasingly recognized [15,17]. The study of interrelationships between objects in 3D models is often labeled as model topology. A succinct review of the concept was presented as part of the 3DIR project [16]. In this research, topological relationships are taken to include any relationships between 3D building elements in a model that may enhance information retrieval. These relationships might be strictly topological and concerned with interior/boundary/exterior of 3D components as noted below, more general spatial/directional relationships, or even relationships as they occur in a very general semantic sense, albeit linked to 3D space. In this sense, any two objects in a model sharing the same attribute (e.g., two components supplied by the same manufacturer or made of the same material) can be said to be related. If a user searching for information is interested in the first object, but not the second object related to it, information from the second object can still be retrieved but ranked as less relevant. Graph theory has been proposed as a useful theoretical lens for studying model topology to improve retrieval performance [16] and this is the lens adopted here.

As more of the design, construction and operation of constructed facilities is digitized, and in light of the inherent collaborative and fragmented nature of the sector, the importance of standardization has emerged. The research reported here, therefore, goes on to investigate the effect of information standards on retrieval performance. Information standards are increasingly needed in the delivery of design and construction by complex supply chains, and it is important that standards would not prevent BIM users from meeting their information needs. Two pertinent standards are tested: Industry Foundation Classes (IFC) and Uniclass-2015. Previous research [18] identified those two as among the most widely adopted in the UK. IFC has emerged as a globally influential standard schema for BIM data exchange. Uniclass-2015 is particularly important in the UK where it is aligned to the country’s BIM strategy.

The aim of this research is to investigate how information retrieval performance from BIM models can be improved by exploiting relationships between 3D objects in the model and to gauge the effect of information standards on IR performance.

2. Literature Review

2.1. Information Management in BIM

Despite being exponentially crammed with information, BIM is still emerging as a useful anchor for information management purposes, whether to facilitate the retrieval and flow of information within a single project or between projects. Broadly, knowledge (information) management has been proposed to address many challenges in construction, ultimately benefitting project quality, time and cost [19]. The CoMem (Corporate Memory) system supports design reuse from project to project to avoid wheels being needlessly reinvented [12]. A prototype system, ContextGen [13], retrieves relevant and contextualized cost information from past projects. An automatic retrieval system [14] is proposed providing similar past accident cases, to increase health and safety on construction sites. IR techniques have been adapted to develop a system for automatic classification of construction documents, associating documents with their corresponding CAD components [20]. Beyond text, techniques have been developed [21] to automatically classify construction site photographs, and IR techniques have been adapted to enhance retrieval of relevant site photographs from project databases. The literature reviewed here demonstrates the potential of applying IR techniques in BIM environments, with their heterogenous information types. The need for a BIM search engine clearly emerges, prompting the need for a review of information retrieval concepts and literature.

2.2. Information Retrieval

Information retrieval (IR) is concerned with systems that support users in meeting their information needs. IR textbooks [22] distinguish two modes of interacting with an information repository: retrieval and browsing. This research focuses on retrieval, specifically on how 3D content can improve the quantification of relevance to a query and, ultimately, the ranking of search results. Relevance is generally defined in terms of being connected or pertinent to a matter at hand, but is potentially multifaceted and difficult to understand and quantify [23]. Relevance remains a cornerstone of IR and has been the focus of much research [23,24]. Quantifying relevance remains challenging because of its subjectivity and context dependency. It appears from the literature that there is still scope for research to help meet this challenge. The 3D object orientation of BIM environments offers unexplored opportunities. Applying established information retrieval techniques to the text in BIM models is a starting point to quantify relevance, but 3D information and relationships between 3D objects offer the potential for further improvements in retrieval performance.

Focusing on improving the performance of search engines and the underlying quantification of relevance, there has been much relevant research in particular on built environment design and construction. Query expansion has been proposed [25] to improve retrieval performance of an online product search engine. Domain knowledge in the form of an ontology has also been used [26], which normalizes and expands index terms to improve retrieval of useful documents. Furthermore, domain knowledge has been complemented with natural language processing [27] to improve retrieval from BIM object libraries. Natural language processing and IFC have also been combined to improve retrieval from hierarchical BIM models [28].

The need for IT to consider context has also been recognized in IR in general [29] and particularly in the built environment [15]. Four “levels of context” have been set out [30] that are relevant to information retrieval. Context in the research reported here is most closely aligned to their query level of context, whereby context is information not explicitly encoded in queries or information resources, but which nevertheless might improve query retrieval performance.

Evaluating IR systems is important. The most widely used measures are Recall and Precision [24,31]. Recall is the ratio between the number of relevant items retrieved and the total number of relevant items in the collection. It gauges a search engine’s ability to retrieve relevant items. Precision is the ratio between the number of relevant items retrieved and the total number of items retrieved. It gauges a search engine’s ability to filter out irrelevant items (Section 3.2.1 in [22] gives precise equations for calculating Precision and Recall).

Precision and Recall are complementary and are usually combined in a Precision–Recall curve where Precision is recorded as more and more results are retrieved (and Recall increases). Section 3.2.1 in [22] gives the procedure for generating a Precision–Recall curve, which is the procedure used for the curves presented in this paper. A Precision–Recall curve can be summarized in a single figure by averaging the Precision at 11 standard Recall levels. This is useful for comparing relevance computations. Although Precision and Recall have been used over the last four decades, they are still recognized as valid and preferred among researchers.

2.3. Topological Modeling in Buildings

Objects in BIM models are interrelated, and this is often referred to as model topology. Exploiting such relationships may help to account for context and can conceivably improve retrieval performance. In a general sense, topology is the study of the way in which constituent parts are interrelated or arranged. In spatial modelling, topology is concerned with the notions of “interior”, “boundary”, or “exterior”. These notions could be captured by the Industry Foundation Classes, as buildings were modelled in 3D Euclidean space [32]. Algorithms have been presented [33] for the standard topological operators in 3D space: within, contain, touch, overlap, disjoint and equal. Conceptually related, spatial–topological relationships within floorplans have been captured [34], creating a sketch-based search engine aiding retrieval of similar architectural floor plans. A Query Language for Building Information Models (QL4BIM) has been proposed [35], with algorithms and implementation methods focusing on spatial semantic queries. The concept of exploiting topology in 3D models appears to be a promising avenue for exploration regarding improving retrieval from BIM models.

2.4. Graph Theoretic Formulation of Information Linked to 3D Models

A graph-theoretic formulation and accompanying relevance computations [16] are invoked as the points of departure for this research. This theoretical lens has proven to be extremely useful in researching information linked to 3D space, as in information-rich BIM models. The graph theoretic vocabulary of vertices and edges serves as an elegant language to convey 3D objects enriched with attributes and possibly linked to external documents. Edges in graphs are particularly important for this research, as relationships between 3D objects can be thought of as edges.

Principles of graph theory have been used by several researchers [36,37] to capture the topological relationships between building objects. However, unlike reported here, none of the studies encountered in the literature had clearly distinguished between 3D and textual information or between different relationship types in a 3D model.

Figure 1 gives the graph theoretic formulation that makes these distinctions. A graph consists of a set of vertices connected by edges, where an edge links only two vertices. Hence, mathematically modelling any graph X consists of listing its set of vertices V(X) and set of edges E(X) [38]. In the case of BIM, this formulation distinguishes two sets of vertices and two sets of edges:

V_3D: The set of vertices representing 3D objects in the model.
V_i: The set of vertices representing information items linked to the 3D objects, which can be either 3D object properties treated as short documents or full-text documents.
E_n: A ‘natural’ edge joining a vertex in set V_3D and a vertex in set V_i, which originates from the nature of parametric modelling in 3D BIM environments.
E_t: A ‘topological’ edge joining two related vertices in set V_3D, representing the relationship between these two 3D objects. Such relationships and their exploitation to improve retrieval performance are focal points of this research.

2.5. Information Standards

BIM integrates the fragmented construction industry by increasing collaboration and facilitating sharing of information between organizations and across all project phases. With a variety of BIM platforms available, interoperability between software applications remains an issue. Information standards, i.e., neutral file formats and standard classifications, are being continuously developed and are necessary to support data exchange. The concept of Open BIM is based on open standards and workflows that allow different stakeholders to share data across any BIM platform.

From previous research on information management in a digitalized construction industry in Britain [18], Industry Foundation Classes (IFC), Uniclass-2015 and Construction Operations Building Information Exchange (COBie) emerged as pertinent standards. It is noteworthy that IFC is a schema, while Uniclass-2015 and COBie are classification systems. Limited research was found on the effect of information standards on retrieval performance, especially Uniclass-2015 and COBie. Previous research has concluded that the complex nature of IFC makes retrieval from IFC models difficult [39,40]. Furthermore, Uniclass-2015 has been identified [18] as having a significant professional following in the UK but having less flexibility in terms of classifying object parameters compared to the ISO 15926 series of standards. Ultimately, the literature review seems to suggest poor IR performance for both IFC and Uniclass-2015 compliant models. (Interestingly, researchers [41] have attempted to characterize and classify BIM users).

To conclude the literature review, the need for a BIM search engine is apparent, particularly one that accounts for context. Topological relationships within BIM models offer this opportunity to account for context, but IR performance deserves formal study, as does the impact of compliance with information standards.

3. Materials and Methods

The steps of the research method are illustrated in Figure 2. To investigate the exploitation of topological relationships in BIM models, the 3DIR toolset was adopted. The 3DIR toolset was a publicly available BIM search engine developed as an add-in for Autodesk Revit as part of previous research [16]. The 3DIR toolset indexes the text parameters of the 3D objects in a Revit model and enables keyword searches based on this index. It also follows URLs to index external documents linked to 3D objects. The 3DIR toolset exploits model topology by identifying related objects and encoding these relationships in the index. Related objects are currently identified based on hosting, touching or intersecting relationships [16]. The hosting relationship is explicitly encoded in the Revit information architecture, whereas the touching and intersecting relationships can be inferred using simple computations through the Revit Application Programming Interface. Those relationships were chosen as a feasible set of relationships to test the concept of exploiting such relationships. A deeper investigation into meaningful relationships between 3D objects is recommended for future work. For this research, a structural 3D model was obtained from Sir Robert McAlpine’s project, Battersea Phase 3a. The Battersea Power Station project is a £9 bn eight-phase development on the banks of the River Thames featuring more than 4000 new homes, commercial offices, retail units, and a hotel. Phase 3a is a key phase within the overall masterplan, comprising over 550 apartments, a lifestyle hotel, and a pedestrianized retail high street. The specific building adopted for this research is a large 16-storey multifunctional development. The actual building is shown on the right of Figure 1. At its core, as stated above, the 3DIR toolset creates an index of all text terms in a Revit model, i.e., texts from parameters and linked documents. With reference to Figure 1, each entry in the index is a V_i item. Each V_i item in the index is represented as a high-dimensionality vector, with a weight based on term frequency representing each textual term. The Apache Lucene library [42] was used to compute these vector space model weights. Lucene uses the Term Frequency/Inverse Document Frequency computation (TF-IDF) ([22]), whereby term frequencies are normalized by the number of documents in the corpus that contain that term, thereby attaching greater significance to term frequencies where that term is rarer across the corpus. Additionally, each indexed Vi item contains a reference to its related 3D object (i.e., V_3D), which represents an E_n edge.

Three 2-term search queries were developed and tested: “transfer slab”, “lobby stair” and “roof pavilion”. In addition, to test the effect of the query type, simple 1-term variants of the queries were also tested: “transfer”, “lobby” and “pavilion”. Those queries were chosen as they each returned about 100 search results, which was deemed to be a sensible amount to enable the research questions to be addressed. A 35 min interview was conducted with the project’s BIM manager, to identify relevant items for each test query. The interview was conducted in compliance with the ethical research protocol of Loughborough University. The interview consisted of a contextual introduction to the BIM search engine and this research, followed by questions regarding what the expert expected a BIM search engine to retrieve from the 3D model for each test query. Although 3DIR search results are, strictly speaking, V_i items, it was felt more natural to ask the interviewee to list relevant V_3D items. In the subsequent analysis and calculation of IR performance measures, a research-specific 3DIR-ranked list of search results was created. A V_3D object was considered retrieved and ranked at its top-ranking Vi item; any lower-ranked V_i items were not considered.

One expert was interviewed due to the limited availability of others. Although information from one expert was satisfactory, it lacked interrater reliability. However, the first author’s familiarity with the structural model meant that relevant items could independently be identified. Discrepancies between the expert’s and the researcher’s lists of relevant items were discussed at length during the interviews until consensus was reached. With the interviewee’s permission, the interview was audio recorded.

The graph theoretic formulation above provides a useful framework for exploiting topological relationships (i.e., E_t edges) in relevance computations for IR. To test this, the following computations were implemented. When a keyword query is entered in 3DIR, the tool computes a score for each V_i item based on term frequency. This standard score is referred to as S(V_i). Each V_i item relates to a 3D object (V_3D), connected by E_n edge. Therefore, a holistic score for a 3D object can be calculated as the mean of the standard term frequency scores of its constituent V_i items:

S (V_{3 D}) = \bar{S (V_{i})}

(1)

where the overbar indicates the mean of the S(V_i) scores for the 3D object in question.

For each 3D object, E_t edges can be followed to generate a list of Neighbor objects (i.e., other 3D objects related by hosting, touching or interesting relationships) as well as a list of Neighbor-of-Neighbor objects. Based on the query at hand, an S(V_3D) score can be calculated for each 3D object in those sets of Neighbors and Neighbors-of-Neighbors by taking the mean of the object’s S(V_i) scores (as in Equation (1)). Subsequently, a mean score can be calculated for all the Neighbor objects and Neighbor-of-Neighbor objects for each 3D object in the index, represented by S(V_3D−N) and S(V_3D−NN), respectively:

S (V_{3 D - N}) = \bar{s (V_{3 D - N})}

(2)

S (V_{3 D - N N}) = \bar{s (V_{3 D - N N})}

(3)

where the overbar indicates the mean of s(V_3D−N) and s(V_3D−NN) for the 3D object in question.

The above computations form the basis for the relevance measures, presented in Table 1. These, in turn, allowed testing the effects of exploiting an object’s relationships in a 3D model, in an attempt to improve the ranking of search results, i.e., the measurement of relevance to a query. The additional consideration of Neighbors (denotated by N) and Neighbors-of-Neighbors (denoted by NN) in the IR computations aimed to account for context when identifying relevant items (rather than retrieving items in isolation from a 3D model). The 3DIR toolset generates these sets of related 3D objects using the hosting, touching or intersecting relationships. As the V_3D object is the meaningful “atom” of information, the bottom three relevance measures in Table 1 are the main ones reported, with the “V_i + V_3D” relevance measure serving as the baseline.

In Table 1, C₁–C₉ are constants, i.e., weighting factors as shown in Table 2. These constants were maintained from earlier research [16], where they were heuristically set. Fine-tuning the constants was considered beyond the scope of this research, although a sensitivity analysis was conducted and is reported here in Section 4.3.

The Revit model used contained 27,008 V_3D objects, which translated into 146,036 V_i items. Therefore, on average, a 3D object from the model contained around 5 indexed parameters. Of the 27,008 3D objects, 3DIR registered only 2932 objects with Neighbors (1.85 Neighbors per 3D object on average). These objects, on average, also had 3.42 Neighbors-of-Neighbors (estimated by squaring the average number of Neighbors, as the number of Neighbors-of-Neighbors was not stored in the index). The sparse set of topological relationships in the model is concerning; only just over one-tenth of the 3D objects had Neighbors, and each object with Neighbors had, on average, just under two Neighbors. This raises concerns about the experiments realizing the potential for retrieval performance improvement from considering topological relationships. This limitation is discussed in more detail in Section 6.

To test the effect of compliance with IFC and Uniclass-2015 standards, 3DIR functionality was used to index only IFC or Uniclass-2015 data. Once converted to IFC, the model contained 8419 V_3D objects and 94,645 V_i items (on average, 11 indexed parameters per 3D object). Upon closer inspection of the illogical drop in V_3D objects following conversion to IFC, it emerged that some items in the Revit model were being wrongly considered as V_3D instead of V_i. For Uniclass-2015, the model contained 4384 V_3D objects (lower still, but understandable as not all 3D objects contained Uniclass-2015 data), and 8768 V_i items.

Measurements of Precision and Recall were used to gauge IR performance, given lists of relevant V_3D objects for each test query from the interviews and ranked lists of retrieved V_3D objects for each test query from 3DIR. Other IR performance measures were considered but ultimately rejected; from the literature reviewed, Recall and Precision proved to be the most practical and widely accepted. Precision and Recall were combined in Precision–Recall curves. In addition, Average Precision (AP) was used to summarize entire Precision–Recall curves by averaging the Precision at the eleven standard Recall levels. For reporting results, Mean Average Precision (MAP) was also used, which averaged the three AP values for a particular query type (e.g., 1-term) and model (Revit, IFC or Uniclass-2015). One drawback of AP values was that they favor cases where relevant V_3D objects are retrieved quickly, i.e., early in the ranking [22]. Thus, 3DIR might give good AP values, but have poor Recall performance. For this reason, an F-measure (i.e., harmonic mean) combining Recall and Precision was considered. However, it would not capture the ranking change of search results that was expected from the contextual measures (as the AP measure does). Variations of the F-measure were also considered, but ultimately the AP measure was deemed superior. As a result, in cases where the AP obscures Recall performance, further analysis of IR performance was conducted via the plotted Precision–Recall curves, allowing deeper investigation into IR performance in information standards. Ultimately, the single value summary enabled tabulation and juxtaposition of IR performance results.

4. Results

4.1. Basic Retrieval

Based on the test search queries developed for the structural 3D model, the 3DIR information retrieval results are shown in Table 3 and Table 4 for the Revit model and the IFC and Uniclass-2015 variants. Table 3 summarizes the retrieval results for one-term test queries (labelled 1a, 1b and 1c), and Table 4 summarizes the retrieval results for two-term test queries (labelled 2a, 2b and 2c). It is noteworthy that the interviewer and interviewee did not always agree on relevant/irrelevant objects during the interview; the data in Table 3 and Table 4 are after such discrepancies were discussed and resolved.

4.2. Precision–Recall Curves

Figure 3 shows the average (across three one-term queries; 1a, 1b and 1c) Recall-Precision curves for Revit, IFC and Uniclass-2015 models, using the holistic baseline measure, i.e., “V_i + V_3D” relevance measure. Figure 4 shows the same but averaged over the three two-term queries; 2a, 2b and 2c. The following observations can be made:

IR performance is better for one-term than two-term queries.
Relevant V_3D objects were ranked earlier in the search results for the Revit model than the IFC and Uniclass-2015 models.
On average, IFC retrieved fewer relevant V_3D objects; this is particularly clear from the lower IFC Precision in Figure 3
IR performance was generally poorest in the Uniclass-2015 model, with no or few relevant V_3D objects retrieved.

The contextual measures tested (i.e., relevance measures exploiting relationships between 3D objects) were expected to be useful, especially for the two-term queries. For example, if a user was searching for ‘lobby stair’, such a contextual measure was intended to rate the relevance of a ‘stair’ object more highly if it was touching other objects in the ‘lobby’. However, this desired effect was not observed, and the IR performance remained unchanged for the contextual measures compared to the baseline measure. Thus, those Precision–Recall curves are not reported here for brevity. Nevertheless, the following observations can be made regarding the “V_i + V_3D + N” and “V_i + V_3D + N + NN” relevance measures:

In Revit, such contextual measures did change the relevance scores but minimally, not impacting the ranking. To further explore this effect, the weighting constants for the topological relationships were experimentally amplified (as described below).
In IFC, such contextual measures had no influence on relevance scores.
In Uniclass-2015, query 2b showed no change in relevance scores. The sparse Uniclass-2015 data meant that few neighboring objects were retrieved, preventing conclusions from being drawn.

4.3. Amplified Weighting Constants for Topological Relationships

The contextual measures in Revit seemed to have a minimal effect, which could potentially affect the ranking of search results for the better if amplified. The small magnitude of the effect was because most objects in the sets of Neighbors and Neighbors-of-Neighbors have a relevance score of zero to the query, and so the averages over those sets were small (the S(V_3D−N) and S(V_3D−NN) terms in Table 1). To explore this, a simple test was carried out to exaggerate the weighting of constants assigned to the scores of Neighbors and Neighbors-of-Neighbors (Table 2). Constants C₅, C₈ and C₉ were changed to 10, the maximum possible value allowed by the 3DIR prototype settings. Table 5 summarizes the results for all test search queries when these constants were increased.

Ultimately, Table 5 indicates that contextual measures work as expected, but the relevance scores of objects remained unchanged whenever their neighboring objects are considered irrelevant (i.e., scoring ‘zero’) or simply no Neighbors/Neighbors-of-Neighbors are captured for these objects by 3DIR. The negatively affected ranking (i.e., Precision) for query 2b is notable. The IR results for query 2b, in which the amplified constants were used in the contextual measures are shown in Figure 5. In this case, objects deemed to be irrelevant by the human expert were ranked higher by the contextual measures, resulting in a drop in Precision for those measures. The only possible explanation is that for this particular query, “Lobby stair”, the Neighbors and Neighbors-of-Neighbors of the relevant objects were in general irrelevant. Therefore, the contextual measures diluted rather than strengthened the relevance measures of those relevant objects.

4.4. Recall Performance

To compare Recall performance between Revit, IFC and Uniclass-2015, basic Recall figures are plotted, regardless of the Precision data, based on the V_3D figures in Table 3 and Table 4. Figure 6 and Figure 7 show the mixed Recall performance for Revit and IFC, with Uniclass-2015 lagging behind, for both one-term and two-term queries.

It is arguable that the strikingly poor Recall performance for the Uniclass-2015 model can be attributed to the low proportion of 3D objects with Uniclass-2015 classifications, but this is not the case. Removing such unclassified 3D objects from the list of relevant 3D objects and the ranked list of retrieved 3D objects did not have any effect on the zero Recall scores for queries 2a and 2c; and it had a very small effect on the Recall for query 2b.

4.5. Information Retrieval Performance Summary

Table 6 summarizes the IR performance in terms of MAP values of all tests conducted in this research, examining the impact of a 3D object’s relationships and information standards on IR performance, using the heuristically set values for the constants, as set out in Table 2.

From Table 6, three main conclusions can initially be drawn: relevance measures considering topological relationships had no significant effect on IR performance, IR performance decreased with information standards (with a very poor IR performance in Uniclass-2015), and IR performance was generally worse for two-term queries than one-term queries. However, these results need to be read with some caution, mainly due to the findings from additional tests carried out and the fact that AP measures can obscure some aspects of IR performance.

5. Discussion

5.1. One-Term vs. Two-Term Query Performance

From Table 6, one-term queries seem to outperform two-term queries. This may be due to the retrieved V_3D objects for one-term queries predominantly being relevant (all retrieved objects from the Revit model were relevant), resulting in high Precision and excellent MAP results. However, turning to the Recall performance for one-term queries, Figure 6 clearly shows a lack of relevant 3D objects being retrieved by 3DIR (low Recall, especially for Queries 1b and 1c). This is also apparent from Figure 3, but the effect is diluted by averaging the Precision values at the higher Recall levels across the three queries. Comparing query 1a to Queries 1b and 1c, the term ‘transfer’ is more related to the object’s technical specification, while ‘lobby’ and ‘pavilion’ are terms related to the object’s spatial location in the building. Logically, the poor retrieval of relevant 3D objects for Queries 1b and 1c stemmed from the quality and richness of the textual information embedded in the objects. In this sense, IR performance heavily relies on practitioners including quality information into the model’s objects.

The additional keyword for two-term queries resulted in a wider distribution of relevance scores of retrieved results, as expected. This was due to, for example for query 2b, a ‘lobby stair’ object becoming more relevant than a ‘lobby lift’ object, resulting in a ranking that was more focused to the query. However, by comparing Figure 3 and Figure 4, IR performance (i.e., Precision) was clearly hindered for the two-term queries (also visible by comparing the one-term and two-term columns in Table 6). Due to the more specific queries created by the additional term, it was somewhat expected for these queries (especially 2a and 2b) to perform worse compared to one-term queries; fewer 3D objects were now relevant among 3DIR’s retrieval of more objects by simple keyword matching. Yet, the IR performance for two-term queries, including Recall (Figure 7), was good. The contextual measures tested (i.e., relevance measures exploiting relationships between 3D objects) aimed to improve Precision performance of these queries but proved to be unsuccessful in some cases (discussed below). In running query 2b in Revit, search results differed when ‘stair’ or ‘stairs’ was used as the keyword. Although this did not impede IR performance in this research, it is recognized that, again, the quality of text and consistency throughout the model is vitally important. In such cases, stemming would be beneficial, as indicated in other research [12].

Compared to other test query pairs, query 1c (‘pavilion’, which was located on the roof) and query 2c (‘roof pavilion’) shared the same 3D objects judged to be relevant. This query pair, therefore, provided the opportunity to isolate the effect of the more descriptive two-term queries. It was hoped that the additional term would result in relevance scores of relevant objects retrieved by ‘pavilion’ to increase, while also retrieving more relevant objects with only the term ‘roof’ that would otherwise be overlooked. This expected benefit of the second query term was clearly realized in the case of the IFC model, as shown in Figure 8. However, the additional term ‘roof’ increased the relevance scores of objects retrieved by the one-term query by an approximately similar magnitude, whether they were relevant objects or not. Hence, no improvement appears in Figure 8 in terms of Precision up to the 0.7 Recall level. Moreover, the additional objects retrieved in IFC by ‘pavilion roof’ (query 2c) were ranked lower (indicated by the Precision–Recall curve’s low Precision near the end), as anticipated. This was simply due to only the term ‘roof’ contributing to these objects’ relevance scores. In this case particularly, IR results using the contextual measures would have been informative if they had worked for the IFC model (as discussed below).

5.2. Information Retrieval from Information Standards

Results in Table 6 show that information standards negatively impacted IR performance. Compared to retrieving information from the Revit model, IFC IR performance marginally decreased, while Uniclass-2015 IR performance proved to be extremely poor. Thus, considering the different ways each information standard describes 3D objects, the results suggest IFC maintains most of the textual information embedded in the native 3D object’s parameters when exported from Revit. Uniclass-2015 loses much textual information, drastically hindering retrieval capabilities from 3D models. In a way, these results are not surprising. The difficulty of information retrieval from IFC models has been reported in the literature [39,40], while the limitations of Uniclass-2015 have also been reported [18], particularly its single way of classifying objects, suggesting IR is very restricted to these specific classification names. It can be argued that Uniclass-2015 is more standardized to allow better interoperability between stakeholders, hence its less flexible nature [18]. However, the results clearly indicate the negative influence this can have on the IR potential. Ultimately, the decrease in IR performance from information standards was expected, but the measurement of the drop in IR performance is a unique finding of this research.

Compared to the results in Revit, the initial Precision in IFC tends to be slightly better for two-term than for one-term queries, shown in Figure 4 and Figure 3 respectively. Precision for IFC tends to be maintained at greater Recall levels (surpassing Revit’s Precision) for one-term queries. For two-term queries, the initial Precision gain is small compared to the Precision drop at a later Recall level. Compared to Uniclass-2015, IFC seems to enable retrieval of objects from the more spatial orientated query terms. Thus, despite losing some information (implied from the worse Recall performance), detailed information for some objects is retained by IFC. This suggests good IR from IFC models can be achieved.

IR performance from Uniclass-2015 was very poor. No objects were retrieved for one-term queries (Table 3), and only two objects (both relevant) were retrieved for two-term query 2b (Table 4). The poor Recall performance is evident from Figure 6 and Figure 7. Although the large proportion of 3D objects missing Uniclass-2015 classifications played a role (only about half of the 3D objects in the model had Uniclass-2015 classification data), this is also due to a limitation in the nature of Uniclass-2015. Fundamentally, unlike IFC, Uniclass-2015 is a classification system rather than a schema in the strict sense [18]. Therefore, Uniclass-2015 does not capture information in 3D objects in the same way, which is believed to have partially contributed to its poor IR performance. For example, ‘slabs’ are classified as ‘decking systems’, and structural function ‘transfer’ for beams is simply classified and interpreted as ‘structural beam systems’. Furthermore, in terms of completely losing information, Uniclass-2015 does not capture the more spatial orientated information. Hence, no objects were retrieved from query terms such as ‘lobby’, ‘roof’ and ‘pavilion’. This is supported by the Recall performance of query 1b and query 2b, as relevant ‘lobby stair’ objects were retrieved due to the term ‘stair’ rather than ‘lobby’. This effect explains why two-term queries performed better than one-term queries for Uniclass-2015.

5.3. Exploiting Relationships between 3D Objects

The contextual measures did not deliver the hoped-for gains in IR performance. One possible explanation is that relevant items were in any case ranked highly for the test queries (with the exception of query 2a) and so there was little scope for further improvement in IR performance by the contextual measures. The limited text in 3D object parameters resulted in relevance scores (i.e., the raw S(V_i) values) of retrieved objects being mostly identical. The contextual measures were observed to scatter those scores due to the relevance of retrieved objects’ Neighbors and Neighbors-of-Neighbors, matching earlier findings [16]. The more scattered relevance scores suggest that objects were now ranked less arbitrarily than they were when the relevance scores were identical. Relevant objects remained ranked first (for two-term queries); and contextualization of retrieved objects through Neighbors worked as anticipated. Notably, Neighbors-of-Neighbors resulted in these scores becoming even more distributed, suggesting even more informative ranking was achieved. Furthermore, for query 2c, a greater relevance score difference between relevant and irrelevant objects was observed to occur after considering Neighbors, and further improved after considering Neighbors-of-Neighbors (Table 5). This clearly signposts the benefit of both contextual measures, although decisive gains in IR performance remain elusive.

For retrieval from the IFC model, relevant objects mostly anyway received the top ranking in search results. Queries 1c and 2c (Figure 8) are an exception to this, and they might have benefited from the contextual relevance measures. It was found that 3DIR does not capture relationships between 3D objects in IFC; contextual measures had no influence on relevance scores. If relationships in IFC were registered by 3DIR, contextual measures were expected to improve, particularly the ranking for Queries 1c and 2c. Many objects judged to be relevant to those queries are interrelated and contain the terms ‘roof’ and ‘pavilion’ (the high Recall value for query 2c in Figure 7 supports this statement).

In case of Uniclass-2015, relationships between 3D objects were carried over from the Revit model (Uniclass-2015 being used only to classify objects). The issue remained of almost half of the model not being classified. This resulted in generally poor IR performance, and the benefit of the contextual relevance measures (whether actual or potential) was impossible to gauge.

Despite the holistic measure (“V_i + V_3D”) being used as the baseline IR performance for this research into the impact of object relationships on IR performance, it is noteworthy that it delivered significant IR performance gains over the basic “V_i” relevance measure which was purely based on term frequency. While the more distributed relevance scores provided a more informative ranking, considering the relevancy of a 3D object as a whole (based on the relevance of other object parameters) proved to be undoubtedly beneficial. Not only did it lead to clustering of V_i items (in relation to a 3D object), but it also resulted in relevant V_i items being ranked higher. Recall and Precision performance here is broadly comparable to other studies [43]. More appropriate tests using IR performance measures on retrieved V_i items are required to assess this benefit.

6. Conclusions

This research investigates information retrieval from BIM environments, where information is linked to a 3D model. In particular, it focuses on IR performance based on the 3D object parameters/attributes (textual or symbolic data) utilizing the 3DIR toolset. This research examines contextual relevance measures, which in addition to the retrieved 3D object, consider the relevance of other 3D objects related to the object in question, in order to improve search results ranking (i.e., Precision). Although significant improvement in Precision performance remains elusive, this research proposes a promising retrieval mechanism from 3D models. It provides a framework for exploiting relationships between objects for the retrieval of information from BIM models. The contextual relevance measures presented in this paper constitute a significant original contribution, whether measuring the relevance of a 3D object as a whole, a 3D object plus its set of “Neighbors” or a 3D object plus its sets of “Neighbors” and “Neighbors-of-Neighbors”. This research has also uniquely extended the 3DIR BIM search engine to comply with information standards and shows that retrieval from IFC models is adequate, while retrieval from Uniclass-2015 datasets is poor.

Only a single model (and a limited subpart of a larger construction development) could be studied. A larger scope building model would have allowed a fuller exploration of the benefits of the concepts proposed here. The limited modelling of relationships between 3D objects (with each 3D object being related to 1.85 others on average) also might have prevented the true benefit of the contextual measures from emerging. It would be promising to explore other relationships, beyond hosting, touching or intersecting. Other relationships might be based on other spatial relationships or shared attributes such as supplier, material, or fire rating. For IFC, it was not possible to study relationships between 3D objects. This was because of the way a Revit model is exported to IFC rather than a limitation of the IFC schema itself. Indeed, the accuracy of the conversion to models complying with information standards is questionable and raises concerns about the validity of the findings reported here. The IFC model contained about 32% of the number of 3D objects in the native Revit model. For Uniclass-2015, the number of 3D objects in the Uniclass-2015 model was 16% of the number of 3D objects in the Revit model. Although this limited translation to standard formats is a noteworthy result, richer models complying with information standards are needed to test the true effect of these information standards on information retrieval performance. The sheer quantity of objects in the model used here would make manual classification unfeasible. To produce such richer models, future research can explore semiautomatic mechanisms to support the automatic translation into information standards.

Despite the limitations, the particular model used was from a real, large-scale project with limited detail in the model, perhaps a challenging, near “worst-case” scenario. This should mean that the findings reported here are generalizable and that the contextual relevance measures are worthy of future research and development beyond the 3DIR toolset.

Further research is recommended to fine-tune the relevance measures proposed. The weighting factors can be adjusted to optimize IR performance. Further research is also recommended to extend the type of 3D object relationships that are exploited when searching. In an editorial to a special issue on the topic [44], a research agenda has been set out to study topology, which touches on the diversity of topological relationships. Mechanisms have been proposed [45] to automatically identify such spatial relationships in 3D Geographic Information Systems datasets. Based on the exponential rise of information stored digitally, BIM platforms are increasingly in need of a dedicated search engine. It is recommended that this work inform the built-in search functions included in commercial BIM platforms. Given the good IR performance reported in this research, the construction industry can benefit from such developments, facilitating knowledge/information management and drastically reducing the time spent searching for information.

Author Contributions

Conceptualization, M.M. and P.D.; methodology, M.M. and P.D.; validation, M.M., P.D. and M.G.; formal analysis, M.M.; investigation, M.M.; resources, P.D.; data curation, M.M.; writing—original draft preparation, P.D. and M.G.; writing—review and editing, P.D., M.M. and M.G.; supervision, P.D.; project administration, M.M.; funding acquisition, P.D. All authors have read and agreed to the published version of the manuscript.

Funding

The original 3DIR BIM search engine was funded by a Brian Mercer Feasibility Award from the Royal Society. The current version for the work reported here was partly funded by Centre for Digital Built Britain as part of the Network FOuNTAIN: Network for ONTologies and Information management in Digital Built Britain.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The complete data are not publicly available due to participant confidentiality.

Acknowledgments

The 3DIR toolset was developed as an Autodesk Revit add-in by Neil Sutton, Software Development Manager at Graitec UK. The authors are grateful to Robert McAlpine for access to building models for the Battersea Phase 3a project.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study, in the collection, analyses, or interpretation of data, in the writing of the manuscript, or in the decision to publish the results.

References

Gray, C.; Hughes, W. Building Design Management; Routledge: Oxfordshire, UK, 2001. [Google Scholar]
Ugwu, O.O. A service-oriented framework for sustainability appraisal and knowledge management. J. Inf. Technol. Constr. (Itcon) 2005, 10, 245–263. [Google Scholar]
Demian, P.; Walters, D. The advantages of information management through building information modelling. Constr. Manag. Econ. 2014, 32, 1153–1165. [Google Scholar] [CrossRef]
Charalambous, G.; Thorpe, T.; Demian, P.; Yeomans, S.G.; Doughty, N.; Peters, C. Collaborative BIM in the Cloud and the Communication tools to support it. In Proceedings of the 30th CIB W78 International Conference on Applications of IT in the AEC Industry, Beijing, China, 12–19 October 2013; pp. 58–67. [Google Scholar]
Karr-Wisniewski, P.; Lu, Y. When more is too much: Operationalizing technology overload and exploring its impact on knowledge worker productivity. Comput. Hum. Behav. 2010, 26, 1061–1072. [Google Scholar] [CrossRef]
Roetzel, P.G. Information overload in the information age: A review of the literature from business administration, business psychology, and related disciplines with a bibliometric approach and framework development. Bus. Res. 2019, 12, 479–522. [Google Scholar] [CrossRef]
Kenny, J. Have We Reached Information Overload? Construction Manager. 3 November 2017. Available online: https://constructionmanagement.co.uk/round-table-have-we-reached-information-overload/ (accessed on 3 May 2023).
Farmer, M. The Farmer Review of the UK Construction Labour Model: Modernise or Die; Construction Leadership Council: London, UK, 2016. [Google Scholar]
Schott, P. Construction Disconnected: The High Cost of Poor Data and Miscommunication, PlanGrid. 2018. Available online: https://blog.plangrid.com/2018/08/fmi-plangrid-construction-report/ (accessed on 3 May 2023).
Hu, W. Conceptual Framework of Information Retrieve and Reuse in Construction Projects. In Proceedings of the 2008 International Conference on Advanced Computer Theory and Engineering, Washington, DC, USA, 20–22 December 2008; pp. 735–739. [Google Scholar]
Short, J.E. Information Lifecycle Management Concepts, Practices, and Value. In A Report for the Society for Information Management. 2007. [Google Scholar]
Demian, P.; Fruchter, R. Measuring relevance in support of design reuse from archives of building product models. J. Comput. Civ. Eng. 2005, 19, 119–136. [Google Scholar] [CrossRef]
Kiziltas, S.; Akinci, B. Automated generation of customized field data collection templates to support information needs of cost estimators. J. Comput. Civ. Eng. 2010, 24, 129–139. [Google Scholar] [CrossRef]
Kim, H.-S.; Lee, H.-S.; Park, M.-S.; Hwang, S.-J. Information Retrieval in Construction Hazard Identification. Korean J. Constr. Eng. Manag. 2011, 12, 53–63. [Google Scholar] [CrossRef]
Khademi, H.; Behan, A. A review of approaches to solving the problem of BIM search: Towards intelligence-assisted design. In Proceedings of the CitA BIM Gathering 2017, Dublin, Ireland, 23–24 November 2017. [Google Scholar]
Demian, P. BIM search engine: Exploiting interrelations between objects when assessing relevance. In Proceedings of the 17th International Conference on Computing in Civil and Building Engineering (ICCCBE2018), Tampere, Finland, 5–7 June 2018. [Google Scholar]
Demian, P.; Balatsoukas, P. Information retrieval from civil engineering repositories: Importance of context and granularity. J. Comput. Civ. Eng. 2012, 26, 727–740. [Google Scholar] [CrossRef]
Demian, P.; Yeomans, S.G.; Murguia-Sanchez, D.E.; West, M.; Barr, S.; Beach, T.; Kassem, M.; Buhagiar, J.; Chapman, L.; Gibbs, D.-J.; et al. Network FOuNTAIN a CDBB Network: For Ontologies and Information Management in Digital Built Britain. Final Report. 2019. Available online: https://hdl.handle.net/2134/37318 (accessed on 19 June 2023).
Anumba, C.J.; Egbu, C.; Carrillo, P. (Eds.) Knowledge Management in Construction; Wiley: Hoboken, NJ, USA, 2008. [Google Scholar]
Soibelman, L.; Han, J. Automated classification of construction project documents. J. Comput. Civ. Eng. 2002, 16, 234–243. [Google Scholar]
Brilakis, I.K.; Soibelman, L. Shape-based retrieval of construction site photographs. J. Comput. Civ. Eng. 2008, 22, 14–20. [Google Scholar] [CrossRef]
Baeza-Yates, R.; Ribeiro-Neto, B. Modern Information Retrieval; ACM: New York, NY, USA, 1999; Volume 463. [Google Scholar]
Cooper, W.S. A definition of relevance for information retrieval. Inf. Storage Retr. 1971, 7, 19–37. [Google Scholar] [CrossRef]
Dominich, S. The Modern Algebra of Information Retrieval; Springer: Berlin/Heidelberg, Germany, 2008; pp. 74–93. [Google Scholar]
Lin, K.Y.; Soibelman, L. Incorporating domain knowledge and information retrieval techniques to develop an architectural/engineering/construction online product search engine. J. Comput. Civ. Eng. 2009, 23, 201–210. [Google Scholar] [CrossRef]
Rezgui, Y. Ontology-centered knowledge management using information retrieval techniques. J. Comput. Civ. Eng. 2006, 20, 261–270. [Google Scholar] [CrossRef]
Wu, S.; Shen, Q.; Deng, Y.; Cheng, J. Natural-language-based intelligent retrieval engine for BIM object database. Comput. Ind. 2019, 108, 73–88. [Google Scholar] [CrossRef]
Wang, J.; Gao, X.; Zhou, X.; Xie, Q. Multi-scale Information Retrieval for BIM using Hierarchical Structure Modelling and Natural Language Processing. J. Inf. Technol. Constr. 2021, 26, 409–426. [Google Scholar] [CrossRef]
Merrouni, Z.A.; Frikh, B.; Ouhbi, B. Toward contextual information retrieval: A review and trends. Procedia Comput. Sci. 2019, 148, 191–200. [Google Scholar] [CrossRef]
Cool, C.; Spink, A. Issues of context in information retrieval (IR): An introduction to the special issue. Inf. Process. Manag. 2002, 38, 605–611. [Google Scholar] [CrossRef]
Chowdhury, G.G. Introduction to Modern Information Retrieval; Facet Publishing: London, UK, 2010. [Google Scholar]
Paul, N. Basic topological notions and their relation to BIM. In Handbook of Research on Building Information Modeling and Construction Informatics: Concepts and Technologies; IGI Global: Hershey, PA, USA, 2010; pp. 451–472. [Google Scholar]
Borrmann, A.; Rank, E. Topological analysis of 3D building models using a spatial query language. Adv. Eng. Inform. 2009, 23, 370–385. [Google Scholar] [CrossRef]
Langenhan, C.; Petzold, F. The fingerprint of architecture-sketch-based design methods for researching building layouts through the semantic fingerprinting of floor plans. Int. Electron. Sci.-Educ. J. Archit. Mod. Inf. Technol. 2010, 4, 1–8. [Google Scholar]
Daum, S.; Borrmann, A. Processing of topological BIM queries using boundary representation based methods. Adv. Eng. Inform. 2014, 28, 272–286. [Google Scholar] [CrossRef]
Langenhan, C.; Weber, M.; Liwicki, M.; Petzold, F.; Dengel, A. Graph-based retrieval of building information models for supporting the early design stages. Adv. Eng. Inform. 2013, 27, 413–426. [Google Scholar] [CrossRef]
Khalili, A.; Chua, D.K.H. IFC-based graph data model for topological queries on building elements. J. Comput. Civ. Eng. 2015, 29, 04014046. [Google Scholar] [CrossRef]
Aldous, J.M.; Wilson, R.J. Graphs and Applications: An Introductory Approach; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
Gao, G.; Liu, Y.-S.; Wang, M.; Gu, M.; Yong, J.-H. A query expansion method for retrieving online BIM resources based on Industry Foundation Classes. Autom. Constr. 2015, 56, 14–25. [Google Scholar] [CrossRef]
Tauscher, E.; Bargstädt, H.J.; Smarsly, K. Generic BIM queries based on the IFC object model using graph theory. In Proceedings of the 16th International Conference on Computing in Civil and Building Engineering, Osaka, Japan, 6–8 June 2016; pp. 6–8. [Google Scholar]
Forcael, E.; Puentes, C.; García-Alvarado, R.; Opazo-Vega, A.; Soto-Muñoz, J.; Moroni, G. Profile Characterization of Building Information Modeling Users. Buildings 2023, 13, 60. [Google Scholar] [CrossRef]
Lucene 2022. Apache Lucene. Available online: https://lucene.apache.org/ (accessed on 4 May 2023).
Li, N.; Li, Q.; Liu, Y.-S.; Lu, W.; Wang, W. BIMSeek++: Retrieving BIM components using similarity measurement of attributes. Comput. Ind. 2020, 116, 103186. [Google Scholar] [CrossRef]
Ellul, C.; Haklay, M. The research agenda for topology and spatial databases. Comput. Environ. Urban Syst. 2007, 31, 373–378. [Google Scholar] [CrossRef]
Ellul, C.; Haklay, M.M. Using a B-rep structure to query 9-intersection topological relationships in 3D GIS–reviewing the approach and improving performance. In 3D Geo-Information Sciences; Springer: Berlin/Heidelberg, Germany, 2009; pp. 127–151. [Google Scholar]

Figure 1. A graph theoretic representation of BIM models as a lens for studying information retrieval.

Figure 2. Diagrammatic representation of research method.

Figure 3. Precision–Recall curves for one-term query type for Revit, IFC and Uniclass-2015, using the baseline holistic measure.

Figure 4. Precision–Recall curves for two-term query type for Revit, IFC and Uniclass-2015, using the baseline holistic measure.

Figure 5. Query 2b Precision–Recall curves in Revit, when amplified constants were applied in contextual measures.

Figure 6. Recall performances between Revit, IFC and Uniclass-2015 for one-term queries.

Figure 7. Recall performances between Revit, IFC and Uniclass-2015 for two-term queries.

Figure 8. Precision–Recall curve for queries 1c and 2c in IFC, using the holistic measure.

Table 1. Relevance measures for investigating information retrieval performance.

Name of Relevance Measure	Equation	Rationale
“ $V_{i}$ ”	$S (V_{i})$	Standard $V_{i}$ Lucene score based on term frequency
“ $V_{i} + V_{3 D}$ ”	$C_{1} S (V_{i}) + C_{2} S (V_{3 D})$	Also considering relevance of 3D object as a whole
“ $V_{i} + V_{3 D} + N$ ”	$C_{3} S (V_{i}) + C_{4} S (V_{3 D})$ $+ C_{5} S (V_{3 D - N})$	Also considering relevance of 3D object’s Neighbors
“ $V_{i} + V_{3 D} + N + N N$ ”	$C_{6} S (V_{i}) + C_{7} S (V_{3 D})$ $+ C_{8} S (V_{3 D - N}) + C_{9} S (V_{3 D - N N})$	Also considering relevance of 3D object’s Neighbors-of-Neighbors

Table 2. Constants used for relevance measures in this research.

Name of Relevance Measure	Constants
“ $V_{i} + V_{3 D}$ ”	$C_{1} = 0.7, C_{2} = 0.3$
“ $V_{i} + V_{3 D} + N$ ”	$C_{3} = 0.5, C_{4} = 0.3, C_{5} = 0.2$
“ $V_{i} + V_{3 D} + N + N N$ ”	$C_{6} = 0.4, C_{7} = 0.3, C_{8} = 0.2, C_{9} = 0.1$

Table 3. Results for one-term test search queries in Revit, IFC and Uniclass-2015.

	Query 1a	Query 1b	Query 1c
Query Terms	Transfer	Lobby	Pavilion
Relevant $V_{3 D}$ objects (agreed between experts)	39	60	65
Revit model (3DIR results)
$V_{i}$ items retrieved	40	36	36
Corresponding $V_{3 D}$ objects	39	31	36
$V_{3 D}$ objects which are relevant	39	31	36
IFC model (3DIR results)
$V_{i}$ items retrieved	75	28	50
Corresponding $V_{3 D}$ objects	38	7	50
$V_{3 D}$ objects which are relevant	38	7	47
Uniclass-2015 (3DIR results)
$V_{i}$ items retrieved	0	0	0
Corresponding $V_{3 D}$ objects	0	0	0
$V_{3 D}$ objects which are relevant	0	0	0

Table 4. Results for two-term test search queries in Revit, IFC and Uniclass-2015.

	Query 2a	Query 2b	Query 2c
Query Terms	Transfer Slab	Lobby Stair	Roof Pavilion
Relevant $V_{3 D}$ objects (agreed between experts)	6	13	65
Revit model (3DIR results)
$V_{i}$ items retrieved	67	58	77
Corresponding $V_{3 D}$ objects	59	45	76
$V_{3 D}$ objects which are relevant	6	11	36
IFC model (3DIR results)
$V_{i}$ items retrieved	126	54	428
Corresponding $V_{3 D}$ objects	54	13	396
$V_{3 D}$ objects which are relevant	6	7	61
Uniclass-2015 (3DIR results)
$V_{i}$ items retrieved	0	2	0
Corresponding $V_{3 D}$ objects	0	2	0
$V_{3 D}$ objects which are relevant	0	2	0

Table 5. Test results when constants in contextual relevance measures were amplified (Revit model).

Query	Relevance Measure	Remarks
1a	$“ V_{i} + V_{3 D} + N$ ”	No impact on overall relevance scores.
1a	$“ V_{i} + V_{3 D} + N + N N$ ”	No impact on overall relevance scores.
1b	$“ V_{i} + V_{3 D} + N$ ”	Relevance scores increased for $V_{3 D}$ objects. Ranking of $V_{3 D}$ objects reshuffled, but difficult to identify impact on IR performance as all the retrieved $V_{3 D}$ objects were relevant.
1b	$“ V_{i} + V_{3 D} + N + N N$ ”	Same as the above; further increase of relevance scores and further reshuffle of $V_{3 D}$ objects.
1c	$“ V_{i} + V_{3 D} + N$ ”	Similar as for query 1b.
1c	$“ V_{i} + V_{3 D} + N + N N$ ”	Similar as for query 1b.
2a	$“ V_{i} + V_{3 D} + N$ ”	No impact on overall relevance scores.
2a	$“ V_{i} + V_{3 D} + N + N N$ ”	IR performance remained the same. Only the relevance scores of the top two $V_{3 D}$ (irrelevant) objects increased.
2b	$“ V_{i} + V_{3 D} + N$ ”	Resulted in poorer IR performance. See Figure 5.
2b	$“ V_{i} + V_{3 D} + N + N N$ ”	Slightly poorer IR performance compared to the above relevance measure. See Figure 5.
2c	$“ V_{i} + V_{3 D} + N$ ”	Relevance scores increased for all relevant $V_{3 D}$ objects and few irrelevant $V_{3 D}$ objects. Ranking of $V_{3 D}$ objects reshuffled, but IR performance remained the same (i.e., relevant $V_{3 D}$ objects maintained their top rank among the retrieved $V_{3 D}$ objects).
2c	$“ V_{i} + V_{3 D} + N + N N$ ”	Similar to above. Notably, all relevant $V_{3 D}$ objects’ relevance scores remained the same except for three 3D objects. Thus, creating a bigger relevance score difference between the three relevant $V_{3 D}$ objects and irrelevant $V_{3 D}$ objects.

Table 6. Mean Average Precision (MAP) results for relevance measures tested.

Query Type	Revit		IFC		Uniclass-2015
Query Type	1-Term	2-Term	1-Term	2-Term	1-Term	2-Term
$“ V_{i} + V_{3 D}$ ” Performance	1	0.715	0.980	0.662	0	0.333
$“ V_{i} + V_{3 D} + N$ ” Performance	1	0.715	0.980	0.662	0	0.333
$“ V_{i} + V_{3 D} + N + N N$ ” Performance	1	0.715	0.980	0.662	0	0.333

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Molsa, M.; Demian, P.; Gerges, M. BIM Search Engine: Effects of Object Relationships and Information Standards. Buildings 2023, 13, 1591. https://doi.org/10.3390/buildings13071591

AMA Style

Molsa M, Demian P, Gerges M. BIM Search Engine: Effects of Object Relationships and Information Standards. Buildings. 2023; 13(7):1591. https://doi.org/10.3390/buildings13071591

Chicago/Turabian Style

Molsa, Maciej, Peter Demian, and Michael Gerges. 2023. "BIM Search Engine: Effects of Object Relationships and Information Standards" Buildings 13, no. 7: 1591. https://doi.org/10.3390/buildings13071591

APA Style

Molsa, M., Demian, P., & Gerges, M. (2023). BIM Search Engine: Effects of Object Relationships and Information Standards. Buildings, 13(7), 1591. https://doi.org/10.3390/buildings13071591

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

BIM Search Engine: Effects of Object Relationships and Information Standards

Abstract

1. Introduction

2. Literature Review

2.1. Information Management in BIM

2.2. Information Retrieval

2.3. Topological Modeling in Buildings

2.4. Graph Theoretic Formulation of Information Linked to 3D Models

2.5. Information Standards

3. Materials and Methods

4. Results

4.1. Basic Retrieval

4.2. Precision–Recall Curves

4.3. Amplified Weighting Constants for Topological Relationships

4.4. Recall Performance

4.5. Information Retrieval Performance Summary

5. Discussion

5.1. One-Term vs. Two-Term Query Performance

5.2. Information Retrieval from Information Standards

5.3. Exploiting Relationships between 3D Objects

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI