Comprehensive Calculation Method of Semantic Similarity of Transport Infrastructure Ontology Concept Based on SHO-BP Algorithm

Bao, Tuyu; Chen, Kun; Zhang, Hao; Zhang, Zheng; Ai, Qingsong; Yan, Junwei

doi:10.3390/app131910587

Open AccessArticle

Comprehensive Calculation Method of Semantic Similarity of Transport Infrastructure Ontology Concept Based on SHO-BP Algorithm

by

Tuyu Bao

¹,

Kun Chen

^1,2

,

Hao Zhang

¹,

Zheng Zhang

¹,

Qingsong Ai

^1,3

and

Junwei Yan

^1,2,*

¹

School of Information Engineering, Wuhan University of Technology, Wuhan 430070, China

²

Hubei Key Laboratory of Broadband Wireless Communication and Sensor Networks, Wuhan University of Technology, Wuhan 430070, China

³

School of Computer and Information Engineering, Hubei University, Wuhan 430062, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(19), 10587; https://doi.org/10.3390/app131910587

Submission received: 12 August 2023 / Revised: 10 September 2023 / Accepted: 19 September 2023 / Published: 22 September 2023

Download

Browse Figures

Versions Notes

Abstract

:

Semantic information interaction plays an important role in transportation infrastructure modeling and management. To ensure semantic consistency during information exchange and data integration, ontology technology is commonly employed to measure the semantic relevance between concepts. Ontology semantic similarity accurately expresses relationships among various concepts in the domain, and when combined with Building Information Modeling (BIM) technology, it improves the efficiency of information transmission and management in construction. However, the complex structure, diverse components, and strong attribute diversity of transportation infrastructure pose challenges for analysis and computation, leading to limited precision in existing ontology semantic similarity methods. Aimed at these issues, this paper proposes a transport infrastructure ontology concept semantic similarity measurement model based on the Back Propagation (BP) neural network algorithm improved by the Spotted Hyena Optimizer (SHO-BP). Firstly, a semantic network for transportation infrastructure is established, and an ontology-based semantic similarity calculation model is constructed with three approaches, including Edge-Counting method, Feature-based method, and Information-Content method. Then, the SHO-BP algorithm is employed to comprehensively weight the three similarity measure approaches above. Finally, using bridge BIM models as examples, the semantic similarity of transportation infrastructure concepts involved in the BIM models are computed based on the weighted model derived from the aforementioned processes. The experiments demonstrate that the SHO-BP algorithm achieves a higher Pearson correlation coefficient than other algorithms for the comprehensive semantic similarity results in the field of transportation infrastructure. This improvement effectively enhances the accuracy of ontology semantic similarity calculation, and it is conducive to the sharing and integration of BIM information in different systems.

Keywords:

ontology; semantic similarity; transport infrastructure; back propagation neural network; spotted hyena optimizer

1. Introduction

With the continuous progress of computers and information technology, the construction industry is gradually moving towards digitization and informatization. In this context, BIM technology has been widely used in the construction field, with many building projects adopting BIM for design, construction, and operation management [1]. BIM technology covers the entire lifecycle of a building, and in the process of constructing transportation infrastructure, information from BIM models can be utilized in various stages of construction [2,3]. Interacting and integrating information from BIM models into different systems can improve the efficiency and accuracy of information transmission in construction projects. However, the construction of transportation infrastructure involves diverse projects of different types, and different stages of the same project have varying requirements, resulting in the heterogeneity of indicators. In theory, the information contained within a BIM model could encompass the entire lifecycle of a building. However, due to the aforementioned phenomenon of indicator heterogeneity, software engineers, when tasked with constructing a BIM information sharing system, encounter concepts that are semantically related yet have different names across various indicators. The semantic associations among these concepts lack a quantifiable metric, leading engineers to rely on subjective judgment to assess their semantic associations. Such a workflow not only proves to be inefficient but also introduces numerous errors, significantly impeding the integration and sharing of BIM information.

Semantic similarity is a metric used in the fields of natural language processing (NLP) and knowledge representation to measure the degree of semantic similarity between two texts or concepts. It is a method for comparing and quantifying semantic associations between texts or concepts [4]. There are various methods to calculate semantic similarity, among which those combined with ontology technology are known as ontology-based semantic similarity. Ontology-based semantic similarity is a metric used to measure the degree of semantic similarity between concepts or instances in an ontology [5,6]. When calculating ontology-based semantic similarity, structural information and property information from ontology trees can be utilized to compute the semantic relevance of concepts in a domain, addressing the problem of semantic inconsistency in data interaction involving multiple heterogeneous sources [7,8].

To ensure the reliability of ontology-based semantic similarity calculation results, it is necessary to construct a complete domain ontology model and consider multiple dimensions of semantic information comprehensively. Currently, ontology semantic similarity can be calculated from various dimensions, such as path distance, hierarchical structure, attribute features, and semantic relationships [9], but the accuracy of the calculation depends on the fit between the ontology model and the domain. Ontology technology has been widely applied in the construction field, including the construction and maintenance of building knowledge bases, semantic search and inference of building information, and knowledge management of BIM models [10,11,12,13].

In the construction field, the use of existing ontology-based semantic similarity calculation methods often yields favorable results. However, applying the same methods in the domain of transportation infrastructure proves to be less satisfactory. Compared to traditional buildings, transportation infrastructure has more complex structures, larger spans, and involves more elements, which means that after ontology mapping, transportation infrastructure ontology models have more nodes, more hierarchical levels, and stronger attribute diversity. These characteristics indicate that using existing ontology-based similarity calculation methods to compute the semantic similarity in the field of transportation infrastructure may affect the precision of results. To obtain calculation results that meet the practical engineering needs, it is necessary to study ontology model construction and ontology-based semantic similarity calculation methods tailored for transportation infrastructure.

In order to address the challenges encountered by existing methods in computing ontology-based semantic similarity within the domain of transportation infrastructure, this paper introduces a comprehensive ontology-based semantic similarity calculation method based on the SHO-BP algorithm. This method accurately computes ontology-based semantic similarity between concepts in the field of transportation infrastructure, which can effectively address the issue of heterogeneity in indicators encountered in the integration of information in the transportation infrastructure domain. The major contributions of this study are as follows:

Construction of ontology models tailored for the domain of transportation infrastructure, capable of storing information from BIM models. These ontology models serve as a basis for analyzing various types of transportation infrastructure.
Adjustment and optimization of the existing Edge-Counting, Information-Content, and Feature-based methods to make them more suitable for analyzing the ontology models constructed in this paper.
Proposal of an ontology-based comprehensive semantic similarity calculation method based on the SHO-BP algorithm, addressing the absence of effective means for measuring semantic association among concepts in the transportation infrastructure domain.

The remainder of this paper is organized as follows: Section 2 provides a literature review on ontology-based semantic similarity calculation. Subsequently, Section 3 elaborates on the workflow of the SHO-BP algorithm and the process of comprehensive semantic similarity calculation. Section 4 details the experimental process of ontology model construction and semantic similarity calculation, followed by an analysis of the experimental results. Finally, Section 5 presents conclusions and discusses future research directions.

2. Related Work

The ontology model can store rich semantic, structural, and attribute information, and it allows similarity calculations from multiple perspectives. Early research on ontology similarity mainly focused on analyzing similarity from a single perspective. Sánchez et al. [5] categorized these approaches into Edge-Counting, Feature-based, and Information-Content similarity calculation methods.

The method of measuring similarity by the length of the path between two nodes in the ontology is known as the path Edge-Counting method. Yang and Powers [14] proposed a model using edge annotation to measure semantic similarity in WordNet classifications. They annotated directed edges in the ontology model based on the types of relationships between concepts. Different types of paths were assigned different weights in the calculation to align the results more closely with human judgment. However, due to the involvement of seven free parameters in the algorithm, the stability of the calculation results is not sufficient.

The Feature-based method determines the similarity between two concepts by analyzing their common and different properties. Zhang and Tian [15] proposed a similarity calculation process that subdivides properties into property names, property data types, and property instance data, and uses string similarity calculation methods to calculate the similarity of property names and property data types with a comprehensive weighting. However, the accuracy of the results obtained by this method is largely influenced by manually setting the weights, making it subjective. Moreover, this method lacks universality for different ontologies.

The Information-Content method combines information entropy with ontology relationships. Couto [16] proposed a model called DiShin, which can compute the shared information content between two ontology concepts based on the information content of their disjunctive common ancestors. This method introduces information entropy and analyzes the information content of two concepts, but it does not utilize higher-level ontology features in the model.

The aforementioned similarity calculation methods utilize information from the ontology to measure the relevance between concepts but only use partial information from the ontology model. In recent years, research on ontology-based semantic similarity no longer focuses solely on individual similarity measures but instead combines multiple measures to perform mixed calculations, involving more comprehensive elements and obtaining more reliable results. Zhao et al. [17] proposed a contributor’s reputation model based on the semantic similarity of ontology concepts. In this method, Edge-Counting similarity and Feature-based similarity are used as indicators to calculate the semantic similarity between concepts. Then, the evaluation reputation of Volunteered Geographic Information (VGI) contributors is computed based on the integrated semantic similarity, geometric similarity, and topological similarity of the object versions. As semantic similarity is just one aspect of calculating the contributor reputation, this method analyzes semantic similarity from the perspectives of edges and properties in the ontology. Zhao et al. [18] proposed an improved semantic similarity algorithm that weights the Edge-Counting similarity and Information-Content similarity to calculate the comprehensive semantic similarity between concepts in WordNet. The Edge-Counting similarity comprehensively analyzes various structural features of the ontology model, including the deepness, density, and importance of edges. However, as the aim of this method is to calculate the semantic similarity between WordNet concepts, it cannot use the similarity of properties between concepts as an indicator. Xu et al. [19] improved the classic similarity calculation model based on semantic distance, information content, and concept property, and also proposed a similarity calculation model based on the view of concept’s sub-nodes coincidence. They introduced a BP neural network in the weighted calculation and optimized the neural network using Simulated Annealing (SA) to obtain the weights of various similarities, avoiding the subjectivity of manually setting weights and enhancing the accuracy of integrating ontology similarities. However, this method is specifically designed for command information concepts and is challenging to directly apply to the domain of transportation infrastructure. Wang et al. [20] combined ontology technology to propose a geographic information semantic similarity calculation method. In the geographic information domain, spatial distribution characteristics and geometric features are essential for analysis. Hence, this method focuses more on the types and ranges of properties used in the model. It evaluates the contribution of ontology properties, measures the influence of relative positions in the ontology hierarchy structure, and calculates the geometric similarity of geographic spatial entities to obtain geographic semantic similarity more accurately and effectively. For the transportation infrastructure domain, spatial distribution characteristics and geometric features are equally important for analysis. However, the current similarity calculation methods for transportation infrastructure do not fully utilize the relevant attributes in the ontology.

In general, hybrid similarity calculation methods often yield superior results compared to approaches focused solely on individual factors. Hybrid semantic similarity is derived by performing additive weighting on different similarity outcomes, requiring the computational model to express relationships among these diverse similarities. Nevertheless, existing hybrid similarity calculation models often demand extensive data during training, and the determination of model weights heavily relies on expert opinions and empirical data, introducing subjectivity into weight allocation. Neural networks, with their potent nonlinear modeling and feature learning capabilities, can objectively and accurately represent relationships between similarities after training. Therefore, we opt for neural networks to perform hybrid similarity calculation. The integrated computation employed in this study involves additive weighting of Edge-Counting, Feature-based, and Information-Content similarities. During training, the three individual factor similarities serve as inputs to the neural network, while the comprehensive semantic similarity of the training set serves as the output. Both the BP neural network and most of the currently commonly used neural networks have shown admirably performance when handling the aforementioned weighted task. Furthermore, given the mature state of research concerning BP neural networks, training and parameter tuning for BP neural networks are relatively straightforward. Hence, we have chosen to employ a BP neural network for the task of weights allocation in this paper. Nonetheless, it is crucial to acknowledge that BP neural networks come with inherent limitations that can impact results. To mitigate these limitations, we employ the SHO algorithm to optimize the BP neural network, aiming for enhancing the accuracy computational outcomes.

3. Methods

3.1. BP Neural Network

BP neural network [21] is an artificial neural network based on the error back propagation algorithm. It is a type of multi-layered feed-forward neural network that possesses excellent capabilities for nonlinear mapping and approximating arbitrary functions. Consequently, it finds wide applications in fields such as classification, regression, pattern recognition, and signal processing. Typically, BP neural network consists of three layers: the input layer, the hidden layer, and the output layer, as shown in Figure 1.

In this study, similarity measures denoted as X₁ to X_n are used as inputs to the network. The activation function chosen for the network is the Sigmoid function, represented by the following formula:

f (x) = 1 / (1 + e^{- x})

(1)

The variable x in the Sigmoid function ranges from negative infinity to positive infinity.

Use

Z_{k}

to represent the expected value corresponding to

Y_{k}

, then the error function of the BP neural network is represented by the following formula:

E = \frac{1}{2} \sum_{k = 1}^{m} {(Z_{k} - Y_{k})}^{2}

(2)

The BP neural network can adjust the values of W_ij and W_jk based on their partial derivatives with respect to the output E in order to minimize the error. Training is stopped when the error reaches the predetermined target or when the BP neural network completes the expected number of training iterations.

3.2. Spotted Hyena Optimizer

Spotted Hyena Optimizer [22] is inspired by the hunting behavior of spotted hyenas. In optimization algorithms, each individual corresponds to a fitness value, where higher fitness indicates a better solution for that individual. In this paper’s optimization algorithm, a lower mean squared error leads to higher fitness. The two key mechanisms in the optimization algorithm are the encircling mechanism and the hunting mechanism.

3.2.1. Encircling Mechanism

The encircling mechanism of the SHO algorithm can be expressed as follows:

D_{s} = | B P_{P} (t) - P (t) |

(3)

D_{s}

represents the distance between the prey and the spotted hyena,

t

indicates the current iteration,

P_{P} (t)

represents the position of the prey,

P (t)

is the position of the spotted hyena, and

B = 2 r_{1}

stands for the oscillation factor,

r_{1}

is random vector in the range [0, 1].

The new position of the spotted hyena after one iteration is represented by

P (t + 1)

, as shown in the following equation:

P (t + 1) = P_{t} - E D_{h}

(4)

In the equation above,

E = 2 h r_{2} - h

represents the convergence factor;

r_{2}

denotes a random number between [0, 1]; and

h

represents the control factor, which decreases as the iteration progresses. The equation is given as follows:

h = 5 - 5 \frac{I t e r a t i o n}{{M a x}_{I t e r a t i o n}}

(5)

In Equation (5),

I t e r a t i o n

represents the current iteration number, and

{M a x}_{I t e r a t i o n}

represents the maximum number of iterations. After a single iteration, if

| E | < 1

, the individual spotted hyena will approach the prey. Through continuous iteration, the absolute value of

E

gradually decreases, resulting in the gradual shrinkage of the hunting area.

3.2.2. Hunting Mechanism

In the SHO algorithm, hunting mechanism can be represented by Equations (6) and (7):

D_{h} = | B P_{h} (t) - P_{k} |

(6)

P_{k} = P_{h} - E D_{h}

(7)

In Equations (6) and (7),

P_{k}

represents a regular spotted hyena individual,

P_{h} (t)

defines the first best position of spotted hyenas, and

D_{h}

represents the distance between a regular individual and

P_{h} (t)

. Classify

N

of the spotted hyena individuals into the optimal cluster based on their individual fitness values, where

N

can be calculated as follows:

N = {C o u n t}_{n o s} (P_{h}, P_{h + 1}, \dots, (P_{h} + M))

(8)

In Equation (8),

M

is a random vector between [0.5, 1], which represents the range for partitioning the optimal cluster.

P_{h}

to

P_{h} + M

in parentheses represent individuals closest to the optimal individual

P_{h}

. All individuals within this range are assigned to the optimal cluster

C_{h}

, expressed by the following formula:

C_{h} = P_{k} + P_{k + 1} + \dots + P_{k + N}

(9)

C_{h}

is represented as the sum of vectors from all individuals

P_{k}

to

P_{k + N}

in the optimal cluster, and

C_{h}

/N representing the center position of the optimal cluster.

When

| E | < 1

, the best spotted hyena individual will consider the center of the optimal cluster A as the position for the next iteration, described by the following formula:

P_{h} (t + 1) = \frac{C_{h}}{N}

(10)

3.3. SHO-BP Algorithm

Although the BP neural network has shown promising results in calculating ontology-based semantic similarity tasks, it still suffers from issues such as slow convergence, susceptibility to local optima solution, and dependence on initial weights for result accuracy. This paper proposes an optimization approach using the SHO algorithm to improve the training process of the BP neural network and enhance its performance. By integrating the hunting mechanism of the Spotted Hyena algorithm, the convergence speed of the BP neural network model is increased. Furthermore, each weight adjustment takes into account the impact of the individual fitness values of the entire spotted hyena population, preventing the model from becoming stuck in a local optimal solution. The specific optimization process is illustrated in Figure 2.

As shown in Figure 2, the SHO algorithm updates the weights and thresholds of the BP neural network during the iterative process. The BP neural network incorporates these new weights and thresholds for training, and the current weights and thresholds are evaluated for optimality based on the obtained errors. Through continuous updates and adjustments to these parameters, the SHO algorithm aims to discover superior solutions for the neural network. This process partially reduces the sensitivity of the BP neural network to initial weights and thresholds, thereby enhancing the model’s performance and robustness.

3.4. Comprehensive Semantic Similarity Calculation Method

The workflow of the comprehensive semantic similarity calculation method based on the SHO-BP algorithm is depicted in Figure 3. Firstly, an transport infrastructure BIM model is converted into an ontology model by analyzing its structure. Then, two concepts are selected from the ontology model, and their positions in the ontology tree structure are determined, along with the retrieval of their properties and parameters, to conduct the calculation of three types of similarity: Edge-Counting, Feature-based, and Information-Content similarity. Finally, the SHO algorithm is used to optimize the weights and thresholds of the BP neural network. The optimized BP neural network is then employed to perform comprehensive weighting on the three types of similarity, resulting in the overall ontology-based semantic similarity.

The process of calculating comprehensive similarity involves weighting and summing three types of similarity: Edge-Counting, Feature-based, and Information-Content. The weights for this summation can be obtained through the training of the BP neural network. In order to achieve the task of weights allocation, we set the Edge-Counting, Feature-based, and Information-Content similarities between different concepts as inputs to the BP neural network. Simultaneously, we use the comprehensive similarity from the training dataset as the output for training the neural network. During the training process, the network continuously adjusts its weights and biases using the Backpropagation algorithm to minimize the loss function. Upon completion of training, the weights of the output layer are extracted and used as the weights for the three similarity factors, which are subsequently employed in the comprehensive calculation task. However, the network is prone to becoming stuck in the local optimal solution during the aforementioned training, resulting in a calculation model with lower accuracy in its outcomes. Therefore, we employ the SHO-BP algorithm to facilitate the training process. Through the SHO algorithm, we optimize the initial weights and thresholds of the BP neural network. Leveraging the global search capability of the SHO algorithm prevents the network from falling into a local optimal solution. This ensures that the weights obtained during training enable the accurate computation of comprehensive similarity while reducing the time required for the BP neural network convergence. The following sections provide a detailed introduction to the three similarity calculation methods.

3.4.1. Edge-Counting Similarity

Ontologies employ tree-like structures to represent relationships between concepts, where the geometric arrangement of the tree encodes semantic information about the concepts. When studying semantic similarity based on the distance between two concepts, it is necessary to compare the distance between the positions of these two concepts in the ontology tree. Wu and Palmer’s method [23] and Leacock and Chodorow’s method [24] comprehensively analyze the impact of common parent nodes and the depth of the compared concepts in the semantic network on the calculation results. Furthermore, the distance between two nodes is also influenced by the relationship between them. The bridge ontology constructed in this study mainly consists of three types of relationships: part–whole, concept–instance, and synonym. Clearly, the distance between concepts influenced by synonym relationships should be smaller than that influenced by part–whole relationships. After analyzing the concepts and relationships in the bridge ontology model, different weights for the edges corresponding to different relationships are set according to the following criteria:

W_{d i s} (c_{i}, c_{j}) = \{\begin{matrix} 0 synonym \\ 0.2 concept - instance \\ 0.5 part - whole \end{matrix}

(11)

The distance with additional weighted edges

D i s (c_{1}, c_{4})

between node

c_{1}

and node

c_{4}

, where the path from

c_{1}

to

c_{4}

is

c_{1}

to

c_{2}

,

c_{2}

to

c_{3}

, and

c_{3}

to

c_{4}

, can be represented by the following formula:

D i s (c_{1}, c_{4}) = W_{d i s} (c_{1}, c_{2}) + W_{d i s} (c_{2}, c_{3}) + W_{d i s} (c_{3}, c_{4})

(12)

The formula for calculating Edge-Counting similarity used in this study is shown as Equation (13):

{S i m}^{d i s} (c_{i}, c_{j}) = \frac{d e p (l c a (c_{i}, c_{j}), c_{i}) + d e p (l c a (c_{i}, c_{j}), c_{j})}{(D i s (c_{i}, c_{j}) + 1) (m a x (d e p (c_{i})) + m a x (d e p (c_{j})))}

(13)

In the formula,

l c a (c_{i}, c_{j})

represents the lowest common ancestor node between nodes

c_{i}

and

c_{j}

,

d e p (l c a (c_{i}, c_{j}), c_{i})

represents the depth of the lowest common ancestor node in the branch of the ontology tree where node

c_{i}

is located, and

m a x (d e p (c_{i}))

represents the maximum depth of the branch starting from node

c_{i}

in the ontology tree.

3.4.2. Feature-Based Similarity

In the ontology model, each node is associated with several properties. The higher the degree of property overlap between two concepts, the greater the similarity between them. Tversky algorithm [25] calculates Feature-based similarity by performing weighted operations on the property union and intersection between two nodes, which determines the proportion of shared attributes between them. However, this method considers all properties to have equal weights, while in reality, different properties may contribute differently to the similarity between concepts. Zhang et al. [16] address this issue by dividing the property information into three elements: property name, property data type, and property value. They calculate the similarity of these three concepts separately and then combine them with weighted formula as shown in Equation (14) for Feature-based similarity computation.

{S i m}_{a * b}^{e l e} (a, b) = α s i m (a_{n}, b_{n}) - β s i m (a_{d}, b_{d}) - γ s i m (a_{v}, b_{v})

(14)

In the formula above,

a

and

b

represent a pair of properties compared between nodes

c_{i}

and

c_{j}

.

s i m (a_{n}, b_{n})

denotes the similarity of property names,

s i m (a_{d}, b_{d})

denotes the similarity of data types, and

s i m (a_{v}, b_{v})

denotes the similarity of property values.

α

,

β

, and

γ

are parameters satisfying

α + β + γ = 1

. Considering that property names and property data types are string forms, the cosine similarity algorithm is used to calculate

s i m (a_{n}, b_{n})

and

s i m (a_{d}, b_{d})

. If there are k pairs of properties between concepts

c_{i}

and

c_{j}

, the Feature-based similarity of

c_{i}

and

c_{j}

is expressed as follows:

{S i m}^{e l e} = \sum_{i = 1}^{k} ω_{i} {S i m}_{a * b}^{e l e} (a_{i}, b_{i})

(15)

In Equation (15),

ω_{i}

represents the weights assigned to different property pairs, satisfying the condition

\sum_{i = 1}^{k} ω_{i}

= 1. When constructing the ontology model described in this paper, properties that are considered important for defining concepts are arranged towards the end of the property list. Therefore, the weight of the back property pair will be greater than the front property pair.

3.4.3. Information-Content Similarity

In the ontology model, each node corresponding to a concept does not appear with equal probability. By statistically analyzing the occurrence probability of concepts or instances corresponding to each node in the bridge BIM model, the information content of concept nodes can be computed, as shown in the following formula:

I n f o (c_{i}) = - l o g (p (c_{i}))

(16)

In the formula,

p (c_{i})

represents the probability of concept

c_{i}

occurring in the BIM model.

In the bridge ontology model constructed in this study, the bridge architectural concepts are hierarchically organized from the whole to the parts. Concepts at the same depth in the ontology model should receive higher weights for their information similarity compared to concepts at different depths. Moreover, the greater the depth difference, the lower the weight. Following the method proposed by Lin [6] and considering the features of hierarchical division in the transport infrastructure ontology model, the Information-Content similarity is defined as follows:

{S i m}^{i n f o} (c_{i}, c_{j}) = \frac{2 I n f o (l c a (c_{i}, c_{j})) \times W_{d e p} (c_{i}, c_{j})}{I n f o (c_{i}) + I n f o (c_{j})}

(17)

In Equation (17), the value of

W_{d e p} (c_{i}, c_{j})

depends on the depth difference between

c_{i}

and

c_{j}

. Use

D_{d e p} (c_{i}, c_{j})

to indicate the depth difference between

c_{i}

and

c_{j}

. The value of

W_{d e p} (c_{i}, c_{j})

can be defined in the following formula:

W_{d e p} (c_{i}, c_{j}) = \{\begin{matrix} 1 D_{d e p} (c_{i}, c_{j}) = 0 \\ 0.7 D_{d e p} (c_{i}, c_{j}) = 1 \\ 0.5 D_{d e p} (c_{i}, c_{j}) = 2 \\ 0.3 D_{d e p} (c_{i}, c_{j}) = 3 \\ 0.1 other cases \end{matrix}

(18)

4. Results and Discussion

4.1. Experimental Procedure

In this study, bridge BIM models are selected as examples of transportation infrastructure to calculate the semantic similarity of concepts in the BIM model. The following steps were taken:

Step 1: Collecting Data. The specific method involves selecting 300 pairs of concepts related to bridges from the Classification and Coding Standard for Road Traffic Infrastructure Unit Information Model. The reference similarity of these 300 concept pairs is obtained based on expert opinions. Bridge BIM models used in several engineering projects are selected, and the data from these models are used for ontology mapping.

Step 2: Establishing Semantic Network. The specific method involves following the approach described in [10]. A bridge ontology model corresponding to the bridge models is constructed in Protégé, and a semantic network is built in Python based on the ontology model’s tree-like structure.

Step 3: Training Calculation Model. Calculating Edge-Counting similarity, Feature-based similarity, and Information-Content similarity between concepts in the Semantic Network. Similarity data are then collected to form training and testing samples, and the calculation model is trained using the training samples.

Step 4: Testing and Analyzing Result. Using the trained model to calculate the comprehensive ontology-based semantic similarity for test samples. The results are compared and analyzed against results obtained from other methods.

4.2. Constructing Ontology Model and Semantic Network

By analyzing bridge BIM models and relevant knowledge in the field of bridge engineering, the bridge structures are abstracted and classified. Considering the engineering requirements of transportation infrastructure, the bridge structures are categorized into nine major classes: Pre-stressed Component, Foundation Component, Abutment, Pier, Girder Bridge Component, Arch Bridge Component, Cable-Stayed Bridge Component, Suspension Bridge Component, and Bridge Deck and Subsidiary Engineering Component. Based on this classification, a basic structural component ontology model for bridges is established, and further subdivisions are made within each of the nine main classes. With the class hierarchy defined, conceptual relationships are set from top to bottom, and properties and instances are added to the concepts. The relationships between classes in the ontology mainly consist of Part-of and Instance-of relationships. The bridge ontology model constructed in this paper using Protégé is shown in Figure 4.

According to the structure of the bridge ontology model, a semantic network for ontology model mapping is established using Python. Each class in the ontology model corresponds to a node in the semantic network. The nodes in the semantic network need to store information for calculating similarities. The code to set up a node in the semantic network is shown as follows:

G.add_node (‘abutment’, p = ‘0.27’, attrs = [(‘component’, ‘string’), (‘height’, ‘double’), (‘length’, ‘double’), (‘type’, ‘abutment’)])

Where G is a directed graph representing the constructed semantic network, and “p” represents the occurrence probability of node concepts, which is based on the frequency statistics of concepts in BIM models. The property groups required for calculating Feature-based similarity are saved in the “attrs” string array corresponding to each node.

In the semantic network, nodes are connected through directed edges. When setting the directed edges, based on the relationships between nodes, the corresponding weights are set. For example, between “Bridge” and “Abutment” there is a “Part-of” relationship. The code for establishing edges in Python is as follows:

G.add_edge(‘Bridge’, ‘Abutment’, rel = ‘Part-of’, weight = 0.5)

Once the semantic network is established, the calculation of Edge-Counting, Feature-based, and Information-Content similarities can be performed after selecting two concept nodes from the semantic network.

4.3. Model Training and Testing

The calculation model utilizes the BP neural network to calculate the comprehensive semantic similarity by weighting Edge-Counting, Feature-based, and Information-Content similarities, thereby avoiding the subjectivity of manual weighting. To ensure the scientific validity and accuracy of weight settings, the model is trained using a training dataset, and during the training process, the SHO algorithm is employed to optimize the BP neural network [26]. The specific flowchart is illustrated in Figure 5. It is worth mentioning that within the ontology model, certain concepts may have multiple meanings or interpretations. Taking the concept of “bridge pier” as an example, this concept is relevant in various types of bridges, but the bridge piers in different bridges have distinct meanings, functions, and structures. To address this issue, we create each distinct meaning as an instance in the ontology model. When calculating the similarity between a regular concept A and a concept B with multiple meanings, we independently compute the comprehensive similarity between concept A and each instance associated with a specific meaning related to concept B. Subsequently, weights are assigned based on the relationships between these meanings and concept B, with a higher relationship relevance leading to larger weights. Finally, the obtained similarities are weighted according to these weights to derive the comprehensive similarity between concept A and concept B.

In Figure 5, the similarity evaluation values are semantic similarity scores determined by domain experts, which are considered as the target for prediction. By arranging and evaluating the similarity of various concepts in pairs within the bridge ontology model, multiple sets of data are obtained. Among them, 300 sets of data are selected, with 270 sets used as training samples to train the BP neural network model, and the remaining 30 sets used as a test set to assess the accuracy of the experimental results. The training regression curve of the BP neural network is depicted in Figure 6, showing a high degree of fit with the actual data points and demonstrating excellent predictive capabilities.

After inputting the three types of similarity data of the testing samples into the trained BP neural network, the comprehensive ontology semantic similarity output results were obtained. The comparison curve and error curve between the results obtained by SHO-BP algorithm and the similarity evaluation values are shown in Figure 7. The neural network’s predicted results exhibited a small error compared to the evaluation values, demonstrating high accuracy.

4.4. Comparison and Analysis

In order to validate the effectiveness of the SHO-BP algorithm, this study applies other computational methods to predict semantic similarity for the same set of test samples. Pearson correlation coefficient analysis and error analysis are conducted to compare the performance of these methods. Considering that there are currently no specific ontology-based semantic similarity calculation methods tailored for the domain of transportation infrastructure, we selected and compared our proposed method with five commonly used ontology-based semantic similarity calculation methods. The five computational methods are as follows: Whale Optimization Algorithm optimized BP neural network method (WOA-BP) [27], Improved Edge-Counting method [14], Improved IC (Information-Content) method [28], Hierarchical Ontology Model (HOM) [29] method, and HowNet [30] method. These five methods possess distinct characteristics, all of which can be employed to measure the semantic association among concepts within a domain. The partial comparison results obtained from different methods are presented in Table 1, where the similarity evaluation values serve as the prediction targets.

As shown in Table 1, the results obtained by the SHO-BP algorithm are closer to the evaluation values compared to other methods. Both SHO-BP and WOA-BP utilize optimization algorithms to improve the performance of the BP neural network. While the results of the SHO-BP and the WOA-BP are relatively close, the SHO-BP algorithm demonstrates better performance by achieving results that are even closer to the evaluation values.

We also compared the Pearson correlation coefficients and errors obtained from different calculation methods, as shown in Table 2 and Table 3. The Pearson coefficient is used to measure the linear correlation between results and evaluation values, where a higher Pearson coefficient indicates better algorithm performance.

As shown in Table 2 and Table 3, the SHO-BP algorithm exhibits the highest Pearson correlation coefficient, indicating a strong correlation. The SHO-BP algorithm achieves the best performance in terms of errors across all metrics. Although the improved Edge-Counting method can achieve good results for some concepts, the large average error indicates that the overall performance of this method is poor. While the maximum error of the WOA-BP algorithm is close to that of the SHO-BP, there is a significant difference in the mean squared error.

In the aforementioned experiments, the results of the SHO-BP algorithm and the WOA-BP algorithm are very close. To further compare the optimization performance of SHO and WOA algorithms on the BP neural network, standard test functions are used to evaluate the optimization effects of both algorithms. The convergence curves of the two algorithms under four test functions are shown in Figure 8, Figure 9, Figure 10 and Figure 11, and the formulas of the test functions are given by Equations (19)–(22).

f_{1} (x) = \sum_{i = 1}^{30} x_{i}^{2}, - 100 \leq x_{i} \leq 100

(19)

f_{6} (x) = \sum_{i = 1}^{30} {(| x_{i} + 0.5 |)}^{2}, - 100 \leq x_{i} \leq 100

(20)

f_{9} (x) = \sum_{i = 1}^{30} [x_{i}^{2} - 10 c o s (2 π x_{i}) + 10], - 5.12 \leq x_{i} \leq 5.12

(21)

\begin{matrix} f_{9} (x) = - 20 e x p (- 0.2 \sqrt{\frac{1}{30} \sum_{i = 1}^{30} x_{i}^{2}}) - e x p (\frac{1}{30} \sum_{i = 1}^{30} c o s 2 π x_{i}) + 20 + c, \\ - 32 \leq x_{i} \leq 32 \end{matrix}

(22)

Based on the analysis of the convergence curves with experimental data, it can be observed that the SHO algorithm demonstrates a rapid convergence capability when computing the ontology-based semantic similarity of transportation infrastructure. Compared to the WOA algorithm, SHO algorithm can more quickly approach the optimal solution. These findings indicate that the SHO-BP algorithm achieves superior performance.

5. Conclusions

In the context of the increasing informatization and digitization of transportation infrastructure, there is a growing demand for the use of digital, informational, and intelligent software in transportation engineering projects. The construction of transportation infrastructure involves a multidisciplinary and interdisciplinary process, which implies that the digital software used in this process must possess the capability of integrating information from various domains. This paper proposes a method for calculating the ontology-based semantic similarity of transportation infrastructure concepts, which enables the computation of semantic correlations between different concepts within the transportation infrastructure domain. It can facilitate the alignment of information from diverse data sources during the process of information integration, enabling the integrated management and utilization of data. As a typical form of transportation infrastructure, bridges were chosen as the subject of our experiments. We conducted similarity calculation experiments using concepts related to bridges. The outcomes of these experiments demonstrate that results derived from the SHO-BP algorithm yield the smallest maximal, average, and standard deviation among the various algorithms assessed. Furthermore, the correlation coefficients between these results and evaluation values surpass those of other algorithms, thus affirming the superiority of the SHO-BP algorithm.

However, the experiments also show that there is a significant relative error when comparing the similarity of two concepts with low ontology similarity. This phenomenon might be due to the fact that the majority of concepts in the bridge ontology exhibit relatively high similarity, and concepts with low similarity are underrepresented in the dataset. The aforementioned issue could potentially impact the future integration of information in transportation infrastructure engineering projects related to BIM systems. Therefore, further research is still required to investigate the ontology structure and the computation methods for semantic similarity of transportation infrastructure. Moreover, the calculation model constructed in this study demonstrates adaptability to complex ontology structures, enabling comprehensive utilization of information within ontology models in the field of transportation infrastructure. This implies the potential applicability of this approach to other domains characterized by intricate ontology structures, such as transportation, urban planning, and hydraulic engineering, among others. These domains’ ontology models share similarities with the structure of transportation infrastructure and exhibit certain levels of relevance to it. Future research can verify the reliability of this method in other domains, thereby expanding the applicability of this computational approach.

Author Contributions

Conceptualization, T.B. and J.Y.; methodology, T.B., Z.Z. and H.Z.; software, T.B. and Z.Z.; validation, T.B., H.Z. and J.Y.; formal analysis, T.B. and J.Y.; investigation, T.B. and H.Z.; resources, T.B., H.Z. and Z.Z.; data curation, T.B. and H.Z.; writing—original draft preparation, T.B. and J.Y.; writing—review and editing, T.B., H.Z. and J.Y.; visualization, T.B. and Z.Z.; supervision, K.C. and J.Y.; project administration, Q.A.; funding acquisition, Q.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Key Research and Development Project of China (2021YFB2600302).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest. The sponsors had no role in the design, execution, interpretation, or writing of the study.

References

Yang, A.; Han, M.; Zeng, Q.; Sun, Y. Adopting building information modeling (BIM) for the development of smart buildings: A review of enabling applications and challenges. Adv. Civ. Eng. 2021, 2021, 8811476. [Google Scholar] [CrossRef]
Olawumi, T.O.; Chan, D.W. Building information modelling and project information management framework for construction projects. J. Civ. Eng. Manag. 2019, 25, 53. [Google Scholar] [CrossRef]
Pauwels, P.; Terkaj, W. EXPRESS to OWL for construction industry: Towards a recommendable and usable ifcOWL ontology. Autom. Constr. 2016, 63, 100–133. [Google Scholar] [CrossRef]
Aouicha, M.B.; Hadj Taieb, M.A.; Hamadou, A.B. Taxonomy-based information content and wordnet-wiktionary-wikipedia glosses for semantic relatedness. Appl. Intell. 2016, 45, 475–511. [Google Scholar] [CrossRef]
Sánchez, D.; Batet, M.; Isern, D.; Valls, A. Ontology-based semantic similarity: A new feature-based approach. Expert Syst. Appl. 2012, 39, 7718–7728. [Google Scholar] [CrossRef]
Kulmanov, M.; Smaili, F.Z.; Gao, X.; Hoehndorf, R. Semantic similarity and machine learning with ontologies. Brief. Bioinform. 2021, 22, bbaa199. [Google Scholar] [CrossRef]
Batet, M.; Sánchez, D.; Valls, A.; Gibert, K. Semantic similarity estimation from multiple ontologies. Appl. Intell. 2013, 38, 29–44. [Google Scholar] [CrossRef]
Lu, W.; Qin, Y.; Qi, Q.; Zeng, W.; Zhong, Y.; Liu, X.; Jiang, X. Selecting a semantic similarity measure for concepts in two different CAD model data ontologies. Adv. Eng. Inform. 2016, 30, 449–466. [Google Scholar] [CrossRef]
Sathiya, B.; Geetha, T. A review on semantic similarity measures for ontology. J. Intell. Fuzzy Syst. 2019, 36, 3045–3059. [Google Scholar] [CrossRef]
Liu, H.; Lu, M.; Al-Hussein, M. Ontology-based semantic approach for construction-oriented quantity take-off from BIM models in the light-frame building industry. Adv. Eng. Inform. 2016, 30, 190–207. [Google Scholar] [CrossRef]
Niknam, M.; Karshenas, S. A shared ontology approach to semantic representation of BIM data. Autom. Constr. 2017, 80, 22–36. [Google Scholar] [CrossRef]
Zhou, Z.; Goh, Y.M.; Shen, L. Overview and analysis of ontology studies supporting development of the construction industry. J. Comput. Civ. Eng. 2016, 30, 04016026. [Google Scholar] [CrossRef]
Park, H.; Shin, S. A Proposal for Basic Formal Ontology for Knowledge Management in Building Information Modeling Domain. Appl. Sci. 2023, 13, 4859. [Google Scholar] [CrossRef]
Yang, D.; Powers, D.M. Measuring Semantic Similarity in the Taxonomy of WordNet; Australian Computer Society: Sydney, Australia, 2005. [Google Scholar]
Zhang, Z.; Tian, S.; Liu, H. Compositive approach for ontology similarity computation. Comput. Sci. 2008, 35, 142–145. [Google Scholar]
Couto, F.M.; Silva, M.J. Disjunctive shared information between ontology concepts: Application to Gene Ontology. J. Biomed. Semant. 2011, 2, 5. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; Wei, X.; Liu, Y.; Liao, Z. A Reputation Model of OSM Contributor Based on Semantic Similarity of Ontology Concepts. Appl. Sci. 2022, 12, 11363. [Google Scholar] [CrossRef]
Zhao, C.; Cai, A. The similarity calculation of concept names. In Proceedings of the 2016 2nd International Conference on Cloud Computing and Internet of Things (CCIOT), Dalian, China, 22–23 October 2016; pp. 93–96. [Google Scholar]
Xu, F.; Ye, X.; Li, L.; Cao, J.; Wang, X. Comprehensive calculation of semantic similarity of ontology concept based on SA-BP. Comput. Sci. 2020, 47, 199–204. [Google Scholar]
Wang, L.; Zhang, F.; Du, Z.; Chen, Y.; Zhang, C.; Liu, R. A hybrid semantic similarity measurement for geospatial entities. Microprocess. Microsyst. 2021, 80, 103526. [Google Scholar] [CrossRef]
Zhang, L.; Wang, F.; Sun, T.; Xu, B. A constrained optimization method based on BP neural network. Neural Comput. Appl. 2018, 29, 413–421. [Google Scholar] [CrossRef]
Dhiman, G.; Kaur, A. Spotted hyena optimizer for solving engineering design problems. In Proceedings of the 2017 International Conference on Machine Learning and Data Science (MLDS), Noida, India, 14–15 December 2017; pp. 114–119. [Google Scholar]
Chandrasekaran, D.; Mago, V. Evolution of semantic similarity—A survey. ACM Comput. Surv. (CSUR) 2021, 54, 1–37. [Google Scholar] [CrossRef]
Altınel, B.; Ganiz, M.C. Semantic text classification: A survey of past and recent advances. Inf. Process. Manag. 2018, 54, 1129–1153. [Google Scholar] [CrossRef]
Taieb, M.A.H.; Aouicha, M.B.; Hamadou, A.B. Ontology-based approach for measuring semantic similarity. Eng. Appl. Artif. Intell. 2014, 36, 238–261. [Google Scholar] [CrossRef]
Dhiman, G.; Kumar, V. Spotted hyena optimizer for solving complex and non-linear constrained engineering problems. In Harmony Search and Nature Inspired Optimization Algorithms: Theory and Applications, ICHSA 2018; Springer: Singapore, 2019; pp. 857–867. [Google Scholar]
Aljarah, I.; Faris, H.; Mirjalili, S. Optimizing connection weights in neural networks using the whale optimization algorithm. Soft Comput. 2018, 22, 1–15. [Google Scholar] [CrossRef]
Zhang, P.; Qi, J.H.; Wu, M. An Ontology Concept Update Method Based on Hybrid Semantic Similarity. In Proceedings of the 2nd International Conference on Mechanical Engineering, Industrial Materials and Industrial Electronics (MEIMIE), Dalian, China, 29–30 March 2019; pp. 232–240. [Google Scholar]
Huang, H.; Liu, Z.; Zhang, W. Research on calculating semantic similarity based on HOM. Syst. Eng. Electron. 2009, 31, 1750–1754. [Google Scholar]
Bai, J.; Bu, Y. An improved algorithm for semantic similarity based on HowNet. In Proceedings of the 2018 2nd International Conference on Data Science and Business Analytics (ICDSBA), Changsha, China, 21–23 September 2018; pp. 65–70. [Google Scholar]

Figure 1. The structure of BP neural network (red dotted line represents the direction of back propagation).

Figure 2. Flowchart of the SHO-BP algorithm.

Figure 3. Flowchart of comprehensive semantic similarity calculation.

Figure 4. Bridge ontology model in Protégé.

Figure 5. Flowchart of model training and testing (the concepts involved in training and testing in the figure are examples).

Figure 6. Regression curves for training and testing sets.

Figure 7. The SHO-BP algorithm result and evaluation comparison curve (above), error curve (below).

Figure 8. Convergence curve under test function

f_{1} (x)

.

Figure 8. Convergence curve under test function

f_{1} (x)

.

Figure 9. Convergence curve under test function

f_{6} (x)

.

Figure 9. Convergence curve under test function

f_{6} (x)

.

Figure 10. Convergence curve under test function

f_{9} (x)

.

Figure 10. Convergence curve under test function

f_{9} (x)

.

Figure 11. Convergence curve under test function

f_{10} (x)

.

Figure 11. Convergence curve under test function

f_{10} (x)

.

Table 1. Semantic similarity calculated in different methods (the semantic similarity closest to the evaluation value within each row is highlighted in bold).

Concept Pair	Evaluation Value	SHO-BP	WOA-BP	Improved Edge-Counting	Improved IC	HOM	HowNet
Bridge deck Bridge bearing	0.48475	0.41420	0.38288	0.34014	0.40324	0.39224	0.65855
Concrete box girder Concrete T-beam	0.73087	0.74099	0.75072	0.70862	0.69431	0.74978	0.57454
Abutment Arch segment	0.19581	0.21033	0.21257	0.12870	0.22043	0.16745	0.15145
Transverse beam Suspender cable	0.21445	0.20296	0.19764	0.17351	0.20922	0.17601	0.35266
Arch segment Steel frame arch	0.93627	0.92994	0.88600	0.95768	0.78764	0.97456	0.46754
Tower column Steel anchor box	0.41634	0.41988	0.42420	0.34014	0.40220	0.39520	0.37844
Bridge Bridge deck	0.50807	0.50355	0.49456	0.47006	0.54469	0.46456	0.63891
Girder Concrete box girder	0.96123	0.92336	0.88468	0.99707	0.86346	0.80079	0.56754
Steel box girder Composite steel box girder	0.84485	0.74868	0.74284	0.70862	0.70563	0.73213	0.60877
Foundation Component Drilled pile	0.32018	0.29312	0.27865	0.23401	0.28711	0.25655	0.35875

Table 2. Pearson correlation coefficient of results obtained by different calculation methods.

Method	SHO-BP	WOA-BP	Improved Edge-Counting	Improved IC	HOM	HowNet
Pearson correlation coefficient	0.9236	0.9074	0.8676	0.84015	0.7234	0.5721

Table 3. The maximum, average, and standard deviation of different method results. (The smallest value in each column is highlighted in bold).

Error	Maximum	Average	Standard Deviation
SHO-BP	0.09617	0.02379	0.02136
WOA-BP	0.10201	0.02880	0.02879
Improved Edge-Counting	0.20158	0.05520	0.04035
Improved IC	0.14863	0.06452	0.06162
HOM	0.16044	0.03171	0.03592
HowNet	0.46873	0.18268	0.18007

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bao, T.; Chen, K.; Zhang, H.; Zhang, Z.; Ai, Q.; Yan, J. Comprehensive Calculation Method of Semantic Similarity of Transport Infrastructure Ontology Concept Based on SHO-BP Algorithm. Appl. Sci. 2023, 13, 10587. https://doi.org/10.3390/app131910587

AMA Style

Bao T, Chen K, Zhang H, Zhang Z, Ai Q, Yan J. Comprehensive Calculation Method of Semantic Similarity of Transport Infrastructure Ontology Concept Based on SHO-BP Algorithm. Applied Sciences. 2023; 13(19):10587. https://doi.org/10.3390/app131910587

Chicago/Turabian Style

Bao, Tuyu, Kun Chen, Hao Zhang, Zheng Zhang, Qingsong Ai, and Junwei Yan. 2023. "Comprehensive Calculation Method of Semantic Similarity of Transport Infrastructure Ontology Concept Based on SHO-BP Algorithm" Applied Sciences 13, no. 19: 10587. https://doi.org/10.3390/app131910587

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comprehensive Calculation Method of Semantic Similarity of Transport Infrastructure Ontology Concept Based on SHO-BP Algorithm

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. BP Neural Network

3.2. Spotted Hyena Optimizer

3.2.1. Encircling Mechanism

3.2.2. Hunting Mechanism

3.3. SHO-BP Algorithm

3.4. Comprehensive Semantic Similarity Calculation Method

3.4.1. Edge-Counting Similarity

3.4.2. Feature-Based Similarity

3.4.3. Information-Content Similarity

4. Results and Discussion

4.1. Experimental Procedure

4.2. Constructing Ontology Model and Semantic Network

4.3. Model Training and Testing

4.4. Comparison and Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI