**5. Validation**

#### *5.1. Data Collection*

To verify the effectiveness of the model for predicting the risk cost of PPP projects in the VFM evaluation stage, the urban rail transit PPP project was taken as an example. We screened out a total of 18 projects from the official managemen<sup>t</sup> database that was in the implementation stage and had completed a quantitative VFM evaluation, which included comprehensive risk identification and an allocation framework. Simultaneously, according to the profile of urban rail transit projects, a total of 11 major classes were ultimately defined; 3 unique attributes of "route length," "unit investment," and "station quantity" were added as new classes to the original ontology model in Section 4.1, while the individuals of each class were added, respectively. Then, details of the 18 projects under the 11 classes were summarized.

Specifically, for the class of "risk factors," a standardized description or a normative index system that was applicable for multiple practical projects was absent in urban rail transit PPP projects. Consequently, it was not conducive to the establishment of an ontology model. To solve this problem, we formed a risk index system that was available for these 18 projects by aggregating the risks of 18 cases, and finally divided all risks into two groups by type and by occurrence stage. The whole system contained a total of 10 primary risks and 101 secondary risks, some of which may also have tertiary and quaternary risks. The more the layers of risk indexes could be subdivided, the more integrated the corresponding ontology model was, and the more significant the differences between risks would be, which was more efficient to improve the accuracy of the whole process. Since the entire risk index system is relatively huge, only primary risks, some secondary risks, and subdivided risks in the ontology model are shown in Figure 4.

**Figure 4.** Risk Factor Hierarchy.

After all the information from the 18 cases was completely collected, we chose the project of Dalian Metro Line 5 to be the target case to be tested, and the rest were used as historical cases for empirical analysis. As there were huge amounts of data for the 18 cases, we only listing the details for Dalian Metro Line 5 in Figures 5 and 6; they reflect how to build the information of a project into the ontology structure. Eventually, the individual data under each property of each case would correspond to the unique node in the PPP information ontology model.

#### *5.2. Similarity Calculation between Cases*

#### 5.2.1. Attribute Weighting

Before measuring the similarity between the target and historical cases, the weights of classes should first be evaluated. According to the information of 18 projects whose "return modes" are total "viability gap funding," the "return mode" is temporarily removed from this validation and will be considered when more projects are available in the future. Thus, the contributions of the remaining 10 classes could follow the rules of the ID3 algorithm where each class was assigned a distinct weight according to the definition that the greater the information gain, the greater the impact on the ontology system. Results are shown in Table 1.

**Table 1.** Class weights calculation.


Note: Self-organized by the authors.

**Figure 5.** Ontology Creation of Dalian Matro Line 5.

As we could easily conclude from the results, the "risk factors" class brought the highest information gain, which meant that its weight was also the highest. Given that "risk factors" had the most subdivision levels which directly determined the depth of the ontology tree, and further indicated that with the expansion of the "risk factors," its contribution to the whole ontology would be increasing and the risk cost prediction of VFM evaluation would be more reliable.

#### 5.2.2. Cases Similarity

When the index weights calculation has been completed, the similarity between the Dalian Metro Line 5 and the historical cases in terms of each attribute can be completed based on the information ontology model.

(1) For quantitative information, take the "invest count" as an example. The maximum value of total project investment in the historical database was RMB 31,300 million and the minimum was RMB 1457.30 million, while the total project investment of Dalian Metro Line 5 was RMB 17,670.5 million and that of Tianjin Metro Line 4 was RMB 18,274.61 million, then the similarity between the two was *sim* = 1 − |17670.50−18274.61| 31300.00−1457.30 = 0.98.

**Figure 6.** Dalian Metro Line 5 Ontology Structure Diagram.

(2) For qualitative information, all the calculations were based on the conceptual semantic similarity of the ontology.

1 First, clarify the weights of the four dimensions of semantic distance, node depth, node density, and semantic coincidence. In this paper, we employed a computer program to derive the similarity values of the four dimensions between Dalian Metro Line 5 and each historical case under all classes, with a total of 14,287 datasets, which were imported into SPSS version 23.0 for principal component analysis, and the contribution rate of the four principal components was taken as the final weight of the four dimensions. Ultimately, *α* = 0.7215, *β* = 0.2410, *γ* = 0.02925, and *φ* = 0.00825.

2 Measure the combined similarity under each class. For example, the "procurement mode" of Dalian Metro Line 5 was "public bidding," while Qingdao Metro Line 4 adopted "competitive negotiation" as the "procurement mode." The similarities of the four dimensions under the class were 0.875, 0.5, 0.929, and 0.333, respectively, so the combined similarity of the two projects was 0.875 × 0.7125 + 0.5 × 0.2410 + 0.929 × 0.02925 + 0.333 × 0.00825 = 0.7817.

3 Evaluate the similarity of the risk sets. The target and historical cases contained multiple risk factors, so the calculation of their similarity was a comparison between two sets of concepts, which required calculating the combined similarity of each risk factor in one case against another case in sequence, and then exchanging the positions of two cases to make the second round of calculation. After that, extract every maximum value in every comparison to acquire a string of values whose quantity was equal to the number of risk factors in the two projects, and their average was the semantic similarity of the two cases under the class of "risk factors." Because of the workload and complexity, the whole process was implemented with the support of a computer program. Table 2 shows an example of the semantic similarity of "risk factors" between Dalian Metro Line 5, which owned 28 risk factors, and Qingdao Metro Line 4, which owned 25 risk factors.


**Table 2.** Similarity calculation of "risk factors" between two projects.

Note: Self-organized by the authors.

If all the above measurements were completed, then the general similarity between Dalian Metro Line 5 and each historical case could be obtained by weighting the semantic similarities under all classes using the contribution rates which was calculated by the ID3 algorithm. Finally, all the general similarities were greater than 80%. However, under the principle restriction of identifying no less than 3 cases, similar historical cases with an overall similarity above 85% were selected, while five items actually met the requirement. The detailed similarity values are shown in Table 3; moreover, the contribution degree of each item was determined by the general similarity, which is listed in Table 4.

As presented in Table 4, the risk costs of Tianjin Metro Line 8 Phase I and Tianjin Metro Line 4 were extremely different from those of Tianjin Metro Line 7 and Tianjin Metro Line 11 Phase I, which were all located at Tianjin and were highly similar to the two projects for every class. Additionally, no extra efforts to reduce the risk costs were detected after further studying the complete information and report of the VFM evaluation of these two cases. This demonstrated that there was likely some bias in the forecasting process. Therefore, the calculated preliminary total risk cost and retained cost of Dalian Metro Line 5 deviated drastically from the actual amount of RMB 45,70 hundred million and RMB 19.28 hundred million. To obviate this situation, some revisions were required.


**Table 3.** Calculation of the similarity of each class.


**Table 4.** Weight calculation of similar cases.

Notes: Self-organized by the authors. The currency unit is RMB 100 million.

#### *5.3. Cases Revision and Result*

By adopting Equations (16) and (17) to adjust the retained risk cost and total risk cost of Tianjin Metro Line 8 Phase I Project and Tianjin Metro Line 4, the correction process is listed in Table 5.

**Table 5.** Revision of historical cases.


Notes: Self-organized by the authors. The currency unit is RMB 100 million.

As the final results presented in Table 5, four of the five selected similar cases of Dalian Metro Line 5 were also urban rail transit PPP projects located in Tianjin, while the remaining one was Shaoxing Urban Rail Transit Line 1. Moreover, the contribution weights of Tianjin Metro Line 11 Phase I, Tianjin Metro Line 4, Tianjin Metro Line 8 Phase I, Tianjin Metro Line 4, and Shaoxing Urban Rail Transit Line 1 were 0.2049, 0.1996, 0.1987, 0.1987, and 0.1981, respectively. This showed that the contributions of the four similar cases belonging to Tianjin were the most similar. According to conventional cognition, the projects located in the same district, implementing the same urban managemen<sup>t</sup> and planning policies, owning virtually the same construction technology and investment, etc., are relatively similar to each other. The retrieval mechanism based on the ontology model has achieved the objective of identifying the similar projects in the same district with high priority. It indicated that the ontology model has realized the structured representation and completed the sharing and interoperability of project information. Meanwhile, the conceptual semantic similarity algorithm was feasible to guarantee the usability of the extracted similar cases. These advantages have been validated in previous accounts. Im et al. [64] enhanced the cost managemen<sup>t</sup> efficiency of construction projects by developing an ontological knowledge structure. Xiao et al. [65] used the ontological knowledge representation to improve access to information for construction noise control. In addition, a conceptual similarity based on ontology has been proven to be more accurate to support the retrieval measure [66]. Ontology provided a good boost to the whole process of CBR.

After case revision, the risk costs of Tianjin Metro Line 8 Phase I and Tianjin Metro Line 4 were more reasonable compared with the original cases, and their deviations from the other two projects of Tianjin were further reduced. Ultimately, the retained risk cost of Dalian Metro Line 5 was calculated to be RMB 17.15 hundred million, and the total risk cost was RMB 46.80 hundred million, while the actual cost measured in the VFM evaluation was RMB 19.28 hundred million and 45.70 hundred million, with relative errors of 11.05% and 2.41%, respectively. The accuracy was greatly improved in comparison with the preliminary risk costs calculation. Furthermore, the results were more acceptable where the risk costs of the target case were basically at the same level as all projects in Tianjin. Ji et al. [46] likewise established a more sophisticated revision mechanism to improve the accuracy of estimating housing costs by using CBR. They verified that an effective revision could make a grea<sup>t</sup> difference by comparing the results before and after the revision. Fan et al. [42] used CBR to generate the desirable risk response strategies, and further, through the analysis of the strategy-risk response relationships, to revise the inapplicable strategies. These all revealed that positive revision improved the utilization and validity of the case data.

## **6. Discussion**

Risk is demonstrated as one of the most important drivers of VFM in the PPP field [7–10]. Research on its identification and allocation is well advanced, but the cost assessment was still difficult to address. It was usually conducted based on specialists' opinions whose subjective bias cannot be completely eliminated. The CBR-based measurement model developed in this paper compensated for this deficiency by employing the expert problem-solving ideology, but removing its subjective influence [35–37]. In addition, the knowledge representation capability and hierarchical structure of ontology provided useful support for the promotion of CBR performance [49].

In this paper, a foundational ontology structure containing eight major classes was initially established, which was applicable to all industries in the PPP area. In the validation section, three extra classes of "route length," "station quantity," and "unit investment" were expanded independently according to the features of the urban rail transit PPP industry under study, and the individuals were created based on practical cases. The results showed that the expanded ontology model performed well in the whole process, which verified that the scalability of the ontology [50]. Other classes or individuals can be extended on the basis of this ontology with characteristics of many other PPP industries. A comprehensive knowledge system can be further improved by the development of PPP information ontology in the future.

Ultimately, the entire validation process demonstrated accurate results that are in line with reality. In the creation of the risk index system, 10 primary risks and 101 secondary risks were included, with some of them further subdivided, as shown in Figure 5. In the future, the risk index system can be continuously refined to provide clearer knowledge for the ontology model. Thus, it became the most detailed class of the ontology model, serving as the basis for risk costs estimation. While using the ID3 algorithm to weight the major classes, "risk factors" as the key to risk costs received the highest weight, as shown in Table 1. This not only conformed to practical perception, but also showed that the ID3 algorithm could intuitively reflect the amount of information carried by each node of the tree structure [60]. The algorithm is suitable for the ontology model. Moreover, the ontology-based conceptual semantic similarity provided an excellent comparison method for abstract qualitative information, especially the concept set of "risk factors" [62,63]. Based on the series of complementary calculations, the final extracted cases are the most similar to the target case, as shown in Tables 4 and 5. The cases located in the same district as the target were retrieved with priority, which indicated that the established model has complied with the computational requirements. The retrieval mechanism has delivered an effective and efficient extraction of similar cases. After revision, the total risk cost and the retained risk cost of the target case were better estimated, with reasonable relative errors; therefore, we believed that the effectiveness of the whole measurement cycle was well tested. Nevertheless, there is still room for improvement in the revision method of this research. The formula approach may be rigid for some cases with fortuities. Other methods are welcomed to assist in the elimination of the contingency of cases and to promote the extensive application of the measurement model.

## **7. Conclusions**

When the PPP mode stepped into a steady development period in China, the VFM evaluation system similarly entered into a mature era, acting as a solid foundation for the construction of PPP projects. However, it is still slightly inadequate in risk assessment, which relies heavily on domain specialists, and the corresponding academic research has poor practicality, leading to an unbalanced development between academics and practitioners. Considering to promote the accuracy and feasibility of the risk cost estimation of VFM evaluation and to increase the probability of reusing PPP historical cases, we combine the CBR and ontology technology to facilitate the overall efficiency of the process. Using the ontology model to structure and integrate the PPP information knowledge for the CBR cycle, based on the ontology tree structure, the overall efficiency of CBR is improved by using the conceptual semantic similarity algorithm. The revision mechanism was established to ensure the accuracy of the results of the entire measurement process. Simultaneously, more objective weighting algorithms are adapted in the entire process to alleviate the reliance on experts. Ultimately, the proposed estimation model was tested using a total of 18 urban rail transit PPP projects in the official database. Results show that the five most similar cases of the target case were efficiently extracted. The total risk cost and retained risk cost are successfully estimated, with the relative errors of 11.05% and 2.41%, respectively. Therefore, it can be concluded that the VFM risk costs measurement which involved ontology and CBR is feasible and reliable. Based on the historical cases, it has grea<sup>t</sup> accuracy, which increases the independence from experts in the quantitative evaluation of VFM. It further demonstrates that the sources involved in the PPP information ontology model established in this paper can accommodate the computational requirements of the whole process, strengthening the information integration and interoperability of PPP information, particularly the abstract qualitative information. The combination of CBR with ontology has maximized the usage and efficiency of valuable information concerning past projects from the perspective of problem-solving by human beings. We believe that the cooperation of both is going to be more satisfactory as the database is expanded and updated to be more comprehensive in the future.

However, this study has several limitations that are expected to be improved in future work. First, for an ontology to have application capabilities, it must build a consensus among users on how the world is codified [67]. For better cognition, more valid historical cases are required to be used as an information source for a more complete ontology. Additionally, a comprehensive model that integrates more categories of projects, such as wastewater treatment, elderly care facilities, ecological construction, the environment, etc., is expected to be developed in order to achieve holistic knowledge managemen<sup>t</sup> without redundancy. Third, the risk index system in the ontology model is expected to be perfected in the future. We sugges<sup>t</sup> that every PPP industry ought to affiliate with its own specific and normalized risk index system that unifies the description of risk factors. Finally, CBR requires valid retrieval and revision [46,47]. The revision method established in this paper may be too absolute for the target cases. The fact that the revised results still deviate from the practical situation is still not well explained. Other methodologies are expected to be explored for assistance in the revision process. Therefore, we continue to seek a better combination to improve the accuracy of results and avoid contingency in the process.

**Author Contributions:** Conceptualization, H.W.; investigation, Q.L.; supervision, Y.Z.; validation, Q.L.; writing—original draft, Q.L.; writing—review and editing, Q.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Publicly available datasets were analyzed in this study. This data can be found here: https://www.cpppc.org:8082/inforpublic/homepage.html#/projectPublic/ (accessed on 1 April 2021).

**Conflicts of Interest:** The authors declare no conflict of interest.
