Next Article in Journal
Multitask Learning for Crash Analysis: A Fine-Tuned LLM Framework Using Twitter Data
Previous Article in Journal
Impact of Privacy Filters and Fleet Changes on Connected Vehicle Trajectory Datasets for Intersection and Freeway Use Cases
 
 
Article
Peer-Review Record

Towards Next-Generation Urban Decision Support Systems through AI-Powered Construction of Scientific Ontology Using Large Language Models—A Case in Optimizing Intermodal Freight Transportation

Smart Cities 2024, 7(5), 2392-2421; https://doi.org/10.3390/smartcities7050094
by Jose Tupayachi 1, Haowen Xu 2,*, Olufemi A. Omitaomu 2, Mustafa Can Camur 1, Aliza Sharmin 1 and Xueping Li 1
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Smart Cities 2024, 7(5), 2392-2421; https://doi.org/10.3390/smartcities7050094
Submission received: 13 May 2024 / Revised: 2 August 2024 / Accepted: 9 August 2024 / Published: 31 August 2024
(This article belongs to the Topic Artificial Intelligence Models, Tools and Applications)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This is a very interesting paper on automatic generation of ontologies. The methodological aspects are excellently presented, the used tools are well-motivated and described. I recommend including a discussion on similarities and differences between ontologies and knowledge graphs as a motivation for concentrating on ontology tools. 

The paper provides a very clear overview on the workflow, but I find the explanations of the of the use case too briefly presented. As it is can be seen as demonstration of the presented methodology, it requires much more specific, freight transportation details. The current version does not contain any image with freight transportation ontology!!!

I would like to see specific examples within the individual steps such as examples of the used data sets, identified terms and vocabularies and specifically relationships, the derived transportation network, examples of optimisation, etc. 

According to the presented methodology, the graph data model is derived from FAF data set manual. Can it be derived from maps? How the information from maps can be taken into consideration? Computations on graph will require lots of metrics. How has it been derived?

What kind of database for ontologies is used and what is the spatial schema? Please provide a chart, e.g. UML diagram can be handier instead of RDF or OWL listings.

It would be valuable to include a specific reasoning example, e.g. selection of path under specific constraints. Discuss how the ontology reasoning would be visualised? Provide examples with the failures as discussed in the section for future work. 

This paper needs a section analysing the current development. It is unclear what has been realised as implementation and how complex queries can be executed.  

 

Author Response

Comment 1:  

This is a very interesting paper on automatic generation of ontologies. The methodological aspects are excellently presented, the used tools are well-motivated and described. I recommend including a discussion on similarities and differences between ontologies and knowledge graphs as a motivation for concentrating on ontology tools.  

Response: Thank you for your kind comments and recommendations. Ontology and knowledge graphs are closely related concepts with different focuses: domain conceptualization versus information integration and management. Our method leverages LLMs to create a scenario-based ontology based on research articles and manuals of domain datasets and simulations. Subsequently, through knowledge engineering software and tools, knowledge graphs are directly derived from the ontology. In the revised manuscript, we have added a discussion, with references, to explain the similarities and differences between ontologies and knowledge graphs. We highlight the revised text content in red color in this revision. Please see the highlighted text in second paragraph of section 2 and in the second paragraph of subsection 2.1.  

  

Comment 2:  

The paper provides a very clear overview on the workflow, but I find the explanations of the of the use case too briefly presented. As it is can be seen as demonstration of the presented methodology, it requires much more specific, freight transportation details. The current version does not contain any image with freight transportation ontology!!! 

Response: We appreciate your comments. In the revised manuscript, we demonstrated the details of freight transportation ontology and knowledge graph through newly added Figure 6, 7, 8 in newly added subsection 4.2-4.4 in this revision. The use of this set of ontologies for the optimization purposes is shown in Figure 10 and represents the use of these sources to retrieve information that finally is used for the end goal of CO2 reduction and route optimization (shortest path). Please see the newly added subsection 4.6 in this revision.  

 

Comment 3:  

I would like to see specific examples within the individual steps such as examples of the used data sets, identified terms and vocabularies and specifically relationships, the derived transportation network, examples of optimisation, etc.  

Response: In the context of the intermodal Transporation case study, we identified 3 relevant literature (in PDF files) and used them as the inputs to our analytical pipeline. This raw information is then processed by employing the propped method and converted into text to later supply the LLM with the data necessary to perform the ontology generation. On the other hand, under figure 10 an example of optimization is shown, this entangles the theoretical examples of the ontology plus show the ultimate usage of the ontology. 

We provide specific examples of the intermediate outputs in a step-by-step fashion through the newly added Figure 5 and 10 in the revised manuscript. Please see the newly added subsection 4.6 in the revised manuscript. Specific examples of the AI-generated ontology and knowledge graph are depicted in the newly added Figure 6, 7, and 8 in this revision.  

 

Comment 4:  

According to the presented methodology, the graph data model is derived from FAF data set manual. Can it be derived from maps? How the information from maps can be taken into consideration? Computations on graph will require lots of metrics. How has it been derived? 

 

Response: In the context of the intermodal freight transportation case studies, some required GIS information can be derived from maps. For example, the topologies (connectivity) of transportation networks (e.g., road and rail) can be downloaded and extracted from the OpenStreetMap website.  

Many cartographic maps are created in raster formats (e.g., bitmaps and PNGs) and delivered through web-based map engines (e.g., Google Maps, ArcGIS Online, and Leaflet) via web map services. To extract transportation network information from these cartographic maps, we need to employ deep learning algorithms and digital image processing techniques to (a) segment and vectorize the network and (b) extract text labels from raster-based maps. Once the network is stored in a vector format (e.g., shapefiles), we can use various spatial analytics tools (e.g., ArcPy, GDAL, and QGIS) to extract the topology and spatially derive and join properties to the network components (e.g., edges and nodes). 

Computing on the graph requires numerous metrics. In our case study, we derived the required transportation network based on the Origin-Destination (OD) pairs documented in the FAF data. The resulting network (represented by a graph) describes the simplest form of the connectivity of roads, rails, and rivers across the US, with graph components (e.g., nodes as cities and edges as routes) serving as building blocks to join attributes and metrics necessary for decision optimization. To optimize computational load, we store the graph in a neo4j database, developed to handle large-scale graph data, without including non-essential edge and node attributes. Essential attributes, such as distances and slopes, are pre-calculated using spatial analysis tools and included in the graph as numerical attributes. This practice enhances neo4j’s performance in navigating the transportation network to identify possible routes (consisting of a list of nodes and edges) between any user-defined origin and destination. Once all alternative routes are identified (with nodes and edges IDs), we use the AI-generated ontology to guide information searches from distributed relational databases to retrieve relevant metrics (e.g., greenhouse emissions, intermodal costs, and road or rail geometry) based on the requirements of the optimization. 

We also agree with your statement “Computations on graph will require lots of metrics”, therefore we adopted the “separation of concern” paradigm to separately store and manage the network topology (graph connectivity), network geometry (vector lines in shapefiles and WKT formats), and decision metrics in different database tools. This paradigm could significantly opitmize the preformance.  

  

Comment 5: What kind of database for ontologies is used and what is the spatial schema? Please provide a chart, e.g. UML diagram can be handier instead of RDF or OWL listings. 

Response: In our case study, we use two types of databases: (a) neo4j, a graph database, to store the transportation network topologies defined by the FAF datasets, and (b) PostgreSQL, a relational database, to store required metrics from FTOT and other datasets to support decision optimizations. The ontology and its knowledge graph are mainly used to construct the database schema and ER diagram for the PostgreSQL database and provide essential metadata and descriptions for each metric used in decision-making. Through the traverse of the graph stored in Neo4J, we can identify all possible routes between a set of user-defined origin and destination. Based on the identified routes, our ontology facilitates the information discovery and retrieval of all required metrics from the PostgreSQL database based on proper information mapping (between individual metrics to their associated graph components in identified routes, such as edges and nodes). Details regarding how the AI-generated ontology is used to guide the database design is elaberated in subsection 4.2 and 4.3.  

This rationale is illustrated in the newly added Figure 5 and Figure 10, both of which are workflow charts, in the revised manuscript. The newly added figure 8 presents the network topology through neo4j database, which is form of schema that resemble UML format.  

 

Comment 6:  

It would be valuable to include a specific reasoning example, e.g. selection of path under specific constraints. Discuss how the ontology reasoning would be visualized? Provide examples with the failures as discussed in the section for future work.  

Response: We agree with your comments. Please see the newly added Section 4.6 and Figure 10 in this revision, which presents a user scenario for identifying subsets of paths based on user-defined Origin and Destination (OD). The initial path identification is based on the shortest path query enabled through the Neo4J graph database. The identified paths, represented by combinations of modes, nodes, and edges, are then used to retrieve metrics from multi-domain urban datasets using the AI-generated ontology (as data descriptor) and a relational database. The ontology serves as a data descriptor, helping the decision support system identify relevant data sources for decision metrics and support problem formulation. Each path combination, along with its aggregated metrics, is then used as input for the GUROBI optimization engine for more sophisticated reasoning. 

 

Comment 7:  

This paper needs a section analysing the current development. It is unclear what has been realised as implementation and how complex queries can be executed.   

Response:  

Regarding current developments, we have added a new subsection to the literature review to discuss existing approaches similar to our methods. Please refer to Subsection 2.6 in this revision. For the implementation, the architecture presented in Figure 3 has been realized using Python, Neo4J, and PostgreSQL databases. We elaborate on the implementation details and their performance in this revision. Please see the newly added Subsections 4.2 and 4.3 and 4.4, as well as Figure 5. Further details regarding the implementation are provided in Subsection 4.6. We can provide the repository for our prompt templates and API endpoint on request. 

Reviewer 2 Report

Comments and Suggestions for Authors

The authors introduce and explore a methodology for generating ontologies for urban systems and transportation domains. The methodology described is sound and has value in practical applications in this domain. My primary concerns relate to the use of closed-source models, lack of quantitative evaluations, and lack of citations to recent literature on the subject.

+ Strengths

- There is significant value in automated ontology building for smart cities applications and the manuscript proposes a framework that could address the problem.

+ Weaknesses

- Ontology generation using LLMs has been studied in past literature, but such approaches are not referenced. See, for example, [1][2].

- There are problems regarding reproducibility and reliability when closed-source models like GPT-4 are used for implementation. It would be beneficial to report/compare the performance of the approach using open-source LLMs, such as Llama-3.

- The manuscript lacks a quantitative analysis of the proposed approach. With hallucinations being a major concern about LLMs in deployment scenarios, it is important for their limitations in the smart cities domain to be explored, and potential methods for addressing them to be studied. A manual review and numerical analysis of the results in Section 4, for example, could provide such information.

- The prompts used in the study are not mentioned in the manuscript and are important for the reproducibility of the approach.

+ Relevant Literature:

[1] Babaei Giglou, H., D’Souza, J. and Auer, S., 2023, October. LLMs4OL: Large language models for ontology learning. In International Semantic Web Conference (pp. 408-427). Cham: Springer Nature Switzerland.

[2] Caufield, J.H., Hegde, H., Emonet, V., Harris, N.L., Joachimiak, M.P., Matentzoglu, N., Kim, H., Moxon, S., Reese, J.T., Haendel, M.A. and Robinson, P.N., 2024. Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning. Bioinformatics, 40(3).

Comments on the Quality of English Language

+ Minor Issues:

Line 413: promotes -> prompts

Line 466: the sentence ends abruptly.

Figure 3: missing . at the end of the caption.

Author Response

Comment 1: 

The authors introduce and explore a methodology for generating ontologies for urban systems and transportation domains. The methodology described is sound and has value in practical applications in this domain. My primary concerns relate to the use of closed-source models, lack of quantitative evaluations, and lack of citations to recent literature on the subject. 

Response: We appreciate your comments. In the revised manuscript, we have added a demonstration of the prompt templates used to tune our model (Figure 5 in Subsection 4.2) and a qualitative evaluation (Table 1 in Subsection 3.3). Additionally, we have included a more comprehensive literature review discussing recent studies that utilize LLMs to construct ontology and knowledge graphs. Please refer to Subsection 2.6 in the revised manuscript. All the modified and newly added texts are highlighted in red in this revision.  

 In terms quantitative analysis, we conducted a review on the subject of ontology evaluation in the newly added subsection 3.3.1 in this revision, and we have concluded based on the existing literature that an CQ-based qualitative evaluation is more relevant to our ontology use-case, which focuses on generating scenario-based ontology that is more subjective (defined by objective function and research goals).  

 

+ Strengths 

- There is significant value in automated ontology building for smart cities applications and the manuscript proposes a framework that could address the problem. 

+ Weaknesses 

Comment 2: 

- Ontology generation using LLMs has been studied in past literature, but such approaches are not referenced. See, for example, [1][2]. 

Response: We agree with your comment and have added a subsection to the literature review to expand our discussion on ontology and knowledge graph creation using LLMs. Additionally, we have included discussions on two more recent references. Please see Subsection 2.6, “Knowledge Engineering and LLMs,” in the revised manuscript. Please also see the newly added bullet point 3, which explains how our work complement the existing work, in Subsection 2.8 in this revision.  

 

Comment 3: 

- There are problems regarding reproducibility and reliability when closed-source models like GPT-4 are used for implementation. It would be beneficial to report/compare the performance of the approach using open-source LLMs, such as Llama-3.  

Response: We agree with your concerns, and the performance comparison between GPT-4 and open-source models has been provided by one of the article we included in our literature review section (Subsection 2,6). As our current study primarily focuses on the application of AI-generated ontology for supporting complex urban decisions, we cite the existing research to discuss the performance comparison between different models.  

Please see the reference added to the revised manuscript:  

Kommineni, V. K., König-Ries, B., & Samuel, S. (2024). From human experts to machines: An LLM supported approach to ontology and knowledge graph construction. arXiv preprint arXiv:2403.08345. 

 

 

Comment 4: 

- The manuscript lacks a quantitative analysis of the proposed approach. With hallucinations being a major concern about LLMs in deployment scenarios, it is important for their limitations in the smart cities domain to be explored, and potential methods for addressing them to be studied. A manual review and numerical analysis of the results in Section 4, for example, could provide such information. 

Response:  In terms quantitative analysis, we conducted a review on the subject of ontology evaluation in the newly added subsection 3.3.1 in this revision, and we have concluded based on the existing literature that an CQ-based qualitative evaluation is more relevant to our ontology use-case, which focuses on generating scenario-based ontology that is more subjective (defined by objective function and research goals). We added a qualitative analysis to the revised manuscript, please see subsection 3.3.  

 We also discussed potential solutions to resolve hallucinations in this revision by citing recent literature. LLM’s hallucinations can be minimized through the matching technique that utilizes the vocabulary and concept definition from the existing domain ontologies to validate AI-generated content, as proposed by Caufield et al., 2024. Please see the newly added content in Section 5 in this revision. Given the circumstance that we are developing an LLM-powered tool to generate scenario-based ontology to extend the existing foundational domain ontology, the hallucinations are not a major concern, as our pipeline can reduce hallucinations through semantic mapping and matching technique.  

 Caufield, J. H., Hegde, H., Emonet, V., Harris, N. L., Joachimiak, M. P., Matentzoglu, N., ... & Mungall, C. J. (2024). Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning. Bioinformatics, 40(3), btae104. 

  

Comment 5:  

- The prompts used in the study are not mentioned in the manuscript and are important for the reproducibility of the approach.  

Response: We provide some demonstrations of the prompts we devised through this study in the context of individual tasks of our methods. Please see Figure 5 and Figure 10 in the revised manuscript. Additionally, we have also uploaded our prompt base to an online repo to support the revision of the manuscript: https://github.com/ILABUTK/RECOIL_Auto_Onotology 

The access to the repo can be provided on request. 

 

+ Relevant Literature: 

[1] Babaei Giglou, H., D’Souza, J. and Auer, S., 2023, October. LLMs4OL: Large language models for ontology learning. In International Semantic Web Conference (pp. 408-427). Cham: Springer Nature Switzerland. 

[2] Caufield, J.H., Hegde, H., Emonet, V., Harris, N.L., Joachimiak, M.P., Matentzoglu, N., Kim, H., Moxon, S., Reese, J.T., Haendel, M.A. and Robinson, P.N., 2024. Structured prompt interrogation and recursive extraction of semantics (SPIRES): A method for populating knowledge bases using zero-shot learning. Bioinformatics, 40(3). 

 

Reviewer 3 Report

Comments and Suggestions for Authors

The authors propose a use case for Optimizing Intermodal Freight Transportation within the context of Smart Cities. The work is interesting, but several issues need to be addressed.

- Figures 4 and 5 are not clear and need to be improved.

- Include more structured information in tables and diagrams, showing relationships with technology and models. The text is dense with information, and visual resources would greatly aid in understanding the contribution.

- References from 20 years ago may not be relevant to current technologies. Replace outdated references with more recent ones and eliminate those that are no longer applicable.

- In the literature review, comparative tables between the current proposal and existing works are necessary.

Addressing these points will make the paper more coherent and accessible, making the contribution clearer and easier to follow.

Author Response

Comment 1  

The authors propose a use case for Optimizing Intermodal Freight Transportation within the context of Smart Cities. The work is interesting, but several issues need to be addressed. 

Response: Thank you for your comments.  

 

Comment 2 

- Figures 4 and 5 are not clear and need to be improved. 

Response: In this revision, we have added more figures to elaborate on our technical details and results. Please refer to the newly added Figure 5, which extends Figure 4 from the original manuscript, and the newly added Figure 10, which expands on the technical details presented in the original Figure 5.  

 

Comment 3 

- Include more structured information in tables and diagrams, showing relationships with technology and models. The text is dense with information, and visual resources would greatly aid in understanding the contribution. 

Response: We agree with your comments, and in the revised manuscript, we newly added Figure 5-10 and table 1, to improve the visualization of our workflow, results, intermediate results, and implementation strategies.  

 

Comment 4 

- References from 20 years ago may not be relevant to current technologies. Replace outdated references with more recent ones and eliminate those that are no longer applicable. 

Response: We agree, and we have added subsection 2.6 in the revisited manuscript to review very recent literature on the relevant subjects.  

 

Comment 5: 

- In the literature review, comparative tables between the current proposal and existing works are necessary. 

Response: We appreciate your comment. We have identified four relevant works that explore the LLM's capability for constructing ontologies and knowledge graphs; please see Subsection 2.6. The literature review section has been divided into multiple subsections with different subjects and focuses, making it challenging to use a single table to summarize and compare the various works. Many of the works included in our review, particularly in Subsection 2.6 regarding the use of LLMs for generating ontologies and knowledge graphs, are ongoing preprints at a preliminary stage with different focuses, making direct comparison difficult.  

 

Comment 6: 

Addressing these points will make the paper more coherent and accessible, making the contribution clearer and easier to follow. 

Response: Thank you again for your comments, we agree, and we have carefully revised our manuscripts with the addition of a significant number of new materials to make the manuscript coherent. Please see the newly added subsections 2.6, and 4.2 - 4.6. 

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors reflect my comments very well. 

Author Response

Thank you for your comments.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have addressed my comments; however, I do not believe the contribution is sufficient for publication. There is no significant addition to the existing research. Additionally, the figures are difficult to read due to their high density and complexity.

Author Response

Thank you for your comment. We have conducted a thorough revision to address your comments and concerns. All changes are highlighted in red text.

Regarding our research contributions, we expanded and refined Section 3 to elaborate on knowledge gaps in the state-of-the-art literature on a case-by-case basis, with a detailed discussion presented in Subsection 3.3. Most existing literature focusing on a similar scope are ongoing research efforts presented through preprints. None of the existing studies focus on the same topic as our study, which developed an LLM-powered method to generate scientific ontologies using complex and multidimensional datasets to support operations research by solving a real-world urban decision optimization problem. In contrast, existing studies primarily focus on developing proof-of-concept using generic text documents and conducting performance analysis between different LLMs.

In this revision, we highlighted our contributions to different disciplines, including AI applications, operations science, and software engineering, in the newly added Subsection 3.4. We have also provided a detailed explanation of the knowledge gaps in contemporary studies, many of which are presented through preprints, to highlight the uniqueness of our contributions. Table 2 (on page 24) is newly added to the revised manuscript to further demonstrate our contribution to the operations research sector.

To address the complexity of the figures and tables, we have reduced the number of figures by including some as supplementary materials in a repository (with links provided in this revision) and simplified the figures and tables with overwhelming information. Please refer to Table 1 on page 17 and the modified Figures 3, 4, and 5 in this revision. We ensured that all tables and figures are now readable by revising Table 1 and enhancing the clarity of all figures.

We would also like to mention that, when this manuscript was submitted in May 2024 (with its full preprints posted on arXiv), there was very little literature focusing on the utilization of LLMs to construct scientific ontologies for complex and multidimensional datasets. Most of the preprints presenting proof-of-concept LLM-ontology applications emerged in June and July of this year and focus on applications very different from our study. Currently, most of the existing literature, which remains scarce in this field, consists of ongoing studies presented through preprints.

Back to TopTop