**1. Introduction**

When visually impaired people want to walk safely to their destination, they always have to overcome many difficulties on the street. Nowadays, the common walking aids are still guide dogs and a long cane [1]. Additionally, based on advancements in AI technology, smaller embedded sensors enable wearable devices to effectively detect more road conditions in the surrounding environment [2]. However, in addition to detecting the surrounding environment while walking, visually impaired people also need a macronavigation that can handle a wide range of information to help plan their travel, such as ticket booking or path planning [3]. Due to the development of Global Positioning Systems (GPS) and Geographic Information Systems (GIS), these technologies are of great help to the development of Electronic Travel Assistance systems (ETA) such as the MOBIC Travel Aid [4], Arkenstone system [5], and Personal Guidance System [6]. However, the use of human–computer interaction to accurately understand the requirements of the user is still in need of substantial improvement [7].

For the visually impaired, the voice is the best way to communicate with a system [8], that is, through Voice User Interfaces (VUI). The latest research on VUI is the Conversational User Interface or Dialogue System, which distinguishes it from other VUI by simulating natural language dialogue instead of command interaction or response interaction [9,10]. Dialogue systems have become the main way of interacting with virtual personal assis-

**Citation:** Chen, C.-H.; Shiu, M.-F.; Chen, S.-H. Use Learnable Knowledge Graph in Dialogue System for Visually Impaired Macro Navigation. *Appl. Sci.* **2021**, *11*, 6057. https://doi.org/10.3390/app11136057

Academic Editors: Carlos A. Jara and Manuel Armada

Received: 28 February 2021 Accepted: 24 June 2021 Published: 29 June 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

tants, smart devices, wearable devices, or social robots [11]. Additionally, deep learning technology has also made great contributions to dialogue systems.

Dialogue systems are usually divided into two types, task-oriented and non-taskoriented systems [12]. What we are concerned about here is a task-oriented and multi-turn dialogue system which is suitable for road navigation. Using multi-turn dialogue to understand meaning is the main challenge in this kind of dialogue system. This work focuses on conversation as a means to model context [13] and fully understand the user's intentions.

Understanding the background and making the right response is the main goal of the dialogue system. After parsing the input sentence [14], we recommend using the knowledge graph (KG) as the knowledge base for reasoning dialogue. KG is a way of organizing knowledge. In addition to storing information, it can use deductive methods or inductive methods for reasoning [15,16]. The reasoning process is the way the dialogue system understands the context, and the result of such reasoning becomes the system's response. After each conversation, the system is constantly updated to learn more about the user and provide more accurate results for future applications.

Finally, based on the learnable knowledge graph in the multi-turn dialogue system, and the integration of the widely used GPS and GIS [17], we developed macroscopic walking navigation that can be used by the visually impaired. It can be integrated with micro-navigation to help the visually impaired arrive at targeted goals safely.

#### **2. Methods**

Our task-oriented dialogue system is built with a modular architecture. Each module is responsible for a specific task and passes the results to the next module. The modules are Automatic Speech Recognition (ASR), Natural Language Understanding (NLU), Dialogue State Tracking (DST), Dialogue Policy Learning (DPL), Natural Language Generation (NLG), and Text to Speech (TTS). DST and DPL are also called dialogue management. The modular architecture is shown in Figure 1.

**Figure 1.** Modular architecture of task-oriented dialogue system.

For ASR and TTS, we use the services provided by the Google Cloud Platform [18]. The functions of the other four main modules are briefly described as follows:


#### *2.1. Knowledge Graph Integration*

We propose to use knowledge memory, concept conversion, and logical reasoning of the knowledge graph to do the inference work for DST, and to send the reasoning results to DPL. Our KG uses the RDF triple maps to store information. The triple map is the subject–predicate–object ternary structure [19] and is currently the most mainstream way of storing knowledge graphs. In the understanding of NLU semantics, X-Bar Theory's syntactic analysis theory [20] is used, and the analysis results are converted into triple maps, which are used as the input of the knowledge graph. Figure 2 shows the architecture of the dialogue system after integrating the knowledge graph.

**Figure 2.** Dialogue system module architecture integrating knowledge graph.

The implementation process of the dialogue system which integrates the knowledge graph is shown in Figure 3. When the user speaks their requirements, the voice is recognized as text through ASR and then passed to the sentence parser of NLU. The sentence parser uses an X-Bar based parsing tool to convert sentences into RDF triple maps, which is the acceptable format for our knowledge graph. RDF triples will be checked for confirmation semantics before being sent to DST for reasoning. If it is a confirmation semantics and the response is affirmative, it means that the user accepts the previous suggestion and agrees to go to the location. Otherwise, the suggestion is canceled. Some details may be ignored (e.g., if no last suggestion was found).

**Figure 3.** Implementation flowchart of dialogue system integrating knowledge graph.

Unconfirmed semantic triples will enter the knowledge graph of DST for reasoning. The result of the reasoning will take the intention of expression and then decide whether this intention can confirm a specific type of places such as a restaurant or a station. If so, DPL will confirm with the user: "*Do you want to go to the restaurant?*". Otherwise, DPL

asks the user how to deal with the intention. For example, when the user says that he is going to have dinner, but dinner is not a type of place, therefore the system will ask the user whether they want to go to a restaurant for dinner.

#### *2.2. Syntax Analysis*

We use the Analyzing Syntax of Google Cloud Natural Language API (Google, Mountain View, CA, USA) [21] to analyze the syntax, and convert them into RDF triples maps after obtaining dependency trees based on the X-Bar Theory.

Figure 4 illustrates how to transform dependency trees into RDF triples maps. Taking "*I want to go to Zhishan MRT station*" as an example, we will take the last X-Bars in this phrase, which is (go to) and (Zhishan MRT station), used as the predicate and object, respectively. The "*I*" is as the subject to create a (Person, Traffic, Zhishan MRT station) triplet, and then this triplet will be passed into the knowledge graph for further reasoning processes.

**Figure 4.** Dependency trees to RDF triple maps.

The process of transforming dependency trees into RDF triple maps includes more details. For example, to adapt to the knowledge graph, the subject "*I*" will be transformed into the upper abstract subject "*Person*"; the verb "*go to*" is also transformed to synonymous predicate "*Traffic*"; and, because the navigation system tends to locate a specific location, but the recipient Zhishan MRT station cannot find lower-level objects in the knowledge graph, it will be directly delivered to DPL to search for it. This concept of conversion via knowledge graph is shown in Figure 5.

**Figure 5.** Concept conversion via knowledge graph.

#### *2.3. Reasoning with Knowledge Graph*

The reasoning of the knowledge graph mainly revolves around the reasoning of the relationship. Based on the facts or relationships in the graph, it infers unknown facts or relationships [22], and generally pays attention to checking the three aspects of the entity, the relationship, and the structure of the graph. Knowledge graph reasoning techniques are mainly divided into two categories, those based on the deduction (such as description logic [23], Datalog, production rules, etc.) and those based on induction (such as path reasoning [24], representation learning [25], rule learning [26] and reinforcement learning [27], etc.).

This article uses induction-based path reasoning, mainly through the analysis and extraction of existing information in the knowledge graph, since most of the information in the graph represents a certain relationship between two entities. After syntactic analysis, the user's speech is also converted into triples as input so that the two can use triple maps as a communication interface.

We use the PRA (Path Ranking Algorithm) to find the most suitable destination for the user [28], learn the relationship characteristics of the knowledge graph through random walks, quantitatively calculate whether there is a relationship between two nodes, and determine the probability of the relation. The following examples illustrate the application of the PRA algorithm in macro-navigation.

In this case, a visually impaired person wants to go to a restaurant for dinner, but he doesn't know which one to go to, so he says to the navigator, "*I want to have dinner.*" The content of dinner in the knowledge graph is shown in Figure 6.

**Figure 6.** The part of knowledge graph in the case.

Step 1: *Eq* = {Restaurant, Supermarket, Online Service}, *R*<sup>1</sup> = locate, for any e ∈ E*q*/*R*1, assuming the scoring function h = 1/3, then the following path is shown in Figure 7.

**Figure 7.** The part of knowledge graph in step 1.

Step 2: *Eq* = {Restaurant, Fast food}, *R*<sup>2</sup> = locate, calculate *h*(Restaurant, locate, Fast food) and *h*(Restaurant, hold, Performance), obviously *h*(Restaurant, hold, Performance) = 0. For *P*1: Dinner-Restaurant-Fast food, *P*2: Dinner-Restaurant-Performance, *h*(*P*1) > *h*(*P*2). Step 3: And so on, the result is shown in Figure 8.

**Figure 8.** The part of knowledge graph in step 2 and step 3.

Suppose there is a path *P*: Dinner-Restaurant-Fast food- ... -Burger King A1 Store, the path length is *n*, the *hi* between two nodes is calculated, and then all *h* is added to get the entire path *P.* The score value is *h*(*P*).

But it should be noted that the weight of each path is not necessarily the same. For example, the user may prefer to eat McDonald instead of Burger King, so the final score *h*(*P*) is given the weight parameter *θ*, which is also a learnable parameter.

Step 4: Calculate weighted summation.

$$\text{Score(Dinner-\dots-\dots-Burger King A1 Store)} = \theta\_1 P(1) + \theta\_2 P(2) + \dots \dots + \theta\_n P(n) \tag{1}$$

More generally, given a set of paths *P*1, *P*2, ... , *Pn*, one could treat these paths as features for a linear model and rank answers *e* to the query *Eq* by

$$
\theta\_1 \, hE\_{q\_1}, P\_1 \, (\varepsilon) + \theta\_2 \, hE\_{q\_2}, P\_2 \, (\varepsilon) + \dots \dots \, \theta\_n \, hE\_{q\_n}, P\_n \, (\varepsilon) \tag{2}
$$

The final scoring function:

$$s(\mathfrak{e}; \theta) = \sum\_{P \in \mathcal{P}(q, l)} h\_{E\_{q\*}P(\mathfrak{e})} \theta\_P \tag{3}$$

We can construct the training set with the set of relation *R* and the starting point *s* and ending point *t*, and obtain the weight parameter *θ* through logistic regression. After each conversation, according to the user's decision, the weight parameter will be updated, making the knowledge graph more and more suitable for the user's habits.

#### **3. Results**

The main contribution of this paper is to introduce the knowledge graph to the navigation dialogue system and apply the PRA path search algorithm to find the best method for use. We also propose a practical macro-navigation architecture, as shown in Figure 9. The architecture clearly defines the interdependence of the main modules in the dialogue system. In addition to the use of Google Cloud Platform for ASR and TTS, as described above, syntactic analysis is also integrated Google Cloud Natural Language API, and DPL uses Google Maps API to complete the function of geographic path search [29]. In the implementation of the knowledge graph, we use the Apache Jena triplet database (Apache Software Foundation, Forest Hill, MD, USA) [30].

**Figure 9.** Macro-navigation system architecture based on knowledge graph.

After the user's voice is converted into text by ASR, it is given to Phrase Parser for syntactic analysis. In the process, the support of the knowledge graph will be used to convert phrases into triples in order to provide subsequent path reasoning. Decisions obtained by DST using the results of PRA path reasoning will be executed by DPL, such as using Google Maps API to search for real locations or notify micro-navigation to initiate navigation. The response message processed by DPL will be converted into an appropriate sentence according to the user's language and, finally, sent to TTS to utter a voice to complete a round of dialogue processing.

#### *3.1. Dialogue Experiment*

Our experiment is mainly to verify whether the system can discuss an appropriate destination with the user. We designed three scenarios from simple to complex as experimental methods. We use manual methods to create data for the knowledge graph.

The first scenario is that the user directly speaks to a specific destination. The example here is the Seven-Eleven Convenience Store, Xue Cheng branch. This scenario will confirm that the system has the basic ability to command dialogue. The dialogue process is shown in Table 1. Because of the leaf node of the location relationship in the knowledge map of the Seven-Eleven convenience store, Xue Cheng branch, we can see that the system will directly lead the user there.

The system needs a hotword: "*Hi, partner*." to start the dialogue, which makes the dialogue system not too sensitive.

**Table 1.** Display of the dialogue process of the navigation system in the first scenario.


The second scenario is that the user speaks out an indirect destination so that the system can get to the real destination by reasoning. The example here is that the user says that he wants to go to work, and the system deduces that the place where he usually goes

to work is his company's location. This scenario will show the ability for simple reasoning, and the dialogue process is shown in Table 2. Since *Eq* = {Office, Engineering Building 5}, *R*<sup>1</sup> = locate which has the highest score, the system will advise the user to go to Engineering Building 5.

**Table 2.** Display of the dialogue process of the navigation system in the second scenario.


The third scenario is to verify more remote reasoning so that the system can start from a vaguer intention, and obtain the most suitable destination through multiple rounds of dialogue. The scenario here is the same as the description in the previous section. The dialogue process is shown in Table 3.


