2.2.1. School Location Evaluation Model
Based on the summary of the five major categories of influencing factors mentioned earlier, a school location evaluation model based on the knowledge graph considering the topological relationships, direction relationships and distance relationships of territorial spatial planning is proposed, as shown in Equation (1).
where
is the final evaluation measure of a school. The measurement range is (0–1). The larger the value, the better the location of the school based on the evaluation of the model.
is the population that the school can cover regarding specific needs;
is the total number of people living in the region;
is the Euclidean distance between the school and the factory;
is the radian value of the absolute value of the wind direction minus the angle between the school and the factory;
is the K-order neighbour value between the factory and the school;
is the total number of factories in the region;
is the Harmonic centrality of the school entity in the topological relation subgraph;
is the reciprocal of the slope at which the school is located;
is the shortest road network path between other schools;
is the K-order neighbour value between two schools;
is the total number of schools in the region;
= 1 and
= 1; and
·
,
·
,
,
and
are the standardized data.
In this article, the winter north wind is used as an example. According to the research on the importance of the influence of each factor [
7,
8,
9,
10], the weights of
,
,
,
,
,
, and
temporarily are determined to be 0.3, 0.2, 0.25, 0.125, 0.125, 0.5 and 0.5, respectively in this article (
Table 2).
The adjacency, association and inclusion relations among points, lines and surfaces in space are described as spatial topological relations [
26,
27,
28]. These relations are important for the storage and expression of spatial data, spatial analysis and practical application. Two-dimensional geographic entities that are not covered and do not overlap are the main concept or entity type in territorial spatial planning. Therefore, only three topological relations—Touches, Contains and Within—regarding surface–surface relationships are considered. Equations (2) and (3) show that the topological relationship between two different two-dimensional geographic objects can be judged by calculating the distance between them and determining whether common edges exist [
24].
Figure 5 shows the three spatial topological relationships—Touches, Within and Contains—among different two-dimensional geographic objects
a,
b and
c. Equations (4) and (5) show the symmetry of some topological relations, which can improve the computational efficiency of topological relations. In Equations (2)–(5),
a and
b are two different two-dimensional geographic entity objects, and
L is the length of the common edge. Topological relationships are considered as one type of relationship in the knowledge graph.
The direction relationship, which describes the cardinal directions of a target object with respect to a given reference object, is an important binary spatial relation that describes the spatial location of two different geographical objects in space. It is also an important part of spatial reasoning [
29,
30]. In this article, when calculating the direction relationship between two different two-dimensional geographical objects, the centre point of the geographical object is used to replace itself, and the direction relationship is measured by 0–360°. The calculation method is shown in Equation (6), where
is the angle, and (
,
) and (
,
) are the coordinates of the centre point of the two-dimensional geographic objects.
Figure 6 shows the direction relationship among different two-dimensional geographic objects a, b and c. Object
in Equation (1) is obtained by using Equation (6) to calculate the angle between two different geographical objects, subtracting the value and angle of the wind direction, and then calculating the absolute value and converting it to radians.
Euclidean Distance, Manhattan Distance, Network Distance, etc., are often used to express spatial metric relations. The Euclidean distance in 2D and 3D spaces is the straight-line distance between two points. The network distance is the path distance or cost distance between two points based on an actual network, such as a road network [
31]. In this article, only two-dimensional space is involved, so the Euclidean distance is selected to express the distance between two different two-dimensional geographical objects, and the straight-line distance between them is calculated by using the centre point.
In the context of school siting, the dispersion of factory pollutants is not constrained by road networks, making the Euclidean distance a more appropriate and relevant metric. in Equation (1) is calculated using this method. However, concerning the traffic features in school siting, road networks play a crucial role. Network distance is utilized to represent the commuting distance between two different geographic objects. In a road network, the shortest path distance is the shortest network path length from the starting point through the road network to the end point. and in Equation (1) are obtained by using the above method to calculate the network distance. is the sum of the population served by the accessible residential areas via the shortest path network within a given minimum commuting time.
Two objects are K-order neighbours if one object passes at least k adjacent objects en route to another object [
24].
Figure 7 shows the adjacent objects of order 1–3 of a two-dimensional geographic object a. Everything is interrelated, but nearby objects have a greater interrelation [
32], that is, the smaller is the k of two geographical objects that are K-order neighbour objects, the greater the influence between them and vice versa. In this model,
k is used to measure the adjacency between the different objects, which represents the spatial influence between them.
k1 and
k2 in Equation (1) are computed via the graph shortest path algorithm based on the topological subgraph.
Centrality is applied to determine the importance of distinct nodes in a network [
33]. The centrality of a single node is mainly divided into degree, PageRank, betweenness, closeness, etc. Harmonic centrality [
34] is a variant of closeness centrality and is proposed to solve the problem caused by the processing of unconnected graphs by closeness centrality. Harmonic centrality is a way to detect nodes that are able to efficiently spread information through a graph. The closeness centrality of a node measures its average farness (inverse distance) to all other nodes. Rather than summing the distances of a node to all other nodes, the harmonic centrality algorithm sums the inverse of those distances. This approach enables it to handle infinite values. Nodes with a high score have the shortest distances to all other nodes. Equation (7) shows the standardized formula of harmonic centrality. In Equation (7),
is a normalized harmonic centrality, and
is the sum of the reciprocal of the distance from the node to every other node, excluding itself.
is the number of nodes in the graph. Harmonic centrality in the topological relations subgraph of plots is used to measure the difficulty of accessing each plot and to identify plots that have a critical impact. For example, harmonic centrality can be used to determine whether the locations of public services in a city are superior or to reselect the locations of public services. In this model, harmonic centrality is a very important factor in evaluating the location of a school. If the value of harmonic centrality is large, the school can be more easily accessed and more compatible with other plots. The
in Equation (1) is computed via the harmonic centrality algorithm based on the topological relations subgraph of the plots.
After considering the five influence principles of school location selection and evaluation discussed in
Section 1, the topological relationship, direction, metric relationship and proximity are integrated. The slope is used to measure the smoothness of the terrain of the school location. In this study, we use the average slope of plots as the measurement standard. The K-order neighbourhood value and the shortest path between two schools are utilized to measure whether the educational resources are reasonably allocated. The K-order neighbourhood value is calculated based on the topological relationships between plots. The serviceable population during a commuting time is applied to measure the serviceable population of a school. In this article, we assume the coverage range within a 15 min walking distance from the school and use the area of the residential zone within this coverage range to represent the population. The harmonic centrality of the school plots in the topological relation subgraph is used to measure the convenience of accessibility between schools and other plots. Harmonic centrality is calculated based on the graph structure of territorial spatial planning knowledge graphs. When calculating the harmonic centrality, factory plots are excluded because of the negative correlation between factories and schools. The straight-line distance between a school and a factory and the difference between their direction and the wind direction are employed to measure the impact of factory emissions on the school, and the K-order neighbourhood value is used to measure the proximity between the factory and the school. Equation (1) is proposed by weighted linear superposition of five parts. The weight of each part is determined by the influence of each factor on school location. The input of the model is composed of the above five parts and their influence weights. The output is the final evaluation score
. For the model, the larger the
, the better the evaluation.
The implementation process of the quantitative school location evaluation model based on the territorial spatial planning knowledge graph involves the following steps: (1) extraction of land parcel entities and their attributes from territorial spatial planning data; (2) identification of relationships among land parcels, including topological, directional and metric relationships; (3) construction of the territorial spatial planning knowledge graph; (4) storage of the knowledge graph data; (5) based on the knowledge graph, the formulas needed to build the assessment model.
2.2.2. Construction of the Knowledge Graph
The knowledge graph schema is the core of the knowledge graph. It stores the conceptual model abstracted from facts [
24]. First, the knowledge graph schema of territorial spatial planning, including the classes, entities, attributes and relationships, should be defined. Class is an abstract concept about geographic objects; entities are instances of classes; attribute is the characteristic of a class; and relationship represents how the classes and entities are associated. The schema of the binary system of the plot class and plot type class is defined. Each plot and plot type is an instance of its own class. Each entity has all the attributes of its class. The relationship between the plot class and the plot type class is that the plot class belongs to the plot type class. Plot class and plot type class also have certain relations.
Knowledge acquisition is aimed at constructing knowledge graphs from unstructured text and other structured or semi-structured sources, completing an existing knowledge graph, and discovering and recognizing entities and relations [
21]. It is necessary to acquire the entities and relationships in territorial spatial planning before constructing a knowledge graph.
The entity knowledge should be extracted from the digital resources of the territorial spatial planning map and mean slope map in
Section 2 to acquire entities and the attributes of entities. Every plot has the attributes of ID, Plot Type, Mean_slope and Code. The plot serves as the main entity, with entity properties such as ID, Shape_area, Shape_length and Mean_slope. Plot type serves as the information entity for the plot entity, with entity properties such as type code and type.
Table 3 shows the entities and entity attributes of the territorial spatial planning knowledge graph.
A knowledge graph is defined as G = (E, R and F), where E, R and F are sets of entities, relations and facts [
21]. The triple is a general representation of a knowledge graph. The basic forms of a triple mainly include (entity A, relation and entity B) and (entity, attribute and attribute value). In this article, the entities and their properties are represented by triples such as (a, Mean_slope and 12), which means that the average slope of a is 12°. After acquiring the entity, the various items of knowledge of the relationships in the digital resources of territorial spatial planning need to be extracted.
The specific association between two entities is defined as the entity relationship, which can be regarded as the edge connecting two nodes in the knowledge graph. Spatial relationships refer to relationships with spatial characteristics between two different spatial objects, which are mainly divided into topological, directional and metric relationships. The calculation methods discussed in
Section 2.2.1 are employed to acquire the various relationships between two entities of the digital resources of the study area. There is a subordinate relationship with the plot type; this relationship is acquired according to the land-type classification code data. The triple form is also used to store various relationships and relationship attributes.
The Neo4j graph database is selected to store and visualize the territorial spatial planning knowledge graph in the form of a graph. The Neo4j native graph database is an important, open source and efficient NOSQL graph database based on the Property Graph Model. The basic structure includes nodes, relationships and properties. Nodes are connected via relationships to form a network structure [
35].
To store data in the Neo4j graph database, the mapping rules between the schema and Neo4j’s data structures need to be designed for standardizing the data storage process. The mapping rules are as follows: Nodes in the Neo4j graph database represent specific entity objects in the digital resources of territorial spatial planning. The relationships in the Neo4j graph database are used to connect not only different nodes but also independent nodes to form a knowledge network. The relationship between two entities of the digital resources in territorial spatial planning can be transformed into a directed relationship between two nodes in the graph database. The data attributes of entities are generally saved in the Neo4j database as attributes of nodes. Additionally, the relationship between two nodes can also establish attributes, which are derived from the relationship between two entities. For example, the attribute value of the “Touches” relationship between Plot a and Plot b is “100”, indicating that the adjacent common edge length between Plot a and Plot b is 100 m. The Cypher language and the py2neo tool library are used to store nodes, node attributes and node relationships in Neo4j according to the mapping rules [
36]. The entities include “Plot” and “Plot Type”, with relationships such as “Belong” and “Belong kind” linking them. Additionally, various relationships exist between plots, including “Touches”, “Within”, “Contains”, “Degree”, “Euclidean distance” and “Short distance”. Node attributes follow
Table 3.