We modeled geodesic-grid prediction as a classification problem in line with previous grid-based geocoding research. The input is a query text and the output is the probability distribution of the input text belonging to various geodesic grids. In this section, we first introduce the multi-level partitioning of the study area using the S2 spatial index and then present a model for geodesic-grid prediction based on a PLM.
3.2.2. Geodesic-Grid-Prediction Model
To predict the S2 cell to which a query text belongs, we developed a multi-level geodesic-grid-prediction model based on a PLM. We refer to this model as the PTMLG. In contrast to traditional single-level geodesic-grid classifiers, PTMLG considers the hierarchical relationships of input texts across multiple spatial scales. This allows for a more effective utilization of multiscale geographical information within the text. Furthermore, the prediction results at various levels can be combined by simultaneously considering the probability distribution of the query text across multiple spatial scales, which addresses the issue of reduced classification accuracy caused by the uneven distribution of POIs. The overall structure of PTMLG is presented in
Figure 3a. The multi-level geodesic-grid-prediction model consists of a shared feature extractor and three identically structured classification heads. Each classification head is dedicated to predicting the geodesic grids at three different levels. Parameters are not shared among the three classification heads.
The PTMLG utilizes a bidirectional PLM composed of transformers to obtain deep semantic representations of input texts. To aggregate features across an entire sentence, we first add a special token [CLS] at the beginning of an input query text
and then add a [SEP] token at the end of the sequence to mark the end. Subsequently,
is split into
L tokens using a tokenizer, and each token is transformed into a dense representation
using a word-embedding matrix. Considering the importance of word-position information in a sentence for semantic representation, it is necessary to obtain positional embeddings of each token in the positional-embedding matrix based on its absolute position in the sequence. Next, the dense representation
of the token and its positional representation
are summed and inputted into the encoder. Feature interaction is performed using a multihead self-attention mechanism that produces a semantic representation of each token in the text. This process can be represented as follows:
where
represents the input of the
k-th token,
, and
denotes the semantic representation of the
k-th token outputted by
i-th layer of the
PLM encoder. We initialized the language model’s parameters using a RoBERTa-wwm-based model pre-trained on Chinese corpora. The dimensions of
and
are both 768, where
i.
Figure 3b presents the architecture of a classification head. We use a fully connected neural network consisting of two linear layers and a softmax function to predict the probability of an input text belonging to a specific S2 cell. The LeakyReLU activation function connects the two linear layers.
We utilize the embedding of the [CLS] token outputted by the final layer of the encoder as a classification feature for the entire sentence. This feature is separately fed into the three classification heads to facilitate the prediction of different levels of S2 cells. This process is defined as follows:
where
denotes the first linear layer output of the classifier of level
and
represents the probability that the query text belongs to each S2 cell of level
.
and
denote the parameters of the
nth linear layer in the classification head of level
L, and
σ represents the activation function.
We optimized the PTMLG parameters by minimizing the cross-entropy loss, which was calculated using the following formula:
where
C represents the number of categories,
qn represents the label of the
nth category, and
represents the predicted probability of the
nth category.
During the training phase, we simultaneously predict the level-11, level-12, and level-13 S2 cells to which the input text belongs. The average of the cross-entropy losses at different levels is then computed to serve as the overall loss function for the entire geodesic-grid-prediction network.
Here, ,, and denote the cross-entropy losses at levels 11, 12, and 13, respectively, and , , and are weight factors that are used to balance the contributions of the loss functions. Considering level 13 as the primary region and levels 11 and 12 as supporting regions, we assign weights of 0.1, 0.2, and 0.7 to levels 11, 12, and 13, respectively.
3.2.3. Multi-Level Joint Inference
To determine the final score for each level-13 S2 cell, we multiply the predicted probabilities of the level-13 cells by the predicted probabilities of the level-11 and level-12 cells in their respective parents. We achieve this by using the aforementioned network to determine the probability that the query text belongs to each grid at each level during the prediction phase.
Here, denotes the final probability that the query text belongs to each grid of level 13, and denotes the index of the S2 cell in level 13 to which the query text belongs.
During our experiments, we noticed that some samples located at grid boundaries were mistakenly classified into neighboring cells. Because the quality of the candidate POI set recalled during the region-proposal stage directly affects the accuracy of subsequent query matching, we aimed to maximize the inclusion of true query values in the candidate set. After obtaining the prediction results for the level-13 S2 grid, we constructed a buffer zone with a 1000 m radius to serve as the final candidate region. Subsequent experiments demonstrated that this process can further improve the recall of POIs corresponding to a query.