A Stacking Ensemble Learning Method to Classify the Patterns of Complex Road Junctions

Yang, Min; Cheng, Lingya; Cao, Minjun; Yan, Xiongfeng

doi:10.3390/ijgi11100523

Open AccessArticle

A Stacking Ensemble Learning Method to Classify the Patterns of Complex Road Junctions

by

Min Yang

¹

,

Lingya Cheng

¹,

Minjun Cao

¹ and

Xiongfeng Yan

^2,*

¹

School of Resource and Environmental Sciences, Wuhan University, 129 Luoyu Road, Wuhan 430079, China

²

College of Surveying and Geo-Informatics, Tongji University, 1239 Siping Road, Shanghai 200092, China

^*

Author to whom correspondence should be addressed.

ISPRS Int. J. Geo-Inf. 2022, 11(10), 523; https://doi.org/10.3390/ijgi11100523

Submission received: 18 August 2022 / Revised: 4 October 2022 / Accepted: 14 October 2022 / Published: 17 October 2022

Download

Browse Figures

Versions Notes

Abstract

:

Recognizing the patterns of road junctions in a road network plays a crucial role in various applications. Owing to the diversity and complexity of morphologies of road junctions, traditional methods that rely heavily on manual settings of features and rules are often problematic. In recent years, several studies have employed convolutional neural networks (CNNs) to classify complex junctions. These methods usually convert vector-based junctions into raster representations with a predefined sampling area coverage. However, a fixed sampling area coverage cannot ensure the integrity and clarity of each junction, which inevitably leads to misclassification. To overcome this drawback, this study proposes a stacking ensemble learning method for classifying the patterns of complex road junctions. In this method, each junction is first converted into raster images with multiple area coverages. Subsequently, several CNN-based base-classifiers are trained using raster images, and they output the probabilities of the junction belonging to different patterns. Finally, a meta-classifier based on random forest is used to combine the outputs of the base-classifiers and learn to arrive at the final classification. Experimental results show that the proposed method can improve the classification accuracy for complex road junctions compared to existing CNN-based classifiers that are trained using raster representations of junctions with a fixed sampling area coverage.

Keywords:

complex road junction; pattern classification; convolutional neural network; stacking ensemble; area coverage

1. Introduction

As an essential part of road networks in urban areas, road junctions connect primary roads in various directions. Vehicles and pedestrians often require turning in road junctions to reach their desired destination. Therefore, road junctions play a critical role in modern transportation systems. However, the structures of road junctions are often not explicitly recorded in existing spatial databases; hence, locating road junctions and classifying their patterns is an important component for many applications. For example, Ulugtekin et al. [1] emphasized that road junctions should be represented in a suitable manner for the application of car navigation. Mackaness and Mackechnie [2], Touya [3], and Yang et al. [4] highlighted that recognizing the structures of the junctions is essential for pattern-preserving road-network generalization. In addition, the recognition of road junctions contributes to traffic analysis and management [5,6,7] and urban planning and landscape design [8].

Road junctions can generally be classified as simple or complex. Simple junctions, such as planar crossroads and T-shaped road intersections, are defined as intersections where roads directly meet, whereas complex junctions connect two or more primary roads though slip roads or ramps. Complex junctions involve two different scenarios: planar structures with slip roads for smooth turning and grade-separated interchanges with ramps that allow vehicles to travel from one road to another without interruptions [9,10,11]. In general, complex junctions are places where roads meet in a complex manner. Therefore, complex junctions have intricate internal structures and various external forms, making it difficult to characterize them and identify their patterns. In practice, the recognition of complex junctions still relies on manual visual inspection, which requires high labor and time costs.

Over the past few decades, several approaches have been developed to automate the recognition of complex junctions in road networks. For example, Mackaness and Macchechnie [2] proposed a method for locating complex junctions by searching for areas in which road nodes are densely distributed. Similar ideas were developed by Touya [3] and Zhou and Li [11], who detected complex junctions by clustering characteristic road nodes, including Y-shaped, y-shaped, fork-shaped, and multi-leg nodes. Aiming to preserve the integrality of complex junctions, Yang et al. [4] introduced road design principles to clarify the topological boundaries of complex junctions. Li et al. [9] utilized a target detection model, that is, the faster-region convolutional neural network, to detect the locations of interchanges based on raster representations of road networks. Yang et al. [10] developed a graph-based, deep-learning approach to detect segments belonging to interchanges in road networks. Interchange structures were obtained by clustering the detected interchange segments.

The aforementioned approaches focused on identifying the locations of complex junctions in road networks. In comparison, few studies have focused on the pattern classification of the detected complex junctions. An early study was conducted by Xu and Yan [12], who developed a template-matching-based approach to classify complex junction patterns. In this approach, a road junction was first represented as a directional attribute-relationship graph. The pattern of this junction was then classified by searching the template with the highest similarity in the library based on their graph representations and predefined rules. Wang et al. [13] used topological features to describe structures of the junction to identify the matching template for each junction by calculating topological similarity. Owing to the diversity of complex junction patterns, these approaches have limitations. First, they relied heavily on manual design features and constructed template libraries, resulting in difficulty in classifying complex junctions with irregular structures. Second, they inevitably suffered from the problem of threshold setting during the determination of structural patterns.

In recent years, convolutional neural networks (CNNs) have developed rapidly and achieved great success in various fields, including object detection [14], natural language processing [15], and speech recognition [16]. Compared to the traditional machine learning methods, CNNs can capture high-level features from shallow features. This advantage has motivated researchers to apply CNNs and their variants to handle spatial data [17,18,19]. From the perspective of artificial visual cognition, the classification of road junctions is similar to image classification. Recent studies have explored the potential of CNNs for characterizing and classifying complex junctions. First, image processing was implemented to convert the vector-based junctions into image representations, and then a CNN-based classifier, such as AlexNet [20], U-net [21], or GoogLeNet [22], was constructed to learn the high-level features from image samples and classify the patterns of complex junctions. By overcoming the limitations of artificially designed features, CNN-based learning models can improve the performance of classification of complex junctions.

However, these approaches have drawbacks. A typical issue is that the existing approaches use a fixed geographical coverage to generate image samples. Considering that complex junctions often vary in size and shape, fixed sampling area coverage cannot ensure the integrity and clarity of the original junctions, which may lead to incorrect classification results. More specifically, if the area coverage is too small for a complex junction, the generated image cannot fully represent the structure of the junction, making it difficult to capture the overall characteristics of the junction. Conversely, with a larger area coverage, the complex structures of the junction are intact, but the image resolution is lower, and there is a lot of blank information in the image, which leads to the misclassification of the raster images.

To address this issue, this study proposes a stacking ensemble (SE) learning method for classifying the patterns of complex road junctions. By constructing and combining multiple machine learning algorithms, ensemble learning can overcome the disadvantages of the insignificance of data distributions or the limited representation ability of a single learning model and improve the performance of the classification [23,24,25,26]. The ensemble learning method can be implemented in three ways: bagging, boosting, and stacking. Among them, the SE algorithm uses a learning strategy to combine the predictions of two or more models to achieve a better performance [27], which has advantages in terms of stability over the averaging or voting strategies used in the other two ensemble algorithms. Specifically, the SE algorithm first processes training data separately with multiple classifiers (i.e., base-classifiers), then uses a new classifier (i.e., meta-classifier) to receive the outputs of the base-classifiers as input, and trains again to obtain the final classification result. For the classification of complex road junctions, considering the uncertainty of their spatial distributions, we first converted each junction into images with multiple sampling area coverages. Subsequently, several base-classifiers were trained using the samples with different area coverages, and they were further used to predict the probabilities of each junction belonging to different patterns. Finally, a meta-classifier was designed to combine the outputs of the base-classifiers and learn the final classification. In this study, AlexNet [28] and GoogLeNet [29] were employed as the base-classifiers, and the random forest (RF) algorithm [30] was implemented as the meta-classifier. To verify the effectiveness of the proposed method, a new and publicly available dataset containing complex road junctions with seven pattern types was constructed to train and test the base- and meta-classifiers.

The remainder of this paper is organized as follows. The proposed method is explained in Section 2. Section 3 presents experimental results, analysis, and discussion. Section 4 presents the conclusions.

2. Methodology

This study proposes an SE learning method for classifying the patterns of complex road junctions. The ensemble learning strategy aims to find the optimal classification results by combining the outputs of two or more CNN-based base-classifiers, which are constructed to classify the pattern types of road junctions by collecting image samples with different sampling area coverages. Figure 1 illustrates the overall framework of the proposed method, which comprises three main parts.

Sampling complex junctions with multiple area coverages: For each vector-based complex junction, several fixed-size raster images are generated with different area coverages to serve as the input samples for the base-classifiers.
Generating preliminary predictions using multiple base-classifiers: Based on the image samples obtained for each area coverage, a CNN-based base-classifier is constructed to predict the preliminary probabilities of each complex junction.
Obtaining classification results using a meta-classifier: The preliminary probabilities for each complex junction generated by multiple base-classifiers are combined as a new feature vector, which serves as the input to an RF-based meta-classifier to output the final classification result.

2.1. Sampling Complex Junctions with Different Area Coverages

Figure 2 illustrates the process of generating a raster image of a vector-based road junction. First, for a complex road junction extracted from the vector road network, the X- and Y- coordinates of the midpoint of each associated road segment were computed. Second, the geometric center of the structure of the junction was calculated by averaging the coordinates of the midpoints of all road segments. Based on the geometric center, a square area with a given length, i.e., sampling area coverage, was cropped to sample this junction. Finally, a binary raster image with the background in black and road segments in white was generated to realize the rasterization of the road junction.

For each complex junction, multiple raster images were generated by varying the sampling area coverage. All the generated raster images have the same pixel size. During the rasterization process, it is important to note that the line widths of the junction segments significantly affect the generated raster images. A smaller width may lead to insignificant representations of the structures of the junction, whereas a larger width may cause the junction segments to adhere to each other and lose the structural information of the junction. To explain this problem, five raster images generated with different line widths are shown in Figure 3. In this study, the size of the raster images was set to 250 × 250 pixels by comprehensively considering the clarity of raster image and data required for model input, and the line width of the road segments was empirically set to two pixels by considering the image clarity.

2.2. Generating Preliminary Predictions Using Base-Classifiers

Based on the raster images obtained with multiple area coverages, multiple base-classifiers were constructed to predict the possibilities of each junction belonging to different pattern types. As a type of feedforward neural network with deep architectures [31], CNNs are characterized by the features of each layer produced by the local regions of the upper layer through convolution kernels with shared weights, which makes them suitable for the learning and representation of image features. Therefore, this study employed CNNs as the base-classifiers. Two well-known CNN architectures, AlexNet [28] and GoogLeNet [29], were used as the base-classifiers respectively to obtain the preliminary prediction of patterns of road junctions. The following sections describe the main operations in the CNNs and the two architectures.

2.2.1. Basic Operations in CNNs

In addition to the input and output layers, a typical CNN architecture contains multiple convolutional and pooling layers, and one or more fully connected layers.

(1) Convolutional layers

The convolutional layers extract various features from raw image data through convolution operations. For example, the lower convolutional layer extracts obvious low-level features, such as edges, corners, and lines, whereas the higher convolutional layer extracts hidden high-level features. In the convolution layer, the feature map of the previous layer is convolved with a sliding convolutional kernel, and a new feature map is generated using a nonlinear activation function as follows:

g (i, j) = f (\sum_{k = 0}^{K - 1} \sum_{l = 0}^{L - 1} h (i - k, j - l) \times w (k, l) + b)

(1)

where

f (.)

denotes the activation function, for example, the Softmax function;

g (i, j)

and

h (i, j)

represent the feature values at position

(i, j)

in the new feature map and the previous layer, respectively;

w (k, l)

and

b

are the convolutional kernels with a size of

K \times L

and the bias.

(2) Pooling layer

The pooling layer aims to obtain spatially invariant features and compress them by reducing the resolution of feature maps. It is implemented by representing data in a region, that is, the pooling window, as a single value. Max-pooling is widely used to design CNN architectures. This operation selects the maximum value of the region as the value after pooling, which is suitable for filtering out noise and useless background information when only a part of useful information exists in the features. The max-pooling results are calculated as follows:

g (i, j) = \max_{\begin{matrix} 0 < k \leq K - 1 \\ 0 < l \leq L - 1 \end{matrix}} (h (M \times i + k, N \times j + l))

(2)

where

g (i, j)

and

h (i, j)

represent the feature values at position

(i, j)

in the new feature map after maximum pooling and the previous layer, respectively;

K

and

L

denote the size of the pooling window, respectively;

M

and

N

are the step size.

(3) Softmax function

To ensure that the final output classification result conforms to a distribution of probabilities, that is, each value ranges from 0 to 1 and the sum is 1, a Softmax layer is designed. The adjusted probability

P (S_{i})

for type

S_{i} (1 \leq i \leq n)

is computed as follows:

P (S_{i}) = \frac{e^{g_{i}}}{\sum_{j = 1}^{n} e^{g_{j}}}

(3)

where

g_{i}

and

g_{j}

denote the previous output value of this type and

n

denotes the number of types. The Softmax function is also often used as an activation function for the convolutional layers.

2.2.2. AlexNet

AlexNet was proposed by Krizhevsky et al. [28], and it won the ImageNet competition by a large margin over other non-neural network algorithms. AlexNet is an 8-layer CNN, as shown in Figure 4. The first five layers are the convolutional layers, the last three layers are the fully connected layers, and Softmax is used in the last layer.

AlexNet is the boundary between shallow and deep neural networks. Compared with traditional neural networks, it uses many techniques for the first time, including rectified linear units (ReLU) and Dropout, to accelerate model training and improve model representation capability. It also introduced a local response normalization (LRN) layer, which performed partial normalization of the ReLU output with its neighbors within a certain range. The normalized values

b_{(x, y)}^{i}

were computed as follows:

b_{(x, y)}^{i} = \frac{a_{(x, y)}^{i}}{{(k + α \sum_{j = m a x (0, i - \frac{n}{2})}^{m i n (N - 1, i + \frac{n}{2})} {(a_{(x, y)}^{j})}^{2})}^{β}}

(4)

where

a_{(x, y)}^{i}

denotes the output at the position

(x, y)

of the i-th convolution kernel after the ReLU function;

n

is the number of neighbors of

a_{(x, y)}^{i}

, which is self-defined;

N

is the total number of convolution kernels;

α

,

β

,

k

are self-defined coefficients. The introduction of LRN contributes to rapid convergence and improves the generalization ability of the model for feature learning.

2.2.3. GoogleNet

GoogLeNet is a deep architecture proposed by Szegedy et al. [29], which won the first place in the ImageNet competition that year. In GoogLeNet, a basic neuron unit, that is, the Inception structure, was designed to construct a sparse but high computational performance network, as illustrated in Figure 5. The Inception unit can make more efficient use of computational resources and extract more features with the same amount of computation, thereby improving the training results.

The Inception unit consists of a parallel structure with four branches. Through these four branches, feature maps of different scales can be obtained and then concatenated along the depth direction to obtain a new feature map. A 22-layer GoogLeNet model was constructed based on the Inception unit. The input image was fed into nine Inception units after two convolutional layers and maximum pooling layers. GoogLeNet is easier to modify owing to the modular design of Inception units. The network eventually used average pooling instead of the fully connected layer but still used Dropout. Additionally, to avoid gradient disappearance, two additional auxiliary Softmax functions were added to conduct gradients forward.

2.3. Obtaining Final Classification Results Using a Meta-Classifier

As illustrated in Figure 6, trained base-classifiers were applied to predict the type of input junction. The outputs of these base-classifiers, indicating the possibility that the input junction belongs to different pattern types, were then combined as the feature input for the meta-classifier to obtain the final classification results. In this study, a stacked ensemble using the RF model was adopted to fuse the prediction results from individual base-classifiers for the final decision.

As a flexible and efficient machine learning algorithm, the RF model produces satisfactory results without complex hyper-parameter tuning. The RF model contains multiple decision trees, and the classes that it outputs are determined by the mode of the classes that the trees output. As the decision trees integrated in the RF model are independent of each other, the training is fast and easy to parallelize, making the RF model suitable for classification tasks. The performance of the RF model is affected by several parameters, including the n_estimators, which denotes the number of decision trees, and max_depth, which denotes the maximum depth of the decision trees. During the training of the model, these parameters should be appropriately set according to the dataset to improve the accuracy of the model.

3. Experiments

Experiments were conducted to validate the proposed method. This section describes the experimental datasets, settings, results, analyses, and discussion.

3.1. Experimental Data and Preprocessing

3.1.1. Experimental Data

Experimental data were manually extracted from the road networks of 30 cities in China, including Wuhan, Shanghai, Hangzhou, and Nanjing, and downloaded from OpenStreetMap (OSM) (www.openstreetmap.org, accessed on 20 March 2021). These cities have relatively large areas, abundant road transportation facilities, and road junctions with diversified types and structures. According to the morphological and shape characteristics, the collected complex road junctions were classified into seven typical types, including the Butterfly, Cloverleaf, Diamond, T-shape, Trumpet, Turbine, and Other, as illustrated in Figure 7. A total of 150 samples were collected for each junction type, with a total of 1050.

3.1.2. Sample Augmentation

An adequate number of samples is important for ensuring the training stability and improving the classification accuracy of the supervised CNN model. Therefore, two data augmentation methods were employed to increase the number of junctions collected. Because the orientation change of a road junction does not affect pattern recognition, a rotation method was first implemented. Specifically, each vector-based junction sample was rotated 90°, 180°, and 270°. Second, mirror operation was used for each sample in the up–down and left–right directions. Consequently, the number of junction samples reached 6300, which was six times that of the original samples. Subsequently, all samples were divided into training, validation, and test sets at a ratio of 6:2:2, with total numbers of samples of 3780, 1280, and 1280, respectively.

3.2. Parameter Settings

To determine the appropriate sampling area coverage as well as the number of base-classifiers in the SE models, the sizes of the minimum bounding squares [22] for all collected road junctions were counted, and the distributions are shown in Figure 8.

The area coverages of the collected junction samples varied greatly, with the maximum coverage exceeding 2000 × 2000 m² and the minimum coverage at less than 250 × 250 m². According to statistics, three different area coverages, namely, 500 × 500 m², 1000 × 1000 m², and 1500 × 1500 m², were set to generate raster images for each junction sample. Examples of the seven types of junction samples under the three area coverages are listed in Table 1.

In the training of the GoogLeNet model, some parameters were adjusted several times in accordance with the characteristics of the collected junctions. Finally, the learning rate was set to 0.02, the batch size was set to 16, the maximum number of iterations was set to 30,000, the Dropout value was set to 0.8, and the weight decay was set to 0.00004 to ensure the convergence of the models. For the AlexNet model, the learning rate, batch size, and Dropout were set to be the same as those of the GoogLeNet model, the maximum iteration was set to 20,000, and the weight decay was set to 0.0005. For the RF model, the parameter n_estimators and max_depth were set to 200 and 10, respectively. Moreover, a node must contain at least two training samples before it can branch, and each child of a node after a branch must contain at least one sample to ensure training efficiency and accuracy.

3.3. Evaluation Metrics

The overall classification performance of the test samples was quantitatively evaluated using the accuracy metric, which is defined as the ratio of the number of correctly classified samples to the total number of samples. In addition, for each pattern type, three metrics, that is, the precision, recall, and F₁, were employed to evaluate the classification results. The three metrics were computed as follows:

p r e c i s i o n = \frac{T P}{T P + F P} \times 100 %

(5)

r e c a l l = \frac{T P}{T P + F N} \times 100 %

(6)

F_{1} = 2 \times \frac{p r e c i s i o n \times r e c a l l}{p r e c i s i o n + r e c a l l}

(7)

where TP, FP, and FN are the numbers of true-positive, false-positive, and false-negative classification results, respectively.

3.4. Results and Analysis

The classification results for the test set using different base-classifiers and the SE method are listed in Table 2. First, the classification accuracies of the GoogLeNet-based base-classifiers and SE model were significantly higher than those of the AlexNet-based base-classifiers and SE model. This result may be attributed to the representation capability of CNN models. More importantly, the SE models significantly outperformed the base-classifiers trained using samples with a fixed sampling area coverage. The accuracy metric of the AlexNet-based SE model reached 78.9%, which was 5~6% higher than that of the base-classifiers with different area coverage. For the GoogLeNet-based SE model, the accuracy metric reached 92.4%, which was 6~14% higher than that of the base-classifiers.

Three examples of the classification results are presented in Table 3. Through ensemble learning, junctions with very small or large area coverage that were misclassified in some sampling area coverages could be corrected (case 1 and case 2). For some junctions that are similar in overall form and structure to other types of junctions (case 3), the local characteristics in a small area coverage are important for identifying their patterns, and the ensemble learning can make good use of these slight differences in the local characteristics to obtain correct classification results. The results indicate that ensemble learning retains the advantages of previous base-classifiers and solves the shortcomings in the classification of some junctions. As a result, the classification performance was significantly improved, which verifies the effectiveness of the proposed method.

Table 4 and Table 5 list the confusion matrices and three metrics of the classification results for the test road junctions using the AlexNet-based and GoogLeNet-based SE models, respectively. The precision and recall metrics of the Cloverleaf-type junctions in both models were high because they contain four road segments, which make their visual characteristics significant. For some Butterfly-type and Turbine-type junctions, both models had misclassification, with the AlexNet-based SE model having more. The GoogLeNet-based SE model had a relatively good performance in classifying the Diamond- and Trumpet-type junctions, whereas the AlexNet-based SE model identified some Diamond-type junctions as T-shape- or Trumpet-type junctions and misclassified some T-shape- and Trumpet-type junctions. In addition, there was a certain misclassification between Other-type junctions and the remaining six types of junctions, which was even more serious in the AlexNet-based model.

Furthermore, some examples of misclassified junctions were analyzed to explore the possible reasons, as shown in Figure 9. It can be seen that if the morphology of a junction differs significantly from the typical junctions of its type, the model is likely to produce an incorrect classification despite the fact that the connections between their road segments are almost the same. This result indicates that the proposed models focus more on the overall morphology and structure of the junctions during classification. However, the connections between road segments are also important criteria for manual classification, although they are weakened in the process of image-based machine learning models. Under these circumstances, the topological relationship between road segments should be further considered, and a graph-based deep-learning model may be a potential solution for this purpose.

3.5. Discussion

To explore the effect of different numbers of base-classifiers on stacking ensemble learning, raster images for junctions with different sampling area coverages were generated to train the base- and meta-classifiers. Specifically, we designed two area coverages of 500 × 500 m² and 1500 × 1500 m² to correspond to two base-classifiers; four area coverages of 500 × 500 m², 750 × 750 m², 1000 × 1000 m², and 1500 × 1500 m² corresponding to four base-classifiers; and five sampling area coverages of 500 × 500 m², 750 × 750 m², 1000 × 1000 m², 1250 × 1250 m², and 1500 × 1500 m² corresponding to five base-classifiers. The classification accuracies of the SE models with different numbers of base-classifiers are presented in Table 6.

It was observed that the classification accuracies of the AlexNet- and GoogLeNet-based SE models tended to increase initially and then decrease with an increase in the number of base-classifiers, reaching the highest when the number of base-classifiers was three. This result indicates that the number of base-classifiers affects the classification accuracy of the ensemble learning. An appropriate increase in the number of base-classifiers can effectively improve the accuracy of the ensemble learning; however, it is not necessarily better. One possible reason for this result is that an increase in the number of base-classifiers leads to a rapid increase in the feature dimension of the input to the meta-classifier, thus affecting the classification performance of the meta-classifier. In this study, considering model accuracy and experimental efficiency, three base-classifiers were used to construct the SE models.

3.6. Tests on Other Cities

To further verify the generalization ability of the ensemble learning method, the road junctions from four other cities, i.e., Beijing, Chengdu, Guangzhou, and Chongqing, were used to conduct an additional experiment. An approach introduced in [10] was used to detect complex road junctions from the road networks of these cities. As a result, 217, 90, 109, and 156 complex junctions were identified in Beijing, Chengdu, Guangzhou, and Chongqing, respectively. The distribution of junction types in each city is presented in Table 7.

Classification results of the complex junctions in the four cities using AlexNet- and GoogLeNet-based SE models are shown in Figure 10, wherein different colors are used to mark different types of junctions. The classification accuracies of the junctions in the four cities using the SE models are listed in Table 8.

As shown in Table 8, although the distribution of the number of different types of junctions varies greatly in each city, over 70% of the junctions in the four cities were classified correctly. The classification accuracies of the GoogLeNet-based SE model for the four cities are higher than those of the AlexNet-based SE model, which is consistent with the results of previous tests of data. Overall, this result verifies that the proposed method classifies junctions in different cities well and proves the generalization ability of the model.

4. Conclusions

This study presents an ensemble learning method for classifying patterns of complex road junctions. In contrast to the existing deep-learning-based classification methods, the proposed method consists of multiple base-classifiers and one meta-classifier. Each base-classifier predicts the type of each complex road junction by learning the features of the raster images generated from the vector-based junctions with different sampling area coverages. Then, the meta-classifier combines the preliminary predictions of these base-classifiers to obtain the final prediction. Two popular deep learning architectures, AlexNet and GoogLeNet, were employed to construct the base-classifiers, and the RF model was used as the meta-classifier.

Experimental results show that the classification performance improved after using the SE method. The accuracy metric of the GoogLeNet-based SE model reached 92.4%, which was 6~14% higher than that of the base-classifiers; that is, the GoogLeNet models trained using raster images from a single sampling area coverage. The accuracy metric of the AlexNet-based SE model reached 78.9%, which is 5~6% higher than that of the base-classifiers. These results demonstrate that the proposed SE method can effectively combine the contributions of different classifiers that are trained based on raster representations of complex junctions with different area coverages. In addition, this method has been proven to have a good generalization ability and can be applied for the classification of road junctions in different cities.

Future work would focus on increasing the diversity of samples to improve the universality of proposed method for complex road junction classification. Further, the topological connections between road segments can be considered to improve the deficiency of the raster-based model, such as constructing a graph-based deep learning model as a base-classifier to participate in ensemble learning. In addition, additional different ensemble learning strategies deserve consideration, such as integrating multi-source data (trajectories, etc.), to further improve the classification accuracy.

Author Contributions

Conceptualization, Min Yang and Xiongfeng Yan; methodology, Min Yang, and Xiongfeng Yan; software, Lingya Cheng and Minjun Cao; formal analysis, Min Yang, Lingya Cheng, Minjun Cao, and Xiongfeng Yan; data curation, Lingya Cheng and Minjun Cao; writing—original draft preparation, Min Yang, Lingya Cheng, Minjun Cao, and Xiongfeng Yan; writing—review and editing, Min Yang and Xiongfeng Yan; supervision, Min Yang and Xiongfeng Yan; funding acquisition, Min Yang and Xiongfeng Yan. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China [grant numbers 42071450, 42001415], and the Key Basic Research Projects of the Foundation Plan of China [grant number 2020-JCJQ-ZD-087].

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ulugtekin, N.; Dogru, A.O.; Thomson, R.C. Modelling urban road networks integrating multiple representations of complex road and junction structures. In Proceedings of the 12th International Conferences on Geoinformatics, Gävle, Sweden, 7–9 June 2004. [Google Scholar]
Mackaness, W.A.; Mackechnie, G.A. Automating the detection and simplification of junctions in road networks. Geoinf. Int. J. Adv. Comput. Sci. Geogr. 1999, 3, 185–200. [Google Scholar] [CrossRef]
Touya, G. A Road network selection process based on data enrichment and structure detection. Trans. GIS 2010, 14, 595–614. [Google Scholar] [CrossRef] [Green Version]
Yang, J.; Zhao, K.; Li, M.; Xu, Z.; Li, Z. Identifying complex junctions in a road network. ISPRS Int. J. Geo-Inf. 2020, 10, 4. [Google Scholar] [CrossRef]
Zhou, Y.; Chung, E.; Bhaskar, A.; Cholette, M.E. A state-constrained optimal control based trajectory planning strategy for cooperative freeway mainline facilitating and on-ramp merging maneuvers under congested traffic. Transp. Res. Part C-Emerg. Technol. 2019, 109, 321–342. [Google Scholar] [CrossRef]
Jiang, B.; Liu, C. Street-based topological representations and analyses for predicting traffic flow in GIS. Int. J. Geogr. Inf. Sci. 2009, 23, 1119–1137. [Google Scholar] [CrossRef] [Green Version]
Liu, B.; Dong, W.; Zhan, Z.; Wang, S.; Meng, L. Differences in the gaze behaviours of pedestrians navigating between regular and irregular road patterns. ISPRS Int. J. Geo-Inf. 2020, 9, 45. [Google Scholar] [CrossRef] [Green Version]
Habermann, D.; Vido, C.; Osorio, F.S.; Ramos, F. Road junction detection from 3D point clouds. In Proceedings of the International Joint Conference on Neural Networks, Vancouver, BC, Canada, 24–29 July 2016; pp. 4934–4940. [Google Scholar]
Li, H.; Hu, M.; Huang, Y. Automatic identification of overpass structures: A method of deep learning. ISPRS Int. J. Geo-Inf. 2019, 8, 421. [Google Scholar] [CrossRef] [Green Version]
Yang, M.; Jiang, C.; Yan, X.; Ai, T.; Cao, M.; Chen, W. Detecting interchanges in road networks using a graph convolutional network approach. Int. J. Geogr. Inf. Sci. 2022, 36, 1119–1139. [Google Scholar] [CrossRef]
Zhou, Q.; Li, Z. Experimental analysis of various types of road intersections for interchange detection. Trans. GIS 2015, 19, 19–41. [Google Scholar] [CrossRef]
Xu, Z.; Meng, Y.; Li, Z.; Li, M. Identification method of typical road junctions based on directed attribute relation graph. Acta Geod. Cartogr. Sin. 2011, 40, 125–131. [Google Scholar]
Wang, X.; Qian, H.; Ding, Y.; Zhang, X.; Liu, R. Recognition method of overall interchanges based on topological relationship and road classification. J. Geomat. Sci. Technol. 2013, 30, 324–328. [Google Scholar]
Zhao, Z.; Zheng, P.; Xu, S.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef]
Otter, D.W.; Medina, J.R.; Kalita, J.K. A survey of the usages of deep learning for natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 604–624. [Google Scholar] [CrossRef] [Green Version]
Nassif, A.B.; Shahin, I.; Attili, I.; Azzeh, M.; Shaalan, K. Speech recognition using deep neural networks: A systematic review. IEEE Access 2019, 7, 19143–19165. [Google Scholar] [CrossRef]
Yan, X.; Ai, T.; Yang, M.; Yin, H. A graph convolutional neural network for classification of building patterns using spatial vector data. ISPRS-J. Photogramm. Remote Sens. 2019, 150, 259–273. [Google Scholar] [CrossRef]
Yan, X.; Ai, T.; Yang, M.; Tong, X. Graph convolutional autoencoder model for the shape coding and cognition of buildings in maps. Int. J. Geogr. Inf. Sci. 2021, 35, 490–512. [Google Scholar] [CrossRef]
Michael, M.K.; Thirumalai, S.V.J.; Sureshkanna, P. RBorderNet: Rider border collie optimization-based deep convolutional neural network for road scene segmentation and road intersection classification. Digit. Signal Process. 2022, 129, 103626. [Google Scholar] [CrossRef]
He, H.; Qian, H.; Xie, L.; Duan, P. Interchange recognition method based on CNN. Acta Geod. Cartogr. Sin. 2018, 47, 385–395. [Google Scholar]
Touya, G.; Lokhat, I. Deep learning for enrichment of vector spatial databases. ACM Trans. Spat. Algorithms Syst. 2020, 6, 1–21. [Google Scholar] [CrossRef] [Green Version]
Li, C.; Zhang, H.; Wu, P.; Yin, Y.; Liu, S. A complex junction recognition method based on GoogLeNet model. Trans. GIS 2020, 24, 1756–1778. [Google Scholar] [CrossRef]
Yang, M.; Kong, B.; Dang, R.; Yan, X. Classifying urban functional regions by integrating buildings and points-of-interest using a stacking ensemble method. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102753. [Google Scholar] [CrossRef]
Fatemeh, H.; Hesam, O. Stacking ensemble model of deep learning and its application to Persian/Arabic handwritten digits recognition. Knowl. Based Syst. 2021, 220, 106940. [Google Scholar] [CrossRef]
Cao, D.; Xing, H.; Sing, W.M.; MeiPo, K.; Xing, H.; Meng, Y. A stacking ensemble deep learning model for building extraction from remote sensing images. Remote Sens. 2021, 13, 3898. [Google Scholar] [CrossRef]
Cheng, X.; Lei, H. Remote sensing scene image classification based on mmsCNN–HMM with stacking ensemble model. Remote Sens. 2022, 14, 4423. [Google Scholar] [CrossRef]
Wolpret, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef] [Green Version]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.E.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]

Figure 1. Overall framework of the proposed stacking ensemble (SE) learning method for classifying the patterns of complex road junctions.

Figure 2. Illustration of the process of generating the raster image for a vector-based complex road junction.

Figure 3. Comparison of the raster images generated with different line widths.

Figure 4. Simplified illustration of the AlexNet architecture.

Figure 5. Simplified illustration of the GoogLeNet architecture.

Figure 6. Details of the stacking ensemble learning process.

p_{j}^{i}

represents the possibility of the pattern of the input sample belonging to the i-th type that is predicted by the j-th base-classifier, and m and n denote the number of the based-classifiers and the number of junction pattern types, respectively.

Figure 6. Details of the stacking ensemble learning process.

p_{j}^{i}

represents the possibility of the pattern of the input sample belonging to the i-th type that is predicted by the j-th base-classifier, and m and n denote the number of the based-classifiers and the number of junction pattern types, respectively.

Figure 7. Examples for the seven types of complex road junctions.

Figure 8. Size distribution of the minimum bounding squares of the collected road junction samples.

Figure 9. Some misclassified junctions and the typical junctions of their types: (a) misclassified and typical Trumpet-type junctions, respectively, (b) misclassified and typical Diamond-type junctions, respectively.

Figure 10. Classification results for the complex road junctions in different cities using the proposed stacking ensemble (SE) method: (a–d) Beijing, Chengdu, Chongqing, and Guangzhou, respectively, using the AlexNet-based SE model, (e–h) Beijing, Chengdu, Chongqing, and Guangzhou, respectively, using the GoogLeNet-based SE model.

Table 1. Examples of the seven types of junction samples under three sampling area coverages.

Area Coverage	Butterfly	Cloverleaf	Diamond	T-shape	Trumpet	Turbine	Other
500 × 500 m²
1000 × 1000 m²
1500 × 1500 m²

Table 2. Classification accuracies of the AlexNet-based and GoogLeNet-based base-classifiers and stacking ensemble (SE) models for test data.

Model	Accuracy (%)
Model	AlexNet-Based SE Model	GoogLeNet-Based SE Model
Base-classifier 1 Area coverage 500 × 500 m²	73.4	86.4
Base-classifier 2 Area coverage 1000 × 1000 m²	73.4	82.4
Base-classifier 3 Area coverage 1500 × 1500 m²	72.4	78.3
SE model	78.9	92.4

Table 3. Comparison of classification results using the base-classifiers and stacking ensemble (SE) model.

Case	Manual Label	Base-Classifier 1 Area coverage 500 × 500 m²	Base-Classifier 2 Area coverage 1000 × 1000 m²	Base-Classifier 3 Area coverage 1500 × 1500 m²	SE Model
1	Diamond type				Diamond type
1	Diamond type	Diamond type	Diamond type	Other type	Diamond type
2	T-shape type				T-shape type
2	T-shape type	Other type	T-shape type	T-shape type	T-shape type
3	Turbine type				Turbine type
3	Turbine type	Turbine type	Turbine type	Butterfly type

Table 4. Confusion matrix and evaluation metrics of the classification results of complex junctions using the AlexNet-based SE model.

	Butterfly	Cloverleaf	Diamond	T-shape	Trumpet	Turbine	Other	Precision (%)	Recall (%)	F₁
Butterfly	134	12	0	0	0	32	2	68.7	74.4	0.71
Cloverleaf	0	179	0	0	0	0	1	90.9	99.4	0.95
Diamond	0	0	137	20	5	0	18	78.3	76.1	0.77
T-shape	0	0	7	151	10	0	12	73.3	83.9	0.78
Trumpet	0	0	0	8	167	0	5	91.8	92.8	0.92
Turbine	48	6	0	0	0	126	0	75.5	70.0	0.73
Other	13	0	31	27	0	9	100	72.5	55.6	0.63

Table 5. Confusion matrix and evaluation metrics of the classification results of complex junctions using the GoogLeNet-based SE model.

	Butterfly	Cloverleaf	Diamond	T-shape	Trumpet	Turbine	Other	Precision (%)	Recall (%)	F₁
Butterfly	179	0	0	0	0	0	1	90.4	99.4	0.95
Cloverleaf	0	174	0	0	0	0	6	99.4	96.7	0.98
Diamond	0	0	174	1	0	0	5	91.6	96.7	0.94
T-shape	0	0	10	160	0	0	10	92.0	88.9	0.90
Trumpet	0	0	0	0	178	0	2	98.9	98.9	0.99
Turbine	16	0	0	0	0	157	7	92.4	87.2	0.90
Other	3	1	6	13	2	13	142	82.1	78.9	0.80

Table 6. Classification accuracies of the stacking ensemble (SE) models with different numbers of base-classifiers for test data.

Model	Accuracy (%)
Model	AlexNet-Based SE Model	GoogLeNet-Based SE Model
Two base-classifiers	76.8	88.0
Three base-classifiers	78.9	92.4
Four base-classifiers	77.1	92.1
Five base-classifiers	77.5	90.7

Table 7. Statistics of the detected complex junctions in the four cities.

City	Butterfly	Cloverleaf	Diamond	T-Shape	Trumpet	Turbine	Other
Beijing	21	15	13	8	77	28	55
Chengdu	23	39	10	2	0	11	5
Guangzhou	33	12	31	37	1	16	26
Chongqing	19	1	16	24	15	15	19

Table 8. Classification accuracies of the AlexNet- and GoogLeNet-based stacking ensemble (SE) models for the complex road junctions in the four cities.

City	Accuracy (%)
City	AlexNet-Based SE Model	GoogLeNet-Based SE Model
Beijing	75.6	77.9
Chengdu	64.4	74.4
Chongqing	75.0	76.9
Guangzhou	73.4	83.5

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, M.; Cheng, L.; Cao, M.; Yan, X. A Stacking Ensemble Learning Method to Classify the Patterns of Complex Road Junctions. ISPRS Int. J. Geo-Inf. 2022, 11, 523. https://doi.org/10.3390/ijgi11100523

AMA Style

Yang M, Cheng L, Cao M, Yan X. A Stacking Ensemble Learning Method to Classify the Patterns of Complex Road Junctions. ISPRS International Journal of Geo-Information. 2022; 11(10):523. https://doi.org/10.3390/ijgi11100523

Chicago/Turabian Style

Yang, Min, Lingya Cheng, Minjun Cao, and Xiongfeng Yan. 2022. "A Stacking Ensemble Learning Method to Classify the Patterns of Complex Road Junctions" ISPRS International Journal of Geo-Information 11, no. 10: 523. https://doi.org/10.3390/ijgi11100523

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Stacking Ensemble Learning Method to Classify the Patterns of Complex Road Junctions

Abstract

1. Introduction

2. Methodology

2.1. Sampling Complex Junctions with Different Area Coverages

2.2. Generating Preliminary Predictions Using Base-Classifiers

2.2.1. Basic Operations in CNNs

2.2.2. AlexNet

2.2.3. GoogleNet

2.3. Obtaining Final Classification Results Using a Meta-Classifier

3. Experiments

3.1. Experimental Data and Preprocessing

3.1.1. Experimental Data

3.1.2. Sample Augmentation

3.2. Parameter Settings

3.3. Evaluation Metrics

3.4. Results and Analysis

3.5. Discussion

3.6. Tests on Other Cities

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI