Investigating Green View Perception in Non-Street Areas by Combining Baidu Street View and Sentinel-2 Images

Wang, Hongyan; Che, Xianghong; Yang, Xinru

doi:10.3390/su17167485

Open AccessArticle

Investigating Green View Perception in Non-Street Areas by Combining Baidu Street View and Sentinel-2 Images

by

Hongyan Wang

^1,2,

Xianghong Che

^1,* and

Xinru Yang

³

¹

Chinese Academy of Surveying and Mapping, Beijing 100036, China

²

SpaceTellan Aerospace Spatiotemporal Information Technology (Chongqing) Co., Ltd., Chongqing 401135, China

³

Faculty of Geosciences and Engineering, Southwest Jiaotong University, Chengdu 610031, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(16), 7485; https://doi.org/10.3390/su17167485

Submission received: 20 June 2025 / Revised: 3 August 2025 / Accepted: 15 August 2025 / Published: 19 August 2025

(This article belongs to the Special Issue Remote Sensing in Landscape Quality Assessment)

Download

Browse Figures

Versions Notes

Abstract

Urban greening distribution critically impacts residents’ quality of life and environmental sustainability. While the Green View Index (GVI), derived from street view imagery, is widely adopted for urban green space assessment, its limitation lies in the inability to capture non-street-area vegetation. Remote sensing imagery, conversely, provides full-coverage urban vegetation data. This study focuses on Beijing’s Third Ring Road area, employing DeepLabv3+ to calculate a street-view-based GVI as a predictor. Correlations between the GVI and Sentinel-2 spectral bands, along with two vegetation indices, such as the Normalized Difference Vegetation Index (NDVI) and Fractional Vegetation Cover (FVC), were analyzed under varying buffer radius. Regression and classification models were subsequently developed for GVI prediction. The optimal classifier was then applied to estimate green perception levels in non-street zones. The results demonstrated that (1) at a 25 m buffer radius, the near-infrared band, NDVI, and FVC exhibited the highest correlations with the GVI, reaching 0.553, 0.75, and 0.752, respectively. (2) Among the five machine learning regression models evaluated, the random forest algorithm demonstrated superior performance in GVI estimation, achieving a coefficient of determination (R²) of 0.787, with a root mean square error (RMSE) of 0.063 and a mean absolute error (MAE) of 0.045. (3) When evaluating categorical perception levels of urban greenery, the Extremely Randomized Trees classifier (Extra Trees) demonstrated superior performance in green vision perception level estimation, achieving an accuracy (ACC) score of 0.652. (4) The green perception level in non-road areas within Beijing’s Third Ring Road is 56.8%, which is considered relatively poor. Moreover, the green perception level within the Second Ring Road is even lower than that in the area between the Second and Third Ring roads. This study is expected to provide valuable insights and references for the adjustment and optimization of green perception distribution in Beijing, thereby supporting more informed urban planning and the development of sustainable, human-centered green spaces across the city.

Keywords:

urban green spaces; green view index; street view imagery; remote sensing imagery; NDVI; FVC; extremely randomized trees classification

1. Introduction

The optimization of urban spatial planning to enhance livability has emerged as a critical focus in contemporary urban research, driven by escalating societal demands for improved quality of life. A systematic assessment of urban spatial configurations through multidimensional evaluation frameworks provides essential empirical foundations for evidence-based urban management [1]. Within urban ecosystem planning paradigms, green infrastructure development constitutes a vital component of sustainable environmental systems, requiring integrated approaches that balance ecological functionality with human-centric design principles [2]. The investigation of urban green space (UGS) in Chinese megacities has traditionally leveraged multi-source social media datasets, including points-of-interest (POI) mapping, geotagged Weibo check-ins, and Dazhong Dianping data [3,4,5]. Satellite remote sensing historically dominated UGS research prior to street view imagery, offering broad spatial coverage and temporal resolution [6,7]. Core vegetation indices included the NDVI for spectral analysis and FVC for structural assessment [8,9]. Zhao et al. analyzed green space evolution in Nanjing and Greater Manchester using remote sensing-derived land use data and spatial statistics [10]. Li et al. investigated NDVI dynamics in Beijing through multispectral processing and geospatial modeling, identifying key drivers of green space variations [11].

Research has shown that approximately 90% of the environmental information that is perceived by humans is acquired through visual channels [12]. With advancements in street view imagery technology and the comprehensive studies conducted by Japanese scholars on the Green View Index (GVI), which include quantitative assessments of environmental greenery, psychological response mechanisms, and landscape perception evaluations [13,14], the GVI has evolved into a fundamental three-dimensional metric for assessing urban green spaces. The GVI quantifies pedestrian-level environmental perceptions, capturing spatiotemporal variations and three-dimensional community greening composition [15,16]. Compared to traditional indices like the Greening Rate (GR) and Green Space Ratio (GSR), studies demonstrate that the GVI more accurately reflects public green space quality and aligns with daily human activities [17]. Villeneuve et al. identified statistically significant associations between the GVI and summer recreational time, outperforming the NDVI [18]. As a robust three-dimensional metric, the GVI addresses critical limitations of conventional two-dimensional indices by quantifying pedestrian-scale vegetation exposure with high precision [19]. It also demonstrated that correlations with human activity patterns and recreational behaviors have aligned with contemporary demands for human-centric urban design, while also complementing traditional remote sensing approaches that have dominated urban green space research. Current methodologies for GVI extraction and calculation primarily involve HSV color space analysis, semantic segmentation, and supervised classification techniques. Zheng et al. integrated an HSV-based GVI and sky view factor (SVF) with population heatmaps, revealing significant correlations between street hierarchy and GVI values in historic districts [20]. Ye et al. combined SegNet for vegetation extraction from Google Street View (GSV) imagery with spatial design network analysis (sDNA) using OpenStreetMap (OSM) data to calculate street accessibility metrics [21]. Comparative analyses by Feng et al. demonstrated street view imagery’s superiority over multispectral remote sensing in pedestrian-perspective greenery monitoring [22]. These multimodal approaches enable comprehensive urban planning through complementary spatiotemporal insights.

Recent advancements in urban informatics have demonstrated the efficacy of integrating deep learning with geospatial analysis. Guo Jinhuan et al. used the Deeplabv3+ model to process street view images [23,24]. Based on the results of semantic segmentation, they combined multi-source data to analyze the impact of environmental features, architectural features, and neighborhood characteristics on housing prices on Xiamen Island. Concurrently, Hu et al. conducted comparative evaluations of segmentation performance on multispectral remote sensing datasets, with empirical results demonstrating the superior Intersection over Union (IoU) and mean IoU (mIoU) metrics achieved by DeepLabv3+ across all land cover categories [25]. Within urban greening assessment frameworks, street view imagery and remote sensing data exhibit complementary strengths: pedestrian-centric visual exposure metrics derived from street-level perspectives contrast with the NDVI and FVC obtained from aerial platforms, collectively enabling multi-scale urban ecosystem monitoring that is essential for sustainable planning. Li Miaoyi et al. calculated the GVI based on the greenery identified by SegNet using street view, and the NDVI using Landsat 8 Operational Land Imager (OLI) remote sensing images from the same month, respectively, to clarify the greening differences between satellite scale and human scale, concluding that human scale greening evaluation is more promising [26]. However, street view images can only be captured along streets, thus making it difficult to evaluate urban green space in non-street areas. As an index describing urban green environments, the GVI is bound to correlate with the NDVI and similar measures. Ming Tong et al. used Nanjing as a case study to explore the relationship between the GVI and NDVI, showing strong correlations between the two [27]. Limited research has addressed how to exploit their relationship to overcome the spatial constraints of street view imagery collection.

Given the temporal inconsistency and spatial sparsity of street view image acquisition, this study first explored the correlations between remote sensing-derived vegetation indices and street-level GVI values. Subsequently, predictive models were established to estimate GVI-equivalent metrics in non-street areas. These methodological advancements provided critical data support for evidence-based urban green space planning and evaluation.

2. Study Area and Data Source

2.1. Study Area

Street view imagery, a widely used data source in recent years, suffers from limited spatial coverage. Given that some cities cannot obtain complete street view image coverage, particular attention was paid to ensuring temporal consistency between street view imagery and remotely sensed data during the experimental design. This study focuses on the main road network within Beijing’s Third Ring Road area. Encompassing 159 square kilometers, the study area primarily includes Dongcheng District, Xicheng District, and portions of Haidian, Fengtai, and Chaoyang districts. The selected road network comprises 231 principal roads with a total length of approximately 309.06 km (Figure 1).

2.2. Street View Data

Baidu Street View data were utilized in this study. Road network data were initially retrieved through the Overpass API from the OpenStreetMap (OSM) platform (https://www.openstreetmap.org). The GeoConverter tool (https://geoconverter.infs.ch/) was then employed to convert these data into vector format under the World Geodetic System 1984 (WGS-1984) geographic coordinate system, which served as the geospatial reference framework for street view sampling. Using ArcGIS 10.7 software, we extracted road segments within the study area. For dual-carriageway roads presenting symmetrical alignment, their widths were measured through manual quantification. Width measurements were conducted for different road classifications separately, with the average value assigned as the representative width for each road category.

Based on the measured widths, buffer zones were created around road centerlines, from which refined vector network data were derived by extracting central axes of the buffers. To avoid sampling duplication, road segments were truncated at intersections. Sampling points were generated along centerlines at 50 m intervals, with corresponding geographic coordinates calculated using geodetic formulas. A Python3.8 script was developed to convert the WGS-1984 coordinates to the Baidu Mars coordinate system (BD-09). Through the Baidu Street View API, panoramic images within a 50 m vicinity of sampling points were acquired, recording metadata that included unique image identifiers, captured entities, and actual geographic coordinates. Figure 2 illustrates the acquisition workflow of street view imagery data.

Baidu Street View images are 360° panoramic scenes that are captured by vehicle-mounted cameras at predefined sampling locations. To address the constrained vertical field of view in human visual perception and correct the lens distortions that are prevalent in the upper and lower portions of these street view images, we implemented the preprocessing framework that was established by Xiang Jing et al. [28]. The original images with a native resolution of 4096 × 2048 pixels underwent marginal cropping, which removed 320 pixels from both the top and bottom edges, resulting in rectified imagery at 4096 × 1408 pixels resolution (Figure 3).

To ensure temporal consistency across the remote sensing and street view datasets, we performed temporal metadata analysis of all street view images by using their acquisition timestamps. The statistical results revealed that June–August 2017 accounted for the peak collection period, constituting 61% of the total acquired images (Figure 4). Accordingly, the final dataset prioritized images from this three-month period, yielding a total of 4070 qualified street view samples.

2.3. Remote Sensing Data

Considering the temporal distribution of the street view images, we downloaded the Sentinel-2 Level-1C data for the period from June to August 2017. Sentinel-2 is a medium-to-high-resolution multispectral imaging satellite that provides vegetation, soil, and coastal area monitoring data for terrestrial environment and natural disaster observation. The Sentinel-2 constellation includes two satellites (Sentinel-2A and Sentinel-2B), launched in 2015 and 2017, respectively, which together achieve a 3–5-day global revisit cycle (URL: https://dataspace.copernicus.eu/).

We used blue, green, red, and near-infrared bands with a 10 m spatial resolution. Through visual inspection, three cloud-free Sentinel-2 Level-1C images from June, July, and August 2017 were selected. These orthorectified images are atmospheric-apparent reflectance products with sub-pixel geometric precision correction. The Sen2Cor atmospheric correction tool from the European Space Agency was applied for atmospheric, topographic, and cirrus corrections. Finally, the three Sentinel-2 images were resampled and cropped using SNAP 8.0 software. Figure 5 displays the false-color composite image of the study area, where the brighter yellow hues indicate higher vegetation content.

3. Methodology

Street view imagery and remote sensing imagery characterize the distribution of urban greenery from different perspectives. Considering the inaccessibility of street view images in non-road areas, GVI values derived from the street view images in road-adjacent zones were analyzed to predict non-road GVIs through Sentinel-2 data integration. First, the GVI values in the road areas were quantified using the DeepLabv3+ semantic segmentation model. Correlation analyses were conducted between Sentinel-2 spectral bands (blue, green, red, and near-infrared), vegetation indices (the NDVI and FVC), and the GVI to identify key predictors and to determine the optimal buffering radii for remote sensing indices. Multiple machine learning regression and classification models were evaluated to establish the optimal predictor for the non-road GVI and green vision perception levels. Finally, the optimal model was selected to predict spatial green coverage in non-road areas within Beijing’s Third Ring Road.

3.1. Street View-Based GVI Calculation

In this study, the DeepLabv3+ model was selected for street view image binary semantic segmentation, which is an advanced neural network architecture that is designed for semantic image segmentation, combining Atrous convolution and an Atrous Spatial Pyramid Pooling (ASPP) module to capture multi-scale contextual information. The model enhanced the previous DeepLab architectures by introducing an encoder–decoder structure, where the encoder extracted features and the decoder refined the segmentation maps for more precise results. The model was highly effective in applications requiring detailed image segmentation, such as autonomous driving and medical imaging, due to its ability to accurately segment objects of varying sizes and shapes within an image [29]. To reduce the computing resources, the Xception backbone network was replaced by MobileNet v2 as the feature extraction backbone network. A DeepLabv3+ semantic segmentation pre-training model was built based on the Cityscapes dataset, combined with manually labeled greening samples from the street view images, and was retrained by implementing a grid search strategy to optimize the hyperparameters, such as the learning rate, output size, and batch size. Readers are referred to Wang et al. for detailed segmentation descriptions [30].

The GVI of each street view image is calculated as the ratio of green-classified pixels to the total pixel count, based on binary semantic segmentation results of the green elements (Equation (1)). The green view index ranges from 0 to 1. Increasing values indicate greater proportions of greenery in the street view perspective.

G V I = \frac{N_{g r e e n p i x e l}}{N_{p i x e l}} \times 100 %

(1)

In the formula,

N_{p i x e l}

represents the total number of pixels of the image calculated, which is obtained by multiplying the number of pixels in the image rows by the number of pixels in the columns, and is usually a constant of (4096 pixels × 1408 pixels);

N_{g r e e n p i x e l}

represents the number of pixels occupied by the green elements extracted from the image.

3.2. Correlation Analysis

The Sentinel-2 10 m resolution red, green, blue, and near-infrared bands, along with the green-related NDVI and FVC indices (six variables in total), were analyzed using 4070 GVI values from street view images to explore the correlations. The NDVI is a widely used remote sensing index that measures vegetation health and density via satellite imagery (Equation (2)) [8], while the FVC refers to the proportion of green vegetation coverage relative to total ground area (Equation (3)) [31]. It is a key biophysical parameter for assessing vegetation density and distribution in ecological studies. The FVC is expressed as a percentage that represents live vegetation coverage.

Since a GVI value represents green vision perception within a specific area, pixel-based band values (four bands) and indices (two indices) were aggregated to the corresponding spatial units. To account for varying street widths, three circular buffering strategies were applied to GVI locations as follows: fixed 25 m, fixed 45 m, and dynamic radii (45 m for the ring streets, 35 m for the main streets, and 25 m for the secondary streets; Figure 6). ArcGIS software was used to create buffers around each street view image location, within which the red, green, blue, near-infrared band, NDVI, and FVC values were averaged.

NDVI = \frac{N I R - R E D}{N I R + R E D}

(2)

where

N I R

and

R E D

represent the spectral reflectance from the near-infrared and red bands, respectively.

F V C = \frac{N D V I - N D V I_{b}}{N D V I_{v} - N D V I_{b}}

(3)

where

F V C

is the fractional vegetation coverage;

N D V I

is the normalized vegetation index of each pixel in the image;

N D V I_{b}

is the value of the area covered by all the bare soil pixels in the image

N D V I,

which is the value of the

N D V I

at the 5th percentile in ascending order in the image; and

N D V I_{v}

is the value of the area covered by the pure vegetation pixels in the image

N D V I,

which is the value of the

N D V I

at the 95th percentile in ascending order in the image.

Prior to analyzing the correlation between the GVI and Sentinel-2 data variables, we applied a non-parametric normality test to check whether the Sentinel-2 variables followed a normal distribution. This test calculates the goodness-of-fit between the variable distribution and the standard normal distribution. In this study, given the small sample size (<5000), the Shapiro–Wilk (S-W) test was performed to analyze the degree of deviation from normal distribution among the sampling data, where p-values were calculated using SPSS 28.0 [32]. The p-value in a Shapiro–Wilk test ranges from 0 to 1. The closer the p-value is to 1, the more closely it approximates a normal distribution. Subsequently, the Pearson correlation coefficient [33] was calculated to measure the degree of correlation between the GVI and each Sentinel-2 data variable. The Pearson correlation coefficient measures the linear relationship between two variables by dividing their covariance by the product of their standard deviations, and the value is always between −1 and 1. Coefficients closer to 1 or −1 indicate a stronger linear relationship, while values near 0 suggest a weak or no linear relationship.

3.3. GVI Regression and Classification Prediction

GVI, as a key three-dimensional metric for urban greenery assessment, is primarily derived from street view imagery. However, existing street view data are limited by temporal inconsistencies and insufficient spatial coverage. To address these limitations, we innovatively integrated the street view imagery with remote sensing data and employed both regression and classification prediction models. The classification prediction model directly estimates urban residents’ green perception levels in living environments, providing data support for urban planning and greenery assessment. The regression prediction model establishes quantitative relationships between the GVI and remote sensing vegetation indices, overcoming the spatiotemporal constraints of street view imagery to deliver more accurate GVI estimations. It should be emphasized that, given the current scarcity of experimental studies in this field, we conducted comprehensive experiments using both regression and classification prediction approaches.

Based on the averaged Sentinel-2 data variables and corresponding GVI values, machine learning regression models were employed to predict specific GVI values. Five models were compared to determine the optimal regression prediction model, including K-Nearest Neighbor (KNN) regression, linear regression, decision tree (DT) regression, random forest (RF) regression, and Extra Trees Regressor. Based on the results of the regression prediction models, we selected the most stable performing models for classification prediction. We optimized the model hyperparameters using a grid search approach to ensure maximum prediction accuracy. The 4070 samples were randomly split into training and testing sets with a ratio of 4:1. The models were constructed using the open-source scikit-learn library (https://scikit-learn.org/stable/api/sklearn.ensemble.html accessed on 19 June 2025).

Three accuracy evaluation indicators of the regression prediction model were utilized, which included the root mean square error (RMSE, Equation (4)), mean absolute error (MAE, Equation (5)), and determination coefficient (R², Equation (6)). RMSE is the square root of the expected squared difference between predicted and actual values. MAE represents the average absolute error between predictions and observations, reflecting the magnitude of prediction errors. Lower RMSE and MAE values indicate higher model accuracy. R² quantifies the proportion of variance in the dependent GVI variable explained by the independent Sentinel-2 variables. R² values closer to 1 demonstrate higher model accuracy [34].

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} ({\hat{y}}_{i} - y_{i})^{2}}

(4)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |{\hat{y}}_{i} - y_{i}|

(5)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}}{\sum_{i = 1}^{n} (y_{i} - {\bar{y}}_{i})^{2}}

(6)

where

n

is the number of test samples,

y_{i}

is the actual GVI value from the street view images,

{\hat{y}}_{i}

is the predicted value from the models, and

{\hat{y}}_{i}

is the mean of the actual values of the test samples.

One goal of this study was to explore urban green perception levels. Machine learning classification models were developed to directly predict green vision perception in non-road areas, rather than categorizing the levels based on regressed GVI values. It was hypothesized that classification accuracy could be improved through this approach.

Following Japan’s National Institute for Environmental Studies, green perception levels demonstrating positive psychological effects were defined as the following: poor [0–0.15), general [0.15–0.25], and good [0.25–1.00] [35]. Given that the GVI values in [0–0.1) were primarily attributed to sparse vegetation or image obstructions, the [0–0.15) range was subdivided into [0–0.1) and [0.1–0.15). Decision tree (DT) and random forest (RF) classifiers were implemented using the same training samples and scikit-learn library as the regression models, categorizing the predictions into four green levels. The classification accuracy was evaluated using the confusion matrix, which tabulates correct vs. incorrect classifications in the statistical classification process [36].

Both precision and recall metrics were calculated to evaluate the GVI level accuracy, which exhibits a trade-off relationship. The F1-score, defined as the weighted harmonic mean of precision (P) and recall (R), serves as a comprehensive evaluation index. Higher F1-scores indicate superior model classification performance. Accuracy (ACC) measures the overall prediction correctness of the classification models, defined as the proportion of correctly predicted instances in the dataset. ACC ranges from 0 to 1, with values approaching 1 signifying higher model accuracy.

P = \frac{T P}{T P + F P}

(7)

R = \frac{T P}{T P + F N}

(8)

F 1 = \frac{2 \times P \times R}{P + R}

(9)

ACC = \frac{T P + T N}{T P + F P + F N + T N}

(10)

where TP (True Positive) represents the number of correctly classified pixels that are actually within the specific green level. FN (False Negative) represents the number of incorrectly classified pixels that are not actually within the specific green level, FP (False Positive) represents the number of incorrectly classified pixels that are actually within the specific green level, and TN (True Negative) represents the number of correctly classified pixels that are not actually within the specific green level.

4. Results and Analysis

4.1. Sentinel-2 Variables Correlation Analysis

Table 1 presents the p-values from the Shapiro–Wilk (S-W) test for the Sentinel-2 variables across different buffer radii. The p-values from the B2, B3, B4, B8, NDVI, and FVC indices were greater than 0.9, indicating adherence to normal distribution, regardless of the buffer radii.

Table 2 presents the correlation coefficient of each variable under different buffering radii, showing that a radius of 25 m was optimal with the highest average absolute correlation coefficient of 0.560, which is consistent with the study of Su et al. [37]. The dynamic radius ranked second with the corresponding value of 0.549, while the radius of 45 m had the weakest correlation with 0.424. To be more specific, for the optimal radius of 25 m, the FVC and GVI showed the strongest positive correlation with a value of 0.752, which was followed by the NDVI, with a correlation coefficient of 0.750. B8 also had a relatively strong positive correlation with the GVI, with a value of 0.553. In contrast, B2, B3, and B4 had a negative correlation with the GVI, where B3 had a relatively apparent correlation with the value of −0.514. Compared with the results for the 25 m radius, all variables from the dynamic radius had weaker correlations with the GVI than B4. Variables from the radius of 45 m were expected to have the lowest correlation with the GVI.

Remote sensing observes the Earth’s surface from a vertical viewing angle, with no observation bias for vegetation information within a region [38]. In contrast, street view cameras capture street environments from pedestrian perspectives, potentially magnifying the weight of vegetation near the cameras with consideration for the depth of field (DOF) [39]. Street view images from the smaller-width street tend to overestimate the GVI due to more visible greenery at close ranges, while street view images from the larger one tend to underestimate the GVI from a long range with less green information. Our correlation analysis results demonstrated that the fixed 25 m radius is optimal among the three buffering radius strategies, which may signify that 35 m and 45 m dynamic radii are still excessive compared to the greening space capturing of street view collection equipment.

4.2. GVI Prediction Results

Based on the results of the regression prediction models, we eliminated two unsuitable prediction approaches—KNN and linear regression—and selected the DT, RF, and Extra Trees models for GVI classification prediction. Table 3 presents the final model parameters of the regression and classification prediction models that were optimized through the grid search method.

Figure 7 presents a comparison of five GVI regression models, where the RF model had the best prediction performance, achieving the lowest RMSE and MAE of 0.063 and 0.045, as well as the highest R² of 0.787. The DT model ranked second, while the linear regression model had the worst performance with the corresponding values of 0.088, 0.061, and 0.590, respectively.

Table 4 presents the classification performance metrics, revealing that the Extra Trees classifier attained optimal results, with an F1-score of 0.735, demonstrating balanced performance with a 0.745 recall and 0.737 precision.

Based on the evaluation results from five regression models, this study conducted a comparative analysis of the decision tree, random forest, and Extra Trees models using both regression and classification approaches. We discretized the regression outputs from the decision tree, random forest, and Extra Trees models into three green perception levels—[0–0.15), [0.15–0.25), and [0.25–1.00]—and compared them with the corresponding classification results (Table 5). In the classification prediction model, the accuracy evaluation metrics for the two classified intervals [0–0.10) and [0.10–0.15) were combined and averaged.

The Extra Trees classification model achieved the best performance for green visual perception level prediction, demonstrating that direct classification outperforms regression with subsequent discretization. Table 5 presents the performance metrics of R, P, F1, and ACC for both the regression and classification prediction approaches using the decision tree, random forest, and Extra Trees models on the test set. In regression predictions, all models demonstrated higher Ps than Rs. Conversely, the classification predictions showed consistently higher Rs than Ps, indicating that the regression predictions achieved a higher proportion of correct predictions among the positive samples, while classification predictions were more effective at identifying positive samples. When considering both the comprehensive F1-score and ACC metrics, the Extra Trees classification model exhibited superior performance. Notably, the classification predictions generally achieved higher ACCs than the regression predictions, with the Extra Trees classifier reaching an ACC of 0.652.

Regarding the results of the regression and classification prediction models, the highest R² accuracy metric in the regression models was 0.787, while the highest F1-score in the classification models was 0.717. After standardizing the evaluation metrics between the regression and classification prediction results, we found that the classification prediction models demonstrated significantly better performance in both accuracy metrics and overall precision compared to the regression prediction models. Through an analysis of these results, we identified two main factors. First, the presence of erroneous data in the training dataset was a key reason for the relatively lower model accuracy. Second, the higher data density near the threshold values of the green perception level classifications was primarily responsible for the classification models’ accuracy, as this significantly influenced the models’ predictive judgments.

4.3. GVI Distribution Analysis of the Study Area

Since the relationship between the GVI and Sentinel-2 satellite data at the pixel scale remains unclear and considering that the Extra Trees classification model demonstrated superior performance in both regression prediction and classification prediction of the green perception levels, this study adopted the Extra Trees classification model to predict GVI levels in non-street areas.

The prediction scope covers areas within Beijing’s Third Ring Road, where the model’s performance in predicting green perception levels for non-road areas will be evaluated for reasonableness and reliability. Within this study area, non-street regions were systematically divided into 50,480 concentric circles, each with an optimal buffering radius of 25 m. The research team calculated the mean values for pixel-based metrics (including Band 8, as well as the NDVI and FVC) within each circle, which were then processed using the Extra Trees classification model for direct prediction. To enhance the visualization of GVI spatial distribution, the researchers generated 12,620 concentric circles with a 50 m radius and then applied a majority-rules principle to aggregate the predicted values from the 25 m buffer zones (see Figure 8). Figure 8 presents the predicted green perception levels in non-road areas. Within the study area, we employed a four-tier color gradient, from red to green, to represent the different classification levels as follows: [0–0.10) pretty poor green vision perception (red), [0.10–0.15) poor green vision perception (orange), [0.15–0.25) moderate green vision perception (yellow), and [0.25–1.00] good green vision perception (green). The results revealed significantly inadequate green visual perception in non-street areas within the Third Ring Road, with most regions (marked in red) exhibiting low GVI values that ranged from 0 to 0.1.

The frequency and distribution of the green vision perception levels in non-street areas are shown in Figure 9, which quantitatively indicates that the green vision perception level within Beijing’s Third Ring Road is poor. The green vision perception level, ranging from 0 to 0.15, accounts for more than half, with an exact value of 56.8%, where the ranges from 0 to 0.1 and from 0.1 to 0.15 represent 44.6% and 12.2%, respectively. The moderate green vision perception level, ranging from 0.15 to 0.25, is slightly 2% higher than the good level, which ranges from 0.25 to 1.00.

4.4. Spatial Patterns of Green Perception Distribution Within Beijing’s Third Ring Road

Figure 10 identifies four green perception areas as follows: (a) Area B of Fuli City Community in Shuangjing Street, (b) Fangcheng Community in the Fangzhuang area, (c) around the Bridge of Fenzhong Temple in the Shibalidian area, and (d) a mixed residential area on Dashilar Street. The areas (a)–(c) are ordered from the highest to lowest green perception levels in the prediction results, with orange-yellow denoting poor green perception zones, and blue indicating better green perception areas. An examination of the corresponding real-world conditions reveals that both (a) and (b) are residential communities within the Second and Third Ring roads, with (a) being a recently developed neighborhood that exhibits superior greenery conditions compared to (b), which is due to its higher economic value. Area (c), situated near a major Third Ring Road transportation hub that is surrounded by industrial facilities, demonstrates lower greening levels. Area (d), containing cultural landmarks and extensive hutong alley networks, displays particularly low green perception levels within the Second Ring Road owing to difficulties in implementing greening measures caused by building typologies and area-specific constraints. A comparative analysis between the predicted results and actual conditions across these four areas representing varying green perception levels shows fundamental consistency, thereby providing substantial validation for our experimental outcomes.

Based on the aforementioned comparative analysis conclusions, we conducted a spatial distribution analysis of the green perception levels in the non-road areas within Beijing’s Third Ring Road. Figure 11 presents the community distribution within Beijing’s Third Ring Road, in combination with the prediction results. According to the results shown in Figure 11a, the green vision perception levels within the Second Ring Road were found to be lower than those in the areas between the Second and Third Ring roads. The region within the Second Ring Road primarily includes the Xicheng and Dongcheng districts, which are characterized by traditional alleyways and narrow lanes. Due to rapid population growth, these areas have become densely built-up, making greening efforts challenging due to the historical nature of the buildings and the spatial constraints of the region. Consequently, the green vision perception within the Second Ring Road remains relatively low [40].

In detail, areas with moderate to good green vision perception levels were primarily located in parks, universities, and some subdistrict communities. For example, parks along the southeast corner of Beijing’s Second Ring Road, such as the Temple of Heaven Park, Longtan Park, Taoranting Park, and Beijing Grand View Garden, exhibited good green vision perception levels, ranging from 0.25 to 1.0. At the northwest corner of the Third Ring Road, including communities such as Zizhuyuan, Beixiaguan, and Beitaiping, many universities, including Beijing Institute of Technology, Beijing Foreign Studies University, Beijing Normal University, and Beijing Jiaotong University, displayed moderate to good green vision perception levels that exceeded 0.15. Another area with a good green vision perception level was located in the Ganjiakou community to the west of the Third Ring Road, where Yuyuantan Park, Zizhuyuan Park, and the zoo area contain abundant vegetation.

Large areas with relatively poor green vision perception levels, ranging from 0 to 0.1, were primarily located within Beijing’s Second Ring Road. These areas included high-speed railway stations, commercial and cultural districts, distinctive cultural landmarks, and hutong residential areas within the main urban areas. For example, as the cultural center of the Chinese capital, the Second Ring Road encompasses most of the commercial and cultural blocks, as well as characteristic cultural attractions. Notable locations include Liulichang Ancient Cultural Street in the Dashilan community, Wangfujing Commercial Pedestrian Street in the Dongsi community, and the Qianmen community, which is primarily composed of guild halls and museums, such as Madame Tussauds Beijing, the Red Star Yuanshenghao Museum, and the Pigment Guild Hall. Additionally, Beijing’s hutong residential areas, such as the Xichangan, Financial Street, and Xinjiekou communities, also exhibited low green vision perception levels.

5. Discussion

We integrated street view imagery and remote sensing data to assess urban green space distribution. To address the lack of street view data in non-road areas, the GVI derived from roadside street view images was combined with Sentinel-2 data to predict the GVI in non-road areas. This section discusses the research from three aspects: innovation, methodological analysis, and research limitations.

5.1. Innovations

In urban green space assessment, many researchers consider the GVI as the most suitable three-dimensional evaluation metric [17,18,19]. However, street view imagery, as the primary data source, suffers from temporal inconsistencies in data collection and spatial limitations to road networks. These constraints hinder GVI research progress in urban areas. Comparatively, remote sensing imagery demonstrates superior temporal and spatial coverage due to its systematically acquired data, providing new perspectives for three-dimensional urban greenery assessment. Through multi-source data integration and cross-scale modeling, we combined the 2D perspective of remote sensing with ground-level 3D perception from street views, achieving comprehensive breakthroughs in urban greenery evaluation. This offered novel technical pathways for urban planning and human settlement enhancement. The key innovations are demonstrated in the following two aspects. First, we innovatively integrated street view imagery with Sentinel-2 remote sensing data. By combining pedestrian-perspective GVI and regional vegetation coverage metrics, we developed a green space prediction method that extends from road areas to non-road areas, effectively overcoming the coverage limitations of street view data in non-street zones. Second, leveraging strong correlations among the NDVI, FVC, and GVI, machine learning methods were employed to establish quantitative relationships between remote sensing indices and pedestrian visual perception through the integration of street views and remote sensing imagery, thereby addressing the perspective biases inherent in conventional two-dimensional vegetation indicators.

5.2. Green Perception Difference Analysis Between Street View and Satellite Imagery

5.2.1. Cross-Scale Data Alignment Between Street View and Satellite Imagery

Street view imagery, captured from a pedestrian perspective, simulates human visual perception on urban streets. It embodies characteristics of visual depth and field of view, emphasizing close-range environmental details while diminishing the perceptual prominence of distant objects [41]. In contrast, Sentinel-2 remote sensing imagery, acquired from a vertical perspective, offers a macroscopic capability for quantifying greenery but lacks the three-dimensional spatial hierarchy and rich micro-scale environmental details inherent in human visual perception. Figure 12 illustrates the primary differences in spatial scale between street view and remote sensing imagery. In Figure 12a, the remote sensing imagery is annotated with distances of 20 to 30 m to simulate the range of pedestrian visual perception, which is further segmented into four sub-regions (Regions 1–4). Figure 12b and Figure 12c, respectively, present the undistorted street view imagery after correcting for lens distortion, and the corresponding visual fields of Regions 1–4, as mapped in Figure 12a. The red bounding boxes in Region 1 of Figure 12c clearly demonstrate the perspective distortion characteristic of street view imagery, where nearer objects appear larger and more distant objects appear smaller due to the distance-dependent visual scaling.

Jiang Feng et al. [42] conducted a visual attention study using an eye movement experiment, concluding that pedestrians in urban street environments generally focus their attention within a 20–30 m range. Beyond this threshold, visual perception decreases significantly. In Figure 12, the core perceptual range of human vision in remote sensing imagery is delineated using red (30 m) and green (20 m) circles. Integrating the pedestrian perspective with street view imagery, the remote sensing image is subdivided into four fan-shaped sub-regions: Region 1 represents the front field of view, Region 2 the left side, Region 3 the rear, and Region 4 the right side. Due to greater visual depth in the front and rear views, distant objects appear compressed in street view imagery, aligning with how pedestrians perceive distant elements along a street. For example, in Region 1, trees closer to the observer (highlighted by a red bounding box) appear visually larger than those farther away. In this study, correlation analyses across different buffer radii further confirm these visual-perceptual observations. Under a 25 m buffer radius, the correlation coefficients between the Sentinel-2 near-infrared band, NDVI, FVC, and GVI reach 0.553, 0.75, and 0.752, respectively, which are substantially higher than those observed with a 45 m buffer or a dynamic buffer. Therefore, we determined that a 25 m radius represents the most reasonable spatial scale for aligning street view and remote sensing data analyses. Similarly, Feng Siyuan et al. [22] extracted lateral vegetation information from pedestrian perspectives using Baidu Street View imagery and observed a declining trend in the correlation between street-level greenery indicators and remote sensing-based vegetation metrics with increasing buffer distances. This finding further supports the validity of the spatial scale adopted in the present study.

5.2.2. Evaluating the Rationality of GVI Classification Predictions Using FVC Data

Figure 13 below presents a distribution map of the GVI levels and a 25 m FVC map. The GVI map (a) was produced by predicting the GVI for non-road areas by using spectral indices and band information from remote sensing imagery, which was followed by classification. The FVC map (b) is classified into five levels according to the methodology of Peng et al. [43]. The GVI and FVC exhibit a high degree of spatial consistency, which validates the reliability of the GVI results. As observed in Figure 13a,b, the areas with the highest and lowest levels of green vision perception show significant spatial overlap. For instance, this is evident in the high-greenery areas of the Temple of Heaven Park and Longtan Park (indicated by the black circles), and the low-greenery areas of the Forbidden City and Tiananmen Square (indicated by the blue circles). Additionally, the differences in visual presentation and local detail between the GVI and FVC are not deficiencies of the GVI model, but rather, they are determined by fundamental differences in their research dimensions and modeling methodologies. The FVC is derived from a top-down, 2D perspective using remote sensing imagery, quantifying the proportion of the land surface covered by the vertical projection of vegetation as a continuous value (from 0 to 1) for each 25 m × 25 m pixel. In contrast, the GVI aims to simulate the amount of greenery perceived by the human eye at street level and in its vicinity. It employs a classification-regression model to categorize the results into four distinct perceptual levels. Furthermore, the comparison between these two metrics demonstrates the feasibility of using 2D remote sensing data to predict and simulate 3D green vision perception, particularly in situations where comprehensive 3D street view imagery is unavailable.

5.3. Limitations and Future Work

There are also some limitations in this study. The primary issues concern the temporal consistency of street view imagery and the accuracy of spatial scale matching between the street view and remote sensing imagery. The first limitation lies in the inconsistency in street view image acquisition times, which may reduce the comparability of the GVI results. Since street view image collection requires professional equipment or certified contributors, it was challenging to ensure simultaneous data collection across locations. Although efforts were made to filter image collection times, not all images could be confined to a consecutive three-year period. From the perspective of discussion analysis accuracy and visualization effectiveness, reducing the study area scope could also be considered for a more detailed analysis of the GVI green coverage prediction results. The second limitation is the use of only three buffering radii to calculate the Sentinel-2-GVI correlations. Future research should explore more appropriate radii to improve cross-source data alignment effects. To address these issues, we propose three research directions. First, street view data collection timing should be optimized to ensure temporal and seasonal consistency, thereby improving the GVI’s calculation accuracy. We should also conduct stratified sampling of the street view imagery under various conditions to further analyze any correlations with remote sensing vegetation indices [1]. Secondly, based on the experimental data, we conducted further analyses of the street view images by incorporating human visual theory and eye movement experiment, considering that depth-of-field processing of street view images can more accurately reflect residents’ green visual perception [42]. Finally, advanced deep learning models, such as Multi-Layer Perceptron (MLP) and deep neural networks (DNN), should be explored, as they may better capture non-linear relationships compared to traditional models [44].

6. Conclusions

We estimated the green vison perception level in the non-street areas within Beijing’s Third Ring Road by combining Baidu map street view images, correlation analysis between a street view image-based GVI and Sentinel-2 B2, B3, B4, B8, NDVI, and FVC variables, as well as machine learning classification and regression prediction models. The correlation analysis demonstrated that the aggregated Sentinel-2 variables with a buffering radius of 25 m had the closest correlation with the GVI from street view images compared to the 45 m and dynamic radii with different street widths. Among the six Sentinel-2 related variables, the FVC and NDVI had a strong positive correlation with the GVI, with a correlation coefficient of 0.752 and 0.750, respectively. Sentinel-2 B2, B3, and B4 had a negative correlation with the GVI, with B3 showing the strongest negative correlation, with a coefficient of −0.514. Among the five machine learning regression models for predicting the GVI, the random forest regression model showed the best prediction performance with RMSE, MAE, and R² values of 0.063, 0.045, and 0.787, respectively. However, the classification model outperformed the regression model in estimating green vision perception levels, with the accuracy of green vision perception increasing by 5.5% (ACC) compared to the regression and the classification strategy. The spatial distribution of green vision perception levels in the non-street areas within Beijing’s Third Ring Road showed that the green vision perception level within the Beijing Second Ring Road was lower than in the area between the Beijing Second Ring Road and the Third Ring Road. Furthermore, 56.8% of the study area was within the poor green perception level from 0 to 0.15. The proportion of moderate and good green perception levels was similar at 22.6% and 20.6%, respectively.

Author Contributions

Conceptualization, H.W. and X.C.; methodology, H.W. and X.C.; validation, H.W.; formal analysis, H.W. and X.C.; resources, H.W. and X.C.; data curation, H.W.; writing—original draft preparation, H.W. and X.Y.; writing—review and editing, H.W., X.Y. and X.C.; visualization, H.W., X.Y. and X.C.; supervision, X.C.; project administration, X.C.; funding acquisition, X.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Key Research and Development Program of China (Grant No. 2022YFB3904202) and the Basic Scientific Research Operating Expenses of the Chinese Academy of Surveying and Mapping (No. AR2416).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

We would like to formally acknowledge the valuable contributions of Chi Jiang in the areas of data visualization and manuscript revision in response to the reviewers’ comments.

Conflicts of Interest

Author Hongyan Wang is employed by SpaceTellan Aerospace Spatiotemporal Information Technology (Chongqing) Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

Zhang, T.; Wang, L.; Hu, Y.; Zhang, W.; Liu, Y. Measuring Urban Green Space Exposure Based on Street View Images and Machine Learning. Forests 2024, 15, 655. [Google Scholar] [CrossRef]
Sousa-Silva, R.; Zanocco, C. Assessing public attitudes towards urban green spaces as a heat adaptation strategy: Insights from Germany. Landsc. Urban Plan. 2024, 245, 105013. [Google Scholar] [CrossRef]
Li, J.; Gao, J.; Zhang, Z.; Fu, J.; Shao, G.; Zhao, Z.; Yang, P. Insights into citizens’ experiences of cultural ecosystem services in urban green spaces based on social media analytics. Landsc. Urban Plan. 2024, 244, 104999. [Google Scholar] [CrossRef]
Qi, L.; Li, J.; Wang, Y.; Gao, X. Urban observation: Integration of remote sensing and social media data. Appl. Earth Obs. Remote Sens. 2019, 12, 4252–4264. [Google Scholar] [CrossRef]
Hwang, J.; Dahir, N.; Sarukkai, M.; Wright, G. Curating Training Data for Reliable Large-Scale Visual Data Analysis: Lessons from Identifying Trash in Street View Imagery. Sociol. Methods Res. 2023, 52, 1155–1200. [Google Scholar] [CrossRef]
Dang, H.; Li, J. The integration of urban streetscapes provides the possibility to fully quantify the ecological landscape of urban green spaces: A case study of Xi’an city. Ecol. Indic. 2021, 133, 108388. [Google Scholar] [CrossRef]
Nouri, H.; Nagler, P.; Chavoshi Borujeni, S.; Barreto Munez, A.; Alaghmand, S.; Noori, B.; Galindo, A.; Didan, K. Effect of spatial resolution of satellite images on estimating the greenness and evapotranspiration of urban green spaces. Hydrol. Process. 2020, 34, 3183–3199. [Google Scholar] [CrossRef]
Huang, S.; Tang, L.; Hupy, J.P.; Wang, Y.; Shao, G. A commentary review on the use of normalized difference vegetation index (NDVI) in the era of popular remote sensing. For. Res. 2021, 32, 1–6. [Google Scholar]
Moser, D.; Zechmeister, H.G.; Plutzar, C.; Sauberer, N.; Wrbka, T.; Grabherr, G. Landscape patch shape complexity as an effective measure for plant species richness in rural landscapes. Landsc. Ecol. 2002, 17, 657–669. [Google Scholar] [CrossRef]
Zhao, H.; Zhu, T.; Wang, S.; Lindley, S. Study on the Changes of Urban Green Space with Remote Sensing Data: A Comparison of Nanjing and Greater Manchester. Environ. Stud. 2022, 31, 461–474. [Google Scholar] [CrossRef]
Li, F.; Xie, S.; Li, X. Spatiotemporal Evolution of Urban Green Space in Beijing City Center Based on Multi-Source Data (1992–2016). Landsc. Archit. 2018, 25, 46–51. [Google Scholar] [CrossRef]
Xi, X.; Wei, Y.; Li, M. The Method of Measurement and Applications of Visible Green Index in Japan. Urban Plan. Int. 2018, 33, 98–103. [Google Scholar]
Osamu, K.; Shunsuke, T. Encyclopedia of the Science of Thought; Keishosha: Takeo, Japan, 1969. Available online: https://ndlsearch.ndl.go.jp/books/R100000039-I12404497 (accessed on 19 June 2025).
Motohiro, I. Considerations on Urban Issues. Road Constr. 1974, 320, 13–16. [Google Scholar]
Huang, G.; Yu, Y.; Lyu, M.; Sun, D.; Zeng, Q.; Bart, D. Using google street view panoramas to investigate the influence of urban coastal street environment on visual walkability. Environ. Res. Commun. 2023, 5, 065017. [Google Scholar] [CrossRef]
Rita, L.; Nathvani, R.; Peliteiro, M.; Bostan, T.C.; Muller, E.; Suel, E.; Metzler, A.; Tamagusko, T.; Ferreira, A. Using Deep Learning and Google Street View Imagery to Assess and Improve Cyclist Safety in London. Sustainability 2023, 15, 10270. [Google Scholar] [CrossRef]
Aoki, Y. Relationship Between Visual Field Expansion and Perception of Green Volume. J. Jpn. Inst. Landsc. Archit. 1987, 51, 1–10. [Google Scholar] [CrossRef]
Villeneuve, P.J.; Ysseldyk, R.L.; Root, A.; Ambrose, S.; DiMuzio, J.; Kumar, N.; Shehata, M.; Xi, M.; Seed, E.; Li, X. Comparing the Normalized Difference Vegetation Index with the Google Street View Measure of Vegetation to Assess Associations between Greenness, Walkability, Recreational Physical Activity, and Health in Ottawa, Canada. Int. J. Environ. Res. Public Health 2018, 15, 1719. [Google Scholar] [CrossRef] [PubMed]
Li, X.; Zhang, C.; Li, W.; Ricard, R.; Meng, Q.; Zhang, W. Assessing street-level urban greenery using Google Street View and a modified green view index. Urban For. Urban Green. 2015, 14, 675–685. [Google Scholar] [CrossRef]
Jie, Z.; Shan, L.; Zhiyuan, Z. Street Greening Space in Hefei’s Old Urban Area: A Multi-Source Data Approach. J. Beijing Univ. Civ. Eng. Archit. 2023, 39, 46–55. [Google Scholar]
Ye, Y.; Richards, D.; Lu, Y.; Song, X.; Zhuang, Y.; Zeng, W.; Zhong, T. Measuring daily accessed street greenery: A human-scale approach for informing better urban planning practices. Landsc. Urban Plan. 2019, 191, 103434. [Google Scholar] [CrossRef]
Feng, S.; Wei, Y.; Wang, Z.; Yu, X. Pedestrian-view urban street vegetation monitoring using Baidu Street View Images. Chin. J. Plant Ecol. 2020, 44, 205–213. [Google Scholar] [CrossRef]
Guo, J.; Ma, Z.; Bian, J.; Jiang, C. Analysis on the Influence of Environmental Characteristics of Xiamen Island on Housing Price based on Street View Imagery. J. Geo-Inf. Sci. 2022, 24, 2128–2140. [Google Scholar]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
Hu, G.; Yang, C.; Xu, L.; Shang, H.; Wang, Z.; Qin, Z. Improved U-Net remote sensing image semantic segmentation method. Acta Geod. Et Cartogr. Sin. 2023, 52, 980–989. [Google Scholar]
Li, M.; Yang, Z.; Xue, F. Urban Street Greenery Quality Measurement, Planning and Design Promotion Strategies Based on Multi-Source Data: A Case Study of Fuzhou’s Main Urban Area. Landsc. Archit. 2021, 28, 62–68. [Google Scholar]
Tong, M.; She, J.; Tan, J.; Li, M.; Ge, R.; Gao, Y. Evaluating Street Greenery by Multiple Indicators Using Street-Level Imagery and Satellite Images: A Case Study in Nanjing, China. Forests 2020, 11, 1347. [Google Scholar] [CrossRef]
Jing, X.; Li, Z.; Chen, H.; Zhang, C. “Is What We See Always Real?” A Comparative Study of Two-Dimensional and Three-Dimensional Urban Green Spaces: The Case of Shenzhen’s Central District. Forests 2024, 15, 983. [Google Scholar] [CrossRef]
de Andrade, R.B.; Mota, G.L.A.; da Costa, G.A.O.P. Deforestation Detection in the Amazon Using DeepLabv3+ Semantic Segmentation Model Variants. Remote Sens. 2022, 14, 4694. [Google Scholar] [CrossRef]
Wang, H.; Che, X.; Xu, X.; Xu, S.; Li, H. Green Visible Index Extraction and Analysis of Street View Image using DeepLabv3+ Model: Taking within the Third Ring Road in Beijing as an Example. Bull. Surv. Mapp. 2024, 3, 88–94. [Google Scholar] [CrossRef]
Zakaria, S.; Hafez, M.S.; Gad, A.M. A Latent Class Model for Multivariate Binary Data Subject to Missingness, International Journal on Advanced Science. Eng. Inf. Technol. 2021, 11, 1832–1840. [Google Scholar]
Shapiro, S.S.; Wilk, M.B. An analysis of variance test for normality (complete samples). Biometrika 1965, 52, 591–611. [Google Scholar] [CrossRef]
Mukaka, M. A guide to appropriate use of Correlation coefficient in medical research. Malawi Med. J. 2012, 24, 69–71. [Google Scholar] [PubMed]
Li, Y.; Liu, X.; Han, Z.; Dou, J. Spatial Proximity-Based Geographically Weighted Regression Model for Landslide Susceptibility Assessment: A Case Study of Qingchuan Area, China. Applied Sci. 2020, 10, 1107. [Google Scholar] [CrossRef]
Recommendations for Environmental Perception Research—Aiming for a Favorable Environment, Japan, (2007-07-25). 6 August 2015. Available online: https://www.nies.go.jp/kanko/kankyogi/25/25.pdf (accessed on 19 June 2025).
Zhu, J.; Huang, H.; Jin, Z.; Lv, J. Assessing Sensor Reliability Using Confusion Matrix Based on the Belief Function Theory. J. Res. Sci. Eng. 2023, 5. [Google Scholar] [CrossRef]
Su, L.; Chen, W.; Li, J.; Zhou, Y.; Fan, L. Analysis and Optimization of Urban Street Landscape Based on GVl and NDVI. J. Northwest For. Univ. 2024, 39, 256–264. [Google Scholar]
Liu, H.; Ren, H.; Niu, X.; Xia, P. Sentinel-2-Based Algal Bloom Extraction in Chaohu Lake. Ecol. Environ. Sci. 2021, 30, 146–155. [Google Scholar] [CrossRef]
Tang, J.; Wang, X.; Zhou, H.; Ji, J.; Li, Z.; Ye, Z. Multiimage-distance imaging system for extending depth-of-field. Optik 2023, 286, 170965. [Google Scholar] [CrossRef]
Liang, J. Analysis of Green Vision Rate in Beijing Five Rings Based on Street View Image. Master’s Thesis, Beijing Forestry University, Beijing, China, 2019. [Google Scholar] [CrossRef]
Li, Y.; Huang, J.; Liang, J.; Zhang, Y.; Chen, Y. Research on Visual Attraction and Influencing Factors of Perception of Commercial Street Space in Cultural Heritage Site: Taking Gulangyu Longtou Road as an Example. J. Hum. Settl. West China 2022, 37, 114–121. [Google Scholar] [CrossRef]
Jiang, F.; Tang, L.; Lin, D.; Chen, X.; Feng, X.; Chen, C. Green View Index Estimation Method based on Three-dimensional Simulation of Urban Tree Landscape. J. Geo-Inf. Sci. 2021, 23, 2151–2162. [Google Scholar]
Peng, W.; Wang, G.J.; Zhou, J.M.; Xu, X.; Luo, H.; Zhao, J.; Yang, C. Dynamic monitoring of fractional vegetation cover along Minjiang River from Wenchuan County to Dujiangyan City using multi-temporal landsat 5 and 8 images. Acta Ecol. Sin. 2016, 36, 1975–1988. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]

Figure 1. Beijing’s third ring road inner area.

Figure 2. Data acquisition flow chart.

Figure 3. A comparative visualization of street panorama imagery pre- and post-preprocessing.

Figure 4. Temporal distribution of street view images.

Figure 5. The false-color composite image of the study area.

Figure 6. Three different circle radii buffering strategies.

Figure 7. Accuracy comparison of five regression prediction models using the training sample set.

Figure 8. Spatial distribution of regional GVI at the plane scale.

Figure 9. The proportion and distribution of a green vision perception level in a non-street area.

Figure 10. Comparisons of planting distribution in cold and hot areas.

Figure 11. Spatial distribution patterns of communities versus prediction results within Beijing’s Third Ring Road.

Figure 12. Spatial perceptual correspondence between street view and remote sensing imagery under different perspectives.

Figure 13. Comparison of the spatial distribution of 25 m GVI and FVC.

Table 1. p-value of S-W test for each variable from Sentinel-2 data using different buffering.

Radius/Index	B2	B3	B4	B5	NDVI	FVC
25 m	0.965	0.977	0.945	0.986	0.967	0.964
45 m	0.978	0.986	0.970	0.990	0.987	0.985
Dynamic radius	0.976	0.982	0.956	0.990	0.976	0.974

Table 2. Correlation coefficient between the GVI and Sentinel-2 variables under different buffering radii.

Buffer Radius	B2	B3	B4	B8	NDVI	FVC	Average Absolute Coefficients
25 m	−0.473	−0.514	−0.320	0.553	0.750	0.752	0.560
45 m	−0.314	−0.335	−0.226	0.464	0.601	0.602	0.424
Dynamic radius	−0.473	−0.504	−0.330	0.545	0.722	0.724	0.549

Note: The NDVI confidence interval was set to 5–98% based on the actual distribution of the NDVI values in the imagery. The FVC is calculated using Formula (3), where

{N D V I}_{b}

(bare soil NDVI) corresponds to the 5th percentile of the NDVI within the confidence interval, and

{N D V I}_{v}

(pure vegetation NDVI) corresponds to the 98th percentile of the NDVI within the confidence interval.

Table 3. Parameters of each model.

Algorithm Model	Parameter Name	Value
KNeighbors Regressor	n_neighbors, leaf_size	10, 30
Linear Regression	Algorithm	Gradient-Descent
Decision Tree Regression	Criterion, max_depth, max_leaf_nodes	friedman_mse, 20, 100
Random Forest Regression	Criterion, max_depth, max_leaf_nodes, n_estimators	friedman_mse, 20, 173, 80
Extra Trees Regressor	Criterion, max_depth, max_leaf_nodes, n_estimators	friedman_mse, 20, 173, 80
Decision Tree Classification	Criterion, max_depth, max_leaf_nodes	friedman_mse, 20, 173
Random Forest Classification	max_depth, max_leaf_nodes, n_estimators	20, 100, 100
Extra Trees Classification	max_depth, max_leaf_nodes	20, 173

Table 4. Accuracy evaluation index scores of training sample sets of different prediction models.

Classification Accuracy Evaluation Index		P	R	F1
Classification prediction model	Decision Tree	0.717	0.724	0.717
	Random Forest	0.718	0.723	0.711
	Extra Trees	0.737	0.745	0.735

Table 5. Accuracy evaluation index scores of test sets of different prediction models.

Classification Accuracy Evaluation Index		P	R	F1	Test Set Accuracy
Regression prediction model	Decision Tree	0.543	0.527	0.535	0.565
	Random Forest	0.554	0.549	0.552	0.597
	Extra Trees	0.569	0.558	0.564	0.603
Classification prediction model	Decision Tree	0.518	0.521	0.520	0.607
	Random Forest	0.539	0.551	0.545	0.644
	Extra Trees	0.561	0.569	0.565	0.652

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, H.; Che, X.; Yang, X. Investigating Green View Perception in Non-Street Areas by Combining Baidu Street View and Sentinel-2 Images. Sustainability 2025, 17, 7485. https://doi.org/10.3390/su17167485

AMA Style

Wang H, Che X, Yang X. Investigating Green View Perception in Non-Street Areas by Combining Baidu Street View and Sentinel-2 Images. Sustainability. 2025; 17(16):7485. https://doi.org/10.3390/su17167485

Chicago/Turabian Style

Wang, Hongyan, Xianghong Che, and Xinru Yang. 2025. "Investigating Green View Perception in Non-Street Areas by Combining Baidu Street View and Sentinel-2 Images" Sustainability 17, no. 16: 7485. https://doi.org/10.3390/su17167485

APA Style

Wang, H., Che, X., & Yang, X. (2025). Investigating Green View Perception in Non-Street Areas by Combining Baidu Street View and Sentinel-2 Images. Sustainability, 17(16), 7485. https://doi.org/10.3390/su17167485

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Investigating Green View Perception in Non-Street Areas by Combining Baidu Street View and Sentinel-2 Images

Abstract

1. Introduction

2. Study Area and Data Source

2.1. Study Area

2.2. Street View Data

2.3. Remote Sensing Data

3. Methodology

3.1. Street View-Based GVI Calculation

3.2. Correlation Analysis

3.3. GVI Regression and Classification Prediction

4. Results and Analysis

4.1. Sentinel-2 Variables Correlation Analysis

4.2. GVI Prediction Results

4.3. GVI Distribution Analysis of the Study Area

4.4. Spatial Patterns of Green Perception Distribution Within Beijing’s Third Ring Road

5. Discussion

5.1. Innovations

5.2. Green Perception Difference Analysis Between Street View and Satellite Imagery

5.2.1. Cross-Scale Data Alignment Between Street View and Satellite Imagery

5.2.2. Evaluating the Rationality of GVI Classification Predictions Using FVC Data

5.3. Limitations and Future Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI