1. Introduction
Digital footprints are an inevitable part of participating in today’s digitally connected world [
1,
2,
3]. People leave digital footprints, both intentionally and unintentionally, through interactions with digital technologies and online platforms [
4,
5,
6,
7,
8]. For example, individuals may unknowingly leave digital footprints when visiting a website via browser cookies or intentionally share their experiences by commenting on and rating a place on Google. Such footprints do not merely reflect an individual’s online activity; they also mirror broader interactions with the physical environment and local businesses. Urban dynamics and consumer behaviors are intricately connected; therefore, analyzing digital footprints provides an opportunity to optimize urban design and business operations, making it a pivotal aspect of sustainable urban development. Consequently, understanding how these digital traces interweave with spatial characteristics allows us to discern emerging patterns that influence both consumer decisions and urban vitality.
By focusing on the relationship between spatial factors, such as the location of a venue on a particular street, and digital footprints, this research contributes to a more comprehensive understanding of user behavior and venue attractiveness in urban spaces.
Specifically, we focus on three research objectives:
Centrality Impact: Determine to what extent the centrality of a specific location (in particular, the street on which an experience occurs) affects whether an experience is rated positively or negatively.
Pedestrian Flow Effect: Investigate how dynamic spatial data, including sensor-derived pedestrian counts, affects the volume and sentiment of online reviews.
Role of Spatial Attributes: Explore the contribution of various spatial factors in shaping digital consumer behavior.
Building on these objectives, the next section poses a question arising from this relationship between the spatial context and digital footprints: does the inherent quality of an experience alone dictate its ratings, or do contextual factors, such as location centrality and pedestrian flow, also significantly shape consumer feedback? To address this, the following research questions guide our inquiry: first, does the centrality of a specific location (e.g., the street where the experience occurred) influence whether it is rated positively or negatively? Second, does the number of people passing at the point of experience affect its ratings and reviews? Investigating these questions lies at the core of understanding digital footprints and their implications for urban design and visitor behavior.
To address these research questions, this study analyzes various factors that influence the digital footprints left by consumers. Specifically, we focus on the effect of sensor-derived pedestrian counts, the spatial distribution of stable city dwellings, the centrality metrics of streets in both primal and dual graph approaches, street sinuosity and functionality, the distance between places and the nearest street, the size of the neighboring cell of places, and place type on consumer reviews and ratings. Our research is centered on Melbourne’s city center, using Google’s POI data to study eating and drinking out (EDO) places and the associated digital footprints. To predict the review counts and ratings of places, categorized as low and high, we employed the random forest machine learning technique. Key features such as pedestrian count, city dwelling distribution, street type, sinuosity and venue proximity were influential in predicting both review volumes and ratings. The centrality metrics improved rating prediction accuracy but negatively affected review volume predictions.
The remainder of this paper is organized as follows:
Section 3 describes the study area and details all of the data processes, including data integration, feature generation and the prediction model.
Section 4 presents the results, examining them from multiple perspectives.
Section 5 discusses the findings, and
Section 6 concludes the paper with future directions.
2. Background of the Study
This section provides an essential examination of the pivotal studies that bring in the focus on the relationship between spatial context and the formation of digital footprints, thereby framing the scope of our research. Muhammad et al. [
9,
10] systematically reviewed the factors influencing customers’ willingness to leave big data digital footprints on social media platforms. They emphasized that frameworks like the technology acceptance model, theory of planned behavior and unified theory of acceptance and use of technology are widely used to explain user engagement with technology. These models consider factors such as the ease of use, usefulness and trust, which shape users’ behaviors and decision-making processes. The research also underlined the privacy and security issues in encouraging or discouraging users from sharing their digital footprints.
The relationship between Google ratings and spatial metrics was investigated by Hacar et al. [
11], demonstrating how similar spatial characteristics of different locations influence people online. Numerous studies have found that such relationships significantly impact spatial online behavior [
12,
13,
14]. The influence of online reviews and ratings on user behavior has emerged as a key area of research, particularly as the use of platforms such as Google Maps [
15]. Google Maps API serves as a valuable resource, providing a vast database of geolocated reviews and ratings that reflect visitor impressions of specific points of interest (POIs) [
13,
16]. For instance, Li and Hecht [
17] compared restaurant ratings on all platforms and noted discrepancies that can influence local search outcomes and decision-making processes. Akkaya et al. [
18] utilized machine learning to explore how Google Maps reviews can be leveraged as participatory urban planning tools, enabling public involvement in decision-making through user-generated content analysis. The utility of social media for geospatial research was elaborated by Owuor and Hochmair [
19], who reviewed social media applications and highlighted the importance of multiple data sources to minimize biases and improve representativeness. Tang et al. [
20] examined the impact of the built environment on social media followings, finding significant correlations between urban infrastructure, such as pedestrian density, and social media engagement. Healthcare facilities have also been examined within the context of user-generated ratings [
21,
22]. Tse et al. [
23] conducted an analysis of Canadian hospitals using Google ratings, revealing correlations between ratings and hospital attributes, such as teaching status and patient experience surveys. Leiras and Eusébio [
24] used Google Maps reviews to assess accessible tourism destinations, identifying insights into accessibility and user satisfaction. Girardin et al. [
12] and Önder et al. [
25] highlight how georeferenced photos and mobile data reveal tourist movements and behaviors. Gavilan et al. [
26] examined the interaction between numerical ratings and the number of reviews in the decision-making process, focusing on their combined impact on trustworthiness perceptions. Their findings revealed an asymmetric effect, where the reliability of good ratings depended on the number of reviews, while bad ratings remained unaffected by review quantity. Studies using geo-tagged photos from platforms like Flickr demonstrate the potential of digital footprints in analyzing tourism demand [
27,
28,
29,
30,
31]. Similarly, Maldonado-Gil and Psarra [
32] analyzed Instagram images from London to understand social dynamics and spatial characteristics, showcasing the utility of social media data in urban analysis. In terms of managing public use in protected areas, Palacio Buendía et al. [
33] employed Public Participation GIS to understand visitor interactions and improve environmental management through user-generated geospatial data. Umer et al. [
34] proposed a machine learning model to predict app ratings based on user reviews, addressing challenges of biased feedback in user-generated content. Digital footprints also reveal a great deal about cultural and demographic diversity. Rabiei-Dastjerdi et al. [
35] analyzed Google reviews in Dublin to map cultural diversity, demonstrating the potential of user-generated data in cultural studies. The importance of such data in urban research is further emphasized by studies such as that of Bernabeu-Bautista et al. [
36], which highlighted the relationship between economic activity areas and location-based social network data. The ongoing research continues to explore these relationships, offering new insights into the spatial dimensions of visitor experiences [
11,
32,
37]. Moreover, the influence of Google reviews on behavioral patterns has been widely documented in the literature [
38,
39,
40,
41,
42,
43]. However, since these studies have primarily considered individuals as consumers, they focused only on the characteristics of the product itself or the business from which it is obtained.
Digital interactions, such as reviews and ratings on platforms like Google Maps, offer valuable insights into how urban environments influence consumer decisions. As Buchanan et al. [
44] indicated, the understanding and management of digital footprints are critical for both individuals and organizations in Melbourne. While the existing research has extensively explored the effect of product and business attributes on consumer decisions, the spatial context has often been overlooked. For instance, studies focusing on centrality metrics or pedestrian dynamics tend to emphasize static spatial characteristics rather than the dynamic interplay between urban geometry and consumer behavior. Furthermore, the use of Google reviews in prior research has primarily been centered on consumer perspectives, without adequately considering how the spatial configuration of the urban environment can shape digital footprints. This gap underscores the need for more integrative approaches that account for both micro- (e.g., street geometry and venue proximity) and macro-level (e.g., network centrality and pedestrian flow) spatial factors. Although numerous studies have investigated the effects of online reviews and ratings on consumer decisions, most of this research predominantly focuses on product or business attributes (e.g., pricing, quality and brand reputation), with limited consideration for the spatial characteristics of the urban environment. Indeed, the existing literature often treats location merely as a static factor rather than a dynamic urban component that can shape user behaviors, movement patterns, and ultimately, online engagement. By explicitly examining how varying street configurations, pedestrian densities and geometric street features influence review volumes and rating tendencies, this study addresses a critical gap in understanding the spatial drivers of digital consumer behavior. Moreover, most of the studies examining Google reviews and ratings, as well as user evaluations on social media platforms, have a significant gap in understanding the influence of spatial characteristics of the environment on user behavior. This study aims to address this gap by investigating how the spatial features of urban environments impact user-generated reviews and ratings, thereby examining the influence of these reviews and ratings on behaviors within a spatial context.
This study addresses these limitations by integrating dynamic pedestrian data, detailed geometric street attributes and urban network metrics into a unified model. By explicitly examining the significance of urban spatial characteristics on digital consumer behaviors, this research provides a novel contribution to the field, bridging the gap between urban planning and consumer analytics.
3. Materials and Methods
3.1. Study Area and Data
The study area, Melbourne, is a major metropolitan city known for its diverse array of eating and drinking venues (
Figure 1). For this study, data integration is key to establishing a comprehensive model that associates various urban characteristics and their potential impact on digital consumer behaviors, specifically in terms of reviews and ratings. The datasets used in this research include three distinct but interrelated sources. The first dataset consists of Google POIs for eating and drinking out places, providing semantic attributes such as place type, review count, and rating [
45]. To improve the reliability of analyzing visitor behavior, only venues with a minimum of 10 reviews were selected. This data filter allowed us to focus on consumer-oriented places with sufficient review activity, allowing a more robust exploration of the relationships between place locations and the digital footprints of consumers.
The second dataset, extracted from OSM using the BBBike web service for city extraction [
46], includes road lines to define street networks. This dataset also includes
highway tags, which classify roads by type [
47], offering information on how different types of streets can affect consumer behavior and the access to places. Centrality measures, which are critical to the analysis of the importance and connectivity of each street segment, were calculated using the Python NetworkX package (version 3.2.1) [
48].
The third dataset, sourced from Melbourne Open Data, includes pedestrian count data and information on residential dwelling [
49,
50,
51]. Melbourne is uniquely positioned for this type of research due to its advanced pedestrian sensor network, which has been collecting hourly pedestrian count data since 2009. Additionally, the city provides open access to a wealth of high-resolution datasets, including pedestrian dynamics and other sensor-based urban metrics. This extensive data availability enables a detailed and comprehensive exploration of pedestrian behaviors and urban characteristics. Pedestrian count data, which is collected by sensors, reflect the intensity of movement throughout the urban landscape, while residential dwelling data provide insights into the spatial distribution of stable populations. To prepare these data for analysis, we used an Inverse Distance Weighting (IDW) interpolation with a power of 2 within a QGIS (version 3.28) environment to estimate the continuous surfaces of pedestrian and residential densities [
52].
3.2. Data Integration, Feature Generation, and Prediction Model
In this section, we present the process of integrating various datasets and generating features that describe the spatial, demographic and network-related characteristics relevant to consumer behavior in urban dining settings (
Figure 2). The primary goal is to predict two dependent variables: review counts (categorized as high or low) and ratings (high score or low score), both divided into binary classes based on quantiles. We used a random forest classifier that predicted the high and low categories, a model that effectively manages diverse feature sets (FSs) using the interconnections between spatial, demographic and network properties.
The features are organized into six sets, each representing a distinct aspect of the urban environment and potential consumer interaction.
FS 1 (pedestrian count): This set consists of interpolated data for the annual (1) mean pedestrian counts per hour, (2) peak pedestrian count during the most crowded hour (17:00) in 2022 and 2023, and (3) number of residential dwellings in 2023. All values were processed with Inverse Distance Weighting (IDW), allowing a detailed spatial representation of pedestrian and resident density, which may impact both the review volume and ratings. Therefore, to integrate these values with POI data, we created Voronoi polygons based on the POIs, defining spatial catchments for each place (
Figure 3). We then calculated the mean values of the pedestrian and residential dwelling data within each Voronoi cell, creating features that represent the localized movement and population density around each cell of places.
FS 2 (distance to and shape of the street): This set includes (1) the Euclidean distance from each POI to its closest street and (2) the sinuosity of the closest street. These geometric measures describe the physical relationship between places and their surroundings, which may influence how consumers perceive and interact with these locations.
FSs 3 and 4 (centrality measures): Both sets focus on centrality metrics, calculated with primal and dual approaches to model the impact of the street network structure on the prominence of the place. In both the primal (FS 3) and dual (FS4) approaches, we use eight metrics: betweenness (B), closeness (C), degree (D), eigenvector (E), harmonic (H), Katz centralities (K), load (L) and PageRank (P), all of which assess the connectivity within the network. These metrics help describe the extent of the integration or separation of each place within the urban grid.
FS 5 (road functional class): This set is derived from OpenStreetMap highway tags, which categorize streets based on their functional hierarchy, from motorway roads to minor streets. The classification helps to distinguish the influence of different road types on consumer movement patterns around each place.
FS 6 (place-based features): This set includes the area of each Google POI’s Voronoi cell, defining its spatial reach and the type of place (e.g., bar, café, or restaurant).
Using the defined feature sets, a random forest classifier was conducted. The random_state parameter of our prediction model is explicitly set to 42 to ensure reproducibility. The rest of the key hyperparameters are as follows: 100 trees (n_estimators = 100), the Gini impurity criterion and the square root of the total features considered at each split. The data, which consists of 1582 POIs, is split into training (1265) and testing (317) datasets using an 80–20% split, with the random seed set to 42 for consistent partitioning [
53].
4. Results
The results highlight how the urban configuration, density of residential areas and pedestrian flow pattern influence consumer interaction with Melbourne’s EDO establishments.
Figure 4 provides a detailed visualization of the features.
Figure 4a illustrates the distribution of centrality metrics, including
B,
C,
D,
E,
H,
K,
L and
P, in the study area. These metrics capture the structural importance of streets within the network, highlighting key routes and intersections that may influence accessibility and consumer behavior. The varying intensities of these metrics suggest differences in the functional and spatial functions of specific streets in the urban configuration.
Figure 4b shows the density of residential dwellings in 2023, with warmer colors indicating areas of higher population concentration. This layer provides the essential context for understanding the spatial distribution of stable populations, which significantly affects foot traffic and potential consumer engagement with nearby venues.
Figure 4c presents two visualizations of pedestrian movement: the average hourly pedestrian count for 2023 and the peak pedestrian count at 17:00 in the same year. These maps use gradient colors to represent the intensity of movement, with red areas indicating higher levels of activity. The data reveal key hotspots of pedestrian activity, reflecting the influence of urban design, connectivity and functional land use.
The graphs in
Figure 5 illustrate the weighted F1-scores achieved by applying different combinations of FSs to EDO places in Melbourne. The last three bar graphs in
Figure 5 show the performance if the model only learns digital footprints at a specific type of place (that is, bars, cafés, restaurants).
Figure 5a shows that the combination of FSs 1, 2, 5 and 6, which remarks using none of the centrality measures, achieved the highest F1-score (0.65) for general review volume prediction, while the usage of all of the feature sets (FSs 1–6) gave the lowest score. This shows that integrating the primal and dual centrality features in our prediction model affected review volume prediction negatively. The cafes performed moderately well when the combination given the highest score was applied with F1-scores of 0.56. However, the model’s ideal feature combination was unable to predict the review volumes at bars, effectively. The second graph demonstrates that using all FSs 1–6 produced the highest F1-scores for predicting visitors’ scores, particularly for cafes (0.60) and general EDO venues (0.62) (
Figure 5b). Bars and restaurants showed slightly lower predictive performance in both cases, reflecting differences in consumer behaviors and review patterns between place types. In any case, the prediction of visitor scores was performed with F1-scores higher than 54%.
We visualized the feature importances on the two best FS combinations predicting the volume of reviews (
Figure 6a) and visitor scores (
Figure 6b). In both models, FS 1, which consists of pedestrian and residential dwelling values, has the highest importance, while FS 5, which represents the functional road classification, has the least. Interestingly, while FS 2 seems more efficient than FS 6 in predicting review volume, the result in the score prediction is the opposite.
For
Figure 6b, FS3, which represents primal centrality metrics, appears to have a significant importance, nearly matching the contribution of FS1 (pedestrian and residential density) in predicting visitors’ scores.
Centrality metrics reveal a detailed impact on consumer behavior. Primal centrality metrics, such as betweenness and closeness, highlight the importance of well-connected urban nodes around the case places. These findings suggest that venues located in central, highly connected areas benefit from increased accessibility, which contribute to the distinction of consumer satisfaction levels. This finding remarks the dual role of local activity and broader network integration in shaping visitor perceptions of venues. Centrality metrics, such as betweenness and closeness, provide insights into how well-connected a place is within the street network, potentially enhancing its visibility and ease of access.
5. Discussion
The results of this study reveal a critical distinction between predicting review volumes and visitors’ scores when analyzing different types of POIs. Specifically, the variability in prediction accuracy is more pronounced for review volume when considering individual POI types compared to modeling all types together. This finding suggests that, for the purpose of classifying review capacity, it is more effective to incorporate a diverse range of POI types that serve similar consumer needs into a unified model, rather than analyzing them separately.
Review volumes appear to be influenced by shared characteristics across POI categories, such as visibility, accessibility and pedestrian movement, which transcend the distinctions between specific types of venues (e.g., cafés, bars or restaurants). By combining all POI types in a single model, these commonalities are better captured, enhancing the model’s ability to generalize and predict review volumes. In contrast, when POI types are modeled separately, the model’s performance may be limited by the reduced variability in the data, which restricts its ability to account for broader patterns. The estimations for both review volume and visitor score in café venues showed the highest performance, suggesting that consumer behaviors in these settings may be more predictable. These findings reveal that cafes consistently outperform other POI types in predictive performance. This can be attributed to their frequent use in high-density, central locations, which strongly correlate with the model’s key features. For review volume, restaurants demonstrated a higher prediction score than bars, possibly due to more consistent review patterns. In contrast, visitor score predictions were more accurate for bars than for restaurants, indicating that different factors can influence consumer ratings in these types of venues.
Figure 6 highlights that pedestrian and residential density, grouped under FS 1, are the most influential features in predicting both review volumes and visitor scores. The dominance of FS1 (pedestrian and residential density) in both predicting review volumes and visitor scores can be attributed to its role as a proxy for foot traffic and local consumer base stability. High pedestrian activity reflects increased consumer access, which naturally enhance review volumes. Similarly, residential density ensures a consistent local customer base that contributes to frequent interactions and reviews. These results align with urban density theories, which emphasize that concentrated urban activity drives economic and social engagement [
54]. This dominance can be explained by theories of urban density and accessibility, which posit that areas with higher population densities naturally attract greater consumer interaction due to the availability of potential customers within a close proximity [
55]. Pedestrian density reflects the movement and activity intensity in urban spaces, making it a direct proxy for visibility and foot traffic—two critical factors influencing venue success. Similarly, residential density provides a stable base of local customers who contribute to both frequent visits and reviews, particularly for venues embedded in dense neighborhoods. From an accessibility perspective, high-density areas often correspond to central locations within the urban network, characterized by their better connectivity and shorter travel times. This aligns with the classic central place theory [
54], which emphasizes the relationship between spatial accessibility and the concentration of economic activity. The findings from FS 1 suggest that the proximity to dense, active urban zones remains a key determinant of consumer engagement, supporting its strong predictive power in the model. In our study, the relationship between residential density and review volumes further illustrates how stable population clusters near venues ensure frequent consumer interaction and feedback. Furthermore, the alignment of pedestrian density with this theory highlights its role as a proxy for a catalyst of foot traffic, reinforcing the significance of dense, accessible urban zones in shaping digital consumer behavior.
The comparable importance of FS3 (or together with FS4) to FS1 indicates that visitors’ scores are influenced not only by the immediate surrounding environment (e.g., population density and foot traffic) but also by the venue’s strategic position within the urban network. A place located at a highly central node may benefit from increased exposure and better accessibility, aligning with urban mobility theories that emphasize the importance of both local density and global connectivity.
Our findings reveal that pedestrian density plays a crucial role in shaping both review volumes and visitor scores, a result that aligns with Tang et al. [
20], who highlighted the significance of pedestrian activity in influencing social media engagement. However, our research extends this understanding by incorporating dynamic pedestrian data, such as peak-hour movement, and centrality features to evaluate its impact on consumer behavior. This dynamic- and network-based approach provides a more granular understanding compared to previous studies.
A deeper analysis of centrality measures reveals nuanced impacts on prediction outcomes. Specifically, primal centrality metrics, such as betweenness and closeness, improved the prediction of visitors’ scores by highlighting well-connected nodes within the network. These measures reflect how accessible or integrated a venue is within the broader street network. Conversely, dual centrality measures, which evaluate segment-based connectivity, had limited effectiveness in predicting review volumes. This indicates that while network-level integration impacts the perceptions of venues, localized factors may drive review generation.
Moreover, our integration of geometric street features, such as sinuosity and the proximity to main roads, addresses a gap in the existing literature. Gavilan et al. [
26] suggested that review volumes depend primarily on trustworthiness, but our study shows that urban spatial configurations can also play a significant role, adding a new layer to the discourse on digital consumer behavior. Also, the findings of this study align with the analysis conducted in [
11], which demonstrated the influence of spatial correlation with digital footprints in the self-organized city of Matera, Italy. Our results suggest that the spatial variables emphasized in [
11] may have differing impacts in certain contexts; for instance, the negative effect of centrality metrics on review volumes in Melbourne implies that while broader network-level integration improves consumer experiences, the act of leaving a review is more affected by localized factors, such as the pedestrian density around the venue, the street geometry (e.g., sinuosity or proximity to main roads) or the venue’s specific spatial catchment area as defined by Voronoi polygons.
The study highlights the dual influence of the broader urban structure and immediate local factors on consumer behaviors. While centrality and pedestrian flow data inform network-level engagement trends, features like street sinuosity and proximity suggest that localized elements also shape consumer decision-making. This interplay suggests that strategies targeting venue success must consider both macro- and micro-scale dynamics. The analysis also underscores the importance of geometric measures: street sinuosity and the proximity to roads. Places located near streets with low sinuosity tended to attract higher review volumes, suggesting that straightforward access routes positively influence consumer engagement. However, in predicting visitor scores, this relationship was less pronounced, hinting at the more complex interplay of experiential factors in determining ratings.
One notable limitation of this study is the reliance on Google POI data, which excludes venues with fewer than 10 reviews to improve the quality of the study, as stated in
Section 3.1. This exclusion may introduce a bias toward well-established locations, potentially overlooking insights from smaller or emerging venues. Additionally, the use of digital footprints carries the risk of including fake or unreliable reviews, despite the algorithms used in the Google Map API platform designed to detect such anomalies.
6. Conclusions
This study developed a prediction model to examine the influence of urban spatial characteristics on digital consumer behaviors. The model serves two primary purposes: assisting new EDO businesses in selecting optimal locations and providing existing EDO businesses with valuable insights for situation analysis. By applying this model, new businesses can make informed location decisions based on predicted consumer engagement levels, while established businesses gain a clearer understanding of their current standing in terms of consumer interactions.
The findings reveal that pedestrian flow and residential density are the most significant factors influencing review volumes and visitor scores, underscoring the importance of visibility and accessibility in consumer engagement. Centrality measures also played a critical role, positively impacting visitor scores but negatively influencing review volumes, suggesting that a broader network connectivity enhances the perceived satisfaction, while localized factors drive reviewing behavior. The geometric properties of streets, along with venue specific characteristics, further contribute to predicting review volumes, highlighting the relationship between urban design and digital consumer interactions.
Academically, this study contributes to the literature by integrating dynamic pedestrian data and geometric street attributes into predictive modeling, offering a novel framework for analyzing the relationship between urban configurations and online consumer behavior. Practically, the research provides actionable insights for urban planners and business stakeholders. For instance, the findings emphasize the value of high-density pedestrian areas and well-connected locations in attracting consumer engagement. These implications can guide business owners in optimizing location selection and assist city planners in designing more consumer-friendly urban environments.
Future research could explore methods for validating data to improve the robustness of findings. Expanding the scope of dependent variables and incorporating environmental characteristics, such as sidewalk width, greenery and lighting, would provide a more refined understanding of digital consumer behaviors. Researchers might also consider segmenting pedestrian data by demographic or user type and analyzing potential temporal variations to capture more detailed mobility patterns. Considering these suggestions, a more comprehensive framework that supports data-driven urban planning and more effective business strategies can be developed.