1. Introduction
Rail transit space refers to the space within a certain range around the rail transit exit area, which is generally defined as a circle centered on the station and radiating outward in the range of 500–800 m. In the field of transportation, scholars often use the concepts of TOD community, passenger attraction range, and rail transit station radiation area to elaborate [
1,
2,
3]. In the context of the rapid construction of national rail transit, global scholars continue to pay attention to the interaction between the built environment and urban rail transit [
4]. The problems of noise and visual interference of rail transit affect the psychological feelings of urban pedestrians [
5,
6,
7]. Noise generated by rail operations, including wheel–rail friction and train vibrations, can lead to auditory discomfort and stress, while the visual impact of elevated tracks and station structures often disrupts the aesthetic continuity of urban landscapes. These issues not only diminish the quality of life for residents but also hinder the creation of pedestrian-friendly environments. While existing research has extensively studied the functional and operational aspects of rail transit spaces [
3,
8,
9], a critical gap remains in understanding how the audiovisual environment of these spaces influences pedestrians’ psychological perception. This gap is particularly relevant given the rapid global expansion of urban rail systems and their growing role as public space hubs.
The delineation of zones helps to target the use of different design and management strategies at locations at different distances from the site. Current research on spatial delineation of rail transit station areas has developed multiple zoning approaches, yet significant variations persist in their theoretical foundations and applications. TOD theory suggests that the vicinity of metro stations should be organized with these stations as the central area, forming a layered structure in space that is within walking distance of 400~800 m, or walking time of 5~10 min [
4]. Alternative frameworks include pedestrian-shed analysis based on walking time thresholds (5/10/15-min isochrones) [
2,
3,
10] and morphological zoning tied to land use gradients (core–transition–peripheral layers) [
3]. Recent advancements have introduced behavioral-based segmentation using mobile phone data to map activity intensity patterns, while others employ space syntax to identify cognitive boundaries through visual graph analysis [
11]. However, these methods predominantly focus on either physical accessibility or economic functionality, with limited consideration of multisensory perception.
Research on the pedestrian environment around rail transit stations reveals a consensus that creating a pedestrian-friendly atmosphere does not only depend on ensuring the accessibility of the road network [
12,
13], but also on improving the level of residents’ psychological perception through planning and layout maneuvers [
3]. However, existing assessment frameworks exhibit two critical shortcomings: (1) overreliance on static environmental indicators such as sound pressure level (SPL) without capturing dynamic human–environment interactions [
13,
14]; (2) at the level of audiovisual interaction, the visual coherence between station buildings and their surroundings and their coupling with noise perception have not been fully analyzed yet.
The design of audiovisual environment in urban rail transit space directly affects the physical and mental health and experience of passengers [
15]. Current research focuses on the operational efficiency of rail transit, station layout, etc., and lacks in-depth exploration of the impact of the environment of rail transit station space on pedestrians’ psychological feelings. In terms of acoustic environment, the current research mainly focuses on vehicle running noise [
16], wheel–rail friction noise [
17], and aerodynamic noise, focusing on physical noise reduction, with less investigation on psychological feeling. In terms of technology application, new environmental control technologies such as intelligent noise reduction technology, visual optimization design, etc., have problems such as high cost and difficult maintenance in practical application. The application of these environmental control technologies also needs to consider the coordination with the overall urban environment.
People’s perception of the urban environment comes from a combination of factors [
18], among which the auditory and visual environments are two important aspects [
19,
20], and there is a coupling between the roles of vision and hearing on psychological perception [
21,
22]. Regarding the visual environment, the building volumes of aboveground and elevated stations often obscure the urban landscape and affect visual permeability. Residents around the aboveground stations reflect that the station buildings cause restricted views, affecting the continuity of the urban landscape. In addition, the harmonization and integrity of station buildings with the surrounding environment also affects the visual quality. However, there are fewer studies on these visual interference factors [
4].
In summary, current studies primarily focus on quantitative metrics such as accessibility, flow efficiency, and noise levels, often overlooking the human-centered experience of these spaces. Three key limitations persist in the literature: (1) zoning approaches remain rigidly distance-based (e.g., 500 m buffers), failing to account for variations in perceptual sensitivity across different spatial contexts; (2) most assessments rely on physical measurements or theoretical models, lacking empirical integration of environmental psychology and behavioral responses; (3) the psychological perception of rail transit environments is typically examined through isolated factors (e.g., noise or visual obstructions), with little consideration of how auditory and visual elements interact to shape pedestrian experience.
This study aims to address these gaps by establishing a perceptual-centric framework for evaluating and designing rail transit spaces. Specifically, we seek to: (1) redefine zoning boundaries based on psychological response; (2) develop an integrated classification system for audiovisual environmental factors; and (3) quantify interaction effects between auditory and visual stimuli on pedestrian perception. By bridging the divide between physical design and psychological experience, our findings aim to inform more humane and perceptually optimized transit environments. Building on this foundation, our study unfolds across four key sections. We begin by detailing our field methodology in
Section 2, where we capture real-world audiovisual data through systematic measurements and employ deep learning to extract essential environmental indicators.
Section 3 then reveals how these measurements translate into meaningful patterns, answering our core questions about perceptual zoning, environmental classification, and their combined effects on pedestrian experience. Finally,
Section 4 bridges research and practice, transforming these findings into design strategies while thoughtfully considering the study’s boundaries and future possibilities.
2. Materials and Methods
2.1. Survey Site
The selected rail transit line is located in Suzhou City, Jiangsu Province, China, where the rapid development of urban rail transit has triggered corresponding noise problems. This design focuses on the elevated aboveground station on Yangcheng Lake Middle Road of Suzhou Rail Transit Line 2, and the main types of functional areas around the rail transit station are green areas, residential areas, and commercial areas. With the rail station as the center, a grid of 30 m × 30 m was drawn, proved optimal for capturing fine-grained spatial variations in soundscape perception while maintaining measurement efficiency, as this resolution adequately resolves the 15–20 dB(A) noise attenuation gradients observed around elevated rail structures [
23] and aligns with the 25–35 m visual recognition thresholds for architectural elements in transit environments [
24,
25], and the equivalent continuous A sound level was measured at each grid intersection for 3 min, and a soundwalk study was conducted, where photographs were taken at these points, and 3 min long audio recordings were made when a train was passing by and when no train was passing by, which were used to reproduce the audio-visual environment in the laboratory, to obtain the subjective evaluation of the subjects, as well as to analyze the visual and auditory environmental indicators. The above was carried out in the morning from 9:00 to 10:00 and repeated at night from 20:00 to 21:00, when the traffic was low.
2.2. Panoramic Video
Panoramic video, binaural audio, and sound pressure level were measured simultaneously with the instrumentation arrangement shown in
Figure 1a. The panoramic video was shot simultaneously with an Insta360 panoramic camera (Manufactured in Shenzhen, China) at the corresponding time and point, which should ensure that the angle of view is parallel to the railroad line, and the shooting range should be in accordance with the range of human vision, in which the horizontal field of vision is of 120°; see
Figure 1b. The line of vision is accurate only for a limited part of the area in which the eyes have to focus themselves on the particular object. In the vertical plane, the eye has the capacity to see 45 degrees upwards and 65 degrees downwards whenever necessary [
26]. The SQobold binaural recording system captured spatially accurate 3D audio samples at each measurement location. Its wide dynamic range (117 dB) and flat frequency response (10 Hz–20 kHz) preserved the spectral characteristics of rail noise essential for creating ecologically valid VR auditory stimuli, including directional cues and distance effects.
Table 1 details the key devices and their technical applications used for acoustic environment measurements and virtual reality (VR) simulations in this study, covering the entire tool chain from field data acquisition to laboratory simulation.
A total of 32 audiovisual segments were recorded, of which 12 were residential environments, 10 were commercial spaces, and 10 were green spaces. It should be noted that these spatial-functional categorizations represent only the functions of the sites around the points, and do not represent the main composition of the visual or auditory material.
Audio-video processing: Each recording with a segment of rail transit passing was intercepted for about 30 s, and a corresponding recording of the same time period at that point without rail transit passing was intercepted for the same length, and the recording and video were synthesized into a single video file in video editing software Adobe Audition 2023. The videos were recorded during the daytime, and the nighttime recordings were mainly used as a control group when there were fewer vehicles.
2.3. Acoustic Measurement
Sound pressure levels were measured using a BSWA 801 (test accuracy: 0.1 dBA) multifunction sound level meter, which was calibrated using a calibrator before the test. In accordance with the outdoor sound environment test standard (ISO 10847:1997) [
27], this test was conducted on a sunny day with wind speed lower than 2 m/s, the test height of the sound level meter was 1.5 m, and the trial time of each test point was 2 min. Prior to each measurement session, the BSWA CA111 acoustic calibrator was used to verify the sound level meter’s performance by generating a reference 94 dB tone at 1 kHz. This calibration process ensured measurement consistency with ±0.3 dB tolerance, maintaining data reliability throughout the field study under varying environmental conditions. The equipment used for testing is shown in
Table 1. The two-minute equivalent continuous A-weighted sound level at each point is shown in
Figure 1f, with a sound pressure level distribution between 65.4 and 78.5 dB(A), where the sound pressure level is higher close to the railroad station and the main road on which it is located, and decreases gradually with the increase in distance towards the surrounding area [
28].
2.4. Questionnaires
The experiment employed a systematically designed questionnaire comprising four components: (1) an audiovisual perception questionnaire, (2) an acoustic environment evaluation, (3) a visual environment evaluation, and (4) an EmojiGrid for holistic station assessment. The development process rigorously followed standardized protocols from ISO 12913-3 for soundscape evaluation [
29] and incorporated validated visual assessment frameworks from urban design studies [
30,
31].
Table 2 outlines the questionnaire components, specific questions, response scales, and their research objectives.
The audiovisual perception questionnaire for rail stations was based on an 11-point scale, in which the subjects were asked to what extent they perceived rail and rail noise to be dominant in that environment, with 0 representing not seeing rail at all or hearing rail noise, and 10 representing rail or its noise being dominant.
The evaluation indexes of the soundscape evaluation questionnaire refer to the perceptual and semantic dimensions of the ISO 12913-3 soundscape evaluation, i.e., the auditory perception questions include: acoustic comfort, using a 5-point scale, with 1 being extremely uncomfortable and 5 being extremely comfortable; perceived loudness, using a 5-point scale, with 1 being extremely quiet and 5 being “deafening”; and semantic evaluation, with three semantic dimensions, i.e., “eventfulness–non-eventfulness”, “pleasantness–annoyance”, “and vitality–boringness”. The semantic evaluation and the sound descriptors, i.e., “eventfulness–non-eventfulness”, “loudness–quietness”, “pleasantness-annoyance”, and “vitality–boringness”, were evaluated in four semantic dimensions. The above dimensions were evaluated on a 7-point scale, where −3 represents the most compatible with the left description, 0 represents neutral, and 3 represents the most compatible with the right description.
The visual evaluation includes overall quality (including “comfortable” and “beautiful”), spatial impression (including “open” and “depressing”), and richness (including “wealthy” and “boring”), respectively, were derived. A 5-point scale was used, with 1 being not at all consistent with the description and 5 being fully consistent with the description.
The overall evaluation of the audiovisual environment was used as a self-report tool to assess the valence and arousal of the subject. The EmojiGrid (
Figure 2) is a visual tool designed to assess emotions by using a coordinate system, which replaces the verbal labels with emoji that depict facial expressions [
32]. The advantage of this kind of evaluation is that it can avoid experimental errors caused by subjects’ biased understanding of the evaluation words [
33,
35]. In addition, since the semantic dimensions of the visual and auditory environments have already been evaluated in the experiment, evaluating the overall environment in a similar way is susceptible to interference from the evaluations that have already been performed, and therefore a differentiated evaluation method was chosen to minimize the interference. The EmojiGrid consists of two dimensions, namely, valence and arousal, which are two dimensions of human psychological feelings that are generally recognized in psychology, and in this experiment, in order to make it easier for the subjects with a non-psychological background to comprehend the EmojiGrid, we used the same system for the evaluation of the visual environment. In this experiment, in order to make it easier for non-psychological subjects to understand, “valence” was replaced by “pleasantness”.
2.5. VR Experiment
The participants were asked to complete an audiovisual perception experiment in the VR lab using HTC Vive Pro Eye VR systems with professional audio equipment. The lab configuration of VR simulations and a picture of a participant during VR simulation are shown in
Figure 3a and
Figure 3b respectively. A total of 42 participants evaluated the audiovisual environments under standardized lighting (5000 K, 300 lux) and background noise conditions (≤25 dB(A)) [
34], with the sample size determined through power analysis to ensure adequate statistical validity (power = 0.82 for η
2 ≥ 0.15) [
36]. A person’s age, gender, and education can influence audiovisual perception [
37,
38]. To ensure a representative sample for the VR-based audiovisual perception experiment, 42 participants (21 male, 21 female) were recruited through stratified sampling across age groups (18–25, 26–35, 36–45 years) and educational backgrounds (high school, undergraduate, postgraduate). All participants were residents living within 2 km of elevated rail transit stations ensuring familiarity with the study context. Prior to selection, candidates were screened to ensure normal or corrected-to-normal vision and hearing (Snellen chart <20/40 [
39], pure-tone audiometry ≤25 dB HL [
40]).
AKG K712 Monitoring Headphones were used to reproduce binaural recordings with exceptional clarity (10 Hz–39.8 kHz frequency response) during VR experiments. The audiovisual environments at different distances from 32 rail transit stations were recreated in VR, with each video lasting 3 min, and subjects scanned the QR code to fill out the audiovisual perception questionnaire and mark the pleasure and arousal perceptions of the overall audiovisual environments on the EmojiGrid provided by the experimenter. Depending on the position of the markers in the coordinates, values for both dimensions can be output, taking values between −1 and 1. The equipment used for testing is shown in
Table 1.
2.6. Extraction of Audiovisual Environment Indicators
In the extraction of audiovisual environment indicators, a multimodal research approach is used. The methods used for indicator extraction are shown in
Figure 4. For street view images, a Faster R-CNN (region-based convolutional neural network) model is used to extract key indicators, identify and classify relevant elements in the cityscape. To analyze the spatial context, PSPNet (Pyramid Scene Parsing Network) [
41] is used for semantic segmentation to delineate different objects and regions in the image. For planimetry, distance measurement is performed using a Feedforward Neural Network (FNN) and semantic segmentation is performed using PSPNet. The distance is analyzed using a feedforward neural network model that includes two hidden layers and an output layer. The input features include the latitude and longitude of the point and the latitude and longitude of the station, and the output is the distance from the point to the station. The mean square error is used as the loss function and the model is trained using the Adam optimizer.
For audio data processing, the recordings are classified by a convolutional neural network (CNN) classification model to identify and classify noise sources. Since the study addresses the rail transit space, the training dataset contains images with rail transit stations and recordings containing rail transit noise. This approach extracted audiovisual environment indicators and analyzed the audiovisual dynamics within the rail transit space.
The visual indicators used for the analysis are shown in
Table 3, which includes subjective evaluation of the subjects (taking values from −3 to 3), semantic segmentation (taking values from 0 to 1), and object detection, taking values from 0 to 1. The auditory indicators used for the analysis are shown in
Table 4, which includes subjective evaluation of the subjects (taking values from −3 to 3), audio classification indicators, taking values from 0 to 1, and sound pressure levels, taking values from 0 to 1. The sound level indicators used for the analysis are shown in
Table 4, which includes Overall_Leq, which takes values from 0 to 1, Lmin, and Lmax.
4. Discussion
4.1. Comparative Analysis and Methodological Advancements
The advantages of this approach are threefold. First, it resolves the limitations of rigid distance-based zoning by identifying perceptually sensitive zones (0–50 m, 50–150 m, 150–300 m) grounded in empirical data. Second, it surpasses purely physical noise reduction strategies [
19,
20] by quantifying how visual design (e.g., VT2’s positive artificial environments) mitigates auditory discomfort—a linkage underexplored in prior work [
25]. Third, the use of VR-controlled experiments and cluster analysis overcomes the shortcomings of theoretical models [
12] by directly correlating environmental features with human perception. Our methodology integrates environmental psychology principles with behavioral surveys, addressing a critical gap in existing research that either isolates sensory factors [
23,
24] or prioritizes infrastructural efficiency over experiential quality [
48].
By identifying perceptually sensitive zones (0–50 m, 50–150 m, 150–300 m) based on empirical data, this approach overcomes the limitations of rigid distance-based zoning. It also goes beyond purely physical noise reduction strategies [
19,
20] by quantifying how visual design (e.g., VT2’s positive artificial environments) mitigates auditory discomfort—a linkage that has been underexplored in previous work [
26]. Finally, the use of VR-controlled experiments and cluster analysis overcomes the shortcomings of theoretical models [
12,
49] by directly correlating environmental features with human perception.
These findings call for a paradigm change in urban transportation design. Design standards [
18] should give integrated audiovisual solutions (such as beautiful sound barriers with flora) precedence over individual noise-control measures, as seen by the proven link between visual harmony and noise tolerance in Zones 1–2. Furthermore, the perceptual zoning concept gives planners evidence-based cutoff points for effective resource allocation: landscape-based improvements in transitional areas (50–150 m) and strict interventions in high-impact zones (0–50 m). Rail transit areas can be transformed from functional hubs to psychologically optimal public worlds by policymakers because to this study’s ability to bridge the gap between technical measures and human experience [
14].
4.2. Design Strategies
The effect of the type of auditory environment on visual indicators is smaller than the effect of the visual environment on auditory indicators, which is somewhat different from the research on other urban public spaces [
50]. As a special type of urban open space, rail transit space has the need to adopt targeted methods and strategies in design and research.
First of all, the noise generated by train operations (e.g., wheel–rail noise, whistles, etc.) is much higher in the rail space than in other urban open spaces (e.g., parks, squares). This noise is persistent, high-frequency, and high-intensity, and has a significant negative impact on the auditory comfort of pedestrians. The noise problem is especially prominent at Zone 1 (within 50 m). Second, elements such as elevated tracks, train operations, and track facilities (e.g., power lines, signaling equipment) cause significant interference with the visual environment, and this visual interference may trigger a sense of depression or insecurity for pedestrians. This is especially true within Zone 1 and Zone 2. Again, the rail transit space is highly dynamic, with train operations, pedestrian density, and environmental noise changing over time. In addition, the rail space is usually closely intertwined with other functional spaces in the city (e.g., commercial and residential areas), which increases the complexity of the environment. Dynamic design strategies are needed, such as combining real-time noise monitoring and adaptive sound barrier technologies, as well as considering the needs of different time periods (e.g., the difference between morning and evening peaks and flat peaks). Finally, rail space is both a functional transportation space and an urban public space. It needs to meet the dual needs of transportation efficiency (e.g., train operation, passenger gathering and dispersal) and public space comfort (e.g., pedestrian walking, leisure).
Zone division provides a clear reference basis for the design of rail transit space. For urban planners, the zone division framework offers a systematic approach to optimize rail transit space design. The distinct characteristics of each zone—from the high-impact Zone 1 (0–50 m) requiring intensive noise and visual mitigation, to the transitional Zone 2 (50–150 m) needing visual buffering, and the peripheral Zone 3 (150–300 m) focusing on aesthetic integration—provide clear spatial parameters for land use planning. Planners can utilize this zoning system to strategically allocate functions around stations, ensuring that high-density developments align with the acoustic and visual requirements of each zone while maintaining pedestrian comfort and urban design coherence.
Policymakers can leverage these zoning insights to develop more nuanced urban design guidelines and regulations. The framework suggests implementing tiered environmental standards, with Zone 1 requiring stringent noise control measures like mandatory sound barriers and restricted sensitive land uses, Zone 2 benefiting from landscape-based solutions such as minimum greening requirements, and Zone 3 focusing on visual harmony provisions. This graduated approach enables more targeted policy interventions that balance infrastructure needs with quality-of-life considerations, while providing measurable benchmarks for evaluating station area developments.
For engineers and designers, the zone-specific findings translate into practical technical solutions. In Zone 1, this means developing integrated systems combining advanced noise insulation materials (such as composite acoustic panels) with visually appealing treatments like artistic baffles or living walls. Zone 2 solutions might involve engineered landscape elements—water features with sound-masking properties, strategically planted vegetation belts for visual screening, and wayfinding elements that subtly redirect attention. Zone 3 implementations could focus on seamless transitions between station areas and surrounding neighborhoods using natural topography and view corridors. This zoned approach ensures technical solutions are appropriately scaled to their perceptual impact areas, optimizing both resource allocation and user experience.
4.3. Limitations
The limitations of this study are mainly in the following aspects: first, there may be differences in visual details, sound realism, and dynamic changes of the environment between the VR environment simulation and the real environment, and additional experiments can be conducted in the real environment to verify the accuracy of the results in further studies. Second, the study did not assess the effects of long-term exposure to the spatial environment of rail transit, and conducting long-term follow-up studies could help to reveal the lasting effects of the environment on pedestrians’ psychological perception. In addition, the study focused on audiovisual perception and did not cover the effects of other senses, such as touch and smell, on psychological perception, and the introduction of multisensory interaction studies could provide a more comprehensive understanding of the integrated effects of the environment. Meanwhile, the study did not fully consider dynamic environmental factors (e.g., weather, time change, crowd density, etc.), and the inclusion of these factors could enhance the realistic applicability of the study. Finally, the study may be limited by specific cultural or social contexts (e.g., the urban environment in China), and the generalizability of its conclusions has yet to be verified. Comparative studies in different cultural and social contexts are needed to further verify the broad applicability of the conclusions. By addressing these limitations, the scientific and practical value of the study can be more comprehensively enhanced.
5. Conclusions
The audiovisual environment of rail transit station space affects the psychological feeling of pedestrians, but the current research focuses on the operational efficiency and physical noise reduction, and the psychological feeling is not explored enough. Visual interference, application of sound environment control technology, and lagging engineering design standards are the main problems. It is necessary to combine multidisciplinary approaches to optimize the design of audiovisual environments, enhance pedestrian friendliness and public space quality, and promote the construction of healthy cities. Therefore, this study aims to explore the role of visual and auditory environments of rail transit spaces on the psychological feelings of pedestrians from the psychological and behavioral perspectives.
Firstly, through cluster analysis, the environment is divided into zones and categories according to the visual and auditory perception and evaluation of rail transit stations, and the interactive effects of audiovisual environmental factors on psychological perception within different zones are explored. The results show that according to the audiovisual perception, the space within 300 m from the rail transit station can be divided into three zones, as well as four space types with different audiovisual perceptions. The results can be summarized from the ANOVA test: the effect of visual environment on pleasantness and arousal varies with the zone distance. In Zones 1 and 2, the visual environment had a significant effect on pleasantness, and railroad noise significantly reduced pleasantness; in Zone 2, the visual environment also significantly affected arousal. In Zone 3, only the positive visual environment had a moderate effect on pleasantness. Overall, the audiovisual interaction was significant at close range (Zones 1 and 2), and the visual environment at far range (Zone 3) indirectly enhanced pleasantness mainly through psychological feelings. The effect of the type of auditory environment on visual indicators was smaller than the effect of the visual environment on auditory indicators, and the category of vision had the greatest effect on subjective indicators of hearing within Zones 1 and 2.
This study reveals the critical role of audiovisual environments in shaping pedestrian psychological perception around rail transit stations. The zone-based framework provides actionable guidance for stakeholders: urban planners can strategically allocate land uses according to each zone’s acoustic and visual requirements (0–50 m for intensive mitigation, 50–150 m for transitional buffering, 150–300 m for aesthetic integration); policymakers can establish tiered regulations with strict noise controls in Zone 1 and landscape-based solutions in Zone 2; while engineers can implement zone-specific technical solutions ranging from advanced sound barriers in Zone 1 to natural topography treatments in Zone 3, ensuring optimal resource allocation and enhanced user experience throughout the station area.
The findings offer valuable applications across multiple domains. For public health, the demonstrated link between optimized audiovisual environments and reduced stress responses suggests station-area design could be incorporated into urban mental health initiatives, particularly near sensitive facilities like hospitals. In urban noise management, zone-specific impact data enable more precise interventions, from acoustic engineering solutions in high-noise zones to psychoacoustic approaches in transitional areas. Regarding pedestrian comfort, the visual dominance thresholds identified can guide wayfinding systems and amenity placement to minimize cognitive load while maximizing restorative qualities of station environments. These applications collectively contribute to creating transit spaces that support both functional mobility and psychological wellbeing.
Future research directions will focus on three key expansions of this work. First, the zoning framework will be validated and adapted across diverse rail transit environments, including underground stations and at-grade crossings, to test its universal applicability. Second, the Deep Learning models will be refined to incorporate dynamic predictors like real-time crowd flows and weather conditions, enhancing their precision for practical implementation. Third, building on the current audiovisual foundation, integrated multisensory assessment tools will be developed to account for tactile and olfactory dimensions of transit environments. These extensions, coupled with cross-cultural validation studies, will strengthen the model’s robustness while addressing the current limitations identified in VR simulation fidelity, long-term exposure effects, and contextual factors.