In the subjective questionnaire, the subjects indicated that having positive emotions, higher comfort levels, and more focused attention in the VR environment experiment ensured the quality of the data collected.
3.1. Data from and Results of Experiment 1
The visualization results of the representative pictures are shown in
Table 5. The G1 viewpoints were mainly concentrated in the area of the city tower and the daughter wall. The G2 viewpoints focused on the plaque words and daughter wall areas. The G3 viewpoints focused on the Great Wall and projection areas. The visual range of the G4 viewpoints was the most concentrated, with the focus on the central area of the gate tower. The G5 viewpoints focused on the word and symbol areas.
In general, the subjects’ visual ranges were concentrated, with the preference being looking at word information, building form and structure, and architectural features.
To avoid viewing habits affecting the experiments’ data and results, the different styles of transliteration symbol for the five groups of questions were randomly ordered [
50]. Higher AOI FC and AOI TFD values represented a greater or more difficult amount of information in this region and a longer time required for subjects to comprehend the content [
51]. The eye-tracking data from the subjects during symbol selection (see
Table 6) showed that the highest overall transcription elements for AOI MFD (0.218 s), AOI FC (22.465%), and AOI TFD (23.941%) were line and face combinations, indicating that this style of symbol attracted sufficient attention and substantial interest. For the error style transliteration symbols, AOI FF (0.216 s) was relatively long, but AOI FC (10.699%) and AOI TFD (10.618%) were short, indicating that subjects had more difficulty interpreting the information conveyed by this style during their observation and were reluctant to spend too much time on it.
According to the results of the AOI and the final selection of symbols (see
Figure 5), most subjects first fixated on the line and surface combination symbols (171 visits, accounting for 37% of the total). Most also selected the line and surface combination symbols (145 visits, accounting for 31% of the total). In summary, the current line and surface combinations agreed with the visual perception and aesthetic preference of the subjects, conveying the image information accurately; thus, this type was the most suitable for symbols.
3.2. Data from and Results of Experiment 2
The three scenes in Experiment 2 were all from the Taizichen railway station: Scene 1 (S1), the transfer lane inside; Scene 2 (S2), the waiting hall inside; and Scene 3 (S3), the landscape pavilion outside. The visualization results for the subjects in terms of scene memory are shown in
Table 7. The visual range of S1 was more concentrated. The fixation points were mainly distributed in a “+” shape, with the center of the picture serving as the midpoint. The visual range of S2 was more dispersed than that of S1. The distribution of viewpoints in the horizontal direction was greater than in the vertical direction. The visual range of S3 was more dispersed. The fixation points were mainly distributed horizontally in a “−” shape.
Combined with human visual characteristics, the horizontal field of view tends to be much greater than the vertical, and the eyes move faster and grow less fatigued in the horizontal rather than the vertical direction [
52]. Thus, given the limited time, subjects in the present research tended to prioritize their information search efforts in the horizontal direction, followed by the vertical direction.
Further viewed in conjunction with the FC and SC indicators (see
Table 8), the FCs in S1 were approximately twice that of S2 and S3, and the SCs were approximately 88% that of S2 and S3. This indicated that the subjects had more difficulty with searching for target information in S2 and S3, due to challenges with finding the target and the need to scan the scenes extensively; this was likely attributable to the complexity of the spatial structure.
The cognitive task data for the subjects showed that the overall cognitive correctness of the three scenarios was ranked as “S1 (81.104%) > S2 (64.566%) > S3 (49.566%).” In S1, the significance of the five digits was high and similar, and the cognitive correctness was ranked from highest to lowest as “5 > 2 > 3 > 1 = 4.” The number 5 was slightly significant (84.87%); In S2, the digits began to show more significant differences, ranked as “4 > 2 > 3 > 1 > 5,” and the number 4 was the most significant (76.09%); In S3, the numerical significance was the most prominent, with a ranking of “1 > 3 > 2 > 5 > 4.” The number 1 was the most significant (77.17%) (see
Figure 6). With the same observation times, subjects were able to complete their searches quickly and obtain more information about the scenes in S1.
In summary, the higher the cognitive correctness, the greater was the significance. The visual saliency was greater in spaces with simple environmental structures and reduced in spaces in which the environmental structures were more complex.
3.3. Data from and Results of Experiment 3
According to the related literature on the layout of symbol systems [
53] as applied to the Taizicheng railway station (in terms of location, size, and number of signs), we concluded that the information related to image type was mostly arranged on the space facades as propaganda. Pattern types were mostly arranged along passenger travel routes as guidance signs, and word types were mostly used for crowd diversion in the middle and upper regions of the space. Color decorations were primarily applied to create atmosphere in the station, and sculptures were most often used to illustrate regional historical heritage.
As a result, the translational elements (images, patterns, words, colors, and sculptures) and the spatial layout of each scene in this experiment were marked with the corresponding AOI (see
Figure 7).
The heat maps (see
Table 9) showed that in S1, the optimized spatial fixation concentration area was significantly increased as compared to the unoptimized area, and basically focused on the optimized area. In S2, the optimized fixation concentration area was increased in the vertical direction as compared to the unoptimized area, and basically focused on the optimized area (except for the dome). In S3, the focus of fixation in the optimized area was more horizontal than in the unoptimized area, and primarily focused on the optimized area.
The track maps (see
Table 10) showed that in S1, the spatially optimized fixation points were more concentrated and the saccadic distance was shorter than in the unoptimized area, especially the ceiling. In S2, the spatially optimized fixation points were more concentrated on both sides of the optimized rather than the unoptimized area, making the saccadic distance longer and causing the information in the middle section (i.e., the area from the center to both sides) to be more obviously neglected. In S3, the optimized design significantly reduced the fixation points in the unoptimized ground area, substantially increased the fixation points in the optimized ceiling area, and markedly shortened the saccadic distance of the optimized scene.
Overall, the spatial optimization of the scene began to shift the focus of the subjects’ viewing from being scattered before optimization to being concentrated and focused afterwards.
The analysis was performed in combination with a scene eye movement index. The data were judged in extreme cases, and extreme values at both ends that affected the data stability were removed using the truncated mean method, yielding an average index that more truly reflected the data situation [
54,
55,
56]. According to the data (see
Table 11), S1 showed an increase of approximately 40% in the number of FCs and a decrease of approximately 3% in the number of SCs, as compared to the unoptimized scenario. S2 showed an increase of approximately 9% in the number of FCs and a decrease of approximately 2% in the number of SCs, as compared to the unoptimized scenario. S3 showed an increase of approximately 12% in the number of FCs and a decrease of approximately 1% in the number of SCs, as compared to the unoptimized scenario.
Overall, the number of FCs increased and SCs decreased after (as compared to before) spatial optimization, and more saccadic time was converted into fixation time. This indicated that the subjects’ attention was more focused and they paid greater visual attention in the spatially optimized scenes, which together represented the most obvious phenomenon in S1.
The duration of fixation not only indicated the attractiveness of the target, but also the difference in cognitive load [
57]. As shown in
Table 12, the significance ranking of the transliteration element in S1 was “Images > Words > Sculptures > Patterns > Colors.” In S2, the significance of the transliteration element was “Patterns > Colors > Words > Images > Sculptures.” In S3, the significance of the transliteration element was “Colors > Images > Words > Patterns > Sculptures”.
In summary, the images had a high visual significance in S1 and a low visual significance in S2. Patterns had a high visual significance in S2 and a low visual significance in S3. Words were more significant in each scene, with the most significance being in S1. Colors had a high significance in scenes with complex environment spaces and a low significance in scenes in which the environment space was simple. Sculptures were more significant in scenes with a simple environment space and very low in scenes in which the environment space was complex. It is worth noting that the average duration of the fixation points in S1 was longer for sculptures, indicating that the transliteration of sculptures required a certain cognitive load and offered more readability and reflectivity. In S2, words were more important. In S3, sculptures were more important to people in scenes with more external influence and lower visual salience.