1. Introduction
The word is the smallest unit of language that can be utilized independently [
1]. To comprehend sentences and larger textual structures, such as paragraphs and chapters, words must first be processed and recognized [
2]. Words in texts must be segmented to achieve word processing and recognition; this process is referred to as word segmentation [
3,
4]. Word segmentation plays a significant role in information and cognitive processing during reading [
5,
6], and visual word segmentation cues are crucial in the word segmentation process [
6,
7].
Previous studies have shown that in languages with within-word segmentation cues, removing or substituting the original visual segmentation cues in a text leads to an increase in lexical recognition time and a decrease in reading speed for readers, and adversely affects saccade target selection during reading, and causes the initial fixation position on words to shift from the preferred viewing location to the beginning of the word [
7,
8,
9,
10,
11,
12,
13,
14,
15]. Some studies have examined the effects of inserting artificial visual segmentation cues, such as spaces, into languages that do not have within word segmentation cues. In Thai texts, the practice of inserting spaces between words has been found to facilitate lexical recognition, though it has a detrimental effect on reading speed. It is noteworthy that this spacing does not influence saccade target selection during reading [
16]. Conversely, in Japanese texts that incorporate a mix of hiragana, katakana, and kanji scripts, the addition of spaces between words does not have any significant impact on reading speed, lexical recognition, or saccade target selection [
17]. When it comes to Chinese texts, the insertion of spaces between words does not interfere with the reading process [
18,
19]. However, it does have a positive effect on lexical recognition and aids in saccade target selection [
19,
20]. In summary, the effects of inserting spaces in text are inconsistent across different languages. This inconsistency may be due to inherent differences between languages.
Furthermore, some researchers believe that inserting spaces in languages that lack visual word segmentation cues changes the original spatial distribution of sentences, extends the physical length of the sentences, and reduces the processing efficiency of the reader’s parafoveal region. The addition of spaces causes the words near the fixated word to be farther from the fixation, which may result in the words near the fixation falling into a visual area with lower visual acuity, thereby reducing the benefits of the parafoveal preview [
21,
22,
23].
To eliminate the influence of the irrelevant factor on experimental results mentioned above, researchers in recent years have adopted color alternation markings as a visual word segmentation cue [
21,
22,
23,
24,
25,
26,
27]. Unlike spaces, color alternation markings do not extend the physical length of the entire sentence or change its spatial distribution and maintain parafoveal processing efficiency of readers [
21,
22,
23]. Researchers have examined the effect of color alternation markings as a visual word segmentation cue in reading. However, the results of studies using eye-tracking technology with university students who are native Chinese speakers to examine the effect of color alternation markings as a visual word segmentation cue in reading are inconsistent. Some studies have shown that color alternation markings between words have no effect on reading but aid in saccade target selection and facilitated lexical recognition under certain conditions [
23]. Other studies have indicated that color alternation markings between words hinder reading, do not facilitate lexical recognition, and do not affect saccade target selection [
19]. Among languages that inherently lack visual word segmentation cues, studies have examined the effect of color alternation markings as a visual word segmentation cue in reading, lexical recognition, and saccade target selection only in Chinese. Therefore, whether the results can be generalized to other languages that inherently lack visual word segmentation cues remains to be verified.
Tibetan is a phonetic script belonging to the Sino-Tibetan language family. The Tibetan writing system consists of thirty basic consonant letters and four basic vowel signs, which can be combined in various ways to represent different morphemes. In the Tibetan language, the morpheme “བོད” (meaning “Tibet” or “Tibetan”) consists of the consonant “བ” combined with the vowel sign “ོ” to form “བོ”, and the consonant “ད”. Tibetan text inherently possesses a unique visual morpheme segmentation cue in the form of tshegs (“་”). For example, in the Tibetan language, the word “བོད་ཡིག” (meaning “Tibetan script”) consists of “བོད”, which means “Tibet” or “Tibetan”, and “ཡིག”, which means “script” or “writing”. The tsheg between these two morphemes acts like a delimiter, indicating that they are separate morphemes, even though together they form a single unit of meaning. Although there are morpheme segmentation cues in the Tibetan language, there are no word segmentation cues in Tibetan [
28,
29,
30]. Furthermore, the impact of artificially inserting visual word segmentation cues on the reading of Tibetan texts is currently unknown.
Examining the effect of artificially inserting visual word segmentation cues in Tibetan reading can help address the theoretical issue of the basic information processing unit in Tibetan reading. In languages with inherent spaces between words, such as English, researchers generally consider the word a basic information processing unit [
31,
32,
33,
34,
35]. Conversely, in other languages, such as Tibetan and Chinese, there are no visual word segmentation cues aside from punctuation marks denoting semantic units and pauses [
36,
37,
38,
39,
40,
41,
42]. This lack of visual word segmentation cues has created controversy over the basic information processing unit in the reading process of these languages. Therefore, the basic information processing units in Tibetan reading remain unknown.
Wang et al. (2023) found that removing tshegs interferes with Tibetan reading [
43]. This indicates that native Tibetan speakers rely on visual morpheme segmentation cues provided by tshegs between morphemes to read words, suggesting that morphemes might be the basic information processing unit in Tibetan reading. Furthermore, since the discovery of the effect of word frequency on Chinese reading provides strong evidence that words are the basic information processing units in Chinese reading [
41], and Gao et al. (2020) and Li et al. (2022) identified a word frequency effect in Tibetan reading, we hypothesized that words may also be the basic information processing unit in Tibetan reading [
44,
45]. However, previous studies have not determined whether words or morphemes are the basic information processing unit in Tibetan reading.
Therefore, this study investigated the effect of visual word segmentation cues in Tibetan reading to explore whether these cues facilitate reading and lexical recognition, and aid in saccade target selection. Moreover, this study aims to identify the basic information processing unit in Tibetan reading.
Based on previous research that has investigated languages that do not have within-word segmentation cues [
16,
17,
18,
19,
20], this study proposes the following hypotheses:
- (1)
Interword spaces have no effect on Tibetan reading but facilitate lexical recognition and aid in saccade target selection. The existence of spaces between words does not affect reading metrics, including average fixation duration, average saccade amplitude, number of fixations, sentence reading time, number of forward saccades, and number of regressions. Conversely, interword spaces positively impact lexical recognition metrics, such as first fixation duration, gaze duration, total fixation duration, number of first-pass fixations, total number of fixations, and refixation probability. Additionally, interword spaces enhance the metric of saccade target selection, specifically the average initial fixation position.
- (2)
Color alternation markings have no effect on Tibetan reading but facilitate lexical recognition and aid in saccade target selection. The existence of spaces between words does not influence reading metrics, including average fixation duration, average saccade amplitude, number of fixations, sentence reading time, number of forward saccades, and number of regressions. Conversely, interword spaces positively impact lexical recognition metrics, such as first fixation duration, gaze duration, total fixation duration, number of first-pass fixations, total number of fixations, and refixation probability. Additionally, the presence of interword spaces enhances the measure of saccade target selection, specifically the average initial fixation position.
- (3)
Words are more likely to be the basic information processing unit in Tibetan reading than morphemes, and words possess greater psychological reality. In other words, readers demonstrate superior performance in the areas of reading, lexical recognition, and saccade target selection when exposed to the word spacing condition and the word boundary color alternation marking condition, in contrast to their performance under the morpheme spacing condition and morpheme boundary color alternation marking condition.
4. General Discussion
This study investigated whether visual word segmentation cues facilitate Tibetan reading, lexical recognition, and saccade target selection and whether the basic processing unit in Tibetan reading is the morpheme or the word. The results demonstrated that both interword spaces and color alternation markings facilitated reading and lexical recognition and that word spacing aided in the saccade target selection. Words were more likely to be basic processing units and had a greater psychological reality than morphemes.
4.1. The Effect of Interword Spaces in Tibetan Reading
Interword spaces facilitated Tibetan reading. This result differed from our experimental hypothesis and also differed from the results of previous studies, which found that interword spaces reduced reading speed in Thai and did not affect reading in Japanese and Chinese [
16,
17,
18,
19].
We propose two reasons for our finding. First, according to the hypothesis proposed by Bai et al. (2008), there was a trade-off between the interfering effect caused by readers’ familiarity with the text presentation and the promoting effect of visual word segmentation cues [
18], and we speculated that Tibetan university students may not be highly familiar with the presentation of Tibetan text; therefore, the interword spaces are unlikely to have a significant impact on their familiarity with Tibetan texts. An important factor that may lead Tibetan college students to be unfamiliar with Tibetan texts could be that Tibetan university students experience language attrition. Language attrition is the reverse process of language acquisition, referring to the phenomenon in which language users experience a gradual decline in their ability to use the language over time due to a reduction or cessation in the use of a certain language [
62,
63]. Minority students living in China not only have to master their native languages but also learn Mandarin and other foreign languages, making language attrition a very common phenomenon [
62]. In Tibet, Mandarin is used much more frequently than Tibetan, even in areas where Tibetan people live.
The objective reasons for native language attrition include the characteristics of the linguistic environment [
62]. The participants in this study were Tibetan university students whose first language was Tibetan and second language was Chinese, respectively, making them Tibetan–Chinese bilinguals. If the native and second languages are from the same language family, the probability of native language attrition is greater. Based on the linguistic characteristics of Tibetan and Chinese, researchers believe that Tibetan and Chinese belonging to the same language family (the Sino-Tibetan language family), so they are more likely to be confused [
62]. Confusion between the two languages is likely, leading to a high probability of language attrition among Tibetan–Chinese bilinguals. In addition, Tibetan university students are often immersed in a Chinese linguistic environment in their daily studies and life. Apart from students majoring in Tibetan, students in other majors rarely read books in the Tibetan language. Moreover, Tibetan people are divided into three major dialects: Weizang, Kang, and Amdo. Although all three dialects use the same script, their pronunciations differ vastly [
64,
65]. Therefore, many Tibetan university students from different dialectal backgrounds have to use Chinese for daily communication and reading.
These reasons may have led to a certain degree of language attrition among the participants, resulting in their insufficient experience in reading Tibetan and lower familiarity with the presentation of Tibetan texts. Consequently, the promoting effect of interword spaces on Tibetan reading outweighed the interfering effect caused by familiarity with the text presentation, leading to interword spaces facilitating Tibetan reading.
Second, the promoting effect of interword spaces on Tibetan reading may be due to the characteristics of the Tibetan text. Although Tibetan, Chinese, Japanese, and Thai texts do not have spaces as visual cues for word segmentation., Tibetan text is unique in that it contains an explicit visual word segmentation cue, known as tshegs, which is distinct from other languages. Existing research indicates that tshegs ensure the normal process of Tibetan reading, and that replacing tshegs with another visual word segmentation cue, such as a space, can facilitate Tibetan reading. This suggests that Tibetan university students rely on explicit visual segmentation cues (spaces and tshegs) to segment Tibetan texts [
43]. Thus, Tibetan university students may be more accustomed to using explicit visual segmentation cues to segment text than Chinese, Japanese, and Thai students, leading to the artificial addition of interword spaces facilitating Tibetan reading.
Regarding the local analysis, indicators on the temporal dimension revealed that interword spaces facilitated lexical recognition of Tibetan reading. Specifically, gaze duration and total fixation time were shorter, and the number of first-pass fixations and total number of fixations were lower under the word spacing condition than used the normal sentence condition. This result is consistent with those of studies on Chinese and Thai reading [
16,
18,
19], and supported the hypothesis that interword spaces facilitate lexical recognition.
In addition, the refixation probability was smaller, and the initial fixation position on words was closer to the word center under the word spacing condition than under the normal sentence condition. This indicated that the participants used word spacing for saccade target selection. Word spacing helped readers locate saccades in an optimal viewing position on a word, thereby facilitating lexical recognition. This result is consistent with the findings of Zang et al. (2013) and Ma et al. (2019) for Chinese reading [
19,
20]. This result also supported our hypothesis.
This result can be explained using the E-Z Reader model. The E-Z Reader model posits that readers can use low-spatial-frequency information during the pre-attentive visual processing stage (V stage) to guide saccade target selection [
66,
67], and that interword spaces provides low-spatial-frequency information that indicates word boundaries [
66]. Interword spaces brings the letters that comprise words spatially closer, allowing readers to group spatially close letters into words through a parafoveal preview in the Tibetan text. This provides word boundary information, which helps readers locate their saccades at the optimal viewing position. The process by which readers group through a parafoveal preview may be based on the principle of proximity of the Gestalt laws of organization [
68].
4.2. The Effect of Color Alternation Markings in Tibetan Reading
The interword spaces altered the spatial distribution of Tibetan sentences, elongated the physical length of the sentences, and reduced the processing efficiency of readers’ parafoveal vision. The insertion of spaces caused words near the fixation word to move away from the fixation, which may have resulted in words near the fixation falling into a visual area with lower visual acuity, thereby reducing the benefits of the parafoveal preview [
21,
22,
23]. To eliminate the influence of this irrelevant factor on the results, Experiment 2 adopted the color alternation marking method to further investigate the effect of visual word segmentation cues in Tibetan reading.
The average fixation duration and sentence reading time were shorter, and fewer fixations and forward saccades were present under the word boundary color alternation marking condition than the normal sentence condition. This indicated that word boundary color alternation marking facilitated Tibetan reading. Furthermore, visual word segmentation cues promoted the reading of Tibetan sentences when controlling for their spatial distribution and physical length without changing the processing efficiency of the reader’s parafovea.
This result differs from our experimental hypothesis and diverges from the results of past studies. Previous studies have found that word boundary color alternation marking did not affect Chinese reading [
23], whereas others have found that it impedes Chinese reading [
19].
The reason for this may relate to the two speculations discussed above. First, Tibetan university students may have experienced language attrition, and their familiarity with the presentation of Tibetan texts may be low. This would lead to the promoting effect of word boundary color alternation marking on Tibetan reading being greater than the interfering effect caused by familiarity with text presentation, resulting in word boundary color alternation marking facilitating Tibetan reading. Second, Tibetan university students may be more accustomed to using low-level visual segmentation cues for word segmentation during Tibetan reading than Chinese readers, thereby aiding in Tibetan reading.
Word boundary color alternation marking facilitated lexical recognition in Tibetan reading. This result is consistent with the findings of Pan et al. (2021) for Chinese reading and those of Zhou et al. (2018) among native speakers of Chinese [
23,
24]. This supported our hypothesis that color alternation markings facilitate local lexical recognition. However, the results differed from those of Ma et al. (2019), who found that color alternation markings did not facilitate Chinese lexical recognition [
19]. Nevertheless, as the global analysis of Experiment 2 showed that word boundary color alternation marking facilitated Tibetan reading, the finding in the local analysis that color alternation markings promoted lexical recognition is reasonable, indicating that color alternation markings facilitated Tibetan reading by promoting local lexical recognition.
Furthermore, color alternation markings reduced the probability of readers’ re-gazing. This was consistent with the results of Zhou et al. (2018) for Chinese reading [
23]. However, word boundary color alternation marking did not affect readers’ saccade target selection. This was consistent with the results of Ma et al. (2019) for Chinese reading but inconsistent with those of Zhou et al. (2018) and with our hypothesis [
19,
23]. This could be because Tibetan readers tended to use the principle of proximity of the Gestalt laws of organization when grouping adjacent letters into words through a parafoveal preview rather than the principle of similarity (letters of the same color making readers perceive them as a whole) [
68,
69].
4.3. The Basic Information Processing Units in Tibetan Reading
Tibetan texts contain tshegs. Wang et al. (2023) found that removing tshegs decreased the efficiency of Tibetan reading. This indicated that tshegs are an effective visual word segmentation cue that ensures the normal progression of Tibetan reading [
43]. Thus, Tibetan university students may be accustomed to relying on the visual word segmentation cue of tshegs for reading morpheme by morpheme, and morphemes may be the basic information processing unit in Tibetan reading.
Moreover, the attributes of words in Tibetan reading affected readers’ fixation duration. For instance, Tibetan readers spent significantly less time fixating on high- than low-frequency words [
44,
45]. Thus, words may be the basic information processing unit in Tibetan reading.
Therefore, the word spacing condition had fewer fixations, forward saccades, and shorter sentence reading time than the normal sentence condition. The word boundary color alternation marking condition had fewer fixations, forward saccades, and shorter sentence reading time than the normal sentence condition. The morpheme spacing condition had more fixations, forward saccades, regressions, and longer sentence reading time than the normal sentence condition. The morpheme color alternation marking condition had more fixations, forward saccades, regressions, and longer sentence reading time than the normal sentence condition. The word spacing condition had fewer fixations, forward saccades, regressions, and shorter sentence reading time than the morpheme spacing condition. The word boundary color alternation marking condition had fewer fixations, forward saccades, regressions, and shorter sentence reading time than the morpheme color alternation marking condition.
Thus, words were more likely than morphemes to be the basic unit of information processing in Tibetan reading. This result was in line with those of studies that considered words to be the basic unit of information processing in Chinese reading [
31,
32,
33,
34,
35]. However, the basic unit of information processing in Tibetan reading may also be influenced by various factors, which will be discussed in detail in the Limitations and Prospects section.
This study sheds light on the word segmentation mechanism in Tibetan reading, reveals the nature and rules of Tibetan reading activities from the perspectives of reading psychology and cognitive psychology, and provides basic data for the improvement of eye movement control models in alphabetic writing systems. Moreover, the findings provide practical guidance for exploring ways to improve the efficiency of Tibetan reading and a scientific basis for word segmentation in Tibetan information processing and modern Tibetan intelligent systems.
4.4. Limitations and Prospects
Yan et al. (2013) investigated the impact of interword spaces in Chinese texts on the reading of second-grade elementary students and found no significant difference in sentence reading time when second-grade students read normal sentences and those under character spacing and word spacing conditions [
70]. This result suggests that both characters and words may be the basic units of information processing in Chinese reading. Therefore, future research should investigate the basic units of information processing in Tibetan reading among elementary school students.
This study examined only four conditions, including normal sentences, word spacing, morpheme spacing, and non-word spacing. Words were more likely than morphemes to be the basic unit of information processing in Tibetan reading. Conversely, some studies suggest that chunks, psycholinguistic words, and prosodic words may be the basic units of information processing in Chinese reading [
71,
72,
73,
74,
75]. Whether these conclusions are also applicable to Tibetan has not yet been studied. Therefore, future research could set chunks, psycholinguistic words, and prosodic words as experimental conditions based on the original experimental conditions to further investigate the basic units of information processing in Tibetan reading.
This study found that adding spaces as low-level visual word segmentation cues to Tibetan texts helped readers segment words, thereby promoting reading. However, although Tibetan text does not have spaces as inherent low-level visual word segmentation cues, reading proceeds normally. Therefore, we speculate that Tibetan readers may use high-level linguistic word segmentation cues to segment words, thereby ensuring normal reading progress. Studies have found that character positional frequency information and word formation probabilities in Chinese reading are effective linguistic word segmentation cues [
76,
77]. Furthermore, character positional frequency information is an effective linguistic word segmentation cue in Thai reading [
78]. Therefore, future research should examine the effect of high-level linguistic word segmentation cues in Tibetan reading based on comprehensive corpus data, thereby revealing the word segmentation mechanism in Tibetan reading.