*2.2. Shannon's Method*

If a subject guesses a character given a string of length *n*, the answer will be correct or incorrect. In Shannon's setting and ours, the *prediction* of *Xn by a subject is accomplished by making multiple guesses*, one character at a time, until he/she reaches the correct answer. In other words, a prediction for character *Xn* in this setting consists of a series of guesses.

The number of guesses required to reach the correct answer reflects the predictability of that character and should relate to the probability of that character *Xn* appearing after *X<sup>n</sup>*−<sup>1</sup> 1 . Let *qni* denote the probability that a subject requires *i guesses* in a prediction to find the correct letter following a block of length *n* − 1.

Shannon deduced the following inequality [1]:

$$\sum\_{i=1}^{K} i(q\_i^n - q\_{i+1}^n) \log i \le F\_n \le -\sum\_{i=1}^{K} q\_i^n \log q\_i^n. \tag{5}$$

Here, *K* is the number of characters in the set; in this work, *K* = 27, since the English alphabet consists of 26 letters and the space symbol. This setting corresponds to the settings used in previous works [9,11] using the cognitive approach to acquire the entropy rate in order for our results to be comparable with those reported in these works. Note that this lower bound is the lower bound of the upper bound of *h* and not the direct lower bound of *h*. For each context length *n*, the probability *qni* can be calculated for a set of samples.

In Shannon's original experiment, 100 phrases of length 100 were taken from *Jefferson the Virginian*, a biography of ex-US President Thomas Jefferson authored by Dumas Malone. In each experimental session, the subject (i.e., only his spouse, according to [11]) was asked to predict the next character given a block of length *n* − 1. She continued in this manner for *n* = 1, 2, ... , 15, and 100 for each phrase; consequently, Shannon acquired 16 observations for each phrase. He used 100 different phrases; therefore, he collected 16 × 100 = 1600 observations from his spouse in total. He then calculated *qni* for *n* = 1, 2, ... , 15, and 100, each based on 100 observations, and the upper and lower bounds of *h* were computed based on the leftmost and rightmost terms of the inequality (5), respectively. Shannon observed a decrease in the bounds with respect to *n* and obtained an upper bound of *h* = 1.3 bpc for *n* = 100.

Moradi et al. [11] conducted Shannon's experiment under two different settings. In the first experiment, they used 100 phrases of length *n* = 64 from *Scruples II*, a romance novel authored by Judith Krantz. In the first setting, a single subject participated, and they calculated the upper bounds from *n* = 1 to *n* = 64 based on 100 observations. They reported that the entropy rate reached *h* ≈ 1.6 bpc at *n* = 32 and that larger values of *n* did not contribute to decreasing the upper bound. In the second setting, the eight participants were given phrases extracted from four different books, and the values of the upper bound at *n* = 32 were reported, which ranged between *h* = 1.62 and *h* = 3.00 bpc.

Jamison and Jamison [9] used 50 and 40 phrases, both taken from some unspecified source, for each of two subjects, respectively. They conducted the experiment for *n* = 4, 8, 12, and 100 and obtained *h* = 1.63 and *h* = 1.67 bpc for the two subjects at *n* = 100 based on 50 and 40 phrase samples, respectively.

Note how the reported values deviate greatly from Shannon's *h* = 1.3 bpc. In all these experiments, since the number of subjects was small, the number of observations was limited, making the statistical validity questionable.

## *2.3. Cover King's Method*

While Shannon's method only considers the likelihood of the correct answer for each *Xn*, Cover and King wanted to collect the distribution for each *Xn*. Hence, instead of counting the number of guesses required, a subject was asked to assign a probability distribution to the *n*th character given the preceding string of length *n* − 1. Precisely, in Cover and King [10], a *prediction by a subject is the character distribution* of *Xn*.

They designed this experiment using a *gambling* framework, following their theory of information in gambling [13,14]. A subject assigned odds to every character which could be used for *Xn*; i.e., a probability distribution.

Cover and King [10] conducted two experiments separately. In the first experiment, phrases were extracted from *Jefferson the Virginian* for 12 subjects. The maximum length of a phrase was set as *n* = 75. The estimated value of the upper bound of *h* for the 12 subjects ranged between *h* = 1.29 bpc and *h* = 1.90 bpc. In the second experiment, phrases were taken from *Contact: The First Four Minutes* (a science book on psychology authored by Leonard M. Zunin); lengths of *n* = 220 were used, and two subjects participated. The estimated values of *h* produced by the two subjects were *h* = 1.26 bpc and *h* = 1.30 bpc.

We conducted Cover and King's experiment using the similar framework, as explained in detail in the following section. Compared with the experiment proposed by Shannon, however, their experiment demanded too much from each subject since he/she had to set the odds for all 27 characters every time. The majority of the subjects abandoned the experiment before completing the assignment, and it was difficult to collect a large number of reliable observations. Therefore, we could not utilize this method effectively and focused on Shannon's framework instead.

#### *2.4. Summary of the Scales Used in Previous Studies*

Comparison

 of the scales of cognitive

**Table 1.**

Table 1 summarizes the experimental settings of the previous reports [1,9–11]. We refer to the total number of observations as the sum of the count of the predictions made by the subjects for different phrases and context lengths. For example, in Shannon's case, the total number of observations was 1600, as one subject was asked to make predictions for 16 different context lengths (i.e., *n* = 1, 2, ... , 15, and 100) for each of 100 different phrases. The third and fourth columns in the table list the numbers of distinct subjects and phrases used in each study, respectively. Note that a phrase could be tested by multiple subjects or a subject could test multiple phrases, depending on the experimental setting.


experiments

 undertaken  in previous works for the


The fifth and sixth columns present the average maximum value of *n* obtained in one session and the mean number of observations per *n*, respectively, where *n* represents the offset of a character from the beginning of a phrase. Both of these values were fixed in the previous works.

#### **3. Cognitive Experiment Using Mechanical Turk**

#### *3.1. The Mechanical Turk Framework*

Our experimental framework was implemented through Amazon Mechanical Turk, a workplace service offered by *Amazon*. AMT puts up tasks called HITs (human intelligence tasks) and *workers* do them. AMT has been used previously as a research tool for conducting large-scale investigations that require human judgment, ranging from annotating image data [15,16], to collecting text and speech data [17,18], behavioral research [19], judging music and documents [20,21], and identifying complex patterns in brain activity [22].

With AMT, the experimenter is able to collect a large number of observations on a wide range of topics. Compared with standard in-laboratory studies, however, such an experiment is open to anonymous subjects, and thus, control is limited. For example, in our case, a subject could use any external information to predict the next character. In particular, we were unable to prohibit subjects from conducting a search for the *n* − 1 characters to obtain the answer for the next character. Furthermore, the English fluency of the subjects was unknown. Thus, the results should be examined from this perspective as well; see Section 5.2.

An experimental user interface based on Shannon's original proposal was developed. The most important requirement of the design was the adequacy of the task load since a subject could easily lose their concentration and abandon a prediction during the experiment. We designed the user interface to be as simple as possible so as to lessen the psychological demand on the subjects.
