1. Introduction
Unanimously, in a large number of papers—some of which are recalled here [
1,
2,
3,
4,
5,
6,
7,
8] from the vast literature on the topic—scholars of English Literature state that J.R.R. Tolkien influenced C.S. Lewis’s writings. The purpose of the present paper is not to review the large wealth of literature based on the typical approach used by scholars of literature—which is not our specialty—but to investigate this issue mathematically and statistically—a study that has never been conducted before—by using recent methods devised by researching the impact of the surface deep language variables [
9,
10] and linguistic channels [
11] in literary texts. Since scholars mention the influence of George MacDonald on both, we consider some novels written by this earlier author. To set all these novels in the framework of English Literature, we consider some novels written by other earlier authors, such as C. Dickens and others.
After this introduction, in
Section 2, we introduce the literary texts (novels) considered. In
Section 3, we report the series of words, sentences and interpunctions versus chapters for some novels, and define an index useful to synthetically describe regularity due to what we think is a conscious design by authors. In
Section 4, we start exploring the four deep language variables; to avoid misunderstanding, these variables, and the linguistic channels derived from them, refer to the “surface” structure of texts, not to the “deep” structure mentioned in cognitive theory. In
Section 5, we report results concerning the extended short-term memory and a universal readability index; both topics address human short-term memory buffers. In
Section 6, we represent literary texts geometrically in the Cartesian plane by defining linear combinations of deep language variables and calculate the probability that a text can be confused with another. In
Section 7, we show the linear relationships existing between linguistic variables in the novels considered. In
Section 8, we report the theory of linguistic channels. In
Section 9, we apply it to the novels presently studied. Finally, in
Section 10, we summarize the main findings and conclude. Several Appendices report numerical data.
2. Database of Literary Texts (Novels)
Let us first introduce the database of literary texts used in the present paper.
Table 1 lists some basic statistics of the novels by Tolkien, Lewis and MacDonald. To set these texts in the framework of earlier English Literature, we consider novels by Charles Dickens (
Table 2) and other authors (
Table 3).
We have used the digital text of a novel (WinWord file) and counted, for each chapter, the number of characters, words, sentences and interpunctions (punctuation marks). Before doing so, we have deleted the titles, footnotes and other extraneous material present in the digital texts, a burdensome work. The count is very simple, although time-consuming. Winword directly provides the number of characters and words. The number of sentences was calculated by using WinWord to replace every full stop with a full stop: of course, this action does not change the text, but it gives the number of these substitutions and therefore the number of full stops. The same procedure was repeated for question marks and exclamation marks. The sum of the three totals gives the total number of sentences in the text analyzed. The same procedure gives the total number of commas, colons and semicolons. The sum of these latter values with the total number of sentences gives the total number of interpunctions.
Some homogeneity can be noted in novels of the same author. The stories in The Space Trilogy and The Chronicles of Narnia, by Lewis are told with about the same number of chapters, words and sentences, as is also for a couple of MacDonald’s novels, such as At the Back of the North Wind and Lilith: A Romance. Some homogeneity can be found in David Copperfield, Bleak House and Our Mutual Friend (by Dickens) and in The Adventures of Oliver Twist and A Tale of Two Cities. These numerical values, we think, are not due to chance but consciously managed by the authors, which is a topic we purse more in the next section.
3. Conscious Design of Texts: Words, Sentences and Interpunctions versus Chapters
First, we study the linguistic variables which we think the authors deliberately designed. In the specifics, we show the series of words, sentences and interpunctions versus chapter.
Let us consider a literary work (a novel) and its subdivision into disjointed blocks of text long enough to give reliable average values. Let be the number of sentences contained in a text block, the number of words contained in the sentences, the number of characters contained in the words and the number of punctuation marks (interpunctions) contained in the sentences.
Figure 1 shows the series
versus the normalized chapter number for
The Lord of the Rings,
The Chronicles of Narnia,
The Space Trilogy.
For example, the normalized value of chapter 10 in
The Chronicles of Narnia, is
in the
scale of
Figure 1. This normalization allows the synoptic showing of novels with a different number of chapters.
In The Chronicles of Narnia (in the following, Narnia, for brevity), we can notice a practically constant value compared to The Lord of the Rings (Lord) and The Space Trilogy (Trilogy).
Let us define a synthetic index to describe the series drawn in
Figure 1, namely the coefficient of variation
, given by the standard deviation
divided by the mean value
Table 4 and
Table 5 report
for
,
and
. Since
and
are very well correlated with
, the three coefficients of dispersion are about the same.
In Narnia , in Lord and in Trilogy . Let us also notice the minimum value in The Screwtape Letters (Screwtape).
The overall (words, sentences and interpunctions mixed together) mean value is and the standard deviation . Therefore, Screwtape is practically more than from the mean, as also is Silmarillion on the other side, and Narnia is at about . In contrast, Trilogy, Lord and The Hobbit (Hobbit) are within .
From these results, it seems that Lewis designed the chapters of Narnia and Screwtape with an almost uniform distribution of words, sentences and interpunctions, very likely because of the intended audience in Narnia (i.e., kids) and the “letters” fiction tool used in Screwtape. In Trilogy the design seems very different (, well within ) likely due to the development of the science fiction story narrated.
Tolkien acted differently from Lewis, because he seems to have designed chapters more randomly and within , as Hobbit and Lord show. An exception is The Silmarillion, published posthumously, which is a text far from being a “novel”.
Finally, notice that the novels by MacDonald show more homogeneous values, very similar to
Hobbit and
Trilogy and to the other novels listed in
Table 5.
In conclusion, the analysis of series of words, sentences and interpunctions per chapter does not indicate likely connections between Tolkien, Lewis and MacDonald. Each author structured their use of words, sentences and punctuation according to distinct plans, which varied not only between authors but also between different novels by the same author.
There are, however, linguistic variables that—as we have reported for modern and ancient literary texts—are not consciously designed/managed by authors; therefore, these variables are the best candidates to reveal hidden mathematical/statistical connections between texts. In the next section, we start dealing with these variables, with the specific purpose of comparing Tolkien and Lewis, although this comparison is set in the more general framework of the authors mentioned in
Section 2.
4. Surface Deep Language Variables
We start exploring the four stochastic variables we called deep language variables, following our general statistical theory on alphabetical languages [
9,
10,
11]. To avoid possible misunderstandings, these variables, and the linguistic channels derived from them, refer to the “surface” structure of texts, not to the “deep” structure mentioned in cognitive theory.
Contrarily to the variables studied in
Section 3, the deep language variables are likely due to unconscious design. As shown in [
9,
10,
11], they reveal connections between texts far beyond writers’ awareness; therefore, the geometrical representation of texts [
10] and the fine tuning of linguistic channels [
11] are tools better suited to reveal connections. They can also likely indicate the influence of an author on another.
We defined the number of characters per chapter
and the number of
per chapter
, and the four deep language variables are [
9] the number of characters
:
the number of words per sentence
:
the number of interpunctions per word, referred to as the word interval,
:
the number of word intervals per sentence
:
Equation (5) can be written also as .
Table 6,
Table 7,
Table 8 and
Table 9 reports the mean and standard deviation of these variables. Notice that these values have been calculated by weighing each chapter with its number of words to avoid the short chapters weighing as much as long ones. For example, chapter 1 of
Lord has
words; therefore, its statistical weight is
, not
. Notice, also, that the coefficient of dispersion used in
Section 2 was calculated by weighing each chapter
, not
to visually agree with the series drawn in
Figure 1.
Specifically, let
be the number of samples (i.e., chapters), then the mean value
is given by
Therefore, notice, for not being misled, that =. In other words, is not given by the total number of words divided by the total number of sentences , or by assigning the weight to every chapter. The three values coincide only if all the text blocks contain the same number of words and the same number of sentences, which did not occur. The same observations apply to all other variables.
The following characteristics can be observed from
Table 6,
Table 7,
Table 8 and
Table 9.
Lord and
Narnia share the same
.
Silmarillion is distinctly different from
Lord and
Hobbit, which is in agreement with the different coefficient of dispersion.
Screwtape is distinctly different from
Narnia and
Trilogy. There is a great homogeneity in Dicken’s novels and a large homogeneity in
in all novels.
In the next sections, we use , and to calculate interesting indices connected to the short-term memory of readers.
5. Extended Short-Term Memory of Writers/Readers and Universal Readability Index
In this section, we deal with the linguistic variables that, very likely, are not consciously managed by writers who, of course, act also as readers of their own text. We first report findings concerning the extended short-term memory and then those concerning a universal readability index. Both topics address human short-term memory buffers.
5.1. Extended Short-Term Memory and Multiplicity Factor
In [
12,
13], we have conjectured that the human short-term memory is sensitive to two independent variables, which apparently engage two short-term memory buffers in series, constituents of what we have called the extended short-term memory (E–STM). The first buffer is modeled according to the number of words between two consecutive interpunctions, i.e., the variable
, the word interval, which follows Miller’s
law [
14]; the second buffer is modeled according to the number of word intervals,
, contained in a sentence—i.e., the variable
—ranging approximately from 1 to 7.
In [
13], we studied the patterns (which depend on the size of the two buffers) that determine the number of sentences that theoretically can be recorded in the E–STM of a given capacity. These patterns were then compared with the number of sentences actually found in novels of Italian and English literature. We have found that most authors write for readers with short memory buffers and, consequently, are forced to reuse sentence patterns to convey multiple meanings. This behavior is quantified by the multiplicity factor
, defined as the ratio between the number of sentences in a novel and the number of sentences theoretically allowed by the two buffers, a function of
and
.
We found that
is more likely than
and often
. In the latter case, writers reuse many times the same pattern of number of words. Few novels show
; in this case, writers do not use some or most of them. The values of
found in the novels presently studied are reported in
Table 10 and
Table 11.
5.2. Universal Readability Index
In Reference [
14], we have proposed a universal readability index given by
In Equation (8), , . By using Equations (7) and (8), the average value of any language is forced to be equal to that found in Italian, namely . The rationale for this choice is that is a parameter typical of a language which, if not scaled, would bias without really quantifying the reading difficulty for readers, who in their language are used, on average, to reading shorter or longer words than in Italian. This scaling, therefore, avoids changing for the only reason that a language has, on average, words shorter (as English) or longer than Italian. In any case, affects Equation (7) much less than or .
The values of
—calculated as the other linguistic variables, i.e., by weighing chapters (samples) according to the number of words – are reported in
Table 10 and
Table 11. The reader may be tempted to calculate Equation (7) by introducing the mean values reported in
Table 6,
Table 7,
Table 8 and
Table 9. This, of course, can be performed but it should be noted that the values so obtained are always less or equal (hence they are lower bounds) to the means calculated from the samples (see
Appendix A). For example, for
Lord, instead of 64.9, we would obtain 61.9.
It is interesting to “decode” these mean values into the minimum number of school years,
necessary to make a novel “easy” to read, according to the Italian school system, which is assumed as the reference, see
Figure 1 of [
15]. The results are also listed in
Table 10 and
Table 11.
5.3. Discussion
Several intriguing observations can be drawn from the results presented in the preceding subsections.
- (a).
Silmarillion with is quite diverse from other Tolkien’s writings. Mathematically, this is due to its large and . In practice, the number of theoretical sentences allowed by the E–STM to read this text is only times the number of sentence patterns actully used in the text. The reader needs a powerful E–STM and reading ability, since and . This does not occur for Hobbit ( , 9.9) and Lord ( ) in which Tolkien reuses patterns many times, especially in Lord.
- (b).
Lord and Narnia show very large values, and , and very similar and school years: , and , , respectively. Sentence patterns are reused many times by Lewis in this novel, but not in Screwtape (, which is more difficult to read () and requires more years of schooling, . Moreover, Lord and Narnia have practically the same .
- (c).
In general,
Narnia is closer to
Lord than to
Trilogy, although the number of words and sentences in
Trilogy and
Narnia are quite similar (
Table 1). This difference between
Trilogy (
,
) and
Narnia (
,
) might depend on the different readers addressed, kids for
Narnia and adults for
Trilogy, with different reading ability, as
indicates.
- (d).
The novels by MacDonald show values of and very similar to those of the other English novels.
- (e).
Notice the homogeneity in Dicken’s novels, which require about years of school and readability index .
In conclusion, Lord and Narnia are the novels that address readers with very similar E–STM buffers, reuse sentence patterns in similar ways, contain the same number of words per sentence, and require the same reading ability and school years compared to other novels by Tolkien and Lewis. The mathematical connections between Lord and Narnia will be further pursued in the next section, where the four deep language parameters are used to represent texts geometrically.
6. Geometrical Representation of Texts
The mean values of
Table 6,
Table 7,
Table 8 and
Table 9 can be used to assess how texts are “close”, or mathematically similar, in the Cartesian coordinate plane, by defining linear combinations of deep–language variables. Texts are then modeled as vectors; the representation is discussed in detail in [
9,
10] and briefly recalled here. An extension of this geometrical representation of texts allows the calculation of the probability that a text may be confused with another one, an extension in two dimensions of the problem discussed in [
16]. The values of the conditional probability between two texts (authors) can be considered an index indicating who influenced who.
6.1. Vector Representation of Texts
Let us consider the following six vectors of the indicated components of deep language variables
),
),
),
),
),
) and their resulting vector sum:
The choice of which parameter represents the component in the abscissa and ordinate axes is not important because, once the choice is made, the numerical results will depend on it, but not the relative comparisons and general conclusions.
In the first quadrant of the Cartesian coordinate plane, two texts are likely mathematically connected—they show close ending points of vector (9)—if their relative Pythagorean distance is small. A small distance means that texts share a similar mathematical structure, according to the four deep language variables.
By considering the vector components
and
of Equation (9), we obtain the scatterplot shown in
Figure 2 where
and
are normalized coordinates calculated by setting
Lord at the origin
and
Silmarillion at
, according to the linear tranformations:
From
Figure 2, we can notice that
Silmarillion and
Screwtape are distinctly very far from all other texts examined, marking their striking diversity, as already remarked; therefore, in the following analyses, we neglect them. Moreover,
Pride,
Vanity,
Moby and
Floss are grouped together and far from
Trilogy,
Narnia and
Lord; therefore, in the following analyses, we will not consider them further.
The complete set of the Pythagorean distance
between pairs of texts is reported in
Appendix B. These data synthetically describe proximity of texts and may indicate to scholars of literature connections between texts not considered before.
Figure 3 shows example of these distances concerning
Lord,
Narnia and
Trilogy. By referring to the cases in which
, we can observe the following:
- (a).
The closest texts to Lord are Narnia, Back, Lilith, Mutual and Peter.
- (b).
The closest texts to Narnia are Lord, Lilith, Bleak, Martin and Peter.
- (c).
The closest texts to Trilogy are Hobbit, Martin and Peter.
Besides the proximity with earlier novels, Lord and Narnia show close proximity with each other and with two novels by MacDonald.
These remarks, however, refer to the “average” display of vectors whose ending point depends only on mean values. The standard deviation of the four deep language variables, reported in
Table 6,
Table 7,
Table 8 and
Table 9, do introduce data scattering; therefore, in the next subsection, we study and discuss this issue by calculating the probability (called “error” probability) that a text may be mathematically confused with another one.
6.2. Error Probability: An Index to Assess Who Influenced Who
Besides the vector
of Equation (9)—due to mean values—we can consider another vector
, due to the standard deviation of the four deep language variables that adds to
. In this case, the final random vector describing a text is given by
Now, to obtain some insight into this new description, we consider the area of a circle centered at the ending point of .
We fix the magnitude (radius)
as follows. First, we add the variances of the deep language variables that determine the components
and
of
, let them be
,
. Then, we calculate the average value
and finally, we set
Now, since in calculating the coordinates
and
of
a deep language variable can be summed twice or more, we add its standard deviation (referred to as sigma) twice or more times before squaring. For example, in the
component,
appears three times; therefore, its contribution to the total variance in the
is 9 times the variance calculated from the standard deviation reported in
Table 6,
Table 7,
Table 8 and
Table 9. For
Lord, for example, it is
. After these calculations, the values of the 1–sigma circle are transformed into the normalized coordinates
according to Equations (10) and (11).
Figure 4 shows a significant example involving
Lord,
Narnia,
Trilogy,
Back and
Peter. We see that
Lord can be almost fully confused with
Narnia, and partially with
Trilogy, but not vice versa.
Lord can also be confused with
Peter and
Back, therefore indicating strong connections with these earlier novels.
Now, we can estimate the (conditional) probability that a text is confused with another by calculating the ratio of areas. This procedure is correct if we assume that the bivariate density of the normalized coordinates
, centered at
, is uniform. By assuming this hypothesis, we can calculate probabilities as the ratio of areas [
17,
18].
The hypothesis of substantial uniformity around should be justified by noting that the coordinates are likely distributed according to a log-normal bivariate density because the logarithm of the four deep language variables, which combine in Equation (9) linearly, can be modeled as a Gaussian. For the central limit theorem, we should expect approximately a Gaussian model on the linear values, but with a significantly larger standard deviation that that of the single variables. Therefore, in the area close to , the bivariate density function should not be peaked, hence the uniform density modeling.
Now, we can calculate the following probabilities. Let
be the common area of two 1–sigma circles (i.e., the area proportional to the joint probability of two texts), let
be the area of 1–sigma circle of text 1 and
the area of 1–sigma circle of text 2. Now, since probabilities are proportional to areas, we obtain the following relationships:
In other words,
gives the conditional probability
that part of text 2 can be confused (or “contained”) with text 1;
gives the conditional probability
that part of text 1 can be confused with text 2. Notice that these conditional probabilities depend on the distance between two texts and on the 1–sigma radii (
Appendix C).
Of course, these joint probabilities can be extended to three or more texts, e.g., in
Figure 4 we could calculate the area shared by
Lord,
Narnia and
Trilogy and the corresponding joint probability, which is not conducted in the present paper.
We think that the conditional probabilities and the visual display of 1–sigma circles give useful clues to establish possible hidden connections between texts and, maybe, even between authors, because the variables involved are not consciously managed by them.
In
Table 12, the conditional probability
is reported in the columns; therefore,
refers to the text indicated in the upper row.
is reported in the rows; therefore,
refers to the text indicated in the left column.
Notice that means ; therefore, text 1 can be fully confused with text 2. means ; therefore, text 2 can be fully confused with text 1.
For example, assuming
Lord as text 1 (column 1 of
Table 12) and
Narnia as text 2 (row 3), we find
and vice versa. If we assume
Narnia as text 1 (column 3) and
Lord as text 2 (row 1), we find
. These data indicate that
Lord can be confused with
Narnia with a probability close to 1, but not vice versa. In other words, in the data bank considered in this paper, if a machine randomly extracts a chapter from
Lord, another machine, unaware of this choice, could attribute it to
Lord, but also with decreasing probability to
Back,
Peter, Narnia and
Lilith.On the contrary, if the text is extracted from Narnia, then it is more likely attributed to Peter or Trilogy than to Lord or other texts.
We think that these conditional probabilities indicate who influenced who more. In other words, Tolkien influenced more Lewis that the opposite.
Now, we can define a synthetic parameter which highlights how much, on the average, two texts can be erroneously confused with each other. The parameter is the average conditional probability (see [
16] for a similar problem):
Now, since in comparing two texts we can assume
, we receive
If
, there is no intersection between the two 1–sigma circles. The two texts cannot be each other confused; therefore, there is no mathematical connection involving the deep language parameters (this happens for
Screwtape and
Silmarillion, which can be each other confused, but not with the other texts). If
, the two texts can be totally confused, and the two 1–sigma circles coincide.
Appendix D reports the values of
for all the pairs of novels.
Now, just to allow some rough analysis, it is reasonable to assume as a reference threshold, i.e., the probability of obtaining heads or tails in flipping a fair coin. If , then two texts can be confused not by chance; if , then two texts cannot likely be confused.
To visualize
,
Figure 5 draws
when text 1 is
Lord (column 1 of
Table 12),
Narnia (column 3) or
Trilogy (column 4). We notice that
in the following cases:
- (a).
Lord as text 1: Narnia, Back, Lilith, Mutual, Peter.
- (b).
Narnia as text 1: Lord, Trilogy, Back, Lilith, Bleak, Mutual, Martin, Peter.
- (c).
Trilogy as text 1: Hobbit, Narnia, Bleak, Martin, Bask.
We can reiterate that Tolkien (Lord) appears significantly connected to Lewis (Narnia), to MacDonald (Back, Lilith) and Barrie (Peter), but not to Dicken’s novels where, on the contrary, Lewis appears connected.
In the next section, the four deep language variables are singled out to consider linguistic channels existing in texts. This is the analysis we have called the “fine tuning” of texts [
11].
7. Linear Relationships in Literary Texts
The theory of linguistic channels, which we will be revisited in the next section, is based on the regression line between linguistic variables:
Therefore, we show examples of these linear relationships found in Lord and Narnia.
Figure 6a shows the scatterplot of
versus
of
Lord and
Narnia. In
Narnia, the slope of the regression line is
and the correlation coefficient
In
Lord, and
. Since the average relationships—i.e., Equation (18)—are practically identical—see also the values of
in
Table 6 and
Table 7—while the correlation coefficients—i.e., the scattering of the data—are not, this fact will impact the sentence channel discussed in
Section 9.
Similar observations can be carried out for
Figure 6b, which shows
versus
in
Lord and
Narnia. We find
in
Lord, and
and
0.9384 in
Narnia.
Appendix E reports the complete set of these parameters.
Figure 7 shows the scatterplots of
Lord and
Trilogy. In
Trilogy, for
versus
,
; for
versus
.
Figure 8 shows the scatterplots for
Lord and
Back or
Lilith. We see similar regression lines and data scattering. In
Back (left panel), the regression line between
and
gives
,
; in
Lilith (right panel),
,
0.8890. These results likely indicate the influence of MacDonald on Tolkien’s writings because they are different from most other novels.
In conclusion, the regression lines of Lord, Narnia and Trilogy are very similar, but they can differ in the scattering of the data. Regression lines, however, describe only one aspect of the relationship, namely the relationship between conditional average values in Equation (18); they do not consider the other aspect of the relationship, namely the scattering of data, which may not be the same even when two regression lines almost coincide, as shown above. The theory of linguistic channels, discussed in the next section, on the contrary, considers both slopes and correlation coefficients and provides a “fine tuning” tool to compare two sets of data by singling out each of the four deep language parameters.
8. Theory of Linguistic Channels
In this section, we recall the general theory of linguistic channels [
11]. In a literary work, an independent (reference) variable
(e.g.,
and a dependent variable
(e.g.,
can be related by the regression line given by Equation (18).
Let us consider two different text blocks
and
, e.g., the chapters of work
and work
. Equation (18) does not give the full relationship between two variables because it links only conditional average values. We can write more general linear relationships, which take care of the scattering of the data—measured by the correlation coefficients
and
, respectively—around the average values (measured by the slopes
and
):
The linear models Equations (19) and (20) introduce additive “noise” through the stochastic variables
and
, with zero mean value [
9,
11,
15]. The noise is due to the correlation coefficient
.
We can compare two literary works by eliminating ; therefore, we compare the output variable for the same number of the input variable . For example, we can compare the number of sentences in two novels—for an equal number of words—by considering not only the average relationship, Equation (18), but also the scattering of the data, measured by the correlation coefficient, Equations (19) and (20). We refer to this communication channel as the “sentences channel”, S–channel, and to this processing as “fine tuning” because it deepens the analysis of the data and can provide more insight into the relationship between two literary works or any other texts.
By eliminating
from Equations (19) and (20), we obtain the linear relationship between the input number of sentences in work
(now the reference, input text) and the number of sentences in text
(now the output text):
Compared to the new reference work
, the slope
is given by
The noise source that produces the correlation coefficient between
and
is given by
The “regression noise–to–signal ratio”,
due to
, of the new channel is given by
The unknown correlation coefficient
between
and
is given by
The “correlation noise–to–signal ratio”,
, due to
, of the new channel from text
to text
is given by
Because the two noise sources are disjoint and additive, the total noise-to-signal ratio of the channel connecting text
to text
is given by
Notice that Equation (27) can be represented graphically [
10]. Finally, the total and the partial signal-to-noise ratios are given by
Of course, we expect that no channel can yield and ; therefore, , a case referred to as the ideal channel, unless a text is compared with itself. In practice, we always find and . The slope measures the multiplicative “bias” of the dependent variable compared to the independent variable; the correlation coefficient measures how “precise” the linear best fit is.
In conclusion, the slope is the source of the regression noise , and the correlation coefficient is mostly the source of the correlation noise of the channel .
9. Linguistic Channels
In long texts (such as novels, essays, etc.), we can define at least four linguistic linear channels [
11], namely:
- (a).
Sentence channel (S–channel)
- (b).
Interpunctions channel (I–channel)
- (c).
Word interval channel, WI–channel
- (d).
Characters channel (C–channel).
In S–channels, the number of sentences of two texts is compared to the same number of words. These channels describe how many sentences the author of text writes, compared to the writer of text (reference text), by using the same number of words. Therefore, these channels are more linked to than to other parameters. It is very likely they reflect the style of the writer.
In I–channels, the number of word intervals of two texts is compared for the same number of sentences. These channels describe how many short texts between two contiguous punctuation marks (of length ) two authors use; therefore, these channels are more linked to than to other parameters. Since is very likely connected with the E–STM, I–channels are more related to the second buffer of readers’ E–STM than to the style of the writer.
In WI–channels, the number of words contained in a word interval (i.e., ) is compared for the same number of interpunctions. These channels are more linked to than to other parameters. Since is very likely connected with the E–STM, WI–channels are more related to the first buffer of readers’ E–STM than to the style of the writer.
In C–channels, the number of characters of two texts is compared to the same number of words. They are more related to the language used, e.g., English, than to the other parameters, unless essays or scientific/academic texts are considered because these latter texts use, on average, longer words [
9].
As an example,
Table 13 reports the total and the partial signal-to-noise ratios
,
,
in the four channels by considering
Lord as reference (input) text. In other words, text
is compared to text
(reference text, i.e.,
Lord).
Appendix F reports
for all novels considered in the paper.
Let us make some fundamental remarks on
Table 13, applicable to whichever is the reference text. The signal-to-noise ratios of C–channels are practically the largest ones, ranging from 19.17 dB (
Lilith) to 31.19 dB (
Back). These results are simply saying that all authors use the same language and write texts of the same kind, which is novels, not essays or scientific/academic papers. These channels are not apt to distinguish or assess large differences between texts or authors.
In the three other channels, we can notice that Trilogy, Back and Lilith have the largest signal-to-noise ratios, about dB; therefore, these novels are very similar to Lord. In other words, these channels seem to confirm the likely influence by MacDonald on both Lord and Trilogy and the connection between Lord and Trilogy.
On the contrary,
Narnia shows poor values in the S–Channel (10.12 dB) and WI–Channel (7.94 dB). These low values are determined by the correlation noise because
. If we consider only
—i.e., only the regression line—then we notice a strong connection with
Lord since
dB. As we have already observed regarding
Figure 6, the regression lines are practically identical but the spreading of the data is not. Lewis in
Narnia is less “regular” than in
Trilogy or Tolkien in
Lord in shaping (unconsciously) these two linguistic channels.
10. Summary and Conclusions
Scholars of English Literature unanimously say that J.R.R. Tolkien influenced C.S. Lewis’s writings. For the first time, we have investigated this issue mathematically by using an original multi-dimensional analysis of linguistic parameters, based on the surface deep language variables and linguistic channels.
To set our investigation in the framework of English Literature, we have also considered some novels written by earlier authors, such as Charles Dickens and others, including George MacDonald, because scholars mention his likely influence on Tolkien and Lewis.
In our multi-dimensional analysis, only the series of words, sentences and interpunctions per chapter, in our opinion, were consciously planned by the authors and, specifically, they do not indicate strong connections between Tolkien, Lewis and MacDonald. Each author distributed words, sentences and interpunctions differently from author to author and, sometimes, even from novel to novel by the same author.
On the contrary, the deep language variables and the linguistic channels, discussed in the paper, are likely due to unconscious design and can reveal connections between texts far beyond writers’ awareness.
In summary, the buffers of the extended short-term memory required to readers, the universal readability index of texts, the geometrical representation of texts and the fine tuning of linguistic channels—all tools largely discussed in the paper—have revealed strong connections between The Lord of the Rings (Tolkien), The Chronicles of Narnia and The Space Trilogy (Lewis) on one side, and the strong connection also with some novels by MacDonald on the other side, therefore substantially agreeing with what scholars of English Literature say.