2. Fibonacci-like Sequences
In this section, we briefly summarize the essential elements of a set of Fibonacci-like sequences, the same as those used in our reference [
1], which we shall use again in this paper for new applications. These sequences are defined, in terms of the ordinary Fibonacci sequence, by the recurrence relation (
)
where
denotes collectively the five sequences, named in the sequel
, and
. In
Table 4 below, the first few terms are given.
The choice of the “seeds”, or
initial conditions (p, q) of these sequences, has been shown to be especially appropriate and very useful in their consequences in [
1]; see, on this subject, Sections 3 and 4.2.5 of the latter reference. As we shall see in this study, these sequences will also be crucial in opening up new application possibilities. It is important to note that the Fibonacci and Lucas sequences can be obtained as a secondary product of the sequences
and
. The difference
gives the (slightly modified) Fibonacci sequence denoted
,
in an unusual but interesting form: its “seeds”, here, are inverted with respect to the usual Fibonacci sequence. Also, the sum of any of its first members until a certain index gives a Fibonacci number, exactly, contrary to the ordinary Fibonacci sequence with seeds 0, 1 which always gives one unit less than a Fibonacci number. For example, in our case, for
, we obtain
. Moreover, the relation
gives the Lucas sequence:
It is important to note that the sequences in
Table 4 are intertwined by a (large) number of
identities connecting them (see Equation (2) in [
1] for some of them). The reader can consult Appendix C of the latter reference to see how it is possible to check these identities for any large or very large values of the index n by using a computer with mathematical software containing a built-in Fibonacci function. For the low values of the index n in
Table 4, the verification could be carried out immediately by hand or using a pocket calculator. We shall also use some of these identities in our applications in this paper, as we successfully did in our above-mentioned recent paper. The identities we need will be presented as we go along, in the appropriate place, where we use them for the first time.
5. The 3rd-Base Symmetry Classification
By considering the genetic code as an f-mapping, Findley et al. [
7] extracted a basic symmetry for the doubly degenerate codons (group-II). Some excerpts from the aforementioned reference are in order for understanding what an f-mapping is. The first, second, and third bases in a codon are denoted by the letters i, j, and k (B stands for bases U, C, A, and G). The authors consider the 64-codon set,
and define
where i, j, k designate the first, second, and third bases in the codon
(B is for bases U, C, A, G).
, k
, partitions
into four separate subsets where each subset contains only codons having the
same third base. Each of these subsets is mapped by f onto members of the amino acids set
, with the image being denoted
this is shown in
Table 6, below.
Therefore
and
. With this f-mapping, a
one-to-one correspondence is established between one member of a
doubly degenerate codon pair and the other member. Equivalently, these relationships could be rephrased as follows: (i) if a codon for an amino acid has third base U, then there is a codon for the same amino acid having third base C and the other way round or (ii) if a codon of an amino acid has third base A, then there is a codon of the same amino acid having third base G and the other way round. For a doubly degenerate codon pair, (i) and (ii) are mutually exclusive. For the quartets (group-IV), (i) and (ii) hold simultaneously. For the sextets (group-VI), the quartet part obeys (i)
and (ii) and, for the doublet part, one has (i)
or (ii). For the odd-order degenerate codons (group-Iand group-III), however, there is a small deviation from symmetry. In
Table 6, we show this classification. In the last two rows of this table, we have calculated, from
Table 3, the hydrogen atom content and the atom content in the side chains of the amino acids in the four columns, in the two views “
on” and “
off” (see
Section 1.2). Note the hydrogen atom balances (
) and atom number
balances (
) in the last two rows in
Table 6. These express the exact one-to-one correspondence mentioned above (here, the two codons of isoleucine AUU and AUC constitute an order-2
doublet). These balances will be established from our Fibonacci-like sequences below in this section.
5.1. The Hydrogen Atom Content
5.1.1. “Activation Key” On
In the U/C third-base set, there are
hydrogen atoms. In the A/G third-base set there are, respectively,
and
hydrogen atoms (grand total of
, see
Table 6 above). To describe this pattern, using our Fibonacci-like sequences, let us start again from Equation (24) of
Section 4.1.1 and write it in the following form, by expliciting the sum
Note that we have included the sixth term of the sequence , in the sum , in the second parentheses. In this way, we reach the correct hydrogen atom pattern.
5.1.2. “Activation Key” Off
In this case, let us recall Equation (27) of
Section 4.1.2 (or Equation (12) of
Section 3.2 which is the same)
and use the following identity linking the sequences
and
which, for
, is written
. By inserting this last number, 31, in the above equation and arranging, in a first step, we have
The second parentheses on the left-hand side can be written as
. This is the correct pattern for the U/C third-base set and the other part in the above equation remains to be handled. A quick way consists in writing the factor
above as
as 8 is a Fibonacci number. All this allows us to put the above equation in the following form:
which could be compared with the data in
Table 6 (case “
off”).
5.2. The Atom Content
5.2.1. “Activation Key” On
Let us, here, start from Equation (30) in
Section 4.2.1, written as
and use, first,
in cascade the recurrence relation of the sequence
:
Now, we arrange this relation as follows:
To obtain the correct atom number pattern, we note that because of the following identity of the sequence
:
we can, for
, write
or
. By inserting this latter value in Equation (43) above, we obtain
We recognize here the correct atom number pattern (see
Table 6).
5.2.2. “Activation Key” Off
This case is easily handled by starting from Equation (34) of
Section 4.2.2. Using the recurrence relation of the sequence
(
), we write it as
Next, we use, again, the identity
, already considered in
Section 5.2.1, but now for
:
. By inserting this relation in the equation above, we have
As the first term is already correct, we examine the second. Using the recurrence relations of both sequence
and
, we can write
and
. By inserting these values in the equation above, we end up with
which is the correct answer.
6. The “Ideal” Symmetry and the “Supersymmetry” Classification Schemes
In the “ideal” symmetry classification scheme [
9], the three sextets serine, arginine, and leucine, each of them encoded by six codons, are used as “
generators”, with serine playing the central role. These three objects are underlined in
Table 7 below. This approach separates the 64-codon matrix into two groups, the “leading” group and the “non-leading” group, each of which has 32 codons. The (equal) A+U-rich and the G+C-rich parts make up each group. The “ideal” classification scheme is engendered by combining the six codons of serine, arginine, and leucine in the following way. The entire “leading” group (consisting of 32 codons) is defined by the initial generator, serine, which has six codons; arginine which, too, has six codons; and leucine, which has
only the quartet part of its six codons. On the other hand, the leftover doublet portion of leucine serves as a “seed” for the creation of the 32-codon “non-leading” group. According to this scheme, the genetic code table is produced by codon sextets based on exact
purine/pyrimidine symmetries, A+U-rich/C+G-rich symmetries, and
direct/complement symmetries (see [
9]). The table below shows these groups.
Soon after the publication of the paper [
9], the authors postulated, in [
10], the existence of what they call a “supersymmetric” genetic code table, derived from the “ideal” symmetry genetic code table, having now five symmetries between bases, codons, and amino acids. These are purine–pyrimidine between bases and codons, direct–complement symmetry of codons between boxes, A+U-rich and C+G-rich symmetry of codons between two columns, and
mirror symmetry between all purines and pyrimidines of the whole code and between second and third bases of codons (see [
10]). This “supersymmetry” genetic code table is shown in
Table 8.
6.1. Hydrogen Atom Content
6.1.1. “Activation Key” On
The hydrogen atom count is as follows, from
Table 3 and
Table 8: leading group (in yellow and orange, as in
Table 7): 192; non-leading group (in light grey and light blue, as in
Table 7): 170. To derive this hydrogen atom pattern, let us start from Equation (25) of
Section 4.1.1 and use again the equality
(from the identity in Equation (16) of
Section 3.2 for
) to obtain, after arranging,
which is the correct result.
6.1.2. “Activation Key” Off
In this case, the hydrogen atom count is as follows: leading group: 192, non-leading group: 174. Here, we start from Equation (27) of
Section 4.1.2:
In this case, we consider, first, the number 8 and use the recurrence relation of the sequence
to write it as
and, next, use the recurrence relation of
. With these elements, we could write Equation (50) as follows:
This is the correct result.
6.2. Atom Content
6.2.1. “Activation Key” On
From
Table 3 and
Table 8, we have 316 atoms in the leading group and 282 atoms in the non-leading group. Here, we start from the relation
, which led to Equation (31) of
Section 4.2.1 but, this time, we add and subtract the quantity
, see
Table 4, to obtain the correct result:
6.2.2. “Activation Key” Off
In this case, the atom number in the leading group is the same as before (316) but the atom number in the non-leading group is now equal to 286. This case could be handled by appealing to the identity in Equation (33) of
Section 4.2.2, which is again written for
We first write
as
, as in
Section 4.2.2, but we now (i) select
one copy of the number 61 in the above relation and write it as
, by virtue of the recurrence relation of the sequence
, and (ii) use the identity in Equation (16) (
) for
, that is,
. This allows us to put Equation (53a) above in the form:
which is the correct result.
6.3. The “Supersymmetry” Genetic Code Table
As the case of the “supersymmetry” genetic code table [
10] has not been considered in [
1], where the 20 amino acids were all taken in the their uncharged state and proline’s side chain considered in shCherbak’s view (5 hydrogen atoms, 8 atoms, and 41 nucleons), we give, here, the corresponding results and, next, consider the case where the four amino acids mentioned earlier are charged and proline with its two views,
on and
off.
6.3.1. Uncharged Amino Acid Case and “Activation Key” On Only
Consider, first, the identity
where we have added to both sides the same quantity
. For
, we have from
Table 4The sum
, describing the leading group/non-leading group hydrogen atom pattern, has already been obtained in [
1] but the (new) quantity
will be useful in what follows. Using again the identity in Equation (16) for
(
) and next the identity in Equation (7) of
Section 3.1 for
, which gives
, we can put the left-hand side of Equation (55) in the form
If we take the number 91, the 7th term of the sequence
,
, and write it as
, because
in the same sequence, we then have, from Equation (56),
This is the direct box/complement box hydrogen atom pattern, respectively (see
Table 8). (The calculations from this table go along the same lines as in the above sections. For the direct boxes, for example, take all the amino acids inside all of them and, taking into account their number of codons, compute the number of hydrogen atoms, and the same for the complement boxes). To derive the hydrogen atom pattern for the
mirror symmetry, a more elegant and quick way is as follows. Consider the identity
For
, we have
(see
Table 4). By inserting this last relation in Equation (56) above, we obtain
This is the hydrogen atom pattern for the “mirror” symmetry (see
Table 8 above. See also Figure 2 in [
10] and the detailed explanations therein about this beautiful symmetry).
6.3.2. Charged Amino Acid Case, “Activation Key” On and Off
Now, we consider the case where (four) amino acids are in their (physiological) charged state which is the main subject in this paper.
Hydrogen Atom Content
In the case of “activation key”
on, there are
hydrogen atoms in the direct boxes and
hydrogen atoms in the complement boxes (from
Table 3 and
Table 8). Here, we recall Equation (25) of
Section 4.1.1:
By using again the identity in Equation (16) for
,
, once, and arranging, we obtain
which is the correct result. In the case of “activation key”
off, there are
hydrogen atoms in the direct boxes and
hydrogen atoms in the complement boxes. Here, we start from Equation (12) of
Section 3.2 and write it as
where
from the recurrence relation of the sequence
. Next, we use the same identity in Equation (38) of
Section 5.1.2, again for
(
), to rewrite (one copy) of the number
above:
These are the correct hydrogen atom numbers mentioned above. Now, we look at the “mirror” symmetry. In the case of “activation key”
on, there are
hydrogen atoms in column 1 and
hydrogen atoms in column 2 of
Table 8, using the data of
Table 4. Here, we start from Equation (60) above and put it in the following correct form:
where we have used the recurrence relation
of the sequence
and, next, replaced the number 53 of the latter sequence by the same number 53 of the sequence
which is equal to
. (Recall that, from Equation (16), one has
)
In the case of “activation key”
off, there are
hydrogen atoms in column 1 and
hydrogen atoms in column 2 (see
Table 8, data from
Table 4). Consider again Equation (60) above:
By using, repetitively, the recurrence relation of the sequence
and also the following relation
, from the identity
for
, we can put the equation above into the form:
which is the correct answer.
Atom Content
In the case of “activation key”
on, there are
atoms in the direct boxes and
atoms in the complement boxes with a total of 598 (see
Table 8 and data from
Table 4). In this case, we start from the relation
(see Equation (30) and below,
). It is now enough to write
, as a Lucas number, for example, and rewrite the above equation in the form
which correctly describes the above atom content numbers. In the case of “activation key”
on, there are 348 atoms in column 1 and
atoms in column 2 (see
Table 8, data from
Table 4). Here, we start from Equation (66) above and use the identity in Equation (11),
with
(
. We have
By introducing the identity in Equation (16) with
,
and arranging, we finally obtain the above correct atom numbers:
In the case of “activation key”
off, there are
atoms in the direct boxes and
atoms in the complement boxes, with a total of 602 atoms (see
Table 8, data from
Table 4). To describe this case, we start by writing Equation (34) of
Section 4.2.2 as follows:
Now we, first, take one copy of the number 61 and write it as
, using the identity
with
(
. Second, we write each of the other three copies of 61 using the recurrence relation
. Inserting these values in Equation (71), we obtain
which is what we are looking for.
In the case of “activation key”
off there are
atoms in column 1 and
atoms in column 2 (see
Table 8, data from
Table 4). It is possible to show that this case follows from the preceding one by noticing, as we did in the derivation of Equation (64) above, that the number
is equal to
(these sequences are linked, see Equation (16)). By using the recurrence relation
and arranging, we finally have the following right answer:
7. More on shCherbak’s Theory
In [
1], we derived the relation
This describes proline’s singularity (see [
3,
4]). Here, in this section, we go much further, by presenting completely new results. First, consider, once again, the sequence
, more exactly
. We have, by squaring:
It is not difficult to see, from
Table 3, that this number corresponds to the number of nucleons (or integer molecular mass) in the side chains of the amino acids coded by 23 codons, where the sextets are counted twice, and proline has 42 nucleons in its side chain and only 73 nucleons in its backbone, contrary to the other 19 amino acids having 74 nucleons in their backbones, see Equation (74) above and
Section 1.2. Second, from the identity
, already considered in the sections above, we can write Equation (75) as follows, using
twice:
We recognize here the unit corresponding to the “singular” nucleon and the 1443 nucleons where proline, now, has 41 nucleons in its side chain and 74 nucleons in its backbone as do the 19 other amino acids. Third, we can indeed derive the very molecular mass of proline from the above numbers of nucleons
and
. To see this, we use another tool from number theory, i.e.,
modular arithmetic, which has many applications in mathematics (group theory, knot theory, ring theory) and computer science (computer algebra, coding theory, cryptography, and so on), see, for example, [
11]. Also, several kinds of moduli are used in applications, for example, modulo 11 in international standard book numbers (ISBNs) or mod 37 and mod 97 arithmetic in error detection in bank account numbers. We shall, here, take as moduli the integers
and
. (This is equivalent to summing the “digits” in base-100 and base-1000, respectively.) We have
The reader could use, if desired, quick online calculators for the modulo function, for example, [
12]. Using the trick of the digit summation mentioned above (
and
, we can arrange the above relation as
. In what follows, we shall use two functions from elementary number theory, Euler’s φ-function of an integer n (also known as Euler’s totient function), which counts the number of positive integers less than or equal to n which are relatively prime to n [
13], and also the σ-function which gives the sum of the divisors of an integer n [
14]. In the case where the integer is a prime number p, these functions simplify greatly and one has simply
and
. Noting that 43 above is the only odd number out of four (43, 14, 14, 44) and, furthermore, a prime “digit” (remember we are in base-100), we obtain by calling its φ-function (
):
We have also
if we use
. These are the same relations as in Equation (74) above. The numbers
and
are useful, as explained above, but there is also a third number which will not only play a role together with the other two but also has a meaningful interpretation. It is given by the following relation:
This number corresponds to the number of nucleons in the side chains of the amino acids encoded by 23 codons (the sextets counted twice) with proline’s side chain having 42 nucleons and four amino acids are in their charged state (see
Section 1.2,
Table 3 and above it):
In the first parentheses, 1 corresponds to the supplementary nucleon in proline’s side chain. In the second parentheses, 1 corresponds to the charged arginine. In the third parentheses, the units correspond respectively to lysine (charge +1), aspartic acid (charge −1), and glutamic acid (charge −1). We have therefore three meaningful numbers:
,
, and
. From these, we consider the following expression:
and take its
-function, the sum of its prime factors (
), see below about this function.
This number is equal to the number of nucleons (or molecular mass) of the
residue of proline (see [
5],
Table 1). When two amino acids (or more) combine to form a peptide, a water molecule (two hydrogen atoms and one oxygen atom) is released and what remains of each amino acid is called a
residue. Here, we have
, which is the molecular mass of the water molecule. Note that we have also, using two of the above numbers, 444 and 445:
Both relations give the same result, 97. From Equations (81) and (82), we have the two-fold result
Finally, it is also possible to derive the detailed atomic composition of the (whole)
molecule of proline:
. Starting from Equation (81) and then adding the quantity
,
Now,
, as a Fibonacci number, could be decomposed successively as
and
. By inserting this decomposition in the above equation and arranging, we have
This is the correct result. The number 60 has the prime factorization and gives five carbon atoms (carbon nucleus: six protons, six neutrons). The number 14 has the prime factorization and corresponds to one nitrogen atom (nitrogen nucleus: seven protons, seven neutrons). The number 32 has the prime factorization and corresponds to two oxygen atoms (oxygen nucleus: eight protons, eight neutrons). The last number, 9, corresponds to nine hydrogen atoms.
In order to fully understand the reasoning presented below, it is important for the reader to keep in mind that, when looking at Equations (77) and (80), 1443 represents the number of nucleons in the side chains of the amino acids coded by 23 codons with the sextets counted twice and proline having 41 nucleons in its side chain, while 1444 represents the number of nucleons in the side chains of the amino acids coded by 23 codons with the sextets counted twice and proline now having 42 nucleons in its side chain. In fact, it appears that there is compelling evidence that the calculations performed here are “locked” technically. Below, we show why but, before doing that, let us recall, briefly, a few elements of our helpful arithmetic function
(see Appendix B in [
1]). From the Fundamental Theorem of Arithmetic, an integer n can be represented, uniquely, as a product of prime numbers irrespective of their order:
. The function
is defined by the formula
where
is the sum of the prime factors (including the multiplicities)
,
is the sum of the prime indices of the prime factors (including the multiplicities)
, and
, the so-called Big Omega function, is the number of prime factors
. The portion
of this function was already involved in the derivation of Equation (81) above.
Now, let us look at the moduli
and
which were
together with the numbers
and
and
critical in the derivation of Equations (77), (80), and (82). Their prime factorization is given by
and
. We have
and
and, therefore,
. This is nothing but,
again, the integer molecular mass of proline’s residue, see Equations (81) and (82). Also, by isolating the two terms
and
, in
, and including them in
, we obtain
. This is a more accurate description of proline’s
residue (see [
5],
Table 1), which could also be seen from Equation (81) above, remembering that 89 is a Fibonacci number,
By pushing the precision to the extreme, we can arrange the side chain part as follows:
, where we have made explicit the portions of
. We have three carbon atoms (atomic mass 12) and six hydrogen atoms, see the side chain in
Figure 1 below. The last term is interpreted as six hydrogen atoms in the side chain, (
), with one hydrogen atom susceptible to being “transferred” from the side chain to the backbone (shCherbak’s “borrowing”, see above and
Table 3). Of course, one has to add
, from Equation (83), the water molecule, to obtain the whole molecule of proline. Below, in
Figure 1, we show it with the side chain boxed.
The unique charm and covert attraction of proline’s structure are concealed inside the integer molecule masses, just waiting to be gently revealed through the use of modular arithmetic.
8. Multiplet Structures
This section deals with another application of our Fibonacci-like sequences, more precisely, the sequences
and
. In [
15], we have derived the exact multiplet structure of the genetic code, starting from the total number of codons, 64, expressed from the beginning as
and using Fibonacci/Lucas decompositions. We subsequently used either a property of “superperfect” numbers or the relation between Fibonacci and Lucas numbers to write one factor 8 as
and next 7 as 3 + 4 to derive the above-mentioned multiplet structure. Here, we show that all the ingredients of this derivation are, in fact, already
ostensibly embedded in our Fibonacci-like sequences. Taking
(see
Table 4), first, there is the recurrence relation
. This is the decomposition of the number 8 mentioned above, obtained here without recourse to “superperfect” numbers, for example. Next, from the Lucas sequence in Equation (4),
, which is derived from the Fibonacci sequence
in Equation (3), is itself derived from the sequences
and
in Equation (2), and we have
. This is all we need to write
which leads, after writing the Fibonacci number 8 as
, to the following multiplet structure of the (standard) genetic code which could be expressed in two equivalent forms, Equations (87) and (88):
The form in Equation (87) describes Rumer’s division (see
Section 4): five quartets (four codons each) and three quartet parts of the three sextets (four codons each) in the first parentheses (set
), and nine doublets (two codons each), three doublet parts of the three sextets (two codons each), one triplet (three codons), two singlets (one codon each), and three stops (three codons) in the second parentheses (set
). The form in Equation (88) describes the usual multiplet structure: five quartets, three sextets (six codons each,
), nine doublets, one triplet, two singlets, and three stops. The vertebrate mitochondrial genetic code could also be easily derived from Equation (88), see [
1]. In fact, in unpublished notes, we have also derived from Equation (86), with a little work, several other multiplet structures of the (non-standard) genetic codes. Let us give, here, only one example: the
Alternative Yeast Nuclear Code (#12 in the database [
16]). In this code, shown in
Table 9 below, the only change concerns the reassignment of the codon CUG of leucine which now codes for serine. We have therefore five quartets (V, A, T, P, G), one sextet (R), one
quintet (L, UUR, CUY, CUA), one
septet (S, UCN, AGY, CUG), nine doublets (F, Y, C, H, Q, D, E, N, K), one triplet (I), two singlets (M, W), and three stops. To describe this code, let us start from Equation (88) and rewrite it in the form
by selecting a factor
and developing it as
. Now, we write the Fibonacci number 8 as
and insert it in Equation (88). We have, writing
again,
This relation describes this code. Arginine, the term , is now the only sextet left. The term is suitable for the quintet leucine coded now by five codons: CUA (one codon), CUY (two codons), and UUR (two codons). The term describes the septet serine coded now by seven codons: UCN (four codons), AGY (two codons), and CUG (one codon). The remaining terms are the usual ones (see above). The case of the other non-standard genetic codes could be handled along the same lines with, of course, some additional work.
9. Conclusions
We have once again studied the genetic code symmetries by taking an unexplored route. As previously mentioned, we recently used a small set of Fibonacci-like sequences that we designed to describe the symmetries of the genetic code [
1]. However, this time, we thought of the amino acids as if they were submerged in a physiological environment (neutral pH), where four of them pick up a charge, either −1 (for aspartic acid and glutamic acid) or +1 (for arginine and lysine). The option examined in [
4,
5] is the same as this one. Additionally—and this is just as novel—we have examined two potential viewpoints for the unique amino acid proline, whose side chain is connected to its backbone twice: shCherbak’s view and the Downes–Richardson view, see
Section 1.2. We have outlined the patterns for the hydrogen atom content and the atom content for Rumer’s symmetry, as well as this for the two viewpoints indicated above (referred to as “
on” and “
off” in the text), in
Section 4.1 and
Section 4.2 with these two newly considered components. The same work has been carried out for the third-base symmetry in
Section 5.1 and
Section 5.2 and the “ideal” symmetry as well as the more complex “supersymmetry” genetic code table in
Section 6.1,
Section 6.2 and
Section 6.3. In
Section 7, we have uncovered the remarkably unique chemical structure of proline along with its corresponding “activation” key, all with a basic application of modular arithmetic. Finally, we used our Fibonacci-like sequence
once more in
Section 8 to derive, in a new way, not only the exact number of amino acids, 20, and the multiplet structure of the standard genetic code (five quartets, three sextets, nine doublets, one triplet, two singlets, and three stops) but also, through an example, the exact multiplet structure of a non-standard variant of the genetic code, the
Alternative Yeast Nuclear Code. For the other non-standard genetic codes, the strategy is analogous to the one adopted here, i.e., starting from Equation (87), or one of its variants obtained while treating a given non-standard version of the genetic code, at a given stage, and applying Fibonacci/Lucas decompositions and/or regrouping of the numeric factors. All the known non-standard versions of the genetic code, treated this way, will be the subject of a future publication, as another (new) practical application.