Next Article in Journal
Nitrogen Substituted Phenothiazine Derivatives: Modelling of Molecular Self-Assembling
Next Article in Special Issue
Supernovae, Neutrinos and the Chirality of Amino Acids
Previous Article in Journal
Antioxidant Properties of Aminoethylcysteine Ketimine Decarboxylated Dimer: A Review
Previous Article in Special Issue
The Composition and Organization of Cytoplasm in Prebiotic Cells
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Required Levels of Catalysis for Emergence of Autocatalytic Sets in Models of Chemical Reaction Systems

1
Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
2
Tampere University of Technology, Finland
3
University of Vermont, 85 South Prospect Street Burlington, VT 05405, USA
4
Santa Fe Institute, 1399 Hyde Park Road Santa Fe, NM 87501, USA
5
Department of Mathematics and Statistics, University of Canterbury, Private Bag 4800, Christchurch, New Zealand
*
Author to whom correspondence should be addressed.
Int. J. Mol. Sci. 2011, 12(5), 3085-3101; https://doi.org/10.3390/ijms12053085
Submission received: 18 February 2011 / Revised: 18 April 2011 / Accepted: 5 May 2011 / Published: 12 May 2011
(This article belongs to the Special Issue Origin of Life 2011)

Abstract

:
The formation of a self-sustaining autocatalytic chemical network is a necessary but not sufficient condition for the origin of life. The question of whether such a network could form “by chance” within a sufficiently complex suite of molecules and reactions is one that we have investigated for a simple chemical reaction model based on polymer ligation and cleavage. In this paper, we extend this work in several further directions. In particular, we investigate in more detail the levels of catalysis required for a self-sustaining autocatalytic network to form. We study the size of chemical networks within which we might expect to find such an autocatalytic subset, and we extend the theoretical and computational analyses to models in which catalysis requires template matching.

1. Introduction

In previous work we introduced and investigated a mathematical model of catalytic reaction systems and autocatalytic sets [1,2]. It was shown, both theoretically and computationally, that a linear growth rate in the level of catalysis (with increasing length n of the largest molecules in the system) is sufficient for autocatalytic sets to arise spontaneously [2,3] in a well-known binary polymer model of catalytic reaction systems [4,5].
In this paper we take a closer and more detailed look at our model and its results. First, we introduce a small modification to our mathematical definition of autocatalytic sets and the corresponding algorithm for finding them in general catalytic reaction systems (Section 3). This modification makes both the definition and the algorithm slightly simpler, and includes some specific (although probably rare) cases of autocatalytic sets which were previously left out. However, we show (formally, and in simulations) that this modified algorithm does not invalidate any previous results or conclusions.
Second, we show that there is a discrepancy between the theoretical and simulation results (Section 4). Both results show that a linear growth rate in level of catalysis is sufficient for the emergence of autocatalytic sets. However, there is a difference in the parameter values of these linear relations. Here, we recalculate and compare the required levels of catalysis in more detail and under different scenarios.
Third, we show how our model and algorithm can be used to answer other interesting questions relating to the emergence of autocatalytic sets (Section 5). In particular: What is the minimum required size of the molecule set for autocatalytic sets to emerge given a fixed (known) probability of a molecule catalyzing an arbitrary reaction?
Fourth, we show how more chemical realism can be included in our model, for example by considering template-based catalysis (Section 6). Even though this makes the model harder to analyze, it still generates interesting and useful results.
The next section briefly reviews our previously introduced model and definitions. The four sections following it will present the model modifications, extensions, and additional results mentioned above. The final section summarizes the main conclusions and discusses future directions. Mathematical proofs are provided in an Appendix.
Our study fits within a large and growing body of work that aims to formally model how self-sustaining biochemical systems necessary for life might have emerged. This is an area that has been investigated from many angles over the last three decades. Some approaches that are similar in scope but different in their specific details from the one we study here include models based on Petri-nets [6], algebraic approaches based on metabolic closure (such as Rosen’s “(M,R) systems”, [79]), computer simulations of autocatalytic networks involving artificial chemistry [10] or metabolic networks [11], differential equation modeling [1215], and Erdös-Renyi style random graph theory [16]. The idea of autocatalytic sets as a precursor to life has certainly not been without criticism [1719], but recent (exciting) experimental evidence shows that they are a real possibility [2024]. Thus, we believe that our theoretical and computational studies and results are of direct relevance in the larger context of the origin of life [5,25,26].

2. RAF Sets

Autocatalytic sets may have played an important role in the origin of life [5,15,26,27], and are a necessary, although not sufficient, condition for life. Here, autocatalytic sets are defined more formally as RAF sets as follows (see [1,2] for the full mathematical definition and notations). Given a catalytic reaction system (CRS), i.e., a network of molecule types and catalyzed chemical reactions, a (sub)set of such reactions (plus the molecules involved in the reactions in ) is called:
  • Reflexively autocatalytic (RA) if every reaction in R is catalyzed by at least one molecule involved in any of the reactions in ;
  • F-generated (F) if every reactant in can be constructed from a small “food set” F by successive applications of reactions from ;
  • Reflexively autocatalytic and F-generated (RAF) if it is both RA and F.
The food set F contains molecules that are assumed to be freely available in the environment. Thus, an RAF set formally captures the notion of “catalytic closure”, i.e., a self-sustaining set supported by a steady supply of (simple) molecules from some food set. Figure 1 shows a simple example. In [2], we also introduced a polynomial-time algorithm to find RAF sets in general catalytic reaction systems.
Note (as already stated earlier [26]) that this notion of an autocatalytic set is somewhat different from the (chemical) term autocatalytic reaction in which a molecule directly catalyzes its own production. With an autocatalytic set we do not mean a set of autocatalytic reactions, but rather a set of molecules and reactions which is collectively autocatalytic in the sense that all molecules help in producing each other (through mutual catalysis, and supported by a food set). Because of this confusion in terminology, we prefer to use the term “RAF set”.

A Model of Catalytic Reaction Systems

In [4,5], a random CRS model was introduced using binary polymers (bit strings) of length at most n as molecule types, ligation and cleavage reactions, and a probability p of any molecule catalyzing any reaction (n and p are parameters of the model). It was argued that in a “sufficiently complex” system (i.e., large enough diversity of molecule types), autocatalytic sets would appear “spontaneously”. This was later criticized for requiring an exponential increase (with n) in the (average) number of reactions catalyzed by any one molecule [17], which is chemically unrealistic. However, in [2] it was shown computationally (by applying the RAF algorithm to instances of the random CRS model), and then confirmed theoretically in [3], that only a linear growth rate (with n) in the level of catalysis is sufficient for RAF sets to appear with high probability. Furthermore, with these results it is possible to quantify “sufficiently complex” in terms of the size of the molecule set (or maximum molecule length n) and the level of catalysis (average number of reactions catalyzed by any molecule). Here, we continue our investigations of this random CRS model, which we will refer to throughout as the binary polymer model.

3. Modification of the Original Definition and Algorithm

In the original (mathematical) definition of RAF sets [2], the “RA” part is stated in terms of the support supp() of a set of reactions , which is defined as all the molecules (reactants and products) that are involved in at least one of the reactions in . For a set of reactions to be reflexively autocatalytic (RA), it is required that every reaction in is catalyzed by at least one molecule in supp().
Similarly, the “F” part of the definition is stated in terms of the closure cl(F) of the food set F relative to the set of reactions , which is defined as F together with all molecules that can be constructed from F by repeated applications of reactions in . For a set of reactions to be F-generated (F), it is required that the reactants of every reaction in are in cl(F).
Now imagine a situation where all reactions in conform to the above two requirements, except for one reaction r which is catalyzed by a molecule xF which is not involved in any other way in any reaction in . So, x ∈ cl(F), but x ∉ supp(). In words, the molecule x is in the closure of the food set (by default, as it is part of the food set), but it is not in the support of the reaction set. So, according to the original definition, this set is not RAF (it does not conform to the RA part of the definition), even though logically one would consider this case to be a proper autocatalytic set.
To remedy this, and make sure these (probably rare but relevant) cases are also included, we propose a slight modification to our original definition of RAF sets as follows:

3.1. RAF Definition

Given a catalytic reaction system 𝒬 = (X, , C), with molecule set X, reaction set , catalyzation C (a set of molecule-reaction pairs indicating which molecules catalyze which reactions), a nonempty subset of is said to be:
  • Reflexively autocatalytic (RA) if for all reactions r there exists a molecule x ∈ cl (F) such that (x, r) ∈ C;
  • F-generated if ρ() ⊆ cl (F), where ρ() is the set of all reactants in ′;
  • Reflexively autocatalytic and F -generated (RAF) if is both RA and F.
In other words, a set of reactions is RAF if each reaction in is catalyzed by at least one molecule that can be produced by the set itself (starting from the food molecules), and all reactants (of all reactions in ) can also be produced by the set itself. So, instead of using supp() in the RA part of the definition and cl(F) in the F part, as in the original definition, the modified definition uses cl(F) in both the RA and the F part, thus simplifying it slightly. Note that an RAF in the original setting of [2] is still an RAF under this new definition by virtue of the following result, a proof of which is provided in the Appendix.
Lemma 3.1 Any set of reactions that forms an RAF under the earlier definition of [2] is an RAF in the modified definition above.
This may seem like a minor point, but it could be an important one. Consider, for example, the (reverse) citric acid cycle, which has been argued to have (possibly) been a major step in the origin of life by synthesizing the basic building blocks of organic molecules [28,29]. Even though at present the catalysts that drive the reactions in this cycle are enzymes (proteins), at the early stages these reactions could very well have been catalyzed by much simpler, naturally occurring elements, or ones that are easily synthesized (by basic chemistry) from freely available inorganic molecules [30]. Thus, these original catalysts can be considered food molecules, but they are not involved (as reactants or products) in any of the reactions in the (reverse) citric acid cycle (i.e., they are not in the support). As a consequence, this important metabolic network of reactions would not be classified as an RAF with our original definition, but it is indeed a true RAF according to our modified definition.
Lemma 3.1 has several other desirable properties. First, it ensures that the recent result that Rosen’s (M,R) systems can be viewed and studied within the RAF framework [9], still holds. Second, it tells us that if a system has no RAF under the new definition, then it clearly has none under the original one, nor, indeed, under more embellished definitions of an RAF that impose further conditions so as to avoid “trivialities”. One such condition (from [1]) would be to require that not all reactions in an RAF are catalyzed by molecules in F (or in some larger subset S of molecules that can be generated from F by catalysis from F and resulting molecules). However, we can accommodate this additional condition within the new RAF framework as follows. The algorithm we will describes below constructs a unique maximal RAF (under the new definition), and from this one can easily check whether an RAF exists that satisfies the additional non-triviality condition (simply check whether this maximal RAF contains a reaction that does not have all its catalysts in F, or the larger set S). Third, spontaneous reactions (those that can proceed without any catalyst) can be allowed (and form part of an RAF) if we formally extend F by an extra “token” element with directed (catalysis) arrows from that token to all such spontaneous reactions. And there are various other extensions, restrictions, or variations that can be imposed on our more general framework.
Of course there are many more properties relevant to the origin of self-sustaining biochemistry beyond such ad-hoc conditions to exclude trivialities—for example, the dynamics of the reactions (the quantity of reagents and products (stoichiometry) along with thermodynamic considerations) and the effect of inhibition and degrading side reactions [18,19]. However, our view here is that RAF sets should be regarded very much as minimal necessary conditions for such systems, rather than in any sense sufficient. Identifying RAF sets, and establishing conditions for their existence is thus a natural question on the road to finding viable candidates for the origin of early biochemistry.
An additional benefit of our modified RAF definition is that it significantly simplifies the corresponding algorithm for finding RAF sets in general catalytic reaction systems, and its correctness proof. The original algorithm is based on repeatedly (and alternately) applying two reduction steps (starting from the full reaction set) [2]:
  • Remove all reactions that do not conform to the RA requirement;
  • Remove all reactions that do not conform to the F requirement.
However, since in our modified definition both the RA part and the F part are stated in terms of the closure, these two reduction steps can now be merged into one. As a consequence, the algorithm can be simplified to the following:

3.2. RAF Algorithm

  • Start with the complete set of reactions and the food set F;
  • Compute the closure of the food set cl(F) relative to the current set of reactions ;
  • For each reaction r for which (1) all catalysts, or (2) one or more reactants, are not in cl(F), remove r from ;
  • Repeat steps 2 and 3 until no more reactions can be removed.
The resulting (reduced) reaction set is either the (maximal) RAF set contained in the given catalytic reaction system, or it is empty, in which case there is no RAF. This was already proved for the original algorithm in [2], and a (simpler) proof for the modified version is provided in the appendix. Note, however, that the overall running time of this modified algorithm is still the same as in the original case (𝒪(||2 log ||) worst-case, but shown to be sub-quadratic on average in practice [2]), as it is dominated by step 2 (computing the closure of the food set). All results presented in this paper are generated with this new version of the RAF algorithm.

4. Required Levels of Catalysis

In [2], we showed through computer simulations that a linear growth rate in the level of catalysis (with n, the length of the largest molecules in the system) appears to be sufficient for RAF sets to occur with high probability in the binary polymer model with ligation and cleavage reactions and random catalysis. This was subsequently confirmed theoretically in [3]. However, even though both the computational and theoretical results give a linear relation for the required level of catalysis, there appears to be a significant difference in the values of the parameters of these linear functions.
This discrepancy can partly be explained by the fact that the theoretical analysis in [3] actually assumes RAF sets that involve all molecule types in the system, i.e., an RAF that contains the entire molecule set X (but not necessarily all reactions). This, of course, is a much stronger assumption than used in the simulation studies in [2], where the RAF algorithm was used to find any RAF set, regardless of how many molecules or reactions it contains. But the question remains whether the discrepancy can be explained entirely by this difference.
To answer this, we repeated the original simulations, using the RAF algorithm to find any RAF set, but this time with the modified RAF algorithm. Then we also applied the RAF algorithm to look for RAF sets that involve all molecule types in X. In both cases, we collected statistics for the average number f (n) of reactions catalyzed by any molecule (i.e., the level of catalysis) for which there is a probability Pn = 0.50 (or close to 0.50) of finding an RAF set in a number of instances of the random catalytic reaction model [31]. From these statistics, we then estimated a linear function f (n) = a + bn using an ordinary least squares regression. We compare these results with the theoretical linear relation which can be calculated from Theorem 4.1 (ii) in [3] (using Pn = 0.50 and k = t = 2).
So, in short, we compare the required levels of catalysis for RAF sets to occur with high probability for three cases:
  • Computational case for any RAF;
  • Computational case for all-molecule RAFs;
  • Theoretical case for all-molecule RAFs.
The computational values were calculated over 100 to 1000 (depending on the value of n) instances of the random catalytic reaction model, for n = 7, . . ., 20 (because of an exponential increase in the number of molecules |X| and reactions || with n, we are computationally limited to about n = 20 in these simulations).
Table 1 presents the linear relations estimated from the simulation data or calculated from the theoretical analysis. There is a difference of almost 2 orders of magnitude between the observed (simulation) slope of the required growth rate (case A) and the theoretical one (case C). However, as case B shows, this difference cannot be fully explained by the fact that the theoretical analysis assumes RAFs involving all molecules. Even with this stronger assumption, the theoretical slope (case C) is still more than twice as large as the one from the simulations (case B).
Figure 2 shows the complete data (dots for simulation values, lines for estimated or calculated linear relations). Clearly, even though the theoretically predicted value for f (n) grows quite fast with n (slope = 1.6339), the actual value grows only very slowly (slope = 0.0189). So, for example, for n = 20 the theoretically expected value for the average number of reactions that need to be catalyzed by any molecule to have RAF sets occurring with probability at least Pn = 0.50 is fC (20) = 32.678, which seems unrealistically high. However, the actual level of catalysis required is only fA(20) = 1.475, which is chemically much more plausible.

5. Required Size of the Molecule Set

In addition to the level of catalysis required for RAF sets to emerge, we can use the RAF algorithm to answer other interesting, and related, questions. For example, one could ask what the minimum required size of the molecule set is (or, in the binary polymer model, the minimum size n of the largest molecules in the system) to get RAF sets with high probability, given a fixed (known) probability p of a molecule catalyzing an (arbitrary) reaction.
Suppose we fix the probability of catalysis at p = 0.00001, or perhaps even p = 0.000001 (one in a million), which is, for example, roughly the probability in phage display of a random peptide binding an arbitrary ligand [5,32], and “the ease of polypeptide evolution with a small number of arbitrary sequences indicates that a significant fraction of all possible sequences may have functions, at least binding activity in correlation with catalytic activity” [33]. In the binary polymer model, with these probabilities of catalysis, what value of n is required to get a probability of, say, Pn ≥ 0.50 of RAF sets occurring?
Figure 3 shows the results of applying the (modified) RAF algorithm to instances of the random binary polymer model (Section 2) for the two given values of p and for n = 5, . . ., 20. With p = 0.00001 (left curve in Figure 3), Pn = 0 for n ≤ 12, but P13 = 0.982, and Pn = 1 for n ≥ 14. So, in this case a value of at least n = 13 is required to get RAF sets with high probability. For only a slightly higher probability of catalysis (p = 0.00002), a value of n = 12 would be sufficient (results not shown).
Similarly, with p = 0.000001 (right curve in Figure 3), Pn = 0 for n ≤ 15, P16 = 0.939, and Pn = 1 for n ≥ 17. So, in this case a value of at least n = 16 is required to get RAF sets with high probability. Again, for only a slightly higher probability of catalysis (p = 0.000002), a value of n = 15 would be sufficient (results not shown). The size of the molecule set in this case would be |X| = 65534.

6. Template-Based Catalysis

One could argue that the binary polymer model used in our studies so far is perhaps somewhat oversimplified to be biologically or chemically realistic. However, the model serves as a useful starting point with which precise mathematical statements can be formulated and proved, or at least tested computationally. Furthermore, our RAF definition and algorithm are independent of the particular model that is used, and can in principle also be applied to real catalytic reaction systems (for example metabolic networks, of which the already mentioned citric acid cycle is a core element). And, equally importantly, it is actually not very difficult to add more chemical realism into our mathematical models.
As one particular example, we have considered template-based catalysis [5]. The idea here is that, in order to act as a catalyst, a molecule must match at least a certain area around the reaction site according to some template-based matching rule. Similar to, for example, base-pair complementarity in RNA and DNA, we could require a catalyst to match the complement of at least four positions around the reaction site of a ligation or cleavage (two on either side). Consider the ligation reaction
00101 + 0011 > 001010011
The reaction site template in this case is 0100 (the last two bits of the first binary polymer, plus the first two bits of the second polymer). A given molecule can only act as a catalyst for this reaction if, somewhere along its binary string representation, it contains the complement of this template, i.e., 1011.
In line with the original random CRS model, and some initial simulations with such template-based catalysis [4,5], we have included this idea as follows. For each combination of a molecule xX and a reaction r, if x matches (anywhere along its length) the complement of the reaction site template of length four of r, then with probability p the pair (x, r) is included in the set of catalyzation C. We present analytical results for this model in Section 6.1 below. However, in our simulations (Section 6.2 below), we used a slightly less constrained version as follows: If a molecule happens to be shorter than length four, then it only has to match part of the complement of the reaction site template, but we require catalysts to be of at least length two. So, the molecule 01 could also catalyze the reaction in the above example. This is mainly done to give some of the food molecules (which are all bit strings up to length two in our simulations) also a chance to act as catalysts.
Note that this template matching requirement is almost the same as in the original simulations [5,3436], except that we do not allow partial matches here. So, we are considering a slightly “stronger”, or more constrained case. Some initial results on similar simulations were reported in [37].

6.1. Theoretical Results

Theorem 4.1 (ii) of Mossel and Steel (2005) shows that for polymers of length up to n over an alphabet of size κ ≥ 2, and with a food set of polymers of length up to t, the probability P (n) that there exists an all-molecule RAF is at least:
P ( n ) 1 κ ( κ e λ ) t 1 κ e λ
when each molecule catalyzes on average (at least) λn reactions (provided also that λ > loge(κ)). Notice that P (n) can be chosen as close to 1 as we wish by selecting λ large enough (and independently of n!). Thus, this result justifies the statement that the average number of reactions each molecule catalyzes needs to grow only linearly with n in order for there to be a given (high) probability of generating an RAF.
We now describe how this result modifies if catalysis is required to be template-based, as described above. Suppose that a polymer x can catalyze a given reaction only if x contains a substring of length s = s1 + s2 that is complementary to the end-segment (of length s1) and the initial segment (of length s2) of the two molecules involved in the cleavage or ligation. Thus, for the above set-up we have: s1 = s2 = 2 and so s = 4. We also assume that the probability that a molecule x catalyzes a reaction r with a complementary template just depends on x and not on r (this is the analogue of the template-free model assumption (R2) in [3]).
The following result shows that RAFs will still arise with high probability under linear growth in the average number of reactions each molecule catalyzes, provided the constant involved is increased by a factor of κs (this factor would be 16 for the binary model with template size four). More precisely we have the following result, whose proof is provided in the Appendix.
Theorem 6.1 Let Ps(n) be the probability that there exists an all-molecule RAF under this template matching model. Suppose that each molecule catalyzes (on average) at least λsn reactions, where λs = κsλ and λ > loge(κ). Then Ps(n) satisfies the same inequality as P (n), namely
P s ( n ) 1 κ ( κ e λ ) t 1 κ e λ

6.2. Computational Results

As with the original (random) model (see Section 4), we expect there to be a difference between the theoretically predicted required level of catalysis and the observed level (from simulations) in case of template-based catalysis. Unfortunately, we are computationally even more restricted with this template-based catalysis case as with the original model. The running time of our RAF algorithm is polynomial in the size of the reaction set ||, as was shown in [2]. However, since || is exponential in the size of the largest molecules n, i.e., || ∝ 2n, we can only go up to about n = 20 to get computational results in a reasonable amount of time (even on a parallel cluster) for the original model (Section 4). But for the template-based catalysis as described above, it is even worse. We now have to check for each pair of molecule xX and reaction r whether there is a template match between any part of the molecule and the complement of the (4-site) reaction template. Since both |X| and || are ∝ 2n, this means that |X × | ∝ 22n, and in practice this means that we can only go up to about n = 16 with our template-based catalysis simulations.
Figure 4 shows the results of these simulations, using our modified algorithm to find RAF sets (of any size) in the template-based catalysis case, compared to the original model (case A in Section 4). As can be expected, for smaller values of n, a higher level of catalysis is needed to find RAF sets with high probability (again, Pn = 0.50 is taken as the transition point) in the case of template-based catalysis compared to the purely random model. Since each molecule type xX is now restricted, to some extent, in terms of which reactions it can catalyze, the system as a whole is more constrained, and it will be harder to get RAF sets.
However, for longer and longer molecule types, this restriction becomes less of a problem, as a longer molecule has an increasingly higher probability of matching a given 4-site template somewhere along its length. Indeed, the required level of catalysis for the template-based case tapers off as n increases, and converges to that of the original (purely random) model, reaching the same level at n = 16. Given that the purely random model is the “limiting” case for the template-based model, we expect that the level of catalysis for this template-based model will follow the same pattern as the base model for n > 16, and follow a linear relation.

7. Conclusions

Building on our previous work, we have investigated in more detail the levels of catalysis required for the emergence of autocatalytic sets in models of chemical reaction systems. First, we have shown that there is a discrepancy between the theoretically predicted levels and the computationally observed ones. Although both results yield a linear relation between the required level of catalysis and the size n of the largest molecules in the system, in practice this required level is almost two orders of magnitude smaller than the predicted one. However, this discrepancy cannot be fully explained by the fact that the theoretical result is based on a much stronger assumption (requiring the RAF sets to contain all molecules in the system). Even for large systems (n = 20, containing several millions of molecule types), each molecule only needs to catalyze (on average) between one and two reactions to have RAF sets appear with high probability, which is chemically highly plausible.
Next, we looked at the minimum size of the molecule set (i.e., number of different types of molecules) necessary to get RAF sets given a fixed probability of catalysis. With a (realistic) probability on the order of one in a million of any molecule catalyzing any given reaction, we only need molecules up to length n = 15 or n = 16, or about 65,000 different molecule types. Again, this number is well within a plausible (experimental) range.
Finally, we studied an extension of the original model, including template-based catalysis. We established formally that in this case a linear growth rate in the level of catalysis also suffices for RAF sets to appear with arbitrary high probability. However, the simulations show that for smaller values of n (the length of the largest molecules in the system), this linear relation is not exact, and that a higher level of catalysis is necessary compared to the original (purely random) model to get RAF sets with a similar probability. But, as n increases, the template matching constraint becomes less of an issue, and the required levels of catalysis converge to those of the original model. This example shows how more chemical realism can be included in our model.
We intend to continue studying the emergence of autocatalytic sets in chemical reaction systems under various scenarios, models, and extensions. However, so far we have mainly studied the (static) underlying graph structures of such systems. One particularly important issue we hope to address in the future is the actual molecular dynamics in a given (catalytic) reaction system. Also, next to studying models of chemical reaction systems, we would like to apply our RAF framework to real (bio)chemical systems such as metabolic networks, or the collection of all known (organic) substrates and reactions. It is our hope that this line of work will help provide more insight into the (possible) origin of life in general.

Acknowledgments

The computations were performed at the Vital-IT ( http://www.vital-it.ch) center for high-performance computing of the Swiss Institute of Bioinformatics. We thank Jotun Hein for stimulating discussions. MS also thanks the Royal Society of New Zealand (James Cook fellowship).

Appendix

A1. Proof of Lemma 3.1

Let RA denote the original definition of “Reflexively Autocatalytic” from [2], and [RA1] the new one as given here in Section 3. We need to show that [RA] + [F-gen] implies [RA1], so suppose that rR′. By [RA] there exists xsupp() : (x, r) ∈ C. Either xρ(R′) or xπ(R′) (the molecules that are products of at least one reaction in ). In the first case, by [F-gen], we have x ⋲ cl (F) so [RA1] holds. In the second case, where xπ(R′) there exists (A, B) ∈ R′ for which xB. By [F-gen], A ⊆ cl (F) and so B ⊆ cl (F) and thus x ∈ cl (F). So once again [RA1] holds.

A2. Proof of RAF Algorithm Correctness

Let be the output of the RAF algorithm. We need to establish two claims, namely that (i) if is empty then contains no RAF, and (ii) if is nonempty then is an RAF, and indeed the unique maximal RAF present in .
To establish claim (i) we first justify a third claim (iii): If a reaction r lies in any RAF (say 1) that is contained within an arbitrary subset (say 2) of , then r is not eliminated if step 3 of the RAF algorithm is applied with 2 being taken as the “current set of reactions”. To see this, the fact that 1 is an RAF implies that there exists a molecule x ∈ cl1 (F) that catalyzes r, and that the reactants of r are contained within cl1 (F). Now, 1R2 and so cl1 (F) ⊆ cl2 (F); thus step 3 will not eliminate r from 2. This establishes claim (iii) which implies, by induction, the further claim (iv): If a reaction r lies within any RAF that is contained within , then r is never eliminated at step 3 of the RAF algorithm at any stage starting with . Claim (i) now follows immediately.
To establish claim (ii) we first note that, since the algorithm has terminated at some non-empty set of reactions , no reaction in can be further eliminated by step 3. By definition, this means that each reaction r in has the properties that (1) at least one catalyst of r is in cl (F) and (2) all reactants of r are in cl (F). Condition (1) implies that satisfies the (RA) condition, and property (2) shows that satisfies the F-generated condition. Thus is an RAF; moreover it is the unique maximal RAF by virtue of claim (iv) established in the previous paragraph.

A3. Proof of Theorem 6.1

We first recall two quantities (xm, rm) from [3]. Let
x m = κ m + 1 κ κ 1
denote the number of polymers of length at most m over an alphabet of size κ and let rm denote the number of forward (ligation) reactions that produce polymers of length at most m. We have:
r m m x m
For any polymer x of length at most n let Ts(x) denote the set of polymers of length s that occur within x, and for any polymer t of length s let Xn(t) denote the set of polymers of length at most n that contain at least one copy of t. Thus,
t T s ( x ) x T n ( t )
Clearly,
1 | T s ( x ) | κ s
Also, for n ≥ s, and t any polymer of length s we have:
| X n ( t ) | 1 + κ + κ 2 + + κ n s = κ n s + 1 1 κ 1
Given a (forward) reaction r, let tr denote the complement of the template polymer of length s = s1 + s2 (which is also polymer of length s). If we let Rn(t) denote the set of forward reactions r with tr = t, then:
| R n ( t ) | = r n s
In the template polymerization model, x can catalyze r only if xX(tr) (or, equivalently, if rRn(t) for some tTs(x)). Let p(x, r) be the probability that molecule x catalyzes reaction r. We have assumed that each molecule that could catalyze reaction r (by virtue of template matching) has the same probability of doing so, that is:
p ( x , r ) = { p x , if t r T n ( x ) 0 , otherwise
where px just depends on x and not r (the analogue of requirement (R2) in [3]).
Thus, if we let μx be the expected number of (forward) reactions (rR+(n)) that a molecule x catalyzes, then:
μ x = r R + ( n ) p ( x , r ) = p x · | S ( x ) |
where S(x) = {rR+(n) : xX (tr)}. Note that |S(x)| is bounded above by the number of pairs (t, r) where tTs(x) and rR+(n) : rt = t. By virtue of Inequality (3) and Equation (5) this set of pairs has size at most κsrn–s and so
| S ( x ) | κ s r n s
Now, Inequality (2) applied to m = n – s, and (7) gives:
| S ( x ) | κ s r ( n s ) κ s ( n s ) x n s
Thus, by the assumption that μxλsn and applying (6), we obtain:
λ s · n μ x p x κ s ( 1 s / n ) x n s · n
Since λs = κsλ this last inequality can be written as:
p x λ ( 1 s / n ) x n s
Now, following [3], for a forward reaction rR+(n) let q r * be the probability that r is not catalyzed by any molecule. Then
q r * = x X ( t r ) ( 1 p ( x , r ) ) = x X ( t r ) ( 1 p x )
By Inequalities (4) and (8) and the inequality (1 – x)y ≤ exp(–xy) for x, y > 0 we obtain:
q r * exp ( λ · κ n s + 1 1 ( 1 s / n ) x n s ( κ 1 ) ) < exp ( λ )
where the second inequality follows by replacing xn–s by the expression given by Equation (1) for m = ns and checking that the ratio that post-multiplies λ is greater than 1. The remainder of the proof now follows the argument in [3] for the proof of Theorem 4.1 (ii).

References and Notes

  1. Steel, M. The emergence of a self-catalysing structure in abstract origin-of-life models. Appl. Math. Lett 2000, 3, 91–95. [Google Scholar]
  2. Hordijk, W; Steel, M. Detecting autocatalytic, self-sustaining sets in chemical reaction systems. J. Theor. Biol 2004, 227, 451–461. [Google Scholar]
  3. Mossel, E; Steel, M. Random biochemical networks: The probability of self-sustaining autocatalysis. J. Theor. Biol 2005, 233, 327–336. [Google Scholar]
  4. Kauffman, SA. Autocatalytic sets of proteins. J. Theor. Biol 1986, 119, 1–24. [Google Scholar]
  5. Kauffman, SA. The Origins of Order; Oxford University Press: New York, NY, USA, 1993. [Google Scholar]
  6. Sharov, A. Self-reproducing systems: Structure, niche relations and evolution. BioSystems 1991, 25, 237–249. [Google Scholar]
  7. Letelier, JC; Soto-Andrade, J; Abarzúa, FG; Cornish-Bowden, A; Cárdenas, ML. Organizational invariance and metabolic closure: Analysis in terms of (M,R) systems. J. Theor. Biol 2006, 238, 949–961. [Google Scholar]
  8. Cornish-Bowden, A; Cárdenas, ML; Letelier, JC; Soto-Andrade, J. Beyond reductionism: Metabolic circularity as a guiding vision for a real biology of systems. Proteomics 2007, 7, 839–845. [Google Scholar]
  9. Jaramillo, S; Honorato-Zimmer, R; Pereira, U; Contreras, D; Reynaert, B; Hernández, V; Soto-Andrade, J; Cárdenas, M; Cornish-Bowden, A; Letelier, J. (M,R) Systems and RAF Sets: Common Ideas, Tools and Projections. Proceedings of the Alife XII Conference, Odense, Denmark, 19–23 August, 2010, 94–100.
  10. Flamm, C; Ullrich, A; Ekker, H; Mann, M; Hoegerl, D; Rohrschneider, M; Sauer, S; Scheuermann, G; Klemm, K; Hofacker, IL; et al. Evolution of metabolic networks: A computational framework. J Syst Chem 2010, 1. [Google Scholar] [CrossRef]
  11. Kun, A; Papp, B; Szathmáry, E. Computational identification of obligatorily autocatalytic replicators embedded in metabolic networks. Genome Biol 2008, 9. [Google Scholar] [CrossRef]
  12. Awazu, A; Kaneko, K. Discretness-induced transition in catalytic reaction networks. Phys Rev E 2007, 76, 041915:1–041915:8. [Google Scholar]
  13. Brogioli, D. Marginally stable chemical systems as precursors of life. Phys Rev Lett 2010, 105, 058102:1–058102:4. [Google Scholar]
  14. Bartsev, SI; Mezhevikin, VV. On initial steps of chemical prebiotic evolution: Triggering autocatalytic reaction of oligomerization. Adv. Space Res 2008, 42, 2008–2013. [Google Scholar]
  15. Dyson, FJ. A model for the origin of life. J. Mol. Evol 1982, 18, 344–350. [Google Scholar]
  16. Bollobas, B; Rasmussen, S. First cycles in random directed graph processes. Discret. Math 1989, 75, 55–68. [Google Scholar]
  17. Lifson, S. On the crucial stages in the origin of animate matter. J. Mol. Evol 1997, 44, 1–8. [Google Scholar]
  18. Szathmary, E. The evolution of replicators. Philos. Trans. R. Soc. Lond. B 2000, 355, 1669–1676. [Google Scholar]
  19. Orgel, LE. The implausibility of metabolic cycles on the prebiotic earth. PLoS Biol 2008, 6, 5–13. [Google Scholar]
  20. Sievers, D; von Kiedrowski, G. Self-replication of complementary nucleotide-based oligomers. Nature 1994, 369, 221–224. [Google Scholar]
  21. Lee, DH; Severin, K; Ghadiri, MR. Autocatalytic networks: The transition from molecular self-replication to molecular ecosystems. Curr. Opin. Chem. Biol 1997, 1, 491–496. [Google Scholar]
  22. Ashkenasy, G; Jegasia, R; Yadav, M; Ghadiri, MR. Design of a directed molecular network. Proc. Nat. Acad. Sci. USA 2004, 101, 10872–10877. [Google Scholar]
  23. Hayden, EJ; von Kieddrowski, G; Lehman, N. Systems chemistry on ribozyme self-construction: Evidence for anabolic autocatalysis in a recombination network. Angew. Chem. Int. Ed 2008, 120, 8552–8556. [Google Scholar]
  24. Lincoln, TA; Joyce, GF. Self-Sustained Replication of an RNA Enzyme. Science 2009, 323, 1229–1232. [Google Scholar]
  25. Penny, D. An interpretive review of the origin of life research. Biol. Philos 2005, 20, 633–671. [Google Scholar]
  26. Hordijk, W; Hein, J; Steel, M. Autocatalytic sets and the origin of life. Entropy 2010, 12, 1733–1742. [Google Scholar]
  27. Dyson, FJ. Origins of Life; Cambridge University Press: Cambridge, UK, 1985. [Google Scholar]
  28. Morowitz, HJ; Kostelnik, JD; Yang, J; Cody, GD. The origin of intermediary metabolism. Proc. Nat. Acad. Sci. USA 2000, 97, 7704–7708. [Google Scholar]
  29. Smith, E; Morowitz, HJ. Universality in intermediary metabolism. Proc. Nat. Acad. Sci. USA 2004, 101, 13168–13173. [Google Scholar]
  30. Morowitz, HJ; Srinivasan, V; Smith, E. Ligand field theory and the origin of life as an emergent feature of the periodic table of elements. Biol. Bull 2010, 219, 1–6. [Google Scholar]
  31. It was shown in [2] that Pn = 0.50 can be taken as the “transition point” between RAFs not existing at all and RAFs occurring with high probability.
  32. Scott, JK; Smith, GP. Searching for peptide ligands with an epitope library. Science 1990, 249, 286–390. [Google Scholar]
  33. Yamauchi, A; Nakashima, T; Tokuriki, N; Hosokawa, M; Nogami, H; Arioka, S; Urabe, I; Yomo, T. Evolvability of random polypeptides through functional selection within a small library. Protein Eng 2002, 15, 619–626. [Google Scholar]
  34. Bagley, RJ. A Model of Functional Self Organization; PhD Thesis; University of California, San Diego: CA, USA, 1991. [Google Scholar]
  35. Bagley, RJ; Farmer, JD. Spontaneous Emergence of a Metabolism. In Artificial Life II; Langton, CG, Taylor, C, Farmer, JD, Rasmussen, S, Eds.; Addison-Wesley: Upper Saddle River, NJ, USA, 1991; pp. 93–140. [Google Scholar]
  36. Bagley, RJ; Farmer, JD; Fontana, W. Evolution of a metabolism. In Artificial Life II; Langton, CG, Taylor, C, Farmer, JD, Rasmussen, S, Eds.; Addison-Wesley; Upper Saddle River, NJ, USA, 1991; pp. 141–158. [Google Scholar]
  37. Andersen, IT; Nan, L; Kjaersgaard, MIS. Search for life in catalytic reaction systems. Available online: http://www.stats.ox.ac.uk/research/genome/projects/pastprojects (accessed on 5 May 2011).
Figure 1. A simple example of a catalytic reaction system (CRS) with seven molecule types {a, b, c, d, e, f, g} (solid nodes) and four reactions {r1, r2, r3, r4} (open nodes). The food set is F = {a, b}. Solid arrows indicate reactants going into and products coming out of a reaction, dashed arrows indicate catalysis. The subset = {r1, r2} (shown with bold arrows) is RAF.
Figure 1. A simple example of a catalytic reaction system (CRS) with seven molecule types {a, b, c, d, e, f, g} (solid nodes) and four reactions {r1, r2, r3, r4} (open nodes). The food set is F = {a, b}. Solid arrows indicate reactants going into and products coming out of a reaction, dashed arrows indicate catalysis. The subset = {r1, r2} (shown with bold arrows) is RAF.
Ijms 12 03085f1
Figure 2. The linear relations for the three cases.
Figure 2. The linear relations for the three cases.
Ijms 12 03085f2
Figure 3. The probability Pn of finding RAF sets for different (fixed) catalysis probabilities p and values of n.
Figure 3. The probability Pn of finding RAF sets for different (fixed) catalysis probabilities p and values of n.
Ijms 12 03085f3
Figure 4. The level of catalysis f (n) required for the template-based catalysis case compared to the original (purely random) case, for different values of n.
Figure 4. The level of catalysis f (n) required for the template-based catalysis case compared to the original (purely random) case, for different values of n.
Ijms 12 03085f4
Table 1. The empirical (cases A and B) and theoretical (case C) linear relations.
Table 1. The empirical (cases A and B) and theoretical (case C) linear relations.
AfA(n) =1.0970 + 0.0189n
BfB (n) =–0.4736 + 0.7012n
CfC (n) =1.6339n

Share and Cite

MDPI and ACS Style

Hordijk, W.; Kauffman, S.A.; Steel, M. Required Levels of Catalysis for Emergence of Autocatalytic Sets in Models of Chemical Reaction Systems. Int. J. Mol. Sci. 2011, 12, 3085-3101. https://doi.org/10.3390/ijms12053085

AMA Style

Hordijk W, Kauffman SA, Steel M. Required Levels of Catalysis for Emergence of Autocatalytic Sets in Models of Chemical Reaction Systems. International Journal of Molecular Sciences. 2011; 12(5):3085-3101. https://doi.org/10.3390/ijms12053085

Chicago/Turabian Style

Hordijk, Wim, Stuart A. Kauffman, and Mike Steel. 2011. "Required Levels of Catalysis for Emergence of Autocatalytic Sets in Models of Chemical Reaction Systems" International Journal of Molecular Sciences 12, no. 5: 3085-3101. https://doi.org/10.3390/ijms12053085

Article Metrics

Back to TopTop