The method shown in this analysis is based on two theoretical cornerstones: Nested Canalising Boolean Functions (NCBF) and Conflicts Strategy (CS).
2.1. Nested Canalising Boolean Functions
In the 1940s, C. H. Waddington observed that despite great genetic and environmental variability, individuals tend to manifest highly specialised characteristics [
18]. He introduced the concept of canalisation: A property of systems to buffer variability [
15]. The idea is that variability is a key concept for life that allows species benefit from it throughout evolution to achieve unique features and skills to succeed in their respective environments. However, once a useful feature has been acquired, variability could provoke its loss, e.g., a mutation could erase the effects of another mutation. At this point, it can be concluded that variability is not enough, a mechanism is needed to prioritise those variations worth obtaining and discard the rest. Canalisation is such a mechanism, operating as a selection process. It permits the development of sophisticated structures, such as fins or teeth, through the filtering of those alterations worth preserving. The development of these structures would be impossible without this phenomenon because the need for many changes towards the same direction.
In a mathematical context, canalisation can be defined through Canalised Boolean Functions (CBF) [
19]. A CBF is a function
f with
n variables, in which at least one of its variables is capable of imposing a certain result. For example,
independent of the remaining of variables. In other words, the individual
f with genotype
is bound to manifest the phenotype
b, regardless of any perturbation of the system (other variables
in
f). In this case,
is a canalising variable because it imposes a value (
b) on
f. In this sense, the canalising depth is the number of canalising variables in
f. It has been proven that non-canalising functions, whose depth equals zero, are notably more unstable than those whose canalising depth is greater than 0 [
20], which is coherent with Waddington’s notion of canalisation.
At this point, a NCBF is every CBF with a maximum canalising depth. This is a function like the one shown in Equation (
1) [
19], a function in which all its variables are canalising:
Genetic systems exhibit canalising behaviour according to this variability buffer. Consequently, NCBF have reported numerous successes in the modelling of Gene Regulatory Networks (GRN) [
4,
5,
12]. According to [
19], every NCBF can be uniquely represented under the form of Equation (
2). The relationship between both Equations (
1) and (
2) is illustrated in the example of canalisation of
Section 2.1.
where
. Note that Equation (
2) is structured in
r layers. Every layer consists of a product (
) and the other side of the module 2 operation denoted with the symbol ⊕ (where for example
). For instance, the first layer is made of the terms
and
b. The index
i denotes the layer,
j indicates the variable within a layer
i,
is the number of variables in each product, and
r is the number of products. Every variable
has a canalising value
and a canalised value. For example, the canalised value of the canalising variables in
is
b. If a variable is thought as an action or phenomenon, the canalising value is the trigger of the action whilst the canalised value is the effect of the action. For example, in the expression
, the canalising and canalised values for the variable
are 0 and 1 respectively.
Consequently, the variables in the products of outer layers have priority over the variables in the products of inner layers. This is because at the moment in which any variable equals its canalising value, its product becomes 0 along with all the nested products, no matter the values of their variables. This is the way in which the canalising behaviour of Equation (
1) is represented in Equation (
2).
On the other hand, the number of layers (
r) will not necessarily equal the number of variables (
n), this will be in the case in which every layer is made of one variable (
). It can be appreciated in Equation (
2) that all the variables in the same layer have the same canalised value. Moreover, the only canalised value that is specified is the value for the first layer (
), that is
b. It is not necessary to specify more in Equation (
2) due to the fact that in boolean algebra
– they are just different notations. Consequently, the nested negations in Equation (
2) bring about the canalised values to alternate along the layers. In other words,
,
,
and so on. In a mathematical context, variables must be ordered by their priority according to this pattern.
In practice, the advantage of employing NCBF in relation to other approaches is that the search for boolean functions is restricted to only NCBF. This strategy is based on combinatorics restricted by biological information and by models that depict the phenomenon of canalisation, which highly reduces computation time. However, the total time destined for the obtaining and validation of networks is still huge.
Example of Canalisation
In order to depict this phenomenon, a simplified example of eye colour inheritance is considered. Let us consider a couple of homozygous individuals for a particular gene, for example the gene responsible for eye colour. Assuming that one individual has blue eyes and the other brown eyes, all their descendants will be heterozygous for this gene, holding two different alleles, one from each parent. This case shows an example of eye colour inheritance. The brown allele (B) dominates over the blue allele (L) because when an individual manifests heterozygosity, the exhibited phenotype is the one determined by the brown allele. Nature allows variability through mutations (blue allele) although it prioritises those features with better adaptive outcomes (brown allele). Thus, only a few variants (phenotypes) are exhibited in the majority of individuals despite having numerous alternatives (alleles) for each feature.
This is a simplified example of canalisation. In order to model it, a function is defined to answer the question:
Will the phenotype manifest a brown colour? The answer can be either yes (1) or no (0). The behaviour of this function is exposed in Equation (
3) according to Equation (
1). A NCBF with this performance is shown in Equation (
4) according to the structures shown in Equation (
2):
According to Equation (
4), possessing the brown allele implies that
and, similarly, possessing the blue allele implies that
. Consequently, the only case in which the blue allele manifests is in homozygosity, that is to say, 25% of all the possible cases, whereas the prioritised allele will determine the rest (75%). Note that, in case that
and
,
and, although this is mathematically correct, such an arguments combination is impossible because every individual is, at least, to hold one type of allele. Thus, canalisation can be expressed through NCBF.
2.2. Conflicts Strategy
In this work, Conflicts Strategy (CS) is the name given to the theory developed in [
8]. In CS, the basic unit is what we will refer to as
pathway, the relationship between two nodes of a directed graph. These relationships establish either activation or inhibition dependencies among nodes. This definition of pathway is mathematically represented according to Equation (
5):
Q and
P are the functions that drive the behaviour of two nodes in a given graph,
u represents the time steps required to meet the relation between
Q and
P,
v is the value to which
Q must be equal to trigger the action, and
b is the value to which
P must be equal, the effect of the action. Thus, the pathways used in CS are like the ones shown in Equation (
6), where D, H, T, and R are graph nodes, and ∨, ∧, and ¬ denote the OR, AND, and NOT operators respectively:
The objective of the inference process is to obtain the expressions which best show the nodes behaviour, in other words, the nodes expressions are unknown during the inference. For instance, the graph in
Figure 1a can be separated in the pathways of
Figure 1b. In the second pathway of the set,
, the dependency conveyed is as follows: “If the function which controls the node
X equals 1, after 1 time step the function which controls the node
Y must equal 1”.
Since boolean logic admits only 2 values, nodes can be either active (1) or inactive (0). In the same way, every node in a graph divides its linked nodes in activators and inhibitors. For example, in relation to the graph in
Figure 1, the node
Y has one activator (node
X) and one inhibitor (node
Y). The reason is that the pathway
establishes an activation dependency from
X to
Y. In other words,
X activates
Y because it makes
Y equal to 1. Thus,
X is an activator of
Y and, for the same reason,
Y is an inhibitor of
Y.
Then, what is depicted in ? It is represented that at the moment in which there is not concentration of X, there will not be concentration of X on the following time step. Consequently, attending this pathway, X is a necessary condition for X, which is coherent with the behaviour of an activator. Therefore, throughout this work, it is considered a node to be activator when it manifests one of the pairs {(0, 0), (1, 1)} through at least one pathway, otherwise it is considered an inhibitor. In consequence, the same graph can give rise to several combinations of pathways, what influences the obtained model. It is important to emphasise that this is the definition of an activator and inhibitor in this analysis for both CS and NCBF.
Then, what happens when one activator and one inhibitor coincide in the same node? This is what occurs in
Figure 1a for node
Y, when
X and
Y are simultaneously active (
). According to the pathways, there is a contradiction because
Y would have to be active (
) and inactive (
) at the following time step. These contradictions are called conflicts. In this case, the conditions (left side) of the pathways of Equation (
7) overlap in a region called
, where
. Therefore, the conflict emerges when
:
Since conflicts must be solved in order to build BN models, a method of conflict resolution is proposed. This procedure is a modification of the one explained in [
8] as it will be made explicit later.
For this purpose, two concepts are to be introduced. The first one is the support (
supp), according to [
8] of a boolean function
f is the set of arguments combinations that make the function 1. For example, given the function
(Equation (
6)),
, where the order of the arguments in the tuples is
D, H. The second concept is the
antecedent and
consequent of a pathway that are their left and right sides respectively.
In order to avoid conflict, the idea is to prioritise one pathway over the other. It should be noticed that the priority between pathways is very close to the concept of canalisation, that is to say, the canalisation is the prioritised expression of certain genes over the others, and consequently, it can be obtained through CS.
Let us consider
Figure 2a, in which the supports of the antecedents of both pathways before any modification are represented. It can be observed that both supports overlap in a region
, the conditions in which the system manifests the conflict.
- Step 1.
Prioritisation of one pathway over the other. In this example, the pathway is arbitrarily prioritised, although a priority criterion will be explained in the next section.
- Step 2.
Modification of the non-prioritised pathway to avoid the conflict. The modified expression is obtained through the intersection of the original support of the non-prioritised pathway with the region of the space that does not belong to the prioritised pathway:
, what is equivalent to
(green regions in
Figure 2). Then, the modification consists of multiplying the negated antecedent of the prioritised pathway by the antecedent of the non-prioritised pathway. In our example, since
and
, the second pathway becomes
, so there is no overlapping region.
- Step 3.
Introduction of a new pathway. Finally, we introduce a pathway to conserve the dynamics underlying . Note that the effect of the non-prioritised pathway represents a dynamic that cannot be ignored due to its biological meaning. In order to preserve this dynamic, a new pathway is introduced, which obtains the non-prioritised value in the node but in two time steps whereas the prioritised one is obtained in one time step. In our example the new pathway is: .
It is important to mention that, as noted in [
8], through CS, it is not always possible to find a solution to the problem. A reason for this is that the modifications over the pathways are performed according to the information that is already known about the BN, namely, just those same pathways. Since they form a partial representation of the system, the inference process is limited to a fraction of all the solutions that could be obtained. Hence, the methodology described in [
8] avoids potential solutions as a consequence of this problem of partial information.