1. Introduction
The Topological Hidden Markov Model (THMM) from [
1] is constructed from a mixture of normal distributions to be extended to a mixture in locally convex topological spaces in order to comprehend probability density functions in infinite-dimensional spaces. Differently, in the present work, the methods of sheaf cohomology will be used instead to construct the chain for deep-machine-learning multiple sequencing.
The present paper is aimed at proving how to construct a rectangular-matrix chain for multiple sequencing starting from a square two-dimensional-matrix Markov chain after adding Morse complexes (i.e., as from [
2]).
For these purposes, the two-dimensional Markov models are considered. The more general context of rectangular chains is preferred, in which the compatibility with larger Markov models is ensured after the singular values. The corresponding probability spaces are constructed. A sheaf of constants is chosen in order to extend the two-state Markov models. The blocks to construct the (also, rectangular) new chain are added as Morse complexes, whose compatibility is proven with the simplices of the corresponding graph, i.e., with the filtration(s). The deep Markov models are constructed this way. The more general construction of rectangular-matrix Markov models is presented. The applications to develop sequencing are newly analytically modelized; this way, the dependence of the probabilities of the nucleotides in the sequences are explained.
In the present paper, the Jukes–Cantor model [
3] is considered, according to which the amino-acids in a sequence have equal probability; a Poisson model was proposed in [
4] and applied in [
5] for the expression of the probability matrix of a Markov chain to be written.
As an application, the profile Markov model, which can be extracted from [
6], for sequencing is newly analytically written; the insertion of new sequences, i.e., that of a single amino-acid and that of a sequence, is newly analytically written in order to extend the profile Markov model to a Jukes–Cantor model [
3] via the interrogation proposed in [
5] (where the interrogation from [
5] follows the long-standing problem after [
4]). After the review of several two-dimensional square matrix Markov chain sequencing models, the limitations of the deep Markov models are outlined. The way to analytically manage insertions and deletions in the sequences is newly proven.
In more detail, the methods of sheaf cohomology are used in order to analytically add blocks to a matrix and to analytically delete blocks from a matrix, whose action corresponds to add insertions and to perform deletions to a sequence. The partial matchings that are to be managed are constructed starting from a chosen sequence. In particular, the Morse simplicial complexes that are treated are those that allow us to reconduct to the wanted filtration of the probability space.
The paradigm developed at the initial stage defined in the above can be of use to shape the profile Hidden Markov Models version treated in [
7]. This item of the paradigm allows one to define the probability spaces of the two-state pair MM and of the profile HMM.
The applications to computation in biology are discussed in [
8].
The measure of the probability space is that to which the proprieties of the Markov chain allow to upgrade starting from the sigma-algebra of the Borel (sub-)sets that constitute the state space of the models.
From the probability space, a metric space can be defined, therefore endowed with a metric, on which there lives the manifold on which the graphs associated with the chain are defined: it is one of the aims of the present paper to discuss and to prove the well-posedness of the concept of distance within the metric spaces, in the way such that the distance is well-posed if two chains are obtained from the initial two-items sequences, to which the opportune sheaf (or sheaves) are applied.
The possibility to construct rectangular-matrix chains is considered as well; this construction allows one to schematize the existence of latent states [
9,
10]. The sheaf-cohomology techniques here selected also apply to rectangular matrices; the analytical managing of the Morse simplicial complexes allows one to reconduct to the wanted filtration.
As a further advantage, it is possible to reproduce the site-dependence of the probability of finding the items in a sequence.
Therefore, the model is apt to be applied to the analysis of sequences of amino-acids, as outlined in [
11].
In more detail, in the present paper, it is possible to analytically write the probabilities to find a certain number of items in a selected order. Accordingly, the new quality of the amino-acids to exhibit different probabilities to be found in a sequence, as very recently found in [
12], is analytically modelized.
As an application of the methods developed here, two nucleotide-substitution models are compared: the method from [
13] is proven to be obtained after the addition of the blocks of the sheaf needed to reproduce the wanted probability space, while the model from [
14] is proven inconsistent with the presented protocol.
Therefore, the present paper contains answers to the interrogations delivered in [
15] and in [
16] as far as the comparison of methods of nucleotide substitutions is concerned. An illustrative example is contained in which two methods are compared: one is found to be an extension of the two-state Jukes–Cantor model, while the other one is proven as not being obtainable from the Jukes–Cantor model. The relevance of constructing sequences containing pairwise repetitions of the same pattern as far as timely requests are concerned about the evolution of species is outlined in [
17].
The prospective studies to be envisaged are the recognition of patterns in the sequences and the outlining of the patterns from the experimental errors, as requested in [
18]: the interrogation can be solved as indicated in [
19] and applied as in [
20]. The patterns are ibidem shown to be contained in rectangular-matrix constructions.
The implementation of the deep Markov models can therefore be argumented to be apt for developping with the ’context-sensitive’ Markov models, i.e., from [
21], where the rendering of the wanted states as ‘context-sensitive’, can be achieved using the substitution techniques developed in the present paper.
To apply the methods of [
19], the singular-value decomposition is requested in order to control the filtration of the probability spaces.
The paper is organized as follows:
In
Section 2, several models of sequencing are discussed, each of whose exhibits stringent peculiarities for application.
In
Section 3, the ergodicity of HMMs with a ‘generalized observation structure’ is ensured.
In
Section 4, the deep Markov models are reviewed and the limitations are outlined.
In
Section 5, the probability spaces are constructed, whose features complete the limitations of the deep Markov models.
In
Section 6, the filtration of the probability space is modelized according to the choice of a (constant) sheaf; furthermore, the way to add insertions is proven to be by adding Morse complexes, to which the compatibility with the simplices of the graph holds: the methods to perform deletions is, in this way, also delineated.
In
Section 7, the outlook is proposed. In
Appendix A, further complements of the cochain theory are studied in detail in order to illustrate the proof of the new Theorem 1.
In
Appendix B, the method is demonstrated to be one of use to develop the multiple sequencing of amino-acids: the chains for sequencing are analytically constructed from the profile Markov model; the metric of the corresponding manifold is discussed. In
Appendix C, two methods of amino-acid substitutions are compared: one of them is proven to be obtained from the Jukes–Cantor model after application of the Morse operator, while the other is demonstrated as consisting of different hypotheses.
2. Introductory Material
2.1. Profile-Hidden Markov Model
As described in [
22], the construction of a Profile-hidden Markov Model (pHMM) starts with the consideration of a ‘multiple sequence alignment’.
Insertions and deletions are modelized.
The occupancy of a position and the probability to each position of the alignment are encoded in the pHMM.
In [
23], the pHMM is commented to apply to large-scale sequence analyses.
Profile-Hidden Markov States models are introduced in [
24]. The pHMM is constructed from an opportune HMM of an MSM. While the HMM consists of ‘training sets’ of unaligned sequences, the pHMM allows one to obtain the multiple alignment of all the training sequences.
In [
8], the pHMM is analyzed to exhibit higher sensitivity and to ‘recall’ the rates compared to ‘pairwise alignment methods’.
The software implementation of pHMM is reviewed in [
23].
From [
7,
22], a profile for an alignment is understood as a trivial HMM with one ‘match state’ for each column. The consecutive match states have consecutive probabilities.
The ‘output probabilities’ are defined as the probabilities given after the probabilities of finding one particular match state in the corresponding column.
’Insertions’ are defined as ‘portions of sequences that do not overlap those found in the previous constructions’. They are represented after the state space , where the states modelize the inserts of the j-th column of the alignment. The ‘output probabilities’ of the insert states are requested to equal the background probabilities.
The probabilities are given; such probabilities affect the pHMM such that the insertions can take place in different portions of the alignment. The can be time-dependent as well: the ‘affine gap penalties’ are this way defined.
From [
22], the construction of a pHMM starts with the consideration of a ‘multiple alignment’. Insertions and deletions can be modelized.
The occupancy of a position and the probability at each position are encoded in the ‘profile’.
Given the sample , the wanted set of states is chosen: the pertinent transition probabilities are specified under the initial condition as i; the ‘emission probabilities’ are specified as : the notation in the latter formula employs the row–columns description of the entries of the matrix. The corresponding pair HMM relates the probabilities distribution over certain sequences of pairs of observations.
The can be time-dependent as well and are a-priori different from the probabilities found in the previous calculations: the ‘affine gap penalties’ are this way defined.
2.2. The Context-Sensitive Hidden Markov Models
The context-sensitive Markov Models (csMMs) are presented in [
21]. Context-sensitive HMMs are used to model long-range dependencies in symbol sequences—a case to which the study of amino-acids can be reconducted.
csMMs are apt to describe ‘nested dependences’ between symbols.
The csMMs are developed in order to study long-range correlations after ‘rendering’ some of the states in the model as ‘context-sensitive’.
2.3. Pair Hidden Markov Models
From [
25], the Pair hidden Markov Models (pair HMMs) are constructed. The sample
is given, with the wanted set of pairs of states
, the transition probabilities
are written, specified under the initial conditions
: the ‘emission probabilities’ of the pair hMM are different form those obtained in the profile hMM, and are obtained as
. The pair HMM related the probability distributions
over certain sequences of pairs of observations.
From [
26], the use of conditional random fields is discussed in pair HMMs.
After the use of conditional random fields (CRFs) for time series data, linear chains are obtained [
27].
2.4. Deep Markov Models of Sequential Data
The software implementation of featuring the deep Markov models of sequential data after [
28] are written in [
29].
From the analysis ibidem, the following developments are constructed.
The chain of latent variables is introduced: each latent variable form the chain is conditioned after the previous latent variables.
Potential highly non-linear dynamics are prospected.
Definition 1. The wanted transition probabilities, which rule the dynamics of the latent variables, and the ‘emission probabilities’, which govern the observations, should be parameterized after non-linear neural networks.
Example 1. The sequence of observations is of length 3 as ; there exist a sequence of coherent random variables .
The implementation of the Markov properties of the model is to be studied.
3. About Ergodicity of HMMs with a ‘Generalized Observation Structure’
The conditions of ergodicity of HMMs with ‘generalized observation structure’ are addressed in [
30].
Under the opportune set of hypotheses, which will be tested in the following for the verification of the extent to which the work [
30] is useful in the present approach, the existence of a unique invariant measure of the pair (
) has to be verified.
The triple () is written within the asymptotic stability of probability.
The problem of ‘incorrectly initialized filters’ will be further addressed in the present work.
The asymptotic properties of the filter and the state estimator are funded on both the observations and on the (knowledge of the) initial data.
A minimal invariant measure and a maximal invariant measure can be defined.
Let be a discrete time process of a probability space (): the kernel is the transition kernel.
The part is called the ‘observation process’; its transition kernel is written as : it is parameterized after the current values of the hidden Markov process.
Let be the set of states on a Borel subset endowed with a algebra , and let : here, it is necessary to remark that the implementation from to is non-trivial within the analysis of the ergodicity.
The transition probabilities are written as
and
It is crucial to remark in the present work that, from [
30], for fixed
,
and
must be defined. In the case where they are defined, they are the probability measures of
and of
E, respectively.
It is further crucial to remark from [
30] that, for a fixed Borel subset
A of
and, if the fixed Borel
B of
E exists, then the application that acts like sending
and
has to be demonstrated as far as the new probabilities
p exist to admit a measure
and
being
the suitable Borel
fields.
The hypothesis is made in [
30] that, on the kernel
, the following definition holds:
with
, and
from the probability space is the space of probability measures.
The observation is claimed to be admitted in [
30] for the models
being
h diffeomorphisms on
.
It is one of the aims of the present work to analyze the construction.
4. About the Proposal of Deep Markov Models
In [
28], the use of Gaussian state space models has been of use in ‘generative model of sequential data’ as a long-standing attribution.
Algorithms to learn a wide class of linear state space models and of non-linear state space models have been developed.
Developments were also implemented in which the emission distributions and the transition distributions were molded after neural networks.
The posterior distributions were obtained so as to be compared with the outcomes of a ‘learning algorithm’, in which both expand the properties of a ‘compiled inference network’ and the ’generative model’, where room is left for parameterized variational approximation.
The scalability properties were exhibited.
The ‘structured approximation’ is shown to produce ‘posterior results’ with ‘significantly higher held-out distributions’. It is our aim to verify the statement analytically.
HMMs and Recurrent Neural Networks (RNNs) can be implemented with linear state space models or with Gaussian state space models.
Definition 2. A deep Markov model (DMM) is a ‘class of generative models’ in which the ‘classic’ linear emission and the classic transition distribution are substituted with complex multi-layer perceptrons (CMLPs).
There exists general state space model (GSSM’s) that are apt to keep the Markovian structure of the HMMs; in this case, however, the ‘representational power of the deep neural network to treat high-dimensional data is leveraged’.
If the DMM is ‘improved’ after increasing the latent states of the successive time step
, the DMM is ibidem hypothesized to be interpreted as a (possibly ‘more restrictive’) stochastic RNN [
31], as far as variational RNNs are concerned [
32].
The linear algorithm proposed ibidem is valid for executing ‘stochastic gradient ascent’ of the variational lower bound of the ‘likelihood’ [
33].
A new family of ‘structured inference networks’ is assessed in [
28]; it is parameterized after ’recurrent neural networks’, and the following three possibilities are comprehended:
- (i)
The ‘generative model’ is already known and assessed;
- (ii)
The functional form of the parameters estimation is known;
- (iii)
The learning deep Markov model is known.
5. The Probability Spaces
The proposed HMM is prepared form the complete model.
From the complete probability space
for the state space
, the probability is defined
The HMM is defined as the probability space (
) with the probability distribution
and the measurable space
with the probability distribution
as
The existence of the HMM is based on the verification that the map Equation (
8) is still a measurable function
. In the case where the requests after Equation (
8) are verified, the section (product of
fields)
F defines a unique measure as
is still from.
The choice does not allow for such a construction. It is verified after the verification of the measure. From this, it follows that the two models are on different manifolds.
6. Sheaf Cohomology of Models of Sequential Data
The two-dimensional matrices from [
11,
34,
35] have an eigenvalue equal to 1, i.e., they are stochastic matrices.
The construction of the
rectangular matrix in the thin singular-value decomposition is achieved as
where the apex
T indicates the transpose. In the present case, the wanted vectors are such that
for the purposes of
Appendix A. For the present purposes, it is even more conformable to investigate the thin compact singular-value decomposition
in which only the
r non-vanishing singular values are considered.
The singular values
of
allow for a decomposition
where
and
are from, in general, two sets of orthonormal bases.
The request that be unitary would lead to the definition of one corresponding normalized orthonormal basis.
It will be interesting to verify whether is a partial isometry.
From [
2], the procedures are extended here in order to add at least one block
on the diagonal of
to discuss whether it is possible to pass from the two-state Markov models to the Deep Markov model of sequential data [
28,
29].
The wanted chain can be put on a manifold, on which a graph is constructed after the path of simplices implied after . A sheaf is fixed on a simplicial complex L.
The choice of the sheaf will be specified in the following as a sheaf of constants.
The following new definition is obtained starting from [
2] Def 5.5:
Definition 3 (:). ∀ dimension , the vector space of k-cochains of L with coefficients is written as a productHere, one newly chooses from [2] to develop the case in which the vertices of L are ordered. This way, it is possible to define the topological models (of which the 3-states model is that with the lowest dimensionality). This way, the following new definition is given: Definition 4 (:). For each , the k-th coboundary map of L with coefficients is the linear mapwith coboundary operator. Remark 1. The arrow in Equation (15) is defined as a ‘block action’. For each pairs of simplices
such that
and such that
, the component of
is written as
The sequence
is forming a cochain complex over
.
The composition
for every sheaf
and, therefore, in particular, also for
.
The Morse chain complex is now discussed.
Let K be a simplicial complex with ordered vertices.
Let . Also, let be any simplices.
Now, let : it defines the coefficient of in the boundary of . The coefficient is non-vanishing iff . Here, we are interested in the determination of and define the coefficient of as 1.
In [
2] page. 98, an acyclic partial matching is fixed: there, all acyclic partial matching is defined as one whose
G-paths are all gradient.
The definition of ‘chain complex’ therefore follows Theorem 8.4 in ibidem.
From Definition 8.6 ibidem, which is aimed at associating an algebraic contribution to each G-path, the following new definition is given:
Definition 5 (:). The weight on the G-path consisting ofis defined as the product Differently, from [
2], it is our aim to express the ‘collected’ version of the weight of the
G-path, and to then choose the algebraic weight all equal to 1 and to study the
k-th coboundary map.
The weights
are also written as
We need to specify Equation (
22) for
as
for the purposes of the sequencing procedures.
The constraints on imply the looked-for entries of the singular-value decomposition form of the wanted vectors from . To this aim, the following new definition is given:
Definition 6 (:). For each , the k-th Morse boundary operator is the linear mapwhich admits a matrix representation. The entry of the matrix from Definition
are the columns of a critical
k-simplex
, and the rows of a critical
-simplex
are given as
We specify the choice of the orthonormal base (at least,
) with all the weights of the
G-path equal to 1.
The only paths that result in a non-zero contribution are those paths that can be written as
where the role of
is outlined.
The choice of the path with weight equal to 1 is .
For the sake of the present purposes of sequencing, it is necessary to use Proposition 8.8 from [
2].
It is necessary to control the number of the connecting G-paths.
The definition of a chain complex allows one to analyze the outcomes of the measurements of the sequencing resulting from the chain complex.
It is proven by induction that the wanted result is obtained when G consists of a single pair as of simplices in K.
Complements for the following matter are given in the Appendix.
The wanted set of critical simplices is defined as follows:
Theorem 1. The set of critical simplices for sequencing is defined as Proof. Equation (
27) allows one to define the new boundary operator, which defines the chain complex. □
It is crucial to comment the following:
Remark 2. The chain complex is associated with an ‘acyclic partial matching’ G.
The following is true:
Theorem 2. The partial matching has the same homology as the boundary operator as the chain complex.
Proof. From Proposition 8.10 in [
2], the Morse chain complex is, in this case, chain-homotopy equivalent to the standard simplicial chain complex. □
The wanted HMMs are, therefore, this way constructed.
Here, one needs to take a sheaf of constants as follows:
From Proposition 8.8 in [
2],
is a chain complex. From Proposition 8.10 in [
2], the Morse chain complex
is ‘chain homotopy equivalent’ to the standard simplicial complex
.
The two chain maps are described with a pair of chain homotopies.
To construct the chain homotopies, one processes the simplex-pairs in one at a time. This means the block of the matrices can be added one at a time.
For this purpose, two more arbitrary simplices and can be considered, as there is only one ‘path’ .
The entries of the new blocks of the matrices to be inserted are this way found as
Construction of the Probability Space of the HMM
The chain homotopy is defined after the linear maps
as
The difference between the ‘new’ entries and the previous ones are
where the apex ′ denotes the new entries.
Given a filtration of a simplicial complex, an acyclic ‘partial matching’ on
is compatible with the filtration if there exists a function
b such that
is associated with the map
such that
.
The blocks can, therefore, be inserted one at time; nevertheless, the blocks are different from each other. It is nevertheless ensured that the effects in determining the sequencing are wanted. Indeed, a nested sequence of Morse complexes is found. From Theorem 8.25 in [
2], there exist homeomorphisms of persistent homology groups ensuring that the inclusions of Morse complexes are equal.
7. Discussion
The present work is aimed at analyzing the DMM in deep machine learning as far as the dimensionality of the kernels is concerned for the comparison of the different models.
It is proven by induction that the topological features of the HMM eventually descending from the latent states of a two-state MSM are not comparable with those of an N-state MSM with because of the different properties of the manifold that the corresponding graphs stay on according to the analytical expressions of the eigenvalues; accordingly, not even a concept of distances between the models is well-posed.
In the present work, the dimensions of the possible matrices of the chains originating the transition probabilities of sequencing as ‘partial matching’ are achieved by added the blocks needed one at each time as nested sequence of Morse complexes; in more detail, the order at which the Morse complexes are added are proven irrelevant for the construction of the block matrix: indeed, not only one square matrix can be taken as originating the sequencing. In more detail, there exists an infinite set of rectangular matrices, discriminated only according to their singular values, which can serve for the scope of sequencing.
The definition of the probability space allows one to perform all of the requested measurements. In more detail, the measure of the probability spaces allows one to calculate the distances; the choice of the sheaf allows one to qualify the filtration of the partial matchings.
As a result, the new qualities of the DNA sequences [
12] can be analytically modelized, according to which the probability to find an amino-acid in a sequence depends on the site; in more detail, further characterization is possible, according to which the probability to find a partial matching of amino-acids is dependent on the distances among other amino-acids in the complete sequence.
Accordingly, the construction of the complete most-general rectangular-matrix chain is possible. The construction of the chain is demonstrated from the proof of Theorem 1: the partial matching 12 is chosen from a profile Markov model, and the further items are added, which correspond to blocks of the matrix. The possibility to manage deletions analytically is also performed. The analytical paradigm here developed is now apt for several analytical applications.
As a first instance, it is possible to compare Markov models for sequencing that are aimed at generalizing the two-items sequence and that have the same dimensions: as a result, it is possible to prove which ones are originated from the Jukes–Cantor model and which are not: the proof is based on the analysis of the corresponding probability space.
As a further result, from the probability space, the metric space on which there lives the manifold of the graph associated with the Markov models can be determined; the determination of the metric space allows one to discern about the well-posedness of distances, i.e., as interrogated in [
16]: the answer to the interrogation allows one to find the structures of the sequencing.
As an additional advantage, the search for hidden structures in the sequences is therefore possible; the relevance of the comparison with the Jukes–Cantor model is a long-standing interrogation, which is answered here after defining the suitable topology paradigm to be applied to implement the several two-state Markov models to a model with a higher number of states, i.e., starting from the three states for deep Markov models.
The data simulations for the papers analyzed in
Appendix C can be described in a timely manner according to the newly introduced concepts of scalability and computational complexity, as introduced in [
36,
37]. A definition for scalability in this regard has been pointed out to be not complete yet. The notions are not used here, because all the computations performed are analytical.
The following considerations can be developed.
In [
38], ‘transitions in the space of data’ are made to correspond to ‘transitions in the space of models’: in this context, the notion of ‘distance distribution functions’ are introduced. In the present paper, the well-posedness of distances in the metric spaces worked out of the probability spaces defined after the filtrations avoid this misconception according to our analytical derivations. In the case where the assistance of computer-based techniques is of interest, the pioneering work of [
39] can be considered for further developments.
According to all the statements, the issue of inquiring about the stochastic stability of deep Markov models can be addressed. From [
40], the deep Markov models from a multi-layer deep neural network are analyzed; it is observed that there exists a map between the transition probabilities and the emission probabilities: in the present work, the problem is overcome after the analytical definitions of the metric spaces, after which the ‘stability’ of the system is not stochastic.