Next Article in Journal
Web-Based Tool for Algebraic Modeling and Mathematical Optimization
Next Article in Special Issue
Alias Structures and Sequential Experimentation for Mixed-Level Designs
Previous Article in Journal
Some New Results on Bicomplex Bernstein Polynomials
Previous Article in Special Issue
An Improved Variable Kernel Density Estimator Based on L2 Regularization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Möbius Transformation-Induced Distributions Provide Better Modelling for Protein Architecture

by
Mohammad Arashi
1,2,*,
Najmeh Nakhaei Rad
2,3,
Andriette Bekker
2 and
Wolf-Dieter Schubert
4
1
Department of Statistics, Faculty of Mathematical Sciences, Ferdowsi University of Mashhad, Mashhad 4897, Iran
2
Department of Statistics, University of Pretoria, Pretoria 0002, South Africa
3
DSI-NRF Centre of Excellence in Mathematical and Statistical Sciences (CoE-MaSS), Johannesburg 2000, South Africa
4
Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria 0002, South Africa
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(21), 2749; https://doi.org/10.3390/math9212749
Submission received: 16 September 2021 / Revised: 22 October 2021 / Accepted: 24 October 2021 / Published: 29 October 2021

Abstract

:
Proteins are found in all living organisms and constitute a large group of macromolecules with many functions. Proteins achieve their operations by adopting distinct three-dimensional structures encoded within the sequence of the constituent amino acids in one or more polypeptides. New, more flexible distributions are proposed for the MCMC sampling method for predicting protein 3D structures by applying a Möbius transformation to the bivariate von Mises distribution. In addition to this, sine-skewed versions of the proposed models are introduced to meet the increasing demand for modelling asymmetric toroidal data. Interestingly, the marginals of the new models lead to new multimodal circular distributions. We analysed three big datasets consisting of bivariate information about protein domains to illustrate the efficiency and behaviour of the proposed models. These newly proposed models outperformed mixtures of well-known models for modelling toroidal data. A simulation study was carried out to find the best method for generating samples from the proposed models. Our results shed new light on proposal distributions in the MCMC sampling method for predicting the protein structure environment.

1. Introduction

Proteins constitute a diverse set of biological macromolecules that are often referred to as the workhorses of cells because of their central role in most biological processes. Chemically, proteins are biopolymers consisting of linear sequences of amino acid covalently linked by peptide bonds, such that each polypeptide is a single large molecule. Nineteen of the natural amino acids (all but proline) have an amino group (– N H 2 ), a carboxylic acid group (– C O O H ), an amino acid-specific side-chain, and a hydrogen atom attached to a central carbon atom ( C α ). Each peptide bond links the carboxylate group of one amino acid to the amino group of the next. Protein structure is often described in terms of four levels of organisation. The primary structure is the sequence of amino acids. The secondary structure refers to the local folding of the polypeptide backbone into helices, strands, or loops. The tertiary structure describes the complex three-dimensional folding of a polypeptide. Finally, the quaternary structure describes the involvement of one or more polypeptides in creating a functional protein. The amino nitrogen, C α , and the carbonyl carbon of all residues constitute the protein backbone.
The 3D coordinates of proteins, as provided by electron microscopy, NMR, or X-ray crystallography, directly reveal the conformation of the backbone atoms, with knowledge of standard chemical bond angles and lengths incorporated during the refinement process. Generally, the backbone conformation is analysed using the backbone torsion or the dihedral angles, denoted by ϕ , ψ , and ω , as introduced by Ramachandran [1] (Figure 1A), where ω is usually close to 180 or occasionally 0 . Alternatively, virtual bond and torsion angles θ and τ may be used to describe a protein backbone representation based on only C α positions (Figure 1B).
A major challenge in molecular biology and computational biochemistry involves predicting protein 3D structure. The encoding gene provides the primary structure of a protein, and the secondary structure may be predicted computationally with high reliability using artificial neural networks [2], based on the propensity of amino acids to form different secondary structures.
However, predicting the 3D structure of a protein, especially if it is larger than 100 amino acids or if a homologue with a known structure and significant sequence identity is not available, remains challenging. This challenge is addressed by de novo structure prediction, which requires parametrized physical force fields. The probability of observing a particular conformation x of the molecule, p ( x | β ) is considered and expressed as the Boltzmann distribution:
p ( x | β ) = exp ( β U ( x ) ) Z β ,
where Z β is the normalization constant, U ( x ) is the potential energy of the molecule, β = ( k b T ) 1 is the thermodynamic beta, k b is the Boltzmann constant, and constant T is the temperature. The 3D structure of a molecule can be derived from p ( x | β ) by determining the mode of the distribution. Molecular dynamics (MD) is a simulation-based method used to probe for the mode of distribution. However, many millions to trillions of steps are required to simulate a single folding event. By contrast with MD, Monte Carlo (MC)-based methods are more time-efficient. In the Markov Chain Monte Carlo (MCMC) method, a Markov chain is constructed using the Metropolis–Hastings (MH) algorithm ([3,4]), with p ( x | β ) as the stationary distribution. A symmetric proposal distribution is utilized in the MH algorithm.
Choosing a good proposal distribution is one of the challenges in MCMC-based simulation. Gaussian perturbations are the most straightforward proposal distributions that can be used [5]. The results are more accurate when the proposal distribution is closer to the stationary distribution; therefore, protein structural information is incorporated into most proposal distributions. Using the information on angles and bond lengths observed in real proteins is a simple way to define a suitable proposal distribution. Fragment libraries for backbone angles and rotamer libraries for side-chain angles can be selected as default choices for proposal distributions [6,7,8].
Various tractable statistical distributions for modelling protein dihedral angles are briefly reviewed. These models can be used as proposal distributions for MCMC protein sampling. They can also be utilized as a prior for determining a protein structure from data. However, these models do not generate folded proteins because they work under some simplifying assumptions, both in terms of their functional form and dependency structure (see [9]). The ultimate goal of our contribution is to propose more flexible models for the proposal distribution.

1.1. Brief Overview

An overview of the models available for toroidal data that forms the departure point for the investigation in this paper follows (see [10]).
The first probability distribution on the torus was proposed by Mardia in [11]. It is the bivariate von Mises distribution:
f ( θ 1 , θ 2 ) = C exp ( κ 1 cos ( θ 1 ι 1 ) + κ 2 cos ( θ 2 ι 2 ) + cos ( θ 1 ι 1 ) , sin ( θ 1 ι 1 ) A cos ( θ 2 ι 2 ) , sin ( θ 2 ι 2 ) T ) ,
where C is the normalizing constant, ι 1 , ι 2 [ π , π ) are location parameters, κ 1 , κ 2 0 are concentration parameters, and matrix A 2 × 2 is the circular–circular dependence parameter. To move beyond the complexity created by the large number of parameters in this founding distribution, a few special cases in the literature have been considered. Rivest in [12] introduced the subclass:
f ( θ 1 , θ 2 ) exp ( κ 1 cos ( θ 1 ι 1 ) + κ 2 cos ( θ 2 ι 2 ) + α cos ( θ 1 ι 1 ) cos ( θ 2 ι 2 ) + β sin ( θ 1 ι 1 ) sin ( θ 2 ι 2 ) ) ,
where α , β R . Singh et al. in [13] proposed the sine model as a special case of (1) with one less parameter, letting α = 0 and β = κ 3 :
f ( θ 1 , θ 2 ) = C exp ( κ 1 cos ( θ 1 ι 1 ) + κ 2 cos ( θ 2 ι 2 ) + κ 3 sin ( θ 1 ι 1 ) sin ( θ 2 ι 2 ) ) ,
where
C 1 = 4 π 2 i = 0 2 i i κ 3 2 4 κ 1 κ 2 i I i ( κ 1 ) I i ( κ 2 ) ,
where I α ( z ) is the modified Bessel function of the first kind of order α . Another submodel of (1), the cosine model, was introduced by Mardia et al. in [14] by setting α = β = κ 3 :
f ( θ 1 , θ 2 ) = C exp ( κ 1 cos ( θ 1 ι 1 ) + κ 2 cos ( θ 2 ι 2 ) κ 3 cos ( θ 1 ι 1 θ 2 + ι 2 ) ) ,
where
C 1 = 4 π 2 I 0 ( κ 1 ) I 0 ( κ 2 ) I 0 ( κ 3 ) + 2 i = 1 I i ( κ 1 ) I i ( κ 2 ) I i ( κ 3 ) .
It is worth noting that Kent et al. in [15] introduced another version of the cosine model, with a negative interaction given by:
f ( θ 1 , θ 2 ) = C exp ( κ 1 cos ( θ 1 ι 1 ) + κ 2 cos ( θ 2 ι 2 ) κ 3 cos ( θ 1 ι 1 + θ 2 ι 2 ) ) ,
with the same normalizing constant as for the model with a positive interaction in (4). Kent et al. in [15] also introduced a submodel of (1), which is a hybrid between the sine and cosine models, given by:
f ( θ 1 , θ 2 ) exp κ 1 cos ( θ 1 ι 1 ) + κ 2 cos ( θ 2 ι 2 ) + β ( cosh λ 1 ) cos ( θ 1 ι 1 ) cos ( θ 2 ι 2 ) ) + sinh λ sin ( θ 1 ι 1 ) sin ( θ 2 ι 2 ) ,
where κ 1 , κ 2 0 , λ R , and for simplicity, β = 1 . Mardia and Frellsen in [16] compared the properties of these three submodels in (2), (4), and (6). The multivariate extensions of the sine model can be found in [17]. In another attempt to expand the platform of toroidal distributions, Wehrly and Johnson in [18] used a marginal specification approach to construct bivariate models with more flexible specified circular marginals. Later, Jones et al. in [19] obtained various toroidal models using the general form in [18]. In this way, Fernández-Durán in [20] proposed another general toroidal model by using a copula pdf that García-Portugués imposed periodic restrictions on in [21], and Jones et al. [19] defined it as a circula pdf, arguing that it is characterised by a circular uniform distribution. For more details, see [22].
The main incentive for defining toroidal models in recent years has been the demand from other sciences, especially bioinformatics, to model dihedral angles in order to analyse protein structures ([13,14,23,24]). However, toroidal data can also be observed in other fields, for example, in meteorology (wind directions at two different times of day) and medicine (peak systolic blood pressure during two separate time periods). For the interested reader, some applications of toroidal models can be found in [25,26,27,28,29].
Most of the proposed toroidal models are pointwise symmetric, whereas the data that they model usually represent asymmetric patterns. This inspired Ameijeiras-Alonso and Ley [24] to introduce bivariate sine-skewed distributions ( B S S ):
f B S S ( θ 1 , θ 2 ) = f ( θ 1 μ 1 , θ 2 μ 2 ) ( 1 + λ 1 sin ( θ 1 μ 1 ) + λ 2 sin ( θ 2 μ 2 ) ) ,
where f ( . , . ) is a toroidal density symmetric (pointwise) about π μ 1 , μ 2 < π , and the skewness parameters 1 λ i 1 , i = 1 , 2 , satisfy | λ 1 | + | λ 2 | 1 .
In this paper, Möbius transformation will form the foundation for the construction of competitive models. A map T : C C is a Möbius transformation if it has the following form:
T ( z ) = a z + b c z + d ,
where C is the set of complex numbers, a , b , c , d C are complex numbers, and a d b c 0 . Let S C be a unit circle, then Möbius transformation maps a point on the unit circle θ onto another θ ˜ . Jones in [30] subsequently applied the Möbius transformation to introduce a new family of distributions on the disc. Kato and Jones in [31] used the Möbius transformation to introduce a new distribution on the circle by transforming the von Mises distribution. Wang and Shimizu in [32] applied the Möbius transformation to cardioid random variables. Kato and Pewsey in [33] employed this transformation to define the unimodal bivariate wrapped Cauchy distribution by transforming the bivariate circular distribution in [34]:
f ( θ 1 , θ 2 ) = c c 0 c 1 cos ( θ 1 μ 1 ) c 2 cos ( θ 2 μ 2 ) c 3 cos ( θ 1 μ 1 ) cos ( θ 2 μ 2 ) c 4 sin ( θ 1 μ 1 ) sin ( θ 2 μ 2 ) 1 ,
where c = ( 1 ρ 2 ) ( 1 r 1 2 ) ( 1 r 2 2 ) / 4 π 2 , c 0 = ( 1 + ρ 2 ) ( 1 + r 1 2 ) ( 1 + r 2 2 ) 8 | ρ | r 1 r 2 , c 1 = 2 ( 1 + ρ 2 ) r 1 ( 1 + r 2 2 ) 4 | ρ | ( 1 + r 1 2 ) r 2 , c 2 = 2 ( 1 + ρ 2 ) r 2 ( 1 + r 2 2 ) 4 | ρ | ( 1 + r 2 2 ) r 1 , c 3 = 4 ( 1 + ρ 2 ) r 1 r 2 + 2 | ρ | ( 1 + r 1 2 ) ( 1 + r 2 2 ) , c 4 = 2 ρ ( 1 r 1 2 ) ( 1 r 2 2 ) , μ 1 , μ 2 [ π , π ) , r 1 , r 2 [ 0 , 1 ) , and 1 < ρ < 1 . Kato and McCullagh in [35] introduced the Cauchy distribution on the sphere by using a Möbius transformation.

1.2. Our Contribution

In this paper, two new distributions are introduced on the torus by applying a restricted version of the Möbius transformation developed by Kato and Pewsey in [33], namely the circular Möbius transformation that transforms θ into θ ˜ through the following mapping:
θ = ( θ ˜ , μ , ν , r ) = μ + ν + 2 arctan 1 r 1 + r tan θ ˜ ν 2 ,
where π μ , ν π , r [ 0 , 1 ) , and μ is the rotation parameter. When μ = 0 , ν and r attract the point θ towards ν . By increasing r, the concentration of the points around ν increases. If r = 0 , the transformation is identity mapping, and when r 1 , ( θ ˜ , μ , ν , r ) tends to ν . More details about the circular Möbius transformation can be found in [29,36]. The inverse of (9) can be obtained as follows:
θ ˜ = ν + 2 arctan 1 + r 1 r tan θ μ ν 2 .
More specifically, our novel contribution includes the following highlights:
  • New Möbius transformation-induced toroidal distributions are developed, acting as alternatives for existing models and efficiently outperforming them in the data application in this paper;
  • The proposed distributions reflect the protein structure more accurately than the existing models and can serve as proposal distributions for MCMC sampling of proteins since we should incorporate protein structure information into proposal distributions to obtain more accurate results;
  • Sine-skewed versions of these proposed models are introduced to meet the increasing demand for the modelling of asymmetric toroidal data;
  • The marginals of the new models lead to new multimodal circular distributions.
The remainder of this paper is organised as follows. Section 2 introduces two new distributions emanating from the sine and cosine models in (2) and (4), respectively. Section 3 introduces the sine-skewed versions of the newly proposed transformed sine and cosine models. Section 4 outlines the maximum likelihood method for obtaining the parameter estimates for the proposed models. Three real datasets, including information on angles in protein structures, are analysed in Section 5 to determine the performance of the proposed models relative to known competitors, and demonstrate their well-deserved designation as possible models for toroidal data. In Section 6, a simulation study is conducted for two reasons: (1) to explore the best method of generating samples from the newly transformed sine and cosine models, and (2) to evaluate the numerical method, followed by the acquisition of the maximum likelihood estimates (MLEs) of the parameters.

2. Two New Models on the Torus

This section highlights two new flexible models for toroidal data, obtained by transforming the sine and cosine models in (2) and (4) via a Möbius transformation.

2.1. Transformed Cosine Model

Let ( Θ ˜ 1 , Θ ˜ 2 ) have pdf (4) with ι 1 = ι 2 = 0 . Suppose that
( Θ 1 , Θ 2 ) = ( Θ ˜ 1 , μ 1 , ν 1 , r 1 ) , ( Θ ˜ 2 , μ 2 , ν 2 , r 2 ) ,
where ( . ) is defined in (9), μ 1 , μ 2 , ν 1 , ν 2 ( π , π ] , r 1 , r 2 [ 0 , 1 ) and without loss of generality ν 1 = ν 2 = 0 . Then, ( Θ 1 , Θ 2 ) has a pdf of
f ( θ 1 , θ 2 ) = C ( 1 r 1 2 ) ( 1 r 2 2 ) ( 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) ) ( 1 + r 2 2 2 r 2 cos ( θ 2 μ 2 ) ) × exp 1 ( 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) ) ( 1 + r 2 2 2 r 2 cos ( θ 2 μ 2 ) ) C 0 + C 1 cos ( θ 1 μ 1 ) + C 2 cos ( θ 2 μ 2 ) + C 3 cos ( θ 1 μ 1 ) cos ( θ 2 μ 2 ) + C 4 sin ( θ 1 μ 1 ) sin ( θ 2 μ 2 ) ,
where κ 1 , κ 2 0 , κ 3 R , C is defined in (5), and
C 0 = 2 κ 1 r 1 ( 1 + r 2 2 ) 2 κ 2 r 2 ( 1 + r 1 2 ) 4 κ 3 r 1 r 2 , C 1 = κ 1 ( 1 + r 1 2 ) ( 1 + r 2 2 ) + 2 κ 3 r 2 ( 1 + r 1 2 ) + 4 κ 2 r 1 r 2 , C 2 = κ 2 ( 1 + r 1 2 ) ( 1 + r 2 2 ) + 2 κ 3 r 1 ( 1 + r 2 2 ) + 4 κ 1 r 1 r 2 , C 3 = 2 κ 1 r 2 ( 1 + r 1 2 ) 2 κ 2 r 1 ( 1 + r 2 2 ) κ 3 ( 1 + r 1 2 ) ( 1 + r 2 2 ) , C 4 = κ 3 ( 1 r 1 2 ) ( 1 r 2 2 ) ,
where μ 1 , μ 2 [ π , π ) are location parameters, κ 1 , κ 2 0 are concentration parameters, κ 3 is the circular–circular dependence parameter, and r 1 and r 2 regulate the concentrations of the marginal distributions. In (10), when r 1 = r 2 = 0 , the cosine model (4) is obtained. If κ 1 , κ 2 , κ 3 = 0 yields the bivariate wrapped Cauchy distribution, then θ 1 θ 2 follows. The pdf and contour plots of (10) are shown in Figure 2 for μ 1 = μ 2 = 0 and different values of κ 1 , κ 2 , κ 3 , r 1 and r 2 , and reveal unimodal and bimodal behaviour.
Proposition 1.
Assuming the transformed cosine model (10), when r 1 , r 2 0 , then ( Θ 1 , Θ 2 ) has approximately a bivariate normal distribution if and only if κ 3 κ 1 κ 2 κ 1 + κ 2 .
Proof. 
See Appendix A. □
In the following, the marginal pdf and conditional pdf of the transformed cosine model (10) and their properties are discussed. The marginal pdf of θ 1 for the transformed cosine model in (10) is as follows:
f Θ 1 ( θ 1 ) = 2 π C ( 1 r 1 2 ) 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) I 0 ( h ( θ 1 ) ) exp κ 1 ( 1 + r 1 2 ) cos ( θ 1 μ 1 ) 2 κ 1 r 1 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) ,
where
h ( θ 1 ) = κ 2 2 + κ 3 2 2 κ 2 κ 3 ( 1 + r 1 2 ) cos ( θ 1 μ 1 ) 2 r 1 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) 1 / 2 ,
and C is as defined in (5). The marginal pdf of Θ 1 in (12) is symmetric to μ 1 , small values of κ 3 approximate the transformed von Mises distribution [31], and r 1 = 0 , which simplifies to the marginal pdf of the cosine model [14]. It is clear that for r 1 = 0 and small values of κ 3 , the von Mises distribution is approximated. If κ 1 = κ 2 = κ 3 = 0 in (12), then the Möbius-transformed uniform distribution is obtained. For κ 1 = κ 2 = κ 3 = r 1 = 0 , the distribution is uniform. When κ 1 = κ 2 = 0 in (12), the distribution is the transformed von Mises distribution [31], and when κ 1 = κ 2 = r 1 = 0 , the von Mises distribution is obtained. The plots of this generalized marginal pdf of Θ 1 are shown in Figure 3 (left) for μ 1 = 0 and different values of κ 1 , κ 2 , κ 3 and r 1 , reflecting unimodal and bimodal graphs. In the following theorem, the modality of the marginal density function Θ 1 is addressed.
Corollary 1.
The marginal distribution of Θ 1 in (12) is symmetric around θ 1 = μ 1 and unimodal (with mode at μ 1 ) if and only if A ( κ 2 κ 3 ) κ 2 κ 3 2 r 1 ( 1 r 1 ) 2 ( 1 r 1 2 ) 2 + κ 1 / κ 2 κ 3 , where A ( κ ) = I 1 ( κ ) / I 0 ( κ ) . Moreover, the marginal distribution of Θ 1 in (12) is bimodal (with the modes at μ 1 θ 1 * and μ 1 + θ 1 * ) if and only if A ( κ 2 κ 3 ) κ 2 κ 3 > 2 r 1 ( 1 r 1 ) 2 ( 1 r 1 2 ) 2 + κ 1 / κ 2 κ 3 , and θ 1 * is the root of κ 2 κ 3 ( 1 r 1 2 ) 2 A ( h ( θ 1 * ) ) / h ( θ 1 * ) 2 r 1 ( 1 + r 1 2 2 r 1 cos ( θ 1 * μ 1 ) ) κ 1 ( 1 r 1 2 ) 2 = 0 , where h ( θ ) is as defined in (13).
Proof. 
See Appendix A. □
The conditional pdf f ( θ 2 Θ 1 = θ 1 ) results in the transformed von Mises distribution [31] given by the following:
f ( θ 2 Θ 1 = θ 1 ) = 1 r 2 2 2 π I 0 ( h ( θ 1 ) ) 1 1 + r 2 2 2 r 2 cos ( θ 2 μ 2 ) × exp h ( θ 1 ) cos τ ( ( 1 + r 2 2 ) cos ( θ 2 μ 2 ) 2 r 2 ) + h ( θ 1 ) sin τ ( 1 r 2 2 ) sin ( θ 2 μ 2 ) 1 + r 2 2 2 r 2 cos ( θ 2 μ 2 ) ,
where h ( θ 1 ) is as defined in (13), and
tan τ = κ 3 ( 1 r 1 2 ) sin ( θ 1 μ 1 ) κ 2 ( 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) ) κ 3 ( ( 1 + r 1 2 ) cos ( θ 1 μ 1 ) 2 r 1 )
Note that for r 1 = r 2 = 0 , (14) simplifies to the von Mises distribution with the parameters τ and h ( θ 1 ) in (13).

2.2. Transformed Sine Model

Let ( Θ ˜ 1 , Θ ˜ 2 ) have a bivariate pdf (2), with ι 1 = ι 2 = 0 . Suppose that
( Θ 1 , Θ 2 ) = ( Θ ˜ 1 , μ 1 , ν 1 , r 1 ) , ( Θ ˜ 2 , μ 2 , ν 2 , r 2 ) ,
where ( . ) is as defined in (9), μ 1 , μ 2 , ν 1 , ν 2 ( π , π ] , r 1 , r 2 [ 0 , 1 ) , and without loss of generality ν 1 = ν 2 = 0 . Then, ( Θ 1 , Θ 2 ) has a pdf as follows:
f ( θ 1 , θ 2 ) = C ( 1 r 1 2 ) ( 1 r 2 2 ) ( 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) ) ( 1 + r 2 2 2 r 2 cos ( θ 2 μ 2 ) ) × exp 1 ( 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) ) ( 1 + r 2 2 2 r 2 cos ( θ 2 μ 2 ) ) C 0 + C 1 cos ( θ 1 μ 1 ) + C 2 cos ( θ 2 μ 2 ) + C 3 cos ( θ 1 μ 1 ) cos ( θ 2 μ 2 ) + C 4 sin ( θ 1 μ 1 ) sin ( θ 2 μ 2 ) ,
where κ 1 , κ 2 0 , κ 3 R , C is as defined in (3), and
C 0 = 2 κ 1 r 1 ( 1 + r 2 2 ) 2 κ 2 r 2 ( 1 + r 1 2 ) , C 1 = κ 1 ( 1 + r 1 2 ) ( 1 + r 2 2 ) + 4 κ 2 r 1 r 2 , C 2 = κ 2 ( 1 + r 1 2 ) ( 1 + r 2 2 ) + 4 κ 1 r 1 r 2 , C 3 = 2 κ 1 r 2 ( 1 + r 1 2 ) 2 κ 2 r 1 ( 1 + r 2 2 ) , C 4 = κ 3 ( 1 r 1 2 ) ( 1 r 2 2 ) ,
where μ 1 , μ 2 [ π , π ) are location parameters, κ 1 , κ 2 0 are concentration parameters, κ 3 is the circular–circular dependence parameter, and r 1 and r 2 regulate the concentrations of the marginal distributions. If r 1 = r 2 = 0 in (16), then the sine model in (2) follows. The pdf and contour plots of (16) are shown in Figure 4 for μ 1 = μ 2 = 0 and for different values of κ 1 , κ 2 , κ 3 , r 1 and r 2 . As can be seen, this transformed sine pdf (16) can have both unimodal and bimodal forms.
Proposition 2.
Assuming the transformed sine model in (16), when r 1 , r 2 0 , then ( Θ 1 , Θ 2 ) has an approximately bivariate normal distribution if and only if κ 3 2 < κ 1 κ 2 .
Proof. 
Similarly, Theorem 1 is proved using the results in [13]. □
In this case, the marginal pdf of Θ 1 for the transformed sine model in (16) is as follows:
f Θ 1 ( θ 1 ) = 2 π C ( 1 r 1 2 ) 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) I 0 ( h ( θ 1 ) ) exp κ 1 ( 1 + r 1 2 ) cos ( θ 1 μ 1 ) 2 κ 1 r 1 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) ,
where
h ( θ 1 ) = κ 2 2 + κ 3 ( 1 r 1 2 ) 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) 2 sin 2 ( θ 1 μ 1 ) 1 / 2 ,
and C, as shown in (3). The marginal pdf of Θ 1 is symmetric around μ 1 . If κ 3 = 0 , the distribution is the transformed von Mises distribution [31]. If r 1 = 0 in (18), the marginal distribution of the sine model [13] is obtained. The plots of the marginal pdf of Θ 1 in (18) are shown in Figure 3 (right) for μ 1 = 0 and different values of κ 1 , κ 2 , κ 3 , and r 1 . As can be seen, the distribution can be both unimodal and bimodal. In the following theorem, the modality of the marginal pdf of Θ 1 in (18) is explored.
Corollary 2.
The marginal distribution of Θ 1 in (18) is symmetric around θ 1 = μ 1 and unimodal (with mode at μ 1 ) if and only if A ( κ 2 ) κ 2 2 r 1 ( 1 r 1 ) 2 ( 1 r 1 2 ) 2 + κ 1 / κ 3 2 , where A ( κ ) = I 1 ( κ ) / I 0 ( κ ) . Moreover, the marginal distribution of Θ 1 in (18) is bimodal (with the modes at μ 1 θ 1 * and μ 1 + θ 1 * ) if and only if A ( κ 2 ) κ 2 > 2 r 1 ( 1 r 1 ) 2 ( 1 r 1 2 ) 2 + κ 1 / κ 3 2 , and θ 1 * is the root of κ 3 2 ( 1 r 1 2 ) 2 cos θ 1 * A ( h ( θ 1 * ) ) / h ( θ 1 * ) 2 r 1 ( 1 + r 1 2 2 r 1 cos ( θ 1 * μ 1 ) ) κ 1 ( 1 r 1 2 ) 2 = 0 , where h ( θ ) is as defined in (19).
Proof. 
See Appendix A. □
The conditional pdf f ( θ 2 Θ 1 = θ 1 ) is given by:
f ( θ 2 Θ 1 = θ 1 ) = 1 r 2 2 2 π I 0 ( h ( θ 1 ) ) 1 1 + r 2 2 2 r 2 cos ( θ 2 μ 2 ) × exp h ( θ 1 ) cos τ ( ( 1 + r 2 2 ) cos ( θ 2 μ 2 ) 2 r 2 ) + h ( θ 1 ) sin τ ( 1 r 2 2 ) sin ( θ 2 μ 2 ) 1 + r 2 2 2 r 2 cos ( θ 2 μ 2 ) ,
where h ( θ 1 ) is as defined in (19), and
tan τ = κ 3 κ 2 ( 1 r 1 2 ) sin ( θ 1 μ 1 ) 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 )
Interestingly, the conditional distribution is the transformed von Mises distribution [31]. When r 1 = r 2 = 0 in (20), the von Mises distribution with parameters τ and h ( θ 1 ) is obtained.

3. Sine-Skewed Transformed Sine and Cosine Distributions

In practice, it is possible to have skewed toroidal datasets, despite the well-known toroidal distributions being pointwise symmetric. Therefore, it would be interesting to extend this methodology to the recent model of Ameijeiras-Alonso and Ley in [24]. In this section, the skewed versions of the proposed transformed sine and cosine models in (16) and (10) are introduced. In addition, Abe and Pewsey’s skew model in [37] is applied to extend models on the circle manifold using marginal density functions.
By substituting (10) in (7), the sine-skewed transformed cosine ( B S S T C ) distribution can be defined as follows:
f B S S T C ( θ 1 , θ 2 ) = C ( 1 r 1 2 ) ( 1 r 2 2 ) ( 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) ) ( 1 + r 2 2 2 r 2 cos ( θ 2 μ 2 ) ) × exp 1 ( 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) ) ( 1 + r 2 2 2 r 2 cos ( θ 2 μ 2 ) ) C 0 + C 1 cos ( θ 1 μ 1 ) + C 2 cos ( θ 2 μ 2 ) + C 3 cos ( θ 1 μ 1 ) cos ( θ 2 μ 2 ) + C 4 sin ( θ 1 μ 1 ) sin ( θ 2 μ 2 ) × 1 + λ 1 sin ( θ 1 μ 1 ) + λ 2 sin ( θ 2 μ 2 ) ,
where κ 1 , κ 2 0 , κ 3 R , C is as defined in (5), and C 0 C 4 are as defined in (11). The pdf and contour plots of the sine-skewed transformed cosine model for κ 1 = 0.2 , κ 2 = 0.3 , κ 3 = 0.2 , r 1 = 0.2 , r 2 = 0.1 , μ 1 = μ 2 = 0 , and different values of λ 1 and λ 2 are shown in Figure 5 (top).
The marginal pdf of θ 1 for B S S T C in (22) is as follows:
f Θ 1 ; B S S ( θ 1 ) = C ( 1 r 1 2 ) 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) exp κ 1 ( 1 + r 1 2 ) cos ( θ 1 μ 1 ) 2 κ 1 r 1 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) × 2 π I 0 ( h ( θ 1 ) ) 1 + λ 1 ( 1 r 1 2 ) sin ( θ 1 μ 1 ) / 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) + λ 2 A ( h ( θ 1 ) ) cos ( τ + μ 2 )
where h ( θ 1 ) and τ are obtained from (13) and (15), respectively. When λ 2 = 0 , f Θ 1 ; B S S ( θ 1 ) is the Möbius-transformed sine-skewed version [37] of the marginal pdf of the cosine model. The plots of the skewed pdf in (23) are shown in Figure 6 (left) for μ 1 = μ 2 = 0 and different values of κ 1 , κ 2 , κ 3 , r 1 , λ 1 , and λ 2 . As can be observed, the distribution can be both unimodal and bimodal.
Similarly, from (16) and (7), the sine-skewed transformed sine ( B S S T S ) distribution can be obtained as follows:
f B S S T S ( θ 1 , θ 2 ) = C ( 1 r 1 2 ) ( 1 r 2 2 ) ( 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) ) ( 1 + r 2 2 2 r 2 cos ( θ 2 μ 2 ) ) × exp 1 ( 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) ) ( 1 + r 2 2 2 r 2 cos ( θ 2 μ 2 ) ) C 0 + C 1 cos ( θ 1 μ 1 ) + C 2 cos ( θ 2 μ 2 ) + C 3 cos ( θ 1 μ 1 ) cos ( θ 2 μ 2 ) + C 4 sin ( θ 1 μ 1 ) sin ( θ 2 μ 2 ) × 1 + λ 1 sin ( θ 1 μ 1 ) + λ 2 sin ( θ 2 μ 2 ) ,
where κ 1 , κ 2 0 , κ 3 R , C is as defined in (3), and C 0 C 4 are defined in (17). The pdf and contour plots of the sine-skewed transformed sine model for κ 1 = 2 , κ 2 = 0.6 , κ 3 = 2 , r 1 = 0.1 , r 2 = 0.1 , μ 1 = μ 2 = 0 , and different values of λ 1 and λ 2 are shown in Figure 5 (bottom).
The marginal pdf of θ 1 for B S S T S is of the same density as in (23), where h ( θ 1 ) and τ are obtained from (19) and (21). When λ 2 = 0 , f θ 1 ; B S S ( θ 1 ) is the Möbius-transformed sine-skewed version [37] of the marginal pdf of the sine model. The plots of the skewed pdf in (23) are shown in Figure 6 (right) for μ 1 = μ 2 = 0 and different values of κ 1 , κ 2 , κ 3 , r 1 , λ 1 , and λ 2 . Figure 6 illustrates that the distribution can have both unimodal and bimodal forms.
To expand the skewed circular models, the following models are introduced based on the k sine-skewed model of [37]. The skewed version of the marginal distribution of Θ 1 in (12) is the following:
f S S ( θ 1 ) = 2 π C ( 1 r 1 2 ) 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) I 0 ( h ( θ 1 ) ) exp κ 1 ( 1 + r 1 2 ) cos ( θ 1 μ 1 ) 2 κ 1 r 1 1 + r 1 2 2 r 1 cos ( θ 1 μ 1 ) × ( 1 + λ sin ( k ( θ 1 μ 1 ) ) ) ,
where C is as defined in (5), h ( θ 1 ) is as defined in (13), and 1 λ 1 . λ > 0 leads to left-skewed distributions, and λ < 0 provides right-skewed distributions. The plots of the skewed pdf in (25) are shown in Figure 7 (left) for k = 1 , μ 1 = 0 , and different values of κ 1 , κ 2 , κ 3 , r 1 , and λ .
Similarly, the sine-skewed version [37] of the marginal pdf of Θ 1 in (18) is of the same density as in (25), where C is as defined in (3), h ( θ 1 ) is as defined in (19), and 1 λ 1 . The plots of the sine-skewed version of the marginal pdf in (18) are shown in Figure 7 (right) for k = 1 , μ 1 = 0 , and different values of κ 1 , κ 2 , κ 3 , r 1 , and λ . As can be seen, the distribution is both unimodal and bimodal. Multimodal results for k > 1 .

4. Maximum Likelihood Estimation

In this section, the maximum likelihood method is outlined to obtain the estimates of parameters for both the transformed cosine and sine models. Suppose that ζ = ( μ 1 , μ 2 , κ 1 , κ 2 , κ 3 , r 1 , r 2 ) T are the parameters associated with the transformed cosine model (10). The log-likelihood function of the transformed cosine model is represented as follows:
l ( ζ ) = n log C + n log ( 1 r 1 2 ) + n log ( 1 r 2 2 ) i = 1 n log ( 1 + r 1 2 2 r 1 cos ( θ 1 i μ 1 ) ) i = 1 n log ( 1 + r 2 2 2 r 2 cos ( θ 2 i μ 2 ) ) + i = 1 n 1 ( 1 + r 1 2 2 r 1 cos ( θ 1 i μ 1 ) ) ( 1 + r 2 2 2 r 2 cos ( θ 2 i μ 2 ) ) × C 0 + C 1 cos ( θ 1 i μ 1 ) + C 2 cos ( θ 2 i μ 2 ) + C 3 cos ( θ 1 i μ 1 ) cos ( θ 2 i μ 2 ) + C 4 sin ( θ 1 i μ 1 ) sin ( θ 2 i μ 2 ) ,
where C is as defined in (5), and C 0 C 4 are as defined in (11). The MLE of the parameters, ζ ^ = ( μ ^ 1 , μ ^ 2 , κ ^ 1 , κ ^ 2 , κ ^ 3 , r ^ 1 , r ^ 2 ) T , can be determined by maximizing (26) with respect to ζ = ( μ 1 , μ 2 , κ 1 , κ 2 , κ 3 , r 1 , r 2 ) T .
Supposing that ζ = ( μ 1 , μ 2 , κ 1 , κ 2 , κ 3 , r 1 , r 2 ) T are the parameters associated with the transformed sine model (16), the log-likelihood function of the transformed sine model can be represented as follows:
l ( ζ ) = n log C + n log ( 1 r 1 2 ) + n log ( 1 r 2 2 ) i = 1 n log ( 1 + r 1 2 2 r 1 cos ( θ 1 i μ 1 ) ) i = 1 n log ( 1 + r 2 2 2 r 2 cos ( θ 2 i μ 2 ) ) + i = 1 n 1 ( 1 + r 1 2 2 r 1 cos ( θ 1 i μ 1 ) ) ( 1 + r 2 2 2 r 2 cos ( θ 2 i μ 2 ) ) × C 0 + C 1 cos ( θ 1 i μ 1 ) + C 2 cos ( θ 2 i μ 2 ) + C 3 cos ( θ 1 i μ 1 ) cos ( θ 2 i μ 2 ) + C 4 sin ( θ 1 i μ 1 ) sin ( θ 2 i μ 2 ) ,
where C is as defined in (3), and C 0 C 4 are as defined in (17). The maximization of (27) with respect to ζ = ( μ 1 , μ 2 , κ 1 , κ 2 , κ 3 , r 1 , r 2 ) T results in the MLE of the parameters, ζ ^ = ( μ ^ 1 , μ ^ 2 , κ ^ 1 , κ ^ 2 , κ 3 ^ , r ^ 1 , r ^ 2 ) T .
By setting the partial derivatives of the log-likelihood functions in (26) and (27) with respect to ζ to zero, the MLEs of ζ = ( μ 1 , μ 2 , κ 1 , κ 2 , κ 3 , r 1 , r 2 ) T can be derived for the transformed cosine and sine models. Given the fact that no closed-form expressions exist, it is necessary to use numerical methods to obtain the MLEs. Operationally, the maximization of (26) and (27) with respect to ζ is obtained by the DEoptim package in the R software [38] based on the differential evolution (DE) algorithm [39]. Extensive studies have validated its significant performance as a global optimization algorithm for continuous numerical minimization problems [40]. It is worth noting that this package was also used to obtain the MLEs of the parameters for sine-skewed versions and mixtures of transformed cosine and sine models.

5. Protein Structure Application

To demonstrate the performance of the proposed models in modelling the dihedral angles and the planar and torsion angles in a protein structure, three datasets are considered, which are available at http://scop.mrc-lmb.cam.ac.uk/scop/. SCOP.1 contains 10,188 planar and torsion angles ( θ , τ ) (see Figure 1A) for about 63 protein domains that were randomly selected from three remote protein classes in the structural classification of proteins (SCOP). SCOP.3 includes 4607 planar and torsion angles ( θ , τ ) from approximately 40 protein chains, and the TCBIG.VAL.right set consists of 2673 dihedral angles ( ϕ , ψ ) (see Figure 7B) [41]. The Ramachandran plots [1] for each dataset are presented in Figure 8. As can be seen, the datasets are at least bimodal, so bimodal or mixture distributions will be good choices for fitting.
The transformed sine and cosine models in (16) and (10), along with their competitors—the sine model, and a mixture of sine models (see (2); [13]), the cosine model, and a mixture of cosine models (see (4); [14]), and a mixture of bivariate wrapped Cauchy models (see (8); [33])—were fitted to the SCOP.1 and SCOP.3 datasets. A mixture distribution with two components was investigated as follows:
g M ( θ 1 , θ 2 ) = p f 1 ( θ 1 , θ 2 ) + ( 1 p ) f 2 ( θ 1 , θ 2 )
where p [ 0 , 1 ] and f 1 ( . , . ) and f 2 ( . , . ) are two toroidal distributions. The estimation of parameters, identifiability, and choosing the number of mixing components and parameters are among the well-known challenges in the application of mixture distributions. Furthermore, when the empirical density of the data is highly asymmetric, it can result in a misleading statistical inference of the parameters [42]. Multimodal distributions, which represent the random behaviour of data with multi-mode presence, can provide better model fitting. This is observed here using the bimodal transformed sine model.
The sine-skewed versions of the aforementioned distributions [24] form part of these evaluations. The results, including the MLEs of parameters, log-likelihood, Akaike information criterion (AIC), and the Bayesian information criterion (BIC), are shown in Table 1 and Table 2. Based on these results, the bimodal transformed sine model in (16) provides the best fit for the data, and its performance is better than that of the mixture models for these datasets. Based on the symmetry test of Ameijeiras-Alonso and Ley in [24] and the values of log-likelihood in Table 1 and Table 2, there is no evidence that rejects the fact that underlying distributions for SCOP.1 and SCOP.3 are pointwise symmetric. The results of the mixture of transformed sine and the mixture of transformed cosine models are not reported in Table 1 and Table 2 because p ^ 1 . Scatter plots of the data, together with contour plots of the fitted distributions are provided in Figure 9 and Figure 10.
With the last dataset TCBIG.VAL.right, good results are not observed upon application of the single component distributions. Therefore, a mixture model might offer a solution. Subsequently, only mixtures of the aforementioned distributions were considered. For comparison, goodness-of-fit was evaluated for mixtures of distributions from transformed sine and cosine models, and for mixtures of distributions from existing models. The results are listed in Table 3. As can be seen, the mixture of transformed sine models provides the best fitting of the data. Scatter plots of the data and contour plots of the fitted distributions are shown in Figure 11.
The kernel density plots of the three datasets and the best-fit models obtained for each dataset are shown in Figure 12. According to the levels of contours in the kernel densities of the data and fitted curves, our proposed models provide an accurate fit.

6. Simulation Study

The authors of Ref. [16] explored suitable methods for generating samples from cosine (with positive interaction) and sine models. They found that both Gibbs and rejection sampling approaches performed well, but the latter was more efficient. To simulate a sample from the newly proposed transformed sine and transformed cosine distributions in (16) and (10), four packages in R, which are generally based on rejection sampling, including MCMCpack [43], gibbs.met [44], LearnBayes [45], and MHadaptive [46], were used and the results were compared. These packages are based on Metropolis sampling, random walk Metropolis sampling, Metropolis-Hastings MCMC sampling, and Gibbs sampling with Metropolis steps. First, a sample of size n = 1000 was generated with each package from the transformed sine model in (16), with the parameters κ 1 = 2.1585 , κ 2 = 0.3489 , κ 3 = 3.1712 , r 1 = 0.6036 , r 2 = 0.0131 , μ 1 = 1.8573 , and μ 2 = 2.4321 (the best-fit model for the SCOP.1 dataset in the previous section). The results, including scatter plots of simulated samples with contour plots of the distribution, trace plots, and compare-partial plots [47], which use the last 10 percent of the chain, are shown in Figure 13. The runtime of each method is shown in Figure 14 (left) for a sample size of n = 100 [48] (system: Intel(R) Core(TM) i7-8550U CPU @ 1.80 GHz RAM 8.00 GB). Second, the MLE of the parameters and bias and the mean squared error (MSE) of the estimates were calculated for each method using the Monte Carlo method, with 500 replications and n = 1,001,000. The results are listed in Table 4.
Similarly, for the transformed cosine model in (10) with parameters κ 1 = 3.9891 , κ 2 = 0.6532 , κ 3 = 1.7911 , r 1 = 0.2305 , r 2 = 0.5046 , μ 1 = 1.5651 , and μ 2 = 0.9878 , the aforementioned R packages were applied, first to generate a sample size of n = 1000 . The results, including scatter plots of simulated samples with contour plots of the distribution, trace plots, and compare-partial plots [47], are shown in Figure 15. The runtime of each method is presented in Figure 14 (right) for a sample size of n = 100 [48]. Then, the MLE of the parameters and bias and the MSE of the estimates were calculated for each method using the Monte Carlo method, with 500 replications and n = 100,1000. The results are listed in Table 4, which support the performance of the selected approach for obtaining the MLEs of parameters. As shown in Figure 14, the MCMCmetrop1R is the highest-speed method, and gibbs_met is the lowest-speed method. According to the results in Table 4, rejection sampling provides accurate results. Gibbs sampling with Metropolis steps (gibbs_met) is also precise despite the low speed. With increasing n, bias and MSE decrease.

7. Conclusions

In MCMC protein sampling for predicting the 3D structure, when the proposal distribution is closer to the stationary distribution, the results are more accurate. Therefore, a suitable proposal distribution can be defined using the angles and bond lengths observed in natural proteins. Statistical distributions for modelling protein dihedral angles can be used as proposal distributions for MCMC protein sampling. We gave a brief overview of existing symmetric models that formed the basis of the proposed models in this paper ((2) and (4)). In addition, new Möbius transformation-induced toroidal distributions, together with skewed versions, were developed in this study as alternatives to proposal distributions for the MCMC sampling of proteins. We demonstrated their performance with three protein datasets of toroidal nature and graphically illustrated their flexible behaviour. The AIC and BIC confirmed the better performance of our proposed models in comparison with the existing models. These newly proposed models even outperformed mixtures of well-known models for modelling toroidal data. In comparison with the existing toroidal models, these proposed models reflect the protein structural information better and should be incorporated into proposal distributions. Lastly, to meet the need for sampling of proposal distribution in the MCMC algorithm, suitable methods for generating samples from these new models were explored using different types of the Metropolis sampling. In the future, one can investigate the performance of the Möbius transformation to obtain new cylindrical distributions.

Author Contributions

Conceptualization, M.A. and A.B.; methodology, M.A., N.N.R., A.B. and W.-D.S.; validation, M.A., N.N.R. and A.B.; formal analysis, M.A., N.N.R. and A.B.; investigation, M.A., N.N.R., A.B. and W.-D.S.; writing—original draft preparation, N.N.R.; writing—review and editing, M.A., N.N.R. and A.B.; visualization, M.A., N.N.R., A.B. and W.-D.S.; supervision, M.A. and A.B.; project administration, N.N.R.; funding acquisition, N.N.R. and A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Research Foundation grant number 71199, 109214, 120839.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are available at http://scop.mrc-lmb.cam.ac.uk/scop/ (accessed on 20 August 2020).

Acknowledgments

We would like to sincerely thank the three anonymous reviewers for their constructive comments that improved the paper. This work was based on research supported in part by the National Research Foundation (NRF) of South Africa, SARChI Research Chair UID: 71199; Ref.: IFR170227223754 grant No. 109214; Ref.: SRUG190308422768 grant No. 120839, STATOMET at the Department of Statistics at the University of Pretoria, and DSI-NRF Centre of Excellence in Mathematical and Statistical Sciences (CoE-MaSS), South Africa. The opinions expressed and conclusions arrived at are those of the authors and are not necessarily attributed to the CoE-MaSS or the NRF.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Proof of Proposition 1

When r 1 , r 2 0 , pdf (10) tends to the cosine distribution, which for large values of κ 1 and κ 2 is concentrated near 0. Suppose μ 1 = μ 2 = 0 (without loss of generality), according to Theorem 1 in [14] and using Taylor expansions, ( Θ 1 , Θ 2 ) N 2 ( 0 , Σ ) , where Σ 1 = κ 1 κ 3 κ 3 κ 3 κ 2 κ 3 , with κ 3 κ 1 κ 2 κ 1 + κ 2 .

Appendix A.2. Proof of Corollary 1

Without loss of generality, we consider μ 1 = 0 . According to (12), we conclude that:
f Θ 1 ( θ 1 ) f Θ 1 ( θ 1 ) = { 2 r 1 ( 1 + r 1 2 2 r 1 cos θ 1 ) + κ 2 κ 3 ( 1 r 1 2 ) 2 A ( h ( θ 1 ) ) h ( θ 1 ) ( 1 + r 1 2 2 r 1 cos θ 1 ) 2 κ 1 ( 1 r 1 2 ) 2 ( 1 + r 1 2 2 r 1 cos θ 1 ) 2 } sin θ 1 = g ( θ 1 ) sin θ 1 .
In (A1), if κ 3 < 0 , then g ( θ 1 ) < 0 . Therefore, for θ 1 [ 0 , π ) , f Θ 1 ( θ 1 ) < 0 , and for θ 1 [ π , 0 ) , f Θ 1 ( θ 1 ) 0 . Thus, f Θ 1 ( θ 1 ) is increasing in [ π , 0 ) and decreases from 0 to π . In addition, f Θ 1 ( θ 1 ) = f Θ 1 ( θ 1 ) , which means that f Θ 1 ( θ 1 ) is symmetric around 0; thus, for κ 3 < 0 , f Θ 1 ( θ 1 ) is unimodal. If κ 3 > 0 , h ( θ 1 ) decreases from π to 0 and increases from 0 to π , and h ( 0 ) = κ 2 κ 3 and h ( π ) = κ 2 + κ 3 . From Lemma 1 in Singh et al. (2002), A ( t ) / t is a decreasing function of t; therefore, A ( h ( θ 1 ) ) / h ( θ 1 ) is increasing in [ π , 0 ) and decreases from 0 to π . It can be concluded that g ( θ 1 ) is decreasing in [ 0 , π ) and increasing in [ π , 0 ) ; hence, if 2 r 1 + κ 2 κ 3 A ( κ 2 κ 3 ) κ 2 κ 3 ( 1 r 1 2 ) 2 ( 1 r 1 ) 2 κ 1 ( 1 r 1 2 ) 2 ( 1 r 1 ) 2 < 0 , then f Θ 1 ( θ 1 ) 0 for θ 1 [ π , 0 ) and f Θ 1 ( θ 1 ) < 0 for θ 1 [ 0 , π ) ; which means that f Θ 1 ( θ 1 ) is unimodal. If 2 r 1 + κ 2 κ 3 A ( κ 2 κ 3 ) κ 2 κ 3 ( 1 r 1 2 ) 2 ( 1 r 1 ) 2 κ 1 ( 1 r 1 2 ) 2 ( 1 r 1 ) 2 > 0 and 2 r 1 + κ 2 κ 3 A ( κ 2 + κ 3 ) κ 2 + κ 3 ( 1 r 1 2 ) 2 ( 1 + r 1 ) 2 κ 1 ( 1 r 1 2 ) 2 ( 1 + r 1 ) 2 0 , then f Θ 1 ( θ 1 ) is first increasing and then decreasing in [ π , 0 ) , which means that f Θ 1 ( θ 1 ) is bimodal. A more detailed proof is provided by the authors upon request.

Appendix A.3. Proof of Corollary 2

Suppose μ 1 = 0 (without loss of generality). According to (18), the following result can be obtained:
f Θ 1 ( θ 1 ) f Θ 1 ( θ 1 ) = { 2 r 1 ( 1 + r 1 2 2 r 1 cos θ 1 ) κ 1 ( 1 r 1 2 ) 2 ( 1 + r 1 2 2 r 1 cos θ 1 ) 2 + κ 3 2 ( 1 r 1 2 ) 2 A ( h ( θ 1 ) ) h ( θ 1 ) ( 1 + r 1 2 2 r 1 cos θ 1 ) 4 × ( ( 1 + r 1 2 ) 2 + 4 r 1 2 ) cos θ 1 4 r 1 ( 1 + r 1 2 ) cos 2 θ 1 2 r 1 ( 1 + r 1 2 ) sin 2 θ 1 } sin θ 1 = g ( θ 1 ) sin θ 1 .
In (A2), if cos θ 1 0 , then g ( θ 1 ) < 0 and the sign of (A2) depends on the sign of sin θ 1 . Hence, for θ 1 ( π , π / 2 ] , f Θ 1 ( θ 1 ) < 0 and for θ 1 [ π / 2 , π ] , f Θ 1 ( θ 1 ) 0 . Thus, f Θ 1 ( θ 1 ) is increasing in ( π , π / 2 ] and decreasing from π / 2 to π . In addition, f Θ 1 ( θ 1 ) = f Θ 1 ( θ 1 ) , which means that f Θ 1 ( θ 1 ) is symmetric around 0; therefore, f Θ 1 ( θ 1 ) is unimodal. For θ [ 0 , π / 2 ] , h ( θ 1 ) is an increasing function of θ 1 , and according to Lemma 1 in [13], A ( h ( θ 1 ) ) / h ( θ 1 ) is a decreasing function of θ 1 . We can conclude that if 2 r ( 1 r 1 ) 2 κ 1 ( 1 r 1 2 ) 2 + κ 3 2 ( 1 r 1 2 ) 2 A ( κ 2 ) κ 2 < 0 , then f Θ 1 ( θ 1 ) is a decreasing function from 0 to π / 2 , and because f Θ 1 ( θ 1 ) is symmetric around 0, it increases from π / 2 to 0. If 2 r ( 1 r 1 ) 2 κ 1 ( 1 r 1 2 ) 2 + κ 3 2 ( 1 r 1 2 ) 2 A ( κ 2 ) κ 2 > 0 , then f Θ 1 ( θ 1 ) first increases and then decreases in [ 0 , π / 2 ] and [ π / 2 , 0 ] (because it is symmetric around 0), which states that f Θ 1 ( θ 1 ) is bimodal.

References

  1. Ramachandran, G.T.; Sasisekharan, V. Conformation of polypeptides and proteins. Adv. Protein Chem. 1968, 23, 283–437. [Google Scholar]
  2. Holley, L.H.; Karplus, M. Protein secondary structure prediction with a neural network. Proc. Natl. Acad. Sci. USA 1989, 86, 152–156. [Google Scholar] [CrossRef] [Green Version]
  3. Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equation of state calculations by fast computing machines. J. Chem. Phys. 1953, 21, 1087–1092. [Google Scholar] [CrossRef] [Green Version]
  4. Hastings, W.K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 1970, 57, 97–109. [Google Scholar] [CrossRef]
  5. Irbäck, A.; Mohanty, S. PROFASI: A Monte Carlo simulation package for protein folding and aggregation. J. Comput. Chem. 2006, 27, 1548–1555. [Google Scholar] [CrossRef]
  6. Jones, D.T. Successful ab initio prediction of the tertiary structure of NK-lysin using multiple sequences and recognized supersecondary structural motifs. Proteins Struct. Funct. Bioinform. 1997, 29, 185–191. [Google Scholar] [CrossRef]
  7. Jones, T.A.; Thirup, S. Using known substructures in protein model building and crystallography. Embo J. 1986, 5, 819–822. [Google Scholar] [CrossRef]
  8. Simons, K.T.; Kooperberg, C.; Huang, E.; Baker, D. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions. J. Mol. Biol. 1997, 268, 209–225. [Google Scholar] [CrossRef] [Green Version]
  9. Ley, C.; Verdebout, T. Applied Directional Statistics: Modern Methods and Case Studies; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
  10. Ley, C.; Verdebout, T. Modern Directional Statistics; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
  11. Mardia, K.V. Statistics of directional data. J. R. Stat. Soc. Ser. B (Methodol.) 1975, 37, 349–371. [Google Scholar] [CrossRef]
  12. Rivest, L.P. A distribution for dependent unit vectors. Commun. Stat.-Theory Methods 1988, 17, 461–483. [Google Scholar] [CrossRef]
  13. Singh, H.; Hnizdo, V.; Demchuk, E. Probabilistic model for two dependent circular variables. Biometrika 2002, 89, 719–723. [Google Scholar] [CrossRef]
  14. Mardia, K.V.; Taylor, C.C.; Subramaniam, G.K. Protein bioinformatics and mixtures of bivariate von-Mises distributions for angular data. Biometrics 2007, 63, 505–512. [Google Scholar] [CrossRef]
  15. Kent, J.T.; Mardia, K.V.; Taylor, C.C. Modelling strategies for bivariate circular data. In Proceedings of the Leeds Annual Statistical Research Conference; The Art and Science of Statistical Bioinformatics, Leeds University Press: Leeds, UK, 2008; pp. 70–73. [Google Scholar]
  16. Mardia, K.V.; Frellsen, J. Statistics of bivariate von Mises distributions. In Bayesian Methods in Structural Bioinformatics; Springer: Berlin/Heidelberg, Germany, 2012; pp. 159–178. [Google Scholar]
  17. Mardia, K.V.; Hughes, G.; Taylor, C.C.; Singh, H. A multivariate von Mises distribution with applications to bioinformatics. Can. J. Stat. 2008, 36, 99–109. [Google Scholar] [CrossRef]
  18. Wehrly, T.E.; Johnson, R.A. Bivariate models for dependence of angular observations and a related Markov process. Biometrika 1980, 67, 255–256. [Google Scholar] [CrossRef]
  19. Jones, M.C.; Pewsey, A.; Kato, S. On a class of circulas: Copulas for circular distributions. Ann. Inst. Stat. Math. 2015, 67, 843–862. [Google Scholar] [CrossRef]
  20. Fernández-Durán, J.J. Models for circular–linear and circular–circular data constructed from circular distributions based on nonnegative trigonometric sums. Biometrics 2007, 63, 579–585. [Google Scholar] [CrossRef]
  21. García-Portugués, E.; Crujeiras, R.M.; González-Manteiga, W. Exploring wind direction and SO2 concentration by circular–linear density estimation. Stoch. Environ. Res. Risk Assess. 2013, 27, 1055–1067. [Google Scholar] [CrossRef] [Green Version]
  22. Pewsey, A.; García-Portugués, E. Recent advances in directional statistics. TEST 2021, 30, 1–58. [Google Scholar] [CrossRef]
  23. Di Marzio, M.; Panzera, A.; Taylor, C.C. Kernel density estimation on the torus. J. Stat. Plan. Inference 2011, 141, 2156–2173. [Google Scholar] [CrossRef] [Green Version]
  24. Ameijeiras-Alonso, J.; Ley, C. Sine-skewed toroidal distributions and their application in protein bioinformatics. Biostatistics 2020. Available online: https://doi.org/10.1093/biostatistics/kxaa039 (accessed on 20 January 2021). [CrossRef]
  25. Kato, S.; Shimizu, K.; Shieh, G.S. A circular–circular regression model. Stat. Sin. 2008, 18, 633–645. [Google Scholar]
  26. Shieh, G.S.; Johnson, R.A. Inferences based on a bivariate distribution with von-Mises marginals. Ann. Inst. Stat. Math. 2005, 57, 789–802. [Google Scholar] [CrossRef]
  27. Shieh, G.S.; Zheng, S.; Johnson, R.A.; Chang, Y.F.; Shimizu, K.; Wang, C.C.; Tang, S.L. Modeling and comparing the organization of circular genomes. Bioinformatics 2011, 27, 912–918. [Google Scholar] [CrossRef]
  28. Liu, D.; Peddada, S.D.; Li, L.; Weinberg, C.R. Phase analysis of circadian-related genes in two tissues. BMC Bioinform. 2006, 7, 87. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Downs, T.D.; Mardia, K.V. Circular regression. Biometrika 2002, 89, 683–697. [Google Scholar] [CrossRef]
  30. Jones, M.C. The Möbius distribution on the disc. Ann. Inst. Stat. Math. 2004, 56, 733–742. [Google Scholar] [CrossRef]
  31. Kato, S.; Jones, M.C. A family of distributions on the circle with links to, and applications arising from, Möbius transformation. J. Am. Stat. Assoc. 2010, 105, 249–262. [Google Scholar] [CrossRef] [Green Version]
  32. Wang, M.Z.; Shimizu, K. On applying Möbius transformation to cardioid random variables. Stat. Methodol. 2012, 9, 604–614. [Google Scholar] [CrossRef]
  33. Kato, S.; Pewsey, A. A Möbius transformation-induced distribution on the torus. Biometrika 2015, 102, 359–370. [Google Scholar] [CrossRef]
  34. Kato, S. A distribution for a pair of unit vectors generated by Brownian motion. Bernoulli 2009, 15, 898–921. [Google Scholar] [CrossRef]
  35. Kato, S.; McCullagh, P. Some properties of a Cauchy family on the sphere derived from the Möbius transformation. Bernoulli 2020, 26, 3224–3248. [Google Scholar] [CrossRef]
  36. McCullagh, P. Möbius transformation and Cauchy parameter estimation. Ann. Stat. 1996, 24, 787–808. [Google Scholar] [CrossRef]
  37. Abe, T.; Pewsey, A. Sine-skewed circular distributions. Stat. Pap. 2011, 52, 683–707. [Google Scholar] [CrossRef]
  38. Mullen, K.; Ardia, D.; Gil, D.L.; Windover, D.; Cline, J. DEoptim: An R package for global optimization by differential evolution. J. Stat. Softw. 2011, 40, 1–26. [Google Scholar] [CrossRef] [Green Version]
  39. Storn, R.; Price, K. Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
  40. Price, K.; Storn, R.M.; Lampinen, J.A. Differential Evolution: A Practical Approach to Global Optimization; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  41. Najibi, S.M.; Maadooliat, M.; Zhou, L.; Huang, J.Z.; Gao, X. Protein structure classification and loop modeling using multiple Ramachandran distributions. Comput. Struct. Biotechnol. J. 2017, 15, 243–254. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  42. Moghimbeygi, M.; Golalizadeh, M. Spherical logistic distribution. Commun. Math. Stat. 2020, 8, 151–166. [Google Scholar] [CrossRef]
  43. Martin, A.D.; Quinn, K.M.; Park, J.H.; Park, M.J.H. MCMCpack: Markov Chain Monte Carlo (MCMC) Package; Version 1.5-0; R Package: Vienna, Austria, 2020; Available online: https://cran.r-project.org/web/packages/MCMCpack/index.html (accessed on 25 August 2020).
  44. Li, L. gibbs.met: Naive Gibbs Sampling with Metropolis Steps; Version 1.1-3; R Package: Vienna, Austria, 2015; Available online: https://cran.r-project.org/web/packages/gibbs.met/index.html (accessed on 25 August 2020).
  45. Albert, J. LearnBayes: Functions for Learning Bayesian Inference; Version 2.15.1; R Package: Vienna, Austria, 2018; Available online: https://cran.r-project.org/web/packages/LearnBayes/index.html (accessed on 25 August 2020).
  46. Chivers, C.; Chivers, M.C. MHadaptive: General Markov Chain Monte Carlo for Bayesian Inference Using Adaptive Metropolis-Hastings Sampling; Version 1.1-8; R Package: Vienna, Austria, 2015; Available online: https://cran.r-project.org/web/packages/MHadaptive/index.html (accessed on 25 August 2020).
  47. Fernández-i-Marın, X. ggmcmc: Analysis of MCMC samples and Bayesian inference. J. Stat. Softw. 2016, 70, 1–20. [Google Scholar] [CrossRef] [Green Version]
  48. Mersmann, O. Microbenchmark: Accurate Timing Functions; Version 1.4-7; R Package: Vienna, Austria, 2019; Available online: https://www.rdocumentation.org/packages/microbenchmark/versions/1.4-7/topics/microbenchmark (accessed on 25 August 2020).
Figure 1. Two representations of protein backbone structures based on torsion or pseudo-torsion angles.
Figure 1. Two representations of protein backbone structures based on torsion or pseudo-torsion angles.
Mathematics 09 02749 g001
Figure 2. Pdf and contour plots of the transformed cosine model (10) for μ 1 = μ 2 = 0 and different values of κ 1 , κ 2 , κ 3 , r 1 , and r 2 .
Figure 2. Pdf and contour plots of the transformed cosine model (10) for μ 1 = μ 2 = 0 and different values of κ 1 , κ 2 , κ 3 , r 1 , and r 2 .
Mathematics 09 02749 g002
Figure 3. Plots of the marginal pdf of Θ 1 in (12) (left) and in (18) (right) for μ 1 = 0 and different parameter values.
Figure 3. Plots of the marginal pdf of Θ 1 in (12) (left) and in (18) (right) for μ 1 = 0 and different parameter values.
Mathematics 09 02749 g003
Figure 4. Pdf and contour plots of the transformed sine model (16) for μ 1 = μ 2 = 0 and different values of κ 1 , κ 2 , κ 3 , r 1 , and r 2 .
Figure 4. Pdf and contour plots of the transformed sine model (16) for μ 1 = μ 2 = 0 and different values of κ 1 , κ 2 , κ 3 , r 1 , and r 2 .
Mathematics 09 02749 g004
Figure 5. Pdf and contour plots of the sine-skewed transformed cosine model in (22) (top) and the sine-skewed transformed sine model in (24) (bottom) for different values of λ 1 and λ 2 .
Figure 5. Pdf and contour plots of the sine-skewed transformed cosine model in (22) (top) and the sine-skewed transformed sine model in (24) (bottom) for different values of λ 1 and λ 2 .
Mathematics 09 02749 g005
Figure 6. Plots of the marginal pdf of Θ 1 for B S S T C (left) and B S S T S (right) for μ 1 = μ 2 = 0 and different parameter values.
Figure 6. Plots of the marginal pdf of Θ 1 for B S S T C (left) and B S S T S (right) for μ 1 = μ 2 = 0 and different parameter values.
Mathematics 09 02749 g006
Figure 7. Plots of the sine-skewed versions of marginal pdfs of Θ 1 in (12) (left) and (18) (right) for k = 1 , μ 1 = 0 and different parameter values.
Figure 7. Plots of the sine-skewed versions of marginal pdfs of Θ 1 in (12) (left) and (18) (right) for k = 1 , μ 1 = 0 and different parameter values.
Mathematics 09 02749 g007
Figure 8. Ramachandran plots for each dataset.
Figure 8. Ramachandran plots for each dataset.
Mathematics 09 02749 g008
Figure 9. Contour plots of fitted pdfs together with scatter plot for SCOP.1 ( n = 10,188). The last row includes the proposed models.
Figure 9. Contour plots of fitted pdfs together with scatter plot for SCOP.1 ( n = 10,188). The last row includes the proposed models.
Mathematics 09 02749 g009
Figure 10. Contour plots of fitted pdfs together with scatter plot for SCOP.3 ( n = 4607 ). The last row includes the proposed models.
Figure 10. Contour plots of fitted pdfs together with scatter plot for SCOP.3 ( n = 4607 ). The last row includes the proposed models.
Mathematics 09 02749 g010
Figure 11. Contour plots of fitted pdfs together with scatter plots for TCBIG.VAL.right ( n = 2673 ). The last row includes the proposed models.
Figure 11. Contour plots of fitted pdfs together with scatter plots for TCBIG.VAL.right ( n = 2673 ). The last row includes the proposed models.
Mathematics 09 02749 g011
Figure 12. Kernel density plots of the data, and the best-fit models.
Figure 12. Kernel density plots of the data, and the best-fit models.
Mathematics 09 02749 g012
Figure 13. Scatter, trace, and compare-partial plots of the simulated data from the transformed sine model using “gibbs_met” in the “gibbs.met” package (first row), “MCMCmetrop1R” in the “MCMCpack” package (second row), “met_gaussian” in the “gibbs.met” package (third row), “Metro_Hastings” in the “MHadaptive” package (fourth row), and “rwmetrop” in the “LearnBayes” package (fifth row).
Figure 13. Scatter, trace, and compare-partial plots of the simulated data from the transformed sine model using “gibbs_met” in the “gibbs.met” package (first row), “MCMCmetrop1R” in the “MCMCpack” package (second row), “met_gaussian” in the “gibbs.met” package (third row), “Metro_Hastings” in the “MHadaptive” package (fourth row), and “rwmetrop” in the “LearnBayes” package (fifth row).
Mathematics 09 02749 g013
Figure 14. Execution times for generating a sample size of n = 100 from a transformed sine model (left) and a transformed cosine model (right) for each method.
Figure 14. Execution times for generating a sample size of n = 100 from a transformed sine model (left) and a transformed cosine model (right) for each method.
Mathematics 09 02749 g014
Figure 15. Scatter, trace, and compare-partial plots of the simulated data from the transformed cosine model using “gibbs_met” in the “gibbs.met” package (first row), “MCMCmetrop1R” in the “MCMCpack” package (second row), “met_gaussian” in the “gibbs.met” package (third row), “Metro_Hastings” in the “MHadaptive” package (fourth row), and “rwmetrop” in the “LearnBayes” package (fifth row).
Figure 15. Scatter, trace, and compare-partial plots of the simulated data from the transformed cosine model using “gibbs_met” in the “gibbs.met” package (first row), “MCMCmetrop1R” in the “MCMCpack” package (second row), “met_gaussian” in the “gibbs.met” package (third row), “Metro_Hastings” in the “MHadaptive” package (fourth row), and “rwmetrop” in the “LearnBayes” package (fifth row).
Mathematics 09 02749 g015
Table 1. Maximum likelihood estimates and corresponding log-likelihood, AIC, and BIC for SCOP.1 ( n = 10,188).
Table 1. Maximum likelihood estimates and corresponding log-likelihood, AIC, and BIC for SCOP.1 ( n = 10,188).
Model ρ ^ κ ^ 1 κ ^ 2 κ ^ 3 r ^ 1 r ^ 2 μ ^ 1 μ ^ 2 λ ^ 1 λ ^ 2 p ^ Log-LikelihoodAICBIC
Sine25.20850.36797.37001.89762.4624 15 , 890.80 31,790.1631,827.74
[13]
Sine-skewed sine18.80580.08524.84491.8701 3.1415 0.4051 0.3718 18 , 089.71 36,193.4236,244.02
[24]
Mixture of sine4.99380.46032.35122.05602.50110.3476
15 , 719.26 31,460.2231,540.04
0.02170.0413 4.4594 1.0912 1.8997 0.6524
Cosine11.6274 6.7 × 10 17 0.65071.8807 0.8652 19 , 919.04 39,848.0739,884.22
[14]
Sine-skewed cosine11.6274 1.7 × 10 8 0.65071.8807 0.8651 0.7557 0.0789 19 , 919.04 39,852.0739,902.68
[24]
Mixture of cosine9.60152.64590.00871.79670.86760.5266
[14] 18 , 120.09 36,262.1836,341.70
8.47610.08202.32282.13090.96470.4734
Mixture of bivariate 0.2892 0.95510.56491.61291.53370.4463
wrapped Cauchy 17 , 099.36 34,220.7234,300.24
[33] 0.1289 0.85130.54332.1128 2.6980 0.5537
Transformed sine2.15850.34893.17120.60360.01311.85732.4321−15,558.9831,131.9731,182.56
Sine-skewed transformed sine2.15820.34873.17120.60370.01311.85732.4321 0.1894 0.0556 15 , 558.98 31,135.9731,201.02
Transformed cosine4.5122 1.9 × 10 16 2.79050.26320.41641.8806 0.6888 16 , 920.43 33,854.8633,905.46
Sine-skewed transformed cosine4.4704 4.2 × 10 5 2.81850.26560.42281.8805-0.68710.6225 0.1849 16 , 920.43 33,858.8633,923.92
Table 2. Maximum likelihood estimates and corresponding log-likelihood, AIC, and BIC for SCOP.3 ( n = 4607 ).
Table 2. Maximum likelihood estimates and corresponding log-likelihood, AIC, and BIC for SCOP.3 ( n = 4607 ).
Model ρ ^ κ ^ 1 κ ^ 2 κ ^ 3 r ^ 1 r ^ 2 μ ^ 1 μ ^ 2 λ ^ 1 λ ^ 2 p ^ Log-LikelihoodAICBIC
Sine27.03120.32438.07891.88102.4618 6970.09 13,950.1813,982.36
[13]
Sine-skewed sine26.83040.32248.07321.89602.4724 0.4124 0.0159 6941.31 13,896.6213,941.67
[24]
Mixture of sine7.38422.0013 6.3567 2.0918 1.4321 0.6632
6893.41 13,901.1513,879.61
2.87740.0347 1.7125 1.9306 1.1124 0.3368
Cosine11.5883 3.8 × 10 16 0.64041.8537 0.9851 9028.72 18,067.4518,099.62
[14]
Sine-skewed cosine11.5883 5.0 × 10 9 0.64041.8537 0.9850 0.0168 0.6387 9028.72 18,071.4518,116.49
[24]
Mixture of cosine29.93751.92100.02131.68400.80430.5648
[14] 6959.76 13,941.5214,012.31
17.33020.02111.94562.05750.88660.4352
Mixture of bivariate 0.2347 0.91690.55461.59691.10370.4712
wrapped Cauchy 7137.52 14,297.0414,367.83
[33] 0.1279 0.83880.51001.9792 2.0869 0.5288
Transformed sine3.87550.34143.67860.4950 1.3 × 10 9 1.85892.4490−6905.0813,824.1713,869.22
Sine-skewed transformed sine3.87640.34153.70660.4883 2.6 × 10 8 1.85912.4491 0.1544 0.0796 6905.08 13,828.1713,886.08
Transformed cosine4.1351 2.4 × 10 16 2.82830.28840.41831.8604 0.6560 7567.28 15,148.5615,193.61
Sine-skewed transformed cosine4.1350 6.4 × 10 10 2.82830.28840.41831.8604 0.6560 0.6868 0.1567 7567.27 15,152.5615,210.46
Table 3. Maximum likelihood estimates and corresponding log-likelihood, AIC, and BIC for TCBIG.VAL.right ( n = 2673 ).
Table 3. Maximum likelihood estimates and corresponding log-likelihood, AIC, and BIC for TCBIG.VAL.right ( n = 2673 ).
Model ρ ^ κ ^ 1 κ ^ 2 κ ^ 3 r ^ 1 r ^ 2 μ ^ 1 μ ^ 2 p ^ Log-LikelihoodAICBIC
Mixture of sine4.43647.6222 1.2187 1.7736 2.33360.6239
4901.12 9824.259889.04
5.46067.7290 3.1154 1.4197 0.4111 0.3761
Mixture of cosine4.28496.1824 4.6 × 10 6 1.4003 0.4191 0.3787
[14] 5005.01 10,032.0310,096.82
4.62687.8256 9.1 × 10 6 1.7785 2.33580.6213
Mixture of bivariate 0.3805 0.85450.8118 1.1194 0.4710 0.3108
wrapped Cauchy 5283.42 10,588.8510,653.64
[33] 0.0294 0.70380.7778 1.8772 2.30370.6892
Mixture of transformed sine2.82747.3718 3.0328 0.29300.0872 1.3860 0.4040 0.3515
4826.80 9683.619771.96
4.09491.6545 0.7133 0.024500.4387 1.7802 2.34950.6485
Mixture of transformed cosine2.33499.33850.00630.49090.1287 1.1835 0.5358 0.2645
4882.14 9794.289882.64
3.94960.01170.88860.05530.8668 1.8501 2.38410.7355
Table 4. Maximum likelihood estimates of parameters and bias, and the MSE of the estimates for the simulated data obtained from each method.
Table 4. Maximum likelihood estimates of parameters and bias, and the MSE of the estimates for the simulated data obtained from each method.
MethodDistributionn κ 1 κ 2 κ 3 r 1 r 2 μ 1 μ 2
MLE1.84500.28762.83600.60010.00631.77442.6177
n = 100 Bias 0.0542 0.1932 0.1531 0.0089 0.0521 0.0815 0.1185
MSE0.08790.09210.18640.00830.00440.00870.0344
Transformed sine
MLE2.13110.26493.15460.62430.03951.84912.4263
n = 1000 Bias 0.0298 0.0900 0.0135 0.01420.0130 0.0018 0.0046
MSE0.05270.05880.00020.00180.00080.0005 2.5 × 10 5
MCMCmetrop1R
MLE3.77530.59851.56930.25640.4660 1.4684 1.0937
n = 100 Bias 0.1137 0.0572 0.1022 0.0143 0.0376 0.09570.0421
MSE0.09980.09320.10070.00280.01160.01350.0275
Transformed cosine
MLE4.06930.63051.82100.25460.5557 1.5613 1.0116
n = 1000 Bias0.0758 0.0319 0.03050.02410.03290.00370.0238
MSE0.00650.04260.00850.00070.00140.00070.0011
MLE1.81860.22603.26190.64440.03591.88152.6027
n = 100 Bias 0.2337 0.0867 0.38370.03980.02050.03700.1006
MSE0.10940.03500.07260.00160.04060.00220.0291
Transformed sine
MLE1.97350.38873.15460.62040.01591.87012.4270
n = 1000 Bias0.09790.0360 0.0165 0.01310.00460.0129 0.0046
MSE0.03400.04820.00030.00040.00010.0005 2.1 × 10 5
rwmetrop
MLE3.80860.53971.49560.17260.5929 1.5746 1.0417
n = 100 Bias 0.0984 0.0940 0.2954 0.0512 0.0883 0.0115 0.0524
MSE0.09020.08930.10030.00330.00880.00090.0037
Transformed cosine
MLE3.91390.73201.81880.25460.4667 1.5766 0.9962
n = 1000 Bias 0.0758 0.0703 0.0166 0.0241 0.0386 0.0095 0.0084
MSE0.00560.06740.00070.00130.00140.00070.0018
MLE2.54000.44803.13220.69630.10871.83212.3133
n = 100 Bias0.34690.0904 0.0328 0.08370.0916 0.0294 0.0908
MSE0.17730.05830.01170.00920.00910.00090.0141
Transformed sine
MLE1.90220.28943.34030.66490.00161.86192.3822
n = 1000 Bias 0.2223 0.0658 0.11370.0607 0.0123 0.0042 0.0481
MSE0.12680.00420.02860.00430.0025 2.1 × 10 5 0.0024
met_gaussian
MLE3.43890.54091.43730.25300.5214 1.5411 1.0526
n = 100 Bias 0.3642 0.1187 0.2970 0.01430.01130.02100.0623
MSE0.19270.09930.12510.08530.00820.00050.0041
Transformed cosine
MLE3.64910.53411.62600.24200.5134 1.5720 1.0158
n = 1000 Bias 0.2978 0.1049 0.0999 0.01950.0027 0.0105 0.0126
MSE0.12140.09150.08750.00190.00320.00050.0024
MLE2.14650.27282.67130.69120.00101.81072.2024
n = 100 Bias 0.0197 0.0784 0.4598 0.0856 0.0770 0.0603 0.1029
MSE0.04340.08830.24980.00730.00590.00460.0527
Transformed sine
MLE2.16570.27433.21240.58130.07621.84872.4826
n = 1000 Bias0.0362 0.0456 0.0433 0.0262 0.0531 0.0081 0.0404
MSE0.02460.05560.00270.00110.0039 7.3 × 10 5 0.0025
Metro_Hastings
MLE3.82900.59032.04190.28570.5753 1.5434 0.8373
n = 100 Bias 0.1600 0.0582 0.24070.05770.07090.0172 0.1202
MSE0.19980.09440.12080.00610.00590.00060.0266
Transformed cosine
MLE3.89360.67471.58220.27750.4859 1.5676 0.9935
n = 1000 Bias 0.0961 0.0298 0.2317 0.0470 0.0196 0.0025 0.0057
MSE0.00910.08470.08290.00300.00030.00020.0051
MLE2.23400.27282.82770.61980.13981.84372.1688
n = 100 Bias0.0712 0.0784 0.3884 0.01430.1220 0.0129 0.2598
MSE0.11710.05830.11790.00920.01600.00030.0692
Transformed sine
MLE2.19010.35553.17600.60160.01561.85732.4342
n = 1000 Bias0.03150.00660.0048 0.0020 0.0024 6.1 × 10 5 0.0021
MSE0.09630.02210.04640.00080.0003 1.7 × 10 5 0.0018
gibbs_met
MLE3.61230.57321.51930.20060.5852 1.5886 0.8808
n = 100 Bias 0.2842 0.0853 0.2717 0.0276 0.0869 0.0238 0.0758
MSE0.14190.08930.10670.00820.00870.00050.0187
Transformed cosine
MLE3.63220.59781.76130.26260.5793 1.5685 0.9668
n = 1000 Bias 0.2568 0.0653 0.0232 0.02210.0648 0.0034 0.0209
MSE0.09150.07260.00080.00310.00740.00020.0046
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Arashi, M.; Nakhaei Rad, N.; Bekker, A.; Schubert, W.-D. Möbius Transformation-Induced Distributions Provide Better Modelling for Protein Architecture. Mathematics 2021, 9, 2749. https://doi.org/10.3390/math9212749

AMA Style

Arashi M, Nakhaei Rad N, Bekker A, Schubert W-D. Möbius Transformation-Induced Distributions Provide Better Modelling for Protein Architecture. Mathematics. 2021; 9(21):2749. https://doi.org/10.3390/math9212749

Chicago/Turabian Style

Arashi, Mohammad, Najmeh Nakhaei Rad, Andriette Bekker, and Wolf-Dieter Schubert. 2021. "Möbius Transformation-Induced Distributions Provide Better Modelling for Protein Architecture" Mathematics 9, no. 21: 2749. https://doi.org/10.3390/math9212749

APA Style

Arashi, M., Nakhaei Rad, N., Bekker, A., & Schubert, W. -D. (2021). Möbius Transformation-Induced Distributions Provide Better Modelling for Protein Architecture. Mathematics, 9(21), 2749. https://doi.org/10.3390/math9212749

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop