Next Article in Journal
Knowledge-Assisted Actor Critic Proximal Policy Optimization-Based Service Function Chain Reconfiguration Algorithm for 6G IoT Scenario
Previous Article in Journal
On the Exploration of Quantum Polar Stabilizer Codes and Quantum Stabilizer Codes with High Coding Rate
Previous Article in Special Issue
Targeted Energy Transfer Dynamics and Chemical Reactions
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Superstatistics Applied to Cucurbitaceae DNA Sequences

1
Departamento de Física, Universidade Federal do Rio Grande do Norte, Natal 59072-970, Brazil
2
Departamento de Física, Universidade do Estado do Rio Grande do Norte, Mossoró 59610-210, Brazil
3
Departamento de Ciências Vegetais, Universidade Federal Rural do Semi-Árido, Mossoró 59625-900, Brazil
*
Author to whom correspondence should be addressed.
Entropy 2024, 26(10), 819; https://doi.org/10.3390/e26100819
Submission received: 11 June 2024 / Revised: 17 September 2024 / Accepted: 23 September 2024 / Published: 25 September 2024
(This article belongs to the Special Issue New Challenges in Contemporary Statistical Physics)

Abstract

:
The short and long statistical correlations are essential in the genomic sequence. Such correlations are long-range for introns, whereas, for exons, these are short. In this study, we employed superstatistics to investigate correlations and fluctuations in the distribution of nucleotide sequence lengths of the Cucurbitaceae family. We established a time series for exon sizes to probe these correlations and fluctuations. We used data from the National Center for Biotechnology Information (NCBI) gene database to extract the temporal evolution of exon sizes, measured in terms of the number of base pairs (bp). To assess the model’s viability, we utilized a timescale extraction method to determine the statistical properties of our time series, including the local distribution and fluctuations, which provide the exon size distributions based on the q-Gamma and inverse q-Gamma distributions. From the Bayesian statistics standpoint, both distributions are excellent for capturing the correlations and fluctuations from the data.

1. Introduction

The dynamics associated with complex systems generally present a superposition of many dynamics on different time scales. From a statistical point of view, an approach that naturally captures such a superposition is the so-called superstatistics [1]. Specifically, the main argument for this formalism is to decompose the dynamics of the system into different scales so that its statistical properties follow from a superposition of statistics through a Boltzmann factor e β E , where E represents the effective energy for each subsystem and a probability density p ( β ) that results in a fluctuating intensive parameter β . Moreover, one assumes local equilibrium on each scale, achieved at distinct β values, e.g., the dissipation energy and inverse temperature are examples of such parameters [1,2]. Other applications of the superstatistics formalism follow from econophysics [3,4], geophysics [5,6], turbulence [7,8,9], plasmas [10,11,12], and ultra-cold gases [13,14] to high-energy scattering processes [15,16], spin systems [17,18], cosmology [19,20], and stellar systems [21].
In systems of interest in biophysics, there are other connections, such as a superstatistical model (and its corresponding DNA generation algorithm), emulating the rules that dictate the (empirical) nucleotide arrangement properties of some DNA sequences [22]. In Ref. [23], the authors reported an analysis of DNA-binding proteins that exhibit heterogeneous diffusion processes in bacteria [24]. More recently, considering the sequence coding of the human genome, fluctuations in the distribution of the lengths of nucleotide strings have been developed using a stochastic model that provides the distributions of exon size [25].
In the context of the plant genome, on the other hand, there are more than 825 species in the Cucurbitaceae family, which is divided into 118 genera. This family contains many plant traits, sexual expression, fruits, and seeds. It has a significant impact as a source of nutrients and fiber [26,27]. The Cucumis genus stands out among the well-known genera in this family because it includes two species of commercially significant vegetables that are grown all over the world: cucumber (Cucumis sativus) and melon (Cucumis melo). As for the Cucurbita genus, it has three species that we consider to be the main ones: Cucurbita maxima, Cucurbita moschata, and Cucurbita pepo [28].
The basic number of chromosomes in the Cucumis genus can vary from x = 7 to x = 12 ; this last value corresponds to the base number of the Cucurbitaceae family. There are three types of species: diploid ( 2 x = 14 or 24 chromosomes), tetraploid ( 4 x = 48 chromosomes), and hexaploid ( 6 x = 72 chromosomes) [29,30]. Melon ( 2 x = 24 chromosomes) and cucumber ( 2 x = 14 chromosomes) are diploid species. The length of the metaphase chromosomes in cucumbers ranges from 1.48 μ m to 2.31 μ m [31] and 1.65 μ m to 3.40 μ m [32], with the first six chromosomes being metacentric and the smallest submetacentric, according to karyotyping research. As for melon, it is believed that they range from 1.44 μ m to 2.40 μ m [33], 1.20 to 2.50 μ m [34], and from 1.1 μ m to 1.9 μ m [35].
The complete sequencing of the cucumber and melon genomes was made possible by developing high-throughput sequencing technology. The genome of a cucumber is estimated to be 367 Mbp [36], but the genome of a melon is estimated to be 450–454 Mbp. The estimated gene count for cucumber is 26,682, with an average gene size of 1046 bp and an average exon count of 4.39. Melon has an estimated 27,427 genes, with an average size of 2776 bp and 5.85 exons per gene [37]. On the other hand, every Cucurbita species is a diploid, with 20 chromosomal pairs ( 2 x = 40 ) . The plant species of the tribe Cucurbiteae, which include C. pepo, C. moschata, and C. maxima, have undergone genome-doubling events [38,39,40]. There are few estimates of the size of the Cucurbita genome. However, studies have shown a relatively small genome size. According to estimates, the genomic sizes of C. maxima, C. moschata, and C. pepo are 263.0 Mb, 271.40 Mb, and 269.90 Mb, respectively. The number of genes is 32,076, 32,205, and 27,868 for C. maxima, C. moschata, and C. pepo, respectively [38,39].
The main goal of this paper is to establish a connection between the formalism of superstatistics and the encoding of plant genome sequences. Using this approach, we investigated short-range correlations (SRCs) and fluctuations in the distribution of nucleotide chain lengths based on the temporal scale extraction method, which provides a detailed view of the statistical properties of the time series associated with exon sizes. From this analysis, we observed that local distributions and the local mean fluctuations ξ are well described by q-Gamma and inverse q-Gamma distributions, supporting the validity of the superstatistics concept in this context. Additionally, to determine which of these distributions offers the best description of the statistical behavior of the time series, we applied Bayesian analysis, which is known for its robustness in various fields. This method has been widely used in areas such as Cosmology [41,42,43,44,45,46], Biophysics [25,47,48,49], and Ecology [50], highlighting its effectiveness for model comparison.

2. Materials and Methods

The central premise of the superstatistical approach lies in describing a non-equilibrium heterogeneous system composed of subsystems subject to fluctuations in an intensive parameter. Each subsystem rapidly reaches local equilibrium, characterized by a relaxation time of γ 1 , and maintains this state for a time interval T before transitioning to a new value. The superstatistical description can be constructed through a stochastic differential equation of the Langevin type [1], given by
d x d t = γ F ( x ) + ζ L ( t ) ,
where the (white Gaussian) noise is denoted by L ( t ) , the friction constant is represented by γ > 0 , the noise intensity is given by ζ , and the drift force is represented by F ( x ) = d V ( x ) / d x , where V ( x ) is the potential. Parameters γ and ζ can fluctuate in this theoretical framework so that β = γ / ζ 2 has a probability density function (PDF) given by p ( β ) .
Based on this, we obtain the conditional probability f ( x | β ) as:
f ( x | β ) = 1 Z ( β ) exp β V ( x ) .
where Z ( β ) is a normalization constant for exp β V ( x ) for a given β . The marginal probability p ( x ) is defined by:
p ( x ) = f ( x | β ) p ( β ) d β .
In our analysis, to test the superstatistical formalism, it is necessary to extract two-time scales, γ 1 and T. Initially, we constructed a set of time series using the complete genomic data of five species from the Cucurbitaceae family: C. melo, C. sativus, C. maxima, C. moschata, and C. pepo, corresponding to the exon lengths along the DNA sequence:
l = l ( t ) ,
where the time series is defined as random variables l 1 , , l i , , l n indexed in discrete time t 1 , , t i , , t n , with the subscript i = 1 , , n . Thus, the first value t 1 can be interpreted as the position of the first exon of length l 1 on the DNA strand. Similarly, t 2 represents the position of the second exon of length l 2 . Finally, the n-th exon of length l n will have the position t n . We utilized the publicly available database provided by the National Center for Biotechnology Information (NCBI) [51]. The time series for chromosome 01 of the C. melo species is illustrated in Figure 1a.
To determine statistical information for the time series, we constructed the probability density p ( l ) , as depicted in Figure 1b, using a linear box scale. In this method, the bin sizes for the probability density are incremented using the relation b i = 25 × i , where b i represents the size of the i-th bin for the probability density, with i = 1 , 2 , 3 , n . This approach allowed us to smooth the fluctuations in the size distributions of exons while preserving the overall shape of the distribution curve.
To ascertain the time scales γ 1 and T, we adopted an approach that analyzes the exponential decay of the autocorrelation function of a time series [4]. In this method, the autocorrelation function is modeled as a combination of two exponential decay functions, as illustrated in the following equation:
C ( τ ) = a exp ( t 1 τ ) + b exp ( τ / t 2 ) .
Here, parameter a indicates the maximum amplitude of the autocorrelation function, and parameter b is related to the initial decay rate of the autocorrelation function, with t 1 related to the time scale γ 1 , t 2 related to the time scale T, and τ is lag time. Therefore, the time scales γ 1 are compared to the time scales T through the ratio t 2 / t 1 . The autocorrelation function plot for chromosome 01 of the C. melo species is presented in Figure 2a. In this case, the obtained time scales are t 1 = γ 1 1.27 and t 2 = T 121 . This demonstrates the existence of two time scales based on the ratio t 2 / t 1 95 .
When analyzing the time series presented in Figure 1a and windows of size T, we observed that the local distribution can be modeled by two distinct distributions: the gamma distribution and the inverse gamma distribution, as depicted in Figure 2b. The following probability density function characterizes the gamma distribution:
f G ( l ) = 1 Γ ( k ) k ξ k x k 1 exp k ξ l ,
and the inverse gamma distribution is described by the following equation:
f I G ( l ) = α α + 1 ξ Γ ( α + 1 ) l ξ α 2 exp α ξ l ,
where k, ξ , and α are adjustment parameters, all of which are positive real values.
Notably, the other chromosomes of the examined species had patterns similar to those reported in chromosome 01 of the C. melo species. To obtain more details about the temporal durations of these chromosomes, kindly see the provided Table A1 and Table A2 (please see Appendix A).
It is essential to notice that our present approach follows the timescale extraction method and is, therefore, different from the one used in Ref. [25], which is based on the Fokker–Planck and the Langevin equations, to obtain the q-Gamma and inverse q-Gamma distributions. The choice of the distribution p ( ξ ) is a crucial concept in superstatistical formalism. To determine this distribution, we followed the method described in the Reference [52] for the local mean ξ . We employed the local variance for gamma and inverse gamma distributions to ascertain the behavior of ξ ( t ) . For the gamma distribution, the local variance is expressed as
ξ G ( t 0 ) 1 l 2 t 0 , T l t 0 , T 2 .
While for the inverse gamma distribution, the local variance is given by
ξ I G ( t 0 ) l 2 t 0 , T l t 0 , T 2 .
Here, t 0 , Δ t = 1 Δ t t 0 t 0 + Δ t d t represents integration over an interval of size Δ t starting at t 0 . In Figure 3a,c, we depict the behavior of the time series for ξ ( t ) for chromosome 01 of the C. melo species, obtained through the application of Equations (8) and (9). Subsequently, we constructed the probability density histogram for p ( ξ ) , as shown in Figure 3b,d. For this analysis, we compared the histogram of the local variance of the gamma distribution, Figure 3b, with the inverse gamma distribution, which has parameters μ = 15.3237 and ω = 3989.4215 . Regarding the local variance of the inverse gamma distribution, Figure 3d, we compared it with the gamma distribution that has parameters δ = 18.3461 and ω = 18.0439 , defined as:
p ( ξ ; δ , ω ) = 1 ω δ Γ ( δ ) ξ δ 1 exp ξ / ω ,
and
p ( ξ ; μ , ω ) = ω μ Γ ( μ ) ξ μ 1 exp ( ω / ξ ) .
The fittings of the gamma and inverse gamma distributions provided a meaningful description of the histogram data. These results suggest that both gamma-type and inverse gamma-type statistical distributions are suitable models for describing the data generated by our timescale extraction process. It is important to note that we performed this analysis for all species’ chromosomes.
Then, we proceeded to obtain the q-Gamma and inverse q-Gamma superstatistics. The mathematical details are given in Appendix B. Here, we limited ourselves to present the expressions. The inverse gamma superstatistics is described by the q-Gamma probability density function, which reads:
p G ( l ) = A G l σ a exp q l σ ,
and A G is defined as:
A G = ( q 1 ) a + 1 Γ 1 q 1 σ Γ 1 q 1 a 1 Γ ( a + 1 ) .
Similarly, the gamma superstatistics is described by the inverse q-Gamma distribution (we call attention to the reciprocal, inverse nomenclature):
p I G ( l ) = A I G l σ α 2 exp q σ l .
and A I G is given by
A I G = Γ 1 q 1 σ Γ ( α + 1 ) Γ 1 q 1 α 1 1 q 1 α 1 .

3. Results and Discussion

To capture short-range characteristics in the exon size distributions, we used the q-Gamma and inverse q-Gamma distributions described by Equations (12) and (14) presented in the previous section, fitting them to the data using the Levenberg–Marquardt (LM) algorithm [53]. The results for the Cucumis and Cucurbita genera are presented in Table A3, Table A4, Table A5, Table A6 and Table A7.
Figure 4a,b illustrate the distributions for chromosomes 08 and 11 in the C. melo species, while Figure 4c,d show the distributions for chromosomes 02 and 07 in the C. sativus species. An analysis of Figure 4 reveals that the peak of the distributions in coding regions is around 10 2 base pairs (bp), which is consistent with previous findings in the literature for higher eukaryotes [25]. This pattern is persistent across all chromosomes of the studied species within the Cucurbitaceae family. Within the genus Cucumis, C. melo, Figure 4a,b, presents small discrepancies between the theoretical distributions q-Gamma (blue curve) and inverse q-Gamma (red curve) with the probability distribution observed from the data (black dots), for values below 50 bp. The same behavior was reported by Costa et al. (2022) [25], in the context of the Homo sapiens genome. However, when we evaluated the species C. sativus, Figure 4c,d, we did not observe this discrepancy, and the proposed distributions provide a very accurate description of the probability density of the data.
The Figure 5a–f show how the theoretical probability density functions fit the probability density curves observed for some chromosomes of the species C. maximum, C. moschata, and C. pepo, with a similar behavior being observed for the other chromosomes. For these species, we can also observe a slight discrepancy between the behavior of the data (black dots) and the theoretical q-Gamma (blue line) and inverse q-Gamma (red line) distributions. Here, it is worth stressing that the discrepancies are due to the behavior of the theoretical distributions studied and the observed data. Our intention here is to identify which of the distributions is most suitable for explaining the data, so these discrepancies certainly influence the effectiveness of the evaluated model. However, this does not appear to compromise the quality of the fit to the data.
This work uses Bayesian inference to compare the q-Gamma and inverse q-Gamma distributions. Bayesian inference is a technique for updating knowledge about the parameters of a model based on new information. This process is related to the dataset D and a probabilistic model for a given distribution L ( D | θ ) , known as the likelihood function [54,55], conditioned by the knowledge of the free parameter set θ . Our understanding of θ is quantified by the prior distribution, f ( θ ) . These functions are connected through Bayes’ theorem:
P ( θ | D ) = L ( D | θ ) f ( θ ) L ( D | θ ) f ( θ ) d θ .
The primary method used in Bayesian inference to decide which model better quantifies the data is by determining the marginal likelihood function. The acquisition of the marginal likelihood function is accomplished by integrating the likelihood function over the parameter space, and it can be expressed as:
ϵ ( D ) = L ( D | θ ) P ( θ ) d θ .
Therefore, to perform Bayesian analysis, we will consider that the genomes for the Cucurbitaceae family are described by the probability likelihood functions for the q-Gamma distribution,
L G ( l ) = i n p G ( l i ) = i n A G l i σ a exp q l i σ ,
and the inverse q-Gamma distribution, given by
L I G ( l ) = i n p I G ( l i ) = i n A I G l i σ α 2 exp q σ l i .
Thus, when comparing the two models, we used the Bayes factor regarding the ratio of marginal likelihood functions.
B 1 , 2 = ϵ 1 ϵ 2 ,
where ϵ 1 is associated with the marginal likelihood function for the q-Gamma distribution and ϵ 2 is associated with the marginal likelihood function for the inverse q-Gamma distribution.
To quantify whether the model presents favorable evidence, we used Jeffreys’ scale of evidence for the Bayes factor [56], presented in Table 1. This table represents an empirically calibrated scale for values of ln ( B 1 , 2 ) : ln ( B 1 , 2 ) > 1 indicates favorable evidence for the inverse q-Gamma distribution; ln ( B 1 , 2 ) < 1 represents favorable evidence for the q-Gamma distribution, and 1 ln ( B 1 , 2 ) 1 represents inconclusive evidence, making it impossible to determine which model best describes the data set. This work uses the inverse q-Gamma distribution as a reference model.
To determine the Bayesian evidence associated with the theoretical probability distributions studied, whether Equation (12) (q-Gamma) or Equation (14) (inverse q-Gamma), we used the Ultranest [57] package, a Bayesian analysis tool that employs the concept of nested sampling to calculate Bayesian evidence and posterior distributions [58,59,60]. We constructed scatter plots to visualize the relationships between the parameters of the proposed theoretical models. Figure 6 shows the interaction between the parameters of Equation (12) (panel a) and Equation (14) (panel b), for the species C. melo. The same behavior can be observed for all species studied (see Figure A3 in Appendix A).
The previous distribution considers the model under study’s configurable parameters. Consequently, this study phase is critical since it will establish the trajectory of the quest for ideal parameters. This factor can directly impact the model’s capacity for prediction. Prior distributions can be chosen in a variety of ways when there is little prior knowledge; for instance, a Uniform distribution is utilized, which specifies the minimum and maximum values for the parameters and indicates that any value falling between these two has an equal chance of being the best parameter to describe the data’s behavior. When dealing with a normal (Gaussian) distribution, the most likely value of the adjustable parameter (Gaussian peak) and the standard deviation of the error estimate are required to determine the algorithm’s maximum search limit. It is important to note that, in this instance, it is inferred that the likelihood of identifying the ideal parameter to describe the observed data decreases with increasing distance from the given average value. One can also utilize other prior distributions; see [61]. In this study, we used normal distributions as priors. The values determined for each parameter of the investigated models are detailed in Table 2. Figure 6 can allow us to visualize the distributions of the parameters in question.
Therefore, the next stage of our research will evaluate how well the models match the observational data. We must apply the Bayes factor to compare the Bayesian evidence of the theoretical models under examination. Taking the ln on both sides of Equation (20), we obtain that ln B 1 , 2 = ln ϵ 1 ln ϵ 2 . Consequently, all that is left to do is compute the difference between the evidence of the models and apply the Table 1 interpretation. The Ultranest package provides each model’s logarithm of the evidence. Table 1 shows the results for all species’ chromosomes C. melo and C. sativus. All other species showed similar behavior (see Table A8).
For chromosome 1 of the species C. melo, we computed the differences between the evidence ln ϵ 1 and ln ϵ 2 (see Table 3). This yielded the value ln ( B 1 , 2 ) = 0.065 ± 0.022 . We can conclude that both models are equally good at explaining the data’s behavior and that there is no statistically significant difference between them based on Jeffreys’ interpretation of the Bayes factor. A similar behavior was observed for the other chromosomes of the C. melo species as well as those of C. sativus, C. maxima, C. moschata, and C. pepo. Therefore, our results indicate that the q gamma distribution and the inverse q gamma distribution can be used to study the genome length distributions of the species analyzed here.

4. Conclusions

In this article, the length distributions of exon sequences of five species of the Cucurbitaceae family are analyzed: C. melo, C. sativus, C. maxima, C. moschata, and C. pepo. To perform this analysis, we implemented the timescale extraction method, which provided the distributions of exon sizes by considering the time series’ statistical properties. Based on the local distribution and fluctuations in the intensive parameter ξ , it was possible to show that the exon size distributions followed the q-Gamma and inverse q-Gamma distributions. Both distributions were obtained by analyzing data windows proportionate in size to the time scale T.
We used each chromosome belonging to the species studied in this research, constructed a time series from the datasets, and performed analyses. Specifically, we examined the exon size distributions, expressed in base pairs (bp), using the q-Gamma and inverse q-Gamma functions. As reported by Costa et al. [25], these functions also exhibited a range in which they deviated from the data when the values were small, below 50 bp. These distributions were able to capture low-order correlations in all investigated species.
The two probability distributions proposed in this study naturally incorporated the extensive parameter q, which, in the statistical generalization proposed by Tsallis, evaluates the autocorrelations present in the system under investigation. When q 1 , the conventional statistical pattern is recovered; each part of the system is independent [62]. For the genus Cucumis, the studied species exhibited an average q G of 1.1726 ± 0.0064 for the q-Gamma distribution and an average q I G of 1.0919 ± 0.0032 for the inverse q-Gamma distribution. In the case of the genus Cucurbita, the average q G for the q-Gamma distribution was 1.1727 ± 0.0071 , while the average q I G for the inverse q-Gamma distribution was 1.1316 ± 0.0216 . These results demonstrate short-range correlations, as described in the literature for the coding part of the genome of higher eukaryotes [25,47,49].
Still in the context of the analysis of plant genome lengths, we can observe that our results point to well-defined behavior for the entropic index q for all chromosomes and all species studied, which may indicate a specific standard biological signature among plants, a behavior that has already been observed for plant, animal, and viral genomes, even within other generalized statistical contexts, even when we consider the sequences of exons [47,63], introns [48], or proteins [64,65].
We implemented a Bayesian inference selection method to compare the proposed distributions, q-Gamma and inverse q-Gamma. In Figure 6 and Figure A3, we present triangle plots constructed with confidence regions for the parameters. Based on the data provided in Table 3 and Table A8, it is evident that the q-Gamma and inverse q-Gamma distributions share equal statistical weight in modeling the size distribution for the investigated species within the Cucurbitaceae family. It is worth noting that the investigated distributions emerge naturally from the proposed model; moreover, when we used other priors, the results differed from those presented here.

Author Contributions

M.O.C.: Methodology, writing—original draft preparation, software, visualization, investigation. R.S. conceptualization, methodology, writing—original draft preparation, writing—reviewing and editing. M.M.F.d.L. visualization, writing—original draft, writing—reviewing and editing. D.H.A.L.A. validation, writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The DNA code data that support the results of this study are available at the National Center for Biotechnology Information—NCBI [51] (https://www.ncbi.nlm.nih.gov/, accessed on 22 October 2023).

Acknowledgments

This study was financed in part by CNPq (Conselho Nacional de Desenvolvimento Científico e tecnológico), and by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES). R. Silva thanks CNPq (Grant No. 307620/2019-0) for financial support. D. H. A. L. Anselmo acknowledges CNPq for financial support (CNPq grant No. 317464/2021-3). M. M. F. Lima acknowledges CNPq for financial support (CNPq grant No. 152681/2024-8).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

In the main text, to determine the time scales γ 1 and T, we analyzed the exponential decay of the autocorrelation function of a time series, modeling it as a combination of two exponential functions (Equation (5)). t 1 is related to the scale γ 1 and t 2 to the scale T, with τ being the lag time. In chromosome 01 of C. melo, we obtained t 1 1.27 and t 2 121 , showing two time scales with the ratio t 2 / t 1 95 . The additional chromosomes of this species and every other species included in the research are detailed in Table A1 and Table A2.
Table A1. The autocorrelation analysis described by Equation (5) applied to the C. melo and C. sativus species.
Table A1. The autocorrelation analysis described by Equation (5) applied to the C. melo and C. sativus species.
C. meloC. sativus
CHR a t 1 b t 2 T t 2 / t 1 a t 1 b t 2 T t 2 / t 1
10.98541.270.0081120.5012195.280.99401.720.0039249.70250145.35
20.99331.380.0029183.25183132.320.98571.600.0116160.38160100.31
30.98391.270.0106116.6411792.130.99411.500.0027156.62157104.74
40.98651.250.0052145.34145116.280.99321.640.0043138.7513984.60
50.98771.340.0072179.80180133.930.98981.540.0065123.0212379.97
60.98681.220.0054137.93138113.210.99081.580.0068114.6511572.60
70.98631.160.0031173.42173149.270.99481.640.0034155.6015695.30
80.98581.320.0081122.6712393.04------
90.99051.300.0030168.08168129.23------
100.99401.220.0008280.17280230.45------
110.98691.230.0067154.90155126.32------
120.98861.270.0040160.76161126.67------
Table A2. The autocorrelation analysis described by Equation (5) applied to the C. maxima, C. moschata, and C. pepo species.
Table A2. The autocorrelation analysis described by Equation (5) applied to the C. maxima, C. moschata, and C. pepo species.
C. maximaC. moschataC. pepo
CHR a t 1 b t 2 T t 2 / t 1 a t 1 b t 2 T t 2 / t 1 a t 1 b t 2 T t 2 / t 1
10.98382.100.015187.188741.450.98701.680.012766.086639.290.96371.790.0345144.1614480.27
21.00101.62−0.0025120.3312074.170.98351.620.015563.866439.510.96901.500.027257.785838.67
30.98761.720.011168.626940.050.99511.500.003581.428153.960.88911.540.108374.487448.08
40.99751.940.002184.088443.230.98912.090.010865.496531.040.96201.940.036579.267940.81
50.99432.030.004976.557737.880.99701.820.001689.118948.980.98931.790.009970.037039.17
60.99601.840.003382.888345.181.00501.43−0.005480.818156.680.96051.250.036056.635745.56
70.98341.660.015566.416639.660.98211.250.009571.707257.780.94671.040.032657.265755.07
80.98911.740.009771.667241.450.99941.54−0.0028120.0112077.820.92280.90−0.014376.197684.44
90.98972.120.009970.767133.521.00101.83−0.0022119.9012065.430.94961.970.048754.245427.44
100.99251.950.006673.137337.490.98451.620.015565.156540.070.99031.640.006285.318551.99
110.99452.170.004379.838036.880.99271.700.003679.588047.000.94241.600.054561.276138.10
120.99001.290.001486.928767.440.98321.600.013068.196842.390.96961.780.027157.905832.60
130.98891.780.008071.827240.360.92330.90−0.015467.006774.440.98741.620.007972.827344.95
140.99241.780.007372.337240.400.98971.730.009968.656939.930.95791.820.042154.785530.22
150.99462.250.006275.287533.270.98352.140.015763.946429.850.99171.740.006174.887543.23
160.77201.990.021760.626130.610.99291.980.007571.987236.380.98081.520.012668.386844.65
170.98682.140.014663.516429.880.96891.900.0300120.4012063.260.98791.450.0072120.0312082.99
180.98532.150.014365.266530.300.98021.840.020293.339350.430.94222.180.057154.745525.23
190.98092.070.018861.136129.440.97001.630.027271.947244.200.93401.580.064053.825434.13
200.99032.050.010270.697134.580.98041.610.016165.086540.400.99081.820.007675.827641.80
The optimal parameters found for the q gamma and inverse q gamma distributions via Equations (A7) and (A16), respectively, are described in Table A3, Table A4, Table A5, Table A6 and Table A7, for all chromosomes of all five species studied. These values can provide us with important information about the behavior of the observational data.
Figure A1 and Figure A2 present the quantile–quantile analysis applied to species belonging to the genera Cucumis and Cucurbita. Each subplot compares theoretical distributions, represented on the horizontal axis, and empirical distributions, observed in the data, represented on the vertical axis. In the context of the analyses performed, the data exhibit behavior that can be adequately modeled by q-Gamma and inverse q-Gamma distributions, represented by the blue and red curves, respectively. It highlights that both distributions are capable of reproducing the observed behavior.
Figure A3 shows the interaction between the parameters of Equations (12) and (14). It presents the projections of the posterior distributions for the free parameters of the distributions mentioned above for the genus Curcubita. In addition, (a,b) present the distributions for chromosomes 11 and 15 of the species C. maxima, respectively; (c,d) show the distributions for chromosomes 08 and 15 of the species C. moschata, respectively; and (e,f) show the distributions for chromosomes 04 and 06 of the species C. pepo, respectively. All other chromosomes present the same behavior.
Table A3. The table shows the best values of the parameters A G , a G , σ G , and q G for the q-Gamma distribution mentioned in Equation (12), as well as the values of A I G , α I G , σ I G , and q I G for the inverse q-Gamma distribution described in Equation (14), applied to the C. melo species.
Table A3. The table shows the best values of the parameters A G , a G , σ G , and q G for the q-Gamma distribution mentioned in Equation (12), as well as the values of A I G , α I G , σ I G , and q I G for the inverse q-Gamma distribution described in Equation (14), applied to the C. melo species.
q-GammaInverse q-Gamma
CHR A G a G σ G q G A IG α IG σ IG q IG
1 5.865 × 10 7 ± 2.881 × 10 8 2.982 ± 0.142 12.318 ± 0.620 1.172 ± 0.053 4.135 × 10 3 ± 2.056 × 10 2 0.528 ± 0.026 243.701 ± 12.038 1.090 ± 0.057
2 5.809 × 10 7 ± 2.868 × 10 8 2.986 ± 0.149 12.344 ± 0.590 1.177 ± 0.053 4.526 × 10 3 ± 2.242 × 10 2 0.524 ± 0.026 257.184 ± 12.575 1.091 ± 0.052
3 5.928 × 10 7 ± 2.861 × 10 8 2.975 ± 0.151 12.329 ± 0.608 1.169 ± 0.051 3.931 × 10 3 ± 1.947 × 10 2 0.525 ± 0.026 235.742 ± 11.563 1.097 ± 0.053
4 5.806 × 10 7 ± 2.896 × 10 8 2.983 ± 0.145 12.352 ± 0.605 1.170 ± 0.056 4.389 × 10 3 ± 2.166 × 10 2 0.525 ± 0.026 250.523 ± 12.558 1.091 ± 0.054
5 5.985 × 10 7 ± 3.022 × 10 8 2.984 ± 0.151 12.365 ± 0.580 1.165 ± 0.055 3.849 × 10 3 ± 1.979 × 10 2 0.525 ± 0.026 232.731 ± 11.460 1.092 ± 0.054
6 5.929 × 10 7 ± 2.992 × 10 8 2.990 ± 0.149 12.296 ± 0.625 1.169 ± 0.053 3.819 × 10 3 ± 1.863 × 10 2 0.524 ± 0.024 232.298 ± 11.564 1.092 ± 0.053
7 5.922 × 10 7 ± 2.853 × 10 8 2.986 ± 0.140 12.361 ± 0.622 1.172 ± 0.055 3.828 × 10 3 ± 1.903 × 10 2 0.527 ± 0.025 232.823 ± 11.860 1.096 ± 0.057
8 5.959 × 10 7 ± 2.865 × 10 8 2.986 ± 0.143 12.319 ± 0.603 1.171 ± 0.054 3.899 × 10 3 ± 1.999 × 10 2 0.526 ± 0.027 233.974 ± 11.753 1.083 ± 0.053
9 5.868 × 10 7 ± 2.902 × 10 8 2.989 ± 0.146 12.337 ± 0.626 1.171 ± 0.054 3.993 × 10 3 ± 1.949 × 10 2 0.526 ± 0.027 239.356 ± 12.004 1.088 ± 0.054
10 5.906 × 10 7 ± 2.817 × 10 8 2.979 ± 0.139 12.320 ± 0.614 1.170 ± 0.055 3.739 × 10 3 ± 1.844 × 10 2 0.524 ± 0.027 229.712 ± 11.908 1.094 ± 0.054
11 5.843 × 10 7 ± 2.860 × 10 8 2.983 ± 0.149 12.341 ± 0.631 1.170 ± 0.056 4.068 × 10 3 ± 1.984 × 10 2 0.524 ± 0.026 241.484 ± 12.251 1.098 ± 0.055
12 5.836 × 10 7 ± 2.805 × 10 8 2.970 ± 0.139 12.325 ± 0.613 1.170 ± 0.055 4.145 × 10 3 ± 2.126 × 10 2 0.527 ± 0.026 244.823 ± 12.665 1.093 ± 0.054
Table A4. The same procedures as in Table A3 were applied to the C. sativus species.
Table A4. The same procedures as in Table A3 were applied to the C. sativus species.
q-GammaInverse q-Gamma
CHR A G a G σ G q G A IG α IG σ IG q IG
1 5.719 × 10 7 ± 2.955 × 10 8 2.990 ± 0.146 12.282 ± 0.563 1.175 ± 0.054 3.255 × 10 3 ± 1.651 × 10 2 0.527 ± 0.026 234.521 ± 11.537 1.090 ± 0.054
2 5.714 × 10 7 ± 2.835 × 10 8 2.990 ± 0.142 12.286 ± 0.596 1.176 ± 0.054 3.554 × 10 3 ± 1.777 × 10 2 0.525 ± 0.025 241.540 ± 12.020 1.089 ± 0.053
3 5.696 × 10 7 ± 2.869 × 10 8 2.986 ± 0.143 12.307 ± 0.612 1.176 ± 0.054 3.583 × 10 3 ± 1.861 × 10 2 0.524 ± 0.025 244.520 ± 12.068 1.090 ± 0.055
4 5.735 × 10 7 ± 2.794 × 10 8 2.989 ± 0.146 12.277 ± 0.637 1.168 ± 0.053 3.252 × 10 3 ± 1.650 × 10 2 0.525 ± 0.026 233.304 ± 11.927 1.092 ± 0.054
5 5.724 × 10 7 ± 2.756 × 10 8 3.003 ± 0.146 12.277 ± 0.592 1.169 ± 0.054 3.551 × 10 3 ± 1.807 × 10 2 0.524 ± 0.026 243.650 ± 12.022 1.091 ± 0.053
6 5.716 × 10 7 ± 2.868 × 10 8 2.973 ± 0.151 12.290 ± 0.615 1.172 ± 0.056 3.269 × 10 3 ± 1.652 × 10 2 0.524 ± 0.025 233.519 ± 11.283 1.091 ± 0.051
7 5.696 × 10 7 ± 2.894 × 10 8 2.983 ± 0.144 12.262 ± 0.617 1.179 ± 0.056 3.693 × 10 3 ± 1.813 × 10 2 0.522 ± 0.026 245.726 ± 11.915 1.092 ± 0.055
Table A5. The same procedures as in Table A3 were applied to the C. maxima species.
Table A5. The same procedures as in Table A3 were applied to the C. maxima species.
q-GammaInverse q-Gamma
CHR A G a G σ G q G A IG α IG σ IG q IG
1 1.456 × 10 6 ± 7.438 × 10 8 2.749 ± 0.144 13.207 ± 0.706 1.175 ± 0.055 3.361 × 10 3 ± 1.596 × 10 2 0.527 ± 0.026 237.645 ± 11.970 1.158 ± 0.056
2 6.285 × 10 7 ± 3.048 × 10 8 2.984 ± 0.147 12.270 ± 0.621 1.169 ± 0.053 3.596 × 10 3 ± 1.768 × 10 2 0.524 ± 0.026 254.482 ± 12.562 1.164 ± 0.058
3 6.220 × 10 7 ± 3.159 × 10 8 2.985 ± 0.144 12.289 ± 0.630 1.174 ± 0.053 3.001 × 10 3 ± 1.612 × 10 2 0.523 ± 0.026 216.471 ± 10.948 1.137 ± 0.054
4 6.231 × 10 7 ± 3.137 × 10 8 2.983 ± 0.143 12.277 ± 0.614 1.174 ± 0.055 3.047 × 10 3 ± 1.387 × 10 2 0.528 ± 0.025 215.570 ± 10.733 1.134 ± 0.056
5 1.468 × 10 6 ± 7.270 × 10 8 2.606 ± 0.133 18.068 ± 0.924 1.172 ± 0.057 3.569 × 10 3 ± 1.833 × 10 2 0.525 ± 0.028 241.221 ± 12.369 1.148 ± 0.058
6 6.415 × 10 7 ± 3.254 × 10 8 2.974 ± 0.153 12.294 ± 0.614 1.173 ± 0.055 3.014 × 10 3 ± 1.514 × 10 2 0.526 ± 0.026 208.958 ± 10.860 1.121 ± 0.055
7 6.243 × 10 7 ± 3.146 × 10 8 2.976 ± 0.146 12.302 ± 0.606 1.176 ± 0.055 2.959 × 10 3 ± 1.476 × 10 2 0.527 ± 0.026 212.403 ± 11.015 1.124 ± 0.056
8 6.593 × 10 7 ± 3.283 × 10 8 2.988 ± 0.143 12.231 ± 0.601 1.170 ± 0.055 3.071 × 10 3 ± 1.503 × 10 2 0.522 ± 0.026 208.611 ± 10.280 1.123 ± 0.055
9 6.020 × 10 7 ± 3.110 × 10 8 2.983 ± 0.148 12.290 ± 0.607 1.171 ± 0.053 3.607 × 10 3 ± 1.795 × 10 2 0.525 ± 0.025 254.313 ± 12.365 1.163 ± 0.058
10 1.462 × 10 6 ± 7.095 × 10 8 2.696 ± 0.134 12.306 ± 0.622 1.208 ± 0.055 4.001 × 10 3 ± 1.899 × 10 2 0.525 ± 0.025 278.209 ± 13.770 1.187 ± 0.060
11 6.142 × 10 7 ± 2.975 × 10 8 2.989 ± 0.149 12.294 ± 0.602 1.175 ± 0.054 2.945 × 10 3 ± 1.462 × 10 2 0.526 ± 0.026 208.552 ± 10.454 1.121 ± 0.058
12 6.655 × 10 7 ± 3.264 × 10 8 2.986 ± 0.153 12.255 ± 0.634 1.171 ± 0.053 3.008 × 10 3 ± 1.557 × 10 2 0.526 ± 0.027 209.127 ± 10.545 1.123 ± 0.057
13 5.967 × 10 7 ± 2.993 × 10 8 2.995 ± 0.147 12.296 ± 0.617 1.171 ± 0.055 3.068 × 10 3 ± 1.532 × 10 2 0.527 ± 0.026 223.678 ± 10.903 1.138 ± 0.057
14 6.366 × 10 7 ± 3.207 × 10 8 2.979 ± 0.144 12.300 ± 0.598 1.169 ± 0.054 2.977 × 10 3 ± 1.450 × 10 2 0.527 ± 0.027 209.614 ± 9.993 1.117 ± 0.056
15 5.948 × 10 7 ± 3.064 × 10 8 2.987 ± 0.149 12.315 ± 0.615 1.168 ± 0.055 2.922 × 10 3 ± 1.413 × 10 2 0.527 ± 0.026 222.062 ± 10.873 1.154 ± 0.057
16 6.338 × 10 7 ± 3.098 × 10 8 2.977 ± 0.143 12.303 ± 0.627 1.173 ± 0.052 3.208 × 10 3 ± 1.620 × 10 2 0.524 ± 0.026 224.222 ± 10.977 1.135 ± 0.057
17 6.049 × 10 7 ± 2.971 × 10 8 2.997 ± 0.147 12.274 ± 0.588 1.173 ± 0.053 3.547 × 10 3 ± 1.751 × 10 2 0.525 ± 0.026 244.535 ± 11.554 1.156 ± 0.055
18 6.458 × 10 7 ± 3.237 × 10 8 2.975 ± 0.145 12.314 ± 0.620 1.173 ± 0.055 2.985 × 10 3 ± 1.552 × 10 2 0.522 ± 0.025 209.307 ± 10.592 1.123 ± 0.054
19 5.852 × 10 7 ± 2.874 × 10 8 2.982 ± 0.143 12.312 ± 0.593 1.172 ± 0.054 3.192 × 10 3 ± 1.585 × 10 2 0.524 ± 0.025 224.393 ± 11.033 1.094 ± 0.054
20 6.028 × 10 7 ± 2.983 × 10 8 2.994 ± 0.144 12.295 ± 0.614 1.167 ± 0.055 3.532 × 10 3 ± 1.763 × 10 2 0.527 ± 0.025 248.378 ± 12.065 1.154 ± 0.056
Table A6. The same procedures as in Table A3 were applied to the C. moschata species.
Table A6. The same procedures as in Table A3 were applied to the C. moschata species.
q-GammaInverse q-Gamma
CHR A G a G σ G q G A IG α IG σ IG q IG
1 6.508 × 10 7 ± 3.261 × 10 8 2.981 ± 0.146 12.318 ± 0.614 1.170 ± 0.055 3.014 × 10 3 ± 1.528 × 10 2 0.525 ± 0.026 208.699 ± 10.713 1.117 ± 0.055
2 5.945 × 10 7 ± 3.004 × 10 8 2.983 ± 0.150 12.255 ± 0.627 1.171 ± 0.055 2.810 × 10 3 ± 1.427 × 10 2 0.524 ± 0.025 209.805 ± 10.839 1.126 ± 0.055
3 6.775 × 10 7 ± 3.310 × 10 8 2.988 ± 0.147 12.267 ± 0.611 1.170 ± 0.052 3.367 × 10 3 ± 1.706 × 10 2 0.524 ± 0.027 242.470 ± 12.013 1.159 ± 0.057
4 6.425 × 10 7 ± 3.298 × 10 8 2.992 ± 0.141 12.256 ± 0.638 1.170 ± 0.054 2.946 × 10 3 ± 1.399 × 10 2 0.526 ± 0.026 209.261 ± 10.465 1.124 ± 0.054
5 6.306 × 10 7 ± 3.181 × 10 8 2.976 ± 0.147 12.276 ± 0.614 1.170 ± 0.055 3.332 × 10 3 ± 1.636 × 10 2 0.523 ± 0.026 225.113 ± 11.021 1.127 ± 0.055
6 6.020 × 10 7 ± 2.931 × 10 8 2.992 ± 0.148 12.325 ± 0.631 1.174 ± 0.054 3.316 × 10 3 ± 1.638 × 10 2 0.526 ± 0.026 241.424 ± 11.991 1.160 ± 0.057
7 6.456 × 10 7 ± 3.202 × 10 8 2.988 ± 0.146 12.295 ± 0.622 1.168 ± 0.052 3.138 × 10 3 ± 1.619 × 10 2 0.528 ± 0.025 208.060 ± 10.257 1.116 ± 0.054
8 6.267 × 10 7 ± 3.003 × 10 8 2.985 ± 0.143 12.266 ± 0.629 1.171 ± 0.054 2.940 × 10 3 ± 1.505 × 10 2 0.526 ± 0.026 209.010 ± 10.454 1.128 ± 0.055
9 5.722 × 10 7 ± 2.943 × 10 8 3.001 ± 0.146 12.290 ± 0.591 1.169 ± 0.052 3.329 × 10 3 ± 1.640 × 10 2 0.526 ± 0.025 253.320 ± 12.660 1.177 ± 0.059
10 6.285 × 10 7 ± 3.179 × 10 8 2.985 ± 0.151 12.301 ± 0.610 1.168 ± 0.054 3.160 × 10 3 ± 1.631 × 10 2 0.524 ± 0.027 209.186 ± 10.450 1.101 ± 0.054
11 5.913 × 10 7 ± 2.892 × 10 8 2.979 ± 0.145 12.240 ± 0.614 1.168 ± 0.054 3.119 × 10 3 ± 1.519 × 10 2 0.523 ± 0.025 228.948 ± 11.847 1.140 ± 0.057
12 6.453 × 10 7 ± 3.262 × 10 8 2.982 ± 0.149 12.301 ± 0.620 1.171 ± 0.054 3.123 × 10 3 ± 1.541 × 10 2 0.525 ± 0.026 211.091 ± 10.676 1.110 ± 0.054
13 5.884 × 10 7 ± 2.832 × 10 8 2.970 ± 0.151 12.267 ± 0.642 1.170 ± 0.054 3.555 × 10 3 ± 1.805 × 10 2 0.524 ± 0.026 245.276 ± 12.358 1.144 ± 0.059
14 6.144 × 10 7 ± 3.129 × 10 8 2.978 ± 0.146 12.245 ± 0.603 1.168 ± 0.055 2.919 × 10 3 ± 1.412 × 10 2 0.524 ± 0.025 208.038 ± 10.411 1.122 ± 0.056
15 5.996 × 10 7 ± 3.118 × 10 8 2.982 ± 0.153 12.313 ± 0.611 1.168 ± 0.054 2.749 × 10 3 ± 1.424 × 10 2 0.524 ± 0.026 209.581 ± 10.296 1.142 ± 0.057
16 6.098 × 10 7 ± 3.227 × 10 8 2.995 ± 0.147 12.299 ± 0.617 1.172 ± 0.056 3.433 × 10 3 ± 1.748 × 10 2 0.524 ± 0.026 233.568 ± 12.013 1.130 ± 0.055
17 6.325 × 10 7 ± 2.996 × 10 8 2.980 ± 0.142 12.284 ± 0.615 1.171 ± 0.053 3.297 × 10 3 ± 1.663 × 10 2 0.525 ± 0.025 221.457 ± 10.878 1.118 ± 0.056
18 6.389 × 10 7 ± 3.120 × 10 8 2.981 ± 0.144 12.308 ± 0.617 1.171 ± 0.055 3.013 × 10 3 ± 1.516 × 10 2 0.523 ± 0.026 208.723 ± 10.208 1.114 ± 0.054
19 5.940 × 10 7 ± 3.001 × 10 8 2.984 ± 0.150 12.278 ± 0.609 1.174 ± 0.052 3.070 × 10 3 ± 1.541 × 10 2 0.526 ± 0.025 222.439 ± 10.900 1.134 ± 0.056
20 1.455 × 10 6 ± 7.255 × 10 8 2.686 ± 0.134 12.301 ± 0.606 1.207 ± 0.059 4.365 × 10 3 ± 2.241 × 10 2 0.528 ± 0.024 296.115 ± 15.331 1.189 ± 0.057
Table A7. The same procedures as in Table A3 were applied to the Cucurbita pepo species.
Table A7. The same procedures as in Table A3 were applied to the Cucurbita pepo species.
q-GammaInverse q-Gamma
CHR A G a G σ G q G A IG α IG σ IG q IG
1 6.171 × 10 7 ± 3.091 × 10 8 2.980 ± 0.141 12.262 ± 0.617 1.175 ± 0.055 3.063 × 10 3 ± 1.541 × 10 2 0.525 ± 0.025 209.370 ± 10.632 1.120 ± 0.057
2 5.976 × 10 7 ± 3.004 × 10 8 2.974 ± 0.139 12.269 ± 0.597 1.170 ± 0.055 2.990 × 10 3 ± 1.479 × 10 2 0.522 ± 0.026 208.945 ± 10.598 1.112 ± 0.056
3 6.206 × 10 7 ± 2.988 × 10 8 2.983 ± 0.137 12.255 ± 0.603 1.173 ± 0.053 3.016 × 10 3 ± 1.519 × 10 2 0.527 ± 0.025 208.831 ± 10.467 1.130 ± 0.055
4 5.776 × 10 7 ± 2.841 × 10 8 2.986 ± 0.144 12.286 ± 0.624 1.170 ± 0.054 2.981 × 10 3 ± 1.487 × 10 2 0.526 ± 0.025 209.237 ± 10.066 1.115 ± 0.055
5 6.014 × 10 7 ± 3.191 × 10 8 2.981 ± 0.148 12.288 ± 0.609 1.170 ± 0.055 2.950 × 10 3 ± 1.428 × 10 2 0.524 ± 0.025 209.034 ± 10.637 1.118 ± 0.052
6 6.753 × 10 7 ± 3.312 × 10 8 2.988 ± 0.148 12.278 ± 0.605 1.166 ± 0.055 3.504 × 10 3 ± 1.745 × 10 2 0.528 ± 0.025 209.254 ± 10.435 1.113 ± 0.058
7 6.694 × 10 7 ± 3.220 × 10 8 2.974 ± 0.150 12.302 ± 0.616 1.168 ± 0.054 3.289 × 10 3 ± 1.591 × 10 2 0.524 ± 0.024 208.556 ± 10.929 1.107 ± 0.057
8 5.717 × 10 7 ± 2.892 × 10 8 2.975 ± 0.149 12.285 ± 0.622 1.175 ± 0.055 2.611 × 10 3 ± 1.348 × 10 2 0.524 ± 0.025 209.373 ± 9.912 1.138 ± 0.056
9 7.129 × 10 7 ± 3.538 × 10 8 2.984 ± 0.146 12.271 ± 0.643 1.171 ± 0.054 3.458 × 10 3 ± 1.749 × 10 2 0.526 ± 0.026 209.003 ± 10.615 1.116 ± 0.057
10 5.951 × 10 7 ± 3.026 × 10 8 2.985 ± 0.146 12.345 ± 0.614 1.169 ± 0.055 2.968 × 10 3 ± 1.460 × 10 2 0.524 ± 0.026 209.462 ± 10.334 1.116 ± 0.055
11 1.464 × 10 6 ± 7.125 × 10 8 2.767 ± 0.136 12.319 ± 0.613 1.173 ± 0.053 3.445 × 10 3 ± 1.794 × 10 2 0.526 ± 0.026 231.581 ± 11.564 1.091 ± 0.054
12 6.232 × 10 7 ± 3.308 × 10 8 2.997 ± 0.150 12.308 ± 0.613 1.168 ± 0.053 3.050 × 10 3 ± 1.458 × 10 2 0.527 ± 0.026 208.665 ± 10.225 1.113 ± 0.056
13 6.060 × 10 7 ± 3.198 × 10 8 2.979 ± 0.150 12.268 ± 0.596 1.178 ± 0.057 2.928 × 10 3 ± 1.463 × 10 2 0.522 ± 0.025 209.873 ± 10.115 1.126 ± 0.055
14 6.588 × 10 7 ± 3.406 × 10 8 2.966 ± 0.145 12.338 ± 0.631 1.171 ± 0.054 3.264 × 10 3 ± 1.620 × 10 2 0.527 ± 0.026 210.521 ± 10.453 1.107 ± 0.055
15 6.099 × 10 7 ± 2.902 × 10 8 2.984 ± 0.142 12.290 ± 0.613 1.175 ± 0.055 3.004 × 10 3 ± 1.460 × 10 2 0.525 ± 0.026 209.403 ± 10.791 1.119 ± 0.056
16 6.112 × 10 7 ± 2.964 × 10 8 2.988 ± 0.143 12.312 ± 0.585 1.167 ± 0.053 2.997 × 10 3 ± 1.481 × 10 2 0.525 ± 0.025 208.459 ± 10.559 1.120 ± 0.057
17 5.964 × 10 7 ± 2.999 × 10 8 2.989 ± 0.151 12.296 ± 0.640 1.169 ± 0.054 2.922 × 10 3 ± 1.498 × 10 2 0.524 ± 0.025 208.371 ± 10.415 1.116 ± 0.055
18 6.194 × 10 7 ± 3.193 × 10 8 2.967 ± 0.147 12.332 ± 0.620 1.167 ± 0.056 3.226 × 10 3 ± 1.574 × 10 2 0.522 ± 0.027 211.364 ± 10.848 1.123 ± 0.056
19 5.717 × 10 7 ± 2.802 × 10 8 2.979 ± 0.153 12.323 ± 0.628 1.174 ± 0.055 2.596 × 10 3 ± 1.303 × 10 2 0.526 ± 0.025 210.284 ± 10.298 1.139 ± 0.056
20 5.720 × 10 7 ± 2.901 × 10 8 2.980 ± 0.147 12.340 ± 0.599 1.172 ± 0.056 3.645 × 10 3 ± 1.794 × 10 2 0.525 ± 0.026 273.876 ± 13.334 1.177 ± 0.060
Figure A1. Quantile–quantile representation of the superstatistical distributions for nucleotide chain lengths in the chromosomes of the family Cucurbitaceae, genus Cucumis. The black dots represent the observed data, while the red line illustrates the theoretical distribution based on the inverse q-Gamma. In contrast, the blue line represents the theoretical q-Gamma distribution. For better visualization, the values of the q-Gamma distribution were shifted by a factor of 10 2 . (a,b) show chromosomes 08 and 11, respectively, for the species C. melo, while (c,d) correspond to chromosomes 02 and 07, respectively, for the species C. sativus.
Figure A1. Quantile–quantile representation of the superstatistical distributions for nucleotide chain lengths in the chromosomes of the family Cucurbitaceae, genus Cucumis. The black dots represent the observed data, while the red line illustrates the theoretical distribution based on the inverse q-Gamma. In contrast, the blue line represents the theoretical q-Gamma distribution. For better visualization, the values of the q-Gamma distribution were shifted by a factor of 10 2 . (a,b) show chromosomes 08 and 11, respectively, for the species C. melo, while (c,d) correspond to chromosomes 02 and 07, respectively, for the species C. sativus.
Entropy 26 00819 g0a1
The Bayesian factors and evidence values for each chromosome of the species C. maxima, C. moschata, and C. pepo are presented in Table A8. The Bayesian evidence for the q gamma distribution is shown in column ln ( ϵ 1 ) ; the Bayesian evidence for the inverse q gamma distribution is shown in column ln ( ϵ 2 ) ; and the Bayesian factor is shown in column ln ( B 1 , 2 ) . The Jeffreys’ scale found in Table 1 of the main text can be used to analyze the data shown in the table. As we can see, the findings make it impossible to identify any of the suggested distributions as superior. Consequently, either can be used to study the exon length distributions of the species analyzed here without bias.
Figure A2. Quantile–quantile representation of the superstatistical distributions for nucleotide chain lengths in the chromosomes of the family Cucurbitaceae, genus Cucurbita. The black dots represent the observed data, while the red line illustrates the theoretical distribution based on the inverse q-Gamma. In contrast, the blue line represents the theoretical q-Gamma distribution. For better visualization, the values of the q-Gamma distribution were shifted by a factor of 10 2 . (a,b) show chromosomes 11 and 15, respectively, for the species C. maxima. (c,d) show chromosomes 08 and 15, respectively, for the species C. moschata. (e,f) show chromosomes 04 and 06, respectively, for the species C. pepo.
Figure A2. Quantile–quantile representation of the superstatistical distributions for nucleotide chain lengths in the chromosomes of the family Cucurbitaceae, genus Cucurbita. The black dots represent the observed data, while the red line illustrates the theoretical distribution based on the inverse q-Gamma. In contrast, the blue line represents the theoretical q-Gamma distribution. For better visualization, the values of the q-Gamma distribution were shifted by a factor of 10 2 . (a,b) show chromosomes 11 and 15, respectively, for the species C. maxima. (c,d) show chromosomes 08 and 15, respectively, for the species C. moschata. (e,f) show chromosomes 04 and 06, respectively, for the species C. pepo.
Entropy 26 00819 g0a2
Table A8. The table presents the Bayesian evidence and the Bayes factors for each chromosome of the C. maxima, C. moschata, and C. pepo species. The column ln ( ϵ 1 ) presents the Bayesian evidence for the q-Gamma distribution, ln ( ϵ 2 ) presents the Bayesian evidence for the inverse q-Gamma distribution, and ln ( B 1 , 2 ) presents the Bayes factor.
Table A8. The table presents the Bayesian evidence and the Bayes factors for each chromosome of the C. maxima, C. moschata, and C. pepo species. The column ln ( ϵ 1 ) presents the Bayesian evidence for the q-Gamma distribution, ln ( ϵ 2 ) presents the Bayesian evidence for the inverse q-Gamma distribution, and ln ( B 1 , 2 ) presents the Bayes factor.
C. maximaC. moschataC. pepo
CHR ln ϵ 1 ln ϵ 2 ln B 1 , 2 ln ϵ 1 ln ϵ 2 ln B 1 , 2 ln ϵ 1 ln ϵ 2 ln B 1 , 2
1 0.065 ± 0.016 0.020 ± 0.010 0.044 ± 0.006 0.095 ± 0.024 0.020 ± 0.010 0.074 ± 0.014 0.101 ± 0.037 0.020 ± 0.010 0.081 ± 0.027
2 0.076 ± 0.027 0.020 ± 0.010 0.055 ± 0.017 0.090 ± 0.022 0.019 ± 0.009 0.070 ± 0.012 0.079 ± 0.020 0.020 ± 0.010 0.058 ± 0.010
3 0.086 ± 0.022 0.020 ± 0.010 0.065 ± 0.012 0.083 ± 0.020 0.020 ± 0.010 0.062 ± 0.010 0.086 ± 0.025 0.021 ± 0.010 0.065 ± 0.015
4 0.091 ± 0.029 0.020 ± 0.010 0.071 ± 0.019 0.091 ± 0.019 0.020 ± 0.010 0.071 ± 0.009 0.077 ± 0.023 0.020 ± 0.010 0.057 ± 0.013
5 0.085 ± 0.025 0.021 ± 0.010 0.064 ± 0.015 0.074 ± 0.017 0.020 ± 0.010 0.054 ± 0.007 0.087 ± 0.025 0.019 ± 0.010 0.068 ± 0.015
6 0.090 ± 0.027 0.020 ± 0.009 0.069 ± 0.017 0.119 ± 0.028 0.020 ± 0.010 0.099 ± 0.018 0.095 ± 0.026 0.023 ± 0.010 0.072 ± 0.016
7 0.092 ± 0.031 0.020 ± 0.010 0.072 ± 0.021 0.089 ± 0.020 0.021 ± 0.010 0.068 ± 0.010 0.089 ± 0.030 0.022 ± 0.010 0.067 ± 0.020
8 0.064 ± 0.019 0.021 ± 0.010 0.043 ± 0.009 0.085 ± 0.024 0.020 ± 0.010 0.065 ± 0.014 0.077 ± 0.022 0.018 ± 0.009 0.058 ± 0.012
9 0.074 ± 0.026 0.020 ± 0.010 0.053 ± 0.016 0.084 ± 0.024 0.019 ± 0.010 0.065 ± 0.014 0.094 ± 0.016 0.023 ± 0.010 0.070 ± 0.006
10 0.074 ± 0.013 0.021 ± 0.009 0.053 ± 0.003 0.084 ± 0.019 0.021 ± 0.010 0.063 ± 0.009 0.082 ± 0.016 0.020 ± 0.010 0.062 ± 0.006
11 0.093 ± 0.041 0.020 ± 0.010 0.073 ± 0.031 0.094 ± 0.023 0.020 ± 0.010 0.074 ± 0.013 0.108 ± 0.024 0.046 ± 0.010 0.062 ± 0.014
12 0.115 ± 0.023 0.021 ± 0.010 0.094 ± 0.013 0.086 ± 0.022 0.020 ± 0.009 0.065 ± 0.012 0.087 ± 0.025 0.020 ± 0.010 0.067 ± 0.015
13 0.088 ± 0.026 0.019 ± 0.010 0.069 ± 0.016 0.076 ± 0.025 0.020 ± 0.010 0.055 ± 0.015 0.093 ± 0.022 0.020 ± 0.010 0.073 ± 0.012
14 0.104 ± 0.016 0.020 ± 0.010 0.084 ± 0.007 0.099 ± 0.025 0.020 ± 0.010 0.078 ± 0.015 0.090 ± 0.017 0.021 ± 0.010 0.069 ± 0.007
15 0.072 ± 0.021 0.019 ± 0.010 0.052 ± 0.011 0.081 ± 0.020 0.020 ± 0.010 0.061 ± 0.010 0.089 ± 0.026 0.020 ± 0.010 0.068 ± 0.016
16 0.086 ± 0.020 0.020 ± 0.010 0.065 ± 0.010 0.104 ± 0.026 0.020 ± 0.010 0.084 ± 0.016 0.082 ± 0.028 0.020 ± 0.010 0.062 ± 0.018
17 0.083 ± 0.022 0.020 ± 0.010 0.063 ± 0.012 0.097 ± 0.033 0.020 ± 0.010 0.076 ± 0.023 0.070 ± 0.021 0.019 ± 0.010 0.050 ± 0.011
18 0.093 ± 0.021 0.020 ± 0.010 0.072 ± 0.011 0.095 ± 0.025 0.020 ± 0.010 0.075 ± 0.015 0.083 ± 0.021 0.022 ± 0.010 0.061 ± 0.011
19 0.061 ± 0.032 0.010 ± 0.010 0.051 ± 0.022 0.076 ± 0.025 0.019 ± 0.010 0.056 ± 0.015 0.077 ± 0.022 0.018 ± 0.009 0.058 ± 0.012
20 0.086 ± 0.025 0.020 ± 0.010 0.066 ± 0.015 0.099 ± 0.024 0.020 ± 0.010 0.078 ± 0.014 0.083 ± 0.018 0.019 ± 0.010 0.064 ± 0.008
Figure A3. Results of the Bayesian inference process, showing projections of the posterior distributions for the free parameters of the q-Gamma and inverse q-Gamma distributions for the Curcubita genus. (a,b) present the distributions for chromosomes 11 and 15 of the C. maxima species, respectively. (c,d) show the distributions for chromosomes 08 and 15 of the C. moschata species, respectively. (e,f) show the distributions for chromosomes 04 and 06 of the C. pepo species, respectively.
Figure A3. Results of the Bayesian inference process, showing projections of the posterior distributions for the free parameters of the q-Gamma and inverse q-Gamma distributions for the Curcubita genus. (a,b) present the distributions for chromosomes 11 and 15 of the C. maxima species, respectively. (c,d) show the distributions for chromosomes 08 and 15 of the C. moschata species, respectively. (e,f) show the distributions for chromosomes 04 and 06 of the C. pepo species, respectively.
Entropy 26 00819 g0a3

Appendix B. Deduction of the Inverse q-Gamma and q-Gamma Distributions

Appendix B.1. Inverse Gamma Superstatistics

Based on the results obtained from the time series analysis, we will establish the size distribution from the temporal evolution of exon sizes. We start with the assumption that the local distribution within a specific time interval T follows from a conditional distribution for l given a value of ξ , namely:
f ( l | ξ ) = 1 Γ ( k ) k ξ k x k 1 exp k ξ l .
Next, we introduce the stationary inverse gamma distribution to incorporate fluctuations in the local mean ξ of the exon sizes:
p ( ξ ) = ω μ Γ ( μ ) ξ μ 1 exp ( ω / ξ ) .
Consequently, the joint probability of obtaining a specific value of l and ξ , denoted as P ( l , ξ ) , is given by:
P ( l , ξ ) = f ( l | ξ ) p ( ξ ) .
Moreover, the marginal probability of obtaining a given value of l, independent of ξ , is expressed as:
p ( l ) = 0 f ( l | ξ ) p ( ξ ) d ξ .
By applying Equations (A1) and (A2) to Equation (A4) and performing the integration, we obtain the following expression:
p ( l ) = Γ ( k + μ ) Γ ( k ) Γ ( μ ) k ω k ω l k 1 1 + k ω l k μ .
Now, it is appropriate to introduce some changes of variables as follows:
k ω = q 1 σ , k = 1 q 1 μ , a = k 1 .
Therefore, Equation (A5) can be expressed as p G ( l ) :
p G ( l ) = A G l σ a 1 + ( 1 q ) l σ 1 1 q .
Equation (A7) can be expressed in terms of the q-exponential as follows:
p G ( l ) = A G l σ a exp q l σ .
Here, the q-exponential is defined as:
exp q l σ = 1 + ( 1 q ) l σ 1 1 q .
where p G ( l ) represents the q-Gamma probability density, and A G is defined as:
A G = ( q 1 ) a + 1 Γ 1 q 1 σ Γ 1 q 1 a 1 Γ ( a + 1 ) .

Appendix B.2. Gamma Superstatistics

We can create the discrete-time “temporal” evolution for the exon size distribution by applying the process described in Appendix B.1. Consequently, assuming a local distribution, a data window of size T is defined by the distribution of l for a given ξ
f ( l | ξ ) = α α + 1 ξ Γ ( α + 1 ) l ξ α 2 exp α ξ l .
By following a rationale similar to the one described in Appendix B.1, to introduce the fluctuations in the local mean ξ of the exon sizes, we assume that ξ obeys the gamma distribution
p ( ξ ) = 1 ω δ Γ ( δ ) ξ δ 1 exp ξ ω .
By using Equations (A11) and (A12) to the conditional probability equation represented by Equation (A4), as well as carrying out the integration, we obtain
p I G ( l ) = Γ ( α + δ + 1 ) ω δ Γ ( α + 1 ) Γ ( δ ) ( α ω ) α + 1 x α 2 1 + α ω l α δ 1 .
Using the change of variables
α ω = σ ( q 1 ) , δ = 1 q 1 α 1
Equation (A13) can be rewritten in the form
p I G ( l ) = A I G l σ α 2 1 ( 1 q ) σ l 1 1 q
Here, P I G ( l ) is the inverse q-Gamma probability density function. It is worth noting that Equation (A15) can be represented in terms of the q-exponential function in the form
p I G ( l ) = A I G l σ α 2 exp q σ l .
The q-exponential is defined as
exp q σ l = 1 ( 1 q ) σ l 1 1 q .
and A I G is given by
A I G = Γ 1 q 1 σ Γ ( α + 1 ) Γ 1 q 1 α 1 1 q 1 α 1 .

References

  1. Beck, C.; Cohen, E.G.D. Superstatistics. Phys. A 2003, 322, 267–275. [Google Scholar] [CrossRef]
  2. Beck, C. Dynamical Foundations of Nonextensive Statistical Mechanics. Phys. Rev. Lett. 2001, 87, 180601. [Google Scholar] [CrossRef]
  3. Duarte Queirós, S.M. On the emergence of a generalised Gamma distribution. Application to traded volume in financial markets. Europhys. Lett. 2005, 71, 339–345. [Google Scholar] [CrossRef]
  4. de Souza, J.; Moyano, L.G.; Duarte Queirós, S.M. On statistical properties of traded volume in financial markets. Eur. Phys. J. B 2006, 50, 165–168. [Google Scholar] [CrossRef]
  5. Michas, G.; Vallianatos, F. Stochastic modeling of nonstationary earthquake time series with long-term clustering effects. Phys. Rev. E 2018, 98, 042107. [Google Scholar] [CrossRef]
  6. Iliopoulos, A.; Chorozoglou, D.; Kourouklas, C.; Mangira, O.; Papadimitriou, E. Superstatistics, complexity and earthquakes: A brief review and application on Hellenic seismicity. Boll. Geofis. Teor. Appl. 2019, 60, 531–548. [Google Scholar] [CrossRef]
  7. Beck, C. Lagrangian acceleration statistics in turbulent flows. Europhys. Lett. 2003, 64, 151–157. [Google Scholar] [CrossRef]
  8. Reynolds, A.M. Superstatistical Mechanics of Tracer-Particle Motions in Turbulence. Phys. Rev. Lett. 2003, 91, 084503. [Google Scholar] [CrossRef]
  9. Jung, S.; Swinney, H.L. Velocity difference statistics in turbulence. Phys. Rev. E 2005, 72, 026304. [Google Scholar] [CrossRef]
  10. Ourabah, K.; Ait Gougam, L.; Tribeche, M. Nonthermal and suprathermal distributions as a consequence of superstatistics. Phys. Rev. E 2015, 91, 012133. [Google Scholar] [CrossRef]
  11. Davis, S.; Avaria, G.; Bora, B.; Jain, J.; Moreno, J.; Pavez, C.; Soto, L. Single-particle velocity distributions of collisionless, steady-state plasmas must follow superstatistics. Phys. Rev. E 2019, 100, 023205. [Google Scholar] [CrossRef] [PubMed]
  12. Ourabah, K. Demystifying the success of empirical distributions in space plasmas. Phys. Rev. Res. 2020, 2, 023121. [Google Scholar] [CrossRef]
  13. Rouse, I.; Willitsch, S. Superstatistical Energy Distributions of an Ion in an Ultracold Buffer Gas. Phys. Rev. Lett. 2017, 118, 143401. [Google Scholar] [CrossRef] [PubMed]
  14. Ourabah, K. Fingerprints of nonequilibrium stationary distributions in dispersion relations. Sci. Rep. 2021, 11, 12103. [Google Scholar] [CrossRef] [PubMed]
  15. Jizba, P.; Kleinert, H. Superstatistics approach to path integral for a relativistic particle. Phys. Rev. D 2010, 82, 085016. [Google Scholar] [CrossRef]
  16. Ayala, A.; Hentschinski, M.; Hernández, L.A.; Loewe, M.; Zamora, R. Superstatistics and the effective QCD phase diagram. Phys. Rev. D 2018, 98, 114002. [Google Scholar] [CrossRef]
  17. Ourabah, K.; Tribeche, M. Quantum entanglement and temperature fluctuations. Phys. Rev. E 2017, 95, 042111. [Google Scholar] [CrossRef]
  18. Cheraghalizadeh, J.; Seifi, M.; Ebadi, Z.; Mohammadzadeh, H.; Najafi, M.N. Superstatistical two-temperature Ising model. Phys. Rev. E 2021, 103, 032104. [Google Scholar] [CrossRef]
  19. Jizba, P.; Scardigli, F. Special relativity induced by granular space. Eur. Phys. J. C 2013, 73, 2491. [Google Scholar] [CrossRef]
  20. Ourabah, K.; Barboza, E.M.; Abreu, E.M.C.; Neto, J.A. Superstatistics: Consequences on gravitation and cosmology. Phys. Rev. D 2019, 100, 103516. [Google Scholar] [CrossRef]
  21. Ourabah, K. Generalized statistical mechanics of stellar systems. Phys. Rev. E 2022, 105, 064108. [Google Scholar] [CrossRef] [PubMed]
  22. Bogachev, M.I.; Markelov, O.A.; Kayumov, A.R.; Bunde, A. Superstatistical model of bacterial DNA architecture. Sci. Rep. 2017, 7, 43034, Erratum in Sci. Rep. 2017, 7, 46917. [Google Scholar] [CrossRef] [PubMed]
  23. Itto, Y.; Beck, C. Superstatistical modelling of protein diffusion dynamics in bacteria. J. R. Soc. Interface 2021, 18, 20200927. [Google Scholar] [CrossRef] [PubMed]
  24. Sadoon, A.A.; Wang, Y. Anomalous, non-Gaussian, viscoelastic, and age-dependent dynamics of histonelike nucleoid-structuring proteins in live Escherichia coli. Phys. Rev. E 2018, 98, 042411. [Google Scholar] [CrossRef]
  25. Costa, M.O.; Silva, R.; Anselmo, D.H.A.L. Superstatistical and DNA sequence coding of the human genome. Phys. Rev. E 2022, 106. [Google Scholar] [CrossRef] [PubMed]
  26. Jeffrey, H.J. Chaos game representation of gene structure. Nucleic Acids Res. 1990, 18, 2163–2170. [Google Scholar] [CrossRef]
  27. Paris, H.S. Genetic resources of pumpkins and squash, Cucurbita spp. In Genetics and Genomics of Cucurbitaceae; Springer: Berlin/Heidelberg, Germany, 2016; pp. 111–154. [Google Scholar] [CrossRef]
  28. Chomicki, G.; Schaefer, H.; Renner, S.S. Origin and domestication of Cucurbitaceae crops: Insights from phylogenies, genomics and archaeology. New Phytol. 2020, 226, 1240–1255. [Google Scholar] [CrossRef]
  29. TATLIOGLU, T. 13—Cucumber: Cucumis sativus L. In Genetic Improvement of Vegetable Crops; Kalloo, G., Bergh, B., Eds.; Pergamon: Amsterdam, The Netherlands, 1993; pp. 197–234. [Google Scholar] [CrossRef]
  30. McCreight, J.D.; Nerson, H.; Grumet, R. 20—Melon: Cucumis melo L. In Genetic Improvement of Vegetable Crops; Kalloo, G., Bergh, B., Eds.; Pergamon: Amsterdam, The Netherlands, 1993; pp. 267–294. [Google Scholar] [CrossRef]
  31. Chen, J.f.; Staub, J.E.; Jiang, J. A reevaluation of karyotype in cucumber (Cucumis sativus L.). Genet. Resour. Crop Evol. 1998, 45, 301–305. [Google Scholar] [CrossRef]
  32. Koo, D.H.; Hur, Y.; Jin, D.C.; Bang, J.W. Karyotype analysis of a Korean cucumber cultivar (Cucumis sativus L. cv. Winter Long) using C-banding and bicolor fluorescence in situ hybridization. Mol. Cells 2002, 13, 413–418. [Google Scholar] [CrossRef]
  33. Singh, A.; Roy, R. Karyological studies in Cucumis (L.). Caryologia 1974, 27, 153–160. [Google Scholar] [CrossRef]
  34. RP, R. Cytological studies in Cucumis and Citrullus. Cytologia 1970, 35, 561–569. [Google Scholar] [CrossRef]
  35. Ramachandran, A.; Seshadri, P. Multiple relapses in borderline leprosy—A case report. Indian J. Lepr. 1986, 58, 623–625. [Google Scholar] [PubMed]
  36. Cavagnaro, P.F.; Senalik, D.A.; Yang, L.; Simon, P.W.; Harkins, T.T.; Kodira, C.D.; Huang, S.; Weng, Y. Genome-wide characterization of simple sequence repeats in cucumber (Cucumis sativus L.). BMC Genom. 2010, 11, 1–18. [Google Scholar] [CrossRef] [PubMed]
  37. Garcia-Mas, J.; Benjak, A.; Sanseverino, W.; Bourgeois, M.; Mir, G.; González, V.M.; Hénaff, E.; Câmara, F.; Cozzuto, L.; Lowy, E.; et al. The genome of melon (Cucumis melo L.). PNAS 2012, 109, 11872–11877. [Google Scholar] [CrossRef]
  38. Sun, H.; Wu, S.; Zhang, G.; Jiao, C.; Guo, S.; Ren, Y.; Zhang, J.; Zhang, H.; Gong, G.; Jia, Z.; et al. Karyotype stability and unbiased fractionation in the paleo-allotetraploid Cucurbita genomes. Mol. Plant 2017, 10, 1293–1306. [Google Scholar] [CrossRef]
  39. Montero-Pau, J.; Blanca, J.; Bombarely, A.; Ziarsolo, P.; Esteras, C.; Martí-Gómez, C.; Ferriol, M.; Gómez, P.; Jamilena, M.; Mueller, L.; et al. De novo assembly of the zucchini genome reveals a whole-genome duplication associated with the origin of the Cucurbita genus. Plant Biotechnol. J. 2018, 16, 1161–1171. [Google Scholar] [CrossRef]
  40. Barrera-Redondo, J.; Ibarra-Laclette, E.; Vázquez-Lobo, A.; Gutiérrez-Guerrero, Y.T.; de la Vega, G.S.; Piñero, D.; Montes-Hernández, S.; Lira-Saade, R.; Eguiarte, L.E. The genome of Cucurbita argyrosperma (silver-seed gourd) reveals faster rates of protein-coding gene and long noncoding RNA turnover and neofunctionalization within Cucurbita. Mol. Plant 2019, 12, 506–520. [Google Scholar] [CrossRef]
  41. Da Silva, W.; Silva, R. Cosmological perturbations in the Tsallis holographic dark energy scenarios. Eur. Phys. J. Plus 2021, 136, 1–19. [Google Scholar] [CrossRef]
  42. da Silva, W.; Gonzalez, J.; Silva, R.; Alcaniz, J. Thermodynamic constraints on the dark sector. Eur. Phys. J. Plus 2020, 135, 1–11. [Google Scholar] [CrossRef]
  43. Holanda, R.; da Silva, W. On a possible cosmological evolution of galaxy cluster YX-YSZE scaling relation. J. Cosmol. Astropart. Phys. 2020, 2020, 027. [Google Scholar] [CrossRef]
  44. da Silva, W.; Holanda, R.; Silva, R. Bayesian comparison of the cosmic duality scenarios. Phys. Rev. D 2020, 102, 063513. [Google Scholar] [CrossRef]
  45. Da Silva, W.; Silva, R. Extended ΛCDM model and viscous dark energy: A Bayesian analysis. J. Cosmol. Astropart. Phys. 2019, 2019, 036. [Google Scholar] [CrossRef]
  46. da Silva, W.; Gimenes, H.; Silva, R. Extended ΛCDM model. Astropart. Phys. 2019, 105, 37–43. [Google Scholar] [CrossRef]
  47. de Lima, M.M.F.; Silva, R.; Fulco, U.L.; Mello, V.D.; Anselmo, D.H.A.L. Bayesian analysis of plant DNA size distribution via non-additive statistics. Eur. Phys. J. Plus 2022, 137, 1–8. [Google Scholar] [CrossRef]
  48. Costa, M.O.; Silva, R.; Anselmo, D.H.A.L.; Silva, J.R.P. Analysis of human DNA through power-law statistics. Phys. Rev. E 2019, 99, 022112. [Google Scholar] [CrossRef]
  49. Silva, R.; Silva, J.R.P.; Anselmo, D.H.A.L.; Alcaniz, J.S.; da Silva, W.J.C.; Costa, M.O. An alternative description of power law correlations in DNA sequences. Phys. Stat. Mech. Appl. 2020, 545, 123735. [Google Scholar] [CrossRef]
  50. Ellison, A.M. Bayesian inference in ecology. Ecol. Lett. 2004, 7, 509–520. [Google Scholar] [CrossRef]
  51. Sayers, E.W.; Bolton, E.E.; Brister, J.R.; Canese, K.; Chan, J.; Comeau, D.C.; Connor, R.; Funk, K.; Kelly, C.; Kim, S.; et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2022, 50, D20–D26. [Google Scholar] [CrossRef]
  52. Beck, C.; Cohen, E.G.D.; Swinney, H.L. From time series to superstatistics. Phys. Rev. E 2005, 72, 056133, Erratum in Phys. Rev. E 2006, 73, 049905. [Google Scholar] [CrossRef]
  53. Moré, J.J. The Levenberg-Marquardt algorithm: Implementation and theory. In Proceedings of the Numerical Analysis; Watson, G.A., Ed.; Springer: Berlin/Heidelberg, Germany, 1978; pp. 105–116. [Google Scholar]
  54. Kalbfleisch, J.D.; Sprott, D.A. Application of Likelihood Methods to Models Involving Large Numbers of Parameters. J. R. Stat. Soc. Ser. Methodol. 1970, 32, 175–208. [Google Scholar] [CrossRef]
  55. Smith, R.L.; Naylor, J.C. A Comparison of Maximum Likelihood and Bayesian Estimators for the Three-Parameter Weibull Distribution. J. R. Stat. Soc. Ser. Appl. Stat. 1987, 36, 358–369. Available online: https://academic.oup.com/jrsssc/article-pdf/36/3/358/48622158/jrsssc_36_3_358.pdf (accessed on 2 February 2024). [CrossRef]
  56. Jeffreys, H. The Theory of Probability; OUP: Oxford, UK, 1998. [Google Scholar]
  57. Buchner, J. UltraNest—A robust, general purpose Bayesian inference engine. J. Open Source Softw. 2021, 6, 3001. [Google Scholar] [CrossRef]
  58. Feroz, F.; Hobson, M.P.; Bridges, M. MultiNest: An efficient and robust Bayesian inference tool for cosmology and particle physics. Mon. Not. R. Astron. Soc. 2009, 398, 1601–1614. [Google Scholar] [CrossRef]
  59. Feroz, F.; Hobson, M.P.; Cameron, E.; Pettitt, A.N. Importance Nested Sampling and the MultiNest Algorithm. Open J. Astrophys. 2019, 2, 1. [Google Scholar] [CrossRef]
  60. Buchner, J. Collaborative Nested Sampling: Big Data versus Complex Physical Models. Publ. Astron. Soc. Pac. 2019, 131, 108005. [Google Scholar] [CrossRef]
  61. Keegan, E.H. A Primer on Bayesian Inference for Biophysical Systems. Biophys. J. 2015, 108, 2103–2113. [Google Scholar] [CrossRef]
  62. Tsallis, C. Introduction to Nonextensive Statistical Mechanics: Approaching a Complex World; Springer: Berlin/Heidelberg, Germany, 2009; Volume 1, pp. 1–2. [Google Scholar]
  63. de Lima, M.M.; Anselmo, D.H.; Silva, R.; Nunes, G.H.; Fulco, U.L.; Vasconcelos, M.S.; Mello, V.D. A Bayesian Analysis of Plant DNA Length Distribution via κ-Statistics. Entropy 2022, 24, 1225. [Google Scholar] [CrossRef]
  64. de Lima, M.; Nunes, G.; Fulco, U.; Silva, R.; Vasconcelos, M.; Anselmo, D. Range of correlations in the size distributions of plant proteins. Eur. Phys. J. Plus 2023, 138, 1132. [Google Scholar] [CrossRef]
  65. de Lima, M.; Costa, M.; Silva, R.; Fulco, U.; Oliveira, J.; Vasconcelos, M.; Anselmo, D. Viral proteins length distributions: A comparative analysis. Phys. Stat. Mech. Its Appl. 2024, 633, 129367. [Google Scholar] [CrossRef]
Figure 1. Time series representation and statistical evaluation method. (a) Time series for chromosome 01 was created using data from the National Center for Biotechnology Information (NCBI) [51]. The “spatial” series (l) represents the spatial displacement along the DNA sequence at “time” t. The coordinate position ( t i ) is also linked to a “temporal” index, with i = 1 , , n . (b) The probability density derived of the “time” series of chromosome 01 in the C. melo species.
Figure 1. Time series representation and statistical evaluation method. (a) Time series for chromosome 01 was created using data from the National Center for Biotechnology Information (NCBI) [51]. The “spatial” series (l) represents the spatial displacement along the DNA sequence at “time” t. The coordinate position ( t i ) is also linked to a “temporal” index, with i = 1 , , n . (b) The probability density derived of the “time” series of chromosome 01 in the C. melo species.
Entropy 26 00819 g001
Figure 2. Autocorrelation analysis. (a) In the graphical representation, the black points correspond to the autocorrelation of the time series of chromosome 01 in the C. melo species. The red line indicates a double exponential fit as defined by Equation (5), with characteristic times t 1 = γ 1 1.27 and t 2 = T 121 ; (b) histogram for the local distribution within a window of size T. The blue and red curves correspond to the gamma distribution with fitting parameters k = 0.3785 and ξ = 768.3620 and the inverse gamma distribution with parameters α = 1.1572 and ξ = 262.2571, respectively.
Figure 2. Autocorrelation analysis. (a) In the graphical representation, the black points correspond to the autocorrelation of the time series of chromosome 01 in the C. melo species. The red line indicates a double exponential fit as defined by Equation (5), with characteristic times t 1 = γ 1 1.27 and t 2 = T 121 ; (b) histogram for the local distribution within a window of size T. The blue and red curves correspond to the gamma distribution with fitting parameters k = 0.3785 and ξ = 768.3620 and the inverse gamma distribution with parameters α = 1.1572 and ξ = 262.2571, respectively.
Entropy 26 00819 g002
Figure 3. Graphical analysis of the time series ξ ( t ) for chromosome 01 of the C. melo species. (a,c) depict the time series for ξ ( t ) obtained from the relationships presented in Equations (8) and (9), respectively. (b,d) illustrate the histogram describing the distribution for the function p ( ξ ) . In (b), the red curve represents the inverse gamma distribution with parameters μ = 15.3237 and ω = 3989.4215 , while in (d), the blue curve represents the gamma distribution with parameters δ = 18.3461 and ω = 18.0439 .
Figure 3. Graphical analysis of the time series ξ ( t ) for chromosome 01 of the C. melo species. (a,c) depict the time series for ξ ( t ) obtained from the relationships presented in Equations (8) and (9), respectively. (b,d) illustrate the histogram describing the distribution for the function p ( ξ ) . In (b), the red curve represents the inverse gamma distribution with parameters μ = 15.3237 and ω = 3989.4215 , while in (d), the blue curve represents the gamma distribution with parameters δ = 18.3461 and ω = 18.0439 .
Entropy 26 00819 g003
Figure 4. Superstatistical distribution for the string size of chromosomes in the Cucurbitaceae family and Cucumis genus. The probability distribution of the time series is visually represented by black dots. The red curve illustrates the best fit for the inverse q-Gamma distribution. In contrast, the blue line corresponds to the best fit for the q-Gamma distribution. (a,b) show chromosomes 08 and 11, respectively, for the C. melo species. (c,d) show chromosomes 02 and 07, respectively, for the C. sativus species.
Figure 4. Superstatistical distribution for the string size of chromosomes in the Cucurbitaceae family and Cucumis genus. The probability distribution of the time series is visually represented by black dots. The red curve illustrates the best fit for the inverse q-Gamma distribution. In contrast, the blue line corresponds to the best fit for the q-Gamma distribution. (a,b) show chromosomes 08 and 11, respectively, for the C. melo species. (c,d) show chromosomes 02 and 07, respectively, for the C. sativus species.
Entropy 26 00819 g004
Figure 5. Superstatistical distribution for the string size of the chromosome in the Cucurbitaceae family and Curcubita genus. The probability distribution of the time series is visually represented by black dots. The red curve illustrates the best fit for the inverse q-Gamma distribution. In contrast, the blue line corresponds to the best fit for the q-Gamma distribution. (a,b) show chromosomes 11 and 15, respectively, for the C. maxima species. (c,d) show chromosomes 08 and 15, respectively, for the C. moschata species. (e,f) show chromosomes 04 and 06, respectively, for the C. pepo species.
Figure 5. Superstatistical distribution for the string size of the chromosome in the Cucurbitaceae family and Curcubita genus. The probability distribution of the time series is visually represented by black dots. The red curve illustrates the best fit for the inverse q-Gamma distribution. In contrast, the blue line corresponds to the best fit for the q-Gamma distribution. (a,b) show chromosomes 11 and 15, respectively, for the C. maxima species. (c,d) show chromosomes 08 and 15, respectively, for the C. moschata species. (e,f) show chromosomes 04 and 06, respectively, for the C. pepo species.
Entropy 26 00819 g005
Figure 6. Results of the Bayesian inference process, showing projections of the posterior distributions for the free parameters of the q-Gamma and inverse q-Gamma distributions for the Cucumis genus. (a,b) present the distributions for chromosomes 08 and 11 of the C. melo species, respectively.
Figure 6. Results of the Bayesian inference process, showing projections of the posterior distributions for the free parameters of the q-Gamma and inverse q-Gamma distributions for the Cucumis genus. (a,b) present the distributions for chromosomes 08 and 11 of the C. melo species, respectively.
Entropy 26 00819 g006
Table 1. The Jeffreys’ scale for interpreting the Bayes factor. The first column represents the logarithm of the Bayes factor limit values, while the second column is the interpretation of the evidence’s strength over the appropriate threshold.
Table 1. The Jeffreys’ scale for interpreting the Bayes factor. The first column represents the logarithm of the Bayes factor limit values, while the second column is the interpretation of the evidence’s strength over the appropriate threshold.
ln B 1 , 2 Interpretation
Greater than 5Strong evidence for model 01
(2.5, 5)Moderate evidence for model 01
(1, 2.5)Weak evidence for model 01
(−1,1)Inconclusive
(−2.5,−1)Weak evidence for model 02
(−5, −2.5)Moderate evidence for model 02
Less than −5Strong evidence for model 02
Table 2. The table presents the normal priors used for the adjustment parameters in the models described by Equations (12) and (14). The first column indicates the parameter, and the second and third specify the model in which the parameters are part, i.e., q gamma and inverse q gamma, respectively.
Table 2. The table presents the normal priors used for the adjustment parameters in the models described by Equations (12) and (14). The first column indicates the parameter, and the second and third specify the model in which the parameters are part, i.e., q gamma and inverse q gamma, respectively.
Parametersq-GammaInverse q-Gamma
MeanStandard DeviationMeanStandard Deviation
A G 6.05 × 10 7 7 × 10 8 --
α G 2.990.30--
σ G 12.291.23--
q G 1.170.12--
A I G -- 3.25 × 10 3 3.25 × 10 2
α I G --0.520.05
σ I G --222.6722.30
q I G --1.120.13
Table 3. The table presents the Bayesian evidence and the Bayes factors for each chromosome of the C. melo and C. sativus species. The column ln ( ϵ 1 ) presents the Bayesian evidence for the q-Gamma distribution, ln ( ϵ 2 ) presents the Bayesian evidence for the inverse q-Gamma distribution, and ln ( B 1 , 2 ) presents the Bayes factor.
Table 3. The table presents the Bayesian evidence and the Bayes factors for each chromosome of the C. melo and C. sativus species. The column ln ( ϵ 1 ) presents the Bayesian evidence for the q-Gamma distribution, ln ( ϵ 2 ) presents the Bayesian evidence for the inverse q-Gamma distribution, and ln ( B 1 , 2 ) presents the Bayes factor.
C. meloC. sativus
CHR ln ϵ 1 ln ϵ 2 ln B 1 , 2 ln ϵ 1 ln ϵ 2 ln B 1 , 2
1 0.085 ± 0.019 0.020 ± 0.010 0.065 ± 0.009 0.061 ± 0.040 0.009 ± 0.009 0.052 ± 0.030
2 0.105 ± 0.026 0.021 ± 0.010 0.084 ± 0.016 0.062 ± 0.040 0.009 ± 0.009 0.052 ± 0.030
3 0.077 ± 0.024 0.020 ± 0.009 0.056 ± 0.014 0.052 ± 0.032 0.009 ± 0.009 0.042 ± 0.022
4 0.106 ± 0.036 0.021 ± 0.009 0.085 ± 0.026 0.061 ± 0.046 0.009 ± 0.009 0.051 ± 0.036
5 0.087 ± 0.026 0.020 ± 0.010 0.067 ± 0.016 0.068 ± 0.034 0.009 ± 0.009 0.059 ± 0.024
6 0.095 ± 0.023 0.020 ± 0.010 0.074 ± 0.013 0.069 ± 0.039 0.009 ± 0.009 0.059 ± 0.030
7 0.107 ± 0.029 0.020 ± 0.010 0.087 ± 0.019 0.068 ± 0.035 0.010 ± 0.009 0.058 ± 0.025
8 0.097 ± 0.025 0.020 ± 0.010 0.077 ± 0.015 ---
9 0.091 ± 0.024 0.020 ± 0.010 0.071 ± 0.014 ---
10 0.092 ± 0.028 0.020 ± 0.010 0.071 ± 0.019 ---
11 0.081 ± 0.027 0.021 ± 0.010 0.060 ± 0.017 ---
12 0.077 ± 0.023 0.020 ± 0.010 0.057 ± 0.013 ---
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Costa, M.O.; Silva, R.; de Lima, M.M.F.; Anselmo, D.H.A.L. Superstatistics Applied to Cucurbitaceae DNA Sequences. Entropy 2024, 26, 819. https://doi.org/10.3390/e26100819

AMA Style

Costa MO, Silva R, de Lima MMF, Anselmo DHAL. Superstatistics Applied to Cucurbitaceae DNA Sequences. Entropy. 2024; 26(10):819. https://doi.org/10.3390/e26100819

Chicago/Turabian Style

Costa, M. O., R. Silva, M. M. F. de Lima, and D. H. A. L. Anselmo. 2024. "Superstatistics Applied to Cucurbitaceae DNA Sequences" Entropy 26, no. 10: 819. https://doi.org/10.3390/e26100819

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop