Next Article in Journal
On Semi-Vector Spaces and Semi-Algebras with Applications in Fuzzy Automata
Previous Article in Journal
On Conditional Axioms and Associated Inference Rules
Previous Article in Special Issue
Engineering Applications with Stress-Strength for a New Flexible Extension of Inverse Lomax Model: Bayesian and Non-Bayesian Inference
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Short Note on Generating a Random Sample from Finite Mixture Distributions

Department of Mathematical & Computational Sciences, University of Toronto Mississauga, Mississauga, ON L5L 1C6, Canada
*
Author to whom correspondence should be addressed.
Axioms 2024, 13(5), 307; https://doi.org/10.3390/axioms13050307
Submission received: 6 March 2024 / Revised: 18 April 2024 / Accepted: 7 May 2024 / Published: 8 May 2024

Abstract

:
Computational statistics is a critical skill for professionals in fields such as data science, statistics, and related disciplines. One essential aspect of computational statistics is the ability to simulate random variables from specified probability distributions. Commonly employed techniques for sampling random variables include the inverse transform method, acceptance–rejection method, and Box–Muller transformation, all of which rely on sampling from the uniform ( 0 , 1 ) distribution. A significant concept in statistics is the finite mixture model, characterized by a convex combination of multiple probability density functions. In this paper, we introduce a modified version of the composition method, a standard approach for sampling finite mixture models. Our modification offers the advantage of relying on sampling from the uniform ( 0 , 1 ) distribution, aligning with prevalent methods in computational statistics. This alignment simplifies teaching computational statistics courses, as well as having other benefits. We offer several examples to illustrate the approach.

1. Introduction

Computational statistics has gained significant importance in recent years due to the exponential growth of data and the increasing complexity of data-driven problems. Within computational statistics, the ability to simulate or generate random samples from a probability distribution is fundamental. These generated random samples are utilized for estimating probabilities, expectations, and testing hypotheses. The inverse transform method and the acceptance–rejection method are two of the most fundamental techniques for generating random samples, and these can be found in well-known computational statistics textbooks such as Statistical Computing with R by [1]. These methods rely on generating numbers from the uniform ( 0 , 1 ) distribution. The choice of method depends on the specific distribution being generated and the desired properties of the generated sample, such as efficiency or accuracy.
In certain cases, the data may not conform to commonly known distributions such as the normal or exponential distributions. Instead, they can be represented as a finite mixture model, which combines multiple probability density functions in a convex manner. These models find applications in various scientific domains. For instance, normal mixture distributions are used as parametric density estimators [2], whereas finite mixture models are employed in medical studies [3] and financial analyses [4]. Finite mixture models have also been used by [5] in the analysis of wind speeds, and Ref. [6] have demonstrated their usefulness in Bayesian density estimation. Furthermore, Ref. [7] provide a comprehensive overview of the different applications of mixture models.
Sampling from finite mixture models is a standard topic covered in many computational statistics textbooks, including works by [1,8], among others. In these texts, the primary approach for sampling from finite mixture models is typically the composition method. However, although the composition method is effective, it does not directly use the uniform distribution.
The goal of this paper is to modify the standard composition algorithm by incorporating sampling from the uniform ( 0 , 1 ) distribution to ensure consistency with primary sampling algorithms such as the inverse transform method and the acceptance–rejection method. This aspect could prove beneficial in teaching computational statistics courses, as sampling from the uniform ( 0 , 1 ) distribution becomes a standard step in various sampling algorithms.
The remainder of this paper is organized as follows. Section 2 provides a relevant background on finite mixture models and discusses the proposed modification. Section 3 presents several examples demonstrating the effectiveness of the proposed method. Finally, Section 4 offers concluding remarks.

2. Finite Mixture Models and Simulation Theorem

In this section, we define a finite mixture model and introduce a theorem for sampling this model via an adaptation of the composition method. The proof of this theorem is also included.
A finite mixture model is a statistical model that represents a probability distribution as a mixture of several component distributions. Mathematically, given k component distributions f 1 ( x ) , , f k ( x ) , each with associated mixing probabilities (also known as mixing weights) π 1 , , π k , a finite mixture model  f ( x ) is defined as:
f ( x ) = i = 1 k π i f i ( x ) ,
where 0 π i 1 and i = 1 k π i = 1 . Further insights into Equation (1) can be found in studies by [9,10].
In the literature, simulating a variable from a finite k-mixture distribution is typically carried out by the composition method [1,11]:
  • Generate an integer I { 1 , , k } such that
    P ( I = i ) = π i , for   i = 1 , , k ;
  • Deliver X with cumulative distribution function F I .
The following theorem introduces an algorithm for generating a sample from (1). This theorem presents a modified version of the composition method, utilizing the uniform distribution. Aligning with well-established algorithms such as the inverse transform and acceptance–rejection method enhances accessibility for learners.
Theorem 1.
Consider F ( x ) as defined in (1). The following algorithm generates a random variate from X with the cumulative distribution function F ( x ) :
1. 
Generate a random u from the uniform (0, 1) distribution;
2. 
If i = 1 l 1 π i u < i = 1 l π i , generate a random x from F l ( x ) , where l = 1 , , k , with the convention that i = 1 0 π i = 0 .
Proof. 
We show that the generated sample has the same distribution as X. By the law of total probability, we have
P ( X x ) = 0 1 P ( X x | U = u ) d u = 0 π 1 P ( X x | U = u ) d u + π 1 π 1 + π 2 P ( X x | U = u ) d u + + i = 1 l 1 π i i = 1 l π i P ( X x | U = u ) d u + + i = 1 k 1 π i 1 P ( X x | U = u ) d u = 0 π 1 F 1 ( x ) d u + π 1 π 1 + π 2 F 2 ( x ) d u + + i = 1 l 1 π i i = 1 l π i F l ( x ) d u + + i = 1 k 1 π i 1 F k ( x ) d u = π 1 F 1 ( x ) + π 2 F 2 ( x ) + + π l F l ( x ) + + π k F k ( x ) = i = 1 k π i F i ( x ) = F ( x ) .
   □
The proof of Theorem 1 reveals that the approach is overly general, encompassing not only mixtures of continuous distributions but also extending to other scenarios. This includes mixtures involving continuous and discrete distributions, as well as mixtures comprising only discrete distributions. Additionally, the framework can be extended to sample mixtures of multivariate distributions. In the following section, we explore specific examples that illustrate these various cases.

3. Examples

In this section, we demonstrate the proposed algorithm outlined in Theorem 1 with six illustrative examples. The R code is provided in the Supplementary Materials.
Example 1.
Mixture of three normal distributions [10].
Suppose X 1 N ( μ = 0 , σ 2 = 1 ) , X 2 N ( μ = 5 , σ 2 = 0.25 ) , and X 3 N ( μ = 2 , σ 2 = 9 ) are independent. Let
F ( x ) = 0.3 F 1 ( x ) + 0.5 F 2 ( x ) + 0.2 F 3 ( x ) .
Using Theorem 1, we generated a sample of size 10 6 from F ( x ) . Figure 1 shows the histogram of the generated sample with the true density superimposed. It is evident from Figure 1 that the proposed method performs exceptionally well in this example.
Example 2.
Mixture of five gamma distributions: different shapes with same scale parameters [1].
Consider F ( x ) = i = 1 5 π i F i ( x ) , where X i g amma ( r = 3 , β i = i ) are independent and the mixing probabilities are π i = i / 15 , i = 1 , , 5 . Using Theorem 1, we generated a sample of size 10 6 from F ( x ) . Figure 2 displays the histogram plot of the generated sample with the true density superimposed. The proposed procedure also performs well in this example.
Example 3.
Mixture of five gamma distributions: different scale with same shape parameters.
Let F ( x ) be as described in Example 2, with X i gamma ( r i = i , β i = 3 ) . Employing Theorem 1, we generated a sample of size 10 6 from F ( x ) . Figure 3 presents the histogram plot of the generated sample with the true density superimposed. The proposed procedure demonstrates effective performance in this example as well.
Example 4.
Comparing empirical and true mixed distributions.
In this example, we compare F n ( x ) , the empirical cumulative distribution function (ECDF) of the simulated data, with the true mixed distribution
F ( x ) = i = 1 3 π i F i ( x ) ,
where F i represents three cases:
  • Case 1:  X 1 t ( 5 ) , X 2 t ( 10 ) , and X 3 t ( 15 ) . Here, t ( ν ) represents the t distribution with ν degrees of freedom;
  • Case 2:  X 1 beta ( 2 , 5 ) , X 2 beta ( 2 , 10 ) , and X 3 beta ( 2 , 15 ) ;
  • Case 3:  X 1 Pareto ( 1 , 1 ) , X 2 Pareto ( 2 , 2.5 ) , and X 3 Pareto ( 3 , 3 ) . Here, Pareto ( x m , α ) is the Pareto distribution with x m as the minimum possible value (scale parameter) and α as the shape parameter.
In all three cases, we let π 1 = 9 / 20 , π 2 = 9 / 20 , and π 3 = 1 / 10 . As a measure of proximity, we utilize the Cramér–von Mises distance defined as
D = F n ( x ) F ( x ) 2 d F ( x ) .
We examine various sample sizes n 20 , 50 , 100 , 1000 . For each generated sample X 1 , , X n , we estimate D using
D ^ = 1 n i = 1 n F n ( X i ) F ( X i ) 2 .
For each sample, we compute 10 4 values of D ^ and report D ^ ¯ and sd ( D ^ ) , representing the mean and standard deviation of the 10 4 values of D ^ . Additionally, for comparison, we include results obtained using samples generated from the composition method described in Section 2. The results are reported in Table 1. It is clear that both simulation algorithms work well as both D ^ ¯ and sd ( D ^ ) approach zero, especially as we increase the sample size.
Example 5.
Mixture of four binomial distributions [12].
Consider
F ( x ) = i = 1 4 π i F i ( x ) ,
where X i binomial ( m = 10 , θ i ) are independent, with θ 1 = 0.1 , θ 2 = 0.2 , θ 3 = 0.6 , and θ 4 = 0.9 . The mixing probabilities are π 1 = π 2 = π 3 = 0.2 and π 4 = 0.4 . Using Theorem 1, a sample of size 10 6 was generated from F ( x ) . For comparison, we analyzed the theoretical mean and variance alongside the sample mean and variance. As stated by [9], we have E [ X ] = μ = i = 1 4 π i μ i and V [ X ] = i = 1 4 π i σ i 2 + i = 1 4 π i ( μ i μ ) 2 . In this example, μ i = m θ i and σ i 2 = m θ i ( 1 θ i ) . Thus, μ = 16.20 and σ 2 = 106.98 . Additionally, the sample mean and variance are 16.2050 and 106.9789, respectively. This indicates a close correspondence between the theoretical and sample statistics.
Example 6.
Mixture of normal and Poisson distributions.
Consider the mixture distribution given by
F ( x ) = 0.7 F 1 ( x ) + 0.3 F 2 ( x ) ,
where X 1 follows a normal distribution with mean 10 and variance 4 and X 2 follows a Poisson distribution with mean 4. X 1 and X 2 are independent. Utilizing Theorem 1, a sample of size 10 6 was generated from F ( x ) . As in Example 4, the exact mean and the exact variance of the mixture distribution are μ = 0.7 × 10 + 0.3 × 5 = 8.50 and σ 2 = 0.7 ( 4 + ( 10 μ ) 2 ) + 0.3 ( 5 + ( 5 μ ) 2 ) = 9.55 . Additionally, the simulated mean and variance of the mixture distribution are 8.4963 and 9.5485, respectively. This demonstrates a close correspondence between the theoretical and sample statistics.

4. Conclusions

This paper introduces a modified version of the composition method for sampling finite mixture distributions. By incorporating sampling from the uniform ( 0 , 1 ) distribution, our modification aligns with prevalent methods in computational statistics, such as the inverse transform and acceptance–rejection methods. This modification not only enhances the consistency and accuracy of sampling procedures but also simplifies the teaching of computational statistics courses, where sampling from the uniform ( 0 , 1 ) distribution is a common step in various algorithms.
The effectiveness of the proposed modification is demonstrated through several illustrative examples, showcasing its robust performance across different scenarios. From mixtures of normal and gamma distributions to binomial and Poisson mixtures, the proposed algorithm consistently generates samples that closely match the theoretical distributions. Moreover, comparison metrics such as the Cramér–von Mises distance provide quantitative evidence of the algorithm’s efficiency and accuracy, especially as sample sizes increase.
Overall, the modified composition method presented in this paper offers a valuable addition to the toolkit of computational statisticians and educators alike. Its simplicity, consistency, and performance make it a practical choice for sampling finite mixture distributions in various applications.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/axioms13050307/s1. R Code: A Short Note on Generating a Random Sample from Finite Mixture Distributions.

Author Contributions

Methodology, L.A.-L.; Software, A.L.; Writing—original draft, A.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Rizzo, M. Statistical Computing with R; CRC Press: Boca Raton, FL, USA; Taylor & Francis: Abingdon, UK, 2019. [Google Scholar]
  2. Hothorn, T.; Everitt, B.S. A Handbook of Statistical Analyses Using R, 2nd ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2009. [Google Scholar]
  3. Everitt, B.S. An introduction to finite mixture distributions. Stat. Methods Med. Res. 1996, 5, 107–127. [Google Scholar] [CrossRef] [PubMed]
  4. Lin, W.C.; Emura, T.; Sun, L.H. Estimation under copula-based Markov normal mixture models for serially correlated data. Commun. Stat.-Simul. Comput. 2021, 50, 4483–4515. [Google Scholar] [CrossRef]
  5. Cai, J.; Xu, Q.; Cao, M.; Yang, Y. Capacity credit evaluation of correlated wind resources vsing vine copula and improved importance sampling. Appl. Sci. 2019, 9, 199. [Google Scholar] [CrossRef]
  6. Escobar, M.D.; West, M. Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc. 1995, 90, 577–588. [Google Scholar] [CrossRef]
  7. Titterington, D.M.; Smith, A.F.M.; Makov, U.E. Statistical Analysis of Finite Mixture Distributions; John Wiley & Sons: New York, NY, USA, 1985. [Google Scholar]
  8. Tanizaki, H. Computational Methods in Statistics and Econometrics, 1st ed.; Marcel Dekker: New York, NY, USA, 2004. [Google Scholar]
  9. Hogg, R.V.; McKean, J.W.; Craig, A.T. Introduction to Mathematical Statistics, 8th ed.; Person: Boston, MA, USA, 2019. [Google Scholar]
  10. McLachlan, G.; Peel, D. Finite Mixture Models; John Wiley & Sons, Inc.: New York, NY, USA, 2000. [Google Scholar]
  11. Ghorbanzadeh, D.; Dur, P.; Jaupi, L. A method for the generate a random sample from a finite mixture distributions. In Proceedings of the 6th Annual International Conference on Computational Mathematics, Computational Geometry & Statistics (CMCGS 2017), Singapore, 6–7 March 2017. [Google Scholar] [CrossRef]
  12. Everitt, B.S.; Hand, D.J. Finite Mixture Distributions; Chapman & Hall: New York, NY, USA, 1981. [Google Scholar]
Figure 1. Mixture of three normal distributions in Example 1.
Figure 1. Mixture of three normal distributions in Example 1.
Axioms 13 00307 g001
Figure 2. Mixture of five gamma distributions with different shapes and same scale parameters in Example 2.
Figure 2. Mixture of five gamma distributions with different shapes and same scale parameters in Example 2.
Axioms 13 00307 g002
Figure 3. Mixture of five gamma distributions with different scale parameters and same shape parameters in Example 3.
Figure 3. Mixture of five gamma distributions with different scale parameters and same shape parameters in Example 3.
Axioms 13 00307 g003
Table 1. Comparison of proposed and composition methods.
Table 1. Comparison of proposed and composition methods.
ProposedComposition
F ( x ) n D ^ ¯ sd ( D ^ ) D ^ ¯ sd ( D ^ )
Case 1200.0087710.0079340.0088140.008036
400.0042910.0039030.0042950.003871
600.0028120.0025150.0028510.002566
800.0021260.001880.0021090.001861
1000.0016870.0015390.0016740.001517
Case 2200.0087950.0080480.0088140.008207
400.0042690.0038540.0043250.003945
600.0028220.002560.0028420.002528
800.0020870.001890.0021080.001951
1000.0016920.0015460.00170.001518
Case 2200.0088130.0081940.0088570.008049
400.0042990.0038890.0042930.003844
600.0028420.0025140.002850.002601
800.0021480.0019030.0021730.002008
1000.0016780.0014850.0017110.001569
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Al-Labadi, L.; Ly, A. A Short Note on Generating a Random Sample from Finite Mixture Distributions. Axioms 2024, 13, 307. https://doi.org/10.3390/axioms13050307

AMA Style

Al-Labadi L, Ly A. A Short Note on Generating a Random Sample from Finite Mixture Distributions. Axioms. 2024; 13(5):307. https://doi.org/10.3390/axioms13050307

Chicago/Turabian Style

Al-Labadi, Luai, and Anna Ly. 2024. "A Short Note on Generating a Random Sample from Finite Mixture Distributions" Axioms 13, no. 5: 307. https://doi.org/10.3390/axioms13050307

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop