Deterministic Approximate EM Algorithm; Application to the Riemann Approximation EM and the Tempered EM
Abstract
:1. Introduction
2. Materials and Methods
2.1. Deterministic Approximate EM Algorithm and Its Convergence for the Exponential Family
2.1.1. Context and Motivation
- (E) With the current parameter , calculate the conditional probability ;
- (M) To get , maximise in the function ;
2.1.2. Theorem
- Conditions M1–3 on the model.
- M1., and is a -finite positive Borel measure on . Let , and . Define , and as:
- M2. Assume that
- (a*) and are continuous on ;
- (b) for all , is finite and continuous on ;
- (c) there exists a continuous function such that for all ;
- (d) g is positive, finite and continuous on and the level set is compact for any .
- M3. Define and, for any , . Assume either that:
- (a) The set is compact or
- () for all compact sets is finite.
- The conditions on the approximation. Assume that is deterministic. Let . For all indices u, for any compact set , one of the two following configurations holds:Or
- (i)
- (a)
- and, with probability 1;
- (b)
- converges towards a connected component of .
- (ii)
- Let be the set closure function and be any point-to-set distance function within . If has an empty interior, then such that:
- When g is differentiable on , its stationary points are the stable points of the EM, i.e., .
- The M1–3 conditions in Theorem 1 are almost identical to the similarly named M1–3 conditions in [7]. The only difference is in , where we remove the hypothesis that S has to be a continuous function of z, since it is not needed when the approximation is not stochastic.
- The property of the condition (4) can seem hard to verify since S is not integrated here against a probability density function. However, when z is a finite variable, as is the case for finite mixtures, this integral becomes a finite sum. Hence, condition (4) is better adapted to finite mixtures, while condition (5) is better adapted to continuous hidden variables
- The two sufficient conditions (4) and (5) involve a certain form of convergence of towards . If the latent variable z is continuous, this excludes countable (and finite) approximations such as sums of Dirac functions, since they have a measure of zero. In particular, this excludes Quasi-Monte Carlo approximations. However, one look at the proof of Theorem 1 (in Appendix A.1) or at the following sketch of proof reveals it is actually sufficient to verify for any compact set K. Where denotes the approximated E step in the Stable Approximate EM. This condition can be verified by finite approximations.
2.1.3. Sketch of Proof
- converges towards a connected component of
- If has an empty interior, then such that converges towards . Moreover, converges towards the set . Where .
- (a)
- the C1–2 conditions of Proposition 10 of [7].
- -
- (C1) There exists g, aLyapunov function relative tosuch that for all, is compact, and:
- -
- (C2) is compact OR (C2’) is finite for all compact .
- (b)
- (c)
- ∀ compact :
- converges towards a connected component of .
- If has an empty interior, then the sequence converges towards a and converges towards the set .
2.2. Riemann Approximation EM
2.2.1. Context and Motivation
2.2.2. Theorem and Proof
2.3. Tempered EM
2.3.1. Context and Motivation
2.3.2. Theorem
- (T1)
- ,
- (T2)
- ,
2.3.3. Sketch of Proof
2.3.4. Examples of Models That Verify the Conditions
Gaussian Mixture Model
Poisson Count with Random Effect
- is measurable on .
- is continuous on K (and on ).
- With , then
2.4. Tempered Riemann Approximation EM
Context, Theorem and Proof
3. Results
3.1. Riemann Approximation EM: Two Applications
3.1.1. Application to a Gaussian Model with the Beta Prior
3.1.2. Application in Two Dimensions
3.2. Tempered EM: Application to Mixtures of Gaussian
3.2.1. Context and Experimental Protocol
3.2.2. Experimental Results Analysis
3.3. Tempered Riemann Approximation EM: Application to a Gaussian Model with Beta Prior
4. Discussion and Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Proofs of the Two Main Theorems
Appendix A.1. Proof of the General Theorem
Appendix A.1.1. Verifying the Conditions of Proposition 2
Equivalent Formulation of the Convergence
Use the Uniform Continuity
Sufficient Condition for Convergence
Further Simplifications of the Desired Result with Successive Sufficient Conditions
Conclusions
Appendix A.1.2. Applying Proposition 2
Appendix A.1.3. Verifying the Conditions of Proposition 1
- (set closure) plays the part of the compact K
- plays the part of the set
- is also compact thanks to hypothesis
- The likelihood g is the Lyapunov function with regards to
- is the K valued sequence (since K is ).
Appendix A.1.4. Applying Proposition 1
- converges towards a connected component of
- If has an empty interior, then converges towards a and converges towards . Where
Appendix A.2. Proof of the Tempering Theorem
Appendix A.2.1. Taylor Development
Appendix A.2.2. Two Intermediary Lemmas
- (i)
- ,
- (ii)
- ,
- (iii)
- ,
- (iv)
- ,
- (v)
- .
Appendix A.2.3. Verifying the Conditions of Lemma A2
- (i)
- We have by hypothesis.
- (ii) and (iii)
- and are implied by the fact that and are continuous
- (iv) and (v)
- are also hypotheses of the theorem.Hence we can apply Lemma A2. This means that:
References
- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. (Methodol.) 1977, 39, 1–22. [Google Scholar]
- Wu, C.J. On the convergence properties of the EM algorithm. Ann. Stat. 1983, 11, 95–103. [Google Scholar] [CrossRef]
- Boyles, R.A. On the convergence of the EM algorithm. J. R. Stat. Soc. Ser. (Methodol.) 1983, 45, 47–50. [Google Scholar] [CrossRef]
- Lange, K. A gradient algorithm locally equivalent to the EM algorithm. J. R. Stat. Soc. Ser. (Methodol.) 1995, 57, 425–437. [Google Scholar] [CrossRef] [Green Version]
- Delyon, B.; Lavielle, M.; Moulines, E. Convergence of a stochastic approximation version of the EM algorithm. Ann. Stat. 1999, 27, 94–128. [Google Scholar] [CrossRef]
- Wei, G.C.; Tanner, M.A. A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithms. J. Am. Stat. Assoc. 1990, 85, 699–704. [Google Scholar] [CrossRef]
- Fort, G.; Moulines, E. Convergence of the Monte Carlo expectation maximization for curved exponential families. Ann. Stat. 2003, 31, 1220–1259. [Google Scholar] [CrossRef]
- Kuhn, E.; Lavielle, M. Maximum likelihood estimation in nonlinear mixed effects models. Comput. Stat. Data Anal. 2005, 49, 1020–1038. [Google Scholar] [CrossRef]
- Allassonnière, S.; Kuhn, E.; Trouvé, A. Construction of Bayesian deformable models via a stochastic approximation algorithm: A convergence study. Bernoulli 2010, 16, 641–678. [Google Scholar] [CrossRef]
- Allassonnière, S.; Chevallier, J. A New Class of EM Algorithms. Escaping Local Maxima and Handling Intractable Sampling. Comput. Stat. Data Anal. 2021, 159, 107159. [Google Scholar]
- Neal, R.M.; Hinton, G.E. A view of the EM algorithm that justifies incremental, sparse, and other variants. In Learning in Graphical Models; Springer: Dordrecht, The Netherlands, 1998; pp. 355–368. [Google Scholar]
- Ng, S.K.; McLachlan, G.J. On the choice of the number of blocks with the incremental EM algorithm for the fitting of normal mixtures. Stat. Comput. 2003, 13, 45–55. [Google Scholar] [CrossRef]
- Cappé, O.; Moulines, E. On-line expectation–maximization algorithm for latent data models. J. R. Stat. Soc. Ser. (Stat. Methodol.) 2009, 71, 593–613. [Google Scholar] [CrossRef] [Green Version]
- Chen, J.; Zhu, J.; Teh, Y.W.; Zhang, T. Stochastic expectation maximization with variance reduction. Adv. Neural Inf. Process. Syst. 2018, 31, 7967–7977. [Google Scholar]
- Karimi, B.; Wai, H.T.; Moulines, E.; Lavielle, M. On the global convergence of (fast) incremental expectation maximization methods. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 2837–2847. [Google Scholar]
- Fort, G.; Moulines, E.; Wai, H.T. A Stochastic Path Integral Differential EstimatoR Expectation Maximization Algorithm. Adv. Neural Inf. Process. Syst. 2020, 34, 16972–16982. [Google Scholar]
- Kuhn, E.; Matias, C.; Rebafka, T. Properties of the stochastic approximation EM algorithm with mini-batch sampling. Stat. Comput. 2020, 30, 1725–1739. [Google Scholar] [CrossRef]
- Balakrishnan, S.; Wainwright, M.J.; Yu, B. Statistical guarantees for the EM algorithm: From population to sample-based analysis. Ann. Stat. 2017, 45, 77–120. [Google Scholar] [CrossRef]
- Dwivedi, R.; Ho, N.; Khamaru, K.; Wainwright, M.J.; Jordan, M.I.; Yu, B. Singularity, misspecification and the convergence rate of EM. Ann. Stat. 2020, 48, 3161–3182. [Google Scholar] [CrossRef]
- Booth, J.G.; Hobert, J.P. Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J. R. Stat. Soc. Ser. (Stat. Methodol.) 1999, 61, 265–285. [Google Scholar]
- Levine, R.A.; Casella, G. Implementations of the Monte Carlo EM algorithm. J. Comput. Graph. Stat. 2001, 10, 422–439. [Google Scholar] [CrossRef] [Green Version]
- Levine, R.A.; Fan, J. An automated (Markov chain) Monte Carlo em algorithm. J. Stat. Comput. Simul. 2004, 74, 349–360. [Google Scholar] [CrossRef]
- Pan, J.X.; Thompson, R. Quasi-Monte Carlo EM algorithm for MLEs in generalized linear mixed models. In COMPSTAT; Physica-Verlag: Heidelberg, Germany, 1998; pp. 419–424. [Google Scholar]
- Jank, W. Quasi-Monte Carlo sampling to improve the efficiency of Monte Carlo EM. Comput. Stat. Data Anal. 2005, 48, 685–701. [Google Scholar] [CrossRef]
- Attias, H. Inferring Parameters and Structure of Latent Variable Models by Variational Bayes. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1999; pp. 21–30. [Google Scholar]
- Bishop, C. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
- Tzikas, D.G.; Likas, A.C.; Galatsanos, N.P. The variational approximation for Bayesian inference. IEEE Signal Process. Mag. 2008, 25, 131–146. [Google Scholar] [CrossRef]
- Kirkpatrick, S.; Gelatt, C.D.; Vecchi, M.P. Optimization by simulated annealing. Science 1983, 220, 671–680. [Google Scholar] [CrossRef] [PubMed]
- Swendsen, R.H.; Wang, J.S. Replica Monte Carlo simulation of spin-glasses. Phys. Rev. Lett. 1986, 57, 2607. [Google Scholar] [CrossRef]
- Geyer, C.J.; Thompson, E.A. Annealing Markov chain Monte Carlo with applications to ancestral inference. J. Am. Stat. Assoc. 1995, 90, 909–920. [Google Scholar] [CrossRef]
- Ueda, N.; Nakano, R. Deterministic annealing EM algorithm. Neural Netw. 1998, 11, 271–282. [Google Scholar] [CrossRef]
- Naim, I.; Gildea, D. Convergence of the EM algorithm for Gaussian mixtures with unbalanced mixing coefficients. In Proceedings of the 29th International Conference on Machine Learning; Omnipress: Madison, WI, USA, 2012; pp. 1655–1662. [Google Scholar]
- Chen, H.F.; Guo, L.; Gao, A.J. Convergence and robustness of the Robbins-Monro algorithm truncated at randomly varying bounds. Stoch. Process. Their Appl. 1987, 27, 217–231. [Google Scholar] [CrossRef] [Green Version]
- Van Laarhoven, P.J.; Aarts, E.H. Simulated annealing. In Simulated Annealing: Theory and Applications; Springer: Dordrecht, The Netherlands, 1987; pp. 7–15. [Google Scholar]
- Aarts, E.; Korst, J. Simulated Annealing and Boltzmann Machines; John Wiley and Sons Inc.: New York, NY, USA, 1988. [Google Scholar]
- Hukushima, K.; Nemoto, K. Exchange Monte Carlo method and application to spin glass simulations. J. Phys. Soc. Jpn. 1996, 65, 1604–1608. [Google Scholar] [CrossRef] [Green Version]
- Titterington, D.; Smith, A.; Makov, U. Statistical Analysis of Finite Mixture Distributions; Wiley: New York, NY, USA, 1985. [Google Scholar]
- Ho, N.; Nguyen, X. Convergence rates of parameter estimation for some weakly identifiable finite mixtures. Ann. Stat. 2016, 44, 2726–2755. [Google Scholar] [CrossRef]
- Dwivedi, R.; Ho, N.; Khamaru, K.; Wainwright, M.; Jordan, M.; Yu, B. Sharp Analysis of Expectation-Maximization for Weakly Identifiable Models. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Online, 26–28 August 2020; pp. 1866–1876. [Google Scholar]
- Winkelbauer, A. Moments and absolute moments of the normal distribution. arXiv 2012, arXiv:1209.4340. [Google Scholar]
EM | tmp-EM (Decreasing T) | tmp-EM (Decreasing Oscillating T) | |||||
---|---|---|---|---|---|---|---|
Parameter | |||||||
Family | cl. | barycenter | 2v1 | barycenter | 2v1 | barycenter | 2v1 |
1 | 1 | 0.52 (1.01) | 1.52 (1.24) | 0.60 (1.08) | 0.87 (1.20) | 0.04 (0.26) | 0.29 (0.64) |
2 | 0.55 (1.05) | 1.53 (1.25) | 0.64 (1.10) | 0.96 (1.25) | 0.05 (0.31) | 0.30 (0.64) | |
3 | 0.01 (0.06) | 0.01 (0.03) | 0.01 (0.10) | 0.01 (0.02) | 0.03 (0.17) | 0.03 (0.19) | |
2 | 1 | 1.00 (1.42) | 1.69 (1.51) | 0.96 (1.41) | 1.10 (1.46) | 0.09 (0.47) | 0.37 (0.86) |
2 | 1.03 (1.44) | 1.71 (1.52) | 1.08 (1.46) | 1.11 (1.46) | 0.12 (0.57) | 0.32 (0.79) | |
3 | 0.01 (0.05) | 0.02 (0.03) | 0.01 (0.03) | 0.01 (0.04) | 0.01 (0.05) | 0.04 (0.22) | |
3 | 1 | 1.56 (1.75) | 1.79 (1.77) | 1.63 (1.76) | 1.38 (1.71) | 0.31 (0.97) | 0.39 (0.98) |
2 | 1.51 (1.74) | 1.88 (1.76) | 1.52 (1.74) | 1.30 (1.68) | 0.30 (0.93) | 0.39 (0.97) | |
3 | 0.02 (0.04) | 0.02 (0.04) | 0.01 (0.03) | 0.02 (0.06) | 0.01 (0.04) | 0.07 (0.30) |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lartigue, T.; Durrleman, S.; Allassonnière, S. Deterministic Approximate EM Algorithm; Application to the Riemann Approximation EM and the Tempered EM. Algorithms 2022, 15, 78. https://doi.org/10.3390/a15030078
Lartigue T, Durrleman S, Allassonnière S. Deterministic Approximate EM Algorithm; Application to the Riemann Approximation EM and the Tempered EM. Algorithms. 2022; 15(3):78. https://doi.org/10.3390/a15030078
Chicago/Turabian StyleLartigue, Thomas, Stanley Durrleman, and Stéphanie Allassonnière. 2022. "Deterministic Approximate EM Algorithm; Application to the Riemann Approximation EM and the Tempered EM" Algorithms 15, no. 3: 78. https://doi.org/10.3390/a15030078
APA StyleLartigue, T., Durrleman, S., & Allassonnière, S. (2022). Deterministic Approximate EM Algorithm; Application to the Riemann Approximation EM and the Tempered EM. Algorithms, 15(3), 78. https://doi.org/10.3390/a15030078