MDPI - Publisher of Open Access Journals

12 pages, 5618 KB

Open AccessFeature PaperArticle

An Algorithm for the Conditional Distribution of Independent Binomial Random Variables Given the Sum

by Kelly Ayres and Steven E. Rigdon

Mathematics 2025, 13(13), 2155; https://doi.org/10.3390/math13132155 - 30 Jun 2025

Viewed by 1013

We investigate Metropolis–Hastings (MH) algorithms to approximate the distribution of independent binomial random variables conditioned on the sum. Let

X_{i} \sim BIN (n_{i}, p_{i})

. We want the distribution of [...] Read more.

We investigate Metropolis–Hastings (MH) algorithms to approximate the distribution of independent binomial random variables conditioned on the sum. Let

X_{i} \sim BIN (n_{i}, p_{i})

. We want the distribution of

[X_{1}, \dots, X_{k}]

conditioned on

X_{1} + \dots + X_{k} = n

. We propose both a random walk MH algorithm and an independence sampling MH algorithm for simulating from this conditional distribution. The acceptance probability in the MH algorithm always involves the probability mass function of the proposal distribution. For the random walk MH algorithm, we take this distribution to be uniform across all possible proposals. There is an inherent asymmetry; the number of moves from one state to another is not in general equal to the number of moves from the other state to the one. This requires a careful counting of the number of possible moves out of each possible state. The independence sampler proposes a move based on the Poisson approximation to the binomial. While in general, random walk MH algorithms tend to outperform independence samplers, we find that in this case the independence sampler is more efficient. Full article

► Show Figures

Figure 1

25 pages, 3590 KB

Open AccessArticle

Predictive Modeling of Urban Travel Demand Using Neural Networks and Regression Analysis

by Muhammed Ali Çolak and Osman Ünsal Bayrak

Urban Sci. 2025, 9(6), 195; https://doi.org/10.3390/urbansci9060195 - 28 May 2025

Cited by 2 | Viewed by 8932

Abstract

Urban transportation systems are increasingly strained by population growth, changing mobility patterns, and the need for sustainable infrastructure planning. The accurate modeling of urban trip generation is critical for effective and sustainable transportation planning, especially in the context of rapidly growing urban populations [...] Read more.

Urban transportation systems are increasingly strained by population growth, changing mobility patterns, and the need for sustainable infrastructure planning. The accurate modeling of urban trip generation is critical for effective and sustainable transportation planning, especially in the context of rapidly growing urban populations and evolving travel behaviors. This study investigated the application of advanced statistical methods and artificial intelligence-based techniques for forecasting urban travel demand. Erzincan, with a population of approximately 200,000, serves as a representative mid-sized city, offering valuable insights for transportation planning and traffic management. Data collected from various user groups, including households and university students, provide a comprehensive understanding of local travel behavior. Four predictive modeling techniques, linear regression, Poisson regression, negative binomial regression, and artificial neural networks (ANNs), were applied to the dataset, followed by a comparative performance evaluation. Additionally, a macro-level simulation was conducted using VISUM (Release 18.2.22) software to evaluate the current transportation network and assess the potential impacts of proposed improvement scenarios. The results show that the ANN model provided the highest predictive accuracy for household-based data (R² = 0.62), while the linear regression model yielded the best results for dormitory-based data (R² = 0.95). Furthermore, Poisson regression proved most effective in estimating the minimum trip generation time, which was estimated to be 22.77 min under simulated conditions. The study offers practical insights for transport planners and policymakers by demonstrating how predictive analytics and simulation tools can be integrated to address urban mobility challenges. Full article

► Show Figures

Figure 1

16 pages, 11181 KB

Open AccessArticle

Lung Cancer Prevalence in Virginia: A Spatial Zipcode-Level Analysis via INLA

by Indranil Sahoo, Jinlei Zhao, Xiaoyan Deng, Myles Gordon Cockburn, Kathy Tossas, Robert Winn and Dipankar Bandyopadhyay

Curr. Oncol. 2024, 31(3), 1129-1144; https://doi.org/10.3390/curroncol31030084 - 20 Feb 2024

Cited by 3 | Viewed by 2905

Abstract

Background: Examining lung cancer (LC) cases in Virginia (VA) is essential due to its significant public health implications. By studying demographic, environmental, and socioeconomic variables, this paper aims to provide insights into the underlying drivers of LC prevalence in the state adjusted for [...] Read more.

Background: Examining lung cancer (LC) cases in Virginia (VA) is essential due to its significant public health implications. By studying demographic, environmental, and socioeconomic variables, this paper aims to provide insights into the underlying drivers of LC prevalence in the state adjusted for spatial associations at the zipcode level. Methods: We model the available VA zipcode-level LC counts via (spatial) Poisson and negative binomial regression models, taking into account missing covariate data, zipcode-level spatial association and allow for overdispersion. Under latent Gaussian Markov Random Field (GMRF) assumptions, our Bayesian hierarchical model powered by Integrated Nested Laplace Approximation (INLA) considers simultaneous (spatial) imputation of all missing covariates through elegant prediction. The spatial random effect across zip codes follows a Conditional Autoregressive (CAR) prior. Results: Zip codes with elevated smoking indices demonstrated a corresponding increase in LC counts, underscoring the well-established connection between smoking and LC. Additionally, we observed a notable correlation between higher Social Deprivation Index (SDI) scores and increased LC counts, aligning with the prevalent pattern of heightened LC prevalence in regions characterized by lower income and education levels. On the demographic level, our findings indicated higher LC counts in zip codes with larger White and Black populations (with Whites having higher prevalence than Blacks), lower counts in zip codes with higher Hispanic populations (compared to non-Hispanics), and higher prevalence among women compared to men. Furthermore, zip codes with a larger population of elderly people (age ≥ 65 years) exhibited higher LC prevalence, consistent with established national patterns. Conclusions: This comprehensive analysis contributes to our understanding of the complex interplay of demographic and socioeconomic factors influencing LC disparities in VA at the zip code level, providing valuable information for targeted public health interventions and resource allocation. Implementation code is available at GitHub. Full article

► Show Figures

Figure 1

24 pages, 684 KB

Open AccessArticle

An Exact and an Approximation Method to Compute the Degree Distribution of Inhomogeneous Random Graph Using Poisson Binomial Distribution

by Róbert Pethes and Levente Kovács

Mathematics 2023, 11(6), 1441; https://doi.org/10.3390/math11061441 - 16 Mar 2023

Cited by 2 | Viewed by 2521

Abstract

Inhomogeneous random graphs are commonly used models for complex networks where nodes have varying degrees of connectivity. Computing the degree distribution of such networks is a fundamental problem and has important applications in various fields. We define the inhomogeneous random graph as a [...] Read more.

Inhomogeneous random graphs are commonly used models for complex networks where nodes have varying degrees of connectivity. Computing the degree distribution of such networks is a fundamental problem and has important applications in various fields. We define the inhomogeneous random graph as a random graph model where the edges are drawn independently and the probability of a link between any two vertices can be different for each node pair. In this paper, we present an exact and an approximation method to compute the degree distribution of inhomogeneous random graphs using the Poisson binomial distribution. The exact algorithm utilizes the DFT-CF method to compute the distribution of a Poisson binomial random variable. The approximation method uses the Poisson, binomial, and Gaussian distributions to approximate the Poisson binomial distribution. Full article

► Show Figures

Figure 1

14 pages, 501 KB

Open AccessArticle

Accident Frequency Prediction Model for Flat Rural Roads in Serbia

by Spasoje Mićić, Radoje Vujadinović, Goran Amidžić, Milanko Damjanović and Boško Matović

Sustainability 2022, 14(13), 7704; https://doi.org/10.3390/su14137704 - 24 Jun 2022

Cited by 7 | Viewed by 2978

Abstract

Traffic accidents, by their nature, are random events; therefore, it is difficult to estimate the exact places and times of their occurrences and the true nature of their impacts. Although they are hard to precisely predict, preventative actions can be taken and their [...] Read more.

Traffic accidents, by their nature, are random events; therefore, it is difficult to estimate the exact places and times of their occurrences and the true nature of their impacts. Although they are hard to precisely predict, preventative actions can be taken and their numbers (in a certain period) can be approximately predicted. In this study, we investigated the relationship between accident frequency and factors that affect accident frequency; we used accident data for events that occurred on a flat rural state road in Serbia. The analysis was conducted using five statistical models, i.e., Poisson, negative binomial, random effect negative binomial, zero-inflated Poisson, and zero-inflated negative binomial models. The results indicated that the random effect negative binomial model outperformed the other models in terms of goodness-of-fit measures; it was chosen as the accident prediction model for flat rural roads. Four explanatory variables—annual average daily traffic, segment length, number of horizontal curves, and access road density—were found to significantly affect accident frequency. The results of this research can help road authorities make decisions about interventions and investments in road networks, designing new roads, and reconstructing existing roads. Full article

(This article belongs to the Special Issue Road Traffic Engineering and Sustainable Transportation - The Second Edition)

► Show Figures

Figure 1

24 pages, 2248 KB

Open AccessArticle

Measuring the Effectiveness of Adaptive Random Forest for Handling Concept Drift in Big Data Streams

by Abdulaziz O. AlQabbany and Aqil M. Azmi

Entropy 2021, 23(7), 859; https://doi.org/10.3390/e23070859 - 4 Jul 2021

Cited by 13 | Viewed by 4858

Abstract

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when [...] Read more.

We are living in the age of big data, a majority of which is stream data. The real-time processing of this data requires careful consideration from different perspectives. Concept drift is a change in the data’s underlying distribution, a significant issue, especially when learning from data streams. It requires learners to be adaptive to dynamic changes. Random forest is an ensemble approach that is widely used in classical non-streaming settings of machine learning applications. At the same time, the Adaptive Random Forest (ARF) is a stream learning algorithm that showed promising results in terms of its accuracy and ability to deal with various types of drift. The incoming instances’ continuity allows for their binomial distribution to be approximated to a Poisson

(1)

distribution. In this study, we propose a mechanism to increase such streaming algorithms’ efficiency by focusing on resampling. Our measure, resampling effectiveness (

ρ

), fuses the two most essential aspects in online learning; accuracy and execution time. We use six different synthetic data sets, each having a different type of drift, to empirically select the parameter

λ

of the Poisson distribution that yields the best value for

ρ

. By comparing the standard ARF with its tuned variations, we show that ARF performance can be enhanced by tackling this important aspect. Finally, we present three case studies from different contexts to test our proposed enhancement method and demonstrate its effectiveness in processing large data sets: (a) Amazon customer reviews (written in English), (b) hotel reviews (in Arabic), and (c) real-time aspect-based sentiment analysis of COVID-19-related tweets in the United States during April 2020. Results indicate that our proposed method of enhancement exhibited considerable improvement in most of the situations. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Figure 1

21 pages, 384 KB

Open AccessArticle

On Approximation of the Tails of the Binomial Distribution with These of the Poisson Law

by Sergei Nagaev and Vladimir Chebotarev

Mathematics 2021, 9(8), 845; https://doi.org/10.3390/math9080845 - 13 Apr 2021

Cited by 1 | Viewed by 3321

Abstract

A subject of this study is the behavior of the tail of the binomial distribution in the case of the Poisson approximation. The deviation from unit of the ratio of the tail of the binomial distribution and that of the Poisson distribution, multiplied [...] Read more.

A subject of this study is the behavior of the tail of the binomial distribution in the case of the Poisson approximation. The deviation from unit of the ratio of the tail of the binomial distribution and that of the Poisson distribution, multiplied by the correction factor, is estimated. A new type of approximation is introduced when the parameter of the approximating Poisson law depends on the point at which the approximation is performed. Then the transition to the approximation by the Poisson law with the parameter equal to the mathematical expectation of the approximated binomial law is carried out. In both cases error estimates are obtained. A number of conjectures are made about the refinement of the known estimates for the Kolmogorov distance between binomial and Poisson distributions. Full article

(This article belongs to the Special Issue Analytical Methods and Convergence in Probability with Applications)

► Show Figures

Figure 1

23 pages, 673 KB

Open AccessEditor’s ChoiceArticle

EM Estimation for the Poisson-Inverse Gamma Regression Model with Varying Dispersion: An Application to Insurance Ratemaking

by George Tzougas

Risks 2020, 8(3), 97; https://doi.org/10.3390/risks8030097 - 11 Sep 2020

Cited by 16 | Viewed by 6099

Abstract

This article presents the Poisson-Inverse Gamma regression model with varying dispersion for approximating heavy-tailed and overdispersed claim counts. Our main contribution is that we develop an Expectation-Maximization (EM) type algorithm for maximum likelihood (ML) estimation of the Poisson-Inverse Gamma regression model with varying [...] Read more.

This article presents the Poisson-Inverse Gamma regression model with varying dispersion for approximating heavy-tailed and overdispersed claim counts. Our main contribution is that we develop an Expectation-Maximization (EM) type algorithm for maximum likelihood (ML) estimation of the Poisson-Inverse Gamma regression model with varying dispersion. The empirical analysis examines a portfolio of motor insurance data in order to investigate the efficiency of the proposed algorithm. Finally, both the a priori and a posteriori, or Bonus-Malus, premium rates that are determined by the Poisson-Inverse Gamma model are compared to those that result from the classic Negative Binomial Type I and the Poisson-Inverse Gaussian distributions with regression structures for their mean and dispersion parameters. Full article

► Show Figures

Figure 1

30 pages, 5453 KB

Open AccessArticle

Probability Models and Statistical Tests for Extreme Precipitation Based on Generalized Negative Binomial Distributions

by Victor Korolev and Andrey Gorshenin

Mathematics 2020, 8(4), 604; https://doi.org/10.3390/math8040604 - 16 Apr 2020

Cited by 19 | Viewed by 4131

Abstract

Mathematical models are proposed for statistical regularities of maximum daily precipitation within a wet period and total precipitation volume per wet period. The proposed models are based on the generalized negative binomial (GNB) distribution of the duration of a wet period. The GNB [...] Read more.

Mathematical models are proposed for statistical regularities of maximum daily precipitation within a wet period and total precipitation volume per wet period. The proposed models are based on the generalized negative binomial (GNB) distribution of the duration of a wet period. The GNB distribution is a mixed Poisson distribution, the mixing distribution being generalized gamma (GG). The GNB distribution demonstrates excellent fit with real data of durations of wet periods measured in days. By means of limit theorems for statistics constructed from samples with random sizes having the GNB distribution, asymptotic approximations are proposed for the distributions of maximum daily precipitation volume within a wet period and total precipitation volume for a wet period. It is shown that the exponent power parameter in the mixing GG distribution matches slow global climate trends. The bounds for the accuracy of the proposed approximations are presented. Several tests for daily precipitation, total precipitation volume and precipitation intensities to be abnormally extremal are proposed and compared to the traditional PoT-method. The results of the application of this test to real data are presented. Full article

(This article belongs to the Special Issue Stability Problems for Stochastic Models: Theory and Applications)

► Show Figures

Figure 1

22 pages, 6066 KB

Open AccessArticle

Relationship Between Traffic Volume and Accident Frequency at Intersections

by Angus Eugene Retallack and Bertram Ostendorf

Int. J. Environ. Res. Public Health 2020, 17(4), 1393; https://doi.org/10.3390/ijerph17041393 - 21 Feb 2020

Cited by 73 | Viewed by 10723

Abstract

Driven by the high social costs and emotional trauma that result from traffic accidents around the world, research into understanding the factors that influence accident occurrence is critical. There is a lack of consensus about how the management of congestion may affect traffic [...] Read more.

Driven by the high social costs and emotional trauma that result from traffic accidents around the world, research into understanding the factors that influence accident occurrence is critical. There is a lack of consensus about how the management of congestion may affect traffic accidents. This paper aims to improve our understanding of this relationship by analysing accidents at 120 intersections in Adelaide, Australia. Data comprised of 1629 motor vehicle accidents with traffic volumes from a dataset of more than five million hourly measurements. The effect of rainfall was also examined. Results showed an approximately linear relationship between traffic volume and accident frequency at lower traffic volumes. In the highest traffic volumes, poisson and negative binomial models showed a significant quadratic explanatory term as accident frequency increases at a higher rate. This implies that focusing management efforts on avoiding these conditions would be most effective in reducing accident frequency. The relative risk of rainfall on accident frequency decreases with increasing congestion index. Accident risk is five times greater during rain at low congestion levels, successively decreasing to no elevated risk at the highest congestion level. No significant effect of congestion index on accident severity was detected. Full article

(This article belongs to the Special Issue Traffic Accident Control and Prevention)

► Show Figures

Figure 1

18 pages, 721 KB

Open AccessArticle

Study on the Appraisal of Tourism Demands and Recreation Benefits for Nanwan Beach, Kenting, Taiwan

by Chih-Ming Dong, Chien-Chi Lin and Shu-Ping Lin

Environments 2018, 5(9), 97; https://doi.org/10.3390/environments5090097 - 25 Aug 2018

Cited by 5 | Viewed by 5326

Abstract

This study implemented a questionnaire survey on tourists to the Nanwan Beach, Kenting, Taiwan and then applied the travel cost method to appraise the recreation benefits of the Nanwan Beach. The truncated Poisson model (TPOIS), truncated negative binomial distribution model, and on-site Poisson [...] Read more.

This study implemented a questionnaire survey on tourists to the Nanwan Beach, Kenting, Taiwan and then applied the travel cost method to appraise the recreation benefits of the Nanwan Beach. The truncated Poisson model (TPOIS), truncated negative binomial distribution model, and on-site Poisson model were applied in view of the errors caused by truncated samples and endogenous stratification, and the results indicated that: (1) The on-site Poisson model was more suitable than the other two models for estimating the recreation benefits of Nanwan; (2) the three recreational benefit indicators (consumer surplus, compensation variation, and equivalent variation) estimated using the TPOIS model were all significantly greater than those of the on-site Poisson model; (3) the on-site Poisson model estimated the price elasticity and income elasticity of the tourism demands for Nanwan as −0.329 and 0.187, respectively; and (4) on the basis of the on-site Poisson model, the consumer surplus for Nanwan was NT$9639 (approximately US$289) per person per visit, and the annual gross recreation benefits were approximate NT$8.022 billion. The results are expected to provide a valuable reference for management and planning policies of the Kenting National Park. Full article

► Show Figures

Figure 1

13 pages, 1079 KB

Open AccessArticle

Revealed Preference and Effectiveness of Public Investment in Ecological River Restoration Projects: An Application of the Count Data Model

by Yoon Lee, Hwansuk Kim and Yongsuk Hong

Sustainability 2016, 8(4), 353; https://doi.org/10.3390/su8040353 - 12 Apr 2016

Cited by 5 | Viewed by 5069

Abstract

Ecological river restoration projects aim to revitalize healthy and self-sustaining river systems that can provide irreplaceable benefits to human society. Cheonggyecheon and Anyangcheon are two sites of recent river restoration projects in Korea. To assess the economic value of two rivers, count data [...] Read more.

Ecological river restoration projects aim to revitalize healthy and self-sustaining river systems that can provide irreplaceable benefits to human society. Cheonggyecheon and Anyangcheon are two sites of recent river restoration projects in Korea. To assess the economic value of two rivers, count data was collected to conduct the individual travel cost method (ITCM) in this study. Five statistical models such as the Poisson, the negative binomial, the zero-truncated Poisson, the negative binomial, and negative binomial model adjusted for both truncation and endogenous stratification were used in the analysis due to the nature of count data. Empirical results showed that regressors were statistically significant and corresponded to conventional consumer theory. Since collected count data indicated over-dispersion and endogenous stratification, the adjusted Negative Binomial was selected as an optimal model to analyze the recreational value of Cheonggyecheon and Anyangcheon. Estimates of the annual economic value of two river restoration projects were approximately US $170.1 million and US $50.5 million, respectively. Full article

► Show Figures

Figure 1

Search Results (12)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (12)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI