MDPI - Publisher of Open Access Journals

18 pages, 907 KB

Open AccessArticle

Bayesian Estimation of Multicomponent Stress–Strength Model Using Progressively Censored Data from the Inverse Rayleigh Distribution

by Asuman Yılmaz

Entropy 2025, 27(11), 1095; https://doi.org/10.3390/e27111095 - 23 Oct 2025

Viewed by 166

Abstract

This paper presents a comprehensive study on the estimation of multicomponent stress–strength reliability under progressively censored data, assuming the inverse Rayleigh distribution. Both maximum likelihood estimation and Bayesian estimation methods are considered. The loss function and prior distribution play crucial roles in Bayesian [...] Read more.

This paper presents a comprehensive study on the estimation of multicomponent stress–strength reliability under progressively censored data, assuming the inverse Rayleigh distribution. Both maximum likelihood estimation and Bayesian estimation methods are considered. The loss function and prior distribution play crucial roles in Bayesian inference. Therefore, Bayes estimators of the unknown model parameters are obtained under symmetric (squared error loss function) and asymmetric (linear exponential and general entropy) loss functions using gamma priors. Lindley and MCMC approximation methods are used for Bayesian calculations. Additionally, asymptotic confidence intervals based on maximum likelihood estimators and Bayesian credible intervals constructed via Markov Chain Monte Carlo methods are presented. An extensive Monte Carlo simulation study compares the efficiencies of classical and Bayesian estimators, revealing that Bayesian estimators outperform classical ones. Finally, a real-life data example is provided to illustrate the practical applicability of the proposed methods. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Figure 1

28 pages, 7150 KB

Open AccessArticle

Distress-Level Prediction of Pavement Deterioration with Causal Analysis and Uncertainty Quantification

by Yifan Sun, Qian Gao, Feng Li and Yuchuan Du

Appl. Sci. 2025, 15(20), 11250; https://doi.org/10.3390/app152011250 - 21 Oct 2025

Viewed by 369

Abstract

Pavement performance prediction serves as a core basis for maintenance decision-making. Although numerous studies have been conducted, most focus on road segments and aggregate indicators such as IRI and PCI, with limited attention to the daily deterioration of individual distresses. Subject to the [...] Read more.

Pavement performance prediction serves as a core basis for maintenance decision-making. Although numerous studies have been conducted, most focus on road segments and aggregate indicators such as IRI and PCI, with limited attention to the daily deterioration of individual distresses. Subject to the combined influence of multiple factors, pavement distress deterioration exhibits pronounced nonlinear and time-lag characteristics, making distress-level predictions prone to disturbances and highly uncertain. To address this challenge, this study investigates the distress-level deterioration of three representative distresses—transverse cracks, alligator cracks, and potholes—with causal analysis and uncertainty quantification. Based on two years of high-frequency road inspection data, a continuous tracking dataset comprising 164 distress sites and 9038 records was established using a three-step matching algorithm. Convergent cross mapping was applied to quantify the causal strength and lag days of environmental factors, which were subsequently embedded into an encoder–decoder framework to construct a BayesLSTM model. Monte Carlo Dropout was employed to approximate Bayesian inference, enabling probabilistic characterization of predictive uncertainty and the construction of prediction intervals. Results indicate that integrating causal and time-lag characteristics improves the model’s capacity to identify key drivers and anticipate deterioration inflection points. The proposed BayesLSTM achieved high predictive accuracy across all three distress types, with a prediction interval coverage of 100%, thereby enhancing the reliability of prediction by providing both deterministic results and interval estimates. These findings facilitate the identification of high-risk distresses and their underlying mechanisms, offering support for rational allocation of maintenance resources. Full article

(This article belongs to the Special Issue New Technology for Road Surface Detection, 2nd Edition)

► Show Figures

Figure 1

24 pages, 518 KB

Open AccessArticle

Bayesian Inference on Stress–Strength Reliability with Geometric Distributions

by Mohammed K. Shakhatreh

Symmetry 2025, 17(10), 1723; https://doi.org/10.3390/sym17101723 - 13 Oct 2025

Viewed by 249

Abstract

This paper investigates the estimation of the stress–strength reliability parameter

ρ = P (X \leq Y)

, where stress (X) and strength

(Y)

are independently modeled by geometric distributions. Objective Bayesian approaches are employed by developing Jeffreys, [...] Read more.

This paper investigates the estimation of the stress–strength reliability parameter

ρ = P (X \leq Y)

, where stress (X) and strength

(Y)

are independently modeled by geometric distributions. Objective Bayesian approaches are employed by developing Jeffreys, reference, and probability-matching priors for

ρ

, and their effects on the resulting Bayes estimates are examined. Posterior inference is carried out using the random-walk Metropolis–Hastings algorithm. The performance of the proposed Bayesian estimators is assessed through extensive Monte Carlo simulations based on average estimates, root mean squared errors, and frequentist coverage probabilities of the highest posterior density credible intervals. Furthermore, the applicability of the methodology is demonstrated using two real data sets. Full article

(This article belongs to the Section Mathematics)

► Show Figures

Figure 1

10 pages, 2337 KB

Open AccessArticle

Neutral Impact of SARS-CoV-2 Coinfection on the Recombination-Driven Evolution of Endemic HCoV-OC43

by Xueling Zheng, Yinyan Zhou, Yue Yu, Shi Cheng, Feifei Cao, Zhou Sun, Jun Li and Xinfen Yu

Viruses 2025, 17(9), 1263; https://doi.org/10.3390/v17091263 - 18 Sep 2025

Viewed by 400

Abstract

Knowledge gaps exist on whether SARS-CoV-2 co-infection alters recombination frequency or induces phylogenetic incongruities in endemic β-coronaviruses (HCoV-OC43, HCoV-HKU1), limiting our understanding of cross-species evolution. Among 7213 COVID-19 and 1590 non-COVID-19 acute respiratory cases (2021–2022) screened via multiplex PCR, β-coronavirus co-infections (SARS-CoV-2 + [...] Read more.

Knowledge gaps exist on whether SARS-CoV-2 co-infection alters recombination frequency or induces phylogenetic incongruities in endemic β-coronaviruses (HCoV-OC43, HCoV-HKU1), limiting our understanding of cross-species evolution. Among 7213 COVID-19 and 1590 non-COVID-19 acute respiratory cases (2021–2022) screened via multiplex PCR, β-coronavirus co-infections (SARS-CoV-2 + HCoV-OC43/HKU1) and single HCoV-OC43/HKU1 infections were identified. Whole-genome sequencing (Illumina NovaSeq) was performed. Phylogenies were reconstructed using Bayesian inference (MrBayes). Recombination was assessed via Bootscan analysis (SimPlot). Co-infection prevalence was low (0.51%, mainly HCoV-HKU1: 0.28%, HCoV-OC43: 0.11%). HCoV-OC43 diverged into lineage 1 (genotype K) and a novel recombinant lineage 2 (genotypes F/J/G/I segments), exhibiting accelerated evolution. HCoV-HKU1 remained genetically stable (genotype B). Co-infection status did not influence evolutionary outcomes. While SARS-CoV-2 co-infection may favor transmission of endemic HCoVs, their evolution appears driven by population-level selection, not co-infection. HCoV-OC43 underwent recombination-driven diversification, contrasting sharply with HCoV-HKU1’s stasis, highlighting distinct evolutionary strategies. Integrated genomic and clinical surveillance is critical for tracking coronavirus adaptation. Full article

(This article belongs to the Special Issue COVID-19 Complications and Co-infections)

► Show Figures

Figure 1

20 pages, 10452 KB

Open AccessArticle

Nonlocal Prior Mixture-Based Bayesian Wavelet Regression with Application to Noisy Imaging and Audio Data

by Nilotpal Sanyal

Mathematics 2025, 13(16), 2642; https://doi.org/10.3390/math13162642 - 17 Aug 2025

Viewed by 362

Abstract

We propose a novel Bayesian wavelet regression approach using a three-component spike-and-slab prior for wavelet coefficients, combining a point mass at zero, a moment (MOM) prior, and an inverse moment (IMOM) prior. This flexible prior supports small and large coefficients differently, offering advantages [...] Read more.

We propose a novel Bayesian wavelet regression approach using a three-component spike-and-slab prior for wavelet coefficients, combining a point mass at zero, a moment (MOM) prior, and an inverse moment (IMOM) prior. This flexible prior supports small and large coefficients differently, offering advantages for highly dispersed data where wavelet coefficients span multiple scales. The IMOM prior’s heavy tails capture large coefficients, while the MOM prior is better suited for smaller non-zero coefficients. Further, our method introduces innovative hyperparameter specifications for mixture probabilities and scale parameters, including generalized logit, hyperbolic secant, and generalized normal decay for probabilities, and double exponential decay for scaling. Hyperparameters are estimated via an empirical Bayes approach, enabling posterior inference tailored to the data. Extensive simulations demonstrate significant performance gains over two-component wavelet methods. Applications to electroencephalography and noisy audio data illustrate the method’s utility in capturing complex signal characteristics. We implement our method in an R package, NLPwavelet (≥1.1). Full article

(This article belongs to the Special Issue Bayesian Statistics and Applications)

► Show Figures

Figure 1

49 pages, 14879 KB

Open AccessArticle

Fully Bayesian Inference for Meta-Analytic Deconvolution Using Efron’s Log-Spline Prior

by JoonHo Lee and Daihe Sui

Mathematics 2025, 13(16), 2639; https://doi.org/10.3390/math13162639 - 17 Aug 2025

Viewed by 672

Abstract

Meta-analytic deconvolution seeks to recover the distribution of true effects from noisy site-specific estimates. While Efron’s log-spline prior provides an elegant empirical Bayes solution with excellent point estimation properties, its plug-in nature yields severely anti-conservative uncertainty quantification for individual site effects—a critical limitation [...] Read more.

Meta-analytic deconvolution seeks to recover the distribution of true effects from noisy site-specific estimates. While Efron’s log-spline prior provides an elegant empirical Bayes solution with excellent point estimation properties, its plug-in nature yields severely anti-conservative uncertainty quantification for individual site effects—a critical limitation for what Efron terms “finite-Bayes inference.” We develop a fully Bayesian extension that preserves the computational advantages of the log-spline framework while properly propagating hyperparameter uncertainty into site-level posteriors. Our approach embeds the log-spline prior within a hierarchical model with adaptive regularization, enabling exact finite-sample inference without asymptotic approximations. Through simulation studies calibrated to realistic meta-analytic scenarios, we demonstrate that our method achieves near-nominal coverage (88–91%) for 90% credible intervals while matching empirical Bayes point estimation accuracy. We provide a complete Stan implementation handling heteroscedastic observations—a critical feature absent from existing software. The method enables principled uncertainty quantification for individual effects at modest computational cost, making it particularly valuable for applications requiring accurate site-specific inference, such as multisite trials and institutional performance assessment. Full article

(This article belongs to the Section D1: Probability and Statistics)

► Show Figures

Figure 1

17 pages, 572 KB

Open AccessArticle

Statistical Analysis Under a Random Censoring Scheme with Applications

by Mustafa M. Hasaballah and Mahmoud M. Abdelwahab

Symmetry 2025, 17(7), 1048; https://doi.org/10.3390/sym17071048 - 3 Jul 2025

Cited by 1 | Viewed by 398

Abstract

The Gumbel Type-II distribution is a widely recognized and frequently utilized lifetime distribution, playing a crucial role in reliability engineering. This paper focuses on the statistical inference of the Gumbel Type-II distribution under a random censoring scheme. From a frequentist perspective, point estimates [...] Read more.

The Gumbel Type-II distribution is a widely recognized and frequently utilized lifetime distribution, playing a crucial role in reliability engineering. This paper focuses on the statistical inference of the Gumbel Type-II distribution under a random censoring scheme. From a frequentist perspective, point estimates for the unknown parameters are derived using the maximum likelihood estimation method, and confidence intervals are constructed based on the Fisher information matrix. From a Bayesian perspective, Bayes estimates of the parameters are obtained using the Markov Chain Monte Carlo method, and the average lengths of credible intervals are calculated. The Bayesian inference is performed under both the squared error loss function and the general entropy loss function. Additionally, a numerical simulation is conducted to evaluate the performance of the proposed methods. To demonstrate their practical applicability, a real world example is provided, illustrating the application and development of these inference techniques. In conclusion, the Bayesian method appears to outperform other approaches, although each method offers unique advantages. Full article

(This article belongs to the Special Issue Skewed (Asymmetrical) Probability Distributions and Applications Across Disciplines, Fourth Edition)

► Show Figures

Figure 1

29 pages, 3774 KB

Open AccessArticle

Improving the Minimum Free Energy Principle to the Maximum Information Efficiency Principle

by Chenguang Lu

Entropy 2025, 27(7), 684; https://doi.org/10.3390/e27070684 - 26 Jun 2025

Viewed by 1589

Abstract

Friston proposed the Minimum Free Energy Principle (FEP) based on the Variational Bayesian (VB) method. This principle emphasizes that the brain and behavior coordinate with the environment, promoting self-organization. However, it has a theoretical flaw, a possibility of being misunderstood, and a limitation [...] Read more.

Friston proposed the Minimum Free Energy Principle (FEP) based on the Variational Bayesian (VB) method. This principle emphasizes that the brain and behavior coordinate with the environment, promoting self-organization. However, it has a theoretical flaw, a possibility of being misunderstood, and a limitation (only likelihood functions are used as constraints). This paper first introduces the semantic information G theory and the R(G) function (where R is the minimum mutual information for the given semantic mutual information G). The G theory is based on the P-T probability framework and, therefore, allows for the use of truth, membership, similarity, and distortion functions (related to semantics) as constraints. Based on the study of the R(G) function and logical Bayesian Inference, this paper proposes the Semantic Variational Bayesian (SVB) and the Maximum Information Efficiency (MIE) principle. Theoretic analysis and computing experiments prove that R − G = F − H(X|Y) (where F denotes VFE, and H(X|Y) is Shannon conditional entropy) instead of F continues to decrease when optimizing latent variables; SVB is a reliable and straightforward approach for latent variables and active inference. This paper also explains the relationship between information, entropy, free energy, and VFE in local non-equilibrium and equilibrium systems, concluding that Shannon information, semantic information, and VFE are analogous to the increment of free energy, the increment of exergy, and physical conditional entropy. The MIE principle builds upon the fundamental ideas of the FEP, making them easier to understand and apply. It needs to combine deep learning methods for wider applications. Full article

(This article belongs to the Special Issue Information-Theoretic Approaches for Machine Learning and AI)

► Show Figures

Figure 1

24 pages, 23057 KB

Open AccessArticle

On the Potential of Bayesian Neural Networks for Estimating Chlorophyll-a Concentration from Satellite Data

by Mohamad Abed El Rahman Hammoud, Nikolaos Papagiannopoulos, George Krokos, Robert J. W. Brewin, Dionysios E. Raitsos, Omar Knio and Ibrahim Hoteit

Remote Sens. 2025, 17(11), 1826; https://doi.org/10.3390/rs17111826 - 23 May 2025

Viewed by 1141

Abstract

This work introduces the use of Bayesian Neural Networks (BNNs) for inferring chlorophyll-a concentration ([CHL-a]) from remotely sensed data. BNNs are probabilistic models that associate a probability distribution to the neural network parameters and rely on Bayes’ rule for training. The performance of [...] Read more.

This work introduces the use of Bayesian Neural Networks (BNNs) for inferring chlorophyll-a concentration ([CHL-a]) from remotely sensed data. BNNs are probabilistic models that associate a probability distribution to the neural network parameters and rely on Bayes’ rule for training. The performance of the proposed probabilistic model is compared to that of standard ocean color algorithms, namely ocean color 4 (OC4) and ocean color index (OCI). An extensive in situ bio-optical dataset was used to train and validate the ocean color models. In contrast to established methods, the BNN allows for enhanced modeling flexibility, where different variables that affect phytoplankton phenology or describe the state of the ocean can be used as additional input for enhanced performance. Our results suggest that BNNs perform at least as well as established methods, and they could achieve 20–40% lower mean squared errors when additional input variables are included, such as the sea surface temperature and its climatological mean alongside the coordinates of the prediction. The BNNs offer means for uncertainty quantification by estimating the probability distribution of [CHL-a], building confidence in the [CHL-a] predictions through the variance of the predictions. Furthermore, the output probability distribution can be used for risk assessment and decision making through analyzing the quantiles and shape of the predicted distribution. Full article

(This article belongs to the Special Issue Recent Advances in Water Quality Monitoring)

► Show Figures

Figure 1

24 pages, 3798 KB

Open AccessArticle

Stochastic Optimal Control for Uncertain Structural Systems Under Random Excitations Based on Bayes Optimal Estimation

by Hua Lei, Zhao-Zhong Ying and Zu-Guang Ying

Buildings 2025, 15(9), 1579; https://doi.org/10.3390/buildings15091579 - 7 May 2025

Cited by 1 | Viewed by 549

Abstract

Stochastic vibration control of uncertain structures under random loading is an important problem and its minimax optimal control strategy remains to be developed. In this paper, a stochastic optimal control strategy for uncertain structural systems under random excitations is proposed, based on the [...] Read more.

Stochastic vibration control of uncertain structures under random loading is an important problem and its minimax optimal control strategy remains to be developed. In this paper, a stochastic optimal control strategy for uncertain structural systems under random excitations is proposed, based on the minimax stochastic dynamical programming principle and the Bayes optimal estimation method with the combination of stochastic dynamics and Bayes inference. The general description of the stochastic optimal control problem is presented including optimal parameter estimation and optimal state control. For the estimation, the posterior probability density conditional on observation states is expressed using the likelihood function conditional on system parameters according to Bayes’ theorem. The likelihood is replaced by the geometrically averaged likelihood, and the posterior is converted into its logarithmic expression to avoid numerical singularity. The expressions of state statistics are derived based on stochastic dynamics. The statistics are further transformed into those conditional on observation states based on optimal state estimation. Then, the obtained posterior will be more reliable and accurate, and the optimal estimation will greatly reduce uncertain parameter domains. For the control, the minimax strategy is designed by minimizing the performance index for the worst-parameter system, which is obtained by maximizing the performance index based on game theory. The dynamical programming equation for the uncertain system is derived according to the minimax stochastic dynamical programming principle. The worst parameters are determined by the maximization of the equation, and the optimal control is determined by the minimization of the resulting equation. The minimax optimal control by combining the Bayes optimal estimation and minimax stochastic dynamical programming will be more effective and robust. Finally, numerical results for a five-story frame structure under random excitations show the control effectiveness of the proposed strategy. Full article

(This article belongs to the Special Issue The Vibration Control of Building Structures)

► Show Figures

Figure 1

13 pages, 6996 KB

Open AccessArticle

Decoding the Mitochondrial Genome of the Tiger Shrimp: Comparative Genomics and Phylogenetic Placement Within Caridean Shrimps

by Zhengfei Wang, Weijie Jiang, Jingxue Ye, Huiwen Wu, Yan Wang and Fei Xiong

Genes 2025, 16(4), 457; https://doi.org/10.3390/genes16040457 - 16 Apr 2025

Cited by 1 | Viewed by 882

Abstract

Background/Objectives: Freshwater shrimps of the family Atyidae, particularly the hyperdiverse genus Caridina, are keystone decomposers in tropical aquatic ecosystems and valuable aquaculture resources. However, their evolutionary relationships remain unresolved due to conflicting morphological and molecular evidence. Here, we sequenced and characterized the complete [...] Read more.

Background/Objectives: Freshwater shrimps of the family Atyidae, particularly the hyperdiverse genus Caridina, are keystone decomposers in tropical aquatic ecosystems and valuable aquaculture resources. However, their evolutionary relationships remain unresolved due to conflicting morphological and molecular evidence. Here, we sequenced and characterized the complete mitochondrial genome of Caridina mariae (Tiger Shrimp), aiming to (1) elucidate its genomic architecture, and (2) reconstruct a robust phylogeny of Caridea using 155 decapod species to address long-standing taxonomic uncertainties. Methods: Muscle tissue from wild-caught C. mariae (voucher ID: KIZ-2023-001, Guangdong, China) was subjected to Illumina NovaSeq 6000 sequencing (150 bp paired-end). The mitogenome was assembled using MITObim v1.9, annotated via MITOS2, and validated by PCR. Phylogenetic analyses employed 13 protein-coding genes under Bayesian inference (MrBayes v3.2.7; 10⁶ generations, ESS > 200) and maximum likelihood (RAxML v8.2.12; 1000 bootstraps), with Harpiosquilla harpax as the outgroup. The best-fit substitution model (MtZoa + F + I + G4) was selected via jModelTest v2.1.10. Results: The 15,581 bp circular mitogenome encodes 37 genes (13 PCGs, 22 tRNAs, and 2 rRNAs) and an A + T-rich control region (86.7%). Notably, trnS1 lacks the dihydrouracil arm—a rare structural deviation in Decapoda. The 13 PCGs exhibit moderate nucleotide skew (AT = 0.030; GC = −0.214), while nad5, nad4, and nad6 show significant GC-skew. Phylogenomic analyses strongly support (PP = 1.0; BS = 95) a novel sister-group relationship between Halocaridinidae and Typhlatyinae, contradicting prior morphology-based classifications. The monophyly of Penaeoidea, Astacidea, and Caridea was confirmed, but Eryonoidea and Crangonoidea formed an unexpected clade. Conclusions: This study provides the first mitogenomic framework for C. mariae, revealing both conserved features (e.g., PCG content) and lineage-specific innovations (e.g., tRNA truncation). The resolved phylogeny challenges traditional Caridea classifications and highlights convergent adaptation in freshwater lineages. These findings offer molecular tools for the conservation prioritization of threatened Caridina species and underscore the utility of mitogenomics in decapod systematics. Full article

(This article belongs to the Special Issue Molecular Evolution, Mitochondrial Genomics and Mitochondrial Genome Expression in Animals: 2024–2025)

► Show Figures

Figure 1

8 pages, 252 KB

Open AccessArticle

On the Bayesian Two-Sample Problem for Ranking Data

by Mayer Alvo

Axioms 2025, 14(4), 292; https://doi.org/10.3390/axioms14040292 - 14 Apr 2025

Viewed by 363

Abstract

We consider the two-sample problem involving a new class of angle-based models for ranking data. These models are functions of the cosine of the angle between a ranking and a consensus vector. A Bayesian approach is employed to determine the corresponding predictive densities. [...] Read more.

We consider the two-sample problem involving a new class of angle-based models for ranking data. These models are functions of the cosine of the angle between a ranking and a consensus vector. A Bayesian approach is employed to determine the corresponding predictive densities. Two competing hypotheses are considered, and we compute the Bayes factor to quantify the evidence provided by the observed data under each hypothesis. We apply the results to a real data set. Full article

► Show Figures

Figure 1

24 pages, 755 KB

Open AccessArticle

Inference for Dependent Competing Risks with Partially Observed Causes from Bivariate Inverted Exponentiated Pareto Distribution Under Generalized Progressive Hybrid Censoring

by Rani Kumari, Yogesh Mani Tripathi, Rajesh Kumar Sinha and Liang Wang

Axioms 2025, 14(3), 217; https://doi.org/10.3390/axioms14030217 - 16 Mar 2025

Viewed by 558

Abstract

In this paper, inference under dependent competing risk data is considered with multiple causes of failure. We discuss both classical and Bayesian methods for estimating model parameters under the assumption that data are observed under generalized progressive hybrid censoring. The maximum likelihood estimators [...] Read more.

In this paper, inference under dependent competing risk data is considered with multiple causes of failure. We discuss both classical and Bayesian methods for estimating model parameters under the assumption that data are observed under generalized progressive hybrid censoring. The maximum likelihood estimators of model parameters are obtained when occurrences of latent failure follow a bivariate inverted exponentiated Pareto distribution. The associated existence and uniqueness properties of these estimators are established. The asymptotic interval estimators are also constructed. Further, Bayes estimates and highest posterior density intervals are derived using flexible priors. A Monte Carlo sampling algorithm is proposed for posterior computations. The performance of all proposed methods is evaluated through extensive simulations. Moreover, a real-life example is also presented to illustrate the practical applications of our inferential procedures. Full article

► Show Figures

Figure 1

18 pages, 418 KB

Open AccessArticle

Inference with Pólya-Gamma Augmentation for US Election Law

by Adam C. Hall and Joseph Kang

Mathematics 2025, 13(6), 945; https://doi.org/10.3390/math13060945 - 13 Mar 2025

Viewed by 912

Abstract

Pólya-gamma (PG) augmentation has proven to be highly effective for Bayesian MCMC simulation, particularly for models with binomial likelihoods. This data augmentation strategy offers two key advantages. First, the method circumvents the need for analytic approximations or Metropolis–Hastings algorithms, which leads to simpler [...] Read more.

Pólya-gamma (PG) augmentation has proven to be highly effective for Bayesian MCMC simulation, particularly for models with binomial likelihoods. This data augmentation strategy offers two key advantages. First, the method circumvents the need for analytic approximations or Metropolis–Hastings algorithms, which leads to simpler and more computationally efficient posterior inference. Second, the approach can be successfully applied to several types of models, including nonlinear mixed-effects models for count data. The effectiveness of PG augmentation has led to its widespread adoption and implementation in statistical software packages, such as version 2.1 of the R package BayesLogit. This success has inspired us to apply this method to the implementation of Section 203 of the Voting Rights Act (VRA), a US law that requires certain jurisdictions to provide non-English voting materials for specific language minority groups (LMGs). In this paper, we show how PG augmentation can be used to fit a Bayesian model that estimates the prevalence of each LMG in each US voting jurisdiction, and that uses a variable selection technique called stochastic search variable selection. We demonstrate that this new model outperforms the previous model used for 2021 VRA data with respect to model diagnostic measures. Full article

(This article belongs to the Special Issue Statistical Simulation and Computation: 3rd Edition)

► Show Figures

Figure 1

23 pages, 522 KB

Open AccessArticle

ORUD-Detect: A Comprehensive Approach to Offensive Language Detection in Roman Urdu Using Hybrid Machine Learning–Deep Learning Models with Embedding Techniques

by Nisar Hussain, Amna Qasim, Gull Mehak, Olga Kolesnikova, Alexander Gelbukh and Grigori Sidorov

Information 2025, 16(2), 139; https://doi.org/10.3390/info16020139 - 13 Feb 2025

Cited by 4 | Viewed by 1961

Abstract

With the rapid expansion of social media, detecting offensive language has become critically important for healthy online interactions. This poses a considerable challenge for low-resource languages such as Roman Urdu which are widely spoken on platforms like Facebook. In this paper, we perform [...] Read more.

With the rapid expansion of social media, detecting offensive language has become critically important for healthy online interactions. This poses a considerable challenge for low-resource languages such as Roman Urdu which are widely spoken on platforms like Facebook. In this paper, we perform a comprehensive study of offensive language detection models on Roman Urdu datasets using both Machine Learning (ML) and Deep Learning (DL) approaches. We present a dataset of 89,968 Facebook comments and extensive preprocessing techniques such as TF-IDF features, Word2Vec, and fastText embeddings to address linguistic idiosyncrasies and code-mixed aspects of Roman Urdu. Among the ML models, a linear kernel Support Vector Machine (SVM) model scored the best performance, with an F1 score of 94.76, followed by SVM models with radial and polynomial kernels. Even the use of BoW uni-gram features with naive Bayes produced competitive results, with an F1 score of 94.26. The DL models performed well, with Bi-LSTM returning an F1 score of 98.00 with Word2Vec embeddings and fastText-based Bi-RNN performing at 97.00, showcasing the inference of contextual embeddings and soft similarity. The CNN model also gave a good result, with an F1 score of 96.00. The CNN model also achieved an F1 score of 96.00. This study presents hybrid ML and DL approaches to improve offensive language detection approaches for low-resource languages. This research opens up new doors to providing safer online environments for widespread Roman Urdu users. Full article

(This article belongs to the Special Issue Application of Machine Learning in Data Science and Computational Intelligence)

► Show Figures

Figure 1

Search Results (150)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (150)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI