MDPI - Publisher of Open Access Journals

32 pages, 594 KB

Open AccessArticle

Design-Aware Predictive and Causal Modeling of Cardiovascular Risk in Chronic Kidney Disease Using Penalized and Double Machine Learning Approaches

by Fernando Rojas, Axa Tapia and Hilda Espinoza

Mathematics 2026, 14(9), 1554; https://doi.org/10.3390/math14091554 - 4 May 2026

Viewed by 219

Abstract

We develop a design-aware framework that combines penalized prediction and causal inference for finite populations observed through complex survey designs. The framework integrates survey-weighted pseudo-likelihoods,

ℓ_{1}

-penalized estimation, Neyman-orthogonal moment functions, and a bootstrap procedure that resamples primary sampling units within strata. [...] Read more.

We develop a design-aware framework that combines penalized prediction and causal inference for finite populations observed through complex survey designs. The framework integrates survey-weighted pseudo-likelihoods,

ℓ_{1}

-penalized estimation, Neyman-orthogonal moment functions, and a bootstrap procedure that resamples primary sampling units within strata. Methodologically, the contribution is an explicit pipeline that supports design-based inference while separating predictive associations from structurally adjusted effects in high-dimensional, clustered data. We illustrate the framework using data from the Chilean National Health Survey (ENS) 2016–2017 to study the relationship between chronic kidney disease (CKD) and high cardiovascular (CV) risk. In the ENS adult population, the survey-weighted prevalence of CKD was 3.1% (95% CI: 2.4–3.8), and the prevalence of high CV risk was 23.9% (95% CI: 21.5–26.3). High CV risk was markedly more frequent among individuals with CKD than among those without CKD (90.9% versus 21.5%). Predictive and associational analyses combined survey-weighted penalized logistic regression (LASSO) with refitted unpenalized models. In conventional survey-weighted logistic regressions, CKD showed a strong association with high CV risk (odds ratio = 5.66; 95% CI: 2.71–11.82;

p < 0.001

), and effect sizes remained stable after LASSO-based variable selection. To assess causal relevance under confounding and potential endogeneity, we implemented two endogeneity-aware estimators: two-stage residual inclusion (2SRI) and double/debiased machine learning (DML). The DML estimator, defined as the primary causal estimand, reports an orthogonalized estimate of the average treatment effect of CKD on the probability of high CV risk. After adjustment for age and major cardiometabolic comorbidities, the DML estimate was attenuated and statistically non-significant (average treatment effect =

- 0.094

; 95% CI:

[- 0.409, 0.220]

). The 2SRI approach yielded unstable estimates with wide confidence intervals, consistent with the limited effective sample size of CKD cases (

n_{C K D} \approx 190

in a sample with n ≈ 6233) and weak identification conditions under low-prevalence settings. Simulation experiments under ENS-like complex sampling suggest that naive predictive associations may overestimate the structural contribution of CKD under confounding, whereas orthogonalized estimators yield more conservative estimates when identification holds. The causal interpretation relies on a conditional mean independence assumption given observed covariates and survey design, while control-function specifications are treated as diagnostic sensitivity analyses due to the absence of credible exclusion-based instruments. Overall, the results demonstrate a fundamental divergence between predictive relevance and causal importance in finite-population settings, underscoring the need for design-aware and endogeneity-robust methods in statistical modeling. Full article

(This article belongs to the Special Issue Applied Probability and Statistics: Theory, Methods, and Applications)

► Show Figures

Figure 1

24 pages, 36350 KB

Open AccessArticle

Partial Multi-Label Feature Selection via Entropy-Weighted Multi-Scale Neighborhood Granular Label Distribution Learning

by Yifan Cao, Mao Li, Cong Wang, Shuyu Fan, Ziqiao Yin and Binghui Guo

Entropy 2026, 28(4), 422; https://doi.org/10.3390/e28040422 - 9 Apr 2026

Viewed by 342

Abstract

Partial multi-label feature selection aims to identify discriminative features from data where each instance is associated with an ambiguous candidate label set. Existing methods are typically built upon single-scale modeling assumptions and may fail to fully exploit the multi-granularity structure underlying instance–label relationships. [...] Read more.

Partial multi-label feature selection aims to identify discriminative features from data where each instance is associated with an ambiguous candidate label set. Existing methods are typically built upon single-scale modeling assumptions and may fail to fully exploit the multi-granularity structure underlying instance–label relationships. To address this limitation, we propose a novel framework termed PML-FSMNG, which integrates entropy-weighted multi-scale neighborhood granules with label distribution learning. Specifically, multi-scale neighborhood systems are constructed to estimate label distinguishability at multiple structural scales, and Shannon entropy is employed to adaptively fuse scale-specific label distributions into a robust soft supervisory signal. Based on the learned label distribution, an embedded sparse regression model with

ℓ_{2, 1}

-norm regularization is developed for discriminative feature selection, together with an entropy-regularized adaptive graph learning mechanism to preserve intrinsic geometric structure. Extensive experiments on benchmark datasets demonstrate that the proposed method consistently outperforms several state-of-the-art approaches, validating the effectiveness of multi-scale modeling and entropy-guided adaptive learning under label ambiguity. Full article

► Show Figures

Figure 1

35 pages, 20162 KB

Open AccessArticle

An Efficient and Sparse Kernelized Grey RVFL Network for Energy Forecasting

by Wenkang Gong and Gaofeng Zong

Systems 2026, 14(3), 257; https://doi.org/10.3390/systems14030257 - 28 Feb 2026

Viewed by 335

Abstract

Reliable energy forecasting is essential for the planning and dispatch of power and fuel systems; however, energy series are often short and exhibit pronounced nonlinearity. To tackle this small sample setting, we propose a gray random vector functional link (GRVFL) framework and further [...] Read more.

Reliable energy forecasting is essential for the planning and dispatch of power and fuel systems; however, energy series are often short and exhibit pronounced nonlinearity. To tackle this small sample setting, we propose a gray random vector functional link (GRVFL) framework and further derive a kernelized variant (KGRVFL). In GRVFL, an RVFL network is integrated into gray system modeling, and the parameters are learned via sparsity-regularized regression, enabling stable and reproducible training without backpropagation or evolutionary optimization. Hyperparameters are tuned using Bayesian optimization driven by a Top-k mean absolute percentage error (Top-k MAPE) criterion to improve robustness. To further promote compactness, we introduce a fractional ratio-type Fr-

ℓ 1

penalty and solve the resulting problem efficiently using a fractional coordinate descent (FCD) algorithm. The proposed methods are assessed on six real-world energy datasets using eight evaluation metrics. Comparisons with nine gray model baselines and six machine learning forecasters demonstrate that the sparse KGRVFL (SKGRVFL) achieves higher predictive accuracy and improved training stability under small sample conditions. Full article

(This article belongs to the Section Systems Engineering)

► Show Figures

Figure 1

29 pages, 1017 KB

Open AccessArticle

Bayesian Elastic Net Cox Models for Time-to-Event Prediction: Application to a Breast Cancer Cohort

by Ersin Yılmaz, Syed Ejaz Ahmed and Dursun Aydın

Entropy 2026, 28(3), 264; https://doi.org/10.3390/e28030264 - 27 Feb 2026

Viewed by 662

Abstract

High-dimensional survival analyses require calibrated risk and measurable uncertainty, but standard elastic net Cox models provide only point estimates. We develop a Bayesian elastic net Cox (BEN–Cox) model for high-dimensional proportional hazards regression that places a hierarchical global–local shrinkage prior on coefficients and [...] Read more.

High-dimensional survival analyses require calibrated risk and measurable uncertainty, but standard elastic net Cox models provide only point estimates. We develop a Bayesian elastic net Cox (BEN–Cox) model for high-dimensional proportional hazards regression that places a hierarchical global–local shrinkage prior on coefficients and performs full Bayesian inference via Hamiltonian Monte Carlo. We represent the elastic net penalty as a global–local Gaussian scale mixture with hyperpriors that learn the

ℓ_{1} / ℓ_{2}

trade-off, enabling adaptive sparsity that preserves correlated gene groups; using HMC with the Cox partial likelihood, we obtain full posterior distributions for hazard ratios and patient-level survival curves. Methodologically, we formalize a Bayesian analogue of the elastic net grouping effect at the posterior mode and establish posterior contraction under sparsity for the Cox partial likelihood, supporting the stability of the resulting risk scores. On the METABRIC breast cancer cohort (

n = 1903

;

p = 440

gene-level features after preprocessing, derived from an Illumina HT-12 array with ≈24,000 probes at the raw feature level), BEN–Cox achieves slightly lower prediction error, higher discrimination, and better global calibration than a tuned ridge Cox, lasso Cox, and elastic net Cox baselines on a held-out test set. Posterior summaries provide credible intervals for hazard ratios and identify a compact gene panel that remains biologically plausible. BEN–Cox provides an uncertainty-aware alternative to tuned penalized Cox models with theoretical support, offering modest improvements in calibration and providing an interpretable sparse signature in highly-correlated survival data. Full article

(This article belongs to the Special Issue Statistical Planning, Inference, and Decision Making in High-Dimensional Data Analysis)

► Show Figures

Figure 1

25 pages, 16590 KB

Open AccessArticle

Adaptive Bayesian System Identification for Long-Term Forecasting of Industrial Load and Renewables Generation

by Lina Sheng, Zhixian Wang, Xiaowen Wang and Linglong Zhu

Electronics 2026, 15(3), 530; https://doi.org/10.3390/electronics15030530 - 26 Jan 2026

Viewed by 378

Abstract

The expansion of renewables in modern power systems and the coordinated development of upstream and downstream industrial chains are promoting a shift on the utility side from traditional settlement by energy toward operation driven by data and models. Industrial electricity consumption data exhibit [...] Read more.

The expansion of renewables in modern power systems and the coordinated development of upstream and downstream industrial chains are promoting a shift on the utility side from traditional settlement by energy toward operation driven by data and models. Industrial electricity consumption data exhibit pronounced multi-scale temporal structures and sectoral heterogeneity, which makes unified long-term load and generation forecasting while maintaining accuracy, interpretability, and scalability a challenge. From a modern system identification perspective, this paper proposes a System Identification in Adaptive Bayesian Framework (SIABF) for medium- and long-term industrial load forecasting based on daily freeze electricity time series. By combining daily aggregation of high-frequency data, frequency domain analysis, sparse identification, and long-term extrapolation, we first construct daily freeze series from 15 min measurements, and then we apply discrete Fourier transforms and a spectral complexity index to extract dominant periodic components and build an interpretable sinusoidal basis library. A sparse regression formulation with

ℓ_{1}

regularization is employed to select a compact set of key basis functions, yielding concise representations of sector and enterprise load profiles and naturally supporting multivariate and joint multi-sector modeling. Building on this structure, we implement a state-space-implicit physics-informed Bayesian forecasting model and evaluate it on real data from three representative sectors, namely, steel, photovoltaics, and chemical, using one year of 15 min measurements. Under a one-month-ahead evaluation using one year of 15 min measurements, the proposed framework attains a Mean Absolute Percentage Error (MAPE) of 4.5% for a representative PV-related customer case and achieves low single-digit MAPE for high-inertia sectors, often outperforming classical statistical models, sparse learning baselines, and deep learning architectures. These results should be interpreted as indicative given the limited time span and sample size, and broader multi-year, population-level validation is warranted. Full article

(This article belongs to the Section Systems & Control Engineering)

► Show Figures

Figure 1

14 pages, 891 KB

Open AccessArticle

A Multi-Task Ensemble Strategy for Gene Selection and Cancer Classification

by Suli Lin, Zhizhe Lin, Jin Zhang and Man-Fai Leung

Bioengineering 2025, 12(11), 1245; https://doi.org/10.3390/bioengineering12111245 - 13 Nov 2025

Viewed by 779

Abstract

Gene expression-based tumor classification aims to distinguish tumor types based on gene expression profiles. This task is difficult due to the high dimensionality of gene expression data and limited sample sizes. Most datasets contain tens of thousands of genes but only a small [...] Read more.

Gene expression-based tumor classification aims to distinguish tumor types based on gene expression profiles. This task is difficult due to the high dimensionality of gene expression data and limited sample sizes. Most datasets contain tens of thousands of genes but only a small number of samples. As a result, selecting informative genes is necessary to improve classification performance and model interpretability. Many existing gene selection methods fail to produce stable and consistent results, especially when training data are limited. To address this, we propose a multi-task ensemble strategy that combines repeated sampling with joint feature selection and classification. The method generates multiple training subsets and applies multi-task logistic regression with

ℓ_{2, 1}

group sparsity regularization to select a subset of genes that appears consistently across tasks. This promotes stability and reduces redundancy. The framework supports integration with standard classifiers such as logistic regression and support vector machines. It performs both gene selection and classification in a single process. We evaluate the method on simulated and real gene expression datasets. The results show that it outperforms several baseline methods in classification accuracy and the consistency of selected genes. Full article

(This article belongs to the Section Biosignal Processing)

► Show Figures

Graphical abstract

25 pages, 492 KB

Open AccessArticle

Federated Logistic Regression with Enhanced Privacy: A Dynamic Gaussian Perturbation Approach via ADMM from an Information-Theoretic Perspective

by Jie Yuan, Yue Wang, Hao Ma and Wentao Liu

Entropy 2025, 27(11), 1148; https://doi.org/10.3390/e27111148 - 12 Nov 2025

Cited by 1 | Viewed by 737

Abstract

Federated learning enables distributed model training across edge nodes without direct raw data sharing, but model parameter transmission still poses significant privacy risks. To address this vulnerability, a Distributed Logistic Regression Gaussian Perturbation (DLGP) algorithm is proposed, which integrates the Alternating Direction Method [...] Read more.

Federated learning enables distributed model training across edge nodes without direct raw data sharing, but model parameter transmission still poses significant privacy risks. To address this vulnerability, a Distributed Logistic Regression Gaussian Perturbation (DLGP) algorithm is proposed, which integrates the Alternating Direction Method of Multipliers (ADMM) with a calibrated differential privacy mechanism. The centralized logistic regression problem is decomposed into local subproblems that are solved independently on edge nodes, where only perturbed model parameters are shared with a central server. The Gaussian noise injection mechanism is designed to optimize the privacy–utility trade-off by introducing calibrated uncertainty into parameter updates, effectively obscuring sensitive information while preserving essential model characteristics. The

ℓ_{2}

-sensitivity of local updates is derived, and a rigorous

(ϵ, δ)

-differential privacy guarantee is provided. Evaluations are conducted on a real-world dataset, and it is demonstrated that DLGP maintains favorable performance across varying privacy budgets, numbers of nodes, and penalty parameters. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Figure 1

23 pages, 21197 KB

Open AccessArticle

DLPLSR: Dual Label Propagation-Driven Least Squares Regression with Feature Selection for Semi-Supervised Learning

by Shuanghao Zhang, Zhengtong Yang and Zhaoyin Shi

Mathematics 2025, 13(14), 2290; https://doi.org/10.3390/math13142290 - 16 Jul 2025

Cited by 1 | Viewed by 1070

Abstract

In the real world, most data are unlabeled, which drives the development of semi-supervised learning (SSL). Among SSL methods, least squares regression (LSR) has attracted attention for its simplicity and efficiency. However, existing semi-supervised LSR approaches suffer from challenges such as the insufficient [...] Read more.

In the real world, most data are unlabeled, which drives the development of semi-supervised learning (SSL). Among SSL methods, least squares regression (LSR) has attracted attention for its simplicity and efficiency. However, existing semi-supervised LSR approaches suffer from challenges such as the insufficient use of unlabeled data, low pseudo-label accuracy, and inefficient label propagation. To address these issues, this paper proposes dual label propagation-driven least squares regression with feature selection, named DLPLSR, which is a pseudo-label-free SSL framework. DLPLSR employs a fuzzy-graph-based clustering strategy to capture global relationships among all samples, and manifold regularization preserves local geometric consistency, so that it implements the dual label propagation mechanism for comprehensive utilization of unlabeled data. Meanwhile, a dual-feature selection mechanism is established by integrating orthogonal projection for maximizing feature information with an ℓ_2,1-norm regularization for eliminating redundancy, thereby jointly enhancing the discriminative power. Benefiting from these two designs, DLPLSR boosts learning performance without pseudo-labeling. Finally, the objective function admits an efficient closed-form solution solvable via an alternating optimization strategy. Extensive experiments on multiple benchmark datasets show the superiority of DLPLSR compared to state-of-the-art LSR-based SSL methods. Full article

(This article belongs to the Special Issue Machine Learning and Optimization for Clustering Algorithms)

► Show Figures

Figure 1

40 pages, 12261 KB

Open AccessArticle

Integrating Reliability, Uncertainty, and Subjectivity in Design Knowledge Flow: A CMZ-BENR Augmented Framework for Kansei Engineering

by Haoyi Lin, Pohsun Wang, Jing Liu and Chiawei Chu

Symmetry 2025, 17(5), 758; https://doi.org/10.3390/sym17050758 - 14 May 2025

Viewed by 1477

Abstract

As a knowledge-intensive activity, the Kansei engineering (KE) process encounters numerous challenges in the design knowledge flow, primarily due to issues related to information reliability, uncertainty, and subjectivity. Bridging this gap, this study introduces an advanced KE framework integrating a cloud model with [...] Read more.

As a knowledge-intensive activity, the Kansei engineering (KE) process encounters numerous challenges in the design knowledge flow, primarily due to issues related to information reliability, uncertainty, and subjectivity. Bridging this gap, this study introduces an advanced KE framework integrating a cloud model with Z-numbers (CMZ) and Bayesian elastic net regression (BENR). In stage-I of this KE, data mining techniques are employed to process online user reviews, coupled with a similarity analysis of affective word clusters to identify representative emotional descriptors. During stage-II, the CMZ algorithm refines K-means clustering outcomes for market-representative product forms, enabling precise feature characterization and experimental prototype development. Stage-III addresses linguistic uncertainties in affective modeling through CMZ-augmented semantic differential questionnaires, achieving a multi-granular representation of subjective evaluations. Subsequently, stage-IV employs BENR for automated hyperparameter optimization in design knowledge inference, eliminating manual intervention. The framework’s efficacy is empirically validated through a domestic cleaning robot case study, demonstrating superior performance in resolving multiple information processing challenges via comparative experiments. Results confirm that this KE framework significantly improves uncertainty management in design knowledge flow compared to conventional implementations. Furthermore, by leveraging the intrinsic symmetry of the normal cloud model with Z-numbers distributions and the balanced ℓ₁/ℓ₂ regularization of BENR, CMZ–BENR framework embodies the principle of structural harmony. Full article

(This article belongs to the Special Issue Fuzzy Set Theory and Uncertainty Theory—3rd Edition)

► Show Figures

Figure 1

21 pages, 4425 KB

Open AccessArticle

The Prediction Performance Analysis of the Lasso Model with Convex Non-Convex Sparse Regularization

by Wei Chen, Qiuyue Liu, Hancong Li and Jian Zou

Algorithms 2025, 18(4), 195; https://doi.org/10.3390/a18040195 - 1 Apr 2025

Viewed by 1460

Abstract

The incorporation of

ℓ_{1}

regularization in Lasso regression plays a crucial role by inducing convexity to the objective function, thereby facilitating its minimization; when compared to non-convex regularization, the utilization of

ℓ_{1}

regularization introduces bias through artificial coefficient shrinkage towards zero. [...] Read more.

The incorporation of

ℓ_{1}

regularization in Lasso regression plays a crucial role by inducing convexity to the objective function, thereby facilitating its minimization; when compared to non-convex regularization, the utilization of

ℓ_{1}

regularization introduces bias through artificial coefficient shrinkage towards zero. Recently, the convex non-convex (CNC) regularization framework has emerged as a powerful technique that enables the incorporation of non-convex regularization terms while maintaining the overall convexity of the optimization problem. Although this method has shown remarkable performance in various empirical studies, its theoretical understanding is still relatively limited. In this paper, we provide a theoretical investigation into the prediction performance of the Lasso model with CNC sparse regularization. By leveraging oracle inequalities, we establish a tighter upper bound on prediction performance compared to the traditional

ℓ_{1}

regularizer. Additionally, we propose an alternating direction method of multipliers (ADMM) algorithm to efficiently solve the proposed model and rigorously analyze its convergence property. Our numerical results, evaluated on both synthetic data and real-world magnetic resonance imaging (MRI) reconstruction tasks, confirm the superior effectiveness of our proposed approach. Full article

(This article belongs to the Section Analysis of Algorithms and Complexity Theory)

► Show Figures

Figure 1

16 pages, 470 KB

Open AccessFeature PaperArticle

Distributed Estimation for ℓ₀-Constrained Quantile Regression Using Iterative Hard Thresholding

by Zhihe Zhao and Heng Lian

Mathematics 2025, 13(4), 669; https://doi.org/10.3390/math13040669 - 18 Feb 2025

Cited by 1 | Viewed by 1183

Abstract

Distributed frameworks for statistical estimation and inference have become a critical toolkit for analyzing massive data efficiently. In this paper, we present distributed estimation for high-dimensional quantile regression with

ℓ_{0}

constraint using iterative hard thresholding (IHT). We propose a communication-efficient distributed estimator [...] Read more.

Distributed frameworks for statistical estimation and inference have become a critical toolkit for analyzing massive data efficiently. In this paper, we present distributed estimation for high-dimensional quantile regression with

ℓ_{0}

constraint using iterative hard thresholding (IHT). We propose a communication-efficient distributed estimator which is linearly convergent to the true parameter up to the statistical precision of the model, despite the fact that the check loss minimization problem with an

ℓ_{0}

constraint is neither strongly smooth nor convex. The distributed estimator we develop can achieve the same convergence rate as the estimator based on the whole data set under suitable assumptions. In our simulations, we illustrate the convergence of the estimators under different settings and also demonstrate the accuracy of nonzero parameter identification. Full article

(This article belongs to the Section D1: Probability and Statistics)

► Show Figures

Figure 1

16 pages, 293 KB

Open AccessArticle

Adaptive CoCoLasso for High-Dimensional Measurement Error Models

by Qin Yu

Entropy 2025, 27(2), 97; https://doi.org/10.3390/e27020097 - 21 Jan 2025

Cited by 1 | Viewed by 1421

Abstract

A significant portion of theoretical and empirical studies in high-dimensional regression have primarily concentrated on clean datasets. However, in numerous practical scenarios, data are often corrupted by missing values and measurement errors, which cannot be ignored. Despite the substantial progress in high-dimensional regression [...] Read more.

A significant portion of theoretical and empirical studies in high-dimensional regression have primarily concentrated on clean datasets. However, in numerous practical scenarios, data are often corrupted by missing values and measurement errors, which cannot be ignored. Despite the substantial progress in high-dimensional regression with contaminated covariates, methods that achieve an effective trade-off among prediction accuracy, feature selection, and computational efficiency remain significantly underexplored. We introduce adaptive convex conditioned Lasso (Adaptive CoCoLasso), offering a new approach that can handle high-dimensional linear models with error-prone measurements. This estimator combines a projection onto the nearest positive semi-definite matrix with an adaptively weighted

ℓ_{1}

penalty. Theoretical guarantees are provided by establishing error bounds for the estimators. The results from the synthetic data analysis indicate that the Adaptive CoCoLasso performs strongly in prediction accuracy and mean squared error, particularly in scenarios involving both additive and multiplicative noise in measurements. While the Adaptive CoCoLasso estimator performs comparably or is slightly outperformed by certain methods, such as Hard, in reducing the number of incorrectly identified covariates, its strength lies in offering a more favorable trade-off between prediction accuracy and sparse modeling. Full article

(This article belongs to the Special Issue Information-Theoretic Methods in Data Analytics)

14 pages, 406 KB

Open AccessArticle

On the Adaptive Penalty Parameter Selection in ADMM

by Serena Crisci, Valentina De Simone and Marco Viola

Algorithms 2023, 16(6), 264; https://doi.org/10.3390/a16060264 - 25 May 2023

Cited by 7 | Viewed by 5289

Abstract

Many data analysis problems can be modeled as a constrained optimization problem characterized by nonsmooth functionals, often because of the presence of

ℓ_{1}

-regularization terms. One of the most effective ways to solve such problems is through the Alternate Direction Method of [...] Read more.

Many data analysis problems can be modeled as a constrained optimization problem characterized by nonsmooth functionals, often because of the presence of

ℓ_{1}

-regularization terms. One of the most effective ways to solve such problems is through the Alternate Direction Method of Multipliers (ADMM), which has been proved to have good theoretical convergence properties even if the arising subproblems are solved inexactly. Nevertheless, experience shows that the choice of the parameter

τ

penalizing the constraint violation in the Augmented Lagrangian underlying ADMM affects the method’s performance. To this end, strategies for the adaptive selection of such parameter have been analyzed in the literature and are still of great interest. In this paper, starting from an adaptive spectral strategy recently proposed in the literature, we investigate the use of different strategies based on Barzilai–Borwein-like stepsize rules. We test the effectiveness of the proposed strategies in the solution of real-life consensus logistic regression and portfolio optimization problems. Full article

(This article belongs to the Special Issue Recent Advances in Nonsmooth Optimization and Analysis)

► Show Figures

Figure 1

24 pages, 418 KB

Open AccessArticle

Robust Variable Selection and Regularization in Quantile Regression Based on Adaptive-LASSO and Adaptive E-NET

by Innocent Mudhombo and Edmore Ranganai

Computation 2022, 10(11), 203; https://doi.org/10.3390/computation10110203 - 21 Nov 2022

Cited by 2 | Viewed by 2654

Abstract

Although the variable selection and regularization procedures have been extensively considered in the literature for the quantile regression

(Q R)

scenario via penalization, many such procedures fail to deal with data aberrations in the design space, namely, high leverage points ( [...] Read more.

Although the variable selection and regularization procedures have been extensively considered in the literature for the quantile regression

(Q R)

scenario via penalization, many such procedures fail to deal with data aberrations in the design space, namely, high leverage points (X-space outliers) and collinearity challenges simultaneously. Some high leverage points referred to as collinearity influential observations tend to adversely alter the eigenstructure of the design matrix by inducing or masking collinearity. Therefore, in the literature, it is recommended that the problems of collinearity and high leverage points should be dealt with simultaneously. In this article, we suggest adaptive

L A S S O

and adaptive E-

N E T

penalized

Q R

(

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

) procedures where the weights are based on a

Q R

estimator as remedies. We extend this methodology to their penalized weighted

Q R

versions of

W Q R

-

L A S S O

,

W Q R

-E-

N E T

procedures we had suggested earlier. In the literature, adaptive weights are based on the RIDGE regression (

R R

) parameter estimator. Although the use of this estimator may be plausible at the

ℓ_{1}

estimator (

Q R

at

τ = 0.5

) for the symmetrical distribution, it may not be so at extreme quantile levels. Therefore, we use a

Q R

-based estimator to derive adaptive weights. We carried out a comparative study of

Q R

-

L A S S O

,

Q R

-E-

N E T

, and the ones we suggest here,

v i z .

,

Q R

-

A L A S S O

,

Q R

-

A E

-

N E T

, weighted

Q R

A L A S S O

penalized and weighted

Q R

adaptive

A E

-

N E T

penalized (

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

) procedures. The simulation study results show that

Q R

-

A L A S S O

,

Q R

-

A E

-

N E T

,

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

generally outperform their nonadaptive counterparts. At predictor matrices with collinearity inducing points under normality, the

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

, respectively, outperform the non-adaptive procedures in the unweighted scenarios, as follows: in all 16 cases (100%) with respect to correctly selected (shrunk) zero coefficients; in 88% with respect to correctly fitted models; and in 81% with respect to prediction. In the weighted penalized

W Q R

scenarios,

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

outperform their non-adaptive versions as follows: in 75% of the time with respect to both correctly fitted models and correctly shrunk zero coefficients and in 63% with respect to prediction. At predictor matrices with collinearity masking points under normality, the

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

, respectively, outperform the non-adaptive procedures in the unweighted scenarios as follows: in prediction, in

100 %

and

88 %

of the time; with respect to correctly fitted models in

100 %

and

50 %

(while in

50 %

equally); and with respect to correctly shrunk zero coefficients in

100 %

of the time. In the weighted scenario,

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

outperform their respective non-adaptive versions as follows; with respect to prediction, both in

63 %

of the time; with respect to correctly fitted models, in

88 %

of the time while with respect to correctly shrunk zero coefficients in

100 %

of the time. At predictor matrices with collinearity inducing points under the t-distribution, the

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

procedures outperform their respective non-adaptive procedures in the unweighted scenarios as follows: in prediction, in

100 %

and

75 %

of the time; with respect to correctly fitted models

88 %

of the time each; and with respect to correctly shrunk zero

88 %

and in

100 %

of the time. Additionally, the procedures

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

and their unweighted versions result in the former outperforming the latter in all respective cases with respect to prediction whilst there is no clear "winner" with respect to the other two measures. Overall, the

W Q R

-

A L A S S O

generally outperforms all other models with respect to all measures. At the predictor matrix with collinearity-masking points under the t-distribution, all adaptive versions outperformed their respective non-adaptive versions with respect to all metrics. In the unweighted scenarios, the

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

dominate their non-adaptive versions as follows: in prediction, in

63 %

and

75 %

of the time; with respect to correctly fitted models, in

100 %

and

38 %

(while in

62 %

equally); in

100 %

of the time with respect to correctly shrunk zero coefficients. In the weighted scenarios, all adaptive versions outperformed their non-adaptive versions as follows:

62 %

of the time in both respective cases with respect to prediction while it is vice-versa with respect to correctly fitted models and with respect to correctly shrunk zero coefficients. In the weighted scenarios,

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

dominate their respective non-adaptive versions as follows; with respect to correctly fitted models, in

62 %

of the time while with respect to correctly shrunk zero coefficients in

100 %

of the time in both cases. At the design matrix with both collinearity and high leverage points under the heavy-tailed distributions (t-distributions with

d \in (1; 6)

degrees of freedom) scenarios, the dominance of the adaptive procedures over the non-adaptive ones is again evident. In the unweighted scenarios, the procedures

Q R

-

A L A S S O

and

Q R

-

A E

-

N E T

outperform their non-adaptive versions as follows; in prediction, in

75 %

and

62 %

of the time; with respect to correctly fitted models, they perform better in

100 %

and

88 %

of the time, while with respect to correctly shrunk zero coefficients, they outperform their non-adaptive ones

100 %

of the time in both cases. In the weighted scenarios,

W Q R

-

A L A S S O

and

W Q R

-

A E

-

N E T

dominate their non-adaptive versions as follows; with respect to prediction, in

100 %

of the time in both cases; and with respect to both correctly fitted models and correctly shrunk zero coefficients, they both do

88 %

of the time. Results from applications of the suggested procedures to real life data sets are more or less in line with the simulation studies results. Full article

(This article belongs to the Section Computational Engineering)

► Show Figures

Figure 1

18 pages, 2004 KB

Open AccessArticle

A Generalized Linear Joint Trained Framework for Semi-Supervised Learning of Sparse Features

by Juan Carlos Laria, Line H. Clemmensen, Bjarne K. Ersbøll and David Delgado-Gómez

Mathematics 2022, 10(16), 3001; https://doi.org/10.3390/math10163001 - 19 Aug 2022

Cited by 3 | Viewed by 2277

Abstract

The elastic net is among the most widely used types of regularization algorithms, commonly associated with the problem of supervised generalized linear model estimation via penalized maximum likelihood. Its attractive properties, originated from a combination of

ℓ_{1}

and

ℓ_{2}

norms, endow [...] Read more.

The elastic net is among the most widely used types of regularization algorithms, commonly associated with the problem of supervised generalized linear model estimation via penalized maximum likelihood. Its attractive properties, originated from a combination of

ℓ_{1}

and

ℓ_{2}

norms, endow this method with the ability to select variables, taking into account the correlations between them. In the last few years, semi-supervised approaches that use both labeled and unlabeled data have become an important component in statistical research. Despite this interest, few researchers have investigated semi-supervised elastic net extensions. This paper introduces a novel solution for semi-supervised learning of sparse features in the context of generalized linear model estimation: the generalized semi-supervised elastic net (s²net), which extends the supervised elastic net method, with a general mathematical formulation that covers, but is not limited to, both regression and classification problems. In addition, a flexible and fast implementation for s²net is provided. Its advantages are illustrated in different experiments using real and synthetic data sets. They show how s²net improves the performance of other techniques that have been proposed for both supervised and semi-supervised learning. Full article

(This article belongs to the Section E: Applied Mathematics)

► Show Figures

Figure 1

Search Results (28)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (28)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI