Statistical Modeling for Analyzing Data with Complex Structures

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Probability and Statistics".

Deadline for manuscript submissions: closed (30 April 2024) | Viewed by 18902

Special Issue Editor

Department of Statistics, College of Arts and Sciences, Florida State University, Tallahassee, FL 32306, USA
Interests: medical image data analysis; manifold-valued data analysis; functional data analysis; data integration; imaging genetics; machine learning; deep learning

Special Issue Information

Dear Colleagues,

Big Data challenges have been observed in recent years in various fields including medicine, public health, epidemiology, social science, economics, and finance. Typically, the datasets are of complex structure, since they are collected from multiple sources (multiple sites, cities, or countries) and different domains (Euclidean space, functional space, or Riemannian manifold). Drawing insights from such large and complex datasets require new statistical modeling tools and the expertise from statistical computing and data science. 

This Special Issue of Mathematics is dedicated to collecting papers on cutting-edge methodological developments and unique applications for analyzing studies with complex data structures. Contributions proposing advanced statistical models to deal with Big Data with messy sources and/or nonstandard domains are welcome. 

Topics of interest include but are not limited to the following: Big Data integration, functional data analysis, manifold data analysis, tensor data analysis, shape data analysis, time series data analysis, and applications in health sciences and social sciences.

Dr. Chao Huang 
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

32 pages, 469 KiB  
Article
Estimation and Inference for Spatio-Temporal Single-Index Models
by Hongxia Wang, Zihan Zhao, Hongxia Hao and Chao Huang
Mathematics 2023, 11(20), 4289; https://doi.org/10.3390/math11204289 - 14 Oct 2023
Viewed by 970
Abstract
To better fit the actual data, this paper will consider both spatio-temporal correlation and heterogeneity to build the model. In order to overcome the “curse of dimensionality” problem in the nonparametric method, we improve the estimation method of the single-index model and combine [...] Read more.
To better fit the actual data, this paper will consider both spatio-temporal correlation and heterogeneity to build the model. In order to overcome the “curse of dimensionality” problem in the nonparametric method, we improve the estimation method of the single-index model and combine it with the correlation and heterogeneity of the spatio-temporal model to obtain a good estimation method. In this paper, assuming that the spatio-temporal process obeys the α mixing condition, a nonparametric procedure is developed for estimating the variance function based on a fully nonparametric function or dimensional reduction structure, and the resulting estimator is consistent. Then, a reweighting estimation of the parametric component can be obtained via taking the estimated variance function into account. The rate of convergence and the asymptotic normality of the new estimators are established under mild conditions. Simulation studies are conducted to evaluate the efficacy of the proposed methodologies, and a case study about the estimation of the air quality evaluation index in Nanjing is provided for illustration. Full article
(This article belongs to the Special Issue Statistical Modeling for Analyzing Data with Complex Structures)
Show Figures

Figure 1

18 pages, 869 KiB  
Article
Partially Functional Linear Models with Linear Process Errors
by Yanping Hu and Zhongqi Pang
Mathematics 2023, 11(16), 3581; https://doi.org/10.3390/math11163581 - 18 Aug 2023
Viewed by 1026
Abstract
In this paper, we focus on the partial functional linear model with linear process errors deduced by not necessarily independent random variables. Based on Mercer’s theorem and Karhunen–Loève expansion, we give the estimators of the slope parameter and coefficient function in the model, [...] Read more.
In this paper, we focus on the partial functional linear model with linear process errors deduced by not necessarily independent random variables. Based on Mercer’s theorem and Karhunen–Loève expansion, we give the estimators of the slope parameter and coefficient function in the model, establish the asymptotic normality of the estimator for the parameter and discuss the weak convergence with rates of the proposed estimators. Meanwhile, the penalized estimator of the parameter is defined by the SCAD penalty and its oracle property is investigated. Finite sample behavior of the proposed estimators is also analysed via simulations. Full article
(This article belongs to the Special Issue Statistical Modeling for Analyzing Data with Complex Structures)
Show Figures

Figure 1

13 pages, 1884 KiB  
Article
Modeling the Cigarette Consumption of Poor Households Using Penalized Zero-Inflated Negative Binomial Regression with Minimax Concave Penalty
by Yudhie Andriyana, Rinda Fitriani, Bertho Tantular, Neneng Sunengsih, Kurnia Wahyudi, I Gede Nyoman Mindra Jaya and Annisa Nur Falah
Mathematics 2023, 11(14), 3192; https://doi.org/10.3390/math11143192 - 20 Jul 2023
Cited by 1 | Viewed by 1227
Abstract
The cigarette commodity is the second largest contributor to the food poverty line. Several aspects imply that poor people consume cigarettes despite having a minimal income. In this study, we are interested in investigating factors influencing poor people to be active smokers. Since [...] Read more.
The cigarette commodity is the second largest contributor to the food poverty line. Several aspects imply that poor people consume cigarettes despite having a minimal income. In this study, we are interested in investigating factors influencing poor people to be active smokers. Since the consumption number is a set of count data with zero excess, we have an overdispersion problem. This implies that a standard Poisson regression technique cannot be implemented. On the other hand, the factors involved in the model need to be selected simultaneously. Therefore, we propose to use a zero-inflated negative binomial (ZINB) regression with a minimax concave penalty (MCP) to determine the dominant factors influencing cigarette consumption in poor households. The data used in this study were microdata from the National Socioeconomic Survey (SUSENAS) conducted in March 2019 in East Java Province, Indonesia. The result shows that poor households with a male head of household, having no education, working in the informal sector, having many adult household members, and receiving social assistance tend to consume more cigarettes than others. Additionally, cigarette consumption decreases with the increasing age of the head of household. Full article
(This article belongs to the Special Issue Statistical Modeling for Analyzing Data with Complex Structures)
Show Figures

Figure 1

28 pages, 753 KiB  
Article
ADMM-Based Differential Privacy Learning for Penalized Quantile Regression on Distributed Functional Data
by Xingcai Zhou and Yu Xiang
Mathematics 2022, 10(16), 2954; https://doi.org/10.3390/math10162954 - 16 Aug 2022
Cited by 2 | Viewed by 1829
Abstract
Alternating Direction Method of Multipliers (ADMM) is a widely used machine learning tool in distributed environments. In the paper, we propose an ADMM-based differential privacy learning algorithm (FDP-ADMM) on penalized quantile regression for distributed functional data. The FDP-ADMM algorithm can resist adversary attacks [...] Read more.
Alternating Direction Method of Multipliers (ADMM) is a widely used machine learning tool in distributed environments. In the paper, we propose an ADMM-based differential privacy learning algorithm (FDP-ADMM) on penalized quantile regression for distributed functional data. The FDP-ADMM algorithm can resist adversary attacks to avoid the possible privacy leakage in distributed networks, which is designed by functional principal analysis, an approximate augmented Lagrange function, ADMM algorithm, and privacy policy via Gaussian mechanism with time-varying variance. It is also a noise-resilient, convergent, and computationally effective distributed learning algorithm, even if for high privacy protection. The theoretical analysis on privacy and convergence guarantees is derived and offers a privacy–utility trade-off: a weaker privacy guarantee would result in better utility. The evaluations on simulation-distributed functional datasets have demonstrated the effectiveness of the FDP-ADMM algorithm even if under high privacy guarantee. Full article
(This article belongs to the Special Issue Statistical Modeling for Analyzing Data with Complex Structures)
Show Figures

Figure 1

27 pages, 2037 KiB  
Article
Deep Neural Networks for Form-Finding of Tensegrity Structures
by Seunghye Lee, Qui X. Lieu, Thuc P. Vo and Jaehong Lee
Mathematics 2022, 10(11), 1822; https://doi.org/10.3390/math10111822 - 25 May 2022
Cited by 13 | Viewed by 2994
Abstract
Analytical paradigms have limited conventional form-finding methods of tensegrities; therefore, an innovative approach is urgently needed. This paper proposes a new form-finding method based on state-of-the-art deep learning techniques. One of the statical paradigms, a force density method, is substituted for trained deep [...] Read more.
Analytical paradigms have limited conventional form-finding methods of tensegrities; therefore, an innovative approach is urgently needed. This paper proposes a new form-finding method based on state-of-the-art deep learning techniques. One of the statical paradigms, a force density method, is substituted for trained deep neural networks to obtain necessary information of tensegrities. It is based on the differential evolution algorithm, where the eigenvalue decomposition process of the force density matrix and the process of the equilibrium matrix are not needed to find the feasible sets of nodal coordinates. Three well-known tensegrity examples including a 2D two-strut, a 3D-truncated tetrahedron and an icosahedron tensegrity are presented for numerical verifications. The cases of the ReLU and Leaky ReLU activation functions show better results than those of the ELU and SELU. Moreover, the results of the proposed method are in good agreement with the analytical super-stable lines. Three examples show that the proposed method exhibits more uniform final shapes of tensegrity, and much faster convergence history than those of the conventional one. Full article
(This article belongs to the Special Issue Statistical Modeling for Analyzing Data with Complex Structures)
Show Figures

Figure 1

15 pages, 1373 KiB  
Article
A New Case-Mix Classification Method for Medical Insurance Payment
by Hongliang Liu, Jinpeng Tan, Kyongson Jon and Wensheng Zhu
Mathematics 2022, 10(10), 1640; https://doi.org/10.3390/math10101640 - 11 May 2022
Viewed by 2076
Abstract
Rapidly rising medical expenses can be controlled by a well-designed medical insurance payment system with the ability to ensure the stability and development of medical insurance funds. At present, China is in the stage of exploring the reform of the medical insurance payment [...] Read more.
Rapidly rising medical expenses can be controlled by a well-designed medical insurance payment system with the ability to ensure the stability and development of medical insurance funds. At present, China is in the stage of exploring the reform of the medical insurance payment system. One of the significant tasks is to establish an appropriate reimbursement model for disease treatment expenses, so as to meet the needs of patients for medical services. In this paper, we propose a case-mixed decision tree method that considers the homogeneity within the same case subgroup as well as the heterogeneity between different case subgroups. The optimal case mix is determined by maximizing the inter-group difference and minimizing the intra-group difference. In order to handle the instability of the tree-based method with a small amount of data, we propose a multi-model ensemble decision tree method. This method first extracts and merges the inherent rules of the data by the stacking-based ensemble learning method, then generates a new sample set by aggregating the original data with the additional samples obtained by applying these rules, and finally trains the case-mix decision tree with the augmented dataset. The proposed method ensures the interpretability of the grouping rules and the stability of the grouping at the same time. The experimental results on real-world data demonstrate that our case-mix method can provide reasonable medical insurance payment standards and the appropriate medical insurance compensation payment for different patient groups. Full article
(This article belongs to the Special Issue Statistical Modeling for Analyzing Data with Complex Structures)
Show Figures

Figure 1

13 pages, 549 KiB  
Article
GR-GNN: Gated Recursion-Based Graph Neural Network Algorithm
by Kao Ge, Jian-Qiang Zhao and Yan-Yong Zhao
Mathematics 2022, 10(7), 1171; https://doi.org/10.3390/math10071171 - 4 Apr 2022
Cited by 5 | Viewed by 2655
Abstract
Under an internet background involving artificial intelligence and big data—unstructured, materialized, network graph-structured data, such as social networks, knowledge graphs, and compound molecules, have gradually entered into various specific business scenarios. One problem that urgently needs to be solved in the industry involves [...] Read more.
Under an internet background involving artificial intelligence and big data—unstructured, materialized, network graph-structured data, such as social networks, knowledge graphs, and compound molecules, have gradually entered into various specific business scenarios. One problem that urgently needs to be solved in the industry involves how to perform feature extractions, transformations, and operations in graph-structured data to solve downstream tasks, such as node classifications and graph classifications in actual business scenarios. Therefore, this paper proposes a gated recursion-based graph neural network (GR-GNN) algorithm to solve tasks such as node depth-dependent feature extractions and node classifications for graph-structured data. The GRU neural network unit was used to complete the node classification task and, thereby, construct the GR-GNN model. In order to verify the accuracy, effectiveness, and superiority of the algorithm on the open datasets Cora, CiteseerX, and PubMed, the algorithm was used to compare the operation results with the classical graph neural network baseline algorithms GCN, GAT, and GraphSAGE, respectively. The experimental results show that, on the validation set, the accuracy and target loss of the GR-GNN algorithm are better than or equal to other baseline algorithms; in terms of algorithm convergence speed, the performance of the GR-GNN algorithm is comparable to that of the GCN algorithm, which is higher than other algorithms. The research results show that the GR-GNN algorithm proposed in this paper has high accuracy and computational efficiency, and very wide application significance. Full article
(This article belongs to the Special Issue Statistical Modeling for Analyzing Data with Complex Structures)
Show Figures

Figure 1

21 pages, 583 KiB  
Article
Communication-Efficient Distributed Learning for High-Dimensional Support Vector Machines
by Xingcai Zhou and Hao Shen
Mathematics 2022, 10(7), 1029; https://doi.org/10.3390/math10071029 - 23 Mar 2022
Cited by 2 | Viewed by 1832
Abstract
Distributed learning has received increasing attention in recent years and is a special need for the era of big data. For a support vector machine (SVM), a powerful binary classification tool, we proposed a novel efficient distributed sparse learning algorithm, the communication-efficient surrogate [...] Read more.
Distributed learning has received increasing attention in recent years and is a special need for the era of big data. For a support vector machine (SVM), a powerful binary classification tool, we proposed a novel efficient distributed sparse learning algorithm, the communication-efficient surrogate likelihood support vector machine (CSLSVM), in high-dimensions with convex or nonconvex penalties, based on a communication-efficient surrogate likelihood (CSL) framework. We extended the CSL for distributed SVMs without the need to smooth the hinge loss or the gradient of the loss. For a CSLSVM with lasso penalty, we proved that its estimator could achieve a near-oracle property for l1 penalized SVM estimators on whole datasets. For a CSLSVM with smoothly clipped absolute deviation penalty, we showed that its estimator enjoyed the oracle property, and that it used local linear approximation (LLA) to solve the optimization problem. Furthermore, we showed that the LLA was guaranteed to converge to the oracle estimator, even in our distributed framework and the ultrahigh-dimensional setting, if an appropriate initial estimator was available. The proposed approach is highly competitive with the centralized method within a few rounds of communications. Numerical experiments provided supportive evidence. Full article
(This article belongs to the Special Issue Statistical Modeling for Analyzing Data with Complex Structures)
Show Figures

Figure 1

14 pages, 393 KiB  
Article
Dynamic Analysis of a Stochastic Rumor Propagation Model with Regime Switching
by Fangju Jia and Chunzheng Cao
Mathematics 2021, 9(24), 3277; https://doi.org/10.3390/math9243277 - 16 Dec 2021
Cited by 3 | Viewed by 2591
Abstract
We study the rumor propagation model with regime switching considering both colored and white noises. Firstly, by constructing suitable Lyapunov functions, the sufficient conditions for ergodic stationary distribution and extinction are obtained. Then we obtain the threshold Rs which guarantees the extinction [...] Read more.
We study the rumor propagation model with regime switching considering both colored and white noises. Firstly, by constructing suitable Lyapunov functions, the sufficient conditions for ergodic stationary distribution and extinction are obtained. Then we obtain the threshold Rs which guarantees the extinction and the existence of the stationary distribution of the rumor. Finally, numerical simulations are performed to verify our model. The results indicated that there is a unique ergodic stationary distribution when Rs>1. The rumor becomes extinct exponentially with probability one when Rs<1. Full article
(This article belongs to the Special Issue Statistical Modeling for Analyzing Data with Complex Structures)
Show Figures

Figure 1

Back to TopTop