Submit to Mathematics Review for Mathematics Propose a Special Issue

Journal Menu

Journal Browser

Machine Learning, Statistics and Big Data

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Related Special Issue
Published Papers

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "D1: Probability and Statistics".

Deadline for manuscript submissions: closed (30 June 2024) | Viewed by 35849

Share This Special Issue

Special Issue Editors

Dr. Codruta Mare

E-Mail Website
Guest Editor

Department of Statistics-Forecasts-Mathematics, Faculty of Economics and Business Administration & the Interdisciplinary Centre for Data Science, Babeș-Bolyai University, Cluj, Romania
Interests: spatial econometrics; economic forecasting; econometrics; statistics
Special Issues, Collections and Topics in MDPI journals

Dr. Ioana Florina Coita

E-Mail Website
Guest Editor

Department of Finance and Accounting, Faculty of Economics, University of Oradea, Oradea, Romania
Interests: machine learning; sentiment analysis; AI in finance; behavioural finance; statistics

Special Issue Information

Dear Colleagues,

Intense technological progress has led to a significant increase in data production and the importance of evaluating these data. Algorithms have been constructed in order to analyze and predict data for decision-making purposes. Classical econometrics are increasingly being compared to or even replaced by machine learning methods for data analysis. Special analytical procedures are being developed for big data situations, which can be found in all fields of human activity, from finance to transportation. As the goal of the European Commission is to sustain innovations in machine learning and artificial intelligence techniques in different sectors, the main goal of this Special Issue is to gather researchers in the field of statistics, econometrics, machine learning and big data. Contributions in the form of different types of theoretical developments, procedure constructions, or applications of such methods are welcome. FinTech and artificial intelligence methods applied in finance are encouraged.

This Special Issue is supported by and developed under the auspices of the COST CA 19130 “Fintech and Artificial Intelligence in Finance”, supported by COST (European Cooperation in Science and Technology); www.cost.eu, https://fin-ai.eu/

Dr. Codruta Mare
Dr. Ioana Florina Coita
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

statistics
econometrics
machine learning
big data
financial econometrics
spatial econometrics
spatial machine learning
sentiment analysis
FinTech
digital finance
artificial intellingence
supervised vs. unsupervised learning
forecasting methods
IoT
cloud
blockchain
architecture for big data
big data analytics
data mining
cyberspace

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Machine Learning, Statistics and Big Data, 2nd Edition in Mathematics (1 article)

Published Papers (10 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

21 pages, 7079 KB

Open AccessArticle

An Expanded Spatial Durbin Model with Ordinary Kriging of Unobserved Big Climate Data

by Annisa Nur Falah, Yudhie Andriyana, Budi Nurani Ruchjana, Eddy Hermawan, Teguh Harjana, Edy Maryadi, Risyanto, Haries Satyawardhana and Sinta Berliana Sipayung

Mathematics 2024, 12(16), 2447; https://doi.org/10.3390/math12162447 - 7 Aug 2024

Cited by 7 | Viewed by 2492

Abstract

Spatial models are essential in the prediction of climate phenomena because they can model the complex relationships between different locations. In this study, we discuss an expanded spatial Durbin model with ordinary kriging on unobserved locations (ESDMOK) to predict rainfall patterns in Java Island. The classical spatial Durbin model needed to be expanded to obtain a parameter estimation for each location. We combined this with ordinary kriging because the data were not available in some locations. The data were taken from the National Aeronautics and Space Administration Prediction of Worldwide Energy Resources (NASA POWER) website. Since climate data are big data, we implement a big data analytics approach, namely the data analytics life cycle method. As the exogenous variables, we used air temperature, humidity, solar irradiation, wind speed, and surface pressure. The authors developed an R-Shiny web applications to implement our proposed technique. Using our proposed technique, we obtained more accurate and reliable climate data prediction, indicated by the mean absolute percentage error (MAPE), which was equal to 1.956%. The greatest effect on rainfall was given by the surface pressure variable, and the smallest was wind speed. Full article

(This article belongs to the Special Issue Machine Learning, Statistics and Big Data)

► Show Figures

Figure 1

17 pages, 1903 KB

Open AccessArticle

Spillover Effect of Network Public Opinion on Market Prices of Small-Scale Agricultural Products

by Xingchen Lv, Weijun Lin, Jun Meng and Linan Mo

Mathematics 2024, 12(4), 539; https://doi.org/10.3390/math12040539 - 8 Feb 2024

Cited by 2 | Viewed by 1671

Abstract

Network public opinion plays a crucial role in the behavior and decision making of various stakeholders, including farmers, middlemen, and consumers. It also affects the price fluctuations of small-scale agricultural products. Understanding the transmission path and spillover effect of network public opinion on the price fluctuations of these products is essential for ensuring their sustainable development and price stability. This paper selects the monthly data of network public opinion and related market prices of small-scale agricultural products from January 2014 to December 2021, constructs a network public opinion value through the sentiment classification results of deep learning models, and uses the trivariate VAR-BEKK-GARCH(1,1) model and spillover index model to study the spillover effect and spillover index of network public opinion on the market prices of small-scale agricultural products (national average price and origin price). The results show that: (1) There is a bidirectional volatility spillover effect between public opinion sentiment and the market prices of small-scale agricultural products. Additionally, this two-way volatility spillover effect is also evident between the average market prices and the origin prices of these commodities. (2) The influence of network public opinion on the market prices of small-scale agricultural products is substantial, with the spillover index being more pronounced for origin prices than for national average prices and reaching its zenith earlier. Consequently, based on these results, recommendations are provided to adapt planting and inventory strategies, enhance vigilance towards price risk transmission amongst small-scale agricultural product markets, and improve the comprehensive information platform encompassing the entire industry chain. Full article

(This article belongs to the Special Issue Machine Learning, Statistics and Big Data)

► Show Figures

Figure 1

21 pages, 1183 KB

Open AccessArticle

Mastery of “Monthly Effects”: Big Data Insights into Contrarian Strategies for DJI 30 and NDX 100 Stocks over a Two-Decade Period

by Chien-Liang Chiu, Paoyu Huang, Min-Yuh Day, Yensen Ni and Yuhsin Chen

Mathematics 2024, 12(2), 356; https://doi.org/10.3390/math12020356 - 22 Jan 2024

Cited by 3 | Viewed by 4272

Abstract

In contrast to finding better monthly performance shown in a specific month, such as the January effect (i.e., better stock price performance in January as opposed to other months), which has been extensively studied, the goal of this study is to determine whether investors would obtain better subsequent performance as technical trading signals emitted in a specific month because, from the investment perspective, investors purchasing stocks now would not know their performance until later. We contend that our analysis emphasizes its critical role in steering investment decisions and enhancing profitability; nonetheless, this issue appears to be overlooked in the relevant literature. As such, utilizing big data to analyze the constituent stocks of the DJI 30 and NDX 100 indices from 2003 to 2022 (i.e., two-decade data), this study investigates whether trading these stocks as trading signals emitted via contrarian regulation of stochastic oscillator indicators (SOIs) and the relative strength index (RSI) in specific months would result in superior subsequent performance (hereafter referred to as “monthly effects”). This study discovers that the oversold signals generated by these two contrarian regulations in March were associated with higher subsequent performance for holding 100 to 250 trading days (roughly one year) than other months. These findings highlight the importance of the trading time and the superiority of the RSI over SOIs in generating profits. This study sheds light on the significance of oversold trading signals and suggests that the “monthly effect” is crucial for achieving higher returns. Full article

(This article belongs to the Special Issue Machine Learning, Statistics and Big Data)

► Show Figures

Figure 1

26 pages, 5669 KB

Open AccessArticle

A Natural-Language-Processing-Based Method for the Clustering and Analysis of Movie Reviews and Classification by Genre

by Fernando González, Miguel Torres-Ruiz, Guadalupe Rivera-Torruco, Liliana Chonona-Hernández and Rolando Quintero

Mathematics 2023, 11(23), 4735; https://doi.org/10.3390/math11234735 - 22 Nov 2023

Cited by 16 | Viewed by 5153

Abstract

Reclassification of massive datasets acquired through different approaches, such as web scraping, is a big challenge to demonstrate the effectiveness of a machine learning model. Notably, there is a strong influence of the quality of the dataset used for training those models. Thus, we propose a threshold algorithm as an efficient method to remove stopwords. This method employs an unsupervised classification technique, such as K-means, to accurately categorize user reviews from the IMDb dataset into their most suitable categories, generating a well-balanced dataset. Analysis of the performance of the algorithm revealed a notable influence of the text vectorization method used concerning the generation of clusters when assessing various preprocessing approaches. Moreover, the algorithm demonstrated that the word embedding technique and the removal of stopwords to retrieve the clustered text significantly impacted the categorization. The proposed method involves confirming the presence of a suggested stopword within each review across various genres. Upon satisfying this condition, the method assesses if the word’s frequency exceeds a predefined threshold. The threshold algorithm yielded a mapping genre success above 80% compared to precompiled lists and a Zipf’s law-based method. In addition, we employed the mini-batch K-means method for the clustering formation of each differently preprocessed dataset. This approach enabled us to reclassify reviews more coherently. Summing up, our methodology categorizes sparsely labeled data into meaningful clusters, in particular, by using a combination of the proposed stopword removal method and TF-IDF. The reclassified and balanced datasets showed a significant improvement, achieving 94% accuracy compared to the original dataset. Full article

(This article belongs to the Special Issue Machine Learning, Statistics and Big Data)

► Show Figures

Figure 1

11 pages, 436 KB

Open AccessArticle

Efficient Estimation and Validation of Shrinkage Estimators in Big Data Analytics

by Salomi du Plessis, Mohammad Arashi, Gaonyalelwe Maribe and Salomon M. Millard

Mathematics 2023, 11(22), 4632; https://doi.org/10.3390/math11224632 - 13 Nov 2023

Cited by 3 | Viewed by 1980

Abstract

Shrinkage estimators are often used to mitigate the consequences of multicollinearity in linear regression models. Despite the ease with which these techniques can be applied to small- or moderate-size datasets, they encounter significant challenges in the big data domain. Some of these challenges are that the volume of data often exceeds the storage capacity of a single computer and that the time required to obtain results becomes infeasible due to the computational burden of a high volume of data. We propose an algorithm for the efficient model estimation and validation of various well-known shrinkage estimators to be used in scenarios where the volume of the data is large. Our proposed algorithm utilises sufficient statistics that can be computed and updated at the row level, thus minimizing access to the entire dataset. A simulation study, as well as an application on a real-world dataset, illustrates the efficiency of the proposed approach. Full article

(This article belongs to the Special Issue Machine Learning, Statistics and Big Data)

► Show Figures

Figure 1

20 pages, 2643 KB

Open AccessArticle

JQPro:Join Query Processing in a Distributed System for Big RDF Data Using the Hash-Merge Join Technique

by Nahla Mohammed Elzein, Mazlina Abdul Majid, Ibrahim Abaker Targio Hashem, Ashraf Osman Ibrahim, Anas W. Abulfaraj and Faisal Binzagr

Mathematics 2023, 11(5), 1275; https://doi.org/10.3390/math11051275 - 6 Mar 2023

Cited by 2 | Viewed by 3208

Abstract

In the last decade, the volume of semantic data has increased exponentially, with the number of Resource Description Framework (RDF) datasets exceeding trillions of triples in RDF repositories. Hence, the size of RDF datasets continues to grow. However, with the increasing number of RDF triples, complex multiple RDF queries are becoming a significant demand. Sometimes, such complex queries produce many common sub-expressions in a single query or over multiple queries running as a batch. In addition, it is also difficult to minimize the number of RDF queries and processing time for a large amount of related data in a typical distributed environment encounter. To address this complication, we introduce a join query processing model for big RDF data, called JQPro. By adopting a MapReduce framework in JQPro, we developed three new algorithms, which are hash-join, sort-merge, and enhanced MapReduce-join for join query processing of RDF data. Based on an experiment conducted, the result showed that the JQPro model outperformed the two popular algorithms, gStore and RDF-3X, with respect to the average execution time. Furthermore, the JQPro model was also tested against RDF-3X, RDFox, and PARJs using the LUBM benchmark. The result showed that the JQPro model had better performance in comparison with the other models. In conclusion, the findings showed that JQPro achieved improved performance with 87.77% in terms of execution time. Hence, in comparison with the selected models, JQPro performs better. Full article

(This article belongs to the Special Issue Machine Learning, Statistics and Big Data)

► Show Figures

Figure 1

19 pages, 3590 KB

Open AccessArticle

Blockchain-Based Distributed Federated Learning in Smart Grid

by Marcel Antal, Vlad Mihailescu, Tudor Cioara and Ionut Anghel

Mathematics 2022, 10(23), 4499; https://doi.org/10.3390/math10234499 - 29 Nov 2022

Cited by 26 | Viewed by 4424

Abstract

The participation of prosumers in demand-response programs is essential for the success of demand-side management in renewable-powered energy grids. Unfortunately, the engagement is still low due to concerns related to the privacy of their energy data used in the prediction processes. In this paper, we propose a blockchain-based distributed federated learning (FL) technique for energy-demand prediction that combines FL with blockchain to provide data privacy and trust features for energy prosumers. The privacy-sensitive energy data are stored locally at edge prosumer nodes without revealing it to third parties, with only the learned local model weights being shared using a blockchain network. The global federated model is not centralized but distributed and replicated over the blockchain overlay, ensuring the model immutability and provenance of parameter updates. We had proposed smart contracts to deal with the integration of local machine-learning prediction models with the blockchain, defining functions for the model parameters’ scaling and reduction of blockchain overhead. The centralized, local-edge, and blockchain-integrated models are comparatively evaluated for prediction of energy demand 24 h ahead using a multi-layer perceptron model and the monitored energy data of several prosumers. The results show only a slight decrease in prediction accuracy in the case of blockchain-based distributed FL with reliable data privacy support compared with the centralized learning solution. Full article

(This article belongs to the Special Issue Machine Learning, Statistics and Big Data)

► Show Figures

Figure 1

13 pages, 936 KB

Open AccessArticle

Machine Learning Models for Predicting Romanian Farmers’ Purchase of Crop Insurance

by Codruţa Mare, Daniela Manaţe, Gabriela-Mihaela Mureşan, Simona Laura Dragoş, Cristian Mihai Dragoş and Alexandra-Anca Purcel

Mathematics 2022, 10(19), 3625; https://doi.org/10.3390/math10193625 - 3 Oct 2022

Cited by 9 | Viewed by 3306

Abstract

Considering the large size of the agricultural sector in Romania, increasing the crop insurance adoption rate and identifying the factors that drive adoption can present a real interest in the Romanian market. The main objective of this research was to identify the performance of machine learning (ML) models in predicting Romanian farmers’ purchase of crop insurance based on crop-level and farmer-level characteristics. The data set used contains 721 responses to a survey administered to Romanian farmers in September 2021, and includes both characteristics related to the crop as well as farmer-level socio-demographic attributes, perception about risk, perception about insurers and knowledge about agricultural insurance. Various ML algorithms have been implemented, and among the approaches developed, the Multi-Layer Perceptron Classifier (MLP) and the Linear Support Vector Classifier (SVC) outperform the other algorithms in terms of overall accuracy. Tree-based ensembles were used to identify the most prominent features, which included the farmer’s general perception of risk, their likelihood of engaging in risky behaviour, as well as their level of knowledge about crop insurance. The models implemented in this study could be a useful tool for insurers and policymakers for predicting potential crop insurance ownership. Full article

(This article belongs to the Special Issue Machine Learning, Statistics and Big Data)

► Show Figures

Figure 1

27 pages, 420 KB

Open AccessArticle

Ridge Regression and the Elastic Net: How Do They Do as Finders of True Regressors and Their Coefficients?

by Rajaram Gana

Mathematics 2022, 10(17), 3057; https://doi.org/10.3390/math10173057 - 24 Aug 2022

Cited by 7 | Viewed by 3481

Abstract

For the linear model

Y = X b + e r r o r

, where the number of regressors (

p

) exceeds the number of observations (

n

), the Elastic Net (EN) was proposed, in 2005, to estimate

b

. [...] Read more.

For the linear model

Y = X b + e r r o r

, where the number of regressors (

p

) exceeds the number of observations (

n

), the Elastic Net (EN) was proposed, in 2005, to estimate

b

. The EN uses both the Lasso, proposed in 1996, and ordinary Ridge Regression (RR), proposed in 1970, to estimate

b

. However, when

p > n

, using only RR to estimate

b

has not been considered in the literature thus far. Because RR is based on the least-squares framework, only using RR to estimate

b

is computationally much simpler than using the EN. We propose a generalized ridge regression (GRR) algorithm, a superior alternative to the EN, for estimating

b

as follows: partition

X

from left to right so that every partition, but the last one, has 3 observations per regressor; for each partition, we estimate

Y

with the regressors in that partition using ordinary RR; retain the regressors with statistically significant

t

-ratios and the corresponding RR tuning parameter

k

, by partition; use the retained regressors and

k

values to re-estimate

Y

by GRR across all partitions, which yields

b

. Algorithmic efficacy is compared using 4 metrics by simulation, because the algorithm is mathematically intractable. Three metrics, with their probabilities of RR’s superiority over EN in parentheses, are: the proportion of true regressors discovered (99%); the squared distance, from the true coefficients, of the significant coefficients (86%); and the squared distance, from the true coefficients, of estimated coefficients that are both significant and true (74%). The fourth metric is the probability that none of the regressors discovered are true, which for RR and EN is 4% and 25%, respectively. This indicates the additional advantage RR has over the EN in terms of discovering causal regressors. Full article

(This article belongs to the Special Issue Machine Learning, Statistics and Big Data)

22 pages, 483 KB

Open AccessArticle

Efficient Mining Support-Confidence Based Framework Generalized Association Rules

by Amira Mouakher, Fahima Hajjej and Sarra Ayouni

Mathematics 2022, 10(7), 1163; https://doi.org/10.3390/math10071163 - 3 Apr 2022

Cited by 4 | Viewed by 3111

Abstract

Mining association rules are one of the most critical data mining problems, intensively studied since their inception. Several approaches have been proposed in the literature to extend the basic association rule framework to extract more general rules, including the negation operator. Thereby, this extension is expected to bring valuable knowledge about an examined dataset to the user. However, the efficient extraction of such rules is challenging, especially for sparse datasets. This paper focuses on the extraction of literalsets, i.e., a set of present and absent items. By consequence, generalized association rules can be straightforwardly derived from these literalsets. To this end, we introduce and prove the soundness of a theorem that paves the way to speed up the costly computation of the support of a literalist. Furthermore, we introduce FasterIE, an efficient algorithm that puts the proved theorem at work to efficiently extract the whole set of frequent literalets. Thus, the FasterIE algorithm is shown to devise very efficient strategies, which minimize as far as possible the number of node visits in the explored search space. Finally, we have carried out experiments on benchmark datasets to back the effectiveness claim of the proposed algorithm versus its competitors. Full article

(This article belongs to the Special Issue Machine Learning, Statistics and Big Data)

► Show Figures

Journal Menu

Journal Browser

Machine Learning, Statistics and Big Data

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Related Special Issue

Published Papers (10 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI