Advances in Machine Learning Prediction Models

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: closed (15 December 2020) | Viewed by 61018

Special Issue Editors

1. Institute of Structural Mechanics, Bauhaus University Weimar, Weimar, Germany
2. School of the Built Environment Oxford Brookes University, Oxford, UK
3. Kalman Kando Faculty of Electrical Engineering, Obuda University, Budapest, Hungary
4. Queensland University of Technology, 130 Victoria Park Road, Queensland, Australia
Interests: machine learning; deep learning; ensemble models; hybrid models; applied mathematics; soft computing; deep reinforcement learning; machine learning for big data; mathematical IT; hydropower modeling; prediction models; time series prediction; business intelligence; climate models; machine learning for remote sensing; hazard models; extreme events; atmospheric model; forecasting models; predictive analytics; meta-heuristic techniques
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Issue is devoted to the recent advances in prediction models. Novel methods, new applications, comparative analyses of models, case studies, and state-of-the-art review papers are particularly welcomed. Prediction models are essential to many scientific domains and are gaining widespread popularity. Health care, cybersecurity, education, credit card fraud detection, social media, cloud computing, software measurement, quality and defect simulation, cost and effort estimations, software reuse and evaluation, computational mechanics, theoretical physics, astrophysics, materials design innovation, disease diagnosis, hydrological modeling, earth systems, atmospheric sciences, weather and extreme events prediction, hazard mapping, natural disasters warning systems, policy-making, energy systems, time-series forecasting, and climate change modeling are among the popular applications of prediction models in the literature. The beneficial aspects and the generalizability of prediction models in various technological and scientific domains have highly increased the progression, competitiveness, and research impact of different fields.  

Very recently, prediction models have been fundamentally revolutionized by the availability of massive computational power, big data technologies, efficient data handling and preprocessing methods, and, most importantly, intelligent learning algorithms. Novel machine learning methods integrated with intelligent optimization and various soft computing techniques, hybrid and deep learning methods, and ensemble techniques are emerging fast and deliver models with higher accuracy. As a response to the recent advancements, the objective of this Special Issue is to present a collection of notable methods and applications of prediction models. We invite scientists from all around the world to contribute to developing a comprehensive collection of papers on the progressive and high impact of prediction models.

Prof. Dr. Timon Rabczuk
Prof. Amir Mosavi
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Prediction models
  • Machine learning
  • Predictive analytics
  • Prescriptive analytics
  • Deep learning
  • Deep reinforcement learning
  • Hybrid models
  • Ensemble models
  • Soft computing
  • Machine learning for big data
  • Advanced statistical learning
  • Short-term prediction models
  • Long-term prediction models
  • Business intelligence
  • Intelligent optimization
  • Data-driven models
  • Feature selection
  • Meta-heuristic techniques
  • Reinforcement learning schemes
  • Preprocessing methods
  • Energy prediction models
  • Climate prediction models
  • Time series forecasting
  • Advanced regression techniques
  • Advanced classification techniques

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

15 pages, 519 KiB  
Article
Cost-Sensitive Variable Selection for Multi-Class Imbalanced Datasets Using Bayesian Networks
by Darío Ramos-López and Ana D. Maldonado
Mathematics 2021, 9(2), 156; https://doi.org/10.3390/math9020156 - 13 Jan 2021
Cited by 5 | Viewed by 2626
Abstract
Multi-class classification in imbalanced datasets is a challenging problem. In these cases, common validation metrics (such as accuracy or recall) are often not suitable. In many of these problems, often real-world problems related to health, some classification errors may be tolerated, whereas others [...] Read more.
Multi-class classification in imbalanced datasets is a challenging problem. In these cases, common validation metrics (such as accuracy or recall) are often not suitable. In many of these problems, often real-world problems related to health, some classification errors may be tolerated, whereas others are to be avoided completely. Therefore, a cost-sensitive variable selection procedure for building a Bayesian network classifier is proposed. In it, a flexible validation metric (cost/loss function) encoding the impact of the different classification errors is employed. Thus, the model is learned to optimize the a priori specified cost function. The proposed approach was applied to forecasting an air quality index using current levels of air pollutants and climatic variables from a highly imbalanced dataset. For this problem, the method yielded better results than other standard validation metrics in the less frequent class states. The possibility of fine-tuning the objective validation function can improve the prediction quality in imbalanced data or when asymmetric misclassification costs have to be considered. Full article
(This article belongs to the Special Issue Advances in Machine Learning Prediction Models)
Show Figures

Figure 1

19 pages, 19318 KiB  
Article
Learning Medical Image Denoising with Deep Dynamic Residual Attention Network
by S M A Sharif, Rizwan Ali Naqvi and Mithun Biswas
Mathematics 2020, 8(12), 2192; https://doi.org/10.3390/math8122192 - 9 Dec 2020
Cited by 32 | Viewed by 5502
Abstract
Image denoising performs a prominent role in medical image analysis. In many cases, it can drastically accelerate the diagnostic process by enhancing the perceptual quality of noisy image samples. However, despite the extensive practicability of medical image denoising, the existing denoising methods illustrate [...] Read more.
Image denoising performs a prominent role in medical image analysis. In many cases, it can drastically accelerate the diagnostic process by enhancing the perceptual quality of noisy image samples. However, despite the extensive practicability of medical image denoising, the existing denoising methods illustrate deficiencies in addressing the diverse range of noise appears in the multidisciplinary medical images. This study alleviates such challenging denoising task by learning residual noise from a substantial extent of data samples. Additionally, the proposed method accelerates the learning process by introducing a novel deep network, where the network architecture exploits the feature correlation known as the attention mechanism and combines it with spatially refine residual features. The experimental results illustrate that the proposed method can outperform the existing works by a substantial margin in both quantitative and qualitative comparisons. Also, the proposed method can handle real-world image noise and can improve the performance of different medical image analysis tasks without producing any visually disturbing artefacts. Full article
(This article belongs to the Special Issue Advances in Machine Learning Prediction Models)
Show Figures

Graphical abstract

22 pages, 354 KiB  
Article
Simultaneous Feature Selection and Classification for Data-Adaptive Kernel-Penalized SVM
by Xin Liu, Bangxin Zhao and Wenqing He
Mathematics 2020, 8(10), 1846; https://doi.org/10.3390/math8101846 - 20 Oct 2020
Cited by 6 | Viewed by 2288
Abstract
Simultaneous feature selection and classification have been explored in the literature to extend the support vector machine (SVM) techniques by adding penalty terms to the loss function directly. However, it is the kernel function that controls the performance of the SVM, and an [...] Read more.
Simultaneous feature selection and classification have been explored in the literature to extend the support vector machine (SVM) techniques by adding penalty terms to the loss function directly. However, it is the kernel function that controls the performance of the SVM, and an imbalance in the data will deteriorate the performance of an SVM. In this paper, we examine a new method of simultaneous feature selection and binary classification. Instead of incorporating the standard loss function of the SVM, a penalty is added to the data-adaptive kernel function directly to control the performance of the SVM, by firstly conformally transforming the kernel functions of the SVM, and then re-conducting an SVM classifier based on the sparse features selected. Both convex and non-convex penalties, such as least absolute shrinkage and selection (LASSO), moothly clipped absolute deviation (SCAD) and minimax concave penalty (MCP) are explored, and the oracle property of the estimator is established accordingly. An iterative optimization procedure is applied as there is no analytic form of the estimated coefficients available. Numerical comparisons show that the proposed method outperforms the competitors considered when data are imbalanced, and it performs similarly to the competitors when data are balanced. The method can be easily applied in medical images from different platforms. Full article
(This article belongs to the Special Issue Advances in Machine Learning Prediction Models)
17 pages, 4224 KiB  
Article
Kidney and Renal Tumor Segmentation Using a Hybrid V-Net-Based Model
by Fuat Türk, Murat Lüy and Necaattin Barışçı
Mathematics 2020, 8(10), 1772; https://doi.org/10.3390/math8101772 - 14 Oct 2020
Cited by 49 | Viewed by 6092
Abstract
Kidney tumors represent a type of cancer that people of advanced age are more likely to develop. For this reason, it is important to exercise caution and provide diagnostic tests in the later stages of life. Medical imaging and deep learning methods are [...] Read more.
Kidney tumors represent a type of cancer that people of advanced age are more likely to develop. For this reason, it is important to exercise caution and provide diagnostic tests in the later stages of life. Medical imaging and deep learning methods are becoming increasingly attractive in this sense. Developing deep learning models to help physicians identify tumors with successful segmentation is of great importance. However, not many successful systems exist for soft tissue organs, such as the kidneys and the prostate, of which segmentation is relatively difficult. In such cases where segmentation is difficult, V-Net-based models are mostly used. This paper proposes a new hybrid model using the superior features of existing V-Net models. The model represents a more successful system with improvements in the encoder and decoder phases not previously applied. We believe that this new hybrid V-Net model could help the majority of physicians, particularly those focused on kidney and kidney tumor segmentation. The proposed model showed better performance in segmentation than existing imaging models and can be easily integrated into all systems due to its flexible structure and applicability. The hybrid V-Net model exhibited average Dice coefficients of 97.7% and 86.5% for kidney and tumor segmentation, respectively, and, therefore, could be used as a reliable method for soft tissue organ segmentation. Full article
(This article belongs to the Special Issue Advances in Machine Learning Prediction Models)
Show Figures

Figure 1

11 pages, 1210 KiB  
Article
Attention-Enhanced Graph Neural Networks for Session-Based Recommendation
by Baocheng Wang and Wentao Cai
Mathematics 2020, 8(9), 1607; https://doi.org/10.3390/math8091607 - 18 Sep 2020
Cited by 7 | Viewed by 5006
Abstract
Session-based recommendation, which aims to match user needs with rich resources based on anonymous sessions, nowadays plays a critical role in various online platforms (e.g., media streaming sites, search and e-commerce). Existing recommendation algorithms usually model a session as a sequence or a [...] Read more.
Session-based recommendation, which aims to match user needs with rich resources based on anonymous sessions, nowadays plays a critical role in various online platforms (e.g., media streaming sites, search and e-commerce). Existing recommendation algorithms usually model a session as a sequence or a session graph to model transitions between items. Despite their effectiveness, we would argue that the performance of these methods is still flawed: (1) Using only fixed session item embedding without considering the diversity of users’ interests and target items. (2) For user’s long-term interest, the difficulty of capturing the different priorities for different items accurately. To tackle these defects, we propose a novel model which leverages both the target attentive network and self-attention network to improve the graph-neural-network (GNN)-based recommender. In our model, we first model user’s interaction sequences as session graphs which serves as the input of the GNN, and each node vector involved in session graph can be obtained via the GNN. Next, target attentive network can activates different user interests corresponding to varied target items (i.e., the session embedding learned varies with different target items), which can reveal the relevance between users’ interests and target items. At last, after applying the self-attention mechanism, the different priorities for different items can be captured to improve the precision of the long-term session representation. By using a hybrid of long-term and short-term session representation, we can capture users’ comprehensive interests at multiple levels. Extensive experiments demonstrate the effectiveness of our algorithm on two real-world datasets for session-based recommendation. Full article
(This article belongs to the Special Issue Advances in Machine Learning Prediction Models)
Show Figures

Figure 1

15 pages, 3553 KiB  
Article
Dimension Reduction of Machine Learning-Based Forecasting Models Employing Principal Component Analysis
by Yinghui Meng, Sultan Noman Qasem, Manouchehr Shokri and Shahab S
Mathematics 2020, 8(8), 1233; https://doi.org/10.3390/math8081233 - 27 Jul 2020
Cited by 12 | Viewed by 2533
Abstract
In this research, an attempt was made to reduce the dimension of wavelet-ANFIS/ANN (artificial neural network/adaptive neuro-fuzzy inference system) models toward reliable forecasts as well as to decrease computational cost. In this regard, the principal component analysis was performed on the input time [...] Read more.
In this research, an attempt was made to reduce the dimension of wavelet-ANFIS/ANN (artificial neural network/adaptive neuro-fuzzy inference system) models toward reliable forecasts as well as to decrease computational cost. In this regard, the principal component analysis was performed on the input time series decomposed by a discrete wavelet transform to feed the ANN/ANFIS models. The models were applied for dissolved oxygen (DO) forecasting in rivers which is an important variable affecting aquatic life and water quality. The current values of DO, water surface temperature, salinity, and turbidity have been considered as the input variable to forecast DO in a three-time step further. The results of the study revealed that PCA can be employed as a powerful tool for dimension reduction of input variables and also to detect inter-correlation of input variables. Results of the PCA-wavelet-ANN models are compared with those obtained from wavelet-ANN models while the earlier one has the advantage of less computational time than the later models. Dealing with ANFIS models, PCA is more beneficial to avoid wavelet-ANFIS models creating too many rules which deteriorate the efficiency of the ANFIS models. Moreover, manipulating the wavelet-ANFIS models utilizing PCA leads to a significant decreasing in computational time. Finally, it was found that the PCA-wavelet-ANN/ANFIS models can provide reliable forecasts of dissolved oxygen as an important water quality indicator in rivers. Full article
(This article belongs to the Special Issue Advances in Machine Learning Prediction Models)
Show Figures

Figure 1

17 pages, 3189 KiB  
Article
Machine Learning Modeling of Aerobic Biodegradation for Azo Dyes and Hexavalent Chromium
by Zulfiqar Ahmad, Hua Zhong, Amir Mosavi, Mehreen Sadiq, Hira Saleem, Azeem Khalid, Shahid Mahmood and Narjes Nabipour
Mathematics 2020, 8(6), 913; https://doi.org/10.3390/math8060913 - 4 Jun 2020
Cited by 16 | Viewed by 2991
Abstract
The present study emphasizes the efficacy of a biosurfactant-producing bacterial strain Klebsiella sp. KOD36 in biodegradation of azo dyes and hexavalent chromium individually and in a simultaneous system. The bacterial strain has exhibited a considerable potential for biodegradation of chromium and azo dyes [...] Read more.
The present study emphasizes the efficacy of a biosurfactant-producing bacterial strain Klebsiella sp. KOD36 in biodegradation of azo dyes and hexavalent chromium individually and in a simultaneous system. The bacterial strain has exhibited a considerable potential for biodegradation of chromium and azo dyes in single and combination systems (maximum 97%, 94% in an individual and combined system, respectively). Simultaneous aerobic biodegradation of azo dyes and hexavalent chromium (SBAHC) was modeled using machine learning programming, which includes gene expression programming, random forest, support vector regression, and support vector regression-fruit fly optimization algorithm. The correlation coefficient includes the dispersion index, and the Willmott agreement index was employed as statistical metrics to assess the performance of each model separately. In addition, the Taylor diagram was used to further investigate the methods used. The findings of the present study were that the support vector regression-fruitfly optimization algorithm (SVR-FOA) with correlation coefficient (CC) of 0.644, (scattered index) SI of 0.374, and (Willmott’s index of agreement) WI of 0.607 performed better than the autonomous support vector regression (SVR), gene expression programming (GEP), and random forest (RF) methods. In addition, the standalone SVR model with CC of 0.146, SI of 0.473, and WI of 0.408 ranked the second best. In summary, the SBAHC can be accurately estimated using the hybrid SVR-FOA method. In other words, FOA has proven to be a powerful optimization algorithm for increasing the accuracy of the SVR method. Full article
(This article belongs to the Special Issue Advances in Machine Learning Prediction Models)
Show Figures

Figure 1

46 pages, 4225 KiB  
Article
A Novel and Simple Mathematical Transform Improves the Perfomance of Lernmatrix in Pattern Classification
by José-Luis Velázquez-Rodríguez, Yenny Villuendas-Rey, Oscar Camacho-Nieto and Cornelio Yáñez-Márquez
Mathematics 2020, 8(5), 732; https://doi.org/10.3390/math8050732 - 6 May 2020
Cited by 3 | Viewed by 2913
Abstract
The Lernmatrix is a classic associative memory model. The Lernmatrix is capable of executing the pattern classification task, but its performance is not competitive when compared to state-of-the-art classifiers. The main contribution of this paper consists of the proposal of a simple mathematical [...] Read more.
The Lernmatrix is a classic associative memory model. The Lernmatrix is capable of executing the pattern classification task, but its performance is not competitive when compared to state-of-the-art classifiers. The main contribution of this paper consists of the proposal of a simple mathematical transform, whose application eliminates the subtractive alterations between patterns. As a consequence, the Lernmatrix performance is significantly improved. To perform the experiments, we selected 20 datasets that are challenging for any classifier, as they exhibit class imbalance. The effectiveness of our proposal was compared against seven supervised classifiers of the most important approaches (Bayes, nearest neighbors, decision trees, logistic function, support vector machines, and neural networks). By choosing balanced accuracy as a performance measure, our proposal obtained the best results in 10 datasets. The elimination of subtractive alterations makes the new model competitive against the best classifiers, and sometimes beats them. After applying the Friedman test and the Holm post hoc test, we can conclude that within a 95% confidence, our proposal competes successfully with the most effective classifiers of the state of the art. Full article
(This article belongs to the Special Issue Advances in Machine Learning Prediction Models)
Show Figures

Figure 1

18 pages, 4078 KiB  
Article
Improving the Accuracy of Convolutional Neural Networks by Identifying and Removing Outlier Images in Datasets Using t-SNE
by Husein Perez and Joseph H. M. Tah
Mathematics 2020, 8(5), 662; https://doi.org/10.3390/math8050662 - 27 Apr 2020
Cited by 46 | Viewed by 6895
Abstract
In the field of supervised machine learning, the quality of a classifier model is directly correlated with the quality of the data that is used to train the model. The presence of unwanted outliers in the data could significantly reduce the accuracy of [...] Read more.
In the field of supervised machine learning, the quality of a classifier model is directly correlated with the quality of the data that is used to train the model. The presence of unwanted outliers in the data could significantly reduce the accuracy of a model or, even worse, result in a biased model leading to an inaccurate classification. Identifying the presence of outliers and eliminating them is, therefore, crucial for building good quality training datasets. Pre-processing procedures for dealing with missing and outlier data, commonly known as feature engineering, are standard practice in machine learning problems. They help to make better assumptions about the data and also prepare datasets in a way that best expose the underlying problem to the machine learning algorithms. In this work, we propose a multistage method for detecting and removing outliers in high-dimensional data. Our proposed method is based on utilising a technique called t-distributed stochastic neighbour embedding (t-SNE) to reduce high-dimensional map of features into a lower, two-dimensional, probability density distribution and then use a simple descriptive statistical method called interquartile range (IQR) to identifying any outlier values from the density distribution of the features. t-SNE is a machine learning algorithm and a nonlinear dimensionality reduction technique well-suited for embedding high-dimensional data for visualisation in a low-dimensional space of two or three dimensions. We applied this method on a dataset containing images for training a convolutional neural network model (ConvNet) for an image classification problem. The dataset contains four different classes of images: three classes contain defects in construction (mould, stain, and paint deterioration) and a no-defect class (normal). We used the transfer learning technique to modify a pre-trained VGG-16 model. We used this model as a feature extractor and as a benchmark to evaluate our method. We have shown that, when using this method, we can identify and remove the outlier images in the dataset. After removing the outlier images from the dataset and re-training the VGG-16 model, the results have also shown that the accuracy of the classification has significantly improved and the number of misclassified cases has also dropped. While many feature engineering techniques for handling missing and outlier data are common in predictive machine learning problems involving numerical or categorical data, there is little work on developing techniques for handling outliers in high-dimensional data which can be used to improve the quality of machine learning problems involving images such as ConvNet models for image classification and object detection problems. Full article
(This article belongs to the Special Issue Advances in Machine Learning Prediction Models)
Show Figures

Figure 1

17 pages, 3296 KiB  
Article
Deep Hybrid Model Based on EMD with Classification by Frequency Characteristics for Long-Term Air Quality Prediction
by Xue-Bo Jin, Nian-Xiang Yang, Xiao-Yi Wang, Yu-Ting Bai, Ting-Li Su and Jian-Lei Kong
Mathematics 2020, 8(2), 214; https://doi.org/10.3390/math8020214 - 7 Feb 2020
Cited by 70 | Viewed by 4895
Abstract
Air pollution (mainly PM2.5) is one of the main environmental problems about air quality. Air pollution prediction and early warning is a prerequisite for air pollution prevention and control. However, it is not easy to accurately predict the long-term trend because the collected [...] Read more.
Air pollution (mainly PM2.5) is one of the main environmental problems about air quality. Air pollution prediction and early warning is a prerequisite for air pollution prevention and control. However, it is not easy to accurately predict the long-term trend because the collected PM2.5 data have complex nonlinearity with multiple components of different frequency characteristics. This study proposes a hybrid deep learning predictor, in which the PM2.5 data are decomposed into components by empirical mode decomposition (EMD) firstly, and a convolutional neural network (CNN) is built to classify all the components into a fixed number of groups based on the frequency characteristics. Then, a gated-recurrent-unit (GRU) network is trained for each group as the sub-predictor, and the results from the three GRUs are fused to obtain the prediction result. Experiments based on the PM2.5 data from Beijing verify the proposed model, and the prediction results show that the decomposition and classification can develop the accuracy of the proposed predictor for air pollution prediction greatly. Full article
(This article belongs to the Special Issue Advances in Machine Learning Prediction Models)
Show Figures

Figure 1

Review

Jump to: Research

25 pages, 5206 KiB  
Review
Data Science in Economics: Comprehensive Review of Advanced Machine Learning and Deep Learning Methods
by Saeed Nosratabadi, Amirhosein Mosavi, Puhong Duan, Pedram Ghamisi, Ferdinand Filip, Shahab S. Band, Uwe Reuter, Joao Gama and Amir H. Gandomi
Mathematics 2020, 8(10), 1799; https://doi.org/10.3390/math8101799 - 16 Oct 2020
Cited by 90 | Viewed by 17345
Abstract
This paper provides a comprehensive state-of-the-art investigation of the recent advances in data science in emerging economic applications. The analysis is performed on the novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, [...] Read more.
This paper provides a comprehensive state-of-the-art investigation of the recent advances in data science in emerging economic applications. The analysis is performed on the novel data science methods in four individual classes of deep learning models, hybrid deep learning models, hybrid machine learning, and ensemble models. Application domains include a broad and diverse range of economics research from the stock market, marketing, and e-commerce to corporate banking and cryptocurrency. Prisma method, a systematic literature review methodology, is used to ensure the quality of the survey. The findings reveal that the trends follow the advancement of hybrid models, which outperform other learning algorithms. It is further expected that the trends will converge toward the evolution of sophisticated hybrid deep learning models. Full article
(This article belongs to the Special Issue Advances in Machine Learning Prediction Models)
Show Figures

Figure 1

Back to TopTop