Second Edition of Predictive Analytics and Data Science

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information and Communications Technology".

Deadline for manuscript submissions: closed (30 June 2024) | Viewed by 16735

Special Issue Editors


E-Mail
Guest Editor
Department of Computer Science and Systems Technology, University of Pannonia, 8200 Veszprém, Hungary
Interests: artificial intelligence; machine learning; data mining; health informatics; network analysis
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
MTA-PE Lendület Complex Systems Monitoring Research Group, Department of Process Engineering, University of Pannonia, H-8200 Veszprém, Hungary
Interests: chemical engineering; complex systems; computational intelligence; network science; process engineering
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The development and maintenance of predictive-data-driven models poses several challenges, such as feature selection, model structure optimisation, sensitivity analysis, model validation, model maintenance, transfer learning and adaptation, model deployment, and evaluation of the benefit of the application of the models.

This Special Issue solicits papers covering the development, validation, application, and maintenance of predictive analytics models and presenting real-life applications. The potential topics include, but are not limited to:

  • Classification-based prediction models;
  • Regression-based prediction models;
  • Forecast using deep learning methods and algorithms;
  • Managing the uncertainty and missing data in forecast;
  • The life cycle of predictive models, and maintaining predictive models;
  • Development and validation of online predictive models;
  • Self-learning predictive models;
  • Predictive analytics in Industry 4.0 (application of sensors, historical experience);
  • Predictive analysis in healthcare and economy (e.g., patient pathway prediction, predicting complications, customer relationship management, risk reduction, churn prevention, market trend and analysis, credit scoring);
  • Social media and text-analysis-based predictive models and systems.

Dr. Agnes Vathy-Fogarassy
Prof. Dr. János Abonyi
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • classification
  • regression
  • deep learning
  • uncertainty
  • validation and maintenance
  • self-learning
  • real-life applications

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issue

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 2317 KiB  
Article
Gender Prediction of Generated Tweets Using Generative AI
by Jalal S. Alowibdi
Information 2024, 15(8), 452; https://doi.org/10.3390/info15080452 - 1 Aug 2024
Viewed by 398
Abstract
With the use of Generative AI (GenAI), Online Social Networks (OSNs) now generate a huge volume of content data. Yet, user-generated content on OSNs, aided by GenAI, presents challenges in analyzing and understanding its characteristics. In particular, tweets generated by GenAI at the [...] Read more.
With the use of Generative AI (GenAI), Online Social Networks (OSNs) now generate a huge volume of content data. Yet, user-generated content on OSNs, aided by GenAI, presents challenges in analyzing and understanding its characteristics. In particular, tweets generated by GenAI at the request of authentic human users present difficulties in determining the gendered variation of the content. The vast amount of data generated from tweets’ content necessitates a thorough investigation into the gender-specific language used in these tweets. This study explores the task of predicting the gender of text content in tweets generated by GenAI. Through our analysis and experimentation, we have achieved a remarkable 90% accuracy in attributing gender-specific language to these tweets. Our research not only highlights the potential of GenAI in gender prediction but also underscores the sophisticated techniques employed to decipher the refined linguistic cues that differentiate male and female language in GenAI-generated content. This advancement in understanding and predicting gender-specific language in GenAI-generated tweets covers the way for more refined and accurate content analysis in the evolving landscape of OSNs. Full article
(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)
Show Figures

Figure 1

20 pages, 7704 KiB  
Article
Anomaly Prediction in Solar Photovoltaic (PV) Systems via Rayleigh Distribution with Integrated Internet of Sensing Things (IoST) Monitoring and Dynamic Sun-Tracking
by Tajim Md. Niamat Ullah Akhund, Nafisha Tamanna Nice, Muftain Ahmed Joy, Tanvir Ahmed and Md Whaiduzzaman
Information 2024, 15(8), 451; https://doi.org/10.3390/info15080451 - 1 Aug 2024
Viewed by 550
Abstract
The proliferation of solar panel installations presents significant societal and environmental advantages. However, many panels are situated in remote or inaccessible locations, like rooftops or vast desert expanses. Moreover, monitoring individual panel performance in large-scale systems poses a logistical challenge. Addressing this issue [...] Read more.
The proliferation of solar panel installations presents significant societal and environmental advantages. However, many panels are situated in remote or inaccessible locations, like rooftops or vast desert expanses. Moreover, monitoring individual panel performance in large-scale systems poses a logistical challenge. Addressing this issue necessitates an efficient surveillance system leveraging wide area networks. This paper introduces an Internet of Sensing Things (IoST)-based monitoring system integrated with sun-tracking capabilities for solar panels. Cutting-edge sensors and microcontrollers collect real-time data and securely store it in a cloud-based server infrastructure, enabling global accessibility and comprehensive analysis for future optimization. Innovative techniques are proposed to maximize power generation from sunlight radiation, achieved through continuous panel alignment with the sun’s position throughout the day. A solar tracking mechanism, utilizing light-dependent sensors and servo motors, dynamically adjusts panel orientation based on the sun’s angle of elevation and direction. This research contributes to the advancement of efficient and sustainable solar energy systems. Integrating state-of-the-art technologies ensures reliability and effectiveness, paving the way for enhanced performance and the widespread adoption of solar energy. Additionally, the paper explores anomaly prediction using Rayleigh distribution, offering insights into potential irregularities in solar panel performance. Full article
(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)
Show Figures

Figure 1

23 pages, 1194 KiB  
Article
A Data-Driven Approach to Set-Theoretic Model Predictive Control for Nonlinear Systems
by Francesco Giannini and Domenico Famularo
Information 2024, 15(7), 369; https://doi.org/10.3390/info15070369 - 23 Jun 2024
Viewed by 753
Abstract
In this paper, we present a data-driven model predictive control (DDMPC) framework specifically designed for constrained single-input single-output (SISO) nonlinear systems. Our approach involves customizing a set-theoretic receding horizon controller within a data-driven context. To achieve this, we translate model-based conditions into data [...] Read more.
In this paper, we present a data-driven model predictive control (DDMPC) framework specifically designed for constrained single-input single-output (SISO) nonlinear systems. Our approach involves customizing a set-theoretic receding horizon controller within a data-driven context. To achieve this, we translate model-based conditions into data series of available input and output signals. This translation process leverages recent advances in data-driven control theory, enabling the controller to operate effectively without relying on explicit system models. The proposed framework incorporates a robust methodology for managing system constraints, ensuring that the control actions remain within predefined bounds. By means of time sequences, the controller learns the underlying system dynamics and adapts to changes in real time, providing enhanced performance and reliability. The integration of set-theoretic methods allows for the systematic handling of uncertainties and disturbances, which are common when the trajectory of a nonlinear system is embedded inside a linear trajectory state tube. To validate the effectiveness of our DDMPC framework, we conduct extensive simulations on a nonlinear DC motor system. The results demonstrate significant improvements in control performance, highlighting the robustness and adaptability of our approach compared to traditional model-based MPC techniques. Full article
(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)
Show Figures

Figure 1

9 pages, 779 KiB  
Article
Clustering Offensive Strategies in Australian-Rules Football Using Social Network Analysis
by Zachery Born, Marion Mundt, Ajmal Mian, Jason Weber and Jacqueline Alderson
Information 2024, 15(6), 364; https://doi.org/10.3390/info15060364 - 20 Jun 2024
Viewed by 779
Abstract
Sports teams aim to understand the tactical behaviour of their opposition to gain a competitive advantage. Prior research of tactical behaviour in team sports has predominantly focused on the relationship between key performance indicators and match outcomes. However, key performance indicators fail to [...] Read more.
Sports teams aim to understand the tactical behaviour of their opposition to gain a competitive advantage. Prior research of tactical behaviour in team sports has predominantly focused on the relationship between key performance indicators and match outcomes. However, key performance indicators fail to capture the patterns of ball movement deployed by teams, which provide deeper insight into a team’s playing style. The purpose of this study was to quantify existing ball movement strategies in Australian-rules Football (AF) using detailed descriptions of possession types from 396 matches of the 2019 season. Ball movement patterns were measured by social network analysis for each team during offensive phases of play. K-means clustering identified four unique offensive strategies. The most successful offensive strategy, defined by the number of matches won (83/396), achieved a win/loss ratio of 1.69 and was characterised by low ball movement predictability, low reliance on well-connected athletes, and a high number of passes. This study’s insights into offensive strategy are instructional to AF coaches and high-performance support staff. The outcomes of this study can be used to support the design of tactical training and inform match-day decisions surrounding optimal offensive strategies. Full article
(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)
Show Figures

Figure 1

24 pages, 517 KiB  
Article
A Comparison of Mixed and Partial Membership Diagnostic Classification Models with Multidimensional Item Response Models
by Alexander Robitzsch 
Information 2024, 15(6), 331; https://doi.org/10.3390/info15060331 - 5 Jun 2024
Viewed by 601
Abstract
Diagnostic classification models (DCM) are latent structure models with discrete multivariate latent variables. Recently, extensions of DCMs to mixed membership have been proposed. In this article, ordinary DCMs, mixed and partial membership models, and multidimensional item response theory (IRT) models are compared through [...] Read more.
Diagnostic classification models (DCM) are latent structure models with discrete multivariate latent variables. Recently, extensions of DCMs to mixed membership have been proposed. In this article, ordinary DCMs, mixed and partial membership models, and multidimensional item response theory (IRT) models are compared through analytical derivations, three example datasets, and a simulation study. It is concluded that partial membership DCMs are similar, if not structurally equivalent, to sufficiently complex multidimensional IRT models. Full article
(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)
38 pages, 6061 KiB  
Article
Advanced Machine Learning Techniques for Predictive Modeling of Property Prices
by Kanchana Vishwanadee Mathotaarachchi, Raza Hasan and Salman Mahmood
Information 2024, 15(6), 295; https://doi.org/10.3390/info15060295 - 22 May 2024
Viewed by 1213
Abstract
Real estate price prediction is crucial for informed decision making in the dynamic real estate sector. In recent years, machine learning (ML) techniques have emerged as powerful tools for enhancing prediction accuracy and data-driven decision making. However, the existing literature lacks a cohesive [...] Read more.
Real estate price prediction is crucial for informed decision making in the dynamic real estate sector. In recent years, machine learning (ML) techniques have emerged as powerful tools for enhancing prediction accuracy and data-driven decision making. However, the existing literature lacks a cohesive synthesis of methodologies, findings, and research gaps in ML-based real estate price prediction. This study addresses this gap through a comprehensive literature review, examining various ML approaches, including neural networks, ensemble methods, and advanced regression techniques. We identify key research gaps, such as the limited exploration of hybrid ML-econometric models and the interpretability of ML predictions. To validate the robustness of regression models, we conduct generalization testing on an independent dataset. Results demonstrate the applicability of regression models in predicting real estate prices across diverse markets. Our findings underscore the importance of addressing research gaps to advance the field and enhance the practical applicability of ML techniques in real estate price prediction. This study contributes to a deeper understanding of ML’s role in real estate forecasting and provides insights for future research and practical implementation in the real estate industry. Full article
(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)
Show Figures

Figure 1

13 pages, 449 KiB  
Article
A Proactive Decision-Making Model for Evaluating the Reliability of Infrastructure Assets of a Railway System
by Daniel O. Aikhuele and Shahryar Sorooshian
Information 2024, 15(4), 219; https://doi.org/10.3390/info15040219 - 13 Apr 2024
Viewed by 963
Abstract
Railway infrastructure is generally classified as either fixed or movable infrastructure assets. Failure in any of the assets could lead to the complete shutdown and disruption of the entire system, economic loss, inconvenience to passengers and the train operating company(s), and can sometimes [...] Read more.
Railway infrastructure is generally classified as either fixed or movable infrastructure assets. Failure in any of the assets could lead to the complete shutdown and disruption of the entire system, economic loss, inconvenience to passengers and the train operating company(s), and can sometimes result in death or injury in the event of the derailment of the rolling stock. Considering the importance of the railway infrastructure assets, it is only necessary to continuously explore their behavior, reliability, and safety. In this paper, a proactive multi-criteria decision-making model that is based on an interval-valued intuitionistic fuzzy set and some reliability quantitative parameters has been proposed for the evaluation of the reliability of the infrastructure assets. Results from the evaluation show that the failure mode ‘Broken and defective rails’ has the most risk and reliability concerns. Hence, priority should be given to the failure mode to avoid a total system collapse. Full article
(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)
Show Figures

Figure 1

19 pages, 574 KiB  
Article
Generally Applicable Q-Table Compression Method and Its Application for Constrained Stochastic Graph Traversal Optimization Problems
by Tamás Kegyes, Alex Kummer, Zoltán Süle and János Abonyi
Information 2024, 15(4), 193; https://doi.org/10.3390/info15040193 - 31 Mar 2024
Viewed by 868
Abstract
We analyzed a special class of graph traversal problems, where the distances are stochastic, and the agent is restricted to take a limited range in one go. We showed that both constrained shortest Hamiltonian pathfinding problems and disassembly line balancing problems belong to [...] Read more.
We analyzed a special class of graph traversal problems, where the distances are stochastic, and the agent is restricted to take a limited range in one go. We showed that both constrained shortest Hamiltonian pathfinding problems and disassembly line balancing problems belong to the class of constrained shortest pathfinding problems, which can be represented as mixed-integer optimization problems. Reinforcement learning (RL) methods have proven their efficiency in multiple complex problems. However, researchers concluded that the learning time increases radically by growing the state- and action spaces. In continuous cases, approximation techniques are used, but these methods have several limitations in mixed-integer searching spaces. We present the Q-table compression method as a multistep method with dimension reduction, state fusion, and space compression techniques that project a mixed-integer optimization problem into a discrete one. The RL agent is then trained using an extended Q-value-based method to deliver a human-interpretable model for optimal action selection. Our approach was tested in selected constrained stochastic graph traversal use cases, and comparative results are shown to the simple grid-based discretization method. Full article
(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)
Show Figures

Figure 1

32 pages, 1285 KiB  
Article
Comparative Analysis of NLP-Based Models for Company Classification
by Maryan Rizinski, Andrej Jankov, Vignesh Sankaradas, Eugene Pinsky, Igor Mishkovski and Dimitar Trajanov
Information 2024, 15(2), 77; https://doi.org/10.3390/info15020077 - 31 Jan 2024
Cited by 1 | Viewed by 2958
Abstract
The task of company classification is traditionally performed using established standards, such as the Global Industry Classification Standard (GICS). However, these approaches heavily rely on laborious manual efforts by domain experts, resulting in slow, costly, and vendor-specific assignments. Therefore, we investigate recent natural [...] Read more.
The task of company classification is traditionally performed using established standards, such as the Global Industry Classification Standard (GICS). However, these approaches heavily rely on laborious manual efforts by domain experts, resulting in slow, costly, and vendor-specific assignments. Therefore, we investigate recent natural language processing (NLP) advancements to automate the company classification process. In particular, we employ and evaluate various NLP-based models, including zero-shot learning, One-vs-Rest classification, multi-class classifiers, and ChatGPT-aided classification. We conduct a comprehensive comparison among these models to assess their effectiveness in the company classification task. The evaluation uses the Wharton Research Data Services (WRDS) dataset, consisting of textual descriptions of publicly traded companies. Our findings reveal that the RoBERTa and One-vs-Rest classifiers surpass the other methods, achieving F1 scores of 0.81 and 0.80 on the WRDS dataset, respectively. These results demonstrate that deep learning algorithms offer the potential to automate, standardize, and continuously update classification systems in an efficient and cost-effective way. In addition, we introduce several improvements to the multi-class classification techniques: (1) in the zero-shot methodology, we TF-IDF to enhance sector representation, yielding improved accuracy in comparison to standard zero-shot classifiers; (2) next, we use ChatGPT for dataset generation, revealing potential in scenarios where datasets of company descriptions are lacking; and (3) we also employ K-Fold to reduce noise in the WRDS dataset, followed by conducting experiments to assess the impact of noise reduction on the company classification results. Full article
(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)
Show Figures

Figure 1

20 pages, 8983 KiB  
Article
An Effective Ensemble Convolutional Learning Model with Fine-Tuning for Medicinal Plant Leaf Identification
by Mohd Asif Hajam, Tasleem Arif, Akib Mohi Ud Din Khanday and Mehdi Neshat
Information 2023, 14(11), 618; https://doi.org/10.3390/info14110618 - 18 Nov 2023
Cited by 6 | Viewed by 3728
Abstract
Accurate and efficient medicinal plant image classification is of utmost importance as these plants produce a wide variety of bioactive compounds that offer therapeutic benefits. With a long history of medicinal plant usage, different parts of plants, such as flowers, leaves, and roots, [...] Read more.
Accurate and efficient medicinal plant image classification is of utmost importance as these plants produce a wide variety of bioactive compounds that offer therapeutic benefits. With a long history of medicinal plant usage, different parts of plants, such as flowers, leaves, and roots, have been recognized for their medicinal properties and are used for plant identification. However, leaf images are extensively used due to their convenient accessibility and are a major source of information. In recent years, transfer learning and fine-tuning, which use pre-trained deep convolutional networks to extract pertinent features, have emerged as an extremely effective approach for image-identification problems. This study leveraged the power by three-component deep convolutional neural networks, namely VGG16, VGG19, and DenseNet201, to derive features from the input images of the medicinal plant dataset, containing leaf images of 30 classes. The models were compared and ensembled to make four hybrid models to enhance the predictive performance by utilizing the averaging and weighted averaging strategies. Quantitative experiments were carried out to evaluate the models on the Mendeley Medicinal Leaf Dataset. The resultant ensemble of VGG19+DensNet201 with fine-tuning showcased an enhanced capability in identifying medicinal plant images with an improvement of 7.43% and 5.8% compared with VGG19 and VGG16. Furthermore, VGG19+DensNet201 can outperform its standalone counterparts by achieving an accuracy of 99.12% on the test set. A thorough assessment with metrics such as accuracy, recall, precision, and the F1-score firmly established the effectiveness of the ensemble strategy. Full article
(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)
Show Figures

Figure 1

30 pages, 2295 KiB  
Article
An Integrated GIS-Based Reinforcement Learning Approach for Efficient Prediction of Disease Transmission in Aquaculture
by Aristeidis Karras, Christos Karras, Spyros Sioutas, Christos Makris, George Katselis, Ioannis Hatzilygeroudis, John A. Theodorou and Dimitrios Tsolis
Information 2023, 14(11), 583; https://doi.org/10.3390/info14110583 - 24 Oct 2023
Cited by 1 | Viewed by 2695
Abstract
This study explores the design and capabilities of a Geographic Information System (GIS) incorporated with an expert knowledge system, tailored for tracking and monitoring the spread of dangerous diseases across a collection of fish farms. Specifically targeting the aquacultural regions of Greece, the [...] Read more.
This study explores the design and capabilities of a Geographic Information System (GIS) incorporated with an expert knowledge system, tailored for tracking and monitoring the spread of dangerous diseases across a collection of fish farms. Specifically targeting the aquacultural regions of Greece, the system captures geographical and climatic data pertinent to these farms. A feature of this system is its ability to calculate disease transmission intervals between individual cages and broader fish farm entities, providing crucial insights into the spread dynamics. These data then act as an entry point to our expert system. To enhance the predictive precision, we employed various machine learning strategies, ultimately focusing on a reinforcement learning (RL) environment. This RL framework, enhanced by the Multi-Armed Bandit (MAB) technique, stands out as a powerful mechanism for effectively managing the flow of virus transmissions within farms. Empirical tests highlight the efficiency of the MAB approach, which, in direct comparisons, consistently outperformed other algorithmic options, achieving an impressive accuracy rate of 96%. Looking ahead to future work, we plan to integrate buffer techniques and delve deeper into advanced RL models to enhance our current system. The results set the stage for future research in predictive modeling within aquaculture health management, and we aim to extend our research even further. Full article
(This article belongs to the Special Issue Second Edition of Predictive Analytics and Data Science)
Show Figures

Figure 1

Back to TopTop