Analytics

24 pages, 1216 KB

Open AccessFeature PaperEditor’s ChoiceArticle

Traffic Prediction with Data Fusion and Machine Learning

by Juntao Qiu and Yaping Zhao

Analytics 2025, 4(2), 12; https://doi.org/10.3390/analytics4020012 - 9 Apr 2025

Cited by 9 | Viewed by 4876

Traffic prediction, as a core task to alleviate urban congestion and optimize the transport system, has limitations in the integration of multimodal data, making it difficult to comprehensively capture the complex spatio-temporal characteristics of the transport system. Although some studies have attempted to [...] Read more.

Traffic prediction, as a core task to alleviate urban congestion and optimize the transport system, has limitations in the integration of multimodal data, making it difficult to comprehensively capture the complex spatio-temporal characteristics of the transport system. Although some studies have attempted to introduce multimodal data, they mostly rely on resource-intensive deep neural network architectures, which have difficultly meeting the demands of practical applications. To this end, we propose a traffic prediction framework based on simple machine learning techniques that effectively integrates property features, amenity features, and emotion features (PAE features). Validated with large-scale real datasets, the method demonstrates excellent prediction performance while significantly reducing computational complexity and deployment costs. This study demonstrates the great potential of simple machine learning techniques in multimodal data fusion, provides an efficient and practical solution for traffic prediction, and offers an effective alternative to resource-intensive deep learning methods, opening up new paths for building scalable traffic prediction systems. Full article

► Show Figures

Figure 1

32 pages, 3163 KB

Open AccessEditor’s ChoiceArticle

Unveiling the Impact of Socioeconomic and Demographic Factors on Graduate Salaries: A Machine Learning Explanatory Analytical Approach Using Higher Education Statistical Agency Data

by Bassey Henshaw, Bhupesh Kumar Mishra, William Sayers and Zeeshan Pervez

Analytics 2025, 4(1), 10; https://doi.org/10.3390/analytics4010010 - 11 Mar 2025

Cited by 1 | Viewed by 3143

Abstract

Graduate salaries are a significant concern for graduates, employers, and policymakers, as various factors influence them. This study investigates determinants of graduate salaries in the UK, utilising survey data from HESA (Higher Education Statistical Agency) and integrating advanced machine learning (ML) explanatory techniques [...] Read more.

Graduate salaries are a significant concern for graduates, employers, and policymakers, as various factors influence them. This study investigates determinants of graduate salaries in the UK, utilising survey data from HESA (Higher Education Statistical Agency) and integrating advanced machine learning (ML) explanatory techniques with statistical analytical methodologies. By employing multi-stage analyses alongside machine learning models such as decision trees, random forests and the explainability with SHAP stands for (Shapley Additive exPanations), this study investigates the influence of 21 socioeconomic and demographic variables on graduate salary outcomes. Key variables, including institutional reputation, age at graduation, socioeconomic classification, job qualification requirements, and domicile, emerged as critical determinants, with institutional reputation proving the most significant. Among ML methods, the decision tree achieved a standout with the highest accuracy through rigorous optimisation techniques, including oversampling and undersampling. SHAP highlighted the top 12 influential variables, providing actionable insights into the interplay between individual and systemic factors. Furthermore, the statistical analysis using ANOVA (Analysis of Variance) validated the significance of these variables, revealing intricate interactions that shape graduate salary dynamics. Additionally, domain experts’ opinions are also analysed to authenticate the findings. This research makes a unique contribution by combining qualitative contextual analysis with quantitative methodologies, machine learning explainability and domain experts’ views on addressing gaps in the existing identification of graduate salary predicting components. Additionally, the findings inform policy and educational interventions to reduce wage inequalities and promote equitable career opportunities. Despite limitations, such as the UK-specific dataset and the focus on socioeconomic and demographic variables, this study lays a robust foundation for future research in predictive modelling and graduate outcomes. Full article

► Show Figures

Figure 1

30 pages, 1939 KB

Open AccessEditor’s ChoiceArticle

Towards Visual Analytics for Explainable AI in Industrial Applications

by Kostiantyn Kucher, Elmira Zohrevandi and Carl A. L. Westin

Analytics 2025, 4(1), 7; https://doi.org/10.3390/analytics4010007 - 12 Feb 2025

Cited by 2 | Viewed by 4962

Abstract

As the levels of automation and reliance on modern artificial intelligence (AI) approaches increase across multiple industries, the importance of the human-centered perspective becomes more evident. Various actors in such industrial applications, including equipment operators and decision makers, have their needs and preferences [...] Read more.

As the levels of automation and reliance on modern artificial intelligence (AI) approaches increase across multiple industries, the importance of the human-centered perspective becomes more evident. Various actors in such industrial applications, including equipment operators and decision makers, have their needs and preferences that often do not align with the decisions produced by black-box models, potentially leading to mistrust and wasted productivity gain opportunities. In this paper, we examine these issues through the lenses of visual analytics and, more broadly, interactive visualization, and we argue that the methods and techniques from these fields can lead to advances in both academic research and industrial innovations concerning the explainability of AI models. To address the existing gap within and across the research and application fields, we propose a conceptual framework for visual analytics design and evaluation for such scenarios, followed by a preliminary roadmap and call to action for the respective communities. Full article

(This article belongs to the Special Issue Visual Analytics: Techniques and Applications)

► Show Figures

Figure 1

14 pages, 2091 KB

Open AccessEditor’s ChoiceArticle

Personalizing Multimedia Content Recommendations for Intelligent Vehicles Through Text–Image Embedding Approaches

by Jin-A Choi, Taekeun Hong and Kiho Lim

Analytics 2025, 4(1), 4; https://doi.org/10.3390/analytics4010004 - 5 Feb 2025

Cited by 1 | Viewed by 1340

Abstract

The ability to automate and personalize the recommendation of multimedia contents to consumers has been gaining significant attention recently. The burgeoning demand for digitization and automation of formerly analog communication processes has caught the attention of researchers and professionals alike. In light of [...] Read more.

The ability to automate and personalize the recommendation of multimedia contents to consumers has been gaining significant attention recently. The burgeoning demand for digitization and automation of formerly analog communication processes has caught the attention of researchers and professionals alike. In light of the recent interest and anticipated transition to fully autonomous vehicles, this study proposes a text–image embedding method recommender system for the optimization of personalized multimedia content for in-vehicle infotainment. This study leverages existing pre-trained text embedding models and pre-trained image feature extraction methods. Previous research to date has focused mainly on textual-only or image-only analyses. By employing similarity measurements, this study demonstrates how recommendation of the most relevant multimedia content to consumers is enhanced through text–image embedding. Full article

► Show Figures

Figure 1

26 pages, 15401 KB

Open AccessEditor’s ChoiceArticle

Uncovering Patterns and Trends in Big Data-Driven Research Through Text Mining of NSF Award Synopses

by Arielle King and Sayed A. Mostafa

Analytics 2025, 4(1), 1; https://doi.org/10.3390/analytics4010001 - 6 Jan 2025

Viewed by 3345

Abstract

The rapid expansion of big data has transformed research practices across disciplines, yet disparities exist in its adoption among U.S. institutions of higher education. This study examines trends in NSF-funded big data-driven research across research domains, institutional classifications, and directorates. Using a quantitative [...] Read more.

The rapid expansion of big data has transformed research practices across disciplines, yet disparities exist in its adoption among U.S. institutions of higher education. This study examines trends in NSF-funded big data-driven research across research domains, institutional classifications, and directorates. Using a quantitative approach and natural language processing (NLP) techniques, we analyzed NSF awards from 2006 to 2022, focusing on seven NSF research areas: Biological Sciences, Computer and Information Science and Engineering, Engineering, Geosciences, Mathematical and Physical Sciences, Social, Behavioral and Economic Sciences, and STEM Education (formally known as Education and Human Resources). Findings indicate a significant increase in big data-related awards over time, with CISE (Computer and Information Science and Engineering) leading in funding. Machine learning and artificial intelligence are dominant themes across all institutions’ classifications. Results show that R1 and non-minority-serving institutions receive the majority of big data-driven research funding, though HBCUs have seen recent growth due to national diversity initiatives. Topic modeling reveals key subdomains such as cybersecurity and bioinformatics benefiting from big data, while areas like Biological Sciences and Social Sciences engage less with these methods. These findings suggest the need for broader support and funding to foster equitable adoption of big data methods across institutions and disciplines. Full article

► Show Figures

Figure 1

15 pages, 3294 KB

Open AccessEditor’s ChoiceArticle

An Exploration of Clustering Algorithms for Customer Segmentation in the UK Retail Market

by Jeen Mary John, Olamilekan Shobayo and Bayode Ogunleye

Analytics 2023, 2(4), 809-823; https://doi.org/10.3390/analytics2040042 - 12 Oct 2023

Cited by 47 | Viewed by 28715

Abstract

Recently, peoples’ awareness of online purchases has significantly risen. This has given rise to online retail platforms and the need for a better understanding of customer purchasing behaviour. Retail companies are pressed with the need to deal with a high volume of customer [...] Read more.

Recently, peoples’ awareness of online purchases has significantly risen. This has given rise to online retail platforms and the need for a better understanding of customer purchasing behaviour. Retail companies are pressed with the need to deal with a high volume of customer purchases, which requires sophisticated approaches to perform more accurate and efficient customer segmentation. Customer segmentation is a marketing analytical tool that aids customer-centric service and thus enhances profitability. In this paper, we aim to develop a customer segmentation model to improve decision-making processes in the retail market industry. To achieve this, we employed a UK-based online retail dataset obtained from the UCI machine learning repository. The retail dataset consists of 541,909 customer records and eight features. Our study adopted the RFM (recency, frequency, and monetary) framework to quantify customer values. Thereafter, we compared several state-of-the-art (SOTA) clustering algorithms, namely, K-means clustering, the Gaussian mixture model (GMM), density-based spatial clustering of applications with noise (DBSCAN), agglomerative clustering, and balanced iterative reducing and clustering using hierarchies (BIRCH). The results showed the GMM outperformed other approaches, with a Silhouette Score of 0.80. Full article

► Show Figures

Figure 1

18 pages, 462 KB

Open AccessEditor’s ChoiceArticle

Heterogeneous Ensemble for Medical Data Classification

by Loris Nanni, Sheryl Brahnam, Andrea Loreggia and Leonardo Barcellona

Analytics 2023, 2(3), 676-693; https://doi.org/10.3390/analytics2030037 - 4 Sep 2023

Cited by 7 | Viewed by 2521

Abstract

For robust classification, selecting a proper classifier is of primary importance. However, selecting the best classifiers depends on the problem, as some classifiers work better at some tasks than on others. Despite the many results collected in the literature, the support vector machine [...] Read more.

For robust classification, selecting a proper classifier is of primary importance. However, selecting the best classifiers depends on the problem, as some classifiers work better at some tasks than on others. Despite the many results collected in the literature, the support vector machine (SVM) remains the leading adopted solution in many domains, thanks to its ease of use. In this paper, we propose a new method based on convolutional neural networks (CNNs) as an alternative to SVM. CNNs are specialized in processing data in a grid-like topology that usually represents images. To enable CNNs to work on different data types, we investigate reshaping one-dimensional vector representations into two-dimensional matrices and compared different approaches for feeding standard CNNs using two-dimensional feature vector representations. We evaluate the different techniques proposing a heterogeneous ensemble based on three classifiers: an SVM, a model based on random subspace of rotation boosting (RB), and a CNN. The robustness of our approach is tested across a set of benchmark datasets that represent a wide range of medical classification tasks. The proposed ensembles provide promising performance on all datasets. Full article

► Show Figures

Figure 1

14 pages, 3918 KB

Open AccessEditor’s ChoiceArticle

Prediction of Stroke Disease with Demographic and Behavioural Data Using Random Forest Algorithm

by Olamilekan Shobayo, Oluwafemi Zachariah, Modupe Olufunke Odusami and Bayode Ogunleye

Analytics 2023, 2(3), 604-617; https://doi.org/10.3390/analytics2030034 - 2 Aug 2023

Cited by 23 | Viewed by 6644

Abstract

Stroke is a major cause of death worldwide, resulting from a blockage in the flow of blood to different parts of the brain. Many studies have proposed a stroke disease prediction model using medical features applied to deep learning (DL) algorithms to reduce [...] Read more.

Stroke is a major cause of death worldwide, resulting from a blockage in the flow of blood to different parts of the brain. Many studies have proposed a stroke disease prediction model using medical features applied to deep learning (DL) algorithms to reduce its occurrence. However, these studies pay less attention to the predictors (both demographic and behavioural). Our study considers interpretability, robustness, and generalisation as key themes for deploying algorithms in the medical domain. Based on this background, we propose the use of random forest for stroke incidence prediction. Results from our experiment showed that random forest (RF) outperformed decision tree (DT) and logistic regression (LR) with a macro F1 score of 94%. Our findings indicated age and body mass index (BMI) as the most significant predictors of stroke disease incidence. Full article

► Show Figures

Figure 1

17 pages, 635 KB

Open AccessFeature PaperEditor’s ChoiceArticle

Building Neural Machine Translation Systems for Multilingual Participatory Spaces

by Pintu Lohar, Guodong Xie, Daniel Gallagher and Andy Way

Analytics 2023, 2(2), 393-409; https://doi.org/10.3390/analytics2020022 - 1 May 2023

Cited by 5 | Viewed by 4252

Abstract

This work presents the development of the translation component in a multistage, multilevel, multimode, multilingual and dynamic deliberative (M4D2) system, built to facilitate automated moderation and translation in the languages of five European countries: Italy, Ireland, Germany, France and Poland. Two main topics [...] Read more.

This work presents the development of the translation component in a multistage, multilevel, multimode, multilingual and dynamic deliberative (M4D2) system, built to facilitate automated moderation and translation in the languages of five European countries: Italy, Ireland, Germany, France and Poland. Two main topics were to be addressed in the deliberation process: (i) the environment and climate change; and (ii) the economy and inequality. In this work, we describe the development of neural machine translation (NMT) models for these domains for six European languages: Italian, English (included as the second official language of Ireland), Irish, German, French and Polish. As a result, we generate 30 NMT models, initially baseline systems built using freely available online data, which are then adapted to the domains of interest in the project by (i) filtering the corpora, (ii) tuning the systems with automatically extracted in-domain development datasets and (iii) using corpus concatenation techniques to expand the amount of data available. We compare our results produced by the domain-adapted systems with those produced by Google Translate, and demonstrate that fast, high-quality systems can be produced that facilitate multilingual deliberation in a secure environment. Full article

► Show Figures

Figure 1

27 pages, 7073 KB

Open AccessEditor’s ChoiceArticle

Theory-Guided Analytics Process: Using Theories to Underpin an Analytics Process for New Banking Product Development Using Segmentation-Based Marketing Analytics Leveraging on Marketing Intelligence

by Tristan Lim, Tao Pan, Chin Sin Ong, Shuaiwei Chen and Jie Jun Jeremy Chia

Analytics 2023, 2(1), 105-131; https://doi.org/10.3390/analytics2010007 - 1 Feb 2023

Cited by 2 | Viewed by 4744

Abstract

Retail banking is undergoing considerable product competitiveness and disruptions. New product development is necessary to tackle such challenges and reinvigorate product lines. This study presents an instrumental real-life banking case study, where marketing analytics was utilized to drive a product differentiation strategy. In [...] Read more.

Retail banking is undergoing considerable product competitiveness and disruptions. New product development is necessary to tackle such challenges and reinvigorate product lines. This study presents an instrumental real-life banking case study, where marketing analytics was utilized to drive a product differentiation strategy. In particular, the study applied unsupervised machine learning techniques of link analysis, latent class analysis, and association analysis to undertake behavioral-based market segmentation, in view of attaining a profitable competitive advantage. To underpin the product development process with well grounded theoretical framing, this study asked the research question: “How may we establish a theory-driven approach for an analytics-driven process?” Findings of this study include a theoretical conceptual framework that underpinned the end-to-end segmentation-driven new product development process, backed by the empirical literature. The study hopes to provide: (i) for managerial practitioners, the use of case-based reasoning for practice-oriented new product development design, planning, and diagnosis efforts, and (ii) for researchers, the potentiality to test of the validity and robustness of an analytical-driven NPD process. The study also hopes to drive a wider research interest that studies the theory-driven approach for analytics-driven processes. Full article

► Show Figures

Figure 1

Journal Menu

Journal Browser

Editor’s Choice Articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI