Journal Description
Analytics
Analytics
is an international, peer-reviewed, open access journal on methodologies, technologies, and applications of analytics, published quarterly online by MDPI.
- Open Access— free for readers, with article processing charges (APC) paid by authors or their institutions.
- High Visibility: indexed within Scopus and other databases.
- Rapid Publication: manuscripts are peer-reviewed and a first decision is provided to authors approximately 20.6 days after submission; acceptance to publication is undertaken in 5.5 days (median values for papers published in this journal in the second half of 2025).
- Recognition of Reviewers: APC discount vouchers, optional signed peer review, and reviewer names published annually in the journal.
- Analytics is a companion journal of Mathematics.
- Journal Cluster of Information Systems and Technology: Analytics, Applied System Innovation, Cryptography, Data, Digital, Informatics, Information, Journal of Cybersecurity and Privacy and Multimedia.
Latest Articles
Integrating Deep Learning Nodes into an Augmented Decision Tree for Automated Medical Coding
Analytics 2026, 5(1), 11; https://doi.org/10.3390/analytics5010011 - 12 Feb 2026
Abstract
►
Show Figures
Accurate assignment of International Classification of Diseases (ICD) codes is essential for healthcare analytics, billing, and clinical research. However, manual coding remains time-consuming and error-prone due to the scale and complexity of the ICD taxonomy. While hierarchical deep learning approaches have improved automated
[...] Read more.
Accurate assignment of International Classification of Diseases (ICD) codes is essential for healthcare analytics, billing, and clinical research. However, manual coding remains time-consuming and error-prone due to the scale and complexity of the ICD taxonomy. While hierarchical deep learning approaches have improved automated coding, their deployment across large taxonomies raises scalability and efficiency concerns. To address these limitations, we introduce the Augmented Decision Tree (ADT) framework, which integrates deep learning with symbolic rule-based logic for automated medical coding. ADT employs an automated lexical screening mechanism to dynamically select the most appropriate modeling strategy for each decision node, thereby minimizing manual configuration. Nodes with high keyword distinctiveness are handled by symbolic rules, while semantically ambiguous nodes are assigned to deep contextual models fine-tuned from PubMedBERT. This selective design eliminates the need to train a deep learning model at every node, significantly reducing computational cost. A case study demonstrates that this hybrid and adaptive ADT approach supports scalable and efficient ICD coding. Experimental results show that ADT outperforms a pure decision tree baseline and achieves accuracy comparable to that of a full deep learning-based decision tree, while requiring substantially less training time and computational resources.
Full article
Open AccessArticle
Site Selection for Solar Photovoltaic Power Plant Using MCDM Method with New De-i-Fuzzification Technique
by
Kamal Hossain Gazi, Asesh Kumar Mukherjee, Shashi Bajaj Mukherjee, Sankar Prasad Mondal, Soheil Salahshour and Arijit Ghosh
Analytics 2026, 5(1), 10; https://doi.org/10.3390/analytics5010010 - 9 Feb 2026
Abstract
Choosing sites for solar photovoltaic (PV) power plants in developing countries like India is a crucial task while considering multiple conflicting factors and sub-factors simultaneously. Multi-criteria decision-making (MCDM) is an optimisation method that provides a framework for handling such situations in an intuitionistic
[...] Read more.
Choosing sites for solar photovoltaic (PV) power plants in developing countries like India is a crucial task while considering multiple conflicting factors and sub-factors simultaneously. Multi-criteria decision-making (MCDM) is an optimisation method that provides a framework for handling such situations in an intuitionistic fuzzy environment. The complexity and uncertainty associated with the site selection model are dealt with professionally. The Criteria Importance Through Intercriteria Correlation (CRITIC) method is applied to determine the relative importance of the criteria, identifying airflow speed as the most influential factor, followed by humidity ratio, level of dust haze, availability of labour and resources, and ecological effects. This shows that airflow speed plays an important role in the power plant’s efficiency and performance. The Vlse Kriterijumska Optimizacija I Kompromisno Rešenje (VIKOR) method is then used to prioritise the alternatives as potential locations for setting up a solar PV power plant in India. A new de-i-fuzzification method based on the relative difference between two real numbers is also proposed. Sensitivity analyses and comparative studies are conducted to assess the robustness and effectiveness of the framework. Overall, the results demonstrate that the proposed framework is useful and effective for optimising site selection for solar power plants in India.
Full article
(This article belongs to the Topic Data Intelligence and Computational Analytics)
►▼
Show Figures

Figure 1
Open AccessArticle
Denoising Stock Price Time Series with Singular Spectrum Analysis for Enhanced Deep Learning Forecasting
by
Carol Anne Hargreaves and Zixian Fan
Analytics 2026, 5(1), 9; https://doi.org/10.3390/analytics5010009 - 27 Jan 2026
Abstract
►▼
Show Figures
Aim: Stock price prediction remains a highly challenging task due to the complex and nonlinear nature of financial time series data. While deep learning (DL) has shown promise in capturing these nonlinear patterns, its effectiveness is often hindered by the low signal-to-noise ratio
[...] Read more.
Aim: Stock price prediction remains a highly challenging task due to the complex and nonlinear nature of financial time series data. While deep learning (DL) has shown promise in capturing these nonlinear patterns, its effectiveness is often hindered by the low signal-to-noise ratio inherent in market data. This study aims to enhance the stock predictive performance and trading outcomes by integrating Singular Spectrum Analysis (SSA) with deep learning models for stock price forecasting and strategy development on the Australian Securities Exchange (ASX)50 index. Method: The proposed framework begins by applying SSA to decompose raw stock price time series into interpretable components, effectively isolating meaningful trends and eliminating noise. The denoised sequences are then used to train a suite of deep learning architectures, including Convolutional Neural Networks (CNN), Long Short-Term Memory (LSTM), and hybrid CNN-LSTM models. These models are evaluated based on their forecasting accuracy and the profitability of the trading strategies derived from their predictions. Results: Experimental results demonstrated that the SSA-DL framework significantly improved the prediction accuracy and trading performance compared to baseline DL models trained on raw data. The best-performing model, SSA-CNN-LSTM, achieved a Sharpe Ratio of 1.88 and a return on investment (ROI) of 67%, indicating robust risk-adjusted returns and effective exploitation of the underlying market conditions. Conclusions: The integration of Singular Spectrum Analysis with deep learning offers a powerful approach to stock price prediction in noisy financial environments. By denoising input data prior to model training, the SSA-DL framework enhanced signal clarity, improved forecast reliability, and enabled the construction of profitable trading strategies. These findings suggested a strong potential for SSA-based preprocessing in financial time series modeling.
Full article

Figure 1
Open AccessArticle
From Models to Metrics: A Governance Framework for Large Language Models in Enterprise AI and Analytics
by
Darshan Desai and Ashish Desai
Analytics 2026, 5(1), 8; https://doi.org/10.3390/analytics5010008 - 11 Jan 2026
Abstract
Large language models (LLMs) and other foundation models are rapidly being woven into enterprise analytics workflows, where they assist with data exploration, forecasting, decision support, and automation. These systems can feel like powerful new teammates: creative, scalable, and tireless. Yet they also introduce
[...] Read more.
Large language models (LLMs) and other foundation models are rapidly being woven into enterprise analytics workflows, where they assist with data exploration, forecasting, decision support, and automation. These systems can feel like powerful new teammates: creative, scalable, and tireless. Yet they also introduce distinctive risks related to opacity, brittleness, bias, and misalignment with organizational goals. Existing work on AI ethics, alignment, and governance provides valuable principles and technical safeguards, but enterprises still lack practical frameworks that connect these ideas to the specific metrics, controls, and workflows by which analytics teams design, deploy, and monitor LLM-powered systems. This paper proposes a conceptual governance framework for enterprise AI and analytics that is explicitly centered on LLMs embedded in analytics pipelines. The framework adopts a three-layered perspective—model and data alignment, system and workflow alignment, and ecosystem and governance alignment—that links technical properties of models to enterprise analytics practices, performance indicators, and oversight mechanisms. In practical terms, the framework shows how model and workflow choices translate into concrete metrics and inform real deployment, monitoring, and scaling decisions for LLM-powered analytics. We also illustrate how this framework can guide the design of controls for metrics, monitoring, human-in-the-loop structures, and incident response in LLM-driven analytics. The paper concludes with implications for analytics leaders and governance teams seeking to operationalize responsible, scalable use of LLMs in enterprise settings.
Full article
(This article belongs to the Special Issue Critical Challenges in Large Language Models and Data Analytics: Trustworthiness, Scalability, and Societal Impact)
►▼
Show Figures

Figure 1
Open AccessArticle
Predicting ESG Scores Using Machine Learning for Data-Driven Sustainable Investment
by
Sanskruti Patel, Abhay Nath and Pranav Desai
Analytics 2026, 5(1), 7; https://doi.org/10.3390/analytics5010007 - 9 Jan 2026
Abstract
►▼
Show Figures
Environmental, social and governance (ESG) metrics increasingly inform sustainable investment yet suffer from inter-rater heterogeneity and incomplete reporting, limiting their utility for forward-looking allocation. In this study, we developed and validated a two-level stacked-ensemble machine-learning framework to predict total ESG risk scores for
[...] Read more.
Environmental, social and governance (ESG) metrics increasingly inform sustainable investment yet suffer from inter-rater heterogeneity and incomplete reporting, limiting their utility for forward-looking allocation. In this study, we developed and validated a two-level stacked-ensemble machine-learning framework to predict total ESG risk scores for S&P 500 firms using a comprehensive feature set comprising pillar sub-scores, controversy measures, firm financials, categorical descriptors and geospatial environmental indicators. Data pre-processing combined median/mean imputation, one-hot encoding, normalization and rigorous feature engineering; models were trained with an 80:20 train–test split and hyperparameters tuned by k-fold cross-validation. The stacked ensemble substantially outperformed single-model baselines (RMSE = 1.006, MAE = 0.664, MAPE = 3.13%, = 0.979, CV_RMSE_Mean = 1.383, CV_R2_Mean = 0.957), with LightGBM and gradient boosting as competitive comparators. Permutation importance and correlation analysis identified environmental and social components as primary drivers (environmental importance = 0.41; social = 0.32), with potential multicollinearity between component and aggregate scores. This study concludes that ensemble-based predictive analytics can produce reliable, actionable ESG estimates to enhance screening and prioritization in sustainable investment, while recommending human review for extreme predictions and further work to harmonize cross-provider score divergence.
Full article

Figure 1
Open AccessArticle
Interference-Driven Scaling Variability in Burst-Based Loopless Invasion Percolation Models of Induced Seismicity
by
Ian Baughman and John B. Rundle
Analytics 2026, 5(1), 6; https://doi.org/10.3390/analytics5010006 - 6 Jan 2026
Abstract
►▼
Show Figures
Many fluid-injection sequences display burst-like seismicity with approximate power-law event-size distributions whose exponents drift between catalogs. Classical percolation models instead predict fixed, dimension-dependent exponents and do not specify which geometric mechanisms could underlie such b-value variability. We address this gap using two
[...] Read more.
Many fluid-injection sequences display burst-like seismicity with approximate power-law event-size distributions whose exponents drift between catalogs. Classical percolation models instead predict fixed, dimension-dependent exponents and do not specify which geometric mechanisms could underlie such b-value variability. We address this gap using two loopless invasion percolation variants—the constrained Leath invasion percolation (CLIP) and avalanche invasion percolation (AIP) models—to generate synthetic burst catalogs and quantify how burst geometry modifies size–frequency statistics. For each model we measure burst-size distributions and an interference fraction, defined as the proportion of attempted growth steps that terminate on previously activated bonds. Single-burst clusters recover the Fisher exponent of classical percolation, whereas multi-burst sequences show systematic, dimension-dependent drift of the effective exponent with a burst number that is strongly correlated with the interference fraction. CLIP and AIP are indistinguishable under these diagnostics, indicating that interference-driven exponent drift is a generic feature of burst growth rather than a model-specific artifact. Mapping the size-distribution exponent to an equivalent Gutenberg–Richter b-value shows that increasing interference suppresses large bursts and produces b value ranges comparable to those reported for injection-induced seismicity, supporting the interpretation of interference as a geometric proxy for mechanical inhibition that limits the growth of large events in real fracture networks.
Full article

Figure 1
Open AccessArticle
PSYCH—Psychometric Assessment of Large Language Model Characters: An Exploration of the German Language
by
Nane Kratzke, Niklas Beuter, André Drews and Monique Janneck
Analytics 2026, 5(1), 5; https://doi.org/10.3390/analytics5010005 - 6 Jan 2026
Abstract
Background: Existing evaluations of large language models (LLMs) largely emphasize linguistic and factual performance, while their psychometric characteristics and behavioral biases remain insufficiently examined, particularly beyond English-language contexts. This study presents a systematic psychometric screening of LLMs in German using the validated Big
[...] Read more.
Background: Existing evaluations of large language models (LLMs) largely emphasize linguistic and factual performance, while their psychometric characteristics and behavioral biases remain insufficiently examined, particularly beyond English-language contexts. This study presents a systematic psychometric screening of LLMs in German using the validated Big Five Inventory-2 (BFI-2). Methods: Thirty-two contemporary commercial and open-source LLMs completed all 60 BFI-2 items 60 times each (once with and once without having to justify their answers), yielding over 330,000 responses. Models answered independently, under male and female impersonation, and with and without required justifications. Responses were compared to German human reference data using Welch’s t-tests ( ) to assess deviations, response stability, justification effects, and gender differences. Results: At the domain level, LLM personality profiles broadly align with human means. Facet-level analyses, however, reveal systematic deviations, including inflated agreement—especially in Agreeableness and Aesthetic Sensitivity—and reduced Negative Emotionality. Only a few models show minimal deviations. Justification prompts significantly altered responses in 56% of models, often increasing variability. Commercial models exhibited substantially higher response stability than open-source models. Gender impersonation affected up to 25% of BFI-2 items, reflecting and occasionally amplifying human gender differences. Conclusions: This study introduces a reproducible psychometric framework for benchmarking LLM behavior against validated human norms and shows that LLMs produce stable yet systematically biased personality-like response patterns. Psychometric screening could therefore complement traditional LLM evaluation in sensitive applications.
Full article
(This article belongs to the Special Issue Critical Challenges in Large Language Models and Data Analytics: Trustworthiness, Scalability, and Societal Impact)
►▼
Show Figures

Figure 1
Open AccessArticle
GSM: An Integrated GAM–SHAP–MCDA Framework for Stroke Risk Assessment
by
Rilwan Mustapha, Ashiribo Wusu, Olusola Olabanjo and Bamidele Adetunji
Analytics 2026, 5(1), 4; https://doi.org/10.3390/analytics5010004 - 29 Dec 2025
Abstract
►▼
Show Figures
This study proposes GSM, an interpretable and operational GAM-SHAP-MCDA framework for stroke risk stratification by integrating generalized additive models (GAMs), a point-based clinical scoring system, SHAP-based explainability, and multi-criteria decision analysis (MCDA). Using a publicly available dataset of individuals (
[...] Read more.
This study proposes GSM, an interpretable and operational GAM-SHAP-MCDA framework for stroke risk stratification by integrating generalized additive models (GAMs), a point-based clinical scoring system, SHAP-based explainability, and multi-criteria decision analysis (MCDA). Using a publicly available dataset of individuals ( stroke prevalence), a GAM was fitted to capture nonlinear effects of key physiological predictors, including age, average blood glucose level, and body mass index (BMI), together with linear effects for hypertension, heart disease, and categorical covariates. The estimated smooth functions revealed strong age-related risk acceleration beyond 60 years, threshold behavior for glucose levels above approximately , and a non-monotonic BMI association with peak risk at moderate BMI ranges. In a comparative evaluation, the GAM achieved superior discrimination and calibration relative to classical logistic regression, with a mean AUC of versus and a lower Brier score ( vs. ). A calibration analysis yielded an intercept of and a slope of , indicating near-ideal agreement between the predicted and observed risks. While high-capacity ensemble models such as XGBoost achieved slightly higher AUC values ( ), the GAM attained near-upper-bound performance while retaining full interpretability. To enhance clinical usability, the GAM smooth effects were discretized into clinically interpretable bands and converted into an additive point-based risk score ranging from 0 to 42, which was subsequently calibrated to absolute stroke probability. The calibrated probabilities were incorporated into the TOPSIS and VIKOR MCDA frameworks, producing transparent and robust patient prioritization rankings. A SHAP analysis confirmed age, glucose, and cardiometabolic factors as dominant global contributors, aligning with the learned GAM structure. Overall, the proposed GAM–SHAP–MCDA framework demonstrates that near-state-of-the-art predictive performance can be achieved alongside transparency, calibration, and decision-oriented interpretability, supporting ethical and practical deployment of medical artificial intelligence for stroke risk assessment.
Full article

Figure 1
Open AccessArticle
Can Length Limit for App Titles Benefit Consumers?
by
Saori Chiba, Yu-Hsi Liu, Chien-Yuan Sher and Min-Hsueh Tsai
Analytics 2026, 5(1), 3; https://doi.org/10.3390/analytics5010003 - 29 Dec 2025
Abstract
The App Store introduced a title-length limit for mobile apps in 2016, and similar policies were later adopted across the industry. This issue drew considerable attention from industry practitioners in the 2010s. Using both empirical and theoretical approaches, this paper examines the effectiveness
[...] Read more.
The App Store introduced a title-length limit for mobile apps in 2016, and similar policies were later adopted across the industry. This issue drew considerable attention from industry practitioners in the 2010s. Using both empirical and theoretical approaches, this paper examines the effectiveness of this policy and its welfare implications. Title length became an issue because some sellers assemble meaningful keywords in the app title to convey information to consumers, while others combine irrelevant yet popular keywords in an attempt to increase their app’s downloads. We hypothesize that when titles are short, title length is positively associated with an app’s performance because both honest and opportunistic sellers coexist in the market. However, due to the presence of opportunistic sellers, once titles become too long, this positive relationship disappears. We examine this hypothesis using a random sample of 1998 apps from the App Store in 2015. Our results show that for apps with titles longer than 30 characters, title length remains positively associated with app performance. However, for titles exceeding 50 characters, we do not have sufficient evidence to conclude that further increases in length continue to generate additional downloads. To interpret our empirical findings, we construct communication games between an app seller and a consumer, in which the equilibrium is characterized by a threshold. Based on our model and empirical observations, the 30-character limit might hurt consumers.
Full article
Open AccessArticle
A Threshold Selection Method in Code Plagiarism Checking Function for Code Writing Problem in Java Programming Learning Assistant System Considering AI-Generated Codes
by
Perwira Annissa Dyah Permatasari, Mustika Mentari, Safira Adine Kinari, Soe Thandar Aung, Nobuo Funabiki, Htoo Htoo Sandi Kyaw and Khaing Hsu Wai
Analytics 2026, 5(1), 2; https://doi.org/10.3390/analytics5010002 - 26 Dec 2025
Abstract
To support novice learners, the Java programming learning assistant system (JPLAS) has been developed with various features. Among them, code writing problem (CWP) assigns writing an answer code that passes a given test code. The correctness of an answer code is validated
[...] Read more.
To support novice learners, the Java programming learning assistant system (JPLAS) has been developed with various features. Among them, code writing problem (CWP) assigns writing an answer code that passes a given test code. The correctness of an answer code is validated by running it on JUnit. In previous works, we implemented a code plagiarism checking function that calculates the similarity score for each pair of answer codes based on the Levenshtein distance. When the score is higher than a given threshold, this pair is regarded as plagiarism. However, a method for finding the proper threshold has not been studied. In addition, AI-generated codes have become threats in plagiarism, as AI has grown in popularity, which should be investigated. In this paper, we propose a threshold selection method based on Tukey’s IQR fences. It uses a custom upper threshold derived from the statistical distribution of similarity scores for each assignment. To better accommodate skewed similarity distributions, the method introduces a simple percentile-based adjustment for determining the upper threshold. We also design prompts to generate answer codes using generative AI and apply them to four AI models. For evaluation, we used a total of 745 source codes of two datasets. The first dataset consists of 420 answer codes across 12 CWP instances from 35 first-year undergraduate students in the State Polytechnic of Malang, Indonesia (POLINEMA). The second dataset includes 325 answer codes across five CWP assignments from 65 third-year undergraduate students at Okayama University, Japan. The applications of our proposals found the following: (1) any pair of student codes whose score is higher than the selected threshold has some evidence of plagiarism, (2) some student codes have a higher similarity than the threshold with AI-generated codes, indicating the use of generative AI, and (3) multiple AI models can generate code that resembles student-written code, despite adopting different implementations. The validity of our proposal is confirmed.
Full article
(This article belongs to the Special Issue Critical Challenges in Large Language Models and Data Analytics: Trustworthiness, Scalability, and Societal Impact)
►▼
Show Figures

Figure 1
Open AccessArticle
A Novel Magnificent Frigatebird Optimization Algorithm with Proposed Movement Strategies for Enhanced Global Search
by
Glykeria Kyrou, Vasileios Charilogis and Ioannis G. Tsoulos
Analytics 2026, 5(1), 1; https://doi.org/10.3390/analytics5010001 - 23 Dec 2025
Abstract
►▼
Show Figures
Global optimization is a fundamental tool for addressing complex and nonlinear problems across scientific and technological domains. The primary objective of this work is to enhance the efficiency, stability, and convergence speed of the Magnificent Frigatebird Optimization (MFO) algorithm by introducing new strategies
[...] Read more.
Global optimization is a fundamental tool for addressing complex and nonlinear problems across scientific and technological domains. The primary objective of this work is to enhance the efficiency, stability, and convergence speed of the Magnificent Frigatebird Optimization (MFO) algorithm by introducing new strategies that strengthen both global exploration and local exploitation. To this end, we propose an improved version of MFO that incorporates three novel movement strategies (aggressive, conservative, and mixed), a BFGS-based local search procedure for more accurate solution refinement, and a dynamic termination criterion capable of detecting stagnation and reducing unnecessary function evaluations. The algorithm is extensively evaluated on a diverse set of benchmark functions, demonstrating substantially lower computational cost and higher reliability compared to classical evolutionary and swarm-based methods. The results confirm the effectiveness of the proposed modifications and highlight the potential of the enhanced MFO for application to demanding real-world optimization problems.
Full article

Figure 1
Open AccessArticle
Assessing the Impact of Capital Expenditure on Corporate Profitability in South Korea’s Electronics Industry: A Regression Analysis Approach
by
Bomee Park and Tetiana Paientko
Analytics 2025, 4(4), 36; https://doi.org/10.3390/analytics4040036 - 10 Dec 2025
Abstract
►▼
Show Figures
This study investigates the relationship between capital expenditure (CAPEX) and long-term corporate profitability in South Korea’s electronics industry. Using panel data from 126 listed electronics firms covering 2005–2019, the research applies fixed-effects regression analysis to examine how CAPEX influences profitability, measured by EBITDA/total
[...] Read more.
This study investigates the relationship between capital expenditure (CAPEX) and long-term corporate profitability in South Korea’s electronics industry. Using panel data from 126 listed electronics firms covering 2005–2019, the research applies fixed-effects regression analysis to examine how CAPEX influences profitability, measured by EBITDA/total assets. The results confirm that CAPEX exerts a positive and statistically significant long-term effect on profitability, with stronger but not significantly different impacts for large firms compared to SMEs. The findings contribute to empirical evidence on capital investment efficiency and the implications of economies and diseconomies of scale in capital-intensive industries.
Full article

Figure 1
Open AccessArticle
Option Pricing in the Approach of Integrating Market Risk Premium: Application to OTM Options
by
David Liu
Analytics 2025, 4(4), 35; https://doi.org/10.3390/analytics4040035 - 21 Nov 2025
Abstract
►▼
Show Figures
In this research, we summarize the results of implementing the market risk premium into the option valuation formulas of the Black–Scholes–Merton model for out-of-the-money (OTM) options. We show that derivative prices can partly depend on systematic market risk, which the BSM model ignores
[...] Read more.
In this research, we summarize the results of implementing the market risk premium into the option valuation formulas of the Black–Scholes–Merton model for out-of-the-money (OTM) options. We show that derivative prices can partly depend on systematic market risk, which the BSM model ignores by construction. Specifically, empirical studies are conducted using 50ETF options obtained from the Shanghai Stock Exchange, covering the periods from January 2018 to September 2022 and from December 2023 to October 2025. The pricing of the OTM options shows that the adjusted BSM formulas exhibit better pricing performance compared with the market prices of the OTM options tested. Furthermore, a framework for the empirical analysis of option prices based on the Capital Asset Pricing Model (CAPM) or factor models is discussed, which may lead to option formulas using non-homogeneous heat equations. The later proposal requires further statistical testing using real market data but offers an alternative to the existing risk-neutral valuation of options.
Full article

Figure 1
Open AccessArticle
Fan Loyalty and Price Elasticity in Sport: Insights from Major League Baseball’s Post-Pandemic Recovery
by
Soojin Choi, Fang Zheng and Seung-Man Lee
Analytics 2025, 4(4), 34; https://doi.org/10.3390/analytics4040034 - 21 Nov 2025
Abstract
►▼
Show Figures
The COVID-19 pandemic disrupted traditional patterns of sport consumption, raising questions about whether fans would return to stadiums and how sensitive they would be to ticket prices in the recovery period. This study reconceptualizes ticket price elasticity as a market-based indicator of fan
[...] Read more.
The COVID-19 pandemic disrupted traditional patterns of sport consumption, raising questions about whether fans would return to stadiums and how sensitive they would be to ticket prices in the recovery period. This study reconceptualizes ticket price elasticity as a market-based indicator of fan loyalty and applies it to Major League Baseball (MLB) during 2021–2023. Using team–season attendance data from Baseball-Reference, primary-market ticket prices from the Team Marketing Report Fan Cost Index, and secondary-market prices from TicketIQ, we estimate log–log fixed-effects panel models to separate causal price responses from popularity-driven correlations. The results show a strongly negative elasticity of attendance with respect to primary-market prices (β ≈ −7.93, p < 0.001), indicating that higher ticket prices substantially reduce attendance, while secondary-market prices are positively associated with attendance, reflecting demand shocks rather than causal effects. Heterogeneity analyses reveal that brand strength, team performance, and game salience significantly moderate elasticity, supporting the interpretation of inelastic demand as revealed loyalty. These findings highlight the potential of elasticity as a Fan Loyalty Index, providing a replicable framework for measuring consumer resilience. The study offers practical insights for pricing strategy, fan segmentation, and engagement, while emphasizing the broader social role of sport in restoring community identity during post-pandemic recovery.
Full article

Figure 1
Open AccessArticle
AI-Powered Chatbot for FDA Drug Labeling Information Retrieval: OpenAI GPT for Grounded Question Answering
by
Manasa Koppula, Fnu Madhulika, Navya Sreeramoju and Praveen Kolimi
Analytics 2025, 4(4), 33; https://doi.org/10.3390/analytics4040033 - 17 Nov 2025
Abstract
►▼
Show Figures
This study presents the development of an AI-powered chatbot designed to facilitate accurate and efficient retrieval of information from the FDA drug labeling documents. Leveraging OpenAI’s GPT-3.5-turbo model within a controlled, document-grounded question–answering framework, Chatbot was created, which can provide users with answers
[...] Read more.
This study presents the development of an AI-powered chatbot designed to facilitate accurate and efficient retrieval of information from the FDA drug labeling documents. Leveraging OpenAI’s GPT-3.5-turbo model within a controlled, document-grounded question–answering framework, Chatbot was created, which can provide users with answers that are strictly limited to the content of the uploaded drug label, thereby minimizing hallucinations and enhancing traceability. A user-friendly interface built with Streamlit allows users to upload FDA labeling PDFs and pose natural language queries. The chatbot extracts relevant sections using PyMuPDF and regex-based segmentation and generates responses constrained to those sections. To evaluate performance, semantic similarity scores were computed between generated answers and ground truth text using Sentence Transformers. Results across 10 breast cancer drug labels demonstrate high semantic alignment, with most scores ranging from 0.7 to 0.9, indicating reliable summarization and contextual fidelity. The chatbot achieved high semantic similarity scores (≥0.95 for concise sections) and ROUGE scores, confirming strong semantic and textual alignment. Comparative analysis with GPT-5-chat and NotebookLM demonstrated that our approach maintains accuracy and section-specific fidelity across models. The current work is limited to a small dataset, focused on breast cancer drugs. Future work will expand to diverse therapeutic areas and incorporate BERTScore and expert-based validation.
Full article

Figure 1
Open AccessReview
Scale-Invariant Correspondence Analysis of Compositional Data
by
Vartan Choulakian and Jacques Allard
Analytics 2025, 4(4), 32; https://doi.org/10.3390/analytics4040032 - 12 Nov 2025
Abstract
►▼
Show Figures
Correspondence analysis is a dimension reduction technique for visualizing a non-negative matrix of size , particularly contingency tables or compositional datasets, but it depends on the row and column marginals of .
[...] Read more.
Correspondence analysis is a dimension reduction technique for visualizing a non-negative matrix of size , particularly contingency tables or compositional datasets, but it depends on the row and column marginals of . Three complementary transformations of the data render CA of invariant for any and : first, Greenacre’s scale-invariant approach, valid for positive data; second, Goodman’s marginal-free correspondence analysis, valid for positive or moderately sparse data; third, correspondence analysis of the sign-transformed matrix, sign valid for sparse or extremely sparse data. We demonstrate these three methods on four real-world datasets with varying levels of sparsity to compare their exploratory performance.
Full article

Graphical abstract
Open AccessArticle
PlayMyData: A Statistical Analysis of a Video Game Dataset on Review Scores and Gaming Platforms
by
Christian Ellington, Paramahansa Pramanik and Haley K. Robinson
Analytics 2025, 4(4), 31; https://doi.org/10.3390/analytics4040031 - 11 Nov 2025
Abstract
►▼
Show Figures
In recent years, video games have become an increasingly popular form of entertainment and enjoyment for consumers of all ages. Given their rapid rise in production, projects such as PlayMyData aim to organize the immense amounts of data that accompany these games into
[...] Read more.
In recent years, video games have become an increasingly popular form of entertainment and enjoyment for consumers of all ages. Given their rapid rise in production, projects such as PlayMyData aim to organize the immense amounts of data that accompany these games into sets of data for public use in research, primarily games bound specifically to modern platforms that are still being actively developed or further improved. This study aims to examine the particular differences in video game review scores using this set of data across the four listed platforms—Nintendo, Xbox, PlayStation, and PC—for different gaming titles relating to each platform. Through analysis of variance (ANOVA) testing and several other statistical analyses, significant differences between the platforms were observed, with PC games receiving the highest amount of positive scores and consistently outperforming the other three platforms, Xbox and PlayStation trailing behind PC, and Nintendo receiving the lowest review scores overall. These results illustrate the influence of platforms and their differences on player ratings and provide insight for developers and market analysts seeking to develop and invest in console platform video games.
Full article

Figure 1
Open AccessArticle
System Inertia Cost Forecasting Using Machine Learning: A Data-Driven Approach for Grid Energy Trading in Great Britain
by
Maitreyee Dey, Soumya Prakash Rana and Preeti Patel
Analytics 2025, 4(4), 30; https://doi.org/10.3390/analytics4040030 - 23 Oct 2025
Abstract
As modern power systems integrate more renewable and decentralised generation, maintaining grid stability has become increasingly challenging. This study proposes a data-driven machine learning framework for forecasting system inertia service costs—a key yet underexplored variable influencing energy trading and frequency stability in Great
[...] Read more.
As modern power systems integrate more renewable and decentralised generation, maintaining grid stability has become increasingly challenging. This study proposes a data-driven machine learning framework for forecasting system inertia service costs—a key yet underexplored variable influencing energy trading and frequency stability in Great Britain. Using eight years (2017–2024) of National Energy System Operator (NESO) data, four models—Long Short-Term Memory (LSTM), Residual LSTM, eXtreme Gradient Boosting (XGBoost), and Light Gradient-Boosting Machine (LightGBM)—are comparatively analysed. LSTM-based models capture temporal dependencies, while ensemble methods effectively handle nonlinear feature relationships. Results demonstrate that LightGBM achieves the highest predictive accuracy, offering a robust method for inertia cost estimation and market intelligence. The framework contributes to strategic procurement planning and supports market design for a more resilient, cost-effective grid.
Full article
(This article belongs to the Special Issue Business Analytics and Applications)
►▼
Show Figures

Figure 1
Open AccessArticle
Distributional CNN-LSTM, KDE, and Copula Approaches for Multimodal Multivariate Data: Assessing Conditional Treatment Effects
by
Jong-Min Kim
Analytics 2025, 4(4), 29; https://doi.org/10.3390/analytics4040029 - 21 Oct 2025
Abstract
►▼
Show Figures
We introduce a distributional CNN-LSTM framework for probabilistic multivariate modeling and heterogeneous treatment effect (HTE) estimation. The model jointly captures complex dependencies among multiple outcomes and enables precise estimation of individual-level conditional average treatment effects (CATEs). In simulation studies with multivariate Gaussian mixtures,
[...] Read more.
We introduce a distributional CNN-LSTM framework for probabilistic multivariate modeling and heterogeneous treatment effect (HTE) estimation. The model jointly captures complex dependencies among multiple outcomes and enables precise estimation of individual-level conditional average treatment effects (CATEs). In simulation studies with multivariate Gaussian mixtures, the CNN-LSTM demonstrates robust density estimation and strong CATE recovery, particularly as mixture complexity increases, while classical methods such as Kernel Density Estimation (KDE) and Gaussian Copulas may achieve higher log-likelihood or coverage in simpler scenarios. On real-world datasets, including Iris and Criteo Uplift, the CNN-LSTM achieves the lowest CATE RMSE, confirming its practical utility for individualized prediction, although KDE and Gaussian Copula approaches may perform better on global likelihood or coverage metrics. These results indicate that the CNN-LSTM can be trained efficiently on moderate-sized datasets while maintaining stable predictive performance. Overall, the framework is particularly valuable in applications requiring accurate individual-level effect estimation and handling of multimodal heterogeneity—such as personalized medicine, economic policy evaluation, and environmental risk assessment—with its primary strength being superior CATE recovery under complex outcome distributions, even when likelihood-based metrics favor simpler baselines.
Full article

Figure 1
Open AccessArticle
Reservoir Computation with Networks of Differentiating Neuron Ring Oscillators
by
Alexander Yeung, Peter DelMastro, Arjun Karuvally, Hava Siegelmann, Edward Rietman and Hananel Hazan
Analytics 2025, 4(4), 28; https://doi.org/10.3390/analytics4040028 - 20 Oct 2025
Abstract
►▼
Show Figures
Reservoir computing is an approach to machine learning that leverages the dynamics of a complex system alongside a simple, often linear, machine learning model for a designated task. While many efforts have previously focused their attention on integrating neurons, which produce an output
[...] Read more.
Reservoir computing is an approach to machine learning that leverages the dynamics of a complex system alongside a simple, often linear, machine learning model for a designated task. While many efforts have previously focused their attention on integrating neurons, which produce an output in response to large, sustained inputs, we focus on using differentiating neurons, which produce an output in response to large changes in input. Here, we introduce a small-world graph built from rings of differentiating neurons as a Reservoir Computing substrate. We find the coupling strength and network topology that enable these small-world networks to function as an effective reservoir. The dynamics of differentiating neurons naturally give rise to oscillatory dynamics when arranged in rings, where we study their computational use in the Reservoir Computing setting. We demonstrate the efficacy of these networks in the MNIST digit recognition task, achieving comparable performance of 90.65% to existing Reservoir Computing approaches. Beyond accuracy, we conduct systematic analysis of our reservoir’s internal dynamics using three complementary complexity measures that quantify neuronal activity balance, input dependence, and effective dimensionality. Our analysis reveals that optimal performance emerges when the reservoir operates with intermediate levels of neural entropy and input sensitivity, consistent with the edge-of-chaos hypothesis, where the system balances stability and responsiveness. The findings suggest that differentiating neurons can be a potential alternative to integrating neurons and can provide a sustainable future alternative for power-hungry AI applications.
Full article

Figure 1
Highly Accessed Articles
Latest Books
E-Mail Alert
News
Topics
Topic in
Applied Sciences, Future Internet, AI, Analytics, BDCC
Data Intelligence and Computational Analytics
Topic Editors: Carson K. Leung, Fei Hao, Xiaokang ZhouDeadline: 30 November 2026
Special Issues
Special Issue in
Analytics
Reviews on Data Analytics and Its Applications
Guest Editor: Carson K. LeungDeadline: 31 March 2026
Special Issue in
Analytics
Critical Challenges in Large Language Models and Data Analytics: Trustworthiness, Scalability, and Societal Impact
Guest Editors: Oluwaseun Ajao, Bayode Ogunleye, Hemlata SharmaDeadline: 31 July 2026
Special Issue in
Analytics
Business Analytics and Applications, 2nd Edition
Guest Editor: Tatiana ErmakovaDeadline: 30 September 2026



