Big Data and Cognitive Computing

20 pages, 17427 KB

Open AccessArticle

Towards Improved Clinical Adoption of AI Segmentation Models: Benchmarking High-Performance Models for Resource-Constrained Settings

by Emmanuel Chibuikem Nnadozie, Susana Merino-Caviedes, Daniel A. de Luis-Román, Marcos Martín-Fernández and Carlos Alberola-López

Big Data Cogn. Comput. 2026, 10(5), 142; https://doi.org/10.3390/bdcc10050142 (registering DOI) - 2 May 2026

Abstract

High-performance medical segmentation models are often benchmarked on high-end GPUs. Such benchmarks do not provide useful performance insights for point-of-care low-end devices. This work, firstly, posits that to achieve improved clinical adoption of AI-powered segmentation models, especially in reduced manpower settings like rural [...] Read more.

High-performance medical segmentation models are often benchmarked on high-end GPUs. Such benchmarks do not provide useful performance insights for point-of-care low-end devices. This work, firstly, posits that to achieve improved clinical adoption of AI-powered segmentation models, especially in reduced manpower settings like rural hospitals, we need benchmarks that provide actionable insights on the degree to which high-performance models address five deployment constraints viz: resource-effectiveness for low-end computing devices, clinically acceptable accuracy, clinically compatible execution times, localization of user data, and user-based finetuning. In this work, five state-of-the-art foundation segmentation models and one target-specific model were systematically evaluated on three multi-organ medical datasets. Furthermore, the best-ranking foundation model and target-specific model were benchmarked on three low-end devices. Our findings show that lightweight foundation models provided the best performance trade-off and are easily user-fine-tuned on custom datasets. Target-specific models provide high accuracy out-of-the-box, but may require significant optimisation to deliver comparably fast execution times and user-based finetuning on low-end devices. The methods and results from this research provide actionable insights on high-performance medical segmentation models for low-end computing devices, as a necessary step towards improved adoption in resource-limited clinical settings. Full article

(This article belongs to the Special Issue Next-Generation Medical Image Analysis: Multimodal, Decentralized, Fair and Reasoning-Centric Approaches)

22 pages, 5846 KB

Open AccessArticle

BERT-Based Models for Normalization of Adverse Drug Event Expressions in Social Media to Standard Medical Terminology for Drug Safety Analysis

by Fan Dong, Wenjing Guo, Jie Liu, Ann Varghese, Weida Tong, Tucker A. Patterson and Huixiao Hong

Big Data Cogn. Comput. 2026, 10(5), 141; https://doi.org/10.3390/bdcc10050141 (registering DOI) - 2 May 2026

Abstract

Social media platforms host abundant and timely descriptions of medication experiences that can complement traditional pharmacovigilance systems. Yet the linguistic informality of these data presents a major challenge for mapping adverse drug event (ADE) expressions to standardized medical terminology. In this study, we [...] Read more.

Social media platforms host abundant and timely descriptions of medication experiences that can complement traditional pharmacovigilance systems. Yet the linguistic informality of these data presents a major challenge for mapping adverse drug event (ADE) expressions to standardized medical terminology. In this study, we developed BERT-based language models to classify ADE mentions from social media into MedDRA System Organ Classes (SOCs). Using the SMM4H and CADEC corpora, as well as their combination, we performed 20 iterations of 20% holdout validation for 3-, 6-, 22-, and 25-SOC classification tasks with a selected fixed training configuration (learning rate, batch size, and training epochs) based on training-loss convergence. The models achieved accuracies ranging from 75% to 94%, demonstrating strong performance for SOC-level classification of noisy and informal ADE expressions under the evaluated settings. These results are based on a controlled mention-level evaluation using deduplicated adverse drug event strings and do not establish document-level or real-world deployment generalization. This work provides a systematic evaluation of BERT-based models for SOC-level classification of ADEs and demonstrates consistent performance within the evaluated datasets and label granularities. While direct comparison with prior studies is limited by differences in datasets and evaluation protocols, the results demonstrate that transformer-based models can effectively classify ADEs into SOCs. These findings support the use of transformer-based normalization for SOC-level aggregation of user-reported adverse events and their integration into large-scale social media pharmacovigilance pipelines as a downstream component under controlled conditions. Full article

(This article belongs to the Section Data Mining and Machine Learning)

► Show Figures

Figure 1

29 pages, 1779 KB

Open AccessArticle

BWT-Enhanced Compression for GIS Raster Data: A Hybrid AV1-Inspired Approach with Burrows–Wheeler Transform

by Yair Wiseman

Big Data Cogn. Comput. 2026, 10(5), 140; https://doi.org/10.3390/bdcc10050140 - 1 May 2026

Abstract

The AVIF (AV1 Image File Format) is a modern, royalty-free image format that leverages the AV1 video codec for superior compression efficiency, supporting both lossy and lossless modes. Its entropy encoding relies on a multi-symbol context-adaptive arithmetic coder (range coding with adaptive cumulative [...] Read more.

The AVIF (AV1 Image File Format) is a modern, royalty-free image format that leverages the AV1 video codec for superior compression efficiency, supporting both lossy and lossless modes. Its entropy encoding relies on a multi-symbol context-adaptive arithmetic coder (range coding with adaptive cumulative distribution functions (CDFs)), which is effective for general imagery but may not optimally exploit the repetitive structures common in Geographic Information System (GIS) maps/data. This paper proposes replacing AVIF’s entropy encoder with the Burrows–Wheeler Transform (BWT), a reversible preprocessing algorithm that rearranges data to create runs of similar symbols, enhancing subsequent compression. We detail the technical steps for modification, drawing from AV1’s open-source implementation, and explain why BWT is advantageous for GIS raster maps/data, which often feature large uniform areas, limited color palettes, and spatial redundancies. Empirical evidence from related studies on BWT-based image compression shows improvements in lossless scenarios, potentially considerably reducing file sizes over standard methods while preserving data integrity critical for geospatial analysis. This swap could improve storage, transmission, and processing efficiency in GIS applications, such as remote sensing and cartography. The discussion includes challenges like computational overhead and compatibility, with recommendations for implementations. The resulting BWT-AVIF hybrid produces a non-standard AV1 bit-stream that is not compliant with the AV1 or AVIF specifications and therefore requires custom decoders. It is presented here as a research prototype for GIS-specific compression rather than a compliant AVIF extension. Full article

(This article belongs to the Special Issue Intelligent Communication and Sensor Networks for Advanced Signal Processing)

► Show Figures

Figure 1

37 pages, 2045 KB

Open AccessArticle

A Hybrid Artificial Intelligence Framework for Reliable and Seamless Vertical Handover in Next-Generation Heterogeneous Networks

by Sunisa Kunarak

Big Data Cogn. Comput. 2026, 10(5), 139; https://doi.org/10.3390/bdcc10050139 - 29 Apr 2026

Abstract

Next-generation heterogeneous wireless networks (HetNets) comprising LTE macro-cells, 5G New Radio (NR) small cells, and WiFi 6 access points aim to provide seamless connectivity under diverse mobility scenarios. However, vertical handover (VHO) remains a performance bottleneck because of the highly variable radio environments, [...] Read more.

Next-generation heterogeneous wireless networks (HetNets) comprising LTE macro-cells, 5G New Radio (NR) small cells, and WiFi 6 access points aim to provide seamless connectivity under diverse mobility scenarios. However, vertical handover (VHO) remains a performance bottleneck because of the highly variable radio environments, dynamic user mobility, stringent quality of service (QoS) requirements, and the coexistence of multi-tier access technologies. Existing handover approaches based on deep learning and deep reinforcement learning (DRL) suffer from limitations: deep learning models lack decision-making capabilities, whereas DRL models, particularly deep Q-network (DQN)-based policies, face Q-value overestimation and unstable convergence. To overcome these limitations, this paper introduces a Hybrid deep double-Q networks (DDQN)–bidirectional long short-term memory (Bi-LSTM) Framework that integrates bi-directional mobility prediction and DRL-based adaptive decision-making. The Bi-LSTM module captures forward and backward temporal dependencies and predicts future Received Signal Strength (RSS) trajectories, mobility dynamics, and cell-edge transitions. The DDQN module stabilizes the action value estimation, mitigates overestimation bias, and enables context-aware handover decisions. A multi-tier simulation environment consisting of LTE, 5G NR, and WiFi 6 networks was developed using realistic path loss, shadowing, interference, and mobility models. Extensive evaluations demonstrated substantial improvements in mobility prediction accuracy, handover stability, radio link reliability, throughput efficiency, and latency reduction compared to conventional RSS-based and DQN-based schemes. The findings highlight the effectiveness of integrating predictive intelligence with reinforcement learning for reliable mobility management in 5G-Advanced and emerging 6G networks. Full article

31 pages, 1521 KB

Open AccessArticle

GPU-TOPSIS: A Complete Vectorized and Parallel Reformulation of the TOPSIS Method for Large-Scale Multi-Criteria Decision Making

by Latifa Boubekri, Hassnae Aberkane, Mohammed Chaouki Abounaima and Loubna Lamrini

Big Data Cogn. Comput. 2026, 10(5), 138; https://doi.org/10.3390/bdcc10050138 - 28 Apr 2026

Abstract

The TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) method is one of the most widely used multi-criteria decision-making (MCDM) approaches in industrial, financial, and scientific fields. However, its sequential computational cost of O(m × n), where m denotes the number [...] Read more.

The TOPSIS (Technique for Order Preference by Similarity to Ideal Solution) method is one of the most widely used multi-criteria decision-making (MCDM) approaches in industrial, financial, and scientific fields. However, its sequential computational cost of O(m × n), where m denotes the number of alternatives and n the number of criteria, becomes prohibitive when decision matrices have several million rows. Despite its geometric interpretability and simplicity, classical TOPSIS faces two key computational bottlenecks at scale: (i) Euclidean distance calculations O(m × n) dominating the total cost, and (ii) the O(m × log m) sorting step, both inherently sequential and memory-bound on CPUs. To overcome these limitations, we propose GPU-TOPSIS, a fully vectorized and parallel reformulation of TOPSIS based on tensor execution on graphics processing units (GPUs), whose main contributions are: (i) a formally correct reformulation of TOPSIS as a GPU tensor pipeline preserving mathematical fidelity to the original method; (ii) a two-pass fragment-processing algorithm guaranteeing exact mathematical equivalence with monolithic TOPSIS, while reducing the memory footprint from O(m × n) to O(m_t × n), where m_t < m is the size of each independently processed fragment; (iii) three independent implementations on CuPy, PyTorch, and TensorFlow, ensuring the framework’s portability and genericity. Experimental evaluations on real data from the Amazon Products 2023 dataset, using matrices of up to 200 million alternatives (via the 2-pass formulation), demonstrate speedups of up to 4.75× compared to the reference CPU implementation (NumPy), with inter-backend score differences below 5 × 10⁻⁸ and 100% ranking overlap across all tested Top-K thresholds. A perturbation sensitivity analysis of the criteria weights and cross-backend consistency tests confirms that GPU acceleration fully preserves robustness and decision reliability, making GPU-TOPSIS a practical, open, and reproducible solution for large-scale multi-criteria decision making in Big Data environments. Full article

26 pages, 1078 KB

Open AccessReview

A Review of Key Technologies for Systems Based on Non-Volatile Memory

by Yuhan Zhang, Zehang Wang, Yuanfang Chen, Chunfeng Du and Jing Chen

Big Data Cogn. Comput. 2026, 10(5), 137; https://doi.org/10.3390/bdcc10050137 - 27 Apr 2026

Abstract

With the continuous growth of data-intensive applications and artificial intelligence workloads, traditional dynamic random access memory (DRAM) is increasingly struggling to meet demands in terms of capacity scale, energy consumption constraints, and data retention after power failure. Consequently, non-volatile memory (NVM) has emerged [...] Read more.

With the continuous growth of data-intensive applications and artificial intelligence workloads, traditional dynamic random access memory (DRAM) is increasingly struggling to meet demands in terms of capacity scale, energy consumption constraints, and data retention after power failure. Consequently, non-volatile memory (NVM) has emerged as a crucial technology for bridging the gap between the memory and storage layers. However, due to inherent differences in write life, read–write performance variations, and consistency guarantee after failure, the systematic application of NVM still faces a series of challenges. Addressing these issues, this paper takes as its starting point the adaptation of medium characteristics and system design, and summarizes the research progress in aspects such as write optimization, consistency and security coordination mechanisms, data structure modification under hybrid memory architecture, and cross-layer resource collaboration. It also conducts an in-depth analysis of representative solutions and evaluation methods. The review results show that current research has shifted from improving a single performance bottleneck to multi-mechanism collaborative optimization. Various technical approaches have proven complementary in alleviating write amplification, enhancing persistence efficiency, and optimizing access patterns. This paper demonstrates that achieving stable and scalable application of NVM requires establishing a more systematic collaborative design concept between durability, security, and performance. As AI training workloads and big data analytics place increasing demands on memory bandwidth and persistence, the techniques surveyed here provide a foundational basis for next-generation memory-centric computing infrastructures. Full article

(This article belongs to the Special Issue Internet Intelligence for Cybersecurity)

30 pages, 12283 KB

Open AccessArticle

A Robust Ensemble Learning Approach to URL-Based Phishing Webpage Detection

by Abdellah Rezoug and Mohamed Bader-el-den

Big Data Cogn. Comput. 2026, 10(5), 136; https://doi.org/10.3390/bdcc10050136 - 27 Apr 2026

Abstract

The proliferation of online fraud has resulted in substantial financial damage to individuals and organizations alike, with web phishing emerging as one of the most pervasive and harmful attack vectors. In response, this paper proposes the Stacking Ensemble Models Generator (SEMG), a URL-based [...] Read more.

The proliferation of online fraud has resulted in substantial financial damage to individuals and organizations alike, with web phishing emerging as one of the most pervasive and harmful attack vectors. In response, this paper proposes the Stacking Ensemble Models Generator (SEMG), a URL-based phishing detection approach that leverages a multi-objective Genetic Algorithm to jointly optimize Precision and Recall in the selection and configuration of stacking ensemble models. An initial pool of base learners is trained on labeled datasets and subsequently evolved through genetic operators toward a globally optimal ensemble. Experimental evaluation across five datasets sourced from Mendeley and UCI repositories demonstrates that SEMG consistently surpasses individual base learners and compares favorably against existing methods, attaining

99.2 %

performance across all metrics on D2 while matching or exceeding state-of-the-art results on the remaining benchmarks. These outcomes underscore the framework’s robustness and its potential for deployment in real-world phishing detection systems. Full article

(This article belongs to the Section Data Mining and Machine Learning)

► Show Figures

Figure 1

19 pages, 16316 KB

Open AccessArticle

Enhancing Adversarial Transferability via Fourier-Based Input Transformation

by Zilin Tian, Xin Wang, Yunfei Long and Liguo Zhang

Big Data Cogn. Comput. 2026, 10(5), 135; https://doi.org/10.3390/bdcc10050135 - 27 Apr 2026

Abstract

Adversarial transferability makes black-box attacks practical and exposes weaknesses of deep neural networks for computer vision, image recognition, and visual understanding. Among various transferability-enhancing methods, input transformation is one of the most effective strategies. However, existing methods often ignore the decoupling of style [...] Read more.

Adversarial transferability makes black-box attacks practical and exposes weaknesses of deep neural networks for computer vision, image recognition, and visual understanding. Among various transferability-enhancing methods, input transformation is one of the most effective strategies. However, existing methods often ignore the decoupling of style and semantics in the input image, as well as the need for customized transformation strategies, resulting in limited performance gains or suboptimal outcomes. In this paper, we propose a novel Fourier-based perspective for input transformation generalization in the context of vision adversarial attacks. The main observations are that the Fourier amplitude captures stylistic information and the phase encompasses richer semantics which are crucial for visual understanding. Motivated by this, we develop a Fourier-based strategy, which performs a stylistic transform and semantic mixup on the input examples to improve transferability. To avoid inconsistent semantics of augmented images for the surrogate model, we mix the original images with the augmentations to maintain semantic consistency and mitigate imprecise gradients. Extensive experiments on ImageNet-compatible datasets demonstrate that our method consistently outperforms existing input transformation attacks. Full article

(This article belongs to the Section Artificial Intelligence and Multi-Agent Systems)

► Show Figures

Figure 1

25 pages, 1180 KB

Open AccessArticle

A Physically Regularized Control-Oriented State Model and Nonlinear Model Predictive Control Framework for an Ice Rink Refrigeration System

by Alexander A. Karmanov and Petr V. Nikitin

Big Data Cogn. Comput. 2026, 10(5), 134; https://doi.org/10.3390/bdcc10050134 - 26 Apr 2026

Abstract

Energy-intensive refrigeration systems require predictive models that remain informative under counterfactual control trajectories, not only on archived operation. This paper develops a control-oriented multi-step state model and a nonlinear model predictive control framework for an indoor ice-rink refrigeration system. Historical state, control, and [...] Read more.

Energy-intensive refrigeration systems require predictive models that remain informative under counterfactual control trajectories, not only on archived operation. This paper develops a control-oriented multi-step state model and a nonlinear model predictive control framework for an indoor ice-rink refrigeration system. Historical state, control, and exogenous variables are encoded jointly with an admissible future control trajectory, and a normalized thermal-balance residual is added to the training objective. A lightweight conditioned transformer predicts ice temperature, return-glycol temperature, supply-glycol temperature, and compressor power over a 30 min horizon. The selected weakly regularized model with regularization coefficient λ_phys = 0.001 decreases the normalized thermal-balance root-mean-square error on the horizon tail by 30.29% relative to the base model while increasing the average ice-temperature root-mean-square error by only 1.90%. In a surrogate-based counterfactual four-day evaluation, the resulting nonlinear model predictive controller reduces predicted daily energy by 4.84%, terminal violation share by 17.32%, mean absolute terminal ice-temperature deviation by 18.74%, and the mean objective value by 30.82% relative to historical admissible setpoint tracking. The mean full control cycle time is 0.0311 s, confirming real-time feasibility for a 5 min supervisory update interval. All controller results are surrogate-based rather than field-deployed and therefore represent receding-horizon benchmark results under learned-model evaluation, not realized field savings. Full article

(This article belongs to the Section Data Mining and Machine Learning)

16 pages, 1132 KB

Open AccessArticle

Mamba-Based Video Analysis for Blood Pressure Estimation

by Walaa Othman, Batol Hamoud, Nikolay Shilov, Alexey Kashevnik and Alexander Mayatin

Big Data Cogn. Comput. 2026, 10(5), 133; https://doi.org/10.3390/bdcc10050133 - 26 Apr 2026

Abstract

Blood pressure monitoring is important for overall health assessment, yet traditional cuff-based methods are intrusive and unsuitable for continuous monitoring. This paper proposes a contactless approach for blood pressure estimation from facial videos using a bidirectional Mamba-based architecture with uncertainty quantification. Our method [...] Read more.

Blood pressure monitoring is important for overall health assessment, yet traditional cuff-based methods are intrusive and unsuitable for continuous monitoring. This paper proposes a contactless approach for blood pressure estimation from facial videos using a bidirectional Mamba-based architecture with uncertainty quantification. Our method processes 64-frame video segments through a hierarchical 3D convolutional encoder to extract spatiotemporal features, then applies bidirectional state-space modeling to capture temporal dynamics efficiently. The model was evaluated on the Vitals for Vision (V4V) dataset, achieving mean absolute errors of 13.15 mmHg for systolic and 9.56 mmHg for diastolic blood pressure, outperforming prior methods while requiring significantly fewer computational resources than attention-based approaches. While these results do not meet clinical-grade diagnostic standards, they demonstrate the feasibility of contactless blood pressure estimation for non-clinical applications such as wellness monitoring, preliminary health screening, and continuous remote observation, where unobtrusive and computationally efficient monitoring is desirable. Full article

(This article belongs to the Section Data Mining and Machine Learning)

24 pages, 750 KB

Open AccessArticle

Adversarial Evaluation of Large Language Models for Building Robust Offensive Language Detection in Moroccan Arabic

by Soufiyan Ouali, Kanza Raisi, Asmaa Mourhir, El Habib Nfaoui and Said El Garouani

Big Data Cogn. Comput. 2026, 10(5), 132; https://doi.org/10.3390/bdcc10050132 - 24 Apr 2026

Abstract

Offensive language detection is crucial for ensuring safe and inclusive digital environments. Identifying harmful content protects users and supports healthier online interactions. Despite advances in transformer-based models, particularly Large Language Models (LLMs), their application to this task remains underexplored for low-resource languages such [...] Read more.

Offensive language detection is crucial for ensuring safe and inclusive digital environments. Identifying harmful content protects users and supports healthier online interactions. Despite advances in transformer-based models, particularly Large Language Models (LLMs), their application to this task remains underexplored for low-resource languages such as Moroccan Arabic, especially compared with high-resource languages. This study evaluates the performance of various open- and closed-source LLMs for offensive language detection in Moroccan Darija. The evaluated models include general-purpose LLMs such as LLaMA, Mistral, and Gemma, as well as Arabic-focused models such as ArabianGPT, Falcon Arabic, and Atlas-Chat. We also experiment with reasoning models such as DeepSeek and GPT-4. Beyond traditional evaluation metrics, we investigate the robustness of these LLMs and examine the impact of adversarial training on their performance. Moreover, we contribute to the field by creating a large, high-quality dataset. Our evaluation revealed that GPT-4o Mini achieved the best overall performance, reaching an F1-score of 88%. However, robustness testing under black-box and white-box adversarial attacks exposed notable vulnerabilities, with attack success rates reaching 30%, thereby highlighting the need for enhancement. Despite the complex morphology and linguistic variability of Moroccan Darija, adversarial training resulted in a notable improvement in both overall model performance and robustness against adversarial attacks, yielding an average increase of 20.89% in resistance to attacks. Furthermore, this approach enabled GPT-4o Mini to achieve an F1-score of 91%, surpassing the current state-of-the-art performance by 6%. These results highlight the importance of incorporating adversarial approaches in low-resource dialectal settings to effectively address linguistic variability and data scarcity. Full article

(This article belongs to the Special Issue Natural Language Processing Applications in Big Data)

► Show Figures

Figure 1

26 pages, 2864 KB

Open AccessArticle

FEM-Based Hybrid Compression Framework with Pipeline Implementation for Efficient Deep Neural Networks on Tiny ImageNet

by Areej Hamza, Amel Tuama and Asraf Mohamed Moubark

Big Data Cogn. Comput. 2026, 10(5), 131; https://doi.org/10.3390/bdcc10050131 - 22 Apr 2026

Abstract

The high accuracy achieved by deep learning techniques has made them indispensable in computer vision applications. However, their substantial memory demands and high computational complexity limit their deployment in resource-constrained environments. To address this challenge, this study introduces a Feature Enhancement Module (FEM) [...] Read more.

The high accuracy achieved by deep learning techniques has made them indispensable in computer vision applications. However, their substantial memory demands and high computational complexity limit their deployment in resource-constrained environments. To address this challenge, this study introduces a Feature Enhancement Module (FEM) as part of a unified hybrid compression framework that combines mixed-precision quantization and structured pruning to improve model efficiency. Experimental results on the Tiny ImageNet dataset using ResNet50 and MobileNetV3 architectures demonstrate the strong adaptability and scalability of the proposed approach. Compared with state-of-the-art compression methods, the proposed FEM-based framework achieves up to 6% improvement in Top-1 accuracy, while reducing memory usage by 32.26% and improving inference speed by 66%. Furthermore, the ablation study demonstrates that incorporating the FEM module leads to up to 24% improvement over the baseline model, highlighting its effectiveness. The results further show that FEM effectively preserves inter-channel feature representation stability even under aggressive compression, making it well suited for real-time processing and practical Artificial Intelligence (AI) applications. By maintaining semantic richness while significantly reducing computational cost, the proposed method bridges the gap between high-performance deep models and lightweight, deployable solutions. Overall, the FEM-based hybrid compression framework establishes a scalable and architecture-independent foundation for sustainable deep learning in resource-limited environments. Full article

► Show Figures

Graphical abstract

33 pages, 1483 KB

Open AccessArticle

A Data-Driven Machine Learning Framework for Multi-Criteria ESG Evaluation

by Zhiyuan Wang, Tristan Lim, Yun Teng and Chongwu Xia

Big Data Cogn. Comput. 2026, 10(5), 130; https://doi.org/10.3390/bdcc10050130 - 22 Apr 2026

Abstract

This study proposes a novel data-driven machine learning (ML) framework for multi-criteria environmental, social, and governance (ESG) evaluation. The framework aims to address the transparency, consistency, and subjectivity limitations of existing ESG evaluation systems by employing a fully data-driven, modular, and ML-supported architecture. [...] Read more.

This study proposes a novel data-driven machine learning (ML) framework for multi-criteria environmental, social, and governance (ESG) evaluation. The framework aims to address the transparency, consistency, and subjectivity limitations of existing ESG evaluation systems by employing a fully data-driven, modular, and ML-supported architecture. It comprises three main modules: (i) ESG data preprocessing with missing-data imputation by the MissForest algorithm; (ii) a three-plane ESG feature selection workflow that integrates clustering, feature importance, and classification algorithms to identify representative ESG indicators; and (iii) a hybrid weighting and ranking procedure that combines unsupervised principal component analysis (PCA), criteria importance through inter-criteria correlation (CRITIC), and technique for order preference by similarity to ideal solution (TOPSIS) methods. A recent 2024 real-world application involving 57 listed Chinese pharmaceutical and biotechnology companies and 70 ESG indicators demonstrates the framework’s practical utility in producing transparent and objective ESG rankings. The main contributions of this work are fourfold: (1) the development of an end-to-end, entirely data-driven ML framework for ESG evaluation; (2) the introduction of an innovative three-plane ESG feature selection workflow within the framework; (3) the first study for designing a hybrid PCA-CRITIC-TOPSIS approach in ESG weighting and ranking; (4) the validation of the framework through a real-world industry application using recent and authentic ESG data. Full article

(This article belongs to the Section Data Mining and Machine Learning)

28 pages, 3411 KB

Open AccessReview

Fuzz Driver Generation: A Survey and Outlook from the Perspective of Data Sources

by Xiao Feng, Shuaibing Lu, Taotao Gu, Yuanping Nie, Qian Yan, Mucheng Yang, Jinyang Chen and Xiaohui Kuang

Big Data Cogn. Comput. 2026, 10(4), 129; https://doi.org/10.3390/bdcc10040129 - 21 Apr 2026

Abstract

Fuzzing is an essential element of software supply chain security governance. Despite its importance, the widespread adoption of library fuzzing is limited by the significant costs associated with constructing fuzz drivers. Without a clear entry point, the reachable path space of the target [...] Read more.

Fuzzing is an essential element of software supply chain security governance. Despite its importance, the widespread adoption of library fuzzing is limited by the significant costs associated with constructing fuzz drivers. Without a clear entry point, the reachable path space of the target library is determined by the interplay of API call sequences, parameter dependencies, and state constraints. As a result, fuzz drivers must achieve not only successful builds but also provide sufficient semantic context to enable exploration of deeper state machine interactions, thereby avoiding premature stagnation at superficial validation logic. To systematically assess advancements in automated fuzz driver generation, this paper develops a taxonomy organized around the primary data sources used to derive driver-generation constraints, categorizing existing approaches into four technological trajectories: Usage Artifact Mining, Source Code Constraint Inference, Binary Semantics Recovery, and Heterogeneous Data Fusion. Large language models are increasingly integrated into these workflows as generators and as components for constraint alignment and repair. To address inconsistencies in experimental methodologies, this paper introduces a bounded comparability-oriented evaluation perspective focused on three dimensions: validity, reachability-related evidence, and reproducibility and cost. Together with a disclosure and reporting protocol for metric comparability, this perspective clarifies the information needed for cross-study comparison and examines the unique features and inherent limitations of each technical trajectory. Based on these findings, three key directions for future research are identified: facilitating structural evolution in response to coverage plateaus to address deep logic unreachability; coordinating dynamic closed-loop orchestration that utilizes on-demand heterogeneous data retrieval to resolve context challenges; and developing language-agnostic driver representations with pluggable adaptation mechanisms to improve cross-ecosystem portability and scalability. Full article

(This article belongs to the Special Issue Machine Learning Methodologies and Applications in Cybersecurity Data Analysis)

► Show Figures

Figure 1

22 pages, 12252 KB

Open AccessArticle

A Reservoir Computing Approach for Synchronizing Discrete-Time 3D Chaotic Systems

by Vismaya V. S, Swetha P, Jubin K. Babu, Diya Gijo, Varada M. T, Adithya K. K, Ekaterina Kopets and Sishu Shankar Muni

Big Data Cogn. Comput. 2026, 10(4), 128; https://doi.org/10.3390/bdcc10040128 - 21 Apr 2026

Abstract

Reservoir computing (RC) is an efficient framework for processing time-series data. This work investigates the synchronization of two independently trained reservoir computers that, after training, operate without external input from the chaotic system and interact solely through symmetric linear coupling. This approach addresses [...] Read more.

Reservoir computing (RC) is an efficient framework for processing time-series data. This work investigates the synchronization of two independently trained reservoir computers that, after training, operate without external input from the chaotic system and interact solely through symmetric linear coupling. This approach addresses a gap in existing reservoir computing-based synchronization studies, which predominantly rely on master–slave or system-driven configurations. In this work, we first build and train two reservoir computing models based on 3D nonlinear chaotic maps and hyperchaotic systems and then introduce a symmetric linear coupling mechanism between them. This study demonstrates that reservoir computing can accurately reproduce the short-term dynamics of chaotic systems and provides insight into the interactions between learned dynamical models, while also helping us understand how complex systems connect and operate collectively. We use this systematic approach to establish a framework for understanding how two trained reservoir computers interact under varying coupling strengths, enabling a detailed investigation of their synchronization behavior. To demonstrate the adaptability of the proposed framework to diverse dynamical behaviors, we systematically investigated three discrete chaotic and hyperchaotic systems: (1) discrete 3D sinusoidal map with discrete Lorenz attractor, (2) 3D sinusoidal map with conjoined Lorenz twin attractor, and (3) 3D quadratic hyperchaotic map. For performance evaluation, we trained coupled RCs and computed the synchronization error for different coupling strengths. We also present phase portraits and time-series plots of the attractors and RCs, along with the synchronization error as a function of the coupling strength, thereby demonstrating the possibility of synchronization of two linearly coupled RCs, which are independently trained on discrete, three-dimensional chaotic and hyperchaotic systems. Full article

► Show Figures

Figure 1

5 pages, 163 KB

Open AccessEditorial

Generative AI and Large Language Models

by Fabrizio Marozzo and Riccardo Cantini

Big Data Cogn. Comput. 2026, 10(4), 127; https://doi.org/10.3390/bdcc10040127 - 21 Apr 2026

Abstract

In recent years, generative artificial intelligence and, in particular, large language models (LLMs) have rapidly transformed the landscape of data analysis, knowledge extraction, content generation, and intelligent decision support [...] Full article

(This article belongs to the Special Issue Generative AI and Large Language Models)

21 pages, 7435 KB

Open AccessArticle

Edge Node Deployment for Turbidity Estimation in Farm Ponds

by Martin Moreno, Iván Trejo-Zúñiga, Víctor Alejandro González-Huitrón, René Francisco Santana-Cruz, Raúl García García and Gabriela Pineda Chacón

Big Data Cogn. Comput. 2026, 10(4), 126; https://doi.org/10.3390/bdcc10040126 - 18 Apr 2026

Abstract

Image-based AI offers a low-cost alternative to traditional turbidity sensors in farm ponds, yet the prevailing shift toward Vision Transformers (ViTs) critically overlooks two field realities: the chronic scarcity of annotated data (Small Data) and the strict computational limits of edge hardware. This [...] Read more.

Image-based AI offers a low-cost alternative to traditional turbidity sensors in farm ponds, yet the prevailing shift toward Vision Transformers (ViTs) critically overlooks two field realities: the chronic scarcity of annotated data (Small Data) and the strict computational limits of edge hardware. This study presents a frugal computer vision framework that challenges the need for complex architectures in environmental screening. By systematically benchmarking six deep learning models across a calibrated high-turbidity dataset (200–800 NTU, 700 images) under standardized capture conditions, we demonstrate that traditional Convolutional Neural Networks (CNNs) possess a crucial inductive bias for this task. Specifically, ResNet-50 significantly outperformed modern ViTs in both accuracy (96.3% vs. 80.0%) and data efficiency, effectively capturing spatial scattering patterns without the massive data requirements that hindered transformer convergence. Deployed on a resource-constrained Raspberry Pi 4, the CNN-based system achieved an inference latency of 46 ms, demonstrated in an initial hardware-in-the-loop field proof-of-concept (82.4% agreement under baseline, calm-weather conditions,

n = 17

). This edge-native approach not only provides actionable spatial turbidity maps to guide on-farm filtration and livestock management decisions but also establishes a critical architectural baseline: under controlled capture protocols, mature CNNs consistently outperform ViTs, establishing them as the optimal architecture for frugal, small-data agricultural Internet of Things (IoT) deployments. Full article

► Show Figures

Graphical abstract

16 pages, 673 KB

Open AccessArticle

LST-AGCN: A Novel Unified Lightweight Attention Framework for Efficient Skeleton-Based Action Recognition

by Khadija Lasri, Khalid El Fazazy, Adnane Mohamed Mahraz, Hamid Tairi and Jamal Riffi

Big Data Cogn. Comput. 2026, 10(4), 125; https://doi.org/10.3390/bdcc10040125 - 18 Apr 2026

Abstract

While Graph Convolutional Networks (GCNs) have revolutionized skeleton-based action recognition, existing methods face a critical efficiency–accuracy dilemma: state-of-the-art approaches achieve high performance through computationally expensive multi-stream fusion (joint, bone, joint motion, and bone motion) and deep architectures, limiting real-world deployment on resource-constrained devices. [...] Read more.

While Graph Convolutional Networks (GCNs) have revolutionized skeleton-based action recognition, existing methods face a critical efficiency–accuracy dilemma: state-of-the-art approaches achieve high performance through computationally expensive multi-stream fusion (joint, bone, joint motion, and bone motion) and deep architectures, limiting real-world deployment on resource-constrained devices. We propose LST-AGCN (Lightweight Spatial–Temporal Attention Graph Convolutional Network), introducing three technical contributions that address this challenge: (1) Unified Attention Module (UAM)—a framework that integrates channel, spatial, and temporal attention through a single compact operation, significantly reducing attention parameters compared to separate attention mechanisms; (2) Depthwise Separable Attention Mechanism (DSAM)—a factorization using depthwise separable convolutions that achieves linear complexity reduction from

O (C^{2})

to

O (C)

in attention operations; and (3) Efficient Topology-Aware Fusion (ETAF)—an adaptive Joint-wise Attention strategy that captures fine-grained spatial relationships without quadratic complexity growth. Extensive experiments on NTU RGB+D 60 and NTU RGB+D 120 datasets demonstrate that LST-AGCN achieves strong performance using only joint modality (86.14%/94.0% and 79.5%/82.0% Top-1 accuracy with 99.0% Top-5 on cross-view) while requiring 14.11 M parameters and 19.02 GFLOPs, delivering efficient inference suitable for edge deployment. Full article

(This article belongs to the Special Issue Advances in Artificial Intelligence for Computer Vision, Augmented Reality Virtual Reality and Metaverse)

► Show Figures

Figure 1

30 pages, 1706 KB

Open AccessArticle

Understanding the Global Trends of 2025 Through the Defly Compass Methodology

by Mabel López Bordao, Antonia Ferrer Sapena, Carlos A. Reyes Pérez and Enrique A. Sánchez Pérez

Big Data Cogn. Comput. 2026, 10(4), 124; https://doi.org/10.3390/bdcc10040124 - 17 Apr 2026

Abstract

This study aims to identify and synthesize the major global trends that shaped 2025 by applying the DeflyCompass methodology to a curated corpus of strategic foresight reports. The study synthesizes insights from 23 strategic reports published by leading international organizations, including the World [...] Read more.

This study aims to identify and synthesize the major global trends that shaped 2025 by applying the DeflyCompass methodology to a curated corpus of strategic foresight reports. The study synthesizes insights from 23 strategic reports published by leading international organizations, including the World Economic Forum, Accenture, Euromonitor, and major technology firms. Methodologically, DeflyCompass operationalizes a structured hybrid human–AI pipeline comprising the deployment of multi-agent AI systems, automated knowledge graph construction, semantic clustering, and hybrid human–AI validation processes, reducing an initial set of 816 preliminary signals to a validated catalog of 50 high-priority trends across six PESTEL domains: Political, Economic, Social, Technological, Environmental, and Legal/Governance. Key findings indicate that artificial intelligence functions as a systemic enabling technology across all domains, climate and sustainability imperatives permeate multiple domains, geopolitical fragmentation introduces systemic tension, and trust deficits emerge as a critical vulnerability. The study contributes a replicable and scalable framework for global-level strategic foresight that operationalizes human–AI integration within a rigorous expert-driven validation process, complementing existing hybrid analytical approaches in the literature. Implications extend to decision-making in technology governance, sustainability strategy, social adaptation, and scenario planning, highlighting the necessity of integrating AI augmentation with human expertise for effective future-oriented planning. Full article

► Show Figures

Graphical abstract

Journal Description

Big Data and Cognitive Computing

Latest Articles

Journal Menu

Journal Browser

Highly Accessed Articles

Latest Books

E-Mail Alert

News

Topics

Conferences

Special Issues

Further Information

Guidelines

MDPI Initiatives

Follow MDPI