Next Issue
Volume 10, March
Previous Issue
Volume 10, January
 
 

Big Data Cogn. Comput., Volume 10, Issue 2 (February 2026) – 28 articles

Cover Story (view full-size image): News data powers research in economics, social science, and NLP, yet full-text corpora are often expensive or hard to access. We introduce gdeltnews (https://github.com/iandreafc/gdeltnews), an open-source Python package that reconstructs near-complete online news articles from the GDELT Web News NGrams 3.0 dataset by assembling overlapping fragments with positional constraints. Validated on 2211 URL-matched articles from major U.S. outlets, the reconstructions achieve up to ~95% similarity. The tool enables scalable, reproducible, near-zero-cost access to global news text for custom analysis. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
23 pages, 1201 KB  
Article
Comparative Read Performance Analysis of PostgreSQL and MongoDB in E-Commerce: An Empirical Study of Filtering and Analytical Queries
by Jovita Urnikienė, Vaida Steponavičienė and Svetoslav Atanasov
Big Data Cogn. Comput. 2026, 10(2), 66; https://doi.org/10.3390/bdcc10020066 - 19 Feb 2026
Viewed by 996
Abstract
This paper presents a comparative analysis of read performance for PostgreSQL and MongoDB in e-commerce scenarios, using identical datasets in a resource-constrained single-host environment. The results demonstrate that PostgreSQL executes complex analytical queries 1.6–15.1 times faster, depending on the query type and data [...] Read more.
This paper presents a comparative analysis of read performance for PostgreSQL and MongoDB in e-commerce scenarios, using identical datasets in a resource-constrained single-host environment. The results demonstrate that PostgreSQL executes complex analytical queries 1.6–15.1 times faster, depending on the query type and data volume. The study employed synthetic data generation with the Faker library across three stages, processing up to 300,000 products and executing each of 6 query types 15 times. Both filtering and analytical queries were tested on non-indexed data in a controlled localhost environment with PostgreSQL 17.5 and MongoDB 7.0.14, using default configurations. PostgreSQL showed 65–80% shorter execution times for multi-criteria queries, while MongoDB required approximately 33% less disk space. These findings suggest that normalized relational schemas are advantageous for transactional e-commerce systems where analytical queries dominate the workload. The results are directly applicable to small and medium e-commerce developers operating in budget-constrained, single-host deployment environments when choosing between relational and document-oriented databases for structured transactional data with read-heavy analytical workloads. A minimal indexed validation confirms that the baseline trends remain consistent under a simple indexing configuration. Future work will examine broader indexing strategies, write-intensive workloads, and distributed deployment scenarios. Full article
Show Figures

Figure 1

74 pages, 45992 KB  
Perspective
Integration of Lean Analytics and Industry 6.0: A Novel Meta-Theoretical Framework for Antifragile, Generative AI-Orchestrated, Circular–Regenerative, and Hyper-Connected Manufacturing Ecosystems
by Mohammad Shahin, Mazdak Maghanaki and F. Frank Chen
Big Data Cogn. Comput. 2026, 10(2), 65; https://doi.org/10.3390/bdcc10020065 - 17 Feb 2026
Cited by 3 | Viewed by 697
Abstract
The convergence of Lean manufacturing principles with Industry 4.0 has yielded significant operational improvements, yet the emerging paradigm of Industry 6.0—characterized by antifragile, autonomous, and sustainable systems—demands a fundamental rethinking of existing analytical frameworks. This paper introduces the Industry 6.0 Lean Analytics (I6LA) [...] Read more.
The convergence of Lean manufacturing principles with Industry 4.0 has yielded significant operational improvements, yet the emerging paradigm of Industry 6.0—characterized by antifragile, autonomous, and sustainable systems—demands a fundamental rethinking of existing analytical frameworks. This paper introduces the Industry 6.0 Lean Analytics (I6LA) Framework, a novel meta-theoretical approach that integrates Lean principles with the core concepts of Industry 6.0. By systematically analyzing the limitations of current Lean analytics in the context of Industry 6.0 requirements, we identify critical gaps in areas such as system resilience, AI-driven autonomy, and circular economy integration. The I6LA Framework addresses these gaps through four new theoretical pillars: Antifragile Lean Systems Theory, generative AI-Orchestrated Value Streams, Circular–Regenerative Analytics, and Hyper-Connected Ecosystem Integration. This research provides a new set of mathematical models for measuring antifragility, generative orchestration efficiency, and circularity, offering a comprehensive analytical toolkit for the next generation of manufacturing. The framework’s primary contribution is a paradigm shift from optimizing stable, human-in-the-loop systems to managing dynamic, autonomous ecosystems that thrive on volatility and are regenerative by design. This paper provides both a robust theoretical foundation and practical implementation guidance for organizations navigating the transition to Industry 6.0. Full article
(This article belongs to the Section Cognitive System)
Show Figures

Figure 1

38 pages, 2848 KB  
Article
Efficient Time Series Visual Exploration for Insight Discovery
by Heba Helal and Mohamed A. Sharaf
Big Data Cogn. Comput. 2026, 10(2), 64; https://doi.org/10.3390/bdcc10020064 - 16 Feb 2026
Viewed by 366
Abstract
Visual exploration of time series data is essential for uncovering meaningful insights in domains such as healthcare monitoring and financial analysis, yet it remains computationally challenging due to the combinatorial explosion of potential subsequence comparisons. For long time series, an exhaustive comparison of [...] Read more.
Visual exploration of time series data is essential for uncovering meaningful insights in domains such as healthcare monitoring and financial analysis, yet it remains computationally challenging due to the combinatorial explosion of potential subsequence comparisons. For long time series, an exhaustive comparison of all possible subsequence pairs becomes prohibitively expensive, limiting interactive exploration. This paper presents the TiVEx (Time Series Visual Exploration) family of algorithms for efficiently discovering the top-k most dissimilar subsequence pairs in comparative time series analysis. TiVEx achieves scalability through three complementary strategies: TiVEx-sharing exploits computational reuse across overlapping subsequence windows, eliminating redundant distance calculations; TiVEx-pruning employs distance-based upper bounds to eliminate unpromising candidates without exhaustive evaluation; and TiVEx-hybrid integrates both mechanisms to maximize efficiency gains. The key observation is that overlapping subsequences share a substantial computational structure, which can be systematically exploited while maintaining result optimality through provably correct pruning bounds. Extensive experiments on six diverse datasets demonstrate that TiVEx-hybrid achieves up to 84% reduction in distance calculations compared to exhaustive search while producing identical top-k results. Compared to state-of-the-art subsequence comparison methods, TiVEx-hybrid achieves 2.3× improvement in computational efficiency. Our effectiveness analysis confirms that TiVEx achieves result quality within 5% of exhaustive search even when exploring only a subset of candidate positions, enabling scalable visual exploration without compromising insight quality. Full article
(This article belongs to the Special Issue Application of Pattern Recognition and Machine Learning)
Show Figures

Figure 1

23 pages, 291 KB  
Review
Cognitive Assemblages: Living with Algorithms
by Stéphane Grumbach
Big Data Cogn. Comput. 2026, 10(2), 63; https://doi.org/10.3390/bdcc10020063 - 16 Feb 2026
Cited by 1 | Viewed by 734
Abstract
The rapid expansion of algorithmic systems has transformed cognition into an increasingly distributed and collective enterprise, giving rise to what can be described as cognitive assemblages, dynamic constellations of humans, institutions, data infrastructures, and artificial agents. This paper traces the historical and conceptual [...] Read more.
The rapid expansion of algorithmic systems has transformed cognition into an increasingly distributed and collective enterprise, giving rise to what can be described as cognitive assemblages, dynamic constellations of humans, institutions, data infrastructures, and artificial agents. This paper traces the historical and conceptual evolution that has led to this shift. First, we show how cognition, once conceived as the property of autonomous individuals, has progressively become embedded in socio-technical networks in which algorithmic processes participate as co-agents. Second, we revisit the progressive awareness of human cognitive limits, from bounded rationality to contemporary theories of extended mind. These frameworks anticipate and help explain today’s hybrid cognitive ecologies. Third, we assess the philosophical implications for Enlightenment ideals of autonomy, rationality, and self-governance, showing how these concepts must be reinterpreted in light of pervasive algorithmic intermediation. Finally, we examine global initiatives that seek to integrate augmented cognitive capacities into large-scale cybernetic forms of societal coordination, ranging from digital platforms and data spaces to AI-driven governance systems. These developments offer new opportunities for steering complex societies under conditions of globalization, environmental disruption, and the rise of autonomous intelligent systems, yet they also raise profound questions regarding control, accountability, and democratic legitimacy. We argue that understanding cognitive assemblages is essential to designing socio-technical systems capable of supporting collective intelligence while preserving human values in an era of accelerating complexity. Full article
Show Figures

Figure 1

17 pages, 1902 KB  
Article
Skill Classification of Youth Table Tennis Players Using Sensor Fusion and the Random Forest Algorithm
by Yung-Hoh Sheu, Cheng-Yu Huang, Li-Wei Tai, Tzu-Hsuan Tai and Sheng K. Wu
Big Data Cogn. Comput. 2026, 10(2), 62; https://doi.org/10.3390/bdcc10020062 - 15 Feb 2026
Viewed by 543
Abstract
This study addresses the issue of inaccurate results in traditional table tennis player classification, which is often influenced by subjective judgment and environmental factors, by proposing a youth table tennis player classification system based on sensor fusion and the random forest algorithm. The [...] Read more.
This study addresses the issue of inaccurate results in traditional table tennis player classification, which is often influenced by subjective judgment and environmental factors, by proposing a youth table tennis player classification system based on sensor fusion and the random forest algorithm. The system utilizes an embedded intelligent table tennis racket equipped with an ICM20948 nine-axis sensor and a wireless transmission module to capture real-time acceleration and angular velocity data during players’ strokes while synchronously employing a camera with OpenPose to extract joint angle variations. A total of 40 players’ stroke data were collected. Due to the limited sample size of top-tier players, the Synthetic Minority Over-sampling Technique (SMOTE) was applied, resulting in a final dataset of 360 records. Multiple key motion indicators were then computed and stored in a dedicated database. Experimental results showed that the proposed system, powered by the random forest algorithm, achieved a classification accuracy of 91.3% under conventional cross-validation, while subject-independent LOSO validation yielded a more conservative accuracy of 70.89%, making it a valuable reference for coaches and referees in conducting objective player classification. Future work will focus on expanding the dataset of domestic high-performance athletes and integrating precise sports science resources to further enhance the system’s performance and algorithmic models, thereby promoting the scientific selection of national team players and advancing the intelligent development of table tennis. Full article
(This article belongs to the Section Artificial Intelligence and Multi-Agent Systems)
Show Figures

Figure 1

25 pages, 4445 KB  
Article
Underwater Visual-Servo Alignment Control Integrating Geometric Cognition Compensation and Confidence Assessment
by Jinkun Li, Lingyu Sun, Minglu Zhang and Xinbao Li
Big Data Cogn. Comput. 2026, 10(2), 61; https://doi.org/10.3390/bdcc10020061 - 14 Feb 2026
Viewed by 428
Abstract
To meet the requirements for the automatic alignment, insertion, and inspection of guide-tube opening pins on the upper core plate in a component pool during refueling outages of nuclear power units, this paper proposes a cognition-enhanced visual-servoing framework that integrates geometric cognition-based compensation, [...] Read more.
To meet the requirements for the automatic alignment, insertion, and inspection of guide-tube opening pins on the upper core plate in a component pool during refueling outages of nuclear power units, this paper proposes a cognition-enhanced visual-servoing framework that integrates geometric cognition-based compensation, observation-confidence modeling, and constraint-aware optimal control. The framework addresses the key challenge posed by the coexistence of long-term geometric drift and underwater observation uncertainty. Specifically, historical closed-loop data are leveraged to learn and compensate for systematic geometric errors online, substantially improving coarse-positioning accuracy. In addition, an explicit confidence model is introduced to quantitatively assess the reliability of visual measurements. Building on these components, a confidence-driven, finite-horizon, constrained model predictive control strategy is designed to achieve safe and efficient finite-step convergence while strictly respecting actuator physical constraints. Ground experiments and deep-water component-pool validations demonstrate that the proposed method reduces coarse-positioning error by approximately 75%, achieves stable sub-millimeter alignment with an ample engineering safety margin, and effectively decreases erroneous insertions and the need for manual intervention. These results confirm the engineering applicability and safety advantages of the proposed cognition-enhanced visual-servoing framework for underwater alignment tasks in nuclear component pools. Full article
(This article belongs to the Special Issue Field Robotics and Artificial Intelligence (AI))
Show Figures

Figure 1

17 pages, 2120 KB  
Article
Reliability of LLM Inference Engines from a Static Perspective: Root Cause Analysis and Repair Suggestion via Natural Language Reports
by Hongwei Li and Yongjun Wang
Big Data Cogn. Comput. 2026, 10(2), 60; https://doi.org/10.3390/bdcc10020060 - 13 Feb 2026
Viewed by 554
Abstract
Large Language Model (LLM) inference engines are becoming critical system infrastructure, yet their increasing architectural complexity makes defects difficult to be diagnosed and repaired. Existing reliability studies predominantly focus on model behavior or training frameworks, leaving inference engine bugs underexplored, especially in settings [...] Read more.
Large Language Model (LLM) inference engines are becoming critical system infrastructure, yet their increasing architectural complexity makes defects difficult to be diagnosed and repaired. Existing reliability studies predominantly focus on model behavior or training frameworks, leaving inference engine bugs underexplored, especially in settings where execution-based debugging is impractical. We present a static, issue-centric approach for automated root cause analysis and repair suggestion generation for LLM inference engines. Based solely on issue reports and developer discussions, we construct a real-world defect dataset and annotate each issue with a semantic root cause category and affected system module. Leveraging text-based representations, our framework performs root cause classification and coarse-grained module localization without requiring code execution or specialized runtime environments. We further integrate structured repair patterns with a large language model to generate interpretable and actionable repair suggestions. Experiments on real-world issues concerning vLLMs demonstrate that our approach achieves effective root cause identification and module localization under limited and imbalanced data. A cross-engine evaluation further shows promising generalization to TensorRT-LLM. Human evaluation confirms that the generated repair suggestions are correct, useful, and clearly expressed. Our results indicate that static, issue-level analysis is a viable foundation for scalable debugging assistance in LLM inference engines. This work highlights the feasibility of static, issue-level defect analysis for complex LLM inference engines and explores automated debugging assistance techniques. The dataset and implementation will be publicly released to facilitate future research. Full article
Show Figures

Figure 1

14 pages, 725 KB  
Article
PLTA-FinBERT: Pseudo-Label Generation-Based Test-Time Adaptation for Financial Sentiment Analysis
by Hai Yang, Hainan Chen, Chang Jiang, Juntao He and Pengyang Li
Big Data Cogn. Comput. 2026, 10(2), 59; https://doi.org/10.3390/bdcc10020059 - 11 Feb 2026
Viewed by 689
Abstract
Financial sentiment analysis leverages natural language processing techniques to quantitatively assess sentiment polarity and emotional tendencies in financial texts. Its practical application in investment decision-making and risk management faces two major challenges: the scarcity of high-quality labeled data due to expert annotation costs, [...] Read more.
Financial sentiment analysis leverages natural language processing techniques to quantitatively assess sentiment polarity and emotional tendencies in financial texts. Its practical application in investment decision-making and risk management faces two major challenges: the scarcity of high-quality labeled data due to expert annotation costs, and semantic drift caused by the continuous evolution of market language. To address these issues, this study proposes PLTA-FinBERT, a pseudo-label generation-based test-time adaptation framework that enables dynamic self-learning without requiring additional labeled data. The framework consists of two modules: a multi-perturbation pseudo-label generation mechanism that enhances label reliability through consistency voting and confidence-based filtering, and a test-time dynamic adaptation strategy that iteratively updates model parameters based on high-confidence pseudo-labels, allowing the model to continuously adapt to new linguistic patterns. PLTA-FinBERT achieves 0.8288 accuracy on the sentiment classification dataset of financial sentiment analysis, representing an absolute improvement of 2.37 percentage points over the benchmark. On the FiQA sentiment intensity prediction task, it obtains an R2 of 0.58, surpassing the previous state-of-the-art by 3 percentage points. Full article
Show Figures

Figure 1

22 pages, 1378 KB  
Article
Bias Correction and Explainability Framework for Large Language Models: A Knowledge-Driven Approach
by Xianming Yang, Qi Li, Chengdong Qian, Haitao Wang, Yonghui Wu and Wei Wang
Big Data Cogn. Comput. 2026, 10(2), 58; https://doi.org/10.3390/bdcc10020058 - 10 Feb 2026
Viewed by 676
Abstract
Large Language Models (LLMs) have demonstrated extraordinary capabilities in natural language generation; however, their real-world deployment is frequently hindered by the generation of factually incorrect or biased content, along with an inherent deficiency in transparency. To address these critical limitations and thereby enhance [...] Read more.
Large Language Models (LLMs) have demonstrated extraordinary capabilities in natural language generation; however, their real-world deployment is frequently hindered by the generation of factually incorrect or biased content, along with an inherent deficiency in transparency. To address these critical limitations and thereby enhance the reliability and explainability of LLM outputs, this study proposes a novel integrated framework, namely the Adaptive Knowledge-Driven Correction Network (AKDC-Net), which incorporates three core algorithmic innovations. Firstly, the Hierarchical Uncertainty-Aware Bias Detector (HUABD) performs multi-level linguistic analysis (lexical, syntactic, semantic, and pragmatic) and, for the first time, decomposes predictive uncertainty into epistemic and aleatoric components. This decomposition enables principled, interpretable bias detection with clear theoretical underpinnings. Secondly, the Neural-Symbolic Knowledge Graph Enhanced Corrector (NSKGEC) integrates a temporal graph neural network with a differentiable symbolic reasoning module, facilitating logically consistent and factually grounded corrections based on dynamically updated knowledge sources. Thirdly, the Contrastive Learning-driven Multimodal Explanation Generator (CLMEG) leverages a cross-modal attention mechanism within a contrastive learning paradigm to generate coherent, high-quality textual and visual explanations that enhance the interpretability of LLM outputs. Extensive evaluations were conducted on a challenging medical domain dataset to validate the effectiveness of the proposed AKDC-Net framework. Experimental results demonstrate significant improvements over state-of-the-art baselines: specifically, a 14.1% increase in the F1-score for bias detection, a 19.4% enhancement in correction quality, and a 31.4% rise in user trust scores. These findings establish a new benchmark for the development of more trustworthy and transparent artificial intelligence (AI) systems, laying a solid foundation for the broader and more reliable application of LLMs in high-stakes domains. Full article
(This article belongs to the Special Issue Enhancement Optimization Techniques on Large Language Model)
Show Figures

Figure 1

24 pages, 2712 KB  
Article
Enhancing the Artificial Rabbit Optimizer Using Fuzzy Rule Interpolation
by Mohammad Almseidin
Big Data Cogn. Comput. 2026, 10(2), 57; https://doi.org/10.3390/bdcc10020057 - 10 Feb 2026
Viewed by 357
Abstract
Metaheuristic optimization algorithms have demonstrated their effectiveness in solving complex optimization tasks, such as those related to Intrusion Detection Systems (IDSs). It was widely used to enhance the detection rate of various types of cyber attacks by reducing the feature space or tuning [...] Read more.
Metaheuristic optimization algorithms have demonstrated their effectiveness in solving complex optimization tasks, such as those related to Intrusion Detection Systems (IDSs). It was widely used to enhance the detection rate of various types of cyber attacks by reducing the feature space or tuning the model’s hyperparameters. The Artificial Rabbit Optimizer (ARO) mimics rabbits’ intelligent foraging and hiding behavior. The ARO algorithm has seen widespread adoption in the optimization field. The widespread use of the ARO algorithm occurs due to its simple design and ease of implementation. However, ARO can get trapped in local optima due to its limited diversity in population dynamics. Although the transition between phases is managed via an energy shrink factor, fine-tuning this balance remains challenging and unexplored. These limitations could limit the ARO algorithm’s effectiveness in high-dimensional space, as with IDS systems. This paper introduces a novel enhancement of the original ARO by integrating Fuzzy Rule Interpolation (FRI) to compute the energy factor during the optimization process dynamically. In this work, we integrate the FRI along with the ARO algorithm to improve solution accuracy, maintain population diversity, and accelerate convergence, particularly in high-dimensional and complex problems such as IDS. The integration of the FRI and ARO aimed to control the exploration-exploitation balance in the IDS application area. To validate our proposed hybrid approach, we tested it on a diverse set of intrusion datasets, covering eight different benchmark intrusion detection datasets. The suggested hybrid approach has been demonstrated to be effective in handling various intrusion classification tasks. For binary intrusion classification tasks, it achieved accuracy rates ranging from 96% to 99.9%. In the case of multiclass intrusion classification tasks, the accuracy was slightly more consistent, falling between 91.6% and 98.9%. The suggested approach effectively reduced the number of feature spaces, achieving reduction rates from 56% up to 96%. Furthermore, the proposed approach outperformed other state-of-the-art methods in terms of detection rate. Full article
Show Figures

Figure 1

20 pages, 2437 KB  
Article
ISFJ-RAG: Interventional Suppression of Hallucinations via Counter-Factual Joint Decoding Retrieval-Augment Generation
by Yuezhao Liu, Wei Li, Yijie Wang, Ningtong Chen and Min Chen
Big Data Cogn. Comput. 2026, 10(2), 56; https://doi.org/10.3390/bdcc10020056 - 9 Feb 2026
Viewed by 616
Abstract
Although retrieval-augmented generation (RAG) technology mitigates the hallucination issue in large language models (LLMs) by incorporating external knowledge, and combining reasoning models can further enhance RAG system performance, retrieval noise and attention bias still lead to the diffusion of factual errors in problems [...] Read more.
Although retrieval-augmented generation (RAG) technology mitigates the hallucination issue in large language models (LLMs) by incorporating external knowledge, and combining reasoning models can further enhance RAG system performance, retrieval noise and attention bias still lead to the diffusion of factual errors in problems such as factual queries, multi-hop questions, and unanswerable questions. Existing methods struggle to effectively suppress “high-confidence hallucinations” in long-chain reasoning due to their failure to decouple knowledge bias effects from causal reasoning paths. To address this, this paper proposes the ISFJ-RAG framework, which dynamically intervenes in hallucinations through counterfactual joint decoding. First, a structural causal model (SCM) reveals three root causes of hallucinations in RAG systems: irrelevant knowledge interference, reasoning path bias, and spurious correlations in self-attention mechanisms. A dual-decoder architecture is further designed: the total causal effect decoder models the global relationship between user queries and knowledge, while the knowledge bias effect decoder captures potential biases induced by external knowledge. A dynamic modulation module converts the latter’s output into a proxy measure of hallucination bias. By computing individual treatment effects (ITEs), the bias component is removed from the full generation distribution, achieving simultaneous suppression of knowledge-irrelevant and reasoning-irrelevant hallucinations. Ablation experiments validate the robustness of average token log-probability as a confidence metric. Experiments demonstrate that on the RAGEval benchmark, ISFJ-RAG improves generation completeness to 86.89% (+5.49%) while reducing hallucination rates to 10.39% (−2.5%) and irrelevance rates to 4.44% (−2.99%). Full article
Show Figures

Figure 1

17 pages, 698 KB  
Review
What Distinguishes AI-Generated from Human Writing? A Rapid Review of the Literature
by Georgios P. Georgiou
Big Data Cogn. Comput. 2026, 10(2), 55; https://doi.org/10.3390/bdcc10020055 - 8 Feb 2026
Viewed by 2116
Abstract
Large language models (LLMs) are now routine writing tools across various domains, intensifying questions about when text should be treated as human-authored, artificial intelligence (AI)-generated, or collaboratively produced. This rapid review aims to identify cue families reported in empirical studies as distinguishing AI [...] Read more.
Large language models (LLMs) are now routine writing tools across various domains, intensifying questions about when text should be treated as human-authored, artificial intelligence (AI)-generated, or collaboratively produced. This rapid review aims to identify cue families reported in empirical studies as distinguishing AI from human-authored text and to assess how stable these cues are across genres/tasks, text lengths, and revision conditions. Following the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines, we searched four online databases for peer-reviewed empirical articles (1 January 2022–1 January 2026). After deduplication and screening, 40 studies were included. Evidence converged on five cue families: surface, discourse/pragmatic, epistemic/content, predictability/probabilistic, and provenance. Surface cues dominated the literature and were the most consistently operationalized. Discourse/pragmatic cues followed, particularly in discipline-bound academic genres where stance and metadiscourse differentiated AI from human writing. Predictability/probabilistic cues were central in detector-focused studies, while epistemic/content cues emerged primarily in tasks where grounding and authenticity were salient. Provenance cues were concentrated in watermarking research. Across studies, cue stability was consistently conditional rather than universal. Specifically, surface and discourse cues often remained discriminative within constrained genres, but shifted with register and discipline; probabilistic cues were powerful yet fragile under paraphrasing, post-editing, and evasion; and provenance signals required robustness to editing, mixing, and span localization. Overall, the literature indicates that AI–human distinction emerges from layered and context-dependent cue profiles rather than from any single reliable marker. High-stakes decisions, therefore, require condition-aware interpretation, triangulation across multiple cue families, and human oversight rather than automated classification in isolation. Full article
(This article belongs to the Special Issue Machine Learning Applications in Natural Language Processing)
Show Figures

Figure 1

32 pages, 7709 KB  
Article
Research on Modeling Method of eLoran Signal Propagation Delay Prediction Model: Integrating Path-Weighted Meteorological Data and Propagation Delay Data in Long-Distance Scenarios
by Tao Jin, Shiyao Liu, Baorong Yan, Xiang Jiang, Wei Guo, Yu Hua, Shougang Zhang and Lu Xu
Big Data Cogn. Comput. 2026, 10(2), 54; https://doi.org/10.3390/bdcc10020054 - 7 Feb 2026
Viewed by 347
Abstract
The enhanced long-range navigation (eLoran) system serves as an important backup method for the global navigation satellite system (GNSS) system. In long-distance transmission scenarios, the signal propagation delay of the eLoran system is affected by fluctuations in meteorological factors along the path. Regarding [...] Read more.
The enhanced long-range navigation (eLoran) system serves as an important backup method for the global navigation satellite system (GNSS) system. In long-distance transmission scenarios, the signal propagation delay of the eLoran system is affected by fluctuations in meteorological factors along the path. Regarding these issues, such as the potential timing system errors caused by meteorological factors and the limitation on the accuracy of the timing system, in this paper, an innovative prediction model is proposed to predict the propagation delay data by fusing the propagation delay data of multiple differential reference stations on the path and the path-weighted meteorological data. By collecting and processing actual data, four types of prediction tasks were designed. Comparative analyses of the prediction performance of eight common models were conducted on a unified dataset. The results show that the Pucheng–Zhengzhou path-weighted ten-factor back-propagation neural network (PZWT-BPNN) model performs the best, achieving a balance between prediction accuracy and training efficiency. This model effectively suppresses the timing errors caused by meteorological fluctuations and improves the prediction accuracy of the propagation delay of the system, providing corresponding technical support for key fields such as low-altitude economy and transportation. Full article
Show Figures

Figure 1

28 pages, 3555 KB  
Article
Modern ICT Tools and Video Content in Athletes’ Education—Inspiration from Corporate Learning and Development
by Martin Mičiak, Dominika Toman, Milan Kubina, Tatiana Poljaková, Klaudia Ivanovič, Kvetoslava Šimová, Anna Majchráková, Ivana Bystrická, Linda Kováčik and Tibor Furmánek
Big Data Cogn. Comput. 2026, 10(2), 53; https://doi.org/10.3390/bdcc10020053 - 6 Feb 2026
Viewed by 856
Abstract
Active athletes represent a specific target for learning and development. Their schedules, including training sessions and competitions, leave little time for education. However, athletes still need skills beyond sports to ensure they are prepared for future employment. Our study approaches this issue by [...] Read more.
Active athletes represent a specific target for learning and development. Their schedules, including training sessions and competitions, leave little time for education. However, athletes still need skills beyond sports to ensure they are prepared for future employment. Our study approaches this issue by identifying appropriate settings for athletes’ learning and development. (1) Based on the background of current athletes’ education, it addresses the gap of not enough attention being paid to transferable practices from corporate attitudes to learning and development. (2) The study’s methodology primarily uses the case study concept because this conveys the video content we created for the athletes’ learning and development. This is combined with the method of content analysis of selected examples from corporate learning and development and the design thinking workshop, with the engagement of important stakeholder groups: athletes (2 participants), lecturers (2 participants), and representatives of sports organizations (1 participant). The other 9 workshop participants were master’s students in a managerial study programme because of their age similarities with the current athletes and the applicability of the courses they were studying to athletes’ education. (3) The designed process was created as a digital twin using haptic artefacts and the S2M technology (version 1.0) within the OMiLAB platform (version 1.6). Our results show that video content tailored to the athletes’ constraints is a viable solution that improves their career prospects. (4) The study’s practical implications are supported by the expert validation of the model provided by the inside of the large sports organizations’ management. Full article
Show Figures

Figure 1

16 pages, 2545 KB  
Article
CCTD-MARL: Coupled Communication-Task Decoupling Framework for Multi-Agent Systems Under Partial Observability
by Kehan Li, Zhenya Wang, Xin Tang, Heng You, Long Hu, Haidong Xie and Min Chen
Big Data Cogn. Comput. 2026, 10(2), 52; https://doi.org/10.3390/bdcc10020052 - 5 Feb 2026
Viewed by 433
Abstract
Although multi-agent reinforcement learning (MARL) has achieved significant success in various domains, its deployment in real-world scenarios remains challenging, particularly in communication-constrained environments involving multi-task coupling. Existing methods suffer from two limitations: (1) the inability to effectively integrate and process incomplete state from [...] Read more.
Although multi-agent reinforcement learning (MARL) has achieved significant success in various domains, its deployment in real-world scenarios remains challenging, particularly in communication-constrained environments involving multi-task coupling. Existing methods suffer from two limitations: (1) the inability to effectively integrate and process incomplete state from disparate agents, and (2) a lack of robust mechanisms for handling complex multi-task coupling. To address these challenges, we propose the Coupled Communication-Task Decoupling (CCTD) framework. CCTD introduces two critical innovations: first, a distributed state compensation mechanism to process historical data, thereby reconstructing accurate global states from partial observations; second, a hierarchical architecture that systematically decomposes complex tasks into manageable subtasks while preserving their interdependencies. Thanks to its modular design, CCTD can integrate with existing MARL algorithms and allow for flexible combination of various subtasks. Extensive experiments demonstrate that CCTD outperforms baseline methods, achieving a 10% improvement in communication reception rate and superior performance across all subtasks in multi-task environments. Full article
Show Figures

Figure 1

27 pages, 3371 KB  
Article
Hybrid Method of Organizing Information Search in Logistics Systems Based on Vector-Graph Structure and Large Language Models
by Vadim Voloshchuk, Yaroslav Melnik, Irina Safronenkova, Egor Lishchenko, Oleg Kartashov and Alexander Kozlovskiy
Big Data Cogn. Comput. 2026, 10(2), 51; https://doi.org/10.3390/bdcc10020051 - 5 Feb 2026
Viewed by 556
Abstract
In logistics systems, the organization of information retrieval plays a key role in human interaction with technical systems to ensure decision-making speed, route optimization, planning, and resource allocation. At the same time, the efficiency of the logistics system when simultaneously processing large volumes [...] Read more.
In logistics systems, the organization of information retrieval plays a key role in human interaction with technical systems to ensure decision-making speed, route optimization, planning, and resource allocation. At the same time, the efficiency of the logistics system when simultaneously processing large volumes of data and constantly updating it is determined by the speed of processing user requests and the accuracy of the responses provided by the system. Within the retrieval-augmented generation architecture, a hybrid information retrieval method has been proposed, based on the combined use of a vector-graph data representation structure and large language model. Experiments showed that the hybrid method achieved best accuracy rates of 0.24–0.25 (among all considered methods) with enhanced scalability capabilities (when the number of nodes increases fourfold, the time increases only twofold—from 0.09 s to 0.20 s) due to the limitation of the graph traversal area when implementing the graph component of the hybrid search. An optimal range of 30–50 nodes to be traversed was also identified, balancing precision and query processing speed. The findings are of practical value to logistics system developers and supply chain managers aiming to implement high-precision, natural language-based information retrieval in dynamic operational environments. Full article
Show Figures

Figure 1

14 pages, 8775 KB  
Article
Improving Transferability of Adversarial Attacks via Maximization and Targeting from Image to Video Quality Assessment
by Georgii Gotin, Ekaterina Shumitskaya, Dmitriy Vatolin and Anastasia Antsiferova
Big Data Cogn. Comput. 2026, 10(2), 50; https://doi.org/10.3390/bdcc10020050 - 5 Feb 2026
Viewed by 472
Abstract
This paper proposes a novel method for transferable adversarial attacks from Image Quality Assessment (IQA) to Video Quality Assessment (VQA) models. Attacking modern VQA models is challenging due to their high complexity and the temporal nature of video content. Since IQA and VQA [...] Read more.
This paper proposes a novel method for transferable adversarial attacks from Image Quality Assessment (IQA) to Video Quality Assessment (VQA) models. Attacking modern VQA models is challenging due to their high complexity and the temporal nature of video content. Since IQA and VQA models share similar low- and mid-level feature representations, and IQA models are substantially cheaper and faster to run, we leverage them as surrogates to generate transferable adversarial perturbations. Our method, MaxT-I2VQA jointly Maximizes IQA scores and Targets IQA feature activations to improve transferability from IQA to VQA models. We first analyze the correlation between IQA and VQA internal features and use these insights to design a feature-targeting loss. We evaluate MaxT-I2VQA by transferring attacks from four state-of-the-art IQA models to four recent VQA models and compare against three competitive baselines. Compared to prior methods, MaxT-I2VQA increases the transferability of an attack success rate by 7.9% and reduces per-example attack runtime by 8 times. Our experiments confirm that IQA and VQA feature spaces are sufficiently aligned to enable effective cross-task transfer. Full article
Show Figures

Figure 1

25 pages, 2294 KB  
Article
SiAraSent: From Features to Deep Transformers for Large-Scale Arabic Sentiment Analysis
by Omar Almousa, Yahya Tashtoush, Anas AlSobeh, Plamen Zahariev and Omar Darwish
Big Data Cogn. Comput. 2026, 10(2), 49; https://doi.org/10.3390/bdcc10020049 - 3 Feb 2026
Viewed by 667
Abstract
Sentiment analysis of Arabic text, particularly on social media platforms, presents a formidable set of unique challenges that stem from the language’s complex morphology, its numerous dialectal variations, and the frequent and nuanced use of emojis to convey emotional context. This paper presents [...] Read more.
Sentiment analysis of Arabic text, particularly on social media platforms, presents a formidable set of unique challenges that stem from the language’s complex morphology, its numerous dialectal variations, and the frequent and nuanced use of emojis to convey emotional context. This paper presents SiAraSent, a hybrid framework that integrates traditional text representations, emoji-aware features, and deep contextual embeddings based on Arabic transformers. Starting from a strong and fully interpretable baseline built on Term Frequency–Inverse Definition Frequency (TF–IDF)-weighted character and word N-grams combined with emoji embeddings, we progressively incorporate SinaTools for linguistically informed preprocessing and AraBERT for contextualized encodings. The framework is evaluated on a large-scale dataset of 58,751 Arabic tweets labeled for sentiment polarity. Our design works within four experimental configurations: (1) a baseline traditional machine learning architecture that employs TF-IDF, N-grams, and emoji features with an Support Vector Machine (SVM) classifier; (2) an Large-language Model (LLM) feature extraction approach that leverages deep contextual embeddings from the pre-trained AraBERT model; (3) a novel hybrid fusion model that concatenates traditional morphological features, AraBERT embeddings, and emoji-based features into a high-dimensional vector; and (4) a fully fine-tuned AraBERT model specifically adapted for the sentiment classification task. Our experiments demonstrate the remarkable efficacy of our proposed framework, with the fine-tuned AraBERT architecture achieving an accuracy of 93.45%, a significant 10.89% improvement over the best traditional baseline. Full article
(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining: 2nd Edition)
Show Figures

Figure 1

26 pages, 4979 KB  
Article
AMPS: A Direction-Aware Adaptive Multi-Scale Potential Model for Link Prediction in Complex Networks
by Xinghua Qin, Sizheng Liu, Mengmeng Zhang, Jun Tang and Yirun Ruan
Big Data Cogn. Comput. 2026, 10(2), 48; https://doi.org/10.3390/bdcc10020048 - 3 Feb 2026
Viewed by 463
Abstract
To overcome the limitations of current link prediction methods in effectively leveraging topological information and node importance, this paper introduces a new model called AMPS (Adaptive Multi-scale Potential-enhanced Path Similarity). The model is built on a hierarchical structure that captures both global network [...] Read more.
To overcome the limitations of current link prediction methods in effectively leveraging topological information and node importance, this paper introduces a new model called AMPS (Adaptive Multi-scale Potential-enhanced Path Similarity). The model is built on a hierarchical structure that captures both global network topology and local interaction patterns, with full compatibility for directed and undirected networks. This is achieved through a process that quantifies node potential fields, enhances multi-scale similarity, and fuses information across scales. Specifically, we define three types of potential field models, global, local, and k-hop, to flexibly measure node importance. We also introduce two complementary prediction modules: an enhanced common neighbor matrix (PCN), which uses potential fields to refine local structural details, and a feature-weighted generalized path similarity (GLP), which integrates node importance into path evaluation. The final similarity score is obtained by adaptively combining the outputs of PCN and GLP. Experiments on 12 undirected datasets and 9 directed datasets demonstrate that AMPS significantly outperforms other mainstream algorithms in terms of the AUC metric. It also exhibits strong robustness under varying training set ratios, maintaining stable advantages in both directed and undirected scenarios. This framework provides a physically intuitive, topology-aware, and high-precision solution for link prediction across various types of networks. Full article
Show Figures

Figure 1

20 pages, 5999 KB  
Article
Lithology Identification from Well Logs via Meta-Information Tensors and Quality-Aware Weighting
by Wenxuan Chen, Guoyun Zhong, Fan Diao, Peng Ding and Jianfeng He
Big Data Cogn. Comput. 2026, 10(2), 47; https://doi.org/10.3390/bdcc10020047 - 2 Feb 2026
Viewed by 682
Abstract
In practical well-logging datasets, severe missing values, anomalous disturbances, and highly imbalanced lithology classes are pervasive. To address these challenges, this study proposes a well-logging lithology identification framework that combines Robust Feature Engineering (RFE) with quality-aware XGBoost. Instead of relying on interpolation-based data [...] Read more.
In practical well-logging datasets, severe missing values, anomalous disturbances, and highly imbalanced lithology classes are pervasive. To address these challenges, this study proposes a well-logging lithology identification framework that combines Robust Feature Engineering (RFE) with quality-aware XGBoost. Instead of relying on interpolation-based data cleaning, RFE uses sentinel values and a meta-information tensor to explicitly encode patterns of missingness and anomalies, and incorporates sliding-window context to transform data defects into discriminative auxiliary features. In parallel, a quality-aware sample-weighting strategy is introduced that jointly accounts for formation boundary locations and label confidence, thereby mitigating training bias induced by long-tailed class distributions. Experiments on the FORCE 2020 lithology prediction dataset demonstrate that, relative to baseline models, the proposed method improves the weighted F1 score from 0.66 to 0.73, while Boundary F1 and the geological penalty score are also consistently enhanced. These results indicate that, compared with traditional workflows that rely solely on data cleaning, explicit modeling of data incompleteness provides more pronounced advantages in terms of robustness and engineering applicability. Full article
(This article belongs to the Section Data Mining and Machine Learning)
Show Figures

Figure 1

34 pages, 2216 KB  
Review
Big Data Analytics and AI for Consumer Behavior in Digital Marketing: Applications, Synthetic and Dark Data, and Future Directions
by Leonidas Theodorakopoulos, Alexandra Theodoropoulou and Christos Klavdianos
Big Data Cogn. Comput. 2026, 10(2), 46; https://doi.org/10.3390/bdcc10020046 - 2 Feb 2026
Cited by 1 | Viewed by 3892
Abstract
In the big data era, understanding and influencing consumer behavior in digital marketing increasingly relies on large-scale data and AI-driven analytics. This narrative, concept-driven review examines how big data technologies and machine learning reshape consumer behavior analysis across key decision-making areas. After outlining [...] Read more.
In the big data era, understanding and influencing consumer behavior in digital marketing increasingly relies on large-scale data and AI-driven analytics. This narrative, concept-driven review examines how big data technologies and machine learning reshape consumer behavior analysis across key decision-making areas. After outlining the theoretical foundations of consumer behavior in digital settings and the main data and AI capabilities available to marketers, this paper discusses five application domains: personalized marketing and recommender systems, dynamic pricing, customer relationship management, data-driven product development and fraud detection. For each domain, it highlights how algorithmic models affect targeting, prediction, consumer experience and perceived fairness. This review then turns to synthetic data as a privacy-oriented way to support model development, experimentation and scenario analysis, and to dark data as a largely underused source of behavioral insight in the form of logs, service interactions and other unstructured records. A discussion section integrates these strands, outlines implications for digital marketing practice and identifies research needs related to validation, governance and consumer trust. Finally, this paper sketches future directions, including deeper integration of AI in real-time decision systems, increased use of edge computing, stronger consumer participation in data use, clearer ethical frameworks and exploratory work on quantum methods. Full article
(This article belongs to the Section Big Data)
Show Figures

Figure 1

18 pages, 800 KB  
Article
Free Access to World News: Reconstructing Full-Text Articles from GDELT
by Andrea Fronzetti Colladon and Roberto Vestrelli
Big Data Cogn. Comput. 2026, 10(2), 45; https://doi.org/10.3390/bdcc10020045 - 2 Feb 2026
Viewed by 1289
Abstract
News data have become essential resources across various disciplines. Still, access to full-text news corpora remains challenging due to high costs and the limited availability of free alternatives. This paper presents a novel Python package (gdeltnews) that reconstructs full-text newspaper articles at near-zero [...] Read more.
News data have become essential resources across various disciplines. Still, access to full-text news corpora remains challenging due to high costs and the limited availability of free alternatives. This paper presents a novel Python package (gdeltnews) that reconstructs full-text newspaper articles at near-zero cost by leveraging the Global Database of Events, Language, and Tone (GDELT) Web News NGrams 3.0 dataset. Our method merges overlapping n-grams extracted from global online news to rebuild complete articles. We validate the approach on a benchmark set of 2211 articles from major U.S. news outlets, achieving up to 95% text similarity against original articles based on Levenshtein and SequenceMatcher metrics. Our tool facilitates economic forecasting, computational social science, information science, and natural language processing applications by enabling free and large-scale access to full-text news data. Full article
(This article belongs to the Section Big Data)
Show Figures

Figure 1

39 pages, 12238 KB  
Article
Fusing Dynamic Bayesian Network for Explainable Decision with Optimal Control for Occupancy Guidance in Autonomous Air Combat
by Mingzhe Zhou, Guanglei Meng, Biao Wang and Tiankuo Meng
Big Data Cogn. Comput. 2026, 10(2), 44; https://doi.org/10.3390/bdcc10020044 - 29 Jan 2026
Viewed by 445
Abstract
In this paper, an explainable decision-making and guidance integration method is developed based on dynamic Bayesian network and the optimized control method. The proposed method can be applied for the autonomous decision-making and guidance in the game of attacking and defending of unmanned [...] Read more.
In this paper, an explainable decision-making and guidance integration method is developed based on dynamic Bayesian network and the optimized control method. The proposed method can be applied for the autonomous decision-making and guidance in the game of attacking and defending of unmanned combat aerial vehicles in close air combat. Firstly, the target maneuver recognition and target trajectory prediction are carried out according to the target information detected by the sensor. Then, a dynamic Bayesian network model for close combat decision is established by combining space occupancy situation and equipment performance information with target maneuver identification results. The decision model realizes the intelligent selection of the optimization index function of the maneuver. The optimal control constrained gradient method is adopted to realize the optimal calculation of the unmanned combat aerial vehicle occupancy guidance quantity by considering the constraint of unmanned combat aerial vehicle flight performance. The simulation results of several typical close air combat show that the proposed method can realize rationalized autonomous decision-making and space occupancy guidance of unmanned combat aerial vehicles, overcome the solidification of mobile action mode by traditional methods, and has better real-time performance and optimization performance. Full article
Show Figures

Figure 1

24 pages, 861 KB  
Article
Distinguishability-Driven Voice Generation for Speaker Anonymization via Random Projection and GMM
by Chunxia Wang, Qiuyu Zhang, Yingjie Hu and Huiyi Wei
Big Data Cogn. Comput. 2026, 10(2), 43; https://doi.org/10.3390/bdcc10020043 - 29 Jan 2026
Viewed by 457
Abstract
Speaker anonymization effectively conceals speaker identity in speech signals to protect privacy. To address issues in existing anonymization systems, including reduced voice distinguishability, limited anonymized voices, reliance on an external speaker pool, and vulnerability to privacy leakage against strong attackers, a novel distinguishability-driven [...] Read more.
Speaker anonymization effectively conceals speaker identity in speech signals to protect privacy. To address issues in existing anonymization systems, including reduced voice distinguishability, limited anonymized voices, reliance on an external speaker pool, and vulnerability to privacy leakage against strong attackers, a novel distinguishability-driven voice generation for speaker anonymization via random projection and the Gaussian Mixture Model (GMM) is proposed. This method first applies the random projection to lower the dimensionality of the X-vectors from an external speaker pool, and then constructs a GMM in the reduced dimensional space to fit the generative model. By sampling from this generative model, anonymous speaker identity representations are generated, ultimately synthesizing anonymized speech that maintains both intelligibility and distinguishability. To ensure the anonymized speech remains sufficiently distinguishable from the original and prevents excessive similarity, a cosine similarity check is implemented between the original X-vector and pseudo-X-vector. Experimental results on the VoicePrivacy Challenge datasets demonstrate that the proposed method not only effectively protects speaker privacy across different attack scenarios but also preserves speech content integrity while significantly enhancing speaker distinguishability between original speakers and their corresponding pseudo-speakers, as well as among different pseudo-speakers. Full article
(This article belongs to the Topic Generative AI and Interdisciplinary Applications)
Show Figures

Figure 1

15 pages, 661 KB  
Article
Assessing the Determinants of Behavioural Cybersecurity in Healthcare: A Study of Patient Health Application Users in Saudi Arabia
by Alghaliyah Alharbi, Hasan Mansur, Manahil Alfuraydan and Thabit Atobishi
Big Data Cogn. Comput. 2026, 10(2), 42; https://doi.org/10.3390/bdcc10020042 - 29 Jan 2026
Cited by 1 | Viewed by 564
Abstract
Cybersecurity has become one of the top priorities in Saudi Arabia, playing a key role in achieving Vision 2030 and advancing the kingdom’s position in digital transformation. This study investigates how cybersecurity knowledge, attitudes, and awareness influence user behaviours in health applications within [...] Read more.
Cybersecurity has become one of the top priorities in Saudi Arabia, playing a key role in achieving Vision 2030 and advancing the kingdom’s position in digital transformation. This study investigates how cybersecurity knowledge, attitudes, and awareness influence user behaviours in health applications within Saudi Arabia. An online cross-sectional survey was distributed between March and April 2025 among Saudi Arabian residents. The collected data (n = 629) were analyzed using Smart PLS Structural Equation Modelling (SEM) to assess the relationships among the study constructs. The majority of the participants (61.4%) were between the age of 18 and 24, and 87.6% reported using health applications such as Sehhaty or Labayh to manage their health information. Results demonstrated that all three constructs significantly predicted cybersecurity behaviours: knowledge showed the strongest influence (β = 0.372), followed by attitude (β = 0.343) and awareness (β = 0.199), with all paths being statistically significant (p < 0.05). The model explained substantial variance in cybersecurity behaviours. Knowledge, attitude, and awareness significantly predict cybersecurity practices in healthcare application contexts. Findings highlight the critical need for targeted educational interventions focusing on cybersecurity knowledge enhancement and awareness programmes to promote safer digital health behaviours and strengthen patient data protection in Saudi Arabia’s healthcare system. Full article
(This article belongs to the Special Issue Big Data Analytics with Machine Learning for Cyber Security)
Show Figures

Figure 1

16 pages, 10849 KB  
Article
LLM4ATS: Applying Large Language Models for Auto-Testing Scripts in Automobiles
by Zeyuan Li, Wei Li, Yuezhao Liu, Wenhao Li and Min Chen
Big Data Cogn. Comput. 2026, 10(2), 41; https://doi.org/10.3390/bdcc10020041 - 28 Jan 2026
Viewed by 402
Abstract
This paper introduces LLM4ATS, a framework integrating large language models, RAG, and closed-loop verification to automatically generate highly reliable automotive automated test scripts from natural language descriptions. Addressing the complex linguistic structure, strict rules, and strong dependency on the in-vehicle communication database inherent [...] Read more.
This paper introduces LLM4ATS, a framework integrating large language models, RAG, and closed-loop verification to automatically generate highly reliable automotive automated test scripts from natural language descriptions. Addressing the complex linguistic structure, strict rules, and strong dependency on the in-vehicle communication database inherent in ATS scripts, LLM4ATS innovatively employs fine-grained line-level generation and a rule-guided iterative refinement mechanism. The framework first enhances prompt context by retrieving relevant information from constructed syntax and case knowledge bases via RAG. Subsequently, each generated script line undergoes rigorous verification through a two-stage validator: initial syntax validation followed by semantic compliance checks against the communication database for signal paths and value domains. Any errors trigger structured feedback, driving iterative refinement by the large language model until fully compliant scripts are produced. This paper evaluated the framework’s effectiveness on real ATS datasets, testing models including GPT-3.5, GPT-4, Qwen2.5-7B, and Qwen2.5-72B-Instruct. Experimental results demonstrate that compared to zero-shot and few-shot baseline methods, the LLM4ATS framework significantly improves generation quality and pass rates across all models. Notably, the strongest GPT-4 model achieved a script pass rate of 91% with LLM4ATS, up from 42% in zero-shot mode, and validated functional effectiveness on a specified in-vehicle hardware platform (Chery Fengyun T28 dashboard). At the same time, expert manual evaluations confirmed the superior performance of the generated scripts in correctness, readability, and compliance with industry standards. Full article
Show Figures

Figure 1

19 pages, 4487 KB  
Article
Research on Emerging Technology Identification Methods Based on a Knowledge Graph of High-Value Patents
by Chuan Zhan, Yang Zhou and Yanping Huang
Big Data Cogn. Comput. 2026, 10(2), 40; https://doi.org/10.3390/bdcc10020040 - 28 Jan 2026
Viewed by 736
Abstract
In the context of a new wave of scientific and technological revolution and industrial transformation, this study proposes an emerging technology identification framework that integrates a High-Value Patent Knowledge Graph with Social Network Analysis, aiming to systematically uncover the semantic and structural relationships [...] Read more.
In the context of a new wave of scientific and technological revolution and industrial transformation, this study proposes an emerging technology identification framework that integrates a High-Value Patent Knowledge Graph with Social Network Analysis, aiming to systematically uncover the semantic and structural relationships embedded in patent data and to support national efforts to secure strategic technological advantages. First, patent textual feature scores are extracted using the Doc2Vec model, while indicator feature scores are calculated across the technical, legal, and economic dimensions using the CRITIC weighting method. These two types of scores are then integrated to derive a comprehensive patent value score, and high-value patents are screened according to the Pareto principle. Subsequently, a High-Value Patent Knowledge Graph is constructed based on entity extraction using the BERT-BiLSTM-CRF model and relationship matching techniques. Building upon this graph, centrality analysis is conducted on the nodes, and the results are combined with the rich semantic relationships represented in the knowledge graph to further identify emerging technologies. Taking the New Energy Vehicle domain as an empirical case, a High-Value Patent Knowledge Graph comprising seven types of entities, six types of relationships, and 25,611 triplets is developed, through which six key emerging sub-technology directions are identified. The empirical findings demonstrate the effectiveness and robustness of the proposed approach for emerging technology identification. Full article
Show Figures

Figure 1

20 pages, 2437 KB  
Article
Regression-Based Small Language Models for DER Trust Metric Extraction from Structured and Semi-Structured Data
by Nathan Hamill and Razi Iqbal
Big Data Cogn. Comput. 2026, 10(2), 39; https://doi.org/10.3390/bdcc10020039 - 24 Jan 2026
Viewed by 590
Abstract
Renewable energy sources like wind turbines and solar panels are integrated into modern power grids as Distributed Energy Resources (DERs). These DERs can operate independently or as part of microgrids. Interconnecting multiple microgrids creates Networked Microgrids (NMGs) that increase reliability, resilience, and independent [...] Read more.
Renewable energy sources like wind turbines and solar panels are integrated into modern power grids as Distributed Energy Resources (DERs). These DERs can operate independently or as part of microgrids. Interconnecting multiple microgrids creates Networked Microgrids (NMGs) that increase reliability, resilience, and independent power generation. However, the trustworthiness of individual DERs remains a critical challenge in NMGs, particularly when integrating previously deployed or geographically distributed units managed by entities with varying expertise. Assessing DER trustworthiness ensuring reliability and security is essential to prevent system-wide instability. Thisresearch addresses this challenge by proposing a lightweight trust metric generation system capable of processing structured and semi-structured DER data to produce key trust indicators. The system employs a Small Language Model (SLM) with approximately 16 million parameters for textual data understanding and metric extraction, followed by a regression head to output bounded trust scores. Designed for deployment in computationally constrained environments, the SLM requires only 64.6 MB of disk space and 200–250 MB of memory that is significantly lesser than larger models such as DeepSeek R1, Gemma-2, and Phi-3, which demand 3–12 GB. Experimental results demonstrate that the SLM achieves high correlation and low mean error across all trust metrics while outperforming larger models in efficiency. When integrated into a full neural network-based trust framework, the generated metrics enable accurate prediction of DER trustworthiness. These findings highlight the potential of lightweight SLMs for reliable and resource-efficient trust assessment in NMGs, supporting resilient and sustainable energy systems in smart cities. Full article
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop