Next Issue
Volume 10, April
Previous Issue
Volume 10, February
 
 

Big Data Cogn. Comput., Volume 10, Issue 3 (March 2026) – 33 articles

Cover Story (view full-size image): This study evaluates SegFormer for practical urban scene segmentation deployment, focusing on scalability, transferability, and interpretability. Results show that SegFormer-B5 achieves top accuracy (82.4% mIoU), but SegFormer-B3 provides better efficiency–performance balance for real-time systems with limited compute. Transfer learning from CamVid significantly boosts performance on KITTI and IDD, with gains up to 72.74% for rare classes while reducing training time by 61.1%, enabling faster adaptation across regions. Interpretability tools such as confidence heatmaps and Grad-CAM improve transparency, supporting safer deployment in autonomous driving. These findings demonstrate SegFormer’s readiness for real-world applications while highlighting future needs in embedded optimization and multimodal perception. View this paper
  • Issues are regarded as officially published after their release is announced to the table of contents alert mailing list.
  • You may sign up for e-mail alerts to receive table of contents of newly released issues.
  • PDF is the official format for papers published in both, html and pdf forms. To view the papers in pdf format, click on the "PDF Full-text" link, and use the free Adobe Reader to open them.
Order results
Result details
Section
Select all
Export citation of selected articles as:
26 pages, 5635 KB  
Article
A Multi-Feature Transition-Aware Framework for Next POI Recommendation
by Oraya Sooknit, Jakkarin Suksawatchon and Ureerat Suksawatchon
Big Data Cogn. Comput. 2026, 10(3), 99; https://doi.org/10.3390/bdcc10030099 - 23 Mar 2026
Viewed by 345
Abstract
Next Point-of-Interest (POI) recommendation focuses on predicting a user’s subsequent location based on historical check-in data. In practice, however, check-in logs frequently contain uncertain records in which ambiguous spatial, temporal, or behavioral information obscures the underlying mobility regularities, thereby degrading prediction performance. To [...] Read more.
Next Point-of-Interest (POI) recommendation focuses on predicting a user’s subsequent location based on historical check-in data. In practice, however, check-in logs frequently contain uncertain records in which ambiguous spatial, temporal, or behavioral information obscures the underlying mobility regularities, thereby degrading prediction performance. To address this challenge, this study first infers user preferences from historical trajectories and reweights transition importance based on temporal and spatial proximity. It then models transition relationships using three complementary feature dimensions: POI category, spatial area, and routine versus non-routine behavioral patterns. Using transition probability analysis, feature-level dependencies in user mobility are systematically investigated. The findings demonstrate that these transition features contribute unevenly to predictive performance, with area-based transitions yielding the strongest results when used in isolation. Nonetheless, their joint integration consistently achieves the highest accuracy, underscoring the critical role of transition-aware modeling. Across two real-world datasets, the proposed framework consistently achieves state-of-the-art performance in top-ranked accuracy (Recall@1) and ranking quality (NDCG@1), while delivering competitive effectiveness at higher cutoff values (k=3 and k=5). Notably, on the NYC dataset, MTF-POI achieves the highest Recall@1 (+19.01% over the strongest baseline) with a marginal trade-off at Recall@3, reflecting the framework’s design emphasis on precise next-step prediction. Full article
Show Figures

Figure 1

18 pages, 7435 KB  
Article
A Comparative Analysis of Deep-Learning-Based Speech Enhancement Models: Assessing Biometric Speaker Verification in Real-World Noisy Environments
by Md Jahangir Alam Khondkar, Ajan Ahmed, Stephanie Schuckers and Masudul H. Imtiaz
Big Data Cogn. Comput. 2026, 10(3), 98; https://doi.org/10.3390/bdcc10030098 - 23 Mar 2026
Viewed by 439
Abstract
Speech enhancement through denoising is essential for maintaining signal intelligibility and quality in biometric speaker verification pipelines that operate in acoustically adverse conditions. Despite the proliferation of deep learning (DL) architectures for speech denoising, simultaneously optimizing noise attenuation, perceptual fidelity, and speaker-identity preservation [...] Read more.
Speech enhancement through denoising is essential for maintaining signal intelligibility and quality in biometric speaker verification pipelines that operate in acoustically adverse conditions. Despite the proliferation of deep learning (DL) architectures for speech denoising, simultaneously optimizing noise attenuation, perceptual fidelity, and speaker-identity preservation remains an open problem. We address this gap by benchmarking three architecturally distinct DL-based enhancement models—Wave-U-Net, CMGAN, and U-Net—on three independent, domain-diverse corpora (SpEAR, VPQAD, and Clarkson) that the models never encountered during training and by introducing commercial-grade VeriSpeak speaker-verification scores as a biometric evaluation dimension absent from prior comparative studies. Our experiments reveal a clear three-way trade-off: U-Net achieves the highest signal-to-noise ratio (SNR) gains (+61.44% on SpEAR, +67.05% on VPQAD, +235.3% on Clarkson) but sacrifices naturalness; CMGAN yields the best perceptual evaluation of speech quality (PESQ) values (3.33, 1.35, and 2.50, respectively), favoring listening-comfort applications; and Wave-U-Net delivers the strongest biometric fidelity (VeriSpeak improvements of +11.63%, +30.22%, and +29.24%) while offering competitive perceptual quality. These results highlight that model selection must be driven by the target deployment scenario and provide actionable guidance for improving biometric verification robustness under real-world noise. Full article
Show Figures

Figure 1

17 pages, 746 KB  
Article
Predicting Mortality and Readmission in Obstructive Sleep Apnea via LLM-Expanded Clinical Concepts
by Awwal Ahmed, Anthony Rispoli, Carrie Wasieloski, Ifrah Khurram, Rafael Zamora-Resendiz, Destinee Morrow, Aijuan Dong and Silvia Crivelli
Big Data Cogn. Comput. 2026, 10(3), 97; https://doi.org/10.3390/bdcc10030097 - 21 Mar 2026
Viewed by 362
Abstract
Obstructive Sleep Apnea (OSA) is a common sleep disorder associated with serious health risks. This study leverages large language models (LLMs) to process and interpret clinical narratives in electronic health records. It develops clinically meaningful lexicons for predicting mortality and readmission risk, as [...] Read more.
Obstructive Sleep Apnea (OSA) is a common sleep disorder associated with serious health risks. This study leverages large language models (LLMs) to process and interpret clinical narratives in electronic health records. It develops clinically meaningful lexicons for predicting mortality and readmission risk, as well as for multiclass diagnostic classification in OSA patients. Using LLM-expanded lexicons, logistic regression models achieved ROC–AUC scores of 0.844 for 6-month all-cause post-discharge mortality, 0.817 for 1-year all-cause post-discharge mortality, and 0.729 for all-cause hospital readmissions following the first discharge. Diagnostic performance was highest with smaller n-gram representations, indicating that additional contextual length did not improve performance. Compared with frequency-based n-gram models, LLM-expanded lexicons yielded sparser feature sets with lower computational cost and comparable performance. Our findings highlight the potential of LLM-expanded lexicons to enhance OSA diagnosis and clinical risk stratification. Full article
Show Figures

Figure 1

19 pages, 992 KB  
Article
Hybrid Music Similarity with Hypergraph and Siamese Network
by Sera Kim, Youngjun Kim, Jaewon Lee and Dalwon Jang
Big Data Cogn. Comput. 2026, 10(3), 96; https://doi.org/10.3390/bdcc10030096 - 21 Mar 2026
Viewed by 333
Abstract
This paper proposes a novel method for measuring music similarity. Existing music similarity measurements have often been used for music appreciation, but this paper proposes a method for measuring the similarity between music samples which are used for music production. Conventional music recommendation [...] Read more.
This paper proposes a novel method for measuring music similarity. Existing music similarity measurements have often been used for music appreciation, but this paper proposes a method for measuring the similarity between music samples which are used for music production. Conventional music recommendation approaches often rely on either metadata-based similarity or audio-based feature similarity in isolation, which limits their effectiveness in sample-based recommendation scenarios where both compositional context and acoustic characteristics are important. To address this limitation, the proposed framework combines a hypergraph-based information similarity module with a feature-based similarity module learned using Siamese networks and triplet loss. In the information-based module, metadata attributes such as beats per minute (BPM), genre, chord, key, and instrument are modeled as vertices in a hypergraph, and Random Walk–Word2Vec embeddings are learned to capture structural relationships between music samples and their attributes. In parallel, the feature-based module employs vertex-specific Siamese networks trained on instrument and key classification tasks to learn perceptual similarity directly from audio signals. The two modules are trained independently and jointly utilized at the recommendation stage to provide attribute-specific similarity results for a given query sample. Results show that the proposed system achieves high Precision@k across multiple attributes and forms stable similarity structures in the embedding space, even without relying on user interaction data. These results reflect embedding consistency evaluated over the entire dataset where training and retrieval are performed on the same sample pool, rather than generalization to unseen samples. These results demonstrate that the proposed hybrid framework effectively captures both structural and perceptual similarity among music samples and is well suited for sample-based music recommendation in music production environments. Full article
Show Figures

Figure 1

25 pages, 6493 KB  
Article
A Dynamic Prompt-Based Logic-Aided Compliance Checker
by Wenxi Sheng, Chi Wei, Yinuo Zhang, Bowen Zhang and Jingyun Sun
Big Data Cogn. Comput. 2026, 10(3), 95; https://doi.org/10.3390/bdcc10030095 - 21 Mar 2026
Viewed by 379
Abstract
Text-based automatic compliance checking (ACC) employs natural language processing technologies to scrutinize a corporation’s business documents, ensuring adherence to related normative texts. The current methods fall into two primary categories: symbol-based and embedding-based approaches. Symbol-based methods, noted for their accuracy and transparent processing, [...] Read more.
Text-based automatic compliance checking (ACC) employs natural language processing technologies to scrutinize a corporation’s business documents, ensuring adherence to related normative texts. The current methods fall into two primary categories: symbol-based and embedding-based approaches. Symbol-based methods, noted for their accuracy and transparent processing, suffer from limited versatility. Conversely, embedding-based methods operate independently of expert knowledge yet often yield challenging-to-interpret results and require substantial volumes of annotated data. While both types of methods exhibit advantages in different aspects, the current research fails to combine these advantages effectively. Therefore, the existing methods fail to balance interpretability, generalization ability, and accuracy, which are key requirements for practical compliance systems. To address this problem, we introduce a novel approach termed the Dynamic Prompt-based Logic-Aided Compliance Checker (DPLACC), which is grounded in the prompt learning framework. This method initially parses target texts, transforming the results into first-order logical expressions. It subsequently retrieves pertinent knowledge from a knowledge graph, converting the knowledge into analogous first-order logical expressions. These expressions are then encoded into a global semantic vector via a pre-trained first-order logistic encoder. Ultimately, the semantics of expressions and initial texts are amalgamated within the prompt template, facilitating the logical knowledge enhancement of model reasoning. Experiments on Chinese and English datasets demonstrate that DPLACC comprehensively outperforms existing methods based solely on symbols or embeddings in terms of accuracy, precision, recall, and F1 score and significantly surpasses current mainstream large language models. Furthermore, DPLACC exhibits enhanced interpretability and reduced data dependence, maintaining 70% checking accuracy with as few as ten training samples. This capability allows DPLACC to be rapidly deployed in data-scarce real-world scenarios with minimal annotation overhead, thus offering a practical pathway toward the scalable implementation of compliance inspection systems. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) and Natural Language Processing (NLP))
Show Figures

Figure 1

46 pages, 2822 KB  
Review
Generative AI and the Foundation Model Era: A Comprehensive Review
by Abdussalam Elhanashi, Siham Essahraui, Pierpaolo Dini, Davide Paolini, Qinghe Zheng and Sergio Saponara
Big Data Cogn. Comput. 2026, 10(3), 94; https://doi.org/10.3390/bdcc10030094 - 20 Mar 2026
Viewed by 944
Abstract
Generative artificial intelligence and foundation models have changed machine learning by allowing systems to produce readable text, realistic images, and other multimodal content with little direct input from a user. Foundation models are large neural networks trained on very large and varied datasets, [...] Read more.
Generative artificial intelligence and foundation models have changed machine learning by allowing systems to produce readable text, realistic images, and other multimodal content with little direct input from a user. Foundation models are large neural networks trained on very large and varied datasets, and they form the core of many current generative AI (GenAI) systems. Their rapid development has led to major advances in areas like natural language processing, computer vision, multimodal learning, and robotics. Examples include GPT, LLaMA, and diffusion-based architectures, such as models often used for image generation. Systems such as Stable Diffusion show this shift by illustrating how AI can interpret information, draw basic inferences, and produce new outputs using more than one type of data. This review surveys common foundation model architectures and examines what they can do in generative tasks. It reviews Transformer, diffusion, and multimodal architectures, focusing on methods that support scaling and transfer across domains. The paper also reviews key approaches to pretraining and fine-tuning, including self-supervised learning, instruction tuning, and parameter-efficient adaptation, which support these systems’ ability to generalize across tasks. In addition to the technical details, this review discusses how GenAI is being used for text generation, image synthesis, robotics, and biomedical research. The study also notes continuing challenges, such as the high computing and energy demands of large models, ethical concerns about data bias and misinformation, and worries about privacy, reliability, and responsible use of AI in real settings. This review brings together ideas about model design, training methods, and social implications to point future research toward GenAI systems that are efficient, easy to interpret, and reliable, while supporting scientific progress and ethical responsibility. Full article
(This article belongs to the Special Issue Multimodal Deep Learning and Its Applications)
Show Figures

Figure 1

42 pages, 1779 KB  
Article
Uncertainty-First Forecasting of the South African Equity Market Using Deep Learning and Temporal Conformal Prediction
by Phumudzo Lloyd Seabe, Claude Rodrigue Bambe Moutsinga and Maggie Aphane
Big Data Cogn. Comput. 2026, 10(3), 93; https://doi.org/10.3390/bdcc10030093 - 20 Mar 2026
Viewed by 458
Abstract
Accurate forecasting of equity returns remains fundamentally constrained by weak short-horizon predictability, pronounced noise, and structural non-stationarity. While deep learning models have been widely applied to financial time series, most studies prioritize point prediction and provide limited guidance on reliable uncertainty quantification, particularly [...] Read more.
Accurate forecasting of equity returns remains fundamentally constrained by weak short-horizon predictability, pronounced noise, and structural non-stationarity. While deep learning models have been widely applied to financial time series, most studies prioritize point prediction and provide limited guidance on reliable uncertainty quantification, particularly in emerging markets. This study developed an uncertainty-aware forecasting framework for the South African equity market by integrating variational mode decomposition (VMD), gated recurrent units (GRUs), and temporal conformal prediction (TCP) to construct distribution-free prediction intervals with finite-sample coverage guarantees. Using daily returns from the FTSE/JSE All Share Index, we first confirmed that baseline recurrent models applied directly to raw returns exhibited negligible out-of-sample explanatory power, consistent with weak-form market efficiency. Incorporating VMD enhanced representation learning and improved point forecast accuracy by isolating latent frequency components. However, model-based predictive variance alone proved insufficient for reliable calibration. Embedding the models within a rolling conformal prediction framework restored near-nominal coverage across multiple confidence levels while allowing interval widths to adapt dynamically to changing volatility regimes. Robustness analyses, including walk-forward validation, stress-regime evaluation, and block permutation negative control experiments, indicated that the observed performance was not driven by temporal leakage or alignment artifacts. The results further highlight a trade-off between interval sharpness and tail-risk protection, particularly during extreme market events. Overall, the findings support a shift from return-level prediction toward calibrated uncertainty estimation as a more stable and economically meaningful objective in non-stationary financial environments. Full article
Show Figures

Figure 1

37 pages, 3831 KB  
Article
A Hybrid NER–Sentiment Model for Uzbek Texts: Integrating Lexical, Deep Learning, and Entity-Based Approaches
by Bobur Saidov, Vladimir Barakhnin, Rakhmon Saparbaev, Zayniddin Narmuratov, Rustamova Manzura, Ruzmetova Zilolakhon and Anorgul Atajanova
Big Data Cogn. Comput. 2026, 10(3), 92; https://doi.org/10.3390/bdcc10030092 - 19 Mar 2026
Viewed by 521
Abstract
This work proposes a hybrid Uzbek sentiment analysis model (sometimes referred to as tonality analysis in the local literature) that integrates contextual text representations with named-entity information from an NER module and emoji-based emotional cues that are common in short online messages. To [...] Read more.
This work proposes a hybrid Uzbek sentiment analysis model (sometimes referred to as tonality analysis in the local literature) that integrates contextual text representations with named-entity information from an NER module and emoji-based emotional cues that are common in short online messages. To provide a comprehensive baseline comparison, we evaluate seven approaches—SVM, LSTM, mBERT, XLM-RoBERTa-base, mDeBERTa-v3, LaBSE, and the proposed hybrid model—covering both classical machine learning and modern multilingual transformer architectures for low-resource sentiment tasks. The overall pipeline begins with Uzbek-specific text normalization to reduce noise from informal spellings, transliteration variants, and inconsistent apostrophe usage. In parallel, the system performs explicit emoji extraction to capture affective signals that are often expressed non-verbally in social media texts. Next, we construct three complementary feature streams: a context encoder for sentence-level semantics, NER-driven entity features that encode entity mentions and types, and an emotion module that models emoji priors and their interaction with contextual meaning. These streams are fused into a unified representation and fed to a final classifier to predict sentiment polarity. Experiments on an Uzbek test set demonstrate that the hybrid model reaches an F1-score of 0.92, consistently outperforming text-only baselines. The results indicate that entity-aware and emoji-informed features improve robustness under sarcasm/irony, mixed sentiment with multiple targets, and orthographic noise, making the approach suitable for social media analytics, public opinion monitoring, customer feedback triage, and recommendation-oriented text mining. Full article
(This article belongs to the Section Data Mining and Machine Learning)
Show Figures

Figure 1

19 pages, 1537 KB  
Article
Data-Driven Cognitive Early Warning for Goaf Spontaneous Combustion: An Edge-Deployed RBF Network with Real-Time Multisensor Analytics
by Gang Cheng, Hailin Pei, Xiaokang Chen, Xiaorong Pang and Renzheng Sun
Big Data Cogn. Comput. 2026, 10(3), 91; https://doi.org/10.3390/bdcc10030091 - 19 Mar 2026
Viewed by 332
Abstract
Spontaneous combustion in goaf areas poses a significant threat to coal mine safety. Traditional safety management systems, reliant on passive response and single-indicator thresholds, often suffer from delayed warnings and lack cognitive decision support. To address this challenge, this study proposes a big-data-driven [...] Read more.
Spontaneous combustion in goaf areas poses a significant threat to coal mine safety. Traditional safety management systems, reliant on passive response and single-indicator thresholds, often suffer from delayed warnings and lack cognitive decision support. To address this challenge, this study proposes a big-data-driven cognitive computing framework for dynamic risk prediction of goaf spontaneous combustion, based on a “Cloud-Edge-End” collaborative architecture. The method leverages multi-sensor big data streams (CO, C2H4, O2, etc.) and deploys a lightweight Radial Basis Function (RBF) neural network on underground edge computing nodes (STM32) for real-time analytics. The model demonstrates excellent predictive performance on imbalanced datasets, with a PR-AUC of 0.910 and a recall of 99.7%. The edge-deployed RBF model achieves a single-pass inference time of only 0.62 ms, enabling real-time cognitive risk mapping. Field application at Z Coal Mine validated the system’s effectiveness, providing an average pre-warning time of 48.5 h, achieving zero spontaneous combustion accidents, and reducing the Total Recordable Injury Rate (TRIR) by 15.2%. This work illustrates how edge-based cognitive computing can transform safety management from passive response to proactive prevention, offering a scalable and interpretable framework for intelligent mine safety. Full article
Show Figures

Figure 1

23 pages, 5079 KB  
Article
Dual-Stream Transformer with Kalman-Based Sensor Fusion for Wearable Fall Detection
by Abheek Pradhan, Sana Alamgeer, Rakesh Suvvari, Syed Tousiful Haque and Anne H. H. Ngu
Big Data Cogn. Comput. 2026, 10(3), 90; https://doi.org/10.3390/bdcc10030090 - 17 Mar 2026
Viewed by 440
Abstract
Wearable fall detection systems face a fundamental challenge: while gyroscope data provide valuable orientation cues, naively combining raw gyroscope and accelerometer signals can degrade performance due to noise contamination. To overcome this challenge, we present a dual-stream transformer architecture that incorporates (i) Kalman-based [...] Read more.
Wearable fall detection systems face a fundamental challenge: while gyroscope data provide valuable orientation cues, naively combining raw gyroscope and accelerometer signals can degrade performance due to noise contamination. To overcome this challenge, we present a dual-stream transformer architecture that incorporates (i) Kalman-based sensor fusion to convert noisy gyroscope angular velocities into stable orientation estimates (roll, pitch, yaw), maintaining an internal state of body pose, and (ii) processing accelerometer and orientation streams in separate encoder pathways before fusion to prevent cross-modal interference. Our architecture further integrates Squeeze-and-Excitation channel attention and Temporal Attention Pooling to focus on fall-critical temporal patterns. Evaluated on the SmartFallMM dataset using 21-fold leave-one-subject-out cross-validation, the dual-stream Kalman transformer achieves 91.10% F1, outperforming single-stream Kalman transformers (89.80% F1) by 1.30% and single-stream baseline transformers (88.96% F1) by 2.14%. We further evaluate the model in real time using a watch-based SmartFall App on five participants, maintaining an average F1 score of 83% and an accuracy of 90%. These results indicate robust performance in both offline and real-world deployment settings, establishing a new state-of-the-art for inertial-measurement-unit-based fall detection on commodity smartwatch devices. Full article
Show Figures

Figure 1

26 pages, 12081 KB  
Article
DEPART: Multi-Task Interpretable Depression and Parkinson’s Disease Detection from In-the-Wild Video Data
by Elena Ryumina, Alexandr Axyonov, Mikhail Dolgushin, Dmitry Ryumin and Alexey Karpov
Big Data Cogn. Comput. 2026, 10(3), 89; https://doi.org/10.3390/bdcc10030089 - 16 Mar 2026
Viewed by 413
Abstract
Automated video-based detection of cognitive disorders can enable a scalable non-invasive health monitoring. However, existing methods focus on a single disease and provide limited interpretability, whereas real-world videos often contain co-occurring conditions. We propose a novel unified multi-task method to detect depression and [...] Read more.
Automated video-based detection of cognitive disorders can enable a scalable non-invasive health monitoring. However, existing methods focus on a single disease and provide limited interpretability, whereas real-world videos often contain co-occurring conditions. We propose a novel unified multi-task method to detect depression and Parkinson’s disease (PD) from in-the-wild video data called DEPART (DEpression and PArkinson’s Recognition Technique). It performs body region extraction, Contrastive Language-Image Pre-training (CLIP)-based visual encoding, Transformer-based temporal modeling, and prototype-aware classification with a gated fusion technique. Gradient-based attention maps are used to visualize task-specific regions that drive predictions. Experiments on the In-the-Wild Speech Medical (WSM) corpus demonstrate competitive performance: the multi-task model achieves Recall of 82.39% for depression and 78.20% for PD, compared with 87.76% and 78.20%, for the best single-task models. The multi-task learning initially increases false positives for healthy persons in the PD subset, mainly due to annotation–modality mismatches, static visual content misinterpreted as motor impairments, and occasional body detection failures. After cleaning the test data, Recall for healthy individuals becomes comparable across models; the multi-task model improves Recall for both depression (from 82.39% to 87.50%) and PD (from 78.20% to 86.14%), suggesting better robustness for real-life clinical applications. Full article
Show Figures

Figure 1

21 pages, 891 KB  
Article
Unified Visual Synchrony: A Framework for Face–Gesture Coherence in Multimodal Human–AI Interaction
by Saule Kudubayeva, Yernar Seksenbayev, Aigerim Yerimbetova, Elmira Daiyrbayeva, Bakzhan Sakenov, Duman Telman and Mussa Turdalyuly
Big Data Cogn. Comput. 2026, 10(3), 88; https://doi.org/10.3390/bdcc10030088 - 12 Mar 2026
Viewed by 523
Abstract
Multimodal human–AI systems generally consider facial expressions and body motions as separate input streams, leading to disjointed interpretations and diminished emotional coherence. To overcome this issue, we offer the Engagement-Safe Expressive Alignment (ESEA) paradigm and the Unified Visual Synchrony (UVS) framework as its [...] Read more.
Multimodal human–AI systems generally consider facial expressions and body motions as separate input streams, leading to disjointed interpretations and diminished emotional coherence. To overcome this issue, we offer the Engagement-Safe Expressive Alignment (ESEA) paradigm and the Unified Visual Synchrony (UVS) framework as its computational implementation. UVS models the coherence between facial expressions and gestures, offering an interpretable visual synchrony signal that can function as adaptive feedback in human–AI interactions. The framework’s key component is the Consistency Index for Affective Synchrony (CIAS), which correlates brief visual segments with scalar synchrony scores through a common latent representation. Facial and gestural signals are processed by modality-specific projection networks into a unified latent space, and CIAS is derived from the similarity and short-term temporal consistency of these latent trajectories. The synchrony index is regarded as an estimation of affective visual coherence within the ESEA paradigm. We formalize the UVS/CIAS framework and conduct a comparative experimental evaluation utilizing matched and mismatched face–gesture segments derived from rendered dialog footage. Utilizing ROC analysis, score distribution comparisons, temporal visualizations, and negative control tests, we illustrate that CIAS effectively captures structured face–gesture alignment that surpasses similarity-based baselines, while also delivering a persistent, time-resolved synchronization signal. These findings establish CIAS as a principled and interpretable feedback signal for future affect-aware, engagement-focused multimodal agents. Full article
Show Figures

Figure 1

21 pages, 6001 KB  
Article
An Intelligent Evaluation Method for Slope Stability Based on a Database Integrating Real Cases and Numerical Simulations
by Junyi Jiang, Dong Li, Qingyi Yang, Zhenhua Zhang, Lei Wang, Wenru Zhao and Mingliang Chen
Big Data Cogn. Comput. 2026, 10(3), 87; https://doi.org/10.3390/bdcc10030087 - 12 Mar 2026
Viewed by 372
Abstract
Slope instability can cause severe disasters, making stability prediction essential. Machine learning has become a key tool for this purpose, as it avoids complex mechanical calculations and efficiently handles high-dimensional data. Currently, the data used in machine learning primarily originate from real-world cases. [...] Read more.
Slope instability can cause severe disasters, making stability prediction essential. Machine learning has become a key tool for this purpose, as it avoids complex mechanical calculations and efficiently handles high-dimensional data. Currently, the data used in machine learning primarily originate from real-world cases. However, such cases are inherently limited in quantity and often fail to comprehensively represent all potential slope conditions. To address these limitations, this study proposes a method for constructing numerical simulation databases. Based on this, we develop a model establishment method for rapid evaluation of slope stability integrating numerical simulation with engineering cases. This study uses six characteristic parameters to assess slope stability, including unit weight γ, cohesion c, internal friction angle φ, slope angle α, slope height H, and pore pressure ratio ru. Through extensive literature mining, we established a database of 684 engineering cases. Based on statistical analysis of input parameters, a numerical simulation scheme was designed. Batch calculations were performed using MATLAB to determine simulation results. The engineering case database was then partitioned into training and testing sets for model development and validation. Subsequently, the numerical simulation database was incorporated into the training set for retesting. Results demonstrate that when considering all predictive indicators, the prediction accuracy of the GRNN-based model improved from 85% to 88.3%, while the PNN-based model showed an increase from 69% to 88.3%. This study offers new insights for optimizing numerical simulation design and enhancing machine learning performance in slope stability prediction. Full article
Show Figures

Figure 1

21 pages, 709 KB  
Article
SBT-Rec: A Structured Behavioral Tokenization Framework for LLM-Based Sequential Recommendation
by Langgao Cheng, Yanying Mao, Guowang Li and Honghui Chen
Big Data Cogn. Comput. 2026, 10(3), 86; https://doi.org/10.3390/bdcc10030086 - 10 Mar 2026
Viewed by 435
Abstract
Generative recommendation systems based on Large Language Models leverage their reasoning capabilities to capture users’ latent interests. However, aligning continuous user behavioral embeddings with the discrete semantic space of LLMs remains a challenge. Direct alignment often leads to semantic mismatch and hallucination issues. [...] Read more.
Generative recommendation systems based on Large Language Models leverage their reasoning capabilities to capture users’ latent interests. However, aligning continuous user behavioral embeddings with the discrete semantic space of LLMs remains a challenge. Direct alignment often leads to semantic mismatch and hallucination issues. Furthermore, existing methods typically rely on multi-stage training strategies to adapt to variations in feature distributions, thereby limiting training efficiency. To address the aforementioned issues, we propose SBT-Rec, a structured behavioral tokenization framework. Specifically, we first design a hierarchical discrete structure discovery module, utilizing a recursive residual quantization mechanism to decompose continuous behavioral vectors into discrete behavioral atoms to resolve modality discrepancies. Second, the multi-scale behavioral semantic reconstruction module reconstructs behavioral representations via residual superposition, thereby reducing data noise. Third, a residual-aware modality distribution aligner is introduced to transform behavioral features into input tokens compatible with the LLM via non-linear mapping. Finally, based on structured discrete representations, we propose a single-stage behavioral-semantic adaptive optimization strategy, achieving end-to-end parameter-efficient fine-tuning. Experiments on the MovieLens, LastFM, and Steam datasets demonstrate that SBT-Rec outperforms existing baseline models in terms of recommendation accuracy, training efficiency, and noise robustness. Full article
(This article belongs to the Special Issue Multimodal Deep Learning and Its Applications)
Show Figures

Figure 1

25 pages, 6489 KB  
Article
A Reference Model for the Analysis and Indexing of Metaverse Recordings for Information Retrieval
by Patrick Steinert, Stefan Wagenpfeil, Ingo Frommholz and Matthias L. Hemmje
Big Data Cogn. Comput. 2026, 10(3), 85; https://doi.org/10.3390/bdcc10030085 - 9 Mar 2026
Viewed by 423
Abstract
After the peak of the recent hype wave of interest surrounding the metaverse, virtual world applications remained in areas such as gaming, VR training, simulations, and collaboration. In this context, recordings are created which subsequently evolve into extensive collections that users may wish [...] Read more.
After the peak of the recent hype wave of interest surrounding the metaverse, virtual world applications remained in areas such as gaming, VR training, simulations, and collaboration. In this context, recordings are created which subsequently evolve into extensive collections that users may wish to access, search through, and retrieve items from. In order to facilitate searchability of metaverse recordings, it is necessary to adapt content analysis and indexing techniques to the specific characteristics of these recordings. This paper presents a reference model, the Processing Framework for Metaverse Recordings (PFMR), which details the phases of structural analysis, feature extraction, data mining, and feature fusion. The objective is to facilitate efficient retrieval of metaverse content. Our evaluation, based on a prototypical implementation, demonstrates the applicability and effectiveness of PFMR. This lays the groundwork for further integration of metaverse-specific content into Multimedia Information Retrieval systems. The evaluation of the 256 Metaverse Recording dataset shows that PFMRs’ domain-specific adaptability and integratability allows effective metaverse recording information retrieval for metaverse-specific features such as avatar detection, dialog mining, and toxicity classification. Full article
Show Figures

Figure 1

34 pages, 5422 KB  
Article
Home-Based Telerehabilitation Through a Modular, Sensor-Integrated Virtual Monitoring System
by Zoltán Mészáros, M. A. Hannan Bin Azhar, Tasmina Islam and Soumya Kanti Manna
Big Data Cogn. Comput. 2026, 10(3), 84; https://doi.org/10.3390/bdcc10030084 - 8 Mar 2026
Viewed by 526
Abstract
Home based telerehabilitation has expanded after COVID-19, but delivering timely guidance and monitoring exercise performance outside the clinic remains difficult. Traditional physiotherapy often relies on repeated execution of simple routines, yet clinicians have limited visibility into adherence and movement quality during unsupervised sessions. [...] Read more.
Home based telerehabilitation has expanded after COVID-19, but delivering timely guidance and monitoring exercise performance outside the clinic remains difficult. Traditional physiotherapy often relies on repeated execution of simple routines, yet clinicians have limited visibility into adherence and movement quality during unsupervised sessions. From a systems perspective, many telerehabilitation approaches also face constraints in accessibility, bandwidth, and computational cost that can limit practical deployment. This paper presents a modular telerehabilitation framework and prototype that captures and records rehabilitation exercise sessions for asynchronous clinician review in a 3D visualisation environment. The system integrates skeletal motion capture with plantar pressure sensing, and stores sessions as portable artefacts to support replay, inspection, and downstream analysis. A connector-based architecture enables extension to additional sensors without redesigning the core application, and the design aims to support deployment under constrained home computing and networking conditions. The manuscript contributes an implementation blueprint and reference architecture for multimodal capture and replay. Clinical effectiveness, usability outcomes, and quantitative sensor accuracy benchmarking are outside the scope of this work and are identified as necessary future evaluation. Full article
Show Figures

Figure 1

44 pages, 1099 KB  
Systematic Review
Sound Event Detection in Smart Cities: A Systematic Review of Methods, Datasets, and Applications
by Giuseppe Ciaburro and Virginia Puyana-Romero
Big Data Cogn. Comput. 2026, 10(3), 83; https://doi.org/10.3390/bdcc10030083 - 8 Mar 2026
Viewed by 573
Abstract
Sound Event Detection (SED) is a growing area with vast prospects for understanding and designing the sonic fabric of smart cities. In this paper, the latest advances in SED are summarized, focusing on models, datasets, and applications from scientific papers listed on Scopus [...] Read more.
Sound Event Detection (SED) is a growing area with vast prospects for understanding and designing the sonic fabric of smart cities. In this paper, the latest advances in SED are summarized, focusing on models, datasets, and applications from scientific papers listed on Scopus and Web of Science. The paper provides a clear view of how SED is being used in smart cities, public safety, environment monitoring, and home security. The paper also addresses the challenges of SED, including dataset representativeness, model robustness under noisy or complex acoustic scenes, event rarity detection, as well as the ethics of using automatic listening. The paper also provides a view of future work to be undertaken in SED. The focus of the paper is on self-supervised learning, multi-modal fusion, neuro-inspired approaches, as well as privacy-preserving analytics. The paper provides a view of SED as a key technology to make smart cities safe, secure, and sustainable. SED has vast prospects as a key technology to enable artificial perception of smart cities. Full article
Show Figures

Figure 1

23 pages, 2046 KB  
Article
Carbon Price Forecasting via a CNN-BiLSTM Model Integrating VMD and Classified News Sentiment
by Xiyun Yang, Han Chen, Xiangjun Li and Xiaoyu Liu
Big Data Cogn. Comput. 2026, 10(3), 82; https://doi.org/10.3390/bdcc10030082 - 6 Mar 2026
Viewed by 318
Abstract
Accurate carbon price forecasting is vital for risk management but is hindered by high volatility and sensitivity to external shocks. Existing multivariate models typically overlook unstructured news sentiment, failing to capture irrational fluctuations driven by market public opinion. To address this, this paper [...] Read more.
Accurate carbon price forecasting is vital for risk management but is hindered by high volatility and sensitivity to external shocks. Existing multivariate models typically overlook unstructured news sentiment, failing to capture irrational fluctuations driven by market public opinion. To address this, this paper proposes VBN-Net, a hybrid model integrating carbon-specific news sentiment with Variational Mode Decomposition (VMD). Two core innovations are presented: First, a multi-modal input mechanism combines structured financial data with unstructured carbon news sentiment to effectively capture policy-driven shocks. Second, a Sequential Beluga Whale Optimization strategy is designed to adaptively optimize feature engineering in steps. Unlike conventional approaches, the VBN-Net first employs VMD for denoising and frequency decomposition, and then optimizes the fusion weights of news sentiment across different frequency components derived from multi-source news. This strategy effectively overcomes the subjectivity of manual parameter selection, providing high-quality features for a fixed CNN-BiLSTM backbone. By integrating VMD-based denoising with optimized multi-source news fusion, the model achieves consistent performance improvements across multiple evaluation metrics. The empirical findings validate the effectiveness of the proposed model in enhancing forecasting performance, thereby providing a reliable analytical tool for participants in the carbon market. Full article
Show Figures

Figure 1

30 pages, 2628 KB  
Article
Predicting Bond Defaults in China: A Double-Ensemble Model Leveraging SMOTE for Class Imbalance
by Chongwen Tian and Rong Li
Big Data Cogn. Comput. 2026, 10(3), 81; https://doi.org/10.3390/bdcc10030081 - 6 Mar 2026
Viewed by 402
Abstract
This study proposes the Double-Ensemble Learning Classification with SMOTE (DELC-SMOTE), a novel hierarchical framework designed to address the critical challenge of severe class imbalance in financial bond default prediction. The model integrates the Synthetic Minority Over-sampling Technique (SMOTE) into a two-phase ensemble architecture. [...] Read more.
This study proposes the Double-Ensemble Learning Classification with SMOTE (DELC-SMOTE), a novel hierarchical framework designed to address the critical challenge of severe class imbalance in financial bond default prediction. The model integrates the Synthetic Minority Over-sampling Technique (SMOTE) into a two-phase ensemble architecture. The first phase employs introspective stacking, where six heterogeneous base learners are individually enhanced through algorithm-specific balancing and meta-learning. The second phase fuses these optimized experts via performance-weighted voting. Empirical analysis utilizes a comprehensive dataset of 10,440 Chinese corporate bonds (522 defaults, ~5% default rate) sourced from Wind and CSMAR databases. Given the high cost of both false negatives and false positives in risk assessment, the Geometric Mean (G-mean) and Specificity are employed as primary evaluation metrics. Results demonstrate that the proposed DELC-SMOTE model significantly outperforms individual base classifiers and benchmark ensemble variants, achieving a G-mean of 0.9152 and a Specificity of 0.8715 under the primary experimental setting. The model exhibits robust performance across varying imbalance ratios (2%, 10%, 20%) and strong resilience against data noise, perturbations, and outliers. These findings indicate that the synergistic integration of data-level resampling within a diversified, two-tiered ensemble structure effectively mitigates class imbalance bias and enhances predictive reliability. The framework offers a robust and generalizable tool for actionable default risk assessment in imbalanced financial datasets. Full article
(This article belongs to the Section Data Mining and Machine Learning)
Show Figures

Figure 1

25 pages, 16570 KB  
Article
Effective Flow Ratio: A Novel Efficiency Metric for Heterogeneous Traffic in a Signalized Urban Intersection with Aerial Computer Vision
by Abu Anas Ibn Samad, Tanvir Ahmed and Md Nazmul Huda
Big Data Cogn. Comput. 2026, 10(3), 80; https://doi.org/10.3390/bdcc10030080 - 6 Mar 2026
Viewed by 507
Abstract
Intelligent Transportation Systems (ITS) primarily rely on flow rate and occupancy to estimate traffic states. However, in heterogeneous traffic conditions characterized by weak lane discipline and diverse vehicle classes, these conventional metrics fail to capture the true operational efficiency of signalized intersections. High [...] Read more.
Intelligent Transportation Systems (ITS) primarily rely on flow rate and occupancy to estimate traffic states. However, in heterogeneous traffic conditions characterized by weak lane discipline and diverse vehicle classes, these conventional metrics fail to capture the true operational efficiency of signalized intersections. High flow rates can mask underlying inefficiencies, while low flow rates do not necessarily indicate free-flow conditions. This paper introduces a novel computer vision-based metric, the Effective Flow Ratio (EFR), designed to quantify the actual discharge efficiency of mixed traffic. By leveraging Bird’s-Eye View (BEV) vehicle tracking using You Only Look Once version 11 (YOLOv11) and ByteTrack, EFR distinguishes between kinematic movement and effective discharge, resolving the ambiguity of “moving but not clearing” states. We analyze 21 days of continuous footage from a rooftop-mounted camera overlooking a congested intersection in Dhaka, Bangladesh, exhibiting distinct non-linear behaviors compared to raw flow counts. Our results demonstrate that: (i) Flow rate and discharge efficiency are dynamically decoupled, evidenced by significant variance in EFR within identical flow bins; (ii) Temporal rolling correlations reveal transient regimes where traditional signal control logic would misinterpret congestion severity; and (iii) EFR provides a more robust proxy for intersection performance than occupancy or volume alone. The proposed metric offers a granular, physics-informed input for next-generation adaptive traffic signal control in developing urban environments. Full article
(This article belongs to the Special Issue AI, Computer Vision and Human–Robot Interaction)
Show Figures

Figure 1

24 pages, 504 KB  
Article
Feasibility Study of CUDA-Accelerated Homomorphic Encryption and Benchmarking on Consumer-Grade and Embedded GPUs
by Volodymyr Dubetskyy and Maria-Dolores Cano
Big Data Cogn. Comput. 2026, 10(3), 79; https://doi.org/10.3390/bdcc10030079 - 6 Mar 2026
Viewed by 636
Abstract
Fully Homomorphic Encryption (FHE) provides strong data confidentiality during computation but often suffers from high latency on Central Processing Units (CPUs). This study evaluates Graphics Processing Unit (GPU) acceleration for modern FHE libraries across a laptop (NVIDIA GTX 1650 Ti), a server (NVIDIA [...] Read more.
Fully Homomorphic Encryption (FHE) provides strong data confidentiality during computation but often suffers from high latency on Central Processing Units (CPUs). This study evaluates Graphics Processing Unit (GPU) acceleration for modern FHE libraries across a laptop (NVIDIA GTX 1650 Ti), a server (NVIDIA RTX 4060), and a Jetson Nano 2 GB embedded GPU. We benchmark key generation, arithmetic operations, Boolean-gate evaluation and scheme-specific tasks such as relinearization and key switching, using library-provided benchmarks with an explicit baseline (operation scope, timing boundaries, and parameter tuples). Moreover, we compare GPU-native libraries (NuFHE, Phantom-FHE, and Troy-Nova) with CPU-oriented ones (Microsoft SEAL, HElib, OpenFHE, Cupcake, and TFHE-rs). Results show GPUs deliver significant speedups for targeted operations. For example, NuFHE’s NVIDIA CUDA (Compute Unified Device Architecture) backend achieves about 1.4× faster Boolean-gate evaluation on the laptop and 3.4× faster on the server compared to its OpenCL backend. Likewise, RLWE (Ring Learning With Errors)-based schemes (BFV, CKKS, and BGV) see marked gains for polynomial arithmetic such as Number Theoretic Transform (NTT) when executed via Phantom-FHE. However, attempts to add CUDA support to Microsoft SEAL reveal four main challenges: high-precision modular arithmetic on GPUs, sequential dependencies in SEAL’s design, limited GPU memory and complex build-system changes. In light of these findings, we propose revised guidelines for GPU-first FHE libraries and practical recommendations for deploying high-throughput, privacy-preserving solutions on modern GPUs. Full article
(This article belongs to the Section Big Data)
Show Figures

Figure 1

22 pages, 1359 KB  
Article
Kernel VICReg for Self-Supervised Learning in Reproducing Kernel Hilbert Space
by M. Hadi Sepanj, Benyamin Ghojogh, Saed Moradi and Paul Fieguth
Big Data Cogn. Comput. 2026, 10(3), 78; https://doi.org/10.3390/bdcc10030078 - 5 Mar 2026
Viewed by 381
Abstract
Self-supervised learning (SSL) has emerged as a powerful paradigm for representation learning by optimizing geometric objectives, such as invariance to augmentations, variance preservation, and feature decorrelation, without requiring labels. However, most existing methods operate in Euclidean space, limiting their ability to capture nonlinear [...] Read more.
Self-supervised learning (SSL) has emerged as a powerful paradigm for representation learning by optimizing geometric objectives, such as invariance to augmentations, variance preservation, and feature decorrelation, without requiring labels. However, most existing methods operate in Euclidean space, limiting their ability to capture nonlinear dependencies and geometric structures. In this work, we propose Kernel VICReg, a novel self-supervised learning framework that pulls the VICReg objective into a Reproducing Kernel Hilbert Space (RKHS). By kernelizing each term of the loss, variance, invariance, and covariance, we obtain a general formulation that operates on double-centered kernel matrices and Hilbert–Schmidt norms, enabling nonlinear feature learning without explicit mappings. We demonstrate that Kernel VICReg mitigates the risk of representational collapse under challenging conditions and improves performance on datasets exhibiting nonlinear structure or limited sample regimes. Empirical evaluations across MNIST, CIFAR-10, STL-10, TinyImageNet, and ImageNet100 show consistent gains over Euclidean VICReg, with particularly strong improvements on datasets where nonlinear structures are prominent. UMAP visualizations are provided only as a qualitative illustration of embedding geometry and are not used as a calibration or statistical validation. Our results suggest that kernelizing SSL objectives is a promising direction for bridging classical kernel methods with modern representation learning. Full article
(This article belongs to the Section Artificial Intelligence and Multi-Agent Systems)
Show Figures

Figure 1

24 pages, 2254 KB  
Article
Exploring Public Health Perspectives on Travel Behavior Using a Machine Learning Approach: Thailand Case Study
by Manlika Seefong, Panuwat Wisutwattanasak, Kestsirin Theerathitichaipa, Pattarawadee Prasomsab, Nisa Dackuntod, Thanapong Champahom and Rattanaporn Kasemsri
Big Data Cogn. Comput. 2026, 10(3), 77; https://doi.org/10.3390/bdcc10030077 - 5 Mar 2026
Viewed by 276
Abstract
Hospital transport services represent a vital alternative for addressing inequities in access to medical care, particularly in countries where public transportation systems are inadequate, such as Thailand. This approach enables equitable and widespread access to healthcare services for residents in underserved areas. The [...] Read more.
Hospital transport services represent a vital alternative for addressing inequities in access to medical care, particularly in countries where public transportation systems are inadequate, such as Thailand. This approach enables equitable and widespread access to healthcare services for residents in underserved areas. The objective of this study is to analyze the factors influencing the choice of hospital transport travel mode by comparing various machine learning algorithms. The findings reveal that the categorical boosting model outperformed the other models across all performance metrics. The model results indicate that waiting time, travel time, travel cost, and comfortability significantly influence the decision to use hospital transport services. Furthermore, demographic data analysis highlights critical factors such as age, gender, income, travel frequency, occupation, and time of travel, all of which significantly affect the choice of hospital transport service. To maximize the practical implications of this study, policy recommendations and implementation strategies are proposed to support decision-makers in promoting equitable travel options and eliminating barriers to fair access to healthcare services. Full article
Show Figures

Figure 1

34 pages, 1394 KB  
Systematic Review
A Systematic Review of Cross-Population Shifts in Medical Imaging Analysis with Deep Learning
by Aminu Musa, Rajesh Prasad, Peter Onwualu and Monica Hernandez
Big Data Cogn. Comput. 2026, 10(3), 76; https://doi.org/10.3390/bdcc10030076 - 4 Mar 2026
Viewed by 802
Abstract
Deep learning has achieved expert-level performance in medical imaging analysis. However, models often fail to generalize across patient populations due to cross-population domain shifts, distributional differences arising from demographic variability, variations in imaging protocols, scanner hardware, and differences in disease prevalence. This challenge [...] Read more.
Deep learning has achieved expert-level performance in medical imaging analysis. However, models often fail to generalize across patient populations due to cross-population domain shifts, distributional differences arising from demographic variability, variations in imaging protocols, scanner hardware, and differences in disease prevalence. This challenge limits the real-world deployment and can increase health inequities. This review systematically examines the nature, causes, and impact of cross-population domain shift in deep learning-based medical imaging analysis. We analyzed 50 peer-reviewed studies from 2020 to 2025, evaluating the proposed methodologies for handling population shifts, the datasets employed, and the metrics used to assess performance. Our findings demonstrate that performance degradation ranged from 10–25% when models were tested on unseen populations, emphasizing the substantial impact of domain shifts on model generalizability. The literature reveals that mitigation strategies broadly fall into two categories: data-centric approaches, such as augmentation and harmonization, and model-centric approaches, including domain adaptation, transfer learning, adversarial learning, multi-task learning, and continual learning. While domain adaptation and transfer learning are the most widely used, their performance gains across populations remain modest, ranging from 5–15%, and are not supported by external validation. Our synthesis reveals a significant reliance on large, publicly available datasets from limited regions, with an underrepresentation of data from low- and middle-income countries. Evaluation practices are inconsistent, with few studies employing standardized external test sets. This review provides a structured taxonomy of mitigation techniques, a refined analysis of domain shift characteristics, and an in-depth critique of methodological challenges. We highlight the urgent need for more geographically and demographically inclusive datasets, adaptable modeling techniques, and standardized evaluation protocols to enable accurate and equitable AI-driven diagnostics across diverse populations. Finally, we outline future research directions to guide the development of robust, generalizable, and fair models for medical imaging analysis. Full article
Show Figures

Figure 1

16 pages, 36949 KB  
Article
Evaluating Architecture Scalability and Transfer Learning in Urban Scene Segmentation Using Explainable AI
by Tanmay Sunil Hatkar, Abhinav Pandey and Saad B. Ahmed
Big Data Cogn. Comput. 2026, 10(3), 75; https://doi.org/10.3390/bdcc10030075 - 1 Mar 2026
Viewed by 310
Abstract
Semantic segmentation plays a pivotal role in autonomous driving, enabling pixel-level understanding of road scenes. Although transformer-based models such as SegFormer have shown exceptional performance on large datasets, their generalization to smaller and geographically diverse datasets remains underexplored. In this work, we analyze [...] Read more.
Semantic segmentation plays a pivotal role in autonomous driving, enabling pixel-level understanding of road scenes. Although transformer-based models such as SegFormer have shown exceptional performance on large datasets, their generalization to smaller and geographically diverse datasets remains underexplored. In this work, we analyze the scalability and transferability of SegFormer variants (B3, B4, B5) using CamVid as the base dataset. We perform cross-dataset transfer learning to KITTI and IDD, evaluate class-level performance, and explore explainable AI via confidence heatmaps. Our findings show that SegFormer-B5 achieves the highest accuracy (82.4% mIoU) on CamVid, while transfer learning from CamVid improves mIoU on KITTI by 2.57% and enhances class-specific predictions in IDD by over 70%. These results highlight the practical potential of SegFormer in real-world segmentation systems and the interpretability benefits of confidence-based visual analysis. Full article
Show Figures

Figure 1

34 pages, 463 KB  
Article
Data-Driven Ergonomic Load Dynamics for Human–Autonomy Teams
by Nikitas Gerolimos, Vasileios Alevizos and Georgios Priniotakis
Big Data Cogn. Comput. 2026, 10(3), 74; https://doi.org/10.3390/bdcc10030074 - 28 Feb 2026
Viewed by 335
Abstract
Ergonomic load in human–autonomy teams is commonly treated as a static score or a post-hoc audit, even though modern sensing and communication enable real-time regulation of operator effort. We model ergonomic load as a dissipative dynamical state inferred online from multimodal effort proxies [...] Read more.
Ergonomic load in human–autonomy teams is commonly treated as a static score or a post-hoc audit, even though modern sensing and communication enable real-time regulation of operator effort. We model ergonomic load as a dissipative dynamical state inferred online from multimodal effort proxies and task context, and couple it to autonomy through load-dependent gain moderation and compliance shaping. The method is evaluated on public human–swarm and human–robot interaction traces together with effort-proximal wearable and myographic datasets using a unified, windowed pipeline and controlled stress tests that emulate latency, downsampling, packet loss, and channel dropouts. On a large human–swarm benchmark, the estimator achieves strong discrimination and calibration for rare high-load events (up to AUROC 0.87, AUPRC 0.41, ECE 0.031 at q=0.90) and degrades predictably under delay, with a knee around 300–400ms (AUROC 0.870.80, ECE 0.0310.061 at 500ms). Embedding the estimate in the adaptation schedule reduces overload incidence and oscillatory redistribution while preserving coordination proxies in surrogate closed-loop simulation: overload time drops from 7.8% to 4.1% (relative reduction  47%) with throughput maintained near baseline (1.000.97) and oscillation power reduced (0.260.14) under nominal timing. These results provide a reproducible pathway for making ergonomics a control-relevant feedback signal, together with explicit operational constraints on estimator calibration (target ECE 0.05) and end-to-end latency (effective τ300ms) required to avoid regime switching and maintain stable, interpretable adaptation. Full article
Show Figures

Figure 1

20 pages, 1419 KB  
Article
Building Prototype Evolution Pathway for Emotion Recognition in User-Generated Videos
by Yujie Liu, Zhenyang Dong, Yante Li and Guoying Zhao
Big Data Cogn. Comput. 2026, 10(3), 73; https://doi.org/10.3390/bdcc10030073 - 28 Feb 2026
Viewed by 429
Abstract
Large-scale pretrained foundation models are increasingly essential for affective analysis in user-generated videos. However, current approaches typically reuse generic multi-modal representations directly with task-specific adapters learned from scratch, and their performance is limited by the large affective domain gap and scarce emotion annotations. [...] Read more.
Large-scale pretrained foundation models are increasingly essential for affective analysis in user-generated videos. However, current approaches typically reuse generic multi-modal representations directly with task-specific adapters learned from scratch, and their performance is limited by the large affective domain gap and scarce emotion annotations. To address these issues, we introduce a novel paradigm that leverages auxiliary cross-modal priors to enhance unimodal emotion modeling, effectively exploiting modality-shared semantics and modality-specific inductive biases. Specifically, we propose a progressive prototype evolution framework that gradually transforms a neutral prototype into discriminative emotional representations through fine-grained cross-modal interactions with visual cues. The auxiliary prior serves as a structural constraint, reframing the adaptation challenge from a difficult domain shift problem into a more tractable prototype shift within the affective space. To ensure robust prototype construction and guided evolution, we further design category-aggregated prompting and bidirectional supervision mechanisms. Extensive experiments on VideoEmotion-8, Ekman-6, and MusicVideo-6 validate the superiority of our approach, achieving state-of-the-art results and demonstrating the effectiveness of leveraging auxiliary modality priors for foundation-model-based emotion recognition. Full article
(This article belongs to the Special Issue Sentiment Analysis in the Context of Big Data)
Show Figures

Figure 1

24 pages, 3755 KB  
Article
Automating Data Product Discovery with Large Language Models and Metadata Reasoning
by Michalis Pingos, Artemis Photiou and Andreas S. Andreou
Big Data Cogn. Comput. 2026, 10(3), 72; https://doi.org/10.3390/bdcc10030072 - 28 Feb 2026
Viewed by 1448
Abstract
The exponential growth of data over the past decade has created new challenges in transforming raw information into actionable knowledge, particularly through the development of data products. The latter is essentially the result of querying and retrieving specific portions of data from a [...] Read more.
The exponential growth of data over the past decade has created new challenges in transforming raw information into actionable knowledge, particularly through the development of data products. The latter is essentially the result of querying and retrieving specific portions of data from a data storage architecture at various levels of granularity. Traditionally, this transformation depends on domain experts manually analyzing datasets and providing feedback to effectively describe or annotate data that facilitates data retrieval. Nevertheless, this is a very time-consuming process that highlights the need for its potential automation. To address this challenge, the present paper proposes a framework which utilizes Large Language Models to support data product discovery through semantic metadata reasoning and executable query prototyping. The framework is evaluated across two domains and three levels of concept complexity to assess the LLM’s ability to identify relevant datasets and generate executable data product queries under varying analytical demands. The findings indicate that LLMs perform effectively in simpler scenarios, but their performance declines as conceptual complexity and dataset volume increase. Full article
Show Figures

Figure 1

27 pages, 6857 KB  
Article
A Convergent Method for Energy Optimization in Modern Hopfield Networks
by Yida Bao, Mohammad Arifuzzaman, Tran Duc Le, Tao Jiang, Jing Hou, Yuan Xing and Dongfang Hou
Big Data Cogn. Comput. 2026, 10(3), 71; https://doi.org/10.3390/bdcc10030071 - 28 Feb 2026
Viewed by 314
Abstract
Modern Hopfield networks are energy-based associative memory models whose performance critically depends on the structure and optimization of their energy functions. While recent formulations substantially improve storage capacity, the resulting non-convex energy landscapes are often optimized using heuristic update rules that can be [...] Read more.
Modern Hopfield networks are energy-based associative memory models whose performance critically depends on the structure and optimization of their energy functions. While recent formulations substantially improve storage capacity, the resulting non-convex energy landscapes are often optimized using heuristic update rules that can be sensitive to initialization and may not provide monotonic energy descent or rigorous convergence guarantees. In this work, we propose a new energy formulation for modern Hopfield networks together with a principled iterative optimization scheme. The proposed energy admits a natural decomposition that allows optimization via the concave–convex procedure (CCCP), yielding well-defined network dynamics with guaranteed energy descent beyond classical Hopfield updates. We establish fundamental theoretical properties of the proposed framework, including non-negativity, boundedness, and monotonic decrease in the energy along iterations. In particular, we prove that the induced dynamics converge to a stationary point of the energy function, providing explicit convergence guarantees for the resulting Hopfield-type model. We further evaluate the proposed approach on synthetic classification tasks and compare its optimization behavior with that of the original Hopfield network and several standard machine learning baselines. Experimental results demonstrate improved stability, convergence behavior, and competitive classification performance. We also validate the approach on real-world benchmark datasets to demonstrate utility beyond controlled experiments. Overall, this work provides a theoretically grounded energy-based optimization framework for modern Hopfield networks, clarifying the role of principled optimization in achieving stable and convergent associative memory dynamics. Full article
(This article belongs to the Special Issue Application of Pattern Recognition and Machine Learning)
Show Figures

Figure 1

18 pages, 2369 KB  
Article
TransGoT: Structured Graph-of-Thoughts Reasoning for Machine Translation with Large Language Models
by Danying Zhang, Yixin Liu, Jie Zhao and Cai Xu
Big Data Cogn. Comput. 2026, 10(3), 70; https://doi.org/10.3390/bdcc10030070 - 27 Feb 2026
Viewed by 498
Abstract
Machine translation with large language models has recently attracted growing attention due to its flexibility and strong zero-shot and few-shot capabilities. However, most prompt-based LLM translation methods rely on linear generation or shallow self-refinement, implicitly committing to a single reasoning path. Such designs [...] Read more.
Machine translation with large language models has recently attracted growing attention due to its flexibility and strong zero-shot and few-shot capabilities. However, most prompt-based LLM translation methods rely on linear generation or shallow self-refinement, implicitly committing to a single reasoning path. Such designs are brittle when translating long and syntactically complex sources, where reliable translation often requires structured planning and hypothesis exploration. In this paper, we propose TransGoT, a novel machine translation framework inspired by the graph-of-thoughts paradigm, which formulates translation as a structured, multi-stage reasoning process over a graph of intermediate thoughts. TransGoT explicitly decomposes translation into constraint identification, draft generation, and culture- and style-aware refinement, enabling systematic exploration and aggregation of alternative translation hypotheses. To better adapt graph-based reasoning to translation, we design two key mechanisms: (1) Uncertainty-driven thought transformation. Unlike general reasoning tasks, translation uncertainty is often localized and unevenly distributed across tokens, making holistic regeneration inefficient. We therefore design uncertainty-driven thought transformation, which leverages model-internal confidence signals to guide targeted token-level revision; (2) Dispersion-adaptive thought scoring. It emphasizes evaluation criteria with stronger inter-candidate variance to enable robust multi-criteria thought selection. We evaluate TransGoT on the WMT22 benchmarks and experimental results demonstrate that TransGoT consistently outperforms strong LLM-based translation baselines, validating the effectiveness of structured graph-based reasoning for machine translation. Full article
(This article belongs to the Special Issue Natural Language Processing Applications in Big Data)
Show Figures

Figure 1

Previous Issue
Next Issue
Back to TopTop