MDPI - Publisher of Open Access Journals

40 pages, 1540 KB

Open AccessReview

A Survey on Video Big Data Analytics: Architecture, Technologies, and Open Research Challenges

by Thi-Thu-Trang Do, Quyet-Thang Huynh, Kyungbaek Kim and Van-Quyet Nguyen

Appl. Sci. 2025, 15(14), 8089; https://doi.org/10.3390/app15148089 - 21 Jul 2025

Viewed by 2164

The exponential growth of video data across domains such as surveillance, transportation, and healthcare has raised critical challenges in scalability, real-time processing, and privacy preservation. While existing studies have addressed individual aspects of Video Big Data Analytics (VBDA), an integrated, up-to-date perspective remains [...] Read more.

The exponential growth of video data across domains such as surveillance, transportation, and healthcare has raised critical challenges in scalability, real-time processing, and privacy preservation. While existing studies have addressed individual aspects of Video Big Data Analytics (VBDA), an integrated, up-to-date perspective remains limited. This paper presents a comprehensive survey of system architectures and enabling technologies in VBDA. It categorizes system architectures into four primary types as follows: centralized, cloud-based infrastructures, edge computing, and hybrid cloud–edge. It also analyzes key enabling technologies, including real-time streaming, scalable distributed processing, intelligent AI models, and advanced storage for managing large-scale multimodal video data. In addition, the study provides a functional taxonomy of core video processing tasks, including object detection, anomaly recognition, and semantic retrieval, and maps these tasks to real-world applications. Based on the survey findings, the paper proposes ViMindXAI, a hybrid AI-driven platform that combines edge and cloud orchestration, adaptive storage, and privacy-aware learning to support scalable and trustworthy video analytics. Our analysis in this survey highlights emerging trends such as the shift toward hybrid cloud–edge architectures, the growing importance of explainable AI and federated learning, and the urgent need for secure and efficient video data management. These findings highlight key directions for designing next-generation VBDA platforms that enhance real-time, data-driven decision-making in domains such as public safety, transportation, and healthcare. These platforms facilitate timely insights, rapid response, and regulatory alignment through scalable and explainable analytics. This work provides a robust conceptual foundation for future research on adaptive and efficient decision-support systems in video-intensive environments. Full article

(This article belongs to the Special Issue Advances in Machine Learning and Data Mining: Emerging Trends and Applications)

► Show Figures

Figure 1

24 pages, 19550 KB

Open AccessArticle

TMTS: A Physics-Based Turbulence Mitigation Network Guided by Turbulence Signatures for Satellite Video

by Jie Yin, Tao Sun, Xiao Zhang, Guorong Zhang, Xue Wan and Jianjun He

Remote Sens. 2025, 17(14), 2422; https://doi.org/10.3390/rs17142422 - 12 Jul 2025

Viewed by 506

Abstract

Atmospheric turbulence severely degrades high-resolution satellite videos through spatiotemporally coupled distortions, including temporal jitter, spatial-variant blur, deformation, and scintillation, thereby constraining downstream analytical capabilities. Restoring turbulence-corrupted videos poses a challenging ill-posed inverse problem due to the inherent randomness of turbulent fluctuations. While existing [...] Read more.

Atmospheric turbulence severely degrades high-resolution satellite videos through spatiotemporally coupled distortions, including temporal jitter, spatial-variant blur, deformation, and scintillation, thereby constraining downstream analytical capabilities. Restoring turbulence-corrupted videos poses a challenging ill-posed inverse problem due to the inherent randomness of turbulent fluctuations. While existing turbulence mitigation methods for long-range imaging demonstrate partial success, they exhibit limited generalizability and interpretability in large-scale satellite scenarios. Inspired by refractive-index structure constant (

C_{n}^{2}

) estimation from degraded sequences, we propose a physics-informed turbulence signature (TS) prior that explicitly captures spatiotemporal distortion patterns to enhance model transparency. Integrating this prior into a lucky imaging framework, we develop a Physics-Based Turbulence Mitigation Network guided by Turbulence Signature (TMTS) to disentangle atmospheric disturbances from satellite videos. The framework employs deformable attention modules guided by turbulence signatures to correct geometric distortions, iterative gated mechanisms for temporal alignment stability, and adaptive multi-frame aggregation to address spatially varying blur. Comprehensive experiments on synthetic and real-world turbulence-degraded satellite videos demonstrate TMTS’s superiority, achieving 0.27 dB PSNR and 0.0015 SSIM improvements over the DATUM baseline while maintaining practical computational efficiency. By bridging turbulence physics with deep learning, our approach provides both performance enhancements and interpretable restoration mechanisms, offering a viable solution for operational satellite video processing under atmospheric disturbances. Full article

(This article belongs to the Special Issue Intelligent Image Analysis: Advancing Remote Sensing with Artificial Intelligence)

► Show Figures

Graphical abstract

16 pages, 13461 KB

Open AccessArticle

Wi-Filter: WiFi-Assisted Frame Filtering on the Edge for Scalable and Resource-Efficient Video Analytics

by Lawrence Lubwama, Jungik Jang, Jisung Pyo, Joon Yoo and Jaehyuk Choi

Sensors 2025, 25(3), 701; https://doi.org/10.3390/s25030701 - 24 Jan 2025

Viewed by 1154

Abstract

With the growing prevalence of large-scale intelligent surveillance camera systems, the burden on real-time video analytics pipelines has significantly increased due to continuous video transmission from numerous cameras. To mitigate this strain, recent approaches focus on filtering irrelevant video frames early in the [...] Read more.

With the growing prevalence of large-scale intelligent surveillance camera systems, the burden on real-time video analytics pipelines has significantly increased due to continuous video transmission from numerous cameras. To mitigate this strain, recent approaches focus on filtering irrelevant video frames early in the pipeline, at the camera or edge device level. In this paper, we propose Wi-Filter, an innovative filtering method that leverages Wi-Fi signals from wireless edge devices, such as Wi-Fi-enabled cameras, to optimize filtering decisions dynamically. Wi-Filter utilizes channel state information (CSI) readily available from these wireless cameras to detect human motion within the field of view, adjusting the filtering threshold accordingly. The motion-sensing models in Wi-Filter (Wi-Fi assisted Filter) are trained using a self-supervised approach, where CSI data are automatically annotated via synchronized camera feeds. We demonstrate the effectiveness of Wi-Filter through real-world experiments and prototype implementation. Wi-Filter achieves motion detection accuracy exceeding 97.2% and reduces false positive rates by up to 60% while maintaining a high detection rate, even in challenging environments, showing its potential to enhance the efficiency of video analytics pipelines. Full article

(This article belongs to the Special Issue Next-Generation Wireless Systems for the Internet of Things (IoT)—2nd Edition)

► Show Figures

Figure 1

17 pages, 4619 KB

Open AccessArticle

Efficient Video Compression Using Afterimage Representation

by Minseong Jeon and Kyungjoo Cheoi

Sensors 2024, 24(22), 7398; https://doi.org/10.3390/s24227398 - 20 Nov 2024

Viewed by 1720

Abstract

Recent advancements in large-scale video data have highlighted the growing need for efficient data compression techniques to enhance video processing performance. In this paper, we propose an afterimage-based video compression method that significantly reduces video data volume while maintaining analytical performance. The proposed [...] Read more.

Recent advancements in large-scale video data have highlighted the growing need for efficient data compression techniques to enhance video processing performance. In this paper, we propose an afterimage-based video compression method that significantly reduces video data volume while maintaining analytical performance. The proposed approach utilizes optical flow to adaptively select the number of keyframes based on scene complexity, optimizing compression efficiency. Additionally, object movement masks extracted from keyframes are accumulated over time using alpha blending to generate the final afterimage. Experiments on the UCF-Crime dataset demonstrated that the proposed method achieved a 95.97% compression ratio. In binary classification experiments on normal/abnormal behaviors, the compressed videos maintained performance comparable to the original videos, while in multi-class classification, they outperformed the originals. Notably, classification experiments focused exclusively on abnormal behaviors exhibited a significant 4.25% improvement in performance. Moreover, further experiments showed that large language models (LLMs) can interpret the temporal context of original videos from single afterimages. These findings confirm that the proposed afterimage-based compression technique effectively preserves spatiotemporal information while significantly reducing data size. Full article

(This article belongs to the Topic Applied Computer Vision and Pattern Recognition: 2nd Edition)

► Show Figures

Figure 1

16 pages, 8247 KB

Open AccessArticle

Integration of Multi-Head Self-Attention and Convolution for Person Re-Identification

by Yalei Zhou, Peng Liu, Yue Cui, Chunguang Liu and Wenli Duan

Sensors 2022, 22(16), 6293; https://doi.org/10.3390/s22166293 - 21 Aug 2022

Cited by 10 | Viewed by 4378

Abstract

Person re-identification is essential to intelligent video analytics, whose results affect downstream tasks such as behavior and event analysis. However, most existing models only consider the accuracy, rather than the computational complexity, which is also an aspect to consider in practical deployment. We [...] Read more.

Person re-identification is essential to intelligent video analytics, whose results affect downstream tasks such as behavior and event analysis. However, most existing models only consider the accuracy, rather than the computational complexity, which is also an aspect to consider in practical deployment. We note that self-attention is a powerful technique for representation learning. It can work with convolution to learn more discriminative feature representations for re-identification. We propose an improved multi-scale feature learning structure, DM-OSNet, with better performance than the original OSNet. Our DM-OSNet replaces the

9 \times 9

convolutional stream in OSNet with multi-head self-attention. To maintain model efficiency, we use double-layer multi-head self-attention to reduce the computational complexity of the original multi-head self-attention. The computational complexity is reduced from the original

O ({(H \times W)}^{2})

to

O (H \times W \times G^{2})

. To further improve the model performance, we use SpCL to perform unsupervised pre-training on the large-scale unlabeled pedestrian dataset LUPerson. Finally, our DM-OSNet achieves an mAP of 87.36%, 78.26%, 72.96%, and 57.13% on the Market1501, DukeMTMC-reID, CUHK03, and MSMT17 datasets. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

18 pages, 1468 KB

Open AccessArticle

Inference Acceleration with Adaptive Distributed DNN Partition over Dynamic Video Stream

by Jin Cao, Bo Li, Mengni Fan and Huiyu Liu

Algorithms 2022, 15(7), 244; https://doi.org/10.3390/a15070244 - 13 Jul 2022

Cited by 1 | Viewed by 3050

Abstract

Deep neural network-based computer vision applications have exploded and are widely used in intelligent services for IoT devices. Due to the computationally intensive nature of DNNs, the deployment and execution of intelligent applications in smart scenarios face the challenge of limited device resources. [...] Read more.

Deep neural network-based computer vision applications have exploded and are widely used in intelligent services for IoT devices. Due to the computationally intensive nature of DNNs, the deployment and execution of intelligent applications in smart scenarios face the challenge of limited device resources. Existing job scheduling strategies are single-focused and have limited support for large-scale end-device scenarios. In this paper, we present ADDP, an adaptive distributed DNN partition method that supports video analysis on large-scale smart cameras. ADDP applies to the commonly used DNN models for computer vision and contains a feature-map layer partition module (FLP) supporting edge-to-end collaborative model partition and a feature-map size partition (FSP) module supporting multidevice parallel inference. Based on the inference delay minimization objective, FLP and FSP achieve a tradeoff between the arithmetic and communication resources of different devices. We validate ADDP on heterogeneous devices and show that both the FLP module and the FSP module outperform existing approaches and reduce single-frame response latency by 10–25% compared to the pure on-device processing. Full article

(This article belongs to the Special Issue Deep Learning for Internet of Things)

► Show Figures

Figure 1

16 pages, 4055 KB

Open AccessArticle

The Usefulness of Video Learning Analytics in Small Scale E-Learning Scenarios

by César Córcoles, Germán Cobo and Ana-Elena Guerrero-Roldán

Appl. Sci. 2021, 11(21), 10366; https://doi.org/10.3390/app112110366 - 4 Nov 2021

Cited by 4 | Viewed by 2894

Abstract

A variety of tools are available to collect, process and analyse learning data obtained from the clickstream generated by students watching learning resources in video format. There is also some literature on the uses of such data in order to better understand and [...] Read more.

A variety of tools are available to collect, process and analyse learning data obtained from the clickstream generated by students watching learning resources in video format. There is also some literature on the uses of such data in order to better understand and improve the teaching-learning process. Most of the literature focuses on large scale learning scenarios, such as MOOCs, where videos are watched hundreds or thousands of times. We have developed a solution to collect clickstream analytics data applicable to smaller scenarios, much more common in primary, secondary and higher education, where videos are watched tens or hundreds of times, and to analyse whether the solution is useful to teachers to improve the learning process. We have deployed it in a real scenario and collected real data. Furthermore, we have processed and presented the data visually to teachers for those scenarios and have collected and analysed their perception of their usefulness. We conclude that the collected data are perceived as useful by teachers to improve the teaching and learning process. Full article

(This article belongs to the Collection The Application and Development of E-learning)

► Show Figures

Figure 1

8 pages, 2364 KB

Open AccessProceeding Paper

Computer Vision Technique for Blind Identification of Modal Frequency of Structures from Video Measurements

by Vishal Allada and Thiyagarajan Jothi Saravanan

Eng. Proc. 2021, 10(1), 12; https://doi.org/10.3390/ecsa-8-11298 - 1 Nov 2021

Cited by 3 | Viewed by 1581

Abstract

Operational modal analysis (OMA) is required for the maintenance of large-scale civil structures. This paper developed a novel methodology of non-contact-based blind identification of the modal frequency of a vibrating structure from its video measurement. There are two stages in the proposed methodology. [...] Read more.

Operational modal analysis (OMA) is required for the maintenance of large-scale civil structures. This paper developed a novel methodology of non-contact-based blind identification of the modal frequency of a vibrating structure from its video measurement. There are two stages in the proposed methodology. The first stage is extracting the motion data of the vibrating structure from its video using a complex steerable pyramid. In the second stage, the principal component analysis combined with analytical mode decomposition is used for modal frequency separation from the motion data. Numerical validation of the methodology on a 10 DOF model is presented. The application of the proposed methodology on the London Millennium Bridge is also presented. Full article

(This article belongs to the Proceedings of The 8th International Electronic Conference on Sensors and Applications)

► Show Figures

Figure 1

26 pages, 872 KB

Open AccessArticle

Leveraging Edge Intelligence for Video Analytics in Smart City Applications

by Aluizio Rocha Neto, Thiago P. Silva, Thais Batista, Flávia C. Delicato, Paulo F. Pires and Frederico Lopes

Information 2021, 12(1), 14; https://doi.org/10.3390/info12010014 - 31 Dec 2020

Cited by 23 | Viewed by 4884

Abstract

In smart city scenarios, the huge proliferation of monitoring cameras scattered in public spaces has posed many challenges to network and processing infrastructure. A few dozen cameras are enough to saturate the city’s backbone. In addition, most smart city applications require a real-time [...] Read more.

In smart city scenarios, the huge proliferation of monitoring cameras scattered in public spaces has posed many challenges to network and processing infrastructure. A few dozen cameras are enough to saturate the city’s backbone. In addition, most smart city applications require a real-time response from the system in charge of processing such large-scale video streams. Finding a missing person using facial recognition technology is one of these applications that require immediate action on the place where that person is. In this paper, we tackle these challenges presenting a distributed system for video analytics designed to leverage edge computing capabilities. Our approach encompasses architecture, methods, and algorithms for: (i) dividing the burdensome processing of large-scale video streams into various machine learning tasks; and (ii) deploying these tasks as a workflow of data processing in edge devices equipped with hardware accelerators for neural networks. We also propose the reuse of nodes running tasks shared by multiple applications, e.g., facial recognition, thus improving the system’s processing throughput. Simulations showed that, with our algorithm to distribute the workload, the time to process a workflow is about 33% faster than a naive approach. Full article

(This article belongs to the Special Issue Smart Cyberphysical Systems and Cloud–Edge Engineering)

► Show Figures

Figure 1

20 pages, 1933 KB

Open AccessArticle

SIAT: A Distributed Video Analytics Framework for Intelligent Video Surveillance

by Md Azher Uddin, Aftab Alam, Nguyen Anh Tu, Md Siyamul Islam and Young-Koo Lee

Symmetry 2019, 11(7), 911; https://doi.org/10.3390/sym11070911 - 12 Jul 2019

Cited by 32 | Viewed by 7944

Abstract

In recent years, the amount of intelligent CCTV cameras installed in public places for surveillance has increased enormously and as a result, a large amount of video data is produced every moment. Due to this situation, there is an increasing request for the [...] Read more.

In recent years, the amount of intelligent CCTV cameras installed in public places for surveillance has increased enormously and as a result, a large amount of video data is produced every moment. Due to this situation, there is an increasing request for the distributed processing of large-scale video data. In an intelligent video analytics platform, a submitted unstructured video undergoes through several multidisciplinary algorithms with the aim of extracting insights and making them searchable and understandable for both human and machine. Video analytics have applications ranging from surveillance to video content management. In this context, various industrial and scholarly solutions exist. However, most of the existing solutions rely on a traditional client/server framework to perform face and object recognition while lacking the support for more complex application scenarios. Furthermore, these frameworks are rarely handled in a scalable manner using distributed computing. Besides, existing works do not provide any support for low-level distributed video processing APIs (Application Programming Interfaces). They also failed to address a complete service-oriented ecosystem to meet the growing demands of consumers, researchers and developers. In order to overcome these issues, in this paper, we propose a distributed video analytics framework for intelligent video surveillance known as SIAT. The proposed framework is able to process both the real-time video streams and batch video analytics. Each real-time stream also corresponds to batch processing data. Hence, this work correlates with the symmetry concept. Furthermore, we introduce a distributed video processing library on top of Spark. SIAT exploits state-of-the-art distributed computing technologies with the aim to ensure scalability, effectiveness and fault-tolerance. Lastly, we implant and evaluate our proposed framework with the goal to authenticate our claims. Full article

► Show Figures

Figure 1

19 pages, 2846 KB

Open AccessArticle

ICE-MoCha: Intelligent Crowd Engineering using Mobility Characterization and Analytics

by Abdoh Jabbari, Khalid J. Almalki, Baek-Young Choi and Sejun Song

Sensors 2019, 19(5), 1025; https://doi.org/10.3390/s19051025 - 28 Feb 2019

Cited by 11 | Viewed by 4852

Abstract

Human injuries and casualties at entertaining, religious, or political crowd events often occur due to the lack of proper crowd safety management. For instance, for a large scale moving crowd, a minor accident can create a panic for the people to start stampede. [...] Read more.

Human injuries and casualties at entertaining, religious, or political crowd events often occur due to the lack of proper crowd safety management. For instance, for a large scale moving crowd, a minor accident can create a panic for the people to start stampede. Although many smart video surveillance tools, inspired by the recent advanced artificial intelligence (AI) technology and machine learning (ML) algorithms, enable object detection and identification, it is still challenging to predict the crowd mobility in real-time for preventing potential disasters. In this paper, we propose an intelligent crowd engineering platform using mobility characterization and analytics named ICE-MoCha. ICE-MoCha is to assist safety management for mobile crowd events by predicting and thus helping to prevent potential disasters through real-time radio frequency (RF) data characterization and analysis. The existing video surveillance based approaches lack scalability thus have limitations in its capability for wide open areas of crowd events. Via effectively integrating RF signal analysis, our approach can enhance safety management for mobile crowd. We particularly tackle the problems of identification, speed, and direction detection for the mobile group, among various crowd mobility characteristics. We then apply those group semantics to track the crowd status and predict any potential accidents and disasters. Taking the advantages of power-efficiency, cost-effectiveness, and ubiquitous availability, we specifically use and analyze a Bluetooth low energy (BLE) signal. We have conducted experiments of ICE-MoCha in a real crowd event as well as controlled indoor and outdoor lab environments. The results show the feasibility of ICE-MoCha detecting the mobile crowd characteristics in real-time, indicating it can effectively help the crowd management tasks to avoid potential crowd movement related incidents. Full article

(This article belongs to the Special Issue Selected Papers from ISC2 2018)

► Show Figures

Figure 1

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI