MDPI - Publisher of Open Access Journals

19 pages, 2647 KB

Open AccessArticle

FDI-VSR: Video Super-Resolution Through Frequency-Domain Integration and Dynamic Offset Estimation

by Donghun Lim and Janghoon Choi

Sensors 2025, 25(8), 2402; https://doi.org/10.3390/s25082402 - 10 Apr 2025

Cited by 1 | Viewed by 1124

The increasing adoption of high-resolution imaging sensors across various fields has led to a growing demand for techniques to enhance video quality. Video super-resolution (VSR) addresses this need by reconstructing high-resolution videos from lower-resolution inputs; however, directly applying single-image super-resolution (SISR) methods to [...] Read more.

The increasing adoption of high-resolution imaging sensors across various fields has led to a growing demand for techniques to enhance video quality. Video super-resolution (VSR) addresses this need by reconstructing high-resolution videos from lower-resolution inputs; however, directly applying single-image super-resolution (SISR) methods to video sequences neglects temporal information, resulting in inconsistent and unnatural outputs. In this paper, we propose FDI-VSR, a novel framework that integrates spatiotemporal dynamics and frequency-domain analysis into conventional SISR models without extensive modifications. We introduce two key modules: the Spatiotemporal Feature Extraction Module (STFEM), which employs dynamic offset estimation, spatial alignment, and multi-stage temporal aggregation using residual channel attention blocks (RCABs); and the Frequency–Spatial Integration Module (FSIM), which transforms deep features into the frequency domain to effectively capture global context beyond the limited receptive field of standard convolutions. Extensive experiments on the Vid4, SPMCs, REDS4, and UDM10 benchmarks, supported by detailed ablation studies, demonstrate that FDI-VSR not only surpasses conventional VSR methods but also achieves competitive results compared to recent state-of-the-art methods, with improvements of up to 0.82 dB in PSNR on the SPMCs benchmark and notable reductions in visual artifacts, all while maintaining lower computational complexity and faster inference. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

32 pages, 9318 KB

Open AccessArticle

VidBlock: A Web3.0-Enabled Decentralized Blockchain Architecture for Live Video Streaming

by Hyunjoo Yang and Sejin Park

Appl. Sci. 2025, 15(3), 1289; https://doi.org/10.3390/app15031289 - 26 Jan 2025

Cited by 1 | Viewed by 2511

Abstract

In the digital era, the demand for real-time streaming services highlights the scalability, data sovereignty, and privacy limitations of traditional centralized systems. VidBlock introduces a novel decentralized blockchain architecture that leverages the blockchain’s immutable and transparent characteristics along with direct communication capabilities. This [...] Read more.

In the digital era, the demand for real-time streaming services highlights the scalability, data sovereignty, and privacy limitations of traditional centralized systems. VidBlock introduces a novel decentralized blockchain architecture that leverages the blockchain’s immutable and transparent characteristics along with direct communication capabilities. This ecosystem revolutionizes content delivery and storage, ensuring high data integrity and user trust. VidBlock’s architecture emphasizes serverless operation, aligning with the principles of decentralization to enhance efficiency and reduce costs. Our contributions include decentralized data management, user-controlled privacy, cost reduction through a serverless architecture, and improved global accessibility. Experiments show that VidBlock is superior in reducing latency and utilizing bandwidth, demonstrating its potential to redefine live video streaming in the Web3.0 era. Full article

► Show Figures

Figure 1

19 pages, 5481 KB

Open AccessArticle

Real-Time Semantic Segmentation Algorithm for Street Scenes Based on Attention Mechanism and Feature Fusion

by Bao Wu, Xingzhong Xiong and Yong Wang

Electronics 2024, 13(18), 3699; https://doi.org/10.3390/electronics13183699 - 18 Sep 2024

Cited by 3 | Viewed by 2237

Abstract

In computer vision, the task of semantic segmentation is crucial for applications such as autonomous driving and intelligent surveillance. However, achieving a balance between real-time performance and segmentation accuracy remains a significant challenge. Although Fast-SCNN is favored for its efficiency and low computational [...] Read more.

In computer vision, the task of semantic segmentation is crucial for applications such as autonomous driving and intelligent surveillance. However, achieving a balance between real-time performance and segmentation accuracy remains a significant challenge. Although Fast-SCNN is favored for its efficiency and low computational complexity, it still faces difficulties when handling complex street scene images. To address this issue, this paper presents an improved Fast-SCNN, aiming to enhance the accuracy and efficiency of semantic segmentation by incorporating a novel attention mechanism and an enhanced feature extraction module. Firstly, the integrated SimAM (Simple, Parameter-Free Attention Module) increases the network’s sensitivity to critical regions of the image and effectively adjusts the feature space weights across channels. Additionally, the refined pyramid pooling module in the global feature extraction module captures a broader range of contextual information through refined pooling levels. During the feature fusion stage, the introduction of an enhanced DAB (Depthwise Asymmetric Bottleneck) block and SE (Squeeze-and-Excitation) attention optimizes the network’s ability to process multi-scale information. Furthermore, the classifier module is extended by incorporating deeper convolutions and more complex convolutional structures, leading to a further improvement in model performance. These enhancements significantly improve the model’s ability to capture details and overall segmentation performance. Experimental results demonstrate that the proposed method excels in processing complex street scene images, achieving a mean Intersection over Union (mIoU) of 71.7% and 69.4% on the Cityscapes and CamVid datasets, respectively, while maintaining inference speeds of 81.4 fps and 113.6 fps. These results indicate that the proposed model effectively improves segmentation quality in complex street scenes while ensuring real-time processing capabilities. Full article

► Show Figures

Figure 1

21 pages, 6785 KB

Open AccessArticle

Multi-Granularity Aggregation with Spatiotemporal Consistency for Video-Based Person Re-Identification

by Hean Sung Lee, Minjung Kim, Sungjun Jang, Han Byeol Bae and Sangyoun Lee

Sensors 2024, 24(7), 2229; https://doi.org/10.3390/s24072229 - 30 Mar 2024

Cited by 2 | Viewed by 1780

Abstract

Video-based person re-identification (ReID) aims to exploit relevant features from spatial and temporal knowledge. Widely used methods include the part- and attention-based approaches for suppressing irrelevant spatial–temporal features. However, it is still challenging to overcome inconsistencies across video frames due to occlusion and [...] Read more.

Video-based person re-identification (ReID) aims to exploit relevant features from spatial and temporal knowledge. Widely used methods include the part- and attention-based approaches for suppressing irrelevant spatial–temporal features. However, it is still challenging to overcome inconsistencies across video frames due to occlusion and imperfect detection. These mismatches make temporal processing ineffective and create an imbalance of crucial spatial information. To address these problems, we propose the Spatiotemporal Multi-Granularity Aggregation (ST-MGA) method, which is specifically designed to accumulate relevant features with spatiotemporally consistent cues. The proposed framework consists of three main stages: extraction, which extracts spatiotemporally consistent partial information; augmentation, which augments the partial information with different granularity levels; and aggregation, which effectively aggregates the augmented spatiotemporal information. We first introduce the consistent part-attention (CPA) module, which extracts spatiotemporally consistent and well-aligned attentive parts. Sub-parts derived from CPA provide temporally consistent semantic information, solving misalignment problems in videos due to occlusion or inaccurate detection, and maximize the efficiency of aggregation through uniform partial information. To enhance the diversity of spatial and temporal cues, we introduce the Multi-Attention Part Augmentation (MA-PA) block, which incorporates fine parts at various granular levels, and the Long-/Short-term Temporal Augmentation (LS-TA) block, designed to capture both long- and short-term temporal relations. Using densely separated part cues, ST-MGA fully exploits and aggregates the spatiotemporal multi-granular patterns by comparing relations between parts and scales. In the experiments, the proposed ST-MGA renders state-of-the-art performance on several video-based ReID benchmarks (i.e., MARS, DukeMTMC-VideoReID, and LS-VID). Full article

(This article belongs to the Special Issue Image and Video Processing and Recognition Based on Artificial Intelligence: 3rd Edition)

► Show Figures

Figure 1

16 pages, 2701 KB

Open AccessArticle

Muti-Frame Point Cloud Feature Fusion Based on Attention Mechanisms for 3D Object Detection

by Zhenyu Zhai, Qiantong Wang, Zongxu Pan, Zhentong Gao and Wenlong Hu

Sensors 2022, 22(19), 7473; https://doi.org/10.3390/s22197473 - 2 Oct 2022

Cited by 12 | Viewed by 4113

Abstract

Continuous frames of point-cloud-based object detection is a new research direction. Currently, most research studies fuse multi-frame point clouds using concatenation-based methods. The method aligns different frames by using information on GPS, IMU, etc. However, this fusion method can only align static objects [...] Read more.

Continuous frames of point-cloud-based object detection is a new research direction. Currently, most research studies fuse multi-frame point clouds using concatenation-based methods. The method aligns different frames by using information on GPS, IMU, etc. However, this fusion method can only align static objects and not moving objects. In this paper, we proposed a non-local-based multi-scale feature fusion method, which can handle both moving and static objects without GPS- and IMU-based registrations. Considering that non-local methods are resource-consuming, we proposed a novel simplified non-local block based on the sparsity of the point cloud. By filtering out empty units, memory consumption decreased by 99.93%. In addition, triple attention is adopted to enhance the key information on the object and suppresses background noise, further benefiting non-local-based feature fusion methods. Finally, we verify the method based on PointPillars and CenterPoint. Experimental results show that the mAP of the proposed method improved by 3.9% and 4.1% in mAP compared with concatenation-based fusion modules, PointPillars-2 and CenterPoint-2, respectively. In addition, the proposed network outperforms powerful 3D-VID by 1.2% in mAP. Full article

(This article belongs to the Special Issue Artificial Intelligence and Smart Sensors for Autonomous Driving)

► Show Figures

Figure 1

18 pages, 2706 KB

Open AccessArticle

An Intelligent Tracking System for Moving Objects in Dynamic Environments

by Nada Ali Hakami, Hanan Ahmed Hosni Mahmoud and Abeer Abdulaziz AlArfaj

Actuators 2022, 11(10), 274; https://doi.org/10.3390/act11100274 - 25 Sep 2022

Cited by 2 | Viewed by 2189

Abstract

Localization of suspicious moving objects in dynamic environments requires high accuracy mapping. A deep learning model is proposed to track crossing moving objects in the opposite direction. Moving objects locus measurements are computed from the space included in the boundaries of the images [...] Read more.

Localization of suspicious moving objects in dynamic environments requires high accuracy mapping. A deep learning model is proposed to track crossing moving objects in the opposite direction. Moving objects locus measurements are computed from the space included in the boundaries of the images in the intersecting cameras. Object appearance is designated by the color and textural histograms in the intersecting camera views. The incorrect mapping of moving objects in a dynamic environment through synchronized localization can be considerably increased in complex areas. This is done due to the presence of unfit points that are triggered by moving targets. To face this problem, a robust model using the dynamic province rejection technique (DPR) is presented. We are proposing a novel model that incorporates a combination of the deep learning method and a tracking system that rejects dynamic areas which are not within the environment boundary of interest. The technique detects the dynamic points from sequential video images and partitions the current video image into super blocks and tags the border differences. In the last stage, dynamic areas are computed from dynamic points and superblock boundaries. Static regions are utilized to compute the positions to enhance the path computation precision of the model. Simulation results show that the introduced model has better performance than the state-of-the-art similar models in both the VID and MOVSD4 datasets and is higher than the state-of-the-art tracking systems with better speed performance. The experiments prove that the computed path error in the dynamic setting can be decreased by 81%. Full article

(This article belongs to the Special Issue Advanced Technologies and Applications in Robotics)

► Show Figures

Figure 1

12 pages, 268 KB

Open AccessArticle

Productivity and Quality of Garlic Produced Using Below-Zero Temperatures When Treating Seed Cloves

by José Magno Queiroz Luz, Breno Nunes Rodrigues de Azevedo, Sérgio Macedo Silva, Carlos Inácio Garcia de Oliveira, Túlio Garcia de Oliveira, Roberta Camargos de Oliveira and Renata Castoldi

Horticulturae 2022, 8(2), 96; https://doi.org/10.3390/horticulturae8020096 - 21 Jan 2022

Cited by 9 | Viewed by 4007

Abstract

Garlic cultivation has increased in Brazil in recent years primarily due to the adoption of appropriate technologies, such as the use of low temperatures during the maintenance of garlic seeds to overcome dormancy. However, there is no information on the effects of below-zero [...] Read more.

Garlic cultivation has increased in Brazil in recent years primarily due to the adoption of appropriate technologies, such as the use of low temperatures during the maintenance of garlic seeds to overcome dormancy. However, there is no information on the effects of below-zero temperatures when treating seed cloves on garlic development. Therefore, this study’s objective was to evaluate the effects of below-zero temperatures and different visual indices of overcoming dormancy (VIDs) on garlic performance in Cristalina County, Goias State, Brazil. The experiment was conducted in a randomized block design with four replicates in a 2 × 3 factorial scheme: with two VIDs (40% and 60%), and three temperature ranges (−1 to −3 °C, 1 to 3 °C, and 2 to 4 °C). Vegetative characteristics, bulbar ratios, and commercial bulb yields were evaluated. The results showed that below-zero temperatures resulted in better vegetative characteristics. The yield increased after using below-zero temperatures to treat seed cloves with a VID of 60%. The garlic produced had a higher market value. We concluded that there is an enormous potential for using below-zero temperatures to improve the performance of the “Ito” garlic variety, and more studies should be conducted with other varieties of economic importance to enhance Brazilian garlic production. Full article

15 pages, 854 KB

Open AccessArticle

DSTnet: Deformable Spatio-Temporal Convolutional Residual Network for Video Super-Resolution

by Anusha Khan, Allah Bux Sargano and Zulfiqar Habib

Mathematics 2021, 9(22), 2873; https://doi.org/10.3390/math9222873 - 12 Nov 2021

Cited by 1 | Viewed by 2839

Abstract

Video super-resolution (VSR) aims at generating high-resolution (HR) video frames with plausible and temporally consistent details using their low-resolution (LR) counterparts, and neighboring frames. The key challenge for VSR lies in the effective exploitation of intra-frame spatial relation and temporal dependency between consecutive [...] Read more.

Video super-resolution (VSR) aims at generating high-resolution (HR) video frames with plausible and temporally consistent details using their low-resolution (LR) counterparts, and neighboring frames. The key challenge for VSR lies in the effective exploitation of intra-frame spatial relation and temporal dependency between consecutive frames. Many existing techniques utilize spatial and temporal information separately and compensate motion via alignment. These methods cannot fully exploit the spatio-temporal information that significantly affects the quality of resultant HR videos. In this work, a novel deformable spatio-temporal convolutional residual network (DSTnet) is proposed to overcome the issues of separate motion estimation and compensation methods for VSR. The proposed framework consists of 3D convolutional residual blocks decomposed into spatial and temporal (2+1) D streams. This decomposition can simultaneously utilize input video’s spatial and temporal features without a separate motion estimation and compensation module. Furthermore, the deformable convolution layers have been used in the proposed model that enhances its motion-awareness capability. Our contribution is twofold; firstly, the proposed approach can overcome the challenges in modeling complex motions by efficiently using spatio-temporal information. Secondly, the proposed model has fewer parameters to learn than state-of-the-art methods, making it a computationally lean and efficient framework for VSR. Experiments are conducted on a benchmark Vid4 dataset to evaluate the efficacy of the proposed approach. The results demonstrate that the proposed approach achieves superior quantitative and qualitative performance compared to the state-of-the-art methods. Full article

(This article belongs to the Special Issue Computer Graphics, Image Processing and Artificial Intelligence)

► Show Figures

Figure 1

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (8)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI