MDPI - Publisher of Open Access Journals

17 pages, 6315 KB

Open AccessArticle

RVM+: An AI-Driven Vision Sensor Framework for High-Precision, Real-Time Video Portrait Segmentation with Enhanced Temporal Consistency and Optimized Model Design

by Na Tang, Yuehui Liao, Yu Chen, Guang Yang, Xiaobo Lai and Jing Chen

Sensors 2025, 25(5), 1278; https://doi.org/10.3390/s25051278 - 20 Feb 2025

Viewed by 2123

Abstract

Video portrait segmentation is essential for intelligent sensing systems, including human-computer interaction, autonomous navigation, and augmented reality. However, dynamic video environments introduce significant challenges, such as temporal variations, occlusions, and computational constraints. This study introduces RVM+, an enhanced video segmentation framework based on the Robust Video Matting (RVM) architecture. By incorporating Convolutional Gated Recurrent Units (ConvGRU), RVM+ improves temporal consistency and captures intricate temporal dynamics across video frames. Additionally, a novel knowledge distillation strategy reduces computational demands while maintaining high segmentation accuracy, making the framework ideal for real-time applications in resource-constrained environments. Comprehensive evaluations on challenging datasets show that RVM+ outperforms state-of-the-art methods in both segmentation accuracy and temporal consistency. Key performance indicators such as MIoU, SAD, and dtSSD effectively verify the robustness and efficiency of the model. The integration of knowledge distillation ensures a streamlined and effective design with negligible accuracy trade-offs, highlighting its suitability for practical deployment. This study makes significant strides in intelligent sensor technology, providing a high-performance, efficient, and scalable solution for video segmentation. RVM+ offers potential for applications in fields such as augmented reality, robotics, and real-time video analysis, while also advancing the development of AI-enabled vision sensors. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

13 pages, 2924 KB

Open AccessArticle

Matting Algorithm with Improved Portrait Details for Images with Complex Backgrounds

by Rui Li, Dan Zhang, Sheng-Ling Geng and Ming-Quan Zhou

Appl. Sci. 2024, 14(5), 1942; https://doi.org/10.3390/app14051942 - 27 Feb 2024

Cited by 1 | Viewed by 4145

Abstract

With the continuous development of virtual reality, digital image applications, the required complex scene video proliferates. For this reason, portrait matting has become a popular topic. In this paper, a new matting algorithm with improved portrait details for images with complex backgrounds (MORLIPO) is proposed. This work combines the background restoration module (BRM) and the fine-grained matting module (FGMatting) to achieve high-detail matting for images with complex backgrounds. We recover the background by inputting a single image or video, which serves as a priori and aids in generating a more accurate alpha matte. The main framework uses the image matting model MODNet, the MobileNetV2 lightweight network, and the background restoration module, which can both preserve the background information of the current image and provide a more accurate prediction of the alpha matte of the current frame for the video image. It also provides the background prior of the previous frame to predict the alpha matte of the current frame more accurately. The fine-grained matting module is designed to extract fine-grained details of the foreground and retain the features, while combining with the semantic module to achieve more accurate matting. Our design allows training on a single NVIDIA 3090 GPU in an end-to-end manner and experiments on publicly available data sets. Experimental validation shows that our method performs well on both visual effects and objective evaluation metrics. Full article

(This article belongs to the Special Issue Innovative Applications of Artificial Intelligence in Multidisciplinary Sciences: Latest Advances and Prospects)

► Show Figures

Figure 1

15 pages, 2064 KB

Open AccessArticle

Portrait Semantic Segmentation Method Based on Dual Modal Information Complementarity

by Guang Feng and Chong Tang

Appl. Sci. 2024, 14(4), 1439; https://doi.org/10.3390/app14041439 - 9 Feb 2024

Viewed by 1585

Abstract

Semantic segmentation of human images is a research hotspot in the field of computer vision. At present, the semantic segmentation models based on U-net generally lack the ability to capture the spatial information of images. At the same time, semantic incompatibility exists because the feature maps of encoder and decoder are directly connected in the skip connection stage. In addition, in low light scenes such as at night, it is easy for false segmentation and segmentation accuracy to appear. To solve the above problems, a portrait semantic segmentation method based on dual-modal information complementarity is proposed. The encoder adopts a double branch structure, and uses a SK-ASSP module that can adaptively adjust the convolution weights of different receptor fields to extract features in RGB and gray image modes respectively, and carries out cross-modal information complementarity and feature fusion. A hybrid attention mechanism is used in the jump connection phase to capture both the channel and coordinate context information of the image. Experiments on human matting dataset show that the PA and MIoU coefficients of this algorithm model reach 96.58% and 94.48% respectively, which is better than U-net benchmark model and other mainstream semantic segmentation models. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

12 pages, 3600 KB

Open AccessArticle

Bust Portraits Matting Based on Improved U-Net

by Honggang Xie, Kaiyuan Hou, Di Jiang and Wanjie Ma

Electronics 2023, 12(6), 1378; https://doi.org/10.3390/electronics12061378 - 14 Mar 2023

Cited by 4 | Viewed by 3142

Abstract

Extracting complete portrait foregrounds from natural images is widely used in image editing and high-definition map generation. When making high-definition maps, it is often necessary to matte passers-by to guarantee their privacy. Current matting methods that do not require additional trimap inputs often suffer from inaccurate global predictions or blurred local details. Portrait matting, as a soft segmentation method, allows the creation of excess areas during segmentation, which inevitably leads to noise in the resulting alpha image as well as excess foreground information, so it is not necessary to keep all the excess areas. To overcome the above problems, this paper designed a contour sharpness refining network (CSRN) that modifies the weight of the alpha values of uncertain regions in the prediction map. It is combined with an end-to-end matting network for bust matting based on the U-Net target detection network containing Residual U-blocks. An end-to-end matting network for bust matting is designed. The network can effectively reduce the image noise without affecting the complete foreground information obtained by the deeper network, thus obtaining a more detailed foreground image with fine edge details. The network structure has been tested on the PPM-100, the RealWorldPortrait-636, and a self-built dataset, showing excellent performance in both edge refinement and global prediction for half-figure portraits. Full article

(This article belongs to the Special Issue Computer Vision for Modern Vehicles)

► Show Figures

Figure 1

18 pages, 21778 KB

Open AccessArticle

Semi-Supervised Portrait Matting via the Collaboration of Teacher–Student Network and Adaptive Strategies

by Xinyue Zhang, Guodong Wang, Chenglizhao Chen, Hao Dong and Mingju Shao

Electronics 2022, 11(24), 4080; https://doi.org/10.3390/electronics11244080 - 8 Dec 2022

Cited by 1 | Viewed by 1942

Abstract

In the portrait matting domain, existing methods rely entirely on annotated images for learning. However, delicate manual annotations are time-consuming and there are few detailed datasets available. To reduce complete dependency on labeled datasets, we design a semi-supervised network (ASSN) with two kinds of innovative adaptive strategies for portrait matting. Three pivotal sub-modules are embedded in our architecture, including a static teacher network (S-TN), a static student network (S-SN), and an adaptive student network (A-SN). S-TN and S-SN are modules that need to be trained with a small number of high-quality labeled datasets. Moreover, A-SN and S-SN share the same module parameters. When processing unlabeled datasets, A-SN adopts the adaptive strategies designed by us to discard the dependence on labeled datasets. The adaptive strategies include: (i) An auxiliary adaption: The teacher network with complicated design not only provides alpha mattes for the adaptive student network but also transmits rough segmentation results and edge graphs as optimization reference standards. (ii) A self-adjusting adaption: The adaptive network can make self-supervised to the characteristics of different layers. In addition, we have produced a finely annotated dataset for scholars in the field. Compared with existing datasets, our dataset complements the following two types of data neglected in previous datasets: (i) Images taken by multiple people. (ii) Images under low light conditions. Full article

(This article belongs to the Special Issue New Technologies in Digital Media Processing: When Computer Vision Meets Natural Language Processing)

► Show Figures

Figure 1

Search Results (5)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (5)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI