MDPI - Publisher of Open Access Journals

17 pages, 1798 KB

Open AccessArticle

From One Domain to Another: The Pitfalls of Gender Recognition in Unseen Environments

by Nzakiese Mbongo, Kailash A. Hambarde and Hugo Proença

Sensors 2025, 25(13), 4161; https://doi.org/10.3390/s25134161 - 4 Jul 2025

Viewed by 386

Gender recognition from pedestrian imagery is acknowledged by many as a quasi-solved problem, yet most existing approaches evaluate performance in a within-domain setting, i.e., when the test and training data, though disjoint, closely resemble each other. This work provides the first exhaustive cross-domain [...] Read more.

Gender recognition from pedestrian imagery is acknowledged by many as a quasi-solved problem, yet most existing approaches evaluate performance in a within-domain setting, i.e., when the test and training data, though disjoint, closely resemble each other. This work provides the first exhaustive cross-domain assessment of six architectures considered to represent the state of the art: ALM, VAC, Rethinking, LML, YinYang-Net, and MAMBA, across three widely known benchmarks: PA-100K, PETA, and RAP. All train/test combinations between datasets were evaluated, yielding 54 comparable experiments. The results revealed a performance split: median in-domain F1 approached 90% in most models, while the average drop under domain shift was up to 16.4 percentage points, with the most recent approaches degrading the most. The adaptive-masking ALM achieved an F1 above 80% in most transfer scenarios, particularly those involving high-resolution or pose-stable domains, highlighting the importance of strong inductive biases over architectural novelty alone. Further, to characterize robustness quantitatively, we introduced the Unified Robustness Metric (URM), which integrates the average cross-domain degradation performance into a single score. A qualitative saliency analysis also corroborated the numerical findings by exposing over-confidence and contextual bias in misclassifications. Overall, this study suggests that challenges in gender recognition are much more evident in cross-domain settings than under the commonly reported within-domain context. Finally, we formalize an open evaluation protocol that can serve as a baseline for future works of this kind. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

15 pages, 9556 KB

Open AccessArticle

An Experimental Evaluation of Smart Sensors for Pedestrian Attribute Recognition Using Multi-Task Learning and Vision Language Models

by Antonio Greco, Alessia Saggese, Carlo Sansone and Bruno Vento

Sensors 2025, 25(6), 1736; https://doi.org/10.3390/s25061736 - 11 Mar 2025

Viewed by 902

Abstract

This paper presents the experimental evaluation and analyzes the results of the first edition of the pedestrian attribute recognition (PAR) contest, the international competition which focused on smart visual sensors based on multi-task computer vision methods for the recognition of binary and multi-class [...] Read more.

This paper presents the experimental evaluation and analyzes the results of the first edition of the pedestrian attribute recognition (PAR) contest, the international competition which focused on smart visual sensors based on multi-task computer vision methods for the recognition of binary and multi-class pedestrian attributes from images. The participant teams designed intelligent sensors based on vision-language models, transformers and convolutional neural networks that address the multi-label recognition problem leveraging task interdependencies to enhance model efficiency and effectiveness. Participants were provided with the MIVIA PAR Dataset, containing 105,244 annotated pedestrian images for training and validation, and their methods were evaluated on a private test set of over 20,000 images. In the paper, we analyze the smart visual sensors proposed by the participating teams, examining the results in terms of accuracy, standard deviation and confusion matrices and highlighting the correlations between design choices and performance. The results of this experimental evaluation, conducted in a challenging and realistic framework, suggest possible directions for future improvements in these smart sensors that are thoroughly discussed in the paper. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

14 pages, 3377 KB

Open AccessArticle

Enhancing Person Re-Identification through Attention-Driven Global Features and Angular Loss Optimization

by Yihan Bi, Rong Wang, Qianli Zhou, Ronghui Lin and Mingjie Wang

Entropy 2024, 26(6), 436; https://doi.org/10.3390/e26060436 - 21 May 2024

Viewed by 1757

Abstract

To address challenges related to the inadequate representation and inaccurate discrimination of pedestrian attributes, we propose a novel method for person re-identification, which leverages global feature learning and classification optimization. Specifically, this approach integrates a Normalization-based Channel Attention Module into the fundamental ResNet50 [...] Read more.

To address challenges related to the inadequate representation and inaccurate discrimination of pedestrian attributes, we propose a novel method for person re-identification, which leverages global feature learning and classification optimization. Specifically, this approach integrates a Normalization-based Channel Attention Module into the fundamental ResNet50 backbone, utilizing a scaling factor to prioritize and enhance key pedestrian feature information. Furthermore, dynamic activation functions are employed to adaptively modulate the parameters of ReLU based on the input convolutional feature maps, thereby bolstering the nonlinear expression capabilities of the network model. By incorporating Arcface loss into the cross-entropy loss, the supervised model is trained to learn pedestrian features that exhibit significant inter-class variance while maintaining tight intra-class coherence. The evaluation of the enhanced model on two popular datasets, Market1501 and DukeMTMC-ReID, reveals improvements in Rank-1 accuracy by 1.28% and 1.4%, respectively, along with corresponding gains in the mean average precision (mAP) of 1.93% and 1.84%. These findings indicate that the proposed model is capable of extracting more robust pedestrian features, enhancing feature discriminability, and ultimately achieving superior recognition accuracy. Full article

(This article belongs to the Section Multidisciplinary Applications)

► Show Figures

Figure 1

23 pages, 2148 KB

Open AccessArticle

MRG-T: Mask-Relation-Guided Transformer for Remote Vision-Based Pedestrian Attribute Recognition in Aerial Imagery

by Shun Zhang, Yupeng Li, Xiao Wu, Zunheng Chu and Lingfei Li

Remote Sens. 2024, 16(7), 1216; https://doi.org/10.3390/rs16071216 - 29 Mar 2024

Cited by 2 | Viewed by 1756

Abstract

Nowadays, with the rapid development of consumer Unmanned Aerial Vehicles (UAVs), utilizing UAV platforms for visual surveillance has become very attractive, and a key part of this is remote vision-based pedestrian attribute recognition. Pedestrian Attribute Recognition (PAR) is dedicated to predicting multiple attribute [...] Read more.

Nowadays, with the rapid development of consumer Unmanned Aerial Vehicles (UAVs), utilizing UAV platforms for visual surveillance has become very attractive, and a key part of this is remote vision-based pedestrian attribute recognition. Pedestrian Attribute Recognition (PAR) is dedicated to predicting multiple attribute labels of a single pedestrian image extracted from surveillance videos and aerial imagery, which presents significant challenges in the computer vision community due to factors such as poor imaging quality and substantial pose variations. Despite recent studies demonstrating impressive advancements in utilizing complicated architectures and exploring relations, most of them may fail to fully and systematically consider the inter-region, inter-attribute, and region-attribute mapping relations simultaneously and be stuck in the dilemma of information redundancy, leading to the degradation of recognition accuracy. To address the issues, we construct a novel Mask-Relation-Guided Transformer (MRG-T) framework that consists of three relation modeling modules to fully exploit spatial and semantic relations in the model learning process. Specifically, we first propose a Masked Region Relation Module (MRRM) to focus on precise spatial attention regions to extract more robust features with masked random patch training. To explore the semantic association of attributes, we further present a Masked Attribute Relation Module (MARM) to extract intrinsic and semantic inter-attribute relations with an attribute label masking strategy. Based on the cross-attention mechanism, we finally design a Region and Attribute Mapping Module (RAMM) to learn the cross-modal alignment between spatial regions and semantic attributes. We conduct comprehensive experiments on three public benchmarks such as PETA, PA-100K, and RAPv1, and conduct inference on a large-scale airborne person dataset named PRAI-1581. The extensive experimental results demonstrate the superior performance of our method compared to state-of-the-art approaches and validate the effectiveness of mask-relation-guided modeling in the remote vision-based PAR task. Full article

(This article belongs to the Special Issue Signal Processing Theory and Methods in Remote Sensing)

► Show Figures

Figure 1

16 pages, 7134 KB

Open AccessArticle

Dual-Stage Attribute Embedding and Modality Consistency Learning-Based Visible–Infrared Person Re-Identification

by Zhuxuan Cheng, Huijie Fan, Qiang Wang, Shiben Liu and Yandong Tang

Electronics 2023, 12(24), 4892; https://doi.org/10.3390/electronics12244892 - 5 Dec 2023

Cited by 2 | Viewed by 1568

Abstract

Visible–infrared person re-identification (VI-ReID) is an emerging technology for realizing all-weather smart surveillance systems. To address the problem of pedestrian discriminative information being difficult to obtain and easy to lose, as well as the wide modality difference in the VI-ReID task, in this [...] Read more.

Visible–infrared person re-identification (VI-ReID) is an emerging technology for realizing all-weather smart surveillance systems. To address the problem of pedestrian discriminative information being difficult to obtain and easy to lose, as well as the wide modality difference in the VI-ReID task, in this paper we propose a two-stage attribute embedding and modality consistency learning-based VI-ReID method. First, the attribute information embedding module introduces the fine-grained pedestrian information in the attribute label into the transformer backbone, enabling the backbone to extract identity-discriminative pedestrian features. After obtaining the pedestrian features, the attribute embedding enhancement module is utilized to realize the second-stage attribute information embedding, which reduces the adverse effect of losing the person discriminative information due to the deepening of network. Finally, the modality consistency learning loss is designed for constraining the network to mine the consistency information between two modalities in order to reduce the impact of modality difference on the recognition results. The results show that our method reaches 74.57% mAP on the SYSU-MM01 dataset in All Search mode and 87.02% mAP on the RegDB dataset in IR-to-VIS mode, with a performance improvement of 6.00% and 2.56%, respectively, proving that our proposed method is able to reach optimal performance compared to existing state-of-the-art methods. Full article

(This article belongs to the Special Issue Lifelong Machine Learning-Based Efficient Robotic Object Perception)

► Show Figures

Figure 1

24 pages, 22680 KB

Open AccessArticle

A Computer Vision-Based Algorithm for Detecting Vehicle Yielding to Pedestrians

by Yanqi Wan, Yaqi Xu, Yi Xu, Heyi Wang, Jian Wang and Mingzheng Liu

Sustainability 2023, 15(22), 15714; https://doi.org/10.3390/su152215714 - 7 Nov 2023

Cited by 2 | Viewed by 2797

Abstract

Computer vision has made remarkable progress in traffic surveillance, but determining whether a motor vehicle yields to pedestrians still requires considerable human effort. This study proposes an automated method for detecting whether a vehicle yields to pedestrians in intelligent transportation systems. The method [...] Read more.

Computer vision has made remarkable progress in traffic surveillance, but determining whether a motor vehicle yields to pedestrians still requires considerable human effort. This study proposes an automated method for detecting whether a vehicle yields to pedestrians in intelligent transportation systems. The method employs a target-tracking algorithm that uses feature maps and license plate IDs to track the motion of relevant elements in the camera’s field of view. By analyzing the positions of motor vehicles and pedestrians over time, we predict the warning points of pedestrians and hazardous areas in front of vehicles to determine whether the vehicles yield to pedestrians. Extensive experiments are conducted on the MOT16 dataset, real traffic street scene video dataset, and a Unity3D virtual simulation scene dataset combined with SUMO, which demonstrating the superiority of this tracking algorithms. Compared to the current state-of-the-art methods, this method demonstrates significant improvements in processing speed without compromising accuracy. Specifically, this approach substantially outperforms in operational efficiency, thus catering aptly to real-time recognition requirements. This meticulous experimentation and evaluations reveal a commendable reduction in ID switches, enhancing the reliability of violation attributions to the correct vehicles. Such enhancement is crucial in practical urban settings characterized by dynamic interactions and variable conditions. This approach can be applied in various weather, time, and road conditions, achieving high predictive accuracy and interpretability in detecting vehicle–pedestrian interactions. This advanced algorithm illuminates the viable pathways for integrating technological innovation and sustainability, paving the way for more resilient and intelligent urban ecosystems. Full article

► Show Figures

Figure 1

18 pages, 4094 KB

Open AccessArticle

A 6G-Enabled Lightweight Framework for Person Re-Identification on Distributed Edges

by Xiting Peng, Yichao Wang, Xiaoyu Zhang, Haibo Yang, Xiongyan Tang and Shi Bai

Electronics 2023, 12(10), 2266; https://doi.org/10.3390/electronics12102266 - 17 May 2023

Cited by 4 | Viewed by 2081

Abstract

In the upcoming 6G era, edge artificial intelligence (AI), as a key technology, will be able to deliver AI processes anytime and anywhere by the deploying of AI models on edge devices. As a hot issue in public safety, person re-identification (Re-ID) also [...] Read more.

In the upcoming 6G era, edge artificial intelligence (AI), as a key technology, will be able to deliver AI processes anytime and anywhere by the deploying of AI models on edge devices. As a hot issue in public safety, person re-identification (Re-ID) also needs its models to be urgently deployed on edge devices to realize real-time and accurate recognition. However, due to complex scenarios and other practical reasons, the performance of the re-identification model is poor in practice. This is especially the case in public places, where most people have similar characteristics, and there are environmental differences, as well other such characteristics that cause problems for identification, and which make it difficult to search for suspicious persons. Therefore, a novel end-to-end suspicious person re-identification framework deployed on edge devices that focuses on real public scenarios is proposed in this paper. In our framework, the video data are cut images and are input into the You only look once (YOLOv5) detector to obtain the pedestrian position information. An omni-scale network (OSNet) is applied through which to conduct the pedestrian attribute recognition and re-identification. Broad learning systems (BLSs) and cycle-consistent adversarial networks (CycleGAN) are used to remove the noise data and unify the style of some of the data obtained under different shooting environments, thus improving the re-identification model performance. In addition, a real-world dataset of the railway station and actual problem requirements are provided as our experimental targets. The HUAWEI Atlas 500 was used as the edge equipment for the testing phase. The experimental results indicate that our framework is effective and lightweight, can be deployed on edge devices, and it can be applied for suspicious person re-identification in public places. Full article

(This article belongs to the Special Issue Edge AI for 6G and Internet of Things)

► Show Figures

Figure 1

21 pages, 31152 KB

Open AccessArticle

StreetAware: A High-Resolution Synchronized Multimodal Urban Scene Dataset

by Yurii Piadyk, Joao Rulff, Ethan Brewer, Maryam Hosseini, Kaan Ozbay, Murugan Sankaradas, Srimat Chakradhar and Claudio Silva

Sensors 2023, 23(7), 3710; https://doi.org/10.3390/s23073710 - 3 Apr 2023

Cited by 13 | Viewed by 5586

Abstract

Access to high-quality data is an important barrier in the digital analysis of urban settings, including applications within computer vision and urban design. Diverse forms of data collected from sensors in areas of high activity in the urban environment, particularly at street intersections, [...] Read more.

Access to high-quality data is an important barrier in the digital analysis of urban settings, including applications within computer vision and urban design. Diverse forms of data collected from sensors in areas of high activity in the urban environment, particularly at street intersections, are valuable resources for researchers interpreting the dynamics between vehicles, pedestrians, and the built environment. In this paper, we present a high-resolution audio, video, and LiDAR dataset of three urban intersections in Brooklyn, New York, totaling almost 8 unique hours. The data were collected with custom Reconfigurable Environmental Intelligence Platform (REIP) sensors that were designed with the ability to accurately synchronize multiple video and audio inputs. The resulting data are novel in that they are inclusively multimodal, multi-angular, high-resolution, and synchronized. We demonstrate four ways the data could be utilized — (1) to discover and locate occluded objects using multiple sensors and modalities, (2) to associate audio events with their respective visual representations using both video and audio modes, (3) to track the amount of each type of object in a scene over time, and (4) to measure pedestrian speed using multiple synchronized camera views. In addition to these use cases, our data are available for other researchers to carry out analyses related to applying machine learning to understanding the urban environment (in which existing datasets may be inadequate), such as pedestrian-vehicle interaction modeling and pedestrian attribute recognition. Such analyses can help inform decisions made in the context of urban sensing and smart cities, including accessibility-aware urban design and Vision Zero initiatives. Full article

(This article belongs to the Special Issue IoT Enabling Technologies for Smart Cities: Challenges and Approaches (Volume II))

► Show Figures

Figure 1

19 pages, 4119 KB

Open AccessArticle

Real-Time 3D Object Detection and Classification in Autonomous Driving Environment Using 3D LiDAR and Camera Sensors

by K. S. Arikumar, A. Deepak Kumar, Thippa Reddy Gadekallu, Sahaya Beni Prathiba and K. Tamilarasi

Electronics 2022, 11(24), 4203; https://doi.org/10.3390/electronics11244203 - 16 Dec 2022

Cited by 38 | Viewed by 7794

Abstract

The rapid development of Autonomous Vehicles (AVs) increases the requirement for the accurate prediction of objects in the vicinity to guarantee safer journeys. For effectively predicting objects, sensors such as Three-Dimensional Light Detection and Ranging (3D LiDAR) and cameras can be used. The [...] Read more.

The rapid development of Autonomous Vehicles (AVs) increases the requirement for the accurate prediction of objects in the vicinity to guarantee safer journeys. For effectively predicting objects, sensors such as Three-Dimensional Light Detection and Ranging (3D LiDAR) and cameras can be used. The 3D LiDAR sensor captures the 3D shape of the object and produces point cloud data that describes the geometrical structure of the object. The LiDAR-only detectors may be subject to false detection or even non-detection over objects located at high distances. The camera sensor captures RGB images with sufficient attributes that describe the distinct identification of the object. The high-resolution images produced by the camera sensor benefit the precise classification of the objects. However, hindrances such as the absence of depth information from the images, unstructured point clouds, and cross modalities affect assertion and boil down the environmental perception. To this end, this paper proposes an object detection mechanism that fuses the data received from the camera sensor and the 3D LiDAR sensor (OD-C3DL). The 3D LiDAR sensor obtains point clouds of the object such as distance, position, and geometric shape. The OD-C3DL employs Convolutional Neural Networks (CNN) for further processing point clouds obtained from the 3D LiDAR sensor and the camera sensor to recognize the objects effectively. The point cloud of the LiDAR is enhanced and fused with the image space on the Regions of Interest (ROI) for easy recognition of the objects. The evaluation results show that the OD-C3DL can provide an average of 89 real-time objects for a frame and reduces the extraction time by a recall rate of 94%. The average processing time is 65ms, which makes the OD-C3DL model incredibly suitable for the AVs perception. Furthermore, OD-C3DL provides mean accuracy for identifying automobiles and pedestrians at a moderate degree of difficulty is higher than that of the previous models at 79.13% and 88.76%. Full article

(This article belongs to the Special Issue Artificial Intelligence Technologies and Applications)

► Show Figures

Figure 1

15 pages, 2697 KB

Open AccessArticle

Binary Dense SIFT Flow Based Position-Information Added Two-Stream CNN for Pedestrian Action Recognition

by Sang Kyoo Park, Jun Ho Chung, Dong Sung Pae and Myo Taeg Lim

Appl. Sci. 2022, 12(20), 10445; https://doi.org/10.3390/app122010445 - 17 Oct 2022

Cited by 6 | Viewed by 2404

Abstract

Pedestrian behavior recognition in the driving environment is an important technology to prevent pedestrian accidents by predicting the next movement. It is necessary to recognize current pedestrian behavior to predict future pedestrian behavior. However, many studies have recognized human visible characteristics such as [...] Read more.

Pedestrian behavior recognition in the driving environment is an important technology to prevent pedestrian accidents by predicting the next movement. It is necessary to recognize current pedestrian behavior to predict future pedestrian behavior. However, many studies have recognized human visible characteristics such as face, body parts or clothes, but few have recognized pedestrian behavior. It is challenging to recognize pedestrian behavior in the driving environment due to the changes in the camera field of view due to the illumination conditions in outdoor environments and vehicle movement. In this paper, to predict pedestrian behavior, we introduce a position-information added two-stream convolutional neural network (CNN) with multi task learning that is robust to the limited conditions of the outdoor driving environment. The conventional two-stream CNN is the most widely used model for human-action recognition. However, the conventional two-stream CNN based on optical flow has limitations regarding pedestrian behavior recognition in a moving vehicle because of the assumptions of brightness constancy and piecewise smoothness. To solve this problem for a moving vehicle, the binary descriptor dense scale-invariant feature transform (SIFT) flow, a feature-based matching algorithm, is robust in moving-pedestrian behavior recognition, such as walking and standing, in a moving vehicle. However, recognizing cross attributes, such as crossing or not crossing the street, is challenging using the binary descriptor dense SIFT flow because people who cross the road or not act the same walking action, but their location on the image is different. Therefore, pedestrian position information should be added to the conventional binary descriptor dense SIFT flow two-stream CNN. Thus, learning biased toward action attributes is evenly learned across action and cross attributes. In addition, YOLO detection and the Siamese tracker are used instead of the ground-truth boundary box to prove the robustness in the action- and cross-attribute recognition from a moving vehicle. The JAAD and PIE datasets were used for training, and only the JAAD dataset was used as a testing dataset for comparison with other state-of-the-art research on multitask and single-task learning. Full article

(This article belongs to the Special Issue Artificial Intelligence and Complex Systems Analysis in Transportation and Maintenance)

► Show Figures

Figure 1

16 pages, 1007 KB

Open AccessArticle

Occluded Pedestrian-Attribute Recognition for Video Sensors Using Group Sparsity

by Geonu Lee, Kimin Yun and Jungchan Cho

Sensors 2022, 22(17), 6626; https://doi.org/10.3390/s22176626 - 1 Sep 2022

Cited by 1 | Viewed by 3427

Abstract

Pedestrians are often obstructed by other objects or people in real-world vision sensors. These obstacles make pedestrian-attribute recognition (PAR) difficult; hence, occlusion processing for visual sensing is a key issue in PAR. To address this problem, we first formulate the identification of non-occluded [...] Read more.

Pedestrians are often obstructed by other objects or people in real-world vision sensors. These obstacles make pedestrian-attribute recognition (PAR) difficult; hence, occlusion processing for visual sensing is a key issue in PAR. To address this problem, we first formulate the identification of non-occluded frames as temporal attention based on the sparsity of a crowded video. In other words, a model for PAR is guided to prevent paying attention to the occluded frame. However, we deduced that this approach cannot include a correlation between attributes when occlusion occurs. For example, “boots” and “shoe color” cannot be recognized simultaneously when the foot is invisible. To address the uncorrelated attention issue, we propose a novel temporal-attention module based on group sparsity. Group sparsity is applied across attention weights in correlated attributes. Accordingly, physically-adjacent pedestrian attributes are grouped, and the attention weights of a group are forced to focus on the same frames. Experimental results indicate that the proposed method achieved 1.18% and 6.21% higher

F_{1}

-scores than the advanced baseline method on the occlusion samples in DukeMTMC-VideoReID and MARS video-based PAR datasets, respectively. Full article

(This article belongs to the Special Issue Image and Video Processing and Recognition Based on Artificial Intelligence-2nd Edition)

► Show Figures

Figure 1

22 pages, 52325 KB

Open AccessArticle

Multi-Level Fusion Model for Person Re-Identification by Attribute Awareness

by Shengyu Pei and Xiaoping Fan

Algorithms 2022, 15(4), 120; https://doi.org/10.3390/a15040120 - 30 Mar 2022

Viewed by 3038

Abstract

Existing person re-recognition (Re-ID) methods usually suffer from poor generalization capability and over-fitting problems caused by insufficient training samples. We find that high-level attributes, semantic information, and part-based local information alignment are useful for person Re-ID networks. In this study, we propose a [...] Read more.

Existing person re-recognition (Re-ID) methods usually suffer from poor generalization capability and over-fitting problems caused by insufficient training samples. We find that high-level attributes, semantic information, and part-based local information alignment are useful for person Re-ID networks. In this study, we propose a person re-recognition network with part-based attribute-enhanced features. The model includes a multi-task learning module, local information alignment module, and global information learning module. The ResNet based on non-local and instance batch normalization (IBN) learns more discriminative feature representations. The multi-task module, local module, and global module are used in parallel for feature extraction. To better prevent over-fitting, the local information alignment module transforms pedestrian attitude alignment into local information alignment to assist in attribute recognition. Extensive experiments are carried out on the Market-1501 and DukeMTMC-reID datasets, whose results demonstrate that the effectiveness of the method is superior to most current algorithms. Full article

► Show Figures

Figure 1

25 pages, 8623 KB

Open AccessArticle

A Framework for Pedestrian Attribute Recognition Using Deep Learning

by Saadman Sakib, Kaushik Deb, Pranab Kumar Dhar and Oh-Jin Kwon

Appl. Sci. 2022, 12(2), 622; https://doi.org/10.3390/app12020622 - 10 Jan 2022

Cited by 7 | Viewed by 5716

Abstract

The pedestrian attribute recognition task is becoming more popular daily because of its significant role in surveillance scenarios. As the technological advances are significantly more than before, deep learning came to the surface of computer vision. Previous works applied deep learning in different [...] Read more.

The pedestrian attribute recognition task is becoming more popular daily because of its significant role in surveillance scenarios. As the technological advances are significantly more than before, deep learning came to the surface of computer vision. Previous works applied deep learning in different ways to recognize pedestrian attributes. The results are satisfactory, but still, there is some scope for improvement. The transfer learning technique is becoming more popular for its extraordinary performance in reducing computation cost and scarcity of data in any task. This paper proposes a framework that can work in surveillance scenarios to recognize pedestrian attributes. The mask R-CNN object detector extracts the pedestrians. Additionally, we applied transfer learning techniques on different CNN architectures, i.e., Inception ResNet v2, Xception, ResNet 101 v2, ResNet 152 v2. The main contribution of this paper is fine-tuning the ResNet 152 v2 architecture, which is performed by freezing layers, last 4, 8, 12, 14, 20, none, and all. Moreover, data balancing techniques are applied, i.e., oversampling, to resolve the class imbalance problem of the dataset and analysis of the usefulness of this technique is discussed in this paper. Our proposed framework outperforms state-of-the-art methods, and it provides 93.41% mA and 89.24% mA on the RAP v2 and PARSE100K datasets, respectively. Full article

(This article belongs to the Special Issue Deep Vision Algorithms and Applications)

► Show Figures

Figure 1

16 pages, 1297 KB

Open AccessArticle

Skeleton-Based Attention Mask for Pedestrian Attribute Recognition Network

by Sorn Sooksatra and Sitapa Rujikietgumjorn

J. Imaging 2021, 7(12), 264; https://doi.org/10.3390/jimaging7120264 - 4 Dec 2021

Cited by 4 | Viewed by 3098

Abstract

This paper presents an extended model for a pedestrian attribute recognition network utilizing skeleton data as a soft attention model to extract a local feature corresponding to a specific attribute. This technique helped keep valuable information surrounding the target area and handle the [...] Read more.

This paper presents an extended model for a pedestrian attribute recognition network utilizing skeleton data as a soft attention model to extract a local feature corresponding to a specific attribute. This technique helped keep valuable information surrounding the target area and handle the variation of human posture. The attention masks were designed to focus on the partial and the whole-body regions. This research utilized an augmented layer for data augmentation inside the network to reduce over-fitting errors. Our network was evaluated in two datasets (RAP and PETA) with various backbone networks (ResNet-50, Inception V3, and Inception-ResNet V2). The experimental result shows that our network improves overall classification performance with a mean accuracy of about 2–3% in the same backbone network, especially local attributes and various human postures. Full article

(This article belongs to the Special Issue Advances in Human Action Recognition Using Deep Learning)

► Show Figures

Figure 1

44 pages, 985 KB

Open AccessArticle

Human Attribute Recognition— A Comprehensive Survey

by Ehsan Yaghoubi, Farhad Khezeli, Diana Borza, SV Aruna Kumar, João Neves and Hugo Proença

Appl. Sci. 2020, 10(16), 5608; https://doi.org/10.3390/app10165608 - 13 Aug 2020

Cited by 14 | Viewed by 7263

Abstract

Human Attribute Recognition (HAR) is a highly active research field in computer vision and pattern recognition domains with various applications such as surveillance or fashion. Several approaches have been proposed to tackle the particular challenges in HAR. However, these approaches have dramatically changed [...] Read more.

Human Attribute Recognition (HAR) is a highly active research field in computer vision and pattern recognition domains with various applications such as surveillance or fashion. Several approaches have been proposed to tackle the particular challenges in HAR. However, these approaches have dramatically changed over the last decade, mainly due to the improvements brought by deep learning solutions. To provide insights for future algorithm design and dataset collections, in this survey, (1) we provide an in-depth analysis of existing HAR techniques, concerning the advances proposed to address the HAR’s main challenges; (2) we provide a comprehensive discussion over the publicly available datasets for the development and evaluation of novel HAR approaches; (3) we outline the applications and typical evaluation metrics used in the HAR context. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

Search Results (20)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (20)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI