**Xiangpeng Song \*, Hongbin Yang and Congcong Zhou**

School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China; hbyoungshu@staff.shu.edu.cn (H.Y.); zhoucongcong@shu.edu.cn (C.Z.)

**\*** Correspondence: sxptom@shu.edu.cn; Tel.: +86-18019118350

Received: 23 October 2019; Accepted: 15 November 2019; Published: 19 November 2019

**Abstract:** Pedestrian attribute recognition is to predict a set of attribute labels of the pedestrian from surveillance scenarios, which is a very challenging task for computer vision due to poor image quality, continual appearance variations, as well as diverse spatial distribution of imbalanced attributes. It is desirable to model the label dependencies between different attributes to improve the recognition performance as each pedestrian normally possesses many attributes. In this paper, we treat pedestrian attribute recognition as multi-label classification and propose a novel model based on the graph convolutional network (GCN). The model is mainly divided into two parts, we first use convolutional neural network (CNN) to extract pedestrian feature, which is a normal operation processing image in deep learning, then we transfer attribute labels to word embedding and construct a correlation matrix between labels to help GCN propagate information between nodes. This paper applies the object classifiers learned by GCN to the image representation extracted by CNN to enable the model to have the ability to be end-to-end trainable. Experiments on pedestrian attribute recognition dataset show that the approach obviously outperforms other existing state-of-the-art methods.

**Keywords:** pedestrian attribute recognition; graph convolutional network; multi-label learning

#### **1. Introduction**

Video surveillance is a part of our daily life, with the advent of the era of artificial intelligence, intelligent video analytics attaches great importance to the modern city since it can pre-alarm abnormal behaviors or events [1]. In this paper, we mainly focus on the pedestrian in the surveillance system. Human attribute analysis has recently drawn a remarkable amount of attentions by researchers for person detection and re-identification, as well as widely applied in plenty of aspects [2]; besides, pedestrian structured representation can obviously reduce surveillance video storage and improve pedestrian retrieve speed in the surveillance system.

However, there are still plenty of challenges. For one thing, human attributes naturally involve largely imbalanced data distribution. For example, when collecting the attribute "Bald", most of them will be labeled as "No Bald" and its imbalanced ratio to the "Bald" class is usually very large [3]. For another, there are plenty of uncertainties in practical scenarios, the resolution of images may be lower and the human body may be blocked by other things [4]. Furthermore, the collecting of labeled samples is labor-consuming.

Extreme learning machine (ELM) has gained increasing interest from various research fields for certain years. Apart from classification and regression, ELM has recently been extended for clustering, feature selection, representational learning, and many other learning tasks [5]. As an efficient single-hidden-layer feed forward neural network, which has generally good performance and fast learning speed, ELM has been applied in a variety of domains, such as computer vision, energy disaggregation [6], and speech enhancement [7]. In [8], authors proposed a novel pedestrian

detection method using multimodal Histogram of Oriented Gradient for pedestrian feature extraction and extreme learning machine for classification to reduce the detection rate of false positives and accelerate processing speed. The experimental results have proved the efficiency of the ELM based method.

Recently, researchers mostly used convolutional neural network to extract image features due to the rapid development of deep learning. As for multi-label classification, researchers have proposed some approaches based on the probabilistic graph model or recurrent attention model to deal with the problem. It is worth mentioning that attention mechanisms is also a popular method. Inspired by [9], we propose a novel model based on the graph convolutional network to model the correlations between labels. For example, when the label "Long Hair" occurs, the label "Female" is much more likely to show up than the label "Male". Following this idea, we construct an adjacency matrix between labels to deliver this correlation into classifiers, then combine the image representation to produce multi-label loss. Our code is hosted at https://github.com/2014gaokao/pedestrian-attribute-recognition-with-GCN. The contributions of the paper are as follows:


The result of the paper is organized as follows: Section 2 comprehensively introduces related work in pedestrian attribute recognition. Section 3 describes the overall architecture of our model in detail. Section 4 verified the superiority of our method using experiments. Section 5 concludes the paper and states future research directions.
