sensors-logo

Journal Browser

Journal Browser

Computer Vision in Human Analysis: From Face and Body to Clothes

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: closed (30 November 2022) | Viewed by 31236

Special Issue Editors


E-Mail Website
Guest Editor
MT Lille Douai, Institut Mines-Télécom, Centre for Digital Systems, F-59000 Lille, France
Interests: computer vision; pattern recognition; face and facial expression recognition; action recognition

E-Mail Website
Guest Editor
Department of Engineering (DIEF), University of Modena and Reggio Emilia, 41125 Modena, Italy
Interests: computer vision; deep learning; vision based HCI; IoT
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Computer Science and Engineering (DISI), University of Bologna, 40126 Bologna, Italy
Interests: computer vision; deep learning; face analysis; biometrics
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Architecture and Engineering, University of Parma, Parco Area delle Scienze 181/A, Parma, Italy
Interests: computer vision; pattern recognition; machine learning; artificial intelligence
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Engineering "Enzo Ferrari", University of Modena and Reggio Emilia, Via P. Vivarelli 10, 41125 Modena, Italy
Interests: image captioning; saliency prediction; vision and language; computer vision; embodied AI; deep learning; artificial intelligence

E-Mail Website
Guest Editor
Media Integration and Communication Center (MICC), University of Florence, 50121 Florence, Italy
Interests: computer vision; object detection; semantic annotation; action recognition

E-Mail Website
Guest Editor
Department of Computer Science, Aalto University, 02150 Espoo, Finland
Interests: computer vision; deep learning

Special Issue Information

Dear Colleagues,

Human-centered data are extremely widespread and have been intensely investigated by researchers belonging to different fields, including computer vision, machine learning, and Artificial Intelligence. These research efforts are motivated by the several highly informative aspects of humans that can be investigated, ranging from corporal elements (e.g., bodies, faces, hands, anthropometric measurements) to emotions and outward appearance (e.g., human garments and accessories).

We encourage submissions from all areas of computer vision, focusing on the analysis of humans. More general contributions such as novel theories, frameworks, architectures, and datasets are also welcome. The topics of interest include, but are not limited to, the following:

Human Body

  • People detection and tracking;
  • 2D/3D human pose estimation;
  • Action and gesture recognition;
  • Anthropometric measurement estimation;
  • Gait analysis;
  • Person re-identification;
  • 3D body reconstruction.

Human Face

  • Facial landmark detection;
  • Head pose estimation;
  • Facial expression and emotion recognition.

Outward Appearance

  • Garment-based virtual try-on;
  • Human-centered image and video synthesis;
  • Generative clothing;
  • Human clothing and attribute recognition;
  • Fashion image manipulation;
  • Outfit recommendation.

Human-Centered Data

  • Novel datasets with human data;
  • Fairness and biases in human analysis;
  • Privacy preservation and data anonymization;
  • First-person vision for human behavior understanding;
  • Multimodal data fusion for human analysis;
  • Computational issues in human analysis architectures.

Biometrics

  • Face recognition and verification;
  • Fingerprint and iris recognition;
  • Morphing attack detection.

Prof. Dr. Mohamed Daoudi
Prof. Dr. Roberto Vezzani
Dr. Guido Borghi
Dr. Claudio Ferrari
Dr. Marcella Cornia
Dr. Federico Becattini
Dr. Andrea Pilzer
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (14 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

3 pages, 178 KiB  
Editorial
Computer Vision in Human Analysis: From Face and Body to Clothes
by Mohamed Daoudi, Roberto Vezzani, Guido Borghi, Claudio Ferrari, Marcella Cornia, Federico Becattini and Andrea Pilzer
Sensors 2023, 23(12), 5378; https://doi.org/10.3390/s23125378 - 06 Jun 2023
Viewed by 1080
Abstract
For decades, researchers of different areas, ranging from artificial intelligence to computer vision, have intensively investigated human-centered data, i [...] Full article
(This article belongs to the Special Issue Computer Vision in Human Analysis: From Face and Body to Clothes)

Research

Jump to: Editorial

21 pages, 2571 KiB  
Article
Weakly Supervised 2D Pose Adaptation and Body Part Segmentation for Concealed Object Detection
by Lawrence Amadi and Gady Agam
Sensors 2023, 23(4), 2005; https://doi.org/10.3390/s23042005 - 10 Feb 2023
Cited by 5 | Viewed by 1645
Abstract
Weakly supervised pose estimation can be used to assist unsupervised body part segmentation and concealed item detection. The accuracy of pose estimation is essential for precise body part segmentation and accurate concealed item detection. In this paper, we show how poses obtained from [...] Read more.
Weakly supervised pose estimation can be used to assist unsupervised body part segmentation and concealed item detection. The accuracy of pose estimation is essential for precise body part segmentation and accurate concealed item detection. In this paper, we show how poses obtained from an RGB pretrained 2D pose detector can be modified for the backscatter image domain. The 2D poses are refined using RANSAC bundle adjustment to minimize the projection loss in 3D. Furthermore, we show how 2D poses can be optimized using a newly proposed 3D-to-2D pose correction network weakly supervised with pose prior regularizers and multi-view pose and posture consistency losses. The optimized 2D poses are used to segment human body parts. We then train a body-part-aware anomaly detection network to detect foreign (concealed threat) objects on segmented body parts. Our work is applied to the TSA passenger screening dataset containing millimeter wave scan images of airport travelers annotated with only binary labels that indicate whether a foreign object is concealed on a body part. Our proposed approach significantly improves the detection accuracy of TSA 2D backscatter images in existing works with a state-of-the-art performance of 97% F1-score, 0.0559 log-loss on the TSA-PSD test-set, and a 74% reduction in 2D pose error. Full article
(This article belongs to the Special Issue Computer Vision in Human Analysis: From Face and Body to Clothes)
Show Figures

Figure 1

16 pages, 1409 KiB  
Article
Fashion-Oriented Image Captioning with External Knowledge Retrieval and Fully Attentive Gates
by Nicholas Moratelli, Manuele Barraco, Davide Morelli, Marcella Cornia, Lorenzo Baraldi and Rita Cucchiara
Sensors 2023, 23(3), 1286; https://doi.org/10.3390/s23031286 - 23 Jan 2023
Cited by 8 | Viewed by 2656
Abstract
Research related to fashion and e-commerce domains is gaining attention in computer vision and multimedia communities. Following this trend, this article tackles the task of generating fine-grained and accurate natural language descriptions of fashion items, a recently-proposed and under-explored challenge that is still [...] Read more.
Research related to fashion and e-commerce domains is gaining attention in computer vision and multimedia communities. Following this trend, this article tackles the task of generating fine-grained and accurate natural language descriptions of fashion items, a recently-proposed and under-explored challenge that is still far from being solved. To overcome the limitations of previous approaches, a transformer-based captioning model was designed with the integration of external textual memory that could be accessed through k-nearest neighbor (kNN) searches. From an architectural point of view, the proposed transformer model can read and retrieve items from the external memory through cross-attention operations, and tune the flow of information coming from the external memory thanks to a novel fully attentive gate. Experimental analyses were carried out on the fashion captioning dataset (FACAD) for fashion image captioning, which contains more than 130k fine-grained descriptions, validating the effectiveness of the proposed approach and the proposed architectural strategies in comparison with carefully designed baselines and state-of-the-art approaches. The presented method constantly outperforms all compared approaches, demonstrating its effectiveness for fashion image captioning. Full article
(This article belongs to the Special Issue Computer Vision in Human Analysis: From Face and Body to Clothes)
Show Figures

Figure 1

13 pages, 4267 KiB  
Article
Generating High-Resolution 3D Faces and Bodies Using VQ-VAE-2 with PixelSNAIL Networks on 2D Representations
by Alessio Gallucci, Dmitry Znamenskiy, Yuxuan Long, Nicola Pezzotti and Milan Petkovic
Sensors 2023, 23(3), 1168; https://doi.org/10.3390/s23031168 - 19 Jan 2023
Cited by 2 | Viewed by 2710
Abstract
Modeling and representing 3D shapes of the human body and face is a prominent field due to its applications in the healthcare, clothes, and movie industry. In our work, we tackled the problem of 3D face and body synthesis by reducing 3D meshes [...] Read more.
Modeling and representing 3D shapes of the human body and face is a prominent field due to its applications in the healthcare, clothes, and movie industry. In our work, we tackled the problem of 3D face and body synthesis by reducing 3D meshes to 2D image representations. We show that the face can naturally be modeled on a 2D grid. At the same time, for more challenging 3D body geometries, we proposed a novel non-bijective 3D–2D conversion method representing the 3D body mesh as a plurality of rendered projections on the 2D grid. Then, we trained a state-of-the-art vector-quantized variational autoencoder (VQ-VAE-2) to learn a latent representation of 2D images and fit a PixelSNAIL autoregressive model to sample novel synthetic meshes. We evaluated our method versus a classical one based on principal component analysis (PCA) by sampling from the empirical cumulative distribution of the PCA scores. We used the empirical distributions of two commonly used metrics, specificity and diversity, to quantitatively demonstrate that the synthetic faces generated with our method are statistically closer to real faces when compared with the PCA ones. Our experiment on the 3D body geometry requires further research to match the test set statistics but shows promising results. Full article
(This article belongs to the Special Issue Computer Vision in Human Analysis: From Face and Body to Clothes)
Show Figures

Figure 1

19 pages, 26861 KiB  
Article
Can Hierarchical Transformers Learn Facial Geometry?
by Paul Young, Nima Ebadi, Arun Das, Mazal Bethany, Kevin Desai and Peyman Najafirad
Sensors 2023, 23(2), 929; https://doi.org/10.3390/s23020929 - 13 Jan 2023
Cited by 1 | Viewed by 1744
Abstract
Human faces are a core part of our identity and expression, and thus, understanding facial geometry is key to capturing this information. Automated systems that seek to make use of this information must have a way of modeling facial features in a way [...] Read more.
Human faces are a core part of our identity and expression, and thus, understanding facial geometry is key to capturing this information. Automated systems that seek to make use of this information must have a way of modeling facial features in a way that makes them accessible. Hierarchical, multi-level architectures have the capability of capturing the different resolutions of representation involved. In this work, we propose using a hierarchical transformer architecture as a means of capturing a robust representation of facial geometry. We further demonstrate the versatility of our approach by using this transformer as a backbone to support three facial representation problems: face anti-spoofing, facial expression representation, and deepfake detection. The combination of effective fine-grained details alongside global attention representations makes this architecture an excellent candidate for these facial representation problems. We conduct numerous experiments first showcasing the ability of our approach to address common issues in facial modeling (pose, occlusions, and background variation) and capture facial symmetry, then demonstrating its effectiveness on three supplemental tasks. Full article
(This article belongs to the Special Issue Computer Vision in Human Analysis: From Face and Body to Clothes)
Show Figures

Figure 1

17 pages, 2397 KiB  
Article
Future Pose Prediction from 3D Human Skeleton Sequence with Surrounding Situation
by Tomohiro Fujita and Yasutomo Kawanishi
Sensors 2023, 23(2), 876; https://doi.org/10.3390/s23020876 - 12 Jan 2023
Cited by 3 | Viewed by 2414
Abstract
Human pose prediction is vital for robot applications such as human–robot interaction and autonomous control of robots. Recent prediction methods often use deep learning and are based on a 3D human skeleton sequence to predict future poses. Even if the starting motions of [...] Read more.
Human pose prediction is vital for robot applications such as human–robot interaction and autonomous control of robots. Recent prediction methods often use deep learning and are based on a 3D human skeleton sequence to predict future poses. Even if the starting motions of 3D human skeleton sequences are very similar, their future poses will have variety. It makes it difficult to predict future poses only from a given human skeleton sequence. Meanwhile, when carefully observing human motions, we can find that human motions are often affected by objects or other people around the target person. We consider that the presence of surrounding objects is an important clue for the prediction. This paper proposes a method for predicting the future skeleton sequence by incorporating the surrounding situation into the prediction model. The proposed method uses a feature of an image around the target person as the surrounding information. We confirmed the performance improvement of the proposed method through evaluations on publicly available datasets. As a result, the prediction accuracy was improved for object-related and human-related motions. Full article
(This article belongs to the Special Issue Computer Vision in Human Analysis: From Face and Body to Clothes)
Show Figures

Figure 1

15 pages, 2857 KiB  
Article
Joint-Based Action Progress Prediction
by Davide Pucci, Federico Becattini and Alberto Del Bimbo
Sensors 2023, 23(1), 520; https://doi.org/10.3390/s23010520 - 03 Jan 2023
Cited by 2 | Viewed by 2222
Abstract
Action understanding is a fundamental computer vision branch for several applications, ranging from surveillance to robotics. Most works deal with localizing and recognizing the action in both time and space, without providing a characterization of its evolution. Recent works have addressed the prediction [...] Read more.
Action understanding is a fundamental computer vision branch for several applications, ranging from surveillance to robotics. Most works deal with localizing and recognizing the action in both time and space, without providing a characterization of its evolution. Recent works have addressed the prediction of action progress, which is an estimate of how far the action has advanced as it is performed. In this paper, we propose to predict action progress using a different modality compared to previous methods: body joints. Human body joints carry very precise information about human poses, which we believe are a much more lightweight and effective way of characterizing actions and therefore their execution. Estimating action progress can in fact be determined based on the understanding of how key poses follow each other during the development of an activity. We show how an action progress prediction model can exploit body joints and integrate it with modules providing keypoint and action information in order to be run directly from raw pixels. The proposed method is experimentally validated on the Penn Action Dataset. Full article
(This article belongs to the Special Issue Computer Vision in Human Analysis: From Face and Body to Clothes)
Show Figures

Figure 1

16 pages, 1264 KiB  
Article
Towards Context-Aware Facial Emotion Reaction Database for Dyadic Interaction Settings
by Abdallah Hussein Sham, Amna Khan, David Lamas, Pia Tikka and Gholamreza Anbarjafari
Sensors 2023, 23(1), 458; https://doi.org/10.3390/s23010458 - 01 Jan 2023
Cited by 2 | Viewed by 2044
Abstract
Emotion recognition is a significant issue in many sectors that use human emotion reactions as communication for marketing, technological equipment, or human–robot interaction. The realistic facial behavior of social robots and artificial agents is still a challenge, limiting their emotional credibility in dyadic [...] Read more.
Emotion recognition is a significant issue in many sectors that use human emotion reactions as communication for marketing, technological equipment, or human–robot interaction. The realistic facial behavior of social robots and artificial agents is still a challenge, limiting their emotional credibility in dyadic face-to-face situations with humans. One obstacle is the lack of appropriate training data on how humans typically interact in such settings. This article focused on collecting the facial behavior of 60 participants to create a new type of dyadic emotion reaction database. For this purpose, we propose a methodology that automatically captures the facial expressions of participants via webcam while they are engaged with other people (facial videos) in emotionally primed contexts. The data were then analyzed using three different Facial Expression Analysis (FEA) tools: iMotions, the Mini-Xception model, and the Py-Feat FEA toolkit. Although the emotion reactions were reported as genuine, the comparative analysis between the aforementioned models could not agree with a single emotion reaction prediction. Based on this result, a more-robust and -effective model for emotion reaction prediction is needed. The relevance of this work for human–computer interaction studies lies in its novel approach to developing adaptive behaviors for synthetic human-like beings (virtual or robotic), allowing them to simulate human facial interaction behavior in contextually varying dyadic situations with humans. This article should be useful for researchers using human emotion analysis while deciding on a suitable methodology to collect facial expression reactions in a dyadic setting. Full article
(This article belongs to the Special Issue Computer Vision in Human Analysis: From Face and Body to Clothes)
Show Figures

Figure 1

19 pages, 1697 KiB  
Article
Vision-Based Eye Image Classification for Ophthalmic Measurement Systems
by Giovanni Gibertoni, Guido Borghi and Luigi Rovati
Sensors 2023, 23(1), 386; https://doi.org/10.3390/s23010386 - 29 Dec 2022
Cited by 5 | Viewed by 2687
Abstract
The accuracy and the overall performances of ophthalmic instrumentation, where specific analysis of eye images is involved, can be negatively influenced by invalid or incorrect frames acquired during everyday measurements of unaware or non-collaborative human patients and non-technical operators. Therefore, in this paper, [...] Read more.
The accuracy and the overall performances of ophthalmic instrumentation, where specific analysis of eye images is involved, can be negatively influenced by invalid or incorrect frames acquired during everyday measurements of unaware or non-collaborative human patients and non-technical operators. Therefore, in this paper, we investigate and compare the adoption of several vision-based classification algorithms belonging to different fields, i.e., Machine Learning, Deep Learning, and Expert Systems, in order to improve the performance of an ophthalmic instrument designed for the Pupillary Light Reflex measurement. To test the implemented solutions, we collected and publicly released PopEYE as one of the first datasets consisting of 15 k eye images belonging to 22 different subjects acquired through the aforementioned specialized ophthalmic device. Finally, we discuss the experimental results in terms of classification accuracy of the eye status, as well as computational load analysis, since the proposed solution is designed to be implemented in embedded boards, which have limited hardware resources in computational power and memory size. Full article
(This article belongs to the Special Issue Computer Vision in Human Analysis: From Face and Body to Clothes)
Show Figures

Figure 1

19 pages, 11413 KiB  
Article
3D Gaze Estimation Using RGB-IR Cameras
by Moayad Mokatren, Tsvi Kuflik and Ilan Shimshoni
Sensors 2023, 23(1), 381; https://doi.org/10.3390/s23010381 - 29 Dec 2022
Cited by 4 | Viewed by 2114
Abstract
In this paper, we present a framework for 3D gaze estimation intended to identify the user’s focus of attention in a corneal imaging system. The framework uses a headset that consists of three cameras, a scene camera and two eye cameras: an IR [...] Read more.
In this paper, we present a framework for 3D gaze estimation intended to identify the user’s focus of attention in a corneal imaging system. The framework uses a headset that consists of three cameras, a scene camera and two eye cameras: an IR camera and an RGB camera. The IR camera is used to continuously and reliably track the pupil and the RGB camera is used to acquire corneal images of the same eye. Deep learning algorithms are trained to detect the pupil in IR and RGB images and to compute a per user 3D model of the eye in real time. Once the 3D model is built, the 3D gaze direction is computed starting from the eyeball center and passing through the pupil center to the outside world. This model can also be used to transform the pupil position detected in the IR image into its corresponding position in the RGB image and to detect the gaze direction in the corneal image. This technique circumvents the problem of pupil detection in RGB images, which is especially difficult and unreliable when the scene is reflected in the corneal images. In our approach, the auto-calibration process is transparent and unobtrusive. Users do not have to be instructed to look at specific objects to calibrate the eye tracker. They need only to act and gaze normally. The framework was evaluated in a user study in realistic settings and the results are promising. It achieved a very low 3D gaze error (2.12°) and very high accuracy in acquiring corneal images (intersection over union—IoU = 0.71). The framework may be used in a variety of real-world mobile scenarios (indoors, indoors near windows and outdoors) with high accuracy. Full article
(This article belongs to the Special Issue Computer Vision in Human Analysis: From Face and Body to Clothes)
Show Figures

Figure 1

26 pages, 14037 KiB  
Article
Low-Cost Human–Machine Interface for Computer Control with Facial Landmark Detection and Voice Commands
by Pablo Ramos, Mireya Zapata, Kevin Valencia, Vanessa Vargas and Carlos Ramos-Galarza
Sensors 2022, 22(23), 9279; https://doi.org/10.3390/s22239279 - 29 Nov 2022
Cited by 4 | Viewed by 2934
Abstract
Nowadays, daily life involves the extensive use of computers, since human beings are immersed in a technological society. Therefore, it is mandatory to interact with computers, which represents a true disadvantage for people with upper limb disabilities. In this context, this work aims [...] Read more.
Nowadays, daily life involves the extensive use of computers, since human beings are immersed in a technological society. Therefore, it is mandatory to interact with computers, which represents a true disadvantage for people with upper limb disabilities. In this context, this work aims to develop an interface for emulating mouse and keyboard functions (EMKEY) by applying concepts of artificial vision and voice recognition to replace the use of hands. Pointer control is achieved by head movement, whereas voice recognition is used to perform interface functionalities, including speech-to-text transcription. To evaluate the interface’s usability and usefulness, two studies were carried out. The first study was performed with 30 participants without physical disabilities. Throughout this study, there were significant correlations found between the emulator’s usability and aspects such as adaptability, execution time, and the participant’s age. In the second study, the use of the emulator was analyzed by four participants with motor disabilities. It was found that the interface was best used by the participant with cerebral palsy, followed by the participants with upper limb paralysis, spina bifida, and muscular dystrophy. In general, the results show that the proposed interface is easy to use, practical, fairly accurate, and works on a wide range of computers. Full article
(This article belongs to the Special Issue Computer Vision in Human Analysis: From Face and Body to Clothes)
Show Figures

Figure 1

19 pages, 5149 KiB  
Article
Capturing Conversational Gestures for Embodied Conversational Agents Using an Optimized Kaneda–Lucas–Tomasi Tracker and Denavit–Hartenberg-Based Kinematic Model
by Grega Močnik, Zdravko Kačič, Riko Šafarič and Izidor Mlakar
Sensors 2022, 22(21), 8318; https://doi.org/10.3390/s22218318 - 29 Oct 2022
Cited by 5 | Viewed by 1959
Abstract
In order to recreate viable and human-like conversational responses, the artificial entity, i.e., an embodied conversational agent, must express correlated speech (verbal) and gestures (non-verbal) responses in spoken social interaction. Most of the existing frameworks focus on intent planning and behavior planning. The [...] Read more.
In order to recreate viable and human-like conversational responses, the artificial entity, i.e., an embodied conversational agent, must express correlated speech (verbal) and gestures (non-verbal) responses in spoken social interaction. Most of the existing frameworks focus on intent planning and behavior planning. The realization, however, is left to a limited set of static 3D representations of conversational expressions. In addition to functional and semantic synchrony between verbal and non-verbal signals, the final believability of the displayed expression is sculpted by the physical realization of non-verbal expressions. A major challenge of most conversational systems capable of reproducing gestures is the diversity in expressiveness. In this paper, we propose a method for capturing gestures automatically from videos and transforming them into 3D representations stored as part of the conversational agent’s repository of motor skills. The main advantage of the proposed method is ensuring the naturalness of the embodied conversational agent’s gestures, which results in a higher quality of human-computer interaction. The method is based on a Kanade–Lucas–Tomasi tracker, a Savitzky–Golay filter, a Denavit–Hartenberg-based kinematic model and the EVA framework. Furthermore, we designed an objective method based on cosine similarity instead of a subjective evaluation of synthesized movement. The proposed method resulted in a 96% similarity. Full article
(This article belongs to the Special Issue Computer Vision in Human Analysis: From Face and Body to Clothes)
Show Figures

Figure 1

18 pages, 2019 KiB  
Article
Camera Motion Agnostic Method for Estimating 3D Human Poses
by Seong Hyun Kim, Sunwon Jeong, Sungbum Park and Ju Yong Chang
Sensors 2022, 22(20), 7975; https://doi.org/10.3390/s22207975 - 19 Oct 2022
Cited by 2 | Viewed by 1468
Abstract
Although the performance of 3D human pose and shape estimation methods has improved considerably in recent years, existing approaches typically generate 3D poses defined in a camera or human-centered coordinate system. This makes it difficult to estimate a person’s pure pose and motion [...] Read more.
Although the performance of 3D human pose and shape estimation methods has improved considerably in recent years, existing approaches typically generate 3D poses defined in a camera or human-centered coordinate system. This makes it difficult to estimate a person’s pure pose and motion in a world coordinate system for a video captured using a moving camera. To address this issue, this paper presents a camera motion agnostic approach for predicting 3D human pose and mesh defined in the world coordinate system. The core idea of the proposed approach is to estimate the difference between two adjacent global poses (i.e., global motion) that is invariant to selecting the coordinate system, instead of the global pose coupled to the camera motion. To this end, we propose a network based on bidirectional gated recurrent units (GRUs) that predicts the global motion sequence from the local pose sequence consisting of relative rotations of joints called global motion regressor (GMR). We use 3DPW and synthetic datasets, which are constructed in a moving-camera environment, for evaluation. We conduct extensive experiments and prove the effectiveness of the proposed method empirically. Full article
(This article belongs to the Special Issue Computer Vision in Human Analysis: From Face and Body to Clothes)
Show Figures

Figure 1

14 pages, 27078 KiB  
Article
Would Your Clothes Look Good on Me? Towards Transferring Clothing Styles with Adaptive Instance Normalization
by Tomaso Fontanini and Claudio Ferrari
Sensors 2022, 22(13), 5002; https://doi.org/10.3390/s22135002 - 02 Jul 2022
Cited by 2 | Viewed by 1779
Abstract
Several applications of deep learning, such as image classification and retrieval, recommendation systems, and especially image synthesis, are of great interest to the fashion industry. Recently, image generation of clothes gained lot of popularity as it is a very challenging task that is [...] Read more.
Several applications of deep learning, such as image classification and retrieval, recommendation systems, and especially image synthesis, are of great interest to the fashion industry. Recently, image generation of clothes gained lot of popularity as it is a very challenging task that is far from being solved. Additionally, it would open lots of possibilities for designers and stylists enhancing their creativity. For this reason, in this paper we propose to tackle the problem of style transfer between two different people wearing different clothes. We draw inspiration from the recent StarGANv2 architecture that reached impressive results in transferring a target domain to a source image and we adapted it to work with fashion images and to transfer clothes styles. In more detail, we modified the architecture to work without the need of a clear separation between multiple domains, added a perceptual loss between the target and the source clothes, and edited the style encoder to better represent the style information of target clothes. We performed both qualitative and quantitative experiments with the recent DeepFashion2 dataset and proved the efficacy and novelty of our method. Full article
(This article belongs to the Special Issue Computer Vision in Human Analysis: From Face and Body to Clothes)
Show Figures

Figure 1

Back to TopTop