Loading [MathJax]/jax/output/HTML-CSS/jax.js
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (13)

Search Parameters:
Keywords = egocentric activity recognition

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 3440 KiB  
Article
Towards Automatic Object Detection and Activity Recognition in Indoor Climbing
by Hana Vrzáková, Jani Koskinen, Sami Andberg, Ahreum Lee and Mary Jean Amon
Sensors 2024, 24(19), 6479; https://doi.org/10.3390/s24196479 - 8 Oct 2024
Cited by 1 | Viewed by 1535
Abstract
Rock climbing has propelled from niche sport to mainstream free-time activity and Olympic sport. Moreover, climbing can be studied as an example of a high-stakes perception-action task. However, understanding what constitutes an expert climber is not simple or straightforward. As a dynamic and [...] Read more.
Rock climbing has propelled from niche sport to mainstream free-time activity and Olympic sport. Moreover, climbing can be studied as an example of a high-stakes perception-action task. However, understanding what constitutes an expert climber is not simple or straightforward. As a dynamic and high-risk activity, climbing requires a precise interplay between cognition, perception, and precise action execution. While prior research has predominantly focused on the movement aspect of climbing (i.e., skeletal posture and individual limb movements), recent studies have also examined the climber’s visual attention and its links to their performance. To associate the climber’s attention with their actions, however, has traditionally required frame-by-frame manual coding of the recorded eye-tracking videos. To overcome this challenge and automatically contextualize the analysis of eye movements in indoor climbing, we present deep learning-driven (YOLOv5) hold detection that facilitates automatic grasp recognition. To demonstrate the framework, we examined the expert climber’s eye movements and egocentric perspective acquired from eye-tracking glasses (SMI and Tobii Glasses 2). Using the framework, we observed that the expert climber’s grasping duration was positively correlated with total fixation duration (r = 0.807) and fixation count (r = 0.864); however, it was negatively correlated with the fixation rate (r = −0.402) and saccade rate (r = −0.344). The findings indicate the moments of cognitive processing and visual search that occurred during decision making and route prospecting. Our work contributes to research on eye–body performance and coordination in high-stakes contexts, and informs the sport science and expands the applications, e.g., in training optimization, injury prevention, and coaching. Full article
Show Figures

Figure 1

15 pages, 641 KiB  
Article
A Multi-Modal Egocentric Activity Recognition Approach towards Video Domain Generalization
by Antonios Papadakis and Evaggelos Spyrou
Sensors 2024, 24(8), 2491; https://doi.org/10.3390/s24082491 - 12 Apr 2024
Cited by 3 | Viewed by 1895
Abstract
Egocentric activity recognition is a prominent computer vision task that is based on the use of wearable cameras. Since egocentric videos are captured through the perspective of the person wearing the camera, her/his body motions severely complicate the video content, imposing several challenges. [...] Read more.
Egocentric activity recognition is a prominent computer vision task that is based on the use of wearable cameras. Since egocentric videos are captured through the perspective of the person wearing the camera, her/his body motions severely complicate the video content, imposing several challenges. In this work we propose a novel approach for domain-generalized egocentric human activity recognition. Typical approaches use a large amount of training data, aiming to cover all possible variants of each action. Moreover, several recent approaches have attempted to handle discrepancies between domains with a variety of costly and mostly unsupervised domain adaptation methods. In our approach we show that through simple manipulation of available source domain data and with minor involvement from the target domain, we are able to produce robust models, able to adequately predict human activity in egocentric video sequences. To this end, we introduce a novel three-stream deep neural network architecture combining elements of vision transformers and residual neural networks which are trained using multi-modal data. We evaluate the proposed approach using a challenging, egocentric video dataset and demonstrate its superiority over recent, state-of-the-art research works. Full article
Show Figures

Figure 1

20 pages, 2103 KiB  
Article
Fusion of Appearance and Motion Features for Daily Activity Recognition from Egocentric Perspective
by Mohd Haris Lye, Nouar AlDahoul and Hezerul Abdul Karim
Sensors 2023, 23(15), 6804; https://doi.org/10.3390/s23156804 - 30 Jul 2023
Cited by 2 | Viewed by 1234
Abstract
Vidos from a first-person or egocentric perspective offer a promising tool for recognizing various activities related to daily living. In the egocentric perspective, the video is obtained from a wearable camera, and this enables the capture of the person’s activities in a consistent [...] Read more.
Vidos from a first-person or egocentric perspective offer a promising tool for recognizing various activities related to daily living. In the egocentric perspective, the video is obtained from a wearable camera, and this enables the capture of the person’s activities in a consistent viewpoint. Recognition of activity using a wearable sensor is challenging due to various reasons, such as motion blur and large variations. The existing methods are based on extracting handcrafted features from video frames to represent the contents. These features are domain-dependent, where features that are suitable for a specific dataset may not be suitable for others. In this paper, we propose a novel solution to recognize daily living activities from a pre-segmented video clip. The pre-trained convolutional neural network (CNN) model VGG16 is used to extract visual features from sampled video frames and then aggregated by the proposed pooling scheme. The proposed solution combines appearance and motion features extracted from video frames and optical flow images, respectively. The methods of mean and max spatial pooling (MMSP) and max mean temporal pyramid (TPMM) pooling are proposed to compose the final video descriptor. The feature is applied to a linear support vector machine (SVM) to recognize the type of activities observed in the video clip. The evaluation of the proposed solution was performed on three public benchmark datasets. We performed studies to show the advantage of aggregating appearance and motion features for daily activity recognition. The results show that the proposed solution is promising for recognizing activities of daily living. Compared to several methods on three public datasets, the proposed MMSP–TPMM method produces higher classification performance in terms of accuracy (90.38% with LENA dataset, 75.37% with ADL dataset, 96.08% with FPPA dataset) and average per-class precision (AP) (58.42% with ADL dataset and 96.11% with FPPA dataset). Full article
(This article belongs to the Special Issue Applications of Body Worn Sensors and Wearables)
Show Figures

Figure 1

22 pages, 10163 KiB  
Article
Closed-Chain Inverse Dynamics for the Biomechanical Analysis of Manual Material Handling Tasks through a Deep Learning Assisted Wearable Sensor Network
by Riccardo Bezzini, Luca Crosato, Massimo Teppati Losè, Carlo Alberto Avizzano, Massimo Bergamasco and Alessandro Filippeschi
Sensors 2023, 23(13), 5885; https://doi.org/10.3390/s23135885 - 25 Jun 2023
Cited by 3 | Viewed by 2736
Abstract
Despite the automatization of many industrial and logistics processes, human workers are still often involved in the manual handling of loads. These activities lead to many work-related disorders that reduce the quality of life and the productivity of aged workers. A biomechanical analysis [...] Read more.
Despite the automatization of many industrial and logistics processes, human workers are still often involved in the manual handling of loads. These activities lead to many work-related disorders that reduce the quality of life and the productivity of aged workers. A biomechanical analysis of such activities is the basis for a detailed estimation of the biomechanical overload, thus enabling focused prevention actions. Thanks to wearable sensor networks, it is now possible to analyze human biomechanics by an inverse dynamics approach in ecological conditions. The purposes of this study are the conceptualization, formulation, and implementation of a deep learning-assisted fully wearable sensor system for an online evaluation of the biomechanical effort that an operator exerts during a manual material handling task. In this paper, we show a novel, computationally efficient algorithm, implemented in ROS, to analyze the biomechanics of the human musculoskeletal systems by an inverse dynamics approach. We also propose a method for estimating the load and its distribution, relying on an egocentric camera and deep learning-based object recognition. This method is suitable for objects of known weight, as is often the case in logistics. Kinematic data, along with foot contact information, are provided by a fully wearable sensor network composed of inertial measurement units. The results show good accuracy and robustness of the system for object detection and grasp recognition, thus providing reliable load estimation for a high-impact field such as logistics. The outcome of the biomechanical analysis is consistent with the literature. However, improvements in gait segmentation are necessary to reduce discontinuities in the estimated lower limb articular wrenches. Full article
Show Figures

Figure 1

24 pages, 14566 KiB  
Article
YOLO Series for Human Hand Action Detection and Classification from Egocentric Videos
by Hung-Cuong Nguyen, Thi-Hao Nguyen, Rafał Scherer and Van-Hung Le
Sensors 2023, 23(6), 3255; https://doi.org/10.3390/s23063255 - 20 Mar 2023
Cited by 17 | Viewed by 7384
Abstract
Hand detection and classification is a very important pre-processing step in building applications based on three-dimensional (3D) hand pose estimation and hand activity recognition. To automatically limit the hand data area on egocentric vision (EV) datasets, especially to see the development and performance [...] Read more.
Hand detection and classification is a very important pre-processing step in building applications based on three-dimensional (3D) hand pose estimation and hand activity recognition. To automatically limit the hand data area on egocentric vision (EV) datasets, especially to see the development and performance of the “You Only Live Once” (YOLO) network over the past seven years, we propose a study comparing the efficiency of hand detection and classification based on the YOLO-family networks. This study is based on the following problems: (1) systematizing all architectures, advantages, and disadvantages of YOLO-family networks from version (v)1 to v7; (2) preparing ground-truth data for pre-trained models and evaluation models of hand detection and classification on EV datasets (FPHAB, HOI4D, RehabHand); (3) fine-tuning the hand detection and classification model based on the YOLO-family networks, hand detection, and classification evaluation on the EV datasets. Hand detection and classification results on the YOLOv7 network and its variations were the best across all three datasets. The results of the YOLOv7-w6 network are as follows: FPHAB is P = 97% with TheshIOU = 0.5; HOI4D is P = 95% with TheshIOU = 0.5; RehabHand is larger than 95% with TheshIOU = 0.5; the processing speed of YOLOv7-w6 is 60 fps with a resolution of 1280 × 1280 pixels and that of YOLOv7 is 133 fps with a resolution of 640 × 640 pixels. Full article
Show Figures

Figure 1

21 pages, 1481 KiB  
Article
Automatic Visual Attention Detection for Mobile Eye Tracking Using Pre-Trained Computer Vision Models and Human Gaze
by Michael Barz and Daniel Sonntag
Sensors 2021, 21(12), 4143; https://doi.org/10.3390/s21124143 - 16 Jun 2021
Cited by 23 | Viewed by 8565
Abstract
Processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. These stimuli, which are prevalent subjects of diagnostic eye tracking studies, are commonly encoded as rectangular areas of interest (AOIs) per frame. Because it is a tedious [...] Read more.
Processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. These stimuli, which are prevalent subjects of diagnostic eye tracking studies, are commonly encoded as rectangular areas of interest (AOIs) per frame. Because it is a tedious manual annotation task, the automatic detection and annotation of visual attention to AOIs can accelerate and objectify eye tracking research, in particular for mobile eye tracking with egocentric video feeds. In this work, we implement two methods to automatically detect visual attention to AOIs using pre-trained deep learning models for image classification and object detection. Furthermore, we develop an evaluation framework based on the VISUS dataset and well-known performance metrics from the field of activity recognition. We systematically evaluate our methods within this framework, discuss potentials and limitations, and propose ways to improve the performance of future automatic visual attention detection methods. Full article
(This article belongs to the Special Issue Wearable Technologies and Applications for Eye Tracking)
Show Figures

Figure 1

16 pages, 1039 KiB  
Article
The Fear to Move in a Crowded Environment. Poor Spatial Memory Related to Agoraphobic Disorder
by Micaela Maria Zucchelli, Laura Piccardi and Raffaella Nori
Brain Sci. 2021, 11(6), 796; https://doi.org/10.3390/brainsci11060796 - 16 Jun 2021
Cited by 6 | Viewed by 3776
Abstract
Individuals with agoraphobia exhibit impaired exploratory activity when navigating unfamiliar environments. However, no studies have investigated the contribution of visuospatial working memory (VSWM) in these individuals’ ability to acquire and process spatial information while considering the use of egocentric and allocentric coordinates or [...] Read more.
Individuals with agoraphobia exhibit impaired exploratory activity when navigating unfamiliar environments. However, no studies have investigated the contribution of visuospatial working memory (VSWM) in these individuals’ ability to acquire and process spatial information while considering the use of egocentric and allocentric coordinates or environments with or without people. A total of 106 individuals (53 with agoraphobia and 53 controls) navigated in a virtual square to acquire spatial information that included the recognition of landmarks and the relationship between landmarks and themselves (egocentric coordinates) and independent of themselves (allocentric coordinates). Half of the participants in both groups navigated in a square without people, and half navigated in a crowded square. They completed a VSWM test in addition to tasks measuring landmark recognition and egocentric and allocentric judgements concerning the explored square. The results showed that individuals with agoraphobia had reduced working memory only when active processing of spatial elements was required, suggesting that they exhibit spatial difficulties particularly in complex spatial tasks requiring them to process information simultaneously. Specifically, VSWM deficits mediated the relationship between agoraphobia and performance in the allocentric judgements. The results are discussed considering the theoretical background of agoraphobia in order to provide useful elements for the early diagnosis of this disorder. Full article
(This article belongs to the Special Issue Application of Virtual Reality in Spatial Memory)
Show Figures

Figure 1

19 pages, 4192 KiB  
Article
STAC: Spatial-Temporal Attention on Compensation Information for Activity Recognition in FPV
by Yue Zhang, Shengli Sun, Linjian Lei, Huikai Liu and Hui Xie
Sensors 2021, 21(4), 1106; https://doi.org/10.3390/s21041106 - 5 Feb 2021
Cited by 7 | Viewed by 2142
Abstract
Egocentric activity recognition in first-person video (FPV) requires fine-grained matching of the camera wearer’s action and the objects being operated. The traditional method used for third-person action recognition does not suffice because of (1) the background ego-noise introduced by the unstructured movement of [...] Read more.
Egocentric activity recognition in first-person video (FPV) requires fine-grained matching of the camera wearer’s action and the objects being operated. The traditional method used for third-person action recognition does not suffice because of (1) the background ego-noise introduced by the unstructured movement of the wearable devices caused by body movement; (2) the small-sized and fine-grained objects with single scale in FPV. Size compensation is performed to augment the data. It generates a multi-scale set of regions, including multi-size objects, leading to superior performance. We compensate for the optical flow to eliminate the camera noise in motion. We developed a novel two-stream convolutional neural network-recurrent attention neural network (CNN-RAN) architecture: spatial temporal attention on compensation information (STAC), able to generate generic descriptors under weak supervision and focus on the locations of activated objects and the capture of effective motion. We encode the RGB features using a spatial location-aware attention mechanism to guide the representation of visual features. Similar location-aware channel attention is applied to the temporal stream in the form of stacked optical flow to implicitly select the relevant frames and pay attention to where the action occurs. The two streams are complementary since one is object-centric and the other focuses on the motion. We conducted extensive ablation analysis to validate the complementarity and effectiveness of our STAC model qualitatively and quantitatively. It achieved state-of-the-art performance on two egocentric datasets. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

17 pages, 1790 KiB  
Article
Performance Boosting of Scale and Rotation Invariant Human Activity Recognition (HAR) with LSTM Networks Using Low Dimensional 3D Posture Data in Egocentric Coordinates
by Ibrahim Furkan Ince
Appl. Sci. 2020, 10(23), 8474; https://doi.org/10.3390/app10238474 - 27 Nov 2020
Cited by 8 | Viewed by 3273
Abstract
Human activity recognition (HAR) has been an active area in computer vision with a broad range of applications, such as education, security surveillance, and healthcare. HAR is a general time series classification problem. LSTMs are widely used for time series classification tasks. However, [...] Read more.
Human activity recognition (HAR) has been an active area in computer vision with a broad range of applications, such as education, security surveillance, and healthcare. HAR is a general time series classification problem. LSTMs are widely used for time series classification tasks. However, they work well with high-dimensional feature vectors, which reduce the processing speed of LSTM in real-time applications. Therefore, dimension reduction is required to create low-dimensional feature space. As it is experimented in previous study, LSTM with dimension reduction yielded the worst performance among other classifiers, which are not deep learning methods. Therefore, in this paper, a novel scale and rotation invariant human activity recognition system, which can also work in low dimensional feature space is presented. For this purpose, Kinect depth sensor is employed to obtain skeleton joints. Since angles are used, proposed system is already scale invariant. In order to provide rotation invariance, body relative direction in egocentric coordinates is calculated. The 3D vector between right hip and left hip is used to get the horizontal axis and its cross product with the vertical axis of global coordinate system assumed to be the depth axis of the proposed local coordinate system. Instead of using 3D joint angles, 8 number of limbs and their corresponding 3D angles with X, Y, and Z axes of the proposed coordinate system are compressed with several dimension reduction methods such as averaging filter, Haar wavelet transform (HWT), and discrete cosine transform (DCT) and employed as the feature vector. Finally, extracted features are trained and tested with LSTM (long short-term memory) network, which is an artificial recurrent neural network (RNN) architecture. Experimental and benchmarking results indicate that proposed framework boosts the performance of LSTM by approximately 30% accuracy in low-dimensional feature space. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

26 pages, 4408 KiB  
Article
Multi-View Hand-Hygiene Recognition for Food Safety
by Chengzhang Zhong, Amy R. Reibman, Hansel A. Mina and Amanda J. Deering
J. Imaging 2020, 6(11), 120; https://doi.org/10.3390/jimaging6110120 - 7 Nov 2020
Cited by 11 | Viewed by 7510
Abstract
A majority of foodborne illnesses result from inappropriate food handling practices. One proven practice to reduce pathogens is to perform effective hand-hygiene before all stages of food handling. In this paper, we design a multi-camera system that uses video analytics to recognize hand-hygiene [...] Read more.
A majority of foodborne illnesses result from inappropriate food handling practices. One proven practice to reduce pathogens is to perform effective hand-hygiene before all stages of food handling. In this paper, we design a multi-camera system that uses video analytics to recognize hand-hygiene actions, with the goal of improving hand-hygiene effectiveness. Our proposed two-stage system processes untrimmed video from both egocentric and third-person cameras. In the first stage, a low-cost coarse classifier efficiently localizes the hand-hygiene period; in the second stage, more complex refinement classifiers recognize seven specific actions within the hand-hygiene period. We demonstrate that our two-stage system has significantly lower computational requirements without a loss of recognition accuracy. Specifically, the computationally complex refinement classifiers process less than 68% of the untrimmed videos, and we anticipate further computational gains in videos that contain a larger fraction of non-hygiene actions. Our results demonstrate that a carefully designed video action recognition system can play an important role in improving hand hygiene for food safety. Full article
Show Figures

Figure 1

22 pages, 3171 KiB  
Article
Fusing Object Information and Inertial Data for Activity Recognition
by Alexander Diete and Heiner Stuckenschmidt
Sensors 2019, 19(19), 4119; https://doi.org/10.3390/s19194119 - 23 Sep 2019
Cited by 8 | Viewed by 3375
Abstract
In the field of pervasive computing, wearable devices have been widely used for recognizing human activities. One important area in this research is the recognition of activities of daily living where especially inertial sensors and interaction sensors (like RFID tags with scanners) are [...] Read more.
In the field of pervasive computing, wearable devices have been widely used for recognizing human activities. One important area in this research is the recognition of activities of daily living where especially inertial sensors and interaction sensors (like RFID tags with scanners) are popular choices as data sources. Using interaction sensors, however, has one drawback: they may not differentiate between proper interaction and simple touching of an object. A positive signal from an interaction sensor is not necessarily caused by a performed activity e.g., when an object is only touched but no interaction occurred afterwards. There are, however, many scenarios like medicine intake that rely heavily on correctly recognized activities. In our work, we aim to address this limitation and present a multimodal egocentric-based activity recognition approach. Our solution relies on object detection that recognizes activity-critical objects in a frame. As it is infeasible to always expect a high quality camera view, we enrich the vision features with inertial sensor data that monitors the users’ arm movement. This way we try to overcome the drawbacks of each respective sensor. We present our results of combining inertial and video features to recognize human activities on different types of scenarios where we achieve an F1 -measure of up to 79.6%. Full article
(This article belongs to the Special Issue Multi-Sensor Fusion in Body Sensor Networks)
Show Figures

Figure 1

28 pages, 14357 KiB  
Article
A Hierarchical Deep Fusion Framework for Egocentric Activity Recognition using a Wearable Hybrid Sensor System
by Haibin Yu, Guoxiong Pan, Mian Pan, Chong Li, Wenyan Jia, Li Zhang and Mingui Sun
Sensors 2019, 19(3), 546; https://doi.org/10.3390/s19030546 - 28 Jan 2019
Cited by 18 | Viewed by 4831
Abstract
Recently, egocentric activity recognition has attracted considerable attention in the pattern recognition and artificial intelligence communities because of its wide applicability in medical care, smart homes, and security monitoring. In this study, we developed and implemented a deep-learning-based hierarchical fusion framework for the [...] Read more.
Recently, egocentric activity recognition has attracted considerable attention in the pattern recognition and artificial intelligence communities because of its wide applicability in medical care, smart homes, and security monitoring. In this study, we developed and implemented a deep-learning-based hierarchical fusion framework for the recognition of egocentric activities of daily living (ADLs) in a wearable hybrid sensor system comprising motion sensors and cameras. Long short-term memory (LSTM) and a convolutional neural network are used to perform egocentric ADL recognition based on motion sensor data and photo streaming in different layers, respectively. The motion sensor data are used solely for activity classification according to motion state, while the photo stream is used for further specific activity recognition in the motion state groups. Thus, both motion sensor data and photo stream work in their most suitable classification mode to significantly reduce the negative influence of sensor differences on the fusion results. Experimental results show that the proposed method not only is more accurate than the existing direct fusion method (by up to 6%) but also avoids the time-consuming computation of optical flow in the existing method, which makes the proposed algorithm less complex and more suitable for practical application. Full article
(This article belongs to the Special Issue Computational Intelligence-Based Sensors)
Show Figures

Figure 1

24 pages, 17045 KiB  
Review
Recognition of Activities of Daily Living with Egocentric Vision: A Review
by Thi-Hoa-Cuc Nguyen, Jean-Christophe Nebel and Francisco Florez-Revuelta
Sensors 2016, 16(1), 72; https://doi.org/10.3390/s16010072 - 7 Jan 2016
Cited by 96 | Viewed by 12429
Abstract
Video-based recognition of activities of daily living (ADLs) is being used in ambient assisted living systems in order to support the independent living of older people. However, current systems based on cameras located in the environment present a number of problems, such as [...] Read more.
Video-based recognition of activities of daily living (ADLs) is being used in ambient assisted living systems in order to support the independent living of older people. However, current systems based on cameras located in the environment present a number of problems, such as occlusions and a limited field of view. Recently, wearable cameras have begun to be exploited. This paper presents a review of the state of the art of egocentric vision systems for the recognition of ADLs following a hierarchical structure: motion, action and activity levels, where each level provides higher semantic information and involves a longer time frame. The current egocentric vision literature suggests that ADLs recognition is mainly driven by the objects present in the scene, especially those associated with specific tasks. However, although object-based approaches have proven popular, object recognition remains a challenge due to the intra-class variations found in unconstrained scenarios. As a consequence, the performance of current systems is far from satisfactory. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

Back to TopTop