applsci-logo

Journal Browser

Journal Browser

Computer Vision in Human Activity Recognition and Behavior Analysis

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 31 January 2025 | Viewed by 9829

Special Issue Editor


E-Mail Website
Guest Editor
Centre for Sustainable Digital Technologies, Technology University Dublin, D07 EWV4 Dublin, Ireland
Interests: learning (artificial intelligence); neural nets; quality of service; 6G mobile communication; Internet; Internet of Things; agriculture; autonomous aerial vehicles; blockchains; computer networks; computer vision; convolutional neural nets; data privacy; discrete wavelet transforms; feature extraction; gesture recognition; graph theory; handicapped aids; image capture; image classification; image motion analysis; mobile robots; multiprotocol label switching; object detection; optimization

Special Issue Information

Dear Colleagues,

Nowadays, with advances being made in computer vision, it is common for intelligent monitoring to observe human activities and behaviors, as well as report abnormal situations. However, it is a different task for the computing environment to extract accurate information about human activities and behaviors from videos or images.

The area of human behavior recognition integrates multi-disciplinary knowledge, such as artificial intelligence, deep learning, video/image processing, machine vision, sensors, and pattern recognition. This Special Issue aims to highlight the state-of-the-art research in computer vision in human activity recognition and behavior analysis, especially trans-domain methods with different sensing and vision technologies, that are energy efficient and make optimal use of compute resources. The topics in focus include, but are not limited to:

  • Human activity recognition;
  • Machine vision;
  • Deep learning;
  • Pattern Behavior Recognition;
  • Movement analysis;
  • Recognize/analyze gestures using smart devices;
  • Human-centric sensing;
  • Early action prediction.

Dr. Steven Davy
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • behavior analysis
  • machine learning
  • emotion recognition
  • wearable and pervasive sensing

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

19 pages, 6954 KiB  
Article
Improving Time Study Methods Using Deep Learning-Based Action Segmentation Models
by Mihael Gudlin, Miro Hegedić, Matija Golec and Davor Kolar
Appl. Sci. 2024, 14(3), 1185; https://doi.org/10.3390/app14031185 - 31 Jan 2024
Viewed by 981
Abstract
In the quest for industrial efficiency, human performance within manufacturing systems remains pivotal. Traditional time study methods, reliant on direct observation and manual video analysis, are increasingly inadequate, given technological advancements. This research explores the automation of time study methods by deploying deep [...] Read more.
In the quest for industrial efficiency, human performance within manufacturing systems remains pivotal. Traditional time study methods, reliant on direct observation and manual video analysis, are increasingly inadequate, given technological advancements. This research explores the automation of time study methods by deploying deep learning models for action segmentation, scrutinizing the efficacy of various architectural strategies. A dataset, featuring nine work activities performed by four subjects on three product types, was collected from a real manufacturing assembly process. Our methodology hinged on a two-step video processing framework, capturing activities from two perspectives: overhead and hand-focused. Through experimentation with 27 distinctive models varying in viewpoint, feature extraction method, and the architecture of the segmentation model, we identified improvements in temporal segmentation precision measured with the F1@IoU metric. Our findings highlight the limitations of basic Transformer models in action segmentation tasks, due to their lack of inductive bias and the limitations of a smaller dataset scale. Conversely, the 1D CNN and biLSTM architectures demonstrated proficiency in temporal data modeling, advocating for architectural adaptability over mere scale. The results contribute to the field by underscoring the interplay between model architecture, feature extraction method, and viewpoint integration in refining time study methodologies. Full article
(This article belongs to the Special Issue Computer Vision in Human Activity Recognition and Behavior Analysis)
Show Figures

Figure 1

14 pages, 5540 KiB  
Article
Addressing Ergonomic Challenges in Agriculture through AI-Enabled Posture Classification
by Siddhant Kapse, Ruoxuan Wu and Ornwipa Thamsuwan
Appl. Sci. 2024, 14(2), 525; https://doi.org/10.3390/app14020525 - 7 Jan 2024
Cited by 2 | Viewed by 2486
Abstract
In this study, we explored the application of Artificial Intelligence (AI) for posture detection in the context of ergonomics in the agricultural field. Leveraging computer vision and machine learning, we aim to overcome limitations in accuracy, robustness, and real-time application found in traditional [...] Read more.
In this study, we explored the application of Artificial Intelligence (AI) for posture detection in the context of ergonomics in the agricultural field. Leveraging computer vision and machine learning, we aim to overcome limitations in accuracy, robustness, and real-time application found in traditional approaches such as observation and direct measurement. We first collected field videos to capture real-world scenarios of workers in an outdoor plant nursery. Next, we labeled workers’ trunk postures into three distinct categories: neutral, slight forward bending and full forward bending. Then, through CNNs, transfer learning, and MoveNet, we investigated the effectiveness of different approaches in accurately classifying trunk postures. Specifically, MoveNet was utilized to extract key anatomical features, which were then fed into various classification algorithms including DT, SVM, RF and ANN. The best performance was obtained using MoveNet together with ANN (accuracy = 87.80%, precision = 87.46%, recall = 87.52%, and F1-score = 87.41%). The findings of this research contributed to the integration of computer vision techniques with ergonomic assessments especially in the outdoor field settings. The results highlighted the potential of correct posture classification systems to enhance health and safety prevention practices in the agricultural industry. Full article
(This article belongs to the Special Issue Computer Vision in Human Activity Recognition and Behavior Analysis)
Show Figures

Figure 1

14 pages, 5143 KiB  
Article
Detection of Anomalous Behavior of Manufacturing Workers Using Deep Learning-Based Recognition of Human–Object Interaction
by Rita Rijayanti, Mintae Hwang and Kyohong Jin
Appl. Sci. 2023, 13(15), 8584; https://doi.org/10.3390/app13158584 - 26 Jul 2023
Cited by 4 | Viewed by 2348
Abstract
The increasing demand for industrial products has expanded production quantities, leading to negative effects on product quality, worker productivity, and safety during working hours. Therefore, monitoring the conditions in manufacturing environments, particularly human workers, is crucial. Accordingly, this study presents a model that [...] Read more.
The increasing demand for industrial products has expanded production quantities, leading to negative effects on product quality, worker productivity, and safety during working hours. Therefore, monitoring the conditions in manufacturing environments, particularly human workers, is crucial. Accordingly, this study presents a model that detects workers’ anomalous behavior in manufacturing environments. The objective is to determine worker movements, postures, and interactions with surrounding objects based on human–object interactions using a Mask R-CNN, MediaPipe Holistic, a long short-term memory (LSTM), and worker behavior description algorithm. The process begins by recognizing the objects within video frames using a Mask R-CNN. Afterward, worker poses are recognized and classified based on object positions using a deep learning-based approach. Next, we identified the patterns or characteristics that signified normal or anomalous behavior. In this case, anomalous behavior consists of anomalies correlated with human pose recognition (emergencies: worker falls, slips, or becomes ill) and human pose recognition with object positions (tool breakage and machine failure). The findings suggest that the model successfully distinguished anomalous behavior and attained the highest pose recognition accuracy (approximately 96%) for standing, touching, and holding, and the lowest accuracy (approximately 88%) for sitting. In addition, the model achieved an object detection accuracy of approximately 97%. Full article
(This article belongs to the Special Issue Computer Vision in Human Activity Recognition and Behavior Analysis)
Show Figures

Figure 1

14 pages, 947 KiB  
Article
Enhanced Spatial Stream of Two-Stream Network Using Optical Flow for Human Action Recognition
by Shahbaz Khan, Ali Hassan, Farhan Hussain, Aqib Perwaiz, Farhan Riaz, Maazen Alsabaan and Wadood Abdul
Appl. Sci. 2023, 13(14), 8003; https://doi.org/10.3390/app13148003 - 8 Jul 2023
Cited by 3 | Viewed by 1679
Abstract
Introduction: Convolutional neural networks (CNNs) have maintained their dominance in deep learning methods for human action recognition (HAR) and other computer vision tasks. However, the need for a large amount of training data always restricts the performance of CNNs. Method: This paper [...] Read more.
Introduction: Convolutional neural networks (CNNs) have maintained their dominance in deep learning methods for human action recognition (HAR) and other computer vision tasks. However, the need for a large amount of training data always restricts the performance of CNNs. Method: This paper is inspired by the two-stream network, where a CNN is deployed to train the network by using the spatial and temporal aspects of an activity, thus exploiting the strengths of both networks to achieve better accuracy. Contributions: Our contribution is twofold: first, we deploy an enhanced spatial stream, and it is demonstrated that models pre-trained on a larger dataset, when used in the spatial stream, yield good performance instead of training the entire model from scratch. Second, a dataset augmentation technique is presented to minimize overfitting of CNNs, where we increase the dataset size by performing various transformations on the images such as rotation and flipping, etc. Results: UCF101 is a standard benchmark dataset for action videos, and our architecture has been trained and validated on it. Compared with the other two-stream networks, our results outperformed them in terms of accuracy. Full article
(This article belongs to the Special Issue Computer Vision in Human Activity Recognition and Behavior Analysis)
Show Figures

Figure 1

15 pages, 596 KiB  
Article
Dilated Multi-Temporal Modeling for Action Recognition
by Tao Zhang, Yifan Wu and Xiaoqiang Li
Appl. Sci. 2023, 13(12), 6934; https://doi.org/10.3390/app13126934 - 8 Jun 2023
Viewed by 1154
Abstract
Action recognition involves capturing temporal information from video clips where the duration varies with videos for the same action. Due to the diverse scale of temporal context, uniform size kernels utilized in convolutional neural networks (CNNs) limit the capability of multiple-scale temporal modeling. [...] Read more.
Action recognition involves capturing temporal information from video clips where the duration varies with videos for the same action. Due to the diverse scale of temporal context, uniform size kernels utilized in convolutional neural networks (CNNs) limit the capability of multiple-scale temporal modeling. In this paper, we propose a novel dilated multi-temporal (DMT) module that provides a solution for modeling multi-temporal information in action recognition. By using dilated convolutions with different dilation rates in different feature map channels, the DMT module captures information at multiple scales without the need for costly multi-branch networks, input-level frame pyramids, or feature map stacking that previous works have usually incurred. Therefore, this approach enables the integration of temporal information from multiple scales. In addition, the DMT module can be integrated into existing 2D CNNs, making it a straightforward and intuitive solution for addressing the challenge of multi-temporal modeling. Our proposed method has demonstrated promising results in performance and has achieved about 2% and 1% accuracy improvement on FineGym99 and SthV1. We conducted an empirical analysis that demonstrates how DMT improves the classification accuracy for action classes with varying durations. Full article
(This article belongs to the Special Issue Computer Vision in Human Activity Recognition and Behavior Analysis)
Show Figures

Figure 1

Back to TopTop