Artificial Intelligence-Based Image Processing and Computer Vision

A special issue of AI (ISSN 2673-2688). This special issue belongs to the section "AI Systems: Theory and Applications".

Deadline for manuscript submissions: 31 July 2024 | Viewed by 8627

Special Issue Editor


E-Mail Website
Guest Editor
Department of Computer Science, Kansas State University, Manhattan, KS 66506, USA
Interests: artificial intelligence; computer vision; parallel computing; embedded systems; secure and trustworthy systems
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Modern image processing is a process of transforming an image into a digital form and using computing systems to process, manipulate, and/or enhance digital images through various algorithms. Image processing is also a requisite for many computer vision tasks as it helps to preprocess images and prepares data in a form suitable for various computer vision models. Computer vision generally refers to techniques that enable computers to understand and make sense of images. Computer vision enables machines to extract latent information from visual data and to mimic the human perception of sight with computational algorithms. Active research is ongoing on developing novel image processing and computer vision algorithms including artificial intelligence (AI), in particular, deep-learning-based algorithms to enable new and fascinating applications. Advances in AI-based image processing and computer vision have enabled many exciting new applications, such as autonomous vehicles, unmanned aerial vehicles, computational photography, augmented reality, surveillance, optical character recognition, machine inspection, autonomous package delivery, photogrammetry, biometrics, computer-aided inspection of medical images, and remote patient monitoring. Image processing and computer vision have applications in various domains including healthcare, transportation, retail, agriculture, business, manufacturing, construction, space, and military.

This Special Issue will explore algorithms and applications of image processing and computer vision inspired by AI. For this Special Issue, we welcome the submission of original research articles and reviews that relate to computing, architecture, algorithms, security, and applications of image processing and computer vision. Topics of interest include but are not limited to the following:

  • Image interpretation
  • Object detection and recognition
  • Spatial artificial intelligence
  • Event detection and activity recognition
  • Image segmentation
  • Video classification and analysis
  • Face and gesture recognition
  • Pose estimation
  • Computational photography
  • Image security
  • Vision hardware and/or software architectures
  • Image/vision acceleration techniques
  • Monitoring and surveillance
  • Situational awareness.

Dr. Arslan Munir
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. AI is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • image processing
  • computer vision
  • image fusion
  • vision algorithms
  • deep learning
  • stereo vision
  • activity recognition
  • image/video analysis
  • image encryption
  • computational photography
  • vision hardware/software
  • monitoring and surveillance
  • biometrics
  • robotics
  • augmented reality

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

13 pages, 9978 KiB  
Article
The Eye in the Sky—A Method to Obtain On-Field Locations of Australian Rules Football Athletes
by Zachery Born, Marion Mundt, Ajmal Mian, Jason Weber and Jacqueline Alderson
AI 2024, 5(2), 733-745; https://doi.org/10.3390/ai5020038 - 16 May 2024
Viewed by 338
Abstract
The ability to overcome an opposition in team sports is reliant upon an understanding of the tactical behaviour of the opposing team members. Recent research is limited to a performance analysts’ own playing team members, as the required opposing team athletes’ geolocation (GPS) [...] Read more.
The ability to overcome an opposition in team sports is reliant upon an understanding of the tactical behaviour of the opposing team members. Recent research is limited to a performance analysts’ own playing team members, as the required opposing team athletes’ geolocation (GPS) data are unavailable. However, in professional Australian rules Football (AF), animations of athlete GPS data from all teams are commercially available. The purpose of this technical study was to obtain the on-field location of AF athletes from animations of the 2019 Australian Football League season to enable the examination of the tactical behaviour of any team. The pre-trained object detection model YOLOv4 was fine-tuned to detect players, and a custom convolutional neural network was trained to track numbers in the animations. The object detection and the athlete tracking achieved an accuracy of 0.94 and 0.98, respectively. Subsequent scaling and translation coefficients were determined through solving an optimisation problem to transform the pixel coordinate positions of a tracked player number to field-relative Cartesian coordinates. The derived equations achieved an average Euclidean distance from the athletes’ raw GPS data of 2.63 m. The proposed athlete detection and tracking approach is a novel methodology to obtain the on-field positions of AF athletes in the absence of direct measures, which may be used for the analysis of opposition collective team behaviour and in the development of interactive play sketching AF tools. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

18 pages, 1201 KiB  
Article
Efficient Paddy Grain Quality Assessment Approach Utilizing Affordable Sensors
by Aditya Singh, Kislay Raj, Teerath Meghwar and Arunabha M. Roy
AI 2024, 5(2), 686-703; https://doi.org/10.3390/ai5020036 - 14 May 2024
Viewed by 217
Abstract
Paddy (Oryza sativa) is one of the most consumed food grains in the world. The process from its sowing to consumption via harvesting, processing, storage and management require much effort and expertise. The grain quality of the product is heavily affected [...] Read more.
Paddy (Oryza sativa) is one of the most consumed food grains in the world. The process from its sowing to consumption via harvesting, processing, storage and management require much effort and expertise. The grain quality of the product is heavily affected by the weather conditions, irrigation frequency, and many other factors. However, quality control is of immense importance, and thus, the evaluation of grain quality is necessary. Since it is necessary and arduous, we try to overcome the limitations and shortcomings of grain quality evaluation using image processing and machine learning (ML) techniques. Most existing methods are designed for rice grain quality assessment, noting that the key characteristics of paddy and rice are different. In addition, they have complex and expensive setups and utilize black-box ML models. To handle these issues, in this paper, we propose a reliable ML-based IoT paddy grain quality assessment system utilizing affordable sensors. It involves a specific data collection procedure followed by image processing with an ML-based model to predict the quality. Different explainable features are used for classifying the grain quality of paddy grain, like the shape, size, moisture, and maturity of the grain. The precision of the system was tested in real-world scenarios. To our knowledge, it is the first automated system to precisely provide an overall quality metric. The main feature of our system is its explainability in terms of utilized features and fuzzy rules, which increases the confidence and trustworthiness of the public toward its use. The grain variety used for experiments majorly belonged to the Indian Subcontinent, but it covered a significant variation in the shape and size of the grain. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

20 pages, 6807 KiB  
Article
Single Image Super Resolution Using Deep Residual Learning
by Moiz Hassan, Kandasamy Illanko and Xavier N. Fernando
AI 2024, 5(1), 426-445; https://doi.org/10.3390/ai5010021 - 21 Mar 2024
Viewed by 1563
Abstract
Single Image Super Resolution (SSIR) is an intriguing research topic in computer vision where the goal is to create high-resolution images from low-resolution ones using innovative techniques. SSIR has numerous applications in fields such as medical/satellite imaging, remote target identification and autonomous vehicles. [...] Read more.
Single Image Super Resolution (SSIR) is an intriguing research topic in computer vision where the goal is to create high-resolution images from low-resolution ones using innovative techniques. SSIR has numerous applications in fields such as medical/satellite imaging, remote target identification and autonomous vehicles. Compared to interpolation based traditional approaches, deep learning techniques have recently gained attention in SISR due to their superior performance and computational efficiency. This article proposes an Autoencoder based Deep Learning Model for SSIR. The down-sampling part of the Autoencoder mainly uses 3 by 3 convolution and has no subsampling layers. The up-sampling part uses transpose convolution and residual connections from the down sampling part. The model is trained using a subset of the VILRC ImageNet database as well as the RealSR database. Quantitative metrics such as PSNR and SSIM are found to be as high as 76.06 and 0.93 in our testing. We also used qualitative measures such as perceptual quality. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

31 pages, 4849 KiB  
Article
MultiWave-Net: An Optimized Spatiotemporal Network for Abnormal Action Recognition Using Wavelet-Based Channel Augmentation
by Ramez M. Elmasry, Mohamed A. Abd El Ghany, Mohammed A.-M. Salem and Omar M. Fahmy
AI 2024, 5(1), 259-289; https://doi.org/10.3390/ai5010014 - 24 Jan 2024
Viewed by 1213
Abstract
Human behavior is regarded as one of the most complex notions present nowadays, due to the large magnitude of possibilities. These behaviors and actions can be distinguished as normal and abnormal. However, abnormal behavior is a vast spectrum, so in this work, abnormal [...] Read more.
Human behavior is regarded as one of the most complex notions present nowadays, due to the large magnitude of possibilities. These behaviors and actions can be distinguished as normal and abnormal. However, abnormal behavior is a vast spectrum, so in this work, abnormal behavior is regarded as human aggression or in another context when car accidents occur on the road. As this behavior can negatively affect the surrounding traffic participants, such as vehicles and other pedestrians, it is crucial to monitor such behavior. Given the current prevalent spread of cameras everywhere with different types, they can be used to classify and monitor such behavior. Accordingly, this work proposes a new optimized model based on a novel integrated wavelet-based channel augmentation unit for classifying human behavior in various scenes, having a total number of trainable parameters of 5.3 m with an average inference time of 0.09 s. The model has been trained and evaluated on four public datasets: Real Live Violence Situations (RLVS), Highway Incident Detection (HWID), Movie Fights, and Hockey Fights. The proposed technique achieved accuracies in the range of 92% to 99.5% across the used benchmark datasets. Comprehensive analysis and comparisons between different versions of the model and the state-of-the-art have been performed to confirm the model’s performance in terms of accuracy and efficiency. The proposed model has higher accuracy with an average of 4.97%, and higher efficiency by reducing the number of parameters by around 139.1 m compared to other models trained and tested on the same benchmark datasets. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

23 pages, 9079 KiB  
Article
Deep Learning Performance Characterization on GPUs for Various Quantization Frameworks
by Muhammad Ali Shafique, Arslan Munir and Joonho Kong
AI 2023, 4(4), 926-948; https://doi.org/10.3390/ai4040047 - 18 Oct 2023
Viewed by 2128
Abstract
Deep learning is employed in many applications, such as computer vision, natural language processing, robotics, and recommender systems. Large and complex neural networks lead to high accuracy; however, they adversely affect many aspects of deep learning performance, such as training time, latency, throughput, [...] Read more.
Deep learning is employed in many applications, such as computer vision, natural language processing, robotics, and recommender systems. Large and complex neural networks lead to high accuracy; however, they adversely affect many aspects of deep learning performance, such as training time, latency, throughput, energy consumption, and memory usage in the training and inference stages. To solve these challenges, various optimization techniques and frameworks have been developed for the efficient performance of deep learning models in the training and inference stages. Although optimization techniques such as quantization have been studied thoroughly in the past, less work has been done to study the performance of frameworks that provide quantization techniques. In this paper, we have used different performance metrics to study the performance of various quantization frameworks, including TensorFlow automatic mixed precision and TensorRT. These performance metrics include training time and memory utilization in the training stage along with latency and throughput for graphics processing units (GPUs) in the inference stage. We have applied the automatic mixed precision (AMP) technique during the training stage using the TensorFlow framework, while for inference we have utilized the TensorRT framework for the post-training quantization technique using the TensorFlow TensorRT (TF-TRT) application programming interface (API).We performed model profiling for different deep learning models, datasets, image sizes, and batch sizes for both the training and inference stages, the results of which can help developers and researchers to devise and deploy efficient deep learning models for GPUs. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

Review

Jump to: Research

21 pages, 683 KiB  
Review
Few-Shot Fine-Grained Image Classification: A Comprehensive Review
by Jie Ren, Changmiao Li, Yaohui An, Weichuan Zhang and Changming Sun
AI 2024, 5(1), 405-425; https://doi.org/10.3390/ai5010020 - 6 Mar 2024
Viewed by 1639
Abstract
Few-shot fine-grained image classification (FSFGIC) methods refer to the classification of images (e.g., birds, flowers, and airplanes) belonging to different subclasses of the same species by a small number of labeled samples. Through feature representation learning, FSFGIC methods can make better use of [...] Read more.
Few-shot fine-grained image classification (FSFGIC) methods refer to the classification of images (e.g., birds, flowers, and airplanes) belonging to different subclasses of the same species by a small number of labeled samples. Through feature representation learning, FSFGIC methods can make better use of limited sample information, learn more discriminative feature representations, greatly improve the classification accuracy and generalization ability, and thus achieve better results in FSFGIC tasks. In this paper, starting from the definition of FSFGIC, a taxonomy of feature representation learning for FSFGIC is proposed. According to this taxonomy, we discuss key issues on FSFGIC (including data augmentation, local and/or global deep feature representation learning, class representation learning, and task-specific feature representation learning). In addition, the existing popular datasets, current challenges and future development trends of feature representation learning on FSFGIC are also described. Full article
(This article belongs to the Special Issue Artificial Intelligence-Based Image Processing and Computer Vision)
Show Figures

Figure 1

Back to TopTop