Algorithms

Research

17 pages, 2144 KB

Open AccessArticle

DEPANet: A Differentiable Edge-Guided Pyramid Aggregation Network for Strip Steel Surface Defect Segmentation

by Yange Sun, Siyu Geng, Chengyi Zheng, Chenglong Xu, Huaping Guo and Yan Feng

Algorithms 2025, 18(5), 279; https://doi.org/10.3390/a18050279 - 9 May 2025

Viewed by 481

The steel strip is an important and ideal material for the automotive and aerospace industries due to its superior machinability, cost efficiency, and flexibility. However, surface defects such as inclusions, spots, and scratches can significantly impact product performance and durability. Accurately identifying these [...] Read more.

The steel strip is an important and ideal material for the automotive and aerospace industries due to its superior machinability, cost efficiency, and flexibility. However, surface defects such as inclusions, spots, and scratches can significantly impact product performance and durability. Accurately identifying these defects remains challenging due to the complex texture structures and subtle variations in the material. In order to tackle this challenge, we propose a Differentiable Edge-guided Pyramid Aggregation Network (DEPANet) to utilize edge information for improving segmentation performance. DEPANet adopts an end-to-end encoder-decoder framework, where the encoder consisting of three key components: a backbone network, a Differentiable Edge Feature Pyramid network (DEFP), and Edge-aware Feature Aggregation Modules (EFAMs). The backbone network is designed to extract overall features from the strip steel surface, while the proposed DEFP utilizes learnable Laplacian operators to extract multiscale edge information of defects across scales. In addition, the proposed EFAMs aggregate the overall features generating from the backbone and the edge information obtained from DEFP using the Convolutional Block Attention Module (CBAM), which combines channel attention and spatial attention mechanisms, to enhance feature expression. Finally, through the decoder, implemented as a Feature Pyramid Network (FPN), the multiscale edge-enhanced features are progressively upsampled and fused to reconstruct high-resolution segmentation maps, enabling precise defect localization and robust handling of defects across various sizes and shapes. DEPANet demonstrates superior segmentation accuracy, edge preservation, and feature representation on the SD-saliency-900 dataset, outperforming other state-of-the-art methods and delivering more precise and reliable defect segmentation. Full article

(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)

► Show Figures

Figure 1

17 pages, 8061 KB

Open AccessArticle

Optimal View Estimation Algorithm and Evaluation with Deviation Angle Analysis

by Meng Yuan and Hongjun Li

Algorithms 2025, 18(4), 224; https://doi.org/10.3390/a18040224 - 12 Apr 2025

Viewed by 662

Abstract

Image-based viewpoint estimation is one of the tasks in image analysis, and another is the inverse problem of selecting the best viewpoint for displaying a three-dimensional object. Currently, two issues need further exploration in image-based viewpoint estimation research: insufficient labeled data and a [...] Read more.

Image-based viewpoint estimation is one of the tasks in image analysis, and another is the inverse problem of selecting the best viewpoint for displaying a three-dimensional object. Currently, two issues need further exploration in image-based viewpoint estimation research: insufficient labeled data and a limited number of evaluation methods for estimation results. To address the first issue, this paper proposes a spherical viewpoint sampling method based on a combination of analytical methods and motion adjustment, and designs a viewpoint-based projection image acquisition algorithm. Considering the difference between viewpoint inference and image classification, we propose an accuracy evaluation method with deviation angle tolerance for viewpoint estimation. Based on constructing a new dataset with viewpoint labels, the new accuracy evaluation method has been validated through experiments. The experimental results show that its estimation accuracy can reach 89% according to the new estimation evaluation indicators. Additionally, we applied our method to estimate the viewpoints of images from a furniture website and analyzed the viewpoint preferences in its furniture displays. Full article

(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)

► Show Figures

Figure 1

12 pages, 784 KB

Open AccessArticle

Image Deconvolution to Resolve Astronomical X-Ray Sources in Close Proximity: The NuSTAR Images of SXP 15.3 and SXP 305

by Sayantan Bhattacharya, Dimitris M. Christodoulou and Silas G. T. Laycock

Algorithms 2025, 18(4), 191; https://doi.org/10.3390/a18040191 - 27 Mar 2025

Viewed by 543

Abstract

The broad point spread function of the NuSTAR telescope makes resolving astronomical X-ray sources a challenging task, especially for off-axis observations. This limitation has affected the observations of the high-mass X-ray binary pulsars SXP 15.3 and SXP 305, in which pulsations are detected [...] Read more.

The broad point spread function of the NuSTAR telescope makes resolving astronomical X-ray sources a challenging task, especially for off-axis observations. This limitation has affected the observations of the high-mass X-ray binary pulsars SXP 15.3 and SXP 305, in which pulsations are detected from nearly overlapping regions without spatially resolving these X-ray sources. To address this issue, we introduce a deconvolution algorithm designed to enhance NuSTAR’s spatial resolution for closely spaced X-ray sources. We apply this technique to archival data and simulations of synthetic point sources placed at varying separations and locations, testing the algorithm’s efficacy in source detection and differentiation. Our study confirms that on some occasions when SXP 305 is brighter, SXP 15.3 is also resolved, suggesting that some prior non-detections may have resulted from imaging limitations. This deconvolution technique represents a proof of concept test for analyzing crowded fields in the sky with closely spaced X-ray sources in future NuSTAR observations. Full article

(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)

► Show Figures

Figure 1

28 pages, 1296 KB

Open AccessArticle

Fidex and FidexGlo: From Local Explanations to Global Explanations of Deep Models

by Guido Bologna, Jean-Marc Boutay, Damian Boquete, Quentin Leblanc, Deniz Köprülü and Ludovic Pfeiffer

Algorithms 2025, 18(3), 120; https://doi.org/10.3390/a18030120 - 20 Feb 2025

Viewed by 774

Abstract

Deep connectionist models are characterized by many neurons grouped together in many successive layers. As a result, their data classifications are difficult to understand. We present two novel algorithms which explain the responses of several black-box machine learning models. The first is Fidex, [...] Read more.

Deep connectionist models are characterized by many neurons grouped together in many successive layers. As a result, their data classifications are difficult to understand. We present two novel algorithms which explain the responses of several black-box machine learning models. The first is Fidex, which is local and thus applied to a single sample. The second, called FidexGlo, is global and uses Fidex. Both algorithms generate explanations by means of propositional rules. In our framework, the discriminative boundaries are parallel to the input variables and their location is precisely determined. Fidex is a heuristic algorithm that, at each step, establishes where the best hyperplane is that has increased fidelity the most. The algorithmic complexity of Fidex is proportional to the maximum number of steps, the number of possible hyperplanes, which is finite, and the number of samples. We first used FidexGlo with ensembles and support vector machines (SVMs) to show that its performance on three benchmark problems is competitive in terms of complexity, fidelity and accuracy. The most challenging part was then to apply it to convolutional neural networks. We achieved this with three classification problems based on images. We obtained accurate results and described the characteristics of the rules generated, as well as several examples of explanations illustrated with their corresponding images. To the best of our knowledge, this is one of the few works showing a global rule extraction technique applied to both ensembles, SVMs and deep neural networks. Full article

(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)

► Show Figures

Figure 1

16 pages, 2204 KB

Open AccessArticle

EGSDK-Net: Edge-Guided Stepwise Dual Kernel Update Network for Panoptic Segmentation

by Pengyu Mu, Hongwei Zhao and Ke Ma

Algorithms 2025, 18(2), 71; https://doi.org/10.3390/a18020071 - 1 Feb 2025

Cited by 1 | Viewed by 872

Abstract

In recent years, panoptic segmentation has garnered increasing attention from researchers aiming to better understand scenes in images. Although many excellent studies have been proposed, they share some common unresolved issues. Firstly, panoptic segmentation, as a novel task, is still confined within inherent [...] Read more.

In recent years, panoptic segmentation has garnered increasing attention from researchers aiming to better understand scenes in images. Although many excellent studies have been proposed, they share some common unresolved issues. Firstly, panoptic segmentation, as a novel task, is still confined within inherent frameworks. Secondly, the prevalent kernel update strategies do not adequately utilize the information from each stage. To address these two issues, redwe propose an edge-guided stepwise dual kernel update network (EGSDK-Net) for panoptic segmentation; the core components are the real-time edge guidance module and the stepwise dual kernel update module. The first component, after extracting and positioning edge features through an extra branch, applies these features to the normally transmitted feature maps within the network to highlight the edges. The input image is initially processed with the Canny edge detector to generate and store the predicted edge map, which acts as the ground truth for supervising the extracted edge feature map. The stepwise dual kernel update module enhances the utilization of information by allowing each stage to update both its own kernel and that of the subsequent stage, thereby improving the judgment capabilities of the kernels. redEGSDK-Net achieves a PQ of 60.6, representing a 2.19% improvement over RT-K-Net. Full article

(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)

► Show Figures

Figure 1

18 pages, 1508 KB

Open AccessArticle

Adversarial Validation in Image Classification Datasets by Means of Cumulative Spectral Gradient

by Diego Renza, Ernesto Moya-Albor and Adrian Chavarro

Algorithms 2024, 17(11), 531; https://doi.org/10.3390/a17110531 - 19 Nov 2024

Cited by 1 | Viewed by 1260

Abstract

The main objective of a machine learning (ML) system is to obtain a trained model from input data in such a way that it allows predictions to be made on new i.i.d. (Independently and Identically Distributed) data with the lowest possible error. However, [...] Read more.

The main objective of a machine learning (ML) system is to obtain a trained model from input data in such a way that it allows predictions to be made on new i.i.d. (Independently and Identically Distributed) data with the lowest possible error. However, how can we assess whether the training and test data have a similar distribution? To answer this question, this paper presents a proposal to determine the degree of distribution shift of two datasets. To this end, a metric for evaluating complexity in datasets is used, which can be applied in multi-class problems, comparing each pair of classes of the two sets. The proposed methodology has been applied to three well-known datasets: MNIST, CIFAR-10 and CIFAR-100, together with corrupted versions of these. Through this methodology, it is possible to evaluate which types of modification have a greater impact on the generalization of the models without the need to train multiple models multiple times, also allowing us to determine which classes are more affected by corruption. Full article

(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)

► Show Figures

Figure 1

21 pages, 3213 KB

Open AccessArticle

An Autoencoder-Based Task-Oriented Semantic Communication System for M2M Communication

by Prabhath Samarathunga, Hossein Rezaei, Maheshi Lokumarambage, Thushan Sivalingam, Nandana Rajatheva and Anil Fernando

Algorithms 2024, 17(11), 492; https://doi.org/10.3390/a17110492 - 2 Nov 2024

Cited by 1 | Viewed by 1597

Abstract

Semantic communication (SC) is a communication paradigm that has gained significant attention, as it offers a potential solution to move beyond Shannon’s formulation in bandwidth-limited communication channels by delivering the semantic meaning of the message rather than its exact form. In this paper, [...] Read more.

Semantic communication (SC) is a communication paradigm that has gained significant attention, as it offers a potential solution to move beyond Shannon’s formulation in bandwidth-limited communication channels by delivering the semantic meaning of the message rather than its exact form. In this paper, we propose an autoencoder-based SC system for transmitting images between two machines over error-prone channels to support emerging applications such as VIoT, XR, M2M, and M2H communications. The proposed autoencoder architecture, with a semantically modeled encoder and decoder, transmits image data as a reduced-dimension vector (latent vector) through an error-prone channel. The decoder then reconstructs the image to determine its M2M implications. The autoencoder is trained for different noise levels under various channel conditions, and both image quality and classification accuracy are used to evaluate the system’s efficacy. A CNN image classifier measures accuracy, as no image quality metric is available for SC yet. The simulation results show that all proposed autoencoders maintain high image quality and classification accuracy at high SNRs, while the autoencoder trained with zero noise underperforms other trained autoencoders at moderate SNRs. The results further indicate that all other proposed autoencoders trained under different noise levels are highly robust against channel impairments. We compare the proposed system against a comparable JPEG transmission system, and results reveal that the proposed system outperforms the JPEG system in compression efficiency by up to

50 %

and in received image quality with an image coding gain of up to 17 dB. Full article

(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)

► Show Figures

Figure 1

28 pages, 5276 KB

Open AccessArticle

Frequency-Domain and Spatial-Domain MLMVN-Based Convolutional Neural Networks

by Igor Aizenberg and Alexander Vasko

Algorithms 2024, 17(8), 361; https://doi.org/10.3390/a17080361 - 17 Aug 2024

Cited by 3 | Viewed by 1608

Abstract

This paper presents a detailed analysis of a convolutional neural network based on multi-valued neurons (CNNMVN) and a fully connected multilayer neural network based on multi-valued neurons (MLMVN), employed here as a convolutional neural network in the frequency domain. We begin by providing [...] Read more.

This paper presents a detailed analysis of a convolutional neural network based on multi-valued neurons (CNNMVN) and a fully connected multilayer neural network based on multi-valued neurons (MLMVN), employed here as a convolutional neural network in the frequency domain. We begin by providing an overview of the fundamental concepts underlying CNNMVN, focusing on the organization of convolutional layers and the CNNMVN learning algorithm. The error backpropagation rule for this network is justified and presented in detail. Subsequently, we consider how MLMVN can be used as a convolutional neural network in the frequency domain. It is shown that each neuron in the first hidden layer of MLMVN may work as a frequency-domain convolutional kernel, utilizing the Convolution Theorem. Essentially, these neurons create Fourier transforms of the feature maps that would have resulted from the convolutions in the spatial domain performed in regular convolutional neural networks. Furthermore, we discuss optimization techniques for both networks and compare the resulting convolutions to explore which features they extract from images. Finally, we present experimental results showing that both approaches can achieve high accuracy in image recognition. Full article

(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)

► Show Figures

Figure 1

26 pages, 501 KB

Open AccessArticle

In-Depth Analysis of GAF-Net: Comparative Fusion Approaches in Video-Based Person Re-Identification

by Moncef Boujou, Rabah Iguernaissi, Lionel Nicod, Djamal Merad and Séverine Dubuisson

Algorithms 2024, 17(8), 352; https://doi.org/10.3390/a17080352 - 11 Aug 2024

Cited by 3 | Viewed by 1885

Abstract

This study provides an in-depth analysis of GAF-Net, a novel model for video-based person re-identification (Re-ID) that matches individuals across different video sequences. GAF-Net combines appearance-based features with gait-based features derived from skeletal data, offering a new approach that diverges from traditional silhouette-based [...] Read more.

This study provides an in-depth analysis of GAF-Net, a novel model for video-based person re-identification (Re-ID) that matches individuals across different video sequences. GAF-Net combines appearance-based features with gait-based features derived from skeletal data, offering a new approach that diverges from traditional silhouette-based methods. We thoroughly examine each module of GAF-Net and explore various fusion methods at the both score and feature levels, extending beyond initial simple concatenation. Comprehensive evaluations on the iLIDS-VID and MARS datasets demonstrate GAF-Net’s effectiveness across scenarios. GAF-Net achieves state-of-the-art

93.2 %

rank-1 accuracy on iLIDS-VID’s long sequences, while MARS results (

86.09 %

mAP,

89.78 %

rank-1) reveal challenges with shorter, variable sequences in complex real-world settings. We demonstrate that integrating skeleton-based gait features consistently improves Re-ID performance, particularly with long, more informative sequences. This research provides crucial insights into multi-modal feature integration in Re-ID tasks, laying a foundation for the advancement of multi-modal biometric systems for diverse computer vision applications. Full article

(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)

► Show Figures

Figure 1

27 pages, 2251 KB

Open AccessArticle

Threshold Active Learning Approach for Physical Violence Detection on Images Obtained from Video (Frame-Level) Using Pre-Trained Deep Learning Neural Network Models

by Itzel M. Abundez, Roberto Alejo, Francisco Primero Primero, Everardo E. Granda-Gutiérrez, Otniel Portillo-Rodríguez and Juan Alberto Antonio Velázquez

Algorithms 2024, 17(7), 316; https://doi.org/10.3390/a17070316 - 18 Jul 2024

Cited by 1 | Viewed by 3242

Abstract

Public authorities and private companies have used video cameras as part of surveillance systems, and one of their objectives is the rapid detection of physically violent actions. This task is usually performed by human visual inspection, which is labor-intensive. For this reason, different [...] Read more.

Public authorities and private companies have used video cameras as part of surveillance systems, and one of their objectives is the rapid detection of physically violent actions. This task is usually performed by human visual inspection, which is labor-intensive. For this reason, different deep learning models have been implemented to remove the human eye from this task, yielding positive results. One of the main problems in detecting physical violence in videos is the variety of scenarios that can exist, which leads to different models being trained on datasets, leading them to detect physical violence in only one or a few types of videos. In this work, we present an approach for physical violence detection on images obtained from video based on threshold active learning, that increases the classifier’s robustness in environments where it was not trained. The proposed approach consists of two stages: In the first stage, pre-trained neural network models are trained on initial datasets, and we use a threshold (

μ

) to identify those images that the classifier considers ambiguous or hard to classify. Then, they are included in the training dataset, and the model is retrained to improve its classification performance. In the second stage, we test the model with video images from other environments, and we again employ (

μ

) to detect ambiguous images that a human expert analyzes to determine the real class or delete the ambiguity on them. After that, the ambiguous images are added to the original training set and the classifier is retrained; this process is repeated while ambiguous images exist. The model is a hybrid neural network that uses transfer learning and a threshold

μ

to detect physical violence on images obtained from video files successfully. In this active learning process, the classifier can detect physical violence in different environments, where the main contribution is the method used to obtain a threshold

μ

(which is based on the neural network output) that allows human experts to contribute to the classification process to obtain more robust neural networks and high-quality datasets. The experimental results show the proposed approach’s effectiveness in detecting physical violence, where it is trained using an initial dataset, and new images are added to improve its robustness in diverse environments. Full article

(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)

► Show Figures

Figure 1

20 pages, 1373 KB

Open AccessArticle

A Sparsity-Invariant Model via Unifying Depth Prediction and Completion

by Shuling Wang, Fengze Jiang and Xiaojin Gong

Algorithms 2024, 17(7), 298; https://doi.org/10.3390/a17070298 - 6 Jul 2024

Cited by 1 | Viewed by 1203

Abstract

The development of a sparse-invariant depth completion model capable of handling varying levels of input depth sparsity is highly desirable in real-world applications. However, existing sparse-invariant models tend to degrade when the input depth points are extremely sparse. In this paper, we propose [...] Read more.

The development of a sparse-invariant depth completion model capable of handling varying levels of input depth sparsity is highly desirable in real-world applications. However, existing sparse-invariant models tend to degrade when the input depth points are extremely sparse. In this paper, we propose a new model that combines the advantageous designs of depth completion and monocular depth estimation tasks to achieve sparse invariance. Specifically, we construct a dual-branch architecture with one branch dedicated to depth prediction and the other to depth completion. Additionally, we integrate the multi-scale local planar module in the decoders of both branches. Experimental results on the NYU Depth V2 benchmark and the OPPO prototype dataset equipped with the Spot-iToF316 sensor demonstrate that our model achieves reliable results even in cases with irregularly distributed, limited or absent depth information. Full article

(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Machine Learning Algorithms for Image Understanding and Analysis

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Related Special Issue

Published Papers (11 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI