sensors-logo

Journal Browser

Journal Browser

Image Processing and Pattern Recognition Based on Deep Learning—2nd Edition

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Sensing and Imaging".

Deadline for manuscript submissions: closed (25 November 2024) | Viewed by 34114

Special Issue Editors


E-Mail Website
Guest Editor
Department of Automation and Industrial Informatics, Faculty of Automatic Control and Computer Science, University POLITEHNICA of Bucharest, 060042 Bucharest, Romania
Interests: image acquisition; image processing; feature extraction; image classification; image segmentation; artificial neural networks; deep learning; wireless sensor networks; unmanned aerial vehicles; data fusion; data processing in medicine; data processing in agriculture
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Automation and Industrial Informatics, Faculty of Automatic Control and Computer Science, University POLITEHNICA of Bucharest, 060042 Bucharest, Romania
Interests: convolutional neural networks; artificial intelligence; medical image processing; biomedical optical imaging; computer vision; computerised monitoring; data acquisition; image colour analysis; texture analysis; cloud computing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The pattern recognition applied in the analysis and interpretation of regions of interest in images is today related to the use of artificial intelligence and in particular neural networks based on deep learning. The current trends in the use of neural networks include the modification of networks from established families to increase statistical or time performance, transfer learning, the use of multiple networks in more complex systems, merging decisions of individual networks, and combining efficient features with neural networks for higher-performance detection or classification. In addition, combination with other classifiers based on artificial intelligence can is a possible avenue.

The aim of this Special Issue is to publish original research contributions concerning new neural-network-based approaches of image processing and pattern recognition with direct applications in different domains, such as: remote sensing, crop monitoring, boarding monitoring, system support in medical diagnosis, emotion detection, and so on.

The scope of the Special Issue includes (but is not limited to) the following scope research areas concerning the image processing and pattern recognition by the aid of new artificial intelligence techniques:

  • Image processing;
  • Pattern recognition;
  • Image segmentation;
  • Object classification;
  • Neural networks;
  • Deep learning;
  • Decision fusion;
  • Systems based on multiple neural networks;
  • The detection of regions of interest from remote images;
  • Industry applications;
  • Precision agriculture application;
  • Medical application;
  • The monitoring of protected areas;
  • Disaster monitoring and assessment.

Prof. Dr. Dan Popescu
Prof. Dr. Loretta Ichim
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Related Special Issue

Published Papers (18 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

27 pages, 8017 KiB  
Article
Quantum Variational vs. Quantum Kernel Machine Learning Models for Partial Discharge Classification in Dielectric Oils
by José Miguel Monzón-Verona, Santiago García-Alonso and Francisco Jorge Santana-Martín
Sensors 2025, 25(4), 1277; https://doi.org/10.3390/s25041277 - 19 Feb 2025
Viewed by 611
Abstract
In this paper, electrical discharge images are classified using AI with quantum machine learning techniques. These discharges were originated in dielectric mineral oils and were detected by a high-resolution optical sensor. The captured images were processed in a Scikit-image environment to obtain a [...] Read more.
In this paper, electrical discharge images are classified using AI with quantum machine learning techniques. These discharges were originated in dielectric mineral oils and were detected by a high-resolution optical sensor. The captured images were processed in a Scikit-image environment to obtain a reduced number of features or qubits for later training of quantum circuits. Two quantum binary classification models were developed and compared in the Qiskit environment for four discharge binary combinations. The first was a quantum variational model (QVM), and the second was a conventional support vector machine (SVM) with a quantum kernel model (QKM). The execution of these two models was realized on three fault-tolerant physical quantum IBM computers. The novelty of this article lies in its application to a real problem, unlike other studies that focus on simulated or theoretical data sets. In addition, a study is carried out on the impact of the number of qubits in QKM, and it is shown that increasing the number of qubits in this model significantly improves the accuracy in the classification of the four binary combinations studied. In the QVM, with two qubits, an accuracy of 92% was observed in the first discharge combination in the three quantum computers used, with a margin of error of 1% compared to the simulation obtained on classical computers. Full article
Show Figures

Figure 1

15 pages, 3484 KiB  
Article
PC-CS-YOLO: High-Precision Obstacle Detection for Visually Impaired Safety
by Jincheng Li, Menglin Zheng, Danyang Dong and Xing Xie
Sensors 2025, 25(2), 534; https://doi.org/10.3390/s25020534 - 17 Jan 2025
Viewed by 967
Abstract
The issue of obstacle avoidance and safety for visually impaired individuals has been a major topic of research. However, complex street environments still pose significant challenges for blind obstacle detection systems. Existing solutions often fail to provide real-time, accurate obstacle avoidance decisions. In [...] Read more.
The issue of obstacle avoidance and safety for visually impaired individuals has been a major topic of research. However, complex street environments still pose significant challenges for blind obstacle detection systems. Existing solutions often fail to provide real-time, accurate obstacle avoidance decisions. In this study, we propose a blind obstacle detection system based on the PC-CS-YOLO model. The system improves the backbone network by adopting the partial convolutional feed-forward network (PCFN) to reduce computational redundancy. Additionally, to enhance the network’s robustness in multi-scale feature fusion, we introduce the Cross-Scale Attention Fusion (CSAF) mechanism, which integrates features from different sensory domains to achieve superior performance. Compared to state-of-the-art networks, our system shows improvements of 2.0%, 3.9%, and 1.5% in precision, recall, and mAP50, respectively. When evaluated on a GPU, the inference speed is 20.6 ms, which is 15.3 ms faster than YOLO11, meeting the real-time requirements for blind obstacle avoidance systems. Full article
Show Figures

Figure 1

26 pages, 12260 KiB  
Article
Deep Learning-Based Pointer Meter Reading Recognition for Advancing Manufacturing Digital Transformation Research
by Xiang Li, Jun Zhao, Changchang Zeng, Yong Yao, Sen Zhang and Suixian Yang
Sensors 2025, 25(1), 244; https://doi.org/10.3390/s25010244 - 3 Jan 2025
Viewed by 910
Abstract
With the digital transformation of the manufacturing industry, data monitoring and collecting in the manufacturing process become essential. Pointer meter reading recognition (PMRR) is a key element in data monitoring throughout the manufacturing process. However, existing PMRR methods have low accuracy and insufficient [...] Read more.
With the digital transformation of the manufacturing industry, data monitoring and collecting in the manufacturing process become essential. Pointer meter reading recognition (PMRR) is a key element in data monitoring throughout the manufacturing process. However, existing PMRR methods have low accuracy and insufficient robustness due to issues such as blur, uneven illumination, tilt, and complex backgrounds in meter images. To address these challenges, we propose an end-to-end PMRR method based on a decoupled circle head detection algorithm (YOLOX-DC) and a Unet-like pure Transformer segmentation network (PM-SwinUnet). First, according to the characteristics of the pointer dial, the YOLOX-DC detection algorithm is designed based on the exceeding you only look once detector (YOLOX). The decoupled circle head of YOLOX-DC detects the pointer meter dial more accurately than the commonly used rectangular detection head. Second, the window multi-head attention of the PM-SwinUnet network enhances the feature extraction ability of pointer meter images and solves problems of missed scale detection and incomplete pointer segmentation. Additionally, the scale and pointer fitting module is introduced into the PM-SwinUnet to locate the accurate position of the scale and pointer. Finally, through the angle relationship between the pointer and the first two main scale lines, the pointer meter reading is accurately calculated by the improved angle method. Experimental results demonstrate the effectiveness and superiority of the proposed end-to-end method across three-pointer meter datasets. Furthermore, it provides a rapid and robust approach to the digital transformation of manufacturing systems. Full article
Show Figures

Figure 1

17 pages, 7156 KiB  
Article
Advancing a Vision Foundation Model for Ming-Style Furniture Image Segmentation: A New Dataset and Method
by Yingtong Wan, Wanru Wang, Meng Zhang, Wei Peng and He Tang
Sensors 2025, 25(1), 96; https://doi.org/10.3390/s25010096 - 27 Dec 2024
Viewed by 781
Abstract
This paper tackles the challenge of accurately segmenting images of Ming-style furniture, an important aspect of China’s cultural heritage, to aid in its preservation and analysis. Existing vision foundation models, like the segment anything model (SAM), struggle with the complex structures of Ming [...] Read more.
This paper tackles the challenge of accurately segmenting images of Ming-style furniture, an important aspect of China’s cultural heritage, to aid in its preservation and analysis. Existing vision foundation models, like the segment anything model (SAM), struggle with the complex structures of Ming furniture due to the need for manual prompts and imprecise segmentation outputs. To address these limitations, we introduce two key innovations: the material attribute prompter (MAP), which automatically generates prompts based on the furniture’s material properties, and the structure refinement module (SRM), which enhances segmentation by combining high- and low-level features. Additionally, we present the MF2K dataset, which includes 2073 images annotated with pixel-level masks across eight materials and environments. Our experiments demonstrate that the proposed method significantly improves the segmentation accuracy, outperforming state-of-the-art models in terms of the mean intersection over union (mIoU). Ablation studies highlight the contributions of the MAP and SRM to both the performance and computational efficiency. This work offers a powerful automated solution for segmenting intricate furniture structures, facilitating digital preservation and in-depth analysis of Ming-style furniture. Full article
Show Figures

Figure 1

16 pages, 1698 KiB  
Article
Probabilistic Attention Map: A Probabilistic Attention Mechanism for Convolutional Neural Networks
by Yifeng Liu and Jing Tian
Sensors 2024, 24(24), 8187; https://doi.org/10.3390/s24248187 - 22 Dec 2024
Cited by 1 | Viewed by 1093
Abstract
The attention mechanism is essential to convolutional neural network (CNN) vision backbones used for sensing and imaging systems. Conventional attention modules are designed heuristically, relying heavily on empirical tuning. To tackle the challenge of designing attention mechanisms, this paper proposes a novel probabilistic [...] Read more.
The attention mechanism is essential to convolutional neural network (CNN) vision backbones used for sensing and imaging systems. Conventional attention modules are designed heuristically, relying heavily on empirical tuning. To tackle the challenge of designing attention mechanisms, this paper proposes a novel probabilistic attention mechanism. The key idea is to estimate the probabilistic distribution of activation maps within CNNs and construct probabilistic attention maps based on the correlation between attention weights and the estimated probabilistic distribution. The proposed approach consists of two main components: (i) the calculation of the probabilistic attention map and (ii) its integration into existing CNN architectures. In the first stage, the activation values generated at each CNN layer are modeled by using a Laplace distribution, which assigns probability values to each activation, representing its relative importance. Next, the probabilistic attention map is applied to the feature maps via element-wise multiplication and is seamlessly integrated as a plug-and-play module into existing CNN architectures. The experimental results show that the proposed probabilistic attention mechanism effectively boosts image classification accuracy performance across various CNN backbone models, outperforming both baseline and other attention mechanisms. Full article
Show Figures

Figure 1

19 pages, 2825 KiB  
Article
Style Transfer of Chinese Wuhu Iron Paintings Using Hierarchical Visual Transformer
by Yuying Zhou, Yao Ren, Chao Wu and Minglong Xue
Sensors 2024, 24(24), 8103; https://doi.org/10.3390/s24248103 - 19 Dec 2024
Viewed by 743
Abstract
Within the domain of traditional art, Chinese Wuhu Iron Painting distinguishes itself through its distinctive craftsmanship, aesthetic expressiveness, and choice of materials, presenting a formidable challenge in the arena of stylistic transformation. This paper introduces an innovative Hierarchical Visual Transformer (HVT) framework aimed [...] Read more.
Within the domain of traditional art, Chinese Wuhu Iron Painting distinguishes itself through its distinctive craftsmanship, aesthetic expressiveness, and choice of materials, presenting a formidable challenge in the arena of stylistic transformation. This paper introduces an innovative Hierarchical Visual Transformer (HVT) framework aimed at achieving effectiveness and precision in the style transfer of Wuhu Iron Paintings. The study begins with an in-depth analysis of the artistic style of Wuhu Iron Paintings, extracting key stylistic elements that meet technical requirements for style conversion. Furthermore, in response to the unique artistic characteristics of Wuhu Iron Paintings, this research constructs a multi-layered network structure capable of effectively capturing and parsing style and content features. Building on this, we have designed an Efficient Local Attention Decoder (ELA-Decoder) that adaptively decodes the style and content features through correlation, significantly enhancing the length dependency of local and global information. Additionally, this paper proposes a Content Correction Module (CCM) to eliminate redundant features generated during the style transfer process, further optimizing the migration results. In light of the scarcity of existing datasets for Wuhu Iron Paintings, this study also collects and constructs a dedicated dataset for the style transfer of Wuhu Iron Paintings. Our method achieves optimal performance in terms of loss metrics, with a reduction of at least 4% in style loss and 5% in content loss compared to other advanced methods. Moreover, expert evaluations were conducted to validate the effectiveness of our approach, and the results show that our method received the highest number of votes, further demonstrating its superiority. Full article
Show Figures

Figure 1

27 pages, 11281 KiB  
Article
Text Font Correction and Alignment Method for Scene Text Recognition
by Liuxu Ding, Yuefeng Liu, Qiyan Zhao and Yunong Liu
Sensors 2024, 24(24), 7917; https://doi.org/10.3390/s24247917 - 11 Dec 2024
Viewed by 801
Abstract
Text recognition is a rapidly evolving task with broad practical applications across multiple industries. However, due to the arbitrary-shape text arrangement, irregular text font, and unintended occlusion of font, this remains a challenging task. To handle images with arbitrary-shape text arrangement and irregular [...] Read more.
Text recognition is a rapidly evolving task with broad practical applications across multiple industries. However, due to the arbitrary-shape text arrangement, irregular text font, and unintended occlusion of font, this remains a challenging task. To handle images with arbitrary-shape text arrangement and irregular text font, we designed the Discriminative Standard Text Font (DSTF) and the Feature Alignment and Complementary Fusion (FACF). To address the unintended occlusion of font, we propose a Dual Attention Serial Module (DASM), which is integrated between residual modules to enhance the focus on text texture. These components improve text recognition by correcting irregular text and aligning it with the original feature extraction, thus complementing the overall recognition process. Additionally, to enhance the study of text recognition in natural scenes, we developed the VBC Chinese dataset under varying lighting conditions, including strong light, weak light, darkness, and other natural environments. Experimental results show that our method achieves competitive performance on the VBC dataset with an accuracy of 90.8% and an overall average accuracy of 93.8%. Full article
Show Figures

Figure 1

27 pages, 4935 KiB  
Article
Diverse Dataset for Eyeglasses Detection: Extending the Flickr-Faces-HQ (FFHQ) Dataset
by Dalius Matuzevičius
Sensors 2024, 24(23), 7697; https://doi.org/10.3390/s24237697 - 1 Dec 2024
Viewed by 1404
Abstract
Facial analysis is an important area of research in computer vision and machine learning, with applications spanning security, healthcare, and user interaction systems. The data-centric AI approach emphasizes the importance of high-quality, diverse, and well-annotated datasets in driving advancements in this field. However, [...] Read more.
Facial analysis is an important area of research in computer vision and machine learning, with applications spanning security, healthcare, and user interaction systems. The data-centric AI approach emphasizes the importance of high-quality, diverse, and well-annotated datasets in driving advancements in this field. However, current facial datasets, such as Flickr-Faces-HQ (FFHQ), lack detailed annotations for detecting facial accessories, particularly eyeglasses. This work addresses this limitation by extending the FFHQ dataset with precise bounding box annotations for eyeglasses detection, enhancing its utility for data-centric AI applications. The extended dataset comprises 70,000 images, including over 16,000 images containing eyewear, and it exceeds the CelebAMask-HQ dataset in size and diversity. A semi-automated protocol was employed to efficiently generate accurate bounding box annotations, minimizing the demand for extensive manual labeling. This enriched dataset serves as a valuable resource for training and benchmarking eyewear detection models. Additionally, the baseline benchmark results for eyeglasses detection were presented using deep learning methods, including YOLOv8 and MobileNetV3. The evaluation, conducted through cross-dataset validation, demonstrated the robustness of models trained on the extended FFHQ dataset with their superior performances over existing alternative CelebAMask-HQ. The extended dataset, which has been made publicly available, is expected to support future research and development in eyewear detection, contributing to advancements in facial analysis and related fields. Full article
Show Figures

Figure 1

23 pages, 6670 KiB  
Article
DRBD-YOLOv8: A Lightweight and Efficient Anti-UAV Detection Model
by Panpan Jiang, Xiaohua Yang, Yaping Wan, Tiejun Zeng, Mingxing Nie and Zhenghai Liu
Sensors 2024, 24(22), 7148; https://doi.org/10.3390/s24227148 - 7 Nov 2024
Cited by 2 | Viewed by 1530
Abstract
Interest in anti-UAV detection systems has increased due to growing concerns about the security and privacy issues associated with unmanned aerial vehicles (UAVs). Achieving real-time detection with high accuracy, while accommodating the limited resources of edge-computing devices poses a significant challenge for anti-UAV [...] Read more.
Interest in anti-UAV detection systems has increased due to growing concerns about the security and privacy issues associated with unmanned aerial vehicles (UAVs). Achieving real-time detection with high accuracy, while accommodating the limited resources of edge-computing devices poses a significant challenge for anti-UAV detection. Existing deep learning-based models for anti-UAV detection often cannot balance accuracy, processing speed, model size, and computational efficiency. To address these limitations, a lightweight and efficient anti-UAV detection model, DRBD-YOLOv8, is proposed in this paper. The model integrates several innovations, including the application of a Re-parameterization Cross-Stage Efficient Layered Attention Network (RCELAN) and a Bidirectional Feature Pyramid Network (BiFPN), to enhance feature processing capabilities while maintaining a lightweight design. Furthermore, DN-ShapeIoU, a novel loss function, has been established to enhance detection accuracy, and depthwise separable convolutions have been included to decrease computational complexity. The experimental results showed that the proposed model outperformed YOLOV8n in terms of mAP50, mAP95, precision, and FPS while reducing GFLOPs and parameter count. The DRBD-YOLOv8 model is almost half the size of the YOLOv8n model, measuring 3.25 M. Its small size, fast speed, and high accuracy combine to provide a lightweight, accurate device that is excellent for real-time anti-UAV detection on edge-computing devices. Full article
Show Figures

Figure 1

23 pages, 14242 KiB  
Article
EHNet: Efficient Hybrid Network with Dual Attention for Image Deblurring
by Quoc-Thien Ho, Minh-Thien Duong, Seongsoo Lee and Min-Cheol Hong
Sensors 2024, 24(20), 6545; https://doi.org/10.3390/s24206545 - 10 Oct 2024
Cited by 1 | Viewed by 1547
Abstract
The motion of an object or camera platform makes the acquired image blurred. This degradation is a major reason to obtain a poor-quality image from an imaging sensor. Therefore, developing an efficient deep-learning-based image processing method to remove the blur artifact is desirable. [...] Read more.
The motion of an object or camera platform makes the acquired image blurred. This degradation is a major reason to obtain a poor-quality image from an imaging sensor. Therefore, developing an efficient deep-learning-based image processing method to remove the blur artifact is desirable. Deep learning has recently demonstrated significant efficacy in image deblurring, primarily through convolutional neural networks (CNNs) and Transformers. However, the limited receptive fields of CNNs restrict their ability to capture long-range structural dependencies. In contrast, Transformers excel at modeling these dependencies, but they are computationally expensive for high-resolution inputs and lack the appropriate inductive bias. To overcome these challenges, we propose an Efficient Hybrid Network (EHNet) that employs CNN encoders for local feature extraction and Transformer decoders with a dual-attention module to capture spatial and channel-wise dependencies. This synergy facilitates the acquisition of rich contextual information for high-quality image deblurring. Additionally, we introduce the Simple Feature-Embedding Module (SFEM) to replace the pointwise and depthwise convolutions to generate simplified embedding features in the self-attention mechanism. This innovation substantially reduces computational complexity and memory usage while maintaining overall performance. Finally, through comprehensive experiments, our compact model yields promising quantitative and qualitative results for image deblurring on various benchmark datasets. Full article
Show Figures

Figure 1

19 pages, 5950 KiB  
Article
Ancient Chinese Character Recognition with Improved Swin-Transformer and Flexible Data Enhancement Strategies
by Yi Zheng, Yi Chen, Xianbo Wang, Donglian Qi and Yunfeng Yan
Sensors 2024, 24(7), 2182; https://doi.org/10.3390/s24072182 - 28 Mar 2024
Cited by 3 | Viewed by 2683
Abstract
The decipherment of ancient Chinese scripts, such as oracle bone and bronze inscriptions, holds immense significance for understanding ancient Chinese history, culture, and civilization. Despite substantial progress in recognizing oracle bone script, research on the overall recognition of ancient Chinese characters remains somewhat [...] Read more.
The decipherment of ancient Chinese scripts, such as oracle bone and bronze inscriptions, holds immense significance for understanding ancient Chinese history, culture, and civilization. Despite substantial progress in recognizing oracle bone script, research on the overall recognition of ancient Chinese characters remains somewhat lacking. To tackle this issue, we pioneered the construction of a large-scale image dataset comprising 9233 distinct ancient Chinese characters sourced from images obtained through archaeological excavations. We propose the first model for recognizing the common ancient Chinese characters. This model consists of four stages with Linear Embedding and Swin-Transformer blocks, each supplemented by a CoT Block to enhance local feature extraction. We also advocate for an enhancement strategy, which involves two steps: firstly, conducting adaptive data enhancement on the original data, and secondly, randomly resampling the data. The experimental results, with a top-one accuracy of 87.25% and a top-five accuracy of 95.81%, demonstrate that our proposed method achieves remarkable performance. Furthermore, through the visualizing of model attention, it can be observed that the proposed model, trained on a large number of images, is able to capture the morphological characteristics of ancient Chinese characters to a certain extent. Full article
Show Figures

Figure 1

17 pages, 6501 KiB  
Article
Unsupervised Conditional Diffusion Models in Video Anomaly Detection for Monitoring Dust Pollution
by Limin Cai, Mofei Li and Dianpeng Wang
Sensors 2024, 24(5), 1464; https://doi.org/10.3390/s24051464 - 23 Feb 2024
Viewed by 1739
Abstract
Video surveillance is widely used in monitoring environmental pollution, particularly harmful dust. Currently, manual video monitoring remains the predominant method for analyzing potential pollution, which is inefficient and prone to errors. In this paper, we introduce a new unsupervised method based on latent [...] Read more.
Video surveillance is widely used in monitoring environmental pollution, particularly harmful dust. Currently, manual video monitoring remains the predominant method for analyzing potential pollution, which is inefficient and prone to errors. In this paper, we introduce a new unsupervised method based on latent diffusion models. Specifically, we propose a spatio-temporal network structure, which better integrates the spatial and temporal features of videos. Our conditional guidance mechanism samples frames of input videos to guide high-quality generation and obtains frame-level anomaly scores, comparing generated videos with original ones. We also propose an efficient compression strategy to reduce computational costs, allowing the model to perform in a latent space. The superiority of our method was demonstrated by numerical experiments in three public benchmarks and practical application analysis in coal mining over previous SOTA methods with better AUC, of at most over 3%. Our method accurately detects abnormal patterns in multiple challenging environmental monitoring scenarios, illustrating the potential application possibilities in the environmental protection domain and beyond. Full article
Show Figures

Figure 1

35 pages, 29002 KiB  
Article
Characterization of Partial Discharges in Dielectric Oils Using High-Resolution CMOS Image Sensor and Convolutional Neural Networks
by José Miguel Monzón-Verona, Pablo González-Domínguez and Santiago García-Alonso
Sensors 2024, 24(4), 1317; https://doi.org/10.3390/s24041317 - 18 Feb 2024
Cited by 5 | Viewed by 1704
Abstract
In this work, an exhaustive analysis of the partial discharges that originate in the bubbles present in dielectric mineral oils is carried out. To achieve this, a low-cost, high-resolution CMOS image sensor is used. Partial discharge measurements using that image sensor are validated [...] Read more.
In this work, an exhaustive analysis of the partial discharges that originate in the bubbles present in dielectric mineral oils is carried out. To achieve this, a low-cost, high-resolution CMOS image sensor is used. Partial discharge measurements using that image sensor are validated by a standard electrical detection system that uses a discharge capacitor. In order to accurately identify the images corresponding to partial discharges, a convolutional neural network is trained using a large set of images captured by the image sensor. An image classification model is also developed using deep learning with a convolutional network based on a TensorFlow and Keras model. The classification results of the experiments show that the accuracy achieved by our model is around 95% on the validation set and 82% on the test set. As a result of this work, a non-destructive diagnosis method has been developed that is based on the use of an image sensor and the design of a convolutional neural network. This approach allows us to obtain information about the state of mineral oils before breakdown occurs, providing a valuable tool for the evaluation and maintenance of these dielectric oils. Full article
Show Figures

Figure 1

21 pages, 18333 KiB  
Article
Parsing Netlists of Integrated Circuits from Images via Graph Attention Network
by Wenxing Hu, Xianke Zhan and Minglei Tong
Sensors 2024, 24(1), 227; https://doi.org/10.3390/s24010227 - 30 Dec 2023
Cited by 1 | Viewed by 2311
Abstract
A massive number of paper documents that include important information such as circuit schematics can be converted into digital documents by optical sensors like scanners or digital cameras. However, extracting the netlists of analog circuits from digital documents is an exceptionally challenging task. [...] Read more.
A massive number of paper documents that include important information such as circuit schematics can be converted into digital documents by optical sensors like scanners or digital cameras. However, extracting the netlists of analog circuits from digital documents is an exceptionally challenging task. This process aids enterprises in digitizing paper-based circuit diagrams, enabling the reuse of analog circuit designs and the automatic generation of datasets required for intelligent design models in this domain. This paper introduces a bottom-up graph encoding model aimed at automatically parsing the circuit topology of analog integrated circuits from images. The model comprises an improved electronic component detection network based on the Swin Transformer, an algorithm for component port localization, and a graph encoding model. The objective of the detection network is to accurately identify component positions and types, followed by automatic dataset generation through port localization, and finally, utilizing the graph encoding model to predict potential connections between circuit components. To validate the model’s performance, we annotated an electronic component detection dataset and a circuit diagram dataset, comprising 1200 and 3552 training samples, respectively. Detailed experimentation results demonstrate the superiority of our proposed enhanced algorithm over comparative algorithms across custom and public datasets. Furthermore, our proposed port localization algorithm significantly accelerates the annotation speed of circuit diagram datasets. Full article
Show Figures

Figure 1

21 pages, 8353 KiB  
Article
Velocity and Color Estimation Using Event-Based Clustering
by Xavier Lesage, Rosalie Tran, Stéphane Mancini and Laurent Fesquet
Sensors 2023, 23(24), 9768; https://doi.org/10.3390/s23249768 - 11 Dec 2023
Cited by 1 | Viewed by 1456
Abstract
Event-based clustering provides a low-power embedded solution for low-level feature extraction in a scene. The algorithm utilizes the non-uniform sampling capability of event-based image sensors to measure local intensity variations within a scene. Consequently, the clustering algorithm forms similar event groups while simultaneously [...] Read more.
Event-based clustering provides a low-power embedded solution for low-level feature extraction in a scene. The algorithm utilizes the non-uniform sampling capability of event-based image sensors to measure local intensity variations within a scene. Consequently, the clustering algorithm forms similar event groups while simultaneously estimating their attributes. This work proposes taking advantage of additional event information in order to provide new attributes for further processing. We elaborate on the estimation of the object velocity using the mean motion of the cluster. Next, we are examining a novel form of events, which includes intensity measurement of the color at the concerned pixel. These events may be processed to estimate the rough color of a cluster, or the color distribution in a cluster. Lastly, this paper presents some applications that utilize these features. The resulting algorithms are applied and exercised thanks to a custom event-based simulator, which generates videos of outdoor scenes. The velocity estimation methods provide satisfactory results with a trade-off between accuracy and convergence speed. Regarding color estimation, the luminance estimation is challenging in the test cases, while the chrominance is precisely estimated. The estimated quantities are adequate for accurately classifying objects into predefined categories. Full article
Show Figures

Figure 1

15 pages, 7876 KiB  
Article
Detection of AI-Created Images Using Pixel-Wise Feature Extraction and Convolutional Neural Networks
by Fernando Martin-Rodriguez, Rocio Garcia-Mojon and Monica Fernandez-Barciela
Sensors 2023, 23(22), 9037; https://doi.org/10.3390/s23229037 - 8 Nov 2023
Cited by 5 | Viewed by 7488
Abstract
Generative AI has gained enormous interest nowadays due to new applications like ChatGPT, DALL E, Stable Diffusion, and Deepfake. In particular, DALL E, Stable Diffusion, and others (Adobe Firefly, ImagineArt, etc.) can create images from a text prompt and are even able to [...] Read more.
Generative AI has gained enormous interest nowadays due to new applications like ChatGPT, DALL E, Stable Diffusion, and Deepfake. In particular, DALL E, Stable Diffusion, and others (Adobe Firefly, ImagineArt, etc.) can create images from a text prompt and are even able to create photorealistic images. Due to this fact, intense research has been performed to create new image forensics applications able to distinguish between real captured images and videos and artificial ones. Detecting forgeries made with Deepfake is one of the most researched issues. This paper is about another kind of forgery detection. The purpose of this research is to detect photorealistic AI-created images versus real photos coming from a physical camera. Id est, making a binary decision over an image, asking whether it is artificially or naturally created. Artificial images do not need to try to represent any real object, person, or place. For this purpose, techniques that perform a pixel-level feature extraction are used. The first one is Photo Response Non-Uniformity (PRNU). PRNU is a special noise due to imperfections on the camera sensor that is used for source camera identification. The underlying idea is that AI images will have a different PRNU pattern. The second one is error level analysis (ELA). This is another type of feature extraction traditionally used for detecting image editing. ELA is being used nowadays by photographers for the manual detection of AI-created images. Both kinds of features are used to train convolutional neural networks to differentiate between AI images and real photographs. Good results are obtained, achieving accuracy rates of over 95%. Both extraction methods are carefully assessed by computing precision/recall and F1-score measurements. Full article
Show Figures

Figure 1

16 pages, 5659 KiB  
Article
Research on Fine-Grained Image Recognition of Birds Based on Improved YOLOv5
by Xiaomei Yi, Cheng Qian, Peng Wu, Brian Tapiwanashe Maponde, Tengteng Jiang and Wenying Ge
Sensors 2023, 23(19), 8204; https://doi.org/10.3390/s23198204 - 30 Sep 2023
Cited by 8 | Viewed by 2590
Abstract
Birds play a vital role in maintaining biodiversity. Accurate identification of bird species is essential for conducting biodiversity surveys. However, fine-grained image recognition of birds encounters challenges due to large within-class differences and small inter-class differences. To solve this problem, our study took [...] Read more.
Birds play a vital role in maintaining biodiversity. Accurate identification of bird species is essential for conducting biodiversity surveys. However, fine-grained image recognition of birds encounters challenges due to large within-class differences and small inter-class differences. To solve this problem, our study took a part-based approach, dividing the identification task into two parts: part detection and identification classification. We proposed an improved bird part detection algorithm based on YOLOv5, which can handle partial overlap and complex environmental conditions between part objects. The backbone network incorporates the Res2Net-CBAM module to enhance the receptive fields of each network layer, strengthen the channel characteristics, and improve the sensitivity of the model to important information. Additionally, in order to boost data on features extraction and channel self-regulation, we have integrated CBAM attention mechanisms into the neck. The success rate of our suggested model, according to experimental findings, is 86.6%, 1.2% greater than the accuracy of the original model. Furthermore, when compared with other algorithms, our model’s accuracy shows noticeable improvement. These results show how useful the method we suggested is for quickly and precisely recognizing different bird species. Full article
Show Figures

Figure 1

13 pages, 300 KiB  
Article
Display-Semantic Transformer for Scene Text Recognition
by Xinqi Yang, Wushour Silamu, Miaomiao Xu and Yanbing Li
Sensors 2023, 23(19), 8159; https://doi.org/10.3390/s23198159 - 28 Sep 2023
Cited by 1 | Viewed by 1983
Abstract
Linguistic knowledge helps a lot in scene text recognition by providing semantic information to refine the character sequence. The visual model only focuses on the visual texture of characters without actively learning linguistic information, which leads to poor model recognition rates in some [...] Read more.
Linguistic knowledge helps a lot in scene text recognition by providing semantic information to refine the character sequence. The visual model only focuses on the visual texture of characters without actively learning linguistic information, which leads to poor model recognition rates in some noisy (distorted and blurry, etc.) images. In order to address the aforementioned issues, this study builds upon the most recent findings of the Vision Transformer, and our approach (called Display-Semantic Transformer, or DST for short) constructs a masked language model and a semantic visual interaction module. The model can mine deep semantic information from images to assist scene text recognition and improve the robustness of the model. The semantic visual interaction module can better realize the interaction between semantic information and visual features. In this way, the visual features can be enhanced by the semantic information so that the model can achieve a better recognition effect. The experimental results show that our model improves the average recognition accuracy on six benchmark test sets by nearly 2% compared to the baseline. Our model retains the benefits of having a small number of parameters and allows for fast inference speed. Additionally, it attains a more optimal balance between accuracy and speed. Full article
Show Figures

Figure 1

Back to TopTop