Loading [MathJax]/jax/output/HTML-CSS/jax.js
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,354)

Search Parameters:
Keywords = construction image classification

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 14266 KiB  
Article
Predictive Capability Evaluation of Micrograph-Driven Deep Learning for Ti6Al4V Alloy Tensile Strength Under Varied Preprocessing Strategies
by Yuqi Xiong and Wei Duan
Metals 2025, 15(6), 586; https://doi.org/10.3390/met15060586 - 24 May 2025
Viewed by 199
Abstract
The purpose of this study is to develop a micrograph-driven model for Ti6Al4V mechanical property prediction through integrated image preprocessing and deep learning, reducing the reliance on manually extracted features and process parameters. This paper systematically evaluates the capability of a CNN model [...] Read more.
The purpose of this study is to develop a micrograph-driven model for Ti6Al4V mechanical property prediction through integrated image preprocessing and deep learning, reducing the reliance on manually extracted features and process parameters. This paper systematically evaluates the capability of a CNN model using preprocessed micrographs to predict Ti6Al4V alloy ultimate tensile strength (UTS), while analyzing how different preprocessing combinations influence model performance. A total of 180 micrographs were selected from published literature to construct the dataset. After applying image standardization (grayscale transformation, resizing, and normalization) and image enhancement, a pre-trained ResNet34 model was employed with transfer learning to conduct strength grade classification (low, medium, high) and UTS regression. The results demonstrated that on highly heterogeneous micrograph datasets, the model exhibited moderate classification capability (maximum accuracy = 65.60% ± 1.22%) but negligible UTS regression capability (highest R2 = 0.163 ± 0.020). Fine-tuning on subsets with consistent forming processes improved regression performance (highest R2 = 0.360 ± 1.47 × 10−5), outperforming traditional predictive models (highest R2 = 0.148). The classification model was insensitive to normalization methods, while min–max normalization with center-cropping showed optimal standardization for regression (R2 = 0.111 ± 0.017). Gamma correction maximized classification accuracy, whereas histogram equalization achieved the highest improvement for regression. Full article
Show Figures

Figure 1

28 pages, 3777 KiB  
Article
Multisensor Fault Diagnosis of Rolling Bearing with Noisy Unbalanced Data via Intuitionistic Fuzzy Weighted Least Squares Twin Support Higher-Order Tensor Machine
by Shengli Dong, Yifang Zhang and Shengzheng Wang
Machines 2025, 13(6), 445; https://doi.org/10.3390/machines13060445 - 22 May 2025
Viewed by 115
Abstract
Aiming at the limitations of existing multisensor fault diagnosis methods for rolling bearings in real industrial scenarios, this paper proposes an innovative intuitionistic fuzzy weighted least squares twin support higher-order tensor machine (IFW-LSTSHTM) model, which realizes a breakthrough in the noise robustness, adaptability [...] Read more.
Aiming at the limitations of existing multisensor fault diagnosis methods for rolling bearings in real industrial scenarios, this paper proposes an innovative intuitionistic fuzzy weighted least squares twin support higher-order tensor machine (IFW-LSTSHTM) model, which realizes a breakthrough in the noise robustness, adaptability to the working conditions, and the class imbalance processing capability. First, the multimodal feature tensor is constructed: the fourier synchro-squeezed transform is used to convert the multisensor time-domain signals into time–frequency images, and then the tensor is reconstructed to retain the three-dimensional structural information of the sensor coupling relationship and time–frequency features. The nonlinear feature mapping strategy combined with Tucker decomposition effectively maintains the high-order correlation of the feature tensor. Second, the adaptive sample-weighting mechanism is developed: an intuitionistic fuzzy membership score assignment scheme with global–local information fusion is proposed. At the global level, the class contribution is assessed based on the relative position of the samples to the classification boundary; at the local level, the topological structural features of the sample distribution are captured by K-nearest neighbor analysis; this mechanism significantly improves the recognition of noisy samples and the handling of class-imbalanced data. Finally, a dual hyperplane classifier is constructed in tensor space: a structural risk regularization term is introduced to enhance the model generalization ability and a dynamic penalty factor is set to set adaptive weights for different categories. A linear equation system solving strategy is adopted: the nonparallel hyperplane optimization is converted into matrix operations to improve the computational efficiency. The extensive experimental results on the two rolling bearing datasets have verified that the proposed method outperforms existing solutions in diagnostic accuracy and stability. Full article
(This article belongs to the Section Machines Testing and Maintenance)
Show Figures

Figure 1

25 pages, 33376 KiB  
Article
Spatial-Spectral Linear Extrapolation for Cross-Scene Hyperspectral Image Classification
by Lianlei Lin, Hanqing Zhao, Sheng Gao, Junkai Wang and Zongwei Zhang
Remote Sens. 2025, 17(11), 1816; https://doi.org/10.3390/rs17111816 - 22 May 2025
Viewed by 118
Abstract
In realistic hyperspectral image (HSI) cross-scene classification tasks, it is ideal to obtain target domain samples during the training phase. Therefore, a model needs to be trained on one or more source domains (SD) and achieve robust domain generalization (DG) performance on an [...] Read more.
In realistic hyperspectral image (HSI) cross-scene classification tasks, it is ideal to obtain target domain samples during the training phase. Therefore, a model needs to be trained on one or more source domains (SD) and achieve robust domain generalization (DG) performance on an unknown target domain (TD). Popular DG strategies constrain the model’s predictive behavior in synthetic space through deep, nonlinear source expansion, and an HSI generation model is usually adopted to enrich the diversity of training samples. However, recent studies have shown that the activation functions of neurons in a network exhibit asymmetry for different categories, which results in the learning of task-irrelevant features while attempting to learn task-related features (called “feature contamination”). For example, even if some intrinsic features of HSIs (lighting conditions, atmospheric environment, etc.) are irrelevant to the label, the neural network still tends to learn them, resulting in features that make the classification related to these spurious components. To alleviate this problem, this study replaces the common nonlinear generative network with a specific linear projection transformation, to reduce the number of neurons activated nonlinearly during training and alleviate the learning of contaminated features. Specifically, this study proposes a dimensionally decoupled spatial spectral linear extrapolation (SSLE) strategy to achieve sample augmentation. Inspired by the weakening effect of water vapor absorption and Rayleigh scattering on band reflectivity, we simulate a common spectral drift based on Markov random fields to achieve linear spectral augmentation. Further considering the common co-occurrence phenomenon of patch images in space, we design spatial weights combined with label determinism of the center pixel to construct linear spatial enhancement. Finally, to ensure the cognitive unity of the high-level features of the discriminator in the sample space, we use inter-class contrastive learning to align the back-end feature representation. Extensive experiments were conducted on four datasets, an ablation study showed the effectiveness of the proposed modules, and a comparative analysis with advanced DG algorithms showed the superiority of our model in the face of various spectral and category shifts. In particular, on the Houston18/Shanghai datasets, its overall accuracy was 0.51%/0.83% higher than the best results of the other methods, and its Kappa coefficient was 0.78%/2.07% higher, respectively. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

18 pages, 1692 KiB  
Article
Multiple-Feature Construction for Image Segmentation Based on Genetic Programming
by David Herrera-Sánchez, José-Antonio Fuentes-Tomás, Héctor-Gabriel Acosta-Mesa, Efrén Mezura-Montes and José-Luis Morales-Reyes
Math. Comput. Appl. 2025, 30(3), 57; https://doi.org/10.3390/mca30030057 - 21 May 2025
Viewed by 73
Abstract
Within the medical field, computer vision has an important role in different tasks, such as health anomaly detection, diagnosis, treatment, and monitoring medical conditions. Image segmentation is one of the most used techniques for medical support to identify regions of interest in different [...] Read more.
Within the medical field, computer vision has an important role in different tasks, such as health anomaly detection, diagnosis, treatment, and monitoring medical conditions. Image segmentation is one of the most used techniques for medical support to identify regions of interest in different organs. However, performing accurate segmentation is difficult due to image variations. In this way, this work proposes an automated multiple-feature construction approach for image segmentation, working with magnetic resonance images, computed tomography, and RGB digital images. Genetic programming is used to automatically create and construct pipelines to extract meaningful features for segmentation tasks. Additionally, a co-evolution strategy is proposed within the evolution process to increase diversity without affecting segmentation performance. The segmentation is addressed as a pixel classification task; in this way, a wrapper approach is used, and the classification model’s segmentation performance determines the fitness. To validate the effectiveness of the proposed method, four datasets were used to measure the capability of the proposal to deal with different types of medical images. The results demonstrate that the proposal achieves values of the DICE similarity coefficient of more than 0.6 in MRI and C.T. images. Additionally, the proposal is compared with SOTA GP-based methods and the convolutional neural networks used within the medical field. The method proposed outperforms these methods, achieving improvements greater than 20% in DICE, specificity, and sensitivity. Additionally, the qualitative results demonstrate that the proposal accurately identifies the region of interest. Full article
(This article belongs to the Special Issue Feature Papers in Mathematical and Computational Applications 2025)
Show Figures

Figure 1

24 pages, 1736 KiB  
Article
ProFusion: Multimodal Prototypical Networks for Few-Shot Learning with Feature Fusion
by Jia Zhao, Ziyang Cao, Huiling Wang, Xu Wang and Yingzhou Chen
Symmetry 2025, 17(5), 796; https://doi.org/10.3390/sym17050796 - 20 May 2025
Viewed by 105
Abstract
Existing few-shot learning models leverage vision-language pre-trained models to alleviate the data scarcity problem. However, such models usually process visual and text information separately, which causes still inherent disparities between cross-modal features. Therefore, we propose the ProFusion model, which leverages multimodal pre-trained models [...] Read more.
Existing few-shot learning models leverage vision-language pre-trained models to alleviate the data scarcity problem. However, such models usually process visual and text information separately, which causes still inherent disparities between cross-modal features. Therefore, we propose the ProFusion model, which leverages multimodal pre-trained models and prototypical networks to construct multiple prototypes. Specifically, ProFusion generates image and text prototypes symmetrically using the visual encoder and text encoder, while integrating visual and text information through the fusion module to create more expressive multimodal feature fusion prototypes. Additionally, we introduce the alignment module to ensure consistency between image and text prototypes. During inference, ProFusion calculates the similarity of test images to the three types of prototypes separately and applies a weighted sum to generate the final prediction. Experiments demonstrate that ProFusion performs outstanding classification tasks on 15 benchmark datasets. Full article
(This article belongs to the Special Issue Symmetry and Asymmetry in Computer Vision and Graphics)
Show Figures

Figure 1

26 pages, 10932 KiB  
Article
A Smartphone-Based Non-Destructive Multimodal Deep Learning Approach Using pH-Sensitive Pitaya Peel Films for Real-Time Fish Freshness Detection
by Yixuan Pan, Yujie Wang, Yuzhe Zhou, Jiacheng Zhou, Manxi Chen, Dongling Liu, Feier Li, Can Liu, Mingwan Zeng, Dongjing Jiang, Xiangyang Yuan and Hejun Wu
Foods 2025, 14(10), 1805; https://doi.org/10.3390/foods14101805 - 19 May 2025
Viewed by 215
Abstract
The detection of fish freshness is crucial for ensuring food safety. This study addresses the limitations of traditional detection methods, which rely on laboratory equipment and complex procedures, by proposing a smartphone-based detection method, termed FreshFusionNet, that utilizes a pitaya peel pH intelligent [...] Read more.
The detection of fish freshness is crucial for ensuring food safety. This study addresses the limitations of traditional detection methods, which rely on laboratory equipment and complex procedures, by proposing a smartphone-based detection method, termed FreshFusionNet, that utilizes a pitaya peel pH intelligent indicator film in conjunction with multimodal deep learning. The pitaya peel indicator film, prepared using high-pressure homogenization technology, demonstrates a significant color change from dark red to yellow in response to the volatile alkaline substances released during fish spoilage. To construct a multimodal dataset, 3600 images of the indicator film were captured using a smartphone under various conditions (natural light and indoor light) and from multiple angles (0° to 120°), while simultaneously recording pH values, total volatile basic nitrogen (TVB-N), and total viable count (TVC) data. Based on the lightweight MobileNetV2 network, a Multi-scale Dilated Fusion Attention module (MDFA) was designed to enhance the robustness of color feature extraction. A Temporal Convolutional Network (TCN) was then used to model dynamic patterns in chemical indicators across spoilage stages, combined with a Context-Aware Gated Fusion (CAG-Fusion) mechanism to adaptively integrate image and chemical temporal features. Experimental results indicate that the overall classification accuracy of FreshFusionNet reaches 99.61%, with a single inference time of only 142 ± 40 milliseconds (tested on Xiaomi 14). This method eliminates the need for professional equipment and enables real-time, non-destructive detection of fish spoilage through smartphones, providing consumers and the food supply chain with a low-cost, portable quality-monitoring tool, thereby promoting the intelligent and universal development of food safety detection technology. Full article
(This article belongs to the Special Issue Development and Application of Biosensors in the Food Field)
Show Figures

Figure 1

21 pages, 4104 KiB  
Article
Linkage Analysis Between Coastline Change and Both Sides of Coastal Ecological Spaces
by Xianchuang Fan, Chao Zhou, Tiejun Cui, Tong Wu, Qian Zhao and Mingming Jia
Water 2025, 17(10), 1505; https://doi.org/10.3390/w17101505 - 16 May 2025
Viewed by 120
Abstract
As the first marine economic zone, the coastal zone is a complex and active ecosystem, serving as an important resource breeding area. However, during the process of economic development, coastal zone resources have been severely exploited, leading to fragile ecology and frequent natural [...] Read more.
As the first marine economic zone, the coastal zone is a complex and active ecosystem, serving as an important resource breeding area. However, during the process of economic development, coastal zone resources have been severely exploited, leading to fragile ecology and frequent natural disasters. Therefore, it is imperative to analyze coastline changes and their correlation with coastal ecological space. Utilizing long-time series high-resolution remote sensing images, Google Earth images, and key sea area unmanned aerial vehicle (UAV) remote sensing monitoring data, this study selected the coastal zone of Ningbo City as the research area. Remote sensing interpretation mark databases for coastline and typical coastal ecological space were established. Coastline extraction was completed based on the visual discrimination method. With the help of the Modified Normalized Difference Water Index (MNDWI), Normalized Difference Vegetation Index (NDVI) and maximum likelihood classification, a hierarchical classification discrimination process combined with a visual discrimination method was constructed to extract long-time series coastal ecological space information. The changes and the linkage relationship between the coastlines and coastal ecological spaces were analyzed. The results show that the extraction accuracy of ground objects based on the hierarchical classification process is high, and the verification effect is improved with the help of UAV remote sensing monitoring. Through long-time sequence change monitoring, it was found that the change in coastline traffic and transportation is significant. Changes in ecological spaces, such as industrial zones, urban construction, agricultural flood wetlands and irrigation land, dominated the change in artificial shorelines, while the change in Spartina alterniflora dominated the change in biological coastlines. The change in ecological space far away from the coastline on both the land and sea sides has little influence on the coastline. The research shows that the correlation analysis between coastline and coastal ecological space provides a new perspective for coastal zone research. In the future, it can provide technical support for coastal zone protection, dynamic supervision, administration, and scientific research. Full article
(This article belongs to the Special Issue Advanced Remote Sensing for Coastal System Monitoring and Management)
Show Figures

Figure 1

26 pages, 13657 KiB  
Article
Multilevel Feature Cross-Fusion-Based High-Resolution Remote Sensing Wetland Landscape Classification and Landscape Pattern Evolution Analysis
by Sijia Sun, Biao Wang, Zhenghao Jiang, Ziyan Li, Sheng Xu, Chengrong Pan, Jun Qin, Yanlan Wu and Peng Zhang
Remote Sens. 2025, 17(10), 1740; https://doi.org/10.3390/rs17101740 - 16 May 2025
Viewed by 99
Abstract
Analyzing wetland landscape pattern evolution is crucial for managing wetland resources. High-resolution remote sensing serves as a primary method for monitoring wetland landscape patterns. However, the complex landscape types and spatial structures of wetlands pose challenges, including interclass similarity and intraclass spatial heterogeneity, [...] Read more.
Analyzing wetland landscape pattern evolution is crucial for managing wetland resources. High-resolution remote sensing serves as a primary method for monitoring wetland landscape patterns. However, the complex landscape types and spatial structures of wetlands pose challenges, including interclass similarity and intraclass spatial heterogeneity, leading to the low separability of landscapes and difficulties in identifying fragmented and small objects. To address these issues, this study proposes the multilevel feature cross-fusion wetland landscape classification network (MFCFNet), which combines the global modeling capability of Swin Transformer with the local detail-capturing ability of convolutional neural networks (CNNs), facilitating discerning intraclass consistency and interclass differences. To alleviate the semantic confusion caused by different-level features with semantic gaps during fusion, we introduce a deep–shallow feature cross-fusion (DSFCF) module between the encoder and the decoder. We incorporate global–local attention block (GLAB) to aggregate global contextual information and local detail. The constructed Shengjin Lake Wetland Gaofen Image Dataset (SLWGID) is utilized to evaluate the performance of MFCFNet, achieving evaluation metric results of the OA, mIoU, and F1 score of 93.23%, 78.12%, and 87.05%, respectively. MFCFNet is used to classify the wetland landscape of Shengjin Lake from 2013 to 2023. A landscape pattern evolution analysis is conducted, focusing on landscape transitions, area changes, and pattern characteristic variations. The method demonstrates effectiveness for the dynamic monitoring of wetland landscape patterns, providing valuable insights for wetland conservation. Full article
Show Figures

Figure 1

21 pages, 3507 KiB  
Article
WSSGCN: Hyperspectral Forest Image Classification via Watershed Superpixel Segmentation and Sparse Graph Convolutional Networks
by Pingfei Chen, Xuyang Li, Yong Peng, Xiangsuo Fan and Qi Li
Forests 2025, 16(5), 827; https://doi.org/10.3390/f16050827 - 15 May 2025
Viewed by 190
Abstract
Hyperspectral image classification is crucial in remote sensing but faces challenges in forest ecosystem studies due to high-dimensional data, spectral variability, and spatial heterogeneity. Watershed Superpixel Segmentation and Sparse Graph Convolutional Networks (WSSGCN), a novel framework designed for efficient forest image classification, is [...] Read more.
Hyperspectral image classification is crucial in remote sensing but faces challenges in forest ecosystem studies due to high-dimensional data, spectral variability, and spatial heterogeneity. Watershed Superpixel Segmentation and Sparse Graph Convolutional Networks (WSSGCN), a novel framework designed for efficient forest image classification, is introduced in this paper. Watershed superpixel segmentation is first used by the method to divide hyperspectral images into semantically consistent regions, reducing computational complexity while preserving terrain boundary information. On this basis, a dual-branch model is designed: a local branch with multi-scale convolutional neural networks (CNN) extracts spatial–spectral features, while a global branch constructs superpixel graphs and uses GCNs to model the global context. To enhance efficiency, a sparse tensor-based storage method is proposed for the adjacency matrix, reducing complexity from quadratic to linear. Additionally, an attention-based adaptive fusion strategy dynamically balances local and global features. Experiments on multiple datasets show that WSSGCN outperforms mainstream methods in overall accuracy (OA), average accuracy (AA), and Kappa coefficient. Notably, it achieves a 3.5% OA improvement and a 0.04 Kappa coefficient increase compared to SPEFORMER on the WHU-Hi-HongHu dataset. Practicality in resource-limited scenarios is ensured by sparse graph modeling. This work offers an efficient solution for forest monitoring, supporting applications like biodiversity assessment and deforestation tracking, and advances remote sensing-based forest ecosystem analysis. The proposed approach shows strong potential for real-world ecological conservation and forest management. Full article
Show Figures

Figure 1

21 pages, 9991 KiB  
Article
Hypergraph Convolution Network Classification for Hyperspectral and LiDAR Data
by Lei Wang and Shiwen Deng
Sensors 2025, 25(10), 3092; https://doi.org/10.3390/s25103092 - 14 May 2025
Viewed by 213
Abstract
Conventional remote sensing classification approaches based on single-source data exhibit inherent limitations, driving significant research interest in improved multimodal data fusion techniques. Although deep learning methods based on convolutional neural networks (CNNs), transformers, and graph convolutional networks (GCNs) have demonstrated promising results in [...] Read more.
Conventional remote sensing classification approaches based on single-source data exhibit inherent limitations, driving significant research interest in improved multimodal data fusion techniques. Although deep learning methods based on convolutional neural networks (CNNs), transformers, and graph convolutional networks (GCNs) have demonstrated promising results in fusing complementary multi-source data, existing methodologies demonstrate limited efficacy in capturing the intricate higher-order spatial–spectral dependencies among pixels. To overcome these limitations, we propose HGCN-HL, a novel multimodal deep learning framework that integrates hypergraph convolutional networks (HGCNs) with lightweight CNNs. Specifically, an adaptive weight mechanism is first designed to preliminarily fuse the spectral features of hyperspectral imaging (HSI) and Light Detection and Ranging (LiDAR), enhancing the feature representation ability. Then, superpixel-based dynamic hyperedge construction enables the joint characterization of homogeneous regions across both modalities, significantly boosting large-scale object recognition accuracy. Finally, local detail features are captured through a parallel CNN branch, complementing the global relationship modeling of the HGCN. Comprehensive experiments conducted on three benchmark datasets demonstrate the superior performance of our method compared to existing state-of-the-art approaches. Notably, the proposed framework achieves significant improvements in both training efficiency and inference speed while maintaining competitive accuracy. Full article
(This article belongs to the Collection Machine Learning and AI for Sensors)
Show Figures

Figure 1

28 pages, 9332 KiB  
Article
Contrastive Learning-Based Cross-Modal Fusion for Product Form Imagery Recognition: A Case Study on New Energy Vehicle Front-End Design
by Yutong Zhang, Jiantao Wu, Li Sun and Guoan Yang
Sustainability 2025, 17(10), 4432; https://doi.org/10.3390/su17104432 - 13 May 2025
Viewed by 251
Abstract
Fine-grained feature extraction and affective semantic mapping remain significant challenges in product form analysis. To address these issues, this study proposes a contrastive learning-based cross-modal fusion approach for product form imagery recognition, using the front-end design of new energy vehicles (NEVs) as a [...] Read more.
Fine-grained feature extraction and affective semantic mapping remain significant challenges in product form analysis. To address these issues, this study proposes a contrastive learning-based cross-modal fusion approach for product form imagery recognition, using the front-end design of new energy vehicles (NEVs) as a case study. The proposed method first employs the Biterm Topic Model (BTM) and Analytic Hierarchy Process (AHP) to extract thematic patterns and compute weight distributions from consumer review texts, thereby identifying key imagery style labels. These labels are then leveraged for image annotation, facilitating the construction of a multimodal dataset. Next, ResNet-50 and Transformer architectures serve as the image and text encoders, respectively, to extract and represent multimodal features. To ensure effective alignment and deep fusion of textual and visual representations in a shared embedding space, a contrastive learning mechanism is introduced, optimizing cosine similarity between positive and negative sample pairs. Finally, a fully connected multilayer network is integrated at the output of the Transformer and ResNet with Contrastive Learning (TRCL) model to enhance classification accuracy and reliability. Comparative experiments against various deep convolutional neural networks (DCNNs) demonstrate that the TRCL model effectively integrates semantic and visual information, significantly improving the accuracy and robustness of complex product form imagery recognition. These findings suggest that the proposed method holds substantial potential for large-scale product appearance evaluation and affective cognition research. Moreover, this data-driven fusion underpins sustainable product form design by streamlining evaluation and optimizing resource use. Full article
Show Figures

Figure 1

24 pages, 11112 KiB  
Article
Semantic Segmentation of Sika Deer Antler Image by U-Net Based on Two-Dimensional Discrete Wavelet Transform Fusion and Multi-Attention Mechanism
by Haotian Gong, Jinfan Wei, Yu Sun, Zhipeng Li, He Gong and Juanjuan Fan
Animals 2025, 15(10), 1388; https://doi.org/10.3390/ani15101388 - 11 May 2025
Viewed by 212
Abstract
At present, the monitoring technology of the growth status of sika deer antlers faces many challenges in a complex breeding environment (such as light change, object occlusion, etc.). More importantly, an effective method for the segmentation of sika deer antlers is still lacking, [...] Read more.
At present, the monitoring technology of the growth status of sika deer antlers faces many challenges in a complex breeding environment (such as light change, object occlusion, etc.). More importantly, an effective method for the segmentation of sika deer antlers is still lacking, which hinders the development of subsequent quality classification of sika deer antlers. In order to fill the research gap and lay a foundation for future sika deer antler quality classification, this paper proposed an improved semantic segmentation model based on U-Net, named SDAS-Net. In order to improve the segmentation accuracy and generalization ability of the model in a complex environment, we introduced a two-dimensional discrete wavelet transform module (2D-DWT) in the encoder head to reduce noise interference and enhance the ability to capture features. In order to compensate for the loss of feature information caused by 2D-DWT, we embedded the Star Blocks module in the encoder. In addition, the efficient mixed channel attention (EMCA) module was introduced to adaptively enhance key feature channels in the decoder, and the dual cross-attention mechanism (DCA) module was used to fuse high-dimensional features in skip connections. To verify the validity of the model, we constructed a 1055-image sika deer antler dataset (SDR). The experimental results show that compared with the baseline model, the performance of the SDAS-Net model is significantly improved, reaching 92.12% in MIoU and 93.63% in the PA index, and the number of parameters is only increased by 6.9%. The results show that the SDAS-Net model can effectively deal with the task of sika deer antler segmentation in a complex breeding environment while maintaining high precision. Full article
(This article belongs to the Section Animal System and Management)
Show Figures

Figure 1

23 pages, 10175 KiB  
Article
Feature-Guided Instance Mining and Task-Aligned Focal Loss for Weakly Supervised Object Detection in Remote Sensing Images
by Jinlin Tan, Chenhao Wang, Xiaomin Tan, Min Zhang and Hai Wang
Remote Sens. 2025, 17(10), 1673; https://doi.org/10.3390/rs17101673 - 9 May 2025
Viewed by 184
Abstract
Weakly supervised object detection (WSOD) in remote sensing images (RSIs) aims to achieve high-value object classification and localization using only image-level labels, and it has a wide range of applications. However, existing popular WSOD models still encounter two challenges. First, these WSOD models [...] Read more.
Weakly supervised object detection (WSOD) in remote sensing images (RSIs) aims to achieve high-value object classification and localization using only image-level labels, and it has a wide range of applications. However, existing popular WSOD models still encounter two challenges. First, these WSOD models typically select the highest-scoring proposal as the seed instance while ignoring lower-scoring ones, resulting in some less-obvious objects being missed. Second, current models fail to ensure consistency between classification and regression, limiting the upper bound of WSOD performance. To address the first challenge, we propose a feature-guided seed instance mining (FGSIM) strategy to mine reliable seed instances. Specifically, FGSIM first selects multiple high-scoring proposals as seed instances and then leverages a feature similarity measure to mine additional seed instances among lower-scoring proposals. Furthermore, a contrastive loss is introduced to construct a credible similarity threshold for FGSIM by leveraging the consistent feature representations of instances within the same category. To address the second challenge, a task-aligned focal (TAF) loss is proposed to enforce consistency between classification and regression. Specifically, the localization difficulty score and classification difficulty score are used as weights for the regression and classification losses, respectively, thereby promoting their synchronous optimization by minimizing the TAF loss. Additionally, rotated images are incorporated into the baseline to encourage the model to make consistent predictions for objects with arbitrary orientations. Ablation studies validate the effectiveness of FGSIM, TAF loss, and their combination. Comparisons with popular models on two RSI datasets further demonstrate the superiority of our approach. Full article
Show Figures

Graphical abstract

24 pages, 1224 KiB  
Article
MDFormer: Transformer-Based Multimodal Fusion for Robust Chest Disease Diagnosis
by Xinlong Liu, Fei Pan, Hainan Song, Siyi Cao, Chunping Li and Tanshi Li
Electronics 2025, 14(10), 1926; https://doi.org/10.3390/electronics14101926 - 9 May 2025
Viewed by 291
Abstract
With the increasing richness of medical images and clinical data, abundant data support is provided for multimodal chest disease diagnosis methods. However, traditional multimodal fusion methods are often relatively simple, leading to insufficient exploitation of crossmodal complementary advantages. At the same time, existing [...] Read more.
With the increasing richness of medical images and clinical data, abundant data support is provided for multimodal chest disease diagnosis methods. However, traditional multimodal fusion methods are often relatively simple, leading to insufficient exploitation of crossmodal complementary advantages. At the same time, existing multimodal chest disease diagnosis methods usually focus on two modalities, and their scalability is poor when extended to three or more modalities. Moreover, in practical clinical scenarios, missing modality problems often arise due to equipment limitations or incomplete data acquisition. To address these issues, this paper proposes a novel multimodal chest disease classification model, MDFormer. This model designs a crossmodal attention fusion mechanism, MFAttention, and combines it with the Transformer architecture to construct a multimodal fusion module, MFTrans, which effectively integrates medical imaging, clinical text, and vital signs data. When extended to multiple modalities, MFTrans significantly reduces model parameters. At the same time, this paper also proposes a two-stage masked enhancement classification and contrastive learning training framework, MECCL, which significantly improves the model’s robustness and transferability. Experimental results show that MDFormer achieves a classification precision of 0.8 on the MIMIC dataset, and when 50% of the modality data are missing, the AUC can reach 85% of that of the complete data, outperforming models that did not use two-stage training. Full article
Show Figures

Figure 1

15 pages, 5253 KiB  
Article
Detection of Tagosodes orizicolus in Aerial Images of Rice Crops Using Machine Learning
by Angig Rivera-Cartagena, Heber I. Mejia-Cabrera and Juan Arcila-Diaz
AgriEngineering 2025, 7(5), 147; https://doi.org/10.3390/agriengineering7050147 - 7 May 2025
Viewed by 229
Abstract
This study employs RGB imagery and machine learning techniques to detect Tagosodes orizicolus infestations in “Tinajones” rice crops during the flowering stage, a critical challenge for agriculture in northern Peru. High-resolution images were acquired using an unmanned aerial vehicle (UAV) and preprocessed by [...] Read more.
This study employs RGB imagery and machine learning techniques to detect Tagosodes orizicolus infestations in “Tinajones” rice crops during the flowering stage, a critical challenge for agriculture in northern Peru. High-resolution images were acquired using an unmanned aerial vehicle (UAV) and preprocessed by extracting 256 × 256-pixel segments, focusing on three classes: infested zones, non-cultivated areas, and healthy rice crops. A dataset of 1500 images was constructed and utilized to train deep learning models based on VGG16 and ResNet50. Both models exhibited highly comparable performance, with VGG16 attaining a precision of 98.274% and ResNet50 achieving a precision of 98.245%, demonstrating their effectiveness in identifying infestation patterns with high reliability. To automate the analysis of complete UAV-acquired images, a web-based application was developed. This system receives an image, segments it into grids, and preprocesses each section using resizing, normalization, and dimensional adjustments. The pretrained VGG16 model subsequently classifies each segment into one of three categories: infested zone, non-cultivated area, or healthy crop, overlaying the classification results onto the original image to generate an annotated visualization of detected areas. This research contributes to precision agriculture by providing an efficient and scalable computational tool for early infestation detection, thereby supporting timely intervention strategies to mitigate potential crop losses. Full article
Show Figures

Figure 1

Back to TopTop