MDPI - Publisher of Open Access Journals

24 pages, 8171 KB

Open AccessArticle

Breast Cancer Image Classification Using Phase Features and Deep Ensemble Models

by Edgar Omar Molina Molina and Victor H. Diaz-Ramirez

Appl. Sci. 2025, 15(14), 7879; https://doi.org/10.3390/app15147879 - 15 Jul 2025

Viewed by 546

Breast cancer is a leading cause of mortality among women worldwide. Early detection is crucial for increasing patient survival rates. Artificial intelligence, particularly convolutional neural networks (CNNs), has enabled the development of effective diagnostic systems by digitally processing mammograms. CNNs have been widely [...] Read more.

Breast cancer is a leading cause of mortality among women worldwide. Early detection is crucial for increasing patient survival rates. Artificial intelligence, particularly convolutional neural networks (CNNs), has enabled the development of effective diagnostic systems by digitally processing mammograms. CNNs have been widely used for the classification of breast cancer in images, obtaining accurate results similar in many cases to those of medical specialists. This work presents a hybrid feature extraction approach for breast cancer detection that employs variants of EfficientNetV2 network and convenient image representation based on phase features. First, a region of interest (ROI) is extracted from the mammogram. Next, a three-channel image is created using the local phase, amplitude, and orientation features of the ROI. A feature vector is constructed for the processed mammogram using the developed CNN model. The size of the feature vector is reduced using simple statistics, achieving a redundancy suppression of

99.65 %

. The reduced feature vector is classified as either malignant or benign using a classifier ensemble. Experimental results using a training/testing ratio of 70/30 on 15,506 mammography images from three datasets produced an accuracy of

86.28 %

, a precision of

78.75 %

, a recall of

86.14 %

, and an F1-score of

80.09 %

with the modified EfficientNetV2 model and stacking classifier. However, an accuracy of

93.47 %

, a precision of

87.61 %

, a recall of

93.19 %

, and an F1-score of

90.32 %

were obtained using only CSAW-M dataset images. Full article

(This article belongs to the Special Issue Object Detection and Image Processing Based on Computer Vision)

► Show Figures

Figure 1

24 pages, 25747 KB

Open AccessArticle

Infrared Small Target Detection Using Directional Derivative Correlation Filtering and a Relative Intensity Contrast Measure

by Feng Xie, Dongsheng Yang, Yao Yang, Tao Wang and Kai Zhang

Remote Sens. 2025, 17(11), 1921; https://doi.org/10.3390/rs17111921 - 31 May 2025

Viewed by 581

Abstract

Detecting small targets in infrared search and track (IRST) systems in complex backgrounds poses a significant challenge. This study introduces a novel detection framework that integrates directional derivative correlation filtering (DDCF) with a local relative intensity contrast measure (LRICM) to effectively handle diverse [...] Read more.

Detecting small targets in infrared search and track (IRST) systems in complex backgrounds poses a significant challenge. This study introduces a novel detection framework that integrates directional derivative correlation filtering (DDCF) with a local relative intensity contrast measure (LRICM) to effectively handle diverse background disturbances, including cloud edges and structural corners. This approach involves converting the original infrared image into an infrared gradient vector field (IGVF) using a facet model. Exploiting the distinctive characteristics of small targets in second-order derivative computations, four directional filters are designed to emphasize target features while suppressing edge clutter. The DDCF map is then constructed by merging the results of the second-order derivative filters applied in four distinct orientations. Subsequently, the LRICM is determined by analyzing the gray-level contrast between the target and its immediate surroundings, effectively minimizing interference from background elements like corners. The final detection step involves fusing the DDCF and LRICM maps to generate a comprehensive saliency representation, which is then processed using an adaptive thresholding technique to extract small targets accurately. Experimental evaluations across multiple datasets verify that the proposed method substantially improves the signal-to-clutter ratio (SCR). Compared to existing advanced techniques, the proposed approach demonstrates superior detection reliability in challenging environments, including ground surfaces, cloudy conditions, forested areas, and urban structures. Moreover, the framework maintains low computational complexity, achieving a favorable balance between detection accuracy and efficiency, thereby demonstrating promising potential for deployment in practical IRST scenarios. Full article

► Show Figures

Figure 1

21 pages, 1875 KB

Open AccessArticle

Direction-Aware Lightweight Framework for Traditional Mongolian Document Layout Analysis

by Chenyang Zhou, Monghjaya Ha and Licheng Wu

Appl. Sci. 2025, 15(8), 4594; https://doi.org/10.3390/app15084594 - 21 Apr 2025

Viewed by 565

Abstract

Traditional Mongolian document layout analysis faces unique challenges due to its vertical writing system and complex structural arrangements. Existing methods often struggle with the directional nature of traditional Mongolian text and require substantial computational resources. In this paper, we propose a direction-aware lightweight [...] Read more.

Traditional Mongolian document layout analysis faces unique challenges due to its vertical writing system and complex structural arrangements. Existing methods often struggle with the directional nature of traditional Mongolian text and require substantial computational resources. In this paper, we propose a direction-aware lightweight framework that effectively addresses these challenges. Our framework introduces three key innovations: a modified MobileNetV3 backbone with asymmetric convolutions for efficient vertical feature extraction, a dynamic feature enhancement module with channel attention for adaptive multi-scale information fusion, and a direction-aware detection head with

(sin θ, cos θ)

vector representation for accurate orientation modeling. We evaluate our method on TMDLAD, a newly constructed traditional Mongolian document layout analysis dataset, comparing it with both heavy ResNet-50-based models and lightweight alternatives. The experimental results demonstrate that our approach achieves state-of-the-art performance, with 0.715 mAP and 92.3% direction accuracy with a mean absolute error of only 2.5°, while maintaining high efficiency at 28.6 FPS using only 8.3 M parameters. Our model outperforms the best ResNet-50-based model by 3.6% in mAP and the best lightweight model by 4.3% in mAP, while uniquely providing direction prediction capability that other lightweight models lack. The proposed framework significantly outperforms existing methods in both accuracy and efficiency, providing a practical solution for traditional Mongolian document layout analysis that can be extended to other vertical writing systems. Full article

► Show Figures

Figure 1

33 pages, 36122 KB

Open AccessArticle

Solar Flare Prediction Using Multivariate Time Series of Photospheric Magnetic Field Parameters: A Comparative Analysis of Vector, Time Series, and Graph Data Representations

by Onur Vural, Shah Muhammad Hamdi and Soukaina Filali Boubrahimi

Remote Sens. 2025, 17(6), 1075; https://doi.org/10.3390/rs17061075 - 18 Mar 2025

Viewed by 1518

Abstract

The purpose of this study is to provide a comprehensive resource for the selection of data representations for machine learning-oriented models and components in solar flare prediction tasks. Major solar flares occurring in the solar corona and heliosphere can bring potential destructive consequences, [...] Read more.

The purpose of this study is to provide a comprehensive resource for the selection of data representations for machine learning-oriented models and components in solar flare prediction tasks. Major solar flares occurring in the solar corona and heliosphere can bring potential destructive consequences, posing significant risks to astronauts, space stations, electronics, communication systems, and numerous technological infrastructures. For this reason, the accurate detection of major flares is essential for mitigating these hazards and ensuring the safety of our technology-dependent society. In response, leveraging machine learning techniques for predicting solar flares has emerged as a significant application within the realm of data science, relying on sensor data collected from solar active region photospheric magnetic fields by space- and ground-based observatories. In this research, three distinct solar flare prediction strategies utilizing the photospheric magnetic field parameter-based multivariate time series dataset are evaluated, with a focus on data representation techniques. Specifically, we examine vector-based, time series-based, and graph-based approaches to identify the most effective data representation for capturing key characteristics of the dataset. The vector-based approach condenses multivariate time series into a compressed vector form, the time series representation leverages temporal patterns, and the graph-based method models interdependencies between magnetic field parameters. The results demonstrate that the vector representation approach exhibits exceptional robustness in predicting solar flares, consistently yielding strong and reliable classification outcomes by effectively encapsulating the intricate relationships within photospheric magnetic field data when coupled with appropriate downstream machine learning classifiers. Full article

► Show Figures

Figure 1

29 pages, 10229 KB

Open AccessArticle

End-to-End Vector Simplification for Building Contours via a Sequence Generation Model

by Longfei Cui, Junkui Xu, Lin Jiang and Haizhong Qian

ISPRS Int. J. Geo-Inf. 2025, 14(3), 124; https://doi.org/10.3390/ijgi14030124 - 9 Mar 2025

Viewed by 1055

Abstract

Simplifying building contours involves reducing data volume while preserving the continuity, accuracy, and essential characteristics of building shapes. This presents significant challenges for sequence representation and generation. Traditional methods often rely on complex rule design, feature engineering, and iterative optimization. To overcome these [...] Read more.

Simplifying building contours involves reducing data volume while preserving the continuity, accuracy, and essential characteristics of building shapes. This presents significant challenges for sequence representation and generation. Traditional methods often rely on complex rule design, feature engineering, and iterative optimization. To overcome these limitations, this study proposes a Transformer-based Polygon Simplification Model (TPSM) for the end-to-end vector simplification of building contours. TPSM processes ordered vertex coordinate sequences of building contours, leveraging the inherent sequence modeling capabilities of the Transformer architecture to directly generate simplified coordinate sequences. To enhance spatial understanding, positional encoding is embedded within the multihead self-attention mechanism, allowing the TPSM to effectively capture relative vertex positions. Additionally, a self-supervised reconstruction mechanism is introduced, where random perturbations are applied to input sequences, and the model learns to reconstruct the original contours. This mechanism enables TPSM to better understand underlying geometric relationships and implicit simplification rules. Experiments were conducted using a 1:10,000 building dataset from Shenzhen, China, targeting a simplification scale of 1:25,000. The results demonstrate that TPSM outperforms five established simplification algorithms in controlling changes to building area, orientation, and shape fidelity, achieving an average intersection over union (IoU) of 0.901 and a complexity-aware IoU (C-IoU) of 0.735. Full article

► Show Figures

Figure 1

23 pages, 69279 KB

Open AccessArticle

A Novel Equivariant Self-Supervised Vector Network for Three-Dimensional Point Clouds

by Kedi Shen, Jieyu Zhao and Min Xie

Algorithms 2025, 18(3), 152; https://doi.org/10.3390/a18030152 - 7 Mar 2025

Viewed by 1097

Abstract

For networks that process 3D data, estimating the orientation and position of 3D objects is a challenging task. This is because the traditional networks are not robust to the rotation of the data, and their internal workings are largely opaque and uninterpretable. To [...] Read more.

For networks that process 3D data, estimating the orientation and position of 3D objects is a challenging task. This is because the traditional networks are not robust to the rotation of the data, and their internal workings are largely opaque and uninterpretable. To solve this problem, a novel equivariant self-supervised vector network for point clouds is proposed. The network can learn the rotation direction information of the 3D target and estimate the rotational pose change of the target, and the interpretability of the equivariant network is studied using information theory. The utilization of vector neurons within the network lifts the scalar data to vector representations, enabling the network to learn the pose information inherent in the 3D target. The network can perform complex rotation-equivariant tasks after pre-training, and it shows impressive performance in complex tasks like category-level pose change estimation and rotation-equivariant reconstruction. We demonstrate through experiments that our network can accurately detect the orientation and pose change of point clouds and visualize the latent features. Moreover, it performs well in invariant tasks such as classification and category-level segmentation. Full article

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

► Show Figures

Figure 1

14 pages, 4193 KB

Open AccessArticle

Latent Space Representations for Marker-Less Realtime Hand–Eye Calibration

by Juan Camilo Martínez-Franco, Ariel Rojas-Álvarez, Alejandra Tabares, David Álvarez-Martínez and César Augusto Marín-Moreno

Sensors 2024, 24(14), 4662; https://doi.org/10.3390/s24144662 - 18 Jul 2024

Cited by 1 | Viewed by 1365

Abstract

Marker-less hand–eye calibration permits the acquisition of an accurate transformation between an optical sensor and a robot in unstructured environments. Single monocular cameras, despite their low cost and modest computation requirements, present difficulties for this purpose due to their incomplete correspondence of projected [...] Read more.

Marker-less hand–eye calibration permits the acquisition of an accurate transformation between an optical sensor and a robot in unstructured environments. Single monocular cameras, despite their low cost and modest computation requirements, present difficulties for this purpose due to their incomplete correspondence of projected coordinates. In this work, we introduce a hand–eye calibration procedure based on the rotation representations inferred by an augmented autoencoder neural network. Learning-based models that attempt to directly regress the spatial transform of objects such as the links of robotic manipulators perform poorly in the orientation domain, but this can be overcome through the analysis of the latent space vectors constructed in the autoencoding process. This technique is computationally inexpensive and can be run in real time in markedly varied lighting and occlusion conditions. To evaluate the procedure, we use a color-depth camera and perform a registration step between the predicted and the captured point clouds to measure translation and orientation errors and compare the results to a baseline based on traditional checkerboard markers. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

14 pages, 1329 KB

Open AccessEditor’s ChoiceArticle

Crystallographic Quaternions

by Andrzej Katrusiak and Stiv Llenga

Symmetry 2024, 16(7), 818; https://doi.org/10.3390/sym16070818 - 29 Jun 2024

Cited by 1 | Viewed by 2315

Abstract

Symmetry transformations in crystallography are traditionally represented as equations and matrices, which can be suitable both for orthonormal and crystal reference systems. Quaternion representations, easily constructed for any orientations of symmetry operations, owing to the vector structure based on the direction of the [...] Read more.

Symmetry transformations in crystallography are traditionally represented as equations and matrices, which can be suitable both for orthonormal and crystal reference systems. Quaternion representations, easily constructed for any orientations of symmetry operations, owing to the vector structure based on the direction of the rotation axes or of the normal vectors to the mirror plane, are known to be advantageous for optimizing numerical computing. However, quaternions are described in Cartesian coordinates only. Here, we present the quaternion representations of crystallographic point-group symmetry operations for the crystallographic reference coordinates in triclinic, monoclinic, orthorhombic, tetragonal, cubic and trigonal (in rhombohedral setting) systems. For these systems, all symmetry operations have been listed and their applications exemplified. Owing to their concise form, quaternions can be used as the symbols of symmetry operations, which contain information about both the orientation and the rotation angle. The shortcomings of quaternions, including different actions for rotations and improper symmetry operations, as well as inadequate representation of the point symmetry in the hexagonal setting, have been discussed. Full article

(This article belongs to the Special Issue Feature Papers in Section "Engineering and Materials" 2024)

► Show Figures

Figure 1

23 pages, 7093 KB

Open AccessArticle

Synthetic Aperture Radar Image Change Detection Based on Principal Component Analysis and Two-Level Clustering

by Liangliang Li, Hongbing Ma, Xueyu Zhang, Xiaobin Zhao, Ming Lv and Zhenhong Jia

Remote Sens. 2024, 16(11), 1861; https://doi.org/10.3390/rs16111861 - 23 May 2024

Cited by 31 | Viewed by 2776

Abstract

Synthetic aperture radar (SAR) change detection provides a powerful tool for continuous, reliable, and objective observation of the Earth, supporting a wide range of applications that require regular monitoring and assessment of changes in the natural and built environment. In this paper, we [...] Read more.

Synthetic aperture radar (SAR) change detection provides a powerful tool for continuous, reliable, and objective observation of the Earth, supporting a wide range of applications that require regular monitoring and assessment of changes in the natural and built environment. In this paper, we introduce a novel SAR image change detection method based on principal component analysis and two-level clustering. First, two difference images of the log-ratio and mean-ratio operators are computed, then the principal component analysis fusion model is used to fuse the two difference images, and a new difference image is generated. To incorporate contextual information during the feature extraction phase, Gabor wavelets are used to obtain the representation of the difference image across multiple scales and orientations. The maximum magnitude across all orientations at each scale is then concatenated to form the Gabor feature vector. Following this, a cascading clustering algorithm is developed within this discriminative feature space by merging the first-level fuzzy c-means clustering with the second-level neighbor rule. Ultimately, the two-level combination of the changed and unchanged results produces the final change map. Five SAR datasets are used for the experiment, and the results show that our algorithm has significant advantages in SAR change detection. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

13 pages, 286 KB

Open AccessArticle

Transracial Adoption, Memory, and Mobile, Processual Identity in Jackie Kay’s Red Dust Road

by Pirjo Ahokas

Genealogy 2023, 7(4), 93; https://doi.org/10.3390/genealogy7040093 - 25 Nov 2023

Viewed by 2313

Abstract

Representations of adoptions tend to concentrate on normatively conceived forms of identity, which prioritize the genetic lineage of adoptees. In contrast, scholarship on autobiographical writing emphasizes that identities are not fixed but are always in process and intersectional because they are formed in [...] Read more.

Representations of adoptions tend to concentrate on normatively conceived forms of identity, which prioritize the genetic lineage of adoptees. In contrast, scholarship on autobiographical writing emphasizes that identities are not fixed but are always in process and intersectional because they are formed in within inequal power relations. Kay’s experimental, autobiographical narrative Red Dust Road (2010) tackles the themes of adoption, the search for close relatives, and reunion. Many scholars of her autobiographical writings describe the fluidity of the diasporic adoptee identities created by her. My aim is more specific: I examine what I call Kay’s continuously mobile, processual identity construction as a transracial adoptee in Red Dust Road. I argue that her identity formation, which is also intersectional, is interconnected with her multidirectional networks of attachments and the experimental form of her adoption narrative. In addition to an intersectional approach and autobiographical studies, I draw on insights from adoption studies. In my reading of Kay’s work, I pay special attention to the inequalities derived from the intersecting vectors of adoption and race, which also intersect with other dimensions of difference, such as nation, gender, class, and sexual orientation. I employ the notion of the multidirectional in the sense in which McLeod applies it to the study of adoption writing. As I demonstrate, multidirectionality and the complex form of Red Dust Road provide versatile means of conveying Kay’s fragmented acts of memory, which assist her ongoing mobile, processual identity construction. Her multidirectional lines of transformative attachments finally bond her to her adoptive and biogenetic families as well as other affective connections. While Kay’s socially significant narrative indicates, amongst other adoption issues, that transracial adoptions can be successful, it is significant that it has no closure. The last chapter gestures toward potential new beginnings, which indicates that the story of adoption has no end. Full article

(This article belongs to the Special Issue Transnational and/or Transracial Adoption and Life Narratives)

20 pages, 4005 KB

Open AccessArticle

WERECE: An Unsupervised Method for Educational Concept Extraction Based on Word Embedding Refinement

by Jingxiu Huang, Ruofei Ding, Xiaomin Wu, Shumin Chen, Jiale Zhang, Lixiang Liu and Yunxiang Zheng

Appl. Sci. 2023, 13(22), 12307; https://doi.org/10.3390/app132212307 - 14 Nov 2023

Cited by 4 | Viewed by 2166

Abstract

The era of educational big data has sparked growing interest in extracting and organizing educational concepts from massive amounts of information. Outcomes are of the utmost importance for artificial intelligence–empowered teaching and learning. Unsupervised educational concept extraction methods based on pre-trained models continue [...] Read more.

The era of educational big data has sparked growing interest in extracting and organizing educational concepts from massive amounts of information. Outcomes are of the utmost importance for artificial intelligence–empowered teaching and learning. Unsupervised educational concept extraction methods based on pre-trained models continue to proliferate due to ongoing advances in semantic representation. However, it remains challenging to directly apply pre-trained large language models to extract educational concepts; pre-trained models are built on extensive corpora and do not necessarily cover all subject-specific concepts. To address this gap, we propose a novel unsupervised method for educational concept extraction based on word embedding refinement (i.e., word embedding refinement–based educational concept extraction (WERECE)). It integrates a manifold learning algorithm to adapt a pre-trained model for extracting educational concepts while accounting for the geometric information in semantic computation. We further devise a discriminant function based on semantic clustering and Box–Cox transformation to enhance WERECE’s accuracy and reliability. We evaluate its performance on two newly constructed datasets, EDU-DT and EDUTECH-DT. Experimental results show that WERECE achieves an average precision up to 85.9%, recall up to 87.0%, and F1 scores up to 86.4%, which significantly outperforms baselines (TextRank, term frequency–inverse document frequency, isolation forest, K-means, and one-class support vector machine) on educational concept extraction. Notably, when WERECE is implemented with different parameter settings, its precision and recall sensitivity remain robust. WERECE also holds broad application prospects as a foundational technology, such as for building discipline-oriented knowledge graphs, enhancing learning assessment and feedback, predicting learning interests, and recommending learning resources. Full article

► Show Figures

Figure 1

14 pages, 515 KB

Open AccessArticle

Towards a Method to Enable the Selection of Physical Models within the Systems Engineering Process: A Case Study with Simulink Models

by Eduardo Cibrián, Jose María Álvarez-Rodríguez, Roy Mendieta and Juan Llorens

Appl. Sci. 2023, 13(21), 11999; https://doi.org/10.3390/app132111999 - 3 Nov 2023

Cited by 2 | Viewed by 1489

Abstract

The use of different techniques and tools is a common practice to cover all stages in the development life-cycle of systems generating a significant number of work products. These artefacts are frequently encoded using diverse formats, and often require access through non-standard protocols [...] Read more.

The use of different techniques and tools is a common practice to cover all stages in the development life-cycle of systems generating a significant number of work products. These artefacts are frequently encoded using diverse formats, and often require access through non-standard protocols and formats. In this context, Model-Based Systems Engineering (MBSE) emerges as a methodology to shift the paradigm of Systems Engineering practice from a document-oriented environment to a model-intensive environment. To achieve this major goal, a formalised application of modelling is employed throughout the life-cycle of systems to generate various system artefacts represented as models, such as requirements, logical models, and multi-physics models. However, the mere use of models does not guarantee one of the main challenges in the Systems Engineering discipline, namely, the reuse of system artefacts. Considering the fact that models are becoming the main type of system artefact, it is necessary to provide the capability to properly and efficiently represent and retrieve the generated models. In light of this, traditional information retrieval techniques have been widely studied to match existing software assets according to a set of capabilities or restrictions. However, there is much more at stake than the simple retrieval of models or even any piece of knowledge. An environment for model reuse must provide the proper mechanisms to (1) represent any piece of data, information, or knowledge under a common and shared data model, and (2) provide advanced retrieval mechanisms to elevate the meaning of information resources from text-based descriptions to concept-based ones. This need has led to novel methods using word embeddings and vector-based representations to semantically encode information. Such methods are applied to encode the information of physical models while preserving their underlying semantics. In this study, a text corpus from MATLAB Simulink models was preprocessed using Natural Language Processing (NLP) techniques and trained to generate word vector representations. Then, the presented method was validated using a testbed of MATLAB Simulink physical models in which verbalisations of models are transformed into vectors. The effectiveness of the proposed solution was assessed through a use case study. Evaluation of the results demonstrates a precision value of 0.925, a recall value of 0.865, and an F1 score of 0.884. Full article

(This article belongs to the Special Issue Advances in Systems Engineering Interoperability: Engineering Design and Operation)

► Show Figures

Figure 1

15 pages, 4273 KB

Open AccessArticle

Fusion of Attention-Based Convolution Neural Network and HOG Features for Static Sign Language Recognition

by Diksha Kumari and Radhey Shyam Anand

Appl. Sci. 2023, 13(21), 11993; https://doi.org/10.3390/app132111993 - 3 Nov 2023

Cited by 8 | Viewed by 2979

Abstract

The deaf and hearing-impaired community expresses their emotions, communicates with society, and enhances the interaction between humans and computers using sign language gestures. This work presents a strategy for efficient feature extraction that uses a combination of two different methods that are the [...] Read more.

The deaf and hearing-impaired community expresses their emotions, communicates with society, and enhances the interaction between humans and computers using sign language gestures. This work presents a strategy for efficient feature extraction that uses a combination of two different methods that are the convolutional block attention module (CBAM)-based convolutional neural network (CNN) and standard handcrafted histogram of oriented gradients (HOG) feature descriptor. The proposed framework aims to enhance accuracy by extracting meaningful features and resolving issues like rotation, similar hand orientation, etc. The HOG feature extraction technique provides a compact feature representation that signifies meaningful information about sign gestures. The CBAM attention module is incorporated into the structure of CNN to enhance feature learning using spatial and channel attention mechanisms. Then, the final feature vector is formed by concatenating these features. This feature vector is provided to the classification layers to predict static sign gestures. The proposed approach is validated on two publicly available static Massey American Sign Language (ASL) and Indian Sign Language (ISL) databases. The model’s performance is evaluated using precision, recall, F1-score, and accuracy. Our proposed methodology achieved 99.22% and 99.79% accuracy for the ASL and ISL datasets. The acquired results signify the efficiency of the feature fusion and attention mechanism. Our network performed better in accuracy compared to the earlier studies. Full article

(This article belongs to the Special Issue Research on Image Analysis and Computer Vision)

► Show Figures

Figure 1

20 pages, 15885 KB

Open AccessArticle

CapGAN: Text-to-Image Synthesis Using Capsule GANs

by Maryam Omar, Hafeez Ur Rehman, Omar Bin Samin, Moutaz Alazab, Gianfranco Politano and Alfredo Benso

Information 2023, 14(10), 552; https://doi.org/10.3390/info14100552 - 9 Oct 2023

Viewed by 3966

Abstract

Text-to-image synthesis is one of the most critical and challenging problems of generative modeling. It is of substantial importance in the area of automatic learning, especially for image creation, modification, analysis and optimization. A number of works have been proposed in the past [...] Read more.

Text-to-image synthesis is one of the most critical and challenging problems of generative modeling. It is of substantial importance in the area of automatic learning, especially for image creation, modification, analysis and optimization. A number of works have been proposed in the past to achieve this goal; however, current methods still lack scene understanding, especially when it comes to synthesizing coherent structures in complex scenes. In this work, we propose a model called CapGAN, to synthesize images from a given single text statement to resolve the problem of global coherent structures in complex scenes. For this purpose, skip-thought vectors are used to encode the given text into vector representation. This encoded vector is used as an input for image synthesis using an adversarial process, in which two models are trained simultaneously, namely: generator (G) and discriminator (D). The model G generates fake images, while the model D tries to predict what the sample is from training data rather than generated by G. The conceptual novelty of this work lies in the integrating capsules at the discriminator level to make the model understand the orientational and relative spatial relationship between different entities of an object in an image. The inception score (IS) along with the Fréchet inception distance (FID) are used as quantitative evaluation metrics for CapGAN. IS recorded for images generated using CapGAN is 4.05 ± 0.050, which is around 34% higher than images synthesized using traditional GANs, whereas the FID score calculated for synthesized images using CapGAN is 44.38, which is ab almost 9% improvement from the previous state-of-the-art models. The experimental results clearly demonstrate the effectiveness of the proposed CapGAN model, which is exceptionally proficient in generating images with complex scenes. Full article

(This article belongs to the Special Issue Advances in Cybersecurity and Reliability)

► Show Figures

Figure 1

24 pages, 51484 KB

Open AccessArticle

Vector Decomposition-Based Arbitrary-Oriented Object Detection for Optical Remote Sensing Images

by Kexue Zhou, Min Zhang, Youqiang Dong, Jinlin Tan, Shaobo Zhao and Hai Wang

Remote Sens. 2023, 15(19), 4738; https://doi.org/10.3390/rs15194738 - 27 Sep 2023

Cited by 1 | Viewed by 1794

Abstract

Arbitrarily oriented object detection is one of the most-popular research fields in remote sensing image processing. In this paper, we propose an approach to predict object angles indirectly, thereby avoiding issues related to angular periodicity and boundary discontinuity. Our method involves representing the [...] Read more.

Arbitrarily oriented object detection is one of the most-popular research fields in remote sensing image processing. In this paper, we propose an approach to predict object angles indirectly, thereby avoiding issues related to angular periodicity and boundary discontinuity. Our method involves representing the long edge and angle of an object as a vector, which we then decompose into horizontal and vertical components. By predicting the two components of the vector, we can obtain the angle information of the object indirectly. To facilitate the transformation between angle-based representation and the proposed vector-decomposition-based representation, we introduced two novel techniques: angle-to-vector encode (ATVEncode) and vector-to-angle decode (VTADecode). These techniques not only improve the efficiency of data processing, but also accelerate the training process. Furthermore, we propose an adaptive coarse-to-fine positive–negative-sample-selection (AdaCFPS) method based on the vector-decomposition-based representation of the object. This method utilizes the Kullback–Leibler divergence loss as a matching degree to dynamically select the most-suitable positive samples. Finally, we modified the YOLOX model to transform it into an arbitrarily oriented object detector that aligns with our proposed vector-decomposition-based representation and positive–negative-sample-selection method. We refer to this redesigned model as the vector-decomposition-based object detector (VODet). In our experiments on the HRSC2016, DIOR-R, and DOTA datasets, VODet demonstrated notable advantages, including fewer parameters, faster processing speed, and higher precision. These results highlighted the significant potential of VODet in the context of arbitrarily oriented object detection. Full article

► Show Figures

Graphical abstract

Search Results (54)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (54)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI