AI Synergy: Vision, Language, and Modality

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Computer Science & Engineering".

Deadline for manuscript submissions: 15 February 2025 | Viewed by 4964

Special Issue Editors


E-Mail Website
Guest Editor
Contents Convergence Research Center, Korea Electronics Technology Institute, Seoul 03924, Republic of Korea
Interests: artificial intelligence; machine learning; signal processing
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Contents Convergence Research Center, Korea Electronics Technology Institute, Seoul 03924, Republic of Korea
Interests: XR (eXtended Reality); artificial intelligence

Special Issue Information

Dear Colleagues,

This Special Issue, titled "AI Synergy: Vision, Language, and Modality", aims to explore the evolving intersections of computer vision, large language models, and multimodal data processing. This Special Issue invites pioneering research that demonstrates novel integrations and applications across these dynamic fields. As artificial intelligence continues to break barriers between data modalities, the synergy among visual, textual, and other data forms has unlocked remarkable advancements and practical applications. We encourage submissions that push the boundaries of how these technologies can work in tandem to enhance understanding, generate new insights, and solve complex problems.

Contributions may include, but are not limited to, innovative algorithms, system designs, and application-oriented studies that leverage the strengths of computer vision, language processing, and multimodal interaction. We are particularly interested in research that addresses challenges such as data fusion, model interpretability, and scalability in real-world settings. This Special Issue aims to serve as a platform for researchers to share breakthroughs that drive forward the capabilities of AI systems, making them more intuitive, efficient, and accessible. Submissions that provide comprehensive experimental results, in-depth analyses, and discussions on the implications of integrating these technologies are highly encouraged.

Dr. Taehyeon Kim
Dr. KyungTaek Lee
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computer vision
  • large language model
  • multimodal data processing
  • AI integration
  • data fusion
  • neural networks
  • AI scalability
  • interpretability in AI
  • real-world applications
  • cross-modal analytics

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

15 pages, 460 KiB  
Article
Unified Domain Adaptation for Specialized Indoor Scene Inpainting Using a Pre-Trained Model
by Asrafi Akter and Myungho Lee
Electronics 2024, 13(24), 4970; https://doi.org/10.3390/electronics13244970 - 17 Dec 2024
Viewed by 300
Abstract
Image inpainting for indoor environments presents unique challenges due to complex spatial relationships, diverse lighting conditions, and domain-specific object configurations. This paper introduces a resource-efficient post-processing framework that enhances domain-specific image inpainting through an adaptation mechanism. Our architecture integrates a convolutional neural network [...] Read more.
Image inpainting for indoor environments presents unique challenges due to complex spatial relationships, diverse lighting conditions, and domain-specific object configurations. This paper introduces a resource-efficient post-processing framework that enhances domain-specific image inpainting through an adaptation mechanism. Our architecture integrates a convolutional neural network with residual connections optimized via a multi-term objective function combining perceptual losses and adaptive loss weighting. Experiments on our curated dataset of 4000 indoor household scenes demonstrate improved performance, with training completed in 20 min on commodity GPU hardware with 0.14 s of inference latency per image. The framework exhibits enhanced results across standard metrics (FID, SSIM, LPIPS, MAE, and PSNR), showing improvements in structural coherence and perceptual quality while preserving cross-domain generalization abilities. Our methodology offers a novel approach for efficient domain adaptation in image inpainting, particularly suitable for real-world applications under computational constraints. This work advances the development of domain-aware image restoration systems and provides architectural insights for specialized image processing frameworks. Full article
(This article belongs to the Special Issue AI Synergy: Vision, Language, and Modality)
Show Figures

Figure 1

10 pages, 4572 KiB  
Article
Multimodal Food Image Classification with Large Language Models
by Jun-Hwa Kim, Nam-Ho Kim, Donghyeok Jo and Chee Sun Won
Electronics 2024, 13(22), 4552; https://doi.org/10.3390/electronics13224552 - 20 Nov 2024
Viewed by 625
Abstract
In this study, we leverage advancements in large language models (LLMs) for fine-grained food image classification. We achieve this by integrating textual features extracted from images using an LLM into a multimodal learning framework. Specifically, semantic textual descriptions generated by the LLM are [...] Read more.
In this study, we leverage advancements in large language models (LLMs) for fine-grained food image classification. We achieve this by integrating textual features extracted from images using an LLM into a multimodal learning framework. Specifically, semantic textual descriptions generated by the LLM are encoded and combined with image features obtained from a transformer-based architecture to improve food image classification. Our approach employs a cross-attention mechanism to effectively fuse visual and textual modalities, enhancing the model’s ability to extract discriminative features beyond what can be achieved with visual features alone. Full article
(This article belongs to the Special Issue AI Synergy: Vision, Language, and Modality)
Show Figures

Figure 1

26 pages, 10600 KiB  
Article
Deep Learning-Based Stopped Vehicle Detection Method Utilizing In-Vehicle Dashcams
by Jinuk Park, Jaeyong Lee, Yongju Park and Yongseok Lim
Electronics 2024, 13(20), 4097; https://doi.org/10.3390/electronics13204097 - 17 Oct 2024
Viewed by 1001
Abstract
In complex urban road conditions, stationary or illegally parked vehicles present a considerable risk to the overall traffic system. In safety-critical applications like autonomous driving, the detection of stopped vehicles is of utmost importance. Previous methods for detecting stopped vehicles have been designed [...] Read more.
In complex urban road conditions, stationary or illegally parked vehicles present a considerable risk to the overall traffic system. In safety-critical applications like autonomous driving, the detection of stopped vehicles is of utmost importance. Previous methods for detecting stopped vehicles have been designed for stationary viewpoints, such as security cameras, which consistently monitor fixed locations. However, these methods for detecting stopped vehicles based on stationary views cannot address blind spots and are not applicable from driving vehicles. To address these limitations, we propose a novel deep learning-based framework for detecting stopped vehicles in dynamic environments, particularly those recorded by dashcams. The proposed framework integrates a deep learning-based object detector and tracker, along with movement estimation using the dense optical flow method. We also introduced additional centerline detection and inter-vehicle distance measurement. The experimental results demonstrate that the proposed framework can effectively identify stopped vehicles under real-world road conditions. Full article
(This article belongs to the Special Issue AI Synergy: Vision, Language, and Modality)
Show Figures

Figure 1

15 pages, 1785 KiB  
Article
FusionNetV2: Explicit Enhancement of Edge Features for 6D Object Pose Estimation
by Yuning Ye and Hanhoon Park
Electronics 2024, 13(18), 3736; https://doi.org/10.3390/electronics13183736 - 20 Sep 2024
Viewed by 744
Abstract
FusionNet is a hybrid model that incorporates convolutional neural networks and Transformers, achieving state-of-the-art performance in 6D object pose estimation while significantly reducing the number of model parameters. Our study reveals that FusionNet has local and global attention mechanisms for enhancing deep features [...] Read more.
FusionNet is a hybrid model that incorporates convolutional neural networks and Transformers, achieving state-of-the-art performance in 6D object pose estimation while significantly reducing the number of model parameters. Our study reveals that FusionNet has local and global attention mechanisms for enhancing deep features in two paths and the attention mechanisms play a role in implicitly enhancing features around object edges. We found that enhancing the features around object edges was the main reason for the performance improvement in 6D object pose estimation. Therefore, in this study, we attempt to enhance the features around object edges explicitly and intuitively. To this end, an edge boosting block (EBB) is introduced that replaces the attention blocks responsible for local attention in FusionNet. EBB is lightweight and can be directly applied to FusionNet with minimal modifications. EBB significantly improved the performance of FusionNet in 6D object pose estimation in experiments on the LINEMOD dataset. Full article
(This article belongs to the Special Issue AI Synergy: Vision, Language, and Modality)
Show Figures

Figure 1

18 pages, 10444 KiB  
Article
Ancient Painting Inpainting Based on Multi-Layer Feature Enhancement and Frequency Perception
by Xiaotong Liu, Jin Wan, Nan Wang and Yuting Wang
Electronics 2024, 13(16), 3309; https://doi.org/10.3390/electronics13163309 - 21 Aug 2024
Viewed by 581
Abstract
Image inpainting aims to restore the damaged information in images, enhancing their readability and usability. Ancient paintings, as a vital component of traditional art, convey profound cultural and artistic value, yet often suffer from various forms of damage over time. Existing ancient painting [...] Read more.
Image inpainting aims to restore the damaged information in images, enhancing their readability and usability. Ancient paintings, as a vital component of traditional art, convey profound cultural and artistic value, yet often suffer from various forms of damage over time. Existing ancient painting inpainting methods are insufficient in extracting deep semantic information, resulting in the loss of high-frequency detail features of the reconstructed image and inconsistency between global and local semantic information. To address these issues, this paper proposes a Generative Adversarial Network (GAN)-based ancient painting inpainting method using multi-layer feature enhancement and frequency perception, named MFGAN. Firstly, we design a Residual Pyramid Encoder (RPE), which fully extracts the deep semantic features of ancient painting images and strengthens the processing of image details by effectively combining the deep feature extraction module and channel attention. Secondly, we propose a Frequency-Aware Mechanism (FAM) to obtain the high-frequency perceptual features by using the frequency attention module, which captures the high-frequency details and texture features of the ancient paintings by increasing the skip connections between the low-frequency and the high-frequency features, and provides more frequency perception information. Thirdly, a Dual Discriminator (DD) is designed to ensure the consistency of semantic information between global and local region images, while reducing the discontinuity and blurring differences at the boundary during image inpainting. Finally, extensive experiments on the proposed ancient painting and Huaniao datasets show that our proposed method outperforms competitive image inpainting methods and exhibits robust generalization capabilities. Full article
(This article belongs to the Special Issue AI Synergy: Vision, Language, and Modality)
Show Figures

Figure 1

12 pages, 1186 KiB  
Article
BSRT++: Improving BSRT with Feature Enhancement, Weighted Fusion, and Cyclic Sampling
by Suji Son and Hanhoon Park
Electronics 2024, 13(16), 3178; https://doi.org/10.3390/electronics13163178 - 11 Aug 2024
Viewed by 1135
Abstract
Multi-frame super-resolution (MFSR) generates a super-resolution (SR) image from a burst consisting of multiple low-resolution images. Burst Super-Resolution Transformer (BSRT) is a state-of-the-art deep learning model for MFSR. However, in this study, we show that there is room for further improvement of BSRT [...] Read more.
Multi-frame super-resolution (MFSR) generates a super-resolution (SR) image from a burst consisting of multiple low-resolution images. Burst Super-Resolution Transformer (BSRT) is a state-of-the-art deep learning model for MFSR. However, in this study, we show that there is room for further improvement of BSRT in the feature extraction and fusion process. Then, we propose a feature enhancement module (FEM), a cyclic sampling module (CSM), and a feature reweighting module (FRM) and integrate them into BSRT. Finally, we demonstrate that the modules can help recover the high-frequency information well, enhance inter-frame communication, and suppress misaligned features, thus significantly improving the SR performance and producing more visually plausible and pleasant results compared to other MFSR methods, including BSRT. On the SyntheticBurst and RealBurst datasets, the improved BSRT with the modules, dubbed BSRT++, achieved higher PSNR values of 1.15 dB and 1.31 dB than BSRT, respectively. Full article
(This article belongs to the Special Issue AI Synergy: Vision, Language, and Modality)
Show Figures

Figure 1

Back to TopTop