Information

Research

Jump to: Review

14 pages, 1113 KB

Open AccessArticle

Image Captioning Using Topic Faster R-CNN-LSTM Networks

by Jui-Feng Yeh, Kuei-Mei Lin and Chun-Chieh Chen

Information 2025, 16(9), 726; https://doi.org/10.3390/info16090726 - 25 Aug 2025

Viewed by 790

Abstract

Image captioning is an important task in cross-modal research in numerous applications. Image captioning aims to capture the semantic content of an image and express it in a linguistically and contextually appropriate sentence. However, existing models mostly trend to focus on a topic [...] Read more.

Image captioning is an important task in cross-modal research in numerous applications. Image captioning aims to capture the semantic content of an image and express it in a linguistically and contextually appropriate sentence. However, existing models mostly trend to focus on a topic generated by the most conspicuous foreground objects. Thus, other topics in the image are often ignored. To address these limitations, we propose a model that can generate richer semantic content and more diverse captions. The proposed model can capture not only main topics using coarse-grained objects but also finds fine-grained visual information from background or minor foreground objects. Our image captioning system combines the ResNet, LSTM, and topic feature models. The ResNet model extracts fine-grained image features and enriches the description of objects. The LSTM model provides a longer context for semantics, increasing the fluency and semantic completeness of the generated sentences. The topic model determines multiple topics based on the image and text content. The topics provide directions for the model to generate different sentences. We evaluate our model on the MSCOCO dataset. The results show that compared with other models, our model achieves a certain improvement in higher-order BLEU scores and a significant improvement in CIDEr score. Full article

(This article belongs to the Special Issue Information Processing in Multimedia Applications)

► Show Figures

Figure 1

18 pages, 2558 KB

Open AccessArticle

Speech Emotion Recognition and Serious Games: An Entertaining Approach for Crowdsourcing Annotated Samples

by Lazaros Matsouliadis, Eleni Siamtanidou, Nikolaos Vryzas and Charalampos Dimoulas

Information 2025, 16(3), 238; https://doi.org/10.3390/info16030238 - 18 Mar 2025

Viewed by 1082

Abstract

Computer games have emerged as valuable tools for education and training. In particular, serious games, which combine learning with entertainment, offer unique potential for engaging users and enhancing knowledge acquisition. This paper presents a case study on the design, development, and evaluation of [...] Read more.

Computer games have emerged as valuable tools for education and training. In particular, serious games, which combine learning with entertainment, offer unique potential for engaging users and enhancing knowledge acquisition. This paper presents a case study on the design, development, and evaluation of two serious games, “Silent Kingdom” and “Job Interview Simulator”, created using Unreal Engine 5 and incorporating speech emotion recognition (SER) technology. Through a systematic analysis of the existing research in SER and game development, these games were designed to elicit a wide range of emotion responses from player and collect voice data for the enhancement of SER models. By evaluating player engagement, emotional expression, and overall user experience, this study investigates the effectiveness of serious games in collecting speech data and creating more immersive player experiences. The research also explores the technical limitations of SER integration within game environments in real-time, as well as its impact on player enjoyment. Although there are some technology limitations due to the latency provided for real-time SER analysis, the results reveal that a properly developed game with integrated SER technology could become a more engaging and efficient tool for crowdsourcing speech data. Full article

(This article belongs to the Special Issue Information Processing in Multimedia Applications)

► Show Figures

Figure 1

23 pages, 12090 KB

Open AccessArticle

Smart Car Damage Assessment Using Enhanced YOLO Algorithm and Image Processing Techniques

by Muhammad Remzy Syah Ramazhan, Alhadi Bustamam and Rinaldi Anwar Buyung

Information 2025, 16(3), 211; https://doi.org/10.3390/info16030211 - 10 Mar 2025

Viewed by 2540

Abstract

Conventional inspections in car damage assessments depend on visual judgments by human inspectors, which are labor-intensive and prone to fraudulent practices through manipulating damages. Recent advancements in artificial intelligence have given rise to a state-of-the-art object detection algorithm, the You Only Look Once [...] Read more.

Conventional inspections in car damage assessments depend on visual judgments by human inspectors, which are labor-intensive and prone to fraudulent practices through manipulating damages. Recent advancements in artificial intelligence have given rise to a state-of-the-art object detection algorithm, the You Only Look Once algorithm (YOLO), that sets a new standard in smart and automated damage assessment. This study proposes an enhanced YOLOv9 network tailored to detect six types of car damage. The enhancements include the convolutional block attention module (CBAM), applied to the backbone layer to enhance the model’s ability to focus on key damaged regions, and the SCYLLA-IoU (SIoU) loss function, introduced for bounding box regression. To be able to assess the damage severity comprehensively, we propose a novel formula named damage severity index (DSI) for quantifying damage severity directly from images, integrating multiple factors such as the number of detected damages, the ratio of damage to the image size, object detection confidence, and the type of damage. Experimental results on the CarDD dataset show that the proposed model outperforms state-of-the-art YOLO algorithms by 1.75% and that the proposed DSI demonstrates intuitive assessment of damage severity with numbers, aiding repair decisions. Full article

(This article belongs to the Special Issue Information Processing in Multimedia Applications)

► Show Figures

Figure 1

17 pages, 6436 KB

Open AccessArticle

One-Shot Learning from Prototype Stock Keeping Unit Images

by Aleksandra Kowalczyk and Grzegorz Sarwas

Information 2024, 15(9), 526; https://doi.org/10.3390/info15090526 - 28 Aug 2024

Viewed by 1775

Abstract

This paper highlights the importance of one-shot learning from prototype Stock Keeping Unit (SKU) images for efficient product recognition in retail and inventory management. Traditional methods require large supervised datasets to train deep neural networks, which can be costly and impractical. One-shot learning [...] Read more.

This paper highlights the importance of one-shot learning from prototype Stock Keeping Unit (SKU) images for efficient product recognition in retail and inventory management. Traditional methods require large supervised datasets to train deep neural networks, which can be costly and impractical. One-shot learning techniques mitigate this issue by enabling classification from a single prototype image per product class, thus reducing data annotation efforts. We introduce the Variational Prototyping Encoder (VPE), a novel deep neural network for one-shot classification. Utilizing a support set of prototype SKU images, VPE learns to classify query images by capturing image similarity and prototypical concepts. Unlike metric learning-based approaches, VPE pre-learns image translation from real-world object images to prototype images as a meta-task, facilitating efficient one-shot classification with minimal supervision. Our research demonstrates that VPE effectively reduces the need for extensive datasets by utilizing a single image per class while accurately classifying query images into their respective categories, thus providing a practical solution for product classification tasks. Full article

(This article belongs to the Special Issue Information Processing in Multimedia Applications)

► Show Figures

Figure 1

26 pages, 10462 KB

Open AccessArticle

The Optimal Choice of the Encoder–Decoder Model Components for Image Captioning

by Mateusz Bartosiewicz and Marcin Iwanowski

Information 2024, 15(8), 504; https://doi.org/10.3390/info15080504 - 21 Aug 2024

Cited by 2 | Viewed by 2145

Abstract

Image captioning aims at generating meaningful verbal descriptions of a digital image. This domain is rapidly growing due to the enormous increase in available computational resources. The most advanced methods are, however, resource-demanding. In our paper, we return to the encoder–decoder deep-learning model [...] Read more.

Image captioning aims at generating meaningful verbal descriptions of a digital image. This domain is rapidly growing due to the enormous increase in available computational resources. The most advanced methods are, however, resource-demanding. In our paper, we return to the encoder–decoder deep-learning model and investigate how replacing its components with newer equivalents improves overall effectiveness. The primary motivation of our study is to obtain the highest possible level of improvement of classic methods, which are applicable in less computational environments where most advanced models are too heavy to be efficiently applied. We investigate image feature extractors, recurrent neural networks, word embedding models, and word generation layers and discuss how each component influences the captioning model’s overall performance. Our experiments are performed on the MS COCO 2014 dataset. As a result of our research, replacing components improves the quality of generating image captions. The results will help design efficient models with optimal combinations of their components. Full article

(This article belongs to the Special Issue Information Processing in Multimedia Applications)

► Show Figures

Figure 1

15 pages, 4913 KB

Open AccessArticle

Deep Learning-Based Monocular Estimation of Distance and Height for Edge Devices

by Jan Gąsienica-Józkowy, Bogusław Cyganek, Mateusz Knapik, Szymon Głogowski and Łukasz Przebinda

Information 2024, 15(8), 474; https://doi.org/10.3390/info15080474 - 9 Aug 2024

Cited by 4 | Viewed by 4164

Abstract

Accurately estimating the absolute distance and height of objects in open areas is quite challenging, especially when based solely on single images. In this paper, we tackle these issues and propose a new method that blends traditional computer vision techniques with advanced neural [...] Read more.

Accurately estimating the absolute distance and height of objects in open areas is quite challenging, especially when based solely on single images. In this paper, we tackle these issues and propose a new method that blends traditional computer vision techniques with advanced neural network-based solutions. Our approach combines object detection and segmentation, monocular depth estimation, and homography-based mapping to provide precise and efficient measurements of absolute height and distance. This solution is implemented on an edge device, allowing for real-time data processing using both visual and thermal data sources. Experimental tests on a height estimation dataset we created show an accuracy of 98.86%, confirming the effectiveness of our method. Full article

(This article belongs to the Special Issue Information Processing in Multimedia Applications)

► Show Figures

Figure 1

17 pages, 1746 KB

Open AccessArticle

Examining the Roles, Sentiments, and Discourse of European Interest Groups in the Ukrainian War through X (Twitter)

by Aritz Gorostiza-Cerviño, Álvaro Serna-Ortega, Andrea Moreno-Cabanillas, Ana Almansa-Martínez and Antonio Castillo-Esparcia

Information 2024, 15(7), 422; https://doi.org/10.3390/info15070422 - 22 Jul 2024

Viewed by 1918

Abstract

This research focuses on examining the responses of interest groups listed in the European Transparency Register to the ongoing Russia–Ukraine war. Its aim is to investigate the nuanced reactions of 2579 commercial and business associations and 2957 companies and groups to the recent [...] Read more.

This research focuses on examining the responses of interest groups listed in the European Transparency Register to the ongoing Russia–Ukraine war. Its aim is to investigate the nuanced reactions of 2579 commercial and business associations and 2957 companies and groups to the recent conflict, as expressed through their X (Twitter) activities. Utilizing advanced text mining and NLP and LDA techniques, this study conducts a comprehensive analysis encompassing language dynamics, thematic shifts, sentiment variations, and activity levels exhibited by these entities both before and after the outbreak of the war. The results obtained reflect a gradual decrease in negative emotions regarding the conflict over time. Likewise, multiple forms of outside lobbying are identified in the communication strategies of interest groups. All in all, this empirical inquiry into how interest groups adapt their messaging in response to complex geopolitical events holds the potential to provide invaluable insights into the multifaceted role of lobbying in shapi ng public policies. Full article

(This article belongs to the Special Issue Information Processing in Multimedia Applications)

► Show Figures

Figure 1

Review

Jump to: Research

52 pages, 2296 KB

Open AccessReview

Digital Sentinels and Antagonists: The Dual Nature of Chatbots in Cybersecurity

by Hannah Szmurlo and Zahid Akhtar

Information 2024, 15(8), 443; https://doi.org/10.3390/info15080443 - 29 Jul 2024

Cited by 3 | Viewed by 3973

Abstract

Advancements in artificial intelligence, machine learning, and natural language processing have culminated in sophisticated technologies such as transformer models, generative AI models, and chatbots. Chatbots are sophisticated software applications created to simulate conversation with human users. Chatbots have surged in popularity owing to [...] Read more.

Advancements in artificial intelligence, machine learning, and natural language processing have culminated in sophisticated technologies such as transformer models, generative AI models, and chatbots. Chatbots are sophisticated software applications created to simulate conversation with human users. Chatbots have surged in popularity owing to their versatility and user-friendly nature, which have made them indispensable across a wide range of tasks. This article explores the dual nature of chatbots in the realm of cybersecurity and highlights their roles as both defensive tools and offensive tools. On the one hand, chatbots enhance organizational cyber defenses by providing real-time threat responses and fortifying existing security measures. On the other hand, adversaries exploit chatbots to perform advanced cyberattacks, since chatbots have lowered the technical barrier to generate phishing, malware, and other cyberthreats. Despite the implementation of censorship systems, malicious actors find ways to bypass these safeguards. Thus, this paper first provides an overview of the historical development of chatbots and large language models (LLMs), including their functionality, applications, and societal effects. Next, we explore the dualistic applications of chatbots in cybersecurity by surveying the most representative works on both attacks involving chatbots and chatbots’ defensive uses. We also present experimental analyses to illustrate and evaluate different offensive applications of chatbots. Finally, open issues and challenges regarding the duality of chatbots are highlighted and potential future research directions are discussed to promote responsible usage and enhance both offensive and defensive cybersecurity strategies. Full article

(This article belongs to the Special Issue Information Processing in Multimedia Applications)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Information Processing in Multimedia Applications

Share This Special Issue

Special Issue Editors

Special Issue Information

Topics

Keywords

Benefits of Publishing in a Special Issue

Published Papers (8 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI