sensors-logo

Journal Browser

Journal Browser

Multimodal Sensing Technologies for IoT and AI-Enabled Systems

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Internet of Things".

Deadline for manuscript submissions: 20 June 2025 | Viewed by 7483

Special Issue Editors


E-Mail Website
Guest Editor
School of Electrical and Computer Engineering, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
Interests: cyber-physical systems; Internet of Things; autonomous systems; AI for robotics; autonomous cars
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Laboratory of Electronic Media, School of Journalism and Mass Communications, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
Interests: media technologies; audiovisual capturing; audiovisual signal processing; machine learning; multimedia semantics; cross-media authentication; digital audio and audiovisual forensics
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
School of Electrical and Computer Engineering, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece
Interests: software engineering processes; model-driven engineering; software quality and software analytics; middleware robotics and knowledge extraction from big data repositories
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

We are delighted to announce this Special Issue, entitled "Multimodal Sensing Technologies for IoT and AI-Enabled Systems", in the renowned international journal Sensors.

In today's world, multimodal data and sensing technologies have emerged as crucial components within the Internet of Things (IoT) and artificial intelligence (AI) paradigms, influencing multiple fields, from healthcare to industry, media, education, robotics, transportation, and environmental monitoring, shaping broader multidisciplinary research and application projects. Due to time, location and contextual awareness, integrating IoT with AI has led to enhanced smart systems capable of performing complex tasks autonomously, thereby contributing to the development of intelligent societies. This Special Issue aims to bring together cutting-edge research and the latest advancements in multimodal sensing technologies, IoT, and AI-enabled systems, combining imaging applications, audiovisual reaction monitoring, and broader sensing technologies (e.g., temperature, humidity, air pollution, interaction recording, etc.), thus forming multimodal fusion decision systems. The proposed Special Issue is an excellent match to the objectives of Sensors, in addition to aligning itself perfectly with the journal’s multidisciplinary nature.

We encourage the submission of high-quality papers demonstrating these technologies' potential to shape our future, drive innovation, and offer solutions to real-world problems. Authors are invited to submit original research works, viewpoint articles, case studies, reviews, theoretical, and critical perspectives.

Topics of interest may include, but are not limited to, the following:

  • Design and implementation of multimodal sensors for IoT.
  • AI techniques for multimodal sensor data analysis.
  • Integration of AI and IoT for smart system development.
  • Security and privacy in AI-enabled IoT systems.
  • Real-world applications and case studies of multimodal sensing technologies in IoT and AI-enabled systems.
  • Data analytics and intelligent content management systems.
  • Multimodal sensing and fused decision-making in robotics.
  • Data journalism/visualization and media automations using multimodal sensing with AI-enabled systems.
  • Environmental data-driven monitoring automations.
  • Educational and digital literacy applications of IoT and AI-enabled systems.
  • Biomedical engineering applications of IoT and AI-enabled systems.
  • Multimodal sensing for data crowdsourcing and datasets organization.
  • Sensing technology for cyber–physical systems.
  • Sensor technology for agile data retrieval and analytics.
  • Sensing technology for AI-enabled systems.
  • Adaptive/modular sensor technology for data management.
  • Model-driven engineering approaches for multimodal sensor systems.

Dr. Emmanouil Tsardoulias
Prof. Dr. Charalampos Dimoulas
Prof. Dr. Andreas L. Symeonidis
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence
  • AI-enabled systems
  • data-driven systems
  • Internet of Things
  • machine learning
  • multimodal decision making
  • multimodal sensing
  • smart systems

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

10 pages, 678 KiB  
Article
PolyMeme: Fine-Grained Internet Meme Sensing
by Vasileios Arailopoulos, Christos Koutlis, Symeon Papadopoulos and Panagiotis C. Petrantonakis
Sensors 2024, 24(17), 5456; https://doi.org/10.3390/s24175456 - 23 Aug 2024
Viewed by 1149
Abstract
Internet memes are a special type of digital content that is shared through social media. They have recently emerged as a popular new format of media communication. They are often multimodal, combining text with images and aim to express humor, irony, sarcasm, or [...] Read more.
Internet memes are a special type of digital content that is shared through social media. They have recently emerged as a popular new format of media communication. They are often multimodal, combining text with images and aim to express humor, irony, sarcasm, or sometimes convey hatred and misinformation. Automatically detecting memes is important since it enables tracking of social and cultural trends and issues related to the spread of harmful content. While memes can take various forms and belong to different categories, such as image macros, memes with labeled objects, screenshots, memes with text out of the image, and funny images, existing datasets do not account for the diversity of meme formats, styles and content. To bridge this gap, we present the PolyMeme dataset, which comprises approximately 27 K memes from four categories. This was collected from Reddit and a part of it was manually labelled into these categories. Using the manual labels, deep learning networks were trained to classify the unlabelled images with an estimated error rate of 7.35%. The introduced meme dataset in combination with existing datasets of regular images were used to train deep learning networks (ResNet, ViT) on meme detection, exhibiting very high accuracy levels (98% on the test set). In addition, no significant gains were identified from the use of regular images containing text. Full article
(This article belongs to the Special Issue Multimodal Sensing Technologies for IoT and AI-Enabled Systems)
Show Figures

Figure 1

20 pages, 3663 KiB  
Article
A Multilayer Architecture towards the Development and Distribution of Multimodal Interface Applications on the Edge
by Nikolaos Malamas, Konstantinos Panayiotou, Apostolia Karabatea, Emmanouil Tsardoulias and Andreas L. Symeonidis
Sensors 2024, 24(16), 5199; https://doi.org/10.3390/s24165199 - 11 Aug 2024
Viewed by 910
Abstract
Today, Smart Assistants (SAs) are supported by significantly improved Natural Language Processing (NLP) and Natural Language Understanding (NLU) engines as well as AI-enabled decision support, enabling efficient information communication, easy appliance/device control, and seamless access to entertainment services, among others. In fact, an [...] Read more.
Today, Smart Assistants (SAs) are supported by significantly improved Natural Language Processing (NLP) and Natural Language Understanding (NLU) engines as well as AI-enabled decision support, enabling efficient information communication, easy appliance/device control, and seamless access to entertainment services, among others. In fact, an increasing number of modern households are being equipped with SAs, which promise to enhance user experience in the context of smart environments through verbal interaction. Currently, the market in SAs is dominated by products manufactured by technology giants that provide well designed off-the-shelf solutions. However, their simple setup and ease of use come with trade-offs, as these SAs abide by proprietary and/or closed-source architectures and offer limited functionality. Their enforced vendor lock-in does not provide (power) users with the ability to build custom conversational applications through their SAs. On the other hand, employing an open-source approach for building and deploying an SA (which comes with a significant overhead) necessitates expertise in multiple domains and fluency in the multimodal technologies used to build the envisioned applications. In this context, this paper proposes a methodology for developing and deploying conversational applications on the edge on top of an open-source software and hardware infrastructure via a multilayer architecture that simplifies low-level complexity and reduces learning overhead. The proposed approach facilitates the rapid development of applications by third-party developers, thereby enabling the establishment of a marketplace of customized applications aimed at the smart assisted living domain, among others. The supporting framework supports application developers, device owners, and ecosystem administrators in building, testing, uploading, and deploying applications, remotely controlling devices, and monitoring device performance. A demonstration of this methodology is presented and discussed focusing on health and assisted living applications for the elderly. Full article
(This article belongs to the Special Issue Multimodal Sensing Technologies for IoT and AI-Enabled Systems)
Show Figures

Figure 1

14 pages, 1252 KiB  
Article
Multisensory Fusion for Unsupervised Spatiotemporal Speaker Diarization
by Paris Xylogiannis, Nikolaos Vryzas, Lazaros Vrysis and Charalampos Dimoulas
Sensors 2024, 24(13), 4229; https://doi.org/10.3390/s24134229 - 29 Jun 2024
Viewed by 911
Abstract
Speaker diarization consists of answering the question of “who spoke when” in audio recordings. In meeting scenarios, the task of labeling audio with the corresponding speaker identities can be further assisted by the exploitation of spatial features. This work proposes a framework designed [...] Read more.
Speaker diarization consists of answering the question of “who spoke when” in audio recordings. In meeting scenarios, the task of labeling audio with the corresponding speaker identities can be further assisted by the exploitation of spatial features. This work proposes a framework designed to assess the effectiveness of combining speaker embeddings with Time Difference of Arrival (TDOA) values from available microphone sensor arrays in meetings. We extract speaker embeddings using two popular and robust pre-trained models, ECAPA-TDNN and X-vectors, and calculate the TDOA values via the Generalized Cross-Correlation (GCC) method with Phase Transform (PHAT) weighting. Although ECAPA-TDNN outperforms the Xvectors model, we utilize both speaker embedding models to explore the potential of employing a computationally lighter model when spatial information is exploited. Various techniques for combining the spatial–temporal information are examined in order to determine the best clustering method. The proposed framework is evaluated on two multichannel datasets: the AVLab Speaker Localization dataset and a multichannel dataset (SpeaD-M3C) enriched in the context of the present work with supplementary information from smartphone recordings. Our results strongly indicate that the integration of spatial information can significantly improve the performance of state-of-the-art deep learning diarization models, presenting a 2–3% reduction in DER compared to the baseline approach on the evaluated datasets. Full article
(This article belongs to the Special Issue Multimodal Sensing Technologies for IoT and AI-Enabled Systems)
Show Figures

Figure 1

27 pages, 23020 KiB  
Article
Seamless Fusion: Multi-Modal Localization for First Responders in Challenging Environments
by Dennis Dahlke, Petros Drakoulis, Anaida Fernández García, Susanna Kaiser, Sotiris Karavarsamis, Michail Mallis, William Oliff, Georgia Sakellari, Alberto Belmonte-Hernández, Federico Alvarez and Dimitrios Zarpalas
Sensors 2024, 24(9), 2864; https://doi.org/10.3390/s24092864 - 30 Apr 2024
Viewed by 1219
Abstract
In dynamic and unpredictable environments, the precise localization of first responders and rescuers is crucial for effective incident response. This paper introduces a novel approach leveraging three complementary localization modalities: visual-based, Galileo-based, and inertial-based. Each modality contributes uniquely to the final Fusion tool, [...] Read more.
In dynamic and unpredictable environments, the precise localization of first responders and rescuers is crucial for effective incident response. This paper introduces a novel approach leveraging three complementary localization modalities: visual-based, Galileo-based, and inertial-based. Each modality contributes uniquely to the final Fusion tool, facilitating seamless indoor and outdoor localization, offering a robust and accurate localization solution without reliance on pre-existing infrastructure, essential for maintaining responder safety and optimizing operational effectiveness. The visual-based localization method utilizes an RGB camera coupled with a modified implementation of the ORB-SLAM2 method, enabling operation with or without prior area scanning. The Galileo-based localization method employs a lightweight prototype equipped with a high-accuracy GNSS receiver board, tailored to meet the specific needs of first responders. The inertial-based localization method utilizes sensor fusion, primarily leveraging smartphone inertial measurement units, to predict and adjust first responders’ positions incrementally, compensating for the GPS signal attenuation indoors. A comprehensive validation test involving various environmental conditions was carried out to demonstrate the efficacy of the proposed fused localization tool. Our results show that our proposed solution always provides a location regardless of the conditions (indoors, outdoors, etc.), with an overall mean error of 1.73 m. Full article
(This article belongs to the Special Issue Multimodal Sensing Technologies for IoT and AI-Enabled Systems)
Show Figures

Figure 1

18 pages, 3646 KiB  
Article
Multimodal Environmental Sensing Using AI & IoT Solutions: A Cognitive Sound Analysis Perspective
by Alexandros Emvoliadis, Nikolaos Vryzas, Marina-Eirini Stamatiadou, Lazaros Vrysis and Charalampos Dimoulas
Sensors 2024, 24(9), 2755; https://doi.org/10.3390/s24092755 - 26 Apr 2024
Viewed by 1121
Abstract
This study presents a novel audio compression technique, tailored for environmental monitoring within multi-modal data processing pipelines. Considering the crucial role that audio data play in environmental evaluations, particularly in contexts with extreme resource limitations, our strategy substantially decreases bit rates to facilitate [...] Read more.
This study presents a novel audio compression technique, tailored for environmental monitoring within multi-modal data processing pipelines. Considering the crucial role that audio data play in environmental evaluations, particularly in contexts with extreme resource limitations, our strategy substantially decreases bit rates to facilitate efficient data transfer and storage. This is accomplished without undermining the accuracy necessary for trustworthy air pollution analysis while simultaneously minimizing processing expenses. More specifically, our approach fuses a Deep-Learning-based model, optimized for edge devices, along with a conventional coding schema for audio compression. Once transmitted to the cloud, the compressed data undergo a decoding process, leveraging vast cloud computing resources for accurate reconstruction and classification. The experimental results indicate that our approach leads to a relatively minor decrease in accuracy, even at notably low bit rates, and demonstrates strong robustness in identifying data from labels not included in our training dataset. Full article
(This article belongs to the Special Issue Multimodal Sensing Technologies for IoT and AI-Enabled Systems)
Show Figures

Figure 1

Back to TopTop