Submit to Electronics Review for Electronics Propose a Special Issue

Journal Browser

► Journal Browser

Multimedia Content Analysis, Management and Retrieval: Trends and Challenges

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Published Papers

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Electronic Multimedia".

Deadline for manuscript submissions: closed (15 October 2022) | Viewed by 21739

Share This Special Issue

Special Issue Editors

Dr. Zheng Wang

E-Mail Website
Guest Editor

School of Computer Science, Wuhan University, Wuhan 430072, China
Interests: multimedia content analysis; image retrieval; artificial intelligence
Special Issues, Collections and Topics in MDPI journals

Dr. Jian Zhao

E-Mail Website
Guest Editor

Institute of North Electronic Equipment, Beijing 100191, China
Interests: artificial intelligence; pattern recognition; machine learning; computer vision; multimedia analytics

Dr. Hong Liu

E-Mail Website
Guest Editor

National Institute of Informatics, Tokyo 101-8430, Japan
Interests: machine learning; computer vision; ML safety/reliability
Special Issues, Collections and Topics in MDPI journals

Dr. Zhun Zhong

E-Mail Website
Guest Editor

Multimedia and Human Understanding Group, University of Trento, 38123 Povo-Trento, Italy
Interests: person re-identification; novel class discovery; domain adaptation

Special Issue Information

Dear Colleagues,

In recent years we have witnessed the development of computing, communication, and storage technologies. Multimedia technology has gained enormous potential to improve processes in a wide range of areas such as advertising, education, entertainment, healthcare, surveillance, wearable computing, biometrics, and remote sensing. Huge quantities of multimedia data require new and innovative approaches to modelling, processing, mining, organizing, and indexing this data in order to effectively and efficiently search, retrieve, deliver, manage, and share multimedia content as required by applications in the aforementioned fields. The main objective of this Special Issue is to bring together researchers and professionals from academia and industry around the world to discuss the wide spectrum of technological opportunities, challenges, solutions, and emerging applications for multimedia content analysis, management, and retrieval. We particularly encourage original work based on interdisciplinary research, such as computer science and social science. Topics of interest include, but are not limited to, the following:

Multimedia annotation, search, and retrieval;
Multimedia signal processing and analysis;
Multimedia content analysis and event detection;
Content-based analysis for multimedia data;
Image and video indexing and classification
Multimodal processing and analysis;
Multimedia applications in education, medicine, surveillance, and remote sensing;
Human-computer interaction.

Dr. Zheng Wang
Dr. Jian Zhao
Dr. Hong Liu
Dr. Zhun Zhong
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

Multimedia analysis
Multimedia management
Multimedia retrieval
Signal processing
Image and video understanding
Human-computer interaction
Multimodal processing
Multimedia applications

Published Papers (10 papers)

Download All Papers

Research

Jump to: Review

14 pages, 8366 KiB

Open AccessArticle

Transformer-Based Multimodal Infusion Dialogue Systems

by Bo Liu, Lejian He, Yafei Liu, Tianyao Yu, Yuejia Xiang, Li Zhu and Weijian Ruan

Electronics 2022, 11(20), 3409; https://doi.org/10.3390/electronics11203409 - 20 Oct 2022

Cited by 1 | Viewed by 1951

Abstract

The recent advancements in multimodal dialogue systems have been gaining importance in several domains such as retail, travel, fashion, among others. Several existing works have improved the understanding and generation of multimodal dialogues. However, there still exists considerable space to improve the quality of output textual responses due to insufficient information infusion between the visual and textual semantics. Moreover, the existing dialogue systems often generate defective knowledge-aware responses for tasks such as providing product attributes and celebrity endorsements. To address the aforementioned issues, we present a Transformer-based Multimodal Infusion Dialogue (TMID) system that extracts the visual and textual information from dialogues via a transformer-based multimodal context encoder and employs a cross-attention mechanism to achieve information infusion between images and texts for each utterance. Furthermore, TMID uses adaptive decoders to generate appropriate multimodal responses based on the user intentions it has determined using a state classifier and enriches the output responses by incorporating domain knowledge into the decoders. The results of extensive experiments on a multimodal dialogue dataset demonstrate that TMID has achieved a state-of-the-art performance by improving the BLUE-4 score by 13.03, NIST by 2.77, image selection Recall@1 by 1.84%. Full article

(This article belongs to the Special Issue Multimedia Content Analysis, Management and Retrieval: Trends and Challenges)

► Show Figures

Figure 1

25 pages, 3087 KiB

Open AccessArticle

Integrated Framework to Assess the Extent of the Pandemic Impact on the Size and Structure of the E-Commerce Retail Sales Sector and Forecast Retail Trade E-Commerce

by Cristiana Tudor

Electronics 2022, 11(19), 3194; https://doi.org/10.3390/electronics11193194 - 5 Oct 2022

Cited by 7 | Viewed by 3476

Abstract

With customers’ increasing reliance on e-commerce and multimedia content after the outbreak of COVID-19, it has become crucial for companies to digitize their business methods and models. Consequently, COVID-19 has highlighted the prominence of e-commerce and new business models while disrupting conventional business activities. Hence, assessing and forecasting e-commerce growth is currently paramount for e-market planners, market players, and policymakers alike. This study sources data for the global e-commerce market leader, the US, and proposes an integrated framework that encompasses automated algorithms able to estimate six statistical and machine-learning univariate methods in order to accomplish two main tasks: (i) to produce accurate forecasts for e-commerce retail sales (e-sale) and the share of e-commerce in total retail sales (e-share); and (ii) to assess in quantitative terms the pandemic impact on the size and structure of the e-commerce retail sales sector. The results confirm that COVID-19 has significantly impacted the trend and structure of the US retail sales sector, producing cumulative excess (or abnormal) retail e-sales of $227.820 billion and a cumulative additional e-share of 10.61 percent. Additionally, estimations indicate a continuation of the increasing trend, with point estimates of $378.691 billion for US e-commerce retail sales that are projected to account for 16.72 percent of total US retail sales by the end of 2025. Nonetheless, the current findings also document that the growth of e-commerce is not a consequence of the COVID-19 crisis, but that the pandemic has accelerated the evolution of the e-commerce sector by at least five years. Overall, the study concludes that the shift towards e-commerce is permanent and, thus, governments (especially in developing countries) should prioritize policies aimed at harnessing e-commerce for sustainable development. Furthermore, in light of the research findings, digital transformation should constitute a top management priority for retail businesses. Full article

(This article belongs to the Special Issue Multimedia Content Analysis, Management and Retrieval: Trends and Challenges)

► Show Figures

Figure 1

11 pages, 1330 KiB

Open AccessArticle

SCA-MMA: Spatial and Channel-Aware Multi-Modal Adaptation for Robust RGB-T Object Tracking

by Run Shi, Chaoqun Wang, Gang Zhao and Chunyan Xu

Electronics 2022, 11(12), 1820; https://doi.org/10.3390/electronics11121820 - 8 Jun 2022

Cited by 1 | Viewed by 1252

Abstract

The RGB and thermal (RGB-T) object tracking task is challenging, especially with various target changes caused by deformation, abrupt motion, background clutter and occlusion. It is critical to employ the complementary nature between visual RGB and thermal infrared data. In this work, we address the RGB-T object tracking task with a novel spatial- and channel-aware multi-modal adaptation (SCA-MMA) framework, which builds an adaptive feature learning process for better mining this object-aware information in a unified network. For each type of modality information, the spatial-aware adaptation mechanism is introduced to dynamically learn the location-based characteristics of specific tracking objects at multiple convolution layers. Further, the channel-aware multi-modal adaptation mechanism is proposed to adaptively learn the feature fusion/aggregation of different modalities. In order to perform object tracking, we employ a binary classification module with two fully connected layers to predict the bounding boxes of specific targets. Comprehensive evaluations on GTOT and RGBT234 datasets demonstrate the significant superiority of our proposed SCA-MMA for robust RGB-T object tracking tasks. In particular, the precision rate (PR) and success rate (SR) on GTOT and RGBT234 datasets can reach

90.5 % / 73.2 %

and

80.2 % / 56.9 %

, significantly higher than the state-of-the-art algorithms. Full article

(This article belongs to the Special Issue Multimedia Content Analysis, Management and Retrieval: Trends and Challenges)

► Show Figures

Figure 1

20 pages, 6179 KiB

Open AccessArticle

A Comparative Study of Reduction Methods Applied on a Convolutional Neural Network

by Aurélie Cools, Mohammed Amin Belarbi and Sidi Ahmed Mahmoudi

Electronics 2022, 11(9), 1422; https://doi.org/10.3390/electronics11091422 - 28 Apr 2022

Cited by 3 | Viewed by 1386

Abstract

With the emergence of smartphones, video surveillance cameras, social networks, and multimedia engines, as well as the development of the internet and connected objects (the Internet of Things—IoT), the number of available images is increasing very quickly. This leads to the necessity of managing a huge amount of data using Big Data technologies. In this context, several sectors, such as security and medicine, need to extract image features (index) in order to quickly and efficiently find these data with high precision. To reach this first goal, two main approaches exist in the literature. The first one uses classical methods based on the extraction of visual features, such as color, texture, and shape for indexation. The accuracy of these methods was acceptable until the early 2010s. The second approach is based on convolutional neuronal networks (CNN), which offer better precision due to the largeness of the descriptors, but they can cause an increase in research time and storage space. To decrease the research time, one needs to reduce the size of these vectors (descriptors) by using dimensionality reduction methods. In this paper, we propose an approach that allows the problem of the “curse of dimensionality” to be solved thanks to an efficient combination of convolutional neural networks and dimensionality reduction methods. Our contribution consists of defining the best combination approach between the CNN layers and the regional maximum activation of convolutions (RMAC) method and its variants. With our combined approach, we propose providing reduced descriptors that will accelerate the research time and reduce the storage space while maintaining precision. We conclude by proposing the best position of an RMAC layer with an increase in accuracy ranging from 4.03% to 27.34%, a decrease in research time ranging from 89.66% to 98.14% in the function of CNN architecture, and a reduction in the size of the descriptor vector by 97.96% on the GHIM-10K benchmark database. Full article

(This article belongs to the Special Issue Multimedia Content Analysis, Management and Retrieval: Trends and Challenges)

► Show Figures

Figure 1

10 pages, 7456 KiB

Open AccessArticle

Quality Assessment of View Synthesis Based on Visual Saliency and Texture Naturalness

by Lijuan Tang, Kezheng Sun, Shuaifeng Huang, Guangcheng Wang and Kui Jiang

Electronics 2022, 11(9), 1384; https://doi.org/10.3390/electronics11091384 - 26 Apr 2022

Cited by 2 | Viewed by 1594

Abstract

Depth-Image-Based-Rendering (DIBR) is one of the core techniques for generating new views in 3D video applications. However, the distortion characteristics of the DIBR synthetic view are different from the 2D image. It is necessary to study the unique distortion characteristics of DIBR views and design effective and efficient algorithms to evaluate the DIBR-synthesized image and guide DIBR algorithms. In this work, the visual saliency and texture natrualness features are extracted to evaluate the quality of the DIBR views. After extracting the feature, we adopt machine learning method for mapping the extracted feature to the quality score of the DIBR views. Experiments constructed on two synthetic view databases IETR and IRCCyN/IVC, and the results show that our proposed algorithm performs better than the compared synthetic view quality evaluation methods. Full article

(This article belongs to the Special Issue Multimedia Content Analysis, Management and Retrieval: Trends and Challenges)

► Show Figures

Figure 1

12 pages, 3207 KiB

Open AccessArticle

Part-Aware Refinement Network for Occlusion Vehicle Detection

by Qifan Wang, Ning Xu, Baojin Huang and Guangcheng Wang

Electronics 2022, 11(9), 1375; https://doi.org/10.3390/electronics11091375 - 25 Apr 2022

Cited by 3 | Viewed by 1593

Abstract

Traditional machine learning approaches are susceptible to factors such as object scale, occlusion, leading to low detection efficiency and poor versatility in vehicle detection applications. To tackle this issue, we propose a part-aware refinement network, which combines multi-scale training and component confidence generation strategies in vehicle detection. Specifically, we divide the original single-valued prediction confidence and adopt the confidence of the visible part of the vehicle to correct the absolute detection confidence of the vehicle. That reduces the impact of occlusion on the detection effect. Simultaneously, we relabel the KITTI data, adding the detailed occlusion information of the vehicles. Then, the deep neural network model is trained and tested using the new images. Our proposed method can automatically extract the vehicle features and solve larger error problems when locating vehicles in traditional approaches. Extensive experimental results on KITTI datasets show that our method significantly outperforms the state-of-the-arts while maintaining the detection time. Full article

(This article belongs to the Special Issue Multimedia Content Analysis, Management and Retrieval: Trends and Challenges)

► Show Figures

Figure 1

20 pages, 2735 KiB

Open AccessArticle

Vehicle Re-Identification with Spatio-Temporal Model Leveraging by Pose View Embedding

by Wenxin Huang, Xian Zhong, Xuemei Jia, Wenxuan Liu, Meng Feng, Zheng Wang and Shin’ichi Satoh

Electronics 2022, 11(9), 1354; https://doi.org/10.3390/electronics11091354 - 24 Apr 2022

Cited by 3 | Viewed by 1448

Abstract

Vehicle re-identification (Re-ID) research has intensified as numerous advancements have been made along with the rapid development of person Re-ID. In this paper, we tackle the vehicle Re-ID problem in open scenarios. This research differs from the early-stage studies that focused on a certain view, and it faces more challenges due to view variations, illumination changes, occlusions, etc. Inspired by the research of person Re-ID, we propose leveraging pose view to enhance the discrimination performance of visual features and utilizing keypoints to improve the accuracy of pose recognition. However, the visual appearance information is still limited by the changing surroundings and extremely similar appearances of vehicles. To the best of our knowledge, few methods have been aware of the spatio-temporal information to supplement visual appearance information, but they neglect the influence of the driving direction. Considering the peculiar characteristic of vehicle movements, we observe that vehicles’ poses on camera views indicating their directions are closely related to spatio-temporal cues. Consequently, we design a two-branch framework for vehicle Re-ID, including a Keypoint-based Pose Embedding Visual (KPEV) model and a Keypoint-based Pose-Guided Spatio-Temporal (KPGST) model. These models are integrated into the framework, and the results of KPEV and KPGST are fused based on a Bayesian network. Extensive experiments performed on the VeRi-776 and VehicleID datasets related to functional urban surveillance scenarios demonstrate the competitive performance of our proposed approach. Full article

(This article belongs to the Special Issue Multimedia Content Analysis, Management and Retrieval: Trends and Challenges)

► Show Figures

Figure 1

12 pages, 10733 KiB

Open AccessArticle

PM_2.5 Concentration Measurement Based on Image Perception

by Guangcheng Wang, Quan Shi and Kui Jiang

Electronics 2022, 11(9), 1298; https://doi.org/10.3390/electronics11091298 - 20 Apr 2022

Cited by 3 | Viewed by 2105

Abstract

PM_2.5 in the atmosphere causes severe air pollution and dramatically affects the normal production and lives of residents. The real-time monitoring of PM_2.5 concentrations has important practical significance for the construction of ecological civilization. The mainstream PM_2.5 concentration prediction algorithms based on electrochemical sensors have some disadvantages, such as high economic cost, high labor cost, time delay, and more. To this end, we propose a simple and effective PM_2.5 concentration prediction algorithm based on image perception. Specifically, the proposed method develops a natural scene statistical prior to estimating the saturation loss caused by the ’haze’ formed by PM_2.5. After extracting the prior features, this paper uses the feedforward neural network to achieve the mapping function from the proposed prior features to the PM_2.5 concentration values. Experiments constructed on the public Air Quality Image Dataset (AQID) show the superiority of our proposed PM_2.5 concentration measurement method compared to state-of-the-art related PM_2.5 concentration monitoring methods. Full article

(This article belongs to the Special Issue Multimedia Content Analysis, Management and Retrieval: Trends and Challenges)

► Show Figures

Figure 1

27 pages, 1358 KiB

Open AccessArticle

Bagged Tree and ResNet-Based Joint End-to-End Fast CTU Partition Decision Algorithm for Video Intra Coding

by Yixiao Li, Lixiang Li, Yuan Fang, Haipeng Peng and Nam Ling

Electronics 2022, 11(8), 1264; https://doi.org/10.3390/electronics11081264 - 16 Apr 2022

Cited by 8 | Viewed by 1907

Abstract

Video coding standards, such as high-efficiency video coding (HEVC), versatile video coding (VVC), and AOMedia video 2 (AV2), achieve an optimal encoding performance by traversing all possible combinations of coding unit (CU) partition and selecting the combination with the minimum coding cost. It is still necessary to further reduce the encoding time of HEVC, because HEVC is one of the most widely used coding standards. In HEVC, the process of searching for the best performance is the source of most of the encoding complexity. To reduce the complexity of the coding block partition in HEVC, a new end-to-end fast algorithm is presented to aid the partition structure decisions of the coding tree unit (CTU) in intra coding. In the proposed method, the partition structure decision problem of a CTU is solved by a novel two-stage strategy. In the first stage, a bagged tree model is employed to predict the splitting of a CTU. In the second stage, the partition problem of a 32 × 32-sized CU is modeled as a 17-output classification task for the first time, so that it can be solved by a single prediction. To achieve a high prediction accuracy, a residual network (ResNet) with 34 layers is employed. Jointly using bagged tree and ResNet, the proposed fast CTU partition algorithm is able to generate the partition quad-tree structure of a CTU through an end-to-end prediction process, which abandons the traditional scheme of making multiple decisions at various depth levels. In addition, several datasets are used in this paper to lay the foundation for high prediction accuracy. Compared with the original HM16.7 encoder, the experimental results show that the proposed algorithm can reduce the encoding time by 60.29% on average, while the Bjøntegaard delta rate (BD-rate) loss is as low as 2.03%, which outperforms the results of most of the state-of-the-art approaches in the field of fast intra CU partition. Full article

(This article belongs to the Special Issue Multimedia Content Analysis, Management and Retrieval: Trends and Challenges)

► Show Figures

Figure 1

Review

Jump to: Research

18 pages, 2168 KiB

Open AccessReview

Visible-Infrared Person Re-Identification: A Comprehensive Survey and a New Setting

by Huantao Zheng, Xian Zhong, Wenxin Huang, Kui Jiang, Wenxuan Liu and Zheng Wang

Electronics 2022, 11(3), 454; https://doi.org/10.3390/electronics11030454 - 3 Feb 2022

Cited by 10 | Viewed by 3462

Abstract

Person re-identification (ReID) plays a crucial role in video surveillance with the aim to search a specific person across disjoint cameras, and it has progressed notably in recent years. However, visible cameras may not be able to record enough information about the pedestrian’s appearance under the condition of low illumination. On the contrary, thermal infrared images can significantly mitigate this issue. To this end, combining visible images with infrared images is a natural trend, and are considerably heterogeneous modalities. Some attempts have recently been contributed to visible-infrared person re-identification (VI-ReID). This paper provides a complete overview of current VI-ReID approaches that employ deep learning algorithms. To align with the practical application scenarios, we first propose a new testing setting and systematically evaluate state-of-the-art methods based on our new setting. Then, we compare ReID with VI-ReID in three aspects, including data composition, challenges, and performance. According to the summary of previous work, we classify the existing methods into two categories. Additionally, we elaborate on frequently used datasets and metrics for performance evaluation. We give insights on the historical development and conclude the limitations of off-the-shelf methods. We finally discuss the future directions of VI-ReID that the community should further address. Full article

(This article belongs to the Special Issue Multimedia Content Analysis, Management and Retrieval: Trends and Challenges)

► Show Figures