Advances and Challenges in Multimodal Machine Learning

A special issue of Journal of Imaging (ISSN 2313-433X). This special issue belongs to the section "AI in Imaging".

Deadline for manuscript submissions: 30 June 2024 | Viewed by 9654

Special Issue Editor


E-Mail Website1 Website2
Guest Editor
Department of Computer Science, Loughborough University, Loughborough LE11 3TU, UK
Interests: cross-modal information retrieval; continual lifelong learning; explainable and ethical AI; sensitivity analysis in machine vision and text; natural language processing; machine vision
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The emerging field of multimodal machine learning has seen much progress in the past few years; however, several core challenges remain. These challenges are mainly around learning how to represent and summarise multimodal data (representation); translating (mapping) data from one modality to another (translation); identifying direct relations between elements from different modalities (alignment); joining or fusing information from two or more modalities to perform a prediction task (fusion); and transferring knowledge between modalities, their representations, and predictive models (co-learning).

Within the field of information retrieval, the enormous and continually growing volume of data has given rise to the need for retrieval solutions that can deal with the search process of using one modality as a query to retrieve related information in another modality, known as cross-modal retrieval. In recent years, cross-modal retrieval methods have attracted considerable attention due to the learning capabilities of deep learning methods; however, most of these methods assume that data examples in different modalities are fully paired, when in reality these data are not often paired.

Furthermore, the continually growing volume of data has given rise to the additional challenge of developing lifelong learning models than can continue to efficiently learn new volumes of data. Lifelong learning remains a challenge for machine learning models and most research on the topic focuses on classification tasks. There is a need to focus on lifelong learning for information retrieval and to propose methods for dealing with the continuous growth of information that can lead to catastrophic forgetting or interference. This limitation represents a major drawback for models that typically learn representations from batches of training data, when in reality information becomes incrementally available over time. The challenge of lifelong learning increases when dealing with cross-modal learning.

We request contributions presenting techniques that will contribute to addressing multimodal machine learning challenges, and we strongly encourage contributions that propose advances in the field of continual lifelong learning for multimodal machine learning applications. 

Dr. Georgina Cosma
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • neural information retrieval
  • multi-modal and cross-modal information retrieval
  • relevance feedback and query expansion in multimodal retrieval
  • sensitivity analysis of images or multi-modal data
  • visual semantic embedding for information retrieval and other tasks
  • continual lifelong learning for information retrieval
  • temporal modelling of multi-modal data

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

22 pages, 9214 KiB  
Article
A Framework for Enabling Unpaired Multi-Modal Learning for Deep Cross-Modal Hashing Retrieval
by Mikel Williams-Lekuona, Georgina Cosma and Iain Phillips
J. Imaging 2022, 8(12), 328; https://doi.org/10.3390/jimaging8120328 - 15 Dec 2022
Cited by 3 | Viewed by 1928
Abstract
Cross-Modal Hashing (CMH) retrieval methods have garnered increasing attention within the information retrieval research community due to their capability to deal with large amounts of data thanks to the computational efficiency of hash-based methods. To date, the focus of cross-modal hashing methods has [...] Read more.
Cross-Modal Hashing (CMH) retrieval methods have garnered increasing attention within the information retrieval research community due to their capability to deal with large amounts of data thanks to the computational efficiency of hash-based methods. To date, the focus of cross-modal hashing methods has been on training with paired data. Paired data refers to samples with one-to-one correspondence across modalities, e.g., image and text pairs where the text sample describes the image. However, real-world applications produce unpaired data that cannot be utilised by most current CMH methods during the training process. Models that can learn from unpaired data are crucial for real-world applications such as cross-modal neural information retrieval where paired data is limited or not available to train the model. This paper provides (1) an overview of the CMH methods when applied to unpaired datasets, (2) proposes a framework that enables pairwise-constrained CMH methods to train with unpaired samples, and (3) evaluates the performance of state-of-the-art CMH methods across different pairing scenarios. Full article
(This article belongs to the Special Issue Advances and Challenges in Multimodal Machine Learning)
Show Figures

Figure 1

14 pages, 1818 KiB  
Article
A Multimodal Knowledge-Based Deep Learning Approach for MGMT Promoter Methylation Identification
by Salvatore Capuozzo, Michela Gravina, Gianluca Gatta, Stefano Marrone and Carlo Sansone
J. Imaging 2022, 8(12), 321; https://doi.org/10.3390/jimaging8120321 - 03 Dec 2022
Cited by 1 | Viewed by 1819
Abstract
Glioblastoma Multiforme (GBM) is considered one of the most aggressive malignant tumors, characterized by a tremendously low survival rate. Despite alkylating chemotherapy being typically adopted to fight this tumor, it is known that O(6)-methylguanine-DNA methyltransferase (MGMT) enzyme repair abilities can antagonize the cytotoxic [...] Read more.
Glioblastoma Multiforme (GBM) is considered one of the most aggressive malignant tumors, characterized by a tremendously low survival rate. Despite alkylating chemotherapy being typically adopted to fight this tumor, it is known that O(6)-methylguanine-DNA methyltransferase (MGMT) enzyme repair abilities can antagonize the cytotoxic effects of alkylating agents, strongly limiting tumor cell destruction. However, it has been observed that MGMT promoter regions may be subject to methylation, a biological process preventing MGMT enzymes from removing the alkyl agents. As a consequence, the presence of the methylation process in GBM patients can be considered a predictive biomarker of response to therapy and a prognosis factor. Unfortunately, identifying signs of methylation is a non-trivial matter, often requiring expensive, time-consuming, and invasive procedures. In this work, we propose to face MGMT promoter methylation identification analyzing Magnetic Resonance Imaging (MRI) data using a Deep Learning (DL) based approach. In particular, we propose a Convolutional Neural Network (CNN) operating on suspicious regions on the FLAIR series, pre-selected through an unsupervised Knowledge-Based filter leveraging both FLAIR and T1-weighted series. The experiments, run on two different publicly available datasets, show that the proposed approach can obtain results comparable to (and in some cases better than) the considered competitor approach while consisting of less than 0.29% of its parameters. Finally, we perform an eXplainable AI (XAI) analysis to take a little step further toward the clinical usability of a DL-based approach for MGMT promoter detection in brain MRI. Full article
(This article belongs to the Special Issue Advances and Challenges in Multimodal Machine Learning)
Show Figures

Figure 1

18 pages, 2967 KiB  
Article
SIFT-CNN: When Convolutional Neural Networks Meet Dense SIFT Descriptors for Image and Sequence Classification
by Dimitrios Tsourounis, Dimitris Kastaniotis, Christos Theoharatos, Andreas Kazantzidis and George Economou
J. Imaging 2022, 8(10), 256; https://doi.org/10.3390/jimaging8100256 - 21 Sep 2022
Cited by 8 | Viewed by 4976
Abstract
Despite the success of hand-crafted features in computer visioning for many years, nowadays, this has been replaced by end-to-end learnable features that are extracted from deep convolutional neural networks (CNNs). Whilst CNNs can learn robust features directly from image pixels, they require large [...] Read more.
Despite the success of hand-crafted features in computer visioning for many years, nowadays, this has been replaced by end-to-end learnable features that are extracted from deep convolutional neural networks (CNNs). Whilst CNNs can learn robust features directly from image pixels, they require large amounts of samples and extreme augmentations. On the contrary, hand-crafted features, like SIFT, exhibit several interesting properties as they can provide local rotation invariance. In this work, a novel scheme combining the strengths of SIFT descriptors with CNNs, namely SIFT-CNN, is presented. Given a single-channel image, one SIFT descriptor is computed for every pixel, and thus, every pixel is represented as an M-dimensional histogram, which ultimately results in an M-channel image. Thus, the SIFT image is generated from the SIFT descriptors for all the pixels in a single-channel image, while at the same time, the original spatial size is preserved. Next, a CNN is trained to utilize these M-channel images as inputs by operating directly on the multiscale SIFT images with the regular convolution processes. Since these images incorporate spatial relations between the histograms of the SIFT descriptors, the CNN is guided to learn features from local gradient information of images that otherwise can be neglected. In this manner, the SIFT-CNN implicitly acquires a local rotation invariance property, which is desired for problems where local areas within the image can be rotated without affecting the overall classification result of the respective image. Some of these problems refer to indirect immunofluorescence (IIF) cell image classification, ground-based all-sky image-cloud classification and human lip-reading classification. The results for the popular datasets related to the three different aforementioned problems indicate that the proposed SIFT-CNN can improve the performance and surpasses the corresponding CNNs trained directly on pixel values in various challenging tasks due to its robustness in local rotations. Our findings highlight the importance of the input image representation in the overall efficiency of a data-driven system. Full article
(This article belongs to the Special Issue Advances and Challenges in Multimodal Machine Learning)
Show Figures

Figure 1

Back to TopTop