Next Article in Journal
Veterinary Drug Residues in the Food Chain as an Emerging Public Health Threat: Sources, Analytical Methods, Health Impacts, and Preventive Measures
Next Article in Special Issue
Dynamic Nondestructive Detection Models of Apple Quality in Critical Harvest Period Based on Near-Infrared Spectroscopy and Intelligent Algorithms
Previous Article in Journal
Is Sustainable Consumption a Sufficient Motivator for Consumers to Adopt Meat Alternatives? A Consumer Perspective on Plant-Based, Cell-Culture-Derived, and Insect-Based Alternatives
Previous Article in Special Issue
Exploration of Convective and Infrared Drying Effect on Image Texture Parameters of ‘Mejhoul’ and ‘Boufeggous’ Date Palm Fruit Using Machine Learning Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval

College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China
*
Author to whom correspondence should be addressed.
Foods 2024, 13(11), 1628; https://doi.org/10.3390/foods13111628
Submission received: 30 April 2024 / Revised: 20 May 2024 / Accepted: 20 May 2024 / Published: 23 May 2024
(This article belongs to the Special Issue Applications of Artificial Intelligence in Food Industry)

Abstract

As a prominent topic in food computing, cross-modal recipe retrieval has garnered substantial attention. However, the semantic alignment across food images and recipes cannot be further enhanced due to the lack of intra-modal alignment in existing solutions. Additionally, a critical issue named food image ambiguity is overlooked, which disrupts the convergence of models. To these ends, we propose a novel Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval (MMACMR). To consider inter-modal and intra-modal alignment together, this method measures the ambiguous food image similarity under the guidance of their corresponding recipes. Additionally, we enhance recipe semantic representation learning by involving a cross-attention module between ingredients and instructions, which is effective in supporting food image similarity measurement. We conduct experiments on the challenging public dataset Recipe1M; as a result, our method outperforms several state-of-the-art methods in commonly used evaluation criteria.
Keywords: cross-modal recipe retrieval; multi-modal alignment; food image ambiguity; deep learning cross-modal recipe retrieval; multi-modal alignment; food image ambiguity; deep learning

Share and Cite

MDPI and ACS Style

Zou, Z.; Zhu, X.; Zhu, Q.; Zhang, H.; Zhu, L. Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval. Foods 2024, 13, 1628. https://doi.org/10.3390/foods13111628

AMA Style

Zou Z, Zhu X, Zhu Q, Zhang H, Zhu L. Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval. Foods. 2024; 13(11):1628. https://doi.org/10.3390/foods13111628

Chicago/Turabian Style

Zou, Zhuoyang, Xinghui Zhu, Qinying Zhu, Hongyan Zhang, and Lei Zhu. 2024. "Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval" Foods 13, no. 11: 1628. https://doi.org/10.3390/foods13111628

APA Style

Zou, Z., Zhu, X., Zhu, Q., Zhang, H., & Zhu, L. (2024). Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval. Foods, 13(11), 1628. https://doi.org/10.3390/foods13111628

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop