Multimodal Recipe Recommendation with Heterogeneous Graph Neural Networks

Ouyang, Ruiqi; Huang, Haodong; Ou, Weihua; Liu, Qilong

doi:10.3390/electronics13163283

Open AccessArticle

Multimodal Recipe Recommendation with Heterogeneous Graph Neural Networks

by

Ruiqi Ouyang

^1,†,

Haodong Huang

^2,†,

Weihua Ou

^1,2,*

and

Qilong Liu

¹

School of Mathematical Sciences, Guizhou Normal University, Guiyang 550025, China

²

School of Bigdata and Computer Science, Guizhou Normal University, Guiyang 550025, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2024, 13(16), 3283; https://doi.org/10.3390/electronics13163283

Submission received: 3 June 2024 / Revised: 9 August 2024 / Accepted: 14 August 2024 / Published: 19 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

Recipe recommendation is the process of recommending suitable recipes to users based on factors such as user preferences and dietary needs. Recipes typically involve multiple modalities, with text and images being common, while most typical recipe recommendation methods recommend recipes to users based on text. Obviously, the expressiveness of a single modal is often not enough, and the semantic information of images is more abundant. Moreover, it is difficult to grasp the feature fusion granularity of different kinds of modal information and the relationship between recipes and users. To solve the above problem, this paper proposes a Multimodal Heterogeneous Graph Neural Network Recipe Recommendation (MHGRR) architecture, which aims to fully fuse the various kinds of modal information of recipes and handle the relationship between users and recipes. We use embedding and shallow Convolutional Neural Networks (CNNs) to extract original text and image information for unifying feature fusion granularity, and use Heterogeneous Graph Neural Networks based on GraphSAGE to capture the complex relationship between users and recipes. To verify the effectiveness of our proposed model, we perform some comparative experiments on a real dataset; the experiments show that our method outperforms most popular recipe recommendation methods. Through an ablation experiment, we found that adding image information to recipe recommendation is more effective, and we additionally found that as the output dimensions of GraphSAGE increased, the performance of the model varied little.

Keywords:

recipe recommendation; heterogeneous graph neural networks; GraphSAGE; multimodal

1. Introduction

With the development of the Internet, the number of users of various websites has gradually increased, and food-related websites are no exception. However, due to the vast number of available recipes, users often struggle to find those that suit them best. Effective recipe recommendations can enhance user satisfaction by providing personalized meal options that cater to individual dietary preferences, nutritional needs, and cooking skills. This not only saves users time and effort in meal planning but also promotes healthier eating habits and reduces food waste by suggesting recipes based on available ingredients. Therefore, how to recommend suitable recipes to users is a valuable problem [1,2,3,4].

Recipes often include text and images [5], with some also containing voice information. A good recipe typically features ingredients, images, and cooking procedures. According to [6,7,8], we can conclude that people usually choose recipes by first looking at images, then checking ingredients, and finally reviewing the cooking process. Thus, images and ingredients contain significant user preference information. However, most recipe recommendation methods recommend recipes to users with a text modal [9,10,11,12,13], and ignore ingredients. The expressiveness of a single modal is not as good as that of multiple modalities, and the semantic information of the image is more abundant; as shown in Figure 1, you can only know the composition of the ingredients through text, but you can not only understand the composition of the ingredients through pictures, but you can also understand much semantic information that cannot be intuitively expressed by text, such as food color, food shape, etc. To achieve a more accurate user preference prediction, we choose a text modal and an image modal as our model input, and text is mainly composed of ingredients.

When merging multiple modalities, it is very important to unify the feature granularity of different modalities. As shown in Figure 1, after extracting the input text, although the text feature contains all the information about the recipe ingredients, its values are only 0 and 1, unlike the image feature, whose value range includes 0 to 255. It is obvious that there is a big difference in the distribution of text features and image features. Therefore, before we fuse image and text features, we use embedding to change the distribution of the original text features, and use a shallow CNN to obtain low-level image features for unifying the granularity of feature fusion.

The relationship between users and recipes is complicated, influenced by factors like personal preferences and eating habits. To make successful recipe recommendations, advanced models must analyze this complexity and capture the subtle connections between users and recipes. Understanding these relationships is essential for providing personalized and accurate recommendations that meet diverse dietary needs. Transformation is very popular and has achieved great success in many fields, but it is not suitable for processing complex relationships [14]. Therefore, we selected Graph Neural Networks (GNNs), which are naturally adept at processing complex relationships, for recipe recommendation. We build a heterogeneous neural network based on GraphSAGE [15] to further extract user features and recipe features. And we rely on the neighbor sampling operation of GraphSAGE to eliminate the heterogeneity between users and recipes to better grasp the relationship between users and recipes.

The main contributions of this paper are summarized as follows: (1) We propose a Multimodal Heterogeneous Graph Neural Network Recipe Recommendation (MHGRR) architecture. (2) We propose a new multimodal fusion method for recipe recommendation. (3) We recommend recipes to users from an image modal and text modal. (4) We found that as the output dimensions of GraphSAGE increased, the performance of the model varied little.

2. Related Work

2.1. Typical Recommendation

Typical recommendation refers to predicting a user’s preference for unknown items based on the user’s historical behavior data and item modalities, and recommends items that may be of interest to the user. The approach is usually divided into two types: content-based recommendation and collaborative filtering (CF) recommendation. Content-based recommendation [16,17,18,19,20] relies on the characteristics of the item itself, and recommends similar items based on items the user has liked, while collaborative filtering recommendation [21,22,23,24,25] recommends items by finding similarities between users or items. Although classical recommender methods have been widely used in the past decade, they also have some disadvantages. First, these methods usually need a large amount of data for prediction to achieve better performance; if the amount of data is not enough, the effect may be poor. Second, it is difficult for typical recommendation to handle image information and understand the deep semantic information of recipes. Third, since these methods only consider the similarity between users and items, other important information such as context, image, etc., is ignored. And most CF-based recommendation captures user preference through a single modal of item, which is not reasonable; user preferences are contained in multiple modalities, and every one is different. Our task is to recommend an item for a user with multiple modalities.

2.2. Recipe Recommendation

With the advancement of science and technology and the improvement in people’s living standards, people are paying more and more attention to food, and recipe recommendation has gradually received more and more attention. Many researchers have begun to use improved typical methods and machine learning to design personalized recipe recommendations to meet the nutritional needs and preferences of different people [11,12,13,26,27,28,29,30,31,32,33].

Reference [34] recommends recipes which are consumed by groups of users, rather than by individuals, based on collaborative filtering. Teng et al. proposed a new method of recipe recommendation based on ingredient networks [6]; they wanted to study the connections between different recipes with similar ingredient compositions. Soon after, Forbes et al. added ingredient information into Matrix Factorization (MF) [35]; they wanted to use the inherent relationship between ingredients and recipes to improve the expressiveness of the model and increase the explainability of the model. But these methods lack communication with the user; Bilgin et al. propose a recipe recommendation method [36] which allows the users to communicate easily with the hidden networked devices which are used for recipe recommendation. Nutrition is also an important part of recipe recommendations, and a nutritionally balanced recipe can not only supply necessary energy to the human body, but also benefit physical and mental health; references [13,37,38,39] all propose nutrition-based recipe recommendation. Machine learning is very popular nowadays, and many scholars have also applied machine learning to recipe recommendations; reference [40] explores the possibility of designing a personalized meal plan using a machine learning model, trained on a dataset of over 10,000 recipes from publicly available databases. In reference [11], the authors use an attention-based Convolutional Neural Network for recipe recommendation, aiming to use Graph Neural Networks to better understand and exploit complex data relationships to provide more accurate and personalized recipe recommendations, and its performance exceeds many traditional methods. However, most recipe recommendation methods do not comprehensively consider multiple recipe modalities which users care about most, such as text and image, and they cannot fit user performance well. Therefore, we propose a multimodal recipe recommendation to leverage the image modal and text modal of a recipe to fit user performance.

3. Problem Description and Formulation

The purpose of food recommendation is to predict recipes which may be approved for users based on the information about the recipe, such as recipe images and ingredients. In our following methods and experiments, we will study the impact of multimodal input and unimodal input on the final recipe recommendation effect.

We use a heterogeneous graph G which includes two types of nodes to denote the user–recipe relation accurately based on user rating history, which is formulated as follows:

G = \{(u, r, y) | u \in U, r \in D\}

(1)

where

U = \{u_{1}, u_{2}, \dots, u_{m}\}

and

D = \{r_{1}, r_{2}, \dots, r_{n}\}

denote the user and recipe set, respectively. y denotes the interaction of user and recipe: if user u rates recipe r,

y = 1

; otherwise

y = 0

.

To encode the ingredients’ text information into the recipe efficiently, we denote an ingredient set

I = \{i_{1}, i_{2}, \dots, i_{a}\}

, and number each ingredient. Besides ingredients, we denote a recipe image set

P = \{p_{1}, p_{2}, \dots, p_{n}\}

for efficient querying.

Through our model, we output a score S to predict user preference with ingredient text, recipe image, and user rating history.

4. Methodology

In this paper, we propose a Multimodal Heterogeneous Graph Neural Network Recipe Recommendation (MHGRR) architecture to fit user preference, which is illustrated in Figure 2.

4.1. Embedding Layer

Typically, recipes contain a variety of ingredient information that is usually discrete. To extract this discrete information and facilitate subsequent recommendations, we use an embedding layer to initialize users and recipes and transform user and recipe features into the same space. The embedding layer comprises an encoding operation and a linear layer; we encode ingredient information into recipes with multi-hot encoding, and encode users with one-hot encoding, and then we input the original encoded user and recipe features into a linear layer to the obtain user and recipe embedding features.

4.1.1. Encoding for User and Recipe

According to the user position in U, we encode each user with one-hot encoding as am original user feature vector

u_{m} \in {[0, 1]}^{| U |}

. And we adopt multi-hot encoding to encode recipe ingredient text into the recipe as an original recipe feature; if recipe contains the ingredients of I, according to the ingredients’ position in I, the corresponding position of the original recipe feature vector

r_{n} \in {[0, 1]}^{| I |}

is 1. To distinguish different recipes with same ingredients, we add a recipe id into original recipe feature with one-hot encoding. Then, we pad original encoding vector with zeros to make the

u_{m}

and

r_{n}

dimensions the same.

4.1.2. Feature Transformation with Linear Layer

To map the original user and recipe features into the same feature space for obtaining user and recipe latent interaction, numerous experiments have shown that training with separate weight matrices for users and recipes leads to difficulties in convergence, and the model performs poorly. Therefore we adopt a linear layer to transform user and recipe features with same weight matrix, which is formulated as follows:

u_{m}^{E} = W_{E} u_{m}, r_{n}^{E} = W_{E} r_{n}

(2)

where

u_{m}^{E} \in R^{d}

and

r_{n}^{E} \in R^{d}

are the embedding feature vectors of user and recipe through the linear layer, respectively.

W_{E}

\in R^{d \times (| U | + | I |)}

is the weight matrix of the embedding layer.

u_{m} \in R^{| U | + | I |}

and

r_{n} \in R^{| U | + | I |}

are the original encoding vectors of user and recipe, respectively.

4.2. Recipe Image Feature Extraction

The CNN is a popular deep learning model specifically used for processing image and spatial data. Due to its excellent performance, the CNN is widely used in computer vision and has achieved remarkable success in tasks such as image classification, target detection, and image generation. Due to its excellent performance, and there is a lot of fine-grained information and contextual information in recipe images, such as ingredients, food shape, food color, food appearance, food cooking state, etc., we extract the local feature and potential contextual feature of the recipe image with a two-layer CNN for guiding recipe recommendation more accurately and unifying feature fusion granularity. We leverage a convolution layer to obtain the recipe image local feature, and use a pooling layer to hold the latent contextual feature. Each layer of the CNN is formulated as follows:

p_{n}^{k} = p o o l (σ (c o n v (p_{n}^{k - 1}, W_{c}^{k}, b_{c}^{k})))

(3)

where k is the number of layers,

p_{n}^{k} \in R^{C \times H \times W}

is the recipe image feature. C is the channels of the image and H and W are the height and width of the image, respectively.

W_{c}^{k} \in R^{3 \times 3}

is the filter weight matrix.

b_{c}^{k}

is basis vector.

c o n v (\cdot)

is the convolution operation.

p o o l (\cdot)

is the pooling operation.

σ (\cdot)

is the Relu activation function.

4.3. Feature Fusion

In this part, we fuse the recipe text feature and image feature on a shallow feature for better feature alignment. Initially, we contact the recipe text feature

r_{n}^{E}

and image feature

p_{n}^{2}

; then, we perform a coarse-grained fusion of these two features with a one-layer linear layer, which is formulated as follows:

r_{n}^{F} = W_{F} c a t (r_{n}^{E}, p_{n}^{2})

(4)

where

r_{n}^{F} \in R^{d}

is the fusion recipe feature.

W_{F} \in R^{d \times (H \times W + d)}

is the weight matrix of the liner layer.

p_{n}^{2} \in R^{1 \times H \times W}

is the recipe image feature through the two-layer CNN.

c a t (\cdot)

is the contact operation.

4.4. GraphSAGE Layer

To better extract the connection feature between users and recipes and the high-level feature of users and recipes, and weaken the heterogeneity in recipe recommendations through sampling, we adopt a two-layer GraphSAGE to extract the features of users and recipes, which is composed of message propagation and feature aggregation. The input of the GraphSAGE is a heterogeneous graph, and there are two type nodes to better recommend a recipe for the user; the two types of nodes need to contain each other’s information, so we do not distinguish between the two types of nodes during message propagation and feature aggregation.

4.4.1. Message Propagation

In GraphSAGE, the propagation route of node messages is determined by sampling. According to node distribution, we sample the two-hop neighbor for each node. For the first hop, we sample 20 neighbors of each node. For the second hop, 10 neighbors are sampled.

4.4.2. Feature Aggregation

To fuse local features and global features, we aggregate the features of the neighbors of the target nodes. But there are two types of nodes, users and recipes. We adopt the same strategy to aggregate features, but their computation methods are different, which are formulated as follows:

\begin{matrix} u_{m}^{k} = & (W^{k} c a t (u_{m}^{k - 1}, m e a n (r_{n}^{k - 1}, \forall r_{n} \in N (u_{m})))), \\ r_{n}^{k} = & (W^{k} c a t (r_{n}^{k - 1}, m e a n (u_{m}^{k - 1}, \forall u_{m} \in N (r_{n})))), \\ u_{m}^{0} = & u_{m}^{E}, r_{n}^{0} = r_{n}^{F} \end{matrix}

(5)

where k is the depth.

u_{m}^{k} \in R^{d}

and

r_{n}^{k} \in R^{d}

are the user and recipe features through GraphSAGE.

W^{k} \in R^{d \times 2 d}

is the weight matrix of layer k.

c a t (\cdot)

is the contact operation.

N (u_{m})

is the neighbor set of

u_{m}

.

m e a n (\cdot)

is an operation which calculates the average of all neighbor features.

4.5. Prediction

To connect dense user and recipe vectors and output the user ratings of recipes, we use cosine similarity to predict the connection between users and recipes, which is formulated as follows:

S = \frac{u_{m}^{k} \cdot r_{n}^{k}}{| | u_{m}^{k} | | \cdot | | r_{n}^{k} | |}, S \in R

(6)

Loss. To obtain a more accurate score, we adopt MSE (Mean Squared Error) loss to train our model, which is formulated as follows:

l = \frac{1}{n} \sum_{i = 1}^{n} {(S_{i} - l_{i})}^{2}

(7)

where

S_{i}

is the rating from our model and

l_{i}

is the rating label from the dataset.

5. Experiments

5.1. Dataset

To verify the effectiveness of our proposed method, we use the MealRec [41] dataset, which is available at https://github.com/WUT-IDEA/MealRec (accessed on 23 May 2022). We adopt recipes, users, ratings of users for recipes, recipe images, and recipe ingredients as the input of our model. There are 1575 users, 7280 recipes, 9211 ingredients, 7280 recipe images, and 151,148 ratings. And each user has rated at least 50 recipes. The data from the dataset adopted by us is shown in Table 1. Since there are 151,148 ratings in the dataset, in order to prevent overfitting of model on the training set, the data are split into training (60%), validation (20%), and test (20%) sets.

5.2. Experimental Settings

5.2.1. Evaluation Metrics

To test the quality of all items rated by the MHGRR, we select three evaluation metrics to evaluate the performance of our recipe recommendation (MHGRR): ACC (Accuracy rate), MAE (Mean Absolute Error), and RMSE (Root Mean Square Error); these are commonly used in the score-based literature [34,42,43,44,45,46,47,48].

5.2.2. Baselines

To verify our model effectiveness, we compare the MHGRR with other methods; the adopted methods are introduced as follows:

CF [49]: Collaborative filtering is a common method used in recommendation systems to make personalized recommendations based on user behavior and feedback data. The core idea of this method is that if two users have similar behaviors or interests in some aspects, they may also have similar interests in other aspects.
Content-based Food Recommendation (CFR) [37]: The method is an improvement of collaborative filtering; it proposes multiple methods for calculating user and recipe similarity and we adopt the Pearson’s correlation algorithm in it. And the method incorporates the relationship between recipes.
Content-boosted Matrix Factorization (CMF) [35]: Matrix factorization is an important technique in linear algebra and mathematical computing. It aims at splitting a complex matrix into the product of several simpler submatrices or vectors with good interpretability, and matrix factorization can effectively handle the sparse data problem in recipe recommendations. The method is an improvement on matrix factorization; it incorporates ingredient information into matrix factorization.
LightGCN [50]: A Graph Convolutional Network (GCN) is a deep learning model used for processing graph-structured data, such as user–item relationships found in social networks, recommendation systems, and molecular structures in bioinformatics. A GCN is capable of handling complex user–item relationships, provides good interpretability, and is suitable for large-scale recommendation systems. In this approach, the authors extend a GCN to develop a lightweight and effective recommendation model. They eliminate unnecessary elements and modify the approach to neighborhood aggregation and message propagation.
GTN [51]: Graph Neural Networks (GNNs) have been widely used in recommendation systems and have shown remarkable effectiveness. However, most current GNN-based recommendation systems tend to neglect interactions due to unreliable behavior (e.g., random/clickbait) and treat all interactions uniformly; this approach can lead to suboptimal and unstable performance. To overcome these limitations, the authors introduce a principled graph trend collaborative filtering method. They present Graph Trend Filtering Networks for Recommendations (GTNs), which are specifically designed to capture the adaptive reliability of interactions.
GraphDA [52]: Graph Collaborative Filtering (GCF) is widely used to capture complex collaborative signals in recommendation systems. However, GCF faces challenges with its bipartite adjacency matrix, especially for users/items with abundant or insufficient interactions; this matrix, which defines aggregated neighbors based on user–item interactions, can introduce noise. Moreover, it neglects user–user and item–item correlations, which limits the inclusion of useful neighbors. In this approach, the authors propose a new graph adjacency matrix that incorporates user–user and item–item correlations. They also introduce a carefully designed user–item interaction matrix that aims to balance the number of interactions across users.

5.2.3. Implementation Details

We take the ratings from the test set as the object of prediction, and adopt an embedding layer to extract the recipe text feature, use a 2-layer CNN to extract the recipe image feature, and then we leverage a linear layer-fused coarse-grained text feature and image feature; eventually, we adopt a 2-layer GraphSAGE to extract the high-level features of the user and recipe based on user rating history, the fused recipe feature, and the user feature, and then we predict the user preference for the recipe with cosine similarity. The evaluation metrics are computed by the real ratings and the prediction ratings from our model. We implement our model in Python using the libraries PyTorch (https://pytorch.org/) (accessed on 17 May 2023) and PyG (https://pytorch-geometric.readthedocs.io/en/latest/index.html) (accessed on 11 July 2022). The learning rate is set to 0.001.

5.3. Comparative Experiments

To verify the effectiveness of the MHGRR and its capacity to handle sparse data, complex user–recipe interactions, and large data, we make some comparative experiments. Table 2 shows the performance in comparison to the baselines; we show the best performance we can obtain from each method. Collaborative filtering is a classic recommendation algorithm that is widely used in various fields; although it is very effective in many situations, there are also some challenges, such as cold starts and the data sparsity problem. From Table 2, it can be seen that the CF performance is poor, and this is due to the fact that users have only rated some recipes, unrated recipes are filled with 0, user rating data are sparse, and this further verified that our model can handle sparse data well. CFR is an improvement of CF, which also has the problem of data sparseness. Although the similarity calculation method has been improved, the essence is the same; as can be seen from Table 2, its effect is also very poor, which further illustrates that traditional recommendation algorithms such as collaborative filtering have difficulty handling sparse data. To research the impact of sparse data on recipe recommendations, we experiment on the test set with a CMF, which is an improvement of matrix factorization, which can effectively handle the data sparsity problem in recipe recommendations; Table 2 shows that compared with the first two methods, the CMF shows a great improvement, and it can be seen that sparse data have a greater impact on recipe recommendations. But the performance of the CMF is normal, and it is related to the fact that user and recipe interaction is complex, the ingredients amount is large, and the methods based on matrix decomposition are not powerful enough. To research the impact of large data and complex relationships between users and recipes on recipe recommendation, we experiment on the test set with a LightGCN, which is a light and effective model for recommendation; as can be seen from Table 2, its expressiveness is greatly improved and it performs well. To verify the advancement of our model, we compared it with other recent excellent methods, and the results in Table 2 show that our method outperforms these two methods. From the above, it can be seen that our model is able to handle sparse data, large data, and complex relationships well.

5.4. Ablation Experiment

5.4.1. The Impact of Adding Image Modality to Recipe Recommendation

In our model, there are two types of input data: recipe text data and recipe image data. We fused the two types of data at the feature level, trying to use images to improve model performance. To verify the effectiveness of the recipe image improving model performance, we performed an experiment with only recipe text modal data, and the result is in Table 3; the result shows that using a recipe image to improve model performance is effective.

5.4.2. The Impact of Linear Transformation in GraphSAGE with Different Output Dimensions

We were inspired by the paper [50], whose authors propose that liner transformation of a GCN has little influence on recommendation. So we want to know if this is applicable to our model, but from the perspective of computational effort; we do not abandon linear transformation in GraphSAGE, so we test the impact of linear transformation with different output dimensions on model performance, and the result is shown in Figure 3. Note that the performance of different dimensions has little change under different metrics; it can be seen that linear transformation in GraphSAGE with different output dimensions has little impact on performance, but if the dimension is too low, the performance of model fluctuates a little, and this may be related to the fact that compressing big data into a low-dimensional space is not easy.

6. Conclusions

In the field of recipe recommendation, there are few studies on image modality, most of which focus on text modality, and there is a lack of modal fusion methods. In this paper, we present a novel method to recommend recipes for users. Different from previous methods, we fuse the recipe image modal and text modal at the feature level, try to use an image to promote subsequent high-level feature extraction and aggregation, and then we adopt GraphSAGE to extract high-level features; from the results of the ablation experiment, our method is valid. At the same time, we compared it with a single modality as the input and found that the effect of using multiple modalities for recipe recommendation is better than that of a single modality.

Author Contributions

Conceptualization and methodology, H.H. and R.O.; visualization, H.H.; writing—original draft preparation, H.H. and R.O.; modeling, R.O.; analysis of experimental results, R.O.; writing—review and editing, H.H., R.O. and W.O.; software, H.H. and Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the High-level Innovative Talents in Guizhou Province (Grant No. GCC[2023]033), the Natural Science Research Project of Department of Education of Guizhou Province (Grant No. QJJ[2024]009, Grant No. QJJ[2023]011).

Data Availability Statement

Datasets used in this paper are open source and publicly available. MealRec is openly available in Github at https://github.com/WUT-IDEA/MealRec, accessed on 29 September 2022.

Acknowledgments

The authors are thankful to the anonymous reviewers and editors for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Harvey, M.; Ludwig, B.; Elsweiler, D. Learning User Tastes: A First Step to Generating Healthy Meal Plans? In Proceedings of the First International Workshop on Recommendation Technologies for Lifestyle Change (Lifestyle 2012), Dublin, Ireland, 13 September 2012; Volume 18. [Google Scholar]
Trattner, C.; Elsweiler, D. Investigating the Healthiness of Internet-Sourced Recipes: Implications for Meal Planning and Recommender Systems. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 May 2017; pp. 489–498. [Google Scholar]
Ge, M.; Ricci, F.; Massimo, D. Health-aware Food Recommender System. In Proceedings of the 9th ACM Conference on Recommender Systems, Vienna, Austria, 16–20 September 2015; pp. 333–334. [Google Scholar]
Marshall, J.; Jimenez-Pazmino, P.; Metoyer, R.; Chawla, N.V. A Survey on Healthy Food Decision Influences through Technological Innovations. ACM Trans. Comput. Healthc. 2022, 3, 1–27. [Google Scholar] [CrossRef]
Harashima, J.; Ariga, M.; Murata, K.; Ioki, M. A Large-scale Recipe and Meal Data Collection as Infrastructure for Food Research. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia, 23–28 May 2016; Calzolari, N., Choukri, K., Declerck, T., Goggi, S., Grobelnik, M., Maegaard, B., Mariani, J., Mazo, H., Moreno, A., Odijk, J., et al., Eds.; European Language Resources Association: Paris, France, 2016. [Google Scholar]
Teng, C.; Lin, Y.R.; Adamic, L.A. Recipe recommendation using ingredient networks. In Proceedings of the Web Science 2012, WebSci ‘12, Evanston, IL, USA, 22–24 June 2012; Contractor, N.S., Uzzi, B., Macy, M.W., Nejdl, W., Eds.; ACM: New York, NY, USA, 2012; pp. 298–307. [Google Scholar] [CrossRef]
Min, W.; Bao, B.K.; Mei, S.; Zhu, Y.; Rui, Y.; Jiang, S. You are what you eat: Exploring rich recipe information for cross-region food analysis. IEEE Trans. Multimed. 2017, 20, 950–964. [Google Scholar] [CrossRef]
Min, W.; Jiang, S.; Jain, R. Food Recommendation: Framework, Existing Solutions, and Challenges. IEEE Trans. Multimed. 2019, 22, 2659–2671. [Google Scholar] [CrossRef]
Tian, Y.; Zhang, C.; Metoyer, R.A.; Chawla, N.V. Recipe Recommendation with Hierarchical Graph Attention Network. Front. Big Data 2021, 4, 778417. [Google Scholar] [CrossRef] [PubMed]
Pacífico, L.D.S.; Britto, L.F.S.; Ludermir, T.B. Ingredient Substitute Recommendation Based on Collaborative Filtering and Recipe Context for Automatic Allergy-Safe Recipe Generation. In Proceedings of the WebMedia ‘21: Brazilian Symposium on Multimedia and the Web, Belo Horizonte, Minas Gerais, Brazil, 5–12 November 2021; Pereira, A.C.M., da Rocha, L.C.D., Eds.; ACM: New York, NY, USA, 2021; pp. 97–104. [Google Scholar] [CrossRef]
Jia, N.; Chen, J.; Wang, R. An Attention-Based Convolutional Neural Network for Recipe Recommendation. Expert Syst. Appl. 2022, 201, 116979. [Google Scholar] [CrossRef]
Tian, Y.; Zhang, C.; Guo, Z.; Huang, C.; Metoyer, R.A.; Chawla, N.V. RecipeRec: A Heterogeneous Graph Learning Model for Recipe Recommendation. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, Vienna, Austria, 23–29 July 2022. [Google Scholar]
Chavan, P.; Thoms, B.; Isaacs, J. A Recommender System for Healthy Food Choices: Building a Hybrid Model for Recipe Recommendations using Big Data Sets. In Proceedings of the 54th Hawaii International Conference on System Sciences, HICSS 2021, Kauai, HI, USA, 5 January 2021; pp. 1–10. [Google Scholar]
Ranaldi, L.; Pucci, G. Knowing knowledge: Epistemological study of knowledge in transformers. Appl. Sci. 2023, 13, 677. [Google Scholar] [CrossRef]
Hamilton, W.L.; Ying, Z.; Leskovec, J. Inductive Representation Learning on Large Graphs. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; Guyon, I., von Luxburg, U., Bengio, S., Wallach, H.M., Fergus, R., Vishwanathan, S.V.N., Garnett, R., Eds.; Curran Associates, Inc.: New York, NY, USA, 2017; pp. 1025–1035. [Google Scholar]
Chum, O.; Zisserman, A. An Exemplar Model for Learning Object Classes. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–8. [Google Scholar]
Melville, P.; Mooney, R.J.; Nagarajan, R. Content-Boosted Collaborative Filtering for Improved Recommendations. Proc. Eighteenth Natl. Conf. Artif. Intell. (AAAI) 2002, 23, 187–192. [Google Scholar]
Pazzani, M.J.; Billsus, D. Content-Based Recommendation Systems. In The Adaptive Web: Methods and Strategies of Web Personalization; Springer: Berlin/Heidelberg, Germany, 2007; pp. 325–341. [Google Scholar]
Linden, G.; Smith, B.; York, J. Amazon.com Recommendations: Item-to-Item Collaborative Filtering. IEEE Internet Comput. 2003, 7, 76–80. [Google Scholar] [CrossRef]
Adomavicius, G.; Tuzhilin, A. Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Trans. Knowl. Data Eng. 2005, 17, 734–749. [Google Scholar] [CrossRef]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-Based Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, 1–5 May 2001; pp. 285–295. [Google Scholar]
Su, X.; Khoshgoftaar, T.M. A Survey of Collaborative Filtering Techniques. Adv. Artif. Intell. 2009, 2009, 421425. [Google Scholar] [CrossRef]
Li, S.; Kawale, J.; Fu, Y. Deep Collaborative Filtering via Marginalized Denoising Auto-encoder. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Melbourne, Australia, 19–23 October 2015; pp. 811–820. [Google Scholar]
He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.S. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar]
Andrade-Ruiz, G.; Carrasco, R.A.; Porcel, C.; Serrano-Guerrero, J.; Mata, F.; Arias-Oliva, M. Emerging Perspectives on the Application of Recommender Systems in Smart Cities. Electronics 2024, 13, 1249. [Google Scholar] [CrossRef]
Pecune, F.; Callebert, L.; Marsella, S. A Recommender System for Healthy and Personalized Recipes Recommendations. In Proceedings of the 5th International Workshop on Health Recommender Systems Co-Located with the 14th ACM Conference on Recommender Systems 2020 (RecSys 2020), Online, 26 September 2020; pp. 15–20. [Google Scholar]
Ma, X.; Gao, Z.; Hu, Q.; Abdelhady, M. Contrastive Knowledge Graph Attention Network for Request-Based Recipe Recommendation. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Virtual, 7–13 May 2022; pp. 3278–3282. [Google Scholar]
Morol, M.K.; Rokon, M.S.J.; Hasan, I.B.; Saif, A.; Khan, R.H.; Das, S.S. Food Recipe Recommendation Based on Ingredients Detection Using Deep Learning. In Proceedings of the 2nd International Conference on Computing Advancements, Dhaka, Bangladesh, 10–12 March 2022; pp. 191–198. [Google Scholar]
Peng, J.; Gong, J.; Zhou, C.; Zang, Q.; Fang, X.; Yang, K.; Yu, J. KGCFRec: Improving Collaborative Filtering Recommendation with Knowledge Graph. Electronics 2024, 13, 1927. [Google Scholar] [CrossRef]
Spoladore, D.; Colombo, V.; Arlati, S.; Mahroo, A.; Trombetta, A.; Sacco, M. An Ontology-Based Framework for a Telehealthcare System to Foster Healthy Nutrition and Active Lifestyle in Older Adults. Electronics 2021, 10, 2129. [Google Scholar] [CrossRef]
Saad, M.H.M.; Hamdan, N.M.; Sarker, M.R. State of the Art of Urban Smart Vertical Farming Automation System: Advanced Topologies, Issues and Recommendations. Electronics 2021, 10, 1422. [Google Scholar] [CrossRef]
Chen, S.; Cao, Q.; Cai, Y. Blockchain for Healthcare Games Management. Electronics 2023, 12, 3195. [Google Scholar] [CrossRef]
Zhang, L.; Kim, D. A Peer-to-Peer Smart Food Delivery Platform Based on Smart Contract. Electronics 2022, 11, 1806. [Google Scholar] [CrossRef]
Berkovsky, S.; Freyne, J. Group-Based Recipe Recommendations: Analysis of Data Aggregation Strategies. In Proceedings of the Fourth ACM Conference on Recommender Systems, Barcelona, Spain, 26–30 September 2010; pp. 111–118. [Google Scholar]
Forbes, P.; Zhu, M. Content-boosted matrix factorization for recommender systems: Experiments with recipe recommendation. In Proceedings of the 2011 ACM Conference on Recommender Systems, RecSys 2011, Chicago, IL, USA, 23–27 October 2011; Mobasher, B., Burke, R.D., Jannach, D., Adomavicius, G., Eds.; ACM: Tokyo, Japan, 2011; pp. 261–264. [Google Scholar] [CrossRef]
Bilgin, A.; Hagras, H.; van Helvert, J.; Al-Ghazzawi, D. A Linear General Type-2 Fuzzy-Logic-Based Computing with Words Approach for Realizing an Ambient Intelligent Platform for Cooking Recipe Recommendation. IEEE T. Fuzzy Syst. 2016, 24, 306–329. [Google Scholar] [CrossRef]
Freyne, J.; Berkovsky, S. Intelligent Food Planning: Personalized Recipe Recommendation. In Proceedings of the 15th International Conference on Intelligent User Interfaces, Hong Kong, China, 7–10 February 2010; pp. 321–324. [Google Scholar]
Elsweiler, D.; Trattner, C.; Harvey, M. Exploiting Food Choice Biases for Healthier Recipe Recommendation. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Shinjuku, Tokyo, Japan, 7–11 August 2017; Kando, N., Sakai, T., Joho, H., Li, H., de Vries, A.P., White, R.W., Eds.; ACM: Tokyo, Japan, 2017; pp. 575–584. [Google Scholar] [CrossRef]
Ozeki, S.; Kotera, M.; Ishiguro, K.; Nishimura, T.; Higuchi, K. Recipe Recommendation for Balancing Ingredient Preference and Daily Nutrients. In Proceedings of the CEA++@MM 2022: Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and Related APPlications, Lisboa, Portugal, 10 October 2022; Yamakata, Y., Hashimoto, A., Chen, J., Eds.; ACM: Tokyo, Japan, 2022; pp. 11–19. [Google Scholar] [CrossRef]
Yuan, J.; Roy Chowdhury, P.K.; McKee, J.; Yang, H.L.; Weaver, J.; Bhaduri, B. Exploiting Deep Learning and Volunteered Geographic Information for Mapping Buildings in Kano, Nigeria. Sci. Data 2018, 5, 180217. [Google Scholar] [CrossRef]
Li, M.; Li, L.; Xie, Q.; Yuan, J.; Tao, X. MealRec: A Meal Recommendation Dataset. arXiv 2022, arXiv:2205.12133. [Google Scholar]
Lin, C.J.; Kuo, T.T.; Lin, S.D. A Content-Based Matrix Factorization Model for Recipe Recommendation. In Proceedings of the Advances in Knowledge Discovery and Data Mining: 18th Pacific-Asia Conference, Tainan, Taiwan, 13–16 May 2014; pp. 560–571. [Google Scholar]
Khan, M.A.; Rushe, E.; Smyth, B.; Coyle, D. Personalized, Health-Aware Recipe Recommendation: An Ensemble Topic Modeling Based Approach. arXiv 2019, arXiv:1908.00148. [Google Scholar]
Freyne, J.; Berkovsky, S.; Smith, G. Recipe Recommendation: Accuracy and Reasoning. In Proceedings of the User Modeling, Adaption and Personalization: 19th International Conference, Girona, Spain, 11–15 July 2011; pp. 99–110. [Google Scholar]
Starke, A.; Trattner, C.; Bakken, H.; Johannessen, M.; Solberg, V. The Cholesterol Factor: Balancing Accuracy and Health in Recipe Recommendation through a Nutrient-Specific Metric. In Proceedings of the 1st Workshop on Multi-Objective Recommender Systems, Amsterdam, The Netherlands, 25 September 2021. [Google Scholar]
Koren, Y.; Bell, R.; Volinsky, C. Matrix Factorization Techniques for Recommender Systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
Herlocker, J.L.; Konstan, J.A.; Terveen, L.G.; Riedl, J.T. Evaluating Collaborative Filtering Recommender Systems. ACM Trans. Inf. Syst. 2004, 22, 5–53. [Google Scholar] [CrossRef]
Wang, C.; Blei, D.M. Collaborative Topic Modeling for Recommending Scientific Articles. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 21–24 August 2011; pp. 448–456. [Google Scholar]
Konstan, J.A.; Miller, B.N.; Maltz, D.; Herlocker, J.L.; Gordon, L.R.; Riedl, J. GroupLens: Applying Collaborative Filtering to Usenet News. Commun. ACM 1997, 40, 77–87. [Google Scholar] [CrossRef]
He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.D.; Wang, M. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020, Virtual, 25–30 July 2020; Huang, J., Chang, Y., Cheng, X., Kamps, J., Murdock, V., Wen, J.R., Liu, Y., Eds.; ACM: Tokyo, Japan, 2020; pp. 639–648. [Google Scholar] [CrossRef]
Fan, W.; Liu, X.; Jin, W.; Zhao, X.; Tang, J.; Li, Q. Graph Trend Filtering Networks for Recommendations. In Proceedings of the SIGIR ‘22: The 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; Amigó, E., Castells, P., Gonzalo, J., Carterette, B., Culpepper, J.S., Kazai, G., Eds.; ACM: Tokyo, Japan, 2022; pp. 112–121. [Google Scholar] [CrossRef]
Fan, Z.; Xu, K.; Dong, Z.; Peng, H.; Zhang, J.; Yu, P.S. Graph Collaborative Signals Denoising and Augmentation for Recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, Taipei, Taiwan, 23–27 July 2023; Chen, H.H., Duh, W.J.E., Huang, H.H., Kato, M.P., Mothe, J., Poblete, B., Eds.; ACM: Tokyo, Japan, 2023; pp. 2037–2041. [Google Scholar] [CrossRef]

Figure 1. Motivation figure with modal fusion example. Generally, most multimodal fusion methods directly extract the features of text and images and then fuse them directly without fully considering the granularity of the model features, which makes it difficult to truly fuse the semantics of different modalities. In the text feature, 0 and 1 represent whether a certain ingredient exists, 0 represents that the ingredient does not exist, and 1 is the opposite. In the image feature, the numbers represent pixel values.

Figure 2. The framework of the MHGRR. We take historical user rating to construct a graph, and input the graph and ingredient text into an embedding layer to obtain the recipe text feature and user feature. Then, we adopt a 2-layer CNN to obtain the recipe image feature; subsequently, we contact the recipe text feature and recipe image feature, and fuse the contacted features with a linear layer. Then, we take a 2-layer GraphSAGE to extract the final user and recipe features.

Figure 3. Impact of the output dimensions of linear transformation of GraphSAGE. The three images are the performance of GraphSAGE based on ACC, MAE, and RMSE, respectively, with different dimensions of linear transformation.

Table 1. Description of the dataset.

Item	Description	Example
recipe name	name of the recipe	Coconut Poke Cake
image url	the url of the recipe image	images.media-allrecipes.com/334118.jpg
ingredients	recipe ingredients	white cake mix; cream of coconut…
rating	user ratings for recipes	user id: 19; recipe id: 59; rating: 4

Table 2. A comparison of the performance among the MHGRR and baselines.

Methods	ACC	MAE	RMSE
CF	5.44%	0.8827	0.8891
CFR	4.62%	0.8903	0.8964
CMF	55.57%	0.4166	0.4899
LightGCN	85.59%	0.1272	0.1990
GTN	88.13%	0.1102	0.1406
GraphDA	86.96%	0.1258	0.1443
MHGRR (Ours)	90.73%	0.0811	0.0923

The data in bold indicate the best performance among all methods.

Table 3. Performance of MHGRR and only-text-modal MHGRR based on ACC, MAE, and RMSE.

Input Modal	ACC	MAE	RMSE
Only text modal	89.59%	0.0921	0.1448
Text modal and image modal	90.73% ↑	0.0811 ↑	0.0923 ↑

The data in bold indicate the better performance between the two input modalities.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ouyang, R.; Huang, H.; Ou, W.; Liu, Q. Multimodal Recipe Recommendation with Heterogeneous Graph Neural Networks. Electronics 2024, 13, 3283. https://doi.org/10.3390/electronics13163283

AMA Style

Ouyang R, Huang H, Ou W, Liu Q. Multimodal Recipe Recommendation with Heterogeneous Graph Neural Networks. Electronics. 2024; 13(16):3283. https://doi.org/10.3390/electronics13163283

Chicago/Turabian Style

Ouyang, Ruiqi, Haodong Huang, Weihua Ou, and Qilong Liu. 2024. "Multimodal Recipe Recommendation with Heterogeneous Graph Neural Networks" Electronics 13, no. 16: 3283. https://doi.org/10.3390/electronics13163283

APA Style

Ouyang, R., Huang, H., Ou, W., & Liu, Q. (2024). Multimodal Recipe Recommendation with Heterogeneous Graph Neural Networks. Electronics, 13(16), 3283. https://doi.org/10.3390/electronics13163283

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multimodal Recipe Recommendation with Heterogeneous Graph Neural Networks

Abstract

1. Introduction

2. Related Work

2.1. Typical Recommendation

2.2. Recipe Recommendation

3. Problem Description and Formulation

4. Methodology

4.1. Embedding Layer

4.1.1. Encoding for User and Recipe

4.1.2. Feature Transformation with Linear Layer

4.2. Recipe Image Feature Extraction

4.3. Feature Fusion

4.4. GraphSAGE Layer

4.4.1. Message Propagation

4.4.2. Feature Aggregation

4.5. Prediction

5. Experiments

5.1. Dataset

5.2. Experimental Settings

5.2.1. Evaluation Metrics

5.2.2. Baselines

5.2.3. Implementation Details

5.3. Comparative Experiments

5.4. Ablation Experiment

5.4.1. The Impact of Adding Image Modality to Recipe Recommendation

5.4.2. The Impact of Linear Transformation in GraphSAGE with Different Output Dimensions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI