1. Introduction
With the exponential growth of Internet technology, e-commerce platforms have emerged as a fundamental component of the modern business landscape [
1], establishing themselves as the dominant channel for consumer transactions. This digital transformation has profoundly revolutionized the conventional distribution paradigm of minor agricultural products. The increasing integration of agricultural commodities into digital marketplaces through e-commerce platforms has given rise to a novel distribution model: rural e-commerce. Functioning as an innovative distribution channel within the agricultural industry chain, rural e-commerce platforms have become instrumental in driving the comprehensive transformation of the agricultural value chain, enhancing industry development standards, and accelerating agricultural modernization processes.
The digital transformation of agricultural product distribution is evidenced by the substantial and consistent growth in online retail sales of agricultural products in China. Official data from the Ministry of Commerce indicate a notable upward trajectory, with agricultural product e-commerce retail sales reaching CNY 486.7 billion in 2021, subsequently expanding to CNY 531.38 billion in 2022, and further growing to CNY 587.03 billion in 2023. This progressive annual increase reflects not merely quantitative market expansion but also signifies the deepening integration of digital technologies into traditional agricultural value chains. The consistent growth pattern demonstrates the resilience and increasing market penetration of agricultural e-commerce, suggesting a structural shift in consumer purchasing behavior toward digital channels for agricultural products. This evolution presents both significant opportunities and unique challenges for stakeholders across the agricultural sector, particularly regarding product recommendation systems that must effectively accommodate the distinctive characteristics of agricultural commodities within digital marketplaces.
Rural e-commerce platforms serve as integrated hubs that connect various segments of the agricultural industry, encompassing production, processing, warehousing, and logistics, thereby establishing an efficient and streamlined marketplace for transactions between minor agricultural product suppliers (merchants) and consumers (users). However, these platforms face significant operational challenges due to the distinctive characteristics of minor agricultural product sales, particularly the pronounced seasonality in user demand and complex geographical distribution patterns. Previous research has demonstrated that the development of rural e-commerce is substantially influenced by multiple factors, including infrastructure conditions, individual characteristics, social networks, and resource endowments [
2]. These multifaceted challenges have, to a considerable extent, impeded the broader adoption and advancement of rural e-commerce platforms.
To address the fundamental challenges impeding the development of rural e-commerce, it is essential to conduct a comprehensive analysis of user–merchant transaction patterns on rural e-commerce platforms. Karpushkin [
3] demonstrates that consumer behavior prediction and personalized recommendation generation can be effectively achieved through the integration of user-generated content analysis and sociological survey data. Following this methodology, deep mining of users’ agricultural product consumption patterns enables the recommendation of products that not only align with individual preferences but also accommodate seasonal and regional characteristics, thereby facilitating precise minor agricultural product marketing and distribution. The implementation of such a system addresses an urgent need in rural e-commerce platforms and represents a critical factor in promoting the sustainable development of rural e-commerce ecosystems. As evidenced by Huang and Yu [
4], consumer purchasing behavior is shaped by both internal and external factors, where internal factors constitute the foundation of purchase intention, while external factors act as catalytic conditions.
In the context of rural e-commerce platforms, this study specifically focuses on scallions and garlic as representative minor agricultural products to conduct a comprehensive analysis of their sales patterns and characteristics. These ingredients, being fundamental seasonings in Chinese cuisine, demonstrate not only extensive culinary applications but also maintain consistent and widespread market demand, thereby serving as ideal candidates for investigating the sales models of minor agricultural products in rural e-commerce. Through the development of a user–agricultural product preference model, this research aims to systematically analyze users’ consumption preferences and subsequently generate accurate, personalized product recommendations. This approach ultimately facilitates precise recommendation generation and sales optimization within the rural e-commerce ecosystem.
Recommender systems have emerged as one of the most effective tools for mining user preferences and implementing precise recommendations for online resources. These systems have been extensively deployed in information management systems across diverse domains, including movies, music, videos, and commercial products. In these applications, recommender systems have demonstrated remarkable capabilities in accurately extracting user preferences, efficiently managing vast resource repositories for recommendations, and optimizing marketing strategies. Therefore, the integration of recommender systems into rural e-commerce platforms for mining user preferences in minor agricultural products and implementing precise recommendations provides robust theoretical support for developing targeted sales and recommendation mechanisms in rural e-commerce ecosystems.
Nevertheless, the implementation of recommender systems in the agricultural domain, particularly for minor agricultural products, encounters several significant challenges for the following reasons:
Capturing Diverse User Preferences with Precision: Users exhibit significant variations in their preferences for minor agricultural products, influenced by factors such as dietary habits and cooking practices. Previous research has demonstrated that analyzing historical interaction patterns is an effective approach to modeling user preferences. For instance, He et al. [
5] introduced FISMs (factored item similarity models), which leverage implicit item similarity relationships to capture user preferences and have demonstrated superior performance in top-N recommendation tasks. However, in the context of minor agricultural product recommendations, developing comprehensive and granular methods to capture and analyze such diverse preferences for more accurate preference modeling remains a significant research challenge.
Integration of Heterogeneous Information Poses Significant Challenges: The incorporation of contextual user information has emerged as a crucial strategy in delivering personalized minor agricultural product recommendations. However, the recommendation process inherently involves processing extensive heterogeneous data sources, encompassing both user-specific attributes and minor agricultural product characteristics. This heterogeneity substantially increases the complexity of the recommendation system. Consequently, the effective integration of such diverse information streams into a unified computational model remains a significant technical challenge in the field.
Temporal–Spatial Sensitivity and Dynamic Adjustment: The demand for minor agricultural products demonstrates distinctive temporal and spatial characteristics, manifesting in significant seasonal and regional variations. These characteristics can result in substantial fluctuations or temporary cessation of user demand for specific products during particular time periods. Consequently, the accurate integration of temporal and geographical information into the recommendation process, coupled with the ability to dynamically adjust recommendation results accordingly, presents a significant challenge in minor agricultural product recommendation systems.
Data Sparsity and Cold-Start Problems: Data sparsity and cold-start problems present significant challenges in minor agricultural product recommendation systems. The extremely limited user–item interaction data lead to severe sparsity issues in the data matrix. Furthermore, the cold-start problem, which occurs when new users enter the system or new products are introduced, compounds these challenges by lacking historical interaction data for accurate recommendations.
To address these challenges, we propose a minor agricultural product personalized recommendation framework integrating graph convolutional networks and collaborative filtering (APGCN-CF). This novel framework is specifically designed to generate personalized product recommendation lists that accurately reflect users’ preferences in minor agricultural product purchasing. By leveraging graph convolutional neural networks, our framework effectively aggregates and fuses latent features from users and their neighboring nodes in the user–product interaction graph. Through comprehensive analysis of historical user–product interaction data, the framework precisely captures and models user preferences for minor agricultural products, ultimately generating highly personalized recommendation lists that align with individual user preferences. The innovation of our approach lies not in any single component in isolation but in the synergistic integration of four carefully designed modules: graph structure construction, convolution-based feature learning, user similarity-based candidate generation, and preference probability-driven precise recommendation, creating a comprehensive pipeline specifically tailored to address the unique challenges in minor agricultural product recommendation.
Furthermore, we construct a comprehensive graph structure that incorporates the essential characteristics of minor agricultural products and user profiles and review behavioral patterns. This structure enables in-depth analysis of both seasonal and regional attributes of minor agricultural products, thereby effectively mitigating the impact of seasonality and geographical constraints on purchasing patterns. The proposed graph-based approach efficiently captures the temporal and spatial correlations between users and products, significantly enhancing the expressiveness of feature representations. By leveraging the rich interconnections between user preferences and product attributes within the graph structure, our method effectively addresses the data sparsity challenge inherent in agricultural recommendation systems. Moreover, through the integration of historical user reviews and multi-dimensional product feature information, the system successfully mitigates the cold-start problem for both newly registered users and newly listed minor agricultural products.
To evaluate the effectiveness of APGCN-CF, we conduct comprehensive experiments using real-world minor agricultural product datasets, specifically focusing on scallions and garlic as representative cases. These seasonings were strategically selected due to their distinctive characteristics in agricultural e-commerce: they exhibit significant seasonal harvesting patterns that directly impact availability and pricing, their consumption demonstrates strong regional preferences influenced by local culinary traditions, and unlike staple agricultural commodities with universal demand patterns. These minor products present unique recommendation challenges due to their heterogeneous user preference distributions. The experimental results demonstrate that our approach, which effectively captures these spatio-temporal dependencies and regional preference variations, significantly outperforms state-of-the-art baseline methods in terms of recommendation accuracy, recall rate, and cold-start problem mitigation.
The main contributions of this work can be summarized as follows:
The proposed APGCN-CF framework comprehensively incorporates both seasonal and geographical characteristics into minor agricultural product recommendations. We construct a graph structure that establishes connections between users and minor agricultural products, implementing an effective framework that mitigates the negative impacts of multi-source heterogeneous data on model performance.
In this study, we propose an innovative approach by incorporating graph convolutional networks (GCN) to process minor agricultural product recommendation data. Through effective feature aggregation mechanisms, we establish meaningful connections between user preferences and minor agricultural product characteristics, thereby significantly enhancing the accuracy and personalization of the recommendation system.
The proposed framework effectively addresses conventional challenges inherent in recommender systems, particularly data sparsity and cold-start issues. Our approach serves as a versatile plugin module that can be seamlessly integrated with existing collaborative filtering methodologies to enhance their performance. Moreover, by incorporating seasonal and geographical factors into consideration, the framework demonstrates substantial improvements in recommendation accuracy and effectiveness.
We conduct comprehensive experiments on real-world minor agricultural product datasets to evaluate the proposed GCN-CF framework. The experimental results demonstrate that our method consistently achieves superior performance compared to state-of-the-art baseline approaches across multiple evaluation metrics, including precision, recall, and F1-score.
The remainder of this paper is organized as follows.
Section 2 provides a comprehensive review of related work, focusing on content-based filtering, collaborative filtering, and minor agricultural product recommendation systems.
Section 3 delineates the mathematical notation and formal problem definitions.
Section 4 presents a detailed description of our proposed APGCN-CF framework, encompassing the model architecture and optimization methodology. In
Section 5, we evaluate the framework through extensive experiments on real-world datasets, analyzing its overall performance with particular emphasis on addressing data sparsity and cold-start challenges. Finally,
Section 6 concludes the paper by summarizing our contributions and discussing directions for future research.
4. Proposed Work
4.1. Framework
With the advancement of mobile internet technologies, rural e-commerce platforms have emerged as crucial channels for minor agricultural product distribution. However, conventional recommendation approaches face significant challenges in delivering precise, personalized recommendations due to the distinct characteristics of minor agricultural product sales, including strong seasonal and regional dependencies coupled with heterogeneous user preferences. As demonstrated by Gao et al. [
15], graph neural networks (GNNs) demonstrate superior capability in capturing complex higher-order interaction patterns between users and items through iterative feature propagation mechanisms that aggregate embedding representations from neighboring nodes. Leveraging this advantage, this study presents a minor agricultural product personalized recommendation framework that integrates graph convolutional networks with collaborative filtering (APGCN-CF). The proposed framework achieves precise adaptation to minor agricultural product recommendation scenarios through the construction of unified dimensional homogeneous graph structures and the implementation of specialized similarity computation modules.
As shown in
Figure 1, APGCN-CF consists of four closely integrated functional modules.
Graph Structure Construction Module: This module generates unified dimensional embedding representations by integrating diverse node and edge features. For user nodes, it incorporates geographic location encodings and historical review text embeddings. For item nodes, it processes item name embeddings and sales price information. Edge features encompass user–item interactions, including ratings and review timestamps. Through systematic feature dimension alignment, this module transforms heterogeneous information into a homogeneous graph structure amenable to graph convolution operations. This unified data representation framework establishes a robust foundation for subsequent deep feature learning processes.
Graph Convolutional Network Module: This module employs the constructed homogeneous graph as input to perform node representation learning through carefully designed multi-layer graph convolutional networks. During the convolution process, each layer iteratively updates and optimizes the hidden representations of nodes through sophisticated message passing and feature aggregation mechanisms. Following multiple layers of propagation, the module generates user and item node embedding vectors that effectively capture high-order interaction information within the graph structure. As demonstrated by Gao et al. [
15], feature learning based on graph structures can significantly mitigate the data sparsity challenge, primarily because graph neural networks can comprehensively leverage high-order neighborhood information through their inherent message-passing mechanisms.
Candidate Set Generation Module Based on User Similarity: This module utilizes user embedding vectors derived from graph convolutional networks to construct high-quality candidate sets through inter-user similarity analysis. The generation process employs cosine similarity metrics to identify the most similar user clusters for each target user, subsequently aggregating historical interaction items from these similar users to form comprehensive candidate sets. By systematically mining interaction patterns from users with similar preferences, the module generates personalized candidate pools that effectively capture potential user interests. The implementation of computationally efficient cosine similarity calculations results in low computational overhead, enabling rapid preliminary screening of candidate items and establishing a robust foundation for subsequent refined personalization in recommendation modules.
Preference Probability-Based Precise Recommendation Module: This module employs a multi-layer perceptron (MLP) network that has been pre-trained on historical user–agricultural product interaction data to generate refined personalized recommendations. The network processes concatenated user–agricultural product embedding vectors as input and generates probability scores indicating user preferences for minor agricultural products. During the recommendation process, the module performs feature fusion by concatenating user and minor agricultural product embedding vectors, which are subsequently processed through the pre-trained MLP network to model the intricate matching relationships between users and minor agricultural products across multiple neural network layers. The module computes preference probabilities for all candidate minor agricultural products and implements a top-N selection strategy to generate personalized recommendation lists that accurately capture user interests. Through the integration of high-dimensional user and minor agricultural product feature representations, this module effectively encodes user preference patterns, yielding high-quality personalized recommendations. The MLP-based preference modeling architecture demonstrates superior capability in learning complex non-linear user–agricultural product interaction patterns that conventional matching methods struggle to capture, thereby substantially enhancing recommendation accuracy.
The proposed framework consists of four interconnected modules that operate sequentially in a pipeline architecture. The graph structure construction module integrates heterogeneous features from multiple sources, including user geographical locations, historical reviews, product nomenclature, and pricing information, to generate unified dimensional embeddings, thereby establishing a standardized data foundation for subsequent processing. The graph convolutional network module implements multi-layer convolutional architectures and message propagation mechanisms to extract deep feature representations of nodes, effectively mitigating the data sparsity challenge. The candidate set generation module employs cosine similarity metrics to identify preference correlations between users, facilitating the rapid construction of personalized candidate pools based on user similarity. The precise recommendation module utilizes a pre-trained multi-layer perceptron network to integrate user–product embedding vectors, generating personalized recommendation lists through sophisticated feature fusion. Specifically, the user similarity-based candidate generation model (USCG) and the preference probability-based precise recommendation model (PPPR) work synergistically, with USCG utilizing efficient cosine similarity methods to rapidly and cost-effectively compress the entire product catalog into a candidate set, performing efficient preliminary screening, enabling PPPR to conduct refined recommendations on a smaller-scale candidate set. This design approach addresses both the computational efficiency and high computational cost issues that may arise in large-scale data application scenarios while fully leveraging the feature vectors output by the graph convolutional network to ensure recommendation accuracy.
This modular architecture effectively leverages the inherent advantages of graph neural networks in feature learning and relationship modeling while achieving precise recommendations for minor agricultural products through specialized similarity computations and multi-layer neural networks. The detailed implementation methodologies for each module will be elaborated on in subsequent chapters.
4.2. Graph Structure Construction Model
Let denote the set of users and denote the set of minor agricultural products in the recommendation system. The interaction relationships between users and products naturally form a bipartite graph structure , where nodes represent users and minor agricultural products, respectively, and edges indicate the explicit interaction behaviors between them. The proposed model aims to transform heterogeneous information into a homogeneous graph representation through a unified feature processing framework.
For a user node
, the initial feature representation vector
is constructed by incorporating multiple user attributes, including geographical location, historical behavioral patterns, etc. For a minor agricultural product node
, the initial feature representation vector
comprises fundamental product characteristics, such as the product’s name, store affiliation, and price information. Regarding edge features, we encode user–product interaction data, specifically ratings, timestamps, and review content, into the edge feature vector
. As illustrated in
Figure 2, our feature transformation framework systematically processes these heterogeneous information sources into a unified graph structure.
As shown in the above figure, the graph structure construction model integrates multi-source heterogeneous information into a unified graph structure. This feature construction approach facilitates the conversion of heterogeneous information into a homogeneous graph structure, establishing a coherent computational foundation for subsequent graph convolutional network operations.
4.3. Graph Convolution Model
To effectively capture higher-order feature patterns in user–agricultural product interactions, this study proposes a multi-layer graph convolutional network (GCN) architecture for node representation learning. Let represent the user–agricultural product interaction graph, where nodes correspond to users and minor agricultural products, while edges indicate the interaction behaviors between them. In this framework, we define as the initial feature representation vector of user u and as the feature representation vector of minor agricultural product p. The interaction between user u and minor agricultural product p is characterized by the edge feature vector .
In the
k-th layer of graph convolution, node features are updated by aggregating information from neighboring nodes. Here,
i and
j denote arbitrary nodes, and
represents the neighbor set of node
i. For user nodes
u,
corresponds to the connected product nodes
, while for product nodes
p,
represents the connected user nodes
. The feature representation
at layer
k is updated as follows:
where
denotes the set of neighboring nodes directly connected to node i, and
represents the trainable weight matrix at layer k. The normalization factor
is employed to normalize the contributions from nodes with varying degrees of connectivity, and
represents a non-linear activation function.
Through the iterative stacking of k graph convolutional layers, the final representation vectors and of user and minor agricultural product nodes capture both high-order interaction patterns and comprehensive node characteristics. These learned embeddings, which incorporate behavioral features and attribute information, serve as information-rich representations for subsequent personalized recommendation tasks.
4.4. User Similarity-Based Candidate Generation Model (USCG)
To generate high-quality initial recommendation candidate sets, this section employs a candidate generation method based on user similarity analysis. While user similarity computation represents a classical approach in traditional collaborative filtering, the innovation of our research lies in its deep integration with graph convolutional networks. Unlike conventional methods that solely rely on direct user–item interaction data, our approach leverages high-order embedding representations learned from the user–item graph structure through graph convolutional networks. These representations not only encapsulate users’ historical preference information but also capture implicit connection patterns in the network topology through message-passing mechanisms. Through this sophisticated collaborative mechanism, user embedding vectors more comprehensively express users’ latent interests and seasonal and regional preferences, thereby enabling the identification of more precisely similar user groups when quantifying inter-user similarity relationships and subsequently generating more personalized and context-aware candidate item sets for target users.
It is important to highlight that the user similarity pattern generation process in our framework is entirely automated, requiring no manual intervention. The cosine similarity measure between users is formally calculated as follows:
where
and
represent the embedding vectors of users
u and
v, respectively, derived from the graph convolutional network module. These embedding vectors are particularly valuable as they encapsulate both explicit user preferences and implicit high-order connectivity patterns learned through the message-passing mechanism of GCNs.
The automation of similarity computation is implemented through vector operations that efficiently calculate pairwise similarities between the target user and all other users in the system. For each target user, the algorithm automatically identifies the k most similar users based on the computed similarity scores and aggregates their historical interactions to form the candidate set. This automated mechanism dynamically adapts to evolving user preferences as new interaction data become available in the system.
By utilizing GCN-derived embedding vectors for similarity computation, our approach captures nuanced preference correlations that conventional collaborative filtering methods might overlook. The resulting candidate set serves as a preliminary filtered pool that effectively narrows the search space for the subsequent preference probability-based precise recommendation module described in
Section 4.5, thereby establishing a seamless pipeline from user embedding generation to the final personalized recommendation.
4.5. Preference Probability-Based Precise Recommendation (PPPR)
To accurately identify items that optimally align with users’ preferences from the candidate pool, this section introduces a novel recommendation methodology based on preference probability estimation. The proposed approach leverages a pre-trained multi-layer perceptron (MLP) architecture, which learns from historical user–item interaction data to model the compatibility between user–item pairs, thereby generating personalized top-N recommendations with enhanced precision.
For a target user u and items in the candidate set
, we construct the following preference prediction model:
where
and
denote the embedding vectors of user
u and item
p, respectively, and
represents the vector concatenation operation. The pre-trained MLP predictor processes the concatenated user–item vector pair to compute the preference probability of user
u towards item
p. Subsequently, the
function selects
N items with the highest preference probabilities to generate the final recommendation list
. Through the incorporation of high-order feature representations for both users and items, this methodology effectively captures their intrinsic matching relationships, thereby producing high-quality personalized recommendations.
5. Experimental Evaluation
In this section, we conduct comprehensive experimental evaluations to assess the recommendation performance of our proposed method. Our experimental analysis specifically addresses two fundamental research questions: (i) How does our proposed method perform in terms of recommendation effectiveness compared to state-of-the-art baseline methods? (ii) What are the underlying mechanisms through which our approach enhances recommendation performance?
5.1. Experimental Datasets
To evaluate the effectiveness of APGCN-CF, we conducted comprehensive experiments utilizing two real-world minor agricultural product datasets, with a special focus on scallions and garlic as representative cases. The selection of these two minor agricultural products was based on their distinctive characteristics in agricultural e-commerce: they exhibit significant seasonal harvesting patterns and seasonal demand fluctuations that directly impact supply and pricing, and unlike traditional staple agricultural commodities (such as wheat, rice, and corn) with universal demand patterns, their consumption is heavily influenced by local dietary habits and culinary traditions, displaying pronounced regional characteristics. These minor products present unique recommendation challenges due to their heterogeneous user preference distributions.
Both datasets were obtained through web crawling techniques within the publicly accessible scope of the JD shopping platform. The garlic dataset includes 374 users and 816 products, with a total of 3162 interaction records (covering the period from 17 March 2023 to 29 May 2024), with a sparsity of approximately 98.96%. The scallion dataset includes 190 users and 370 products, with a total of 1629 interaction records (covering the period from 17 March 2023 to 29 May 2024), with a sparsity of approximately 97.68%. Each dataset contains four tables: a product table, a user table, a training set, and a test set.
The product table records key information about products (examples shown in
Table 2), including the following fields: product ID, positive ratings, sales, prices, vectorized store names, vectorized product names, and vectorized comments. Among these, the product ID is the identifying ID of the product, positive ratings indicate the positive review rate of the product, sales indicate the sales volume of the product, and the price is the selling price of the product, while vectorized store names, vectorized product names, and vectorized comments are multi-dimensional vectors converted from textual information using natural language processing models, representing store names, product names, and all historical comments about the product, respectively. This vectorization method preserves semantic information while making it quantifiable.
The user table records key information about users (examples shown in
Table 3), including the following fields: user ID, vectorized regions, average ratings, and vectorized comments. Among these, the product ID is the identifying ID of the user, the vectorized region is the vectorized identifier of the user’s region (embedded uniformly based on geographical proximity relationships), the average rating is the average rating of the user’s historical evaluations, and vectorized comments are the vectorized representation of the user’s historical reviews (similarly converted from text to numerical vectors using natural language processing models).
The training set records interaction information between users and products through reviews (examples shown in
Table 4), including the following fields: user ID, product ID, ratings, vectorized comments, and vectorized time. Among these, the user ID is the identifying ID of the user, the product ID is the identifying ID of the product, the rating is the rating given by the user to the product, vectorized comments are the vectorized representation of the review content provided by the user for the product (similarly converted from text to numerical vectors using natural language processing models), and vectorized time is the review time processed using sine–cosine encoding technology, which naturally represents the periodic variations in time that are crucial for time-sensitive minor agricultural product recommendation scenarios, and this method creates smooth temporal representations that avoid the “boundary problems” of discrete time encoding, for example, helping the model understand that December and January are seasonally adjacent, these important characteristics enable the processed time vectors to be better understood and processed by the model.
The validation set contains rating information for each user on two products (representing that the user purchased the product and provided a rating, examples shown in
Table 5), used to validate the recommendation performance of the model, including the following fields: user ID, product ID, and ratings. Among these, the user ID is the identifying ID of the user, the product ID is the identifying ID of the product, and the rating is the rating given by the user to the product.
5.2. Baseline Methods
We compare our method with the following recommendation approaches:
User-based collaborative filtering (User-CF) represents a classical paradigm in recommendation systems that generates personalized recommendations by leveraging user similarity patterns. The algorithm operates on a user–item rating matrix to identify users exhibiting similar preferences to the target user through similarity metrics, predominantly employing cosine similarity calculations. The system subsequently generates recommendations by suggesting items that have been positively rated by similar users but have not yet been interacted with by the target user.
Item-based collaborative filtering (Item-CF) represents an alternative collaborative filtering paradigm that generates recommendations by computing item-to-item similarities. This methodology calculates similarity metrics between items based on user interaction data, such as ratings and purchase history. The recommendation process involves analyzing the similarities between items that users have previously interacted with and potential items of interest, thereby generating personalized recommendations through item-to-item correlation patterns.
Matrix factorization (MF) is a model-based collaborative filtering technique that decomposes the user–item rating matrix to predict users’ preferences for unrated items. Through this decomposition process, MF methods project both users and items into a shared low-dimensional latent space, generating dense vector representations that capture their inherent characteristics. These latent representations are then leveraged to estimate potential ratings for items that users have not yet interacted with, facilitating the generation of personalized recommendation lists.
Singular value decomposition (SVD), a specialized form of matrix factorization, is a fundamental technique for generating recommendations through the decomposition of user–item rating matrices. This method systematically decomposes the rating matrix into three constituent components: a user matrix, a diagonal singular value matrix, and an item matrix. By leveraging these decomposed matrices, the system can effectively predict users’ potential ratings for previously unrated items and subsequently generate personalized recommendation lists. The mathematical rigor of SVD enables accurate capture of latent relationships between users and items, facilitating precise recommendation generation.
Neural graph collaborative filtering (NGCF) represents an advanced recommendation approach that integrates graph neural networks with collaborative filtering techniques to enhance embedding representation capabilities. The method explicitly models high-order connectivity patterns in user–item interactions, significantly improving the recommendation performance. NGCF initially constructs a bipartite graph structure to represent user–item interactions, followed by capturing collaborative signal propagation through carefully designed embedding propagation layers. Through the implementation of multi-layer embedding propagation mechanisms, NGCF effectively captures high-order connectivity information, thereby generating more comprehensive and expressive user and item representations. These refined embedding representations are subsequently utilized to predict user preferences for items with higher accuracy and generate personalized recommendation lists.
5.3. Evaluation Metrics
In our experimental setup, we partitioned the dataset into training and test sets. The training set was utilized to train the APGCN-CF model, while the test set was employed to evaluate its recommendation performance. During the test set construction, we retained two previously purchased items from each user as validation data. For performance evaluation, we generated personalized recommendation lists for target users and assessed them using two primary metrics: precision and recall. We implemented a top-N recommendation strategy, where N denotes the length of the recommendation list. Notably, precision@N and recall@N exhibit an inherent trade-off relationship, where improving one metric typically leads to a degradation in the other. To provide a comprehensive evaluation that balances both metrics, we adopted the F1-score as a unified measure of recommendation quality.
We utilized the following formula to calculate the recommendation precision (which measures the proportion of recommended items that users are actually interested in):
where R(u) represents the set of items recommended for user u, and T(u) denotes the set of items actually purchased by user u (specifically, the set containing two items from the validation data). Consequently,
represents the number of correctly recommended items within the recommended item set, while
denotes the total number of recommended items. Given that the recommended item set has a length of N, and each user in the test data possesses a set of two actually purchased items, the maximum achievable precision under these conditions is
.
We employ the following formula to calculate the recommendation recall (which measures the proportion of items that users are actually interested in that were successfully recommended):
where
represents the total number of items actually purchased by the user.
We utilize the following formula to calculate the recommendation F1-score (the harmonic mean of precision and recall, which comprehensively considers both metrics to provide a measure of overall performance):
5.4. Experimental Sets
Experiment 1: The recommendation performance of APGCN-CF is systematically evaluated against several baseline methods across two distinct datasets. A consistent set of evaluation metrics is employed to generate and assess personalized recommendation lists.
Experiment 2: To investigate the effectiveness of the graph convolutional network (GCN) component in APGCN-CF, which is utilized for capturing high-order node representations prior to collaborative filtering, we conduct an ablation study. Specifically, we construct a baseline variant, denoted as APGCN-CF-B, by removing the GCN component while maintaining all other architectural elements intact. The recommendation performance of APGCN-CF-B is then comprehensively compared against the complete APGCN-CF model through systematic evaluation of their generated recommendation lists.
Experiment 3: To evaluate the importance of spatio-temporal characteristics in minor agricultural product recommendation, we conduct an ablation study examining how temporal and geographical features influence recommendation quality. We design three variants of our model: APGCN-CF-C (retaining geographical information while removing temporal features), APGCN-CF-D (preserving temporal information while excluding geographical features), and APGCN-CF-E (removing both temporal and geographical components). By systematically comparing the performance of these variants against the complete APGCN-CF model across both datasets, we can quantitatively assess the individual and combined contributions of spatio-temporal features to recommendation effectiveness.
5.5. Performance Comparison
5.5.1. Experiment 1
In this section, we present the experimental results of our comparative analysis of different parameter configurations. We conducted a comprehensive evaluation of the proposed APGCN-CF model against baseline methods across two distinct datasets, focusing on three key performance metrics: accuracy, recall, and F1-scores.
Figure 3 presents a detailed comparative analysis of the recommendation performance, illustrating how APGCN-CF performs relative to the baseline methods across all two datasets in terms of these critical evaluation metrics.
Our comprehensive experimental analysis reveals the superior performance of APGCN-CF compared to baseline approaches across multiple evaluation metrics. The performance comparison results can be analyzed from the following aspects:
First, compared to graph-based baseline NGCF, APGCN-CF demonstrates consistent superiority across top-5 and top-10 metrics on both datasets. While NGCF leverages graph neural architectures, its simple message-passing mechanism fails to capture the complex spatio-temporal dependencies in minor agricultural product recommendation scenarios. Specifically, NGCF’s embedding propagation layer shows limited capability in modeling seasonal and regional characteristics, resulting in substantially lower F1-scores on both garlic and scallion datasets.
Second, traditional matrix factorization-based methods (MF and SVD) exhibit inherent limitations in our experimental scenarios. MF’s inability to model high-order feature representations leads to suboptimal recall performance, particularly in top-5 and top-10 recommendations. Although SVD shows marginal improvements over MF in top-10 and top-20 scenarios through linear factorization, it still underperforms compared to graph-based approaches due to its limitations in modeling non-linear relationships and dynamic user preferences.
Third, conventional collaborative filtering approaches (item-CF and user-CF) demonstrate significant performance gaps. User-CF achieves notably low F1-scores (0.031 and 0.045 for garlic and scallion, respectively, in the top-5 scenario), primarily due to the high sparsity of user–item interactions impeding similar user identification. Item-CF’s elementary similarity calculation mechanism proves insufficient for capturing critical seasonal and regional patterns in minor agricultural product recommendations.
In contrast, APGCN-CF achieves superior performance through its sophisticated architecture for high-order feature extraction and dynamic recommendation generation. Quantitative results show that APGCN-CF attains remarkable recall scores (0.146 and 0.193 for garlic and 0.140 and 0.234 for scallion for the top-5 and top-10 scenarios, respectively), consistently outperforming all baselines. The integration of temporal and spatial dimensions enables APGCN-CF to effectively model seasonal variations and geographical attributes, thereby generating highly personalized recommendations aligned with users’ contextual preferences. This comprehensive modeling capability is particularly evident in top-20 metrics, where APGCN-CF demonstrates significant advantages in both precision and recall, validating its effectiveness in minor agricultural product recommendation scenarios.
5.5.2. Experiment 2
In this section, we present a systematic ablation study (Experiment 2) comparing the performance between APGCN-CF and its baseline variant APGCN-CF-B. The comparative analysis focuses on three key evaluation metrics: precision, recall, and F1-scores, assessed across two minor agricultural product datasets (garlic and scallion).
Figure 4 presents a comprehensive performance comparison of the recommendation results, demonstrating the significant impact of integrating graph convolutional networks into the recommendation framework.
The ablation study reveals the consistent superiority of APGCN-CF over APGCN-CF-B across both datasets, demonstrating the critical role of the graph convolutional component in enhancing recommendation performance.
For the garlic dataset, APGCN-CF exhibits remarkable improvements in top-15 recommendations, achieving a precision of 0.030 and recall of 0.226, compared to APGCN-CF-B’s 0.028 and 0.215, respectively. This performance gain can be attributed to the GCN module’s capability to capture higher-order user–item interaction patterns and propagate behavioral information through the graph structure, effectively modeling both explicit and implicit relationships.
Similarly, in the scallion dataset evaluation, APGCN-CF demonstrates notable advantages in top-20 recommendations, with precision of 0.034 and an F1-score of 0.061, significantly outperforming APGCN-CF-B’s corresponding metrics of 0.031 and 0.044. These results underscore the effectiveness of graph convolution operations in addressing data sparsity through neighborhood information aggregation, leading to more refined node representations and, subsequently, more accurate recommendations.
The empirical evidence conclusively establishes that the integration of GCN architecture serves as a crucial enhancement to the recommendation framework, particularly in addressing the inherent challenges of sparse data and complex user–item relationships in minor agricultural product scenarios. This is attributed to GCN’s ability to capture higher-order connection patterns in user–item interactions through message-passing mechanisms, efficiently discovering implicit relationships that traditional methods struggle to identify; furthermore, GCN facilitates knowledge transfer from high-density regions to sparse areas through information propagation across the graph structure, generating more robust node representations that effectively resolve data sparsity issues in recommendation systems. GCN can also naturally incorporate multi-source heterogeneous information such as temporal and geographical data into a unified representation learning framework, which is essential for minor agricultural product recommendation scenarios characterized by strong spatio-temporal features.
5.5.3. Experiment 3
This section presents a comprehensive ablation study examining the influence of spatio-temporal features within the APGCN-CF framework. Given that minor agricultural products exhibit distinctive temporal and spatial characteristics, understanding the contribution of these features to recommendation performance is essential for developing effective recommendation systems in rural e-commerce platforms.
To systematically evaluate the impact of spatio-temporal components, we constructed three model variants: APGCN-CF-C (incorporating geographical information while excluding temporal features), APGCN-CF-D (retaining temporal patterns while eliminating geographical information), and APGCN-CF-E (removing both temporal and geographical components).
Figure 5 illustrates the comparative performance of these variants against the complete APGCN-CF model across both Garlic and Scallion datasets.
The experimental analysis reveals the consistent superiority of the complete APGCN-CF model over all three variants across multiple evaluation metrics, thereby demonstrating the critical importance of integrating both temporal and geographical information for precise minor agricultural product recommendations. In the garlic dataset evaluation, APGCN-CF-C exhibited notably better performance than APGCN-CF-D, suggesting that geographical features may play a more significant role in modeling user preferences for this particular product. This pattern was similarly observed in the scallion dataset, albeit with reduced magnitude, indicating that the dependency on specific spatio-temporal features may vary across different categories of minor agricultural products.
Particularly noteworthy is the substantial performance degradation observed in APGCN-CF-E, where the simultaneous removal of both temporal and geographical components resulted in significantly reduced F1-scores for top-10 recommendations across both datasets. This finding provides compelling evidence for the synergistic effect of spatio-temporal features in enhancing recommendation quality for minor agricultural products.
The ablation study empirically validates our research hypothesis regarding the essential role of spatio-temporal characteristics in minor agricultural product recommendation. Temporal features facilitate the capture of seasonal variations in product availability and demand patterns, while geographical components enable the model to address regional preferences influenced by culinary traditions and agricultural production cycles. The superior performance demonstrated by the complete APGCN-CF model confirms that our framework effectively leverages these spatio-temporal dependencies to generate more accurate and contextually relevant recommendations for minor agricultural products in rural e-commerce environments, thereby addressing the unique challenges posed by the pronounced seasonality and regional specificity inherent to these products.
5.6. Impact of Hyperparameters on MAE and RMSE
This section presents a comprehensive analysis of how various hyperparameter configurations influence the recommendation model’s performance, with particular emphasis on the evaluation metrics of mean absolute error (MAE) and root mean square error (RMSE). The investigation systematically examines the correlations between model performance and key hyperparameters, specifically the depth of graph convolutional network (GCN) layers and the dimensionality of hidden layer representations. These findings provide theoretical foundations for achieving an optimal balance between prediction accuracy and computational efficiency in the minor agricultural product recommendation system.
5.6.1. Error Metrics
In the training dataset, each user–item interaction edge (u, i) incorporates a true preference signal
, represented by user ratings. Following the processing through the graph convolutional network (GCN), we derive embedding vectors for users and items, denoted as
and
, respectively. These embeddings are subsequently concatenated as
and input into a trained multi-layer perceptron (MLP) module to predict the preference score:
To evaluate the discrepancy between the predicted and true values, MAE and RMSE are defined as follows:
where
represents the set of user–item interaction pairs in the training and validation sets. The mean absolute error (MAE) metric quantifies the average magnitude of prediction deviations from ground truth values, whereas the root mean square error (RMSE) emphasizes larger prediction discrepancies by incorporating squared errors in its calculation.
5.6.2. Experimental Setup
To systematically investigate the optimal model architecture and hyperparameter configuration, this study analyzes the influence of hyperparameters through a dual-perspective approach:
Number of GCN Layers: We systematically evaluate the effectiveness of graph feature extraction at varying depths by adjusting the number of graph convolutional layers L. The experimental investigation encompasses a range of .
Hidden Layer Dimensions: We analyze the influence of feature space capacity on model performance by modifying the dimensionality d of GCN hidden layers. The experimental investigation encompasses a range of .
In this experimental investigation, we maintained consistent values for key hyperparameters, including learning rate, batch size, and dropout rate, across all experimental sets. The evaluation was conducted on two distinct datasets: the garlic and scallion datasets. To effectively visualize the influence of hyperparameters on model performance metrics (MAE and RMSE), we generated three-dimensional visualization plots. In these plots, the X-axis represents the number of graph convolutional network (GCN) layers, the Y-axis denotes the dimensionality of hidden layers, and the Z-axis displays the corresponding MAE or RMSE values.
5.6.3. Experimental Results and Analysis
Figure 6 illustrates the impact of varying GCN layer depths and hidden layer dimensions on the model’s MAE and RMSE performance for the garlic and scallion datasets.
The experimental analysis reveals distinct optimal hyperparameter configurations across different datasets. For the garlic dataset, as shown in
Figure 6a,b, the model exhibits optimal performance with three GCN layers and 64-dimensional hidden representations, achieving minimal MAE and RMSE values. This configuration leverages the dataset’s high interaction density to construct comprehensive user–product relationship graphs. The three-layer architecture effectively captures high-order interaction patterns and uncovers latent semantic relationships, while the 64-dimensional embeddings provide adequate capacity for encoding complex feature interactions. However, deeper architectures (beyond three layers) lead to performance degradation due to redundant information and signal dilution, consistent with Kipf and Welling’s findings [
21] regarding the homogenization of node features in deeper networks.
In contrast, the scallion dataset demonstrates optimal performance with shallower architectures (two GCN layers) and reduced dimensionality (32 or 64 dimensions), as evidenced by
Figure 6c,d. This configuration addresses the dataset’s inherent sparsity, where user–product connections are more limited. The shallower architecture effectively captures local interaction patterns while avoiding over-smoothing issues characteristic of deeper networks. This observation aligns with He et al.’s findings in their LightGCN model [
5], where simplified architectures demonstrated superior performance in sparse scenarios. The reduced dimensionality proves particularly effective by constraining the embedding space, thereby enhancing model generalization and robustness. The graph convolutional mechanism maintains recommendation reliability through effective neighborhood information aggregation, even under sparse conditions.
These findings underscore the critical role of GCN depth and hidden layer dimensionality in determining recommendation performance. When GCN layers are set to two or three, recommendation performance is superior, which closely relates to the working mechanism of GCN. With excessive GCN layers, the over-smoothing problem emerges, wherein node representations become increasingly homogeneous as layer count increases, causing the model to lose discriminative capacity. This aligns with findings from numerous graph-related research studies indicating that two to three layers of graph convolutional networks represent an optimal balance point. Hidden layer dimensionality performs best at 32 or 64 dimensions, as excessive dimensionality leads to overfitting issues, particularly when agricultural product datasets are not exceptionally large, while insufficient dimensionality fails to adequately express the feature relationships between users and products.
5.7. Discussion
The comprehensive experimental evaluation of APGCN-CF demonstrates its significant advantages over existing state-of-the-art methods in minor agricultural product recommendation scenarios. The superior performance can be attributed to several key factors: first, the integration of graph convolutional networks substantially enhances the model’s capability to capture high-order connectivity patterns in user–item interactions, as evidenced by the consistent improvements observed in ablation studies comparing APGCN-CF with APGCN-CF-B. The graph convolution operations effectively model complex relationships between users and agricultural products, leading to more accurate preference predictions. Second, our framework successfully addresses data sparsity challenges through neighborhood information aggregation mechanisms that leverage the structural properties of user–item interaction graphs. This is particularly valuable in minor agricultural product recommendation contexts where user–item interactions are typically limited. Finally, the proposed approach effectively incorporates spatio-temporal characteristics into the recommendation process, capturing the seasonal and regional dependencies inherent in agricultural product consumption patterns. This is reflected in the improved recommendation accuracy across both garlic and scallion datasets, where temporal and geographical factors significantly influence purchasing behaviors.
Our third experiment further validates the importance of spatio-temporal characteristics in the recommendation process. By systematically removing temporal features (APGCN-CF-C), geographical features (APGCN-CF-D), or both (APGCN-CF-E), we quantified the contribution of these components to recommendation performance. The experimental results definitively demonstrate that the complete APGCN-CF model, which considers both temporal and geographical information simultaneously, consistently outperforms all variants, confirming the efficacy of our proposed spatio-temporal fusion mechanism. Particularly significant is the observation that performance degradation is most severe when both temporal and geographical components are eliminated (APGCN-CF-E), providing compelling evidence for the critical synergistic role of spatio-temporal characteristics in minor agricultural product recommendation. The distinctive performance patterns exhibited by APGCN-CF-C and APGCN-CF-D across different datasets further suggest that the relative importance of temporal versus geographical features may vary according to product-specific characteristics, underscoring the necessity of our comprehensive modeling approach.
Despite these promising results, several limitations and challenges warrant acknowledgment. The computational complexity of graph convolutional networks may pose scalability constraints as datasets continue to expand. While our hyperparameter analysis provides guidance for optimizing model architecture, the trade-off between model expressiveness and computational efficiency remains a significant challenge, particularly for large-scale rural e-commerce platforms with limited computational resources. Additionally, the current implementation of temporal dynamics modeling may not fully capture the complex seasonal patterns in agricultural product demand, which can be influenced by multiple factors, including holidays, climate variations, and cultural practices. Furthermore, while our framework demonstrates the effective fusion of user and product features, the integration of additional heterogeneous information sources presents both opportunities and challenges. Incorporating contextual information such as detailed product attributes, user demographic data, and environmental factors could potentially enhance recommendation quality but would increase model complexity and computational requirements.
6. Conclusions and Discussion
Minor agricultural products, characterized by their strong seasonality, regional specificity, and heterogeneous user preferences, present significant challenges for recommendation systems in rural e-commerce platforms: highly sparse interaction data, complex spatio-temporal dependencies, and cold-start scenarios. These challenges have created notable research gaps in developing effective recommendation frameworks specifically tailored to these products. To address these challenges, this study introduces APGCN-CF, a novel framework that synergistically integrates graph convolutional networks with collaborative filtering. Leveraging the unique properties of graph structures, our approach captures high-order connectivity patterns in user–agricultural product interactions while effectively modeling the regional and temporal characteristics inherent to minor agricultural products as well as other key information that influences user purchasing tendencies. Comprehensive experimental evaluations conducted on real-world datasets featuring scallions and garlic demonstrate that our proposed approach consistently outperforms existing state-of-the-art baseline methods across multiple performance metrics, including precision, recall, and F1-score. The significant improvements observed in the ablation studies further validate the effectiveness of the graph convolutional architecture in enhancing node representations and addressing data sparsity challenges, thus providing a robust solution for optimizing minor agricultural product recommendations in rural e-commerce environments.
Despite these promising results, several limitations warrant further investigation and improvement in future work. First, the inherent computational complexity of graph convolutional networks may introduce significant performance bottlenecks as data scales increase, with high computational costs potentially limiting the model’s practical application in production environments when dealing with large-scale data. In future research, we will investigate graph sampling techniques and distributed computing frameworks to reduce computational costs by decreasing the number of nodes and edges that need to be processed, thereby improving computational efficiency while maintaining recommendation quality. Second, although our framework cleverly incorporates temporal information through graph structures, the seasonal patterns of small-scale agricultural products are influenced by multiple factors, including climate change, cultural festivals, and agricultural cycles, and existing temporal encodings may not fully capture these subtle temporal dependencies. We plan to develop an adaptive temporal encoding mechanism that automatically adjusts according to the specific seasonal patterns of different agricultural products and integrates meteorological data and agricultural production cycle information to more precisely capture complex temporal dependencies. The development of privacy-preserving recommendation techniques that protect user data while maintaining recommendation quality will also be prioritized, addressing the growing concerns regarding data privacy in rural e-commerce scenarios while ensuring the practical applicability of advanced recommendation systems in agricultural commerce platforms.