1. Introduction
Gastric cancer (GC) ranks as the fifth most common type of cancer and is a main cause of cancer-specific mortality worldwide [
1]. In 2020, GLOBCAN estimated that the overall age-standardized incidence rate of GC was 11.1 per 100,000, while it was 15.8 and 7.0 per 100,000 for males and females, respectively [
2]. In 2018, the Korean Central Cancer Registry reported that the age-adjusted incidence rates of GC for all registrants, males, and females were 30.4, 44.3, and 18.3 per 100,000, respectively [
3].
People consume diverse foods and nutrients as part of a meal that includes a complex combination of dietary components [
4,
5]. Thus, assessing dietary intake as a pattern but not as the sum of a single food item or a nutrient together has an impact on our understanding of the complexity of the diet [
6]. The dietary patterns approach has been applied in several nutritional epidemiology studies to observe the association of diet with health, particularly GC [
4,
7,
8,
9,
10,
11,
12]. The application of innovative exploratory methods such as Gaussian graphical models (GGMs) is important to derive dietary patterns that help identify the internal patterns as a graphical network [
13]. Meaningfully correlated food groups that are likely to be possible variables used to examine the relationship between diet and disease risk can be recognized from the derived dietary pattern networks [
14]. GGMs are useful as an exploratory data analysis method to identify the conditional independence structure of a given data set. They assess the pairwise correlation between two variables after controlling for the remaining variables, and the conditional independence is quantified as a partial correlation coefficient. The precision matrix, which is also known as the inverse of the covariance matrix, can be used to obtain the partial correlation coefficient of two random variables based on the other variables [
13]. We applied the GGM approach to derive dietary patterns in a relatively large case-control study and further observed the association with GC risk in a Korean population; we found that the vegetable and seafood pattern network was significantly associated with a reduced risk of GC in the total and male populations [
5].
The human gut hosts over ten thousand species of microorganisms, and supplies a wide array of energy and nutrient sources to facilitate the normal growth and functions of these microbes [
15]. Recent epidemiological studies noted that the lack of a well-balanced microbial community in the stomach, known as gastric microbial dysbiosis, increases the occurrence of GC carcinogenesis due to inflammation [
16,
17,
18,
19,
20]. A study profiling the gastric microbial community revealed that gastric dysbiosis is directly associated with the risk of GC [
21]. We also derived a microbial dysbiosis index (MDI) in a case-control study to observe associations between alterations in the gastric microbiome and GC risk, in which females with a higher MDI showed a significantly higher risk of GC [
22].
Intestinal dysbiosis occurs due to the lack of substrate availability for the growth and functions of microbes [
23]. Diet exerts a direct effect on the gut microbiome, particularly by providing substrates and promoting bacterial colonization [
24]. A high intake of saturated fat increases bile acid secretion and bile acid in the intestine that produces hydrophobic secondary bile acids, which eventually change the composition of the gut microbiome and are associated with metabolic diseases [
25]. Few studies have examined the effect of dietary patterns on the gut microbiome [
26]. Moreover, a few dietary patterns have been assessed for their effect on the gut microbiome, specifically Mediterranean and vegetarian diets [
27,
28,
29]. Thus, a reasonable hypothesis is that an interaction between dietary patterns and the gastric microbiome exists to modulate the risk of GC. In this study, we hypothesized the existence of an interaction between dietary patterns and the gastric microbiome in determining the risk of GC. In this case-control study, we applied GGMs as an innovative approach to derive dietary patterns and subsequently observed the combined effect of GGM-derived dietary patterns on the risk of GC in a Korean population.
2. Materials and Methods
2.1. Study Population
We recruited participants from the National Cancer Center Hospital in Korea from March 2011 to December 2014. For patients with GC, participants who had a histologically confirmed diagnosis of early GC within the preceding 3 months were selected at the Center for Gastric Cancer. An invasive carcinoma confined to the mucosa and/or submucosa, regardless of the lymph node metastasis status, was considered to define early GC. We recruited healthy controls from health screening examinations at the Center for Cancer Prevention and Detection at the same hospital. The following exclusion criteria were applied to both patients with GC and controls at the point of investigation: patients diagnosed with type 2 diabetes mellitus, a history of cancer in the last 5 years, advanced stage GC, mental or severe systemic diseases, and pregnant and currently breastfeeding women. Control participants who were diagnosed with gastric or duodenal ulcers during the examination were additionally excluded. None of the subjects had a previous history of treatment for Helicobacter pylori (HP).
Initially, 500 patients with GC and 1227 controls consented to participate in this study. Due to incomplete self-administered questionnaires and semiquantitative food frequency questionnaires (SQFFQ), 26 patients with GC and 30 controls were excluded. Those who had an implausible total energy intake (<500 kcal or ≥4000 kcal) were additionally excluded (five patients with GC and ten controls). Then, frequency matching was performed for patients and controls based on the distribution of age within five years and sex at a ratio of 1:2 to select 415 patients with GC and 830 controls. Based on the availability of the metagenomics data, we further excluded 147 patients and 542 controls. Finally, 268 patients with GC and 288 healthy controls (men, 353; women, 203) were selected for the analysis. (
Supplemental Figure S1). The Institutional Review Board of the National Cancer Center [IRB Number: NCCNCS-11-438] approved this study. All study participants provided written informed consent.
2.2. Data Collection
After an examination of the stomach and endoscopy during the period of data collection, five biopsy samples of the gastric mucosa were collected from each participant according to the Sydney system. For the metagenomics analysis, a biopsy sample at least 3 cm away from each tumor of the greater curvature was obtained [
30]. Three tests were performed, namely, the rapid urease test, a serological test, and a histological evaluation to determine the HP infection status. Three histological types were considered, namely, intestinal, diffuse and mixed. The intestinal type is a tumor that is well differentiated, grows slowly and tends to form glands, while the diffuse type is a tumor that is poorly differentiated, behaves aggressively and tends to scatter throughout the stomach. The mixed type is composed of both intestinal and diffuse types. One biopsy sample was obtained from the greater curvature of the corpus for the rapid urease test, while 4 biopsy samples were obtained from the lesser curvature of the corpus and antrum for the histological evaluation. A pathologist who specialized in GC carried out Wright–Giemsa staining of the biopsy specimens to determine the HP status. If at least one positive result was obtained from the rapid urease test or from the histological evaluation of four biopsy sites, the subject was classified as positive for the HP status. [
30].
A self-administered questionnaire was distributed to each participant to collect data on demographics, lifestyle, regular exercise, and medical history. A previously validated SQFFQ was used to obtain dietary information. [
31]. The average frequency of food intake and portion sizes in the previous year was collected from all study participants. Nine categories regarding the frequency of food consumption were available in the SQFFQ (never or rarely, once a month, 2–3 times a month, once or twice a week, 3–4 times a week, 5–6 times a week, once a day, twice a day, and 3 times a day), along with three categories regarding portion sizes (small, medium, and large). For the regular exercise assessment, we asked whether the participant engaged in regular exercise through a self-administered questionnaire to obtain a response of either yes or no.
2.3. 16S rRNA Gene Sequencing
Biopsy samples were used to extract DNA with the MagAttract DNA Blood M48 kit (Qiagen, Hilden, Germany) and BioRobot M48 automatic extraction equipment (Qiagen) according to the manufacturer’s instructions. 16S rRNA gene V3–V4 primers were used to amplify input gDNA (12.5 ng), and multiplexing indices and Illumina sequencing adapters were added by performing a subsequent limited cycle amplification step. PicoGreen was used to normalize and pool the final products, and a LabChip GX HT DNA High Sensitivity Kit (PerkinElmer, Waltham, MA, USA) was used to verify library sizes. Then, the MiSeq™ platform (Illumina, San Diego, CA, USA) was used to perform sequencing. The Illumina 16S rRNA gene Metagenomic Sequencing Library protocols were followed to prepare each sequenced sample. PicoGreen and Nanodrop analyses were used to measure the DNA quantity and quality, respectively. The 16S rRNA genes from the 288 control samples and the 268 samples from patients with GC were amplified using 16S rRNA gene V3–V4 primers. The primer sequences were as follows: 16S rRNA gene V3–V4 primer:
16S rRNA gene Amplicon PCR Forward Primer
5′TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG and
16S rRNA gene Amplicon PCR Reverse Primer
5′GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCTAAT CC.
QIIME2 artifact files were obtained after importing the already demultiplexed paired-end FASTQ files. After using Cutadapt to remove barcodes/adaptors, the DADA2 pipeline was used to exclude noise reads, dereplicate sequences, cluster sequences, and remove chimeras using QIIME v2.2019.7 [
32]. After obtaining the amplicon sequence variant (ASV) table, the Ezbio database [
33] was used to count the taxonomic abundance. Host mitochondria and chloroplasts, archaea, eukaryotes, and unassigned reads were filtered before the relative abundances were calculated. For the normalization of the microbial composition, the values calculated from the taxonomic abundance count divided by the number of preprocessed reads for each sample were used and relative abundances were obtained.
2.4. Statistical Analysis
The chi-square test and Student’s
t-test were applied to the categorical and continuous variables, respectively, to compare general characteristics between patients with GC and controls. The 410 types of food included in the 106 food items listed in the SQFFQ were grouped into 33 food groups based on the similarities of nutrient profiles and culinary usage (
Supplemental Table S1). The means ± standard deviations (SD) for the intake of each food group were calculated, and the mean dietary intake was compared between patients and controls using Student’s
t-test.
Dietary pattern networks were derived using GGMs from the dietary intake variables. The theoretical background of the GGMs was previously described [
13]. A detailed description of the methods used in this approach is provided in our previously published study [
5]. The approach is briefly summarized below. The log transformed [log
10 (g/d + 1)] values of dietary intake variables were calculated, and the inverse covariance matrix was obtained after conversion of the dietary data into a data matrix. The series of regularization parameter (λ) values (0.96–0.09) was obtained using the “huge” package in R. Graphical lasso (least absolute shrinkage and selection operator) in the “glasso” package in R was applied to obtain the sparse inverse covariance matrix (precision matrix) corresponding to the optimum λ (0.37). The precision matrix was imported into the yEd graph editor and visualized as a dietary network. Sex-specific pattern networks were derived to observe the sex-specific networks and their associations with GC in subgroup analyses. We calculated the strength values of each node in terms of centrality in the dietary network and combined them with the dietary intake data to calculate a network-specific score for each study participant: lg ⅀ (Intake of food group * Sum of the weights of edges connected to the node). The network-specific score was used as the exposure variable to observe the associations. The network-specific scores were categorized into tertiles according to the distribution of the controls. We used the lowest tertile of the pattern score as the reference group. Unconditional logistic regression models were used to calculate odds ratios (ORs) and 95% confidence intervals (CIs). The median values were calculated for each tertile category of the network-specific scores and used as continuous variables to test for trends. Statistical models were used to estimate ORs: model I was the crude model; model II was adjusted for age, sex, family history of GC, smoking status, regular exercise, education, occupation, income, and total energy intake; and model III was additionally adjusted for the HP infection status.
A compositional analysis of the microbiome data was performed using compositionality corrected by renormalization and permutation (CCREPE). This approach derives accurate significance values for arbitrary association measures such as correlations or other similarity scores from the compositional data. CCREPE is available in an R package (publicly available through R/Bioconductor;
http://huttenhower.org/ccrepe, accessed on 1 September 2020), and it consists of an N-dimensional checkerboard score (NC-score). The NC-score is a novel measure used to assess similarity and is specifically designed for the detection of association patterns in the human microbiome and other microbial communities. It is an extension of the arbitrary nominal categories of the classical checkerboard score to assess the co-occurrence of species. For each pair of microbes m
1 and m
2, the NC-score counts the normalized number of covariations and coexclusions over all pairs of samples s
1 and s
2 [
34]. The subcorrelation matrix of the NC-score was extracted based on two criteria: FDR-corrected Q-values <0.05 and pairs of genera NC-scores |>0.30| after performing the CCREPE analysis using 74 genera. Based on these criteria, 64 genera were identified as candidate genera for further analysis. The mean abundance of the genera was divided by that in the controls to identify the genera that increased (fold change > 1) and decreased in abundance in patients with GC (fold change < 1) and presented as fold changes in the abundance of the selected genera. Although a specific definition for MDI is not available, it is a single number that represents or quantifies the imbalance of the microbial community using the abundance of groups of bacteria [
35]. The MDI was calculated as the log of [the total abundance of genera increased in patients with GC] over [the total abundance of genera decreased in patients with GC] [
21,
36].
The MDI was categorized into tertiles based on the distribution of the controls. We used the lowest teritle of the MDI as the reference group. Unconditional logistic regression models were applied to estimate ORs and 95% CIs. The median values of the MDI in each category were used as continuous variables to test for trends. The OR estimates were calculated for the crude model (model I) and model II. Model II was adjusted for age, smoking, first-degree family history of GC, regular exercise, education, occupation, monthly income, and total energy intake. The interaction between GGM-derived dietary patterns and MDI in relation to GC was tested using logistic regression models via likelihood ratio tests. All analyses were performed using SAS version 9.4 software (SAS Inc., Cary, NC, USA).
4. Discussion
To the best of our knowledge, this study constitutes the first investigation of the combined effect of dietary patterns and gastric microbial dysbiosis on the risk of GC in a Korean population. Subjects who were in the third tertile of the fruit pattern network showed a significantly decreased risk of GC in both the total and female populations. Females who had higher MDI presented a significantly increased GC risk. Our novel findings indicated an interaction between high vegetable and seafood intake pattern score and that a low MDI reduced the risk of GC in males (OR: 0.44; 95% CI: 0.22–0.91; p-interaction = 0.021), whereas in females, an interaction between a high dairy intake pattern score and a low MDI on lowering the risk of GC was found (OR: 0.23; 95% CI: 0.07–0.76; p-interaction = 0.018).
We used GGMs, which constitute one of the most powerful exploratory methods for the analysis of dietary patterns. This approach has been applied in previous studies to derive dietary patterns in a healthy German population, and the findings indicated that the pairwise correlation between two food groups can be assessed using GGMs, which help identify how various food groups are consumed with respect to each other [
13]. We found that the main network derived for the total population (vegetable and seafood pattern) was composed of nine food groups. Most of the food groups in the vegetable and seafood networks were clustered around green/yellow vegetables and light-colored vegetables. Clearly, the food groups clustered in the main network are mainly related to a healthy dietary pattern in a Korean population because the traditional Korean diet is basically composed of vegetable and seafood items, as we observed based on the GGM-derived dietary patterns in our previous, larger case-control study [
5]. Interestingly, we observed that the fruit pattern was significantly inversely associated with the risk of GC in the total and female populations. According to the World Cancer Research Fund/American Association for Cancer Research, the intake of fruits is a significant factor protecting against the development of GC [
37]. Several antioxidant-related nutrients present in fruits, such as vitamin C and carotenoids, play a pivotal role in protecting against GC [
38,
39]. A case-control study in Sweden revealed that a healthy dietary pattern characterized by the consumption of vegetables, tomatoes, fruits, fish and poultry moderately reduces the risk of GC [
40]. We converted the dietary patterns into pattern-specific scores by combining the food group intake values and the node strengths of the nodes (food variables) clustered in the identified patterns. Since node strength represents the sum of the weights of edges connected to the node in the pattern network, this score represents both food group intake and the dependencies of the clustered food groups in a specific dietary pattern network. Thus, we propose that the pattern score provides a more accurate measure rather than using only food intake to assess the relationship between dietary patterns and the GC risk. Moreover, the use of GGM methodology to calculate quantitative scores or classifications of individuals based on identified networks still has a research gap and might be a future research topic as well [
13]. In addition to dietary patterns, several other risk factors associated with the risk of GC have been identified, and these specifically include the HP infection status and the gastric microbial community [
41].
Recent microbiome studies have shown that microbial dysbiosis is a critical risk factor for the occurrence of GC [
21,
22,
42,
43,
44]. We applied a novel statistical approach known as CCREPE to derive an MDI for a Korean population. Although a specific definition for MDI is not available, it is a single number that represents or quantifies the imbalance of the microbial community using the abundance of groups of bacteria [
35]. Interestingly, females in the third tertile of the MDI showed a significantly increased risk of GC compared with females in the lowest tertile. A previous study evaluating the association between gastric dysbiosis of the gastric microbiome and GC risk showed that patients with GC have a higher MDI than those with chronic gastritis [
21]. In females, the gastric microbiota is a principle regulator of circulating estrogens, and the microbiota secretes the β-glucuronidase enzyme to deconjugate estrogens into their active forms [
45]. However, the occurrence of gastric dysbiosis in association with a gastric microbial community with less diversity can impair the deconjugation process that reduces circulating estrogens. Alterations in circulating estrogen levels might facilitate the development of GC in females [
45]. Among various factors that influence the gastric microbiome under different pathological conditions, particularly GC, dietary patterns exhibit an interaction with the gastric microbiota in modulating the risk of GC [
46]. Thus, the combined effects of the gut microbiota and dietary patterns on the risk of human health, particularly human gastrointestinal cancers, in an epidemiological context must be observed.
Our novel findings revealed an interaction between high vegetable and seafood pattern score and a low MDI to attenuate the risk of GC in males (OR: 0.44; 95% CI: 0.22–0.91;
p-interaction = 0.021). Previous studies reported a synergistic relationship among various dietary components, such as vegetables, fruits, pickle foods, soy products, and meats, in GC development [
47,
48,
49]. Specifically, previous studies have revealed an interaction between HP infection and the consumption of cruciferous vegetables [
48] and broccoli sprouts [
50] in determining the risk of GC. Fresh vegetables contain various types of antioxidants functioning as protective agents that potentially ameliorate the effect of microbial dysbiosis. The gastric epithelium is protected by these antioxidants through different pathways, such as reducing chronic inflammation, and lowering endogenous nitrosation by serving as nitrite scavengers where reactive nitrogen species (RNSs) may not be created in the gastric lumen [
46] resulting in a reduction in the potential risk of GC. Although the association was not significant, males who had high meat and snack pattern score and high MDI showed an increased risk of GC (OR: 1.20, 95% CI: 0.52–2.77;
p-interaction = 0.089). Meat and snack are commonly high-fat dietary components can increase the abundance of bile-tolerant microorganisms and reduce the level of bacteria that metabolize plant polysaccharides, which potentially induce gastrointestinal carcinogenesis [
51,
52].
Furthermore, we observed a significant interaction between a high dairy pattern score and a low MDI to reduce the GC risk in females (OR: 0.23; 95% CI: 0.07–0.76;
p-interaction = 0.018). Although data from epidemiological studies that investigated the interaction between dietary patterns and dysbiosis in terms of the GC risk are scarce, a recent review paper highlighted that possible effects of probiotic-containing dairy foods reduce the risk of various types of gastrointestinal cancers by modulating immune parameters [
53]. From a biological perspective, microbial dysbiosis is directly associated with bacterial-induced inflammation because tumor necrosis factor- α (TNF-α), interleukin-6 (IL-6), and TGF-β stimulate reactive oxygen species (ROS) and RNS production from epithelial and immune cells [
54] and these processes are also be activated by TLR and NLR signaling [
55]. TLR signaling is transduced through various proteins, namely, myeloid differentiation primary response-88 (MyD88) and TIR-domain containing adapter-inducing interferon-β (TRIF). MyD88 and TRIF signaling induces the production of certain cytokines, such as TNF-α, interleukin-1 beta (IL-1β), IL-6, interferon gamma-induced protein 10 (IP-10), and interferon-γ (IFN-γ), by stimulating the transcription factors nuclear factor κB (NF-κB), activator protein 1 (AP-1), and interferon regulatory factor 3 (IRF-3). The structural rearrangement of the receptor is triggered by NLR activation which induces widespread signaling; during this process, several signaling pathways are activated to induce the production of inflammasomes, NF-κB, stress kinases, IRFs, and inflammatory caspases [
55]. Probiotic-containing dairy products reduce the levels of several cancer-related biomarkers produced in response to microbial dysbiosis and metabolic imbalances while increasing the production of IFN-γ, which exerts anticancer effects [
56]. Furthermore, a previous study showed that IFN-γ might exert direct negative effects on the proliferation of GC cells by affecting the cell cycle, particularly arresting the cells at G1/S phase [
57]. Thus, dairy products, specifically fermented milk products, should be included in a daily diet to reduce the risk of GC. Furthermore, these dietary habits might improve the bacterial diversity in the stomach to reduce the likelihood of dysbiosis due to their probiotic activity.
Our study has several strengths. To our knowledge, this study provides the first observation of the interaction effect between GGM-derived dietary patterns and gastric microbial dysbiosis on the risk of GC in a Korean population. The application of the sparsity method reduces the pattern to several foods, although actual consumption comprises a large number of foods and all of them must be retained in the dietary pattern. However, this approach may not always be correct for several possible reasons. First, dietary patterns lack a specific definition. The current definition is method-driven and can be defined operationally as data reduction [
58]. Existing dietary patterns have limitations, and different approaches may identify dietary patterns using different methods irrespective of the sparsity assumption, although they can all be called dietary patterns [
59]. The application of the sparsity assumption for the identification of patterns may vary with the research question and be associated with both pros and cons in those situations. For instance, a study applied sparsity to derive nutrient patterns associated with hormone receptor-defined breast cancer, where sparsity has advantages not only in identifying patterns but also in how foods are consumed in relation to each other [
60]. Second, we included a relatively large sample size to observe the associations and interaction effects with higher statistical power. Third, we considered several possible confounding variables that are potential risk factors for the association among dietary patterns, gastric microbiome and GC risk.
Our study has possible limitations. First, selection and recall biases must be considered due to the case-control design of the study. Second, because we did not perform a prospective study, the associations observed for gastric microbial dysbiosis and the GC risk might not represent a causal relationship because microbial profiles might be changed in patients with early GC because of premalignant lesions that had already progressed or by changing their dietary habits. Third, the potential bias in the exposure measures must be acknowledged since time elapsed between the assessment of the diet and evaluation of the microbiome in the biopsy samples using a metagenomics analysis. Fourth, limitations of conducting numerous tests without correction for multiple testing should be noted.