*Article* **Bacteria-Derived Extracellular Vesicles in Urine as a Novel Biomarker for Gastric Cancer: Integration of Liquid Biopsy and Metagenome Analysis**

**Jae-Yong Park 1,†, Chil-Sung Kang 2,†, Ho-Chan Seo 2, Jin-Chul Shin 2, Sung-Min Kym 3, Young-Soo Park 4, Tae-Seop Shin 2, Jae-Gyu Kim 1,\* and Yoon-Keun Kim 2,\***


**Simple Summary:** Gastric cancer shows an improved prognosis when diagnosed in its early stage. However, non-invasive diagnostic markers for gastric cancer known to date have poor clinical efficacies. Many studies have shown that gastric cancer patients have distinct microbial changes compared to normal subjects. In the present study, we performed metagenome analysis using body fluid samples (gastric juice, blood, and urine) to investigate the distinct microbial composition using bacteria-derived EVs from gastric cancer patients. We could build diagnostic prediction models for gastric cancer with the metagenomic data and analyzed the accuracy of models. Although further validation is required to apply these findings to real clinical practice yet, our study showed the possibility of gastric cancer diagnosis with the integration of liquid biopsy and metagenome analysis.

**Abstract:** Early detection is crucial for improving the prognosis of gastric cancer, but there are no non-invasive markers for the early diagnosis of gastric cancer in real clinical settings. Recently, bacteria-derived extracellular vesicles (EVs) emerged as new biomarker resources. We aimed to evaluate the microbial composition in gastric cancer using bacteria-derived EVs and to build a diagnostic prediction model for gastric cancer with the metagenome data. Stool, urine, and serum samples were prospectively collected from 453 subjects (gastric cancer, 181; control, 272). EV portions were extracted from the samples for metagenome analysis. Differences in microbial diversity and composition were analyzed with 16S rRNA gene profiling, using the next-generation sequencing method. Biomarkers were selected using logistic regression models based on relative abundances at the genus level. The microbial composition of healthy groups and gastric cancer patient groups was significantly different in all sample types. The compositional differences of various bacteria, based on relative abundances, were identified at the genus level. Among the diagnostic prediction models for gastric cancer, the urine-based model showed the highest performance when compared to that of stool or serum. We suggest that bacteria-derived EVs in urine can be used as novel metagenomic markers for the non-invasive diagnosis of gastric cancer by integrating the liquid biopsy method and metagenome analysis.

**Keywords:** extracellular vesicles; gastric cancer; liquid biopsy; biomarker; microbiome; 16S rRNA amplicon; metagenomics

**Citation:** Park, J.-Y.; Kang, C.-S.; Seo, H.-C.; Shin, J.-C.; Kym, S.-M.; Park, Y.-S.; Shin, T.-S.; Kim, J.-G.; Kim, Y.-K. Bacteria-Derived Extracellular Vesicles in Urine as a Novel Biomarker for Gastric Cancer: Integration of Liquid Biopsy and Metagenome Analysis. *Cancers* **2021**, *13*, 4687. https://doi.org/10.3390/ cancers13184687

Academic Editor: Takaya Shimura

Received: 21 August 2021 Accepted: 16 September 2021 Published: 18 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

Although the incidence of gastric cancer has been steadily decreasing worldwide, it still remains one of the most common and fatal causes of cancer death in the world [1]. The 5-year survival rate of gastric cancer is particularly low in the advanced stage [2]. However, the survival rate is much higher in countries where the majority of gastric cancers are newly diagnosed in early stages. This finding is evident in Korea and Japan, where a national screening program for gastric cancer is provided [3,4].

Endoscopic examination with pathologic confirmation is the primary diagnostic modality for gastric cancer. Although relatively safe, endoscopy is an invasive procedure and can cause serious complications occasionally. It can also be a burden in terms of cost. Liquid biopsy is considered the most promising area for cancer diagnosis in that it can be an alternative to traditional tissue biopsy methods, taking advantage of the recent development of next-generation sequencing (NGS) technology [5]. The monitoring system using liquid biopsy was initially studied based on cell-free DNA in blood, but it has recently developed by using body fluids such as stool, urine, or saliva. Several studies have been conducted or are now underway for serological diagnosis of gastric cancer [6]. However, there are no non-invasive markers with significant accuracy in the early diagnosis of gastric cancer yet, which can be used in real clinical settings.

*Helicobacter pylori* infection is a well-known risk factor for gastric cancer development. Recently, the rapid development of microbiome research has focused much attention on the possibility that microbiome other than *H. pylori* may be involved in the gastric carcinogenesis process [7,8]. The human gastrointestinal tract is the most complex ecosystem well known. The commensal microorganisms in it affect not only the immune system but also numerous physiological and metabolic processes in the human body, suggesting their role in the pathogenesis of various diseases [9,10]. Many studies are underway to establish the relationship between gastric cancer and microbiomes [7,8,11]. Gut microbe-derived extracellular vesicles (EVs) are emerging as novel proof of the relationship between commensal bacteria and host health conditions [12,13]. The EVs can act as intercellular communication mediators carrying various cargoes, including signaling molecules and transcription factors. Many studies have shown that EVs are associated with various immune responses in humans, causing inflammation or inhibiting reactions [14,15]. However, in the context of gastric carcinogenesis, few studies have been reported focusing on microbiome-derived EVs. Studies linking the EVs to early diagnosis of gastric cancer are even more scarce.

In this context, we conducted a metagenome analysis using microbiome-derived EVs in stool, urine, and serum samples from a large number of prospective cohorts of gastric cancer patients and healthy controls. After identifying the potential microbial biomarkers, we developed diagnostic prediction models for gastric cancer with various types of liquid biopsy samples and validated the diagnostic performance of each model.

#### **2. Materials and Methods**

#### *2.1. Subjects and Sample Collection*

In total, 272 healthy people (159 males and 113 females) and 181 gastric cancer patients (122 males and 59 females) were enrolled from Haewoondae Baek Hospital (Busan, Korea), Chung-Ang University Hospital (Seoul, Korea), and Seoul National University Bundang Hospital (Seongnam, Gyeonggi-do, Korea) between December 2016 and December 2019. Among the patient's medical records, medical history, age, sex, endoscopic diagnosis, and pathologic results were reviewed. The inclusion criteria for the gastric cancer group were patients newly diagnosed with gastric cancer who did not undergo endoscopic or surgical resection or chemotherapy yet. The healthy control group included those without evidence of dysplasia or gastric cancer on the endoscopic examination. Patients were excluded if they had a previous history of gastrointestinal surgery, were pregnant, or were taking antibiotics, probiotics, or acid-suppressing drugs within the previous 3 months, as these conditions can temporarily alter the gut microbial composition. The minimum duration of drug cessation was determined by referring to previous literature [16–19]. This study

protocol was approved by the Institutional Review Board of Haewoondae Baek Hospital (IRB No. 129792-2015-064), Chung-Ang University Hospital (IRB No. 1772-001-290), and Seoul National University Bundang Hospital (IRB No. B-1708/412-301). Stool, serum, and urine samples were collected from the subjects for metagenomics analyses. All participants ate a regular Korean diet a day before sampling and did not smoke or drink alcohol. The regular Korean diet is characterized by high levels of whole grains, vegetables, and low levels of animal-derived foods and saturated fat, particularly in contrast to the Westernstyle diet [20]. The stool sample was collected from the center of the stool, and placed in a sterilized container, and stored at −20 ◦C. For serum collection, we drew 3 mL of blood from each subject in an SST tube. The tube was then centrifuged (3000× *g*, 15 min, 4 ◦C) immediately after collection, and the serum was extracted. The supernatant was stored in Eppendorf tube 1 mL each at −20 ◦C. For urine, 40 mL of midstream urine was collected at a clean urine container and transferred to a conical tube, which was kept frozen at −20 ◦C.

#### *2.2. Bacterial and EV Isolation and DNA Extraction from Clinical Samples*

Bacteria EVs were isolated from the urine and serum of individuals following the procedure described previously [21,22]. Briefly, each urine sample was centrifuged at 10,000× *g* for 10 min at 4 ◦C. The supernatant was taken and passed through a 0.22 μm membrane filter to eliminate foreign particles and then quantified based on protein concentration. For serum, after mixing 100 μL of serum and 900 μL PBS, it was centrifuged at 10,000× *g* for 10 min at 4 ◦C to eliminate other components. The supernatant was taken and passed through a 0.22 μm membrane filter to eliminate foreign particles, and it was quantified based on protein concentration. The stool sample was mixed with PBS for dilution in a 1:10 ratio (1 g:10 mL) and maintained at 4 ◦C for 24 h. After dilution, the sample was centrifuged (10,000× *g*, 10 min, 4 ◦C) to separate the bacteria portion and the EV portion. The rest of the procedure was carried out in the same way as urine and serum.

Bacterial DNA extraction from prepared EVs was performed as described previously [23,24]. Briefly, isolated EVs (1 μg by protein, each sample) were boiled at 100 ◦C for 40 min, centrifuged at 13,000× *g* for 30 min, and the supernatants were collected. Collected samples were then subjected to bacterial DNA extraction using a DNA extraction kit (PowerSoil DNA Isolation Kit, MO BIO, Carlsbad, CA, USA) following the manufacturer's instructions. Isolated DNA was quantified by using the QIAxpert system (QIAGEN, Hilden, Germany). In the case of stool, DNA was extracted by dividing the pellet (including bacteria portion) and the supernatant (including EV portion) to compare each.

#### *2.3. PCR Amplification, Library Construction, and Sequencing of 16S rRNA Gene Variable Regions*

To perform microbiome analysis, 16S rRNA gene amplicon metagenome analysis was conducted. Prepared bacterial DNA was used for PCR amplification of the V3-V4 hypervariable regions of the 16S ribosomal RNA genes using the primer set of 16S\_V3\_F (5 - TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCTACGGGNGGCWGCAG-3 ) and 16S\_V4\_R (5 -GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGACTACHVGGGTATCT AATCC-3 ). The PCR products were used for the construction of 16S rRNA gene libraries following the MiSeq System guidelines (Illumina Inc., San Diego, CA, USA). The 16S rRNA gene libraries for each sample were quantified using QIAxpert (QIAGEN, Germany), pooled at the equimolar ratio, and used for pyrosequencing with the MiSeq System (Illumina Inc., San Diego, CA, USA) according to the manufacturer's recommendations.

#### *2.4. Analysis of Bacterial Composition in the Microbiome*

Paired-end reads were trimmed by cutadapt (ver. 1.1.6). The resulting FASTQ files containing paired-end reads were merged using CASPER and then quality filtered by Phred (Q) score. After merging, any reads under 350 bp or over 550 bp were also discarded. Next, a reference-based chimera detection step was performed using VSEARCH against the SILVA gold database. The sequence reads were clustered into operational taxonomic units (OTUs) using the de novo clustering algorithm, and the threshold was 97% sequence

similarity. Finally, OTUs were classified using UCLUST under default parameters with SILVA 132 database.

#### *2.5. Diagnostic Prediction Models for Gastric Cancer*

The selection of biomarkers for the diagnostic model was based on relative abundances at the genus level. Candidates for bacterial biomarkers were chosen based on two criteria: the statistically significant difference (*p* < 0.05) between control and cancer subjects, the average relative abundance of 0.1% or more for each group. The whole data sets were randomly divided into training sets and test set in a ratio of 8:2. The prediction model was constructed using logistic regression models for each sample type, using the training set. The biomarkers to be used in the model were selected through a stepwise method by eliminating unnecessary variables, and Akaike's information criterion (AIC) was used as a criterion for selection variables in the prediction model. After the diagnostic prediction models were built, the performance was evaluated on the test set for validation.

#### *2.6. Statistical Analysis*

To clarify the species abundancy between healthy control and gastric cancer group, alpha diversity of the variance within each clinical sample was assessed using the alpha diversity test in the phyloseq package in R for the total observed OTUs, richness estimates Chao1, and the Shannon and Simpson diversity indices. In order to avoid the alpha diversity bias, we rarified with the minimum read value for each sample. Dimension reduction was conducted to assess the beta diversity between clinical samples based on the Bray–Curtis dissimilarity using principal coordinate analysis (PCoA) and multiple dimension scale (MDS) in the stats package in R. Permutational multivariate analysis of variance (PERMANOVA) was used to validate either the centroid or the spread of each sample are different between the groups. The significant difference between the control group and the gastric cancer group was determined using a *t*-test, and *p*-values were adjusted using the Benjamini–Hochberg method to reduce the false discovery rates. Receiver operating characteristic (ROC) curves of gastric cancer diagnostic prediction models were developed through stepwise selection of significantly altered genera. The performance of the models was evaluated by assessing the area under the ROC curve (AUC), sensitivity, specificity, and accuracy. *p* value < 0.05 was considered statistically significant.

#### **3. Results**

#### *3.1. Clinical Characteristics of Subjects*

A total of 453 subjects were registered, including 181 gastric cancer patients and 272 healthy controls. Age and sex were matched between each sample group. A total of 813 samples from enrolled subjects were used for analysis. There was no statistical difference in sex and age between the gastric cancer group and control group in all four sample types: ST-Bac (bacteria portion extracted by using centrifuged pellet in stool), ST-EV (EV portion extracted by using centrifuged supernatant in stool), urine, and serum (Table 1).

**Table 1.** Basal clinical characteristics of each sample group.


SD, standard deviation; M, male; F, female.

#### *3.2. Comparison of Alpha and Beta Diversity between Healthy Controls and Gastric Cancer Patients*

Read numbers of 16S rRNA amplicons and OTU counts derived from the NGS results are shown in Table 2. In stool samples, alpha diversity did not show significant differences between the two groups for all the diversity indices (Figure S1A,B). In urine and serum samples, Simpson index and Chao1 index showed significant differences between the two groups, respectively (Figure S1C,D). However, the differences in alpha diversity were generally not obvious. To evaluate the alpha diversity that includes all samples, a rarefaction curve analysis was performed with the Chao1 index to calculate species abundance per sequence. The slope of the rarefaction curve was steeper in the healthy control group than in the gastric cancer group in all sample types, indicating higher alpha diversity in the control group than in the gastric cancer group (Figure 1A–D).

**Table 2.** Read numbers of 16S rRNA amplicons and OTU counts from the NGS results.


OTU, operational taxonomic unit; SE, standard error.

**Figure 1.** Comparison of alpha diversity between healthy controls and gastric cancer patients in 4 sample types. Estimated species richness (Chao1 measure) is demonstrated for the two groups (**A**) in stool bacteria (ST-Bac) isolated from stool pellet, (**B**) in stool-derived extracellular vesicle (ST-EV) isolated from stool supernatant, (**C**) in urine, and (**D**) in serum.

At the phylum level, the beta diversity showed significant differences between the control group and the gastric cancer group in all sample types (Figure 2A). At the genus level, these differences were repeatedly confirmed, with even higher statistical significances (Figure 2B). Although we could not detect a significant reduction in microbial diversity

in gastric cancer, the microbial composition in the gastric cancer group was significantly different from that in healthy controls.

**Figure 2.** Comparison of beta diversity in phylum and genus levels between healthy controls and gastric cancer patients in 4 sample types. PCoA results based on Bray–Curtis similarity for beta diversity of bacteria are shown (**A**) in phylum level and (**B**) in genus level.

#### *3.3. Relative Abundance Differences between Healthy Controls and Gastric Cancer Patients*

We compared the relative abundances of microbiome between the control group and the gastric cancer group to identify the taxa that were differentially represented in the two groups of subjects. In the comparison of phylum levels, heatmaps using ST-Bac samples showed that *Bacteroidetes* were reduced, while *Firmicutes* were increased in gastric cancer (Figures 3A and S2A). In ST-EV samples, *Bacteroidetes* were reduced, while *Actinobacteria* were increased in gastric cancer (Figures 3B and S2B). In urine samples, *Bacteroidetes* and *Fusobacteria* were both increased in gastric cancer, differently from ST-Bac and ST-EV samples (Figures 3C and S2C). In serum samples, *Verrucomicrobia* were reduced, while *Actinobacteria* were increased in gastric cancer (Figures 3D and S2D).

**Figure 3.** Relative abundance differences in phylum level between healthy controls and gastric cancer patients in 4 sample types. Heatmaps representing the relative abundances of microbiome between the two groups (**A**) in stool bacteria (ST-Bac) isolated from stool pellet, (**B**) in stool-derived extracellular vesicle (ST-EV) isolated from stool supernatant, (**C**) in urine, and (**D**) in serum.

When we compared the relative abundances in genus level, more bacteria became candidates showing the differences between the control group and gastric cancer group. In ST-Bac samples, *Prevotella 9* was decreased, while *Streptococcus*, *Subdoligranulum, Enterobacter, Lactobacillus*, *Klebsiella*, *Ruminiclostridium 9* were increased in gastric cancer (Figures 4A and 5A). In ST-EV samples, *Acinetobacter* was increased in gastric cancer (Figures 4B and 5B). In urine samples, the largest number of bacterial candidates were detected, revealing that *Acinetobacter*, *Stayphylococcus*, *Bifidobacterium*, and *Sphingomonas* were decreased, while *Corynebacterium 1*, *Neisseria*, *Fusobacterium, Diaphorobacter, Actinomyces, Porphyromonas, Cloacibacterium,* and *Peptoniphilus* were increased in gastric cancer (Figures 4C and 5C). In serum samples, *Bacteroides*, *Akkermansia*, *Muribaculaceae(f)*, and *Lachnospiraceae NK4A136* were decreased, while *Corynebacterium 1, Rhodococcus*, *Diaphorobacter*, *Haemophilus*, and *Cloacibacterium* were increased in gastric cancer (Figures 4D and 5D).

**Figure 4.** Relative abundance differences in genus level between healthy controls and gastric cancer patients in 4 sample types. Heatmaps representing the relative abundances of microbiome between the two groups (**A**) in stool bacteria (ST-Bac) isolated from stool pellet, (**B**) in stool-derived extracellular vesicle (ST-EV) isolated from stool supernatant, (**C**) in urine, and (**D**) in serum.

**Figure 5.** Compositional differences in genus level between healthy controls and gastric cancer patients in 4 sample types. The bar plots highlight the average relative abundance of individual key taxa between the two groups (**A**) in stool bacteria (ST-Bac) isolated from stool pellet, (**B**) in stool-derived extracellular vesicle (ST-EV) isolated from stool supernatant, (**C**) in urine, and (**D**) in serum. \*, *p* < 0.05; \*\*, *p* < 0.01.

#### *3.4. Comparison of Diagnostic Prediction Models for Gastric Cancer between Healthy Controls and Gastric Cancer Patients*

To further define useful biomarkers from metagenomic biomarkers, optimal models were built using biomarkers to distinguish between gastric cancer group and control group using logistic regression analysis (Figure 6A–D). Selected variables included in the prediction models according to each sample type were as follows: ST-Bac-based model, *Klebsiella, Subdoligranulum, Prevotella 9, Streptococcus, Ruminiclostridium 9;* stool EV-based model, *Acinetobacter*; urine-based model, *Peptoniphilus, Diaphorobacter, Neisseria, Staphylococcus, Bifidobacterium, Corynebacterium 1, Actinomyces, Acinetobacter, Sphingomonas*; serumbased model, *Diaphorobacter, Bacteroides, Corynebacterium 1, Rhodococcus, Cloacibacterium, Haemophilus, Muribaculaceae(f), Akkermansia*.

**Figure 6.** Comparison of gastric cancer diagnostic prediction models between healthy controls and gastric cancer patients in 4 sample types. ROC curves of gastric cancer diagnostic prediction models were developed through stepwise selection of significantly altered genera. Models were validated by performance evaluation in the test set by assessing the area under the curve (AUC), sensitivity, specificity, and accuracy. Performance indices of each model are demonstrated: (**A**) stool bacteria (ST-Bac) isolated from stool pellet, (**B**) stool-derived extracellular vesicle (ST-EV) isolated from stool supernatant, (**C**) urine, and (**D**) serum. SEN, sensitivity; SPE, specificity; ACC, accuracy.

As a result, the model using the urine samples showed the highest AUC value of 0.823 (Figure 6C). Although the sensitivity was rather low (67.7%) in this model, the specificity (84.9%) and accuracy (76.1%) were high when compared to other models. The ST-Bac-based model showed an AUC score of 0.764, lower than the urine-based model, but their sensitivity (71.4%), specificity (76.9%), and accuracy (74.1%) were generally high (Figure 6A). The ST-EV-based model and serum-based model revealed lower AUC values and accuracies than the two models mentioned above (Figure 6B,D).

To further analyze the performance of prediction models according to disease stages, we divided the gastric cancer group into early gastric cancer (EGC) and advanced gastric cancer (AGC) groups. The performance of the prediction model was higher in the AGC group than in the EGC group in all sample types except for ST-Bac Figure S3A–H). For EGC, the ST-Bac prediction model showed the highest AUC value of 0.830.

#### **4. Discussion**

We performed microbiome profiling of gastric cancer using various types of samples through analyzing the bacteria-derived EVs to find out diagnostic biomarkers for gastric cancer in correlation with gut microbes. There were significant differences in microbial composition between the gastric cancer group and the control group in all of the sample types. Diagnostic prediction models for gastric cancer were generated based on this information on metagenomic biomarkers, and the model using the urine samples showed the highest performance for the diagnosis of gastric cancer.

In this study, although there seemed to be a trend for reduced alpha diversity in gastric cancer, further analysis with various indices showed that the differences in microbial diversity between the two groups were not consistent among various sample types. This could also be indirectly inferred from the finding that the OTU values of the gastric cancer group and control group showed high standard deviations, which means that the number of microbial taxa in the two groups overlapped each other to some extent. This finding is different from previous reports, which suggest that reduced microbial diversity is often a characteristic feature of diseased status [25,26]. Several studies showed dysbiotic cancer-associated microbiome in gastric cancer, implying the role of microbial dysbiosis in gastric carcinogenesis [8,25]. This discrepancy may be due to several reasons. We used samples from extragastric area that reflect systemic circulation in this study, whereas samples obtained from the stomach (gastric juice, gastric mucosa, etc.) were mainly used in previous studies. Indirect analysis of microbiome using bacteria-derived EVs rather than direct identification of microbiome might also be a possible explanation. However, a comparison of beta diversity between the gastric cancer group and the control group revealed that microbial composition in the gastric cancer group was significantly different from that in healthy controls, which was evident with all four types of samples.

The composition of microbiota was further investigated with a relative abundance of taxa. The five most dominant bacterial phyla in stool were *Bacteroidetes*, *Proteobacteria*, *Actinobacteria*, *Verrucomicrobia*, and *Firmicutes*, which are consistent with previous literature [27,28]. EVs from these phyla were also highly abundant in urine and serum samples. This finding implies that bacteria-derived EVs in systemic circulation can roughly reflect the microbial composition of gastrointestinal microbes. Interestingly, *Proteobacteria* were dominant in urine and serum EVs, while *Firmicutes* and *Bacteriodetes* were the most abundant in stool samples at the phylum level. *Firmicutes* and *Bacteriodetes* generally comprise more than 90% of the human gut microbiome, according to previous studies [28], similar to the results from our study. In addition, *Firmicutes* were increased in stool, while EVs from *Firmicutes* did not show a significant change in serum in gastric cancer. The relative abundances of *Bacteroidetes* in stool and urine EVs also showed conflicting results. The difference in microbial composition was even more evident at the genus level. *Acinetobacter* (phylum *Proteobacteria*) were the most dominant in urine and serum EVs identically, whereas *Prevotella* and *Bacteroides* (phylum *Bacteriodetes*) were the most dominant in stool samples. Representative strains showing changes in gastric cancer were largely different between stool and urine/serum samples. Moreover, the gross pattern of the microbiome from urine and serum samples was similar, which can easily be inferred from the fact that urine is basically filtrated from blood during systemic circulation. This result shows that the microbial composition directly identified from the bacteria present in the gut lumen differs from the composition of bacteria indirectly inferred from the EVs, which are absorbed through the gut mucosa into the systemic circulation. It is recently recognized that bacteria secrete EVs, which are intercellular communicasomes between the host and

commensal microbes, that can act as biologically active metabolites or as mediators of host-microbiome interaction [15,29,30]. Considering this, it can be inferred that there is a significant difference between the microbiome simply existing in the gastrointestinal tract and the core microbiome that are deeply involved in host-microbiome interaction, actually affecting the health of the host. In fact, gastric juice and feces contain lots of dead strains and strains with little clinical significance [31,32]. This might partly explain why the microbial diversity between the gastric cancer group and the control group was not consistent among various sample types. Although reduced microbial diversity in the gastrointestinal tract is often a characteristic feature of diseased status, this does not necessarily mean that the bacteria related to systemic circulation should also be less diverse in exactly the same manner. We do not yet fully understand which bacteria play major roles by host-microbiome interaction in certain diseases and how they systemically influence the disease pathogenesis. In fact, blood was known as a sterile specimen except for sepsis, but recent studies have shown that 16S rRNA gene-targeted NGS is possible with normal human blood, suggesting the presence of bacteria-derived EVs in blood. It is known that there is a difference in the EV composition in serum and plasma, and each of them has its inherent advantage according to the type of EVs and purpose of EV analysis [33–35]. Especially in terms of platelet richness in the plasma, there are reports that platelet-rich plasma (PRP) has some antibacterial effect [36,37]. Since 16s rRNA NGS analysis is based on bacteria, the platelet-poor plasma would be preferred over PRP in metagenome analysis using bacteria-derived EVs if plasma is used instead of serum. In brief, when performing microbiome analysis from the perspective of systemic effect on the host, it should be noted that circulating EVs found in blood or urine can have a greater meaning than analyzing samples directly obtained from the gastrointestinal tract.

We tried to explore the possibility of a prediction model that can be used for the early diagnosis of gastric cancer by using the microbiome data derived through EV analysis. Currently, some Asian countries, such as Korea and Japan, where the incidence of gastric cancer is very high, have been operating national cancer screening programs aiming to detect gastric cancer early. These programs increased the early detection of gastric cancer and lowered the mortality rate [3]. However, considering the socioeconomic costs of mass screening for a large number of people and the discomfort or complications from invasive procedures such as endoscopy, other new non-invasive diagnostic methods need to be developed. With the results of microbiome analysis, we investigate the possibility of liquid biopsy using diluted stool, urine, and serum.

In this study, prediction models were established for each sample type to distinguish gastric cancer patients from normal controls, and they were validated to evaluate the performance. The AUC value was higher than 0.7 in urine- and ST-Bac-based models, which is good performance according to the evaluation criteria (0.9–1.0 = excellent, 0.8–0.9 = very good, 0.7–0.8 = good, 0.6–0.7= sufficient, 0.5–0.6 = bad, <0.5). = not useful) [38], and the prediction model with urine showed the highest AUC value. However, although the specificity (84.9%) of the urine-based model was high, the sensitivity (67.7%) of this model was lower compared to that of the ST-Bac-based model (71.4%). When EGC and AGC groups were separately analyzed, although the ST-Bac-based prediction model showed the highest AUC value of 0.830 for EGC, the performances of prediction models were generally higher in the AGC group than in the EGC group except for ST-Bac. This indicates that the performances of prediction models differ by cancer stage and sample type. Accurate prediction of EGC seemed more difficult than AGC as one might easily expect, considering that the microbial changes would be more prominent in a more advanced state of the disease. In order to evaluate the suitability as a diagnostic test, it is necessary to consider various factors such as sensitivity, disease prevalence, cost-effectiveness, and convenience of sample collection. Especially, an ideal screening method should show adequate sensitivity and specificity, and there is typically a trade-off between these two performance metrics. High sensitivity is especially important in this situation, as a low value can lead to increased false-negative results, which means missing more cases with

the disease of interest. Although the specificity of the urine-based model in our study was about 85%, the low sensitivity of the model prevents it from being a suitable screening method as it stands. However, despite the relatively low sensitivity of the urine-based model, it should also be considered that urine is much easier to collect than stool samples. Recent studies have also shown that the bacteria-derived EVs in urine samples contain characteristic features in allergic airway disease with a significant correlation with total IgE and eosinophil count, which further supports the possible role of microbial EVs in the implication of microbiota in the diseased state [39,40]. To further investigate the effect of *H. pylori* on the performance of the prognostic models, we tried to use stool samples to detect *H. pylori* first. However, *H. pylori* was detected in ST-Bac samples in only 13.6% of the cases. Furthermore, among the *H. pylori*-positive gastric cancer samples, the relative abundance of *H. pylori* was lower than 0.1% in all but one case. These findings made us assume that the validity of the analysis according to *H. pylori* status might be quite limited, especially when the detection rate and relative abundance of *H. pylori* are low. It is reasonable to investigate the gastric microbiome as a whole, the members of which influence each other and change the adjacent environment. Therefore, we decided to assess the diagnostic ability of prediction models using the whole integrated metagenome data, including *H. pylori*, as a comprehensive biomarker.

This study has some limitations. First, we did not perform the microbiome analysis according to *H. pylori* status. As *H. pylori* is the single most important pathogen for the gastric carcinogenesis process, this might have influenced the interpretation of study results. As the control subjects were recruited from health check-up programs, detailed histological information on the degree of mucosal atrophy was not available. Hypoacidity due to atrophic gastritis could alter the gut microbial composition, which is important in regions with high *H. pylori* prevalence, such as Korea. Therefore, the findings of the present study do not necessarily separate gastric cancer from atrophic pangastritis without cancer. Nevertheless, we showed a possibility of discriminating between gastric cancer patients and the general population without gastric cancer, using metagenomic data from a relatively large number of subjects. Further biomarker studies are needed for the detection of gastric cancer, and mucosal atrophy and *H. pylori* infection should be considered. Second, the performances of diagnostic prediction models were suboptimal. The urine-based model, which had the highest AUC value, showed rather low sensitivity. The results of our study further need to be verified through external validation.

#### **5. Conclusions**

In this study, we have shown that bacteria-derived EVs in systemic circulation can be used for demonstrating the changes in microbial composition in gastric cancer. Bacteriaderived EVs in urine harbors a potential as a new diagnostic biomarker for gastric cancer, suggesting that EVs might be a new standard substance for cancer diagnosis. Although it would be difficult to directly apply the results from this study to real clinical practice yet due to the suboptimal sensitivity and specificity of the diagnostic prediction models, these findings still have great significance in that they showed the possibility of integrating a liquid biopsy method with metagenome analysis for gastric cancer diagnosis.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/cancers13184687/s1, Figure S1: Alpha diversity of microbiome defined by various indices in 4 sample types, Figure S2: Changes in the relative abundances of individual phyla in gastric cancer in 4 sample types, Figure S3: Comparison of gastric cancer diagnostic prediction models between healthy controls and EGC or AGC in 4 sample types.

**Author Contributions:** Conceptualization, J.-G.K. and Y.-K.K.; methodology, C.-S.K. and T.-S.S.; software, Y.-K.K.; validation, H.-C.S. and J.-C.S.; formal analysis, H.-C.S. and J.-C.S.; investigation, S.-M.K., Y.-S.P. and J.-G.K.; data curation, J.-Y.P. and C.-S.K.; writing—original draft preparation, J.-Y.P. and C.-S.K.; writing—review and editing, T.-S.S., Y.-K.K. and J.-G.K.; visualization, H.-C.S.; funding acquisition, Y.-S.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the Bio & Medical Technology Development Program of the National Research Foundation (NRF), funded by the Ministry of Science & ICT under grant number NRF-2017M3A9F3047495.

**Institutional Review Board Statement:** The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board of Haewoondae Baek Hospital (IRB No. 129792-2015-064) on 22 June 2015, Chung-Ang University Hospital (IRB No. 1772-001-290) on 28 August 2017, and Seoul National University Bundang Hospital (IRB No. B-1708/412-301) on 16 July 2017.

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Acknowledgments:** Special thanks to MD Healthcare Inc. for providing the sequencing and analysis services for this study.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study, in the collection, analyses, or interpretation of data, in the writing of the manuscript, or in the decision to publish the results.

#### **References**


## *Article* **Mutations in Epigenetic Regulation Genes in Gastric Cancer**

**Marina V. Nemtsova 1,2, Alexey I. Kalinkin 2, Ekaterina B. Kuznetsova 1,2, Irina V. Bure 1, Ekaterina A. Alekseeva 1,2, Igor I. Bykov 3, Tatiana V. Khorobrykh 3, Dmitry S. Mikhaylenko 1,2, Alexander S. Tanas <sup>2</sup> and Vladimir V. Strelnikov 2,\***


**Simple Summary:** Epigenetic mechanisms, such as DNA methylation/demethylation, covalent modifications of histone proteins, and chromatin remodeling, create specific patterns of gene expression. Epigenetic deregulations are associated with oncogenesis, relapse of the disease and metastases, and can serve as a useful clinical marker. We assessed the clinical relevance of integrity of the genes coding for epigenetic regulator proteins by mutational profiling of 25 genes in 135 gastric cancer (GC) samples. Overall, mutations in the epigenetic regulation genes were found to be significantly associated with reduced overall survival of patients in the group with metastases and in the group with tumors with signet ring cells. We have also discovered mutual exclusivity of somatic mutations in the *KMT2D*, *KMT2C*, *ARID1A*, and *CHD7* genes in our cohort. Our results suggest that mutations in epigenetic regulation genes may be valuable clinical markers and deserve further exploration in independent cohorts.

**Abstract:** We have performed mutational profiling of 25 genes involved in epigenetic processes on 135 gastric cancer (GC) samples. In total, we identified 79 somatic mutations in 49/135 (36%) samples. The minority (*n* = 8) of mutations was identified in DNA methylation/demethylation genes, while the majority (*n* = 41), in histone modifier genes, among which mutations were most commonly found in *KMT2D* and *KMT2C*. Somatic mutations in *KMT2D*, *KMT2C*, *ARID1A* and *CHD7* were mutually exclusive (*p* = 0.038). Mutations in *ARID1A* were associated with distant metastases (*p* = 0.03). The overall survival of patients in the group with metastases and in the group with tumors with signet ring cells was significantly reduced in the presence of mutations in epigenetic regulation genes (*p* = 0.036 and *p* = 0.041, respectively). Separately, somatic mutations in chromatin remodeling genes correlate with low survival rate of patients without distant metastasis (*p* = 0.045) and in the presence of signet ring cells (*p* = 0.0014). Our results suggest that mutations in epigenetic regulation genes may be valuable clinical markers and deserve further exploration in independent cohorts.

**Keywords:** gastric cancer; epigenetic regulation genes; somatic mutations; molecular genetic markers

#### **1. Introduction**

Gastric cancer (GC) is the 5th most common tumor in the world, and is the 3rd leading cause of cancer-related deaths worldwide. In 2018, more than 1,000,000 new GC patients were identified [1].

Recently, knowledge about the molecular mechanisms of gastric carcinogenesis has been intensively expanded. By using genome-wide approaches, The Cancer Genome Atlas (TCGA) Research Network divided GC into four molecular subtypes: Epstein-Barr associated (EBV), microsatellite instability (MSI), genomically stable (GS), and chromosomal

**Citation:** Nemtsova, M.V.; Kalinkin, A.I.; Kuznetsova, E.B.; Bure, I.V.; Alekseeva, E.A.; Bykov, I.I.; Khorobrykh, T.V.; Mikhaylenko, D.S.; Tanas, A.S.; Strelnikov, V.V. Mutations in Epigenetic Regulation Genes in Gastric Cancer. *Cancers* **2021**, *13*, 4586. https://doi.org/10.3390/ cancers13184586

Academic Editor: Takaya Shimura

Received: 6 August 2021 Accepted: 9 September 2021 Published: 13 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

instable (CIN) [2]. Next-generation sequencing (NGS) technologies have allowed identification of genes with an increased frequency of somatic mutations in different types of tumors. Those are the driver genes of carcinogenesis. Being used as targets for a therapy, such genes allow effective treatment of patients. However, GC is not enriched with mutations in known driver genes. Therefore, the targeted drugs that are useful in the treatment of other types of tumors are not effective in GC. Despite the intensive search for new drugs for cancer therapy, only trastuzumab and ramucirumab targeting HER2 and VEGFR2, respectively, are currently approved for GC treatment. Therefore, the search for novel genes with an increased somatic mutation frequency in GC is urgent to identify new clinical and prognostic markers, as well as new targets for treatment.

Epigenetic mechanisms, including DNA methylation/demethylation, covalent modifications of histone proteins (methylation, adenylation, phosphorylation, etc.), chromatin remodeling, and the action of non-coding RNAs create stable and clear patterns of gene expression during cell life. Epigenetic mechanism deregulations are associated with carcinogenesis, relapse of the disease, and metastasis, and can also serve as a useful clinical marker and a marker of response to therapy [3]. Application of NGS allowed identification of tumors without mutations in the known cancer driver genes that are, however, characterized by mutations in genes encoding epigenetic factors and chromatin-modifying enzymes. Today, deregulation of epigenetic mechanisms in different types of tumors has been confirmed, but its causes are insufficiently studied [4,5].

Somatic mutation profiling of epigenetic regulation genes will help to identify causes of epigenetic deregulation in GC and to suggest potential targets for successful therapy.

Using an NGS panel of 25 genes (*DNMT1, MBD1, TET1, DNMT3A, DNMT3B, EZH2, KDM6A, EP300, JARID1B, CREBBP, HDAC2, SIRT1, SMARCB1, SMARCA2, SMARCA4, ARID1A, ARID2, BRD7, PBRM1, CHD5, CHD7, CHD4, KMT2A, KMT2D and KMT2C*), we performed somatic mutation profiling in 135 tumor samples obtained from patients with GC.

#### **2. Materials and Methods**

#### *2.1. Patients and Tumor Samples*

The study included 135 patients with locally advanced GC who were treated in N.N. Burdenko Faculty Surgery Clinic, I.M. Sechenov First Moscow State Medical University from 2007 to 2015. The study was conducted in accordance with the Declaration of Helsinki and was approved by the Institutional Ethics Committee of I.M. Sechenov First Moscow State Medical University. Written informed consent was obtained from each participant in this study. All patients underwent surgical treatment, and resected tumor samples, as well as non-malignant gastric mucosa samples, were used in the study. GC was confirmed in all patients by morphological examination of the surgical material. For TNM staging, ESMO Clinical Practice Guidelines for diagnosis, treatment, and follow-up for gastric cancer [6] were used. The distribution of patients in clinical groups is presented in Table 2.

#### *2.2. Mutation Screening by NGS*

A total of 5 to 7, 10 μm paraffin sections were manually dissected to ensure that each sample contained at least 70% of neoplastic cells. Genomic DNA was isolated from archived samples using a QIAamp DNA FFPE Tissue Kit (Qiagen, Hilden, Germany), as recommended by the manufacturer.

Deep sequencing was performed using the Ion Torrent platform (ThermoFisher, Waltham, MA, USA) following established protocol [7]. The protocol includes the preparation of libraries of genomic DNA fragments, clonal emulsion PCR, sequencing, and bioinformatic analysis of obtained results. DNA fragment libraries were prepared using Ion Ampliseq ultra-multiplex PCR technology.

An epigenetic regulation genes panel with 1376 primer pairs was designed to amplify all coding regions, noncoding regions of the terminal exons, and putative splice site gene regions for 25 human genes: *DNMT1, MBD1, TET1, DNMT3A, DNMT3B, EZH2, KDM6A,* *EP300, JARID1B, CREBBP, HDAC2, SIRT1, SMARCB1, SMARCA2, SMARCA4, ARID1A, ARID2, BRD7, PBRM1, CHD5, CHD7, CHD4, KMT2A, KMT2D* and *KMT2C*. The panel was designed by using the Ion Ampliseq Designer v. 7.03 (ThermoFisher, Waltham, MA, USA). The total length of human genome sequences covered by the panel was 250,900 bp. The panel reached 98.09% coverage by design; this applies to exons and 25 bp flanking intron sequences. The information of the panel is shown in Tables S3 and S4. The selection of epigenetic regulation genes for the panel was based on the estimation of the frequency of their somatic mutations in GC, obtained from the COSMIC database and from the literature. Genes reported to be mutated in >3.5% of GC samples were included in the panel.

Multiplex PCR and subsequent stages of the fragment library preparation were performed using an Ion AmpliSeq Library Kit 2.0 (ThermoFisher, Waltham, MA, USA), according to the manufacturer's protocol. Aliquots from the prepared libraries were subjected to clonal amplification on microspheres in the emulsion on the Ion Chef Instrument (ThermoFisher, Waltham, MA, USA). Sequencing was performed on the Ion S5 genomic sequencer according to the manufacturer's protocol (ThermoFisher, Waltham, MA, USA) with the targeted sequencing depth of 1000×. The results were analyzed with Torrent Suite software consisting of Base Caller (the primary analysis of the sequencing results); Torrent Mapping Alignment Program—TMAP (alignment of the sequences to the reference genome GRCh37/hg19); and Torrent Variant Caller (analysis of variations in nucleotide sequences) with the cut-off for variant allele frequency set at 0.1, and minimum read depth of the variant allele set at 5. Genetic variants were annotated with ANNOVAR software [8]. Visual data analysis, manual filtering of sequencing artifacts, and sequence alignment were performed using the Integrative Genomics Viewer (IGV) [9].

#### *2.3. Sanger Sequencing*

Sanger sequencing was performed in order to (1) validate mutations detected by NGS screening and (2) distinguish somatic vs. germline mutations. For the second purpose, DNA samples extracted from archived non-malignant gastric mucosa of the same patients were used. The direct sequencing of individual PCR products from primers that flank areas of specific mutations were performed on the automatic genetic analyzer ABI PRISM 3500 (ThermoFisher, Waltham, MA, USA) according to the manufacturer's protocols.

#### *2.4. Statistical Analysis*

Samples were compared using Fisher's exact test. For more than 3 groups comparison Chi-squared test was used. Overall survival probability (OS) was calculated by the Kaplan– Meier product-limit method from the date of surgery till death by any cause and compared statistically using Mantel–Haenszel (log-rank) test. A groupwise mutual exclusivity test was carried out using the DISCOVER (Discrete Independence Statistic Controlling for Observations with Varying Rates) method, which is based on overall tumor-specific alteration rates to decide if alterations co-occur more or less than expected by chance and preventing spurious associations in co-occurrence detection with increasing statistical power to detect mutual exclusivities [10]. All calculations were conducted using R version 3.6.3 [R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/ accessed on 7 August 2021].

#### *2.5. Pathogenicity Prediction for Novel Mutations*

To predict the pathogenicity of identified novel missense variants, a combination of PolyPhen2, PROVEAN, SIFT, and MutPred2 tools was used. I-Mutant 3.0 software was used to calculate the stability of the mutant protein. Loss of protein function effects were assessed with MutPred-LOF software. The effect of nonsynonymous substitutions on the structure was illustrated using the Project HOPE3D portal.

#### **3. Results**

#### *3.1. The Spectrum of Detected Somatic Mutations*

Using a targeted NGS panel for 25 epigenetic regulation genes, we performed mutational profiling in 135 tumor samples obtained from patients with GC. Our panel included the *DNMT1, MBD1, TET1, DNMT3A, DNMT3B* genes that control DNA methylation/demethylation; the *EZH2, UTX, EP300, JARID1B, CREBBP, HDAC2, SIRT1, KMT2A, KMT2D,* and *KMT2C* genes encoding histone modifiers; and the *SMARCB1, SMARCA2, SMARCA4, ARID1A, ARID2, BRD7, PBRM1, CHD5, CHD7, CHD4* genes responsible for chromatin remodeling. Mapped data depth and coverage for each sample are presented in Table S5. For the analysis, we selected missense substitutions that were not annotated in the ClinVar, COSMIC, dbSNP databases and/or substitutions with a population frequency of MAF < 0.0005, as well as nonsense mutations and frameshift mutations. A total of 79 different mutations found in our cohort fulfilled the selection criteria. The variant allele frequency, total read depth, reference, and variant allele read depths, etc., for each of these mutations, are presented in Table S1. No appropriate mutations were found in the *DNMT1, DNMT3A, EZH2, UTX, SMARCB1* and *SIRT1* genes. The identified mutations and their characteristics are presented in Table 1.

**Table 1.** Somatic mutations detected in epigenetic regulation genes in 135 gastric tumors.



**Table 1.** *Cont.*


**Table 1.** *Cont.*

In total, we revealed 79 somatic mutations that fulfilled the selection criteria in 49/135 (36%) samples, and no mutations were found in the remaining samples. Among the identified variants, 29/79 were not annotated in dbSNP, 32/79 were not mentioned in gnomAD Exomes, and 68/79 were not mentioned in ClinVar.

The largest number of mutations was determined in histone modifier genes (41), and in chromatin remodeling genes (37). The smallest number was in DNA methylation/demethylation genes (8). Taking into consideration variation in the gene size, we normalized the mutation numbers in these three groups. Of the genes under study, histone modifier genes contained collectively 24,207 codons; chromatin remodeling genes, 16,891 codons, and DNA methylation/demethylation genes, 6188 codons. Thus, frequencies of mutations in these three groups were 0.0017, 0.0022, and 0.0013 per codon, respectively. These figures support a somewhat lower somatic mutation burden on the DNA methylation/demethylation genes, although the differences were not statistically significant. The distribution of variants in the epigenetic regulation genes in our patient samples was as follows: *KMT2D*-16, *ARID1A*-9, *KMT2A*-8, *KMT2C*-8, *CHD7*-7, *CHD5*-5, *CHD4, CREBBP*-4 each, *ARID2*, *SMARCA2, SMARCA4, DNMT3B* and *TET1*-3 each, *HDAC2, EP300, BRD7* and *MBD1*-2 each, and *PBRM1, JARID1B*-1. In 23/49 samples, a combination of more than one mutation in different genes was demonstrated, but mutations in *KMT2D, KMT2C, ARID1A,* and *CHD7* were significantly rarely found in one and the same sample (*p* = 0.038).

#### *3.2. Pathogenicity Analysis of the Detected Mutations by Prediction Programs*

For all novel mutations that fulfilled the selection criteria, pathogenicity analysis was performed by using prediction programs. By in silico analysis of pathogenicity for somatic alterations, we determined that 15/63 alterations were pathogenic according to more than two prediction tools. PolyPhen2-HumDiv predicted 26 of those as 'Probably damaging', and the other 15 were 'Possibly Damaging', whereas PolyPhen2-HumVar predicted 17 alterations as 'Probably damaging', 11 alterations as 'Possibly damaging', while other 35 alterations were 'Benign'. However, it should be noticed that PolyPhen2- HumVar is more effective in mutations pathogenicity prediction for Mendelian disorders. 26/63 somatic alterations were predicted as 'Deleterious' by PROVEAN prediction tool; 42/63 variants were indicated as 'Damaging' by SIFT.

MutPred2 and MutPred-LOF are machine learning approaches, which incorporate genetic and molecular information to predict whether the alteration is pathogenic or not. We assigned a threshold value of 0.68 for pathogenic, as recommended by developers, because it yields a false positive rate of 10%. With this assumption, 11/63 somatic missense variants were predicted as pathogenic, as well as and 10/16 nonsense and frameshift variants, by MutPred-LOF with a cut-off value of 0.50 (as recommended for MutPred-LOF).

I-Mutant 3.0 predicts protein stability changes based on a protein sequence or protein structure by using a support vector machine training algorithm. The I-Mutant 3.0 predicted a decrease in protein structure stability for 44 somatic alterations and an increase for the other 19 (Table S2).

*3.3. Analysis of Clinical Significance of Mutations in Epigenetic Regulation Genes*

The distribution of mutations in our patient cohort aligned to clinical features is shown in Figure 1.

**Figure 1.** Spectrum of epigenetic regulation genes somatic mutations in gastric cancer. Gene names are marked according to functions of the encoded proteins: histone modifiers, blue; chromatin remodeling, red; DNA methylation/demethylation, magenta.

> We found no associations of overall somatic mutation status (absence of mutations vs. presence of at least one mutation) of epigenetic regulation genes with gender, age, tumor size, lymph node metastases, stage, anatomical localization, Lauren type, distant metastases, and presence of signet ring cells (Table 2). As for individual genes, we have only discovered that mutations in *ARID1A* were associated with distant metastases (*p* = 0.03).


**Table 2.** Clinical characteristics of patients and their distribution by groups with mutations (mut+) and without mutations (mut−) in epigenetic regulation genes.

\* patients group with mutations in epigenetic regulation genes (mut+) vs. without mutations (mut−).

In the analysis of survival using the Kaplan–Meier method, we found that the overall survival of patients in the group with metastases and the group of tumors with signet ring cells was significantly reduced in the presence of mutations in the epigenetic regulation genes (*p* = 0.036 and *p* = 0.042, respectively) comparing with patients without mutations (Figure 2).

**Figure 2.** Overall survival in patients with and without somatic mutations of the epigenetic regulation genes in their gastric tumors, and (**a**) with distant metastases; (**b**) with the presence of signet ring cells in tumors.

Somatic mutations in the chromatin remodeling genes correlate with a low survival rate of patients in the absence of distant metastases (*p* = 0.045) and with the presence of signet ring cells in tumors (*p* = 0.0014) (Figure 3).

For the group of histone-modifying genes, no significant clinical correlations were found. The group with mutations in the DNA methylation/demethylation genes included only 8 patients and was too small to perform statistical analysis.

**Figure 3.** Overall survival in patients with and without somatic mutations of the chromatin remodeling genes in their gastric tumors, and (**a**) with distant metastases; (**b**) with the presence of signet ring cells in tumors.

#### **4. Discussion**

Somatic mutations in epigenetic regulation genes are not very common in GC and were determined only in 36% samples (49/135) in our study. Mutations were most rarely detected in genes regulating DNA methylation/demethylation. We have not found any somatic mutations in *DNMT1* and *DNMT3A*. Besides, the group of patients with mutations in the DNA methylation-related genes (*MBD1*, *TET1*, *DNMT3B*) was the smallest one with only 8 out of 135 patients. Such a low frequency may be a result of the cancer type being investigated. Chai-Jin Lee et al. demonstrated that frequencies of somatic mutations in genes associated with DNA methylation and demethylation (*DNMT1*, *DNMT3A*, *MBD1*, *MBD4*, *TET1*, *TET2* and *TET3*) significantly varied in different types of cancers. Thus, in myeloid leukemia samples, the frequency of *DNMT1* and *DNMT3A* mutations was high, whereas, in glioblastoma, renal cell carcinoma, and colon carcinoma, the total mutation rate was less than 9% [11]. The low frequency of mutations in the DNA methylation drivers in solid tumors is consistent with our results. Many studies have been published

demonstrating DNA methylation as a clinical marker of carcinogenesis; however, the role of somatic mutations in genes regulating methylation/demethylation in solid tumors has not yet been sufficiently investigated. Moreover, although we did not find any *DNMT3A* mutations in our samples, they were identified in other solid tumors. In 1.2% of papillary thyroid carcinoma cases, mutations and/or loss of *DNMT3A* expression were associated with aggressive clinical course and poor outcome [12].

In our work, the largest number of mutations was detected in histone modification genes (52%, 41/79), with 16 mutations in *KMT2D*, 8 in *KMT2C,* and 8 in *KMT2A*. The proteins encoded by these KMT2 (histone-lysine N-methyltransferases subclass 2) genes were components of a COMPASS-like complex that performs mono-, di-, and trimethylation of lysine 4 (H3K4) in histone 3 and is associated with transcription activation, facilitating access of transcription factors to the promoter and enhancer regions of genes [13]. The functions of COMPASS complexes are vitally important for the normal development of an organism, and mutations in genes encoding their protein components are associated with carcinogenesis [14]. KMT2C and KMT2D proteins restrain cell proliferation and could be considered tumor suppressors [15]. In addition to lysine methylation associated with transcription activation, methyltransferases KMT2C and KMT2D play an important role in the maintenance of genomic stability and DNA repair [16]. Besides, these proteins, together with PTIP (PAX transactivation-domain interacting protein), a subunit of the KMT2C/KMT2D complexes, were found to increase the instability and induce the degradation of the MRE11-dependent replication fork in BRCA-deficient cells [17].

The *KMT2D* and *KMT2C* genes are among the most frequently mutated in cancers, which is also confirmed by our study. Mutations were detected in various types of solid tumors, such as melanoma, urothelial carcinoma, lung cancer, as well as in esophageal and stomach cancers [18].

In our study, *KMT2D* mutations had the highest frequency of 12% and were distributed throughout the gene (Figure 4). Mutations of the *KMT2D* gene are mainly localized in the central part of the gene coding sequence, which corresponds to the protein region between the PHD-finger domain and the SET domain. This is also in concordance with the data obtained by other authors [19].

**Figure 4.** Distribution of the *KMT2D* mutations detected in this study, along the gene.

According to the analysis by pathogenicity prediction programs, one of the novel somatic missense mutations that we identified in the *KMT2D* gene, p.R3727C, was determined as pathogenic by almost all prediction tools. This substitution results in disruption of the leucine zipper motif, which was necessary for the protein–protein interactions or dimerization [20]. Disruption of the leucine zipper motif seriously alters the function of proteins, which leads to a deregulation of protein interactions and blocking transcription. Directed alterations of the leucine zipper motif are currently created in synthetic proteins that are used as antitumor drugs [21].

The analysis of pathogenicity of unannotated mutations identified by us in *KMT2C* revealed three mutations (p.R973G, p.M959I, p.C1953Y) that were pathogenic according to three or more prediction tools. The first two of them were located in the PHD-finger domain of the gene, and p.C1953Y was located in the disorder domain. Disorder domains are characterized by high instability, and substitutions in this region can change the protein conformation. Recent studies have demonstrated that around 20% of mutations in cancers are located in these regions, causing abnormalities of protein conformations and functions [22]. Mutations in *KMT2C* in diffuse GC are associated with epithelial–mesenchymal transition (EMT) and acquisition of the mesenchymal phenotype by cells and are also markers of a poor prognosis [23]. Mutation distribution along the *KMT2C* gene is shown in Figure 5.

**Figure 5.** Distribution of the *KMT2C* mutations detected in this study, along the gene.

In our study, mutations in the *KMT2D* and *KMT2C* were significantly rarely combined in one sample (*p* = 0.038). There is a hypothesis that mutually exclusive genomic events are functionally related by common biological pathways, and mutually exclusive genes act on the same downstream effectors, thereby demonstrating functional redundancy. Therefore, the aberration of one of these genes is enough to completely disrupt their common pathways [24]. The *KMT2D* and *KMT2C* are components of similar COMPASS complexes that perform the same function. Deregulation of either *KMT2C* or *KMT2D* separately can serve as a driver mutation at the early stages of carcinogenesis, leading to changes in the epigenomic landscape. As was demonstrated for bladder cancer, tumor cells with low KMT2C activity experienced a deficiency of DNA repair mediated by homologous recombination and suffer from endogenous DNA damage and genomic instability, and their treatment with the PARP1/2 inhibitor olaparib leads to synthetic lethality [16]. The high frequency of *KMT2D* and *KMT2C* mutations in GC and its associations with repair processes allows considering them as targets for tumor treatment using PARP inhibitors, causing the lethality of tumor cells.

We compared our result on mutual exclusivity of*KMT2D* and*KMT2C* mutations with other GC mutation databases. Three datasets were acquired using cBioPortal (http://cbioportal.org accessed on 7 August 2021): Gastric Cancer (OncoSG, 2018), Stomach Adenocarcinoma (Pfizer and UHK), and TCGA PanCancer Stomach Adenocarcinoma (STAD). Visual analysis suggested that *KMT2D* and *KMT2C* mutations in these datasets were not mutually exclusive (Figure 6). For statistical analysis, we retained only sequenced samples with mutation data (without Copy-Number Alterations) in all three studies. For the groupwise mutual exclusive test, *p*-values were as follows: 0.088 for Onco SG, 0.016 for TCGA STAD, and 0.5 for the Pfizer study. Using the wFisher *p*-value combination method [25] with sample size for each experiment, we obtained the *p*-value of the mutual exclusive test under the nominal significance level of 0.05 (Figure 6a). Another interesting observation was that considering missense mutations only, mutations in *KMT2D* and *KMT2C* visually were almost mutually exclusive in these three datasets, as they were in our study (Figure 6b), although calculated differences did not approach a significance level of 0.05, which we attributed to the sample sizes. In this respect, we paid attention to the studies of bigger sample size, though of another cancer localization, namely, Breast Cancer METABRIC, Nature 2012, and Nat Commun 2016 (2509 samples) and Breast Cancer MSK, Cancer Cell 2018 (1918 samples), and in these datasets, we witnessed obvious mutual exclusivity of somatic mutations in *KMT2D*, *KMT2C,* and *ARID1A*. Although this may be a cancer typespecific observation, we altogether cannot rule out sample size effect and/or peculiarities of mutation detection/interpretation in different studies.

**Figure 6.** Analysis of mutual exclusivity of *KMT2D* and *KMT2C* mutations on the data presented in this article and in other gastric cancer mutation databases. Portions of samples without mutations in *KMT2D* or *KMT2C* are shown in grey; (**a**) analysis of all types of mutations, excluding amplification and deep deletions, portions of samples with mutations in *KMT2D* or *KMT2C* are colored black; (**b**) analysis of missense mutations only, portions of samples with missense mutations in *KMT2D* or *KMT2C* are colored blue.

> The *ARID1A* and *CHD7* genes that are related to chromatin remodeling were often mutated in our patient samples. The *ARID1A* is often mutated in esophageal and gastric cancers and is the canonical cancer gene according to the Cosmic Cancer Gene Census [26]. The proteins encoded by the *ARID1A*, *SMARCA1*, *SMARCA2*, and *SMARCA4* are subunits of the conservative multisubunit SWI/SNF complex, which uses the energy of ATP hydrolysis to mobilize nucleosomes and remodel chromatin. The expression of these genes is often deregulated in the esophagus and gastric cancers [27].

> ARID1A substitutions that we identified in gastric tumors, not annotated in human mutation databases, namely p.R2236C, p.Q1415H, and p.P1710L, are of interest since they can lead to deregulation of molecular mechanisms important for cancer progression. According to the results of in silico analysis, the p.R2236C substitution results in aberration of ADP-ribosylation, which is important for the DNA damage repair, as well as for the formation of an allosteric site at p.R2233 that can be used to bind therapeutic agents. Today, the search and targeting of allosteric sites are one of the strategies in the development of antitumor drugs [28].

> ARID1A:p.Q1415H and ARID1A:p.P1710L amino acid substitutions demonstrate an overall predicted loss of O- and C-linked glycosylation. Post-translational modifications, such as glycosylation, affect the transport, stability, and folding of the protein, changing its biochemical and biophysical properties. Numerous studies confirmed that changes in protein glycosylation have a great impact on carcinogenesis and contribute to the appearance of more aggressive cell phenotypes [29].

> Recent studies demonstrated that mutations in genes and abnormal expression of the ISO/SNF complex proteins that participate in chromatin remodeling were associated with a more aggressive course of the disease, as well as with EBV and MSI subtypes of GC [30]. In our study, somatic mutations in the chromatin remodeling genes were also found to be associated with worse overall survival in patients (without distant metastases, *p* = 0.045; and in the presence of signet ring cells, an indicator of the aggressive course in GC, *p* = 0.00011). H. Takeshima et al. investigated the role of chromatin remodelers in GC and suggested that deregulations of chromatin remodeling occur at an early stage of gastric carcinogenesis and are involved in the formation of the field cancerization [31].

> Investigation of GC using NGS previously revealed that 47% of gastric adenocarcinomas were characterized by mutations of chromatin remodeling genes, and somatic mutations of *ARID1A* had a high frequency, as it was in our study. It was shown that gastrointestinal tumors with *ARID1A* mutations demonstrated high immune activity [32]. Gastric carcinomas with somatic *ARID1A* mutations were characterized by a more intense PD-L1 expression than tumors without mutations. PD-L1 overexpression contributes to a

more active response to immunotherapy and a better prognosis of survival for patients with mutations in *ARID1A* compared to tumors with wild-type *ARID1A*. *ARID1A* mutations can serve as a biomarker for the identification of patients with gastrointestinal cancer who are sensitive to immunotherapy [33]. Clinically, the loss of *ARID1A* expression was correlated with larger tumor size, deeper invasion, lymph node metastasis, and a poor prognosis [34]. In line with these observations, in our study, mutations in *ARID1A* were associated with distant metastases (*p* = 0.03).

#### **5. Conclusions**

As a result of somatic mutation profiling of epigenetic regulation genes in GC, we have revealed associations of the presence of such mutations in tumors with a decrease in patient survival and the risk of developing distant metastasis, making the presence of mutations a marker of a poor prognosis. Studying mutations in epigenetic regulation genes can also contribute to the development of new approaches to drug therapy for GC treatment, adding to them PARP inhibitors for the treatment of tumors with mutations in genes of the *KMT2* family and immunotherapy for the treatment of tumors with *ARID1A* mutations. According to our results, this may be a significant group of patients, as the total frequency of mutations in the chromatin remodeling genes and histone modifiers in our sample were approximately 25% of all patients with mutations in epigenetic regulation genes.

**Supplementary Materials:** The following are available online at https://www.mdpi.com/article/10 .3390/cancers13184586/s1, Table S1: variant allele frequency, total read depth, reference, and variant allele read depths for each mutation, Table S2: in silico analysis of somatic mutations found in gastric cancer samples, Tables S3 and S4: designed coverage of the NGS panel, Table S5: mapped data depth and coverage for each sample.

**Author Contributions:** Conceptualization, M.V.N.; data curation, M.V.N. and I.V.B.; funding acquisition, M.V.N.; investigation, M.V.N., A.I.K., I.I.B. and T.V.K.; methodology, E.B.K. and E.A.A.; software, A.I.K. and A.S.T.; supervision, M.V.N.; validation, E.B.K. and D.S.M.; visualization, A.I.K. and V.V.S.; writing—review and editing, I.V.B. and V.V.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the Russian Foundation for Basic Research (project 18-29- 09020) and the Ministry of Science and Higher Education of the Russian Federation.

**Institutional Review Board Statement:** The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of Sechenov First Moscow State Medical University (Sechenov University), 13 July 2016 (protocol 04-19).

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** The data presented in this study are available in the article and supplementary files.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**

