*Systematic Review* **Health-Related Quality of Life of Patients Treated with Biological Agents and New Small-Molecule Drugs for Moderate to Severe Crohn's Disease: A Systematic Review**

**Hasan Aladraj 1,\*, Mohamed Abdulla 1, Salman Yousuf Guraya <sup>2</sup> and Shaista Salman Guraya <sup>2</sup>**


**Abstract:** Crohn's disease (CD) leads to a poor health-related quality of life (HRQoL). This review aimed to investigate the effect of biological agents and small-molecule drugs in improving the HRQoL of patients with moderate to severe CD. We adopted a systematic protocol to search PubMed and Cochrane Central Register of Controlled Trials (CENTRAL), which was supplemented with manual searches. Eligible studies were RCTs that matched the research objective based on population, intervention, comparison and outcomes. Studies in paediatric populations, reviews and conference abstracts were excluded. Covidence was used for screening and data extraction. We assessed all research findings using RoB2 and reported them narratively. We included 16 multicentre, multinational RCTs in this review. Of the 15 studies that compared the effect of an intervention to a placebo, 9 were induction studies and 6 investigated maintenance therapy. Of these, 13 studies showed a significant (*p* < 0.05) improvement in the HRQoL of patients with CD. One non-inferiority study compared the intervention with another active drug and favoured the intervention. This systematic review reported a substantial improvement in the HRQoL of patients with CD using biological agents and small-molecule drugs. These pharmaceutical substances have the potential to improve the HRQoL of patients with CD. However, further large clinical trials with long-term follow-up are essential to validate these findings.

**Keywords:** Crohn's disease; biologics; small-molecule drugs; health-related quality of life (HRQoL)

### **1. Introduction**

Crohn's disease (CD) is a debilitating inflammatory disease that affects any part of the gastrointestinal tract, resulting in intestinal and systemic manifestations. It is a chronic disease characterised by alternating periods of disease relapse and remission. The chronic nature, early age of onset and incapacitating intestinal and systemic manifestations account for major social and financial stressors. Some distressing factors in patients with CD include frequent hospital visits, long-term medications with their side effects, bowel stenosis, possible surgical interventions and the fear of developing cancer [1,2]. A major burden on healthcare systems is related to the management of the CD-specific chronic internal and perianal fistulas, which need special attention in highly specialised colorectal surgery centres. The most dreadful complication of CD remains colorectal cancer, with a reported incidence of 746,000 cases (10.0% of the total cancer burden in men) and 614,000 cases (9.2% of the cancer incidence in women) [3,4].

HRQoL is a multidimensional concept which pertains to vitality, social energy and physical wellbeing [5–8]. To determine the effect of disease activity on HRQoL, several disease-specific HRQoL questionnaires have been used, such as the McMaster inflammatory bowel disease questionnaire (IBDQ) [9], the short IBDQ [10], the rating form of

**Citation:** Aladraj, H.; Abdulla, M.; Guraya, S.Y.; Guraya, S.S. Health-Related Quality of Life of Patients Treated with Biological Agents and New Small-Molecule Drugs for Moderate to Severe Crohn's Disease: A Systematic Review. *J. Clin. Med.* **2022**, *11*, 3743. https://doi.org/10.3390/ jcm11133743

Academic Editors: Hidekazu Suzuki and Christian Selinger

Received: 14 May 2022 Accepted: 14 June 2022 Published: 28 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

inflammatory bowel disease patient concerns (RFIPC) [11] and the sickness impact profile (SIP) [12]. All such tools measure specific elements of the HRQoL of patients with a focus on the certain characteristics of vitality and mental and social wellbeing.

To improve HRQoL, traditionally, the contemporary management of CD was driven by a progressive, stepwise therapeutic intensification with a re-review of the clinical response according to symptoms. This approach did not improve the long-term clinical outcomes in patients with CD [13], leading to the introduction of a "treat to target" CD management strategy, which guides physicians in the regular assessment of disease activity using objective clinical and biological outcome measures and subsequent treatment modifications [14]. The STRIDE-II initiative, an Update on the Selecting Therapeutic Targets in Inflammatory Bowel Disease (STRIDE) Initiative, confirmed that the restoration of QoL is the most important long-term treatment target in CD, irrespective of other objective markers of inflammation [15].

In the last two decades, biological agents (generally large, complex molecules manufactured by biotechnology[16]) have emerged as novel therapeutic agents for CD. Since the approval of infliximab in 1999, five other agents have been approved. These drugs work by inhibiting TNF-alpha, integrin-alpha4 or IL23/12p40 [17,18]. Due to a rising number of non-responders to treatment and a deeper understanding of the pathophysiological mechanisms of CD, new drugs are being developed to target IL23p19 and the JAK/STAT mechanisms or to regulate gut leukocyte trafficking [18]. As a primary outcome measure of therapy and a key factor of consideration for decision-makers, HRQoL has become a frequently measured outcome in clinical trials.

In 2009, a systematic review reported that the then-approved biologics (infliximab, adalimumab, certolizumab and natalizumab) demonstrated clinical improvement in the HRQoL of patients with inflammatory bowel disease (IBD) [19]. Since then, despite a staggering upsurge in CD management strategies and the availability of novel biological agents, there has been a scarcity of literature that could validate their efficacy using the best clinical evidence. Therefore, this systematic review aimed to evaluate the outcomes of the currently approved and promising in-development biological agents and small-molecule agents in improving the HRQoL of patients with moderate to severe CD.

#### **2. Methods**

#### *2.1. Objective*

Our review targeted studies of patients with moderate to severe CD, measured using a Crohn's Disease Activity Index (CDAI) score of 221 to 450 points or equivalent, being treated with biological agents and small-molecule agents such as TNF-alpha, integrinalpha4 or IL23/12p40 inhibitors or those regulating the JAK/STAT mechanism or gut leukocyte trafficking. We included studies that compared interventions with placebos or any other drug. The co-primary outcomes of this review were the number of patients achieving clinically meaningful improvements in HRQoL using the inflammatory bowel disease questionnaire (IBDQ) or the SF-36 questionnaires and the mean change in IBDQ total score or the physical component summary (PCS) and mental component summary (MCS) of the SF-36. Only studies that reported the targeted outcomes were included.

#### *2.2. The HRQoL Scales*

The IBDQ is the most frequently used disease-specific HRQoL tool [20]. The IBDQ is a 32-item questionnaire with 4 domains: bowel symptoms, systemic symptoms, emotional functioning and social functioning. The IBDQ total score is the sum of responses to all the items, which use a 7-point Likert scale grading system with 1 reflecting a severe problem and 7, no problem at all. The total score ranges between 32 (very poor HRQoL) and 224 (perfect HRQoL) [19,21].

The SF-36 is a generic HRQOL tool mainly used in IBD clinical trials [20]. The SF-36 has two summary components, the PCS and the MCS, derived from scores in eight individual scales (physical functioning, role—physical, bodily pain, general health, vitality, social functioning, role—emotional and mental health). A scale of 0 to 100 is used to score eight scales, with better HRQoL indicated by a higher score [19].

#### *2.3. Inclusion and Exclusion Criteria*

All double- or triple-blinded randomised controlled trials (RCTs) published in English that met the objective of our review were included. We excluded studies regarding adolescents and children (under 18 years of age). Conference proceedings, systematic reviews and non-English studies were also excluded.

#### *2.4. Search Strategy*

On 25 January 2022, a literature search, designed in conjunction with a senior librarian, was carried out on the databases of PubMed and Cochrane Central Register of Controlled Trials (CENTRAL). No limits were placed on the time span. Our search did not include grey literature. The capture–recapture method was used to verify the completeness of the search strategy results [22]. Keywords of Crohn's disease, HRQoL, IBDQ, SF-36, anti-TNF and infliximab were used. To narrow our results towards RCTs, we used a search strategy suggested by the Cochrane handbook that is highly sensitive for identifying results of RCTs [23]. A manual search of the reference lists and www.clinicaltrials.gov (29 January 2022) was also conducted independently. Details of the search strategy are shown in Appendix A.

#### *2.5. Data Extraction*

The screening of titles, abstracts and full-text articles was conducted by two independent reviewers (H.A. and M.A.) using the Covidence software, using the defined inclusion and exclusion criteria as benchmarks. Any discrepancies were discussed and resolved by the two reviewers. The same software was used for the extraction of data. A customised template containing fields such as general information (title, study ID and registration number), the characteristics of the included studies (aim, date conducted and funding) and the results was used.

#### *2.6. Risk of Bias Assessment*

To ascertain the risk of bias, Cochrane collaboration's risk of bias tool 2 (RoB2, 22 August 2019 version) was used independently by two reviewers (H.A. and M.A.) [24]. Any discrepancies were discussed, and then, a third researcher was consulted to secure a consensus. RoB2 is an outcome-based tool examining five domains which may lead to bias (bias arising from the randomisation process, deviations from intended interventions, missing outcome data, measurement of the outcome and the selection of the reported result). Studies that were rated high in one domain or raised some concerns in multiple domains that substantially lowered the confidence in the results were rated high overall. The risk of bias in relevant outcomes was reported, and those studies with a high risk of bias were not excluded based on those results.

#### *2.7. Strategy for Data Synthesis*

The extracted data were categorised according to the interventions used. The results were reported narratively using descriptive statistics, with the addition of tables and graphs where appropriate. If a study showed a statistically significant improvement (*p* < 0.05) in at least one dose group at the end of the study period, the intervention was considered to be effective in improving HRQoL.

This review was reported according to The Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines [25]. This review is registered with The International Prospective Register of Systematic Reviews (PROSPERO), an open-access online database of systematic review protocols, with the registration number CRD42022306394 [26].

#### **3. Results**

Our first search retrieved 306 and 303 records from PubMed and CENTRAL, respectively. A further 38 records were retrieved by hand searching www.clinicaltrails.gov and reference lists. After the removal of 95 duplicates, 552 title/abstracts were screened, which showed 433 irrelevant reports. Furthermore, 44 reports did not have retrievable full-text articles, 9 were ongoing studies and 44 were excluded for other reasons, as depicted in the flowchart. Finally, 22 reports of 16 studies were included (Figure 1). Four studies met the inclusion criteria. However, we excluded those studies as there was no data about HRQoL outcomes [27–30].

### *3.1. Characteristics of the Included Studies*

Our systematic review identified 16 studies. Fifteen compared their investigated interventions to a placebo. Only SONIC compared its intervention (infliximab) to another active drug (azathioprine) [31]. All included studies were multinational, multicentre RCTs. The total number of participants in this review was 7463 and ranged between 108 and 1281. Three studies investigated infliximab [31–33], three studies investigated certolizumab pegol [34–36], three studies investigated ustekinumab [37], two studies investigated natalizumab [38,39] and one study each investigated adalimumab [40], filgotinib [41], upadacitinib [42], tofacitinib [43] and apilimod mesylate [44]. Nine were induction studies, and six were maintenance studies. SONIC was an induction study with a maintenance extension [31].

A clinically meaningful improvement (MCID) in HRQoL is defined as an increase of ≥16 points in the IBDQ total score and an increase of 3 to 5 points in the SF-36 PCS and MCS scores. [45] Based on these values, nine studies defined MCID in the IBDQ as ≥16 points. Three studies defined MCID in the SF-36 PCS and MCS as ≥5 points. Only the PRECiSE 2 trial defined MCID in the PCS and MCS as 4.1 and 3.9 points, respectively [35].

#### *3.2. Risk of Bias*

A total of 157 outcomes were assessed. Of these, 104 (66%), 1 (0.6%) and 52 (33%) were rated as having high risks, some concerns or low risks of bias overall, respectively. As many as 9.5% of outcomes [33,44] were rated has having some concerns in domain 1: randomisation process. Meanwhile, 22.3% of outcomes [38,39,43] were rated as having a high risk in domain 2: deviations from intended interventions. A total of 88 (56%) [32–35,37,39,40,44] outcomes were rated as having a high risk in missing outcome data (domain 3), while 14% [36,37] of outcomes were rated as having some concerns. In domain 5: selection of the reported results, 38 (24.2%) [32–34,40] and 41 outcomes (26.1%) [33,35,39] were rated as having a high risk and some concerns, respectively. All outcomes were rated as having low risks in domain 4 (Table 1 and Figure 2).

**Table 1.** The estimated risk of bias in the studies recruited for this systematic review (*n* = 16).


D1: randomisation process, D2: deviations from the intended interventions, D3: missing outcome data, D4: measure-

ment of the outcome, D5: selection of the reported result. low risk, some concerns, high risk.

**Figure 2.** Risk of bias in the included study outcomes.

All studies used random allocation sequences except ACCENT I [33] and Sands et al. [44]. Their outcomes raised some concerns in domain 1, as the allocation sequence was concealed, and the baseline characteristics were consistent with randomisation. All studies used the intention to treat (ITT) or modified intention to treat (mITT) populations for analysis, except ENCORE [38], ENACT-2 [39] and Tofacitinib [43], in which there was no information available about the analysed population. Hence, their outcomes were rated as high risk in domain 2. The outcomes of PRECiSE 1 [34] and PRECiSE 2 [35] (more withdrawals in the placebo groups) and Targan et al. [32], ACCENT I [33], CHARM [40], IM UNITI [37], ENACT-2 [39] and Sands et al. [44] (lack of information about the number of or reason for withdrawals) were rated as high risk in domain 3. The major reason for withdrawal in Rutgeerts et al. [36], UNITI I and II [37] and Filgotinib [41] was a lack of efficacy. However, their outcomes showed some concerns, as the number of withdrawn participants was balanced between the study groups. In UNITI I and II, [37] unlike the IBDQ outcomes (which had low risk), PCS and MCS outcomes were rated as having some concerns due to missing outcome data. The protocols of PRECiSE- 2 [35] and ENACT-2 [39] (in which there were some concerns that results were selected), Targan et al. [32], CHARM [40], PRECISE 1 [34], Rutgeerts et al. [36] and Tofacitinib [43] (in which it was likely that results were selected; high risk of bias) were not found. The IBDQ results were likely selected (high risk) in ACCENT I [33], but selection was not suspected in PCS and MCS results (some concerns). The full details of the RoB2 assessment can be found in Appendix B.

#### *3.3. Effect of Interventions on HRQoL*

The effect of interventions on HRQoL is summarised in Table 2 for the SONIC [31] study and Table 3 for the placebo-controlled trials. The tables include the study ID/registration number, intervention, dosage, results and conclusion.

**Table 2.** The summary of findings for the SONIC [31] study. Results of the SONIC trials. All values are in means (SD).


\* SONIC extension for maintenance therapy.


= 15).








in the IBDQ score and an increase of 3–5 points in SF-36 PCS and MCS. All mean change values are in mean (SD). All

MCID values are percentages MCID \*\*: proportion of patients achieving an

an improvement

 of ≥ 4.1 points in the SF-36 PCS score. MCID b: proportion of patients achieving an improvement

 (%).

 MCID \*: proportion

improvement

 of 35 points in the SF-36 PCS or MCS score. MCID a: proportion of patients achieving

 of patients achieving

 an

improvement

 of ≥3.9 points in the SF-36 MCS score.

 of ≥ 16 in the IBDQ score.

#### 3.3.1. Infliximab vs. Azathioprine

SONIC [31] compared infliximab with azathioprine. In the induction period, the difference in the mean change in the IBDQ total score in the infliximab group was significantly higher than the azathioprine group at weeks 2, 18 and 26 (*p* < 0.05), but not at weeks 6 and 10 (*p* = 0.10). In the maintenance phase, the difference was statistically significant at weeks 34 and 42 but not at week 50 (*p* = 0.001, *p* = 0.04 and *p* = 0.09, respectively).

#### 3.3.2. Infliximab vs. Placebo

Two studies, Targan et al. [32] and ACCENT I [33], compared infliximab with placebo. Targan et al. [32] compared three groups using 5, 10 or 15 mg/kg infliximab induction with a placebo. Patients had a statistically higher mean IBDQ score in all infliximab groups at week 4 (*p* < 0.05, compared to placebo).

ACCENT 1 [33] examined the effect of two infliximab maintenance regimens, 5 mg/kg or 10 mg/kg infliximab, following a 5 mg/kg three-dose induction and compared them with a single dose of 5 mg/kg induction followed by a placebo. At week 10, the threedose group had a higher mean IBDQ score compared to the single-dose induction group (*p* < 0.05). Higher IBDQ scores were maintained for both maintenance groups (5 mg/kg and 10 mg/kg infliximab) at week 30 (*p* < 0.05 and *p* < 0.01) and week 50 (*p* < 0.05 and *p* < 0.001), respectively, compared to the single-dose induction group. Up to week 14, all treatment groups had an increase exceeding the MCID. Following week 14, the infliximab maintenance groups maintained this increase, while it decreased to below 16 points in the induction-only group. The PCS scores were significantly greater (*p* < 0.05) for both maintenance groups at weeks 10, 30 and 52 compared to the single-dose induction group. The difference in MCS scores was only significant at week 54, comparing the 10mg/kg maintenance group with the single-dose group (*p* < 0.05).

#### 3.3.3. Adalimumab vs. Placebo

The CHARM trial compared adalimumab maintenance, 40 mg every other week or weekly, with adalimumab induction only (placebo maintenance) [40].

Following a significant increase of 44.3 points (*p* < 0.0001, week 4 vs. baseline) in the mean IBDQ in the open-label induction phase, IBDQ scores continued to increase in the adalimumab maintenance groups (approximately 5 points), while IBDQ scores deteriorated in the induction-only group. There were statistically significant differences in the mean IBDQ total scores at all visits after week 4 between adalimumab maintenance groups and the induction-only group (*p* < 0.001 for adalimumab every other week and *p* < 0.05 for adalimumab weekly). After a year of maintenance (at week 56), patients in the adalimumab group had an IBDQ score of 18 points higher than those in the placebo group, a difference that exceeded the MCID of 16 points.

The differences in PCS scores were statistically significant at all visits following week 4 in the adalimumab-every-other-week maintenance group compared to the induction-only group (*p* < 0.05), while differences in the MCS were only significant at week 56 (*p* < 0.05). In total, 77% of adalimumab-every-other-week patients achieved an MCID of ≥5 points in the PCS compared to 61% in the induction-only group (*p* < 0.01). In the MCS, improvement was achieved by 67% and 54% of adalimumab-every-other-week and placebo patients, respectively (*p* < 0.05). Differences in the mean PCS and MCS between the adalimumabweekly group and the placebo group were not statistically significant.

#### 3.3.4. Certolizumab Pegol vs. Placebo

Three studies compared certolizumab pegol and a placebo. One study [36,46] had four arms comparing certolizumab (100 mg), certolizumab (200 mg) or certolizumab (400 mg) with a placebo. The PRECiSE 1 study had two groups comparing 400 mg of certolizumab with a placebo (administered at weeks 0, 2 and 4 and then every 4 weeks) [34]. In PRECiSE 2 [35,47], following an open-label induction of 400 mg of certolizumab at weeks 0, 2 and 4, patients received either maintenance certolizumab (400 mg) or a placebo.

Rutgeerts et al. and Schreiber et al. [36,46] reported statistically significant changes in the mean IBDQ at all reported timepoints for the 400 mg group compared to the placebo group, with the greatest change at week 10 (certolizumab pegol (400 mg): 32.2 points vs. 18.6 points for placebo; *p* ≤ 0.05). The 200 mg group had significant changes at weeks 2 and 4 compared to the placebo group (*p* ≤ 0.05), while changes in the 100 mg group were not statistically significant. Differences in the mean IBDQ between the certolizumab pegol and placebo arms were statistically significant at week 26 in both PRECiSE 1 and PRECiSE 2 (*p* = 0.03 and *p* < 0.001, respectively). PRECiSE 2 also reported significant differences in the IBDQ means at week 16 (*p* = 0.008). The percentages of patients achieving an MCID in the IBDQ at week 26 were significantly greater in the certolizumab groups in both PRECiSE 1 and 2 (*p* = 0.01 and *p* < 0.001, respectively) compared to the placebo groups.

Only PRECiSE 2 used the SF-36 tool for the estimation of HRQoL. Patients in the certolizumab group showed statistically significant (*p* < 0.05) differences at week 26 in the mean change and proportion achieving an MCID compared to the placebo group.

#### 3.3.5. Ustekinumab vs. Placebo

The UNITI trials compared ustekinumab and a placebo [37,48]. UNITI I and UNITI II induction studies compared a single intravenous infusion of 130 mg of ustekinumab or 6 mg/kg ustekinumab to a placebo. Patients had an inadequate response or intolerance to tumour necrosis factor (TNF) antagonists (UNITI I) or conventional therapy (UNITI II). Patients with a clinical response were re-randomised to maintenance therapy with subcutaneous ustekinumab (90 mg) every 12 weeks (q12w) or every 8 weeks (q8w) for 44 weeks and compared to the placebo in IM UNITI.

In both induction studies, the mean change and proportion of patients achieving an MCID in the IBDQ total score in both ustekinumab groups were statistically significant at week 8 compared to the placebo groups (*p* < 0.05). In the maintenance study at week 20, the mean decrease from the maintenance baseline was significantly less in the q12w group but not in the q8w group compared to the placebo group (*p* = 0.035 and *p* = 0.183, respectively). The mean decrease at week 44 was significantly less in both ustekinumab maintenance groups (*p* < 0.001 and *p* = 0.003, q12w and q8w compared to the placebo group, respectively). A significantly greater proportion of patients achieved MCIDs in the IBDQ in the ustekinumab (q8w) but not the ustekinumab (q12w) group (*p* = 0.014 and *p* = 0.140, respectively, compared to the placebo group).

In UNITI II, the mean change from baseline in the PCS and MCS scores was significant for both ustekinumab doses at week 8 compared to the placebo dose (*p* < 0.05). In UNITI I, the only significant change at week 8 in the mean score was in the MCS of the ustekinumab 6 mg/kg group compared to the placebo group (*p* = 0.006). The same pattern was seen in MCID proportions, significant (*p* < 0.05) in UNITI II for the MCS and PCS in both doses but only significant in the MCS for the 6 mg/kg group in UNITI I.

In the maintenance study at week 44, the mean decrease in the PCS and MCS from the maintenance baseline was significantly less in the ustekinumab (q8w) group compared to the placebo group (*p* < 0.01), while it was only significantly less in the MCS in the ustekinumab (q12w) group compared to the placebo group (*p* < 0.05). Changes in the means of MCS and PCS were not significant at week 20 for both groups. Both groups had significantly (*p* < 0.05) higher proportions of patients with MCID improvements at week 44 in the PCS and MCS, except for the PCS in the q12w group.

#### 3.3.6. Natalizumab vs. Placebo

Two studies compared natalizumab and a placebo. The ENCORE trial compared natalizumab as induction therapy to a placebo [38,49]. The ENACT-2 trial compared maintenance natalizumab with a placebo in patients who responded to natalizumab induction in ENACT-1 [39].

Induction treatment with natalizumab in the ENCORE trial showed a statistically significant (*p* < 0.001) increase in the mean IBDQ total score and the mean PCS score (compared to the placebo) but not in the MCS (*p* = 0.052).

Maintenance natalizumab in ENACT-2 maintained increases in the mean IBDQ total score, PCS and MCS scores achieved from the induction therapy in ENACT-1. The decrease from the change achieved in week 12 (randomisation of ENACT-2) was significantly less in the natalizumab group compared to the placebo group (*p* < 0.01) for all subsequent weeks in the IBDQ total score and PCS. MCS scores were not significant at weeks 24 and 36 but reached significance at weeks 48 and 60 (*p* < 0.01 and *p* < 0.001, respectively, compared to the placebo). The proportion of patients with MCIDs was significantly greater at weeks 36, 48 and 60 in the IBDQ and MCS and at all weeks in the PCS in the natalizumab group (*p* < 0.05, compared to the placebo).

### 3.3.7. Filgotinib vs. Placebo

The FITZROY study compared oral filgotinib to a placebo [41]. There was a 16-point difference favouring the filgotinib group compared to the placebo in the mean change from baseline of the IBDQ total score. This difference was statistically significant (*p* = 0.0046) and clinically meaningful.

#### 3.3.8. Upadacitinib vs. Placebo

The CELEST study compared five doses of oral upadacitinib (3 mg, 6 mg, 12 mg or 24 mg twice daily or 24 mg once daily) with a placebo as induction therapy [42,50]. Changes in the mean IBDQ total score at weeks 8 and 16 were only statistically significant in the 6 mg and 24 mg twice-daily groups (*p* ≤ 0.05, compared to the placebo). A significantly greater proportion of patients achieved clinically meaningful improvement in the IBDQ in all upadacitinib groups at week 16 and only the 6 mg twice-daily group at week 8 (*p* ≤ 0.05, compared to placebo).

#### 3.3.9. Tofacitinib vs. Placebo

One study compared three doses of tofacitinib (5 mg, 10 mg or 15 mg twice daily) with a placebo [43]. The 15 mg arm was closed early after the enrolment of only 16 participants. Therefore, this arm was not included in the efficacy analysis. Statistical significance was not calculated for HRQoL outcomes, and thus we could not determine the implications of the clinical evidence.

#### 3.3.10. Apilimod Mesylate vs. Placebo

Sands et al. compared two doses of apilimod mesylate (50 mg and 100 mg daily) as induction therapy with a placebo [44]. No statistically significant differences were found between either of the apilimod groups and the placebo at both time points (*p* > 0.3).

#### **4. Discussion**

The overarching goal of treatment for moderate to severe CD using biological agents and small-molecule drugs is the achievement of clinical remission and the arrestment or stabilisation of chronic intestinal inflammation. Our systematic review reported evidencebased clinical data from 16 RCTs and endorsed a superior role of biological agents and small-molecule drugs in improving the HRQoL outcomes in patients with CD. Out of the 16 studies identified in this systematic review, 15 studies compared their investigated interventions to a placebo. Only SONIC compared its intervention (infliximab) to another active drug (azathioprine) and favoured the intervention in the induction phase [31]. Excluding the SONIC study (as it had a different comparator) and one study [43], which did not report the statistical significance. A total of 8/14 studies used the intervention as induction therapy. In contrast, the remaining six RCTs studied maintenance therapy. All studies reported a significant difference in the mean change in the total IBDQ score, favouring the intervention group by the end of the study in at least one dose group, except

for Sands et al. [44], who did not report a statistically significant difference. Essentially, three out of eight induction studies and five out of six maintenance studies reported mean changes in the PCS and MCS. Of the induction studies, two out of three studies showed a significant difference in the mean change in the PCS and MCS favouring the intervention group, while all maintenance studies had a significantly greater change in the mean PCS and MCS in their intervention groups.

In our systematic review, two induction and four maintenance studies reported the proportion of patients achieving MCIDs in the IBDQ, and all reported significantly higher proportions in the intervention groups. The only induction studies that reported MCIDs in the PCS and MCS were UNITI I and UNITI II. UNITI I had a significantly higher proportion of patients with MCIDs in the MCS, while UNITI II had a higher proportion in both the MCS and PCS [37]. Furthermore, two out of six maintenance studies reported MCIDs in the PCS and MCS. Both studies had significant findings in favour of the intervention group.

In the systematic review conducted by Vogelaar et al., the researchers found that biologics (infliximab, adalimumab, certolizumab and natalizumab) improved HRQoL [19]. Comparably, the findings of our systematic review are in corroboration with Vogelaar et al. and showed significant improvements in HRQoL using other biologics (ustekinumab) and small-molecule drugs (upadacitinib and filgotinib). A Cochrane review of biologics in ulcerative colitis (another form of inflammatory bowel disease) found that infliximab and adalimumab significantly improved HRQoL [45]. This review argued that the studies on CD have also shown significant improvement in HRQoL. Another systematic review has reported that adalimumab improved fatigue, an aspect of HRQoL [51].

Our systematic review could not measure all of the relevant HRQoL outcomes, including the proportion of patients achieving MCIDs. Due to the missing data and inconsistency in the results from the analysed studies, appropriate statistical analyses could not be used. Some interventions showed inconsistencies in the improvement between physical and mental aspects of HRQoL. Lastly, most studies did not report both co-primary outcomes in both the IBDQ and SF-36 PCS and MCS summary scales. Despite these shortcomings, this systematic review diligently provided valuable data from RCTs which scientifically proves the efficacy of biological agents and small-molecule drugs in improving the HRQoL outcomes in patients with moderate to severe CD.

#### **5. Limitations**

There are some limitations to this review. The search strategy was conducted on two databases and only on English-language articles. Several studies that may have affected the results of this review were excluded because they did not report the targeted outcomes, or they were ongoing studies, including a trial for vedolizumab (a common biologic in current use). Effect measures were not calculated, nor were statistical analyses, including a meta-analysis conducted. Thus, the overall effect was not calculated. Owing to inconsistent clinical data from some of the selected studies, the possibility of unintentional research bias cannot be excluded. Nevertheless, during the systematic review process, for the accuracy and verification of results, the researchers arranged periodic meetings for mutual discussions, data cross-verifications and consensuses.

#### **6. Conclusions**

The cutting-edge advancements in drug research and biotechnology have introduced novel biologics and small-molecule drugs for the treatment of CD. Our systematic review demonstrated clear evidence of the efficacy of biological agents and small-molecule drugs in improving HRQoL outcomes in patients with moderate to severe CD. Due to the paucity of the comparative analysis of biologics and small-molecule drugs with other agents in the published literature, this study may potentially guide physicians in positioning and relocating drugs in management algorithms for patients with CD.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //drive.google.com/drive/folders/1IIraGwQGod8JjCyno07Rz-SdqUTthrtP?usp=sharing, Appendix B: Quality assessment instrument, Appendix C: Data extraction file, Appendix D: PRISMA checklist.

**Author Contributions:** H.A. and M.A. jointly developed the protocol and the search strategy of this review. The same authors independently screened the references and extracted the data. Both authors synthesised the data and wrote the report collaboratively. S.S.G. supervised the review from start to finish and supported H.A. and M.A. through training, advice and encouragement. S.Y.G. reviewed the raw and final data, edited the final draft of the article and cross-verified all files to ensure consistency and scientific rigor. All authors have read and agreed to the published version of the manuscript.

**Funding:** The APC was funded by the Royal College of Surgeons in Ireland—Bahrain.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available in this article.

**Acknowledgments:** We would like to thank Bindhu Nair for her support in developing the search strategy.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

A literature search, designed in conjunction with a senior librarian, was used on the following databases: PubMed and Cochrane Central Register of Controlled Trials (CENTRAL). A manual search of reference lists of relevant studies was also conducted for papers that met the inclusion criteria. No limits were set regarding language or date restrictions. We did not to search grey literature, and we did not contact study investigators. The aim of this systematic review was to evaluate the effect of currently approved and promising in-development biological agents and small-molecule agents in improving HRQOL in individuals with moderate to severe Crohn's disease. To answer this question, our search strategy was developed to find publications that are randomised controlled trials reporting on elements of our PICO. We used Boolean operators to combine the terms in Table A1 (both in the title/abstract field and using MeSH terms where applicable). In terms of narrowing our search to randomised control trails, no addition to the CENTRAL search strategy was needed, as it has a separate section for trails. For our PubMed search, with the AND operator, we added a search strategy suggested by the Cochrane handbook that is highly sensitive for identifying randomised controlled trials [23]. Our search was limited to only two databases, in reference to Cochrane guidance [52].

**Table A1.** Table of keywords used.



**Table A1.** *Cont.*

#### *Appendix A.1. PubMed (n = 306) 25 January 2022*

("crohn\*"[Title/Abstract] OR "Crohn Disease"[MeSH Terms]) AND ("GLPG0634"[Supplementary Concept] OR "upadacitinib"[Supplementary Concept] OR "Janus Kinase Inhibitors"[Pharmacological Action] OR "td 1473"[Supplementary Concept] OR "deucravacitinib"[Supplementary Concept] OR "ontamalimab"[Supplementary Concept] OR "ozanimod"[Supplementary Concept] OR "etrasimod"[Supplementary Concept] OR "spesolimab"[Supplementary Concept] OR ("Infliximab"[MeSH Terms] OR "Tumor Necrosis Factoralpha"[MeSH Terms] OR ("Adalimumab"[MeSH Terms] OR "adalimumab biosimilar HS016"[Supplementary Concept]) OR "Certolizumab Pegol"[MeSH Terms] OR "antibodies, monoclonal"[MeSH Terms] OR "Ustekinumab"[MeSH Terms] OR "vedolizumab"[Supplementary Concept] OR "Natalizumab"[MeSH Terms] OR "Integrin alpha4"[MeSH Terms] OR "etrolizumab"[Supplementary Concept] OR "abrilumab"[Supplementary Concept] OR "risankizumab"[Supplementary Concept] OR "mirikizumab"[Supplementary Concept] OR "guselkumab"[Supplementary Concept]) OR ("jak inhibitor"[Title/Abstract] OR "anti alpha4"[Title/Abstract] OR "sphingosine 1 phosphate"[Title/Abstract] OR "etrasimod"[Title/Abstract] OR "ozanimod"[Title/Abstract] OR "ontamalimab"[Title/Abstract] OR "deucravacitinib"[Title/Abstract] OR "td 1473"[Title/Abstract] OR "upadacitinib"[Title /Abstract] OR "Filgotinib"[Title/Abstract] OR "spesolimab"[Title/Abstract] OR "guselkumab"[Title/Abstract] OR "brazikumab"[Title/Abstract] OR "mirikizumab"[Title/Abstract] OR "risankizumab"[Title/Abstract] OR "abrilumab"[Title/Abstract] OR "etrolizumab"[Title/Abstract] OR "Natalizumab"[Title/Abstract] OR "vedolizumab"[Title/Abstract] OR "Ustekinumab"[Title/Abstract] OR "Certolizumab Pegol"[Title/Abstract] OR "Adalimumab"[Title/Abstract] OR "Infliximab"[Title/Abstract]) OR "tnf-alpha inhibitor"[Title/ Abstract] OR ("interleukin 23"[MeSH Terms] OR "interleukin 23 subunit p19"[MeSH Terms] OR "Interleukin-12 Subunit p40"[MeSH Terms]) OR "Infliximab-qbtx"[Title/Abstract]) AND ("36-item Short-Form Health Survey"[Title/Abstract] OR "SF-36"[Title/Abstract] OR "Inflammatory bowel disease questionnaire"[Title/Abstract] OR "IBDQ"[Title/Abstract] OR "Health related quality of life"[Title/Abstract] OR "HRQoL"[Title/Abstract] OR "QoL"[Title/Abstract] OR "Quality of Life"[MeSH Terms] OR "Quality of Life"[Title/Abstract] OR "SF-36V2"[Title/Abstract]) AND (("trial"[Title/Abstract] OR "randomized controlled trial"[Publication Type] OR "controlled clinical trial"[Publication Type] OR "randomized"[Title/ Abstract] OR "placebo"[Title/Abstract] OR "drug therapy"[MeSH Subheading] OR "randomly"[Title/Abstract] OR "groups"[Title/Abstract]) NOT ("animals"[MeSH Terms] NOT "humans"[MeSH Terms]))

#### *Appendix A.2. CENTRAL*


### **References**


## *Review* **Gastric Cancer Screening in Japan: A Narrative Review**

**Kazuo Yashima 1,\*, Michiko Shabana 2, Hiroki Kurumi 1, Koichiro Kawaguchi <sup>1</sup> and Hajime Isomoto <sup>1</sup>**


**Abstract:** Gastric cancer is the second leading cause of cancer incidence in Japan, although gastric cancer mortality has decreased over the past few decades. This decrease is attributed to a decline in the prevalence of *H. pylori* infection. Radiographic examination has long been performed as the only method of gastric screening with evidence of reduction in mortality in the past. The revised 2014 Japanese Guidelines for Gastric Cancer Screening approved gastric endoscopy for use in populationbased screening, together with radiography. While endoscopic gastric cancer screening has begun, there are some problems associated with its implementation, including endoscopic capacity, equal access, and cost-effectiveness. As *H. pylori* infection and atrophic gastritis are well-known risk factors for gastric cancer, a different screening method might be considered, depending on its association with the individual's background and gastric cancer risk. In this review, we summarize the current status and problems of gastric cancer screening in Japan. We also introduce and discuss the results of gastric cancer screening using *H. pylori* infection status in Hoki-cho, Tottori prefecture. Further, we review risk stratification as a system for improving gastric cancer screening in the future.

**Keywords:** gastric cancer; gastric cancer screening; endoscopy; *H. pylori*; eradication therapy

### **1. Introduction**

Gastric cancer is the fifth most common cancer and the fourth leading cause of cancerrelated deaths worldwide [1]. *Helicobacter pylori* (*H. pylori*) infection is considered the main cause of gastric cancer [2,3]. In Japan, the adjusted incidence and mortality rates of gastric cancer have decreased over the past few decades [4]. This decrease is mainly attributed to the reduction in *H. pylori* infection rates and the preventative effects of the *H. pylori* eradication therapy [5–10]. Despite this reduction, the number of gastric cancer cases ranks second and the number of deaths caused by gastric cancer ranks third in Japan [11], making it a critical public health problem.

In Japan, radiographic examination has been conducted since the 1960s as a secondary preventive measure for gastric cancer [12]. The revised 2014 Japanese Guidelines for Gastric Cancer Screening approved gastric endoscopy for use in population-based screening, together with radiography [13]. Currently, the government of Japan recommends either radiography or gastroscopic examination for gastric cancer screening [14]. However, there are some barriers, such as participation rate, endoscopic capacity, equal access, and costeffectiveness [15–18].

Over 99% of gastric cancers in Japan are predisposed by a current or past *H. pylori* infection [19,20]. Furthermore, the background of gastric cancer risk has changed compared to the past due to the rapid decrease in the infection rate of *H. pylori* [5–10]. It has become necessary for efficient gastric cancer screening to classify patients as *H. pylori*-infected [8,21,22].

In recent years, image-enhanced endoscopy (IEE) [23], as well as artificial intelligence (AI), have been introduced in endoscopic diagnostics [24–26]. In this review, the present status and problems of gastric cancer screening in Japan are summarized. We present

**Citation:** Yashima, K.; Shabana, M.; Kurumi, H.; Kawaguchi, K.; Isomoto, H. Gastric Cancer Screening in Japan: A Narrative Review. *J. Clin. Med.* **2022**, *11*, 4337. https://doi.org/ 10.3390/jcm11154337

Academic Editor: Sun-Young Lee

Received: 27 June 2022 Accepted: 24 July 2022 Published: 26 July 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the results of gastric cancer screening using *H. pylori* infection status in Hoki-cho, Tottori prefecture. Further, we introduce risk stratification as a system for improving gastric cancer screening in the future.

#### **2. Gastric Cancer in Japan**

#### *2.1. Epidemiology of Gastric Cancer*

In Japan, gastric cancer accounted for almost half of all cancer deaths in the 1960s, but the proportion continues to decline. According to the 2021 cancer statistics forecast of the National Cancer Center Cancer-Information Service, "Cancer Registration and Statistics", gastric cancer ranked third in the number of deaths after lung cancer and colorectal cancer. The total number of cancer deaths was 11.1% (42,000 people) [11]. The number of gastric cancer deaths has remained at 50,000 per year for the past few decades, and since 2011, it has been declining. However, more than 40,000 people lose their lives to stomach cancer every year. Gastric cancer has the second highest incidence rate at 12.9% (130,500 people), following colorectal cancer. As for the annual transition of gastric cancer, the age-standardized incidence and mortality are steadily decreasing, the number of cases is increasing, and the number of deaths tends to plateau due to an increase in the incidence and deaths caused by gastric cancer in the elderly population.

#### *2.2. H. pylori and Gastric Cancer*

The International Agency for Research on Cancer designated *H. pylori* as a clear gastric cancer carcinogenic factor (group 1) in 1994 [27] and recommended prevention by eradication in 2014 [28]. The presence of *H. pylori* infection is determined by histologic examination, the rapid urease test, serum antibody test, stool antigen test, or 13C-urea breath test. The effectiveness of the eradication treatment on gastric cancer prevention has been shown in a randomized controlled trial [29], and this primary preventive effect of the eradication of gastric cancer has been reported in recent meta-analyses [30–32]. Eradication of *H. pylori* reduces the risk of gastric cancer and mortality [33–36], but the risk still remains in the second decade after eradication [37]. Moreover, the pathogenicity and carcinogenicity of *H. pylori* depend on its strain. The East Asian type of *H. pylori*, which is popular in Japan, is more carcinogenic than the European-type *H. pylori* [38,39]. In addition, the presence of *H. pylori* with a positive babA2 gene may contribute to an increased risk of GC, especially in the Asian population [39,40]. In Japan, the eradication treatment for gastric and duodenal ulcers was covered by the National Health Insurance in 2000, and *H. pylori*-infected gastritis was added as an indication in 2013 [41]. According to recent reports in Japan, the risk of cumulative incidence of gastric cancer was 17.0% in men and 7.7% in women in the *H. pylori*infected population, and <1% in the non-infected population [42]. More than 99% of gastric cancers in Japan are associated with *H. pylori*-infection gastritis [19,20]. Histopathological diagnosis of gastric cancer is performed according to the Japanese Classification of Gastric Carcinoma and the Vienna classification system [43,44]. Although gastric cancer that is not associated with *H. pylori* infection is extremely rare, gastric is cancer associated with autoimmune gastritis, gastric cancer due to *CDH1* gene mutation, fundic gland-type cancer, signet ring cell carcinoma, and cardia cancer are known [45]. Cardia cancer is often discovered at an advanced stage; thus, particular attention should be paid to it [46]. Moreover, the main risk factors of cardia cancer, which include gastroesophageal reflux disease and obesity, are different from those of gastric cancer associated with *H. pylori* [47].

As mentioned above, in Japan, the age-standardized incidence and mortality rate of gastric cancer has decreased over the past few decades due to a decrease in the incidence of *H. pylori* infection [4–15]. *H. pylori* infection rates in the 1960s, 1970s, and 1980s or later were 30%, 20%, and <10%, respectively [7]. A meta-analysis of the Japanese population shows that *H. pylori* infection rate is high in patients born in the 1940s; however, the infection rate decreased in patients who were born later, in the 1950s [9]. Although the morbidity rate of gastric cancer has continued to decrease due to the reduced *H. pylori* infection rates and the preventative effect of the *H. pylori* eradication therapy, the prevalence of *H. pylori* eradication has increased remarkably in recent years [8]. In the midst of dynamic changes in the incidence of *H. pylori* infection, it is considered to be important to pay attention to the high-risk groups in gastric cancer screening.

#### **3. Gastric Cancer Screening Methods Used in Japan**

#### *3.1. Current Status and Problems of Upper Gastrointestinal Series*

Annual radiographic screening for everyone >40 years of age in Japan was implemented in the 1960s as a secondary preventive measure for gastric cancer [12,14]. Gastric cancer screening using radiographic examination has proven to reduce mortality. It has an excellent mass-processing ability, and good accuracy, and is safe and cost-effective [48,49]. Furthermore, in recent case-control studies in Japan and South Korea, the effect of radiographic screening on mortality reduction was limited [50,51]. The Japan Society of Gastroenterological Cancer Screening formulated a revised version of the new gastric radiography guidelines (2011) [52]. The ability to view lesions by gastric radiographic examination has been greatly improved with the use of high-concentration, low-viscosity barium preparations and the advent of digital X-ray devices. Consequently, the rate of early detection of gastric cancer has exceeded 70% [53]. In addition, gastric cancer screening has been performed using imaging and AI to detect *H. pylori*-infected gastritis and gastric mucosal atrophy [54]. However, due to aging and immobilization of patients, radiation exposure, and lack of reading physicians and aging facilities, the rate of participation has been sluggish. Although endoscopic examinations have been approved by the revised 2014 Japanese Guidelines for Gastric Cancer Screening [13], it is impossible to replace all conventional radiography with endoscopic examinations due to problems relating to the capacity of endoscopy, budget, and access to examinees [14,15]. In population-based gastric cancer screening, it will be necessary to continue to utilize radiographic examinations with high processing capacity as a safety net.

#### *3.2. Current Status and Problems of Upper Gastrointestinal Endoscopy*

Radiographic examination is a screening method limited to Japan, but there is a growing international interest in endoscopic screening [55]. In Korea, in response to the results of domestic research, gastric cancer screening has been limited to endoscopic examinations [55,56].

In 2013, a case-control study was conducted in Japan and Korea. The research conducted in Japan involved a study on the population of Goto Islands in Nagasaki Prefecture [57] and a study on the population of Tottori Prefecture and Niigata City [58]. Although the sample size is small in the Nagasaki study, the mortality rate of gastric cancer was significantly decreased by 79% in participants of endoscopic screening (odds ratio [OR]: 0.206, 95% confidence interval [CI]: 0.044–0.965) [57]. In 2013, a case-control study that was conducted in Niigata City and four cities in Tottori Prefecture reported that the mortality rate was significantly lower by approximately 30% in people who underwent endoscopy 36 months before the date of gastric cancer diagnosis (OR: 0.695, 95% CI: 0.489–0.986) [58]. The studies that were conducted in Korea were large-scale research based on national databases. When the gastroscopic examination was performed even once in the past, the effect of reducing the gastric cancer mortality rate was confirmed to be 47% in individuals aged 40–74 years old (OR: 0.53, 95% CI: 0.51–0.56) [56]. Based on these results, a gastroscopy was recommended as a population-based screening method according to the revised 2014 Japanese Guidelines for Gastric Cancer Screening [13]. At the same time, it has changed from once a year for individuals aged > 40 to once every 2 years for individuals aged > 50 years, reflecting the recent decline in gastric cancer mortality by age group. In 2015, a study of Tottori Prefecture showed that endoscopic screening reduced the gastric cancer mortality rate by 67% compared with radiographic screening [50]. Zhang et al. conducted a meta-analysis that included 342,013 individuals in the six-cohorts and fourcase-control studies that were previously published. This analysis demonstrated that

endoscopic examination showed a 40% reduction in gastric cancer mortality rate (relative risk: 0.60, 95% CI: 0.49–0.73) [59].

According to reports from the area where endoscopic examinations were introduced, the gastric cancer detection rate was 0.05–0.32% for gastric X-ray examination and 0.30–0.87% for gastroscopic examinations [8,60]. Further, the gastric cancer detection rate of endoscopy was reported to have been approximately three times higher than that of X-ray examination. In Japanese studies, the proportion of early-stage cancer was approximately 70% in the radiographic screening group and >80% in the endoscopic screening group. Similarly, Hosokawa et al. previously reported that the detection rate of early cancer was higher in the endoscopic screening group than in the radiographic screening group [61]. However, the effectiveness of gastric cancer screening should be evaluated by the mortality reduction, and not by the detection rate.

Endoscopy can diagnose early-stage cancers that can be treated by endoscopic surgical dissection. Endoscopic surgical dissection has been performed for approximately half of early-stage cancers detected by endoscopic screening [62]. It seems to contribute to the maintenance of the quality of life after treatment. Moreover, recent development and widespread use of IEE and magnifying endoscopy have improved the endoscopic diagnosis of gastric cancer [23]. IEE is useful for diagnosing gastric cancer after eradication, which is usually difficult to detect [63]. In a recent study, we showed that photodynamic endoscopic diagnosis—based on the fluorescence of photosensitizers that accumulate in tumors—may be useful in the diagnosis of early gastric cancer regardless of the endoscopist's experience and is useful for tumor detection; however, its usefulness has not been established because no prospective studies evaluating its usefulness have been performed [64].

As the participation rate in gastric cancer screening has decreased, its impact on mortality reduction has become limited. Although the participation rate in radiographic screening for gastric cancer has sunk below 10% [65], it is possible to improve the participation rate by introducing endoscopic screening as a method of gastric cancer screening. Notably, the participation rate is approximately 25% in municipalities that have already undergone endoscopic screening [66,67]. Thus, endoscopy is now the first choice for gastrointestinal tract examination instead of X-ray examination.

#### **4. Risk Stratification for Gastric Cancer Screening**

#### *4.1. Risk Factors for Gastric Cancer*

Risk factors for gastric cancer include *H. pylori* infection and accompanying gastric mucosal atrophy, smoking, and hereditary diseases, such as Lynch syndrome and familial adenomatous coli [23]. In addition, diet, lifestyle preferences, and Epstein-Barr virus infection have been reported as possible risk factors. Recently, it has been reported that approximately one-fifth of diffuse-type gastric cancers in Japan were attributable to the combination of alcohol intake and defective *ALDH2* allele or *CDH1* variants [68]. The most important method of obtaining information about these risk factors before endoscopic screening is a medical questionnaire. In addition, during the endoscopic examination, individuals can be stratified by gastric cancer risk based on *H. pylori* infection status and relevant findings suggestive of gastric cancer risk, as described in the endoscopy-based Kyoto classification of gastritis [69–71]. Endoscopic findings related to the risk of gastric cancer include moderate-to-severe gastric atrophy, enlarged gastric folds, nodular gastritis, xanthoma [72,73], and map-like redness [70]. As a result of examining the accuracy of *H. pylori* infection diagnosis by the "Kyoto classification of gastritis", the sensitivity and specificity of detecting uninfected, existing infection, and current infection were 88.3% and 92.9%, 78.8% and 90.0%, and 67.1% and 91.4%, respectively. Moreover, risk classification by endoscopic examination was confirmed to have very high accuracy. However, to avoid false-negative results, an *H. pylori* antibody test was recommended [74].

#### *4.2. Tests Used for Risk Stratification*

According to the 2019 Basic Survey on National Life, 54.2% of men and 45.1% of women aged 40–69 years had undergone gastric cancer screening [75], approaching the target value of 50% of the 3rd Basic Plan for Cancer Countermeasures in Japan. However, in recent years, the number of *H. pylori*-negative people has increased, and the gastric cancer-adjusted mortality rate has naturally decreased [5–11]; following this, there has been a problem with cost-effectiveness in the strategy of simply increasing the participation rate. In the future, it may be necessary to stratify individuals according to gastric cancer risk by determining risk factors—such as a history of *H. pylori* infection and gastric mucosal atrophy—and reflect them in the selection of endoscopy and the determination of the screening interval.

The "ABC method", a combined assay for serum anti-*H. pylori* IgG antibody and serum pepsinogen (PG) levels, is generally used in Japan as a gastric cancer risk classification system [76]. Itoh et al. reported a strong correlation between the ABC classification system and radiological findings in relation to the risk of gastric cancer [77]. However, the revised 2014 Japanese Guidelines for Gastric Cancer Screening do not recommend this method due to insufficient scientific evidence regarding its effectiveness in gastric cancer screening [13]. The risk of gastric cancer can be stratified based on factors, such as the presence of *H. pylori* infection and the extent and severity of gastric atrophy. The serum anti-*H. pylori* IgG antibody titer can predict an individual's *H. pylori* infection status, whereas its titers vary greatly depending on the test kit used.Serum PG levels reflect the status of gastric mucosal inflammation and serve as a marker for atrophic gastritis. Individuals with PG I levels of ≤70 ng/ml and PG I/II ratio of <3 are classified as PG test positive, and people with a history of *H. pylori* eradication, treatment of proton pump inhibitors, previous gastric resection and impairment of renal function are excluded to ensure correct stratification. This method classifies individuals into the following four groups according to their serological status: (1) group A, anti-*H. pylori* IgG antibody (−)PG (−); (2) group B, anti-*H. pylori* IgG antibody (+)/PG (−); (3) group C, anti-*H. pylori* IgG antibody (+)/PG (+); and (4) group D, anti-*H. pylori* IgG antibody (−)/PG (+), which also included those with autoimmune gastritis (type A gastritis) [76]. Notably, a meta-analysis conducted by Terasawa et al. demonstrated that groups A, B, and C + D were significantly different in their respective gastric cancer risk [78]; thus, this stratification is expected to serve as a mass screening system for this disease.

As the development of gastric cancer in patients not infected with *H. pylori* is extremely rare in Japan, it may be expected that the *H. pylori*-uninfected population could be excluded from the mass screening system for gastric cancer. However, group A included patients with a high risk of developing gastric cancer and could not be regarded as truly *H. pylori*negative [79,80]. The presence of *H. pylori*-infected individuals in group A is a crucial problem because the individuals are wrongly considered to have an extremely low risk for gastric cancer, similar to healthy, *H. pylori*-uninfected individuals. The endoscopic grade of atrophy is an accurate predictive marker for gastric cancer [81,82]. To exclude individuals who are truly *H. pylori*-negative, an endoscopic evaluation of the gastric mucosa should be performed [83,84]. It is inefficient to perform endoscopy in all patients as this is expensive and requires high manpower of endoscopists.

According to a report by the Kanazawa City Medical Association [84], gastric cancer may develop at an annual rate of 0.31% in a state with advanced atrophy (O-3) classified by Kimura and Takemoto [85], and it is possible to stratify the risk of gastric cancer using endoscopic diagnosis. Therefore, endoscopic diagnosis of atrophy may be more effective than the ABC classification system for predicting the risk of gastric cancer.

Several cost-effectiveness analyses demonstrated that endoscopic surveillance is a cost-effective method to reduce gastric cancer mortality. A comprehensive systematic review showed that endoscopic screening is cost-effective in high-incidence countries, and that targeted endoscopic screening of high-risk populations is also generally cost-effective in low-intermediate incidence countries [86]. Recently, Kowada et al. demonstrated that biennial endoscopy for patients with mild-to-moderate gastric mucosal atrophy and annual endoscopy for patients with severe gastric mucosal atrophy were the most cost-effective measures after *H. pylori* eradication [87].

#### *4.3. Gastric Cancer Screening Tests Performed at Hoki-cho, Tottori Prefecture*

Since 2000, patients in Tottori Prefecture were able to select between endoscopic and radiographic examinations. The rate of gastric cancer screening by endoscopic or radiographic examination in Hoki-cho, Tottori Prefecture has remained around 20%, which is not sufficient, as the national target is 50%. With the aim of accelerating endoscopic screening and eradication therapy for *H. pylori* infection, Hoki-cho in Tottori Prefecture has implemented a risk evaluation system for gastric cancer for 5 years since 2014 by testing the serum for *H. pylori* antibodies [88]. Target populations included individuals aged 20 and 35–70 years in each year, and who underwent at least one examination through the evaluation system during this period (Figure 1).

**Figure 1.** Flow chart of *H. pylori* antibody test project in Hoki-cho, Tottori prefecture. Individuals with PG I levels of ≤70 ng/mL and PG I/II ratio of <3 are classified as PG test positive, which is equal to gastric atrophy. PG, pepsinogen.

In cases with negative results for *H. pylori* diagnosis, we incorporated the serum PG method. During the 5 years from 2014 to 2018, there were a total of 6191 target individuals, of whom 2464 were screened (participation rate: 39.8%). The total number of *H. pylori*positive cases was 753 (30.6%), and that of cases negative for *H. pylori* antibody and positive for the PG method was 58 (2.4%). The frequency of *H. pylori* positivity was 9.2% in individuals aged 20 years and <40% in individuals aged 60–70 years. This gradually increased with advancing age (Figure 2). The rate was highest (38.4%) among patients aged 60–70 years of age.

Consequently, during the 5-year study period, 71.3% of the examinees underwent a detailed endoscopic examination (Table 1), and two patients with early gastric cancer were detected. Eradication therapy was implemented in 97.6% of cases that had a positive result for *H. pylori* infection after undergoing a detailed endoscopic examination. On the other hand, only 33.7% and 22.8% of individuals with positive screening results in 2014 and 2015, respectively, had received a periodic endoscopic screening at least once during the three years after the following year. Therefore, it is important to increase the participation rate of this project and the rate of detailed endoscopic examinations to further increase in the detection of the risk of gastric cancer and implement periodic endoscopic screening.

**Figure 2.** The frequency of *H. pylori* positivity according to age (2014~2018).



The rate of population-based gastric cancer screening in Hoki-cho was 20.6% in 2013; however, after the introduction of the *H. pylori* infection screening, it increased to 26.2% in 2015, 22.8% in 2016, 23.2% in 2017, and 24.3% in 2018. In 2018, 657 (63.4%) of the 1036 patients had opted for endoscopic examination (26.1% in 2013, 35.3% in 2014, 52.9% in 2015, 50.7% in 2016, and 57.0% in 2017), contributing to the steady increase in the use of endoscopy (Table 2).

**Table 2.** Annual trends in the rate of participation for population-based gastric cancer screening in Hoki-cho, Tottori Prefecture.


The data were obtained from "Cancer Screening Report in Tottori Prefecture".

This implies that screening using the *H. pylori* antibody test is useful for improving the rate of participation and efficient gastric cancer endoscopy. In the future, it will be necessary to verify the effect of reducing gastric cancer mortality by combining *H. pylori* antibody testing and endoscopic examination and to implement the optimal screening interval for each *H. pylori*-infected and uninfected person. In addition, it is important to improve the true rate of participation by recommending endoscopic examination to those who require it.

#### **5. Future Directions for Gastric Cancer Screening**

#### *5.1. Optimal Age and Intervals for Screening*

According to Japan's national screening program, the recommended age for gastric cancer screening was changed to >50 years due to a decrease in the incidence of gastric cancer in 40-year-olds [13]. Similarly, the British Society of Gastroenterology guidelines suggested endoscopy screening be considered in individuals aged > 50 years with multiple risk factors for gastric adenocarcinoma (male, smokers, and pernicious anemia) [89]. In Korea, gastric cancer screening is conducted for populations aged 40–74 years [55]. A study in Japan based on nationwide data showed that the endoscopic screening program would be cost-effective when implemented for populations aged 50–75 years [90]. A nationwide study in Singapore revealed that gastric cancer screening was cost-effective when used among Chinese men aged 50–70 years [91].

A different screening interval might be defined and considered depending on its relationship to the individual's background and gastric cancer risk. The incidence of gastric cancer differs according to individual risks and is mainly defined by *H*. *pylori* infection status and atrophic gastritis. In Korea, an interval of 2 years is recommended [92]. The British Society of Gastroenterology recommends that endoscopic follow-up should be performed every 3 years for individuals with severe chronic atrophic gastritis or intestinal metaplasia, and within one-year intervals for low-grade intraepithelial neoplasia—similar to the management of epithelial precancerous conditions and lesions in the stomach (MAPS II) guideline [93]. In Japan, high-grade intraepithelial neoplasia should be treated clinically. The national program in Japan recommends repeated gastric cancer screening every 2–3 years [14]. However, high-quality prospective research is required to determine the optimal follow-up interval for endoscopic screening in Japan. If individuals with a low risk of gastric cancer could be identified and adopted in the screening programs, their screening interval could be expanded. Hamashima et al. introduced infection atrophy diagnosis using endoscopy and serological testing or risk stratification and conducted a nationwide prospective study to set the interval between risk-specific screenings [17]. It is expected that the results of this research will reduce the burden on patients by appropriately classifying the risk of gastric cancer and extending the interval between screenings for low-risk patients. The research also aims to establish a system that enables the target population to access endoscopic screening fairly by effectively utilizing limited medical resources.

#### *5.2. AI as a New Screening Method*

In gastric cancer screening, both radiographic and endoscopic examinations may be eluded by gastric cancer [56,94,95]. In population-based screening, the specialist is required to carry out a double check, the labor is intensive, and the evaluation of the accuracy is difficult. Recently, diagnosis of *H. pylori* infection and detection of gastric cancer using AI have been reported. The sensitivity and specificity of endoscopic *H. pylori* infection diagnosis were 81.9% and 83.4% using AI, 79.0% and 83.2% by an average endoscopist, and 85.2% and 89.3% by an endoscopic specialist, respectively [96]. On the other hand, when AI detection was conducted in three groups. That is, *H. pylori*-positive, *H. pylori*-negative, and eradicated *H. pylori*, the rate of correct diagnosis decreased to 77% [97]; hence, there is room for further improvement in diagnosis using AI, including that of cases following *H. pylori* eradication. AI has a high sensitivity for gastric cancer, but its positive predictive value is low [24–26]. However, this has rapidly improved [98]. In addition to its accuracy, AI diagnostic imaging is expected to reduce the burden of double-checking and effectively extract patients who need follow-up endoscopy [98]. It is expected that intervention of gastric cancer screening using AI may reduce gastric cancer deaths more efficiently than the conventional methods of screening.

#### **6. Conclusions**

While endoscopic gastric cancer screening has been initiated nationwide in Japan, the incidence of *H. pylori* infection has decreased and the number of cases following *H. pylori* eradication has increased. Moreover, the importance of ABC classification reflecting *H. pylori* infection status and gastric atrophy before endoscopic screening is being increasingly recognized. Considering its cost-effectiveness, spreading the use of endoscopic screening is desirable to establish a new medical examination provision system that conducts examinations at appropriate screening intervals, according to the individual's background and risks.

**Author Contributions:** Writing—original draft preparation, K.Y.; writing—review and editing, M.S. and H.I.; supervision, M.S., H.K., K.K. and H.I. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** This study was approved by the institutional ethics committee of complies with "The Treaty of Helsinki". The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Ethics Committee of Tottori University (1511A080).

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** Data will be available from the corresponding author upon reasonable request.

**Acknowledgments:** The study of *H. pylori* antibody project was supported by Hoki-cho, Tottori prefecture.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

