**Predicting Long-Term Health-Related Quality of Life after Bariatric Surgery Using a Conventional Neural Network: A Study Based on the Scandinavian Obesity Surgery Registry**

### **Yang Cao 1,\*, Mustafa Raoof 2, Scott Montgomery 1,3,4, Johan Ottosson <sup>2</sup> and Ingmar Näslund <sup>2</sup>**


Received: 4 November 2019; Accepted: 3 December 2019; Published: 5 December 2019

**Abstract:** Severe obesity has been associated with numerous comorbidities and reduced health-related quality of life (HRQoL). Although many studies have reported changes in HRQoL after bariatric surgery, few were long-term prospective studies. We examined the performance of the convolution neural network (CNN) for predicting 5-year HRQoL after bariatric surgery based on the available preoperative information from the Scandinavian Obesity Surgery Registry (SOReg). CNN was used to predict the 5-year HRQoL after bariatric surgery in a training dataset and evaluated in a test dataset. In general, performance of the CNN model (measured as mean squared error, MSE) increased with more convolution layer filters, computation units, and epochs, and decreased with a larger batch size. The CNN model showed an overwhelming advantage in predicting all the HRQoL measures. The MSEs of the CNN model for training data were 8% to 80% smaller than those of the linear regression model. When the models were evaluated using the test data, the CNN model performed better than the linear regression model. However, the issue of overfitting was apparent in the CNN model. We concluded that the performance of the CNN is better than the traditional multivariate linear regression model in predicting long-term HRQoL after bariatric surgery; however, the overfitting issue needs to be mitigated using more features or more patients to train the model.

**Keywords:** prediction; deep learning; conventional neural network; health-related quality of life; bariatric surgery

#### **1. Introduction**

Severe obesity, defined as having a body mass index (BMI) greater than 40 kg/m2 or greater than 35 kg/m<sup>2</sup> plus at least one obesity-related comorbidity [1,2], has been associated with numerous health outcomes and reduced health-related quality of life (HRQoL) [3–8]. HRQoL measures population health multi-dimensionally from physical, mental, emotional, and social functioning domains, which have already been identified as an important indication for bariatric surgery and recognized by the United States National Institutes of Health Conference as early as 1991 [9,10]. Although many studies have reported changes in HROoL after bariatric surgery, few are long-term prospective studies. A systematic review of seven prospective cohort studies with a follow-up time of ≥5 years revealed that bariatric surgery patients reported considerably improved HRQoL and the improvement was maintained over the long term [11]. However, many patients still experience reduced HRQoL after surgery. In our study, 39% of patients had significant improvements in physical functioning (PF) (increased by >25 in the original score or >0.25 in the scaled score), and the rest had no significant improvement and some patients (2%) even had significant deterioration (reduced by >25 in the original score or >0.25 in the scaled score). No relationship between the PF scores before and 5 years after surgery was identified (Figure S1).

Although some preoperative psychological factors, including personality change, severe psychiatric disorder, or depressive symptoms, are associated with postoperative HRQoL after bariatric surgery [12,13], whether long-term HRQoL after bariatric surgery can be predicted based on patients' baseline features has not been investigated. The present study examined the performance of the convolution neural network (CNN) for predicting 5-year HRQoL after bariatric surgery based on the available preoperative information from a national quality registry, and compared CNN with a conventional linear regression estimator.

#### **2. Material and Methods**

#### *2.1. Patients and Features*

Data for the patients registered in the Scandinavian Obesity Surgery Registry (SOReg) were used for the current study. The SOReg was launched in 2007 and covers 98% of bariatric surgery in Sweden since 2009. SOReg is validated regularly and has been shown to have high data quality [14]. In total, 27 - of 42 operating centers in Sweden participate in the HRQoL registration in SOReg. HRQoL was measured using the RAND-SF-36 and the obesity-related problems (OP) scale preoperatively and 1, 2, and 5 years after surgery. In the present study, preoperative and 5-year HRQoL data, including PF, role physical (RP), bodily pain (BP), general health (GH), vitality (VT), social functioning (SF), role emotional (RE), mental health (MH) scale, summary physical scale (PCS), summary mental scale (MCS), and OP, were used. All scale scores ranged from 0 to 100, with higher scores indicating better health status except for OP, where low values represent good health. Eight baseline features, including sex, age, BMI, sleep apnea syndrome (SAS), hypertension, diabetes, dyslipidemia, and depression, were also used as predictors.

In total, 6687 patients with complete information on 19 baseline features and 11 5-year HRQoL measures were used in the machine learning study.

The data that support the study are not publicly available because they contain information that could compromise research participant privacy and confidentiality. The authors will make the data available upon reasonable request and with permission of the Committee of Scandinavian Obesity Surgery Registry in Örebro, Sweden.

#### *2.2. Feature Scaling*

Before machine learning, the features in the dataset were scaled. The binary features were converted into dummy variables, and the continuous features were scaled to between 0 and 1 using a min-max scaler. In the sensitivity analysis, the normalizer and standardizer scalers were also used to evaluate the influence of scalers on the model's performance.

#### *2.3. Conventional Neural Network*

A CNN is a regularized version of a multi-layer perceptron neural network, which was inspired by a biological process where the connectivity pattern between neurons resembles the organization of the visual cortex [15]. Although not specifically developed for non-image data, CNN may achieve state-of-the-art results on regression prediction problems, especially for data with time series or spatial patterns. The CNN input is traditionally two-dimensional (2D) but can also be changed to be one-dimensional (1D), allowing it to develop an internal representation of a 1D sequence. In our study, we used a CNN with seven hidden layers, including two 1D convolution layers (with 10 filters for

each), two 1D max pooling layers, one flattened layer, and two dense layers (with 1000 computation units). The rectified linear unit (relu) activation function was used for the convolution layers and dense layers, and the normal distribution was used to initialize weights in the layers. The mean squared error was used as the loss function and the an Adadelta algorithm was used as optimizer when compiling the model [16]. The structure of the CNN model is shown in Figure S2.

#### *2.4. Model Validation and Evaluation*

In total, 20% of the patients were randomly selected as a test dataset for the final evaluation of the data, and the rest of the patients were used as the training dataset. To find optimal high-level parameters (like the number, size, and type of layers in the networks) and lower-level parameters (like the number of epochs, choice of loss function and activation function, and optimization procedure) in the CNN model, the K-fold cross validation method was used during the training phase [17]. We split the training data into 5 partitions, instantiated 5 identical models, and trained each one on 4 partitions while validating on the remaining partition. The performance of each model was evaluated using the mean squared error (MSE) because of the existence of zero values in the outcome variables. We then computed the average performance over the 5 folds. In the end, the choice of the parameters was a compromise between the model's performance and computing time, i.e., the model with both the smallest validation error and a shorter computing time was deemed an optimal model. The training, validation, and final evaluation process is shown in Figure S3.

To avoid overuse of the deep learning method for prediction, we also applied a simple multivariate linear regression model as an estimator to predict the 5-year HRQoL scores, and compared the performance between the linear regression model and the CNN model.

#### *2.5. Software and Hardware*

The descriptive and inferential statistical analyses were performed using Stata 15.1 (StataCorp LLC, College Station, TX, USA). The CNN and multiple linear regression models were achieved using packages scikit-learn 0.21.2 and Keras 2.2.4 in Python 3.6 (Python Software Foundation, https: //www.python.org/).

All of the computation was conducted in a computer with a 64-bit Windows 7 Enterprise operation system (Service Pack 1), Intel ®Core TM i5-4210U CPU @ 2.40 GHz, and 16.0 GB random access memory.

#### **3. Results**

#### *3.1. Descriptive Analysis of the Data*

In total, 6687 patients registered in SOReg between 2008 and 2012 with complete demographic and preoperative comorbidity information, and preoperative and 5-year HROoL scores were included in the study. The characteristics of the patients are shown in Table 1. Briefly, the average age and BMI of the patients were 42.7 years and 42.3 kg/m2, respectively. More than three quarters (77%) were female and 45% had at least one of the five comorbidities (SAS, hypertension, diabetes, depression, and dyslipidemia) before bariatric surgery.


**Table 1.** Characteristics of the patients (*n* = 6687) included in the study, mean (SD) or *n* (%).

SD, standard deviation; NA, not applicable; BMI, body mass index; SAS, sleep apnea syndrome; PF, physical functioning; RP, role-physical; BP, bodily pain; GH, general health; VT, vitality; SF, social functioning; RE, role-emotional; MH, mental health; PCS, summary physical scale; MCS, summary mental scale; OP, obesity-related problems.

#### *3.2. Performance of the CNN Model in the K-Fold Cross-Validation*

We analyzed 11 HRQoL scores in the study. To make our description concise, we used the PF score as an example of our data analysis as follows.

In general, the performance of the CNN model (measured as the MSE) increased with more convolution layer filters, computation units, and epochs, and decreased with a larger batch size. Although the performance increased with the model's complexity, the computing time increased exponentially. When we set the number of computation units and filters to be large enough (1000 and 10, respectively) and the batch size was small enough (10), the performance of the CNN model in K-fold cross-validation is shown in Figure 1. The performance was not stable when the number of epochs was small and changed dramatically depending on the random seed used in training (Figure 1). When the number of epochs was >40, the model presented smaller MSE than the linear regression model (0.032 vs. 0.035, Figures 1 and 2). Although more epochs reduced the MSE in the CNN model, the computing time increased exponentially, indicating a higher cost in machine learning (Figure 1). The MSE of the linear regression model appeared constant when the number of epochs >40 (Figure 2), which means the prediction cannot be improved with more epochs. The cross-validation indicates that the CNN model may provide better prediction but at the expense of the computing time.

**Figure 1.** Performance of the convolution neural network (CNN) model in K-fold cross-validation.

**Figure 2.** Performance of the simple multivariate linear regression model in K-fold cross-validation.

#### *3.3. Performance of the CNN Model in the Final Evaluation*

When the models were evaluated using the test data that were not seen previously by the models, in general, the CNN model presented a better performance (solid line in Figure 3b) than the linear regression model (solid line in Figure 3a) with epochs >40. Although overfitting was presented sporadically in the CNN model (comparing the solid line with the dotted line in Figure 3b), the performance improved gradually with an increased number of epochs while remaining constant in the linear regression model.

Finally, we used 40 epochs for the CNN model, and predicted PF scores for both the training data and the test data. Clear correlations can be seen between the predicted values and observed values in the training data, with an MSE of 0.032 for the CNN model (Figure 3d and Table 2) compared to the MSE of 0.033 seen in the linear regression model (Figure 3c and Table 2). For the test data, the CNN model had an MSE of 0.035 (Figure 3f and Table 2) compared with 0.034 (Figure 3e and Table 2) from the linear regression model. Although the CNN model provided better prediction than the linear regression model for the test data, the overfitting became apparent in some situations when the model learned patterns more specific to the training data.

**Figure 3.** Model performance of the simple linear estimator and the CNN estimator. The dots in the plots (c)–(f) were jittered to avoid a heavy overlap of patients with the same coordinates. CNN, convolution neural network.

#### *3.4. Performance of CNN in Predicting Other HRQoL Measures*

The relationships between the baseline and the 5-year scores of other HRQoL measures in the test data are shown in Figure 4. Except for GH and VT, no clear relationship between the baseline and the observed 5-year scores is seen for the HRQoL measures (Figure 4, plots a1–j1). However, the predicted 5-year scores based on the baseline scores and the CNN model show clear correlations with the observed 5-year scores for BP, GH, VT, MH, MCS, and OP (Figure 4, plots a2–j2).

**Figure 4.** Correlation of the observed 5-year scores with the observed baseline scores and predicted scores for test data. The dots in the plots were jittered to avoid a heavy overlap of patients with the same coordinates. RP, role physical; BP, bodily pain; GH, general health; VT, vitality; SF, social functioning; RE, role emotional; MH, mental health; PCS, summary physical scale; MCS, summary mental scale; OP, obesity-related problems.

We compared the performance of the CNN model and the linear regression model for all the HRQoL measures in both the training data and the test data. The CNN model showed an overwhelming advantage in predicting all the HRQoL measures. The MSEs of the CNN model for the training data were 8% to 80% smaller than those of the linear regression model (Table 2). The overfitting was also apparent in the CNN model, i.e., the MSEs of the CNN model for the test data were all greater than those of the linear regression model (Table 2).


**Table 2.** Mean squared errors (MSEs) of the CNN model and the multivariate linear regression model.

PF, physical functioning; RP, role physical; BP, bodily pain; GH, general health; VT, vitality; SF, social functioning; RE, role emotional; MH, mental health; PCS, summary physical scale; MCS, summary mental scale; OP, obesity-related problems.

OP 0.0450 0.0625 0.0750 0.0608

#### *3.5. Sensitivity Analysis and Computing Time*

We also conducted sensitivity analysis using different scalers and optimizers in data preparation and model compiling, and tuned the hyperparameters using the exhaustive grid search method [18]. Although they showed more or less influence on the models' performance, the influence was negligible when the number of epochs was large and the batch size was small. The computing time for the CNN model largely depends on the hyperparameter settings of the layers, number of epochs and the batch size for training, and the software and hardware used. In our study, with the model structures and hyperparameters shown in Figure S2, the running time ranged from 70 (epoch = 40, batch size = 10, without cross-validation) to 595 s (epoch = 400, batch size = 10, with five cross-validations) on our computer.

#### **4. Discussion**

Machine learning methods to predict HRQoL have been used in elderly with chronic diseases [19], cervical cancer patients [20], and osteoarthritis patients [21]. However, to our knowledge, they have not been used to predict the postoperative HRQoL of patients undergoing bariatric surgery. We explored the feasibility and capacity of a deep learning method, i.e., convolution neural network, to predict long-term HRQoL after bariatric surgery using a national register. The study can only be achieved based on a well-maintained and high-quality longitudinal database with long-term follow-up like SOReg [22].

Our results indicate that 5-year HRQoL after bariatric surgery may be well predicted preoperatively for some scale domains like PF, BP, GH, VT, MH, MCS, and OP. In our study, we aimed to evaluate and predict the quality of life of patients after bariatric surgery. Some patients were not "satisfied" even when they lost weight. Other factors, such as complications during follow-up and preoperative pharmacologic drug treatment, are associated with a change of the quality of life after bariatric surgery, whereas age, sex, and preoperative metabolic comorbidity may also play a role [11,23–25]. Our findings may provide important information for postoperative care and rehabilitation for this group of patients.

Our research question was about predicting continuous outcomes using supervised deep learning methods, which could be converted to a question of supervised two- or multi-class classification, i.e., to predict whether the quality of life of the patient has improved, remained unchanged, or deteriorated. Although the precision of prediction might be reduced in classification, the accuracy might be enhanced, and the method might be more applicable for clinical use. We would like to investigate the question in future studies.

There has been a warning that healthcare researchers should not be overly enthralled by the promises of deep learning methods [26]. Therefore, to avoid abusing the deep learning method in our study, we also compared the performance of the CNN model with a conventional statistical learning method for continuous variables, using a multivariate linear regression model. Although the conventional statistical methods require sometimes complex processing (feature engineering) to extract the requisite discriminative features, they may provide more interpretable results compared to the deep learning methods. In contrast, the biggest advantage of deep learning methods is that they try to learn high-level features from data in an incremental manner, which eliminates the need for domain expertise and hard-core feature extraction. However, the generalizability of deep learning models relies largely on the data they learned, and overfitting on unseen data is more apparent, as observed in our study. Although there are some ways in which we may reduce overfitting in deep learning models, the rule of thumb is to use more training data.

There are potential limitations to our study. In total, 28,293 patients underwent surgery for a primary gastric bypass between 2008 and 2012 and had a follow-up longer than 5 years when the study was initiated. However, only less than one quarter of the patients who had complete HRQoL information could be used for the machine learning. Compared to the patients who had no or incomplete HRQoL information, the patients with complete relevant data were older (42.7 ± 11.0 vs. 40.4 <sup>±</sup> 10.8 years), had fewer males (21.2% vs. 25.1%), and lower BMI (42.3 <sup>±</sup> 5.2 vs. 42.8 <sup>±</sup> 5.5 kg/m2). These factors have already been shown to influence HRQoL [27–29]. Because of these systematic differences in HRQoL between the patients with and without HRQoL measures, the generalizability of our CNN model may be questionable. The missing information needs to be imputed in the future for deep machine learning. We would also point out that the CNN built in our study was only based on features from gastric bypass patients, which cannot be generalized to other surgical procedures or health conditions. The application of CNN in predicting prognosis after surgeries still needs to be investigated using large data from the real world.

#### **5. Conclusions**

CNN can be used to predict long-term HRQoL after bariatric surgery based on the baseline features of patients. The performance of the CNN was found to be better than the traditional multivariate linear regression model; however, its overfitting on unseen data needs to be mitigated by using more features of patients or greater use of training data in the future.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2077-0383/8/12/2149/s1, Figure S1: Physical functioning (PF) scores before and after bariatric surgery of 6687 patients used in the study. Figure S2: Structure of the conventional neural network (CNN) model used in the study. Figure S3: Process of training, validation and evaluation

**Author Contributions:** Data curation, M.R.; Formal analysis, Y.C.; Funding acquisition, Y.C.; Investigation, Y.C., M.R., S.M. and I.N.; Methodology, Y.C. and M.R.; Project administration, J.O. and I.N.; Resources, J.O. and I.N.; Software, Y.C.; Validation, Y.C. and S.M.; Visualization, Y.C.; Writing—original draft, Y.C., M.R. and I.N.; Writing—review and editing, Y.C., M.R., S.M., J.O. and I.N.

**Funding:** Yang Cao's work was supported by Örebro Region County Council (OLL-864441). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Clinical Validation of Innovative Optical-Sensor-Based, Low-Cost, Rapid Diagnostic Test to Reduce Antimicrobial Resistance**

**Suman Kapur 1,\*,**†**, Manish Gehani 1,**†**, Nagamani Kammili 2, Pankaj Bhardwaj 3, Vijayalakshmi Nag 3, Sudha M. Devara <sup>2</sup> and Shashwat Sharad 4,\***


Received: 18 October 2019; Accepted: 27 November 2019; Published: 1 December 2019

**Abstract:** The antibiotic susceptibility test determines the most effective antibiotic treatment for bacterial infection. Antimicrobial stewardship is advocated for the rational use of antibiotics to preserve their efficacy in the long term and provide empirical therapy for disease management. Therefore, rapid diagnostic tests can play a pivotal role in efficient and timely treatment. Here, we developed a novel, rapid, affordable, and portable platform for detecting uropathogens and reporting antibiogram to clinicians in just 4 h. This technology replicates the basic tenets of clinical microbiology including bacterial growth in indigenously formulated medium, and measurement of inhibition of bacterial growth in presence of antibiotic/s. Detection is based on chromogenic endpoints using optical sensors and is analyzed by a lab-developed algorithm, which reports sensitivity to the antibiotic's panel tested. To assess its diagnostic accuracy, a prospective clinical validation study was conducted in two tertiary-care Indian hospitals. Urine samples from 1986 participants were processed by both novel/index test and conventional Kirby Bauer Disc Diffusion method. The sensitivity and specificity of this assay was 92.5% and 82%, respectively (*p* < 0.0005). This novel technology will promote evidence-based prescription of antibiotics and reduce the burden of increasing resistance by providing rapid and precise diagnosis in shortest possible time.

**Keywords:** urinary tract infection; rapid culture; antibiotic susceptibility testing (AST), evidence-based prescription; antibiotics; antimicrobial resistance (AMR), rapid diagnostics

#### **1. Introduction**

Healthcare challenges faced by developing countries are vastly different from those in developed nations. With very limited budget for healthcare, developing countries have not been able to put up any significant infrastructure to address their huge disease burden. In vitro diagnostic (IVD) tests provide the basis for most medical decision-making and play crucial role in limiting healthcare costs, since appropriate diagnostic tests performed in a timely manner i) improve patient care, ii) contribute to protecting consumers' health, iii) help to limit healthcare spending, iv) reduce the risk of trial-and-error treatment and over-prescription, v) shorten the time before treatment begins, and vi) decrease the length of hospital stays. Appropriate diagnosis can improve the effectiveness of treatments and avoid long-term complications for the infected patient. India harbors the world's largest burden of drug-resistant pathogens. Easy access, availability, and higher consumption of medicines have led to a disproportionately higher incidence of inappropriate use of antibiotics and greater levels of antimicrobial resistance (AMR) compared to developed countries [1]. It has been shown that the health sector in India suffers from gross inadequacy of funds, which will further result in conditions favorable for the development of drug resistance [2]. The high resistance of pathogens in the country, even to newer antibiotics, has led to the emergence of superbugs like New Delhi Metallo-beta-lactamase (NDM-1) [3]. By 2050, 2 million Indians are projected to die as a result of AMR [3]. Indians are the largest consumers of antibiotics worldwide, despite a decline in communicable diseases [3], due to a liberal policy for over-the-counter sale of antibiotics and irrational prescription of antibiotics. A study by Ganguly et al. highlighted the importance of rationalizing antibiotic use to limit AMR in India [4]. Irrational prescription happens due to a lack of fast point-of-use tests for evidence-based prescription, lack of infrastructure for bacterial culture and antibiotic susceptibility test (AST), and lack of awareness worldwide. Selective pressure from inappropriate use of antibiotics can lead to resistance via the emergence of mutant strains [5]. Unavailability of rapid point-of-use diagnostics to distinguish bacterial infections and suggest appropriate therapy is a major reason for irrational prescriptions of antibiotic/s.

Urinary tract infections (UTIs) lead to 23% of all antibiotic prescriptions in primary healthcare. Even in India, UTIs account for about 8.1 million prescriptions each year. Diagnosis of UTI is a multistep process including determination of pathogen load, identification, and AST requiring culture of sample, which takes around 48–72 h. Even if high-throughput automated systems like Vitek 2, Microscan Walkaway, or Phoenix are used, the results are not available faster than 28 h [6,7]. Conventional urine culture and AST method is not accessible to most clinicians practicing in low-resource settings. Even with access to lab testing facilities, but in the absence of any rapid test, clinicians are forced to prescribe antibiotics empirically. The empirical antibiotics used in the first 48–72 h prove to be ineffective against infection in approximately 33% cases [8–10]. Unresolved, relapsed UTIs tend to be resistant to previously used antibiotics [11]. Nearly 23% to 33% of the prescriptions for UTIs have been found to have no clinical justification [8–10]. Moreover, UTIs are also caused by non-bacterial organisms such as Candida (3% cases) [12], Trichomonas (17% cases) [13], Chlamydia (~16% cases) [14], and rarely Mycobacterium, Schistosoma haematobium, Adenovirus, BK polyomavirus, and mycoplasma [15], and cannot be treated by antibiotics that are empirically prescribed. Such unnecessary drug use is often harmful, and results in multidrug-resistant infections and reduced options for antimicrobial therapy [16]. An urgent need is perceived for developing suitable field operable test for prescribing targeted antibiotics [17]. Addressing the menace of antimicrobial resistance needs a scalable rapid diagnostic test, which gives detection, identification, quantification, and phenotypic antimicrobial susceptibility of bacteria within a minimum turn-around time and has an integrated technology platform for clinical adoption. This test should show high sensitivity and specificity, should be low cost for adoption in low-resource settings, and should be easy to use with minimal training [18,19].

Assays used for upstream screening to improve diagnostic yield of positive samples, like gram staining, dipstick with leukocyte esterase and nitrite, pus cell count [20], urine analysis and microscopy [21], chlorhexidine [22], interleukin-8 [23], Griess test [24], microstix [25], serum procalcitoninlevel [26], and urine catalase-based uriscreen test [27] have shown poor sensitivity and specificity. Novel antibody-based lateral flow immunoassay (RapidBac) [28], chromogenic limulus amoebocyte lysate assay [29], and flow cytometry-based systems (Accuri-6, UF 100, UF-1000i) [18,30,31] have shown high sensitivity and specificity but they do not provide identification of bacteria and its antimicrobial susceptibility. Forward light scattering systems like Uro-Quick (Alifax) and BacterioScan model 216 (BacterioScan Inc., St. Louis, MO, USA) provide detection of bacteria with antimicrobial susceptibility but do not identify the causative bacteria [32,33].

Molecular and proteomic technologies require overnight incubation on culture plates, and do not provide antimicrobial susceptibility [18]. Matrix Assisted Laser Desorption Ionization-Time Of Flight (MALDI-TOF) is expensive to install [34], Fluorescence In Situ Hybridization (FISH) requires multiple probes for all uropathogens [18], while multiplex Polymerase Chain Reaction (PCR) platforms like GeneXpert Omni and Cepheid [18] do not provide quantification of significant bacteriuria and need multiple probes for all uropathogens. Application of these test to direct urine testing needs extensive sample preparation.

Genetic signature identification Confirming Active Pathogens Through Unamplified RNA Expression (CAPTURE) assay [35] identifies bacteria but does not give antimicrobial susceptibility. Time-lapse microscopy-based systems like oCelloScope (Phillips BioCell) [36] and Accelerate ID/AST (Accelerate Diagnostics) [37] provide both identification and antimicrobial susceptibility, but phenotypic measures for identification in direct urine are not precise and they are not easy to use in a clinical lab setting. Integrated microfluidic-Biosensor assays based on ion mobility spectrometry or colorimetric sensor arrays are cost effective and sensitive but their results get confounded by urine variability and in presence of low bacterial count [18]. Most of the newer technologies mentioned above are neither easy to use nor affordable in a resource-poor setting like public hospitals of India. In this study, we evaluated a rapid, portable, easy-to-use, less resource-intensive, and affordable technology, which provides bacterial identification and AST results within 4 h. This new technology integrates the basic tenets of clinical microbiology including bacterial growth in a medium optimized for uropathogens and measurement of inhibition of bacterial growth in presence of specific antibiotic, with detection of bacteria based on chromogenic endpoint by enzymatic hydrolysis of specific media cocktails by UTI causing bacteria. The optical sensor-based measurement of endpoint output is analyzed using indigenous software, based on a lab-developed statistical algorithm, which reports both the sensitivity of the pathogen to a customizable panel of antibiotics and bacterial load in the sample. This integrated technology platform can be used for diagnosing UTIs caused by bacteria and for suggesting effective antibiotics in all types of clinical settings as a preliminary triage test [38] to promote evidence-based prescription and minimize irrational use of antibiotics. The low cost of the test obliviates the need for upstream screening with poor sensitivity screening tests and promotes scalability for use in mass population. The objective of the present study was to evaluate the diagnostic accuracy of the novel test in UTI cases as compared to the gold standard urine culture method.

#### **2. Materials and Methods**

#### *2.1. Study Design, Setting, and Population*

The study was conducted over a 2-year period from January 2017 to December 2018, simultaneously in Gandhi Medical College and Hospital, Secunderabad located in Southern India, and All India Institute of Medical Sciences (AIIMS), Jodhpur located in Northern India. To ensure sufficient case load for achieving required sample size and to ensure that good lab practices are followed, Laboratory of Gandhi Hospital, which is the referral laboratory of State of Telangana, and AIIMS, which is a premiere tertiary care hospital, were chosen for this study.

#### *2.2. Ethical Approval*

The study was reviewed and approved by Institutional Ethics Review Committee of both institutions. Objectives of the study were explained to all participants in their native language and they were enrolled after obtaining a written informed consent. The study was conducted according to the principles expressed in the Declaration of Helsinki.

#### *2.3. Study Oversight*

This prospective clinical validation study was designed to evaluate diagnostic accuracy of the novel/index test with the reference gold standard urine culture and AST method. Eligible participants were referred by clinicians for urine culture and sensitivity test, based on a provisional diagnosis of UTI. Patients who received antibiotics in the preceding two weeks or had indwelling or suprapubic catheter were excluded. Consenting participants were evaluated in microbiology laboratory by history taking and review of medical records.

#### *2.4. Test Methods for Bacterial Culture and Identification*

Clean-catch mid-stream urine samples were collected from each enrolled participant in a sterile container and divided into two parts under sterile conditions. One part was used for routine urine culture and AST and the second for conducting the index test in the hospital premises itself. All samples were processed within 2 h of collection to avoid contamination/bacterial growth.

The index test was the novel test designed for direct quantitative detection and antibiotic sensitivity of bacteria found in human urine [39,40]. The test identifies common UTI-causing bacteria, namely *Escherichia coli*, *Klebsiella*, *Pseudomonas*, *Enterococcus*, *Proteus*, and *Staphylococcus* sp. This rapid method replicates the basic tenets of clinical microbiology, namely (1) growth of bacteria in a specialized medium, and (2) measuring the inhibition of growth of bacteria in the presence of an antibiotic. Detection is based on chromogenic endpoints. The output was analyzed using lab-developed algorithm-based software, which reports the sensitivity of the pathogen to the panel of antibiotics tested. The urine sample was collected in a sterile container. To harvest the bacteria, 10 mL urine was filtered through a sterile syringe with the help of a micro-filter attached to it and filtrate was discarded. After that, BITGEN, specially designed media for accelerated growth of uropathogens, was pushed through the filter in the vial to recover bacteria from the filter, shaken well, and then closed with the dropper cap. The bacteria were harvested in 3 mL of proprietary BITGEN medium. This was then set side at room temperature for about 5 min. Subsequently, four drops (~110–120 μL) of proprietary BITGEN medium containing harvested bacterial suspension was added into all the three strips—one pre-functionalized strip for identification of bacteria and two different 8-well strips, pre-loaded with antibiotics. All the strips were resealed and incubated at 37 ◦C for 4 h. A 4-h incubation period was found to be sufficient for all commonly found uropathogens accounting for 98% cases of UTIs. The media was optimized for nutrients and supports growth up to 8 h with a start bacterial number of 10<sup>5</sup> cells/mL [39]. BITGEN is a proprietary media that has chromogens sensitive to bacterial growth even at low numbers of bacteria and for rapid culture. The enzymatic hydrolysis of specific media cocktails used in this proprietary media metabolizes the chromogens. For identification of bacteria, the 8 wells in the identification strip had a cocktail of specific substrates, which were metabolized by specific bacterial types. Growth of bacteria in the well led to end product formation during the 4-h incubation. The use of optical sensor enables measuring of all color combinations and the lab-developed analytical software interprets the identification of the bacteria based on specific chromogenic endpoints produced as a consequence of specific metabolic activity of each bacterial type. For both identification and AST, the sample was loaded at the same time and incubated for the same length of time.

To identify susceptibility of pathogen, the above-mentioned two pre-functionalized antibiotic strips were used. Each of the antibiotic strips had 8 compartments and, except the first compartment (or reference well) of each of the two antibiotic strips, all the remaining 14 compartments were subjected to preloading by the chosen antibiotics. The preloaded antibiotics used were Amoxicillin, Gentamicin, Amikacin, Cefepime, Ofloxacin, Ciprofloxacin, Ceftriaxone, Piperacillin-Tazobactum, Cefotaxime, Cefuroxime, Tobramycin, Levofloxacin, Cefazolin, and Imipenem. The concentration and composition of the antibiotics were chosen as per Clinical and Laboratory Standards Institute (CLSI) guidelines [41].

In the case that the urine sample had pathogens, it was reflected in the first well of the antibiotic strips, referred to as the reference well of both the antibiotic strips as there is no inhibition of bacterial growth in this well. As per phenotypic AST of bacteria present, the remaining 14 compartments showed varied levels of bacterial growth depending on the bacterial susceptibility to the chosen antibiotics. The bacterial growth within the preloaded antibiotic compartment was represented by a change in color of the BITGEN, measured by chromogenic and nephelometric endpoints using an array of 64 photodiodes in an electronic optical sensor. The intensity of the color is a measure of the number of growing cells in the presence and absence of a particular antibiotic. The sensor output was

analyzed using a proprietary lab-developed statistical algorithm, pre-installed on the reader, which provides ready-to-use results for sensitivity of the pathogen to the antibiotics tested, both as a display on liquid crystal display (LCD) screen and a printout for permanent records. The reader was also enabled to transfer results to other storage devices using a wireless module and/or a universal serial bus (USB) interface. In case of insufficient growth, the analytical software prompts for incubating for one additional hour and then if no growth is detected, the software reports the sample to be negative for presence of bacteria.

Further, for reference, standard universally accepted, conventional gold standard urine culture and Kirby Bauer method for AST was chosen. First, 10 μL of each urine sample was streaked on a chromogenic culture medium, chromID® CPS Elite Translucent using a calibrated ni-chrome wire loop of 4 mm by semi-quantitative method using surface streaking. The inoculated plates were incubated for 18–24 h at 37 ◦C. After incubation, in case growth of colonies was up to the tertiary streaking, it was considered as significant bacteriuria with 105 Colony Forming Units (CFU)/mL. Positive cultures were further processed for determining the AST by Kirby Bauer Disc Diffusion Method as per Clinical and Laboratory Standards Institute guidelines [41]. A suspension of each isolate was prepared to a McFarland standard and spread over Muller Hilton Agar using lawn culture method. Himedia discs with defined concentrations of antibiotics were placed over the culture. After incubation for 18 to 24 h at 37 ◦C, zones of growth inhibition around each antibiotic disc were measured to the nearest millimeter and a reference table was used to determine susceptibility. The American Type Culture Collection (ATCC) bacterial strains, namely *Enterococcus faecalis*, *Escherichia coli*, *Klebsiella pneumoniae*, *Pseudomonas aeruginosa*, and *Staphylococcus aureus*, were used for quality control in the entire process.

The cut-off for labelling both index as well as reference test as positive was pre-specified as 105 CFU/mL based on Infectious Diseases Society of America guidelines [42]. Neither the team performing the index test nor the one conducting urine culture and sensitivity was provided any clinical information about the participant/s. Both teams were also not informed about the results of the other test and, hence, the index test was conducted in a completely blinded manner.

#### *2.5. Data Analysis*

Collected data and results from both tests for each participant were compiled and analyzed by Statistical Package for the Social Sciences (SPSS) software (version 24). A contingency table was used for determining diagnostic accuracy and kappa statistics was used for agreement analysis. Further, 95% confidence interval (CI) was used to describe diagnostic accuracy, with *p* values of <0.05 considered as significant. Sample size was calculated to be 600 for estimating the sensitivity of the index test, based on a precision of 4% and confidence level of 95%, when the sensitivity of the new test was expected to be at least 50%. "Best-case scenario method" was used for indeterminate results and mixed growth. Samples with rare species, budding yeast cells, and contaminated samples were removed from final analysis for a "complete case analysis". No analysis of variability in diagnostic accuracy was performed with respect to age group or department, as it was not pre-specified in the study. The raw data generated from the study which was used to analyze these results has been made publicly available as a safe harbor file in online repository "Harvard Dataverse" [43].

#### *2.6. Reagents*

Analytical-grade chemicals required for preparation of BITGEN, identification strips, and antibiotic strips were procured from Sigma Chemicals, St Louis, MO, USA. Chromogenic culture media, Muller Hilton media, and antibiotic discs were procured from Himedia, India; chromID® CPS Elite Translucent from BioMérieux, France; 8-well strips and syringe filters from NUNC, Denmark; sterile syringes from Dispovan, India. The scanner/reader machine for novel test was obtained from Micro Lab Instruments, Ahmedabad, India. The bacterial strains *Enterococcus faecalis* (ATCC29212), *Escherichia coli* (ATCC25922), *Klebsiella pneumoniae* (ATCC13883), *Pseudomonas aeruginosa* (ATCC27853), and *Staphylococcus aureus* (ATCC25923) were purchased from Himedia, India.

#### **3. Results**

#### *3.1. Study Characteristics*

Overall, 2001 eligible participants (1030 in AIIMS and 971 in Gandhi Hospital) were identified and 1986 participants (1022 in AIIMS and 964 in Gandhi Hospital) were enrolled in the study. Data of 1835 participants (982 in AIIMS and 853 in Gandhi Hospital) were included in the final analysis. A total of 55 samples (20 from AIIMS and 35 from Gandhi Hospital) with low sample volume could not be processed by the index test.

There were no indeterminate results reported by the index test in both the hospitals. One hundred and eleven participants (97 in AIIMS and 14 in Gandhi Hospital) with indeterminate reference standard urine culture results were reported as having no bacterial growth and were reclassified as true negatives using best-case scenario. Samples with mixed growth in both index and reference standard tests were considered positive for UTI. Fifty-five samples (5 from AIIMS and 50 from Gandhi Hospital) were reported as contaminated and not considered for final analysis. Since the index test is designed for identifying the most common bacteria only, 19 samples (6 from AIIMS and 13 from Gandhi Hospital) with budding yeast cells and 22 samples (9 from AIIMS and 13 from Gandhi Hospital) with rare bacteria (*Citrobacter*, *Acinetobacter*, *Morganella*, and *Providencia*) were also excluded from the final analysis. Thus, a total of 96 cases were excluded from final analysis after performing both tests (Figure 1). Table 1 summarizes the mean age, gender distribution, and referring departments. The majority of participants had cystitis, and more male patients were referred at AIIMS than Gandhi Hospital. Ninety-seven cases (10%) cases in AIIMS cohort and nine cases (1%) in Gandhi Hospital cohort had progressed to frank pyelonephritis.


<sup>1</sup> AIIMS = All India Institute of Medical Sciences.

**Figure 1.** Flow of participants through the study—Standards for Reporting Diagnostic Accuracy (STARD) diagram (G = Gandhi Hospital, A = AIIMS).

#### *3.2. Test Performance*

There was no time gap between processing of samples by both tests. No adverse event occurred while performing index test or reference standard test since only urine sample collection was involved. In AIIMS, 609 cases, while in Gandhi Hospital, 273 cases, were diagnosed with symptomatic UTI based on positive culture results by conventional method. Furthermore, 953 participants (373 in AIIMS and 580 in Gandhi Hospital) with symptoms of UTI, showed low colony count on culture plates. Out of these, the index test reported 172 cases (72 in AIIMS and 100 in Gandhi Hospital) as positive, which were otherwise reported as negative by conventional method, 48 h post incubation.

#### 3.2.1. Diagnostic Accuracy

AIIMS cohort showed a higher sensitivity (92.9%) while Gandhi Hospital cohort showed a marginally higher specificity (82.8%). The sensitivity and specificity in both the validation sites were within 95% confidence interval of the other hospital (Table 2). The sensitivity and specificity obtained by use of the novel test was well within the stipulated limits laid down in the recommendations issued by the European Urinalysis Guidelines for rapid tests.


**Table 2.** Comparison of test results obtained by novel test and urine culture method.

<sup>1</sup> Kappa value and its standard error measures agreement between results of two dichotomous variables (here two diagnostic tests providing positive or negative results).

#### 3.2.2. Agreement Analysis

Good agreement was observed at both validation sites, as seen by a Kappa = 0.741. The observed agreement is statistically significant as reflected by a *p* value of <0.0005.

#### *3.3. Identification of Bacteria*

The index test correctly reported the causative bacteria as reported positive by urine culture in 82% cases in AIIMS cohort and 80% cases in Gandhi Hospital cohort (Table 3). In Gandhi Hospital, four cases of *Streptococcus* were not reported by the index test as it is not designed to identify the same. Out of 17 mixed growth in Gandhi cohort, the index test identified seven as individual bacteria, while among 49 in AIIMS cohort, it identified 41 as individual bacteria.


**Table 3.** Identification of bacteria in the two cohorts.

#### *3.4. Antibiotic Susceptibility*

The index test used the same set of 14 antibiotics for every sample, while AIIMS and Gandhi Hospital laboratories used specific antibiotics based on identified bacteria. Hence, only a subset of antibiotics overlapped for both tests. Further, as the conventional method relied on the choice of antibiotics by the microbiologist in-charge, sets of antibiotics tested in both tests were also not used for all the samples tested. Antibiotics tested for at least 30 samples in both the tests were included in analysis presented in Table 4. The rapid index test correctly reported sensitivity and resistance to antibiotics in 91% and 96% cases, respectively, in AIIMS cohort, and these numbers were 87% in the case of sensitivity to tested antibiotics and 92% in the case of resistance to antibiotics reported for the Gandhi Hospital cohort.


R = Resistant; S = Sensitive; and I= Intermediate; a—complete agreement, b—very major error, c—major error, and d—minor error. (Same test results either susceptible or resistant by both tests, were classified as "complete agreement" and result reported as resistant by culture and susceptible by novel test was labelled as "very major error"; susceptible by culture but resistant by novel test was labelled as "major error"; intermediate by culture and susceptible or resistant by novel test was labelled as "minor error"). Please note that no intermediate results were reported by novel test and by Gandhi Hospital culture reports.

#### **4. Discussion**

Treating patients, including UTIs caused by bacteria, is a challenging task, and development of rapid AST is very important to provide better healthcare services. Use of microbiological culture method and Kirby–Bauer disc diffusion tests are well established for diagnosis of UTIs in healthcare facilities worldwide. However, this entire method needs trained microbiologists and its major limitations are long turn-around time, resource intensiveness in the form of lab infrastructure, and requirement of cold chain for supply and storage of reagents [44]. In resource-constrained settings with poor or limited access to laboratory-based testing, performing urine culture and AST is not feasible. Therefore, initial antibiotic therapy in infectious diseases such as UTIs which accounts for ~40% cases of all infections as per World Health Organization (WHO), is mostly empirical. Hence, an alternative method, like the index test described herein, for reporting antibiotic sensitivity in a short period of 4 h, with no ancillary resource requirement, will not only be beneficial for patient care, but also curtail unnecessary antibiotic prescriptions. Additionally, availability of results in 4 h saves the repeat visit of patients to collect lab reports made available only after three days under best conditions and often even longer in remote and hard-to-reach geographical locations.

The high-cost, resource-intensive, non-portable, most commonly used automated systems Vitek 2 and MicroScan Walkaway provide AST and identification results in more than 28 h [6,7]. Reports evaluating the susceptibility of only Gram-negative bacilli to 11 antibacterial using these two systems showed the results in 92.7% of isolates and overall concurrence with the standard test being 94% with a 3.4% major error rate [45]. With reference to preventing emergence of resistance to antibiotics, they still pose a major limitation in terms of time taken to complete the identification and antibiogram profile of UTI causing pathogen. Most of the newer technologies tried for UTI [18] are facing limitations like the need for an overnight incubation, extensive initial sample preparation, need for an upstream screening test, and lack of integrated technology platform for clinical adoption. These technologies are expensive and not easy to use. Most of them do not give antimicrobial susceptibility. Previously tried strip-based tests also showed less sensitivity [46]. Even automated urine analyzers have resulted in low sensitivity [47]. In comparison, this index test is portable, can be used in all healthcare settings, costs less than 0.4 million INR (~5000 USD), needs no ancillary equipment or dedicated space, and provides ready-to-use antibiogram results and microbial identification within 4 h. The sensitivity and limitation of other tests are summarized in Table 5. The higher sensitivity, >90%, and specificity, >80%, of the index test, with kappa values indicating very good agreement with gold reference standard test, show that it has good diagnostic accuracy as a rapid test [48] for its role as a preliminary triage test and its intended use of diagnosing bacteriuria and preventing irrational prescription of antibiotics. Although the gold standard for diagnosing UTI remains as urine culture, the high cost, laboratory requirements, and long turnaround times (24–72 h) are its disadvantages. Further, in this type of testing, recognition and classification of bacteria is associated with the experience of laboratory technicians. The novel/index test developed is a simpler system and shows better agreement (Kappa = 0.741, significant substantial agreement) with the gold standard and is therefore best suited for routine use in clinical laboratories.

This novel test correctly identified sensitivity to multiple antibiotics in more than 75% instances (and in several cases with 100% accuracy), which then becomes the basis for evidence-based, rational use of antibiotics for specific therapy. The availability of results within 4 h will discourage unnecessary prescription of antibiotics in case of absence of bacterial disease and help the physician to prescribe antibiotics that are identified to be effective against the causative pathogen.


**Table 5.**


*J. Clin. Med.* **2019** , *8*, 2098



#### *J. Clin. Med.* **2019** , *8*, 2098

The incidence of UTI from this hospital-based study may not be generalized over the entire population as the present study was conducted in tertiary hospitals to enroll enough participants in the shortest possible time and simultaneous comparison with reference gold standard test without any loss of time in processing or transport of collected samples. Due to the conventional practice in microbiology labs on the choice of antibiotics, which may also be governed by the availability of antibiotics discs, antibiotics could be compared for sensitivity in a subgroup of samples tested. In spite of this, the novel test reported resistance to antibiotics with an accuracy of 80% to 100%, except for two antibiotics, cefazolin (71%) and levofloxacin (65%). This was primarily seen in samples with more than one bacterial entity with a high probability of quorum sensing, impacting the sensitivity to the given antibiotic/s.

Rapid assay described herein determines the efficacy of an antibiotic not only in the shortest possible time, but also with literally no dependence on trained manpower and lab infrastructure. Although this novel test is suitable for use in all healthcare settings, it can prove to be of immense and unsurpassable value for healthcare facilities in low-resource settings due to features like portability, point-of-use testing, and no additional requirements. A consensus using Delphi technique, obtained from experts regarding criteria required for an acceptable point-of-care test for UTI detection, was reported by Weir et al. [49]. This novel test fulfils 25 out of 26 accepted criteria, except for just one, i.e., use of small sample volume. The novel test also fulfils six out of seven of the WHO's ASSURED criteria for ideal characteristics for a point-of-care test in resource-limited settings, the sole unfulfilled one being equipment free, and also matches all the revised criteria suggested by Paul et al. [50]. Effective diagnosis is a prerequisite for successful therapy, and early and accurate diagnosis results in timely and appropriate treatment.

#### **5. Conclusions**

In conclusion, it can be said that this novel test, with high sensitivity and specificity for detecting bacterial UTI and reporting antibiogram, can be used as a triage test for diagnosing UTIs and suggesting appropriate treatment for an evidence-based prescription for antibiotics in any kind of healthcare settings. In the wake of growing AMR, the prevalent "lack of priority for diagnostics over treatment" needs to be addressed urgently and this novel test enables physicians and labs to achieve this by adopting this affordable and portable IVD test before prescribing antibiotics for treatment of infectious diseases.

**Author Contributions:** Conceptualization, S.K., and S.S.; methodology, M.G.; software, S.K.; validation, M.G., S.K., N.K., P.B.,V.N., and S.M.D.; formal analysis, M.G.; investigation, M.G. and S.M.D.; resources, S.K., N.K., and V.N.; data curation, P.B. and M.G.; writing—original draft preparation, M.G., S.K., and S.S.; writing—review and editing, M.G., S.K., and S.S.; visualization, S.K. and S.S.; supervision, S.K., S.S., N.K., P.B., V.N., and S.M.D.; project administration, M.G., V.N., S.M.D., and S.K.; funding acquisition, S.K.

**Funding:** This technology platform was developed with funding received as a research grant from Defence Research and Development Organization, India under their NPMASS program and the clinical validation was undertaken from grants received from DBT under their SPARSH and GYTI-SRISTI schemes. The study and researchers were independent of the funding agency and the funding agencies had no role in design of the study, collection of data, collection of samples, processing of the novel/index test, analysis, and interpretation of data, writing of paper or submission for publication. The authors had full control of all primary data.

**Acknowledgments:** We gratefully acknowledge the support provided by the Government of Telangana supported Microbiology Department of Gandhi Medical College and All India Institute of Medical Sciences, Jodhpur for conducting the study and providing space and facilities for hosting the index test in their premises. We also acknowledge the support of Mr. M. Rajesh Bhatt, the lab technician for coordinating sample collection at Gandhi Hospital. Technical help from the technicians in Genomics Laboratory of BITS Pilani, Hyderabad Campus is also gratefully acknowledged.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Characteristics of Mild Cognitive Impairment in Northern Japanese Community-Dwellers from the ORANGE Registry**

### **Yu Kume 1, Tomoko Takahashi 2, Yuki Itakura 3, Sangyoon Lee 4, Hyuma Makizako 5, Tsuyosi Ono 6, Hiroyuki Shimada <sup>4</sup> and Hidetaka Ota 3,\***


Received: 12 September 2019; Accepted: 8 November 2019; Published: 10 November 2019

**Abstract:** A gradually increasing prevalence of mild cognitive impairment (MCI) is recognized in the super-aging society that Japan faces, and early detection and intervention in community-dwellers with MCI are critical issues to prevent dementia. Although many previous studies have revealed MCI/non-MCI differences in older individuals, information on the prevalence and characteristics of MCI in rural older adults is limited. The aim of this study was to investigate differential characteristics between older adults with and without MCI. The investigation was conducted over one year from 2018 to 2019. Participants were recruited from Akita in northern Japan. Neuropsychological assessments were applied to classify MCI, including the National Center for Geriatrics and Gerontology Functional Assessment Tool (NCGG-FAT) and the Touch panel-type Dementia Assessment Scale (TDAS) based on the Alzheimer's disease assessment scale. Our samples consisted of 103 older adults divided into 54 non-MCI and 49 MCI. The MCI group had lower scores of all cognitive items. Our results showed that individuals with MCI had significantly slower walking speed (WS) and worse geriatric depression scale (GDS) compared to non-MCI. In addition, WS was significantly associated with some cognitive items in non-MCI, but not in MCI. Finally, we showed that predictive variables of MCI were WS and GDS. Our study provides important information about MCI in rural community-dwellers. We suggest that older adults living in a super-aging society should receive lower limb training, and avoiding depression in older adults through interaction of community-dwellers may contribute to preventing the onset of MCI.

**Keywords:** older adults living in super-aging society; mild cognitive impairment; walking speed; depression

#### **1. Introduction**

Mild cognitive impairment (MCI) is a transitional state of cognition between normal ageing and dementia that may progress to dementia. MCI is defined by subjective or objective evidence of cognitive decline greater than expected for the individual's age and education level but that does not interfere notably with activities of daily life, and the early detection and prevention of MCI are a challenge to prevent dementia in older adults [1]. Within established processes for making a diagnosis of MCI [2], some factors in its early detection remain unclear, as well as predictors of reversion from MCI to normal cognition. Differences between individuals without MCI and those with MCI have been studied and reported, which shows that many factors such as a lack of exercise [3], cerebrovascular factors [4], and anxiety [5] affect cognitive function. Especially, it is appreciated that cognitive and physical impairments in older adults are related through shared pathophysiological mechanisms [6]. Some studies show that older adults with MCI compared to individuals without MCI perform more poorly not just on neurocognitive performance, but also on complex motor and psychomotor domains [7–9], and exhibit greater gait impairment [10–14]. Recently, it has become clear that MCI and physical frailty are related. The physical phenotype of frailty is represented by low levels of lean body mass, muscle strength, gait performance, physical activity, and exhaustion [15]. Gait performance of the frailty is associated with cognitive decline and MCI conversion to AD as reported by [9,16]. Therefore, investigating close associations between MCI and physical function has important implications for improving diagnostic acuity of MCI and targeting interventions to prevent dementia and disability among older adults.

To clarify the dementia risk associated with MCI or early stage dementia, a nationwide clinical registry called the Organized Registration for the Assessment of dementia on Nationwide General consortium toward Effective treatment (ORANGE) is ongoing in Japan [17]. The recruitment of many registrants has been in progress in several regions of Japan from 2017, and we performed an extending preclinical trial in a cohort in northern Japan up to 2019. As is well known, gradual growth of the older population has been experienced in Japan. Especially, northern rural areas in Japan (Akita prefecture) are the most super-aging society in the world (e.g., the number of individuals over aged 75 in Akita is estimated to reach 205,000 people by 2025 [18]). Although there are few epidemiological data regarding MCI in rural areas of Japan, several studies have reported MCI profiles in older adults [19,20]. Most of them are focused on the prevalence of MCI or the conversion rate to dementia, and the detailed cognitive profile (e.g., attention, executive function, information coding skill, etc.) of MCI is not covered, as well as scarce epidemiological data regarding health-related variables such as physical performance and mental status. Therefore, we analyzed the data of a prospective cohort in northern Japan. In this study, we investigated which factors were related to MCI status according to the National Center for Geriatrics and Gerontology Functional Assessment Tool (NCGG-FAT) [21,22] and the Touch panel-type Dementia Assessment Scale (TDAS) [23,24] based on the Alzheimer's disease assessment scale (ADAS) [25]. To clarify the characteristics of rural older adults with MCI, we focused on three points as follows. First, we mainly compared cognitive function, physical performance, and depressive symptoms in MCI individuals with those in non-MCI individuals. Second, we examined correlations between physical performance and cognitive and mental function in each group (i.e., non-MCI group and MCI group). Finally, a binomial logistic regression model was estimated to determine predictive factors for MCI in rural older adults in Japan.

#### **2. Experimental Section**

#### *2.1. Participants and Study Design*

The participants were recruited in a rural area in Akita with a small population (total 32,440) with a super-aged rate of 38.7% according to public information, from 2018 to 2019. The inclusion criteria were age 65 years and over, having walking ability without personal assistance, and living at home. The exclusion criteria were dementia, major depression, severe hearing or visual impairment, stroke, Parkinson's disease, other neurological disease, intellectual disability, need for support or care as certified by the Japanese public long-term care insurance system due to disability, and inability to complete cognitive tests at the baseline assessment. The study was approved by the ethics committee of the Faculty of Medicine, Akita University (approval No. 1649) and was performed in accordance

with the Declaration of Helsinki II. Informed consent was obtained from all participants. According to sample size calculations using G\*Power for unpaired *t* test [26], we estimated a sample size of 64 participants per group to detect a clinically significant effect with α = 0.05, power = 80%, and effect size = 0.50.

#### *2.2. Assessment and Outcome*

After obtaining informed consent from each participant, demographics (age, gender, and education) and health variables (body mass index (BMI), medical history of hypertension and diabetes, frail phenotype, medication and Geriatric Depression Scale-15 (GDS)) were collected according to the ORANGE protocol. A questionnaire sent in advance by mail was self-described by each participant, including age, gender, educational duration, presence of hypertension and diabetes (e.g., yes or no), amount of medications, and GDS (e.g., score range from 0 to 15, as indexed more depressive symptoms in higher scores). Height and weight to calculate BMI were measured by public health nurses. Five components of the National Center for Geriatrics and Gerontology-Study of Geriatric Syndromes (NCGG-SGS) [27] based on the Fried frailty index [15] were applied to assess frailty: (i) self-reported unintentional weight loss (i.e., a decrease of 2–3 kg over six months [28]), (ii) self-reported exhaustion (i.e., presence of fatigue for two weeks [28]), (iii) self-reported low physical activity (i.e., no exercise habit for a week [29]), (iv) weakness (i.e., grip strength (GS) less than 26/18 kg for male/female [30]): GS was measured using a Smedley-type handheld dynamometer (GRIP-D; Takei Ltd., Niigata, Japan), and (v) slow walking speed (WS) (i.e., less than 1.0 m/s in 5 m walking test [29]): walking time was measured over a 2.4-m distance in seconds using infrared sensors and participants' WS (m/s) was calculated. They were used to define robust (score of zero), pre-frail (score of 1 to 2), and frail (score of 3 to 5). The frail index of NCGG-SGS is almost equal to the original index of Fried's study [15] except the modified cut-off values for slowness and weakness are appropriate criteria for physical frailty assessments in the Japanese older population [31,32]. The present study also applied NCGG-FAT and TDAS based on ADAS to assess cognitive function in the participants and to divide the participants into non-MCI and MCI groups. All the variables of five frail components, the NCGG-FAT and TDAS were evaluated by trained public health nurses throughout a comprehensive health checkup in a local spot.

#### *2.3. Components of NCGG-FAT*

The computerized multidimensional neurocognitive test was performed on an iPad (Apple, Cupertino, CA, USA) with a 9.7-inch touch display. The task instructions were presented with a letter size of at least 1.0 <sup>×</sup> 1.0 cm<sup>2</sup> on the display. For this study, a trained operator supported each participant by setting up the tablet PC and running each test. Participants completed the NCGG-FAT subtests as follows.

#### 2.3.1. Tablet Version of Word Recognition (WR)

This test is comprised of two computerized tasks of immediate recognition and delayed recall. In the first task of immediate recognition, participants were instructed to memorize 10 words, each of which was displayed for 2 s on the tablet PC. After that, a total of 30 words including 10 target and 20 distracter words were shown to participants, and they were required to select the 10 target words immediately. This task was repeated for three trials. The average number of correct answers was recorded as a score ranging from 0 to 10. In another task, participants were asked to correctly recall the 10 target words after 20 min. The number of correctly recalled target words was scored ranging from 0 to 10. Finally, we calculated the sum score of the two tasks of immediate recognition and delayed recall.

#### 2.3.2. Tablet Version of Trail Making Test Version A (TMT-A) and Version B (TMT-B)

In the Trail Making Test Version A (TMT-A) task, participants were instructed to touch the target numbers in a sequence as rapidly as possible. Target numbers from 1 to 15 were randomly displayed on the tablet panel. In addition, the Trail Making Test Version B (TMT-B) instructions required participants to touch target numbers (e.g., 1–15) and letters in turn. The required time (seconds) to complete each task was recorded, within a maximum time of 90 s.

#### 2.3.3. Tablet Version of Symbol Digit Substitution Task (SDST)

In the Symbol Digit Substitution Task (SDST), nine pairs of numbers and symbols were shown in the upper part of the tablet display. A target symbol was shown in the center of the tablet panel, and selectable numbers were displayed at the bottom. Participants were asked to touch the number corresponding to the target symbol shown in the central part of the tablet display as rapidly as possible. The number of correct numbers within 90 s was recorded.

#### *2.4. Components of TDAS*

The TDAS test was presented on a 14-inch touch panel display. The TDAS subtests consisted of seven of the ADAS-cog test items (11 test items) and two other tasks. Participants were instructed verbally or visually by the computer to complete the TDAS subtests as follows.

#### 2.4.1. WR

The WR test was a computerized test based on the WR task of ADAS-cog. At the start of instructions for this task, 12 target words were individually presented on the display for 3 s each at 2 s intervals. After demonstrating the target words, the computer randomly displayed 24 words consisting of 12 target words and 12 non-target words. Participants were then instructed to respond by touching the displayed button of 'yes', 'no', or 'unknown' in response to the question regarding whether the word had been shown previously. Participants completed the trial three times. The total number of incorrect responses for three trials was recorded, with a maximum score of 72.

#### 2.4.2. Following a Command

This task was modified from the command task of ADAS-cog. The computer presented 10 selectable icons labelled from 0 to 9 and then required participants to touch the number specified. The number of incorrect responses in two trials was scored with a maximum score of 2.

#### 2.4.3. Orientation

This task was based on the orientation task of ADAS-cog. The computer displayed four screens in sequence. On each screen, participants were asked to touch selectable icons and answer what year, month, day, and weekday it is. The number of incorrect responses was scored with a maximum score of 4.

#### 2.4.4. Visual-Spatial Perception

This task was modified from the constructional praxis task of ADAS-cog to evaluate visual-spatial perception. ADAS-cog requires subjects to copy the geometric forms presented. The computer first presented four screens displaying a target geometric form (i.e., a square, rhombus, cube, or triangular prism) for 5 s each. Participants were then required to correctly select the target form in response to a question task including the target form and four non-target forms. The number of incorrect responses was scored with a maximum score of 4.

#### 2.4.5. Naming Fingers

This test assessed whether participants can name the fingers correctly, using the protocol of ADAS-cog. Participants were asked to correctly respond to a picture question of a hand marked with a red circle, by touching an icon labelled with the five finger names. An incorrect response was scored as one point, with a maximum score of 5.

#### 2.4.6. Object Recognition

This task was based on the naming objects task of ADAS-cog. Participants were instructed to touch the correct usage icon (e.g., a pair of scissors, comb or broom) of five selectable icons labelled with the purpose of usage. Three trials were completed, and an incorrect response was scored as one point (maximum score = 3).

#### 2.4.7. Accuracy of Order of a Process

This task was modified from the ideational praxis of ADAS-cog. The computer displayed seven icons labelled randomly with seven actions. Participants were asked to correctly touch the icons in order. The number of incorrect responses was recorded, with a maximum score of 5.

#### 2.4.8. Money Calculation

This task assessed the money calculation ability of each participant. Participants needed to combine coins equal to an amount of money from various denominations of coins displayed on the screen. Three trials were completed, and an incorrect response was scored as one point (maximum score = 3).

#### 2.4.9. Clock Time Recognition

This task included three kinds of question regarding clock time recognition. Participants were instructed to correctly state the time shown on a clock displayed on the screen. The number of incorrect responses was recorded, with a maximum score of 3.

#### *2.5. MCI Classification by NCGG-FAT and TDAS*

According to Petersen's report [2] in which individuals who showed cognitive impairment but were independent in activities of daily living were defined as having MCI, we applied MCI classification according to the cutoff point of NCGG-FAT or TDAS. For all cognitive subtests of NCGG-FAT, the standardized threshold in each corresponding domain for defining impairment in Japanese population-based cohorts consisting of older community-dwellers is a score more than 1.5 standard deviations (SD) below the age- and education-specific mean [21]. In TDAS, decreasing scores indicate cognitive improvement (range of scores from 0 to 101), and total scores ranging from 7 to 13 were classified as MCI [23].

#### **3. Analyses**

According to results of the normalization test (Kolmogorov–Smirnov test), Age, Height, Weight, and BMI were used by the unpaired *t* test. Gender (% female), Hypertension (% Yes), Diabetes (% Yes), Weight loss (% Yes), Poor energy (% Yes), and Low physical activity level (% Yes) were analyzed by chi-squared test for 2 × 2 contingency, except for Pearson's chi-square test for Frail phenotype (%, robust/pre-frail/frail) for 2 × 3 contingency. Mann–Whitney test was applied for GS (kg), WS (m/s), Amount of medications (*n*), Education (years), GDS-15 (score), and cognitive measurements of NCGG-FAT and TDAS (Table 1).


**Table 1.** Characteristics of participants with and without mild cognitive impairment (MCI).

\* *p* < 0.05, \*\* *p* < 0.01, \*\*\* *p* < 0.001, Mann–Whitney test was applied for Education (years), Amount of medications (*n*), GDS-15 total score (score), Grip strength (kg), Walking speed (m/s), and cognitive measurements of NCGG-FAT and TDAS. Age, height, weight, and BMI were analyzed by unpaired *t* test, and gender (% female), hypertension (% Yes), diabetes (% Yes), weight loss (% Yes), poor energy (% Yes), and low physical activity level (% Yes) were analyzed by chi-squared test, except for Pearson's chi-square test for frail phenotype (%, robust/pre-frail/frail). SD, standard deviation; IQR, interquartile range; Loss weight, Loss weight more than 3 kg in six months; TMT-A, Trail Making Test A version; TMT-B, Trail Making Test B version; SDST, Symbol Digit Substitution Task; TDAS, Touch Panel-type Dementia Assessment Scale; GDS-15, Geriatric Depression Scale.

As the variables of WS, GS, subtests of NCGG-FAT, TDAS, and GDS-15 total score were not statistically normalized from the Kolmogorov–Smirnov test, Spearman correlation analysis for interval scales was applied to analyze the relationship among Age, GS, WS, subtests of NCGG-FAT, TDAS, and GDS total score for each group (Table 2).

The values of *p*in = 0.2 and *p*out = 0.25 were set up to select independent variables from Tables 1 and 2 for input into a binominal logistic regression model. The regression model was performed by a method of likelihood ratio, and set up the MCI classification as the dependent variable and predictors (i.e., independent variables) according to the following regression models; (i) 11 predictors of Model I include Age, GS, WS, Amount of medications, Education, WR, TMT-A, B, and SDST of NCGG-FAT, TDAS, and GDS-15 total score. (ii) Ten predictors of Model II except for TDAS score include Age, GS, WS, Amount of medications, Education, WR, TMT-A, B, and SDST of NCGG-FAT, and GDS-15 total score. Finally, (iii) six predictors of Model III except for all cognitive variables included Age, GS, WS, Amount of medications, Education, and GDS-15 total score. The model adaptation was examined by Hosmer–Lemeshow test (Table 3). SPSS Version 26.0 for Windows (SPSS Inc., Chicago. IL, USA) was used for analysis, and the level of significance was set at *p* = 0.05.


**Table 2.** Correlations for each group (non-MCI and MCI).

 *p* 0.05, *p* 0.01, Statistics represent Spearman *r* correlations for each parameter. BMI, body mass index; WS, walking speed; GS, grip strength; WR, word recognition; TMT-A, Trail Making Test A version; TMT-B, Trail Making Test B version; SDST, Symbol Digit Substitution Task; TDAS, Touch Panel-type Dementia Assessment Scale; GDS-15, Geriatric Depression Scale-15.



Reference group for analysis was non-MCI group. Model I: Model χ<sup>2</sup> test, *p* < 0.0001; The Hosmer–Lemeshow test, *p* = 0.12; Percentage of correct classifications = 87.4%. Model II: Model χ<sup>2</sup> test, *p* < 0.0001; The Hosmer–Lemeshow test, *p* = 0.84; Percentage of correct classifications = 84.5%. Model III: Model χ<sup>2</sup> test, *p* = 0.002; The Hosmer–Lemeshow test, *p* = 0.02; Percentage of correct classifications = 59.2%. CI, confidence interval; WR, word recognition; TMT-A, Trail Making Test A version; TMT-B, Trail Making Test B version; SDST, Symbol Digit Substitution Task; TDAS, Touch Panel-type Dementia Assessment Scale; GDS, Geriatric Depression Scale-15.

#### **4. Results**

Our samples consisted of 103 older participants divided into 54 non-MCI people and 49 MCI people. We confirmed that the MCI group had significantly lower scores or longer required times of all cognitive items including WR test, TMT-A, B, SDST and TDAS scores than the non-MCI group (*p* < 0.0001) (Table 1). Demographic and health data including Age, Gender, BMI, presence of Hypertension or Diabetes, Frail phenotype, presence of Weight loss, Poor energy, Low physical activity level, Amount of medications, and Education showed no significant difference between the non-MCI group and MCI group. Of physical assessments, WS was significantly different between the groups (*p* = 0.03), whereas GS was not different (*p* = 0.25). Moreover, the MCI group showed a worse score of GDS (*p* = 0.046 < 0.05). Next, we examined correlations between physical performance, cognitive and mental function in each group (Table 2). According to the results of Spearman correlation analysis, WS was associated with some items of cognitive subtests including WR, TMT-A, B, and SDST in the non-MCI group (|r| > 0.30, *p* < 0.01), but these were not significant in the MCI group except for correlations between cognitive items and Age or Education. Finally, we performed an analysis to determine explanatory variables for MCI with reference to non-MCI by binomial logistic regression analysis (Table 3). According to a result of Phi coefficient of association, all the nominal scales including Gender (Phi coefficient = 0.01, *p* = 0.89), presence of Hypertension (Phi coefficient = 0.02, *p* = 0.86) and Diabetes (Phi coefficient = 0.07, *p* = 0.46), Weight loss (Phi coefficient = 0.05, *p* = 0.63), Poor energy (Phi coefficient = 0.12, *p* = 0.22), Low physical activity level (Phi coefficient = 0.08, *p* = 0.45) were not significantly associated with MCI classification, and they were not included into predictors for the regression model. Three regression models were estimated according to the predictors of Age, GS, WS, Amount of medications, Education, WR, TMT-A, B, SDST, TDAS, and GDS-15 total score. Model I that included them demonstrated that the classification of MCI had a significant association with Age, TMT-B, SDST, and TDAS. Next, Model II except for T-DAS score from Model I was applied to estimate a specific cognitive profile in MCI. Model II demonstrated that the classification of MCI had a significant association with Age, WR, TMT A, B, and GDS-15 total score. Finally, considering the self-explanatory effect of cognitive items, Model III except for all cognitive variables from Model II was applied to clarify the classification of MCI. As shown in Model III, WS and GDS-15 total score were extracted as explanatory variables of MCI (Table S1). In the three estimated models, the results of Hosmer–Lameshow test showed adaptability of 87.4% (*p* = 0.12) in Model I, 84.5% (*p* = 0.84) in Model II, and 59.2% (*p* = 0.02) in Model III.

#### **5. Discussion**

In this study, we found characteristics of MCI in northern Japanese community-dwellers of super-aging society had slower WS and tendency to depression. Aging continues in the subjects of our survey area, and the population ratio 65 years or older reached 38.7% (July, 2019). Actually, the prevalence of MCI in this study was higher (47.6%) compared with other rural areas which were previously reported to be about 10%–30% [29,33]. Additionally, some wealthy urban areas different from our rural area showed that characteristics of MCI were greater with older age and less education than non-MCI [34,35]. Although this high prevalence and multifactorial approach may be due to different methods, it could also be because our community-dwellers living in an area of heavy snowfall in northern Japan experience a more negative impact on gait performance [36] and a potentially high incidence of depressive symptoms [37] because of fewer opportunities to go out and participate in social activities. In fact, we showed an association between cognitive function and demographic and health data including age, gender, BMI, medical history, medication, frailty phenotype, education, physical performance, and GDS in older adults living in a super-aging society (Table 1). We found that recognized risk factors for MCI including age, gender, BMI, presence of hypertension or diabetes, frailty phenotype, education, and amount of medications were not different, but WS and GDS were significantly different between the groups. We also found that WS was significantly associated with some cognitive items including SDST and TMT in the non-MCI group, but not in the MCI group (Table 2). The regression models demonstrated that MCI had a significant association with age, executive function, information coping speed, and composite cognitive performance, indicating that these are predictive variables for the presence of MCI. However, because of the effect of variables on these cognitive scores (Model I), we applied Models II and III (Table 3). Model II excluding composite cognitive performance, as indexed in the TDAS score, demonstrated that MCI had a significant association with age, WR, attention, executive function, and GDS. Compatible with the results of Reinvang et al. [38], attention and executive dysfunction in neuropsychological tests could be early symptoms of MCI. Especially, the variables of SDST and TMT are recognized to reflect psychomotor processing and executive function [39], and several studies have reported that they are rapidly altered in MCI subjects [40,41]. Although they justify its use for the detection of cognitive impairment in older adults, most of these tests have numerous limitations (the problem of novelty, lack of sensitivity and specificity, patient cognitive reserve, etc.) [42,43]. This recent observation underscores the need to find new detection indicators for cognitive impairment. With this in perspective, a new approach associates WS of older adults with the presence of cognitive impairment.

Interestingly, in Model III excluding all cognitive domains, WS and GDS were selected as explanatory variables although the percentage of correct classifications was not so good in the Hosmer–Lameshow test. These findings indicate that the variables WS and GDS can potentially distinguish the presence or absence of MCI; therefore, they provide suggestive information on the presence of MCI. Recently, some studies have focused on both cognition and locomotor performance as predictors of adverse outcomes in community-dwellers with MCI [44,45]. In particular, slow gait speed at usual pace has been implicated in the onset of adverse outcomes, such as disability [46], cognitive impairment [47], institutionalization, falls [48,49], and mortality [50]. As previously reported, the association between slowing of walking and MCI is supported by shared neurological findings that include a smaller right hippocampus [51]. This finding underscores walking–brain behavior relationships and the value of WS as an early indicator of dementia risk. However, thus far, there is insufficient information to state that WS can potentially predict adverse outcomes in older community-dwellers, and more specific investigations need to be performed. Moreover, we showed that GS was no different between the groups (*p* = 0.25), suggesting that reinforcement of

lower, but not upper, limb muscular strength may be a critical target in rehabilitation. Likewise, recent studies have indicated that lower extremity motor dysfunction may be a feature of MCI [52], but little is known about the nature and biological mechanism such as myokines of lower extremity motor dysfunction associated with MCI. Regarding WS and a cognitive function, the concept of frailty has become a geriatric topic recently. Although we could not include frailty as global score in the correlation analysis or binomial regression analysis because the distribution of a frail group according to the frailty phenotype was greatly biased (e.g., % of robust/pre-frail/frail, 50%/50%/0% in the non-MCI group, 43%/47%/10% in the MCI group) (Table 1), some studies have reported that a physical frailty is associated with MCI and a reduction of WS in five items of the Fried index mostly reflect the occurrence of MCI and disability [31,53]. MCI with concomitant physical frailty may be considered to fulfil the criteria for cognitive frailty [54]. In this regard, we believe that the cognitive frailty concept has potential advantages in better stratifying the risk profiles of older adults with MCI. In a comparison between the groups, MCI also showed significantly higher depressive scores as indexed in the GDS. Concerning geriatric depression in MCI, cross-sectional research has shown that the association between depressive symptoms, as indexed in the Korean version of GDS, and memory or executive function was significantly greater in individuals with MCI than in those with AD [55]. Additionally, survival analysis followed for 6.28 years on average, indicating that the presence of MCI is a poor predictive factor in individuals with depressive symptoms as indexed in the GDS [56]. Thus, geriatric depressive symptoms in individuals with MCI need to be carefully screened in rural community-dwellers.

The limitations of our research need to be considered in developing our future research. First, the NCGG-FAT and TDAS used to classify individuals with MCI in this study were a tablet PC version of cognitive measurement tools based on the MCI criteria reported by Petersen [2], and evaluation of the accuracy of MCI's classification is essential for worldwide research. Second, our cohort was comprised of a localized group of individuals in one rural area of northern Japan, whose actual sample size (*n* = 103) did not reach the calculated required sample size (*n* = 128) due to difficulty sampling and recruiting in a depopulated, small rural area. Third, considering younger age was associated with MCI, we could not take the association into consideration. Fourth, although focusing this study on frailty concept was important, we guessed it was difficult to analyze frail status in detail due to bias of frail samples between the groups (e.g., 0% of the non-MCI group, 10% of the MCI group). Further examination concerning frailty is warranted in future research. Finally, we hypothesize that cognitive domains, gait performance, and tendency to depression might be associated with MCI status. For the three regression models in this study, WS and GDS were selected as explanatory variables in Model III. However, further research with sufficient adaptability should be carried out with a large sample size in multiple rural districts. These limitations need to be considered when interpreting this study's findings.

#### **6. Conclusions**

In conclusion, WS and GDS were shown to be potential predictive variables of MCI in our study, and we consider they provide important information about characteristics of MCI in rural community-dwellers. It is suggested that older individuals living in a super-aging society should work on training lower limb muscular strength, and avoiding depression in older adults by interaction of community-dwellers may contribute to prevention of the onset of MCI.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2077-0383/8/11/1937/s1, Table S1: Methodology of the binomial logistic regression models.

**Author Contributions:** Conceived the trial and participated in the study design: H.S., H.M., S.L., and H.O. Recruited and collected data: T.T., Y.I., T.O., and H.O. Analyzed data: Y.K. and H.O. All authors participated in interpretation of the results. Y.K. and H.O. drafted the manuscript, and all authors contributed to critical review and revision of the manuscript. H.O. takes responsibility for the manuscript as a whole.

**Funding:** This work was supported by the Japan Agency for Medical Research and Development (AMED) (Grant: 18dk0207027h0003).

**Acknowledgments:** The authors would like to thank all of participants for their enthusiasm in contributing to this project and making this study possible. We would also like to thank Soichi Kagaya (Akita University) for his management, and all staff at Yokote Hall and Omori Hospital who provided assistance in performing the assessments, and AMED for financial support.

**Conflicts of Interest:** All authors declare that they have no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Cerebral White Matter Hyperintensity as a Healthcare Quotient**

### **Kaechang Park 1,\*,**†**, Kiyotaka Nemoto 2,**†**, Yoshinori Yamakawa 3, Fumio Yamashita 4, Keitaro Yoshida 2, Masashi Tamura 2, Atsushi Kawaguchi 5, Tetsuaki Arai <sup>2</sup> and Makoto Sasaki <sup>4</sup>**


Received: 18 August 2019; Accepted: 27 October 2019; Published: 1 November 2019

**Abstract:** To better understand the risk factors and optimal therapeutic strategies of cerebral white matter hyperintensity (WMH), we examined a large population of adults with and without various vascular risk factors (VRFs) or vascular risk conditions (VRCs), such as hypertension (HT), diabetes mellitus (DM), and dyslipidemia (DLP), including the comorbidities. We assessed two participant groups having no medical history of stroke or dementia that underwent brain checkup using magnetic resonance imaging (MRI): 5541 participants (2760 men, 2781 women) without VRCs and 1969 participants (1169 men, 800 women) who had received drug treatments for VRCs and the combination of comorbidities. For data analysis, we constructed WMH-brain healthcare quotient (WMH-BHQ) based on the percentile rank of WMH volume. This metric has an inverse relation to WMH. Multiple linear regression analysis of 5541 participants without VRCs revealed that age, systolic blood pressure (SBP), Brinkman index (BI), and female sex were significant factors lowering WMH-BHQ, whereas body mass index (BMI), male sex, fasting blood sugar, and triglyceride levels were increasing factors. The Kruskal–Wallis test and Dunn tests showed that WMH-BHQs significantly increased or decreased with BMI or SBP and with BI classification, respectively. Regarding the impact of impaired fasting glucose and abnormal lipid metabolism, there were almost no significant relationships. For 1969 participants who had HT, DM, and DLP, as well as their comorbidities, we found that DLP played a substantial role in increasing WMH-BHQ for some comorbidities, whereas the presence of HT and DM alone tended to decrease it. Cerebral WMH can be used as a healthcare quotient for quantitatively evaluating VRFs and VRCs and their comorbidities.

**Keywords:** white matter hyperintensity; MRI; healthcare quotient; chronic

#### **1. Introduction**

Cerebral vessel diseases are classified as large vessels diseases (LVDs) or small vessels diseases (SVDs) based on whether the diameters of the vessels involved are larger than a few millimeters or smaller than several hundred micrometers, respectively [1,2]. Both categories can be noninvasively diagnosed using magnetic resonance imaging (MRI) [3]. With regard to risk factors, many longitudinal

studies have reported that LVD can be responsible for stroke, cognitive decline, and dementia [1–4]. Hypertension (HT), diabetes mellitus (DM), and dyslipidemia (DLP), three primary vascular risk conditions (VRCs) in developed countries, are risk factors for LVD [5,6]. Although HT is an obvious risk factor for SVD, the roles of DM and DLP remain disputable [7,8]. Compared with LVD, SVD has not yet been sufficiently studied with regard to its onset and development. The pathological complexity of SVD, such as arteriosclerosis, hyalinosis, blood–brain barrier disruption, and venous collagenosis, have long complicated our ability to fully comprehend its many aspects [1,8]. For SVD studies using MRI, it is difficult to include large numbers of participants who have various conditions ranging from preclinical to chronic HT, DM, and DLP, partly because SVD is mostly asymptomatic and does not have hospital follow-up like LVD. To clarify the whole range of cerebral vessel damages, a large scale epidemiological study of SVD including preclinical or chronic HT, DM, and DLP is essential.

Brain MRIs show four major features of SVD: lacuna stroke, white matter hyperintensity (WMH), cerebral microbleeds, and visible perivascular spaces [9]. In our study, we focused on WMHs, also known as leukoaraiosis, which are commonly observed in the general population, particularly among individuals with preclinical or chronic HT, DM, and DLP. WMHs are recognized in >60% of people over 60 years old [10] and >30% of people with the age range from 40 to 50 years in Japan, where MRI examination is incorporated as part of health checkups in connection with a screening program called Brain Dock [11,12]. WMHs are regarded as disappearance of arterioles and capillary arteries caused by aging, HT, and reduced cerebral blood flow [9,13]. WMHs are also significantly associated with recurrent stroke, cognitive decline, and dementia [2,5].

Numerous efforts have been made to develop MRI-based measures of health status, such as the concept of "brain age", which reportedly reflects the mortality of an individual [14]. Our team earlier proposed brain healthcare quotients (BHQs) based on gray matter volume or fractional anisotropy and found significant associations between these proffered metrics and various physical factors, such as obesity, high blood pressure (BP), and daily personal schedules, as well as social factors, including subjective socioeconomic status, subjective well-being, and the adoption of a postmaterialism view of life [15]. In this cross-sectional study, we proposed another BHQ based on WMH, which begins to appear in early middle age and increases in frequency with age. Using an extensively large database obtained from 8921 participants who were examined through MRI as part of the Brain Dock component of a routine health checkups, we analyzed two groups of individuals with VRFs: those without VRCs and those with VRCs receiving drug treatment for high BP, impaired fasting glucose (IFG), or abnormal lipid metabolism (ALM), each of which chronically results in the onset of HT, DM, or ALM, respectively. In the drug treatment group, the BHQs of WMH were compared according to the comorbidity of HT, DM, or DLP because these VRCs commonly combine together. Nonetheless, the relationship between WMH and comorbidity remains remarkably unclear [9,10,13]. To help make progress in this area, we designed and executed a large scale, cross-sectional study covering healthy and non-healthy states ranging from preclinical to chronic HT, DM, and DLP to examine whether WMH can be used as a healthcare quotient to maintain a healthy state or prevent the onset and development of VRCs.

#### **2. Materials and Methods**

#### *2.1. Participants*

Data were collected between January 2013 and April 2017 from the brain dock center (BDC) affiliated with Kochi University of Technology. From BDC, we enrolled 8921 healthy participants without a history of cerebral stroke, who underwent the brain dock health checkups only once. Although we were interested in the WMH of individuals with various medical backgrounds, participants who had been clinically diagnosed with HT, DM, and DLP but had not been treated with drugs were excluded (*n* = 1411). Thus, 5541 participants (2760 males, 2781 females; age, 20–89 years; mean age ± SD, 51.38 ± 9.80 years; median age, 51 years) with no medical history for HT, DM, and DLP were selected for analysis (Table 1). Here, the term "medical history" refers to the drug treatment history before and

at the time of enrollment in the study. In addition, the following participants were also enrolled for analysis based on the examination results at BDC that compared WMH-BHQ of the participants with no medical history with that of those with a medical history of HT, DM, and DLP or their comorbidities (*n* = 1969).

diabetes mellitus (DM), and/or dyslipidemia (DLP). **Total Male Female Mean Age** ± **SD (Years) Median Age (Years)** No medical history 5541 2760 2781 51.4 <sup>±</sup> 9.8 51

**Table 1.** Number and age distribution of participants without and with hypertension (HT),


All participants lived in Kochi Prefecture, visited BDC, and underwent brain MRI as part of their routine health checkups. They also answered a questionnaire on their past and present medical history and lifestyles, such as smoking. Health checkups included systolic blood pressure (SBP), body mass index (BMI), Brinkman index (BI; multiplying the average number of cigarettes smoked per day by the number of years the person has smoked), and various blood chemistry test items, including hemoglobin A1c (HbA1c), fasting blood sugar (FBS), triglycerides (TG), and high- density lipoprotein (HDL) and low-density lipoprotein (LDL) cholesterol. Based on these tests, BMI, BI, High BP, IFG, and ALM were classified according to the criteria shown in Table 2.

**Table 2.** Classification of body mass index (BMI), Brinkman index (BI), high blood pressure (BP), and criteria of impaired fasting glucose (IFG) and abnormal lipid metabolism (ALM).


#### *2.2. Automated Measurement of WMH Volume*

A 1.5 Tesla MRI system (ECHELON Vega; Hitachi Medical Corporation, Tokyo, Japan) was used to perform MRI examinations for WMH diagnosis. The imaging protocol included T2-weighted spin-echo (repetition time/echo time (TR/TE) = 5800/96 ms), T1-weighted spin-echo (TR/TE = 520/14 ms), and fluid-attenuated inversion recovery (FLAIR; TR/TE = 8500/96 ms; inversion time = 2100 ms) images as described previously [16]. Images were obtained as 27 transaxial slices per scan. The slice thickness was 5 mm, with no interslice gap, as described previously [11,16]. Measurement of WMH volume was needed to evaluate the severity, especially for levels more than the maximum of the Fazekas scale. In our study, WMHs were automatically segmented and quantified for their volume using the following procedure. First, the FLAIR images were segmented into gray and white matter and cerebrospinal fluid space using SPM12, which also yielded the intensity inhomogeneity corrected image (IICI) [17,18]. Then, IICI was anatomically normalized into the template space using advanced

normalization tools. A region-of-interest delineating the middle cerebellar peduncle was applied to the anatomically normalized IICI to estimate the intensity distribution of normal white matter of each subject. IICI in native space was then normalized for its intensity in the brain region segmented by the gray and white matter. The intensity normalized IICI was thresholded using a 3.5 SD cutoff to segment WMH, with search regions limited to WMH mask. The WMH volume (WMHV) was calculated by multiplying the voxel size by slice thickness. Finally, the measured WMH was automatically colored red to be detected by the first author (K.P.) who were a neurosurgeon trained enough to confirm the presence and location of WMH.

#### *2.3. WMH-BHQ, a Novel Quotient Based on WMHV*

Based on the WMHV of each participant, we constructed a new metric, the WMH-BHQ in that higher values are better and the median value for a given set of subjects is 100. In other words, this new metric was devised to convert WMHV to the standardized scale so that one can easily understand whether the WMHV of a subject is more or less than the median. In the development of this metric, we realized that the distribution of WMHV is skewed, and therefore, we used the percentile rank to define WMH-BHQ, which is the percentage of scores in its frequency distribution that are equal to or lower than it. For example, a test score that is greater than 75% of the scores of people taking the test is said to be at the 75th percentile, where 75 is the percentile rank. From the raw WMHV we obtained the percentile ranking for each subject (WMHV percentile), where zero means the lowest WMHV and 100 percentile means that the participant has the highest WMHV in the group. From this ranking method, a cumulative probability curve was estimated so that the percentile rank with newer data could be calculated with this curve. This estimation was based on the nonparametric density estimation and implemented by using the polspline function in the "polspline" package with R 3.4.3. We then defined this new brain health metric, WMH-BHQ, to be:

$$\text{WMH-BH} \text{QH} = 100 + 15 \times (50 - \text{WMHV percentile}) / 24,\tag{1}$$

With this formula, the median value, which was equivalent to the 50th percentile, generates a WMH-BHQ equal to 100. Likewise, the 74th and 26th percentiles produce BHQ 85 and 115, respectively. Originally, we considered directly using interquartile range (i.e., 75th and 25th percentiles). However, if the newer data were beyond the range of original data, an error would be produced. To avoid this prospect, we used the value 24 instead of 25 in the denominator. As a result, the WM-BHQ of 95% of our subjects ranged from 70.31 to 130. A lower WMH-BHQ means a higher and more problematic level of WMHV. A histogram of the WMH-BHQ values of our subjects with upward sloping curve implies "well-being" of brain health and a downward slope shows "not well-being" in terms of WMH.

#### *2.4. Statistical Analysis*

WMHV and WMH-BHQ data were not normally distributed. Thus, Mann–Whitney *U* and Kruskal–Wallis tests were utilized to evaluate the associations between WMHs and VRCs or other possible risk factors by comparing the differences between group distributions. The groups were defined by the presence or absence of VRCs or by standard criteria and classifications of risk factors described in the upper paragraphs. When the null hypotheses were rejected in the Kruskal–Wallis tests, we used Dunn tests [18] for pairwise comparisons. The *p* values were then adjusted using the Benjamini–Hochberg procedure [19], which controls the false discovery rate for multiple comparisons. Multiple regression analyses were performed to examine complex associations among multiple variables while controlling for the effect of potential confounding factors [20]. All statistical analyses except for the Benjamini–Hochberg procedure were performed using the Statistical Package for the Social Sciences software version 22 (IBM Corp., Armonk, NY, USA) [20]. Adjusted *p* values based on

the Benjamini–Hochberg procedure were calculated using a Microsoft Excel (Microsoft Inc, Redmond, WA, USA) spreadsheet [21].

#### *2.5. Standard Protocol Approvals, Registrations, and Participant Consents*

Written informed consent was received from all participants and this study was reviewed and approved by the institutional review board of Kochi University of Technology.

#### *2.6. Data Availability*

Anonymized data might be shared by request.

#### **3. Results**

#### *3.1. WMH-BHQ of Participants with no Medical History According to Age Decades*

As shown in Figure 1a, the contour of the histogram for our metric WMH-BHQ changed remarkably across the different ages of the participants in this study. For subjects in their 40s, it was up right, while for those in their 50s, it was symmetric like a rainbow curve, and those in their 60s, 70s, and 80s generated plots shifting up and to the left as the age decades increased. In terms of WMH-BHQ, brain health obviously declined with age, as shown by the box plot in Figure 1b. The Kruskal–Wallis test and Dunn tests showed that all pairwise comparisons were significant except for those between 70s and 80s (*p* = 0.076). The WMHV histogram was asymptotic with a peak volume of <5 mL; therefore, WMH-BHQ was clearly superior to WMHV with regard to visualization and understanding of changes occurring over the age decades.

**Figure 1.** Histograms (**a**) and box plots (**b**) of WMH-BHQ with no medical history according to age decades of 40s, 50s, 60s, 70s, and 80s.

#### *3.2. WMH-BHQ Histograms of Sex*

There was a distinct difference in WMH-BHQ between males and females (Figure 2). The histogram of males was a trapezoid with an upward slope, while that of females showed a plateau, suggesting that females are more susceptive to WMH than males.

Compared with results across the various age decades, the Mann–Whitney *U* test showed a clear sex difference for participants younger than their 50s but not for participants in their 60s and beyond (Table 3).

**Figure 2.** WMH-BHQ Histograms of females (**a**) and males (**b**).

**Table 3.** Mann-Whitney U test of white matter hyperintensity brain healthcare quotient (WMH-BHQ) without medical history according to genders and age decades.


#### *3.3. Analysis of WMH-BHQ Risk Factors: No VRCs*

Multiple linear regression analysis of participants without VRCs was performed using age, sex, BMI, BI, SBP, HbA1c, FBS, TG, HDL, and LDL as independent variables and WMH-BHQ as a dependent variable (Table 4). A stepwise model was adapted for variable selection procedure. Female or male sex was a significant risk factor lowering or raising WMH-BHQ, respectively. The increases in age, SBP, and BI were significantly associated with the decrease in WMH-BHQ, whereas the increases in BMI, FBS, and TG were significantly associated with the increase in WMH-BHQ. HbA1c, HDL, and LDL were excluded after the stepwise regressions.

**Table 4.** Multiple linear regression analysis for white matter hyperintensity brain healthcare quotients (WMH-BHQ) risk factors.


#### *3.4. WMH-BHQ without VRCs: the E*ff*ect of Three Classifications and Two Criteria*

We explored in detail the impact of differing values for BMI, BI, high BP and the criteria of IFG and ALM. Regarding BMI, the box plots in Figure 3a showed that WMH-BHQ significantly increased as the classification of BMI became larger. The Kruskal–Wallis test and the following Dunn tests showed all pairwise comparisons to be significantly different. BMI was also positively associated with WMH-BHQ. The effect of cigarette smoking, as measured by the BI, was that lower levels were associated with higher WMH-BHQ values. In particular, a BI value of 0 (Dunn test; *p* < 0.001) and 0–400 (*p* < 0.001) revealed significantly better bran brain health than levels above 400 (Figure 3b). Also, from Figure 4a, it is apparent that WMH-BHQ declined as SBP increased. Regarding high BP, WMH-BHQs of ≥400 (*p* < 0.010) and 0–400 (*p* < 0.001) revealed significant decreases compared with levels of ≥400 (Figure 4a). For IFG, a significant relationship existed only between (FBS < 100 and HbA1c < 5.6%) and (FBS ≥ 100 and <110 or HbA1c ≥ 5.6% and < 6.0%) (*p* < 0.001), although the other pair matches showed no significance (Figure 4b). TG showed a significant relationship only between 30 ≤ TG < 149 and 150 ≤ TG < 399 (*p* = 0.001), although the other pair matches showed no significance (Figure 4c). The LH ratio showed significant differences between 1 ≤ LH ratio < 1.5 and 2.5 ≤ LH ratio (*p* = 0.010) and between 1.5 ≤ LH ratio < 2.0 and 2.5 ≤ LH ratio (*p* = 0.018), although the other pair matches showed no significance (Figure 4d).

**Figure 3.** Box plots of WMH-BHQ with no medical history according to (**a**) Body Mass Index (BMI) and (**b**) Brinkman Index (BI).

**Figure 4.** Box plots of WMH-BHQ with no medical history according to (**a**) systolic BP (SBP), (**b**) triglyceride (TG), (**c**) impaired fasting glucose (IFG) criteria, and d) the ratios of LDL to HDL (LH ratio). (**d**) A: fasting blood sugar (FBS) < 100 and HbA1c < 5.6; B: 100 ≤ FBS < 110 or 5.6 ≤ HbA1c < 6.0; C: 110 ≤ FBS < 126 or 6.0 ≤ HbA1c < 6.5; D: FBS ≥ 126 or HbA1c ≥ 6.5. \* *p* < 0.05; \*\* *p* < 0.001

#### *3.5. WMH-BHQ with VRCs and Their Comorbidities*

WMH-BHQ histograms showed a downward slope for HT, while those of DM and DLP are almost plateaued compared with HT (Figure 5a). We also analyzed WMH-BHQ patterns regarding various multimorbidities, specifically, HT+DM, HT+DLP, DM+DLP, and HT+DM+DLP. Somewhat surprisingly, DM+DLP and HT+DM+DLP showed no downward slopes. Box plots showed that "no medical history" had the highest and HT+DM had the lowest median WMH-BHQs (Figure 5b). The Kruskal–Wallis test and the following Dunn test showed that all but DLP+DM had a significant difference in mean rank compared with no medical history (Table 5).

**Figure 5.** Histograms (**a**) and box plots (**b**) of WMH-BHQ according to no medical history, single morbidity, and multiple comorbidities. HT: Hypertension, DM: Diabetes mellitus, DLP: Dyslipidemia.

**Table 5.** Dunn test of WMH-BHQ among no medical history, hypertenstion (HT), diabetes mellitus (DM), dyslipidemia (DLP), and the comorbidities.


In addition, there was a statistically significant difference in the mean rank of WMH-BHQ between DLP and HT (*p* = 0.001), and a marginally significant difference between DLP and DM (*p* = 0.098). Furthermore, there were significant differences in the mean rank of WMH-BHQ between DLP and HT+DM (*p* < 0.001) and between HT+DM and DM+DLP (*p* = 0.022). These results show that DLP may positively affect WMH-BHQ, or at least, DLP appears unlikely to be a negative factor in this regard.

#### **4. Discussion**

Age was the strongest risk factor positively related to WMH on multiple regression analysis. In addition to age, SBP and BI were positive factors for WMH, which is consistent with prior reports [4,5]. Conversely, female sex, FBS, TG, and BMI were negative factors for WMH. The reason for sex differences in WMH remains unclear although the risk factors for LVD include male sex [22,23]. Recently, a population-based cognitively unimpaired cohort study with participants aged >70 years demonstrated that females had significantly greater WMHV than males [24]. Although our study examined the percentile of WMHV, the female susceptibility to WMH was significantly observed in participants aged <60 years but not in those aged >60 years (Table 3). The average menopausal age in Japan is around 50 years, but the differences across individuals are also regarded as substantial. Further study of the exact menopausal age is needed to elucidate the influence of sex hormones. If anything, sex differences according to age may contribute to the development of sex-specific preventive strategies against WMH progression linking to stroke and dementia.

Regarding fasting blood glucose and triglycerides, both β values were relatively low and the Kruskal–Wallis test showed no significant relationships according to IGF and ALM criteria. Fasting blood glucose and triglycerides may not be so heavily involved in WMH onset and development. Conversely, BMI was a powerful negative factor that increased the strength according to BMI criteria. BMI is an obesity index, whereas the others indicate waist circumstance (WC) and hip-waist ratio (HWR) [25]. Obesity directly depends on the amount of adipose tissue that can increase without weight gain. Obesity can be evaluated more accurately by WC or HWR than BMI, especially in the elderly [25]. Instead of BMI, WC was used for multiple regression analysis and yielded similar result as that with BMI, a negative factor of WMH. That obesity appears to suppress WMH may be because some growth factors for vessels are reportedly secreted from adipose tissue [26]. For example, angiopoietin-like protein 4 (ANGPTL4) is a member of the angiopoietin family, which encodes a secretory glycoprotein highly expressed in adipose tissue and liver and placenta [27]. ANGPTL4 and/or other vascular trophic factors are delivered from adipose tissue to the brain small vessels and may prevent WMH onset and progression. Further study will be needed to validate the hypothesis.

The existence and progression of WMH in the brain can be visually recognized through WMH-BHQ before or after VRCs such as HT, DM, and DLP emerge, that is, at any progression of morbidity from preclinical to chronic stages. The visible changes of WMH in the brain readily force patients to make efforts to control VRCs, linking to the prevention of stroke or dementia, and can determine when to start drug treatments or how to promote nondrug therapies, such as low calorie and salt diets and/or physical exercise. For example, HT diagnosis based on BP values remains unreflective of organ damages caused by a high BP. If WMH was accurately grasped by means of the new metric WMH-BHQ and efficiently treated according to the change in this measure, the damages due to HT in the brain and entire body could be minimized. Thus, WMH-BHQ may be regarded as an effective indicator that can help us to visualize brain damages and estimate the health of the whole body in terms of cerebral small vessel damages. Another advantage of WMH-BHQ is that it is based on percentile and ranking order, which enables minimization of measurement bias compared with WMHV, which heavily depends on scanning conditions and the abilities of MRI equipment.

Regarding comorbidity, HT+DM yielded lower WMH-BHQs than the other double morbidities (HT+DLP and DM+DLP) as well as triple morbidities (HT+DM+DLP), although this may be due to the possible ceiling effect of the WMH volume associated with the VRF. Multimorbidity is becoming a global challenge to prevent stroke and extend lifespan [28,29]. A nationally representative cross-sectional study of more than 1.4 million persons in Scotland showed that a diagnosis of stroke significantly became more common as the number of morbidities increased [30]. Additional comorbidities are widely considered to decrease health with a destructive metabolic domino effect [29]. In our study, however, DLP may have a suppressive or preventive effect on WMH in comorbidity, although the Scotland study did not describe DLP at all. According to the ALM criteria, there were no significant differences in WMH-BHQ. This evidence might imply some effect of statins, usual drugs for DLP, rather than DLP pathology. Several meta-analyses of placebo-controlled randomized trials suggest that statins may be beneficial in reducing the overall incidence of stroke [31,32]. It remains to be determined whether statins prevent onset and development of stroke through suppression of WMH. In our study, the numbers of DLP and DM patients were extremely smaller than those with HT. Further validation needs a larger number of participants with DLP and DM for the next follow-up study. In the near future, MRI parameters could be assessed through artificial intelligence pivoting on data mining [33,34]. Such approach together with the identification of biomarkers based on novel nanotechnology or biomedical engineering platforms would allow to propose new biosignatures for risk stratification in neurovascular patients [35].

#### *Limitations*

The Brain Dock program utilizes an MRI-based approach to preventive medicine that was uniquely developed in Japan, aiming at early detection of unruptured cerebral aneurysms. At Brain Dock, health checks are conducted for a vast number of participants, and therefore, a large database could be built for brain research. Our study covered approximately 9000 participants living in Kochi Prefecture, Japan, and a single MRI machine was used throughout the study. Thus, the selection and information bias in this study could be minimized. However, our study included a bias of socioeconomic state involved, whereas one-fourth of the participants belonged to the white-collar class, such as public officials with moderate yearly incomes. The socioeconomic impact of this proportion is likely considered significant on WMH and the onset as well as the progression of VRCs to no small extent. The usage of 1.5 T MRI yields lower measurement of WMH volume as compared with 3 T MRI [36]. Nevertheless, the difference in magnetic power may be minimized in case of WMH-BHQ using the percentile of WMH volume. Our study was designed as a cross-sectional approach and only referred to the associations with WMHs at preclinical and chronic stages of VRCs. The next step will involve a prospective cohort study to certify the causal validation of WMH-BHQ using participants undergoing Brain Dock examinations more than twice.

#### **5. Conclusions**

In this study, we showed that cerebral white matter hyperintensities can be used as a healthcare quotient for quantitatively evaluating vascular risk factors or vascular risk conditions. Because of easiness of interpretation (Higher WMH-BHQ is better for brain in terms of cerebral vascular risk), WMH-BHQ might be useful for both clinicians and patients/inidividuals to pay their attentions to reduce cerebral vascular risks.

**Author Contributions:** Conceptualization, K.P. and K.N.; methodology, K.N. and K.Y.; software, F.Y.; validation, M.T., T.A. and M.S.; formal analysis, K.Y.; investigation, M.S.; resources, K.P.; data curation, K.P.; writing—original draft preparation, K.P.; writing—review and editing, K.N.; visualization, A.K.; supervision, M.S.; project administration, K.P.; funding acquisition, Y.Y.

**Funding:** This work was partially supported by the fund of ImPACT Program of Council for Science, Technology and Innovation (Cabinet Office, Government of Japan).

**Acknowledgments:** We thank Miho Kawai for assistance with questionnaires in our Brain Dock.

**Conflicts of Interest:** None of the authors has a financial relationship with a commercial entity that has an interest in the subject of this manuscript.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
