*Proceedings*  **Web Server and R Library for the Calculation of Markov Chains Molecular Descriptors †**

#### **Paula Carracedo-Reboredo 1, Cristian R. Munteanu 1,2, Humbert González-Díaz 3,4 and Carlos Fernandez-Lozano 1,2,\***


Published: 20 August 2020

**Abstract:** Markov Chain Molecular Descriptors (MCDs) have been largely used to solve Cheminformatics problems. The software to perform the calculation is not always available for general users. In this work, we developed the first library in R for the calculation of MCDs and we also report the first public web server for the calculation of MCDs online that include the calculation of a new class of MCDs called Markov Singular values. We also report the first Cheminformatics study of the biological activity of 5644 compounds against colorectal cancer.

**Keywords:** Markov Chains; online tool; R; colorectal cancer

#### **1. Introduction**

Cheminformatics models are able to predict different outputs in complex molecular systems. On the other hand, colorectal cancer (CRC) is the third most commonly occurring cancer in men and the second in women, having a mortality of approximately 56% of the patients [1]. Although a number of compounds for anti-CRC activity have been synthetized and tested, the possibility of coming across an effective drug is still too low [2]. Markov Chain Molecular Descriptors (MCDs) have been largely used to solve Cheminformatics problems, and the calculation is done very often using specific software not ever available for general users. In this work, we developed the first library in R [3] for the calculation of MCDs and the first public web server for the calculation online that includes the calculation of a new class of MCDs called Markov Singular indices. We report a case study; we illustrated the use of those molecular descriptors in the study of active compounds against colorectal cancer (CRC).

#### **2. Materials and Methods**

We proposed an implementation in R of the algorithm for calculation of MCDs that can calculate two drug topological indices (TIs) families: Markov Mean Properties (MMPs) and Markov Singular Values of Transition Probabilities (MMSVs). The combination of the RMarkov.mol with the RRegrs package generates a powerful and fast R tool for designing QSAR (Quantitative structure-activity relationship) regression models. We obtained 5644 preclinical assays of CRC active compounds from ChEMBL and calculated the MCDs using our web server to simplify the process.

#### **3. Results**

Figure 1 shows the user interface of Markov Chemical Descriptors Calculator (MCDCalc) web server. This allows the calculation of molecular descriptors for each atomic property and type of atom. Smiles formulas can be read from the text file or can be individually pasted on screen textbox.


**Figure 1.** Online web server**.** 

We also used the molecular descriptors as input for the RRegrs [4] package in order to find better regression models. Random Forest (RF), Support Vector Machines (SVM), Neural Networks (NN), and Partial Least Squares (PLS) regression methods have been tested and the results are presented in Table 1.


**Table 1.** Results for RF, SVM, NN and PLS.

#### **4. Conclusions**

We have developed the first library in R for the calculation of MCDs, and the first public web server for the calculation of MCDs online that includes the calculation of Markov Singular values which are useful to predict the activity prediction of anti-colorectal cancer compounds. The RF regression model showed the best results.

#### **References**


*Proceedings* **2020**, *54*, 28


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Proceedings* **Validation of Self-Quantification Xiaomi Band in a Clinical Sleep Unit †**

#### **Francisco José Martínez-Martínez 1, Patricia Concheiro-Moscoso 1,\*, María Del Carmen Miranda-Duro 1, Francisco Docampo Boedo 2, Francisco Javier Mejuto Muiño <sup>2</sup> and Betania Groba <sup>1</sup>**


#### Published: 21 August 2020

**Abstract:** Polysomnography (PSG) is currently the accepted gold standard for sleep studies, as it measures multiple variables that lead to a clear diagnosis of any sleep disorder. However, it has some clear drawbacks, since it can only be performed by qualified technicians, has a high cost and complexity and is very invasive. In the last years, actigraphy has been used along PSG for sleep studies. In this study, we intend to assess the capability of the new Xiaomi Mi Smart Band 5 to be used as an actigraphy tool. Sleep measures from PSG and Xiaomi Mi Smart Band 5 recorded in the same night will be obtained and further analysed to assess their concordance. For this analysis, we perform a paired sample t-test to compare the different measures, Bland–Altman plots to evaluate the level of agreement between the Mi Band and PSG and Epoch by Epoch analysis to study the ability of the Mi Band to correctly identify PSG-defined sleep stages. This study belongs to the research field known as participatory health, which aims to offer an innovative healthcare model driven by the patients themselves, leading to civic empowerment and self-management of health.

**Keywords:** sleep; polysomnography; participatory health; Xiaomi Mi Smart Band 5; Internet of Things

#### **1. Introduction**

Sleep has considerable implications in our daily life, and it is crucial to effectively accomplish basic vital functions. People whose sleep quality is poor exhibit different sleep disturbances, such as the fragmentation of the different sleep stages, night arousals and a greater will of having diurnal naps. Moreover, in the most serious cases, sleep disorders like insomnia, hypersomnia or sleep apnoea may also appear [1]. Sleep Units are specialised areas for diagnosis and treatment of all these sleep disorders and offer a wide range of diagnostic tests (e.g., PSG or Multiple Latency Test) [2]. Despite providing multiple possibilities, PSG is considered as the most reliable instrument for the measure of sleep parameters by the scientific community. However, PSG also implies high invasiveness and cost, as well as specialised technicians to use it. Hence, different approaches for sleep assessment using wearable devices are being compared with PSG in order to achieve their maximum possible quality for sleep studies, so they can be used as a complementary tool [3]. In this study, the validation of the sleep data recorded by Xiaomi wristbands is assessed to determine whether these measurements are reliable enough for people to evaluate their own sleep. On this matter, previously conducted studies about

these type of wearables have shown a high accuracy and sensitivity, a low specificity and a poorly significant and limited estimation of sleep/wakefulness states [4].

#### **2. Methods**

#### *2.1. Design of the Study*

This is an observational, analytic, longitudinal, pilot study whose aim is to demonstrate that data collecting instruments, along with their management, are viable and effective. Different variables from the population of interest will be observed and recorded without any direct intervention, so as to establish causality associations between these variables. It is considered as longitudinal, since the tracking of the variables will be performed during six months, continually (and occasionally) recording and monitoring sleep quality. A difference of >15 min in deep sleep measures between Xiaomi and PSG will be considered as clinically relevant in this study. Accepting a 0.05 *α* risk and a 0.1 *β* risk (statistical power of 90%) in a bilateral contrast, a population of 43 patients is needed to detect a difference that is ≥15 min. According to previous studies, a ±30 SD will be assumed. Only patients that perform a medical test at the Sleep Unit from San Rafael Hospital and are >18 years old will be asked to participate. Legal and ethical aspects that guarantee good clinical practice will be followed in this study. Therefore, the informed consent process will be carried out with all participants.

#### *2.2. Data Collection and Analysis*

In this study, PSG data in EDF+ format, the most accepted standard to exchange EEG and PSG data [5], will be obtained from patients that undergo a sleep study at the Sleep Unit of San Rafael Hospital. In addition, patients will be given a Xiaomi Mi Smart Band 5 that will measure their sleep along with PSG for one night. By doing so, we will be able to compare both recordings in order to assess if these wearable devices show results in concordance with PSG. Besides, software to obtain data second by second from Xiaomi bands was developed by the TALIONIS Group, since these wearables export daily data by default, which hinder the analysis.

#### 2.2.1. Variables of Interest

Table 1 shows the features that will be extracted from our data. Numeric variables will be shown as mean (M) and standard deviation (SD), including their range, minimums and maximums.


**Table 1.** Summary of the features of interest for our study

#### 2.2.2. Statistical Analysis

After preprocessing the data (described in Figure 1), and following the analysis performed in the available literature for these validation studies [6], the summary of the aforementioned variables for both the Xiaomi Mi Smart Band 5 and PSG will be compared by using a paired sample *t*-test to study if there are significant differences between the means of each Xiaomi-PSG variable. To evaluate the level of agreement between Xiaomi and equivalent PSG sleep measures, we will use Bland–Altman plots. Since we are interested in the ability of Xiaomi devices to correctly identify sleep stages, Epoch by Epoch (EBE) analysis will be used to calculate the sensitivity (proportion of epochs identified as sleep

by PSG that are correctly classified by the device), specificity (proportion of epochs identified as awake by PSG that are correctly classified by the device), agreement between both PSG and Xiaomi device in light sleep (proportion of PSG F1 + F2 epochs identified as light sleep by the device), deep sleep (proportion of PSG F3 + F4 epochs identified as deep sleep by the device) and REM sleep (proportion of PSG REM identified as REM by the device) identification.

**Figure 1.** Sleep data validation workflow.

#### **3. Conclusions**

This study offers a promising approach to assess whether wearable devices, in our case the Xiaomi Mi Smart Band 5, are able to correctly record our sleep. Even though these devices are not expected to replace polysomnography studies, they may be used as an initial evaluation for users to manage their own sleep quality and, if necessary, visit their doctor or simply change some habits.

**Author Contributions:** Conceptualization, F.J.M.-M., P.C.-M., and B.G.; methodology, F.J.M.-M., P.C.-M., and B.G.; investigation, F.J.M.-M., P.C.-M., and B.G.; writing–original draft preparation, F.J.M.-M., P.C.-M., and M.d.c.M.-D.; writing–review and editing, B.G., F.J.M.M. and F.D.B.; visualization,F.J.M.-M., P.C.-M., and M.d.c.M.-D.; supervision, B.G., F.J.M.M. and F.D.B.; project administration, B.G., F.J.M.M. and F.D.B.; funding acquisition, P.C.-M. and M.d.c.M.-D. All authors have read and agreed to the published version of the manuscript.

**Acknowledgments:** The authors disclose the receipt of the following financial support for the research, authorship and/or publication of this article: All the economic costs involved in the study will be borne by the research team. This work is supported in part by some grants from the European Social Fund 2014-2020. CITIC (Research Centre of the Galician University System) and the Galician University System (SUG) obtained funds through Regional Development Fund (ERDF) with 80%, Operational Programme ERDF Galicia 2014-2020 and the remaining 20% by the Secretaría Xeral de Universidades of the Galician University System (SUG). Specifically, the author P.C.M obtained a scholarship (Ref. ED481A-2019/069) and the author M.d.c.M.-D. (Ref. ED481A 2018/205) to develop the PhD thesis. Furthermore, the diffusion and publication of this research is funded by the CITIC, Research Centre of the Galician University System with the support previously mentioned (Ref ED431G 2019/01).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Proceedings* **Designing an Open Source Virtual Assistant †**

#### **Anxo Pérez \* , Paula Lopez-Otero and Javier Parapar**

Information Retrieval Lab, Centro de Investigación en Tecnoloxías da Información e as Comunicacións (CITIC), Universidade da Coruña, 15071 A Coruña, Spain; paula.lopez.otero@udc.es (P.L.-O.); javier.parapar@udc.es (J.P.)

**\*** Correspondence: anxo.pvila@udc.es; Tel.: +34-881-01-1276

† Presented at the 3rd XoveTIC Conference, A Coruña, Spain, 8–9 October 2020.

Published: 21 August 2020

**Abstract:** A chatbot is a type of agent that allows people to interact with an information repository using natural language. Nowadays, chatbots have been incorporated in the form of conversational assistants on the most important mobile and desktop platforms. In this article, we present our design of an assistant developed with open-source and widely used components. Our proposal covers the process end-to-end, from information gathering and processing to visual and speech-based interaction. We have deployed a proof of concept over the website of our Computer Science Faculty.

**Keywords:** conversational assistant; chatbot; question answering; natural language processing; crawling; information retrieval

#### **1. Introduction**

Nowadays, conversational systems are part of our daily routines [1]. Tech giants are aware of their relevance, and they are incorporating these assistants to their platforms. Microsoft Cortana or Apple Siri are popularly used examples. Some companies, such as Google with the Assistant or Amazon with Alexa, are even manufacturing dedicated devices. Moreover, these services offer great opportunities for customer support [2]. Many companies' websites are gradually enabling conversational capacities to help users discover products and information [3,4]. On the other hand, voice commands have gained much attraction for user interaction [5]. There is a critical tendency to move towards audio controls over tactile interfaces [6]. The inclusion of voice capabilities was, therefore, a straightforward improvement for conversational agents. Apart from the business value of the technology, voice-enabled assistants are truly useful for people with functional diversity [7].

In this article, we propose an architecture for a conversational assistant. As we mentioned, our proposal covers the process end-to-end. We apply information retrieval, natural language processing, machine learning, and speech technologies to cover data acquisition to audio response and user questions.

#### **2. Proposal**

As mentioned previously, our architectural design involves all stages of the process. The system covers everything from information gathering and processing to visual- and speech-based interaction. For that, we have used models and techniques from different information processing fields. In this section, we will explain the process we have followed and the description of the technologies used in the development.

A web-crawler is in charge of the first step of the information-processing pipeline. In this phase, we retrieved the information from the domain webpages, and kept those documents up-to-date. For this task, we used Scrapy (https://scrapy.org/), a popular web scraper. It saves the data from the Internet and creates a repository containing the files to be indexed. The website (https://www.fic.udc.es/) contains documents both in Spanish and Galician. The second phase corresponds to text-processing, which includes sentence-splitting and indexing. We used ElasticSearch (https://www.elastic.co/), a distributed search engine based on Lucene, for both indexing and searching. We propose the use of the ElasticSearch identification component for tagging the documents. As we were building a conversational system, we indexed the data at both the document- and sentence-level. Indexing isolated sentences allowed us to answer many of the user's questions concisely and directly.

Our design accepts the user's input, both through writing and spoken queries. For processing voice queries, we used Kaldi (https://kaldi-asr.org/) for Automatic Speech Recognition (ASR). Finally, we provided a system response in text and audio format using Cotovia (http://gtm.uvigo.es/cotovia). Here, we again used language identification models to process user inputs and outputs correctly. In the case of voice interaction, we trained the automatic speech recognition language models with specific domain lexica [8]. On the voice response side, the system reproduces the responses, selecting the language accordingly to the user input. In this case, we used Cotovia pre-trained models to perform speech synthesis [9].

One crucial problem of spoken document retrieval is term misrecognition. This problem provokes the inability to process the information need correctly. ASR misrecognition produces term mismatch between user input and document content. We used efficient state-of-the-art retrieval models [10] based on n-gram decomposition in dealing with it. The system processes both searchable content and user input in that way to allow fast and robust query matching. These models achieve state-of-the-art effectiveness figures, while also being quite efficient [11].

For answering information needs, we designed a four-level cascade system. First, the system tries to classify the user's intent in some predefined structured tasks (e.g., the timetable for a subject or the date of an exam). If the input falls onto one of those categories, the answer is processed according to the defined pipelines. Second, if that was not the case, the system attempts to provide a direct answer to the specific user question. For that, we propose to use two approaches: best-sentence-matching and BERT-based question-answering [12]. Thirdly, there is no satisfactory direct answer, the system tries to provide the best document answer. Finally, if the system does not rank satisfory documents, it asks the user to reformulate the question.

Architecturally speaking, we are thus using a basic client-server application. The web client communicates with the backed-through rest services and WebSocket APIs . For the web interface, we used BotUI (https://github.com/botui/botui), a very intuitive Javascript library for conversational interfaces. The server contains the implementation of the different REST endpoints and WebSocket APIS for processing audio streams.

#### **3. Conclusions and Future Work**

In this article, we presented our design of a conversational assistant based on open-source and well-known technologies. Even though we exemplified the design for a specific web domain for its implementation, the architecture introduced here can consume other information repositories, such as different enterprises data information or any databases.

There are many avenues for future work. We propose to improve the architecture with advanced Natural Language Generation (NLG) capacities. The fine-tuning of acoustic models for specific language variants is another interesting research address. When not depending on languages, such as Galician, we would favor the use of Tacotron as the TTS engine [13].

**Funding:** This work was supported by projects RTI2018-093336-B-C22 (MCIU & ERDF) and GPC ED431B 2019/03 (Xunta de Galicia & ERDF). Also, this work has received financial support from CITIC, *Centro de Investigación del Sistema universitario de Galicia*, which is financial supported by *Consellería de Educación, Universidade e Formación Profesional* of the *Xunta de Galicia* through the ERDF (80%) and *Secretaría Xeral de Universidades* (20%), (Ref ED431G 2019/01).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Proceedings*
