**1. Introduction**

Wearable systems have been extensively used in healthcare field for several years. Physical medicine and rehabilitation were the first disciplines to venture into the implementation of these devices, in order to monitor the physical activity of the individual in pursuance of better disease diagnosis or patient ailment rehabilitation [1]. With the introduction of Internet of Things (IoT), wearable devices have been used to collect data from patients, not only for the detection of motor activity, but also for measuring blood pressure, heartbeat and even glucose level. This has allowed patients to be monitored any time and anywhere [2]. However, although the sensors are placed in the human body in the most normal and natural way possible to sense its activity, public data acquired from this type of sensing are still rare and not public [3].

Wearables have been used for the detection of depression according to their motor activity, since this represents a high indicator of the presence of this disease. Retardation or decrease in activities is the main feature for patients suffering from depression, for that reason, collect data from this condition could present good results that could be useful in different applications [4].

According to the World Health Organization (WHO) depression is the leading cause of disability [5], seven percent of people around the world suffer from Major Depressive Disorder (MDD), which causes the deterioration of the quality of life, the increase in medical costs and the death rate.

MDD is characterized by several syntopms as; loss of interest and pleasure in daily activities, sleep disorders, weight loss, suicide ideation, suicide attempts, among others. These symptoms must be present every day for at least two weeks to be diagnosed as depressive patients [6]. Depression is a treatable disease with a high level of efficacy using antidepressant medications and psychotherapy treatment, nevertheless for many patients being diagnosed can take months or even years to heal [7–9].

To diagnose or quantify the severity of the MDD, specialists use scales and manuals such as; the Hamilton Rating Scale for Depression written in 1960 [10], the Montgomery and Asberg Depression Rating Scale (MADRS) written in 1979 [11] or the Diagnostic and Statistical Manual of Mental Disorders (DSM). However, the use and interpretation of these methods depends largely on the ability of the specialist to determine the diagnosis [6,12–14]. A fact that discredits this type of methods is their lack of actualization and adaptation to the new advances and discoveries about the disease. In addition, these methods require the intervention of the patient, and in some cases the patients lie for any reason, causing the results not to be true and useful.

On the other hand, the use of data from monitoring depressive patients brings several benefits to medical services, mainly reduces the diagnosis and treatment time, improves the quality of life of patients and reduces medical costs [8].

Sensing motor activity arises as a favorable way for psychiatry and mental health to detect abnormal behaviors. Has been demonstrated that patients suffering from depression tend to reduce their daytime activity, and due to sleep disorders increase their nighttime activity [3]. In contrast, patients with bipolar disorder lead to an increase in their energy, however both scenarios presents a motor activity discrepancy from a healthy person. Therefore, circadian rhythm desynchronization is present in mental illness but is not well used for diagnosis or treatment monitoring ye<sup>t</sup> [15].

The task of collect motor activity data can be accomplished using sensors like accelerometers, a combination of accelerometers, Global Positioning System (GPS), gyroscopes, inclinometers, magnetometers, etc. [1]. Nowadays, most of these technologies have very small dimensions, are cheep and easy to add in some specific devices or clothes, which facilitates usability and adaptability to everyday life. One way to replace these sensors could be using mobile phones, these devices have a big role in ubiquitous treatment, where the main idea is to avoid disturbances that the sensors or devices could generate on the patients and collect reliable data.

Once the data is collected the next step is to process it to recognize patterns and obtain some statistics or classifications. Sohrab Saeb et al [8] preset the relation of the regular clinical diagnosed and the sensor-based data from depression patients, they obtained an important result, in which a correlation between the GPS data and The Patient Health Questionnaire for depression (PHQ-9) scores was presented, this proves the relation between activity and depression [8].

In another work presented by Enrique Garcia-Ceja et al [3] a collected data from unipolar, bipolar and healthy control people was used to compare different machine learning algorithms and classify depressive and non-depressive signals, proving that through the use of machine learning techniques it is possible to classify between depressive and non-depressive people.

Machine learning is a set of algorithms that learn from the analyzed data to develop training models to classify that type of data. It allows among other applications, to make diagnoses or even predict some diseases [16]. These methods are commonly used in a data mining process that involves a series of steps related to each other and with the final objective of acquire valuable information.

Machine learning methods are increasingly used, EEG-based machine learning provides a non-invasive method to automatic diagnose MDD using algorithms like Linear Regression and Naive Bayes [17].

In this paper, a data mining process is carried out to classified depressive episodes using data collected during night time, day and full 24 h. The comparison between the classification using different data collected trough the time gives a better image of the disease and behavior of the patients with the diagnosis.

The structure of the paper is the Material and Methods, Results, Discussion and Conclusion sections, the Material and Methods section describes step by step the data mining process used to implement the classification of depressive and non-depressive episodes.

## **2. Materials and Methods**

The data mining process [18] shown in Figure 1 is followed to classify depressive episodes. The first stage consists on the data collection, where the Depresjon dataset containing the information of depression episodes from patients is acquired. These data are submitted to a pre-processing step in order to clean, normalize and segmen<sup>t</sup> them in one hour lapses. Then, a feature extraction is applied, where a set of 24 features in the time and frequency domain are obtained for different stages of the day (day, night and full day). A feature selection based on a forward selection (FS) approach is subsequently performed to reduce the number of features and to avoid redundant or non-significant information. From the selected features, a classification step based on the random forest (RF) algorithm is applied to develop a series of generalized models to identify between healthy and depressed patients according to the motor activity. Finally, these models are validated by a statistical analysis.


**Figure 1.** Data mining process used in this paper to classified depressive and non-depressive episodes.

## *2.1. Dataset Description*

In this work, the Depresjon dataset is used to classify depressive episodes. It is comprised by the motor activity of 23 patients diagnosed as bipolar, unipolar depressive and bipolar I (all these labeled as condition), and 32 non-depressive control subjects.

The motor activity corresponds to a weighting voltage collected with an actigraph watch (Actiwatch, Cambridge Neurotechnology Ltd., England, model AW4) located in the right wrist, which records movements over 0.5g in a sampling frequency of 32 Hz. Some advantages of actigraphs accelerometers is that they are inexpensive and well-known activity trackers [19] and, in addition, they are easy to wear and allow collecting data from day and night [20].

The data structure is formed by different files. One set of files contains a csv file per each condition and control with their recorded motor activity, organized in three columns: timestamp (one minute intervals), date (date of measurement), activity (activity measurement from the actigraph watch). Also, a scores file is included, which provides information about every subject. This file includes the columns: number (contributor id), days (number of days of data collection), gender (1 or 2 for female or male), age, afftype (1: bipolar II, 2: unipolar depressive, 3: bipolar I), melanch (1: melancholia, 2: no melancholia), inpatient (1: inpatient, 2: outpatient), edu (education), marriage (1: married or cohabiting, 2: single), work (1: working or studying, 2: unemployed/sick leave/pension), madrs1 (MADRS score when measurement started), madrs2 (MADRS when measurement stopped).

Features describing the date, timestamp and if it is or not a weekend, are not taking into account for the classification.

The package with the dataset files and full description of data can be downloaded from http: //doi.org/10.5281/zenodo.1219550 [15].
