1. Introduction
Migraine is a complex neurovascular disorder that affects more than a billion people worldwide, making it one of the most prevalent neurological conditions globally [
1]. The impact of migraine extends far beyond the physical pain it causes, negatively influencing several critical aspects of an individual’s life such as marriage, parenting, and family relationships, as well as career prospects, financial stability, and general health [
2]. The unpredictable nature of migraine attacks is particularly distressing for sufferers, as highlighted in numerous studies [
3,
4,
5,
6]. Consequently, individuals suffering from migraines are three times more likely to develop generalized anxiety disorders compared to the general population.
The desire for better management and predictability of migraine attacks has been clearly expressed by those affected. A survey of 565 people concluded that 88.8% would like a device to predict migraine attacks, most preferring a wrist-worn option [
7]. Almost all participants in another study [
8] agreed that they would want information on when a migraine might occur. With the advent of personal health tools such as apps, smartphones, and wearable devices, it is now possible to capture objective and prospective data on a large scale. This development facilitates the collection of massive datasets that can be used to develop prediction models with less effort and cost, enabling studies at both the individual and population levels. The biomedical signals measured by wearable devices can provide valuable information on physiological changes, such as variations in skin temperature (SkinTEMP), pulse, and sleep quality, offering insights into changes that occur before the onset of migraines.
In this study, we used the newer Empatica Embrace Plus device [
9], which offers enhanced features over the commonly used Empatica E4 [
10,
11,
12,
13]. Our focus is on understanding pre-migraine symptoms, which could lead to earlier interventions and better patient outcomes [
14]. However, the reliability of triggers in predicting an attack varies between and within individuals, adding complexity to the development of prediction models [
15].
According to the third edition of the International Classification of Headache Disorders (ICHD-3) [
16], some patients experience various prodromal symptoms that can start up to two days before the onset of headache. In the PRODROME trial [
17], scientists analyzed prodromal symptoms in 920 participants who stated that they could identify migraine attacks with prodromal symptoms reliably followed by headache and showed that treatment during the prodrome prevents the onset of moderate or severe headaches and reduces functional disability. Clarifying our understanding of pre-headache symptoms could improve our ability to study the underlying mechanisms of attack onset and may lead to earlier treatment during the prodrome, with the intention of shortening the prodrome, preventing headaches, and improving function. Given the complex nature of migraine attacks, more studies are required at the individual and population levels [
14].
Studies have shown that changes in biomedical signals regulated by the autonomic nervous system (ANS) are significant in chronic diseases, including migraine [
18]. Various signal processing techniques and machine learning models have been explored to predict migraines [
19,
20,
21,
22].
AI models have shown good results in the detection and classification of headache disorders, including migraine, using deep learning and ML techniques [
23].
Among the relatively small number of studies focused on migraine prediction, we selected those that used monitoring through electronic journals [
17,
24], wearable sensors [
8,
10,
11,
12,
13,
25], finger-held peripheral capillary oxygen saturation devices (SpO2), and ECG patches [
18]. These studies are notable for their data collection methods that involve migraine participants and their significant contributions to the development of migraine prediction models [
12,
18,
25,
26,
27].
Researchers who have conducted studies on migraine prediction have identified specific biomedical signals that are considered particularly informative in the prediction of migraine episodes. For example, in their work in [
18], the authors highlighted the importance of a multivariate analysis involving the HR, EDA, SkinTEMP, and SpO2 signals for the prediction of migraine.
To collect their data, the researchers used an SpO2 finger-sized device and affixed ECG patches. During their investigation, the authors of [
18] observed that there was a lack of sufficiently useful models for migraine prediction, except in cases involving the combination of SkinTEMP, HR, and SpO2. In this combination, they found that four different models performed reasonably well in predicting migraine attacks in two participants with migraine. However, the prediction accuracy using the combination of SkinTEMP, EDA, HR, and SpO2 exhibited a relatively high false positive rate (FPR) in their investigation.
In the study in [
25], the authors included eight participants with episodic migraine and recorded real-time hemodynamic signals, including TEMP, HR, and EDA, obtained from a wrist-wearable device for 24 h. Personalized prediction models were generated using artificial recurrent neural networks, such as long short-term memory (LSTM), to compute on a one-minute basis if pain would appear in the next 120 min. Data were balanced in pain–no-pain time periods to train the models. The algorithm was able to predict migraine attacks with a sensitivity of 95% throughout the sample.
The study in [
12] used the Empatica E4 wristband to collect data from seven volunteers with different types of migraine. What distinguishes this research from previous studies is its exclusive focus on data collected during sleep periods. The authors chose this approach based on findings from a prior study [
13], which highlighted the impact of physical activity on heart rate (HR).
The fundamental hypothesis of the study in [
12] posits that focusing on the signals acquired during sleep allows a rough estimate of the probability that a migraine attack occurs the following day. To test this hypothesis, sleep data were classified into two classes: (1) nights preceding a day without a migraine, and (2) nights preceding days with a migraine. Consequently, class (2) contained information related to the prodromal stage of a migraine attack. The features used in the analysis included accelerometers (Accs), blood volume pulse (BVP), TEMP, EDA, RR, and pulse rate variability (PRV).
In a study, quadratic discriminant analysis (QDA) and linear discriminant analysis (LDA) were compared, and the authors found that QDA produces better results than LDA (84.1% vs. 70.2% in personal recognition models). The authors of [
12] also tested user-independent recognition models, but found that they could not detect migraine attacks, the balanced recognition accuracy was below 50%. To gain a better understanding of how well a user-independent model can detect migraine attacks in advance, data should be collected from much larger user groups, or the study should focus only on one type of migraine attack [
12].
The body of research that confirms changes in biomedical signals during migraine episodes, together with the presence of observable prodromal symptoms in some individuals before the onset of a migraine, coupled with intriguing findings from other studies in the field of migraine prediction, strongly suggests the potential for abnormalities in biomedical signals during the prodromal phase. This hypothesis has driven our investigation, where we leveraged the expertise of colleagues who have conducted similar research on migraine using modern analytical tools and technologies.
The main contributions of this paper are outlined as follows:
A comprehensive investigation into the application of wearable technology in conjunction with machine learning models to predict migraine attacks by analyzing physiological signals collected during nocturnal sleep.
A detailed examination of the effectiveness of various analysis frames in predicting migraine attacks, highlighting the most suitable time frames for accurate predictions.
The identification of key physiological features with the highest predictive power for detecting abnormalities that precede migraine onset, providing valuable insights into the early detection of migraines.
Establishing a foundation for future research by emphasizing the importance of traditional statistical metrics in feature extraction and suggesting the integration of physiological data collected over the investigated periods to improve prediction models.
2. Methods
Inspiration for this investigation was drawn from the study in [
12], leading to the decision to conduct a similar investigation using biomedical signals from nocturnal sleep periods. Consequently, a labeling approach was adopted for nights based on participants’ migraine diaries, where nights preceding a day without migraines were designated as (0) and pre-migraine nights as (1); nights after a day with migraine were labeled as (2) and excluded from the investigation due to possible abnormalities. These labels were used as output in the developed ML model. The input comprised 78 features, extracted from six signals measured by the Empatica Embrace Plus wristband. A general workflow diagram of the research process is shown in
Figure 1.
An important aspect of this research involved the implementation of precise criteria to define nocturnal sleep data and fit them into frames of 5, 10, 30, 60, 90, and 120 min for analysis. We used this method to expand the dataset and explore how the duration of the analysis frame impacted the precision of migraine prediction.
In the study, specific criteria were established to define nocturnal sleep data. These criteria were based on the time interval between 7 p.m. and 8 a.m., and values of the sleep detection stage (SDS) measured by Empatica Embrace Plus, which help identify different phases of sleep and limit night movements to no more than 20 steps, and require the SDS value to be exactly 101, which strictly indicates sleeping.
After identifying the nocturnal sleep data, they were subdivided into analysis frames, upon which feature extraction was subsequently performed. This approach was essential for the operation of the developed framework and the sleep-based ML model.
2.1. Empatica Embrace Plus Wristband
The Empatica Embrace Plus [
9] enables extended data collection of up to 14 days due to its long battery life and increased NOR flash memory, with rapid 90 min charging for longer monitoring cycles, essential to our study.
Featuring multiple sensors, including a 3D accelerometer, gyroscope, PPG sensor (up to 64 Hz), TEMP sensor (1 Hz), and EDA sensor (up to 4 Hz), the device provides access to raw data, digital biomarkers, and reports via the Empatica Health Monitoring Platform.
Embrace Plus offers four configurations: Pulse Rate Pro, SpO2 Pro, Actigraphy Pro, and Actigraphy Optimized. Its algorithms derive digital biomarkers from raw sensor data, such as RMSSD for PRV, respiratory rate from PPG and Acc data, and continuous monitoring of SpO2.
Additional algorithms include those for SDS, activity count (ACT), activity classification, body position, and MET, using the 3-axis accelerometer; along with gait speed, skin conductance level, and continuous SkinTEMP estimation from EDA sensors. These biomarkers capture vital physiological data, although SpO2 is not included in our professional plan.
In reviewing Empatica Embrace Plus, we focus on the precision of its HR algorithm during the daytime that was questioned in the study [
13,
25]. The developers [
28] reported that, under no motion conditions, the error of the PR algorithm is less than 3 bpm, and under typical daily activities, less than 5 bpm, both within clinically acceptable ranges. Although our study uses primarily nocturnal sleep data, these findings support the reliability of using PR data from daytime activity for further research.
In this study, digital biomarkers obtained from the device were utilized, providing insight into various physiological parameters. The signals used, their physical significance, and units of measurement are described in the following.
Pulse rate (PR): Continuous monitoring of the participant’s heart rate, measured in beats per minute (bpm).
Pulse rate variability—RMSSD: Intermittent monitoring of pulse rate variability, expressed as the root-mean-square of successive differences between consecutive systolic peaks, measured in milliseconds (ms).
Respiratory rate (RR): Intermittent monitoring of the participant’s breathing rate, measured in breaths per minute (brpm).
Sleep detection: Automatic detection of sleep periods based on physical activity data.
Activity count: Continuous estimation of movement intensity, measured in arbitrary units (a.U.).
Accelerometer magnitude standard deviation: Average standard deviation of Acc data magnitude, measured in g (gravitational force).
Step count: Continuous monitoring of step count, measured in steps.
Metabolic equivalent of task (MET): Continuous estimation of energy consumption during physical activity, measured in MET.
Electrodermal activity (EDA): Continuous monitoring of the participant’s skin conductance level, measured in microsiemens ( siemens).
Wearing detection: Detection of device wearing status, expressed as the proportion of time the device is worn versus not worn, measured in percentage (%).
2.2. Participants
In this study, ten migraine sufferers were enrolled according to specific inclusion criteria to assess the applicability of wearable biosensor technology to predict migraine attacks by monitoring changes in ANS during the prodrome phase. Participants were instructed to wear the device on the wrist of their non-dominant hand and continue using it until the occurrence of at least three migraine episodes was recorded. The demographic information and the number of monitoring days for each participant are shown in
Table 1.
To minimize external impacts on changes in the autonomic nervous system for the most precise analysis possible, and to avoid bioethical concerns, subjects were included according to the following inclusion and exclusion criteria.
2.2.1. Inclusion Criteria
Age ≥ 18 years;
Diagnosis of episodic migraine with/without aura, meeting ICHD-3 (3rd International Classification of Headache Disorders) criteria;
Frequency of at least 4 migraine attacks per month;
Ability to write, read, and understand the Lithuanian language;
Ability to understand and perform all procedures defined in the study protocol.
2.2.2. Exclusion Criteria
Pregnant and lactating women;
Diagnosis of chronic and/or hemiplegic migraine;
Diagnosis of other headache syndrome, except for episodic tension-type headache whose frequency does not exceed 4 days per month;
Use of preventive migraine treatment;
Use of drugs with an effect on the ANS: antidepressants, cholinergic and anticholinergic drugs, antipsychotics, antihistamines, antihypertensives, antiepileptic drugs, opioids, benzodiazepines, antiparkinsonian drugs, antispasmodics, antiemetics, muscle relaxants;
Other diagnosis of chronic pain.
2.3. Data Pre-Processing
During the pre-processing stage of the study data, the objective was to develop individual datasets for migraine prediction models for each participant. The digital biomarkers, systematically organized into daily folders containing 14 files corresponding to different signals in the cloud, were merged into a single comprehensive file per participant.
Table 2 shows that PRV and RR signals were excluded from the research due to their high proportion of missing values. This pattern of a high percentage of missing data in the PRV and RR signals was consistently observed across all participants’ data.
The data pre-processing involved removing missing values to ensure that the data were reliable and complete. Data were normalized according to the formula
To eliminate non-informative values in the PR signal resulting from issues such as poor device attachment, the values were clipped to predefined thresholds of 25 to 115 bpm (
Figure 2).
The features were extracted using statistical measures described in the section Feature Extraction. As mentioned above and in
Figure 1, each night was labeled 0, 1, or 2. Several classifiers were used: Random Forest (100 estimators, Gini criterion), HistGradientBoosting (learning rate 0.1, max depth 3, min samples leaf 20), XGBoost (learning rate 0.1, 100 estimators, max depth 3, mlogloss metric), support vector machine (SVM) algorithm (C = 1.0, RBF kernel, scale gamma), and k-nearest neighbors algorithm (KNN) (5 neighbors, uniform weights, Minkowski metric). The SVM algorithm classifies the extracted features by identifying the optimal hyperplane that maximizes the margin between migraine and non-migraine classes in our provided feature space, as used for similar classification tasks [
29]. In contrast, XGBoost, a gradient boosting algorithm, iteratively updates decision trees, optimizes accuracy by reducing prediction errors, and effectively captures complex patterns within the data [
30].
Stratified five-fold cross-validation ensured robust evaluation, calculating accuracy, precision, recall, and F1-score. This approach involves dividing the dataset into five equal parts, with each part serving as a test set once and as part of the training set four times. This cycling ensures that each data segment is used for both training and testing [
31].
Given the dataset imbalance, random under-sampling was applied as a part of the research process to achieve a 5:1 ratio using random under-sampling, and recall metrics that were received were compared while using unbalanced data.
An ANOVA variance analysis, which is widely used in medical studies [
32,
33,
34], was performed to compare the characteristics between nights labeled 0 and 1. The classifiers were trained and evaluated based on F1-scores for different analysis frames, with the best classifiers summarized in the results tables.
3. Feature Extraction
The feature extraction phase was dedicated to data preparation for subsequent analysis and ML applications, which involves extraction of features from biomedical signals, including EDA, PR, Acc, MET, SkinTEMP, and ACT.
As mentioned above, nocturnal sleep was divided into analysis frames to indicate the duration of the analysis frame that was most suitable for the prediction of migraine attacks. For each of these analysis intervals, a set of features was extracted, including the following.
Mean: The average level of the signal.
Median: The central tendency of the data.
Std: The degree of data variability relative to the mean.
Max: The highest recorded value.
Min: The lowest recorded value.
Clearance factor: Reflects the maximum clearance in the data.
Crest factor: Provides information on signal peaks.
Impulse factor: Indicates abrupt changes in the signal.
Kurtosis: Measures the distribution’s sharpness or flatness.
Peak value: The highest peak in the data.
RMS (root mean square): Magnitude of the signal.
Shape factor: Provides insights into the shape of the signal.
Skewness: Measures the symmetry or asymmetry of the data distribution.
These features were calculated using formulas from [
35]. Time-domain features such as mean, median, Std, min, and max are used in similar research using wearables, for example, stress detection using Empatica E4 [
36] and migraine forecast [
12,
37]. Other time-domain features, such as clearance factor, crest factor, impulse factor, peak value, kurtosis, RMS, shape factor, and skewness are successfully implemented in other domains [
38,
39]. The latter features are considered more explicable, but are not commonly used in data obtained from wearables [
40,
41]. This approach facilitated an exploration of the relationship between these features and migraine occurrences, providing insight into the predictive patterns of migraines.
In the learning process, the input is made up of 78 features, including 13 previously specified features for each of the six unique signals. These features are important ground for the development of classifiers, which involves training and evaluating a variety of ML models. The result of this method is a night-labeling system.
4. Results
4.1. Feature Ranking and Analysis
In the analysis performed, a variety of physiological, activity, and signal processing measurements were evaluated to determine their relative importance in predicting migraines. ANOVA feature ranking was used to evaluate the impact of the features on the prediction results. The recurrence of the signal source for the subset of features is shown in
Table 3. The recurrence of features, independent of the signal source, is shown in
Table 4.
Table 3 presents the recurrence results for each signal as a source within the subset of features ranked. From the initial set of 78 features, the 35 most significant were selected using analysis of variance (ANOVA), allowing us to identify the characteristics that have the most substantial impact on the predictive model. The PR and SkinTEMP signals showed the highest frequency of recurrence, highlighting their significant contribution to the overall set of features. In contrast, signals such as ACT and Acc appeared less frequently, indicating their relatively lower influence on the predictive model.
The analysis of the rank of characteristics, as illustrated in
Table 4, provides information on how to predict migraine. Features such as median and min were frequently selected for a feature subset, independent of the signal source. In contrast, features such as skewness and kurtosis were less important, indicating that the shape of the data distribution might play a minor role in the prediction of migraine.
4.2. Classifier Performance Analysis
To evaluate the migraine prediction model, key metrics including accuracy, F1-score, precision, and recall were used. The analysis focused on different analysis frames, with the objective of identifying the best-performing classifier for each interval based on these metrics.
To define the most suitable classifier for each length of analysis frame, an experiment was conducted training XGBoost, HistGradientBoosting, Random Forest, SVM, and KNN models. Following this, the best ML model was chosen from the list above for each analysis frame based on the highest F1-score.
Random Forest shows strong and consistent performance across multiple participants and analysis frames. XGBoost also performs well, particularly in shorter analysis frames. HistGradientBoosting is another strong contender, particularly in medium to longer analysis frames.
Performance metrics vary significantly between participants. This could be due to individual differences in physiological signals or different levels of data quality and quantity. For some participants, a specific classifier consistently performs better, while for others, the best classifier changes with the duration of the analysis frame.
Shorter analysis frames tend to produce higher F1-scores and recall for most participants, as shown in
Table 5 and
Table 6. This is likely because shorter frames can capture more granular changes in physiological signals. The medium analysis frames show balanced performance, as demonstrated in
Table 7 and
Table 8, with some drop in F1-score and recall compared to shorter frames, but they are still useful. Longer analysis frames generally result in lower performance metrics; see
Table 9 and
Table 10. The ability to detect differences between migraine and non-migraine days diminishes with longer frames.
The 5 and 10 min frames are generally the best for achieving high F1-scores and recall. These shorter frames capture more detailed variations in the data, which is important for detecting the onset of migraines.
4.3. Rationale for Implementing Cost-Sensitive Learning in Classifier Training
Classification methods based on feature-level and sensitive cost functions are widely used to deal with problems related to unbalanced datasets [
42]. In this context, the decision to implement cost-sensitive learning with a ratio of 5:1 in classifier training was based on the unbalanced nature of the dataset. The results shown in the previous section in
Table 5,
Table 6,
Table 7,
Table 8,
Table 9 and
Table 10 were derived from training of ML models on unbalanced data.
The cost ratio was experimentally deduced to find a balance between sensitivity and specificity and to avoid model overfitting. For the first participant, there were more migraine samples than non-migraine samples, while for all other participants it was the reverse, with fewer migraine samples. The goal of using cost-sensitive learning was to enhance the sensitivity of the predictive model to migraine samples, with the aim of improving the model’s precision in identifying migraine cases.
4.4. Comparative Analysis of Classifier Learning with Cost-Sensitive Approach
Cost-sensitive learning was introduced in classifier training to achieve a ratio of 5:1. The results of the recall metrics of classifiers trained with cost-sensitive learning and unbalanced data are shown in
Figure A1.
The application of cost-sensitive learning in classifier training generally led to an improvement in recall for participants who initially had lower recall metrics with unbalanced data. For example, participants 1, 3, and 10 showed increases in recall in various analysis frames, indicating that cost-sensitive learning effectively improved the model’s ability to identify migraine instances. However, some participants, such as 2 and 8, experienced a decrease in recall in most analysis frames, suggesting a potential trade-off when applying this approach. For others, such as participants 5 and 9, the recall metrics remained largely unchanged, indicating that the effect of cost-sensitive learning can be participant-specific. These variations in recall performance could be attributed to inherent differences in individual physiological responses and the complexity of their data patterns. In general, the cost-sensitive approach appears to benefit those with initially lower recall, highlighting its utility in improving migraine detection in unbalanced datasets.
4.5. ANOVA Feature Analysis: Comparison of Pre-Migraine Night with Night before a Migraine-Free Day
To evaluate the analysis of various features during the pre-migraine night and the night before a migraine-free day, an analysis of variance (ANOVA) was performed [
43]. Based on the results of ANOVA, heat maps were generated showing the F-statistic and the
p-value for each signal feature, as shown in
Figure 3,
Figure 4,
Figure 5,
Figure 6,
Figure 7,
Figure 8,
Figure 9,
Figure 10,
Figure 11,
Figure 12,
Figure 13 and
Figure 14.
The F-statistic measures the ratio of the variability between the group means to the variability within the groups. A higher F-statistic indicates that the group means are significantly different from each other relative to the variability within the groups. In this context, it helps determine whether there are significant differences in the feature values between nights marked with 0 and those with 1. The F-statistic in ANOVA is calculated as
where
(sum of squares between groups) is the variability due to the interaction between the groups.
(sum of squares within groups) is the variability within each group.
and
are the degrees of freedom between and within groups, respectively. These formulas were derived and used according to the principles outlined in [
44,
45].
The p-value indicates the probability that the observed differences between the group means occur by chance. A lower p-value suggests that the differences between the groups are statistically significant. In this study, it helps confirm whether the features vary significantly between the two types of nights.
The
p-value is calculated on the basis of the F-distribution:
This is the probability that the observed F-statistic would be as extreme as or more extreme than what was actually observed under the null hypothesis.
The ANOVA F-statistic and
p-value heat maps (
Figure 3 and
Figure 4) for the EDA features indicate that the shorter analysis frames of 5 and 10 min exhibit higher F-statistic values and significant
p-values (less than 0.05) for several features such as mean, median, max, min, peak value, and RMS. This suggests that these features are significantly different when comparing pre-migraine nights to migraine-free nights within these shorter time frames. As the analysis frame length increases to 30 min, 60 min, 90 min, and 120 min, the F-statistic values decrease, and
p-values rise above the significance threshold, indicating a reduced variance and less significant differences.
Similarly, for PR features, the F-statistic values are higher, and p-values are significant for shorter analysis frames of 5 and 10 min, particularly for features such as the clearance factor and the shape factor. This trend diminishes with longer analysis frames, where the F-statistic values drop and p-values become non-significant, highlighting that shorter frames are more effective for capturing significant variations in PR related to migraines.
For SkinTEMP features, the ANOVA results in
Figure 7 and
Figure 8 show significant F-statistic values and low
p-values for shorter frames, indicating notable differences in features such as mean, median, Std, max, min, clearance factor, crest factor, impulse factor, peak value, and RMS during pre-migraine nights compared to migraine-free nights. As the frame lengthens, the significance of these differences reduces, as evidenced by lower F-statistic values and higher
p-values.
MET features also display higher F-statistic values and significant p-values for shorter frames, with features such as mean, median, min, clearance factor, crest factor, impulse factor, and shape factor showing notable differences. Longer frames see a decline in both the F-statistic values and the significance of the p-values, suggesting that the predictive power of these features decreases over longer periods.
For ACT features, shorter analysis frames again show higher F-statistic values and significant p-values for features like mean, median, Std, max, min, peak value, and RMS. This trend aligns with previous observations, where longer analysis frames lead to a reduction in significance and variance, indicating that shorter frames are more suitable for capturing relevant differences.
The ANOVA results for the Acc features, as shown in
Figure 13 and
Figure 14, further reinforce the pattern, with higher F-statistic values and significant
p-values for shorter frames of 5 and 10 min, particularly for features such as mean, median, Std, max, min, peak value, and RMS. As the analysis frames extend to 30 min, 60 min, 90 min, and 120 min, the statistical significance wanes, as reflected by lower F-statistic values and higher
p-values.
The ANOVA analysis clearly indicates that shorter analysis frames of 5 and 10 min are more effective in capturing significant variations in physiological features between pre-migraine nights and migraine-free nights. These shorter frames exhibit higher F-statistic values and significant p-values for various characteristics, suggesting that they contain more relevant information for the prediction of migraines. As the analysis frame length increases, the significance of the differences diminishes, pointing towards a reduced predictive power over longer periods. This insight is important for developing effective migraine prediction models, emphasizing the importance of selecting appropriate analysis frame lengths to maximize predictive accuracy and clinical relevance.
EDA, SkinTEMP, and Acc features show the highest F-statistic values and the most significant p-values in the 5 and 10 min frames, making them particularly useful for early detection of migraines. MET and ACT features also show significant differences, especially in the 5 and 30 min frames, further indicating their potential utility in migraine prediction. These findings suggest that shorter analysis frames are generally more effective, and specific signals such as EDA, SkinTEMP, and Acc data are particularly valuable for accurate and timely migraine prediction.
4.6. Generalized Model for Predicting Migraines
The final stage of the study involved the training of ML models on data from the ten participants using cost-sensitive learning with a ratio of 5:1 to evaluate the significance of performance metrics. Among all the models evaluated, XGBoost demonstrated the best performance for the generalized model in terms of F1-score across all analysis frames, highlighting its robustness and effectiveness in predicting migraines regardless of the duration of the analysis frame, as shown in
Table 11.
However, the clinical implications of these metrics, particularly recall, warrant careful consideration. Although XGBoost consistently provided the best F1-scores, the recall values indicate significant limitations in the model’s sensitivity, especially for longer analysis frames. Shorter analysis frames of 5 and 10 min offer relatively better performance, but still show a need for improvement in recall to ensure clinically effective migraine prediction.
These results suggest that the current generalized model, while robust in certain metrics, may not be sufficiently sensitive for clinical application, as it risks failing to detect a substantial number of true migraine events. Consequently, there is a pressing need to enhance the model’s sensitivity and ability to distinguish between migraine and non-migraine events. This could involve incorporating additional features, further optimizing the model, or integrating it with other predictive methods to improve its overall effectiveness in a clinical setting.
5. Discussion
The primary objective of this study was to explore the impact of the duration of the analysis frame and specific features on migraine prediction using wearable biosensor data. By investigating various physiological, activity-based, and signal processing metrics, we identified effective classifiers for different analysis intervals. Focusing on pre-migraine nights was crucial for understanding the patterns preceding migraine attacks while minimizing the confounding effects of physical activity on biomarkers.
Our results demonstrated that shorter analysis frames, specifically 5 and 10 min, yielded higher F1-scores and recall metrics among various participants. This finding suggests that these shorter frames are more effective in capturing the subtle physiological changes that precede migraine attacks. However, the study also highlighted the importance of balancing sensitivity and specificity, particularly in the context of migraine detection, where false positives and false negatives have distinct implications for patient care.
The use of cost-sensitive learning with a 5:1 ratio was a key strategy to address the unbalanced nature of the dataset. This approach improved recall metrics for several participants, enhancing the model’s ability to detect migraine instances. However, the effectiveness of this method varied among the participants, indicating the need for personalized approaches in the training of the classifier.
Feature extraction played an important role in this study and traditional statistical metrics such as minimum and median values proved to be more predictive than unconventional features. This emphasizes the importance of using a comprehensive range of features to build robust predictive models. Furthermore, the integration of frequently updated raw data during the prodromal phase was found to be beneficial in capturing the significant effects of daily activities on physiological parameters, thus improving the predictive power of the model.
Compared to the reviewed studies, our results align with those found by [
18], which highlighted the importance of a multivariate analysis that involves the HR, EDA, SkinTEMP, and SpO2 signals for the prediction of migraine. They found that combining these signals improved prediction accuracy, although with a relatively high false positive rate. Our research further supports these findings by confirming that the EDA, SkinTEMP, and Acc data are significant predictors, especially when analyzed in shorter time frames.
Furthermore, the researchers in [
25] demonstrated the effectiveness of personalized prediction models using hemodynamic signals, including TEMP, HR, and EDA, obtained from a wrist wearable device. They achieved a sensitivity of 95% in predicting migraine attacks with LSTM models. Our study supports the use of similar physiological signals and emphasizes the utility of shorter analysis frames to improve predictive accuracy. However, it is important to note that [
25] trained their data with pain–no-pain balancing, which can lead to pseudo-high predictive accuracy. Thus, finding the best balancing approach and testing it on separate datasets is crucial. We consider investigating LSTM with other balancing methods.
The methodology of the study, including the exclusion of high-missing-value signals such as PRV and RR, and the application of normalization, ensured the reliability of the processed data. The decision to focus on nocturnal sleep data, segmented into various analysis frames, provided a robust framework for feature extraction and model training.
6. Research Limitations
Despite the promising results, several limitations must be acknowledged. First, the absence of a control group without migraines limits the generalizability of the findings. The study design relied on baseline data from nights before migraine-free days, which may not fully represent the variability in physiological signals among the general population.
Second, the sample size of ten participants, while sufficient for preliminary analysis, is relatively small to develop generalized predictive models. Future research should involve a larger and more diverse participant pool to enhance the robustness and applicability of the models.
Third, the study did not incorporate other potential migraine triggers, such as meteorological conditions, space weather, or menstrual cycles, which could influence the accuracy of the predictive models. Including these factors in future research could provide a more comprehensive understanding of migraine triggers and improve the accuracy of prediction.
Lastly, while the study focused on nocturnal sleep data, the possible impact of daytime activities and stress levels on physiological signals was not explored. Future studies should consider incorporating data from both daytime and nighttime periods to develop more holistic predictive models.
Furthermore, the study examined a limited set of ML algorithms, specifically XGBoost, HistGradientBoosting, Random Forest, SVM, and KNN. Research is constrained by this selection, and exploring a wider range of classifiers, including newer and more sophisticated algorithms, is necessary to fully understand the potential for accurate migraine prediction. Future studies should investigate other classifiers to provide a more comprehensive evaluation of different modeling approaches and their effectiveness in predicting migraines.
7. Conclusions
This study has demonstrated the potential of using wearable biosensor technology and ML models to predict migraine attacks by analyzing physiological signals during nocturnal sleep periods. The findings highlight the importance of shorter analysis frames (5 and 10 min) in capturing pre-migraine abnormalities, with features such as EDA, SkinTEMP, and Acc data showing the highest predictive power.
Although the use of cost-sensitive learning improved the recall metrics for several participants, the variability in effectiveness underscores the need for personalized approaches in classifier training. The study’s emphasis on traditional statistical metrics for feature extraction provides a solid foundation for future research in this area.
To enhance the clinical applicability of migraine prediction models, future studies should involve a larger and more diverse group of participants, incorporate additional migraine triggers, and explore the integration of physiological data during the day and at night. The development of adaptive self-learning models that account for individual variability and external factors holds promise for more personalized and effective migraine management strategies.
In conclusion, the integration of wearable technology and ML offers a promising pathway toward early detection and prevention of migraine attacks. Continued research and refinement of predictive models are essential to realize the full potential of this approach in clinical settings, ultimately improving the quality of life of individuals affected by migraines and other neurological conditions.
Future work should focus on expanding the pool of participants to improve the generalizability of predictive models and incorporate a wider range of migraine triggers, such as meteorological conditions and menstrual cycles. Furthermore, future studies should explore both daytime and nighttime physiological data, as well as a wider array of machine learning algorithms, to develop more robust and accurate migraine prediction models.