Fusing Wearable Biosensors with Artificial Intelligence for Mental Health Monitoring: A Systematic Review
Abstract
:1. Introduction
1.1. Background
1.2. Objectives
- What types of biosensors and data collection methods are most commonly used for AI-based monitoring of mental health conditions such as depression, stress, and anxiety?
- What methodological challenges and gaps exist for research at the intersection of biosensing, AI, and mental health?
2. Materials and Methods
3. Results
3.1. Stress
Article/Year Application | Sample Size | Wearable/ Feature Space | Prediction Task/ AI Details | Env. | Labels/ Monitoring Details | Research Challenges |
---|---|---|---|---|---|---|
(Akubulut et al., 2020) [17] Real-time stress monitoring and management | 30 | ❖ CVDiMo wearable sensor device | ❖ Multiclass classification ● A feed-forward neural network (FFNN) with 2 hidden layers and linear output ● 92% (metabolic syndrome) and 89% (others) accuracy | IL | ● Labels: self-reported emotional ratings (1–10) after emotion-inducing videos ● 6 labels/participant (one for each video) ● Monitoring: approximately 1 h | ● GSR/EDA biases challenge accuracy ● Small sample size limits generalizability ● Larger datasets needed for robust, broader analysis |
(Anusha et al., 2019) [18] Pre-surgery stress detection | 41 | ❖ ADI-VSM device | ❖ Multiclass classification ● Localized supervised learning with adaptive dataset partitioning ● 97.83% accuracy; 85.06% without person-specific factors | IL | ● Labels: State–Trait Anxiety Inventory (STAI) and salivary cortisol levels ● 2 STAI surveys assessed stress levels (low, moderate, and high) ● Monitoring: approximately 3 h before surgery | ● Stress response variability limits applicability ● Clinical conditions affect detection accuracy ● Small, surgery-specific sample size |
(Aristizabal et al., 2021) [19] Wearable and self-report stress detection | 18 | ❖ Empatica E4 | ❖ Binary classification ● Supervised deep neural networks with logistic generalized estimating equation (GEE) for stress estimation ● 96% accuracy (wearable + survey); 88% accuracy (wearable only) | IL | ● Labels: self-reported stress/anxiety (STAI and PSS-10) and salivary cortisol ● 8 labels/participant: 4 cortisol samples; 4 self-reports ● Monitoring: 2 h monitoring with 4 prompts (baseline, post-stress, and recovery) | ● Small sample size ● Imbalanced data labels ● Data showed high variability ● Research-grade wearable limits translatability ● Stress detection focused on induced stress in controlled settings |
(Barki and Chung, 2023) [20] Mental stress detection | 14 | ❖ In-ear device (earbod) PPG system | ❖ Binary classification ● Supervised convolutional neural network (CNN) ● 92.04% accuracy, 90.80% F1-score; 96.02% accuracy with white-Gaussian noise | IL | ● Labels: stress/no-stress classifications based on PPG signals and mental stress tasks (Stroop test; arithmetic) ● 2 labels/participant (stressed and non-stressed) ● Monitoring: 3 min per task | ● Small sample size limits generalizability ● Limited stressors reduce real-world applicability ● Further research needed for diverse settings and populations |
(Betti et al., 2017) [21] Stress detection via physiological signals correlated with salivary cortisol using PCA analysis for early warnings | 15 | ❖ Zephyr BioHarness 3 (chest belt for HRV), Shimmer Sensor (EDA/GSR monitoring), and MindWave Mobile EEG headset (EEG) | ❖ Binary classification ● Supervised ML ● 84% sensitivity, 90% specificity, and 86% overall accuracy | IL | ● Labels: salivary cortisol levels during the Maastricht Acute Stress Test (MAST) ● 5 salivary cortisol samples and multiple physiological data points per participant ● Monitoring: approximately 1 h | ● Small sample size limits generalizability ● Offline processing hinders real-time application ● MAST protocol complexity and cultural adjustments affect reproducibility |
(Booth et al., 2022) [22] Predicting perceived stress | 606 | ❖ Garmin Vivosmart 3 | ❖ Binary classification ● Supervised ML; multimodal ● 62% accuracy, 65% precision, and 89% recall | IW | ● Labels: daily self-reported perceived stress using a 5-point Likert scale ● 56 labels/participant (daily for 56 days) ● Monitoring: approximately 2 months ● Prompting: 1–3 times daily via SMS (at 8 am, noon, or 4 pm) to complete the surveys | ● Low resolution limits daily stress assessment ● More contextual data on activities and environment needed ● Inadequate data limits GRU/LSTM; stronger temporal models required ● Non-compliance and contextual variability impact predictions |
(Campanella et al., 2023) [23] Stress detection | 29 | ❖ Empatica E4 bracelet | ❖ Binary classification ● Supervised ML ● 71% precision, 60% recall, and 65% F1-score (random forest (RF)) | IL | ● Labels: stress/no-stress classifications during cognitive, social, and physiological stress tasks ● 2 labels/participant (stressed and non-stressed) ● Monitoring: approximately 15–20 min per task | ● Unbalanced data impacts classifier performance ● Small sample size and lab setting limit generalizability ● Noise from Empatica E4 and lack of real-life stressors reduce accuracy ● Larger, diverse samples and varied stressors needed for robustness |
(Can et al., 2019) [24] Real-life stress monitoring system during a programming contest | 21 | ❖ Samsung Gear S1, S2, and S3 (HRV) and Empatica E4 (EDA/GSR, ST, ACC) | ❖ Multiclass classification ● Supervised ML ● 86–92% accuracy (separate vs. three-model); 88–97% accuracy (general vs. person-specific models) | IL | ● Labels: self-reported stress levels using the STAI ● 2 stress-related labels/participant (before and after the stress task) ● Monitoring: approximately 30 min | ● Device-related data quality variations impact accuracy ● Inconsistencies from subjective self-reported stress ● Future efforts focus on better devices and refined models for subjective stress data |
(Can et al., 2020) [25] Smartwatch-based stress detection validated with primary school teachers | 32 | ❖ Samsung Gear S1, S2 (HRV), and Empatica E4 (EDA/GSR, ST, and ACC) | ❖ Multiclass classification ● Personalized models ● Up to 94.44% accuracy (HR signal); 100% accuracy (EDA/GSR signal) | IL | ● Labels: perceived stress (NASA-TLX) during lecture, exam, and recovery ● 3 labels/participant ● Monitoring: approximately 1.5 to 2 h ● Prompting: after each session to complete surveys | ● Lower smartwatch data quality mitigated by artifact detection ● Notifications risk increasing stress without interventions ● Limited generalizability due to specific participant group and setting |
(Chen and Lee, 2023) [26] Stress measured in students using Sudoku in distracting environments for real-time educational assessment | 30 | ❖ Polar Verity Sense (PPG), BMD101 (ECG), and NeuroSky MindWave Mobile 2 (EEG) | ❖ Multiclass classification ● StressNeXt Model, Attention-LRCN, and Self-Supervised CNN ● 95–99% accuracy; 93–96% F1-score | IL | ● Labels: self-reported stress (3-point scale) after Sudoku in noisy, monitored, and comforting scenarios ● 3 labels/participant (one for each scenario) ● Monitoring: approximately 45 min (15 min per Sudoku puzzle scenario) | ● Self-reported stress bias affects reliability ● Adding GSR/EDA and respiratory signals can improve accuracy ● Larger datasets needed for robustness and broader applicability |
(Golgouneh et al., 2019) [27] Portable stress monitoring system for continuous stress index (SI) estimation and classification | 37 | ❖ Custom-fabricated wearable devices featuring PPG and GSR/EDA sensors developed by the RCDAT | ❖ Multiclass classification ● Classical machine learning: K-nearest neighbor (KNN), ANNs, Naive Bayes (NB), and support vector machine (SVM) used for stress classification ● Best performance: KNN (K = 3) with 85.3% accuracy | IL | ● Labels: self-reported stress/relaxation levels on a scale of 0 to 4, completed by participants after each test ● 3 labels/participant (relaxed, normal, and stressed) ● Monitoring: approximately 30 min during the stress-inducing tasks | ● Limited generalizability due to sensor types, conditions, and participant age range ● Validation needed on diverse datasets and sensors ● Computational complexity for real-time use not fully evaluated |
(Halim and Rehan, 2020) [28] Wearable system identifies driver emotions using hemispheric asymmetry to link brain dynamics with stress | 86 | ❖ EMOTIV EPOC+ EEG headset, a 16-channel device | ❖ Binary classification ● SVM, ANN, and RF evaluated using precision, sensitivity, specificity, F-measure, and G mean ● 97.95% accuracy, 89.23% precision, and 94.92% specificity | IL | ● Labels: self-reported emotional states (stress/relaxation) based on the Self-Assessment Manikin (SAM) after driving sessions ● 2 labels/participant (relaxed and stressed) ● Monitoring: approximately 30 min per session | ● Excludes fatigue, drowsiness, and alcohol effects ● Self-reports as ground truth introduce subjectivity ● Lab setting limits real-world applicability |
(Bin Heyat et al., 2022) [29] Monitoring researchers’ mental stress with an automatic stress detection system | 20 | ❖ A smart T-shirt developed by Hexin Medical Co. Ltd., featuring silver-coated flexible dry electrodes | ❖ Binary classification ● Decision Tree (DT), NB, RF, and Logistic Regression (LR) to classify the intra-subject (mental stress and normal) and inter-subject classification ● Intra-subject: 93.30% accuracy, 96.70% specificity, and 93.50% F1-score; inter-subject: 94.10% accuracy | IL | ● Labels: self-reported emotional states (stress/relaxation) using the SAM and anxiety levels using STAI ● 2 labels/participant (one for each driving session: relaxed and stressed) ● Monitoring: approximately 30 min per session | ● Data from 20 subjects limits generalizability ● Single-signal focus overlooks multi-signal interactions ● Small, homogeneous sample risks overfitting ● Unspecified gender distribution affects generalizability |
(Kikhia et al. 2016) [30] Detecting stress in dementia patients, with personalized stress threshold settings | 6 | ❖ Philips DTI-2 wristband sensor | ❖ Binary classification ● Base learners: neural networks, splines, ridge regression, RF, GLMs, Gaussian process, XGBoost, KNN, and SVM ● Four-fold cross-validation used to predict anxiety symptom changes between MIDUS-1 and MIDUS-3 ● 89% accuracy at the highest threshold | IW | ● Labels: stress monitored via wearables; caregiver observations as ground truth ● 2 labels/participant (stressed/not stressed) based on the analysis of physiological data and clinical observations ● Monitoring: 2 months | ● Limited participants reduce generalizability ● GSR/EDA-only algorithm with five severity levels oversimplifies stress ● Additional physiological data needed for better accuracy |
(Kim et al., 2020) [31] Mental stress assessment and monitoring | 21 | ❖ EGG: BIOPAC EGG100C with disposable electrodes ECG and RESP: BIOPAC BN-RSPEC and BN-RESP-XDCR with elastic bands | ❖ Multiclass classification ● Conventional machine learning models: SVM, LR, RF, DT, and KNN were used as classifier models ● 70.15% accuracy; 0.741 AUC | IL | ● Labels: perceived stress (10-point VAS) before and after tasks ● 2 labels/participant ● Monitoring: approximately 50 min during the relaxation and stress-inducing tasks (arithmetic and Stroop tasks) | ● Small sample size limits the generalizability ● Protocol-based categorical stress levels restrict analysis depth ● Wired equipment limits real-time monitoring |
(Nath et al., 2022) [32] Stress classification framework for older adults | 19 | ❖ Shimmer3 GSR+ | ❖ Binary classification ● DT, KNN, and Probability- and Kernel-based classifier ● 0.95 F1-Micro, 0.87 Macro, and 0.81 AUC | IL | ● Labels: salivary cortisol levels recorded during the Trier Social Stress Test (TSST) ● 5 labels/participant (based on cortisol samples at different stages: baseline, after stress induction, and recovery) ● Monitoring: approximately 1 h | ● Cortisol-only stress definition may miss rapid indicators ● Continuous EEG is impractical for older adults, risking frustration and anxiety |
(Perez et al., 2018) [33] Stress estimation on students in controlled environments like classrooms | 12 | ❖ COTS wrist wearable devices equipped with HR, GSR/EDA, ST, and ACC sensors | ❖ Multiclass classification ● Supervised ML models ● 99–100% accuracy | IL and IW | ● Labels: self-reported stress (5-point Likert) after each task ● 6 labels/participant across lab and classroom phases ● Monitoring: approximately 30–40 min per session over multiple sessions | ● Subject variability and device differences hinder replication ● Wearables outside the classroom could capture greater physiological variation over time |
(Rescio et al., 2023) [34] Non-invasive stress detection in industrial settings | 20 | ❖ Shimmer GSR | ❖ Binary classification ● Supervised (DT, RF, and KNN) and unsupervised (K-means, GMM, and SOM) algorithms evaluated ● GMM achieved 77.4% accuracy (one level) and 75.1% (two levels) | IL | ● Labels: self-reported stress (5-point scale) after each task ● 4 labels/participant (one after each stress-inducing task) ● Monitoring: approximately 40 min during stress tasks (Trier Social Stress Test, Stroop Test, and Math Test) | ● Lab conditions may not reflect real industrial complexities ● Unsupervised learning had lower accuracy than supervised methods |
(Ribeiro et al., 2023) [35] Five-level stress classification | 16 | ❖ MAX30102 PPG sensor for HR, HRV, and SpO2, Grove GSR sensor for GSR/EDA, and an infrared sensor for skin temperature | ❖ Multiclass classification ● Fuzzy logic ● 75% sensitivity, 97% specificity, and 93% accuracy | IL | ● Labels: self-reported stress (5-point scale) after thermal stress phases ● 5 labels/participant (rest, cold stress, recovery, heat stress, and final recovery) ● Monitoring: approximately 27 min | ● Thermal stress focus limits generalization ● System complexity and size hinder daily use ● Costly hardware reduces accessibility ● Binary relevance overlooks stress indicator correlations |
(Sevil et al., 2020) [36] Real-time detection of physical activity and acute psychological stress | 24 | ❖ Empatica E4 | ❖ Multiclass classification ● Ensemble models ● 99.30% PA accuracy; 92.70% APS accuracy. 89.90% accuracy for simultaneous occurrences of both PA and APS | IL | ● Labels: self-reported stress levels ● 207 experiments with stress labels from physical and psychological tasks (e.g., treadmill, bike, and mental/emotional stress) ● Monitoring: 20–60 min per experiment | ● Limited data volume favored simpler ML techniques over advanced deep learning models ● Larger datasets are needed in future research to effectively train and evaluate deep learning models |
(Tonacci et al., 2020) [37] Predicting stress reduction following relaxation at workplace | 24 | ❖ Shimmer 3 GSR+ | ❖ Binary classification ● Matlab “Classification Learner” trained classifiers on autonomic features from ECG and GSR/EDA ● 79.2% accuracy | IL | ● Labels: self-reported anxiety (STAI; VAS-A) before and after relaxation ● 2 labels/participant ● Monitoring: 12 min (baseline, audio–video relaxation, rest, and video/audio-only relaxation) | ● Predominantly female sample limits generalizability ● Small sample size restricts demographic analysis and model training ● Short protocol duration limits autonomic changes ● Non-optimized YouTube media may affect intervention efficacy |
(Toshnazarov et al., 2024) [38] Accurate stress detection in natural settings | IL: 26 IW: 28 | ❖ Samsung galaxy Watch 5, Polar H10 chest strap | ❖ Binary classification ● The ML models include AdaBoost, GB, LR, MLP, RF, SVM, and extreme GB (XGBoost) ● 84% F1-score (lab); 71% F1-score (real-life) | IL and IW | ● Labels: self-reported stress ecological momentary assessments (EMA) via smartphones ● 12 labels/participant daily, randomized intervals ● 2-week continuous monitoring (Samsung smartwatch, and smartphone) ● Lab stress tasks: public speaking (9 min), cognitive (8 min), and physical (4 min), with 30 min rest ● Prompting: 12 times/day for stress and contextual data | ● Daily-life variability may affect SOSW performance ● Small sample size limits generalizability; larger data needed ● Continuous sensing may impact smartwatch battery life ● Data transmission issues could disrupt capture ● Results depend on smartwatch model; sensor accuracy may vary |
(Tutunji et al., 2023) [39] Passive stress detection, supporting personalized psychiatry | 83 | ❖ Empatica E4 | ❖ Binary classification ● Generalized linear mixed-effects models (GLMM) with maximal fitting ● Individualized (leave-one-beep-out) and group-based (leave-one-subject-out) models tested ● 29.87% error rate | IW | ● Labels: self-reported stress and affect (EMA, 7-point Likert) ● 6 labels/day per participant (exam and control weeks) ● Monitoring: 2-week continuous ● Prompting: EMA surveys prompted 6 times/day | ● Cross-sectional design limits long-term predictions ● Low accuracy due to real-life stress detection challenges ● Device reliability affected by data noise in daily use ● Small sample and student exam focus limit generalizability |
(Umer, 2022) [40] Monitoring physical and mental stress in construction workers | 8 | ❖ Equivital EQ02 Life Monitor vest | ❖ Multiclass/binary classification ● RUS Boosted Trees, Subspace KNN, and gagged trees ● 94.7% accuracy for simultaneous physical and mental stress monitoring | IL | ● Labels: self-reported physical (RPE) and mental stress (NASA-TLX) ● 2 labels/participant: physical (every 5 min) and mental (pre/post-task) ● Monitoring: 25 min physical (manual handling) + 25 min mental (digits task), repeated over two days | ● Controlled environment limits generalizability to actual construction sites ● Small sample size affects robustness and scalability |
(Velmovitsky et al., 2022 ) [41] Using apple watch ECG data for heart rate variability monitoring and stress prediction | 33 | ❖ Apple Watch Series 6 | ❖ Binary classification ● Classical ML, RF, and SVM ● 52% to 64% F1-weighted scores | IW | ● Labels: self-reported stress (DASS-21; single-item Likert) ● 6 labels/participant daily (every 3 h; EMA) ● Monitoring: 2 weeks with Apple Watch (30 s ECG readings) ● Prompting: 6 times/day for ECG and stress questionnaires via mobile health platform | ● Predominantly white female participants limit generalizability ● Real-life data collection introduces noise, reducing accuracy ● Models performed well for “no stress” but poorly for “stress” states |
(Vila et al., 2019) [42] A real-life application of stress detection of passengers while traveling | 1 | ❖ Empatica E4 | ❖ Regression ● Personalized stress model (linear regression) ● 0.187 RMSE (raw output), 0.146 RMSE (Clipped Output), and 96.5% classification rate | IW | ● Labels: self-reported stress (1–10 scale) ● 4 labels/participant (3 high-stress; 1 low-stress episode) ● High-stress: 162 min; low-stress: 120 min (museum visit) ● Monitoring: 3-day continuous monitoring while traveling; self-reports given daily | ● Simple linear regression used without feature selection ● Small number of non-stress labels |
(Weerasinghe et al., 2023) [43] Mental stress classification and exploring links between perceived and acute stress | 22 | ❖ SynAmps amplifier and a 62-channel QuickCap EEG system | ❖ Multiclass classification ● Spiking Neural Networks, Spike Time Dependent Plasticity (STDP), Intrinsic Plasticity (IP), Neuron Evolving and Pruning, self-pruning, unsupervised learning ● 90.76% accuracy | IL | ● Labels: self-reported mental stress (PSS-14) ● 3 labels/participant (stress, neutral, and positive states) ● Monitoring: EEG monitoring across 3 sessions (2 min each) with 40 audio comments (10–15 sec each) | ● Reliance on EEG data limits generalizability to other stress scenarios ● Potential overfitting to specific audio stress cues ● Spiking neural network model complexity hinders real-time wearable applications without optimization |
(Xu et al., 2024) [44] Non-invasive stress monitoring using an electronic skin | 10 | ❖ CARES sensor (electronic skin) | ❖ Multiclass classification ● Supervised learning, Shapley additive explanation ● 99.2% accuracy for stress/relaxation detection | IL and IW | ● Labels: self-reported state anxiety (STAI-Y) ● 3 labels/participant (pre-, during, and post-stressor tasks) ● Monitoring: 24 h continuous monitoring across daily activities (exercise, lab work, and relaxation) | ● CARES device promising but has research gaps ● Sweat cross-reactivity may reduce sensor accuracy ● Scalability and cost pose challenges ● Long-term wearability needs testing for skin compatibility ● ML algorithms may introduce bias and lack generalizability ● Real-world validation and data privacy are crucial |
(Zhang et al., 2023) [45] Model the relationship between overcrowding and stress levels | 26 | ❖ Empatica E4, Zephyr Bioharness 3, and FrontRow wearable camera | ❖ GLMM analysis ● Mask R-CNN for image detection/Geographically weighted regression and GLMM for effects estimate; personalized models | IW | ● Labels: self-reported stress via eDiary app ● 3 labels/participant (pre-, during, and post-walk) ● Locations: green, blue, transit, and commercial spaces ● Monitoring: 80 min (20 min per location: 5 min sitting; 15 min walking) | ● Small sample size from a single setting limits generalizability ● Confounders like cultural and socio-economic differences not accounted for |
3.2. Depression
Article/Year Application | Sample Size | Wearable/ Feature Space | Prediction Task/ AI Details | Env. | Labels/ Monitoring Details | Research Gaps/ Future Challenges |
---|---|---|---|---|---|---|
(Chikersal et al., 2021) [57] Detecting and predicting depression in college students, with predicting symptoms up to 15 weeks in advance | 138 | ❖ Fitbit Flex 2 | ❖ Binary classification ● Personalized model ● 85.7% accuracy in detecting post-semester depressive symptoms, predicting these outcomes with accuracy >80% | IW | ● Labels: pre- and post-semester depression severity scores using the Beck Depression Inventory-II (BDI-II) ● 2 labels/participant ● Monitoring: 16-week semester | ● Passive data may miss nuances of depressive symptoms ● College student focus limits generalizability ● High dimensionality and small sample size challenge feature stability ● Refinement needed for missing data and robustness |
(Dai et al., 2022) [58] Personalized depression predictions in RCTs | 89 | ❖ Fitbit Alta HR | ❖ Multitask Learning ● Multitask Learning, Hierarchical Model Architecture, Dynamic Task Weighing ● 0.725 AUROC; 0.668 AUPRC | IW | ● Labels: depression remission outcomes based on PHQ-9 scores ● 2 labels/participant (baseline and after 6 months) ● Monitoring: 6 months ● Continuous monitoring via Fitbit and assessments at 2, 4, and 6-month check-ins | ● Assumes static treatment paths post-randomization, limiting use in adaptive trials ● Small sample size reduces confidence in results ● The impact of varying wearable data lengths on model performance was not evaluated |
(Kim et al., 2019) [59] Classify depression in older adults living alone | 47 | ❖ Actiwatch Spectrum PRO (Philips Respironics) | ❖ Binary classification ● Binary LR compared with 4 ML models (logit, DT, boosted trees, and RF) ● ~70% binary LR accuracy; 91% logit model accuracy | IW | ● Labels: self-reported depression (EMA via Actiwatch, SGDS-K, and K-HDRS) ● 4 EMA prompts/day + 2 SGDS-K and K-HDRS labels (baseline; 2 weeks) ● Monitoring: 2 weeks (56 labels/participant) ● Prompting: 4 times/day (10-point Likert scale) | ● Subjective reports limit accuracy without clinical diagnoses ● EMA grand means may miss moment-to-moment mood variations ● Small, predominantly female sample limits generalizability ● Low statistical power hinders detecting sleep efficiency associations |
(Li et al., 2020) [60] Recognizing mild depression | 51 | ❖ HydroCel Geodesic Sensor Net (EEG system with 128 channels) | ❖ Binary classification ● Used different models including CNN, SVM, and KNN ● 80.74% accuracy (highest) | IL | ● Labels: BDI-II scores classify participants as mildly depressed or healthy ● 2 labels/participant (mildly depressed; healthy) ● Monitoring: EEG monitoring during a 7 min facial expression task | ● Focus on mild depression limits applicability to more severe cases ● Heavy reliance on EEG data quality and preprocessing may affect robustness and generalizability ● Small, homogeneous sample from a single university limits broader applicability |
(Mullick et al., 2022) [61] Predicting adolescent depression during COVID-19, with personalized models | 37 | ❖ Fitbit Inspire HR | ❖ Multiclass classification ● Algorithms: LASSO, elastic net, RF, AdaBoost, extra trees, gradient boosting, and XGBoost; high PHQ-9: 60–75% accuracy; low PHQ-9: 20–40% accuracy ● 50–60% average accuracy | IW | ● Labels: self-reported depression levels using the PHQ-9, completed weekly by participants ● 24 labels/participant (one per week) ● Monitoring: 24 weeks | ● COVID-19 impact limits generalizability to non-pandemic conditions ● Missing data from technical issues and non-adherence affected accuracy ● Small sample size reduced robustness despite study duration ● Model struggled with rare/severe depression changes, needing better data handling |
(Pedrelli et al., 2020) [62] Monitoring changes in depression severity | 31 | ❖ Empatica E4 | ❖ Regression task (depression severity) ● Average ensemble of boosting and RF ● Correlation between models’ estimate and clinician-rated HDRS scores: 0.46–0.7 | IW | ● Labels: HDRS-17 assessed by clinicians during in-person visits ● 6 labels/participant (assessed bi-weekly for 8 weeks) ● Monitoring: 9 weeks | ● Data collection issues require better network and sensor connectivity ● High MAE (3.8–4.74) limits clinical scalability despite strong correlations ● Small sample size and low symptom variability reduce generalizability ● Minimal sleep impact due to short study; longer monitoring needed |
(Price et al. 2022) [63] Unexpected similarities found between schizophrenia patients and healthy controls | 77 | ❖ Actiwatch device | ❖ UMAP/SHAP ● Unsupervised ML clustering methods. An unsupervised ML algorithm | IW | ● Labels: actigraphy-based movement patterns and clinical diagnoses ● 2 labels/participant (schizophrenia, depression, or control) ● Monitoring: 1-week continuous monitoring with actigraphy devices | ● Single-institution sample limits generalizability ● Cross-group differences may be confounded by age, medication, disorder type, and gender ● Depression and schizophrenia severity levels may not reflect typical cases |
(Rykov et al., 2021) [64] Predicting workforce depression | 267 | ❖ Fitbit Charge 2 | ❖ Binary classification ● Dropouts meet Multiple Additive Regression Trees ● 80% accuracy, 82% sensitivity, and 78% specificity | IW | ● Labels: depressive symptom severity assessed using the 9-item PHQ-9 ● 2 labels/participant (PHQ-9 scores at baseline and after 14 days) ● Monitoring: 14 days | ● Workforce-specific cohort limits generalizability ● Findings apply mainly to balanced demographic subgroups; broader testing needed ● Self-reported depression assessments may introduce response bias |
(Sato et al., 2023) [65] Enhanced MDD screening using SQIs to filter motion artifacts | 69 | ❖ Empatica E4 ACC for activity and sleep detection | ❖ Binary classification ● Classical ML; linear classification model ● 87.3% sensitivity; 84.0% specificity | IW | ● Labels: depressive symptoms using the Zung Self-Rated Depression Scale (SDS) ● 2 labels/participant (self-reported SDS scores and physiological markers over 24 h) ● Monitoring: 24 h | ● Small sample (69 participants) limits generalizability ● Linear models may miss complex HRV patterns ● Wearable device variability not fully assessed, affecting HRV accuracy |
(Bai et al., 2022) [66] Tracking mood stability and predicting variations in MDD for personalized treatment | 261 | ❖ Mi Band 2 (Xiaomi Corporation) and phone usage statistics | ❖ Binary/multiclass classification ● SVMs, KNN, DT, NB, RF, and LR ● RF with 84.46% accuracy and 97.38% recall for predicting between Steady-remission and Mood Swing-moderate | IW | ● Labels: PHQ-9 assessments ● Each participant contributed three consecutive PHQ-9 results per data sample ● Monitoring: 12 weeks ● Prompting: Daily at 8 PM to record mood using the Visual Analog Scale (VAS) | ● Imbalanced, small dataset affects model generalizability and accuracy ● Restricted to Android users, excluding a significant population segment |
(Shah et al., 2021) [67] Depression management | 14 | ❖ Samsung Galaxy watch | ❖ Regression task ● SHAP used to compare ML models per subject (RF, gradient boost, AdaBoost, elastic net, SVM, and Poisson regressor) ● Voting regressor selected the best model from all strategies ● Spearman’s rho = 0.67, p < 0.0001 | IW | ● Labels: self-reported mood (EMA, 7-point Likert for depression/anxiety) ● 4 labels/participant daily (8 a.m., 12 p.m., 4 p.m., and 8 p.m.) for 30 days ● Monitoring: 1 month ● Prompting: 4 times/day via the EMA app to report mood, lifestyle factors, and stress levels | ● N-of-1 models’ treatment guidance remains unproven ● Low-variability participants led to poor model fits ● Wearables and EEG may lack comprehensive data capture ● Small, specific sample limits generalizability ● Lack of clinical interviews weakens depression assessment ● High-frequency data collection may impact long-term adherence and quality |
(Tazawa et al., 2020) [68] Assessing depression severity and mood disorder evaluation | 86 | ❖ Silmee W20 (TDK, Inc., Tokyo, Japan) wristband biosensor device | ❖ Binary classification/regression ● XGBoost, self-rating assessment score (labels) ● 76% accuracy, 73% sensitivity, and 79% specificity | IW | ● Labels: clinician-administered HAMD-17 and self-reported BDI-II ● 2 labels/participant (HAMD-17 depressive severity; BDI-II depression score) ● Monitoring: continuous monitoring for up to 7 days | ● Small, non-diverse sample limits generalizability and significance ● No external validation; medication and hospitalization effects excluded |
(Tian et al., 2023) [69] Diagnosing depression | 178 | ❖ The three-lead EEG sensor and electrodes | ❖ Binary classification ● Using different classification models including “KNN, SVM, NB, DT, RF, XGboost” ● 90.70% accuracy, 96.53% specificity, and 81.79% sensitivity (KNN) | IL | ● Labels: clinician-administered assessments using the PHQ-9 for depression ● 2 labels/participant ● Monitoring: 162 s, including 90 s of resting-state EEG data and 72 s of audio-stimulated EEG data | ● Small, specific sample limits generalizability ● ALO algorithm’s complexity may hinder real-time clinical use ● Further validation needed across diverse populations ● Three EEG leads may limit detection versus multi-lead systems |
(Yang et al., 2023) [70] Depression recognition using multimodal data for improved severity assessment | 57 | ❖ Custom wristband with audio, activity (gyroscope; accelerometer), and heart rate sensors | ❖ Binary classification ● Using different models XGBoost RF, DT, LR, KNN, and NB (Gaussian) ● 83% accuracy | IW | ● Labels: self-reported depressive symptoms using multiple questionnaires, including the PHQ-9, BDI, and the Center for Epidemiological Studies Depression (CES-D) scale ● 3 labels/participant, collected before and after a course of treatment ● Monitoring: one day at a time (from 8:00 A.M. to 10:00 P.M.) | ● Larger sample needed for validation across populations ● Long-term effectiveness in real-world settings untested ● Emotion-sensing graphs and GCN model may hinder real-time use due to high computational demands |
(Zhu et al., 2019) [71] Mild depression detection during free viewing | 51 | ❖ 128-channel HydroCel Geodesic Sensor Net (EEG) and EyeLink 1000 Desktop Eye Tracker | ❖ Binary classification ● Models: Linear SVM, Radial Basis Function SVM (RBF SVM), Gradient Boosting Decision Tree (GBD tree), RF, Self-Normalizing Neural Networks (SNNs), and Batch Normalized Multilayer Perceptron (BNMLP) ● 83% accuracy (highest) | IL | ● Labels: self-reported depressive symptoms (BDI-II) ● 2 labels/participant (mild depression or normal control) ● Monitoring: 30 trials, with each trial lasting 6 s of viewing neutral and emotional facial expressions, followed by 2 s of rest, for a total of 240 s of active monitoring per participant | ● Limited sample size affects generalizability with only two depression levels studied ● Excluded severe depression, requiring broader diagnostic capabilities ● Inconsistent EEG band performance with EM, needing optimization in feature selection and fusion strategies |
3.3. Anxiety
Article/Year Application | Sample Size | Wearable/ Feature Space | Prediction Task/ AI Details | Env. | Labels/ Monitoring Details | Research Gaps/ Future Challenges |
---|---|---|---|---|---|---|
(Jacobson et al., 2021) [75] Predicting symptom deterioration in GAD and PD for early intervention | 265 | ❖ Mini Mitter Actiwatch | ❖ Binary classification ● Unsupervised deep autoencoder with ensemble modeling ● Base learners: neural networks, splines, ridge regression, RF, GLMs, Gaussian processes, XGBoost, KNN, and SVM ● 0.696 AUC, CI [0.598, 0.793], 84.6% sensitivity, 52.7% specificity, and 68.7% balanced accuracy | IW | ● Labels: anxiety disorder symptoms assessed via the Composite International Diagnostic Interview (CIDI) ● 2 labels/participant (baseline and after 17–18 years) ● Monitoring: one week for actigraphy data collection, with follow-up interviews 17–18 years later | ● Specificity of 52.7% risks false positives and unnecessary interventions ● Sparse data collection over two decades may miss symptom fluctuations, affecting accuracy ● Low prevalence of GAD and PD limits generalizability |
(Lee et al., 2022) [76] Identifying geriatric depression and anxiety | 352 | ❖ Fitbit Alta HR2 | ❖ Multilabel classification ● Classical ML LR, SVM, RF, and gradient boosting (GB) ● 90% accuracy for single labeling | IW | ● Labels: self-reported depression and anxiety symptoms using the Korean version of the Geriatric Depression Scale (K-SGDS) and the Korean Geriatric Anxiety Inventory (K-GAI) ● 2 labels/participant (depression; anxiety) from questionnaires ● Monitoring: 1+ month using low-cost activity trackers | ● Moderate-resolution trackers may lower data quality vs. ActiGraph ● Binary relevance method overlooks depression–anxiety relationships ● Seasonal mood and activity variations not considered ● More data needed to reduce prediction variance |
(Shaukat-Jali et al., 2021) [77] Detecting social anxiety and its levels | 12 | ❖ Empatica E4 | ❖ Binary (non-anxious vs. socially anxious) and multiclass (baseline, anticipation, and reactive anxiety) classification ● Supervised learning ● Accuracy: 97.5–99.48% (binary); 98.86–99.52% (multiclass with severity) with EDA/GSR being the most predictive physiological marker | IL | ● Labels: self-reported subclinical social anxiety levels using the Liebowitz Social Anxiety Scale (LSAS-SR) and the Social Phobia Screening Questionnaire (SPSQ) ● 3 labels/participant (baseline, anticipation anxiety, and reactive anxiety) ● Monitoring: approximately 30 min during the task | ● Small sample size limits generalizability ● Models may be biased toward certain classes, inflating accuracy ● Uncontrolled external factors, such as caffeine and alcohol consumption, may affect results |
(Di Tecco et al., 2024) [78] Detecting anxiety induced by a horror movie trailer | 34 | ❖ Shimmer3 GSR+ | ❖ Multiclass classification ● Classifiers based on DT, Discriminant Analysis, NB, SVM, ANNs, KNN, Kernels, and Ensembles optimized using Bayesian optimizer ● > 95% accuracy | IL | ● Labels: self-reported anxiety (post-task questionnaires) ● 3 labels/participant (baseline, anxiety-inducing, and relaxation) ● Monitoring: 10 min monitoring (2 min horror trailer; 4 min relaxation clips before/after) | ● Limited audiovisual stimuli reduce specificity and applicability ● Improved data collection and signal processing needed for precision and reliability |
3.4. General Trends
4. Risks of Bias and Applicability Appraisal
5. Discussion
5.1. Key Challenges for the Field
5.2. Limitations
6. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Canady, V. Mental illness will cost the world $16 trillion (USD) by 2030. Ment. Health Wkly 2018, 28, 7–8. [Google Scholar]
- Alonso, J.; Codony, M.; Kovess, V.; Angermeyer, M.C.; Katz, S.J.; Haro, J.M.; De Girolamo, G.; De Graaf, R.; Demyttenaere, K.; Vilagut, G. Population level of unmet need for mental healthcare in Europe. Br. J. Psychiatry 2007, 190, 299–306. [Google Scholar] [CrossRef] [PubMed]
- Rathod, S.; Pinninti, N.; Irfan, M.; Gorczynski, P.; Rathod, P.; Gega, L.; Naeem, F. Mental health service provision in low-and middle-income countries. Health Serv. Insights 2017, 10, 1178632917694350. [Google Scholar] [CrossRef] [PubMed]
- Dalla Vecchia, E.; Costa, M.M.; Lau, E. Urgent mental health issues in adolescents. Lancet Child Adolesc. Health 2019, 3, 218–219. [Google Scholar] [CrossRef]
- Wainberg, M.L.; Scorza, P.; Shultz, J.M.; Helpman, L.; Mootz, J.J.; Johnson, K.A.; Neria, Y.; Bradford, J.-M.E.; Oquendo, M.A.; Arbuckle, M.R. Challenges and opportunities in global mental health: A research-to-practice perspective. Curr. Psychiatry Rep. 2017, 19, 1–10. [Google Scholar] [CrossRef]
- Abuse, S. Key substance use and mental health indicators in the United States: Results from the 2019 National Survey on Drug Use and Health. 2020. Available online: https://digitalcommons.fiu.edu/srhreports/health/health/32/ (accessed on 14 March 2025).
- Vogel, J.; Auinger, A.; Riedl, R.; Kindermann, H.; Helfert, M.; Ocenasek, H. Digitally enhanced recovery: Investigating the use of digital self-tracking for monitoring leisure time physical activity of cardiovascular disease (CVD) patients undergoing cardiac rehabilitation. PLoS ONE 2017, 12, e0186261. [Google Scholar] [CrossRef]
- Dian, F.J.; Vahidnia, R.; Rahmati, A. Wearables and the Internet of Things (IoT), applications, opportunities, and challenges: A Survey. IEEE Access 2020, 8, 69200–69211. [Google Scholar] [CrossRef]
- Gedam, S.; Paul, S. A review on mental stress detection using wearable sensors and machine learning techniques. IEEE Access 2021, 9, 84045–84066. [Google Scholar] [CrossRef]
- Kang, M.; Chai, K. Wearable sensing systems for monitoring mental health. Sensors 2022, 22, 994. [Google Scholar] [CrossRef]
- Bhalla, N.; Jolly, P.; Formisano, N.; Estrela, P. Introduction to biosensors. Essays Biochem. 2016, 60, 1–8. [Google Scholar] [CrossRef]
- Shajari, S.; Kuruvinashetti, K.; Komeili, A.; Sundararaj, U. The emergence of AI-based wearable sensors for digital health technology: A review. Sensors 2023, 23, 9498. [Google Scholar] [CrossRef] [PubMed]
- Shang, R.; Chen, H.; Cai, X.; Shi, X.; Yang, Y.; Wei, X.; Wang, J.; Xu, Y. Machine Learning-Enhanced Triboelectric Sensing Application. Adv. Mater. Technol. 2024, 9, 2301316. [Google Scholar] [CrossRef]
- Zhou, J.; Chen, H.; Wu, Z.; Zhou, P.; You, M.; Zheng, C.; Guo, Q.; Li, Z.; Weng, M. 2D Ti3C2Tx MXene-based light-driven actuator with integrated structure for self-powered multi-modal intelligent perception assisted by neural network. Nano Energy 2025, 134, 110552. [Google Scholar] [CrossRef]
- Gomes, N.; Pato, M.; Lourenco, A.R.; Datia, N. A survey on wearable sensors for mental health monitoring. Sensors 2023, 23, 1330. [Google Scholar] [CrossRef]
- Rotenberg, S.; McGrath, J.J. Inter-relation between autonomic and HPA axis activity in children and adolescents. Biol. Psychol. 2016, 117, 16–25. [Google Scholar] [CrossRef]
- Akbulut, F.P.; Ikitimur, B.; Akan, A. Wearable sensor-based evaluation of psychosocial stress in patients with metabolic syndrome. Artif. Intell. Med. 2020, 104, 101824. [Google Scholar] [CrossRef]
- Anusha, A.; Sukumaran, P.; Sarveswaran, V.; Shyam, A.; Akl, T.J.; Preejith, S.; Sivaprakasam, M. Electrodermal activity based pre-surgery stress detection using a wrist wearable. IEEE J. Biomed. Health Inform. 2019, 24, 92–100. [Google Scholar]
- Aristizabal, S.; Byun, K.; Wood, N.; Mullan, A.F.; Porter, P.M.; Campanella, C.; Jamrozik, A.; Nenadic, I.Z.; Bauer, B.A. The feasibility of wearable and self-report stress detection measures in a semi-controlled lab environment. IEEE Access 2021, 9, 102053–102068. [Google Scholar] [CrossRef]
- Barki, H.; Chung, W.-Y. Mental stress detection using a wearable in-ear plethysmography. Biosensors 2023, 13, 397. [Google Scholar] [CrossRef]
- Betti, S.; Lova, R.M.; Rovini, E.; Acerbi, G.; Santarelli, L.; Cabiati, M.; Del Ry, S.; Cavallo, F. Evaluation of an integrated system of wearable physiological sensors for stress monitoring in working environments by using biological markers. IEEE Trans. Biomed. Eng. 2017, 65, 1748–1758. [Google Scholar]
- Booth, B.M.; Vrzakova, H.; Mattingly, S.M.; Martinez, G.J.; Faust, L.; D’Mello, S.K. Toward robust stress prediction in the age of wearables: Modeling perceived stress in a longitudinal study with information workers. IEEE Trans. Affect. Comput. 2022, 13, 2201–2217. [Google Scholar] [CrossRef]
- Campanella, S.; Altaleb, A.; Belli, A.; Pierleoni, P.; Palma, L. A method for stress detection using empatica E4 bracelet and machine-learning techniques. Sensors 2023, 23, 3565. [Google Scholar] [CrossRef] [PubMed]
- Can, Y.S.; Chalabianloo, N.; Ekiz, D.; Ersoy, C. Continuous stress detection using wearable sensors in real life: Algorithmic programming contest case study. Sensors 2019, 19, 1849. [Google Scholar] [CrossRef] [PubMed]
- Can, Y.S.; Chalabianloo, N.; Ekiz, D.; Fernandez-Alvarez, J.; Riva, G.; Ersoy, C. Personal stress-level clustering and decision-level smoothing to enhance the performance of ambulatory stress detection with smartwatches. IEEE Access 2020, 8, 38146–38163. [Google Scholar] [CrossRef]
- Chen, Q.; Lee, B.G. Deep learning models for stress analysis in university students: A sudoku-based study. Sensors 2023, 23, 6099. [Google Scholar] [CrossRef]
- Golgouneh, A.; Tarvirdizadeh, B. Fabrication of a portable device for stress monitoring using wearable sensors and soft computing algorithms. Neural Comput. Appl. 2020, 32, 7515–7537. [Google Scholar] [CrossRef]
- Halim, Z.; Rehan, M. On identification of driving-induced stress using electroencephalogram signals: A framework based on wearable safety-critical scheme and machine learning. Inf. Fusion 2020, 53, 66–79. [Google Scholar] [CrossRef]
- Bin Heyat, M.B.; Akhtar, F.; Abbas, S.J.; Al-Sarem, M.; Alqarafi, A.; Stalin, A.; Abbasi, R.; Muaad, A.Y.; Lai, D.; Wu, K. Wearable flexible electronics based cardiac electrode for researcher mental stress detection system using machine learning models on single lead electrocardiogram signal. Biosensors 2022, 12, 427. [Google Scholar] [CrossRef]
- Kikhia, B.; Stavropoulos, T.G.; Andreadis, S.; Karvonen, N.; Kompatsiaris, I.; Sävenstedt, S.; Pijl, M.; Melander, C. Utilizing a wristband sensor to measure the stress level for people with dementia. Sensors 2016, 16, 1989. [Google Scholar] [CrossRef]
- Kim, N.; Seo, W.; Kim, S.; Park, S.-M. Electrogastrogram: Demonstrating feasibility in mental stress assessment using sensor fusion. IEEE Sens. J. 2020, 21, 14503–14514. [Google Scholar] [CrossRef]
- Nath, R.K.; Thapliyal, H.; Caban-Holt, A. Machine learning based stress monitoring in older adults using wearable sensors and cortisol as stress biomarker. J. Signal Process. Syst. 2022, 94, 513–525. [Google Scholar] [CrossRef]
- de Arriba Perez, F.; Santos-Gago, J.M.; Caeiro-Rodríguez, M.; Iglesias, M.J.F. Evaluation of commercial-off-the-shelf wrist wearables to estimate stress on students. JoVE (J. Vis. Exp.) 2018, 136, e57590. [Google Scholar]
- Rescio, G.; Manni, A.; Caroppo, A.; Ciccarelli, M.; Papetti, A.; Leone, A. Ambient and wearable system for workers’ stress evaluation. Comput. Ind. 2023, 148, 103905. [Google Scholar] [CrossRef]
- Ribeiro, G.; Postolache, O.; Martín, F.F. A New Intelligent Approach for Automatic Stress Levels Assessment based on Multiple Physiological Parameters Monitoring. IEEE Trans. Instrum. Meas. 2023, 73, 4001714. [Google Scholar] [CrossRef]
- Sevil, M.; Rashid, M.; Askari, M.R.; Maloney, Z.; Hajizadeh, I.; Cinar, A. Detection and characterization of physical activity and psychological stress from wristband data. Signals 2020, 1, 188–208. [Google Scholar] [CrossRef]
- Tonacci, A.; Dellabate, A.; Dieni, A.; Bachi, L.; Sansone, F.; Conte, R.; Billeci, L. Can machine learning predict stress reduction based on wearable sensors’ data following relaxation at workplace? a pilot study. Processes 2020, 8, 448. [Google Scholar] [CrossRef]
- Toshnazarov, K.; Lee, U.; Kim, B.H.; Mishra, V.; Najarro, L.A.C.; Noh, Y. SOSW: Stress Sensing with Off-the-shelf Smartwatches in the Wild. IEEE Internet Things J. 2024, 11, 21527–21545. [Google Scholar] [CrossRef]
- Tutunji, R.; Kogias, N.; Kapteijns, B.; Krentz, M.; Krause, F.; Vassena, E.; Hermans, E.J. Detecting Prolonged Stress in Real Life Using Wearable Biosensors and Ecological Momentary Assessments: Naturalistic Experimental Study. J. Med. Internet Res. 2023, 25, e39995. [Google Scholar] [CrossRef]
- Umer, W. Simultaneous monitoring of physical and mental stress for construction tasks using physiological measures. J. Build. Eng. 2022, 46, 103777. [Google Scholar] [CrossRef]
- Velmovitsky, P.E.; Alencar, P.; Leatherdale, S.T.; Cowan, D.; Morita, P.P. Using apple watch ECG data for heart rate variability monitoring and stress prediction: A pilot study. Front. Digit. Health 2022, 4, 1058826. [Google Scholar] [CrossRef]
- Vila, G.; Godin, C.; Sakri, O.; Labyt, E.; Vidal, A.; Charbonnier, S.; Ollander, S.; Campagne, A. Real-time monitoring of passenger’s psychological stress. Future Internet 2019, 11, 102. [Google Scholar] [CrossRef]
- Weerasinghe, M.M.A.; Wang, G.; Whalley, J.; Crook-Rumsey, M. Mental stress recognition on the fly using neuroplasticity spiking neural networks. Sci. Rep. 2023, 13, 14962. [Google Scholar] [CrossRef] [PubMed]
- Xu, C.; Song, Y.; Sempionatto, J.R.; Solomon, S.A.; Yu, Y.; Nyein, H.Y.; Tay, R.Y.; Li, J.; Heng, W.; Min, J. A physicochemical-sensing electronic skin for stress response monitoring. Nat. Electron. 2024, 7, 168–179. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Měchurová, K.; Resch, B.; Amegbor, P.; Sabel, C.E. Assessing the association between overcrowding and human physiological stress response in different urban contexts: A case study in Salzburg, Austria. Int. J. Health Geogr. 2023, 22, 15. [Google Scholar] [CrossRef]
- Allen, A.P.; Kennedy, P.J.; Dockray, S.; Cryan, J.F.; Dinan, T.G.; Clarke, G. The trier social stress test: Principles and practice. Neurobiol. Stress 2017, 6, 113–126. [Google Scholar] [CrossRef]
- Mueller, V.; Richer, R.; Henrich, L.; Berger, L.; Gelardi, A.; Jaeger, K.M.; Eskofier, B.M.; Rohleder, N. The stroop competition: A social-evaluative stroop test for acute stress induction. In Proceedings of the 2022 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), Ioannina, Greece, 27–30 September 2022; pp. 1–4. [Google Scholar]
- WHO. Depression. Available online: https://www.who.int/news-room/fact-sheets/detail/depression (accessed on 14 March 2025).
- Benazzi, F. Various forms of depression. Dialogues Clin. Neurosci. 2006, 8, 151–161. [Google Scholar] [CrossRef]
- American Psychiatric Association, D.; American Psychiatric Association, D. Diagnostic and Statistical Manual of Mental Disorders: DSM-5; American Psychiatric Association: Washington, DC, USA, 2013; Volume 5. [Google Scholar]
- Govindasamy, K.A.; Palanichamy, N. Depression detection using machine learning techniques on twitter data. In Proceedings of the 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 6–8 May 2021; pp. 960–966. [Google Scholar]
- Disner, S.G.; Beevers, C.G.; Haigh, E.A.; Beck, A.T. Neural mechanisms of the cognitive model of depression. Nat. Rev. Neurosci. 2011, 12, 467–477. [Google Scholar] [CrossRef]
- Malhi, G.; Mann, J. Depression. Lancet 2018, 392, 2299–2312. [Google Scholar] [CrossRef]
- Riemann, D.; Berger, M.; Voderholzer, U. Sleep and depression—Results from psychobiological studies: An overview. Biol. Psychol. 2001, 57, 67–103. [Google Scholar] [CrossRef]
- Kemp, A.H.; Quintana, D.S.; Gray, M.A.; Felmingham, K.L.; Brown, K.; Gatt, J.M. Impact of depression and antidepressant treatment on heart rate variability: A review and meta-analysis. Biol. Psychiatry 2010, 67, 1067–1074. [Google Scholar] [CrossRef]
- Davey, C.G.; Yücel, M.; Allen, N.B.; Harrison, B.J. Task-related deactivation and functional connectivity of the subgenual cingulate cortex in major depressive disorder. Front. Psychiatry 2012, 3, 14. [Google Scholar] [CrossRef] [PubMed]
- Chikersal, P.; Doryab, A.; Tumminia, M.; Villalba, D.K.; Dutcher, J.M.; Liu, X.; Cohen, S.; Creswell, K.G.; Mankoff, J.; Creswell, J.D. Detecting depression and predicting its onset using longitudinal symptoms captured by passive sensing: A machine learning approach with robust feature selection. ACM Trans. Comput.-Hum. Interact. (TOCHI) 2021, 28, 1–41. [Google Scholar] [CrossRef]
- Dai, R.; Kannampallil, T.; Zhang, J.; Lv, N.; Ma, J.; Lu, C. Multi-task learning for randomized controlled trials: A case study on predicting depression with wearable data. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2022, 6, 1–23. [Google Scholar] [CrossRef]
- Kim, H.; Lee, S.; Lee, S.; Hong, S.; Kang, H.; Kim, N. Depression prediction by using ecological momentary assessment, actiwatch data, and machine learning: Observational study on older adults living alone. JMIR Mhealth Uhealth 2019, 7, e14149. [Google Scholar] [CrossRef]
- Li, X.; La, R.; Wang, Y.; Hu, B.; Zhang, X. A deep learning approach for mild depression recognition based on functional connectivity using electroencephalography. Front. Neurosci. 2020, 14, 192. [Google Scholar] [CrossRef]
- Mullick, T.; Radovic, A.; Shaaban, S.; Doryab, A. Predicting depression in adolescents using mobile and wearable sensors: Multimodal machine learning–based exploratory study. JMIR Form. Res. 2022, 6, e35807. [Google Scholar] [CrossRef]
- Pedrelli, P.; Fedor, S.; Ghandeharioun, A.; Howe, E.; Ionescu, D.F.; Bhathena, D.; Fisher, L.B.; Cusin, C.; Nyer, M.; Yeung, A. Monitoring changes in depression severity using wearable and mobile sensors. Front. Psychiatry 2020, 11, 584711. [Google Scholar] [CrossRef]
- Price, G.D.; Heinz, M.V.; Zhao, D.; Nemesure, M.; Ruan, F.; Jacobson, N.C. An unsupervised machine learning approach using passive movement data to understand depression and schizophrenia. J. Affect. Disord. 2022, 316, 132–139. [Google Scholar] [CrossRef]
- Rykov, Y.; Thach, T.-Q.; Bojic, I.; Christopoulos, G.; Car, J. Digital biomarkers for depression screening with wearable devices: Cross-sectional study with machine learning modeling. JMIR Mhealth Uhealth 2021, 9, e24872. [Google Scholar] [CrossRef]
- Sato, S.; Hiratsuka, T.; Hasegawa, K.; Watanabe, K.; Obara, Y.; Kariya, N.; Shinba, T.; Matsui, T. Screening for major depressive disorder using a wearable ultra-short-term HRV monitor and signal quality indices. Sensors 2023, 23, 3867. [Google Scholar] [CrossRef]
- Bai, R.; Xiao, L.; Guo, Y.; Zhu, X.; Li, N.; Wang, Y.; Chen, Q.; Feng, L.; Wang, Y.; Yu, X. Tracking and monitoring mood stability of patients with major depressive disorder by machine learning models using passive digital data: Prospective naturalistic multicenter study. JMIR Mhealth Uhealth 2021, 9, e24365. [Google Scholar] [CrossRef] [PubMed]
- Shah, R.V.; Grennan, G.; Zafar-Khan, M.; Alim, F.; Dey, S.; Ramanathan, D.; Mishra, J. Personalized machine learning of depressed mood using wearables. Transl. Psychiatry 2021, 11, 1–18. [Google Scholar] [CrossRef] [PubMed]
- Tazawa, Y.; Liang, K.-c.; Yoshimura, M.; Kitazawa, M.; Kaise, Y.; Takamiya, A.; Kishi, A.; Horigome, T.; Mitsukura, Y.; Mimura, M. Evaluating depression with multimodal wristband-type wearable device: Screening and assessing patient severity utilizing machine-learning. Heliyon 2020, 6, e03274. [Google Scholar] [CrossRef] [PubMed]
- Tian, F.; Zhu, L.; Shi, Q.; Wang, R.; Zhang, L.; Dong, Q.; Qian, K.; Zhao, Q.; Hu, B. The three-lead eeg sensor: Introducing an eeg-assisted depression diagnosis system based on ant lion optimization. IEEE Trans. Biomed. Circuits Syst. 2023, 17, 1305–1318. [Google Scholar] [CrossRef]
- Yang, D.; Li, Y.; Gao, B.; Woo, W.L.; Zhang, Y.; Kendrick, K.M.; Luo, L. Using Wearable and Structured Emotion-Sensing-Graphs for Assessment of Depressive Symptoms in Patients Undergoing Treatment. IEEE Sens. J. 2023, 24, 3637–3648. [Google Scholar] [CrossRef]
- Zhu, J.; Wang, Y.; La, R.; Zhan, J.; Niu, J.; Zeng, S.; Hu, X. Multimodal mild depression recognition based on EEG-EM synchronization acquisition network. IEEE Access 2019, 7, 28196–28210. [Google Scholar] [CrossRef]
- World Health Organization. The ICD-10 Classification of Mental and Behavioural Disorders: Diagnostic Criteria for Research; World Health Organization: Geneva, Switzerland, 1993; Volume 2. [Google Scholar]
- Canals, J.; Voltas, N.; Hernández-Martínez, C.; Cosi, S.; Arija, V. Prevalence of DSM-5 anxiety disorders, comorbidity, and persistence of symptoms in Spanish early adolescents. Eur. Child Adolesc. Psychiatry 2019, 28, 131–143. [Google Scholar] [CrossRef]
- Muris, P.; Simon, E.; Lijphart, H.; Bos, A.; Hale, W.; Schmeitz, K. The youth anxiety measure for DSM-5 (YAM-5): Development and first psychometric evidence of a new scale for assessing anxiety disorders symptoms of children and adolescents. Child Psychiatry Hum. Dev. 2017, 48, 1–17. [Google Scholar] [CrossRef]
- Jacobson, N.C.; Lekkas, D.; Huang, R.; Thomas, N. Deep learning paired with wearable passive sensing data predicts deterioration in anxiety disorder symptoms across 17–18 years. J. Affect. Disord. 2021, 282, 104–111. [Google Scholar] [CrossRef]
- Lee, T.-R.; Kim, G.-H.; Choi, M.-T. Identification of geriatric depression and anxiety using activity tracking data and minimal geriatric assessment scales. Appl. Sci. 2022, 12, 2488. [Google Scholar] [CrossRef]
- Shaukat-Jali, R.; van Zalk, N.; Boyle, D.E. Detecting subclinical social anxiety using physiological data from a wrist-worn wearable: Small-scale feasibility study. JMIR Form. Res. 2021, 5, e32656. [Google Scholar] [CrossRef]
- Di Tecco, A.; Pistolesi, F.; Lazzerini, B. Elicitation of Anxiety Without Time Pressure and Its Detection Using Physiological Signals and Artificial Intelligence: A Proof of Concept. IEEE Access 2024, 12, 22376–22393. [Google Scholar] [CrossRef]
- Shiffman, S.; Stone, A.A.; Hufford, M.R. Ecological momentary assessment. Annu. Rev. Clin. Psychol. 2008, 4, 1–32. [Google Scholar] [CrossRef] [PubMed]
- Wang, L.; Miller, L.C. Just-in-the-moment adaptive interventions (JITAI): A meta-analytical review. Health Commun. 2020, 35, 1531–1544. [Google Scholar] [CrossRef] [PubMed]
- Yang, M.-J.; Sutton, S.K.; Hernandez, L.M.; Jones, S.R.; Wetter, D.W.; Kumar, S.; Vinci, C. A Just-In-Time Adaptive intervention (JITAI) for smoking cessation: Feasibility and acceptability findings. Addict. Behav. 2023, 136, 107467. [Google Scholar] [CrossRef]
- Ben-Zeev, D.; Scherer, E.A.; Wang, R.; Xie, H.; Campbell, A.T. Next-generation psychiatric assessment: Using smartphone sensors to monitor behavior and mental health. Psychiatr. Rehabil. J. 2015, 38, 218. [Google Scholar] [CrossRef]
- Quirk, S.; Lovis, J.; Stenhouse, K.; Van Dyke, L.; Roumeliotis, M.; Thind, K. A standardized automation framework for monitoring institutional radiotherapy protocol compliance. Med. Phys. 2021, 48, 2661–2666. [Google Scholar] [CrossRef]
- Polacsek, M.; Woolford, M. Strategies to support older adults’ mental health during the transition into residential aged care: A qualitative study of multiple stakeholder perspectives. BMC Geriatr. 2022, 22, 151. [Google Scholar] [CrossRef]
- Tyler, C.M.; McKee, G.B.; Alzueta, E.; Perrin, P.B.; Kingsley, K.; Baker, F.C.; Arango-Lasprilla, J.C. A study of older adults’ mental health across 33 countries during the COVID-19 pandemic. Int. J. Environ. Res. Public Health 2021, 18, 5090. [Google Scholar] [CrossRef]
- Yu, C.C.; Tou, N.X.; Low, J.A. A comparative study on mental health and adaptability between older and younger adults during the COVID-19 circuit breaker in Singapore. BMC Public Health 2022, 22, 507. [Google Scholar] [CrossRef]
- Sun, Y.; Kargarandehkordi, A.; Slade, C.; Jaiswal, A.; Busch, G.; Guerrero, A.; Phillips, K.T.; Washington, P. Personalized Deep Learning for Substance Use in Hawaii: Protocol for a Passive Sensing and Ecological Momentary Assessment Study. JMIR Res. Protoc. 2024, 13, e46493. [Google Scholar] [CrossRef] [PubMed]
- Kargarandehkordi, A.; Slade, C.; Washington, P. Personalized AI-Driven Real-Time Models to Predict Stress-Induced Blood Pressure Spikes Using Wearable Devices: Proposal for a Prospective Cohort Study. JMIR Res. Protoc. 2024, 13, e55615. [Google Scholar] [CrossRef] [PubMed]
- Islam, T.; Washington, P. Individualized stress mobile sensing using self-supervised pre-training. Appl. Sci. 2023, 13, 12035. [Google Scholar] [CrossRef] [PubMed]
- Islam, T.; Washington, P. Personalized prediction of recurrent stress events using self-supervised learning on multimodal time-series data. arXiv 2023, arXiv:2307.03337. [Google Scholar]
- Li, J.; Washington, P. A comparison of personalized and generalized approaches to emotion recognition using consumer wearable devices: Machine learning study. JMIR AI 2024, 3, e52171. [Google Scholar] [CrossRef]
- Eom, S.; Eom, S.; Washington, P. SIM-CNN: Self-supervised individualized multimodal learning for stress prediction on nurses using biosignals. In Proceedings of the Workshop on Machine Learning for Multimodal Healthcare Data, Honolulu, HI, USA, 29 July 2023; pp. 155–171. [Google Scholar]
- Taji, H.; Miranda, J.; Peón-Quirós, M.; Atienza, D. Energy-Efficient Frequency Selection Method for Bio-Signal Acquisition in AI/ML Wearables. In Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design, Newport Beach CA USA, 5–7 August 2024; pp. 1–6. [Google Scholar]
- Wu, J.; Lin, X.; Yang, C.; Yang, S.; Liu, C.; Cao, Y. Wearable Sensors Based on Miniaturized High-Performance Hybrid Nanogenerator for Medical Health Monitoring. Biosensors 2024, 14, 361. [Google Scholar] [CrossRef]
- Neseem, M.; Nelson, J.; Reda, S. AdaSense: Adaptive low-power sensing and activity recognition for wearable devices. In Proceedings of the 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 20–24 July 2020; pp. 1–6. [Google Scholar]
Search Term | Publisher | No. of Papers |
---|---|---|
((“sensing” OR monitor* OR track* OR measur* OR biosignal OR biomarker OR ecg OR ppg OR eeg OR gsr OR “galvanic skin response” OR “breathing rate” OR “respiratory rate” OR accelerometer OR posture OR “heart rate” OR steps OR “step count” OR hrv OR “heart rate variability” OR “skin temp*”) AND (wearable* OR smartwatch OR “biometric trackers” OR “health monitoring devices” OR “activity track*” OR “fitness track*” OR “fitness monitor*” OR “body worn sensor*” OR “smart band*”) AND (mental* OR wellbeing OR stress OR anxiety OR substance OR depression) AND (“machine learning” OR “deep learning” OR “artificial intelligence” OR AI OR ML)) | IEEE | 13 |
MDPI | 11 | |
Elsevier | 6 | |
JMIR | 6 | |
Frontiers | 3 | |
Others | 9 |
Study | ROB-Participants | ROB-Predictors | ROB-Outcome | ROB-Analysis | A-Participants | A-Predictors | A-Outcome |
---|---|---|---|---|---|---|---|
[17] | high | low | low | low | high | low | low |
[18] | high | low | low | low | low | low | low |
[19] | low | low | low | low | high | low | low |
[20] | low | low | low | high | high | low | low |
[21] | low | low | low | unclear | high | low | low |
[22] | low | low | low | low | low | low | low |
[23] | low | low | low | high | low | low | low |
[24] | low | low | high | low | low | low | low |
[25] | low | low | low | low | high | low | low |
[26] | low | low | high | low | low | low | high |
[27] | low | low | low | low | low | low | low |
[28] | low | low | high | low | low | low | high |
[29] | high | high | low | low | high | low | low |
[30] | unclear | low | low | high | high | low | low |
[31] | low | low | unclear | low | high | low | high |
[32] | unclear | high | low | low | low | low | low |
[33] | high | high | low | high | high | high | low |
[34] | high | low | low | low | high | low | low |
[35] | unclear | low | low | low | unclear | low | low |
[36] | high | low | low | low | low | low | low |
[37] | high | low | high | high | high | low | high |
[38] | high | high | high | high | high | high | high |
[39] | high | low | high | high | high | low | high |
[40] | high | low | low | low | high | low | low |
[41] | high | low | low | low | high | low | low |
[42] | high | low | high | high | high | low | high |
[43] | low | low | low | low | low | low | low |
[44] | high | low | low | low | high | low | low |
[45] | high | low | low | low | high | low | low |
[57] | high | low | low | low | high | low | low |
[58] | low | low | low | low | low | low | low |
[59] | high | high | high | high | low | low | low |
[60] | low | low | low | low | low | low | low |
[61] | high | low | low | high | high | low | low |
[62] | low | low | low | low | low | low | low |
[63] | high | low | low | high | high | low | low |
[64] | low | low | low | low | low | low | low |
[65] | low | low | low | low | low | low | low |
[66] | low | low | low | low | low | low | low |
[67] | low | low | low | low | low | low | low |
[68] | low | low | low | low | low | low | low |
[69] | low | low | low | low | low | low | low |
[70] | unclear | low | low | low | unclear | low | low |
[71] | high | low | low | low | high | low | low |
[75] | low | low | low | low | low | low | low |
[76] | low | low | low | low | low | low | low |
[77] | high | low | low | high | high | low | low |
[78] | low | low | low | low | low | low | low |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kargarandehkordi, A.; Li, S.; Lin, K.; Phillips, K.T.; Benzo, R.M.; Washington, P. Fusing Wearable Biosensors with Artificial Intelligence for Mental Health Monitoring: A Systematic Review. Biosensors 2025, 15, 202. https://doi.org/10.3390/bios15040202
Kargarandehkordi A, Li S, Lin K, Phillips KT, Benzo RM, Washington P. Fusing Wearable Biosensors with Artificial Intelligence for Mental Health Monitoring: A Systematic Review. Biosensors. 2025; 15(4):202. https://doi.org/10.3390/bios15040202
Chicago/Turabian StyleKargarandehkordi, Ali, Shizhe Li, Kaiying Lin, Kristina T. Phillips, Roberto M. Benzo, and Peter Washington. 2025. "Fusing Wearable Biosensors with Artificial Intelligence for Mental Health Monitoring: A Systematic Review" Biosensors 15, no. 4: 202. https://doi.org/10.3390/bios15040202
APA StyleKargarandehkordi, A., Li, S., Lin, K., Phillips, K. T., Benzo, R. M., & Washington, P. (2025). Fusing Wearable Biosensors with Artificial Intelligence for Mental Health Monitoring: A Systematic Review. Biosensors, 15(4), 202. https://doi.org/10.3390/bios15040202