AI-Driven Mental Health Surveillance: Identifying Suicidal Ideation Through Machine Learning Techniques

Allam, Hesham; Davison, Chris; Kalota, Faisal; Lazaros, Edward; Hua, David

doi:10.3390/bdcc9010016

Open AccessArticle

AI-Driven Mental Health Surveillance: Identifying Suicidal Ideation Through Machine Learning Techniques

by

Hesham Allam

^*

,

Chris Davison

,

Faisal Kalota

,

Edward Lazaros

and

David Hua

Center for Information and Communication Sciences (CICS), College of Communication, Information, and Media, Ball State University, Muncie, IN 47304, USA

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2025, 9(1), 16; https://doi.org/10.3390/bdcc9010016

Submission received: 7 December 2024 / Revised: 31 December 2024 / Accepted: 14 January 2025 / Published: 20 January 2025

Download

Browse Figures

Versions Notes

Abstract

:

As suicide rates increase globally, there is a growing need for effective, data-driven methods in mental health monitoring. This study leverages advanced artificial intelligence (AI), particularly natural language processing (NLP) and machine learning (ML), to identify suicidal ideation from Twitter data. A predictive model was developed to process social media posts in real time, using NLP and sentiment analysis to detect textual and emotional cues associated with distress. The model aims to identify potential suicide risks accurately, while minimizing false positives, offering a practical tool for targeted mental health interventions. The study achieved notable predictive performance, with an accuracy of 85%, precision of 88%, and recall of 83% in detecting potential suicide posts. Advanced preprocessing techniques, including tokenization, stemming, and feature extraction with term frequency–inverse document frequency (TF-IDF) and count vectorization, ensured high-quality data transformation. A random forest classifier was selected for its ability to handle high-dimensional data and effectively capture linguistic and emotional patterns linked to suicidal ideation. The model’s reliability was supported by a precision–recall AUC score of 0.93, demonstrating its potential for real-time mental health monitoring and intervention. By identifying behavioral patterns and triggers, such as social isolation and bullying, this framework provides a scalable and efficient solution for mental health support, contributing significantly to suicide prevention strategies worldwide.

Keywords:

machine learning; artificial intelligence; suicidal ideation detection; mental health analysis; natural language processing; sentiment analysis; predictive modeling

1. Introduction

1.1. Suicide as a Global Public Health Crisis

Suicide is a pressing global public health issue that significantly impacts individuals across various demographics and regions. The complexity of this phenomenon is shaped by diverse risk factors, cultural contexts, and the effects of globalization. Understanding these elements is critical for developing effective prevention strategies.

Globally, suicide ranks as the third leading cause of death among 15- to 19-year-olds [1]. Epidemiological trends reveal that men generally exhibit higher suicide rates than women, except in regions like India and China, where young women are more vulnerable [2]. Additionally, youth suicide attempts are disproportionately higher in low- and middle-income countries (LMICs) compared to high-income countries, highlighting significant disparities [3].

Globalization plays a multifaceted role in influencing suicide rates. In high- and middle-income nations, suicide rates initially rise with globalization before declining due to improved healthcare and social integration [4]. However, in low-income countries, this relationship follows a U-shaped curve, where initial reductions in suicide rates give way to increases, as social inequalities deepen [4,5]. Vulnerable populations, such as LGBTQ+ individuals, those with psychiatric disorders, and socioeconomically disadvantaged youth, face heightened risks. Protective factors, like family cohesion and access to mental healthcare, can help mitigate these risks [2].

1.2. The Role of Technology in Suicide Prevention

The integration of technology in detecting early suicidal ideation offers significant promise, particularly through the analysis of social media and digital communication. Advanced methodologies, including natural language processing (NLP) and deep learning, have been employed to identify behavioral patterns and emotional cues indicative of suicidal thoughts. This technological approach not only enhances detection accuracy but also facilitates timely interventions.

Natural Language Processing Techniques

NLP techniques analyze user-generated content on social media, identifying emotional nuances and abrupt behavioral changes that signal suicidal ideation [6]. Models such as LSTM-attention-RNN and cat swarm intelligent adaptive recurrent network achieve high accuracy rates, reaching 93.7% and 90.3%, respectively, in detecting suicidal thoughts [7].

Deep Learning Models and Real-Time Detection

Deep learning frameworks, including transformers and multimodal approaches, effectively classify suicidal ideation, achieving F1 scores of 0.97 in specific datasets [8]. These models leverage extensive training datasets and attention mechanisms, making them suitable for real-world mental health screening applications.

Innovative systems, such as chatbot integrations, utilize deep learning to provide the real-time detection of suicidal ideation during conversations, offering immediate support to individuals in distress [9]. However, while these technological advancements show promise, ethical considerations and the need for human oversight remain essential to ensure effective and responsible deployment.

Refining predictive models for suicidal ideation detection is a critical endeavor, particularly in the context of addressing the global public health crisis posed by suicide. As outlined, suicide rates remain alarmingly high, with significant disparities across demographics and regions. The use of advanced technologies, such as natural language processing (NLP) and deep learning, has shown immense potential in detecting the early signs of suicidal ideation from social media and digital communication. However, the effectiveness of these models depends heavily on continuous refinement to improve accuracy, reduce false negatives, and adapt to the complexities of language and cultural contexts.

This research underscores the value of advancing these technologies to ensure timely interventions and more precise identification of at-risk individuals. By refining models to better capture nuanced emotional cues, context-specific triggers, and behavioral patterns, researchers can create tools that are not only effective but also scalable for global application. Additionally, such refinements enable these systems to address ethical challenges and integrate human oversight, ensuring they align with the sensitive nature of mental health screening. Ultimately, the value of this research lies in its potential to save lives by bridging gaps in mental healthcare and creating proactive, data-driven solutions to combat the rising tide of suicide worldwide.

1.3. Objectives and Contributions of This Study

This study leverages advanced natural language processing (NLP) and machine learning techniques to detect suicidal ideation from Twitter data. The methodology centers on developing a robust machine learning framework capable of processing and analyzing large volumes of tweets. Using this framework, a predictive model is trained to identify patterns indicative of suicidal ideation, enabling a proactive approach to suicide prevention.

As suicide rates continue to rise globally, data-driven solutions are imperative. This research focuses on developing a predictive model that could potentially analyze social media posts in real time to identify potential suicide risks when integrated with live Twitter data streams. By incorporating NLP and sentiment analysis, the model detects textual and emotional cues often associated with distress or crisis. The study emphasizes two main objectives:

Classification of Suicidal Ideation: The first objective is to train a machine learning model to categorize suicidal ideation into three distinct levels of severity and context. This involves teaching the model to recognize subtle linguistic patterns, such as expressions of despair or self-harm intentions, using advanced NLP techniques for semantic and syntactic analysis. The classification system is foundational for understanding the varied manifestations of suicidal thoughts.
Predictive Modeling for Risk Assessment: The second objective is to develop a predictive model capable of forecasting the likelihood of suicidal ideation based on previously observed patterns in the data. This model identifies trends and warning signals, providing a tool that, when connected to live Twitter data streams, could enable the real-time monitoring of social media platforms for early intervention opportunities.

These objectives form the core of the study, demonstrating the potential for AI-driven tools to enhance mental health surveillance and support suicide prevention initiatives.

This article provides a comprehensive exploration of how artificial intelligence (AI) can be used to detect suicidal ideation from social media data. The structure of the article is designed to guide readers through the key components of the study. Section 2 offers an in-depth literature review, examining topics such as suicidal ideation, associated risk factors, and the role of social media as a mental health indicator. It also highlights insights gained from analyzing social media data related to suicidal ideation while identifying gaps addressed by this research. Section 3 outlines the methodology, detailing the data collection process, preprocessing techniques, and analytical approaches used to develop and train the predictive model. The results and findings are presented in Section 4, where performance metrics, such as precision, recall, and overall analysis, are discussed to evaluate the model’s effectiveness. Section 5 provides a detailed discussion, comparing the proposed model with previous studies, emphasizing its unique contributions, and contextualizing the results within the broader landscape of AI applications in mental health analysis. Finally, Section 6 concludes the article by summarizing the study’s contributions, implications, and potential directions for future research. This structured approach ensures a clear and thorough understanding of the study’s objectives, methods, findings, and significance.

2. Literature Review

2.1. Suicidal Ideation and Its Significance

Suicidal ideation is a pervasive public health issue influenced by psychological, social, and environmental factors. Among adolescents, this issue is especially critical due to their vulnerability to mental health disorders and socio-environmental stressors. Psychological conditions, such as depression, hopelessness, and worthlessness, often exacerbate suicidal ideation, while substance abuse and social adversities, like discrimination and strained family relationships, compound the risk [10,11,12]. Understanding these factors is foundational for developing effective strategies to mitigate suicide risk.

2.2. Prevalence and Demographic Patterns

Adolescents exhibit a notable prevalence of suicidal ideation, as highlighted by a study in Macapá, where 46.7% of adolescents reported experiencing suicidal thoughts, with higher rates among private school students compared to public school students [13]. Suicide rates among 15–19-year-olds globally are significantly higher than among younger demographics, emphasizing adolescence as a critical window for intervention [14]. Gender differences are also evident, with males generally at a higher risk of completed suicide, while females are more likely to report ideation [15].

2.3. Social Media as a Data Source for Suicide Detection

Social media platforms, like Twitter and Facebook, are rich sources of real-time data reflecting users’ psychological states. Through computational techniques, such as natural language processing (NLP) and sentiment analysis, social media content can be analyzed to identify patterns indicative of suicidal ideation. Linguistic features, emotional expressions, and behavioral indicators provide critical insights. For instance, studies have shown that increased expressions of sadness or anxiety in posts correlate with depression and suicidal thoughts [16]. Machine learning models trained on such patterns enable predictive analytics that can facilitate early intervention [17].

2.4. The Role of NLP in Detecting Suicidal Ideation

NLP has emerged as a powerful tool for understanding and classifying language indicative of mental health conditions, including suicidal ideation.

Linguistic Features: Text-based indicators, such as negative sentiment, increased use of first-person pronouns, and expressions of hopelessness, are key markers of suicidal thoughts [17].
Sentiment Analysis: Advanced models, like CNN-BiLSTM, have demonstrated high accuracy in classifying mental health-related content, making them effective in identifying the early warning signs of mental distress [18].
Topic Modeling and Semantic Analysis: NLP techniques identify recurring themes and topics, such as crisis or despair, within social media posts. These insights provide a deeper understanding of the context and severity of suicidal ideation [16].

2.5. Machine Learning Models for Suicide Detection

Machine learning (ML) enhances the capability of NLP by enabling large-scale analysis and predictive modeling. Various ML approaches have been successfully applied to detect suicidal ideation:

Supervised Learning: Logistic regression and random forest models have been employed to classify social media posts, achieving promising results in detecting suicidal tendencies [19].
Deep Learning Architectures: Models like RoBERTa-CNN and LSTM-attention-RNN have shown superior performance by capturing contextual and emotional nuances in text, with RoBERTa-CNN achieving 98% accuracy on Reddit posts [20,21].
Network-Based Models: These models incorporate users’ social connections and interaction patterns to complement text-based approaches, offering a more comprehensive assessment of mental health [22].

2.6. Integration of NLP and ML in Suicide Risk Prediction

The combination of NLP and ML allows for:

◦: Real-Time Monitoring: These models can process large volumes of live social media data, providing continuous mental health surveillance.
◦: Classification and Prediction: NLP techniques identify and classify suicidal ideation, while ML algorithms predict the likelihood of progression to suicidal behavior based on historical and linguistic patterns [18,23].
◦: Personalization: Models can be fine-tuned to individual users, tailoring predictions to unique linguistic and behavioral patterns, thus improving accuracy and relevance [20].

2.7. Generative AI in Cybersecurity

Generative AI, particularly large language models (LLMs), has revolutionized cybersecurity by introducing advanced capabilities that transform traditional defense paradigms. LLMs exhibit emergent abilities, such as in-context learning, adaptive instruction following, and step-by-step reasoning, allowing them to tackle novel tasks with minimal input and adapt swiftly without extensive retraining. These attributes make LLMs indispensable tools in cybersecurity, enabling sophisticated threat detection, proactive response strategies, and the development of intelligent, resilient defense systems. Osipov et al. [24] emphasize the transformative potential of LLMs in fortifying cybersecurity measures, highlighting their capacity to enhance system resilience against increasingly sophisticated cyberattacks.

2.8. AI-Based Biometric Data Processing

Machine learning advancements have significantly propelled the capabilities of biometric data processing for enhanced security applications. A prominent study by Osipov et al. [24] showcases the application of machine learning in speech emotion recognition (SER) within telecommunication systems. The researchers introduced a novel wavelet capsular neural network, 2D-CapsNet, designed to analyze photoplethysmogram (PPG) data and identify states of panic stupor with an impressive accuracy of 86.0%. This innovative approach highlights the growing potential of AI in interpreting biometric signals to detect emotional states, offering critical applications in stress identification, deception detection, and secure telecommunication interactions.

While prior research highlights the significant role of social, psychological, and environmental factors in suicidal ideation, as well as the potential of natural language processing (NLP) and machine learning (ML) for suicide risk detection, several critical gaps remain. Existing studies have often underutilized advanced preprocessing techniques and interpretable ML models, such as random forest classifiers, instead favoring deep learning models. While effective, these models often lack transparency, require extensive computational resources, and are not always optimized for real-time applications. Additionally, much of the research focuses on general text-based patterns without adequately addressing the challenges posed by noisy and imbalanced social media data, which are crucial for developing practical and scalable solutions.

This study aims to address these gaps by leveraging a curated dataset and advanced NLP techniques, combined with a robust and interpretable random forest classifier, to detect suicidal ideation in Twitter posts. Unlike deep learning approaches, this model emphasizes computational efficiency and interpretability, making it better suited for real-world mental health applications. By automating the data labeling process with a verified model and employing sophisticated preprocessing methods, such as tokenization, stemming, and feature extraction using term frequency–inverse document frequency (TF-IDF) and count vectorization, this study provides a scalable and practical framework for suicide prevention. The findings underscore the importance of integrating precise linguistic and emotional pattern detection with computational efficiency to enable real-time mental health surveillance and intervention, particularly among vulnerable populations like adolescents.

3. Methodology

This study utilizes a dataset derived from Twitter to develop advanced predictive models that detect suicidal ideation using natural language processing (NLP) and machine learning (ML). The dataset plays a crucial role in identifying individuals who may be at risk, offering valuable insights for suicide prevention efforts.

3.1. Data Collection Methodology

Using Python’s Tweepy library, tweets were programmatically retrieved from the Twitter API during a defined period spanning June to August 2022. The dataset comprises over 20,000 tweets, each filtered using specific English hashtags and keywords that indicate potential suicidal thoughts. Examples of these hashtags include:

#wanttodie
#suicideprevention
#waysout
#depressionhelp
#feelinghopeless
#mentalhealthstruggles
#overwhelmed

To ensure the focus remained on original user posts, retweets were systematically excluded. Each tweet in the dataset includes the following attributes:

Anonymized User ID: Ensures user privacy, while maintaining the ability to analyze post history.
Timestamp: Specifies the time and date of the post.
Content: The main body of the tweet, including any hashtags.
Associated Keywords/Hashtags: A list of tags or terms that triggered the inclusion of the tweet.

3.2. Risk Categorization Framework

For effective analysis, the tweets were categorized into four risk levels based on content indicators:

The dataset used in this study contained 20,000 tweets, categorized into two classes:

Potential Suicide Post: Posts that lightly touch on distressing thoughts but do not exhibit immediate suicidal intent (Class 1).
Not Suicide: Tweets that show no signs of suicidal ideation (Class 0).

The dataset included two columns: one for the tweet content and another for the corresponding label. The labels were encoded as binary values (1 for “Potential Suicide Post” and 0 for “Not Suicide Post”). The data were preprocessed and split into training and testing sets to develop a predictive model for suicide ideation detection. We followed a similar plan used by previous studies [25,26].

3.3. Data Preprocessing

Loading and Cleaning Data:
The dataset was imported using Pandas, and missing values were removed. Tweets were cleaned by:
◦
Converting the text to lowercase.
◦
Removing mentions (@usernames), URLs, special characters, and numbers using regular expressions.
◦
Reducing consecutive repeating characters to single instances (e.g., “soooo” → “so”).
Tokenization and Stopword Removal:
◦
The text was tokenized into individual words.
◦
Common stopwords (e.g., “the”, “and”) were removed, and words were reduced to their root forms using the Porter Stemmer.
Feature Extraction:
◦
Text data were converted into numerical format using TF-IDF (term frequency inverse document frequency) and count vectorization for machine learning readiness.
Train-Test Split
◦
The dataset was divided into training (80%) and testing (20%) subsets using train_test_split.

3.4. Model Development

Algorithm Selection: A random forest classifier was chosen for its robustness and ability to handle high-dimensional data. It was trained with 100 estimators for optimal performance.

Training Process: The classifier was trained on the preprocessed training set (X_train, y_train) and validated on the testing set (X_test, y_test).

Evaluation Metrics: The model’s performance was evaluated using the following standard metrics:

Precision: Proportion of correct positive predictions.

Recall: Proportion of actual positives correctly identified.

F1-Score: Harmonic mean of precision and recall.

Accuracy: Overall correctness of predictions.

A confusion matrix was used to visualize true positives, true negatives, false positives, and false negatives, which provided a detailed view of model performance. The confusion matrix was particularly useful for identifying specific areas where the model underperformed, such as false negatives (critical in suicide ideation detection).

3.5. Data Features

Data Sample

The following sections provide an in-depth overview of the Twitter data utilized in this study, beginning with a sample dataset that illustrates the structure and classification of the posts. Additionally, the sections detail the distribution of “Suicide” versus “Not Suicide” posts, highlighting the percentage split between these categories to offer insight into the data’s composition and balance.

Table 1 provides an overview of how the dataset is structured, showcasing examples of tweets labeled as either “Not Suicide Post” or “Potential Suicide Post”. Each row represents a single tweet along with its corresponding classification, offering a clear understanding of the type of data used in the study. For instance, tweets such as “I love my new phone it’s super fast” and “Cherishing every moment with my loved ones” are labeled as “Not Suicide Post”, reflecting neutral or positive sentiments. On the other hand, tweets like “It hurts to even wake up every morning” and “I can’t seem to find a way out of this darkness” are categorized as “Potential Suicide Post”, indicating expressions of emotional distress or hopelessness. This structure highlights the diverse linguistic and emotional cues present in the dataset, which are essential for training models to detect suicidal ideation effectively.

Figure 1 illustrates the distribution of suicide risk classification, with 59.6% of posts classified as “Not Suicide Post” and 40.4% as “Potential Suicide Post”. While the dataset is not heavily imbalanced, the notable proportion of “Potential Suicide Posts” underscores the importance of accurately identifying and addressing these cases. This distribution is reflective of the realistic variability in social media content, where a significant number of posts express potential distress or suicidal ideation. A nearly balanced dataset ensures the model is not biased toward either class, allowing it to perform effectively in distinguishing between the two. Such a distribution justifies the need for rigorous preprocessing and robust model development to handle the sensitive nature of suicidal ideation detection.

Figure 2 below shows that the dataset has a significant number of not suicide posts (11,921) compared to potential suicide posts (8079). While not extremely imbalanced, the distribution may lead to biased results depending on how your model handles class weighting. Even though the imbalance is not severe, the consequences of missing actual suicide-related posts are significant. Ensuring the model is sensitive enough to detect potential suicide posts is crucial, even if it means accepting a slightly higher false positive rate.

Figure 3 below shows a word cloud of the potential suicide posts. The word cloud illustrates the most frequently occurring words in potential suicide posts, with larger words like “hurts”, “lost”, “wake”, “even”, and “pain” representing the dominant themes in the dataset. These words reflect intense emotional distress, feelings of hopelessness, and personal struggles. Supporting terms, such as “nobody”, “understands”, “burden”, and “unbearable”, further emphasize themes of isolation and a sense of being overwhelmed. Phrases like “better without” and “every morning” hint at repetitive struggles and despair, adding context to the emotional expressions in the posts.

The purpose of the word cloud is to provide a visual summary of the language patterns in posts associated with suicidal ideation, helping to identify key emotional cues and recurring themes. This visualization offers valuable insights into the dataset, highlighting specific linguistic patterns that can guide the development of predictive models or mental health interventions. By focusing on these prominent words and phrases, researchers can better understand the emotional undertones of at-risk individuals and create targeted strategies for timely support.

4. Results

4.1. Performance Results

The model’s performance metrics are summarized in Table 2.

The bar chart in Figure 4 highlights the model’s strong performance in classifying not suicide posts (Class 0) and potential suicide posts (Class 1) with high precision, recall, and F1-scores for both categories. While precision is slightly higher than recall for potential suicide posts, indicating a low rate of false positives, recall is marginally higher for not suicide posts, reflecting the model’s ability to capture most true cases in this class. The consistently high F1-scores across both classes demonstrate the model’s balance between precision and recall, showcasing its reliability in accurately distinguishing between the two categories. This performance underscores the model’s effectiveness for detecting suicidal ideation, while maintaining a manageable rate of false positives and negatives.

Table 2 presents performance metrics for a classification model predicting two classes: “Not Suicide Post” (Class 0) and “Potential Suicide Post” (Class 1). The model achieves a precision of 82% and recall of 88% for Class 0, indicating that it effectively identifies nonsuicidal posts but has some false positives. For Class 1, the precision is higher at 88%, meaning fewer false alarms, while the recall is slightly lower at 83%, showing some missed cases of suicidal ideation. Both classes have an F1-score of 0.85, reflecting a balanced performance between precision and recall. With a total of 145 instances for Class 0 and 155 for Class 1, the metrics are evaluated on a fairly balanced dataset.

Overall, the model achieves an accuracy of 85%, with macro and weighted averages of precision, recall, and F1-scores also at 0.85, indicating consistent performance across both classes. While the model performs well overall, slightly improving the recall for potential suicide posts could further reduce missed critical cases, which is vital in real-world applications, such as suicide ideation detection.

4.2. Precision–Recall Curve

The precision–recall curve is used to evaluate a model’s ability to distinguish between positive and negative classes, especially in datasets with class imbalance, by showing the tradeoff between precision (accuracy of positive predictions) and recall (ability to identify all true positives). It helps determine the optimal balance for specific applications, such as minimizing false negatives in suicide ideation detection, while maintaining reasonable precision to avoid excessive false positives.

Figure 5 shows the precision and recall curve. The model demonstrates strong performance, as indicated by a high AUC score of 0.93, suggesting it effectively distinguishes between positive and negative classes. This balance is maintained through high precision and recall, essential for accurately predicting cases. The precision–recall curve highlights a tradeoff: At low recall values, precision is near 1.0, indicating high accuracy when predicting positive cases but with significant false negatives. As recall increases, the model identifies more true positives, but precision declines due to a rise in false positives, reflecting the typical tradeoff between these metrics.

In practical applications like suicide ideation detection, high recall is critical to minimize false negatives, ensuring that actual cases are not missed, while reasonable precision prevents overwhelming resources with false positives. The model achieves a commendable balance, making it well-suited for contexts where identifying true cases is prioritized without overloading systems. The PR curve and AUC underscore the model’s effectiveness and its potential for deployment in sensitive mental health tasks.

4.3. Dataset Reduction

Reducing the dataset size from 20,000 tweets to a smaller, curated subset was necessary to enhance the model’s precision by focusing on high-quality and relevant data. A large dataset often contains noise, such as mislabeled or irrelevant entries, which can confuse the model and reduce its ability to make accurate predictions. By carefully curating the dataset, we eliminated much of this noise, enabling the model to better capture meaningful patterns and correctly identify potential suicide posts.

However, this approach introduced trade-offs. While precision improved by reducing false positives, the smaller dataset limited the diversity of examples, potentially affecting the model’s generalizability and recall (its ability to capture all true positives). This trade-off highlights the balance between focusing on accuracy in predictions versus ensuring the model can handle a broader range of inputs, especially in real-world applications where variability in data is inevitable.

4.4. Suicide Ideation Confusion Matrix

The confusion matrix was chosen because it provides an in-depth understanding of classification performance beyond overall accuracy. It highlights errors, like false positives (incorrectly flagging nonsuicidal posts) and false negatives (missing potential suicide posts), both of which are critical for real-world applications. By analyzing these metrics, targeted improvements can be made to address specific weaknesses.

Figure 6 shows two confusion matrices, one for the training dataset (left) and one for the test dataset (right). These matrices summarize the model’s performance in predicting “Not Suicide Post” and “Potential Suicide Post” classifications.

4.5. Training Confusion Matrix (Left):

True Positives (TP): 302

The model correctly classified 302 posts as “Potential Suicide Post”True Negatives (TN): 315

The model correctly classified 315 posts as “Not Suicide Post”

False Positives (FP): 40

These are “Not Suicide Posts” misclassified as “Potential Suicide Posts” False Negatives (FN): 43

These are “Potential Suicide Posts” misclassified as “Not Suicide Posts”

This matrix shows strong performance on the training set, with a relatively low number of false positives and false negatives, suggesting that the model has effectively learned patterns in the training data.

Test Confusion Matrix (Right):

True Positives (TP): 128

The model correctly identified 128 “Potential Suicide Posts”.

True Negatives (TN): 127

The model correctly identified 127 “Not Suicide Posts”.

False Positives (FP): 18

These are “Not Suicide Posts” misclassified as “Potential Suicide Posts”.

False Negatives (FN): 27

These are “Potential Suicide Posts” misclassified as “Not Suicide Posts”.

The test matrix reflects the model’s ability to generalize to unseen data, with a strong balance of true positives and true negatives. However, compared to the training data, there is a slight increase in false negatives and a reduction in false positives, indicating potential room for improvement in recall for “Potential Suicide Posts”.

4.6. Comparison and Insights:

Training vs. Test:

The training matrix shows higher overall correct classifications compared to the test matrix, indicating that the model has learned well on the training data. However, the slight difference in performance on the test set may highlight minor overfitting or areas where the model’s generalizability could improve.

2.: False Negatives:

The presence of 27 false negatives in the test set is critical for suicide ideation detection, as missing potential suicide posts could have severe real-world implications. Strategies to improve recall, such as fine-tuning thresholds or enhancing feature representation, are necessary.

3.: False Positives:

The relatively low false positives in both matrices indicate that the model maintains high precision, minimizing unnecessary alerts, which is valuable for efficient resource allocation.

The confusion matrices indicate that the model performs well in distinguishing between a “Not Suicide Post” and “Potential Suicide Post” on both training and test data. While the model achieves a good balance of precision and recall, addressing false negatives in the test set should be prioritized to ensure the robust and reliable detection of suicide ideation in real-world scenarios.

4.7. Performance Analysis

Class 0 (“Not Suicide Post”):
◦
The model exhibited better recall (0.88), indicating that it correctly identified most nonsuicidal posts. However, a precision of 0.82 suggests some false positives.
Class 1 (“Potential Suicide Post”):
◦
The precision (0.88) was higher than recall (0.83), meaning the model effectively reduced false positives but missed some critical cases (false negatives).

5. Discussion

5.1. Machine Learning Models

5.1.1. Logistic Regression (LR)

Kruthika et al. [27] demonstrated the simplicity and effectiveness of logistic regression combined with Bag-of-Words (BoW) vectorization, achieving an accuracy of 92%. While LR achieved high accuracy, this study’s random forest classifier (RFC) demonstrated superior robustness in handling high-dimensional data and capturing complex linguistic patterns, though it achieved a slightly lower accuracy of 85%. This highlights RFC’s strength in providing balanced and nuanced insights rather than focusing solely on raw accuracy.

5.1.2. Support Vector Machines (SVM)

Goni et al. [28] reported a 94% accuracy for SVM models utilizing a probability-based feature set (ProBFS). While SVM demonstrated strong performance in binary classification tasks, the RFC used in this study balanced precision (88%) and AUC (0.93), providing a broader evaluation of model reliability. This balance is crucial for the sensitive task of suicide ideation detection, where minimizing false negatives is a priority.

5.1.3. Naive Bayes

Shanmukha et al. [29] highlighted the effectiveness of naive Bayes with TF-IDF vectorization, showcasing strong results in some contexts. Similar to this study’s methodology, TF-IDF was utilized for feature extraction. However, the RFC utilized in this study offered more comprehensive insights into linguistic patterns, demonstrating an advantage over the simplicity of naive Bayes.

5.2. Deep Learning Models

5.2.1. BERT and Transformers

Akintoye et al. [30] showcased exceptional performance from models like BERT and RoBERTa, with accuracies ranging from 97.7% to 99.9%, due to their ability to capture semantic nuances in text. While deep learning models excel in raw performance, their reliance on extensive computational resources and large-scale datasets contrasts with this study’s approach, which focuses on a smaller, curated dataset. This methodology prioritizes interpretability and practical application, particularly in contexts where computational efficiency and real-world deployment are critical.

5.2.2. Long Short-Term Memory (LSTM)

SenilSeby et al. [30] emphasized LSTM’s ability to capture temporal dynamics, achieving accuracies of around 92.3%. LSTM’s temporal modeling capabilities are well-suited for sequential data but may not be as critical for single-instance text analysis, such as tweets. In contrast, the RFC’s ability to efficiently process high-dimensional feature sets aligns more effectively with the static nature of tweet data in this study.

5.3. Key Findings and Contributions

The rise in mental health concerns globally, particularly suicide rates, underscores the urgent need for innovative, data-driven solutions to enhance mental health surveillance and intervention. Social media platforms, such as Twitter, provide a unique opportunity to detect signs of suicidal ideation in real time, given the openness and immediacy of user-generated content. While several studies have explored machine learning (ML) and natural language processing (NLP) techniques for analyzing social media data, challenges, such as noisy datasets, false positives, and scalability, remain prevalent.

Development of a Notable Model: A random forest classifier was utilized to handle high-dimensional data effectively, balancing precision and recall to minimize false negatives in suicidal ideation detection.
Advanced Preprocessing Pipeline: The study employs rigorous preprocessing techniques, including tokenization, stemming, and feature extraction through term frequency–inverse document frequency (TF-IDF) and count vectorization, to ensure high-quality data transformation.
Balanced Dataset Approach: A curated dataset of over 20,000 tweets was refined to focus on meaningful linguistic patterns, reducing noise while maintaining a realistic distribution of classes.
Comprehensive Validation: The model’s performance was rigorously evaluated using metrics such as accuracy (85%), precision (88%), recall (83%), and a precision–recall AUC score of 0.93, demonstrating its reliability for real-world applications.
Scalability and Practicality: The proposed framework is scalable and suitable for real-time deployment, offering a practical tool for mental health monitoring and suicide prevention on social media platforms.

The study presents a novel approach to detecting suicidal ideation from Twitter data by integrating advanced NLP techniques with a robust random forest classifier (RFC). The novelty of this research lies in its focus on leveraging high-quality, curated datasets alongside state-of-the-art preprocessing methods to enhance the accuracy and reliability of predictive modeling in a sensitive context. Unlike previous studies that heavily rely on resource-intensive deep learning models, our work emphasizes computational efficiency, interpretability, and practical applicability.

Contributions

The specific contributions of this research are as follows:

By addressing existing challenges and offering a computationally efficient, interpretable, and reliable model, this research establishes a benchmark for leveraging artificial intelligence in the critical field of mental health surveillance. The findings demonstrate the potential of machine learning to contribute meaningfully to global suicide prevention strategies.

5.4. Innovative Approaches

Visualization and Linguistic Validation: The use of word clouds to analyze and validate the linguistic focus of potential suicide posts distinguished this study. By highlighting recurring emotional cues, such as words like “pain” and “lost”, the visualization provided qualitative support for the model’s ability to identify relevant patterns in the data.
Ethical and Practical Relevance: Unlike studies prioritizing raw performance, this study aligns with the ethical requirements of suicide prevention by balancing accuracy with interpretability and real-world applicability.

In conclusion, while other models, like SVM and deep learning approaches, demonstrate higher accuracy, this study’s RFC-based approach offers a valuable balance of precision, recall, and interpretability. The focus on dataset quality and practical application makes it a robust choice for detecting suicidal ideation on Twitter, addressing the unique challenges of this critical task.

5.5. Alignment with Objectives

The study successfully achieved its objectives by developing a Python-based system capable of processing large-scale datasets and generating predictive insights for suicide prevention. The integration of natural language processing (NLP) and sentiment analysis allowed the model to capture nuanced emotional and textual cues associated with suicidal ideation. This capability enabled the real-time identification of high-risk individuals, fulfilling the goal of supporting healthcare providers, social media platforms, and intervention agencies in deploying timely and targeted interventions.

The results underscore the potential of combining sentiment detection with advanced machine learning frameworks, such as transformer models (e.g., BERT or GPT), to further improve precision and reduce false positives. Incorporating such architectures could enhance the detection of complex emotional states and subtle linguistic signals indicative of suicidal ideation, allowing for more precise and effective resource allocation. Additionally, extending the system’s functionality to include temporal and contextual analysis could enable it to identify behavioral patterns over time, offering deeper insights into triggers like social isolation or bullying. These advancements would enhance the system’s scalability and impact, ensuring proactive support for individuals at risk.

In conclusion, this study highlights the value of leveraging artificial intelligence to address critical mental health challenges. The model demonstrated a strong ability to detect suicidal ideation, while maintaining a balance between precision and recall. While the findings are promising, future iterations must focus on reducing false negatives and incorporating advanced techniques to further refine the system’s capabilities, ensuring it becomes an indispensable tool in suicide prevention efforts.

5.6. Limitations and Future Directions

The framework we propose is inherently designed to integrate with live systems. Although our study focuses on validating the model on existing data, the predictive capabilities demonstrated lay the groundwork for real-time deployment. Future iterations will integrate continuous data streams to enable personalized monitoring, addressing the concern about adapting the model to individual psychological health needs. We acknowledge the importance of validating these claims with live data and have outlined this as a crucial next step in our research. Our ultimate aim is to translate these findings into real-time applications that can significantly enhance mental health surveillance.

While the model demonstrated strong performance, further efforts are needed to address false negatives, as these represent critical missed opportunities for intervention. Incorporating additional data sources, expanding the dataset, and employing advanced techniques, such as temporal analysis and topic modeling, could improve the model’s ability to identify context-specific risk factors. Additionally, embedding ethical considerations and privacy safeguards into the system ensures responsible deployment, while maintaining user trust.

While the model demonstrates potential for detecting suicidal ideation, several limitations must be acknowledged. One significant limitation is that the markup process for the dataset was not elaborated from the perspective of healthcare professionals. Although the model is designed to align with widely accepted standards, further collaboration with healthcare professionals and social authorities is necessary to refine its classification accuracy. This step is crucial to ensure that the annotations and predictions are practically relevant and suitable for real-world applications.

Additionally, the model is not intended to replace professional judgment but rather to serve as a supplementary tool. Its current scope relies on a limited dataset, and while it is designed to integrate with larger datasets for further testing and validation, this scalability has yet to be fully realized. By working alongside healthcare professionals and social authorities, the model can be used as an extra measure to flag potential cases of suicidal ideation, providing data-driven insights that complement expert evaluations. These collaborative efforts will enhance the model’s reliability and utility in real-world mental health monitoring and intervention scenarios.

In conclusion, this study demonstrates the potential of AI-driven solutions to tackle the global mental health crisis through the large-scale analysis of social media data. The model’s ability to detect suicidal ideation effectively aligns with the objectives of scaling proactive suicide prevention and advancing data-driven models. Future enhancements will further refine its capabilities, paving the way for impactful applications in mental health intervention and resource allocation.

6. Conclusions

This study set out to address the global mental health crisis by leveraging artificial intelligence (AI) to detect suicidal ideation from social media posts. The primary goal was to develop a data-driven model capable of analyzing large-scale Twitter datasets to identify patterns indicative of distress. Through advanced natural language processing (NLP) and sentiment analysis techniques, the model aimed to provide timely and actionable insights to support suicide prevention efforts. By achieving a balance between precision and recall, the model demonstrates significant potential for real-world applications in identifying at-risk individuals and facilitating targeted interventions.

The process involved several critical steps to ensure the model’s effectiveness. A curated subset of the original dataset of 20,000 tweets was used to minimize noise and focus on high-quality, relevant data. Preprocessing techniques, such as tokenization, stemming, and feature extraction through TF-IDF and count vectorization, transformed raw text into a machine-readable format. A random forest classifier was selected for its robustness and ability to handle high-dimensional data, ensuring that the model could effectively capture linguistic and emotional patterns associated with suicidal ideation. These steps were integral in refining the model’s ability to distinguish between potential suicide posts and not suicide posts.

The findings revealed that the model achieved an overall accuracy of 85%, with a precision of 88% and a recall of 83% for potential suicide posts, reflecting its strong capability to identify high-risk cases, while maintaining a low false positive rate. The precision–recall curve, with an AUC score of 0.93, further validated its effectiveness in balancing precision and recall across varying thresholds. The confusion matrices for both training and test datasets highlighted areas for improvement, particularly in reducing false negatives, which remain critical for sensitive applications like suicide prevention.

Overall, the study demonstrates the potential of AI-driven solutions to address critical mental health challenges by analyzing social media data to detect suicidal ideation. While the model performed well in capturing high-risk cases and minimizing false alarms, further enhancements are needed to improve recall and address missed cases. Incorporating advanced architectures like transformers, temporal analysis, and broader datasets can further refine the system. This research underscores the importance of leveraging AI to create scalable and proactive tools for mental health intervention, offering hope for a more comprehensive approach to suicide prevention.

Author Contributions

Conceptualization was conducted by H.A. and F.K.; methodology by H.A.; software by C.D.; validation by H.A., D.H. and E.L.; formal analysis by H.A.; investigation by H.A.; resources by E.L.; data curation by C.D.; writing—original draft preparation by H.A.; writing—review and editing by F.K.; visualization by D.H.; supervision by F.K.; project administration by H.A.; and funding acquisition by E.L. All authors have read and agreed to the published version of the manuscript.

Funding

The research received no external funding.

Data Availability Statement

The current Tweet Data was generated by Tweepy Library. It can be accessed via Tweepy: https://www.tweepy.org/.

Conflicts of Interest

The authors declare no conflict of interest.

References

Liu, H.Y.; Qayyum, Z. Suicidal Behaviors in Children and Adolescents: Synthesis of Issues and Solutions From Global Perspectives. In Proceedings of the AACAP’s 70th Annual Meeting, New York, NY, USA, 23–28 October 2023. [Google Scholar]
Abraham, Z.K.; Sher, L. Adolescent suicide as a global public health issue. Int. J. Adolesc. Med. Health 2019, 31, 20170036. [Google Scholar] [CrossRef]
Lovero, K.L.; Dos Santos, P.F.; Come, A.X.; Wainberg, M.L.; Oquendo, M.A. Suicide in global mental health. Curr. Psychiatry Rep. 2023, 25, 255–262. [Google Scholar] [CrossRef] [PubMed]
Sari, E.; Er, S.T.; Demir, E. Suicide as globalisation’s Black Swan: Global evidence. Public Health 2023, 217, 74–80. [Google Scholar] [CrossRef] [PubMed]
Milner, A.; McClure, R.; De Leo, D. Globalization and suicide: An ecological study across five regions of the world. Arch. Suicide Res. 2012, 16, 238–249. [Google Scholar] [CrossRef] [PubMed]
Raja R, A.; Nagarajan, B. Cutting-Edge AI Technology to Recognize Signs of Suicidal Thoughts in Social Media Posts. Res. Sq. 2024. [Google Scholar] [CrossRef]
Balasubramanian, J.; Koppa, K.B.; Solanki, V.; Saxena, A.K. Suicide thoughts screening with social media using cat swarm-intelligent adaptive recurrent network. Multidiscip. Sci. J. 2024, 6, e2024ss0501. [Google Scholar] [CrossRef]
Ezerceli, Ö.; Dehkharghani, R. Mental disorder and suicidal ideation detection from social media using deep neural networks. J. Comput. Soc. Sci. 2024, 7, 2277–2307. [Google Scholar] [CrossRef]
Elsayed, N.; ElSayed, Z.; Ozer, M. CautionSuicide: A Deep Learning Based Approach for Detecting Suicidal Ideation in Real Time Chatbot Conversation. arXiv 2024, arXiv:2401.01023. [Google Scholar]
Gregor, B.; André, D.C.; Dagmar, P. Suizidalität bei Adoleszenten—Prävention und Behandlung. Therapeutische Umschau. Rev. Thér. 2015, 72, 619–632. [Google Scholar]
Caio, C.D.F.V.; Fernanda, C.C.; Manuela, K.; Mariluci, A.M.; Rosibeth, D.C.M.P. Ideação suicida em pessoas com transtornos relacionados a substâncias em tratamento. SMAD. Rev. Eletrônica Saúde Ment. Álcool Drog. 2023, 19, 70–81. [Google Scholar]
Silva, G.C.V.; Jaques, U.; Fernandes, S.A.F.S. Incidência de Ideação Suicida (IS) e principais fatores associados entre a população trans—Revisão de literatura. Braz. J. Health Rev. 2022, 5, 19134–19147. [Google Scholar]
de Abreu, T.B.; Martins, M.D.G.T. A presença de ideação suicida em adolescentes e terapia cognitivo-comportamental na intervenção: Um estudo de campo. Rev. Ibero-Am. Humanidades Ciências Educ. 2022, 8, 1341–1362. [Google Scholar]
Conner, A.; Azrael, D.; Miller, M. Access to firearms and youth suicide in the US: Implications for clinical interventions. In Pediatric Firearm Injuries And Fatalities: The Clinician’s Guide to Policies and Approaches to Firearm Harm Prevention; Springer: Cham, Switzerland, 2021; pp. 13–29. [Google Scholar]
Grover, C.; Huber, J.; Brewer, M.; Basu, A.; Large, M. Meta-analysis of clinical risk factors for suicide among people presenting to emergency departments and general hospitals with suicidal thoughts and behaviours. Acta Psychiatr. Scand. 2023, 148, 491–524. [Google Scholar] [CrossRef]
Kansal, M.; Singh, P.; Srivastava, P.; Singhal, R.; Deep, N.; Singh, A. Mental Health Monitoring in the Digital Age; Advances in Medical Technologies and Clinical Practice Book Series; IGI Global: Hershey PA, USA, 2024; pp. 168–183. [Google Scholar]
Fudholi, D.H. Mental Health Prediction Model on Social Media Data Using CNN-BiLSTM. Kinet. Game Technol. Inf. Syst. Comput. Netw. Comput. Electron. Control 2024, 9, 29–44. [Google Scholar]
Joinson, D.; Davis, O.; Simpson, E. The dynamics of emotion expression on Twitter and mental health in a UK longitudinal study. Int. J. Popul. Data Sci. 2024, 9, 1. [Google Scholar] [CrossRef]
Cai, S.; Jun, H.; Liu, J.L. Research on the applicability of suicide tweet detection algorithms. In Proceedings of the 4th International Conference on Signal Processing and Machine Learning, Chicago, IL, USA, 15 January 2024. [Google Scholar]
Lin, E.; Sun, J.; Chen, H.; Mahoor, M.H. Data Quality Matters: Suicide Intention Detection on Social Media Posts Using a RoBERTa-CNN Model. arXiv 2024, arXiv:2402.02262. [Google Scholar]
Nagarajan, R.; Ramachandran, P.; Dilipkumar, R.; Kaur, P. Global estimate of burnout among the public health workforce: A systematic review and meta-analysis. Hum. Resour. Health 2024, 22, 30. [Google Scholar] [CrossRef] [PubMed]
Oliveira, R.D.N.; Martins, J.A.C.; Paraboni, I. Mental health prediction from social media connections. New Rev. Hypermedia Multimed. 2024, 29, 225–244. [Google Scholar] [CrossRef]
Chancellor, S.; De Choudhury, M. Methods in predictive techniques for mental health status on social media: A critical review. NPJ Digit. Med. 2020, 3, 43. [Google Scholar] [CrossRef]
Osipov, A.; Pleshakova, E.; Liu, Y.; Gataullin, S. Machine learning methods for speech emotion recognition on telecommunication systems. J. Comput. Virol. Hacking Tech. 2024, 20, 415–428. [Google Scholar] [CrossRef]
Abdulsalam, A.; Alhothali, A. Suicidal ideation detection on social media: A review of machine learning methods. Soc. Netw. Anal. Min. 2024, 14, 188. [Google Scholar] [CrossRef]
Mahmud, S.A. Suicidal Tweet Dataset. 2023. Available online: https://www.kaggle.com/datasets/aunanya875/suicidal-tweet-detection-dataset/code (accessed on 1 December 2024).
Kruthika, B.; Aditya, G.; Aggarwal, A.; Gupta, D. Exploring Multi-vectorization Approaches for Suicide Ideation Detection in Tweets. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, 24–28 June 2024. [Google Scholar]
Goni, A.; Akter, F.; Jahangir, M.U.F.; Hussain, K.; Chowdhury, R.R. Detection of Suicidal Ideation on Social Media using Machine Learning Approaches. Int. J. Comput. Appl. 2024, 186, 8–14. [Google Scholar] [CrossRef]
Verma, A.; Harper, M.; Assi, S.; Al-Hamid, A.; Yousif, M.G.; Mustafina, J.; Al-Jumeily OBE, D. Suicide ideation detection: A comparative study of sequential and transformer hybrid algorithms. In The International Conference on Data Science and Emerging Technologies; Springer Nature: Singapore, 2022; pp. 373–387. [Google Scholar]
SenilSeby, G.; Poojitha, B.; Rao, S.; Seshadri, G. Suicidal Tendency Detection from Twitter Posts Using Long Short-Term Memory Vectors. In Proceedings of the 2024 International Conference on Emerging Systems and Intelligent Computing (ESIC), Bhubaneswar, India, 9–10 February 2024. [Google Scholar]

Figure 1. Distribution of suicide risk classification.

Figure 2. Distribution of suicide risk classification.

Figure 3. Word cloud of potential suicide posts.

Figure 4. Precision, recall, and F1-score by class.

Figure 5. Precision–recall curve.

Figure 6. Suicidal ideation confusion matrix.

Table 1. Sample of the Twitter data.

Index	Tweet	Suicide
0	I love my new phone it’s super fast	Not Suicide Post
1	Excited to start a new journey in life	Not Suicide Post
2	It hurts to even wake up every morning	Potential Suicide Post
3	Cherishing every moment with my loved ones	Not Suicide Post
4	Sometimes I wonder if life is worth it	Potential Suicide Post
5	Cherishing every moment with my loved ones	Not Suicide Post
6	Pushing through challenges feeling stronger every day	Not Suicide Post
7	I can’t seem to find a way out of this darkness	Potential Suicide Post
8	Planning to clean my house this weekend	Not Suicide Post
9	It hurts to even wake up every morning	Potential Suicide Post
10	Planning to clean my house this weekend	Not Suicide Post
11	Feeling grateful for another beautiful day	Not Suicide Post
12	Went for a walk in the park; it was relaxing	Not Suicide Post
13	Thankful for the little joys in life	Not Suicide Post
14	Thankful for the little joys in life	Not Suicide Post

Table 2. Performance results.

Class	Precision	Recall	F1-Score	Support
Not Suicide Post (0)	0.82	0.88	0.85	145
Potential Suicide Post (1)	0.88	0.83	0.85	155
Accuracy			0.85	300
Macro Avg	0.85	0.85	0.85	300
Weighted Avg	0.85	0.85	0.85	300

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Allam, H.; Davison, C.; Kalota, F.; Lazaros, E.; Hua, D. AI-Driven Mental Health Surveillance: Identifying Suicidal Ideation Through Machine Learning Techniques. Big Data Cogn. Comput. 2025, 9, 16. https://doi.org/10.3390/bdcc9010016

AMA Style

Allam H, Davison C, Kalota F, Lazaros E, Hua D. AI-Driven Mental Health Surveillance: Identifying Suicidal Ideation Through Machine Learning Techniques. Big Data and Cognitive Computing. 2025; 9(1):16. https://doi.org/10.3390/bdcc9010016

Chicago/Turabian Style

Allam, Hesham, Chris Davison, Faisal Kalota, Edward Lazaros, and David Hua. 2025. "AI-Driven Mental Health Surveillance: Identifying Suicidal Ideation Through Machine Learning Techniques" Big Data and Cognitive Computing 9, no. 1: 16. https://doi.org/10.3390/bdcc9010016

APA Style

Allam, H., Davison, C., Kalota, F., Lazaros, E., & Hua, D. (2025). AI-Driven Mental Health Surveillance: Identifying Suicidal Ideation Through Machine Learning Techniques. Big Data and Cognitive Computing, 9(1), 16. https://doi.org/10.3390/bdcc9010016

Article Menu

AI-Driven Mental Health Surveillance: Identifying Suicidal Ideation Through Machine Learning Techniques

Abstract

1. Introduction

1.1. Suicide as a Global Public Health Crisis

1.2. The Role of Technology in Suicide Prevention

1.3. Objectives and Contributions of This Study

2. Literature Review

2.1. Suicidal Ideation and Its Significance

2.2. Prevalence and Demographic Patterns

2.3. Social Media as a Data Source for Suicide Detection

2.4. The Role of NLP in Detecting Suicidal Ideation

2.5. Machine Learning Models for Suicide Detection

2.6. Integration of NLP and ML in Suicide Risk Prediction

2.7. Generative AI in Cybersecurity

2.8. AI-Based Biometric Data Processing

3. Methodology

3.1. Data Collection Methodology

3.2. Risk Categorization Framework

3.3. Data Preprocessing

3.4. Model Development

3.5. Data Features

4. Results

4.1. Performance Results

4.2. Precision–Recall Curve

4.3. Dataset Reduction

4.4. Suicide Ideation Confusion Matrix

4.5. Training Confusion Matrix (Left):

4.6. Comparison and Insights:

4.7. Performance Analysis

5. Discussion

5.1. Machine Learning Models

5.1.1. Logistic Regression (LR)

5.1.2. Support Vector Machines (SVM)

5.1.3. Naive Bayes

5.2. Deep Learning Models

5.2.1. BERT and Transformers

5.2.2. Long Short-Term Memory (LSTM)

5.3. Key Findings and Contributions

Contributions

5.4. Innovative Approaches

5.5. Alignment with Objectives

5.6. Limitations and Future Directions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI