Next Article in Journal
Maternal Death: Retrospective Autopsy Study in Southwestern Colombia, 2000–2023
Previous Article in Journal
Satisfaction with Health Facility Personnel Among Older People with Disabilities in Chile: An Observational Study Based on the 2024 DISCA Survey
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Early Warning of Infectious Disease Outbreaks Using Social Media and Digital Data: A Scoping Review

by
Yamil Liscano
1,2,*,
Luis A. Anillo Arrieta
2,3,
John Fernando Montenegro
1,
Diego Prieto-Alvarado
1 and
Jorge Ordoñez
1
1
Grupo de Investigación en Salud Integral (GISI), Departamento Facultad de Salud, Universidad Santiago de Cali, Cali 760035, Colombia
2
School of Basic Sciences, Technology, and Engineering, Universidad Nacional Abierta y a Distancia–UNAD, Barranquilla 080005, Colombia
3
Department of Public Health, Division of Health Sciences, Universidad del Norte, Barranquilla 080005, Colombia
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2025, 22(7), 1104; https://doi.org/10.3390/ijerph22071104
Submission received: 21 April 2025 / Revised: 12 June 2025 / Accepted: 16 June 2025 / Published: 13 July 2025

Abstract

Background and Aim: Digital surveillance, which utilizes data from social media, search engines, and other online platforms, has emerged as an innovative approach for the early detection of infectious disease outbreaks. This scoping review aimed to systematically map and characterize the methodologies, performance metrics, and limitations of digital surveillance tools compared to traditional epidemiological monitoring. Methods: A scoping review was conducted in accordance with the Joanna Briggs Institute and PRISMA-SCR guidelines. Scientific databases including PubMed, Scopus, and Web of Science were searched, incorporating both empirical studies and systematic reviews without language restrictions. Key elements analyzed included digital sources, analytical algorithms, accuracy metrics, and validation against official surveillance data. Results: The reviewed studies demonstrate that digital surveillance can provide significant lead times (from days to several weeks) compared to traditional systems. While performance varies by platform and disease, many models showed strong correlations (r > 0.8) with official case data and achieved low predictive errors, particularly for influenza and COVID-19. Google Trends and X (formerly Twitter) emerged as the most frequently used sources, often analyzed using supervised regression, Bayesian models, and ARIMA techniques. Conclusions: While digital surveillance shows strong predictive capabilities, it faces challenges related to data quality and representativeness. Key recommendations include the development of standardized reporting guidelines to improve comparability across studies, the use of statistical techniques like stratification and model weighting to mitigate demographic biases, and leveraging advanced artificial intelligence to differentiate genuine health signals from media-driven noise. These steps are crucial for enhancing the reliability and equity of digital epidemiological monitoring.

1. Introduction

Emerging infectious diseases pose an increasing threat to both biodiversity and human societies, significantly impacting ecosystems and global public health. Traditionally, epidemiological surveillance has relied mainly on official systems that can sometimes experience delays in the early detection of outbreaks. In this context, digital surveillance has emerged as a complementary tool that captures early signals in real time using data from unconventional sources [1,2,3,4].
Digital platforms such as search engines, social media, and participatory surveillance systems have become key tools for detecting early signs of epidemic activity. Various studies have demonstrated that signals derived from Baidu searches or X (formerly Twitter) posts, for example, can anticipate the detection of diseases like COVID-19 and influenza by one to three weeks prior to official reports from traditional systems [5,6,7,8].
The integration of multiple digital sources, including Google queries, X messages, and alerts from platforms such as HealthMap, achieves high correlation levels with official data while significantly enhancing predictive capacity. Advanced techniques such as machine learning algorithms, autoregressive models, and temporal smoothing statistical methods have been shown to increase precision and provide timely alerts [4,9,10].
These methods are not confined to a single geographical context; their effectiveness has been validated in multiple countries, including China, the United Kingdom, the United States, and various Latin American nations. The high spatial and temporal resolution of these digital data facilitates the precise identification of local transmission hotspots, enabling targeted prevention and control measures [11,12,13,14,15].
Nevertheless, digital surveillance faces significant challenges related to data quality, such as media noise and biases from disparities in internet access. To ensure the reliability of these signals, rigorous validation and methodological clarity are essential. For instance, “careful keyword selection” is often cited as a requirement, but this can refer to diverse processes ranging from the use of expert-curated clinical terms to empirically validated, algorithm-based selections. While the potential of digital surveillance is clear, the wide heterogeneity in methods, data sources, and validation strategies makes it difficult for policymakers to assess and compare approaches. This creates a critical knowledge gap between academic research and practical implementation, highlighting the need for a systematic synthesis that maps these different methodologies and their limitations [1,16,17,18].
Therefore, the objective of this scoping review is not to definitively evaluate, but to systematically map and characterize the landscape of digital surveillance for infectious diseases. Specifically, we aim to: (1) identify the key data sources, analytical methods, and performance metrics reported in the literature; (2) categorize the different types of early warning mechanisms employed; and (3) synthesize the primary challenges and limitations discussed, including data quality, biases, and the gap between prediction and implementation, in order to provide a clear overview for researchers and decision-makers.

2. Materials and Methods

2.1. Protocol and Definitions

This study was conducted in accordance with the Joanna Briggs Institute guidelines for scoping reviews and the PRISMA-SCR framework [19,20]. A detailed protocol was developed outlining the eligibility criteria, search strategies, study selection procedures, data extraction, and descriptive analysis, with the aim of mapping and synthesizing the literature on digital disease surveillance.
Traditional Epidemiological Surveillance: The ongoing, systematic collection, analysis, and interpretation of health-related data essential to planning, implementation, and evaluation of public health practice, closely integrated with the timely dissemination of these data to those who need to know [21,22].
Digital Surveillance: The use of digital data, particularly from social media or other internet-based sources, for the purpose of public health surveillance [16].
Participatory Surveillance: An approach for gathering information from the community to monitor health trends, where members of the community are proactively engaged to regularly report on health events. It complements other sources of surveillance information, such as from health care facilities [23].
Warning Signal: For the purpose of this review, a “warning signal” is defined as a statistically significant deviation from a baseline or expected pattern in a digital data stream (e.g., search queries, social media posts) that suggests a potential increase in infectious disease activity processed and identified by an analytical model [6,24,25,26].

2.2. Eligibility Criteria

Empirical studies (observational, experimental, and modeling) as well as systematic or scoping reviews that meet the following criteria were included:
  • They report methods and results on digital surveillance of infectious diseases (e.g., influenza, COVID-19, RSV, dengue, etc.).
  • They describe study design characteristics, digital data sources, detection methods, temporal advantages, accuracy, and correlations with traditional surveillance systems.
  • They are published in peer-reviewed journals and available as full-text, without language restrictions.
Studies were excluded based on the following criteria:
  • Did not provide empirical data or focus exclusively on theoretical models lacking validation.
  • Focused solely on traditional surveillance without integrating digital data.
  • Presented unclear or insufficient results for extracting the required information.

2.3. PCC (Population, Concept, Context) Question

  • Population (P): Studies that involve the use of digital data (e.g., social media, search engines, mobile applications) for disease surveillance in human populations.
  • Concept (C): The characteristics, performance, and methods of digital disease surveillance; variables to be extracted include study design, data sources, detection methods, temporal advantages, detection rates, accuracy, among others.
  • Context (C): Studies published in peer-reviewed scientific journals addressing digital surveillance of outbreaks or epidemics at local, regional, national, or global levels.

2.4. Search Strategy

A systematic and comprehensive search strategy was designed and executed on 24 February 2025, across eight electronic databases: PubMed, Scopus, Web of Science, Springer, SciELO, Science Direct, Google Scholar, and Redalyc. The strategy was constructed using key terms related to the components of the PCC (Population, Concept, Context) question.
No language or date range filters were applied during the database search. The final selection of studies was based on the eligibility criteria detailed in Section 2.2, which were applied by the reviewers during the screening phases.
The specific search strategy used for each database and the results obtained are detailed below:
1. PubMed
Strategy: (((“digital surveillance” [Title/Abstract] OR “infodemiology” [Title/Abstract] OR “infoveillance” [Title/Abstract])) AND ((“infectious diseases” [MeSH Terms] OR “disease outbreaks” [MeSH Terms] OR “epidemics” [Title/Abstract])) AND ((“social media” [MeSH Terms] OR “Google Trends” [Title/Abstract] OR “Twitter” [Title/Abstract] OR “X” [Title/Abstract])) AND ((“machine learning” [MeSH Terms] OR “forecasting” [Title/Abstract] OR “time series analysis” [Title/Abstract]))).
2. Scopus
Strategy: TITLE-ABS-KEY ((“digital surveillance” OR “infodemiology”) AND (“infectious diseases” OR “epidemic”) AND (“social media” OR “Google Trends” OR “Twitter” OR “X”) AND (“machine learning” OR “forecasting”)).
3. Web of Science (WOS)
Strategy: TS = ((“digital surveillance” OR “infodemiology”) AND (“infectious diseases” OR “epidemic”) AND (“social media” OR “Google Trends” OR “Twitter” OR “X”) AND (“machine learning” OR “forecasting”)).
4. Science Direct
Strategy: (“digital surveillance” OR “infodemiology”) AND (“infectious diseases” OR “epidemic”) AND (“social media” OR “Google Trends” OR “Twitter” OR “X”) AND (“machine learning” OR “forecasting”).
5. Google Scholar
Strategy: allintitle: (“digital surveillance” OR “infodemiology” OR “social media”) AND (“infectious diseases” OR “epidemic” OR “outbreak”).
6. Springer
Strategy: (“digital surveillance” OR “infodemiology”) AND (“infectious diseases” OR “epidemic”) AND (“social media” OR “Google”).
7. SciELO
Strategy: (ti:(vigilancia digital OR infodemiología)) AND (ti:(enfermedades infecciosas OR epidemia)).
8. Redalyc
Strategy: (“vigilancia digital” OR “infodemiología”) AND (“enfermedades infecciosas”).
Note: In search strings that include “Twitter OR X,” both terms were used to ensure comprehensive coverage of the literature published both before and after the platform’s rebranding.

2.5. Study Selection and Data Extraction

Following the search, all identified records were exported to Zotero (version 6.0; accessed 24 February 2025) for citation management and removal of duplicates. The remaining records were then uploaded to Rayyan AI version (Rayyan Systems Inc.; 125 Cambridgepark Drive, Suite 301, Cambridge, MA 02140, USA; https://www.rayyan.ai/; accessed 24 February 2025), a web application designed to facilitate collaborative screening. Two independent reviewers (Yamil Liscano and Luis Anillo) screened the titles and abstracts of the identified studies to determine their eligibility. Preselected articles were then subjected to full-text review to confirm compliance with the inclusion and exclusion criteria. Discrepancies were resolved by consensus or consultation with a third reviewer.
To achieve the scoping review’s objective of comprehensively mapping the field, a structured form was developed for the detailed extraction of a wide range of variables. This was intended to capture the heterogeneity in study designs, data sources, analytical techniques, and contextual factors. The form captured the following variables:
  • Author and Year
  • Study Design
  • Digital Data Sources
  • Comparison Method Employed
  • Geographical Scope of the Study
  • Techniques or Algorithms Used for Digital Signal Detection
  • Temporal Advantage in Detection
  • Reported Detection Indicators or Rates
  • Measures of Precision and Performance
  • Data Collection Period
  • Type of Disease or Outbreak
  • Specific Digital Platforms and Tools
  • Data Preprocessing Methods
  • Analytical Algorithms or Techniques
  • Statistical and Performance Metrics
  • Spatial Resolution and Temporal Granularity
  • Integration with Traditional Surveillance Systems
  • Keyword Selection Process
  • Measurement of Media Impact
  • Demographic and Usage Characteristics
Other key methodological characteristics for critical appraisal were assessed.
The PRISMA flow diagram summarizing the study selection process is presented in Figure 1. The diagram was generated using the online R package PRISMA2020 [27] (https://estech.shinyapps.io/prisma_flowdiagram/, accessed 24 February 2025).

2.6. Statistical Analysis

A descriptive statistical analysis, including frequency counts and distributions of the extracted variables, was conducted using R software (version 4.3.0; accessed 15 March 2025). Foundational charts and plots were generated using R’s ggplot2 library, while more complex and customized visualizations, such as matrices and heatmaps, were created with Python’s matplotlib library (version 3.8.2; accessed 9 April 2025). Napkin (beta version; accessed 6 June 2025; https://www.napkin.ai/) was also used to create schematic and process diagrams.

3. Results

3.1. General Study Information

A total of 1009 records were identified from the various search sources. After removing 500 duplicates, 509 records remained for screening. Based on title and abstract review, 440 of these were excluded, leaving 69 reports for full-text retrieval. An additional 20 reports could not be retrieved, which left 49 articles to be assessed for eligibility. From these, a further 21 reports were excluded after full-text analysis (15 due to insufficient data and 6 for lack of relevance). This process resulted in 28 studies being included from the database search. Additionally, one eligible study was identified using another method by reviewing the reference lists of included articles. Thus, a total of 29 studies were included in the final review (see Figure 1).
Studies on digital public health surveillance stand out for a remarkable methodological diversity, reflecting an evolution toward more sophisticated approaches. As shown in Table 1, the research includes quantitative comparative empirical analyses (e.g., [5]), observational studies [28], retrospective and cohort analyses [29], as well as exploratory studies [30]. This variety demonstrates how statistical and modeling techniques have progressively adapted to the growing availability of digital data, incorporating advanced methods to improve outbreak prediction and management.
Regarding the diseases studied, respiratory and viral illnesses clearly predominate, especially COVID-19 and influenza, due to their epidemiological significance. The timeframe of data collection is considerable, spanning from early studies on dengue starting in 2003 [46] to recent research focused on COVID-19 [6,28,50]. The urgency of sudden outbreaks like COVID-19 or Zika often leads to studies with limited timeframes, whereas influenza research typically analyzes data across multiple seasons. Beyond the main respiratory viruses, other diseases such as Zika, MRSA, MERS, dengue, and cholera have also been examined. Broad studies like Feldman et al. [42], which covers multiple pathologies, validate the general applicability of these methodologies.
Geographical diversity is also a key aspect (see Figure 2). There are studies with a local scope, such as Wittwer et al. [33] in Brazilian cities or Broniatowski et al. [51] in a Baltimore hospital. Other works have national approaches in countries like China, the United Kingdom, the United States, Italy, and India. Additionally, international investigations spanning multiple countries, such as those by Yan et al. [34] and Feldman et al. [42], highlight the scalability of these methods. This geographic breadth allows for the adaptation of methodologies to specific contexts, considering factors like internet connectivity and population density.

3.2. Data Sources and Digital Platforms

The analysis of the 29 included studies reveals a digital ecosystem dominated by two primary types of data sources, including public web search engines and social media platforms, as detailed in Table 2 and visually summarized in Figure 3. Search engines, predominantly Google (used in 20 studies), offer high-volume, query-based data that are effective for tracking general interest in diseases like influenza or COVID-19. In contrast, social media platforms like X (12 studies) and Weibo provide richer, albeit noisier, contextual data, often used to analyze symptom self-reporting and public sentiment. A notable trend is the combination of these sources within a single study, a strategy used to balance the breadth of search data with the depth of social media content, as seen in the work of McGough et al. [32] and Santillana et al. [39]
It is important to note that the existing literature is heavily dominated by studies using Google and X. A significant gap exists regarding the use of other globally popular platforms such as Facebook, TikTok, or messaging services like Telegram and WhatsApp, likely due to data access restrictions. This overrepresentation limits the generalizability of the current findings and highlights a key area for future research.
Beyond these, specialized public health platforms such as HealthMap and ProMED-mail serve as crucial aggregators of news and official reports, frequently used to validate signals from other digital sources. Emerging sources are also gaining traction; mobility data from platforms like Apple Mobility and sensor data from smart thermometers were particularly leveraged in COVID-19 studies to correlate population movement and fever trends with case data (see Figure 3).
A near-universal theme across all studies is the validation of digital signals against traditional surveillance systems from institutions like the CDC, WHO, or national health ministries. This pattern underscores that these digital tools are almost exclusively used to complement, by providing early warnings and real-time trends, rather than replace traditional epidemiological reporting.

3.3. Methods and Analytical Techniques

The methodological workflow across the reviewed studies, detailed in Table 3, follows a consistent pattern of data collection, preprocessing, signal detection, and analysis, as outlined in Figure 4. In the preprocessing stage, a clear distinction emerges based on the data source. Studies using time series data from search engines frequently apply smoothing techniques like moving averages to reduce noise. In contrast, those analyzing unstructured social media text employ more complex natural language processing techniques, including stop word removal, stemming, and lemmatization to extract meaningful signals.
The methodological flow begins with data collection from a wide array of digital sources, such as search engines, social media, and specialized surveillance platforms, to capture early epidemiological signals (see Figure 4). These raw data then undergo a critical preprocessing phase, where techniques like cleaning, normalization, and smoothing are applied. This process, which often includes content classification to filter out digital noise, is essential for ensuring the quality and reliability of the data before analysis. Subsequently, the analytical stages first focus on detecting predictive patterns and time lags against official reports, using methods such as correlation and causality analysis. Building on this foundation, a varied set of analytical techniques is applied for robust modeling. These range from statistical models like linear regression and ARIMA to advanced machine learning approaches, including supervised regressions, ensemble models, and Bayesian methods, enabling a precise analysis of the captured epidemiological signals.
The analytical landscape, visually represented in the matrix in Figure 5, shows an evolution in methodological complexity. While traditional statistical methods like linear regression and correlation analysis remain foundational for validation in many studies [5,29], there is a clear trend toward more sophisticated machine learning techniques. For instance, studies tackling complex, multi-source data often employ supervised regression models like LASSO, ensemble methods like AdaBoost, or advanced classifiers such as support vector machines (SVMs) [39,42]. Similarly, time series forecasting has advanced from standard autoregressive models to more robust ARIMA and ARIMAX models to improve predictive accuracy [10,49].

3.4. Performance and Early Detection

The primary value of digital surveillance lies in its performance, specifically its ability to provide early warnings with high accuracy, as detailed in Table 4. The reported lead time varies significantly, from a few days to several weeks ahead of official reports.
While many studies report advantages of 1–3 weeks, exceptional cases exist, such as Feldman et al. [42], who detected outbreaks in news media an average of 43 days before official alerts. However, the practical relevance of such long lead times must be contextualized; for rapidly spreading diseases with short incubation periods like COVID-19, even a few days of early warning can be more impactful for triggering immediate public health responses. A comparative overview of these performance metrics across studies is presented in Figure 6.
Digital surveillance systems generally demonstrate high precision, with numerous studies reporting strong correlation coefficients (often r > 0.8), robust classification metrics (e.g., high sensitivity, specificity, and F1-scores), and low predictive errors (such as RMSE and MAE), as shown in Table 4. However, the trade-off between lead time and precision illustrated in Figure 7 deserves attention. Studies based solely on social media data (red points) sometimes achieve the longest lead times but exhibit considerable variability in precision. In contrast, those leveraging web search data (blue points) tend to maintain consistently high correlation with official reports (r > 0.8), though with more moderate lead times. Studies that integrate multiple data sources (yellow points), such as Santillana et al. [39], frequently strike a strategic balance, combining early detection with strong precision. This highlights the potential of multi-source approaches to optimize system performance. Conversely, studies like that of Alessa and Faezipour [48], which achieve high precision but short lead times, are well-suited for nowcasting applications rather than early warning.
The X-axis represents the lead time in days (temporal advantage of digital surveillance methods over official reporting). The Y-axis shows the detection rate as a percentage, reflecting the effectiveness of digital surveillance methods in terms of sensitivity, correlation coefficients, or detection accuracy. Each data point represents a study, color-coded by data source: web search (blue), social media (red), health data (green), and combined sources (yellow). Key studies are labeled for reference. The dashed line indicates the overall trend across all studies (r = 0.20, not significant, p = 0.32), suggesting no strong relationship between lead time and detection effectiveness in digital health surveillance systems. Note: This figure was generated using the matplotlib library in Python.

3.5. Complementary and Specific Aspects

A critical analysis of the studies’ contextual variables, summarized in Table 5 and the Figure 8 heatmap, reveals uneven methodological rigor across the literature. While most studies adequately address spatial and temporal resolution and keyword selection, significant gaps emerge when comparing how they handle external biases.
For instance, regarding spatial resolution, there is a wide range from hyper-local analyses of a single hospital [49] to global surveillance systems [42], a factor largely dictated by data availability. Similarly, the keyword selection process varies from manual, expert-defined lists to automated, correlation-based methods like that used by Verma et al. [38].
In contrast to the detailed reporting on these aspects, the measurement of media impact and the inclusion of demographics represent significant weak points. Only a minority of studies, such as Lampos et al. [28] and Kogan et al. [6], explicitly attempt to model or mitigate the confounding effects of media coverage. Even more sparse is the consideration of demographic biases; few studies outside of those using clinic-level data [29] stratified their analysis by age, gender, or socioeconomic status. This methodological asymmetry, clearly visualized in Figure 8, suggests that while digital surveillance is technically advancing, its findings may be biased by underestimating the human and communication factors that shape digital data.
In this heat map, the level of detail with which each study addresses four fundamental aspects is shown: the spatial and temporal resolution of the data (Resolution), the keyword selection process (Selection), the measurement of media impact (Measurement), and the consideration of demographic or usage variables (Demographics). The color indicates a range from “absent” (0) to “present” (2), allowing an at-a-glance view of each study’s methodological strengths and possible gaps. Note: This figure was generated using the matplotlib library in Python.

4. Discussion

4.1. Main Findings and Methodological Evolution

The diverse findings of this review can be best understood when organized into a conceptual framework that illustrates the complete pipeline of digital epidemiological surveillance, from data collection to public health application (see Figure 9). This framework helps structure the remarkable methodological diversity observed in the literature, which ranges from comparative empirical analyses to advanced research based on machine learning. It also accounts for the clear evolution toward increasingly sophisticated approaches over a long period, from pioneering research on dengue in 2003 to recent analyses on emerging threats like COVID-19.
By conceptualizing digital surveillance as a modern extension of traditional syndromic surveillance operating within the field of infodemiology, we can effectively analyze how these methods have been adapted across various geographical contexts and pathogens. The following discussion will therefore analyze our main findings through the key stages of this framework: (1) Data Collection from Digital Sources, (2) Analysis and Performance Evaluation, and (3) Integration with Public Health Systems.

4.1.1. Data Sources and Collection: The Core of Digital Syndromic Surveillance

The first stage of the framework involves gathering data from a diverse digital ecosystem, where search engines and social media are the dominant sources. A recurrent characteristic is the use of search engine data, primarily from Google Trends, often combined with social media platforms like X. These sources function as proxies for health-seeking behaviors and symptom self-reporting, forming the basis of a digital approach to syndromic surveillance. For example, studies have monitored search queries for terms like “pneumonia” or analyzed “sick posts” on platforms like Weibo to detect early signals. The high volume of search queries allows for the detection of broad trends with high correlation to official data, while the unstructured nature of social media provides richer, though often noisier, contextual data that enables deeper analysis like sentiment classification. The adaptability of these methods has been validated across a wide range of pathogens, including influenza, COVID-19, Zika, MERS, and dengue.

4.1.2. Analysis and Performance: From Raw Data to Actionable Signals

Once collected, the raw digital data undergo significant analysis to transform it into actionable public health signals. This review reveals a clear evolution toward increasingly sophisticated analytical approaches, incorporating innovations such as transfer learning, Bayesian analysis, and time series modeling. For instance, studies have leveraged transfer learning to adapt models to new contexts, used Bayesian methods to identify change points in time series, and employed robust ARIMA models for forecasting. These modern techniques facilitate a deeper understanding of the temporal and causal relationships between digital signals and official data.
One of the most notable findings is the ability of these methods to detect trends far ahead of traditional systems. Studies indicate a significant time advantage, with alerts sometimes preceding official notifications by more than a month, as seen in the work of Feldman et al. [42] who reported an average lead time of 43 days. This predictive accuracy is consistently validated through high correlation coefficients (e.g., r > 0.9 in multiple influenza studies) and significant reductions in predictive errors, with low RMSE and MAE values being a common metric of success. However, performance is not uniform; precision and lead time depend heavily on the data source, the analytical method employed, and the specific characteristics of the disease and region.

4.1.3. Public Health Integration and Challenges

The final stage of the framework is the integration of these digital insights with traditional public health systems. A key finding is that nearly all reviewed studies validate their digital data against official reports from recognized institutions like the CDC, WHO, or national health ministries, reinforcing that these tools currently serve as a complement to, not a replacement for, conventional surveillance.
However, this integration faces significant challenges rooted in the field of infodemiology, particularly regarding data quality and representativeness. Several studies explicitly recognize the distorting effects of media noise and panic. For instance, major public announcements or sensationalized news can trigger waves of searches from the “worried well”, creating massive signal spikes that are unrelated to actual case counts. This challenge is increasingly amplified by modern artificial intelligence, where recommendation algorithms can create filter bubbles or “echo chambers”, and generative AI can produce convincing fake news that triggers artificial spikes in digital data, making it difficult to separate from genuine public health events [52,53,54].
Furthermore, potential biases arising from the digital divide can affect the reliability and equity of these systems. This means that populations with lower internet access or different online behaviors, often rural, elderly, or low-income communities, are systematically underrepresented in the data. Consequently, a model trained on this biased data may fail to detect localized outbreaks in these vulnerable groups, compromising health equity, a limitation noted in studies analyzing regions with variable internet penetration. Addressing these challenges is crucial for moving digital surveillance from a set of promising research tools to a fully integrated and reliable component of modern public health response [16,55].

4.2. Comparison with Previous Literature

The findings of this review concur with previous research highlighting the utility of digital surveillance. However, to provide a more actionable analysis for researchers and policymakers, the approaches identified in the literature can be classified according to their primary use-case and the resource setting in which they are applied, as summarized in Table 6.
A key distinction emerges between real-time monitoring tools and retrospective models. Real-time “nowcasting” approaches often leverage live data streams from platforms like X to provide immediate situational awareness, which is invaluable during fast-moving outbreaks like influenza or cholera. In contrast, retrospective models frequently use historical data archives, such as Google Trends, to analyze the dynamics of past epidemics (e.g., Zika, Dengue) or to build and validate predictive forecasting models.
Furthermore, the choice of methodology is heavily influenced by the resource context. Studies conducted in high-resource settings often demonstrate the capacity to integrate multiple, complex data streams, including social media, search queries, and even electronic health records, using sophisticated machine learning models. Conversely, research in low-resource settings provides valuable insights into how to maximize the utility of single, highly accessible data sources like Google Trends, often employing more straightforward but effective statistical models.
Beyond classifying the primary studies included in our review, it is also useful to position our work within the context of other recent syntheses in the field. Table 7 provides a direct comparison with several key reviews and pioneering studies, highlighting our unique contribution in terms of methodological scope and practical application.
For example, this study (2025) evaluates the use of social media and digital sources for the early detection of infectious disease outbreaks using retrospective and predictive analyses, machine learning, correlations, and time series analysis from X, Google Trends, health forums, news databases, and epidemiological records. It demonstrates early outbreak detection several weeks in advance with high correlations to official reports; however, it also notes variability in data representativeness depending on the region and digital access.
In contrast, Al-Kenane et al. (2024) [56] examine the relationship between Google Trends and governmental response in Kuwait using time series analysis, Pearson correlations, and bootstrap techniques. They report a high correlation (R ~ 0.71) and note anticipatory changes in policies but with a limited geographical scope. Other studies, such as Melo et al. (2024) [57], Peng Jia et al. (2023) [58], Zhao et al. (2021) [18], and Salathé et al. (2012) [59], are also discussed in relation to their objectives, methodologies, and findings, highlighting innovative approaches, challenges in generalization, and the evolution of digital epidemiology.
Moreover, Melo et al. (2024) [57] share objectives similar to this study, as they also evaluate multiple digital tools, such as Google Trends, X, and mobile applications, in arbovirus surveillance. Their comparative review, based on statistical methods such as ANOVA and correlations, corroborates the advantages identified in early detection and predictive precision. However, Melo et al. [57] also emphasize high methodological variability among studies, a challenge echoed in the current research, stressing the need for greater standardization in the field.
The work by Peng Jia et al. (2023) [58] provides a complementary perspective by focusing on advanced technological innovations such as artificial intelligence, GIS, and digital twins, demonstrating significant improvements in accuracy and real-time detection. Although it does not concentrate exclusively on arboviruses, this study broadens the context of digital surveillance, highlighting the role of smart devices and technological evolution in enhancing epidemiological response. Nonetheless, it acknowledges methodological heterogeneity as a persistent barrier, similar to our findings.
In addition, Zhao et al. (2021) [18] contribute a critical and complementary dimension by analyzing ethical aspects related to digital surveillance in public health. Their approach diverges from operational considerations to address crucial issues such as privacy and civil rights protection, factors essential for ensuring the social acceptance and long-term sustainability of these systems. Although their analysis does not include concrete operational metrics, it emphasizes the importance of balancing technical efficiency with ethical responsibility, a consideration indirectly recognized in our study through the impact of media and digital context.
The pioneering work by Salathé et al. (2012) [59] laid the conceptual foundations of digital epidemiology by highlighting the initial potential of social media and Big Data to reduce outbreak detection times. Even though it lacks the detailed metrics provided by more recent studies, its foundational contribution helps explain how the field has evolved toward more rigorous and quantitative methodological approaches, as exemplified in the present study.
When comparing the current study with that of Shakeri et al. (2021) [60], broad methodological and temporal diversity in the analyzed approaches is evident, ranging from early research on diseases such as dengue to recent studies focused on COVID-19. Notably, there is an increasing use of advanced techniques such as transfer learning, Bayesian analysis, and time series models, especially in the context of respiratory and viral diseases. Moreover, the importance of adapting these methodologies to specific regional contexts and effectively integrating digital sources (such as Google Trends and social networks) with traditional epidemiological surveillance systems is underscored.
The scoping review by Shakeri et al. (2021) [60] provides a broader and more detailed perspective on the use of digital platforms in public health surveillance, covering not only infectious diseases but also areas such as mental health and chronic conditions. This study also highlights significant limitations, such as methodological biases arising from keyword selection and the limited practical application of the results in concrete public health actions. Both studies agree on the need for continuous methodological improvement and stronger integration of digital data with traditional systems, which is vital to maximize the impact of public health interventions. In this regard, it is important to emphasize that the complementarity between these analytical approaches and traditional surveillance enhances the response capacity to health emergencies. This implies not only technological advancement but also the consideration of sociocultural and operational factors that facilitate greater effectiveness in public health interventions derived from digital surveillance.
To provide a more actionable analysis for researchers and policymakers, it is useful to compare the analytical techniques identified in this review. As summarized in Table 8, the choice of method depends on the specific research question, data characteristics, and the desired balance between interpretability and predictive power. Simpler methods like correlation and linear regression offer transparency and are ideal for initial validation, confirming whether a digital data source shows a basic relationship with official case counts. In contrast, more sophisticated approaches like supervised ML are better suited for integrating complex, multi-source data streams to achieve higher predictive accuracy, though often at the cost of interpretability.
Each technique serves a distinct purpose within the digital surveillance toolkit. While time series models like ARIMA are specialized for forecasting diseases with clear seasonal patterns, they may struggle with the unpredictability of novel outbreaks. This is where supervised ML excels, offering the flexibility to model complex interactions from heterogeneous inputs like search queries, mobility data, and social media. NLP is indispensable for unlocking insights from unstructured text, allowing for real-time sentiment analysis and symptom mining from platforms like X. Finally, Bayesian methods offer a crucial advantage by quantifying the uncertainty in predictions, which provides a probabilistic framework to support robust public health decision-making, such as identifying the precise onset of an outbreak.
In summary, by explicitly comparing our approach with that of other studies, it is evident that:
  • Methodological Novelty: Our study integrates multiple digital sources and applies advanced machine learning techniques and time series modeling, thereby surpassing the limited scope of research focused on specific geographical or technological contexts.
  • Practical Application: Systematic validation against official data and the consideration of media impact and demographic variables reinforce the operational utility of our approach, enabling early outbreak alerts with advantages of up to several weeks.
  • Contextualization of Limitations: Whereas previous studies point out limitations in generalization and methodological heterogeneity, our work links these shortcomings with previous empirical evidence and proposes specific strategies (such as improved keyword selection and multi-source integration) to overcome them in future research.
This explicit comparison highlights how our approach contributes to the evolution of digital public health surveillance by offering a more comprehensive methodology that is adaptable to diverse epidemiological and operational contexts.

4.3. Methodological Considerations

The analysis of the methods and techniques used in digital surveillance reveals a shared methodological flow, spanning data collection to evaluation, and emphasizes how the incorporation of advanced techniques substantially improves early outbreak detection. In the initial phase, data are collected from diverse digital sources, such as search engines, social networks, specialized platforms, and mobility data, followed by rigorous preprocessing, including normalization and smoothing, to ensure clarity and high-quality epidemiological signals.

4.3.1. Discussion of Specific Techniques

One of the most innovative methodological aspects is the use of transfer learning and Bayesian analysis. For example, some studies (such as Lampos et al., 2021 [28], described in Table 3) have implemented unsupervised models combined with transfer learning techniques. These methods allow for leveraging information previously learned from large volumes of data, thereby facilitating the detection of subtle patterns in new contexts and enhancing predictive capacity in scenarios with limited data. Similarly, the use of Bayesian algorithms, as employed by authors such as Sharpe et al. (2016) [43] and Yan et al. (2017) [34], has proven effective in identifying change points in time series. These algorithms allow for more precise modeling of uncertainty and have, on occasion, resulted in improved correlation with official data, translating into more reliable early alerts.

4.3.2. Clarity in the Integration Process

The integration of digital data with official sources is another fundamental pillar that boosts the predictive accuracy of digital surveillance. For example, the study by Timpka et al. (2014) [29] combined signals from Google Flu Trends with clinical and laboratory data, achieving correlation coefficients as high as 0.96, which demonstrates the synergy between both types of data. This integration not only reinforces the validity of predictive models but also mitigates the biases inherent in relying solely on digital data. An additional example is found in the work by Yousefinaghani et al. (2021) [26], where the simultaneous use of the X API and Google Trends contrasted with official reports (such as those from Johns Hopkins COVID-19) and resulted in a notable improvement in prediction accuracy, with reduced error metrics (RMSE and MAE) and high correlation coefficients (above 75%).

4.3.3. Keyword Selection and Management of Media Impact

Similarly, the proper selection of keywords, implemented through methods ranging from manual identification to automated tools such as Google Correlate, is critical for extracting relevant signals and minimizing digital noise. Specific strategies, as described in studies like those by Shakeri Hossein Abad et al. (2021) [60], have enabled the reduction of interference from media overexposure by adjusting predictions based on variations in informational attention, while ensuring the representativeness of the signals. This approach contributes to a better interpretation of the data, ultimately optimizing early detection and operational response in public health.
The combination of advanced techniques (such as transfer learning and Bayesian analysis) with a rigorous process that integrates digital data with official sources translates into significant improvements in outbreak prediction. The concrete examples provided in Table 3 support the notion that these strategies not only increase early alert capacity (with advantages ranging from several days to weeks) but also reduce predictive errors, positioning these methods as essential complementary tools in epidemiological surveillance.

4.4. Study Limitations

Despite the significant advances in digital surveillance presented by the analyzed studies, important methodological limitations deserve a more in-depth discussion. First, there is notable heterogeneity in spatial resolution and temporal granularity, ranging from very local scales (such as municipal level) to global analyses. This variability considerably complicates the comparability and generalization of results, as epidemiological signals may exhibit different patterns depending on the geographic and temporal context. Therefore, it would be necessary to establish minimum standards that allow for more robust comparative analyses, thereby facilitating the interpretation of results across diverse studies [13,60,61].
Another critical point is the selection of keywords, a fundamental procedure for ensuring the quality and representativeness of digital signals. Currently, many studies rely on manual or semi-automated methodologies for determining these keywords, which may not fully capture the complexity and evolution of digital language, especially in heterogeneous cultural or linguistic contexts. This could result in biases in the early identification of outbreaks, potentially underestimating or overestimating certain terms based on subjective criteria or technical limitations. Therefore, moving toward more sophisticated methods, such as machine learning-based language models, could allow for a more dynamic, adaptive, and precise selection of key terms [42,60,62].
Another substantial challenge lies in the insufficient consideration of demographic and technological variables in most studies. The lack of detailed analyses regarding the demographic composition of digital users, including factors such as age, gender, education, socioeconomic status, and location, can limit the representativeness of the obtained epidemiological signals. Moreover, the digital divide, marked by disparities in access to and use of technology between urban and rural areas or between countries with different socioeconomic levels, can significantly bias the results, hindering their generalization and universal applicability. Thus, integrating detailed demographic analyses and studies on technological penetration would allow for a better interpretation of the data and tailored conclusions for specific realities [63,64].
Although most research validates its findings through comparison with official data, the systematic integration of these digital methodologies with traditional epidemiological surveillance systems remains limited. There is still a need to develop clear and standardized protocols that facilitate the effective and transparent combination of both data sources. In this regard, internationally coordinated multicenter studies could provide the necessary foundation to establish consensus standards that ensure external validity and allow for reliable and generalized application of the results on a global scale [16,60].

4.5. Clinical Implications and Recommendations for Future Research

Digital surveillance presents itself as a promising complement to traditional epidemiological surveillance systems by enabling the early detection of outbreaks and near real-time analysis of epidemiological trends. However, to fully exploit this potential and overcome the current challenges, it is necessary to implement standardized protocols and more robust integration strategies, combined with an interdisciplinary approach that brings together technical experts, epidemiologists, and social scientists [1,65].

4.5.1. Clear and Actionable Recommendations

Standardization of Protocols: It is imperative to develop multicenter studies that use uniform methods in all phases of the analysis. This would include the adoption of standardized protocols for keyword selection, whether through manual or automated methods, and for data preprocessing (normalization, smoothing, filtering). For example, common guidelines could be established that integrate similar validation metrics to those employed in the works of Brancato et al. (2024) [66] or Jacobson et al. (2024) [67], thus facilitating the comparison of results across different studies and geographical contexts.
Systematic Integration of Demographic Analysis: It is recommended to systematically incorporate geographical, socioeconomic, and demographic variables into predictive models. This approach would improve the representativeness and accuracy of epidemiological signals, reducing biases arising from regional variations in access to and use of digital technologies. To achieve this, future research should explore advanced statistical techniques such as model weighting or post-stratification, where digital data are adjusted against census data to create more nationally representative samples. Furthermore, presenting results stratified by geographic or demographic groups, rather than as a single aggregate figure, would provide more granular and equitable public health insights [60,64].
  • Use of Emerging Technologies: The adoption of artificial intelligence tools and advanced techniques, such as metagenomic analysis, could optimize outbreak detection. It is recommended to implement automated classification and filtering techniques to analyze large volumes of data, thereby increasing the sensitivity and predictive capacity of digital systems, particularly when combined with traditional surveillance sources [68].
  • Mitigation of Media-Driven Noise: To counteract the effects of the infodemic, models should be designed to distinguish between general “chatter” and true symptom-related signals. This could involve multi-stream analysis that compares symptom searches against news trends or the integration of data sources less susceptible to media influence, such as participatory surveillance systems.
  • Longitudinal Evaluations: It is suggested to conduct long-term follow-up studies that evaluate the efficacy, stability, and cost-effectiveness of digital systems in various epidemiological contexts. This approach would not only provide robust evidence on the sustainability of digital surveillance but also help refine and improve predictive models over time.

4.5.2. Interdisciplinary Perspective

To address the complexity of digital analysis in public health, it is essential to foster collaboration among technical experts, epidemiologists, and social scientists. Such interdisciplinary collaboration would allow for the following:
  • Integrating Technical and Contextual Knowledge: Technology specialists can optimize algorithms and predictive models, while epidemiologists contribute their understanding of disease dynamics and social scientists provide insights into cultural, demographic, and behavioral factors that are essential for interpreting digital signals.
  • Designing Adapted and Equitable Interventions: This collaborative approach will facilitate the design of public health interventions that are both precise and adapted to local realities, maximizing the impact on outbreak prevention and control.
  • Developing Holistic Solutions: By combining skills and knowledge from various disciplines, it is possible to develop comprehensive solutions that address both operational and ethical aspects, ensuring that digital surveillance is implemented responsibly and with high standards of effectiveness.
Although current evidence widely supports the predictive capacity of digital tools in epidemiological surveillance, methodological and operational challenges persist that hinder their clinical generalization. Therefore, it is recommended to adopt an interdisciplinary and collaborative approach that combines advanced technological innovations with rigorous methodological designs, thereby strengthening the public health system’s response capacity in the face of epidemiological emergencies.

4.6. From Prediction to Action: Early Warning Mechanisms and Their Impact

While the title of this review emphasizes “early warning”, a crucial distinction must be made between a model’s predictive accuracy and its function as a true warning mechanism. The value of digital surveillance lies not just in forecasting trends, but in its ability to translate those forecasts into timely, actionable alerts for public health officials. Our review identified three primary mechanisms through which the included studies operationalize these warnings.
The first and most straightforward mechanism is based on anomaly detection. These systems function by establishing a baseline of normal digital activity and triggering an alert when data deviates significantly from this pattern. For example, studies have monitored search volumes for terms like “pneumonia” and issued a warning upon detecting an abnormal spike, suggesting a potential outbreak before official case counts rise [5,44]. This approach is effective for capturing sudden changes but can be sensitive to media-driven noise.
A second, more structured mechanism relies on pre-defined thresholds. These approaches often adapt established epidemiological methods, such as the Moving Epidemic Method, to digital data streams. A warning is issued only when the signal (e.g., search interest for RSV) crosses a statistically defined intensity threshold, providing a more robust and less arbitrary alert system [30,47].
The third mechanism involves supervised classification. Here, machine learning models are trained to identify and categorize specific infection-related content, such as tweets describing symptoms or social media posts from individuals who self-identify as ill. An early warning is then triggered when the volume or proportion of this classified content surpasses a certain frequency, effectively creating a real-time cohort of “digital cases” [41,49,50].
However, a critical finding of this review, and a key limitation of the current field, is the profound gap between demonstrating a model’s predictive accuracy and evaluating its real-world impact on public health response. While nearly all studies validate their models against official case data, we found a scarcity of research that measures whether the “early warnings” generated by these digital systems led to concrete actions (e.g., increased local testing, targeted public health messaging, resource allocation) or ultimately altered the trajectory of an outbreak. This evidence gap highlights the critical next step for the field: moving beyond prediction and toward a formal evaluation of translational impact.

5. Conclusions

The evidence gathered in this review demonstrates that the integration of diverse digital sources, such as search engines, social networks, and specialized databases, enables the anticipation of infectious disease outbreaks with a considerable time advantage over traditional surveillance systems. These methods, which employ advanced statistical analysis and machine learning techniques, consistently achieve high correlation and precision when compared with official data.
Nonetheless, challenges persist due to heterogeneity in spatial and temporal resolution, inconsistencies in keyword selection and processing, varying strategies for managing media noise, and the limited incorporation of demographic and behavioral data. These factors contribute to bias and limit the clinical generalizability and external validity of current approaches.
To address these limitations, actionable steps must be taken. First, methodological protocols should be standardized across studies, particularly in the selection and preprocessing of keywords. This could involve the use of NLP-based taxonomies, crowd-sourced dictionaries, or biomedical ontologies to ensure consistency and contextual relevance. Second, integrating demographic, geographic, and behavioral dimensions into models can help mitigate bias and improve representativeness. Finally, leveraging artificial intelligence for dynamic adaptation to emerging trends and data patterns can further enhance system accuracy. Together, these strategies will facilitate real-time, informed decision-making and lay the groundwork for more robust, equitable, and effective digital public health surveillance systems in future epidemiological emergencies.

Author Contributions

Conceptualization, J.F.M., D.P.-A. and Y.L.; methodology, J.F.M., D.P.-A. and Y.L.; validation, J.F.M., D.P.-A. and Y.L.; formal analysis, J.F.M., D.P.-A., J.O. and Y.L.; investigation, Y.L.; resources, Y.L.; data curation, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, L.A.A.A. and Y.L.; visualization, Y.L. and J.O.; supervision, Y.L. and L.A.A.A.; project administration, Y.L.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been funded by Dirección General de Investigaciones of Universidad Santiago de Cali under call No. DGI-01-2025.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

This research has been funded by Dirección General de Investigaciones of Universidad Santiago de Cali under call No. DGI-01-2025.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CDCCenters for Disease Control and Prevention
HPAHealth Protection Agency
PAHOPan American Health Organization
WHOWorld Health Organization
APIApplication programming interface
GPHINGlobal Public Health Intelligence Network
MEMMoving epidemic method
RT-PCRReverse transcription-polymerase chain reaction
NPIsNon-pharmaceutical interventions
DONDisease Outbreak News
ANOVAAnalysis of variance
LASSOLeast absolute shrinkage and selection operator
LOESSLocally estimated scatterplot smoothing
ROCReceiver operating characteristic
AICAkaike information criterion
SSDSum of squared differences
ARIMAAutoregressive integrated moving average
ARIMAARIMA with exogenous variables
SVMSupport vector machine
OLSOrdinary least squares
RMSERoot mean square error
MAEMean absolute error
MAPEMean absolute percentage error
AUCArea under the curve
R2Coefficient of determination
ρ (rho)Correlation coefficient
ReEffective reproduction number

References

  1. Fallatah, D.; Adekola, H.A. Digital Epidemiology: Harnessing Big Data for Early Detection and Monitoring of Viral Outbreaks. Infect. Prev. Pract. 2024, 6, 100382. [Google Scholar] [CrossRef]
  2. MacIntyre, C.R.; Lim, S.; Gurdasani, D.; Miranda, M.; Metcalf, D.; Quigley, A.L.; Hutchinson, D.; Burr, A.; Heslop, D.J. Early Detection of Emerging Infectious Diseases—Implications for Vaccine Development. Vaccine 2023, 42, 1826–1830. [Google Scholar] [CrossRef] [PubMed]
  3. O’Shea, J. Digital Disease Detection: A Systematic Review of Event-Based Internet Biosurveillance Systems. Int. J. Med. Inform. 2017, 101, 15–22. [Google Scholar] [CrossRef]
  4. Shausan, A.; Nazarathy, Y.; Dyda, A. Emerging Data Inputs for Infectious Diseases Surveillance and Decision Making. Front. Digit. Health 2023, 5, 1131731. [Google Scholar] [CrossRef]
  5. Dai, Y.; Wang, J. Identifying the Outbreak Signal of COVID-19 before the Response of the Traditional Disease Monitoring System. PLoS Negl. Trop. Dis. 2020, 14, e0008758. [Google Scholar] [CrossRef] [PubMed]
  6. Kogan, N.E.; Clemente, L.; Liautaud, P.; Kaashoek, J.; Link, N.B.; Nguyen, A.T.; Lu, F.S.; Huybers, P.; Resch, B.; Havas, C.; et al. An Early Warning Approach to Monitor COVID-19 Activity with Multiple Digital Traces in near Real Time. Sci. Adv. 2021, 7, eabd6989. [Google Scholar] [CrossRef] [PubMed]
  7. Perlaza, C.L.; Cruz Mosquera, F.E.; Moreno Reyes, S.P.; Tovar Salazar, S.M.; Cruz Rojas, A.F.; España Serna, J.D.; Liscano, Y. Sociodemographic, Clinical, and Ventilatory Factors Influencing COVID-19 Mortality in the ICU of a Hospital in Colombia. Healthcare 2024, 12, 2294. [Google Scholar] [CrossRef]
  8. Shin, S.-Y.; Seo, D.-W.; An, J.; Kwak, H.; Kim, S.-H.; Gwack, J.; Jo, M.-W. High Correlation of Middle East Respiratory Syndrome Spread with Google Search and Twitter Trends in Korea. Sci. Rep. 2016, 6, 32920. [Google Scholar] [CrossRef]
  9. Poirel, L.; Vuillemin, X.; Kieffer, N.; Mueller, L.; Descombes, M.-C.; Nordmann, P. Identification of FosA8, a Plasmid-Encoded Fosfomycin Resistance Determinant from Escherichia Coli, and Its Origin in Leclercia Adecarboxylata. Antimicrob. Agents Chemother. 2019, 63, 10–1128. [Google Scholar] [CrossRef]
  10. Samaras, L.; García-Barriocanal, E.; Sicilia, M.-A. Comparing Social Media and Google to Detect and Predict Severe Epidemics. Sci. Rep. 2020, 10, 4747. [Google Scholar] [CrossRef]
  11. Budd, J.; Miller, B.S.; Manning, E.M.; Lampos, V.; Zhuang, M.; Edelstein, M.; Rees, G.; Emery, V.C.; Stevens, M.M.; Keegan, N.; et al. Digital Technologies in the Public-Health Response to COVID-19. Nat. Med. 2020, 26, 1183–1192. [Google Scholar] [CrossRef] [PubMed]
  12. Chen, T.; Rosen, R.; Grace, W.; Alden, D. Case Report: A Case of Adult Nesidioblastosis. HPB 2022, 24, S328. [Google Scholar] [CrossRef]
  13. Dhewantara, P.W.; Lau, C.L.; Allan, K.J.; Hu, W.; Zhang, W.; Mamun, A.A.; Soares Magalhães, R.J. Spatial Epidemiological Approaches to Inform Leptospirosis Surveillance and Control: A Systematic Review and Critical Appraisal of Methods. Zoonoses Public Health 2019, 66, 185–206. [Google Scholar] [CrossRef]
  14. Nageshwaran, G.; Harris, R.C.; Guerche-Seblain, C.E. Review of the Role of Big Data and Digital Technologies in Controlling COVID-19 in Asia: Public Health Interest vs. Privacy. Digit. Health 2021, 7, 20552076211002953. [Google Scholar] [CrossRef] [PubMed]
  15. Villanueva Parra, I.; Muñoz Diaz, V.; Martinez Guevara, D.; Cruz Mosquera, F.E.; Prieto-Alvarado, D.E.; Liscano, Y. A Scoping Review of Angiostrongyliasis and Other Diseases Associated with Terrestrial Mollusks, Including Lissachatina Fulica: An Overview of Case Reports and Series. Pathogens 2024, 13, 862. [Google Scholar] [CrossRef]
  16. Aiello, A.E.; Renson, A.; Zivich, P.N. Social Media- and Internet-Based Disease Surveillance for Public Health. Annu. Rev. Public Health 2020, 41, 101–118. [Google Scholar] [CrossRef]
  17. Ibrahim, N.K. Epidemiologic Surveillance for Controlling COVID-19 Pandemic: Types, Challenges and Implications. J. Infect. Public Health 2020, 13, 1630–1638. [Google Scholar] [CrossRef]
  18. Zhao, I.Y.; Ma, Y.X.; Yu, M.W.C.; Liu, J.; Dong, W.N.; Pang, Q.; Lu, X.Q.; Molassiotis, A.; Holroyd, E.; Wong, C.W.W. Ethics, Integrity, and Retributions of Digital Detection Surveillance Systems for Infectious Diseases: Systematic Literature Review. J. Med. Internet Res. 2021, 23, e32328. [Google Scholar] [CrossRef]
  19. Munn, Z.; Barker, T.H.; Moola, S.; Tufanaru, C.; Stern, C.; McArthur, A.; Stephenson, M.; Aromataris, E. Methodological Quality of Case Series Studies: An Introduction to the JBI Critical Appraisal Tool. JBI Evid. Synth 2020, 18, 2127–2133. [Google Scholar] [CrossRef]
  20. Tricco, A.C.; Lillie, E.; Zarin, W.; O’Brien, K.K.; Colquhoun, H.; Levac, D.; Moher, D.; Peters, M.D.J.; Horsley, T.; Weeks, L.; et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and Explanation. Ann. Intern. Med. 2018, 169, 467–473. [Google Scholar] [CrossRef]
  21. Chiolero, A.; Buckeridge, D. Glossary for Public Health Surveillance in the Age of Data Science. J. Epidemiol. Community Health 2020, 74, 612–616. [Google Scholar] [CrossRef] [PubMed]
  22. Hong, R.; Walker, R.; Hovan, G.; Henry, L.; Pescatore, R. The Power of Public Health Surveillance. Dela. J. Public Health 2020, 6, 60–63. [Google Scholar] [CrossRef] [PubMed]
  23. Guerra, J.; Acharya, P.; Barnadas, C. Community-Based Surveillance: A Scoping Review. PLoS ONE 2019, 14, e0215278. [Google Scholar] [CrossRef] [PubMed]
  24. Southall, E.; Brett, T.S.; Tildesley, M.J.; Dyson, L. Early Warning Signals of Infectious Disease Transitions: A Review. J. R. Soc. Interface 2021, 18, 20210555. [Google Scholar] [CrossRef]
  25. Wang, R.; Jiang, Y.; Michael, E.; Zhao, G. How to Select a Proper Early Warning Threshold to Detect Infectious Disease Outbreaks Based on the China Infectious Disease Automated Alert and Response System (CIDARS). BMC Public Health 2017, 17, 570. [Google Scholar] [CrossRef]
  26. Yousefinaghani, S.; Dara, R.; Mubareka, S.; Sharif, S. Prediction of COVID-19 Waves Using Social Media and Google Search: A Case Study of the US and Canada. Front. Public Health 2021, 9, 656635. [Google Scholar] [CrossRef]
  27. Haddaway, N.R.; Page, M.J.; Pritchard, C.C.; McGuinness, L.A. PRISMA2020: An R Package and Shiny App for Producing PRISMA 2020-Compliant Flow Diagrams, with Interactivity for Optimised Digital Transparency and Open Synthesis. Campbell Syst. Rev. 2022, 18, e1230. [Google Scholar] [CrossRef]
  28. Lampos, V.; Majumder, M.S.; Yom-Tov, E.; Edelstein, M.; Moura, S.; Hamada, Y.; Rangaka, M.X.; McKendry, R.A.; Cox, I.J. Tracking COVID-19 Using Online Search. NPJ Digit. Med. 2021, 4, 17. [Google Scholar] [CrossRef]
  29. Timpka, T.; Spreco, A.; Dahlström, Ö.; Eriksson, O.; Gursky, E.; Ekberg, J.; Blomqvist, E.; Strömgren, M.; Karlsson, D.; Eriksson, H.; et al. Performance of eHealth Data Sources in Local Influenza Surveillance: A 5-Year Open Cohort Study. J. Med. Internet Res. 2014, 16, e116. [Google Scholar] [CrossRef]
  30. Van De Belt, T.H.; Van Stockum, P.T.; Engelen, L.J.L.P.G.; Lancee, J.; Schrijver, R.; Rodríguez-Baño, J.; Tacconelli, E.; Saris, K.; Van Gelder, M.M.H.J.; Voss, A. Social Media Posts and Online Search Behaviour as Early-Warning System for MRSA Outbreaks. Antimicrob. Resist. Infect. Control 2018, 7, 69. [Google Scholar] [CrossRef]
  31. Lampos, V.; Cristianini, N. Tracking the Flu Pandemic by Monitoring the Social Web. In Proceedings of the 2010 2nd International Workshop on Cognitive Information Processing, Elba, Italy, 14–16 June 2010; IEEE: Elba Island, Italy, 2010; pp. 411–416. [Google Scholar]
  32. McGough, S.F.; Brownstein, J.S.; Hawkins, J.B.; Santillana, M. Forecasting Zika Incidence in the 2016 Latin America Outbreak Combining Traditional Disease Surveillance with Search, Social Media, and News Report Data. PLoS Neglected Trop. Dis. 2017, 11, e0005295. [Google Scholar] [CrossRef] [PubMed]
  33. Wittwer, S.; Paolotti, D.; Lichand, G.; Leal Neto, O. Participatory Surveillance for COVID-19 Trend Detection in Brazil: Cross-Sectional Study. JMIR Public Health Surveill. 2023, 9, e44517. [Google Scholar] [CrossRef]
  34. Yan, S.J.; Chughtai, A.A.; Macintyre, C.R. Utility and Potential of Rapid Epidemic Intelligence from Internet-Based Sources. Int. J. Infect. Dis. 2017, 63, 77–87. [Google Scholar] [CrossRef]
  35. Strauss, R.A.; Castro, J.S.; Reintjes, R.; Torres, J.R. Google Dengue Trends: An Indicator of Epidemic Behavior. The Venezuelan Case. Int. J. Med. Inform. 2017, 104, 26–30. [Google Scholar] [CrossRef]
  36. Chunara, R.; Andrews, J.R.; Brownstein, J.S. Social and News Media Enable Estimation of Epidemiological Patterns Early in the 2010 Haitian Cholera Outbreak. Am. Soc. Trop. Med. Hyg. 2012, 86, 39–45. [Google Scholar] [CrossRef] [PubMed]
  37. Barboza, P.; Vaillant, L.; Le Strat, Y.; Hartley, D.M.; Nelson, N.P.; Mawudeku, A.; Madoff, L.C.; Linge, J.P.; Collier, N.; Brownstein, J.S.; et al. Factors Influencing Performance of Internet-Based Biosurveillance Systems Used in Epidemic Intelligence for Early Detection of Infectious Diseases Outbreaks. PLoS ONE 2014, 9, e90536. [Google Scholar] [CrossRef]
  38. Verma, M.; Kishore, K.; Kumar, M.; Sondh, A.R.; Aggarwal, G.; Kathirvel, S. Google Search Trends Predicting Disease Outbreaks: An Analysis from India. Healthc. Inform. Res. 2018, 24, 300. [Google Scholar] [CrossRef] [PubMed]
  39. Santillana, M.; Nguyen, A.T.; Dredze, M.; Paul, M.J.; Nsoesie, E.O.; Brownstein, J.S. Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance. PLoS Comput. Biol. 2015, 11, e1004513. [Google Scholar] [CrossRef]
  40. Majumder, M.S.; Santillana, M.; Mekaru, S.R.; McGinnis, D.P.; Khan, K.; Brownstein, J.S. Utilizing Nontraditional Data Sources for Near Real-Time Estimation of Transmission Dynamics During the 2015–2016 Colombian Zika Virus Disease Outbreak. JMIR Public Health Surveill. 2016, 2, e30. [Google Scholar] [CrossRef]
  41. Li, L.; Gao, L.; Zhou, J.; Ma, Z.; Choy, D.F.; Hall, M.A. Can Social Media Data Be Utilized to Enhance Early Warning: Retrospective Analysis of the U.S. COVID-19 Pandemic 2021. medRxiv 2021. [Google Scholar] [CrossRef]
  42. Feldman, J.; Thomas-Bachli, A.; Forsyth, J.; Patel, Z.H.; Khan, K. Development of a Global Infectious Disease Activity Database Using Natural Language Processing, Machine Learning, and Human Expertise. J. Am. Med. Inform. Assoc. 2019, 26, 1355–1359. [Google Scholar] [CrossRef] [PubMed]
  43. Sharpe, J.D.; Hopkins, R.S.; Cook, R.L.; Striley, C.W. Evaluating Google, Twitter, and Wikipedia as Tools for Influenza Surveillance Using Bayesian Change Point Analysis: A Comparative Analysis. JMIR Public Health Surveill. 2016, 2, e161. [Google Scholar] [CrossRef]
  44. Porcu, G.; Chen, Y.X.; Bonaugurio, A.S.; Villa, S.; Riva, L.; Messina, V.; Bagarella, G.; Maistrello, M.; Leoni, O.; Cereda, D.; et al. Web-Based Surveillance of Respiratory Infection Outbreaks: Retrospective Analysis of Italian COVID-19 Epidemic Waves Using Google Trends. Front. Public Health 2023, 11, 1141688. [Google Scholar] [CrossRef] [PubMed]
  45. Lu, F.S.; Hou, S.; Baltrusaitis, K.; Shah, M.; Leskovec, J.; Sosic, R.; Hawkins, J.; Brownstein, J.; Conidi, G.; Gunn, J.; et al. Accurate Influenza Monitoring and Forecasting Using Novel Internet Data Streams: A Case Study in the Boston Metropolis. JMIR Public Health Surveill. 2018, 4, e4. [Google Scholar] [CrossRef]
  46. Chan, E.H.; Sahai, V.; Conrad, C.; Brownstein, J.S. Using Web Search Query Data to Monitor Dengue Epidemics: A New Model for Neglected Tropical Disease Surveillance. PLoS Negl. Trop. Dis. 2011, 5, e1206. [Google Scholar] [CrossRef]
  47. Wang, D.; Guerra, A.; Wittke, F.; Lang, J.C.; Bakker, K.; Lee, A.W.; Finelli, L.; Chen, Y.-H. Real-Time Monitoring of Infectious Disease Outbreaks with a Combination of Google Trends Search Results and the Moving Epidemic Method: A Respiratory Syncytial Virus Case Study. Trop. Med. Infect. Dis. 2023, 8, 75. [Google Scholar] [CrossRef] [PubMed]
  48. Alessa, A.; Faezipour, M. Flu Outbreak Prediction Using Twitter Posts Classification and Linear Regression With Historical Centers for Disease Control and Prevention Reports: Prediction Framework Study. JMIR Public Health Surveill. 2019, 5, e12383. [Google Scholar] [CrossRef]
  49. Broniatowski, D.A.; Paul, M.J.; Dredze, M. National and Local Influenza Surveillance through Twitter: An Analysis of the 2012-2013 Influenza Epidemic. PLoS ONE 2013, 8, e83672. [Google Scholar] [CrossRef]
  50. Shen, C.; Chen, A.; Luo, C.; Zhang, J.; Feng, B.; Liao, W. Using Reports of Symptoms and Diagnoses on Social Media to Predict COVID-19 Case Counts in Mainland China: Observational Infoveillance Study. J. Med. Internet. Res. 2020, 22, e19421. [Google Scholar] [CrossRef]
  51. Broniatowski, D.A.; Dredze, M.; Paul, M.J.; Dugas, A. Using Social Media to Perform Local Influenza Surveillance in an Inner-City Hospital: A Retrospective Observational Study. JMIR Public Health Surveill. 2015, 1, e5. [Google Scholar] [CrossRef]
  52. Eysenbach, G. How to Fight an Infodemic: The Four Pillars of Infodemic Management. J. Med. Internet. Res. 2020, 22, e21820. [Google Scholar] [CrossRef] [PubMed]
  53. Klimiuk, K.B.; Balwicki, Ł.W. What is infodemiology? An overview and its role in public health. Przegl. Epidemiol. 2024, 78, 81–89. [Google Scholar] [CrossRef]
  54. Menz, B.D.; Modi, N.D.; Sorich, M.J.; Hopkins, A.M. Health Disinformation Use Case Highlighting the Urgent Need for Artificial Intelligence Vigilance: Weapons of Mass Disinformation. JAMA Intern. Med. 2024, 184, 92–96. [Google Scholar] [CrossRef] [PubMed]
  55. Boyd, A.D.; Gonzalez-Guarda, R.; Lawrence, K.; Patil, C.L.; Ezenwa, M.O.; O’Brien, E.C.; Paek, H.; Braciszewski, J.M.; Adeyemi, O.; Cuthel, A.M.; et al. Potential Bias and Lack of Generalizability in Electronic Health Record Data: Reflections on Health Equity from the National Institutes of Health Pragmatic Trials Collaboratory. J. Am. Med. Inform. Assoc. 2023, 30, 1561–1566. [Google Scholar] [CrossRef]
  56. Al-Kenane, K.; Boy, F.; Alsaber, A.; Nafea, R.; AlMutairi, S. Digital Epidemiology of High-Frequency Search Listening Trends for the Surveillance of Subjective Well-Being during COVID-19 Pandemic. Front. Psychol. 2024, 15, 1442303. [Google Scholar] [CrossRef] [PubMed]
  57. Melo, C.L.; Mageste, L.R.; Guaraldo, L.; Paula, D.P.; Wakimoto, M.D. Use of Digital Tools in Arbovirus Surveillance: Scoping Review. J. Med. Internet. Res. 2024, 26, e57476. [Google Scholar] [CrossRef]
  58. Jia, P.; Liu, S.; Yang, S. Innovations in Public Health Surveillance for Emerging Infections. Annu. Rev. Public Health 2023, 44, 55–74. [Google Scholar] [CrossRef]
  59. Salathé, M.; Bengtsson, L.; Bodnar, T.J.; Brewer, D.D.; Brownstein, J.S.; Buckee, C.; Campbell, E.M.; Cattuto, C.; Khandelwal, S.; Mabry, P.L.; et al. Digital Epidemiology. PLoS Comput. Biol. 2012, 8, e1002616. [Google Scholar] [CrossRef]
  60. Shakeri Hossein Abad, Z.; Kline, A.; Sultana, M.; Noaeen, M.; Nurmambetova, E.; Lucini, F.; Al-Jefri, M.; Lee, J. Digital Public Health Surveillance: A Systematic Scoping Review. NPJ Digit. Med. 2021, 4, 41. [Google Scholar] [CrossRef]
  61. Shaweno, D.; Karmakar, M.; Alene, K.A.; Ragonnet, R.; Clements, A.C.; Trauer, J.M.; Denholm, J.T.; McBryde, E.S. Methods Used in the Spatial Analysis of Tuberculosis Epidemiology: A Systematic Review. BMC Med. 2018, 16, 193. [Google Scholar] [CrossRef]
  62. Sulaiman, F.; Yanti, N.S.; Lesmanawati, D.A.S.; Trent, M.J.; Macintyre, C.R.; Chughtai, A.A. Language Specific Gaps in Identifying Early Epidemic Signals—A Case Study of the Malay Language. Glob. Biosecurity 2019, 1, 1–10. [Google Scholar] [CrossRef]
  63. Cho, P.J.; Yi, J.; Ho, E.; Shandhi, M.M.H.; Dinh, Y.; Patil, A.; Martin, L.; Singh, G.; Bent, B.; Ginsburg, G.; et al. Demographic Imbalances Resulting From the Bring-Your-Own-Device Study Design. JMIR Mhealth Uhealth 2022, 10, e29510. [Google Scholar] [CrossRef] [PubMed]
  64. Ragnedda, M.; Ruiu, M.L.; Calderón-Gómez, D. Examining the Interplay of Sociodemographic and Sociotechnical Factors on Users’ Perceived Digital Skills. MaC 2024, 12, 8167. [Google Scholar] [CrossRef]
  65. Kostkova, P.; Saigí-Rubió, F.; Eguia, H.; Borbolla, D.; Verschuuren, M.; Hamilton, C.; Azzopardi-Muscat, N.; Novillo-Ortiz, D. Data and Digital Solutions to Support Surveillance Strategies in the Context of the COVID-19 Pandemic. Front. Digit. Health 2021, 3, 707902. [Google Scholar] [CrossRef] [PubMed]
  66. Brancato, V.; Esposito, G.; Coppola, L.; Cavaliere, C.; Mirabelli, P.; Scapicchio, C.; Borgheresi, R.; Neri, E.; Salvatore, M.; Aiello, M. Standardizing Digital Biobanks: Integrating Imaging, Genomic, and Clinical Data for Precision Medicine. J. Transl. Med. 2024, 22, 136. [Google Scholar] [CrossRef]
  67. Jacobson, L.P.; Parker, C.B.; Cella, D.; Mroczek, D.K.; Lester, B.M.; on behalf of program collaborators for Environmental influences on Child Health Outcomes; Smith, P.B.; Newby, K.L.; Catellier, D.J.; Gershon, R.; et al. Approaches to Protocol Standardization and Data Harmonization in the ECHO-Wide Cohort Study. Pediatr. Res. 2024, 95, 1726–1733. [Google Scholar] [CrossRef]
  68. Syrowatka, A. Leveraging Artificial Intelligence for Pandemic Preparedness and Response: A Scoping Review to Identify Key Use Cases. NPJ Digit. Med. 2021, 4, 96. [Google Scholar] [CrossRef]
Figure 1. PRISMA flow diagram.
Figure 1. PRISMA flow diagram.
Ijerph 22 01104 g001
Figure 2. Temporal and geographic distribution of the studies. Note: This figure was generated using the ggplot2 library in R.
Figure 2. Temporal and geographic distribution of the studies. Note: This figure was generated using the ggplot2 library in R.
Ijerph 22 01104 g002
Figure 3. Frequency of use of different data sources. Note: This figure was generated using the matplotlib library in Python.
Figure 3. Frequency of use of different data sources. Note: This figure was generated using the matplotlib library in Python.
Ijerph 22 01104 g003
Figure 4. Methodological flow diagram. Note: This figure was generated using Napkin.
Figure 4. Methodological flow diagram. Note: This figure was generated using Napkin.
Ijerph 22 01104 g004
Figure 5. Matrix of analytical techniques in digital surveillance studies (chronologically ordered). Cells marked with a value indicate the presence of a technique in the study, which quickly identifies which methods have been used and how frequently. Note: This figure was generated using the matplotlib library in Python [5,6,8,10,26,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51].
Figure 5. Matrix of analytical techniques in digital surveillance studies (chronologically ordered). Cells marked with a value indicate the presence of a technique in the study, which quickly identifies which methods have been used and how frequently. Note: This figure was generated using the matplotlib library in Python [5,6,8,10,26,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51].
Ijerph 22 01104 g005
Figure 6. Comparison of lead time and precision across studies. Note: This figure was generated using the ggplot2 library in R [5,6,8,10,26,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51].
Figure 6. Comparison of lead time and precision across studies. Note: This figure was generated using the ggplot2 library in R [5,6,8,10,26,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51].
Ijerph 22 01104 g006
Figure 7. Relationship between lead time and detection rate in digital public health surveillance studies [5,6,8,10,26,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51].
Figure 7. Relationship between lead time and detection rate in digital public health surveillance studies [5,6,8,10,26,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51].
Ijerph 22 01104 g007
Figure 8. Heat map of complementary characteristics and contextual variables [5,6,8,10,26,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51].
Figure 8. Heat map of complementary characteristics and contextual variables [5,6,8,10,26,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51].
Ijerph 22 01104 g008
Figure 9. Conceptual framework for digital epidemiological surveillance. Note: This figure was generated using Napkin.
Figure 9. Conceptual framework for digital epidemiological surveillance. Note: This figure was generated using Napkin.
Ijerph 22 01104 g009
Table 1. General characteristics of the included studies.
Table 1. General characteristics of the included studies.
Author and YearStudy DesignData Collection
Period
LocationType of Disease or
Outbreak
Dai et al. (2020) [5]Quantitative comparative empirical2015–2020ChinaCOVID-19
Lampos et al. (2010) [31]Quantitative empirical2009United KingdomInfluenza (H1N1)
Lampos et al. (2021) [28]Observational/modeling2011–2020USA, United Kingdom, Australia, etc.COVID-19
Van de Belt et al. (2018) [30]Comparative exploratory2015–2017NetherlandsMRSA
Timpka et al. (2014) [29]Open cohort2007–2012SwedenInfluenza
McGough et al. (2017) [32]Retrospective multivariable forecasting2015–2016Latin AmericaZika
Yousefinaghani et al. (2021) [26]Observational, retrospective, and predictive2020USA and CanadaCOVID-19
Wittwer et al. (2023) [33]Cross-sectional comparative2020BrazilCOVID-19
Shin et al. (2016) [8]Observational correlational2015South KoreaMERS
Yan et al. (2017) [34]Systematic review2006–2016InternationalVarious
Strauss et al. (2017) [35]Observational correlational2004–2014VenezuelaDengue
Chunara et al. (2012) [36]Observational2011HaitiCholera
Barboza et al. (2014) [37]Quantitative evaluation2010InternationalVarious
Kogan et al. (2021) [6]Early warning2020USACOVID-19
Verma et al. (2018) [38]Cross-sectional correlational2016IndiaDengue, malaria, etc.
Santillana et al. (2015) [39]Machine learning2013–2015USAInfluenza (ILI)
Majumder et al. (2016) [40]Retrospective2015–2016ColombiaZika
Samaras et al. (2020) [10]ComparativeInfluenza seasonGreeceInfluenza
Li et al. (2021) [41]Retrospective2020USACOVID-19
Feldman et al. (2019) [42]Database development10 monthsGlobal114 diseases
Sharpe et al. (2016) [43]Retrospective comparative2012–2015USAInfluenza (ILI)
Porcu et al. (2023) [44]Retrospective2020–2021ItalyCOVID-19
Lu et al. (2018) [45]Retrospective observational2012–2016Boston, USAInfluenza
Chan et al. (2011) [46]Real-time monitoring2003–2010Bolivia, Brazil, India, etc.Dengue
Wang et al. (2023) [47]Outbreak prediction5 yearsJapan, Germany, BelgiumRSV
Alessa and Faezipour (2019) [48]Retrospective observational2018USA (Connecticut)Influenza
Broniatowski et al. (2013) [49]Observational infoveillance2012–2013USA (National and NYC)Influenza
Shen et al. (2020) [50]Retrospective observational2019–2020ChinaCOVID-19
Broniatowski et al. (2015) [51]Retrospective observational study20 November 2011—16 March 2014Baltimore, Maryland, USA (inner-city hospital)Influenza
COVID-19: coronavirus disease 2019; H1N1: a subtype of the influenza A virus (H1N1); MRSA: methicillin-resistant Staphylococcus aureus; MERS: Middle East respiratory syndrome; ILI: influenza-like illness.
Table 2. Digital data sources and platforms employed.
Table 2. Digital data sources and platforms employed.
Author and YearSpecific Digital Platforms and ToolsIntegration with Traditional Surveillance Systems
Dai et al. (2020) [5]Baidu Search EngineComparison with the traditional case reporting system
Lampos et al. (2010) [31]XCalibration of the “flu-score” with HPA data
Lampos et al. (2021) [28]Google Search and news dataComparison with official case and death data
Van de Belt et al. (2018) [30]Coosto (social media monitoring) and Google TrendsComparison with official notifications in the SO ZI/AMR system
Timpka et al. (2014) [29]Google Flu Trends, Healthcare Direct/1177, Google AnalyticsComparison with clinical and laboratory data on influenza
McGough et al. (2017) [32]Google Search, X, HealthMapIntegration with Zika data reported by PAHO and health ministries
Yousefinaghani et al. (2021) [26]X API and Google TrendsComparison with official data (Johns Hopkins COVID-19)
Wittwer et al. (2023) [33]Brazil Sem Corona and GitHub dataIntegration with PS and TS data to improve prediction
Shin et al. (2016) [8]Google Trends, TopsyComparison with official MERS data
Yan et al. (2017) [34]Google Flu Trends, Google Trends, Baidu, X, ProMED-mail, HealthMapDiscussion on complementarity with traditional systems
Strauss et al. (2017) [35]Google Dengue TrendsComparison and proposal for complementarity with the surveillance system
Chunara et al. (2012) [36]HealthMap and XComparison with official data from the MSPP
Barboza et al. (2014) [37]Argus, BioCaster, GPHIN, HealthMap, MedISys, ProMED-mailComparative evaluation with official BHI data
Kogan et al. (2021) [6]Google Trends, X, UpToDate, GLEAM, Apple Mobility, Cuebiq, Kinsa ThermometerIntegration of digital proxies with cases, deaths, and ILI
Verma et al. (2018) [38]Google Trends and Google CorrelateComparison with the IDSP surveillance system
Santillana et al. (2015) [39]Google Trends, X, athenahealth, FluNearYouComparison of predictions with CDC reports
Majumder et al. (2016) [40]HealthMap and Google TrendsValidation with official INS data
Samaras et al. (2020) [10]Google Trends, X API (Tweepy and Pytrends)Comparison with official influenza data in Europe
Li et al. (2021) [41]X Standard Search APIComparison with official systems based on searches and news
Feldman et al. (2019) [42]GDELT Global Knowledge Graph and Google Translate APIComparison with WHO (DON) reports
Sharpe et al. (2016) [43]Google Flu Trends, HealthTweets, WikipediaComparison with CDC official reports
Porcu et al. (2023) [44]Google TrendsValidation with RT-PCR data
Lu et al. (2018) [45]Google Trends, X, athenahealth, Flu Near YouValidation with data from the Boston Public Health Commission
Chan et al. (2011) [46]Google Search queriesComparison with data from ministries of health and WHO
Wang et al. (2023) [47]Google TrendsComplement for clinical surveillance
Alessa and Faezipour (2019) [48]XValidation with CDC and hospital data
Broniatowski et al. (2013) [49]X API (HealthTweets and Google Flu Trends)Validation with CDC and NYC Department of Health reports
Shen et al. (2020) [50]WeiboComparison with official data from the China CDC
Broniatowski et al. (2015) [51]X (HealthTweets)Comparison with hospital data (laboratory cases and ILI in ED)
CDC: Centers for Disease Control and Prevention; HPA: Health Protection Agency; PAHO: Pan American Health Organization; WHO: World Health Organization; API: application programming interface; GPHIN: Global Public Health Intelligence Network; RT-PCR: reverse transcription-polymerase chain reaction; NPIs: non-pharmaceutical interventions; DON: Disease Outbreak News.
Table 3. Methods for data analysis and processing.
Table 3. Methods for data analysis and processing.
Author and YearComparison MethodDetection MethodPreprocessingAnalytical Techniques
Dai et al. (2020) [5]Correlation analysis between anomalous peaks and official reportsAbnormal increase in ILI and searches (e.g., “pneumonia”, “SARS”)Smoothing (7-day moving average)ANOVA, linear regression, correlation
Lampos et al. (2010) [31]Comparison of “flu-score” in tweets versus ILI ratesCalculation of “flu-score” from tweetsStop word removal, stemming, smoothingLinear regression, LASSO, supervised learning
Lampos et al. (2021) [28]Comparison of online queries with official COVID-19 dataUnsupervised models and transfer learning with symptomsNormalization and weighting of symptomsElastic net, Gaussian processes, correlation
Van de Belt et al. (2018) [30]Comparison of outbreaks detected on social networks with official reportsDetection on social media and Google TrendsThresholds in social media and Google TrendsDescriptive statistics, ROC analysis, correlation
Timpka et al. (2014) [29]Comparison of eHealth data with clinical and laboratory casesCorrelation of eHealth data with clinical dataWeekly adjustment, detrendingLinear regression, autoregressive models, correlation
McGough et al. (2017) [32]Predictive models of Zika cases with digital dataCase prediction using digital signalsLog transformations and normalizationElastic net, cross-validation, autoregressive models
Yousefinaghani et al. (2021) [26]Comparison of digital time series with COVID-19 casesAnomaly analysis in tweets and searchesKeyword filtering and geolocationAnomaly analysis, regression, validation
Wittwer et al. (2023) [33]Comparison of self-reported infection rates with official dataEstimation of infection rates from self-reportsLOESS smoothing of fluctuationsAutoregressive models, AIC, variable combination
Shin et al. (2016) [8]Correlation between digital data and official casesLag correlation between digital data and casesNormalization and word selectionSpearman and lag analysis
Yan et al. (2017) [34]Correlation analysis and detection of digital signalsDetection of digital signals in official reportsCategorization and noise eliminationCorrelation, Bayesian algorithms, signal detection
Strauss et al. (2017) [35]Comparison of digital surveillance with reported dengue casesDigital surveillance based on dengue searchesNormalization and volume conversionLinear regression, correlation analysis
Chunara et al. (2012) [36]Correlation analysis between tweets and cholera reportsAnalysis of HealthMap and X reportsFiltering and selection of key termsExponential fit, Euler–Lotka equation
Barboza et al. (2014) [37]Evaluation of biosurveillance with media signalsMedia searches validated by human assessmentManual filtering and duplicate removalPoisson regression, rate calculations
Kogan et al. (2021) [6]Comparison of digital proxies with case and death dataModeling digital proxies and official dataSmoothing and scaling of digital proxiesExponential growth, harmonic mean, correlation
Verma et al. (2018) [38]Correlation between search patterns and outbreaks in IndiaIdentification of terms in Google CorrelateSelection of terms in Google CorrelateCorrelation analysis and time series analysis
Santillana et al. (2015) [39]Prediction of ILI by combining multiple digital sourcesPrediction of ILI activity using multiple proxiesNormalization and mapping of digital sourcesLASSO regression, SVM, AdaBoost
Majumder et al. (2016) [40]Estimation of Zika transmission using digital dataModeling Zika transmission with IDEAScaling and smoothing of Google TrendsNon-linear optimization, SSD minimization
Samaras et al. (2020) [10]Predictive modeling of influenza with ARIMAEpidemic activity prediction using ARIMAElimination of duplicates in XARIMA(X) models, predictive analysis
Li et al. (2021) [41]Classification of COVID-19 tweets and lead time analysisClassification of tweets as COVID-19 alertsTokenization and lemmatization of tweetsSupervised classification and sentiment analysis
Feldman et al. (2019) [42]Validation of outbreak detection with WHO reportsOutbreak detection in news articlesAutomatic translation and tag-based filteringNaïve Bayes, SVM, bidirectional LSTM
Sharpe et al. (2016) [43]Detection of changes in time series using Bayesian methodsIdentification of change points in time seriesNormalization and weekly groupingBayesian change point models
Porcu et al. (2023) [44]Detection of outliers in searches using ARMA and EWMADetection of epidemic signals in searchesScaling adjustment from 0 to 100ARMA, EWMA, outlier detection
Lu et al. (2018) [45]Comparison of ARGO models vs. simple autoregressivesPrediction of influenza with ensemble modelsFiltering out irrelevant termsMultivariable regression and ensemble methods
Chan et al. (2011) [46]Fitting linear models to dengue searchesUnivariate linear regression with dengue searchesReplacement of spurious peaksUnivariate linear regression
Wang et al. (2023) [47]Correlation between Google Trends and clinical surveillanceDefinition of thresholds with the Moving Epidemic MethodExclusion of atypical years (2020–2021)Moving Epidemic Method (MEM)
Alessa and Faezipour (2019) [48]Classification of tweets with FastText and linear regressionRegression and classification of tweetsStemming and stop word removalFastText and linear regression
Broniatowski et al. (2013) [49]Tweet filtering for influenza detectionSupervised classification of influenza tweetsFiltering in tweet stagesSVM, logistic regression
Shen et al. (2020) [50]Granger causality analysis between “sick posts” and case countsCase prediction using Granger causality and supervised modelsClassification into “sick” versus othersRandom forest classifier and OLS regression
Broniatowski et al. (2015) [51]Estimation of influenza prevalence using tweets and countsEstimation of ILI prevalence from XNormalization of tweet volumesARIMAX analysis and logistic regression
ANOVA: analysis of variance; LASSO: least absolute shrinkage and selection operator; LOESS: locally estimated scatterplot smoothing; ROC: receiver operating characteristic; AIC: Akaike information criterion; SSD: sum of squared differences; ARIMA: autoregressive integrated moving average; ARIMAX: ARIMA with exogenous variables; SVM: support vector machine; OLS: ordinary least squares; MAE: mean absolute error; MAPE: mean absolute percentage error; AUC: area under the curve.
Table 4. Performance in early detection and precision.
Table 4. Performance in early detection and precision.
Author and YearLead TimeDetection RatePrecision
Dai et al. (2020) [5]20 days before the official alertHigh correlations; no specific rate reportedHigh correlation coefficients
Lampos et al. (2010) [31]Tweets within hours; HPA takes 1–2 weeksCorrelations 81.78–85.56%Cross-validation ~89–94%
Lampos et al. (2021) [28]Cases: 16.7 days before, deaths: 22.1 days beforeCorrelation r ≈ 0.82–0.85Evaluated with AUC and MAE
van de Belt et al. (2018) [30]Outbreaks detected 1–2 days earlierSensitivity 20%, specificity 96%AUC, sensitivity, specificity
Timpka et al. (2014) [29]GFT 2 weeks earlier; telenursing variesGFT r = 0.96, telenursing r ≈ 0.95–0.97Pearson r, RMSE
McGough et al. (2017) [32]Forecasts 1–3 weeks earlierMeasured by predictive error (rRMSE)RMSE, rRMSE, Pearson ρ
Yousefinaghani et al. (2021) [26]83% of waves detected 1 week early100% of symptoms detected in US Category IRMSE, MAE, correlations > 75%
Wittwer et al. (2023) [33]Lead time depends on participationHigh correlation in cities with good participationRMSE, MAE, Pearson correlation
Shin et al. (2016) [8]3–4 days prior to confirmationCorrelations > 0.7, up to 0.9Significant correlations (p < 0.05)
Yan et al. (2017) [34]1–12 days before official reportsAlerts 1–12 days early, variable correlationModerate to high depending on the method
Strauss et al. (2017) [35]Early alert before updater = 0.87 during epidemic weeksR2 = 0.75 in regression
Chunara et al. (2012) [36]Daily updates; official data delayed 1–2 daysρ ≈ 0.80 during growth phasesVariability in Re (1.54 to 6.89)
Barboza et al. (2014) [37]Detects events before publicationC-DR 83–95%, I-DR 47–92%Statistical differences in I-Se
Kogan et al. (2021) [6]Case increases 2–3 weeks earlierCombined sensitivity up to 0.75Precision 0.90–0.98 in proxies
Verma et al. (2018) [38]Google Trends anticipates 2–3 weeksr > 0.80 for chikungunya and dengueChikungunya r = 0.82–0.87
Santillana et al. (2015) [39]Prediction up to 4 weeks beforeReal-time prediction r = 0.989RMSE 0.176% ILI, reduced MAPE
Majumder et al. (2016) [40]Near real-time estimatesNo detection rate reported; estimation of R0Good SSD model fit
Samaras et al. (2020) [10]Searches and tweets anticipate 2–3 weeksPearson R ≈ 0.933–0.943MAPE ≈ 18.7–22.6%
Li et al. (2021) [41]Detects signals 16 days in advanceSignal strategy identifies alertsHigh classification precision
Feldman et al. (2019) [42]Outbreaks detected on average 43.4 days earlier94% of outbreaks detected before WHORecall 88.8%, precision 86.1%
Sharpe et al. (2016) [43]Google alerts changes 1–2 weeks earlierGoogle: sensitivity 92%, PPV 85%Google shows the best performance
Porcu et al. (2023) [44]Epidemics detected 7–8 weeks beforePPV 80% in Lombardy, <50% in MarcheHigh correlation in areas with high connectivity
Lu et al. (2018) [45]Nowcasting and forecasting 1 week aheadCorrelations of 0.98 (nowcast) and 0.94 (forecast)Low RMSE, MAE, and MAPE
Chan et al. (2011) [46]Real-time available dataCorrelations 0.82–0.99Good correlation fit
Wang et al. (2023) [47]Almost immediate dataJapan r = 0.87, Germany r = 0.65Good threshold estimation
Alessa and Faezipour (2019) [48]Almost real-time96.29% correlation with CDCF-measure 89.9%
Broniatowski et al. (2013) [49]Tweets available up to 2 weeks in advanceNational r = 0.93, municipal r = 0.88Lower MAE in the infection model
Shen et al. (2020) [50]Predicts cases 14 days earlierSick posts explain 12.8% of varianceHigh standardized coefficients
Broniatowski et al. (2015) [51]Tweets ahead of official dataHigh correlation at the municipal level85% accuracy in trend prediction
PPV: positive predictive value; RMSE: root mean square error; MAE: mean absolute error; MAPE: mean absolute percentage error; rRMSE: relative root mean square error; AUC: area under the curve; R2: coefficient of determination; ρ (rho): correlation coefficient; Re: effective reproduction number.
Table 5. Complementary characteristics and contextual variables.
Table 5. Complementary characteristics and contextual variables.
Author and YearSpatial Resolution and Temporal GranularityKeyword Selection ProcessMeasurement of Media ImpactDemographic and Usage Characteristics
Dai et al. (2020) [5]Regional (Wuhan, China); daily and weekly dataManual selection (“pneumonia”, “SARS”)Not evaluatedNot specified
Lampos et al. (2010) [31]Urban centers (10 km radius); daily and weekly aggregationManual selection and LASSONot directly measured5.5 million X users (United Kingdom)
Lampos et al. (2021) [28]National; daily data19 symptom-based setsMinimizes panic effect in the autoregressive modelApplication in multiple countries, no demographic details
Van de Belt et al. (2018) [30]Provinces; daily dataBoolean searches in Google TrendsNot explicitly evaluatedGeographic information by province
Timpka et al. (2014) [29]County; daily dataICD-10 and grouping in telenursingCorrelation between media coverage and GFTAge distribution in RIR
McGough et al. (2017) [32]National; weekly dataLASSO and penalized regressionNot measured; low influence mentionedData profiles by country, no demographics
Yousefinaghani et al. (2021) [26]States/provinces; weekly dataPredefined symptom listsIndirect impact by comparing preventive term usageGeolocation of tweets by state/province
Wittwer et al. (2023) [33]Municipalities; daily dataQuestionnaire based on COVID-19 symptomsImpact of media campaigns on participationParticipation rates and urban differences
Shin et al. (2016) [8]National; daily dataBasic and extended terms (“MERS”, “hospital”)Recognizes media noiseAggregated search and tweet data
Yan et al. (2017) [34]Local/global; daily to weekly dataRelevance and specificity-based selectionEvaluation of media noiseLack of detailed user data
Strauss et al. (2017) [35]National; weekly dataSpanish terms for dengueAnnual variation in searches vs. incidenceImpact of internet penetration
Chunara et al. (2012) [36]Departments and arrondissements; daily dataSearches for “cholera” and hashtagsMedia amplification effectGeographic and demographic biases
Barboza et al. (2014) [37]Country-level events; monthly dataDefined by epidemiologistsComparison of media and official source signalsLanguage distribution and regional impact
Kogan et al. (2021) [6]States; daily dataCOVID-19-related termsAnalysis of bias in digital proxiesDifferences in activity and adherence to NPIs
Verma et al. (2018) [38]States; weekly dataTerms with high correlation (Google Correlate)Search explosion preceding the reportInternet penetration in Haryana and Chandigarh
Santillana et al. (2015) [39]National; weekly dataTerms based on previous studiesCaptures media effects in search variationAggregated national level, no demographic details
Majumder et al. (2016) [40]National; aggregated dataKeyword “Zika” in Google TrendsComparison of curves, not evaluating noiseAggregated data, no demographic details
Samaras et al. (2020) [10]National (Greece); aggregated dataTerms in GreekMedia bias in searches and tweetsLimitations in tweet geolocation
Li et al. (2021) [41]State-level (USA); daily dataKeyword “coronavirus”Signal ratio as an indicator of public opinionFiltered by location in the USA
Feldman et al. (2019) [42]Global; updates every 15 minFiltering by GDELT and name databasesLead time of 43.4 days and 94% outbreak coverageNo demographic characteristics; only media data
Sharpe et al. (2016) [43]National; weekly dataImplicit terms in each sourceEvaluation of discrepancies in changesAggregated data, no demographic details
Porcu et al. (2023) [44]Regions (Italy); weekly dataItalian translation of symptomsSearch volume as a proxy for alertsVariability in internet access by region
Lu et al. (2018) [45]City; weekly dataSpecific terms for BostonMedia influence in method comparisonEmergency room visits (age, gender, ethnicity)
Chan et al. (2011) [46]National; weekly/monthly dataSelection based on correlation with official dataNot reportedNot reported
Wang et al. (2023) [47]National and regional; weekly dataTerm “RSV” or “RS virus”Mitigation of media impact with MEMNo specific details reported
Alessa and Faezipour (2019) [48]State (Connecticut); weekly data11 verified keywordsNot directly measuredNo demographic characteristics detailed
Broniatowski et al. (2013) [49]Municipal, regional, and national; weekly dataKeyword list and previous modelsSensitive to media “chatter”Possible biases due to underrepresentation of users
Shen et al. (2020) [50]National and provincial; daily data167 keywords per daily observationComparison between “sick posts” and other postsUser pool with age and gender composition
Broniatowski et al. (2015) [51]Municipal (hospital); weekly data269 health-related terms filtered in stagesFiltering to reduce media “chatter”Data from pediatric and adult patients
ICD-10: International Classification of Diseases, Tenth Revision; NPIs: non-pharmaceutical interventions.
Table 6. Use-Case Classification of Digital Surveillance Approaches.
Table 6. Use-Case Classification of Digital Surveillance Approaches.
Use Case/
Purpose
Typical PlatformsCommon
Diseases
Contextual Considerations (High vs. Low Resources)
Real-Time Monitoring (“Nowcasting”)X API, participatory systems (FluNearYou), news data (GDELT)Influenza, COVID-19, CholeraHigh resource: Integration of multiple real-time data streams. Low resource: Reliance on free social media platforms.
Retrospective Analysis and ModelingGoogle Trends archives, historical social media dataDengue, Zika, MERSHigh and low resource: Accessible in both contexts, as historical data are often freely available.
Predictive ForecastingCombination of multiple sources (search queries, social media, clinical data)Influenza, COVID-19, RSVHigh resource: Requires high-quality longitudinal data and computational power for complex ML models. Low resource: More challenging; often relies on simpler time series models.
Table 7. Comparison of results with other studies.
Table 7. Comparison of results with other studies.
Authors/StudyObjective/ScopeMethodology/
Techniques
Data
Sources
Key FindingsImpact/
Context
Advantages/
Disadvantages
This study (2025)To evaluate the use of social media and digital sources for early detection of infectious disease outbreaks.Retrospective and predictive analysis; use of machine learning, correlations, and time series analysis.X, Google Trends, health forums, news databases, epidemiological records.Outbreaks anticipated several weeks in advance; high correlation with official reports.Impact of media context and digital penetration on data quality.✔ Integration of multiple digital sources; validation with official data. ✘ Variability in data representativeness depending on region and digital access.
Al-Kenane et al. (2024) [56]Relationship between Google Trends searches and government response in Kuwait.Time series analysis; Pearson and bootstrap.Google Trends (English and Arabic).High correlation (R ~ 0.71); anticipates policy changes.Incorporates bilingual analysis and effects of government measures.✔ Innovative and robust approach. ✘ Limited to Kuwait and psychological variables.
Melo et al. (2024) [57]To evaluate digital tools for arbovirus surveillance and early detection.Review with comparative analysis; ANOVA and correlations.Google Trends, X, apps, social media, and official data.Early detection (days to weeks); high precision in outbreak prediction.Considers media influence and spatial data resolution.✔ Comprehensive comparison of tools and contexts. ✘ High variability between studies.
Peng Jia et al. (2023) [58]Review of technological innovations (AI, GIS, digital twins) in epidemiological surveillance.Synthesis in Annual Review of Public Health.Geospatial data, EHRs, big data, electronic reporting.Improved accuracy, timeliness, and real-time detection.Impact of smart devices and digital evolution in public health.✔ Highlights key advances in surveillance. ✘ Study heterogeneity; requires integration with other systems.
Zhao et al. (2021) [18]Ethical analysis of digital surveillance in infectious diseases.Systematic review with theoretical focus on privacy and civil rights.Big data, EHRs, digital surveillance.Assesses ethical risks vs. benefits; correlation with official reports.Emphasizes the need to balance surveillance and privacy.✔ Strong theoretical framework on digital surveillance ethics. ✘ Does not address operational metrics.
Salathé et al. (2012) [59]Impact of big data and social media on digital epidemiology.Narrative review and Editors’ Outlook.Social media, mobile phones, online searches.Reduced outbreak detection times.Potential for early alerts vs. technical and bias challenges.✔ Pioneer in digital epidemiology. ✘ Lacks detailed error metrics.
✔: Advantages; ✘: Disadvantages.
Table 8. Comparison of analytical methods for digital epidemiological surveillance: strengths, limitations, and ideal applications.
Table 8. Comparison of analytical methods for digital epidemiological surveillance: strengths, limitations, and ideal applications.
Method/TechniqueStrengthsLimitationsIdeal Use Case and Examples
Correlation and Linear Regression [56,57]Simple, interpretable—Low computational costAssumes linearity—Sensitive to outliers (e.g., media panic spikes)Initial validation of digital data relevance in disease monitoring.
Time Series Models (e.g., ARIMA) [10,51]Strong for forecasting—Handles seasonalityLess flexible to sudden changes—Requires data transformationShort-term forecasts for diseases with seasonal patterns (e.g., flu, RSV).
Supervised ML (e.g., SVM, LASSO, RF) [18,39,58]Captures complex patterns—Variable selection (e.g., LASSO)Risk of overfitting—Opaque (“black box”)—Needs large datasetsIntegrating diverse sources (searches, mobility, social media) into predictive models.
Natural Language Processing (NLP) [42,57]Extracts insights from unstructured text—Captures context and nuanceSensitive to slang/errors—Ambiguity in word meaningSentiment and symptom mining from social media for real-time public health signals.
Bayesian Methods [43]Quantifies uncertainty—Updates with new evidenceComputationally intensive—Sensitive to prior assumptionsChange-point detection in disease trends, e.g., outbreak onset.
Abbreviations: ML: machine learning; SVM: support vector machine; LASSO: least absolute shrinkage and selection operator; RF: random forest; NLP: natural language processing; RSV: respiratory syncytial virus.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liscano, Y.; Anillo Arrieta, L.A.; Montenegro, J.F.; Prieto-Alvarado, D.; Ordoñez, J. Early Warning of Infectious Disease Outbreaks Using Social Media and Digital Data: A Scoping Review. Int. J. Environ. Res. Public Health 2025, 22, 1104. https://doi.org/10.3390/ijerph22071104

AMA Style

Liscano Y, Anillo Arrieta LA, Montenegro JF, Prieto-Alvarado D, Ordoñez J. Early Warning of Infectious Disease Outbreaks Using Social Media and Digital Data: A Scoping Review. International Journal of Environmental Research and Public Health. 2025; 22(7):1104. https://doi.org/10.3390/ijerph22071104

Chicago/Turabian Style

Liscano, Yamil, Luis A. Anillo Arrieta, John Fernando Montenegro, Diego Prieto-Alvarado, and Jorge Ordoñez. 2025. "Early Warning of Infectious Disease Outbreaks Using Social Media and Digital Data: A Scoping Review" International Journal of Environmental Research and Public Health 22, no. 7: 1104. https://doi.org/10.3390/ijerph22071104

APA Style

Liscano, Y., Anillo Arrieta, L. A., Montenegro, J. F., Prieto-Alvarado, D., & Ordoñez, J. (2025). Early Warning of Infectious Disease Outbreaks Using Social Media and Digital Data: A Scoping Review. International Journal of Environmental Research and Public Health, 22(7), 1104. https://doi.org/10.3390/ijerph22071104

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop