1. Introduction
The proliferation of digital news platforms has led to an overwhelming influx of information, making it increasingly challenging to filter, analyze, and derive meaningful insights in real time. Traditional news analysis frameworks, such as dashboard-based aggregators and static retrieval systems, often fail to provide dynamic, real-time insights and struggle with limitations in adaptability, contextual awareness, and interactive query resolution [
1,
2,
3]. Furthermore, many of these systems rely on keyword-based retrieval mechanisms, which lack the semantic depth necessary for effective correlation analysis, leading to a fragmented understanding of complex global events [
4,
5]. The challenge is exacerbated by the prevalence of misinformation, biases in reporting, and redundancy in news content, which contribute to cognitive overload and hinder efficient decision-making [
6,
7]. Existing AI-driven approaches attempt to address these issues through automated summarization and classification techniques, yet they often fall short in generating real-time, contextually relevant insights [
8,
9].
To bridge these gaps, this study introduces an AI-driven chatbot designed for real-time news automation, integrating generative AI models, knowledge graphs, and natural language processing techniques to improve the accuracy and efficiency of news summarization and correlation analysis. The proposed system processes a dataset comprising over 1,306,518 news reports spanning from September 2023 to February 2025, categorizing them into 15 primary event categories (and 202 distinct subcategories) [
10]. By leveraging state-of-the-art machine learning methodologies, the chatbot achieves real-time classification, interactive query-based exploration, and automated event correlation, thereby enhancing the ability to extract structured insights from large-scale news data. Unlike conventional dashboard-based solutions, this chatbot enables users to interactively query news data, uncovering hidden relationships between events and presenting real-time summaries with high contextual relevance. The system was evaluated using precision, recall, and F1 score metrics, demonstrating an average F1 score of 0.94 for summarization tasks and 0.92 for correlation analysis, with response times averaging 9 s for summarization queries and approximately 21 s for event correlation computations. These results highlight the chatbot’s capability to process and deliver relevant insights with high accuracy and efficiency.
The contributions of this research are both theoretical and practical, as outlined below:
This study advances the field of AI-driven news analytics by introducing an interactive chatbot framework that overcomes the limitations of static news retrieval platforms. Unlike conventional methods, the chatbot dynamically integrates user queries with real-time news aggregation, improving adaptability and contextual relevance [
11].
The chatbot is trained on a large-scale dataset of over 1.3 million news reports, categorized into 15 primary event categories and 202 subcategories. The model achieves an F1 score of 0.94 in summarization tasks and 0.92 in correlation analysis, demonstrating superior performance compared with traditional keyword-based retrieval systems.
This research introduces a novel AI-based correlation analysis framework that leverages knowledge graphs and GPT embeddings to establish relationships between events. The system effectively detects correlations between seemingly disparate news topics, achieving an average correlation computation time of 21 s per query.
From an implementation perspective, the chatbot integrates Google Gemini API, Microsoft Power Automate, and Dataverse to facilitate real-time retrieval, classification, and analysis of news content. This architecture ensures seamless interaction between AI-driven decision-making systems and automated process workflows.
2. Background
The exponential growth of digital news platforms has significantly increased the availability of real-time information. However, this rapid expansion has also led to challenges in extracting meaningful insights, detecting emerging patterns, and providing personalized and relevant news summaries [
2,
3,
8,
9]. Traditional news delivery methods, including dashboard-based aggregators and social media-driven information streams, often fail to filter out irrelevant or redundant content, contributing to information overload [
1].
Recent advancements in artificial intelligence (AI), particularly Generative Pre-trained Transformers (GPTs) and knowledge graph-based semantic modeling, have shown promise in improving news categorization, event correlation, and summarization [
5,
13]. These approaches enable a structured representation of interrelated global events, allowing for automated and real-time analysis. However, despite these advancements, existing solutions exhibit notable limitations, as summarized in
Table 1.
2.1. Advancements in AI-Driven News Analytics
The integration of Large Language Models (LLMs) with real-time news retrieval has facilitated new methodologies for automated intelligence extraction [
17]. The chatbot presented in this study enhances real-time news comprehension by leveraging conversational AI and integrating it with an extensive, dynamically updating news repository [
1].
A fundamental innovation introduced in this research is the development of an interactive chatbot capable of executing just-in-time news aggregation. Unlike traditional static news portals, this chatbot dynamically generates news summaries based on user intent, retrieving relevant data from a vast multi-source repository. Furthermore, it integrates Natural Language Understanding (NLU) capabilities, ensuring that news retrieval is contextually aligned with user queries [
5].
2.2. Knowledge Graph-Based Event Correlation
An important aspect of contemporary AI-driven news analytics is the incorporation of knowledge graphs for structuring extracted features into meaningful semantic relationships. Knowledge graphs enhance the ability to correlate seemingly disparate events by encoding inter-entity relationships, event dependencies, and contextual attributions [
14]. These capabilities enable sophisticated pattern recognition, anomaly detection, and in-depth event correlation.
The AI-driven chatbot employs a hybrid approach combining GPT embeddings and knowledge graph-assisted correlation analysis. This framework significantly improves event-linkage detection, allowing users to explore underlying patterns between seemingly unrelated news events [
4]. The reliance on vector-based similarity metrics, such as cosine similarity in embedding spaces, has demonstrated effectiveness in uncovering hidden relationships between key global events.
2.3. Real-Time AI Chatbot for News Summarization and Decision Support
By addressing key limitations in existing AI-driven news solutions, this study introduces a practical application of AI-powered chatbot technology in real-time news analysis. The chatbot’s implementation aligns with advancements in Robotic Process Automation (RPA), where intelligent systems autonomously process and summarize global news updates to support decision-making in critical domains such as disaster response, logistics, and cybersecurity [
8].
In summary, this paper presents an innovative chatbot-based news intelligence system that advances AI-driven event correlation, real-time summarization, and anomaly detection. By bridging the gap between static news retrieval platforms and interactive conversational agents, this solution represents a significant leap in automated news analytics.
3. Methodology of the Autonomous News Bot
As seen from
Figure 1, the chatbot initiates the process upon receiving a user query, denoted as
q, which may cover a wide range of topics, including requests for localized news summaries, correlation analyses between events, or general inquiries. This query is first processed by the chatbot’s classification model to determine the specific intent, which is labeled as
I. The classification phase allows the chatbot to assign each query to one of three categories: location-based news summary, event correlation, or general inquiry.
If the intent I is identified as a location-based news summary request, the chatbot invokes the ObtainLatestLocationNewsSummary function. Within this function, the chatbot begins by identifying the event category associated with the user-specified event e. The language model performs a semantic analysis to map the event e to one or more relevant categories within the predefined category set . Following this categorization, the chatbot retrieves a set of news reports, referred to as , that aligns with the category , the user-defined location L, and the recent three-day period D. The retrieved news data are then summarized by the chatbot using a language model summarization function, LLM_Summarize, which condenses the content into a concise summary, including key points and references to the original news sources. This summary is subsequently returned to the user.
Should the user query fall under the category of event correlation, the chatbot proceeds with the CalculateCorrelation function. This function is employed when the user is interested in understanding the correlation between two specific events over a chosen time period. Initially, the chatbot identifies relevant categories for each event, labeled and , through the language model’s category mapping function LLM_Match. Once categorized, the chatbot retrieves the corresponding news reports for both events within the specified duration, constructing a dataset for further analysis. The chatbot then calculates the daily frequencies, and , of the reports for each day within the selected period. Using these daily frequency vectors, the chatbot computes the Pearson correlation coefficient r, which measures the statistical correlation between the events based on their frequency of occurrence across the date range. The result of this computation, r, is then communicated back to the user, providing insight into the relationship between the two events.
For any other user queries that do not pertain to a specific location-based news summary or event correlation, the chatbot defaults to the HandleGenericQueries function. This function addresses general inquiries by leveraging a generative language model, such as GPT, to generate an informative response. The response is produced by the generative model and then evaluated for relevance. If the initial response does not meet a predefined relevance threshold, T, the query is iteratively refined and reprocessed until the response achieves satisfactory alignment with the user’s intent. Once a ‘suitable response’ is generated, it is delivered to the user. The ‘suitability’ of a response in our system refers to how well the generated response matches the user’s intent and query in terms of relevance and accuracy. In cases where the query does not match any supported category, the chatbot returns a message indicating that the query cannot be processed, thereby maintaining clarity in user interaction.
This structured approach enables the chatbot to efficiently manage a diverse range of queries, directing each to the appropriate function based on the classified intent. The sequential logic underpinning the chatbot’s decision-making process ensures that users receive tailored responses that meet their specific informational needs, whether for localized news, event analysis, or general knowledge inquiries.
To summarize, the proposed AI-driven chatbot for real-time news automation follows a structured methodology to process user queries, retrieve relevant news, analyze event correlations, and generate informative summaries. The chatbot integrates LLMs, knowledge graphs, and real-time data retrieval techniques to provide accurate and contextually relevant insights.
As highlighted before, this methodology involves three primary functions:
ObtainLatestLocationNewsSummary: fetches and summarizes the latest news for a specific event and location.
CalculateCorrelation: determines correlations between two events based on the frequency of related news articles.
HandleGenericQueries: processes general inquiries using a generative AI-based response system.
Below, we provide a detailed pseudocode representation of the chatbot’s methodology in Algorithm 1:
Algorithm 1 AI-Driven chatbot for real-time news automation. |
- Require:
User query q - Ensure:
Response to user query with relevant insights - 1:
Initialize AI chatbot with language model and database access - 2:
Receive user query q - 3:
Classify intent LLM_Classify(q) - 4:
if I = Location-based News Summary then - 5:
ExtractEventLocation(q) - 6:
LLM_Match(e) - 7:
Retrieve from database - 8:
Generate summary LLM_Summarize() - 9:
Return S - 10:
else if I = Event Correlation Analysis then - 11:
ExtractEventPair(q) - 12:
LLM_Match(), LLM_Match() - 13:
Retrieve news reports for within time frame D - 14:
Compute daily frequencies , - 15:
Compute correlation PearsonCorrelation() - 16:
Return correlation result r - 17:
else - 18:
Generate response GPT_Generate(q) - 19:
Compute relevance CosineSimilarity(q, ) - 20:
if then - 21:
LLM_Refine(q) - 22:
Recompute GPT_Generate(q) - 23:
end if - 24:
Return - 25:
end if
|
This structured methodology ensures that the chatbot efficiently handles a diverse range of queries, supporting real-time event analysis, dynamic summarization, and interactive query-driven decision-making. By leveraging AI-driven techniques, the system provides high-accuracy responses tailored to specific user needs.
4. Mathematical Modelling
Table 2 provides the notations used within this section.
4.1. Common Mathematical Models
1. User Query:
Represented as q, the user query that might relate to various news and event topics or general inquiries.
2. Category Mapping :
: the subset of categories within the category set
(derived from vocabulary
) that semantically align with a specific event
e. Mapping is achieved by a language model as
While exploring advanced methodologies for news classification, it remains pertinent to acknowledge the enduring prevalence of traditional classification systems within mainstream news portals [
18,
19,
20,
21]. These systems categorize news articles into familiar segments such as business, sports, breaking news, and travel—categories that align with conventional reader expectations. This approach reflects established media practices, ensuring that the content is organized in a manner that is readily accessible and understandable to the audience. Such traditional methods continue to serve as a foundation for reader engagement and content navigation across digital news platforms.
3. News Reports Collection :
The set of news reports
filtered by category
, location
L, and date range
D.
4. Language Model (LLM) Functions:
Category Mapping Function LLM_Match: determines the set of relevant categories by analyzing semantic similarity.
Summarization Function LLM_Summarize: summarizes news reports and includes URLs for references.
Generative Response Function GPT_Generate: generates responses for general user queries by leveraging pre-trained knowledge.
4.2. Function 1: ObtainLatestLocationNewsSummary
To generate a summary of the latest news for a specific event e and location L over the last three days D.
Step 1: Input Collection
Define user-selected event e and location L, defaulting to “Global” if L is not provided.
Step 2: Event Category Mapping
For
e, identify the relevant categories:
Step 3: News Reports Retrieval
Retrieve news reports for the past three days that match
and
L:
Step 4: Summarize News Reports
Use the LLM summarization function to create a concise summary of
:
Return summary containing the main points with URLs for original sources.
4.3. Function 2: CalculateCorrelation
To calculate the correlation between two events and based on daily frequencies of related news reports over a specified date range D.
Step 1: Event Category Mapping
Identify relevant categories for both
and
:
Step 2: News Reports Retrieval
For each event
and
, retrieve related news reports:
Step 3: Daily Frequency Calculation
For each day
, calculate frequencies:
Define frequency vectors:
Step 4: Pearson Correlation Calculation
Calculate Pearson correlation:
Return r, representing the correlation between and .
4.4. Function 3: HandleGenericQueries
To respond to general queries q unrelated to news summarization (localized) or event correlation by leveraging the generative AI capabilities of GPT.
Step 1: User Intent Classification
Classify the user’s query
q to determine whether it relates to news summarization, event correlation, or a general inquiry.
Step 2: Generative Response Mechanism
If
I is a general inquiry, use the GPT function to generate a response:
Step 3: Relevance and Iterative Refinement
Calculate relevance using cosine similarity between embeddings of
q and
:
If
V is below a threshold
T, refine
q and repeat the response generation:
Step 4: Return Final Response
Return the final generated response, , having met the relevance threshold T.
5. Implementation
The implementation of the AI-driven news analysis chatbot, as visualized in
Figure 2, involved a strategic orchestration of four key technology components. The architecture diagram illustrates the integrated system of components enabling the chatbot to perform complex news analysis tasks by leveraging user queries, processing information, and retrieving relevant data. For a Robotic Process Automation (RPA) practitioner, this diagram provides a clear blueprint for automating the flow of information and tasks between different technologies, such as Microsoft Copilot Studio, Microsoft Power Automate (MPA), Microsoft Dataverse (MD), and Google Gemini API (GGAPI). By understanding this architecture, RPA practitioners can effectively implement and optimize the chatbot for dynamic news analysis, ensuring seamless communication, efficient data processing, and accurate information retrieval. Microsoft Copilot Studio served as the front end for user interaction, enabling seamless communication and query processing. It captures user intents, whether related to obtaining news summaries, calculating correlations, or posing generic queries. This information is then passed to MPA, which acts as the central orchestrator, facilitating automated interactions and data flow between the different components.
MPA routes requests to appropriate modules based on the user’s intent. For instance, requests for news summaries are directed to both GGAPI and MD. The GGAPI leverages its advanced functionalities to accurately match news categories to user queries and generate Fetch XML for efficient retrieval of relevant news reports from the Dataverse repository, which houses a live news database comprising 991,325 news reports categorized into 202 distinct news categories. Additionally, Google Gemini’s summarization capabilities facilitate the concise and informative presentation of news data to the user. Similarly, requests for correlation calculations are processed by a dedicated module, utilizing both the GGAPI and MD to fetch and analyze relevant news data. Generic queries are handled by leveraging the generative AI capabilities of the system. This synergistic integration of technologies enables the chatbot to effectively address a wide range of user needs and deliver dynamic news analysis capabilities.
Table 3 depicts the roles of the selected technology components for the proposed chatbot-based news analytics system presented in this paper.
Figure 2 in the manuscript meticulously outlines the architectural configuration of the AI-driven chatbot, employing ArchiMate as the notation system. This diagram is central to illustrating the structural and functional relationships within the system, prominently featuring ‘access’ (i.e., dashed arrow with regular arrowhead) and ‘realization’ (i.e., dashed arrow with white triangle at the arrowhead) relations, which are pivotal in ArchiMate for depicting interaction dynamics and implementation details, respectively. It is paramount to note that the diagram employs ArchiMate, a standardized notational language specifically designed for architectural representation within complex information systems [
22,
23,
24]. ArchiMate is endorsed by The Open Group and adheres to rigorous international standards, which dictate the use of specific symbols and color schemes to represent different architectural domains and relationships. As seen in
Figure 2, Microsoft Copilot Studio realizes three business services, namely location-based news summary, correlation calculation, and generative AI. GGAPI, MPA, and MD realize location-based news summary and correlation calculation.
6. Results
To rigorously assess the performance of the AI-driven chatbot for real-time news automation, the system was evaluated using a dataset comprising 1,306,518 news reports. These reports were collected over a comprehensive monitoring period spanning from 25 September 2023 to 17 February 2025. The dataset was systematically classified into 202 subcategories grouped into 15 primary event categories, ensuring broad coverage of diverse global topics. This classification schema was derived from a comprehensive analysis of the dataset, where these 15 primary event categories and 202 subcategories were identified as sufficiently representative of the variance within the data without overly fragmenting the information, which could potentially lead to model over-fitting. The classification results are summarized in
Table 4. The LLM assesses the predominant themes within the news text and classifies it into the most fitting category, ensuring comprehensive and nuanced understanding. It does not reject information simply because it spans multiple disciplines (e.g., a news report on both politics and education); rather, it synthesizes the information to determine the most relevant category or categories, reflecting the interdisciplinary nature of the content where appropriate.
To demonstrate the chatbot’s functionality, we present two figures highlighting its news summarization and correlation analysis capabilities.
Figure 3 illustrates the chatbot’s ability to generate concise and structured summaries for news articles. The chatbot extracts key information from multiple sources and provides an easy-to-read summary.
Figure 4 showcases the chatbot’s event correlation feature, where users input two event categories, and the system computes a correlation coefficient based on historical data. This functionality allows users to understand potential dependencies between different types of events. Together, these figures illustrate the chatbot’s effectiveness in automating news intelligence and supporting real-time decision-making.
The chatbot was evaluated based on two key dimensions: summarization accuracy and correlation analysis. The summarization evaluation assessed the chatbot’s ability to generate concise and relevant summaries for diverse news topics. The results of this evaluation are presented in
Table 5.
The chatbot demonstrated high summarization accuracy, achieving an overall average F1 score of 0.94 across multiple categories. The system required an average processing time of 9 s per summarization task.
In addition to summarization, the chatbot was also evaluated for its correlation analysis capabilities. The AI-driven correlation module systematically processed various query types and dynamically extracted relationships between news events. The chatbot’s effectiveness in detecting event correlations was validated using a dataset of correlated event pairs. The results of this evaluation are presented in
Table 6.
AI-driven chatbots have become integral to various domains, including politics, healthcare, cybersecurity, and finance. Evaluating their effectiveness requires quantitative analysis of key performance metrics such as precision, recall, and the F1 score. This paper presents a three-dimensional (3D) visualization that systematically analyzes the chatbot’s performance across multiple categories. Using a pre-trained model like Google Gemini means that significant portions of the model’s learning and adaptation processes have already been completed before integration into the presented system. This eliminates the need for extensive retraining, which typically involves calculating loss functions during training to improve the model. Hence, this study does not use a loss function and instead focuses on performance metrics that assess how well the pre-trained model adapts to the use case.
The 3D surface plot in
Figure 5 illustrates the chatbot’s performance scores in 15 primary event categories. The x-axis represents the categories, the y-axis differentiates the three evaluation metrics, and the z-axis denotes the performance scores, ranging between 0.87 and 0.98. A color gradient (using the ‘viridis’ colormap) visually encodes variations in performance intensity, where lighter shades correspond to higher scores.
From the analysis, the chatbot demonstrates robust performance across most categories, maintaining precision, recall, and F1 scores above 0.90 in most cases. The highest performance scores are observed in Sports and Pandemics, where the chatbot exhibits strong accuracy and recall when retrieving relevant data. However, slightly lower scores appear in Natural Disasters and UFO-related topics, suggesting areas where improvements can be made, potentially by enhancing the training dataset or refining the response generation algorithm.
The 3D visualization provides an intuitive representation of chatbot performance, enabling stakeholders to identify strengths and improvement areas. The inclusion of a color bar legend facilitates interpretation, making the chart an effective tool for evaluating chatbot capabilities in diverse scenarios.
The chatbot exhibited strong correlation detection capabilities, achieving an overall average F1 score of 0.92 across multiple event pairings. As shown in
Figure 6, the performance of the Google Gemini API with other LLMs like OpenAI’s GPT API and even Meta’s LLaMA is similar (within
Figure 6, the outcome of this study is labeled as ‘AI-Driven Chatbot’ in yellow). The rest of the results within this figure were obtained from [
25]. It is important to recognize that a significant advantage of employing Google Gemini’s API (version: 1.5 flash), as opposed to other GPT APIs like DeepSeek, OpenAI, or similar services, lies in its cost-effectiveness. Google Gemini’s API can be utilized at no charge up to a specified daily usage limit. The processing time for correlation tasks was observed to be approximately 21 s per query. It should be noted that the majority (over 70%) of the processing time is spent on querying the Microsoft Dataverse database in the aggregation process of the news reports. To reduce the processing time for summarization (i.e., an average of 9 s) and correlation (i.e., an average of 21 s), future studies should focus on database indexing and retrieval processes.
The comprehensive evaluation of the system, covering both summarization and correlation analyses, underscores its robustness in automated news analytics. These findings position the chatbot as a scalable and adaptive solution for event tracking, real-time summarization, and data-driven decision-making in various domains.
7. Conclusions
This study has introduced an AI-driven chatbot for real-time news automation, demonstrating its effectiveness in processing large-scale global news data through event summarization and correlation analysis. By leveraging advanced AI methodologies, including Generative Pre-trained Transformers, knowledge graphs, and real-time retrieval systems, the chatbot successfully categorizes and structures information to provide users with precise and context-aware insights. The experimental evaluation validated its efficacy, showing high accuracy in news classification, summarization, and event correlation detection. These capabilities position the chatbot as a valuable tool for applications in critical domains such as disaster response, policy analysis, cybersecurity, and media intelligence, where timely and accurate information is essential for decision-making.
Despite its promising contributions, this research has certain limitations. The chatbot’s performance is dependent on the quality and diversity of available news sources, making it vulnerable to biases and misinformation inherent in the dataset. Furthermore, the computational overhead required for real-time processing, particularly in correlation analysis, poses challenges in high-frequency data environments. Another limitation is the chatbot’s reliance on predefined taxonomies for event categorization, which may not fully capture emerging or evolving news narratives. The system also faces challenges in interpreting ambiguous queries, necessitating further advancements in natural language understanding to improve user interactions.
Future research directions aim to enhance the chatbot’s adaptability and scalability by integrating reinforcement learning for continuous model improvement based on user feedback. Expanding multilingual support will enable broader accessibility, allowing the chatbot to process non-English news sources effectively. Further refinements in misinformation detection using fact-checking AI models and sentiment analysis can enhance the chatbot’s reliability. Additionally, optimizing computational efficiency through distributed AI processing and cloud-based architectures will enable faster response times, making the system more practical for large-scale deployments. While cosine similarity has demonstrated utility in this current framework due to its effectiveness in high-dimensional text data, we acknowledge that other metrics like Minkowski, Chebyshev, Jaccard, and Canberra could unveil additional insights into the complex relationships between news events [
26,
27]. We plan to investigate these metrics in future work to determine their impact on the quality of our correlation analyses, acknowledging that this might increase the computational complexity and the challenge of interpreting results. By addressing these areas, the chatbot can evolve into a more robust and intelligent news analysis system, contributing significantly to real-time information processing and decision support in diverse societal contexts.