AI-Driven Chatbot for Real-Time News Automation

Sufi, Fahim; Alsulami, Musleh

doi:10.3390/math13050850

Open AccessArticle

AI-Driven Chatbot for Real-Time News Automation

by

Fahim Sufi

^1,*

and

Musleh Alsulami

²

¹

School of Public Health and Preventive Medicine, Monash University, Australia, VIC 3004, Australia

²

Department of Software Engineering, College of Computing, Umm Al-Qura University, Makkah 21961, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(5), 850; https://doi.org/10.3390/math13050850

Submission received: 17 February 2025 / Revised: 1 March 2025 / Accepted: 3 March 2025 / Published: 4 March 2025

(This article belongs to the Topic Soft Computing and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

The rapid expansion of digital news sources has necessitated intelligent systems capable of filtering, analyzing, and deriving meaningful insights from vast amounts of information in real time. This study presents an AI-driven chatbot designed for real-time news automation, integrating advanced natural language processing techniques, knowledge graphs, and generative AI models to improve news summarization and correlation analysis. The chatbot processes over 1,306,518 news reports spanning from 25 September 2023 to 17 February 2025, categorizing them into 15 primary event categories and extracting key insights through structured analysis. By employing state-of-the-art machine learning techniques, the system enables real-time classification, interactive query-based exploration, and automated event correlation. The chatbot demonstrated high accuracy in both summarization and correlation tasks, achieving an average F1 score of 0.94 for summarization and 0.92 for correlation analysis. Summarization queries were processed within an average response time of 9 s, while correlation analyses required approximately 21 s per query. The chatbot’s ability to generate real-time, concise news summaries and uncover hidden relationships between events makes it a valuable tool for applications in disaster response, policy analysis, cybersecurity, and public communication. This research contributes to the field of AI-driven news analytics by bridging the gap between static news retrieval platforms and interactive conversational agents. Future work will focus on expanding multilingual support, enhancing misinformation detection, and optimizing computational efficiency for broader real-world applicability. The proposed chatbot stands as a scalable and adaptive solution for real-time decision support in dynamic information environments.

Keywords:

generative AI; chatbots; robotic process automation; news insight automation; AI-driven automation; real-time news analysis for robots

MSC:

03-04; 03-08; 03-11

1. Introduction

The proliferation of digital news platforms has led to an overwhelming influx of information, making it increasingly challenging to filter, analyze, and derive meaningful insights in real time. Traditional news analysis frameworks, such as dashboard-based aggregators and static retrieval systems, often fail to provide dynamic, real-time insights and struggle with limitations in adaptability, contextual awareness, and interactive query resolution [1,2,3]. Furthermore, many of these systems rely on keyword-based retrieval mechanisms, which lack the semantic depth necessary for effective correlation analysis, leading to a fragmented understanding of complex global events [4,5]. The challenge is exacerbated by the prevalence of misinformation, biases in reporting, and redundancy in news content, which contribute to cognitive overload and hinder efficient decision-making [6,7]. Existing AI-driven approaches attempt to address these issues through automated summarization and classification techniques, yet they often fall short in generating real-time, contextually relevant insights [8,9].

To bridge these gaps, this study introduces an AI-driven chatbot designed for real-time news automation, integrating generative AI models, knowledge graphs, and natural language processing techniques to improve the accuracy and efficiency of news summarization and correlation analysis. The proposed system processes a dataset comprising over 1,306,518 news reports spanning from September 2023 to February 2025, categorizing them into 15 primary event categories (and 202 distinct subcategories) [10]. By leveraging state-of-the-art machine learning methodologies, the chatbot achieves real-time classification, interactive query-based exploration, and automated event correlation, thereby enhancing the ability to extract structured insights from large-scale news data. Unlike conventional dashboard-based solutions, this chatbot enables users to interactively query news data, uncovering hidden relationships between events and presenting real-time summaries with high contextual relevance. The system was evaluated using precision, recall, and F1 score metrics, demonstrating an average F1 score of 0.94 for summarization tasks and 0.92 for correlation analysis, with response times averaging 9 s for summarization queries and approximately 21 s for event correlation computations. These results highlight the chatbot’s capability to process and deliver relevant insights with high accuracy and efficiency.

The contributions of this research are both theoretical and practical, as outlined below:

This study advances the field of AI-driven news analytics by introducing an interactive chatbot framework that overcomes the limitations of static news retrieval platforms. Unlike conventional methods, the chatbot dynamically integrates user queries with real-time news aggregation, improving adaptability and contextual relevance [11].
The chatbot is trained on a large-scale dataset of over 1.3 million news reports, categorized into 15 primary event categories and 202 subcategories. The model achieves an F1 score of 0.94 in summarization tasks and 0.92 in correlation analysis, demonstrating superior performance compared with traditional keyword-based retrieval systems.
This research introduces a novel AI-based correlation analysis framework that leverages knowledge graphs and GPT embeddings to establish relationships between events. The system effectively detects correlations between seemingly disparate news topics, achieving an average correlation computation time of 21 s per query.
From an implementation perspective, the chatbot integrates Google Gemini API, Microsoft Power Automate, and Dataverse to facilitate real-time retrieval, classification, and analysis of news content. This architecture ensures seamless interaction between AI-driven decision-making systems and automated process workflows.

The chatbot is designed to support decision-making in critical domains such as disaster response, cybersecurity, policy analysis, and public communication. Its ability to generate concise, high-accuracy summaries and extract latent event relationships makes it a scalable and practical tool for real-time intelligence automation [12]. The bot can be publicly accessed by both human and robot actors (available at https://copilotstudio.microsoft.com/environments/a00ca161-b640-eae8-9caa-8058a3d7ae19/bots/crd69_copilot/canvas?__version__=2&enableFileAttachment=false, accessed on 15 February 2025).

2. Background

The exponential growth of digital news platforms has significantly increased the availability of real-time information. However, this rapid expansion has also led to challenges in extracting meaningful insights, detecting emerging patterns, and providing personalized and relevant news summaries [2,3,8,9]. Traditional news delivery methods, including dashboard-based aggregators and social media-driven information streams, often fail to filter out irrelevant or redundant content, contributing to information overload [1].

Recent advancements in artificial intelligence (AI), particularly Generative Pre-trained Transformers (GPTs) and knowledge graph-based semantic modeling, have shown promise in improving news categorization, event correlation, and summarization [5,13]. These approaches enable a structured representation of interrelated global events, allowing for automated and real-time analysis. However, despite these advancements, existing solutions exhibit notable limitations, as summarized in Table 1.

2.1. Advancements in AI-Driven News Analytics

The integration of Large Language Models (LLMs) with real-time news retrieval has facilitated new methodologies for automated intelligence extraction [17]. The chatbot presented in this study enhances real-time news comprehension by leveraging conversational AI and integrating it with an extensive, dynamically updating news repository [1].

A fundamental innovation introduced in this research is the development of an interactive chatbot capable of executing just-in-time news aggregation. Unlike traditional static news portals, this chatbot dynamically generates news summaries based on user intent, retrieving relevant data from a vast multi-source repository. Furthermore, it integrates Natural Language Understanding (NLU) capabilities, ensuring that news retrieval is contextually aligned with user queries [5].

2.2. Knowledge Graph-Based Event Correlation

An important aspect of contemporary AI-driven news analytics is the incorporation of knowledge graphs for structuring extracted features into meaningful semantic relationships. Knowledge graphs enhance the ability to correlate seemingly disparate events by encoding inter-entity relationships, event dependencies, and contextual attributions [14]. These capabilities enable sophisticated pattern recognition, anomaly detection, and in-depth event correlation.

The AI-driven chatbot employs a hybrid approach combining GPT embeddings and knowledge graph-assisted correlation analysis. This framework significantly improves event-linkage detection, allowing users to explore underlying patterns between seemingly unrelated news events [4]. The reliance on vector-based similarity metrics, such as cosine similarity in embedding spaces, has demonstrated effectiveness in uncovering hidden relationships between key global events.

2.3. Real-Time AI Chatbot for News Summarization and Decision Support

By addressing key limitations in existing AI-driven news solutions, this study introduces a practical application of AI-powered chatbot technology in real-time news analysis. The chatbot’s implementation aligns with advancements in Robotic Process Automation (RPA), where intelligent systems autonomously process and summarize global news updates to support decision-making in critical domains such as disaster response, logistics, and cybersecurity [8].

In summary, this paper presents an innovative chatbot-based news intelligence system that advances AI-driven event correlation, real-time summarization, and anomaly detection. By bridging the gap between static news retrieval platforms and interactive conversational agents, this solution represents a significant leap in automated news analytics.

3. Methodology of the Autonomous News Bot

As seen from Figure 1, the chatbot initiates the process upon receiving a user query, denoted as q, which may cover a wide range of topics, including requests for localized news summaries, correlation analyses between events, or general inquiries. This query is first processed by the chatbot’s classification model to determine the specific intent, which is labeled as I. The classification phase allows the chatbot to assign each query to one of three categories: location-based news summary, event correlation, or general inquiry.

If the intent I is identified as a location-based news summary request, the chatbot invokes the ObtainLatestLocationNewsSummary function. Within this function, the chatbot begins by identifying the event category

C_{e}

associated with the user-specified event e. The language model performs a semantic analysis to map the event e to one or more relevant categories within the predefined category set

C

. Following this categorization, the chatbot retrieves a set of news reports, referred to as

{newsReports}_{e, L, D}

, that aligns with the category

C_{e}

, the user-defined location L, and the recent three-day period D. The retrieved news data are then summarized by the chatbot using a language model summarization function, LLM_Summarize, which condenses the content into a concise summary, including key points and references to the original news sources. This summary is subsequently returned to the user.

Should the user query fall under the category of event correlation, the chatbot proceeds with the CalculateCorrelation function. This function is employed when the user is interested in understanding the correlation between two specific events over a chosen time period. Initially, the chatbot identifies relevant categories for each event, labeled

C_{e 1}

and

C_{e 2}

, through the language model’s category mapping function LLM_Match. Once categorized, the chatbot retrieves the corresponding news reports for both events within the specified duration, constructing a dataset for further analysis. The chatbot then calculates the daily frequencies,

f_{e 1} (d)

and

f_{e 2} (d)

, of the reports for each day within the selected period. Using these daily frequency vectors, the chatbot computes the Pearson correlation coefficient r, which measures the statistical correlation between the events based on their frequency of occurrence across the date range. The result of this computation, r, is then communicated back to the user, providing insight into the relationship between the two events.

For any other user queries that do not pertain to a specific location-based news summary or event correlation, the chatbot defaults to the HandleGenericQueries function. This function addresses general inquiries by leveraging a generative language model, such as GPT, to generate an informative response. The response is produced by the generative model and then evaluated for relevance. If the initial response does not meet a predefined relevance threshold, T, the query is iteratively refined and reprocessed until the response achieves satisfactory alignment with the user’s intent. Once a ‘suitable response’ is generated, it is delivered to the user. The ‘suitability’ of a response in our system refers to how well the generated response matches the user’s intent and query in terms of relevance and accuracy. In cases where the query does not match any supported category, the chatbot returns a message indicating that the query cannot be processed, thereby maintaining clarity in user interaction.

This structured approach enables the chatbot to efficiently manage a diverse range of queries, directing each to the appropriate function based on the classified intent. The sequential logic underpinning the chatbot’s decision-making process ensures that users receive tailored responses that meet their specific informational needs, whether for localized news, event analysis, or general knowledge inquiries.

To summarize, the proposed AI-driven chatbot for real-time news automation follows a structured methodology to process user queries, retrieve relevant news, analyze event correlations, and generate informative summaries. The chatbot integrates LLMs, knowledge graphs, and real-time data retrieval techniques to provide accurate and contextually relevant insights.

As highlighted before, this methodology involves three primary functions:

ObtainLatestLocationNewsSummary: fetches and summarizes the latest news for a specific event and location.
CalculateCorrelation: determines correlations between two events based on the frequency of related news articles.
HandleGenericQueries: processes general inquiries using a generative AI-based response system.

Below, we provide a detailed pseudocode representation of the chatbot’s methodology in Algorithm 1:

Algorithm 1 AI-Driven chatbot for real-time news automation.

Require:: User query q
Ensure:: Response to user query with relevant insights
1:: Initialize AI chatbot with language model and database access
2:: Receive user query q
3:: Classify intent $I \leftarrow$ LLM_Classify(q)
4:: if I = Location-based News Summary then
5:: $e, L \leftarrow$ ExtractEventLocation(q)
6:: $C e \leftarrow$ LLM_Match(e)
7:: Retrieve $n e w s R e p o r t s_{e, L, D}$ from database
8:: Generate summary $S \leftarrow$ LLM_Summarize( $n e w s R e p o r t s_{e, L, D}$ )
9:: Return S
10:: else if I = Event Correlation Analysis then
11:: $e_{1}, e_{2} \leftarrow$ ExtractEventPair(q)
12:: $C e_{1}, C e_{2} \leftarrow$ LLM_Match( $e_{1}$ ), LLM_Match( $e_{2}$ )
13:: Retrieve news reports for $e_{1}, e_{2}$ within time frame D
14:: Compute daily frequencies $f_{e_{1}} (d)$ , $f_{e_{2}} (d)$
15:: Compute correlation $r \leftarrow$ PearsonCorrelation( $f_{e_{1}}, f_{e_{2}}$ )
16:: Return correlation result r
17:: else
18:: Generate response $R_{q} \leftarrow$ GPT_Generate(q)
19:: Compute relevance $V \leftarrow$ CosineSimilarity(q, $R_{q}$ )
20:: if $V < T$ then
21:: $q \leftarrow$ LLM_Refine(q)
22:: Recompute $R_{q} \leftarrow$ GPT_Generate(q)
23:: end if
24:: Return $R_{q}$
25:: end if

This structured methodology ensures that the chatbot efficiently handles a diverse range of queries, supporting real-time event analysis, dynamic summarization, and interactive query-driven decision-making. By leveraging AI-driven techniques, the system provides high-accuracy responses tailored to specific user needs.

4. Mathematical Modelling

Table 2 provides the notations used within this section.

4.1. Common Mathematical Models

1. User Query:

Represented as q, the user query that might relate to various news and event topics or general inquiries.

2. Category Mapping

C_{e}

:

C_{e} \subseteq C

: the subset of categories within the category set

C

(derived from vocabulary

V

) that semantically align with a specific event e. Mapping is achieved by a language model as

C_{e} = LLM_Match (e, V)

(1)

While exploring advanced methodologies for news classification, it remains pertinent to acknowledge the enduring prevalence of traditional classification systems within mainstream news portals [18,19,20,21]. These systems categorize news articles into familiar segments such as business, sports, breaking news, and travel—categories that align with conventional reader expectations. This approach reflects established media practices, ensuring that the content is organized in a manner that is readily accessible and understandable to the audience. Such traditional methods continue to serve as a foundation for reader engagement and content navigation across digital news platforms.

3. News Reports Collection

{newsReports}_{e, L, D}

:

The set of news reports

{newsReports}_{e} \subseteq N

filtered by category

C_{e}

, location L, and date range D.

\begin{matrix} {newsReports}_{e, L, D} = {n \in N ∣ n \in {newsReports}_{e}, \\ Time (n) \in D, Location (n) \approx L} \end{matrix}

(2)

4. Language Model (LLM) Functions:

Category Mapping Function LLM_Match: determines the set of relevant categories by analyzing semantic similarity.
Summarization Function LLM_Summarize: summarizes news reports and includes URLs for references.
Generative Response Function GPT_Generate: generates responses for general user queries by leveraging pre-trained knowledge.

4.2. Function 1: ObtainLatestLocationNewsSummary

To generate a summary of the latest news for a specific event e and location L over the last three days D.

Step 1: Input Collection

Define user-selected event e and location L, defaulting to “Global” if L is not provided.

Step 2: Event Category Mapping

For e, identify the relevant categories:

C_{e} = LLM_Match (e, V)

(3)

Step 3: News Reports Retrieval

Retrieve news reports for the past three days that match

C_{e}

and L:

D = {today - i ∣ i \leq 3} (last three days)

(4)

\begin{matrix} {newsReports}_{e, L, D} = {n \in N ∣ n \in {newsReports}_{e}, \\ Time (n) \in D, Location (n) \approx L} \end{matrix}

(5)

Step 4: Summarize News Reports

Use the LLM summarization function to create a concise summary of

{newsReports}_{e, L, D}

:

summary = LLM_Summarize ({newsReports}_{e, L, D})

(6)

Return summary containing the main points with URLs for original sources.

4.3. Function 2: CalculateCorrelation

To calculate the correlation between two events

e_{1}

and

e_{2}

based on daily frequencies of related news reports over a specified date range D.

Step 1: Event Category Mapping

Identify relevant categories for both

e_{1}

and

e_{2}

:

C_{e_{1}} = LLM_Match (e_{1}, V)

(7)

C_{e_{2}} = LLM_Match (e_{2}, V)

(8)

Step 2: News Reports Retrieval

For each event

e_{1}

and

e_{2}

, retrieve related news reports:

\begin{matrix} {newsReports}_{e_{1}, D} = {n \in N ∣ n \in {newsReports}_{e_{1}}, \\ Time (n) \in D} \end{matrix}

(9)

\begin{matrix} {newsReports}_{e_{2}, D} = {n \in N ∣ n \in {newsReports}_{e_{2}}, \\ Time (n) \in D} \end{matrix}

(10)

Step 3: Daily Frequency Calculation

For each day

d \in D

, calculate frequencies:

f_{e_{1}} (d) = |{n \in {newsReports}_{e_{1}, D} ∣ Time (n) = d}|

(11)

f_{e_{2}} (d) = |{n \in {newsReports}_{e_{2}, D} ∣ Time (n) = d}|

(12)

Define frequency vectors:

f_{e_{1}} = {f_{e_{1}} (d) ∣ d \in D}

(13)

f_{e_{2}} = {f_{e_{2}} (d) ∣ d \in D}

(14)

Step 4: Pearson Correlation Calculation

Compute means:

μ_{e_{1}} = \frac{1}{| D |} \sum_{d \in D} f_{e_{1}} (d)

(15)

μ_{e_{2}} = \frac{1}{| D |} \sum_{d \in D} f_{e_{2}} (d)

(16)

Calculate Pearson correlation:

r = \frac{\sum_{d \in D} (f_{e 1} (d) - μ_{e 1}) (f_{e 2} (d) - μ_{e 2})}{\sqrt{\sum_{d \in D} {(f_{e 1} (d) - μ_{e 1})}^{2}} \cdot \sqrt{\sum_{d \in D} {(f_{e 2} (d) - μ_{e 2})}^{2}}}

(17)

Return r, representing the correlation between

e_{1}

and

e_{2}

.

4.4. Function 3: HandleGenericQueries

To respond to general queries q unrelated to news summarization (localized) or event correlation by leveraging the generative AI capabilities of GPT.

Step 1: User Intent Classification

Classify the user’s query q to determine whether it relates to news summarization, event correlation, or a general inquiry.

I = LLM_Classify (q)

(18)

Step 2: Generative Response Mechanism

If I is a general inquiry, use the GPT function to generate a response:

R_{q} = GPT_Generate (q)

(19)

Step 3: Relevance and Iterative Refinement

Calculate relevance using cosine similarity between embeddings of q and

R_{q}

:

V = Cosine Similarity (E (q), E (R_{q}))

(20)

If V is below a threshold T, refine q and repeat the response generation:

Q_{refined} = LLM_Refine (q)

(21)

R_{Q_{refined}} = GPT_Generate (Q_{refined})

(22)

Step 4: Return Final Response

Return the final generated response,

R_{q}

, having met the relevance threshold T.

5. Implementation

The implementation of the AI-driven news analysis chatbot, as visualized in Figure 2, involved a strategic orchestration of four key technology components. The architecture diagram illustrates the integrated system of components enabling the chatbot to perform complex news analysis tasks by leveraging user queries, processing information, and retrieving relevant data. For a Robotic Process Automation (RPA) practitioner, this diagram provides a clear blueprint for automating the flow of information and tasks between different technologies, such as Microsoft Copilot Studio, Microsoft Power Automate (MPA), Microsoft Dataverse (MD), and Google Gemini API (GGAPI). By understanding this architecture, RPA practitioners can effectively implement and optimize the chatbot for dynamic news analysis, ensuring seamless communication, efficient data processing, and accurate information retrieval. Microsoft Copilot Studio served as the front end for user interaction, enabling seamless communication and query processing. It captures user intents, whether related to obtaining news summaries, calculating correlations, or posing generic queries. This information is then passed to MPA, which acts as the central orchestrator, facilitating automated interactions and data flow between the different components.

MPA routes requests to appropriate modules based on the user’s intent. For instance, requests for news summaries are directed to both GGAPI and MD. The GGAPI leverages its advanced functionalities to accurately match news categories to user queries and generate Fetch XML for efficient retrieval of relevant news reports from the Dataverse repository, which houses a live news database comprising 991,325 news reports categorized into 202 distinct news categories. Additionally, Google Gemini’s summarization capabilities facilitate the concise and informative presentation of news data to the user. Similarly, requests for correlation calculations are processed by a dedicated module, utilizing both the GGAPI and MD to fetch and analyze relevant news data. Generic queries are handled by leveraging the generative AI capabilities of the system. This synergistic integration of technologies enables the chatbot to effectively address a wide range of user needs and deliver dynamic news analysis capabilities. Table 3 depicts the roles of the selected technology components for the proposed chatbot-based news analytics system presented in this paper.

Figure 2 in the manuscript meticulously outlines the architectural configuration of the AI-driven chatbot, employing ArchiMate as the notation system. This diagram is central to illustrating the structural and functional relationships within the system, prominently featuring ‘access’ (i.e., dashed arrow with regular arrowhead) and ‘realization’ (i.e., dashed arrow with white triangle at the arrowhead) relations, which are pivotal in ArchiMate for depicting interaction dynamics and implementation details, respectively. It is paramount to note that the diagram employs ArchiMate, a standardized notational language specifically designed for architectural representation within complex information systems [22,23,24]. ArchiMate is endorsed by The Open Group and adheres to rigorous international standards, which dictate the use of specific symbols and color schemes to represent different architectural domains and relationships. As seen in Figure 2, Microsoft Copilot Studio realizes three business services, namely location-based news summary, correlation calculation, and generative AI. GGAPI, MPA, and MD realize location-based news summary and correlation calculation.

6. Results

To rigorously assess the performance of the AI-driven chatbot for real-time news automation, the system was evaluated using a dataset comprising 1,306,518 news reports. These reports were collected over a comprehensive monitoring period spanning from 25 September 2023 to 17 February 2025. The dataset was systematically classified into 202 subcategories grouped into 15 primary event categories, ensuring broad coverage of diverse global topics. This classification schema was derived from a comprehensive analysis of the dataset, where these 15 primary event categories and 202 subcategories were identified as sufficiently representative of the variance within the data without overly fragmenting the information, which could potentially lead to model over-fitting. The classification results are summarized in Table 4. The LLM assesses the predominant themes within the news text and classifies it into the most fitting category, ensuring comprehensive and nuanced understanding. It does not reject information simply because it spans multiple disciplines (e.g., a news report on both politics and education); rather, it synthesizes the information to determine the most relevant category or categories, reflecting the interdisciplinary nature of the content where appropriate.

To demonstrate the chatbot’s functionality, we present two figures highlighting its news summarization and correlation analysis capabilities. Figure 3 illustrates the chatbot’s ability to generate concise and structured summaries for news articles. The chatbot extracts key information from multiple sources and provides an easy-to-read summary. Figure 4 showcases the chatbot’s event correlation feature, where users input two event categories, and the system computes a correlation coefficient based on historical data. This functionality allows users to understand potential dependencies between different types of events. Together, these figures illustrate the chatbot’s effectiveness in automating news intelligence and supporting real-time decision-making.

The chatbot was evaluated based on two key dimensions: summarization accuracy and correlation analysis. The summarization evaluation assessed the chatbot’s ability to generate concise and relevant summaries for diverse news topics. The results of this evaluation are presented in Table 5.

The chatbot demonstrated high summarization accuracy, achieving an overall average F1 score of 0.94 across multiple categories. The system required an average processing time of 9 s per summarization task.

In addition to summarization, the chatbot was also evaluated for its correlation analysis capabilities. The AI-driven correlation module systematically processed various query types and dynamically extracted relationships between news events. The chatbot’s effectiveness in detecting event correlations was validated using a dataset of correlated event pairs. The results of this evaluation are presented in Table 6.

AI-driven chatbots have become integral to various domains, including politics, healthcare, cybersecurity, and finance. Evaluating their effectiveness requires quantitative analysis of key performance metrics such as precision, recall, and the F1 score. This paper presents a three-dimensional (3D) visualization that systematically analyzes the chatbot’s performance across multiple categories. Using a pre-trained model like Google Gemini means that significant portions of the model’s learning and adaptation processes have already been completed before integration into the presented system. This eliminates the need for extensive retraining, which typically involves calculating loss functions during training to improve the model. Hence, this study does not use a loss function and instead focuses on performance metrics that assess how well the pre-trained model adapts to the use case.

The 3D surface plot in Figure 5 illustrates the chatbot’s performance scores in 15 primary event categories. The x-axis represents the categories, the y-axis differentiates the three evaluation metrics, and the z-axis denotes the performance scores, ranging between 0.87 and 0.98. A color gradient (using the ‘viridis’ colormap) visually encodes variations in performance intensity, where lighter shades correspond to higher scores.

From the analysis, the chatbot demonstrates robust performance across most categories, maintaining precision, recall, and F1 scores above 0.90 in most cases. The highest performance scores are observed in Sports and Pandemics, where the chatbot exhibits strong accuracy and recall when retrieving relevant data. However, slightly lower scores appear in Natural Disasters and UFO-related topics, suggesting areas where improvements can be made, potentially by enhancing the training dataset or refining the response generation algorithm.

The 3D visualization provides an intuitive representation of chatbot performance, enabling stakeholders to identify strengths and improvement areas. The inclusion of a color bar legend facilitates interpretation, making the chart an effective tool for evaluating chatbot capabilities in diverse scenarios.

The chatbot exhibited strong correlation detection capabilities, achieving an overall average F1 score of 0.92 across multiple event pairings. As shown in Figure 6, the performance of the Google Gemini API with other LLMs like OpenAI’s GPT API and even Meta’s LLaMA is similar (within Figure 6, the outcome of this study is labeled as ‘AI-Driven Chatbot’ in yellow). The rest of the results within this figure were obtained from [25]. It is important to recognize that a significant advantage of employing Google Gemini’s API (version: 1.5 flash), as opposed to other GPT APIs like DeepSeek, OpenAI, or similar services, lies in its cost-effectiveness. Google Gemini’s API can be utilized at no charge up to a specified daily usage limit. The processing time for correlation tasks was observed to be approximately 21 s per query. It should be noted that the majority (over 70%) of the processing time is spent on querying the Microsoft Dataverse database in the aggregation process of the news reports. To reduce the processing time for summarization (i.e., an average of 9 s) and correlation (i.e., an average of 21 s), future studies should focus on database indexing and retrieval processes.

The comprehensive evaluation of the system, covering both summarization and correlation analyses, underscores its robustness in automated news analytics. These findings position the chatbot as a scalable and adaptive solution for event tracking, real-time summarization, and data-driven decision-making in various domains.

7. Conclusions

This study has introduced an AI-driven chatbot for real-time news automation, demonstrating its effectiveness in processing large-scale global news data through event summarization and correlation analysis. By leveraging advanced AI methodologies, including Generative Pre-trained Transformers, knowledge graphs, and real-time retrieval systems, the chatbot successfully categorizes and structures information to provide users with precise and context-aware insights. The experimental evaluation validated its efficacy, showing high accuracy in news classification, summarization, and event correlation detection. These capabilities position the chatbot as a valuable tool for applications in critical domains such as disaster response, policy analysis, cybersecurity, and media intelligence, where timely and accurate information is essential for decision-making.

Despite its promising contributions, this research has certain limitations. The chatbot’s performance is dependent on the quality and diversity of available news sources, making it vulnerable to biases and misinformation inherent in the dataset. Furthermore, the computational overhead required for real-time processing, particularly in correlation analysis, poses challenges in high-frequency data environments. Another limitation is the chatbot’s reliance on predefined taxonomies for event categorization, which may not fully capture emerging or evolving news narratives. The system also faces challenges in interpreting ambiguous queries, necessitating further advancements in natural language understanding to improve user interactions.

Future research directions aim to enhance the chatbot’s adaptability and scalability by integrating reinforcement learning for continuous model improvement based on user feedback. Expanding multilingual support will enable broader accessibility, allowing the chatbot to process non-English news sources effectively. Further refinements in misinformation detection using fact-checking AI models and sentiment analysis can enhance the chatbot’s reliability. Additionally, optimizing computational efficiency through distributed AI processing and cloud-based architectures will enable faster response times, making the system more practical for large-scale deployments. While cosine similarity has demonstrated utility in this current framework due to its effectiveness in high-dimensional text data, we acknowledge that other metrics like Minkowski, Chebyshev, Jaccard, and Canberra could unveil additional insights into the complex relationships between news events [26,27]. We plan to investigate these metrics in future work to determine their impact on the quality of our correlation analyses, acknowledging that this might increase the computational complexity and the challenge of interpreting results. By addressing these areas, the chatbot can evolve into a more robust and intelligent news analysis system, contributing significantly to real-time information processing and decision support in diverse societal contexts.

Author Contributions

Conceptualization, F.S.; methodology, F.S.; software, F.S.; validation, F.S. and M.A.; formal analysis, F.S.; investigation, F.S.; resources, M.A.; data curation, F.S.; writing—original draft preparation, F.S.; writing—review and editing, F.S. and M.A.; visualization, F.S.; supervision, M.A.; project administration, M.A.; funding acquisition, M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

This research generated original data of 9261 cyber-related news articles structured with 95 attack types, 29 industries, locations, countries, event dates, and significance. This data is part of the submission as a supplementary document. For supporting research reproducibility and for extending this dataset with other cyber-related data, researchers and scientists can publicly access this dataset from https://github.com/DrSufi/CyberNews (accessed on 15 February 2025).

Acknowledgments

The authors would like to acknowledge the contributions of Edris Alam of Emergency and Crisis Management, Rabdan Academy, UAE, in validating the results. The mathematical rigor presented within this paper has led to the development of Coeus Institute’s flagship product, GERA, which is being used by federal governments and intelligence agencies worldwide (https://coeus.institute/gera/, accessed on 15 February 2025). As the CTO of Coeus Institute, the author, Fahim Sufi, would like to extend his gratitude to all members of Coeus Institute, US.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
GPT	Generative Pre-trained Transformer
LLM	Large Language Model
NLU	Natural Language Understanding
RPA	Robotic Process Automation
ML	Machine Learning
NLP	Natural Language Processing
CNN	Convolutional Neural Network
SR	Spectral Residual
API	Application Programming Interface
NER	Named Entity Recognition
RNN	Recurrent Neural Network
ANN	Artificial Neural Network
GNN	Graph Neural Network

References

Balkus, S.V.; Yan, D. Improving short text classification with augmented data using GPT-3. Nat. Lang. Eng. 2022, 30, 943–972. [Google Scholar] [CrossRef]
Kausar, N.; AliKhan, A.; Sattar, M. Towards better representation learning using hybrid deep learning model for fake news detection. Soc. Netw. Anal. Min. 2022, 12, 165. [Google Scholar] [CrossRef]
Barua, A.; Sharif, O.; Hoque, M.M. Multi-class Sports News Categorization using Machine Learning Techniques: Resource Creation and Evaluation. Procedia Comput. Sci. 2021, 193, 112–121. [Google Scholar] [CrossRef]
Levshun, D.; Kotenko, I. A survey on artificial intelligence techniques for security event correlation: Models, challenges, and opportunities. Artif. Intell. Rev. 2023, 56, 8547–8590. [Google Scholar] [CrossRef]
Pawar, C.S.; Makwana, A. Comparison of bert-base and gpt-3 for marathi text classification. In Proceedings of the Futuristic Trends in Networks and Computing Technologies: Select Proceedings of Fourth International Conference on FTNCT 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 563–574. [Google Scholar]
Nguyen, T.P.; Carvalho, B.; Sukhdeo, H.; Joudi, K.; Guo, N.; Chen, M.; Wolpaw, J.T.; Kiefer, J.J.; Byrne, M.; Jamroz, T.; et al. Comparison of artificial intelligence large language model chatbots in answering frequently asked questions in anaesthesia. BJA Open 2024, 10, 100280. [Google Scholar] [CrossRef] [PubMed]
Babu, A.; Boddu, S.B. BERT-Based Medical Chatbot: Enhancing Healthcare Communication through Natural Language Understanding. Explor. Res. Clin. Soc. Pharm. 2024, 13, 100419. [Google Scholar] [CrossRef]
Hasib, K.M.; Towhid, N.A.; Faruk, K.O.; Mahmud, J.A.; Mridha, M.F. Strategies for enhancing the performance of news article classification in Bangla: Handling imbalance and interpretation. Eng. Appl. Artif. Intell. 2023, 125, 106688. [Google Scholar] [CrossRef]
Maham, S.; Tariq, A.; Khan, M.U.G.; Alamri, F.S.; Rehman, A.; Saba, T. ANN: Adversarial news net for robust fake news classification. Sci. Rep. 2024, 14, 7897. [Google Scholar] [CrossRef]
Sufi, F.K. AI-GlobalEvents: A Software for analyzing, identifying and explaining global events with Artificial Intelligence. Softw. Impacts 2022, 11, 100218. [Google Scholar] [CrossRef]
Sufi, F.K. Identifying the drivers of negative news with sentiment, entity and regression analysis. Int. J. Inf. Manag. Data Insights 2022, 2, 100074. [Google Scholar] [CrossRef]
Gruenhagen, J.H.; Sinclair, P.M.; Carroll, J.A.; Baker, P.R.; Wilson, A.; Demant, D. The rapid rise of generative AI and its implications for academic integrity: Students’ perceptions and use of chatbots for assistance with assessments. Comput. Educ. Artif. Intell. 2024, 7, 100273. [Google Scholar] [CrossRef]
Fatemi, B.; Rabbi, F.; Opdahl, A.L. Evaluating the effectiveness of gpt large language model for news classification in the iptc news ontology. IEEE Access 2023, 11, 145386–145394. [Google Scholar] [CrossRef]
Nicholson, D.N.; Greene, C.S. Constructing knowledge graphs and their biomedical applications. Comput. Struct. Biotechnol. J. 2020, 18, 1414–1428. [Google Scholar] [CrossRef] [PubMed]
Kim, A.; Su, Y. How implementing an AI chatbot impacts Korean as a foreign language learners’ willingness to communicate in Korean. System 2024, 122, 103256. [Google Scholar] [CrossRef]
Du, J.; Daniel, B.K. Transforming language education: A systematic review of AI-powered chatbots for English as a foreign language speaking practice. Comput. Educ. Artif. Intell. 2024, 6, 100230. [Google Scholar] [CrossRef]
She, X.; Zhao, X. A Text Summarization Generation Algorithm Based on the Improved GPT-2 Model. In Proceedings of the 3rd International Conference on Computer, Artificial Intelligence and Control Engineering, Xi’an, China, 26–28 January 2024; pp. 541–549. [Google Scholar]
Ahmed, J.; Ahmed, M. Online news classification using machine learning techniques. IIUM Eng. J. 2021, 22, 210–225. [Google Scholar] [CrossRef]
Daud, S.; Ullah, M.; Rehman, A.; Saba, T.; Damaševičius, R.; Sattar, A. Topic classification of online news articles using optimized machine learning models. Computers 2023, 12, 16. [Google Scholar] [CrossRef]
Sunagar, P.; Kanavalli, A.; Nayak, S.S.; Mahan, S.R.; Prasad, S.; Prasad, S. News topic classification using machine learning techniques. In Proceedings of the International Conference on Communication, Computing and Electronics Systems: Proceedings of ICCCES 2020, Shillong, India, 8–30 April 2020; Springer: Berlin/Heidelberg, Germany, 2021; pp. 461–474. [Google Scholar]
Mulahuwaish, A.; Gyorick, K.; Ghafoor, K.Z.; Maghdid, H.S.; Rawat, D.B. Efficient classification model of web news documents using machine learning algorithms for accurate information. Comput. Secur. 2020, 98, 102006. [Google Scholar] [CrossRef]
The Open Group. ArchiMate^® 3.0 Specification; The Open Group: San Francisco, CA, USA, 2016. [Google Scholar]
Josey, A.; Lankhorst, M.; Band, I.; Jonkers, H.; Quartel, D. An Introduction to the ArchiMate^® 3.0 Specification; White Paper from The Open Group: San Francisco, CA, USA, 2016; p. 35. [Google Scholar]
Kraan, W. Using ArchiMate to design learning environment architectures. J. East China Norm. Univ. (Nat. Sci.) 2012, 2012, 52. [Google Scholar]
Sufi, F.; Alsulami, M. Mathematical Modeling and Clustering Framework for Cyber Threat Analysis Across Industries. Mathematics 2025, 13, 655. [Google Scholar] [CrossRef]
Malla, A.; Omwenga, M.M.; Bera, P.K. Exploring Image Similarity through Generative Language Models: A Comparative Study of GPT-4 with Word Embeddings and Traditional Approaches. In Proceedings of the 2024 IEEE International Conference on Electro Information Technology (eIT), Eau Claire, WI, USA, 30 May–1 June 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 275–279. [Google Scholar]
Zaki, N.; Krishnan, A.; Turaev, S.; Rustamov, Z.; Rustamov, J.; Almusalami, A.; Ayyad, F.; Regasa, T.; Iriho, B.B. Node embedding approach for accurate detection of fake reviews: A graph-based machine learning approach with explainable AI. Int. J. Data Sci. Anal. 2024, 18, 295–315. [Google Scholar] [CrossRef]

Figure 1. Flow diagram of the Generative AI based autonomous chatbot.

Figure 2. Architectural diagram for implementing the chatbot agent-based news analytics system.

Figure 3. Chatbot response for news summarization. The chatbot aggregates information from multiple news sources and provides users with a structured summary of key developments.

Figure 4. Chatbot response for correlation analysis. The chatbot analyzes the relationship between two selected event categories and computes a correlation coefficient based on historical trends.

Figure 5. Three-dimensional performance analysis of chatbot across categories. The plot displays precision, recall, and F1 score for each category, with color intensity denoting performance variation.

Figure 6. Three-dimensional performance analysis of various GPT technologies. The plot displays precision, recall, and F1 score for various GPT technologies like OpenAI’s GPT API, Google Gemini API, Meta’s LLaMA etc.

Table 1. Limitations of existing solutions and addressed problems.

Existing Limitations	Problems Addressed in This Paper
Static news dashboards with limited interactivity [1]	AI-driven chatbot enabling dynamic and interactive news analysis
Overwhelming volume of redundant and irrelevant news [9]	Context-aware summarization reducing cognitive load
Lack of real-time correlation analysis between events [4]	AI-driven event correlation using GPT embeddings
Inefficient information retrieval leading to delayed insights [14]	Automated news classification and trend detection
Inability to integrate AI with robotic decision-making systems [8]	Chatbot-based AI integration for real-time decision support in RPA and autonomous systems
Lack of adaptive user engagement [15,16]	AI-driven personalized news delivery based on user interactions
Ineffective misinformation filtering [6,7]	Automated bias detection and fact-checking using knowledge graph verification
Absence of comprehensive AI chatbot-based news solutions [12]	Integrated chatbot leveraging generative AI and real-time data analysis

Table 2. Notation table.

Notation	Description
q	User query on general topics or news events
e	User-selected event
L	User-specified location, or “Global” if not specified
D	Date range for report selection; in ObtainLatestLocationNewsSummary, this is the last 3 days
$V$	Vocabulary of predefined news categories
$C$	Set of all news categories derived from $V$
$C_{e} \subseteq C$	Subset of categories related to e, identified via semantic matching
${newsReports}_{e, L, D} \subseteq N$	Filtered news reports based on $C_{e}$ , L, and D
$LLM_Match (e, V)$	Language model function that maps e to categories $C_{e}$
$LLM_Summarize (newsReports)$	Summarizes News Reports with references
$GPT_Generate (q)$	Generates a response for q, leveraging pre-trained knowledge
r	Pearson correlation coefficient between daily frequencies of reports for two events $e_{1}$ and $e_{2}$
$f_{e} (d)$	Daily frequency vector of news reports for event e within date range D

Table 3. Technology components used for implementing the automated news analytics solution.

Requirements for the Proposed System	Microsoft Copilot Studio	Microsoft Power Automate	Microsoft Dataverse	Google Gemini
ObtainLatestLocationNewsSummary()	✓	✓	✓	✓
- Obtain Event and Time Frame	✓
- Match relevant News Categories		✓		✓
- Generate Fetch XML				✓
- Obtain all Relevant News			✓
- Summarization	✓	✓		✓
CalculateCorrelation()	✓	✓	✓	✓
- Obtain Event 1, Event 2, Time Frame	✓
- Match relevant News Categories		✓		✓
- Generate Fetch XML				✓
- Obtain all Relevant News			✓
- Calculate Daily Frequencies	✓			✓
- Calculate correlations with explanation	✓			✓
HandleGenericQueries()	✓

Table 4. Categorization of news reports and corresponding subcategories.

Event Category	Number of News Titles	Number of Sub Categories
Politics, Governance, and International Affairs	263,055	19
Industry and Business News	186,944	27
Economic and Financial News	146,675	19
Crime, Safety, and Security	137,598	25
Entertainment and Culture	130,022	15
Human Rights and Social Issues	85,399	16
Disasters, Accidents, and Crisis	48,193	20
Science and Technology	41,656	19
Environment and Climate	37,511	7
Legal and Justice	34,698	4
Health and Medicine	26,735	12
Lifestyle and Trends	21,231	7
Education and Learning	14,742	2
Media and Communication	8591	8
Unusual and Extraordinary Events	1142	2

Table 5. Performance evaluation of the chatbot’s news summarization capabilities.

User Topic of Interest	Location	TP	FP	FN	Precision	Recall	F1 Score
Politics	USA	32	2	1	0.940	0.970	0.960
Military	Global	28	3	2	0.930	0.950	0.940
Sports	UK	37	1	2	0.960	0.980	0.970
UFO	Global	21	4	3	0.900	0.920	0.910
Healthcare	Australia	25	2	3	0.920	0.940	0.930
Terrorism	USA	18	3	1	0.910	0.930	0.920
Finance	Global	30	2	3	0.945	0.955	0.950
Environment	UK	22	3	2	0.915	0.940	0.927
Average					0.928	0.948	0.938

Table 6. Performance evaluation metrics for correlation calculation.

Event 1	Event 2	Precision	Recall	F1 Score
Elections	Economic Crisis	0.920	0.915	0.917
Military	Cybersecurity Threats	0.947	0.900	0.923
Healthcare	Pandemic Outbreaks	0.960	0.970	0.965
Natural Disasters	Humanitarian Aid	0.875	0.910	0.892
UFO Sightings	Space Research	0.900	0.895	0.897
Terrorism	Border Security	0.935	0.920	0.927
Political Unrest	Media Propaganda	0.918	0.930	0.924
Environmental Change	Energy Crisis	0.905	0.915	0.910
Average		0.923	0.918	0.920

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sufi, F.; Alsulami, M. AI-Driven Chatbot for Real-Time News Automation. Mathematics 2025, 13, 850. https://doi.org/10.3390/math13050850

AMA Style

Sufi F, Alsulami M. AI-Driven Chatbot for Real-Time News Automation. Mathematics. 2025; 13(5):850. https://doi.org/10.3390/math13050850

Chicago/Turabian Style

Sufi, Fahim, and Musleh Alsulami. 2025. "AI-Driven Chatbot for Real-Time News Automation" Mathematics 13, no. 5: 850. https://doi.org/10.3390/math13050850

APA Style

Sufi, F., & Alsulami, M. (2025). AI-Driven Chatbot for Real-Time News Automation. Mathematics, 13(5), 850. https://doi.org/10.3390/math13050850

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AI-Driven Chatbot for Real-Time News Automation

Abstract

1. Introduction

2. Background

2.1. Advancements in AI-Driven News Analytics

2.2. Knowledge Graph-Based Event Correlation

2.3. Real-Time AI Chatbot for News Summarization and Decision Support

3. Methodology of the Autonomous News Bot

4. Mathematical Modelling

4.1. Common Mathematical Models

4.2. Function 1: ObtainLatestLocationNewsSummary

4.3. Function 2: CalculateCorrelation

4.4. Function 3: HandleGenericQueries

5. Implementation

6. Results

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI