Next Article in Journal
A Knowledge Graph-Based Implicit Requirement Mining Method in Personalized Product Development
Previous Article in Journal
Empirical Research on AI Technology-Supported Precision Teaching in High School Science Subjects
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

FrameSum: Leveraging Framing Theory and Deep Learning for Enhanced News Text Summarization

by
Xin Zhang
1,*,
Qiyi Wei
1,
Bin Zheng
2,
Jiefeng Liu
1 and
Pengzhou Zhang
3
1
School of Computer and Cyber Sciences, Communication University of China, Beijing 100024, China
2
School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China
3
State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing 100024, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(17), 7548; https://doi.org/10.3390/app14177548
Submission received: 6 August 2024 / Revised: 21 August 2024 / Accepted: 22 August 2024 / Published: 26 August 2024
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

:
Framing theory is a widely accepted theoretical framework in the field of news communication studies, frequently employed to analyze the content of news reports. This paper innovatively introduces framing theory into the text summarization task and proposes a news text summarization method based on framing theory to address the global context of rapidly increasing speed and scale of information dissemination. Traditional text summarization methods often overlook the implicit deep-level semantic content and situational frames in news texts, and the method proposed in this paper aims to fill this gap. Our deep learning-based news frame identification module can automatically identify frame elements in the text and predict the dominant frame of the text. The frame-aware summarization generation model (FrameSum) can incorporate the identified frame feature into the text representation and attention mechanism, ensuring that the generated summary focuses on the core content of the news report while maintaining high information coverage, readability, and objectivity. Through empirical studies on the standard CNN/Daily Mail dataset, we found that this method performs significantly better in improving summary quality and maintaining the accuracy of news facts.

1. Introduction

In the context of globalization, both the speed and scale of information dissemination are increasing at an unprecedented, exponential rate. A forecast study by the International Data Corporation (IDC) predicts that the total global data volume will grow from 33 ZB in 2018 to 175 ZB by 2025 [1]. Particularly during major global events (e.g., the Olympic Games, financial crises, public health emergencies), news reports flood in like tidal waves, making it especially challenging for the public to capture and understand key information regarding event developments, impacts, and countermeasures in a short period. This predicament not only underscores the necessity of improving information filtering efficiency and in-depth analysis capabilities but also drives news text summarization technology to evolve and keep pace with the times. It continuously enhances its ability to accurately extract and efficiently disseminate core information to meet the needs of readers in an era of information explosion.
Although existing news text summarization methods have made some progress in combating information overload, most focus on statistics at the lexical and sentence level and the extraction of linguistic features [2,3,4,5,6], often overlooking the deep-level semantic content and situational frames implied in news texts. News reports are not simply a compilation of facts; they usually embed multiple attributes, such as specific social, cultural, and ideological aspects, which largely shape the topic selection, discourse construction, and meaning conveyance of the news [7,8]. In this context, a news frame can be defined as a conceptual structure used by journalists to present and organize information, shaping the way events and issues are interpreted. Frames help highlight certain aspects of a story while downplaying others, thus influencing how audiences understand and respond to news content. Many researchers have used various computational tools to capture frames in text corpora, from topic models [9] to dictionary methods [10], semantic networks [11], or supervised machine learning models [12]. Eisele et al. [13] even compared frame identification methods with different degrees of supervision, but few researchers have conducted relevant research on text summarization based on these foundations. Therefore, exploring how to introduce framing theory into the news text summarization process, generating summaries that contain factual information and reflect frame characteristics, and thus improving the quality and accuracy of news summaries, is a research topic that urgently needs further exploration and innovative applications. The concept of framing theory will be further elaborated in Section 2.2.
This paper takes the CNN/Daily Mail dataset as the research object and innovatively proposes an automatic news text summarization model based on framing theory. The model aims to provide strong technical support and theoretical basis for news dynamics mastery and information attention prediction while promoting the practical application of framing theory in the field of news text summarization. Specifically, this paper designs a deep learning-based news frame identification model that can automatically identify frame elements from the text and then construct an encoder–decoder summary generation model that integrates frame information. This model incorporates the frame identification results into the text representation and attention mechanism, ensuring that the generated summary focuses on the core content of the news report and maintains high information coverage, readability, and objectivity.
Through empirical research on a large-scale news corpus, we qualitatively and quantitatively compared and analyzed the performance of the proposed frame-aware summarization model with traditional models. The results show that our proposed model significantly improves summary quality and better maintains the accuracy and completeness of news facts. In summary, the contributions of this paper are as follows:
  • We innovatively introduce news framing theory into the text summarization task. This paper combines framing theory from the field of news communication with the text summarization task in natural language processing, constructing a novel frame-aware text summarization model that enriches the theoretical foundation and technical approach of text summarization.
  • We propose a deep learning-based news frame identification model. This paper designs a deep learning model that can automatically identify core frame elements (e.g., factual frames, responsibility frames, conflict frames) from news texts, laying the foundation for frame-aware summarization.
  • We construct an encoder–decoder summary generation model that integrates frame information. This model incorporates frame identification results into text representation and attention mechanisms, generating summaries that closely adhere to the core of the news frame and have high information coverage and readability.
  • We conducted a systematic empirical evaluation of our proposed model on the standard CNN/Daily Mail dataset. Through quantitative and qualitative analysis, we assessed the model’s performance across multiple dimensions, including information completeness, news frame relevance, and reading experience readability. Experimental results show that our model outperforms several existing summarization baseline models on the standard dataset, validating the guiding value and innovative significance of introducing framing theory into the news text summarization task.

2. Literature Review

2.1. Text Summarization

2.1.1. Extractive Summarization

Broadly speaking, extractive summarization methods can be seen as addressing a binary classification problem, where each sentence is classified as either summary-worthy or not. In 1958, Luhn [14] first proposed the concept of automatic text summarization, and the mainstream approach in early work was based on statistics. Graph-based methods are represented by the TextRank algorithm, where Mihalcea et al. [15] consider words in the document as vertices of a graph and construct edges based on the co-occurrence relationship between words to calculate the importance of words and the value of corresponding sentences. The lexical chain-based method was proposed by Barzilay et al. [16], which extracts summary sentences through three steps: text segmentation, lexical chain identification, and finding strong lexical chains. Nallapati et al. [17] used classifier and selector architectures based on recurrent neural networks (RNNs) to capture sentence saliency and provided conditions under which each architecture can achieve optimal performance. Yasunaga et al. [18] combined recurrent neural networks with graph convolutional networks to compute the importance of each sentence. Existing extractive summarization methods have achieved high ROUGE scores, but they all suffer from low readability.

2.1.2. Abstractive Summarization

In contrast to extractive summarization methods, the core of abstractive summarization methods is to fully understand the document content and then reorganize the language to generate a grammatically correct and coherent summary. Currently, the most widely studied approach is the sequence-to-sequence framework model. In 2015, Rush et al. [19] first applied a model with an attention-based encoder and a neural network language model decoder to abstractive summarization. Subsequently, Chopra et al. [20] introduced a conditional recurrent neural network to construct the decoder based on Rush’s work. Nallapati [21] et al. were the first to propose using the seq2seq framework combined with recurrent neural networks (RNN) to process long text summaries and introduced a generator pointer to solve the problem of out-of-vocabulary and low-frequency words. The Google team proposed the transformer model [22] in 2017, which solely uses the Attention mechanism and completely abandons traditional neural network units. Moreover, Claude3 [23], released by Anthropic; Gemini [24], released by Google; GPT-4 [25] and ChatGPT [26], released by OpenAI; “Ernie Bot” [27], by Baidu; and “Tong Yi Qian Wen” [28], by Alibaba, are all built based on the transformer architecture.
Despite the significant progress made in abstractive summarization, several challenges remain: (1) Factual inconsistency: Abstractive models sometimes generate summaries that contain factual errors or hallucinations not present in the source text [29]. This is particularly problematic for news summarization, where accuracy is crucial. (2) Lack of domain knowledge: Current models often struggle with domain-specific texts, as they lack the necessary background knowledge to accurately interpret and summarize specialized content [30]. (3) Limited control over summary content: While these models can generate fluent summaries, they often lack fine-grained control over the specific information included or emphasized in the summary [31]. (4) Difficulty with long documents: Many abstractive models struggle with very long documents, often losing important information or focusing too heavily on the beginning or end of the text [32].

2.1.3. Text Summarization for News Domain

Text summarization models for the news domain often no longer use a single method or technique to achieve the summarization task but rather combine multiple methods and models, with multiple techniques overlapping and interacting. Mohsin et al. [2] proposed a sentence ordering model based on an improved genetic algorithm and adaptive particle swarm optimization for news texts. Singh et al. [3] used convolutional neural networks and recurrent neural networks to implement word-level and sentence-level attention and trained a pointer-generator network using a controlled actor–critic model. Liu et al. [4] designed a segment-aware attention mechanism based on the transformer model, dividing news articles into multiple parts and generating corresponding summaries for each part. Ma et al. [5] proposed a language-guided attention mechanism that utilizes dependency parsing to capture cross-position dependencies and grammatical structures, thereby improving the performance of news text summarization models. Although the above research works have made some progress on the news text summarization task, most of them focus on improving specific technical details while neglecting the core elements and viewpoints of news reports.

2.2. Framing Theory

2.2.1. Defining the Concept of Framing

Bateson [6] first proposed framing as spatial and temporal boundaries of interactive message sets. Goffman [7] defined frames as cognitive structures for interpreting the world. Building on this, Gitlin [33] defined frames as the principles used in selection, emphasis, and presentation when addressing questions about what exists, what happens, and what it means, highlighting the role of ideology. Gamson [34], from the perspective of the media level, believed that frames are organized central ideas or storylines that provide meaning to a series of events. Entman [35] argued that framing involves selection and salience, emphasizing definitions, relationships, moral evaluations, and solutions. From this, we can see that framing has a dual meaning. On the one hand, as an established knowledge system or cognitive predisposition, frames pre-exist in our minds, stemming from our past experiences and knowledge accumulation. On the other hand, frames are also a guiding or constructing mechanism for processing new information and understanding new things.

2.2.2. News Framing Research

After the 1980s, framing theory entered news communication, introducing media frames and news frames. Tuchman [36] viewed news as a frame, arguing that news reporting is a process of “bounding” partial facts, “selecting” partial facts, and subjectively “reorganizing” these social facts. Zang [37] believed that news frames refer to the subjective interpretation and thinking structure of events by news media or news workers, and the concept of framing should be understood as a compound of nouns and verbs. Huang [38] further pointed out that the central issue of framing theory is media production, with the final product embodied in the form of text or discourse, and this production process is not closed or isolated but requires placing its product—the text—in a specific context for interpretation. Simply put, “framing” is a specific principle for selectively processing news facts, telling people “how to think” [39]. This paper introduces framing theory into the news text summarization task, aiming to more comprehensively focus on the framing elements of news reports, thereby enhancing the fidelity and coverage of summaries to the core content of news reports.

2.2.3. News Framing Analysis Methods

Pan [40] noted that news framing analysis is cross-disciplinary, covering discourse, its construction, and reception. Although framing theory originated in sociology, it has been developed into a method for studying news texts in the field of communication. Tuchman [36] believed that framing analysis is better at revealing the internal logic and essential characteristics of news production compared to other social science research approaches. The above three categories also correspond to the three major areas of media communication research: media content, news production, and media effects.
News framing research methods are diverse and rich, including but not limited to content analysis [41], discourse analysis [42], narrative analysis [43], etc. This paper will adopt the content analysis method used by most researchers, which starts from the media content, i.e., the news text itself, and reveals the dominant frames hidden behind news reports and their operating mechanisms by conducting in-depth discussions on aspects such as keywords, themes, narrative structures, and emotional tendencies in the text [44]. Different news organizations may generate multiple frames due to differences in their own interests, ideologies, or audience positioning, shaping the public’s perception of and response to social phenomena with these frames. For example, in the reporting of public health issues, Chinese media usually prioritize national and collective interests, emphasizing the proactive measures taken by the government in responding to public health crises, the efficiency of policy implementation, and the overall layout of epidemic prevention and control. In contrast, Western media have a more diverse perspective. In addition to focusing on government policies and measures, they place more emphasis on reporting the operating mechanisms of the public health system, civil rights, information transparency, and public reactions to policies [45].
Entman [35] believed that news frames have the following four functions: providing problem definitions, interpreting event causes, offering moral evaluations, and suggesting solutions, corresponding to four types of frames: problem definition frames, event attribution frames, moral judgment frames, and solution frames. Iyengar [46] divided the frames in news reports into thematic and episodic structural frames. De Vreese [47] believed that news frames include issue-specific frames and generic frames. Issue-specific frames are necessarily related to specific issues and events, while frames that can cross issue boundaries and be used in multiple reports are called “generic frames”. He defined seven generic frames: fact frame, responsibility frame, conflict frame, human interest frame, economic consequences frame, morality frame, and leadership frame.

2.2.4. Algorithm-Based News Frame Analysis

In recent years, with the rapid development of computational communication and the widespread application of natural language processing technology in the field of news communication, news framing analysis has gradually transitioned from traditional qualitative research methods to a new stage of quantitative analysis using big data and algorithmic models. Burscher et al. [11] used supervised machine learning algorithms to establish an automatic classifier to encode four types of news frames. Lawlor et al. [48] used dictionary-based methods to identify and validate frames. Walter et al. [49] used topic modeling to identify frame elements and utilized community detection techniques in topic networks to group frame elements into frame packages. Eisele et al. [12] compared different machine learning methods, and the results showed that supervised machine learning methods performed best in news framing analysis tasks, followed by LDA topic modeling, while semi-supervised machine learning methods performed the worst. Given the excellent performance of supervised machine learning methods in news framing analysis tasks, this paper will adopt supervised deep learning methods, combining natural language processing techniques with news professional knowledge, in order to obtain more accurate and interpretable frame identification and analysis capabilities.

3. News Text Summarization Model Based on Framing Theory

3.1. Overview

This paper presents a news text summarization model founded on framing theory, utilizing an end-to-end neural network architecture. The model is built around an encoder–decoder core and incorporates key technologies such as frame identification and multi-head attention mechanisms. It comprises two main modules: a frame identification module and a summary generation module. The model’s workflow is illustrated in Figure 1.
The frame identification module is mainly responsible for extracting frame semantic features from the text and predicting the dominant frame category of the news, providing prior frame knowledge for subsequent summary generation.
The summary generation module consists of two components: a frame-aware encoder and a frame-guided decoder. In the frame-aware encoder, a frame embedding mechanism is introduced to enhance its ability to model frame semantics. In the frame-guided decoder, the frame representation is used as additional input to the decoder, guiding the decoder’s attention mechanism to adaptively focus on frame-relevant text segments, ultimately generating high-quality summaries that closely align with the dominant frame of the news article.

3.2. News Framing Recognition Module (NFRM) Based on Deep Learning

3.2.1. Constructing Frame Categories

This study adopts the seven generic frames summarized by De Vreese [47] as research categories: fact frame, responsibility frame, conflict frame, human interest frame, economic consequences frame, morality frame, and leadership frame.
  • Fact frame: Objectively describes the basic aspects of an event, including the cause, process, involved personnel, time, location, and other fundamental factual elements, aiming to answer the core question of “what happened”.
  • Responsibility frame: Attributes and assigns the cause or responsible party for the event’s occurrence, or comments on and questions the actions of the responsible party, exploring the deep-rooted causes of the issue.
  • Conflict frame: Emphasizes the conflicts of interest, divergent opinions, or contradictions among the involved parties in the event, highlighting the complexity and controversy of the event.
  • Human interest frame: Evokes the audience’s empathy and resonance by vividly depicting individuals’ personal experiences and emotional journeys in the event, enhancing the infectiousness of the report.
  • Economic consequences frame: Focuses on and evaluates the actual or potential impact of news events on the economy, including economic indicators, trade, gains, and losses.
  • Morality frame: Examines news events from a moral and ethical perspective, discussing the moral concepts of fairness, justice, integrity, obligation, and rights behind the event, judging the moral responsibility and social norms of the parties involved, and prompting the public to reflect on and discuss moral values.
  • Leadership frame: Focuses on reporting the role and performance of leaders and decision-makers in the event, assessing their decision-making abilities, crisis response strategies, and leadership roles for the public and team, and exploring how leaders’ words and actions influence the event’s process and outcome.
The above seven frame types are derived from content analysis of a large number of news reports [44,45,50,51] and have cross-linguistic and cross-cultural universality. Whether it is political news, economic news, social news, entertainment news, etc., they can all be effectively classified using these seven frames. Moreover, the news reporting characteristics they depict, such as causal attribution, emotional tendency, moral judgment, etc., can find corresponding semantic indicators in the text, facilitating the definition of clear operationalized variables and enabling the possibility of automatic frame type recognition by computers.

3.2.2. Constructing a Frame Annotation Corpus

To ensure broad coverage and high representativeness of the frame annotation work, we carefully selected 3000 news corpora from the CNN/Daily Mail dataset. These samples encompass diverse topic content, varied information sources, and distinct reporting styles, aiming to build a dataset that fully reflects the diversity of the news ecosystem and lays a solid foundation for subsequent frame annotation and recognition model training.
According to De Vreese’s [47] seven generic frame types, we treat frame annotation as an explicit multi-category classification task. Each report needs to be classified into one of the seven frames to reveal its core narrative logic and value orientation.
To ensure annotation quality, we recruited three graduate students from the communication studies department as manual annotators and provided systematic frame annotation training before the annotation work began. The training focused on the connotation and identification points of each frame to ensure cognitive consistency among all annotators in frame classification (for information on frame annotation guidelines, please refer to Appendix A). In the annotation process, two annotators first independently completed the initial sample annotation; then, we randomly selected 10% of the samples for review by the third annotator. For cases where the review opinions were inconsistent, we specially invited a senior professor in the field to arbitrate, striving to reach a consensus on all annotation results. The entire annotation process was implemented using an online crowdsourcing platform to facilitate remote collaboration among annotators.

3.2.3. Design of NFRM

We design a deep learning-based news frame recognition model to automatically extract semantic features from news text and analyze and identify the frame types used in the news, thus guiding the generation of subsequent summaries. The architecture of NFRM is shown in Figure 2.
1.
Text Preprocessing
We use the existing natural language processing toolkit spaCy [52] to segment, tokenize, and perform part-of-speech tagging on the raw news text and utilize NER tools to identify named entities in the text. Subsequently, we perform language normalization processing on the text, including removing stop words, converting case, handling punctuation marks, etc., laying the foundation for subsequent feature extraction and model construction.
2.
Text Semantic Representation
We use the encoder part of the transformer model [22] as the text semantic feature extractor. The core of the transformer is the multi-head attention mechanism, which is used to capture the relationships between different semantic features. For a detailed explanation of the attention and multi-head attention mechanisms, we refer readers to the original paper by Vaswani et al. [22].
After N layers of transformer encoder computation, we finally obtain the contextual semantic representation matrix H R n × d of the text sequence.
3.
Frame Element Feature Extraction
The key innovation of our NFRM lies in the frame element feature extraction process. This section introduces our approach to extracting semantic features specific to each frame type, which forms the foundation of our frame-aware summarization model.
A. Fact Frame Feature Extraction: Utilizing tokenization and part-of-speech tagging techniques, we extract key information such as “who, what, when, where” from the news text to reflect the objective facts of the news report; then, through named entity recognition models, we identify named entities involved in the news, such as person names, place names, organizations, etc., to further enrich the factual information; simultaneously, we count the various numerical information appearing in the news report, such as quantity, time, amount, etc., which are also important indicators reflecting objective facts; finally, we extract factual statement sentences from the text and calculate their proportion to further highlight the factual orientation of the news report. The above four types of features are combined into a semantic feature vector for the fact frame.
B. Responsibility Frame Feature Extraction: Using causal relationship extraction models, we analyze the causes and consequences of events in the news report to determine responsibility attribution; then, through sentiment analysis techniques, we identify the stances of various subjects (such as government, enterprises, the public, etc.) in the news to further understand responsibility attribution; at the same time, based on keyword matching methods, we extract relevant vocabulary such as “responsibility”, “accountability”, “investigation”, etc., in the news report, which can also reflect the characteristics of the responsibility frame; finally, combining contextual factors such as the time, place, and nature of the event in the report, we analyze the specific context of responsibility attribution to more comprehensively capture the features of the responsibility frame. The above four types of features are combined into a semantic feature vector for the responsibility frame.
C. Conflict Frame Feature Extraction: Using sentiment analysis models, we extract vocabulary expressing negative emotions such as conflict and confrontation in the news report, such as “dispute”, “confrontation”, etc.; then, through subject identification techniques, we identify the opposing parties involved in the news, such as political opponents, social groups, etc.; simultaneously, based on sentence pattern matching methods, we extract sentences expressing speech acts such as arguments and debates in the news report to further highlight the conflict features; finally, we analyze the rhetorical devices used in the news, such as metaphors and exaggerations, to observe whether they enhance the sense of conflict. Combining the above features, we obtain the semantic feature vector for the conflict frame.
D. Human Interest Frame Feature Extraction: Using character profile recognition techniques, we extract paragraphs focusing on personal stories and emotional experiences from the news report to reflect the human interest features; then, through sentiment analysis models, we identify vocabulary expressing character emotions and psychological states in the news, such as “sad”, “delighted”, etc., to enhance the shaping of character images; subsequently, we analyze whether the news report attempts to describe events from the character’s perspective, such as using a first-person perspective to highlight personal experiences; finally, we detect vivid and figurative rhetorical devices used in the news, such as metaphors and similes, to further evoke readers’ empathy. Combining the above features, we obtain the semantic feature vector for the human interest frame.
E. Economic Consequences Frame Feature Extraction: Based on vocabulary matching methods, we extract vocabulary related to economy and finance from the news report, such as “cost”, “profit”, “stock price”, etc., to reflect the characteristics of the economic consequences frame; then, we count the various economic data appearing in the news, such as growth rate, inflation rate, GDP, etc., to quantify the economic impact; simultaneously, through causal relationship extraction models, we analyze the specific impact and consequences of the news event on economic operation; finally, combining industry knowledge, we assess the relevance of the news event to specific economic fields or industries to further enrich the analysis of the economic consequences frame. Combining the above features, we obtain the semantic feature vector for the economic consequences frame.
F. Morality Frame Feature Extraction: Based on vocabulary matching methods, we extract vocabulary related to moral values such as morality, ethics, and justice from the news report, such as “justice”, “morality”, etc., to reflect the characteristics of the morality frame; then, through sentence pattern matching techniques, we extract sentences containing moral judgments and connotations of praise or criticism in the news to further highlight the moral orientation; subsequently, we analyze the rhetorical devices used in the news, such as metaphors and similes, to observe whether they are intended to reinforce moral appeals; finally, combining semantic reasoning techniques, we mine the implicit moral implications in the news report to more comprehensively capture the features of the morality frame. The above four types of features are combined into a semantic feature vector for the morality frame.
G. Leadership Frame Feature Extraction: Using character recognition techniques, we extract the leaders involved in the news report, such as politicians, entrepreneurs, etc.; then, based on a verb dictionary, we extract verbs describing the behavior of these leaders in the news, such as “decide”, “call for”, “promote”, etc., to reflect leadership characteristics; subsequently, through opinion mining models, we identify evaluative descriptions of leaders’ abilities, achievements, etc., in the news, such as “outstanding leadership”, “excellent achievements”, etc.; finally, we analyze the rhetorical devices used in the news, such as exaggeration and praise, to observe whether they are intended to highlight the leadership image. Combining the above features, we can obtain the semantic feature vector for the leadership frame.
For a more detailed explanation of the feature extraction process for each frame type, please refer to Appendix B.
4.
News Frame Classification and Prediction
News frame classification and prediction is a key innovation in our approach. While it builds upon existing multi-class classification techniques, our method incorporates novel elements specifically designed for frame identification.
We consider news frame recognition as a multi-class classification problem, where each news text belongs to only one frame type. The set of news frame labels is defined as L = { l 1 , l 2 , , l 7 } , corresponding to the fact frame, responsibility frame, conflict frame, human interest frame, economic consequences frame, morality frame, and leadership frame respectively. For each frame category l i L ( i = 1, 2, …, 7), its semantic feature representation Φ l i can be obtained from Section 3.2.3.
A key innovation in our approach is the use of a frame-aware attention mechanism to perform weighted aggregation on the contextual semantic representation H of the text sequence using the frame features Φ l i . This allows us to obtain the text representation vector s i under that frame.
e i j = v i t a n h ( W h h j + W f Φ l i ) α i j = e x p ( e i j ) k = 1 n e x p ( e i k ) s i = j = 1 n α i j h j
where e i j is the attention score of the j-th hidden state under the i-th frame, α i j is the corresponding attention weight, v i R d a , W h R d a × d h and W f R d a × d i are the attention parameters. s i R d h is the aggregated text representation under the i-th frame.
We then concatenate the text representations of all frames S = [ s 1 ; s 2 ; ; s 7 ] . Through a multi-layer perceptron (MLP), we obtain the final classification probability distribution:
P = s o f t m a x ( M L P ( S ) ) l ^ = a r g m a x i P i
where S R 7 d h is the concatenated text representation vector, M L P is the multi-layer perceptron, and P R 7 is the predicted probability distribution. We select the frame with the highest predicted probability as the final classification result, where l ^ L is the predicted frame category label.
5.
Loss Function and Optimization Strategy
In multi-class classification tasks, we typically use the cross-entropy loss function to measure the difference between the predicted probability distribution and the true label distribution.
L = 1 N x = 1 N y = 1 7 l x y l o g ( P x y )
where N is the number of training samples, l x y { 0,1 } is the true label of the x-th sample on the y -th frame, and P x y is the predicted probability of the x-th sample on the y -th frame.
To minimize the loss function, we use the Adam [53] optimization algorithm as implemented in the PyTorch library [54]. The update rule for the Adam optimizer is as follows:
m t = β 1 m t 1 + ( 1 β 1 ) g t , v t = β 2 v t 1 + ( 1 β 2 ) g t 2 m ^ t = m t 1 β 1 t , v ^ t = v t 1 β 2 t θ t = θ t 1 α m ^ t v ^ t + ϵ
where m t and v t are the first and second moment estimates of the gradient, respectively, β 1 and β 2 are the decay rates, m ^ t and v ^ t are the corrected estimates, θ t is the value of the model parameter at t-th step, α is the learning rate, and ϵ is the smoothing term.
In the training phase, we train the model on a news dataset containing annotated frame labels and use a validation set for hyperparameter tuning to select the optimal hyperparameter configuration. In the testing phase, we evaluate the news frame recognition performance of the model on the test set using metrics such as accuracy, precision, recall, and F1 score for each category. Our method combines the text semantic feature representation automatically extracted by deep neural networks with specially designed frame semantic feature extraction modules, enabling effective mining of the semantic features of different frames in news texts and performing news frame recognition and classification accordingly.

3.3. A Summarization Model Incorporating Framework Information (FrameSum)

This paper proposes a summarization generation model called FrameSum that incorporates framework information to generate high-quality summaries that balance informativeness, topic relevance, and framework consistency. The overall architecture of the model is based on the transformer encoder–decoder framework [22], with the integration of framework information to enhance summary quality.
The encoder employs a multi-layer self-attention structure of the transformer to encode the news text into semantic representations (as shown in Figure 3). Similarly, the decoder uses a multi-layer self-attention structure of the transformer to auto-regressively generate the next word based on the encoder’s output and the previously generated summary sequence (as shown in Figure 4). In both the encoder and decoder, we incorporate the framework information of the news to guide the summary generation process in focusing on the dominant framework and key content.

3.3.1. Encoder Design Incorporating Framework Information

  • Frame Feature Representation
Through the automatic framework identification model, the news text t obtains framework labels l ^ L . To transform the discrete framework labels into continuous vector representations, we map the framework labels   l   ^ through the word embedding matrix to obtain a d f -dimensional framework embedding vector f , as follows:
f = E m b e d d i n g f r a m e (   l   ^ )
To introduce the semantic features of the framework into the encoder, we need to transform the semantic feature vector of the framework into a fixed-dimensional embedding representation. Specifically, we map the semantic feature vector Φ corresponding to framework l ^ through a linear transformation layer to obtain a d f -dimensional semantic embedding vector s ,
s = W s Φ + b s
where W s R d Φ × d s and b s R d f are learnable parameters. Finally, we concatenate f and s to obtain the framework information representation r = [ f , s ] of the news text t .
2.
Framework-Aware Attention Mechanism
To enable the encoder to extract key information based on the framework type of the news text, we introduce framework information r to modulate the attention weights in each layer of self-attention in the transformer encoder. For the l-th layer and the i-th position, the attention weight α i j l to the j-th position is calculated as follows:
e i j l = ( h i l W q l + r i W r l ) ( h j l W k l ) T d k   ( i ,   j = 1 ,   2 ,   ,   n ) α i j l = e x p ( e i j l ) k = 1 n e x p ( e i k l ) h i l = j = 1 n α i j l h j l W v l
where W r l R d m o d e l × 2 d f are learnable parameters and d k is the dimension of the attention head. By adding the news framework representation r i to the query vector h i l W q l , the attention mechanism captures semantic relationships while considering the framework attributes of the current news.
3.
Text Representation Incorporating Framework Information
In the last layer L of the transformer encoder, the hidden state   h i L at the i-th position is concatenated with the framework representation r i to obtain the final text representation incorporating framework information:
D i = [ h i L , r i ]  
As the output of the encoder, D i not only incorporates the semantic content at the i-th position but also the attribute information of the news framework.

3.3.2. Decoder with Framework Information

  • Framework-Guided Decoding Strategy
In the transformer decoder, the framework representation r of the current news is used as an additional input to guide the decoding process to focus on framework-related content. Specifically, at each time step in the decoder, the attention distribution α t R n between the current hidden state s t and r is first calculated through the attention mechanism:
e t i = ( s t W q ) ( r i W k ) T d k   ( i = 1 ,   2 ,   ,   n ) α t = s o f t m a x ( e t )
where W q , W k R d m o d e l × d k are learnable attention parameters and d k is the dimension of the attention head.
The context vector c t R d m o d e l at the current time step is obtained by weighted summation of r using the attention distribution α t :
c t = i = 1 n α t i r i
Then, c t is concatenated with the current decoder hidden state s t and passed through a feed-forward neural network to generate the output representation o t R d m o d e l at the current time step:
o t = F F N ( [ s t ; c t ] )
Finally, we use the output representation o t to predict the vocabulary distribution p t R | V | at the current time step:
p t = s o f t m a x ( o t W v + b v )
where W v R d m o d e l × | V | , b v R | V | are learnable output parameters and | V | is the vocabulary size.
Through the above framework-guided decoding strategy, the decoder can selectively focus on relevant content based on the framework attributes of the text during the summary generation process, thereby generating summaries that better align with the original theme.
2.
Coverage Loss
To improve the coverage of the summary over the news content, we introduce a coverage loss L c o v during the training process. First, we define the coverage vector g t R n , which represents the accumulated attention of the decoder over each position of the original text in the previous t time steps:
g t = τ = 1 t α τ
The difference between the attention distribution α t at the current time step and the coverage vector g t 1 is calculated as the coverage loss:
L c o v t = i = 1 n m i n α t i , g i t 1 L c o v = 1 T t = 1 T L c o v t
where T is the total number of time steps in the decoder. By minimizing the coverage loss, the coverage of the summary over the original content can be improved, generating more comprehensive and complete summaries.
3.
Framework Relevance Loss
To ensure consistency between the generated summary and the original text in terms of the dominant framework, we introduce a framework relevance loss L f r during the training process. Let y = ( y 1 , y 2 , , y m ) be the generated summary sequence. Using the pre-trained framework automatic identification model from Section 3.2.3, we perform framework prediction on the summary to obtain its probability distribution q R 7 over each framework:
q = F r a m e C l a s s i f i e r ( y )
The framework distribution q of the summary is compared with the true framework labels l ^ L of the original news text to calculate the framework relevance loss:
L f r = k = 1 K l k l o g ( q k )
where l k is the true label of the original text on the k-th framework (1 indicates belonging to that framework, 0 indicates not belonging), and q k is the predicted probability of the summary on the k-th framework. By minimizing the framework relevance loss, the thematic relevance of the summary can be improved, highlighting the core content of the original text.
4.
Cross-Entropy Loss
In the summary generation task, we use cross-entropy loss to optimize the generation quality of the model. Let z = ( z 1 , z 2 , , z n ) be the reference summary sequence, where z i represents the i-th token in the reference summary. At each time step t , the model predicts the probability distribution p t of the next token based on the input sequence and previously generated tokens. We want the probability distribution p t generated by the model to be as close as possible to the true token distribution z i , and calculate the cross-entropy loss:
L c e = 1 n t = 1 n l o g p t ( z t )
The final loss function is a weighted sum of the cross-entropy loss, coverage loss, and framework relevance loss:
L = L c e + λ 1 L c o v + λ 2 L f r
where L c e is the cross-entropy loss, λ 1 and λ 2 are balancing factors used to control the weights of different loss terms.
Through the above transformer encoder and decoder that incorporate framework information, as well as the constraints of the coverage loss and framework relevance loss functions, high-quality summaries can be generated that consider informativeness, key frameworks, and semantic coherence.

4. Experiments

4.1. Datasets and Evaluation Metrics

4.1.1. Datasets

For our experiments, we selected three datasets: CNN/Daily Mail, BBCnews, and XSum datasets. The CNN/Daily Mail dataset, widely employed for evaluating text summarization tasks, comprises a vast collection of online news articles and their corresponding reference summaries from CNN and the Daily Mail. The BBCnews dataset, sourced from the British Broadcasting Corporation (BBC), encompasses a diverse range of news reports covering a broad spectrum of topics and events. The XSum dataset, consisting of BBC news articles paired with single-sentence summaries, is specifically designed for extremely abstractive summarization tasks. The BBCnews and XSum datasets were primarily used to assess the model’s generalization capabilities and its ability to generate concise summaries. In our experiments, we extracted 3000 news samples from the CNN/Daily Mail dataset and annotated them with frameworks. These samples are representative and diverse in terms of topics, information sources, and reporting styles. The scale and relevant information of the datasets are shown in Table 1 and Table 2, respectively.

4.1.2. Evaluation Metrics

We employ ROUGE (Recall-Oriented Understudy for Gisting Evaluation) [55] as an automatic evaluation metric to calculate the n-gram overlap between the generated summaries and the reference summaries. We report the F1 scores for ROUGE-1, ROUGE-2, and ROUGE-L.
To complement the automatic evaluation metrics, we formed a panel of three experts to manually evaluate the summaries generated by different models. These experts were selected based on their extensive experience in the fields of news journalism and natural language processing. Specifically, the panel consisted of:
(1)
A senior editor with over 15 years of experience in a major news organization
(2)
A professor of computational linguistics with a focus on text summarization
(3)
A researcher specializing in news framing analysis with 10+ years of experience
Each expert independently rated the summaries on a scale of 1–5 (1 being the worst and 5 being the best) across three dimensions: readability, information coverage, and framework relevance. To ensure the objectivity and consistency of the ratings, we provided the experts with detailed guidelines and conducted a calibration session before the actual evaluation. The final manual evaluation scores were calculated as the arithmetic mean of all expert ratings.

4.2. News Framing Identification Experiments

4.2.1. Experimental Setup

For this experiment, we used the manually annotated dataset constructed in Section 3.2.2 and randomly divided it into training, validation, and test sets in an 8:1:1 ratio. We employed the BERT model as the encoder and added a 7-class fully connected network on top of it as the framing classifier. The model was trained end-to-end using the cross-entropy loss function and the Adam optimizer, with a batch size of 32 and an initial learning rate of 2 × 10−5. All experiments were conducted on a server equipped with NVIDIA Tesla V100 GPUs. The training process for our NFRM model took approximately 3 h on this hardware configuration.

4.2.2. Baseline Models

To comprehensively evaluate the performance of the framing identification model, we set up the following baseline methods:
(1)
NBM (naive Bayes multi-nomial): A probabilistic graphical model based on Bayes’ theorem that estimates the conditional probabilities of each category through word frequency statistics and performs classification based on the maximum a posteriori probability criterion.
(2)
SVM (support vector machine) [12]: A classic discriminative model that seeks the maximum-margin hyperplane to separate samples of different classes in a high-dimensional space. We used a linear kernel function in our experiments.
(3)
TF-IDF+LR: Converts text into a TF-IDF-weighted bag-of-words representation and uses L2-regularized logistic regression for classification.
(4)
RNN: Employs a bidirectional LSTM network for sequence modeling of text and maps the hidden state of the last time step to class labels through a fully connected layer.
(5)
Transformer: A text encoding model based on self-attention mechanisms that captures global semantic information of text through positional encoding and multi-head attention. We used a 6-layer transformer encoder in our experiments.
(6)
BERT: Directly performs classification using the pre-trained BERT model, where the output vector corresponding to the [CLS] token is used as the classification feature.

4.2.3. Experimental Results and Analysis

Table 3 shows the performance of various models on the news framing identification dataset. It can be observed that deep learning-based methods generally outperform traditional machine learning methods. Among them, our proposed classification model that incorporates framing semantic features achieves the best performance on all metrics with a clear leading advantage.
In comparison, traditional machine learning models such as NBM, SVM, and TF-IDF perform poorly on all metrics, with NBM having an F1 score of only 0.31. This may be because these models cannot effectively capture the semantic information and structural features of text. Although the deep learning-based RNN and transformer models show some improvement in accuracy and F1 score, there is still a considerable gap compared to BERT.
NFRM demonstrates excellent performance advantages in the news framing identification task, which can be attributed to the following two points: First, the model incorporates rich framing semantic knowledge and characterizes the framing elements of news reports from multiple perspectives. Second, it employs the powerful pre-trained language model BERT as the encoder, enabling the model to fully extract deep semantic information from the text.
In addition to model comparisons, we also designed ablation experiments to examine the impact of different framing features on framing classification. Specifically, we removed one category of framing features at a time and evaluated the changes in model performance. As shown in Figure 5, regardless of which category of features is removed, the F1 score of NFRM decreases to varying degrees, with the responsibility feature being the most significant. This indicates that causal attribution information is crucial for framing identification. In fact, many news reports analyze causes and identify responsible parties when describing events, which forms an important semantic basis for framing construction. Additionally, removing the morality feature also leads to a substantial performance loss, which is related to the frequent presence of moral judgments and value judgments in news reports.
In contrast, removing the fact, conflict, human interest, economic and leadership features have a relatively smaller impact on model performance, seemingly suggesting that these features contribute less than the responsibility and morality features. However, this does not mean that these features are dispensable. In fact, although the influence of removing a single category of features is limited, the decline in model performance will be more significant if multiple categories of features are removed simultaneously. The construction of news frames is the result of the joint action of various semantic elements, and we need to capture the interactive influence and holistic nature of framing elements from multiple perspectives.

4.3. News Text Summarization Generation Experiments

4.3.1. Experimental Setup

The FrameSum model is implemented based on the encoder–decoder architecture and trained and tested on a high-performance server equipped with rented NVIDIA Tesla V100 SXM2 GPUs from the AutoDL platform (NVIDIA Corporation, Santa Clara, CA, USA). Both the encoder and decoder adopt an 8-layer transformer structure, with 8 multi-head attention heads and a hidden layer dimension of 1024 in the feed-forward network.
The model uses the Adam optimizer with an initial learning rate of 2 × 10−5 and a batch size of 32. All models are trained for 10 epochs, and an early stopping strategy and dropout regularization are employed to avoid overfitting.
The total training time for the FrameSum model, including all 10 epochs, was approximately 18 h on our hardware setup. However, due to the early stopping strategy, the actual training time varied slightly between runs, typically concluding after 7–8 epochs.

4.3.2. Baseline Models

To comprehensively evaluate the effectiveness of our proposed method, we selected the following baseline models for comparison:
(1)
Lead [56]: A simple text summarization method based on fixed rules. It directly extracts the first few sentences of the original document as the summary, without any semantic understanding or generation process.
(2)
LexRank [57]: A graph-based keyword extraction algorithm that considers sentences as nodes and selects the most relevant sentences as the summary based on node scores.
(3)
TextRank [15]: A graph-based ranking algorithm used for keyword extraction and document summarization in natural language processing.
(4)
T5 [58]: An advanced natural language processing model based on the transformer architecture, designed to convert all NLP problems into a text-to-text format. We fine-tuned T5 on the summarization dataset.
(5)
Flat transformer (FT) [59]: A transformer model that handles flat text sequences after merging multiple documents. It concatenates all documents into a single input sequence and utilizes the standard self-attention mechanism to capture and integrate information for generating summaries.
(6)
Pointer-generator (PG) [60]: A pointer-generator network that mitigates the OOV (out-of-vocabulary) problem through a copying mechanism and introduces a coverage loss to avoid repetition.
(7)
GPT-2 [61]: A self-regressive language model developed by OpenAI that considers all previously generated content when generating text.
(8)
Llama3 [62]: A large pre-trained language model developed by OpenAI that demonstrates excellent performance on various language tasks and possesses the ability to generate high-quality text.
(9)
Qwen [63]: An AI-based writing assistant that utilizes advanced natural language processing techniques to provide users with services such as grammar correction, style improvement, and content generation.

4.3.3. Experimental Results and Analysis

Our evaluation of the FrameSum model and baseline methods consists of both automatic metrics and expert assessments. We first present the results of the automatic evaluation using ROUGE metrics, followed by a detailed analysis of the manual evaluation conducted by our expert panel. This combination of quantitative and qualitative assessments provides a comprehensive view of the performance and quality of the generated summaries.
  • Automatic Evaluation Results
Table 4, Table 5 and Table 6 present the ROUGE metrics of each model on the CNN/Daily Mail, BBCnews, and XSum datasets, respectively. The results demonstrate that our proposed frame-aware summarization model (FrameSum) outperforms all baseline methods across all metrics, with statistically significant improvements (p < 0.01).
Specifically, the three unsupervised methods, Lead, LexRank, and TextRank, perform relatively poorly. Lead simply extracts sentences from the beginning of the article, ignoring the semantic information of the entire text. Although LexRank and TextRank consider the semantic relationships between sentences, they fail to capture the overall thematic framework of the article. The pointer-generator alleviates the OOV problem through a copying mechanism and introduces coverage loss to avoid repetition, thereby achieving better results than the unsupervised methods. However, its performance is still significantly lower than FrameSum.
T5, flat transformer, and GPT-2 represent the application of pre-trained language models in text summarization tasks. Compared to the pointer-generator, these models can better model the semantic information of long documents and generate more coherent and fluent summaries. However, due to the lack of explicit modeling of the thematic framework of news reports, the summaries generated by these models are often not concise enough and lack prominent themes.
Llama3 is a pre-trained language model with a larger number of parameters, possessing stronger language understanding and generation capabilities. Experimental results show that the fine-tuned Llama3 surpasses other pre-trained models in automatic evaluation metrics but still lags behind FrameSum. This indicates that merely increasing the model size cannot completely solve the issue of topic deviation in news summarization.
As a commercialized AI writing assistant, Qwen excels in the readability of summaries, often generating more accessible and easy-to-understand text. However, from the perspective of information coverage, the summaries generated by Qwen often omit some important details and lack alignment with the thematic framework of the original text.
Compared to the aforementioned baseline models, FrameSum achieves the best performance on all metrics. On the CNN/Daily Mail dataset, FrameSum’s ROUGE-1, ROUGE-2, and ROUGE-L scores are 0.64, 0.41, and 1.65 percentage points higher than the best baseline (Llama3), respectively. On the BBCnews dataset, the corresponding metric improvements are 0.41, 0.31, and 0.07 percentage points. On the XSum dataset, the corresponding metric improvements are 1.35, 4.78, and 2.44 percentage points. These results fully demonstrate that by introducing news framing knowledge and performing explicit modeling, FrameSum can better grasp the main ideas of articles and generate concise and thematically prominent summaries. Moreover, FrameSum makes targeted improvements based on pre-trained language models, inheriting their strong language expression capabilities while compensating for their deficiencies in thematic summarization.
2.
Manual Evaluation Results
As introduced in the Evaluation Metrics section, we invited a panel of three experts to manually evaluate the quality of summaries generated by different models. Figure 6 shows the average scores given by the experts for each model in three dimensions: readability, information coverage, and framework relevance.
From the figure above, it can be observed that the summaries generated by Qwen have the highest readability, benefiting from its language style optimization function aimed at the general public. However, Qwen’s performance in information coverage and framework relevance is average, possibly because it focuses more on the readability of summaries while neglecting information completeness. Llama3’s metrics are relatively balanced, but there is still a gap compared to human-written summaries.
In contrast, FrameSum is on par with Qwen and Llama3 in terms of readability and information coverage but far surpasses them in framework relevance, achieving a high score of 4.35. This indicates that the summaries generated by FrameSum are not only easy to understand and information rich but also closely aligned with the core themes of news reports, accurately summarizing the central content of the articles. The above results fully validate the positive role of incorporating framing knowledge in improving summary quality.
3.
Case Analysis
To intuitively showcase the strengths and weaknesses of the summaries generated by each model, Table 7 presents a news report along with a manually written reference summary and the summaries generated by each model.
From Table 7, it can be observed that the summaries generated by different models exhibit notable differences in content coverage and emphasis. The sentences extracted by Lead are from the beginning of the article, and although they provide background information about the news, they do not cover the subsequent important content, such as the driver being cited. The summaries generated by LexRank and TextRank are concise and cover the core information but lack some important details, such as Witherspoon’s age and her decision not to press charges. The summaries by pointer-generator and GPT-2 are relatively comprehensive and contain most of the key information, but they differ in details, such as whether they mention Witherspoon’s desire to move forward. The summaries by T5 and flat transformer are clear and concise but may overlook some minor details, such as the specific date of the incident. The summaries by Llama3 and Qwen achieve a good balance between information coverage and conciseness, including most of the key information.
In contrast, the summary generated by FrameSum not only covers the key elements of the event (such as the individuals involved, age, location, injuries, and follow-up actions) but also captures Witherspoon’s attitude of wanting to move forward. This summary maintains conciseness while accurately grasping the core content of the news.
4.
Ablation Study
To investigate the role of each key component in the frame-aware summarization model, we conducted an ablation study. Table 8 shows the ROUGE metrics of the model under different configurations on the CNN/Daily Mail dataset.
After removing the frame embedding (-w/o frame emb.), the ROUGE scores of the model decrease significantly, indicating that frame label information helps the model grasp the theme of the article. Further removing the frame features (-w/o frame feat.) on this basis leads to a further decline in model performance, suggesting that frame-related semantic features can provide important clues for summary generation.
Additionally, we examined the impact of different loss functions. Removing the coverage loss (-w/o cov. loss) results in a decrease in ROUGE scores, indicating that coverage loss helps improve the information coverage of summaries. Removing the frame relevance loss (-w/o frame rel. loss) has the greatest impact on model performance, demonstrating that this loss term plays a crucial role in generating summaries with clear themes and consistent framing.
The above experimental results validate the effectiveness of each module in FrameSum and further confirm the positive significance of incorporating framing knowledge in improving the quality of news summaries. This provides new ideas for the development of intelligent text processing systems in the future.

5. Conclusions

This paper innovatively introduces the framing theory from news communication into the field of natural language processing for the task of news text summarization. We propose an automatic news text summarization model based on framing theory. By integrating the two subtasks of news framing identification and summary generation, the model can automatically extract framing elements from news reports and generate high-quality summaries that are factual, comprehensive, and frame relevant. Experiments on the standard CNN/Daily Mail dataset show that our proposed method significantly outperforms existing text summarization baseline models on various evaluation metrics, confirming the applicability and effectiveness of framing theory in explaining and guiding news report summarization. Moreover, manual evaluation also indicates that incorporating framing semantics not only helps improve summary quality but also better aligns with human cognitive habits, showing broad prospects in practical application scenarios such as news aggregation, public opinion analysis, and decision support.
While our experimental results demonstrate significant improvements in news text summary quality using our framing theory-based model, we acknowledge certain limitations in our approach. Firstly, the currently available manually annotated framing corpora are relatively limited in scale, and the high dependence on training data may limit the generalization ability of the model. Future work needs to explore more efficient and economical framing annotation methods, such as crowdsourcing annotation, few-shot learning, unsupervised corpus mining, etc., to expand the scale of available framing corpora. Secondly, the existing frame-aware summarization models focus more on improving the automatic evaluation metrics of summaries and still lack in-depth explanations of the internal logic and causal mechanisms of summary generation.
In the future, it is necessary to draw on the latest advances in explainable artificial intelligence, characterize feature representations in the framing semantic space, highlight framing reasoning chains, and introduce human-computer interaction feedback mechanisms to empower users with the ability to actively adjust framing preferences, ultimately achieving explainable and controllable intelligent summarization. Furthermore, it could also explore the integration of reinforcement learning techniques, which have shown promise in optimizing complex systems [64,65,66], to further enhance the adaptability and efficiency of our frame-aware summarization model.

Author Contributions

Methodology, X.Z.; validation, Q.W., B.Z. and J.L.; formal analysis, Q.W. and B.Z.; investigation, X.Z.; data curation, Q.W. and J.L.; writing—original draft preparation, X.Z. and Q.W.; writing—review and editing, X.Z. and P.Z.; visualization, Q.W.; supervision, P.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (No. 2022YFC3302100).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This study utilized three publicly available datasets: The CNN/Daily Mail dataset (non-anonymous version) is available at https://github.com/abisee/cnn-dailymail, accessed on 10 April 2024. The BBC News Summary dataset can be accessed on Kaggle at https://www.kaggle.com/datasets/pariza/bbc-news-summary, accessed on 13 June 2024. The XSum dataset is available in the Edinburgh NLP GitHub repository at https://github.com/EdinburghNLP/XSum/tree/master/XSum-Dataset, accessed on 13 June 2024. Additionally, the data repository contains the manually annotated 3000 news samples from the CNN/Daily Mail dataset used for training and evaluating the news framing identification model proposed in this study. Detailed descriptions of the data preprocessing, annotation guidelines, and the specific train/validation/test splits are provided in the Methods section of the paper.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Appendix A

In Section 3.2.2, we discussed the definition and identification of news text framing, as shown in Table A1.
Table A1. General framing annotation guidelines.
Table A1. General framing annotation guidelines.
Frame NameAnnotation DescriptionLabel
Fact Frame[Definition] Objectively presents the basic facts of the event, including the cause, process, involved parties, time, location, and other elements, answering “what happened”.
[Identification Points] Describes facts in an objective and neutral language; explains the background information such as time, place, and characters (5W1H) of the event; includes key aspects such as the cause, process, and result of the event; usually used at the beginning of news reports to provide context.
[Example] On November 2nd, a terrorist attack occurred in the city center, causing at least ten civilian casualties.
l 1
Responsibility Frame[Definition] Explores the causes attributed to the occurrence of the event, identifies the responsible parties, and comments on or questions the dereliction of duty by the responsible parties.
[Identification Points] Analyzes the causes of the event; identifies the responsible parties; criticizes or holds the responsible parties accountable for their dereliction of duty; discusses how to prevent similar events from happening again.
[Example] The government was accused of failing to effectively prevent the terrorist attack, sparking questions and blame regarding the government’s security policies.
l 2
Conflict Frame[Definition] Highlights the evident divergence, opposition, and conflicts between the parties involved in the event.
[Identification Points] Identifies the different interest groups involved in the event; describes the opposing views or interest demands between different interest groups; emphasizes the complexity and controversial nature of the event; commonly seen in event reports involving public policies or the interests of different groups.
[Example] The terrorist attack triggered tensions between the government and counter-terrorism departments, with the government demanding stricter security measures while the opposition accused the government of neglecting counter-terrorism efforts.
l 3
Human Interest Frame[Definition] Presents vivid details and reveals individual fates by describing individuals involved in the event, evoking emotional resonance with the audience.
[Identification Points] Selects individuals from the event for a special feature; vividly describes the experiences, psychological states, and emotional changes of individuals; rich in details and dramatic tension; evokes sympathy and empathy from the audience.
[Example] A survivor recounted his fear and despair during the terrorist attack, as well as the painful experience of losing loved ones.
l 4
Economic Consequence Frame[Definition] Focuses on the impact of the event on the economic domain, including aspects such as costs, benefits, and budgets.
[Identification Points] Analyzes the economic impact of the event on related industries, businesses, or individuals; involves economic indicators such as data and amounts; discusses economic issues such as cost control, benefit improvement, and budget balancing; commonly seen in reports involving public policies and business operations.
[Example] The terrorist attack not only led to a sharp decline in business activities in the city center but also caused economic losses and a decrease in investor confidence.
l 5
Morality Frame[Definition] Evaluates the event and related behaviors from the perspective of moral ethics, involving values, social norms, etc.
[Identification Points] Judges the behaviors of the parties involved from a moral perspective; discusses the value conflicts or moral dilemmas reflected by the event; provokes societal reflection and discussion on moral choices and ethical principles; commonly seen in hot social events and criminal cases.
[Example] Various sectors of society called on the government to strengthen counter-terrorism measures and condemned terrorist acts as a serious challenge to human values and moral norms.
l 6
Leadership Frame[Definition] Focuses on the performance of leaders or managers in responding to the event, evaluating their leadership, decision-making, and control abilities.
[Identification Points] Reports on the words and actions of managers and decision-makers in the event; analyzes the response strategies and solutions of leaders; evaluates the abilities and determination demonstrated by leaders in the event; reflects the public’s expectations of leaders.
[Example] Government leaders issued statements condemning the terrorist attack, pledging to take all necessary measures to ensure public safety, demonstrating the leaders’ firm resolve and leadership capabilities.
l 7
We adopted the seven general frame types summarized by De Vreese. In the actual annotation process, annotators only need to determine the most prominent and dominant frame type for each report, without identifying secondary frames. This aims to focus on the core narrative structure of each report, simplify the annotation process, and ensure the clarity and consistency of the annotation results.

Appendix B

In this appendix, we provide a detailed explanation of the feature extraction process for each frame type. For named entity recognition, sentiment analysis, and dependency parsing tasks across all relevant frame types, we utilize the spaCy library, which provides robust and efficient natural language processing capabilities.
1.
Fact Frame Feature Extraction
(1)
Keyword extraction: Define part-of-speech patterns P = { p 1 , p 2 , , p m } for factual keywords, such as p 1 = “who/WP”, p 2 = “what/WP”, etc. Match the tokenized and part-of-speech tagged results. Extract the words w j corresponding to the matched pattern p i positions j as the set of factual keywords K i . The keyword feature vector is represented as E k e y w o r d = [ K 1 , K 2 , , K m ] .
(2)
Named entity recognition: Perform named entity recognition to obtain the set of named entities E = { e 1 , e 2 , , e k } . For each named entity type c i , calculate its occurrence frequency f i in the set E , then F = [ f 1 , f 2 , , f l ] , where l is the number of named entity types. The named entity feature vector can be represented as E e n t i t y = [ E , F ] .
(3)
Numerical information statistics: Use regular expressions to match numerical information in the news text, obtaining the set of numerical information N = { n 1 , n 2 , , n q } . For each type u i of numerical information, calculate its occurrence frequency g i in the set N , then G = [ g 1 , g 2 , , g r ] , where r is the number of numerical information types. The numerical information feature vector can be represented as E n u m e r i c = [ N , G ] .
(4)
Factual statement extraction: Extract the subject–verb–object structure of each sentence s i in the text and determine whether it is a factual statement. Extract the set of factual statements D = { d 1 , d 2 , , d v } and calculate the proportion of factual statements r = | D | | S | , where | S | is the total number of sentences in the text. The factual statement feature vector can be represented as E s t a t e m e n t = [ D , r ] .
Combine the above four types of features into the semantic feature vector of the factual frame: Φ f a c t = [ E k e y w o r d , E e n t i t y , E n u m e r i c , E s t a t e m e n t ] .
2.
Responsibility Frame Feature Extraction
(1)
Causal relation extraction: Perform dependency parsing on the text to obtain dependency structure triples in the form. Based on dependency relation patterns M = { m 1 , m 2 , , m p } , such as m 1 = (“nsubj”, “dobj”, “cause”), m 2 = (“nsubj”, “prep”, “pobj”), match the “cause-effect” semantic relation pairs ( e i , e j ) and determine the causal relation type c k C   ( k = 1 , 2 , , n ) they belong to. Calculate the frequency f k of each causal relation type c k , then E c a u s a l i t y = [ f 1 , f 2 , , f n ] .
(2)
Sentiment analysis: Using the sentiment analysis module provided by spaCy, we perform sentiment analysis on the events e i and e j , obtaining scores s i and s j . The spaCy sentiment analyzer classifies text into positive, negative, or neutral categories, providing a comprehensive understanding of the emotional tone associated with each event. The sentiment feature of the causal relation pair ( e i , e j ) is s i , j = s i + s j . Calculate the average sentiment score s of all relation pairs as the overall sentiment orientation feature, then E s e n t i m e n t = [   s ¯ , s 1,2 , s 1,3 , , s i , j , ] .
(3)
Responsibility keyword statistics: Construct a responsibility keyword dictionary D = { w 1 , w 2 , , w m } , including words such as “responsibility”, “accountability”, “investigate”, etc. Calculate the occurrence frequency t i , k of each keyword in each event e i (denoted as k , where k represents the k -th keyword in the dictionary D ). For each causal relation pair ( e i , e j ) , calculate the sum of keyword frequencies t i j , k = t i , k + t j , k . The keyword feature vector is represented as E k e y w o r d s = [ t 1 , 1 , t 1 , 2 , , t i j , k , ] .
(4)
Context information encoding: Extract the contextual information u i such as time and location of each event e i occurrence, then U = { u 1 , u 2 , , u l } . Convert each piece of contextual information into a feature vector u i , then the contextual information feature matrix is represented as E c o n t e x t = [ u 1 , u 2 , , u l ] T .
Concatenate the above four types of feature vectors to obtain the semantic feature vector of the responsibility frame: Φ r e s p o n s i b i l i t y = [ E c a u s a l i t y , E s e n t i m e n t , E k e y w o r d s , E c o n t e x t ] .
3.
Conflict Frame Feature Extraction
(1)
Sentiment lexicon extraction: We use the sentiment analysis module from spaCy to identify words expressing negative emotions such as anger, frustration, and disagreement in the text. While “conflict” and “confrontation” are not emotions per se, they often indicate underlying emotional states commonly associated with conflict situations. We calculate the sentiment intensity si of each identified word based on the context. Calculate the total number N s of negative sentiment words and the sum S s = i = 1 N s s i of their sentiment intensities E s e n t i m e n t = [ N s , S s ] .
(2)
Opposing entities identification: Through named entity recognition, obtain the opposing entities e 1 and e 2 For each entity e i , extract its occurrence frequency f i in the text and its key descriptive words d i j . The opposing entities feature vector is represented as E e n t i t i e s = [ e 1 ( f 1 , d 11 , d 12 , ) , e 2 ( f 2 , d 21 , d 22 , ) ] .
(3)
Argumentation pattern matching: Predefine argumentation pattern templates J = { j 1 , j 2 , , j m } , such as “A argued that..”., “A and B debated..”., etc. Calculate the number c i of matches for each pattern template j i in the text. The argumentation pattern feature vector is represented as E p a t t e r n s = [ c 1 , c 2 , , c m ] .
(4)
Rhetorical device analysis: Define a set of rhetorical devices R = { r 1 , r 2 , , r n } that express conflict, such as r 1 = “metaphor”, r 2 = “hyperbole”, etc. Design corresponding recognition rules r i for each rhetorical device. For example, metaphor can be recognized through “is like”, “is a”, while hyperbole can be recognized through “most”, “always”, etc. Calculate the occurrence frequency t i of each rhetorical device in the news text. The rhetorical device feature vector is represented as E r h e t o r i c = [ t 1 , t 2 , , t n ] .
Combine the above four types of feature vectors into the semantic feature vector of the conflict frame: Φ c o n f l i c t = [ E s e n t i m e n t , E e n t i t i e s , E p a t t e r n s , E r h e t o r i c ] .
4.
Human Interest Frame Feature Extraction
(1)
Character sketch identification: For each paragraph p i in the text, extract person names, pronouns, and other character references e i j , and calculate the density d i = j e i j l e n g t h ( p i ) of character references in each paragraph, where l e n g t h ( p i ) is the number of character references in p i . Set a character sketch density threshold θ, and select paragraphs P b i o = p i   |   d i > θ with density higher than the threshold as character sketch paragraphs. The character sketch feature vector is represented as E b i o = [ | P b i o | , | P b i o | | P | ] .
(2)
Sentiment lexicon extraction: Perform sentiment analysis on each paragraph p i and extract sentiment words w i j to form the sentiment lexicon set W i . Calculate the sentiment lexicon density of each paragraph s i = | W i | l e n g t h ( p i ) . The sentiment lexicon feature vector is represented as E e m o = i   | W i | i l e n g t h ( p i ) , m a x ( s i ) , a v g ( s i ) .
(3)
Character perspective analysis: Extract the subject s u b i j of each paragraph p i and determine whether each subject is a character reference. Calculate the proportion of character subjects r i = | s u b i j | s u b i j e i j | | s u b i j | . Set a character perspective proportion threshold δ , and select paragraphs P p o v = p i | r i > δ with proportion higher than the threshold as character perspective paragraphs. The character perspective feature vector is represented as E p o v = [ | P p o v | , | P p o v | | P | ] .
(4)
Rhetorical device detection: Define a set of rhetorical devices R = { r 1 , r 2 , , r m } and design corresponding recognition rules t k l for each rhetorical device r k , then T k = t k 1 ,   t k 2 ,   . Use the rule set T k to detect whether each paragraph p i contains rhetorical devices r k , and obtain the occurrence count c i k of rhetorical devices. The rhetorical device feature vector is represented as E r h e = [ i c i 1 , i c i 2 , , i c i m ] .
Combine the above four types of feature vectors to obtain the semantic feature vector of the human interest frame: Φ h u m a n = [ E b i o , E e m o , E p o v , E r h e ] .
5.
Economic Consequence Frame Feature Extraction
(1)
Economic lexicon extraction: Construct an economic domain vocabulary dictionary D e c o = { w 1 , w 2 , , w n } , including economic terms, financial terms, etc. Calculate the TF-IDF weight of each word w i in the text within the dictionary D e c o , and use it as the economic relevance of that word. Select the top k words with the highest economic relevance as the economic lexicon features E t e r m = [ w 1 , w 2 , , w k ] .
(2)
Economic data statistics: Define a set of economic data types T d a t a = { t 1 , t 2 , , t p } , such as GDP, CPI, unemployment rate, etc. Design corresponding regular expression templates r i for each data type t i , and extract economic data values v i j from the news text based on the templates r i to form the economic data features E d a t a = { ( t i , v i j ) } .
(3)
Economic impact analysis: Extract event entities e i from the news text, such as “new policy introduced”, “company bankruptcy”, etc. Determine whether there is a causal relationship between each event entity e i and economic entities e c o j , and obtain the causal relationship set C = { ( e i , e c o j , r e l i j ) } , where r e l i j represents the causal relationship type, such as “promote”, “inhibit”, etc. For each event entity e i , calculate the number of causal relationships between it and economic entities n i = | { ( e i , e c o j , r e l i j )   | ( e i , e c o j , r e l i j ) C } | . The economic impact feature vector is represented as E i m p a c t = [ n 1 , n 2 , , n q ] .
(4)
Industry association assessment: Define a set of industry domains I = { i 1 , i 2 , , i r } related to the economy, and construct a corresponding set of keywords K k = { k w k 1 , k w k 2 , } for each industry domain i k . Calculate the co-occurrence frequency of each event entity e i with the keywords of each industry in the text, c o i k = j c o u n t ( e i , k w k j ) . The industry association feature vector is represented as E i n d u s t r y = [ c o 11 , c o 12 , , c o 1 r , c o 21 , , c o q r ] , that is the association degree between each event entity and each industry.
Combine the above four types of features to obtain the complete semantic feature vector of the economic consequence frame: Φ e c o n o m i c = [ E t e r m , E d a t a , E i m p a c t , E i n d u s t r y ] .
6.
Morality Frame Feature Extraction
(1)
Moral lexicon extraction: Construct a moral domain vocabulary dictionary D m o r a l = { w 1 , w 2 , , w n } , including words related to morality, ethics, justice, etc. Calculate the occurrence frequency f i of each word w i in the text within the dictionary D m o r a l , and use it as the moral relevance of that word. Select the top k words with the highest moral relevance as the moral lexicon features E t e r m = [ w 1 , w 2 , , w k ] .
(2)
Moral pattern matching: Define a set of moral judgment sentence patterns P m o r a l = { p 1 , p 2 , , p s } , such as “is a virtue”. For each sentence s i in the text, match the sentence patterns P m o r a l and obtain m i j = m a t c h ( s i , p j ) , where m i j = 1 indicates a successful match and m i j = 0 indicates a failure. The moral pattern feature vector is represented as E p a t t e r n = [ m 11 , m 12 , , m 1 s , m 21 , , m t s ] , where t is the total number of sentences in the text.
(3)
Rhetorical device analysis: Define a set of rhetorical devices R m o r a l = { r 1 , r 2 , , r u } related to moral appeals, and design corresponding recognition rules a l g k for each rhetorical device r k . For each sentence s i in the text, use the recognition rules a l g k to determine whether it contains rhetorical devices r k , then r h i k = a l g k ( s i ) , where r h i k = 1 indicates occurrence and r h i k = 0 indicates non-occurrence. The rhetorical device feature vector is represented as E r h e t o r i c = [ r h 11 , r h 12 , , r h 1 u , r h 21 , , r h t u ] .
(4)
Semantic reasoning: Construct a knowledge base K B m o r a l containing moral common sense, values, etc., represented as a set of triples ( e 1 , r , e 2 ) , where e 1 and e 2 are entities or concepts, and r is the relationship between them. Extract entities, concepts, and their relationships from the text to obtain the semantic representation G n e w s = { ( e 1 , r , e 2 ) } . Use a knowledge reasoning algorithm to determine whether each triple in G n e w s can be inferred from the moral knowledge base K B m o r a l , and obtain the reasoning results i n f i = i n f e r ( ( e 1 ,   r ,   e 2 ) , where i n f i = 1 indicates that it can be inferred and i n f i = 0 indicates that it cannot be inferred. The semantic reasoning feature vector is represented as E i n f e r e n c e = [ i n f 1 , i n f 2 , , i n f q ] .
Combine the above four types of features into the complete semantic feature vector of the morality frame: Φ m o r a l = [ E t e r m , E p a t t e r n , E r h e t o r i c , E i n f e r e n c e ] .
7.
Leadership Frame Feature Extraction
(1)
Leader identification: Obtain the set of person names E = { e 1 , e 2 , , e n } in the text through named entity recognition. Construct a leadership position dictionary D l e a d e r = { j 1 , j 2 , , j m } , including leadership position names such as president, prime minister, CEO, etc. Match the leadership position dictionary D l e a d e r in the context of each person name e i . If a match exists, mark it as a leader. The leader feature vector is represented as E l e a d e r = [ l 1 , l 2 , , l n ] , where l i = 1 indicates that the person e i is a leader, and l i = 0 indicates that the person is not a leader.
(2)
Leadership behavior extraction: Construct a leadership behavior verb dictionary D a c t i o n = { v 1 , v 2 , , v p } , including words such as decide, instruct, encourage, etc. Perform dependency parsing on the text, extract verbs v j and their subjects, and determine whether they are in the dictionary D a c t i o n and whether their subjects are leaders. Calculate the frequency of leadership behavior verbs corresponding to each leader e i , then E a c t i o n = [ a 11 , a 12 , , a 1 p , a 21 , , a n p ] , where a i j represents the co-occurrence frequency of the leader e i and the leadership behavior verb v j .
(3)
Leadership ability evaluation: Construct a leadership ability evaluation dictionary D a b i l i t y = { w 1 , w 2 , , w q } , including adjectives such as strong, excellent, incompetent, etc. Use an opinion mining model to identify evaluative sentences in the text, and extract the evaluation object, evaluation word, and sentiment polarity. For each leader e i , calculate the frequency of positive and negative leadership ability evaluation words related to them, then E a b i l i t y = [ p o s 1 , n e g 1 , p o s 2 , n e g 2 , , p o s n , n e g n ] , where p o s i and n e g i represent the frequency of positive and negative leadership ability evaluation words for the leader e i , respectively.
(4)
Rhetorical device analysis: Define a set of rhetorical devices R l e a d e r = { r 1 , r 2 , , r s } used to describe leaders, and design corresponding recognition rules a l g k for each rhetorical device r k to determine whether a sentence contains that rhetorical device. For each sentence s i in the text, use the recognition rules a l g k to determine whether it contains rhetorical devices r k , then r h i k = a l g k ( s i ) , where r h i k = 1 indicates occurrence, and r h i k = 0 indicates non-occurrence. The rhetorical device feature vector is represented as E r h e t o r i c = [ r h 11 , r h 12 , , r h 1 s , r h 21 , , r h t s ] .
Combine the above four types of features into the semantic feature vector of the leadership frame: Φ l e a d e r = [ E l e a d e r , E a c t i o n , E a b i l i t y , E r h e t o r i c ] .

References

  1. IDC. Expect 175 Zettabytes of Data Worldwide by 2025. Networkworld. 2018. Available online: https://www.networkworld.com/article/3325397/idc-expect-175-zettabytes-of-data-worldwide-by-2025.html (accessed on 25 July 2024).
  2. Mohsin, M.; Latif, S.; Haneef, M.; Tariq, U.; Khan, M.A.; Kadry, S.; Choi, J.I. Improved Text Summarization of News Articles Using GA-HC and PSO-HC. Appl. Sci. 2021, 11, 10511. [Google Scholar] [CrossRef]
  3. Singh, R.K.; Khetarpaul, S.; Gorantla, R.; Allada, S.G. SHEG: Summarization and headline generation of news articles using deep learning. Neural Comput. Appl. 2021, 33, 3251–3265. [Google Scholar] [CrossRef]
  4. Liu, Y.; Zhu, C.; Zeng, M. End-to-end segmentation-based news summarization. arXiv 2021, arXiv:2110.07850. [Google Scholar]
  5. Ma, C.; Zhang, W.E.; Wang, H.; Gupta, S.; Guo, M. Dependency Structure for News Document Summarization. arXiv 2021, arXiv:2109.11199. [Google Scholar]
  6. Huang, Y.H.; Lan, H.Y.; Chen, Y.S. Unsupervised Text Summarization of Long Documents using Dependency-based Noun Phrases and Contextual Order Arrangement. In Proceedings of the 34th Conference on Computational Linguistics and Speech Processing (ROCLING 2022), Taipei, Taiwan, 18–19 November 2022; pp. 15–24. [Google Scholar]
  7. Bateson, G. A theory of play and fantasy. Psychiatr. Res. Rep. 1955, 2, 39–51. [Google Scholar]
  8. Goffman, E. Frame Analysis: An Essay on the Organization of Experience; Harvard University Press: Cambridge, MA, USA, 1974. [Google Scholar]
  9. Heidenreich, T.; Lind, F.; Eberl, J.; Boomgaarden, H.G. Media Framing Dynamics of the ‘European Refugee Crisis’: A Comparative Topic Modelling Approach. J. Refug. Stud. 2019, 32, i172–i182. [Google Scholar] [CrossRef]
  10. Nassar, R. Framing refugees: The impact of religious frames on U.S. partisans and consumers of cable news media. Polit. Commun. 2020, 37, 593–611. [Google Scholar] [CrossRef]
  11. Calabrese, C.; Anderton, B.N.; Barnett, G.A. Online representations of “Genome Editing” uncover opportunities for encouraging engagement: A semantic network analysis. Sci. Commun. 2019, 41, 222–242. [Google Scholar]
  12. Burscher, B.; Odijk, D.; Vliegenthart, R.; de Rijke, M.; de Vreese, C.H. Teaching the computer to code frames in news: Comparing two supervised machine learning approaches to frame analysis. Commun. Methods Meas. 2014, 8, 190–206. [Google Scholar] [CrossRef]
  13. Eisele, O.; Heidenreich, T.; Litvyak, O.; Boomgaarden, H.G. Capturing a News Frame–Comparing Machine-Learning Approaches to Frame Analysis with Different Degrees of Supervision. Commun. Methods Meas. 2023, 17, 205–226. [Google Scholar]
  14. Luhn, H.P. The automatic creation of literature abstracts. IBM J. Res. Dev. 1958, 2, 159–165. [Google Scholar] [CrossRef]
  15. Mihalcea, R.; Tarau, P. TextRank: Bringing order into text. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 25–26 July 2004; pp. 404–411. [Google Scholar]
  16. Barzilay, R.; Elhadad, M. Using Lexical Chains for Text Summarization. In Advances in Automatic Text Summarization; MIT Press: Cambridge, MA, USA, 1997; pp. 111–121. [Google Scholar]
  17. Nallapati, R.; Zhou, B.; Ma, M. Classify or select: Neural architectures for extractive document summarization. arXiv 2016, arXiv:1611.04244. [Google Scholar]
  18. Yasunaga, M.; Zhang, R.; Meelu, K.; Pareek, A.; Srinivasan, K.; Radev, D. Graph-based neural multi-document summarization. In Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), Vancouver, BC, Canada, 3–4 August 2017; pp. 452–462. [Google Scholar]
  19. Rush, A.M.; Chopra, S.; Weston, J. A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 379–389. [Google Scholar]
  20. Chopra, S.; Auli, M.; Rush, A.M. Abstractive sentence summarization with attentive recurrent neural networks. In Proceedings of the 2016 Conference of the North American Chapter of the ACL: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 93–98. [Google Scholar]
  21. Nallapati, R.; Zhou, B.; Santos, C.N.; Gulcehre, C.; Xiang, B. Abstractive text summarization using sequence-to-sequence RNNs and beyond. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany, 11–12 August 2016; pp. 280–290. [Google Scholar]
  22. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
  23. Antropic. The Claude 3 Model Family: Opus, Sonnet, Haiku. Available online: https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf (accessed on 25 July 2024).
  24. Google. Gemini: A Family of Highly Capable Multimodal Models. Available online: https://assets.bwbx.io/documents/users/iqjWHBFdfxIU/r7G7RrtT6rnM/v0 (accessed on 25 July 2024).
  25. OPENAI. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
  26. OPENAI. Introducing ChatGPT. Available online: https://openai.com/blog/chatgpt (accessed on 25 July 2024).
  27. Baidu. The Report of Ernie Bot. Available online: https://aistudio.baidu.com/aistudio/projectdetail/5748979 (accessed on 25 July 2024).
  28. Peng, W.; Yang, A.; Rui, M.; Wang, Y.; Guo, C.; Ren, B.; Lin, Y.; Zhou, P.; Huang, L.; Peng, N.; et al. OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022. [Google Scholar]
  29. Maynez, J.; Narayan, S.; Bohnet, B.; Karamanolakis, G. On Faithfulness and Factuality in Abstractive Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 1906–1919. [Google Scholar]
  30. Sotudeh, S.; Gharebagh, S.S.; Goharian, N. TLDR: Extreme Summarization of Scientific Documents. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online, 1–6 August 2021; pp. 4766–4777. [Google Scholar]
  31. Dou, Z.Y.; Liu, P.; Hayashi, H.; Jiang, Z.; Neubig, G. GSum: A General Framework for Guided Neural Abstractive Summarization. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 4830–4842. [Google Scholar]
  32. Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The Long-Document Transformer. arXiv 2020, arXiv:2004.05150. [Google Scholar]
  33. Gitlin, T. The Whole World Is Watching: Mass Media in the Making and Unmaking of the New Left; University of California Press: Berkeley, CA, USA, 1980; pp. 6–7. [Google Scholar]
  34. Gamson, W.A.; Modigliani, A. Media discourse and public opinion on nuclear power: A constructionist approach. Am. J. Sociol. 1989, 95, 1–37. [Google Scholar] [CrossRef]
  35. Entman, R.M. Framing: Toward clarification of a fractured paradigm. J. Commun. 1993, 43, 51–58. [Google Scholar] [CrossRef]
  36. Tuchman, G. Making News: A Study in the Construction of Reality; Free Press: New York, NY, USA, 1978. [Google Scholar]
  37. Zang, G.R. News Media and News Sources: A Discourse on Media Framing and Reality Construction; San Min Book Co., Ltd.: Taipei, Taiwan, 1999. [Google Scholar]
  38. Huang, D. The Image of the Messenger: The Construction and Deconstruction of Journalistic Professionalism; Fudan University Press: Shanghai, China, 2005. [Google Scholar]
  39. Cacciatore, M.A.; Scheufele, D.A.; Iyengar, S. The End of Framing as we Know it … and the Future of Media Effects. Mass Commun. Soc. 2016, 19, 7–23. [Google Scholar] [CrossRef]
  40. Pan, Z.D. Frame Analysis: A Field in Urgent Need of Theoretical Clarification. Commun. Soc. 2006, 1, 17–46. [Google Scholar]
  41. Krippendorff, K. Content Analysis: An Introduction to Its Methodology; Sage Publications: Thousand Oaks, CA, USA, 2018. [Google Scholar]
  42. Fairclough, N. Analysing Discourse: Textual Analysis for Social Research; Psychology Press: London, UK, 2003. [Google Scholar]
  43. Ryan, M.L. Narrative as Virtual Reality; Johns Hopkins University Press: Baltimore, MD, USA, 2001; pp. 357–359. [Google Scholar]
  44. Semetko, H.A.; Valkenburg, P.M. Framing European politics: A content analysis of press and television news. J. Commun. 2000, 50, 93–109. [Google Scholar] [CrossRef]
  45. Zhang, P.W. Research on News Framing of Public Health Emergencies. Master’s Thesis, University of Electronic Science and Technology of China, Chengdu, China, 2022. [Google Scholar]
  46. Iyengar, S. Is Anyone Responsible?: How Television Frames Political Issues; University of Chicago Press: Chicago, IL, USA, 1994. [Google Scholar]
  47. De Vreese, C.H. Framing Europe: Television News and European Integration; Aksant: Amsterdam, The Netherlands, 2003. [Google Scholar]
  48. Lawlor, A.; Tolley, E. Deciding who’s legitimate: News media framing of immigrants and refugees. Int. J. Commun. 2017, 11, 25. [Google Scholar]
  49. Walter, D.; Ophir, Y. News frame analysis: An inductive mixed-method computational approach. Commun. Methods Meas. 2019, 13, 248–266. [Google Scholar] [CrossRef]
  50. Valkenburg, P.M.; Semetko, H.A.; De Vreese, C.H. The effects of news frames on readers’ thoughts and recall. Commun. Res. 1999, 26, 550–569. [Google Scholar] [CrossRef]
  51. Tong, J. Environmental risks in newspaper coverage: A framing analysis of investigative reports on environmental problems in 10 Chinese newspapers. Environ. Commun. 2014, 8, 345–367. [Google Scholar] [CrossRef]
  52. Honnibal, M.; Montani, I. spaCy: Industrial-Strength Natural Language Processing in Python. Available online: https://spacy.io (accessed on 21 August 2024).
  53. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  54. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 8024–8035. [Google Scholar]
  55. Lin, C.Y. ROUGE: A Package for Automatic Evaluation of Summaries. In Proceedings of the ACL-04 Workshop on Text Summarization Branches Out, Barcelona, Spain, 25–26 July 2004. [Google Scholar]
  56. Wasson, M. Using leading text for news summaries: Evaluation results and implications for commercial summarization applications. In Proceedings of the 17th International Conference on Computational Linguistics, Montreal, QC, Canada, 10–14 August 1998; pp. 1364–1368. [Google Scholar]
  57. Erkan, G.; Radev, D.R. LexRank: Graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 2004, 22, 457–479. [Google Scholar] [CrossRef]
  58. Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
  59. Xiao, T.; Xu, C.; Wu, H.; Ji, Z.; Wang, C.; Zhou, H.Y. Flat Transformer for Long Document Summarization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 22 February–1 March 2022; pp. 11545–11552. [Google Scholar]
  60. See, A.; Liu, P.J.; Manning, C.D. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, Vancouver, BC, Canada, 30 July–4 August 2017; pp. 1073–1083. [Google Scholar]
  61. Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
  62. MetaAI. Llama3-Model Card. 2024. Available online: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3 (accessed on 25 July 2024).
  63. Qwen Team, Alibaba Group. Qwen Technical Report. Available online: https://qianwen-res.oss-cn-beijing.aliyuncs.com/QWEN_TECHNICAL_REPORT.pdf (accessed on 25 July 2024).
  64. Yan, M.; Xiong, R.; Wang, Y.; Li, C. Edge Computing Task Offloading Optimization for a UAV-assisted Internet of Vehicles via Deep Reinforcement Learning. IEEE Trans. Veh. Technol. 2024, 73, 5647–5658. [Google Scholar] [CrossRef]
  65. Yan, M.; Luo, M.; Chan, C.A.; Gygax, A.F.; Li, C.; Chih-Lin, I. Energy-Efficient Content Fetching Strategies in Cache-Enabled D2D Networks via an Actor-Critic Reinforcement Learning Structure. IEEE Trans. Veh. Technol. 2024; early access. [Google Scholar] [CrossRef]
  66. Yan, M.; Chan, C.A.; Gygax, A.F.; Li, C.; Nirmalathas, A.; Chih-Lin, I. Efficient Generation of Optimal UAV Trajectories with Uncertain Obstacle Avoidance in MEC Networks. IEEE Internet Things J. 2024; early access. [Google Scholar] [CrossRef]
Figure 1. Workflow of the news text summarization model based on framing theory.
Figure 1. Workflow of the news text summarization model based on framing theory.
Applsci 14 07548 g001
Figure 2. Architecture of NFRM.
Figure 2. Architecture of NFRM.
Applsci 14 07548 g002
Figure 3. Diagram of the encoder structure incorporating frame information.
Figure 3. Diagram of the encoder structure incorporating frame information.
Applsci 14 07548 g003
Figure 4. Framework-guided decoder structure diagram.
Figure 4. Framework-guided decoder structure diagram.
Applsci 14 07548 g004
Figure 5. Ablation study results.
Figure 5. Ablation study results.
Applsci 14 07548 g005
Figure 6. Manual evaluation results.
Figure 6. Manual evaluation results.
Applsci 14 07548 g006
Table 1. Scale of dataset.
Table 1. Scale of dataset.
DatasetSize
Training set (CNN/Daily Mail)2400
Validation set (CNN/Daily Mail)300
Test set (CNN/Daily Mail)300
Test set (BBCnews)450
Test set (XSum)450
Table 2. Information overview of the CNN/Daily Mail dataset.
Table 2. Information overview of the CNN/Daily Mail dataset.
TrainTestValidation
Avg#tokens807827814
Max#entities199148234
Avg#entities444444
Vocabsize48,02022,31122,433
Table 3. Framing identification results.
Table 3. Framing identification results.
ModelAccuracyPrecisionRecallF1
NBM0.440.330.450.31
SVM0.610.620.610.58
TF-IDF0.620.690.610.56
RNN0.470.370.490.34
Transformer0.710.690.700.65
BERT0.790.750.790.72
Our Model (NFRM)0.920.870.890.90
Table 4. Experimental results on the CNN/Daily Mail dataset.
Table 4. Experimental results on the CNN/Daily Mail dataset.
ModelROUGE-1ROUGE-2ROUGE-L
Lead34.1712.4431.17
LexRank36.7513.5429.89
TextRank37.1713.6232.82
PG39.5317.2836.38
T543.2117.8238.96
FT40.4625.2634.65
GPT-243.2117.8238.96
Llama344.1826.4337.91
Qwen43.7424.6837.62
Our model (FrameSum)44.8226.8439.56
Table 5. Experimental results on the BBCnews dataset.
Table 5. Experimental results on the BBCnews dataset.
ModelROUGE-1ROUGE-2ROUGE-L
Lead56.6544.8355.84
LexRank55.7544.8355.00
TextRank52.6441.3351.51
T554.2841.7955.01
FT55.6144.2155.92
Llama356.7347.1856.84
Our model (FrameSum)57.1447.4956.91
Table 6. Experimental results on the XSum dataset.
Table 6. Experimental results on the XSum dataset.
ModelROUGE-1ROUGE-2ROUGE-L
Lead34.9112.0327.07
LexRank32.8310.9325.44
TextRank35.0111.5131.42
T538.9813.0729.34
FT40.5214.1629.92
Llama342.1314.5128.41
Our model (FrameSum)43.4819.2930.85
Table 7. Case study of generated summaries by different models.
Table 7. Case study of generated summaries by different models.
ModelSummary Content
Original News(PEOPLE.com)—Reese Witherspoon wants to move on after her accident. The actress is not pressing charges against the 84-year-old driver who struck her as she was jogging Wednesday in Santa Monica, Calif., her rep tells PEOPLE. The actress, 35, suffered minor injuries and went home to recover, her rep said at the time. The driver, meanwhile, was cited for failing to yield to a pedestrian in a crosswalk.
Lead(PEOPLE.com)—Reese Witherspoon wants to move on after her accident. The actress is not pressing charges against the 84-year-old driver who struck her as she was jogging Wednesday in Santa Monica, Calif., her rep tells PEOPLE. The actress, 35, suffered minor injuries and went home to recover, her rep said at the time.
LexRankReese Witherspoon was struck by an 84-year-old driver while jogging in Santa Monica. She suffered minor injuries and is not pressing charges. The driver was cited for failing to yield to a pedestrian.
TextRankReese Witherspoon was hit by a driver while jogging in Santa Monica. She suffered minor injuries and is not pressing charges. The driver was cited for failing to yield to a pedestrian.
PGReese Witherspoon, 35, was struck by an 84-year-old driver while jogging in Santa Monica. She suffered minor injuries and is not pressing charges. The driver was cited for failing to yield to a pedestrian in a crosswalk.
T5Reese Witherspoon was hit by an 84-year-old driver while jogging in Santa Monica. She suffered minor injuries and is not pressing charges. The driver was cited for failing to yield to a pedestrian.
FTActress Reese Witherspoon, 35, was struck by an 84-year-old driver while jogging in Santa Monica. She suffered minor injuries and is not pressing charges. The driver was cited for failing to yield to a pedestrian in a crosswalk.
GPT-2Reese Witherspoon, 35, was struck by an 84-year-old driver while jogging in Santa Monica on Wednesday. She suffered minor injuries and went home to recover. The actress is not pressing charges, but the driver was cited for failing to yield to a pedestrian in a crosswalk.
Llama3Reese Witherspoon, 35, was hit by an 84-year-old driver while jogging in Santa Monica. She suffered minor injuries but is not pressing charges. The driver was cited for failing to yield to a pedestrian in a crosswalk.
QwenActress Reese Witherspoon, 35, was struck by an 84-year-old driver while jogging in Santa Monica. She suffered minor injuries and is not pressing charges. The driver was cited for failing to yield to a pedestrian in a crosswalk. Witherspoon wants to move on from the incident.
FrameSumReese Witherspoon, 35, was hit by an 84-year-old driver while jogging in Santa Monica. She suffered minor injuries but is not pressing charges. The driver was cited for failing to yield to a pedestrian. Witherspoon wants to move on from the accident.
Reference SummaryActress Reese Witherspoon, 35, was struck by an 84-year-old driver while jogging in Santa Monica. She suffered minor injuries but is not pressing charges. The driver was cited for failing to yield to a pedestrian in a crosswalk. Witherspoon wants to move on from the incident.
Table 8. Ablation study results.
Table 8. Ablation study results.
ModelROUGE-1ROUGE-2ROUGE-L
FrameSum44.8219.24.39.56
-w/o frame emb.44.2318.7938.91
-w/o frame feat.43.1118.0738.85
-w/o cov. loss41.5717.6237.40
-w/o frame rel. loss41.9618.2339.17
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, X.; Wei, Q.; Zheng, B.; Liu, J.; Zhang, P. FrameSum: Leveraging Framing Theory and Deep Learning for Enhanced News Text Summarization. Appl. Sci. 2024, 14, 7548. https://doi.org/10.3390/app14177548

AMA Style

Zhang X, Wei Q, Zheng B, Liu J, Zhang P. FrameSum: Leveraging Framing Theory and Deep Learning for Enhanced News Text Summarization. Applied Sciences. 2024; 14(17):7548. https://doi.org/10.3390/app14177548

Chicago/Turabian Style

Zhang, Xin, Qiyi Wei, Bin Zheng, Jiefeng Liu, and Pengzhou Zhang. 2024. "FrameSum: Leveraging Framing Theory and Deep Learning for Enhanced News Text Summarization" Applied Sciences 14, no. 17: 7548. https://doi.org/10.3390/app14177548

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop