A Hierarchical Representation Model Based on Longformer and Transformer for Extractive Summarization
Abstract
:1. Introduction
- (1)
- This study proposes the hierarchical document representation method, which employs Longformer as the sentence encoder and Transformer as the document encoder to encode input text. Different from CNN (Convolutional Neural Network) or LSTM (Long and Short-Term Memory) as encoders [5,6,7], the model can deal with a long document, up to 4096 tokens, due to adopting Longformer as a sentence encoder, and makes it possible to directly encode long text.
- (2)
- Both global attention and local attention [8] are adopted by encoders, which not only ensures that key tokens do not lose global information but also reduces computational complexity.
- (3)
- The proposed hierarchical model achieves the best Rouge-1 and Rouge-L [9] on CNN/DailyMail datasets [10], and it achieves the state-of-the-art Rouge-1, Rouge-2, and Rouge-L on the long text dataset CNN. The best Rouge-1 and Rouge-L are achieved on the short text dataset DailyMail. Experimental results show that Longformer, as a sentence encoder, has good performance on long documents.
2. Related Work
3. Proposed Model
3.1. Sentence Encoder
3.2. Document Encoder
3.3. Decoder
4. Experiments
4.1. Datasets
4.2. Evaluation Criteria
4.3. Experimental Settings
4.4. Experimental Results and Analysis
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Peter, L.H. The automatic creation of literature abstracts. IBM J. Res. Dev. 1958, 2, 159–165. [Google Scholar]
- Chopra, S.; Auli, M.; Rush, A.M. Abstractive sentence summarization with attentive recurrent neural networks. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 13–15 June 2016; pp. 93–98. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Bahdanau, D.; Brakel, P.; Xu, K. An actor-critic algorithm for sequence prediction. arXiv 2016, arXiv:1607.07086. [Google Scholar]
- Nallapati, R.; Zhai, F.; Zhou, B. Summarunner: A recurrent neural network based sequence model for extractive summarization of documents. In Proceedings of the Thirty-first AAAI conference on artificial intelligence, San Diego, CA, USA, 4–9 February 2017; pp. 3075–3081. [Google Scholar]
- Dong, Y.; Shen, Y.; Crawford, E. Banditsum:Extractive summarization as a contextual bandit. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 2–4 November 2018; pp. 3739–3748. [Google Scholar]
- Beltagy, I.; Peters, M.E.; Cohan, A. Longformer: The long-document transformer. arXiv 2020, arXiv:2004.05150. [Google Scholar]
- Lin, C.-Y. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Text Summarization Branches Out, Barcelona, Spain, 25–26 July 2004; pp. 74–81. [Google Scholar]
- Hermann, K.M.; Kocisky, T.; Grefenstette, E. Teaching machines to read and comprehend. Adv. Neural Inf. Process. Syst. 2015, 28, 1693–1701. [Google Scholar]
- Cho, K.; van Merriënboer, B.; Gulcehre, C. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014. [Google Scholar]
- Sutskever, I.; Vinyals, O.; Le Quoc, V. Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 2014, 27, 3104–3112. [Google Scholar]
- Rush, A.M.; Chopra, S.; Weston, J. A neural attention model for abstractive sentence summarization. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP), Lisbon, Portugal, 17–21 September 2015; pp. 379–389. [Google Scholar]
- Conroy, J.M.; O’leary, D.P. Text summarization via hidden markov models. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, LA, USA, 9–13 September 2001; pp. 406–407. [Google Scholar]
- Mihalcea, R. Graph-based ranking algorithms for sentence extraction, applied to text summarization. In Proceedings of the ACL Interactive Poster and Demonstration Sessions, Barcelona, Spain, 21–26 July 2004; pp. 170–173. [Google Scholar]
- Al-Sabahi, K.; Zuping, Z.; Nadher, M. A hierarchical structured self-attentive model for extractive document summarization (HSSAS). IEEE Access 2018, 6, 24205–24212. [Google Scholar] [CrossRef]
- Narayan, S.; Cohen, S.B.; Lapata, M. Ranking sentences for extractive summarization with reinforcement learning. In Proceedings of the NAACL-HLT, New Orleans, LA, USA, 1–6 June 2018; pp. 1747–1759. [Google Scholar]
- Yao, K.; Zhang, L.; Luo, T. Deep reinforcement learning for extractive document summarization. Neurocomputing 2018, 284, 52–62. [Google Scholar] [CrossRef]
- Liu, Y.; Lapata, M. Text summarization with pretrained encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, Hong Kong, China, 3–7 November 2019; pp. 3730–3740. [Google Scholar]
- Zhang, X.; Wei, F.; Zhou, M. HIBERT: Document level pre-training of hierarchical bidirectional transformers for document summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), Florence, Italy, 28 July–2 August 2019; pp. 5059–5069. [Google Scholar]
- Wang, D.; Liu, P.; Zheng, Y. Heterogeneous graph neural networks for extractive document summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), Online, 6–8 July 2020. [Google Scholar]
- Wu, Y.; Hu, B. Learning to extract coherent summary via deep reinforc-ement learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 5602–5609. [Google Scholar]
- Zhong, M.; Liu, P.; Wang, D. Searching for effective neural extractive summarization: What works and what’s next. In Proceedings of the 57th Conference of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019; pp. 1049–1058. [Google Scholar]
- Kitaev, N.; Kaiser, Ł.; Levskaya, A. Reformer: The efficient transformer. In Proceedings of the ICLR, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Ye, Z.; Guo, Q.; Gan, Q. Bp-transformer: Modelling long-range context via binary partitioning. arXiv 2019, arXiv:1911.04070,. [Google Scholar]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer Normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
- Cheng, J.; Lapata, M. Neural summarization by extracting sentences and words. In Proceedings of the Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016. [Google Scholar]
- See, A.; Liu, P.; Manning, C.D. Get to the point: Summarization with pointer-generator networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (ACL), Vancouver, BC, Canada, 30 July–4 August 2017; pp. 1073–1083. [Google Scholar]
- Haghighi, A.; Vanderwende, L. Exploring content models for multi-document summarization. In Proceedings of the Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, CO, USA, 31 May–5 June 2009; pp. 362–370. [Google Scholar]
- Vanderwende, L.; Suzuki, H.; Brockett, C. Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion. Inf. Process. 2007, 43, 1606–1618. [Google Scholar] [CrossRef] [PubMed]
- Mohsen, F.; Wang, J.; Al-Sabahi, K. A hierarchical self-attentive neural extractive summarizer via reinforcement learning (HSASRL). Appl. Intell. 2020, 50, 2633–2646. [Google Scholar] [CrossRef]
- Lebanoff, L.; Song, K.; Dernoncourt, F. Scoring sentence singletons and pairs for abstractive summarization. In Proceedings of the ACL, Florence, Italy, 28 July–2 August 2019. [Google Scholar]
Dataset | Training | Validation | Testing |
---|---|---|---|
CNN | 90,266 | 1220 | 1093 |
DailyMail | 196,961 | 12,148 | 10,397 |
CNN/DailyMail | 287,277 | 13,368 | 11,490 |
Model | Rouge-1 | Rouge-2 | Rouge-L |
---|---|---|---|
SumBasic [30] | 34.11 | 11.13 | 31.14 |
LexRank [15] | 35.34 | 13.31 | 31.93 |
KLSumm [29] | 29.92 | 10.50 | 27.37 |
Lead-3 | 40.0 | 17.5 | 36.2 |
DQN [18] | 39.4 | 16.1 | 35.6 |
BANDITSUM [7] | 41.5 | 18.7 | 37.6 |
HSASRL [31] | 41.5 | 19.5 | 37.9 |
HSSAS [16] | 42.3 | 17.8 | 37.6 |
Refresh [17] | 40.0 | 18.2 | 36.6 |
BERT-Extr [32] | 41.13 | 18.68 | 37.75 |
HIBERT [20] | 42.37 | 19.95 | 38.83 |
HSG + Tri-Blocking [21] | 42.95 | 19.76 | 39.23 |
BERTSUMEXT + TRIBLK [19] | 43.25 | 20.24 | 39.63 |
Long-Trans-Extr (ours) | 43.78 | 19.83 | 39.71 |
Model | Sentence-Encoder | Document-Encoder | Rouge-1 | Rouge-2 | Rouge-L |
---|---|---|---|---|---|
NN-SE [27] | CNN | LSTM | 35.5 | 14.7 | 32.2 |
HSSAS [16] | Bi-LSTM + Attention | Bi-LSTM + Attention | 42.3 | 17.8 | 37.6 |
BERT-Extr [32] | BERT | — | 41.13 | 18.68 | 37.5 |
BERTSUMEXT + TRIBLK [19] | BERT | Transformer | 43.25 | 20.24 | 39.63 |
Long-Trans-Extr (ours) | Longformer | Transformer | 43.78 | 19.83 | 39.71 |
Model | CNN | DailyMail | ||||
---|---|---|---|---|---|---|
Rouge-1 | Rouge-2 | Rouge-L | Rouge-1 | Rouge-2 | Rouge-L | |
NN-SE [27] | 28.4 | 10.0 | 25.0 | 36.2 | 15.2 | 32.9 |
Refresh [17] | 30.4 | 11.7 | 26.9 | 41.0 | 18.8 | 37.7 |
BANDITSUM [7] | 30.7 | 11.6 | 27.4 | 42.1 | 18.9 | 38.3 |
DQN [18] | — | — | — | 41.9 | 16.5 | 33.8 |
HSASRL [31] | 30.92 | 12.2 | 27.4 | 42.88 | 20.48 | 39.71 |
Long-Trans-Extr (ours) | 33.75 | 13.11 | 30.44 | 44.89 | 20.02 | 40.82 |
Dataset | Word_Num | Sent_Num |
---|---|---|
CNN | 760.50 | 33.98 |
DailyMail | 653.33 | 29.33 |
Attetion Mode | GPU Memory | Training Time/Epoch |
---|---|---|
Global | 7014 MB | 80.8 h |
Global + Local | 4881 MB | 55.48 h |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, S.; Zhang, S.; Fang, M.; Yang, F.; Liu, S. A Hierarchical Representation Model Based on Longformer and Transformer for Extractive Summarization. Electronics 2022, 11, 1706. https://doi.org/10.3390/electronics11111706
Yang S, Zhang S, Fang M, Yang F, Liu S. A Hierarchical Representation Model Based on Longformer and Transformer for Extractive Summarization. Electronics. 2022; 11(11):1706. https://doi.org/10.3390/electronics11111706
Chicago/Turabian StyleYang, Shihao, Shaoru Zhang, Ming Fang, Fengqin Yang, and Shuhua Liu. 2022. "A Hierarchical Representation Model Based on Longformer and Transformer for Extractive Summarization" Electronics 11, no. 11: 1706. https://doi.org/10.3390/electronics11111706