Multi-Modal Sentiment Analysis Based on Interactive Attention Mechanism
Abstract
:1. Introduction
- (1)
- Hierarchical multi-attention mechanism used to realize hierarchical extraction of data features;
- (2)
- Gate Channel used to replace the Feed Forward layer in the BERT model to realize information filtering;
- (3)
- Information interaction between different modes realized through tensor fusion model based on self-attention.
2. Related Work
2.1. Deep Learning
2.2. Multi-Modal Sentiment Analysis Model
- (1)
- Feature extraction: how to use the model to extract the features of different modes;
- (2)
- Feature fusion: how to fuse the features of different modes together to achieve a cross-modal information interaction effect.
2.3. Fusion Method
3. Interactive Attention Mechanism Based on BERT Model
3.1. Hierarchical Multi-Head Self Attention
3.2. Gated Information Channel
3.3. Tensor Fusion Method Based on Self-Attention
4. Experiment
4.1. Experimental Dataset and Prepossessing
4.2. Experimental Parameter Configuration
4.3. Data Processing
5. Analysis of Experimental Results
5.1. Results Analysis of Baseline
5.2. Ablative Study of LG-BERT
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Gove, R.; Faytong, J. Machine Learning and Event-Based Software Testing: Classifiers for Identifying Infeasible GUI Event Sequences. Adv. Comput. 2012, 86, 109–135. [Google Scholar]
- Chen, R.; Zhou, Y.; Zhang, L.; Duan, X. Word-level sentiment analysis with reinforcement learning. IOP Conf. Series Mater. Sci. Eng. 2019, 490, 062063. [Google Scholar] [CrossRef]
- Chen, M.; Wang, S.; Liang, P.P.; Baltrušaitis, T.; Zadeh, A.; Morency, L.-P. Multi-modal sentiment analysis with word-level fusion and reinforcement learning. In Proceedings of the 19th ACM International Conference on Multi-Modal Interaction, Glasgow, UK, 13–17 November 2017; pp. 163–171. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Zadeh, A.; Chen, M.; Poria, S.; Cambria, E.; Morency, L.-P. Tensor fusion network for multimodal sentiment analysis. arXiv 2017, arXiv:1707.07250. [Google Scholar]
- He, J.; Zhao, L.; Yang, H.; Zhang, M.; Li, W. HSI-BERT: Hyperspectral Image Classification Using the Bidirectional Encoder Representation From Transformers. IEEE Trans. Geosci. Remote Sens. 2019, 58, 165–178. [Google Scholar] [CrossRef]
- Zail, C.; Huang, K.; Wu, L.; Zhong, Z.; Jiao, Z. Relational Graph Convolutional Network for Text-Mining-Based Accident Causal Classification. Appl. Sci. 2022, 12, 2482. [Google Scholar] [CrossRef]
- Zhao, S.; Zhang, T.; Hu, M.; Chang, W.; You, F. AP-BERT: Enhanced pre-trained model through average pooling. Appl. Intell. 2022. [Google Scholar] [CrossRef]
- He, J.; Hu, H. MF-BERT: Multimodal Fusion in Pre-Trained BERT for Sentiment Analysis. IEEE Signal Process. Lett. 2021, 29, 454–458. [Google Scholar] [CrossRef]
- Zhu, X.; Zhu, Y.; Zhang, L.; Chen, Y. A BERT-based multi-semantic learning model with aspect-aware enhancement for aspect polarity classification. Appl. Intell. 2022. [Google Scholar] [CrossRef]
- Morency, L.-P.; Mihalcea, R.; Doshi, P. Towards multi-modal sentiment analysis: Harvesting opinions from the web. In Proceedings of the 13th International Conference on Multi-Modal Interfaces, Alicante, Spain, 14–18 November 2011; pp. 169–176. [Google Scholar]
- Wang, H.; Meghawat, A.; Morency, L.-P.; Xing, E.P. Select-additive learning: Improving generalization in multi-modal sentiment analysis. In Proceedings of the 2017 IEEE International Conference on Multimedia and Expo (ICME), Hong Kong, China, 10–14 July 2017; pp. 949–954. [Google Scholar]
- Kumar, A.; Vepa, J. Gated mechanism for attention based multi modal sentiment analysis. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 4477–4481. [Google Scholar]
- Arjmand, M.; Dousti, M.J.; Moradi, H. Teasel: A transformer-based speech-prefixed language model. arXiv 2021, arXiv:2109.05522. [Google Scholar]
- Zhang, S.; Xu, X.; Pang, Y.; Han, J. Multi-layer attention based cnn fortarget-dependent sentiment classification. Neural Process. Lett. 2020, 51, 2089–2103. [Google Scholar] [CrossRef]
- Zadeh, A.; Liang, P.P.; Mazumder, N.; Poria, S.; Cambria, E.; Morency, L.-P. Memory fusion network for multi-view sequential learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Wang, A.; Cho, K. BERT has a Mouth, and It Must Speak: BERT as a Markov Random Field Language Model. arXiv 2019, arXiv:1902.04094. [Google Scholar] [CrossRef]
- Tsai, Y.-H.H.; Liang, P.P.; Zadeh, A.; Morency, L.-P.; Salakhutdinov, R. Learning factorized multi-modal representations. arXiv 2018, arXiv:1806.06176. [Google Scholar]
- Liang, P.P.; Liu, Z.; Zadeh, A.; Morency, L.-P. Multi-modal language analysis with recurrent multistage fusion. arXiv 2018, arXiv:1808.03920. [Google Scholar]
- Pham, H.; Liang, P.P.; Manzini, T.; Morency, L.-P.; Póczos, B. Found in translation: Learning robust joint representations by cyclic translations between modalities. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 6892–6899. [Google Scholar]
- Tsai, Y.-H.H.; Bai, S.; Liang, P.P.; Kolter, J.Z.; Morency, L.-P.; Salakhutdinov, R. Multi-modal transformer for unaligned multi-modal language sequences. In Proceedings of the Association for Computational Linguistics Meeting, Florence, Italy, 28 July–2 August 2019; Volume 2019, p. 6558. [Google Scholar]
Model | Modality | ACC/% | F1 |
---|---|---|---|
TFN [5] | T + A + V | 77.1 | |
LMF [6] | T + A + V | 76.4 | 75.7 |
GME-LSTM [3] | T + A + V | 76.5 | |
MFM [20] | T + A + V | 78.1 | 78.1 |
RMFN [21] | T + A + V | 78.4 | 78.0 |
MCTN [22] | T + A + V | 79.3 | 79.1 |
MulT [23] | T + A + V | 83.0 | 82.8 |
CM-BERT+ | T + A | 82.65 | 82.64 |
BERT-TA+ | T + A | 83.38 | 83.45 |
LG-BERT (ours) | T + A | 83.82 | 83.91 |
Group | Model | Modality | ACC/% | F1/% | Learning Rate |
---|---|---|---|---|---|
A | BERT | T | 83.67 | 83.70 | 1 × 10−5 |
B1 | A + Level Multi-head Attention (LM) | T | 82.07 | 82.08 | 1 × 10−5 |
B2 | A + Gate Channel (GC) | T | 82.22 | 82.17 | 1 × 10−5 |
B3 | A + Attention Fusion based on TFN (AFT) | T + A | 83.38 | 83.45 | 1 × 10−5 |
B4 | A + Fusion (CM-BERT) | T + A | 82.36 | 82.32 | 1 × 10−5 |
D | A + LM + GC + AFT | T + A | 83.82 | 83.91 | 1 × 10−5 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, J.; Zhu, T.; Zheng, X.; Wang, C. Multi-Modal Sentiment Analysis Based on Interactive Attention Mechanism. Appl. Sci. 2022, 12, 8174. https://doi.org/10.3390/app12168174
Wu J, Zhu T, Zheng X, Wang C. Multi-Modal Sentiment Analysis Based on Interactive Attention Mechanism. Applied Sciences. 2022; 12(16):8174. https://doi.org/10.3390/app12168174
Chicago/Turabian StyleWu, Jun, Tianliang Zhu, Xinli Zheng, and Chunzhi Wang. 2022. "Multi-Modal Sentiment Analysis Based on Interactive Attention Mechanism" Applied Sciences 12, no. 16: 8174. https://doi.org/10.3390/app12168174
APA StyleWu, J., Zhu, T., Zheng, X., & Wang, C. (2022). Multi-Modal Sentiment Analysis Based on Interactive Attention Mechanism. Applied Sciences, 12(16), 8174. https://doi.org/10.3390/app12168174