CharAs-CBert: Character Assist Construction-Bert Sentence Representation Improving Sentiment Classification
Abstract
:1. Introduction
- When understanding a sentence, the weight of the word processed by BERT is not directly used to explain the sentence, but a slice attention enhancement network is designed to explain these behaviors, assigning salient words in the sentence to the sentence. Higher weight coefficients, meanwhile, explore the channel dependencies and spatial correlations of different salient words in the sentence.
- Based on our sentence construction, we design a bidirectional independent recurrent neural network to explore the construction vector of sentences, alleviate the ambiguity of the same word in different sentences, and promote long-term learning of the network to establish effective long-term dependencies, it realizes the interaction between forward and backward semantic information and improves the model’s ability to perceive contextual details.
- A construction-based character graph convolutional network is designed to explore the internal structural information of salient words in sentences, that is, there is a strong correlation between adjacent characters in each salient word. Features strengthen construction information to improve the ability of construction information to distinguish the basic structure of sentences. Furthermore, we design a triple loss function to better tune and optimize the network to learn better sentence representations.
2. Related Works
3. CharAs-CBert Framework
3.1. Initial Embedding Module
3.2. Sliced Attention Module (SAM)
3.3. Bidirectional Independent Recurrent Module (BIRM)
3.4. Characters Graph Convolution Module (CharGCM)
4. Experimental Results and Analysis
4.1. Datasets Preparation
4.2. Parameters Settings
4.3. Comparison with Other Models
- The overall performance of our proposed CharAs-CBERT sentence representation framework on the three baseline datasets outperforms other representation models, such as in Laptop, Restaurant, and ACL14 than SBERT-att 1.1%, 1.03% and 1.17%, respectively. There are three possible reasons. First, we use the Slice Attention Module (SAM) to establish long-term dependent salient word representations from two directions, such as channel and space. The performance of sentence representation; second, BIRM and CharGCM are introduced to support construction information, explore the internal structure information of sentences, and highlight the differences between different sentences, resulting in the improvement of sentence representation performance; third, the fusion of three different feature vectors make up for the insufficiency of a single representation and understand sentences from different angles and levels. In addition, the introduction of rich low-level semantics further enhances the difference between sentences, improves the performance of sentence representation, and improves the downstream emotion. Accuracy for classification tasks.
- Compared with BERTS-LSTM and Tree-LSTM sentence representation models, TG-HTreeLSTM and TE-DCNN have certain competitive advantages in three types of data. For example, on the Laptop data, the of TG-HTreeLSTM is 6.53% higher than that of Tree-LSTM. The possible reason is that Tree-LSTM can only process binary selection trees that are different from the original selection tree. Conversely, TG-HTreeLSTM can process the original constituency tree of sentences, resulting in a performance improvement. The good performance of TE-DCNN may be because its dynamic synthesis strategy plays an important role, resulting in better semantic information obtained by the network.
- Capsule-B is improved by 0.98%, 1.42%, and 2.01%, respectively, compared with of TE-DCNN. The possible reason for this is that the capsule network can perceive more effectively due to the directionality of capsule neurons. The subtle changes between different sentences improve the distinguishing ability of sentence structure, thereby improving the representation effect of sentences.
- In the ACL14 baseline data, the ACC of CNN-LSTM is 1.09% higher than that of the LSTM method. The possible reason is that CNN obtains the local spatial features of sentences, LSTM encodes the time series and establishes a complementary relationship between the spatial and temporal features. Improved sentence representation. Thus, the sentence representation accuracy is improved. In contrast, Self-Att achieves better competitive advantages in three sets of open-source baseline datasets, mainly since self-attention focuses on key information and effectively models the local and global semantics of sentences.
4.4. Ablation Studies
4.4.1. Different Components of CharAs-CBert
- Compared with the single-structure sentence representation, the multi-feature co-representation method shows better performance. Such as CharAs-CBert () vs. CharAs-CBert (), CharAs-CBert () and CharAs-CBert () on ACL14 baseline datasets increased by 1.27%, 1.78% and 2.02%, respectively. The possible reason is that the multi-feature vector fusion understands the sentence from different angles, and the different feature vectors form complementarity, making up for a single feature vector that is easy to ignore a question of detail semantics. In addition, the CharAs-CBert () method is inferior to the CharAs-CBert () method on the three sets of open-source baseline data, which indicates that the proposed BIRM plays a positive role in the overall performance of the model. A possible reason is that stacking multiple layers of bidirectional independent recurrent neural networks obtains a better global representation.
- On the Laptop baseline data, CharAs-CBert () is better than CharAs-CBert () and CharAs-CBert () is improved by 0.1% and 0.27%, respectively, which shows that our proposed components play a positive role in the overall performance of the model. In addition, we also found that the SAM component has the least positive effect on the model. It may be that the model only uses the word vector to represent the sentence in the absence of construction, ignoring the basic structure of the sentence, and cannot fully obtain the context of the sentence.
- Although has achieved a certain competitive advantage, it is still lower than CharAs-Bert. Building deep BILSTMs for learning key semantics in data is difficult. In contrast, BIRM can be stacked into very deep networks using non-saturating activation functions, etc. We obtain better depth semantics and perceive richer detail changes due to stacking in the form of residuals.
4.4.2. Comparing with Loss Functions
4.4.3. Comparison of Different Layers
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
CharAs-CBert | Character Assist Construction-Bert |
SAM | Slice Attention Module |
BIRM | Bidirectional Independent Recurrent Module |
CharGCM | Character Graph Convolution Module |
Att | Attention |
References
- Zhou, C.; Sun, C.; Liu, Z.; Lau, F. A C-LSTM neural network for text classification. arXiv 2015, arXiv:1511.08630. [Google Scholar]
- Wan, S.; Lan, Y.; Guo, J.; Xu, J.; Pang, L.; Cheng, X. A deep architecture for semantic matching with multiple positional sentence representations. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar]
- Schwenk, H.; Douze, M. Learning joint multilingual sentence representations with neural machine translation. arXiv 2017, arXiv:1704.04154. [Google Scholar]
- Hao, T.; Li, X.; He, Y.; Wang, F.L.; Qu, Y. Recent progress in leveraging deep learning methods for question answering. Neural Comput. Appl. 2022, 34, 2765–2783. [Google Scholar] [CrossRef]
- Rao, G.; Huang, W.; Feng, Z.; Cong, Q. LSTM with sentence representations for document-level sentiment classification. Neurocomputing 2018, 308, 49–57. [Google Scholar] [CrossRef]
- Fu, Q.; Wang, C.; Han, X. A CNN-LSTM network with attention approach for learning universal sentence representation in embedded system. Microprocess. Microsyst. 2020, 74, 103051. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, J.; Zhang, X. Learning sentiment sentence representation with multiview attention model. Inf. Sci. 2021, 571, 459–474. [Google Scholar] [CrossRef]
- Kim, T.; Yoo, K.M.; Lee, S. Self-guided contrastive learning for BERT sentence representations. arXiv 2021, arXiv:2106.07345. [Google Scholar]
- Li, S.; Li, W.; Cook, C.; Zhu, C.; Gao, Y. Independently recurrent neural network (indrnn): Building a longer and deeper rnn. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 5457–5466. [Google Scholar]
- Reimers, N.; Gurevych, I. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv 2019, arXiv:1908.10084. [Google Scholar]
- Bayhaqy, A.; Sfenrianto, S.; Nainggolan, K.; Kaburuan, E.R. Sentiment analysis about E-commerce from tweets using decision tree, K-nearest neighbor, and naïve bayes. In Proceedings of the 2018 International Conference on Orange Technologies (ICOT), Bali, Indonesia, 23–26 October 2018; pp. 1–6. [Google Scholar]
- Rathi, M.; Malik, A.; Varshney, D.; Sharma, R.; Mendiratta, S. Sentiment analysis of tweets using machine learning approach. In Proceedings of the 2018 Eleventh International Conference on Contemporary Computing (IC3), Noida, India, 2–4 August 2018; pp. 1–3. [Google Scholar]
- Anwar, M.K.M.K.; Yusoff, M.; Kassim, M. Decision Tree and Naïve Bayes for Sentiment Analysis in Smoking Perception. In Proceedings of the 2022 IEEE 12th Symposium on Computer Applications Industrial Electronics (ISCAIE), Penang Island, Malaysia, 21–22 May 2022; pp. 294–299. [Google Scholar]
- Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2011, 2, 1–27. [Google Scholar] [CrossRef]
- Li, W.; Hao, S. Sparse lifting of dense vectors: Unifying word and sentence representations. arXiv 2019, arXiv:1911.01625. [Google Scholar]
- Ma, J.; Li, J.; Liu, Y.; Zhou, S.; Li, X. Integrating Dependency Tree into Self-Attention for Sentence Representation. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Virtual, 7–13 May 2022; pp. 8137–8141. [Google Scholar]
- Bai, X.; Shang, J.; Sun, Y.; Balasubramanian, N. Learning for Expressive Task-Related Sentence Representations. arXiv 2022, arXiv:2205.12186. [Google Scholar]
- Hu, X.; Mi, H.; Li, L.; de Melo, G. Fast-R2D2: A Pretrained Recursive Neural Network based on Pruned CKY for Grammar Induction and Text Representation. arXiv 2022, arXiv:2203.00281. [Google Scholar]
- Zhao, D.; Wang, J.; Lin, H.; Chu, Y.; Wang, Y.; Zhang, Y.; Yang, Z. Sentence representation with manifold learning for biomedical texts. Knowl.-Based Syst. 2021, 218, 106869. [Google Scholar] [CrossRef]
- Wu, Z.; Wang, S.; Gu, J.; Khabsa, M.; Sun, F.; Ma, H. Clear: Contrastive learning for sentence representation. arXiv 2020, arXiv:2012.15466. [Google Scholar]
- Zhang, Y.; Zhang, R.; Mensah, S.; Liu, X.; Mao, Y. Unsupervised Sentence Representation via Contrastive Learning with Mixing Negatives. Available online: https://aaai-2022.virtualchair.net/poster_aaai8081 (accessed on 8 June 2022).
- Zhang, Y.; He, R.; Liu, Z.; Bing, L.; Li, H. Bootstrapped unsupervised sentence representation learning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Bangkok, Thailand, 1–6 August 2021; Volume 1, pp. 5168–5180. [Google Scholar]
- Xu, W.; Li, S.; Lu, Y. Usr-mtl: An unsupervised sentence representation learning framework with multi-task learning. Appl. Intell. 2021, 51, 3506–3521. [Google Scholar] [CrossRef]
- Seo, J.; Lee, S.; Liu, L.; Choi, W. TA-SBERT: Token Attention Sentence-BERT for Improving Sentence Representation. IEEE Access 2022, 10, 39119–39128. [Google Scholar] [CrossRef]
- Zhang, Q.L.; Yang, Y.B. Sa-net: Shuffle attention for deep convolutional neural networks. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 2235–2239. [Google Scholar]
- Tang, Z.; Wan, B.; Yang, L. Word-character graph convolution network for chinese named entity recognition. IEEE ACM Trans. Audio Speech Lang. Process. 2020, 28, 1520–1532. [Google Scholar] [CrossRef]
Datasets | Restaurant | Laptop | ACL14 | |||
---|---|---|---|---|---|---|
Training | Testing | Training | Testing | Training | Testing | |
Positive | 2164 | 728 | 994 | 341 | 3142 | 346 |
Negative | 807 | 196 | 870 | 128 | 1562 | 173 |
Neutral | 637 | 196 | 464 | 169 | 1562 | 173 |
Consnum charnodes | 100,043 12,891 | 1,105,665 67,235 | 241,546 15,213 | 992,438 37,539 | 819,242 31,028 | 286,552 15,652 |
Model-Datasets | Laptop | Restaurant | ACL14 | |||
---|---|---|---|---|---|---|
ACC | ACC | ACC | ||||
LSTM | 75.38 | 72.24 | 73.98 | 70.07 | 77.42 | 73.19 |
CNN-LSTM | 76.51 | 73.02 | 74.21 | 70.56 | 78.51 | 74.23 |
Tree-LSTM | 78.08 | 74.88 | 76.64 | 72.89 | 80.5 | 77.06 |
BERT-LSTM | 80.92 | 76.73 | 80.48 | 74.9 | 81.54 | 77.96 |
TG-HRecNN | 82.08 | 79.52 | 80.93 | 75.92 | 82.46 | 80.63 |
TG-HTreeLSTM | 83.03 | 81.41 | 80.96 | 76.42 | 85.83 | 82.17 |
TE-DCNN | 87.55 | 83.25 | 83.93 | 78.99 | 87.49 | 83.84 |
Capsule-B | 88.32 | 84.23 | 85.09 | 80.41 | 91.38 | 85.85 |
Self-Att [16] | 86.51 | 82.42 | 83.79 | 78.64 | 86.92 | 82.74 |
SBERT-att [24] | 90.59 | 85.93 | 85.31 | 81.93 | 91.53 | 86.37 |
CharAs-CBert | 92.19 | 87.03 | 86.22 | 82.96 | 92.88 | 87.54 |
Model-Datasets | Laptop | Restaurant | ACL14 | |||
---|---|---|---|---|---|---|
ACC | ACC | ACC | ||||
CharAs-CBert () | 86.35 | 82.4 | 81.34 | 77.65 | 86.01 | 83.48 |
CharAs-CBert () | 87.12 | 82.33 | 82.05 | 77.79 | 87.54 | 83.65 |
CharAs-CBert () | 88.04 | 82.82 | 82.59 | 78.73 | 88.02 | 83.72 |
CharAs-CBert () | 88.62 | 83.5 | 83.07 | 79.53 | 88.08 | 84.13 |
CharAs-CBert () | 88.84 | 83.62 | 83.29 | 80.06 | 88.4 | 85.5 |
CharAs-CBert () | 89.85 | 84.17 | 83.43 | 80.65 | 88.64 | 85.54 |
CharAs-CBert () | 90.02 | 84.75 | 83.97 | 80.67 | 88.72 | 85.92 |
CharAs-CBert () | 90.03 | 85.48 | 84.24 | 81.04 | 89.62 | 86.32 |
CharAs-CBert () | 91.3 | 85.65 | 84.67 | 82.03 | 90.62 | 86.42 |
CharAs-CBert () | 91.47 | 85.69 | 84.92 | 82.19 | 91.14 | 86.71 |
CharAs-CBert () | 91.96 | 85.75 | 85.44 | 82.56 | 92.27 | 86.92 |
CharAs-CBert | 92.19 | 87.03 | 86.22 | 82.96 | 92.88 | 87.54 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, B.; Peng, W.; Song, J. CharAs-CBert: Character Assist Construction-Bert Sentence Representation Improving Sentiment Classification. Sensors 2022, 22, 5024. https://doi.org/10.3390/s22135024
Chen B, Peng W, Song J. CharAs-CBert: Character Assist Construction-Bert Sentence Representation Improving Sentiment Classification. Sensors. 2022; 22(13):5024. https://doi.org/10.3390/s22135024
Chicago/Turabian StyleChen, Bo, Weiming Peng, and Jihua Song. 2022. "CharAs-CBert: Character Assist Construction-Bert Sentence Representation Improving Sentiment Classification" Sensors 22, no. 13: 5024. https://doi.org/10.3390/s22135024
APA StyleChen, B., Peng, W., & Song, J. (2022). CharAs-CBert: Character Assist Construction-Bert Sentence Representation Improving Sentiment Classification. Sensors, 22(13), 5024. https://doi.org/10.3390/s22135024