Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (4)

Search Parameters:
Keywords = sentence boundary disambiguation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 2098 KiB  
Article
Tibetan Sentence Boundaries Automatic Disambiguation Based on Bidirectional Encoder Representations from Transformers on Byte Pair Encoding Word Cutting Method
by Fenfang Li, Zhengzhang Zhao, Li Wang and Han Deng
Appl. Sci. 2024, 14(7), 2989; https://doi.org/10.3390/app14072989 - 2 Apr 2024
Viewed by 1225
Abstract
Sentence Boundary Disambiguation (SBD) is crucial for building datasets for tasks such as machine translation, syntactic analysis, and semantic analysis. Currently, most automatic sentence segmentation in Tibetan adopts the methods of rule-based and statistical learning, as well as the combination of the two, [...] Read more.
Sentence Boundary Disambiguation (SBD) is crucial for building datasets for tasks such as machine translation, syntactic analysis, and semantic analysis. Currently, most automatic sentence segmentation in Tibetan adopts the methods of rule-based and statistical learning, as well as the combination of the two, which have high requirements on the corpus and the linguistic foundation of the researchers and are more costly to annotate manually. In this study, we explore Tibetan SBD using deep learning technology. Initially, we analyze Tibetan characteristics and various subword techniques, selecting Byte Pair Encoding (BPE) and Sentencepiece (SP) for text segmentation and training the Bidirectional Encoder Representations from Transformers (BERT) pre-trained language model. Secondly, we studied the Tibetan SBD based on different BERT pre-trained language models, which mainly learns the ambiguity of the shad (“།”) in different positions in modern Tibetan texts and determines through the model whether the shad (“།”) in the texts has the function of segmenting sentences. Meanwhile, this study introduces four models, BERT-CNN, BERT-RNN, BERT-RCNN, and BERT-DPCNN, based on the BERT model for performance comparison. Finally, to verify the performance of the pre-trained language models on the SBD task, this study conducts SBD experiments on both the publicly available Tibetan pre-trained language model TiBERT and the multilingual pre-trained language model (Multi-BERT). The experimental results show that the F1 score of the BERT (BPE) model trained in this study reaches 95.32% on 465,669 Tibetan sentences, nearly five percentage points higher than BERT (SP) and Multi-BERT. The SBD method based on pre-trained language models in this study lays the foundation for establishing datasets for the later tasks of Tibetan pre-training, summary extraction, and machine translation. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

9 pages, 1925 KiB  
Proceeding Paper
A New Approach for Carrying Out Sentiment Analysis of Social Media Comments Using Natural Language Processing
by Mritunjay Ranjan, Sanjay Tiwari, Arif Md Sattar and Nisha S. Tatkar
Eng. Proc. 2023, 59(1), 181; https://doi.org/10.3390/engproc2023059181 - 17 Jan 2024
Cited by 3 | Viewed by 5876
Abstract
Business and science are using sentiment analysis to extract and assess subjective information from the web, social media, and other sources using NLP, computational linguistics, text analysis, image processing, audio processing, and video processing. It models polarity, attitudes, and urgency from positive, negative, [...] Read more.
Business and science are using sentiment analysis to extract and assess subjective information from the web, social media, and other sources using NLP, computational linguistics, text analysis, image processing, audio processing, and video processing. It models polarity, attitudes, and urgency from positive, negative, or neutral inputs. Unstructured data make emotion assessment difficult. Unstructured consumer data allow businesses to market, engage, and connect with consumers on social media. Text data are instantly assessed for user sentiment. Opinion mining identifies a text’s positive, negative, or neutral opinions, attitudes, views, emotions, and sentiments. Text analytics uses machine learning to evaluate “unstructured” natural language text data. These data can help firms make money and decisions. Sentiment analysis shows how individuals feel about things, services, organizations, people, events, themes, and qualities. Reviews, forums, blogs, social media, and other articles use it. DD (data-driven) methods find complicated semantic representations of texts without feature engineering. Data-driven sentiment analysis is three-tiered: document-level sentiment analysis determines polarity and sentiment, aspect-based sentiment analysis assesses document segments for emotion and polarity, and data-driven (DD) sentiment analysis recognizes word polarity and writes positive and negative neutral sentiments. Our innovative method captures sentiments from text comments. The syntactic layer encompasses various processes such as sentence-level normalisation, identification of ambiguities at paragraph boundaries, part-of-speech (POS) tagging, text chunking, and lemmatization. Pragmatics include personality recognition, sarcasm detection, metaphor comprehension, aspect extraction, and polarity detection; semantics include word sense disambiguation, concept extraction, named entity recognition, anaphora resolution, and subjectivity detection. Full article
(This article belongs to the Proceedings of Eng. Proc., 2023, RAiSE-2023)
Show Figures

Figure 1

15 pages, 770 KiB  
Article
Pauses and Parsing: Testing the Role of Prosodic Chunking in Sentence Processing
by Caoimhe Harrington Stack and Duane G. Watson
Languages 2023, 8(3), 157; https://doi.org/10.3390/languages8030157 - 28 Jun 2023
Cited by 1 | Viewed by 2219
Abstract
It is broadly accepted that the prosody of a sentence can influence sentence processing by providing the listener information about the syntax of the sentence. It is less clear what the mechanism is that underlies the transmission of this information. In this paper, [...] Read more.
It is broadly accepted that the prosody of a sentence can influence sentence processing by providing the listener information about the syntax of the sentence. It is less clear what the mechanism is that underlies the transmission of this information. In this paper, we test whether the influence of the prosodic structure on parsing is a result of perceptual breaks such as pauses or whether it is the result of more abstract prosodic elements, such as intonational phrases. In three experiments, we test whether different types of perceptual breaks, e.g., intonational boundaries (Experiment 1), an artificial buzzing sound (Experiment 2), and an isolated pause (Experiment 3), influence syntactic attachment in ambiguous sentences. We find that although full intonational boundaries influence syntactic disambiguation, the artificial buzz and isolated pause do not. These data rule out theories that argue that perceptual breaks indirectly influence grammatical attachment through memory mechanisms, and instead, show that listeners use prosodic breaks themselves as cues to parsing. Full article
(This article belongs to the Special Issue Pauses in Speech)
Show Figures

Figure 1

25 pages, 1359 KiB  
Article
Estimating Sentence-like Structure in Synthetic Languages Using Information Topology
by Andrew D. Back and Janet Wiles
Entropy 2022, 24(7), 859; https://doi.org/10.3390/e24070859 - 22 Jun 2022
Cited by 1 | Viewed by 2326
Abstract
Estimating sentence-like units and sentence boundaries in human language is an important task in the context of natural language understanding. While this topic has been considered using a range of techniques, including rule-based approaches and supervised and unsupervised algorithms, a common aspect of [...] Read more.
Estimating sentence-like units and sentence boundaries in human language is an important task in the context of natural language understanding. While this topic has been considered using a range of techniques, including rule-based approaches and supervised and unsupervised algorithms, a common aspect of these methods is that they inherently rely on a priori knowledge of human language in one form or another. Recently we have been exploring synthetic languages based on the concept of modeling behaviors using emergent languages. These synthetic languages are characterized by a small alphabet and limited vocabulary and grammatical structure. A particular challenge for synthetic languages is that there is generally no a priori language model available, which limits the use of many natural language processing methods. In this paper, we are interested in exploring how it may be possible to discover natural ‘chunks’ in synthetic language sequences in terms of sentence-like units. The problem is how to do this with no linguistic or semantic language model. Our approach is to consider the problem from the perspective of information theory. We extend the basis of information geometry and propose a new concept, which we term information topology, to model the incremental flow of information in natural sequences. We introduce an information topology view of the incremental information and incremental tangent angle of the Wasserstein-1 distance of the probabilistic symbolic language input. It is not suggested as a fully viable alternative for sentence boundary detection per se but provides a new conceptual method for estimating the structure and natural limits of information flow in language sequences but without any semantic knowledge. We consider relevant existing performance metrics such as the F-measure and indicate limitations, leading to the introduction of a new information-theoretic global performance based on modeled distributions. Although the methodology is not proposed for human language sentence detection, we provide some examples using human language corpora where potentially useful results are shown. The proposed model shows potential advantages for overcoming difficulties due to the disambiguation of complex language and potential improvements for human language methods. Full article
(This article belongs to the Special Issue Statistical Methods for Complex Systems)
Show Figures

Figure 1

Back to TopTop