Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (29)

Search Parameters:
Keywords = Chinese internet text

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
32 pages, 2973 KiB  
Article
Social Media Users’ Visual and Emotional Preferences of Internet-Famous Sites in Urban Riverfront Public Spaces: A Case Study in Changsha, China
by Yuanyuan Huang and Bohong Zheng
Land 2024, 13(7), 930; https://doi.org/10.3390/land13070930 - 26 Jun 2024
Cited by 3 | Viewed by 2342
Abstract
With the increasing online exposure of urban public spaces, the new concept of “internet-famous sites” has emerged in China. Social media users are the main contributors to this new phenomenon. To fully understand social media users’ preferences in such kinds of public spaces, [...] Read more.
With the increasing online exposure of urban public spaces, the new concept of “internet-famous sites” has emerged in China. Social media users are the main contributors to this new phenomenon. To fully understand social media users’ preferences in such kinds of public spaces, this article took 27 typical riverfront internet-famous sites (RIFSs) in Changsha City (China) as an example. Through social media platform selection, keyword research, text and image data extraction, visual and emotional symbol coding, and manual calculations of coding frequency, this study investigated social media users’ perception of RIFSs, especially on visual and emotional preferences. The online images and review comments were extracted from the popular Chinese social media platform “Xiaohongshu”. We found that (1) the popularity of each RIFS had a significant head effect and there were far more positive emotions than neutral and negative emotions in review comments. (2) RIFSs in Changsha were divided into five categories: commercial RIFSs, art exhibition RIFSs, historical and cultural RIFSs, ecological recreational RIFSs, and uncultivated RIFSs. Social media users had different visual focuses on each kind of RIFS. (3) Social media users provided specific reasons for their emotional preferences towards different types of RIFSs. This study can provide a new perspective on improving waterfront vitality and offer a targeted and attractive method for waterfront regeneration that is different from traditional methods. Full article
(This article belongs to the Special Issue Landscape Governance in the Age of Social Media (Second Edition))
Show Figures

Figure 1

23 pages, 736 KiB  
Review
A Systematic Review and Comprehensive Analysis of Pioneering AI Chatbot Models from Education to Healthcare: ChatGPT, Bard, Llama, Ernie and Grok
by Ketmanto Wangsa, Shakir Karim, Ergun Gide and Mahmoud Elkhodr
Future Internet 2024, 16(7), 219; https://doi.org/10.3390/fi16070219 - 22 Jun 2024
Cited by 6 | Viewed by 11221
Abstract
AI chatbots have emerged as powerful tools for providing text-based solutions to a wide range of everyday challenges. Selecting the appropriate chatbot is crucial for optimising outcomes. This paper presents a comprehensive comparative analysis of five leading chatbots: ChatGPT, Bard, Llama, Ernie, and [...] Read more.
AI chatbots have emerged as powerful tools for providing text-based solutions to a wide range of everyday challenges. Selecting the appropriate chatbot is crucial for optimising outcomes. This paper presents a comprehensive comparative analysis of five leading chatbots: ChatGPT, Bard, Llama, Ernie, and Grok. The analysis is based on a systematic review of 28 scholarly articles. The review indicates that ChatGPT, developed by OpenAI, excels in educational, medical, humanities, and writing applications but struggles with real-time data accuracy and lacks open-source flexibility. Bard, powered by Google, leverages real-time internet data for problem solving and shows potential in competitive quiz environments, albeit with performance variability and inconsistencies in responses. Llama, an open-source model from Meta, demonstrates significant promise in medical contexts, natural language processing, and personalised educational tools, yet it requires substantial computational resources. Ernie, developed by Baidu, specialises in Chinese language tasks, thus providing localised advantages that may not extend globally due to restrictive policies. Grok, developed by Xai and still in its early stages, shows promise in providing engaging, real-time interactions, humour, and mathematical reasoning capabilities, but its full potential remains to be evaluated through further development and empirical testing. The findings underscore the context-dependent utility of each model and the absence of a singularly superior chatbot. Future research should expand to include a wider range of fields, explore practical applications, and address concerns related to data privacy, ethics, security, and the responsible deployment of these technologies. Full article
Show Figures

Figure 1

19 pages, 7461 KiB  
Article
Detection System Based on Text Adversarial and Multi-Information Fusion for Inappropriate Comments in Mobile Application Reviews
by Zhicheng Yu, Yuhao Jia and Zhen Hong
Electronics 2024, 13(8), 1432; https://doi.org/10.3390/electronics13081432 - 10 Apr 2024
Viewed by 1304
Abstract
With the rapid development of mobile application technology, the content and forms of comments disseminated on the internet are becoming increasingly complex. Various comments serve as users’ firsthand reference materials for understanding the application. However, some comments contain a significant amount of inappropriate [...] Read more.
With the rapid development of mobile application technology, the content and forms of comments disseminated on the internet are becoming increasingly complex. Various comments serve as users’ firsthand reference materials for understanding the application. However, some comments contain a significant amount of inappropriate content unrelated to the app itself, such as gambling, loans, pornography, and game account recharging, seriously impacting the user experience. Therefore, this article aims to assist users in filtering out irrelevant and inappropriate messages, enabling them to quickly obtain useful and relevant information. This study focuses on analyzing actual comments on various Chinese apps on the Apple App Store. However, these irrelevant comments exhibit a certain degree of concealment, sparsity, and complexity, which increases the difficulty of detection. Additionally, due to language differences, the existing English research methods exhibit relatively poor adaptability to Chinese textual data. To overcome these challenges, this paper proposes a research method named “blend net”, which combines text adversarial and multi-information fusion detection to enhance the overall performance of the system. The experimental results demonstrate that the method proposed in this paper achieves precision and recall rates both exceeding 98%, representing an improvement of at least 2% compared to existing methods. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

18 pages, 996 KiB  
Article
REACT: Relation Extraction Method Based on Entity Attention Network and Cascade Binary Tagging Framework
by Lingqi Kong and Shengquau Liu
Appl. Sci. 2024, 14(7), 2981; https://doi.org/10.3390/app14072981 - 2 Apr 2024
Cited by 1 | Viewed by 1203
Abstract
With the development of the Internet, vast amounts of text information are being generated constantly. Methods for extracting the valuable parts from this information have become an important research field. Relation extraction aims to identify entities and the relations between them from text, [...] Read more.
With the development of the Internet, vast amounts of text information are being generated constantly. Methods for extracting the valuable parts from this information have become an important research field. Relation extraction aims to identify entities and the relations between them from text, helping computers better understand textual information. Currently, the field of relation extraction faces various challenges, particularly in addressing the relation overlapping problem. The main difficulties are as follows: (1) Traditional methods of relation extraction have limitations and lack the ability to handle the relation overlapping problem, requiring a redesign. (2) Relation extraction models are easily disturbed by noise from words with weak relevance to the relation extraction task, leading to difficulties in correctly identifying entities and their relations. In this paper, we propose the Relation extraction method based on the Entity Attention network and Cascade binary Tagging framework (REACT). We decompose the relation extraction task into two subtasks: head entity identification and tail entity and relation identification. REACT first identifies the head entity and then identifies all possible tail entities that can be paired with the head entity, as well as all possible relations. With this architecture, the model can handle the relation overlapping problem. In order to reduce the interference of words in the text that are not related to the head entity or relation extraction task and improve the accuracy of identifying the tail entities and relations, we designed an entity attention network. To demonstrate the effectiveness of REACT, we construct a high-quality Chinese dataset and conduct a large number of experiments on this dataset. The experimental results fully confirm the effectiveness of REACT, showing its significant advantages in handling the relation overlapping problem compared to current other methods. Full article
Show Figures

Figure 1

19 pages, 914 KiB  
Article
A Privacy-Preserving Multilingual Comparable Corpus Construction Method in Internet of Things
by Yu Weng, Shumin Dong and Chaomurilige
Mathematics 2024, 12(4), 598; https://doi.org/10.3390/math12040598 - 17 Feb 2024
Cited by 2 | Viewed by 1525
Abstract
With the expansion of the Internet of Things (IoT) and artificial intelligence (AI) technologies, multilingual scenarios are gradually increasing, and applications based on multilingual resources are also on the rise. In this process, apart from the need for the construction of multilingual resources, [...] Read more.
With the expansion of the Internet of Things (IoT) and artificial intelligence (AI) technologies, multilingual scenarios are gradually increasing, and applications based on multilingual resources are also on the rise. In this process, apart from the need for the construction of multilingual resources, privacy protection issues like data privacy leakage are increasingly highlighted. Comparable corpus is important in multilingual language information processing in IoT. However, the multilingual comparable corpus concerning privacy preserving is rare, so there is an urgent need to construct a multilingual corpus resource. This paper proposes a method for constructing a privacy-preserving multilingual comparable corpus, taking Chinese–Uighur–Tibetan IoT based news as an example, and mapping the different language texts to a unified language vector space to avoid sensitive information, then calculates the similarity between different language texts and serves as a comparability index to construct comparable relations. Through the decision-making mechanism of minimizing the impossibility, it can identify a comparable corpus pair of multilingual texts based on chapter size to realize the construction of a privacy-preserving Chinese–Uighur–Tibetan comparable corpus (CUTCC). Evaluation experiments demonstrate the effectiveness of our proposed provable method, which outperforms in accuracy rate by 77%, recall rate by 34% and F value by 47.17%. The CUTCC provides valuable privacy-preserving data resources support and language service for multilingual situations in IoT. Full article
Show Figures

Figure 1

16 pages, 939 KiB  
Article
Danmei and/as Fanfiction: Translations, Variations, and the Digital Semiosphere
by JSA Lowe
Humanities 2024, 13(1), 20; https://doi.org/10.3390/h13010020 - 23 Jan 2024
Viewed by 3817
Abstract
Since the late 1990s, Chinese internet publishing has seen a surge in literary production in terms of danmei, which are webnovels that share many of the features of Anglophone fanfiction. Thanks in part to recent live-action adaptations, there has been an influx of [...] Read more.
Since the late 1990s, Chinese internet publishing has seen a surge in literary production in terms of danmei, which are webnovels that share many of the features of Anglophone fanfiction. Thanks in part to recent live-action adaptations, there has been an influx of new Western and Chinese diaspora readers of danmei. Juxtaposing these bodies of literature in English in particular enables us to examine the complexities of how danmei are newly circulating in the Anglophone world and have become available themselves for transformative work, as readers also write fanfiction based on danmei. This paper offers a comparative reading of the following three such texts, which explore trauma recovery through the arc of romance: Tianya Ke, a danmei novel by Priest; Notebook No. 6 by magdaliny, a novella-length piece of fanfiction based on Marvel characters; and orange_crushed’s Strays, a fanfiction based on the live-action drama that was, in turn, based on Tianya Ke. The space described by Lotman’s semiosphere offers an additional model in which these texts reflect on one another; furthermore, along the porous digital border between fanfiction, danmei in translation, and fan novels based on danmei, readers and writers negotiate and vex contemporary culture. Full article
(This article belongs to the Special Issue The Past, Present and Future of Fan-Fiction)
Show Figures

Figure 1

16 pages, 2229 KiB  
Article
Chinese Text De-Colloquialization Technique Based on Back-Translation Strategy and End-to-End Learning
by Hongkai Liu, Zhonglin Ye, Haixing Zhao and Yanlin Yang
Appl. Sci. 2023, 13(19), 10818; https://doi.org/10.3390/app131910818 - 29 Sep 2023
Cited by 2 | Viewed by 1317
Abstract
With the development of the Internet, there has been a significant increase in various types of textual information. However, when people engage in the composition of formal texts, they often incorporate their colloquial habits, which can diminish the professionalism and formality of the [...] Read more.
With the development of the Internet, there has been a significant increase in various types of textual information. However, when people engage in the composition of formal texts, they often incorporate their colloquial habits, which can diminish the professionalism and formality of the text. Existing research on Chinese texts primarily focuses on correcting misspelt characters that are visually or phonetically similar, as well as obvious grammatical errors, such as redundancy, omissions, and incorrect word order. However, there is limited research addressing the correction of text that exhibits colloquial expressions without apparent grammatical errors or misspelt characters. This article proposes a novel technique that utilizes deep learning methods to directly transform colloquial textual expressions into formal written expressions. Firstly, a parallel corpus dataset of written and spoken language is constructed using a back-translation strategy. Then, an end-to-end learning mechanism based on neural machine translation is employed, with colloquial text as the source language and written text as the target language. This allows the model to directly transform the colloquial text into text with a formal style. Finally, an evaluation of the proposed approach is conducted using the bilingual evaluation understudy (BLEU) and manual assessment techniques. The experimental results demonstrate that the technology proposed in this paper performs well in the task of de-colloquialization in Chinese texts. The contribution of this paper lies in proposing an automated method for collecting a substitute for manually annotated parallel corpora of spoken and written language, which significantly saves time and reduces the manual cost of constructing the dataset. Furthermore, the application of end-to-end learning techniques from neural machine translation to the task of de-colloquialization allows the trained model to directly generate written language flexibly based on the input of spoken language. This presents a novel solution for the task of the de-colloquialization of Chinese text. Full article
Show Figures

Figure 1

21 pages, 7696 KiB  
Article
Non-Standard Address Parsing in Chinese Based on Integrated CHTopoNER Model and Dynamic Finite State Machine
by Mengwei Zhang, Xingui Liu, Jingzhen Ma, Zheng Zhang, Yue Qiu and Zhipeng Jiang
Appl. Sci. 2023, 13(17), 9855; https://doi.org/10.3390/app13179855 - 31 Aug 2023
Cited by 1 | Viewed by 1474
Abstract
Information in non-standard address texts in Chinese is usually presented with rough content, complex and diverse presentation forms, and inconsistent hierarchical granularity, causing low accuracy in Chinese address parsing. Therefore, we propose a method for parsing non-standard address text in Chinese that integrates [...] Read more.
Information in non-standard address texts in Chinese is usually presented with rough content, complex and diverse presentation forms, and inconsistent hierarchical granularity, causing low accuracy in Chinese address parsing. Therefore, we propose a method for parsing non-standard address text in Chinese that integrates the Chinese Toponym Named Entity Recognition (CHTopoNER) model and a dynamic finite state machine (FSM). First, named entity recognition is performed by the CHTopoNER model. Sets of dynamic FSMs are then constructed based on the address hierarchical characteristics to sort and combine the Chinese address elements, thereby achieving address parsing on the Chinese internet. This method showed excellent accuracy in parsing both standard and non-standard placename addresses. In particular, this method performed better in address parsing for disordered or missing hierarchical elements than traditional methods using an FSM. Specifically, this method achieved accuracies of 96.6% and 96.8% for standard and non-standard placenames, respectively. These accuracies increased by 8.0% and 57.1%, respectively, compared with the integrated CHTopoNER model and traditional FSM, and by 7.4% and 19.8%, respectively, compared with the integrated CHTopoNER model and bidirectional FSM. After analysis, the address-parsing method showed good scalability and adaptability, which could be applied to various types of address-parsing tasks. Full article
(This article belongs to the Special Issue Applications of Machine Learning on Earth Sciences)
Show Figures

Figure 1

18 pages, 3359 KiB  
Article
Social Media Opinion Analysis Model Based on Fusion of Text and Structural Features
by Jie Long, Zihan Li, Qi Xuan, Chenbo Fu, Songtao Peng and Yong Min
Appl. Sci. 2023, 13(12), 7221; https://doi.org/10.3390/app13127221 - 16 Jun 2023
Cited by 3 | Viewed by 2034
Abstract
The opinion recognition for comments in Internet media is a new task in text analysis. It takes comment statements as the research object, by learning the opinion tendency in the original text with annotation, and then performing opinion tendency recognition on the unannotated [...] Read more.
The opinion recognition for comments in Internet media is a new task in text analysis. It takes comment statements as the research object, by learning the opinion tendency in the original text with annotation, and then performing opinion tendency recognition on the unannotated statements. However, due to the uncertainty of NLP (natural language processing) in short scenes and the complexity of Chinese text, existing methods have some limitations in accuracy and application scenarios. In this paper, we propose an opinion tendency recognition model HGAT (heterogeneous graph attention network) that integrates text vector and context structure methods to address the above problems. This method first trains a text vectorization model based on annotation text content, then constructs an isomorphic graph with annotation, news, and theme as its apex, and then optimizes the feature vectors of all nodes using an isomorphic graph neural network model with attention mechanism. In addition, this article collected 1,684,318 news items and 57,845,091 comments based on Toutiao, sifted through 511 of those stories and their corresponding 103,787 comments, and tested the impact of HGAT on this dataset. Experiments show that this method has stable improvement effect on different NLP methods, increasing accuracy by 2–10%, and provides a new perspective for opinion tendency recognition. Full article
Show Figures

Figure 1

27 pages, 2506 KiB  
Article
A Study of Public Attitudes toward Shanghai’s Image under the Influence of COVID-19: Evidence from Comments on Sina Weibo
by Yanlong Guo, Lan Zu, Denghang Chen and Han Zhang
Int. J. Environ. Res. Public Health 2023, 20(3), 2297; https://doi.org/10.3390/ijerph20032297 - 27 Jan 2023
Cited by 2 | Viewed by 3032
Abstract
With the advent of the Internet era, Chinese users tend to choose to express their opinions on social media platforms represented by Sina Weibo. The changes in people’s emotions toward cities from the microblogging texts can reflect the image of cities presented on [...] Read more.
With the advent of the Internet era, Chinese users tend to choose to express their opinions on social media platforms represented by Sina Weibo. The changes in people’s emotions toward cities from the microblogging texts can reflect the image of cities presented on mainstream social media, and thus target a good image of cities. In this paper, we collected microblog data containing “Shanghai” from 1 January 2019 to 1 September 2022 by Python technology, and we used three methods: Term Frequency-Inverse Document Frequency keyword statistics, Latent Dirichlet Allocation theme model construction, and sentiment analysis by Zhiwang Sentiment Dictionary. We also explore the impact of the COVID-19 epidemic on Shanghai’s urban image in the context of the “Shanghai Territorial Static Management”, an important public opinion topic during the COVID-19 epidemic. The results of the study show that the “Shanghai-wide static management” of COVID-19 epidemic has significantly reduced the public’s perception of Shanghai and negatively affected the city’s image. By analyzing the data results, we summarize the basic characteristics of Shanghai’s city image and provide strategies for communicating Shanghai’s city image in the post-epidemic era. Full article
Show Figures

Figure 1

13 pages, 790 KiB  
Article
Shadow Education in China and Its Diversified Normative Governance Mechanism: Double Reduction Policy and Internet Public Opinion
by Jijian Lu, Pan Tuo, Junyan Pan, Meimei Zhou, Mohan Zhang and Shaohua Hu
Sustainability 2023, 15(2), 1437; https://doi.org/10.3390/su15021437 - 12 Jan 2023
Cited by 6 | Viewed by 7502
Abstract
As a private supplementary education activity outside the formal education system, shadow education aims to help students pass exams smoothly and obtain better educational resources. In 2021, The Chinese Government issued “Opinions on Further Reducing the Burden of Students’ Homework and Off-campus Training [...] Read more.
As a private supplementary education activity outside the formal education system, shadow education aims to help students pass exams smoothly and obtain better educational resources. In 2021, The Chinese Government issued “Opinions on Further Reducing the Burden of Students’ Homework and Off-campus Training in Compulsory Education” (referred to as the “double reduction” policy for short). We aim to strengthen the standardization of out-of-school training institutions and stimulate public debate on the Internet. However, research on the double reduction policy and how to guide the reform of shadow education in combination with online public opinion is still lacking. Based on this, the text of the double reduction policy and the popular spots for the development of online public opinion were selected for a text analysis, and a diversified and standardized governance mechanism of China’s shadow education was constructed. The results of this study show that the shadow education reforms need to pay attention to government policy documents and network public opinion, and develop public opinion warning lines. This study can provide the international academic community with information on China’s shadow education reform, providing valuable experience and a reference. Full article
(This article belongs to the Section Environmental Sustainability and Applications)
Show Figures

Figure 1

24 pages, 4257 KiB  
Article
E3W—A Combined Model Based on GreedySoup Weighting Strategy for Chinese Agricultural News Classification
by Zeyan Xiao, Senqi Yang, Xuliang Duan, Dezhao Tang, Yan Guo and Zhiyong Li
Appl. Sci. 2022, 12(23), 12059; https://doi.org/10.3390/app122312059 - 25 Nov 2022
Cited by 1 | Viewed by 2009
Abstract
With the continuous development of the internet and big data, modernization and informatization are rapidly being realized in the agricultural field. In this line, the volume of agricultural news is also increasing. This explosion of agricultural news has made accurate access to agricultural [...] Read more.
With the continuous development of the internet and big data, modernization and informatization are rapidly being realized in the agricultural field. In this line, the volume of agricultural news is also increasing. This explosion of agricultural news has made accurate access to agricultural news difficult, and the spread of news about some agricultural technologies has slowed down, resulting in certain hindrance to the development of agriculture. To address this problem, we apply NLP to agricultural news texts to classify the agricultural news, in order to ultimately improve the efficiency of agricultural news dissemination. We propose a classification model based on ERNIE + DPCNN, ERNIE, EGC, and Word2Vec + TextCNN as sub-models for Chinese short-agriculture text classification (E3W), utilizing the GreedySoup weighting strategy and multi-model combination; specifically, E3W consists of four sub-models, the output of which is processed using the GreedySoup weighting strategy. In the E3W model, we divide the classification process into two steps: in the first step, the text is passed through the four independent sub-models to obtain an initial classification result given by each sub-model; in the second step, the model considers the relationship between the initial classification result and the sub-models, and assigns weights to this initial classification result. The final category with the highest weight is used as the output of E3W. To fully evaluate the effectiveness of the E3W model, the accuracy, precision, recall, and F1-score are used as evaluation metrics in this paper. We conduct multiple sets of comparative experiments on a self-constructed agricultural data set, comparing E3W and its sub-models, as well as performing ablation experiments. The results demonstrate that the E3W model can improve the average accuracy by 1.02%, the average precision by 1.62%, the average recall by 1.21%, and the average F1-score by 1.02%. Overall, E3W can achieve state-of-the-art performance in Chinese agricultural news classification. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 3119 KiB  
Article
ArRASA: Channel Optimization for Deep Learning-Based Arabic NLU Chatbot Framework
by Meshrif Alruily
Electronics 2022, 11(22), 3745; https://doi.org/10.3390/electronics11223745 - 15 Nov 2022
Cited by 8 | Viewed by 3397
Abstract
Since the introduction of deep learning-based chatbots for knowledge services, many research and development efforts have been undertaken in a variety of fields. The global market for chatbots has grown dramatically as a result of strong demand. Nevertheless, open-domain chatbots’ limited functional scalability [...] Read more.
Since the introduction of deep learning-based chatbots for knowledge services, many research and development efforts have been undertaken in a variety of fields. The global market for chatbots has grown dramatically as a result of strong demand. Nevertheless, open-domain chatbots’ limited functional scalability poses a challenge to their implementation in industries. Much work has been performed on creating chatbots for languages such as English, Chinese, etc. Still, there is a need to develop chatbots for other languages such as Arabic, Persian, etc., as they are widely used on the Internet today. In this paper, we introduce, ArRASA as a channel optimization strategy based on a deep-learning platform to create a chatbot that understands Arabic. ArRASA is a closed-domain chatbot that can be used in any Arabic industry. The proposed system consists of four major parts. These parts include tokenization of text, featurization, intent categorization and entity extraction. The performance of ArRASA is evaluated using traditional assessment metrics, i.e., accuracy and F1 score for the intent classification and entity extraction tasks in the Arabic language. The proposed framework archives promising results by securing 96%, 94% and 94%, 95% accuracy and an F1 score for intent classification and entity extraction, respectively. Full article
Show Figures

Figure 1

15 pages, 1079 KiB  
Article
Chinese Spam Detection Using a Hybrid BiGRU-CNN Network with Joint Textual and Phonetic Embedding
by Jinliang Yao, Chenrui Wang, Chuang Hu and Xiaoxi Huang
Electronics 2022, 11(15), 2418; https://doi.org/10.3390/electronics11152418 - 3 Aug 2022
Cited by 8 | Viewed by 3202
Abstract
The proliferation of spam in China has a negative impact on internet users’ experiences online. Existing methods for detecting spam are primarily based on machine learning. However, it has been discovered that these methods are susceptible to adversarial textual spam that has frequently [...] Read more.
The proliferation of spam in China has a negative impact on internet users’ experiences online. Existing methods for detecting spam are primarily based on machine learning. However, it has been discovered that these methods are susceptible to adversarial textual spam that has frequently been imperceptibly modified by spammers. Spammers continually modify their strategies to circumvent spam detection systems. Text with Chinese homophonic substitution may be easily understood by users according to its context. Currently, spammers widely use homophonic substitution to break down spam identification systems on the internet. To address these issues, we propose a Bidirectional Gated Recurrent Unit (BiGRU)–Text Convolutional Neural Network (TextCNN) hybrid model with joint embedding for detecting Chinese spam. Our model effectively uses phonetic information and combines the advantages of parameter sharing from TextCNN with long-term memory from BiGRU. The experimental results on real-world datasets show that our model resists homophone noise to some extent and outperforms mainstream deep learning models. We also demonstrate the generality of joint textual and phonetic embedding, which is applicable to other deep learning networks in Chinese spam detection tasks. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

17 pages, 3894 KiB  
Article
EmergEventMine: End-to-End Chinese Emergency Event Extraction Using a Deep Adversarial Network
by Jianzhuo Yan, Lihong Chen, Yongchuan Yu, Hongxia Xu, Qingcai Gao, Kunpeng Cao and Jianhui Chen
ISPRS Int. J. Geo-Inf. 2022, 11(6), 345; https://doi.org/10.3390/ijgi11060345 - 10 Jun 2022
Cited by 4 | Viewed by 3064
Abstract
With the rapid development of the internet and social media, extracting emergency events from online news reports has become an urgent need for public safety. However, current studies on the text mining of emergency information mainly focus on text classification and event recognition, [...] Read more.
With the rapid development of the internet and social media, extracting emergency events from online news reports has become an urgent need for public safety. However, current studies on the text mining of emergency information mainly focus on text classification and event recognition, only obtaining a general and conceptual cognition about an emergency event, which cannot effectively support emergency risk warning, etc. Existing event extraction methods of other professional fields often depend on a domain-specific, well-designed syntactic dependency or external knowledge base, which can offer high accuracy in their professional fields, but their generalization ability is not good, and they are difficult to directly apply to the field of emergency. To address these problems, an end-to-end Chinese emergency event extraction model, called EmergEventMine, is proposed using a deep adversarial network. Considering the characteristics of Chinese emergency texts, including small-scale labelled corpora, relatively clearer syntactic structures, and concentrated argument distribution, this paper simplifies the event extraction with four subtasks as a two-stage task based on the goals of subtasks, and then develops a lightweight heterogeneous joint model based on deep neural networks for realizing end-to-end and few-shot Chinese emergency event extraction. Moreover, adversarial training is introduced into the joint model to alleviate the overfitting of the model on the small-scale labelled corpora. Experiments on the Chinese emergency corpus fully prove the effectiveness of the proposed model. Moreover, this model significantly outperforms other existing state-of-the-art event extraction models. Full article
Show Figures

Figure 1

Back to TopTop