Text Mining Applications and Theory

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Processes".

Deadline for manuscript submissions: closed (31 March 2017) | Viewed by 12827

Special Issue Editors


E-Mail Website
Guest Editor
Auckland University of Technology, New Zealand
Interests: NLP; text mining; artificial intelligence; computational linguistics; twitter

E-Mail Website
Guest Editor
Auckland University of Technology, New Zealand
Interests: information retrieval; natural language processing; question answering

Special Issue Information

Dear Colleagues,

With the excessive amounts of free form text accumulation on the web, and other forms of corpora, there has been a corresponding increase in interest to extract information from textual resources using text-mining techniques. This growing interest resulted in a large number of theories and techniques in the text-mining domain, which have been successfully applied in real world scenarios producing promising results. In addition, other areas, such as Semantic Web, Natural Language Generation, and Linguistic theories, have also built up a close relationship with text mining, which ultimately targets multidisciplinary research and a vast range of applications.

The Special Issue on “Text Mining Theory and Applications” aims to provide an international forum for researchers and practitioners to exchange information regarding advancements in the state of the art of Text Mining related research. It will contain extended versions of selected papers presented in the First New Zealand Text Mining Workshop (TMNZ-2016), which will be held in Hamilton, New Zealand on 16 November 2016.

Dr. Parma Nand
Dr. Rivindu Perera
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

 

Keywords

  • Text Mining
  • Natural Language Processing
  • Language Modeling
  • Computational Linguistics

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

1584 KiB  
Article
Identifying High Quality Document–Summary Pairs through Text Matching
by Yongshuai Hou, Yang Xiang, Buzhou Tang, Qingcai Chen, Xiaolong Wang and Fangze Zhu
Information 2017, 8(2), 64; https://doi.org/10.3390/info8020064 - 12 Jun 2017
Cited by 3 | Viewed by 6179
Abstract
Text summarization namely, automatically generating a short summary of a given document, is a difficult task in natural language processing. Nowadays, deep learning as a new technique has gradually been deployed for text summarization, but there is still a lack of large-scale high [...] Read more.
Text summarization namely, automatically generating a short summary of a given document, is a difficult task in natural language processing. Nowadays, deep learning as a new technique has gradually been deployed for text summarization, but there is still a lack of large-scale high quality datasets for this technique. In this paper, we proposed a novel deep learning method to identify high quality document–summary pairs for building a large-scale pairs dataset. Concretely, a long short-term memory (LSTM)-based model was designed to measure the quality of document–summary pairs. In order to leverage information across all parts of each document, we further proposed an improved LSTM-based model by removing the forget gate in the LSTM unit. Experiments conducted on the training set and the test set built upon Sina Weibo (a Chinese microblog website similar to Twitter) showed that the LSTM-based models significantly outperformed baseline models with regard to the area under receiver operating characteristic curve (AUC) value. Full article
(This article belongs to the Special Issue Text Mining Applications and Theory)
Show Figures

Figure 1

775 KiB  
Article
Multi-Label Classification from Multiple Noisy Sources Using Topic Models
by Divya Padmanabhan, Satyanath Bhat, Shirish Shevade and Y. Narahari
Information 2017, 8(2), 52; https://doi.org/10.3390/info8020052 - 05 May 2017
Cited by 9 | Viewed by 5745
Abstract
Multi-label classification is a well-known supervised machine learning setting where each instance is associated with multiple classes. Examples include annotation of images with multiple labels, assigning multiple tags for a web page, etc. Since several labels can be assigned to a single instance, [...] Read more.
Multi-label classification is a well-known supervised machine learning setting where each instance is associated with multiple classes. Examples include annotation of images with multiple labels, assigning multiple tags for a web page, etc. Since several labels can be assigned to a single instance, one of the key challenges in this problem is to learn the correlations between the classes. Our first contribution assumes labels from a perfect source. Towards this, we propose a novel topic model (ML-PA-LDA). The distinguishing feature in our model is that classes that are present as well as the classes that are absent generate the latent topics and hence the words. Extensive experimentation on real world datasets reveals the superior performance of the proposed model. A natural source for procuring the training dataset is through mining user-generated content or directly through users in a crowdsourcing platform. In this more practical scenario of crowdsourcing, an additional challenge arises as the labels of the training instances are provided by noisy, heterogeneous crowd-workers with unknown qualities. With this motivation, we further augment our topic model to the scenario where the labels are provided by multiple noisy sources and refer to this model as ML-PA-LDA-MNS. With experiments on simulated noisy annotators, the proposed model learns the qualities of the annotators well, even with minimal training data. Full article
(This article belongs to the Special Issue Text Mining Applications and Theory)
Show Figures

Figure 1

Back to TopTop