Natural Language Generation and Machine Learning

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Processes".

Deadline for manuscript submissions: closed (30 December 2019) | Viewed by 6055

Special Issue Editor


E-Mail Website
Guest Editor
School of Mathematical and Computer Sciences, Heriot Watt University, Edinburgh, Scotland, UK
Interests: natural language generation; natural language processing

Special Issue Information

Dear Colleagues,

The generation of natural language has always been one of the core tasks in the area of natural language processing, with a very wide spectrum of applications ranging from summarization systems to conversational agents. However, it was only quite recently that it received rekindled interest from the research community, mostly due to vast improvement of the linguistic capacity of neural generators and the wide adoption of more sophisticated generation systems within commercial personal assistants.

Natural language generation is an umbrella term for a variety of tasks that are usually categorized according to the different given input (e.g., text, unstructured/semi-structured/ structured meaning representations, images, dialogue history) and output formats (e.g., sentence, document, caption, dialogue utterance) they deal with. The most common tasks include text summarization, data-to-text generation with the input ranging from semi-structured record–field–value tables to meaning representations and RDF triples, caption generation of images, and conversational response generation in the context of either a closed- or open-domain dialogue. Given the subjectivity of the task and, often, the lack of large enough datasets with multiple reference text outputs, evaluating the quality of the output becomes a significant bottleneck in the deployment of a successful generation system.

The goal of this Special Issue is to present a collection of the current research in data-driven approaches to natural language generation using machine learning, aiming to capture a variety of tasks, modeling approaches, and effective evaluation techniques. We are interested in submissions of high-quality, original, technical and survey papers addressing both theoretical and practical aspects. We wish for this Special Issue to not only showcase systems with state-of-the-art performance on a specific task, but also studies that consider the ethical implications and the potential impact on society of such systems with regards to generating output which is of high fidelity, factual, and does not contradict common sense knowledge.

Dr. Ioannis Konstas
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • natural language generation
  • natural language processing
  • data-to-text generation
  • text summarization
  • knowledge base generation
  • conversational response generation
  • multilingual generation
  • caption generation
  • generation for low-resource languages
  • quality estimation for NLG

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:

Research

13 pages, 2178 KiB  
Article
Multidocument Arabic Text Summarization Based on Clustering and Word2Vec to Reduce Redundancy
by Samer Abdulateef, Naseer Ahmed Khan, Bolin Chen and Xuequn Shang
Information 2020, 11(2), 59; https://doi.org/10.3390/info11020059 - 23 Jan 2020
Cited by 41 | Viewed by 5758
Abstract
Arabic is one of the most semantically and syntactically complex languages in the world. A key challenging issue in text mining is text summarization, so we propose an unsupervised score-based method which combines the vector space model, continuous bag of words (CBOW), clustering, [...] Read more.
Arabic is one of the most semantically and syntactically complex languages in the world. A key challenging issue in text mining is text summarization, so we propose an unsupervised score-based method which combines the vector space model, continuous bag of words (CBOW), clustering, and a statistically-based method. The problems with multidocument text summarization are the noisy data, redundancy, diminished readability, and sentence incoherency. In this study, we adopt a preprocessing strategy to solve the noise problem and use the word2vec model for two purposes, first, to map the words to fixed-length vectors and, second, to obtain the semantic relationship between each vector based on the dimensions. Similarly, we use a k-means algorithm for two purposes: (1) Selecting the distinctive documents and tokenizing these documents to sentences, and (2) using another iteration of the k-means algorithm to select the key sentences based on the similarity metric to overcome the redundancy problem and generate the initial summary. Lastly, we use weighted principal component analysis (W-PCA) to map the sentences’ encoded weights based on a list of features. This selects the highest set of weights, which relates to important sentences for solving incoherency and readability problems. We adopted Recall-Oriented Understudy for Gisting Evaluation (ROUGE) as an evaluation measure to examine our proposed technique and compare it with state-of-the-art methods. Finally, an experiment on the Essex Arabic Summaries Corpus (EASC) using the ROUGE-1 and ROUGE-2 metrics showed promising results in comparison with existing methods. Full article
(This article belongs to the Special Issue Natural Language Generation and Machine Learning)
Show Figures

Figure 1

Back to TopTop