Advances in Computational Linguistics

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Processes".

Deadline for manuscript submissions: closed (15 April 2020) | Viewed by 21405

Special Issue Editors


E-Mail Website
Guest Editor
Department of Informatics, Universidade Federal de Vicosa, Vicosa 36570-900, Brazil
Interests: multiagent systems; ontologies; knowledge representation; computational linguistics

E-Mail
Guest Editor
Department of Informatics, Universidade Federal de Vicosa, Vicosa 36570-900, Brazil
Interests: ontologies; knowledge representation; computational linguistics; cognitive linguistics

Special Issue Information

Dear Colleagues,

Since the emergence of the digital computer, the processing of information encoded in natural language has been one of the goals pursued by researchers in the field. This is because, due to the flexibility and expressiveness of human language, communication between man and machine and the extraction of information would get huge leverage. However, because of the polysemic and pragmatic nature of natural language, this goal has always been hard to achieve and was somewhat abandoned during the 1980s and 1990s.
Increasing computational power; the shift in approach from a symbolic approach to a statistical approach; and, more recently, the emergence of deep learning have enabled that goal to be achieved once more. As a result, research institutions and large technology companies have again invested heavily in research on natural language processing. Today, it is possible to acquire home computing devices that interact through natural language and can control household appliances, play music, and perform other tasks. Nonetheless, there is still much room for progress, and there are obstacles to be overcome. The treatment of metaphors and other figures of language, the generation of poetic texts, the generation of paraphrases and semantic similarity, and systems of questions and answers are some of the challenges, just to name a few.

The aim of this Special Issue is to present research dedicated to producing advances in challenging areas of natural language processing, both oral and written.

Topics of interest include but are not limited to the following:

  • Generating text poems and lyrics;
  • Generation of paraphrase and semantic similarity;
  • Contextual question and answer systems;
  • Fake News detection;
  • Word-level and sentence-level semantics;
  • Sentiment analysis and argument mining;
  • Textual inference;
  • Discourse and pragmatics;
  • Summarization;
  • Methodologies and tools for corpus annotation.

Dr. Alcione de Paiva Oliveira
Dr. Alexandra Moreira
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Natural language processing
  • Computational linguistics
  • Sentiment analysis
  • Dialogue and interactive systems
  • Discourse and pragmatics
  • Document analysis
  • Natural language generation
  • Natural language semantics
  • Information extraction
  • Text mining
  • Machine learning
  • Machine translation
  • Question answering
  • Natural language resources
  • Textual inference

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (4 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 535 KiB  
Article
Modeling Word Learning and Processing with Recurrent Neural Networks
by Claudia Marzi
Information 2020, 11(6), 320; https://doi.org/10.3390/info11060320 - 13 Jun 2020
Cited by 1 | Viewed by 3199
Abstract
The paper focuses on what two different types of Recurrent Neural Networks, namely a recurrent Long Short-Term Memory and a recurrent variant of self-organizing memories, a Temporal Self-Organizing Map, can tell us about speakers’ learning and processing a set of fully inflected verb [...] Read more.
The paper focuses on what two different types of Recurrent Neural Networks, namely a recurrent Long Short-Term Memory and a recurrent variant of self-organizing memories, a Temporal Self-Organizing Map, can tell us about speakers’ learning and processing a set of fully inflected verb forms selected from the top-frequency paradigms of Italian and German. Both architectures, due to the re-entrant layer of temporal connectivity, can develop a strong sensitivity to sequential patterns that are highly attested in the training data. The main goal is to evaluate learning and processing dynamics of verb inflection data in the two neural networks by focusing on the effects of morphological structure on word production and word recognition, as well as on word generalization for untrained verb forms. For both models, results show that production and recognition, as well as generalization, are facilitated for verb forms in regular paradigms. However, the two models are differently influenced by structural effects, with the Temporal Self-Organizing Map more prone to adaptively find a balance between processing issues of learnability and generalization, on the one side, and discriminability on the other side. Full article
(This article belongs to the Special Issue Advances in Computational Linguistics)
Show Figures

Figure 1

20 pages, 308 KiB  
Article
Fully-Unsupervised Embeddings-Based Hypernym Discovery
by Maurizio Atzori and Simone Balloccu
Information 2020, 11(5), 268; https://doi.org/10.3390/info11050268 - 18 May 2020
Cited by 8 | Viewed by 5552
Abstract
The hypernymy relation is the one occurring between an instance term and its general term (e.g., “lion” and “animal”, “Italy” and “country”). This paper we addresses Hypernym Discovery, the NLP task that aims at finding valid hypernyms from words in a given text, [...] Read more.
The hypernymy relation is the one occurring between an instance term and its general term (e.g., “lion” and “animal”, “Italy” and “country”). This paper we addresses Hypernym Discovery, the NLP task that aims at finding valid hypernyms from words in a given text, proposing HyperRank, an unsupervised approach that therefore does not require manually-labeled training sets as most approaches in the literature. The proposed algorithm exploits the cosine distance of points in the vector space of word embeddings, as already proposed by previous state of the art approaches, but the ranking is then corrected by also weighting word frequencies and the absolute level of similarity, which is expected to be similar when measuring co-hyponyms and their common hypernym. This brings us two major advantages over other approaches—(1) we correct the inadequacy of semantic similarity which is known to cause a significant performance drop and (2) we take into accounts multiple words if provided, allowing to find common hypernyms for a set of co-hyponyms—a task ignored in other systems but very useful when coupled with set expansion (that finds co-hyponyms automatically). We then evaluate HyperRank against the SemEval 2018 Hypernym Discovery task and show that, regardless of the language or domain, our algorithm significantly outperforms all the existing unsupervised algorithms and some supervised ones as well. We also evaluate the algorithm on a new dataset to measure the improvements when finding hypernyms for sets of words instead of singletons. Full article
(This article belongs to the Special Issue Advances in Computational Linguistics)
Show Figures

Figure 1

12 pages, 674 KiB  
Article
A Diverse Data Augmentation Strategy for Low-Resource Neural Machine Translation
by Yu Li, Xiao Li, Yating Yang and Rui Dong
Information 2020, 11(5), 255; https://doi.org/10.3390/info11050255 - 6 May 2020
Cited by 24 | Viewed by 5140
Abstract
One important issue that affects the performance of neural machine translation is the scale of available parallel data. For low-resource languages, the amount of parallel data is not sufficient, which results in poor translation quality. In this paper, we propose a diversity data [...] Read more.
One important issue that affects the performance of neural machine translation is the scale of available parallel data. For low-resource languages, the amount of parallel data is not sufficient, which results in poor translation quality. In this paper, we propose a diversity data augmentation method that does not use extra monolingual data. We expand the training data by generating diversity pseudo parallel data on the source and target sides. To generate diversity data, the restricted sampling strategy is employed at the decoding steps. Finally, we filter and merge origin data and synthetic parallel corpus to train the final model. In the experiment, the proposed approach achieved 1.96 BLEU points in the IWSLT2014 German–English translation tasks, which was used to simulate a low-resource language. Our approach also consistently and substantially obtained 1.0 to 2.0 BLEU improvement in three other low-resource translation tasks, including English–Turkish, Nepali–English, and Sinhala–English translation tasks. Full article
(This article belongs to the Special Issue Advances in Computational Linguistics)
Show Figures

Figure 1

23 pages, 1429 KiB  
Article
A Framework for Word Embedding Based Automatic Text Summarization and Evaluation
by Tulu Tilahun Hailu, Junqing Yu and Tessfu Geteye Fantaye
Information 2020, 11(2), 78; https://doi.org/10.3390/info11020078 - 31 Jan 2020
Cited by 18 | Viewed by 6721
Abstract
Text summarization is a process of producing a concise version of text (summary) from one or more information sources. If the generated summary preserves meaning of the original text, it will help the users to make fast and effective decision. However, how much [...] Read more.
Text summarization is a process of producing a concise version of text (summary) from one or more information sources. If the generated summary preserves meaning of the original text, it will help the users to make fast and effective decision. However, how much meaning of the source text can be preserved is becoming harder to evaluate. The most commonly used automatic evaluation metrics like Recall-Oriented Understudy for Gisting Evaluation (ROUGE) strictly rely on the overlapping n-gram units between reference and candidate summaries, which are not suitable to measure the quality of abstractive summaries. Another major challenge to evaluate text summarization systems is lack of consistent ideal reference summaries. Studies show that human summarizers can produce variable reference summaries of the same source that can significantly affect automatic evaluation metrics scores of summarization systems. Humans are biased to certain situation while producing summary, even the same person perhaps produces substantially different summaries of the same source at different time. This paper proposes a word embedding based automatic text summarization and evaluation framework, which can successfully determine salient top-n sentences of a source text as a reference summary, and evaluate the quality of systems summaries against it. Extensive experimental results demonstrate that the proposed framework is effective and able to outperform several baseline methods with regard to both text summarization systems and automatic evaluation metrics when tested on a publicly available dataset. Full article
(This article belongs to the Special Issue Advances in Computational Linguistics)
Show Figures

Figure 1

Back to TopTop