Next Article in Journal
Optimal Resource Allocation to Reduce an Epidemic Spread and Its Complication
Next Article in Special Issue
Multilingual Open Information Extraction: Challenges and Opportunities
Previous Article in Journal
What Message Characteristics Make Social Engineering Successful on Facebook: The Role of Central Route, Peripheral Route, and Perceived Risk
Previous Article in Special Issue
Event Extraction and Representation: A Case Study for the Portuguese Language
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Large Scale Linguistic Processing of Tweets to Understand Social Interactions among Speakers of Less Resourced Languages: The Basque Case

by
Joseba Fernandez de Landa
*,
Rodrigo Agerri
* and
Iñaki Alegria
IXA NLP Group, University of the Basque Country UPV/EHU, 20018 Donostia-San Sebastian, Spain
*
Authors to whom correspondence should be addressed.
Information 2019, 10(6), 212; https://doi.org/10.3390/info10060212
Submission received: 30 April 2019 / Revised: 4 June 2019 / Accepted: 11 June 2019 / Published: 13 June 2019
(This article belongs to the Special Issue Natural Language Processing and Text Mining)

Abstract

Social networks like Twitter are increasingly important in the creation of new ways of communication. They have also become useful tools for social and linguistic research due to the massive amounts of public textual data available. This is particularly important for less resourced languages, as it allows to apply current natural language processing techniques to large amounts of unstructured data. In this work, we study the linguistic and social aspects of young and adult people’s behaviour based on their tweets’ contents and the social relations that arise from them. With this objective in mind, we have gathered over 10 million tweets from more than 8000 users. First, we classified each user in terms of its life stage (young/adult) according to the writing style of their tweets. Second, we applied topic modelling techniques to the personal tweets to find the most popular topics according to life stages. Third, we established the relations and communities that emerge based on the retweets. We conclude that using large amounts of unstructured data provided by Twitter facilitates social research using computational techniques such as natural language processing, giving the opportunity both to segment communities based on demographic characteristics and to discover how they interact or relate to them.
Keywords: social informatics; social networks; topic modelling; relations; less resourced languages; text classification; information extraction; natural language processing social informatics; social networks; topic modelling; relations; less resourced languages; text classification; information extraction; natural language processing

Share and Cite

MDPI and ACS Style

Fernandez de Landa, J.; Agerri, R.; Alegria, I. Large Scale Linguistic Processing of Tweets to Understand Social Interactions among Speakers of Less Resourced Languages: The Basque Case. Information 2019, 10, 212. https://doi.org/10.3390/info10060212

AMA Style

Fernandez de Landa J, Agerri R, Alegria I. Large Scale Linguistic Processing of Tweets to Understand Social Interactions among Speakers of Less Resourced Languages: The Basque Case. Information. 2019; 10(6):212. https://doi.org/10.3390/info10060212

Chicago/Turabian Style

Fernandez de Landa, Joseba, Rodrigo Agerri, and Iñaki Alegria. 2019. "Large Scale Linguistic Processing of Tweets to Understand Social Interactions among Speakers of Less Resourced Languages: The Basque Case" Information 10, no. 6: 212. https://doi.org/10.3390/info10060212

APA Style

Fernandez de Landa, J., Agerri, R., & Alegria, I. (2019). Large Scale Linguistic Processing of Tweets to Understand Social Interactions among Speakers of Less Resourced Languages: The Basque Case. Information, 10(6), 212. https://doi.org/10.3390/info10060212

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop