Humanistic Data Processing

A special issue of Algorithms (ISSN 1999-4893). This special issue belongs to the section "Databases and Data Structures".

Deadline for manuscript submissions: closed (6 December 2016) | Viewed by 38476

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Engineering and Informatics, University of Patras, 26504 Rio Achaia, Greece
Interests: data structures; information retrieval; data mining; bioinformatics; string algorithmic; computational geometry; multimedia databases; internet technologies
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Informatics, Ionian University, 491 00 Kerkira, Greece
Interests: algorithmic data management; spatio-temporal database systems; distributed data structures and P2P overlays; cloud infrastructures; indexing; query processing and query optimization
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Data processing and analysis could be described as being one of the most important, yet challenging tasks of our era. The abundant amount of available information retrieved from, or related to the areas of Humanistic Sciences poses significant challenges to the research community. The ultimate goal is two-fold: on the one hand, to extract knowledge that will aid human behavior understanding, increasing human creativity, learning, decision making, socializing and even biological processing; on the other hand, to extract and exploit the underlying semantic knowledge by incorporating it into computationally intelligent systems.

The nature of humanistic data can be multimodal, semantically heterogeneous, dynamic, time and space-dependent, and highly complicated. Translating humanistic information, e.g., behavior, state of mind, artistic creation, linguistic utterance, learning and genomic information into numerical or categorical low-level data is a significant challenge on its own. New algorithms, appropriate for dealing with this type of data, need to be proposed and existing ones adapted to the individual special characteristics.

This Special Issue aims to bring together interdisciplinary approaches that focus on the application of innovative as well as existing data matching, fusion and mining and knowledge discovery and management techniques (such as decision rules, decision trees, association rules, ontologies and alignments, clustering, filtering, learning, classifier systems, neural networks, support vector machines, preprocessing, post processing, feature selection, visualization techniques) to data derived from all areas of Humanistic Sciences, e.g., linguistic, historical, behavioral, psychological, artistic, musical, educational, social, etc. The Issue is devoted to the exploitation of the many facets of the above fields and will explore the current related state-of-the-art. Its topics of interest cover the scope of the MHDW 2016 workshop (https://conferences.cwa.gr/mhdw2016/). Extended versions of papers presented at MHDW 2016 are sought, but this Call for Papers is fully open to all who want to contribute by submitting a relevant research manuscript.

Dr. Katia Lida Kermanidis
Dr. Christos Makris
Dr. Phivos Mylonas
Dr. Spyros Sioutas
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • humanistic sciences
  • data matching and fusion,
  • data mining,
  • knowledge discovery and management,
  • artificial intelligence,
  • information retrieval, context, social data analytics

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

315 KiB  
Article
Fuzzy Random Walkers with Second Order Bounds: An Asymmetric Analysis
by Georgios Drakopoulos, Andreas Kanavos and Konstantinos Tsakalidis
Algorithms 2017, 10(2), 40; https://doi.org/10.3390/a10020040 - 30 Mar 2017
Cited by 10 | Viewed by 4587
Abstract
Edge-fuzzy graphs constitute an essential modeling paradigm across a broad spectrum of domains ranging from artificial intelligence to computational neuroscience and social network analysis. Under this model, fundamental graph properties such as edge length and graph diameter become stochastic and as such they [...] Read more.
Edge-fuzzy graphs constitute an essential modeling paradigm across a broad spectrum of domains ranging from artificial intelligence to computational neuroscience and social network analysis. Under this model, fundamental graph properties such as edge length and graph diameter become stochastic and as such they are consequently expressed in probabilistic terms. Thus, algorithms for fuzzy graph analysis must rely on non-deterministic design principles. One such principle is Random Walker, which is based on a virtual entity and selects either edges or, like in this case, vertices of a fuzzy graph to visit. This allows the estimation of global graph properties through a long sequence of local decisions, making it a viable strategy candidate for graph processing software relying on native graph databases such as Neo4j. As a concrete example, Chebyshev Walktrap, a heuristic fuzzy community discovery algorithm relying on second order statistics and on the teleportation of the Random Walker, is proposed and its performance, expressed in terms of community coherence and number of vertex visits, is compared to the previously proposed algorithms of Markov Walktrap, Fuzzy Walktrap, and Fuzzy Newman–Girvan. In order to facilitate this comparison, a metric based on the asymmetric metrics of Tversky index and Kullback–Leibler divergence is used. Full article
(This article belongs to the Special Issue Humanistic Data Processing)
Show Figures

Figure 1

12600 KiB  
Article
A Geo-Clustering Approach for the Detection of Areas-of-Interest and Their Underlying Semantics
by Evaggelos Spyrou, Michalis Korakakis, Vasileios Charalampidis, Apostolos Psallas and Phivos Mylonas
Algorithms 2017, 10(1), 35; https://doi.org/10.3390/a10010035 - 18 Mar 2017
Cited by 13 | Viewed by 5637
Abstract
Living in the “era of social networking”, we are experiencing a data revolution, generating an astonishing amount of digital information every single day. Due to this proliferation of data volume, there has been an explosion of new application domains for information mined from [...] Read more.
Living in the “era of social networking”, we are experiencing a data revolution, generating an astonishing amount of digital information every single day. Due to this proliferation of data volume, there has been an explosion of new application domains for information mined from social networks. In this paper, we leverage this “socially-generated knowledge” (i.e., user-generated content derived from social networks) towards the detection of areas-of-interest within an urban region. These large and homogeneous areas contain multiple points-of-interest which are of special interest to particular groups of people (e.g., tourists and/or consumers). In order to identify them, we exploit two types of metadata, namely location-based information included within geo-tagged photos that we collect from Flickr, along with plain simple textual information from user-generated tags. We propose an algorithm that divides a predefined geographical area (i.e., the center of Athens, Greece) into “tile”-shaped sub-regions and based on an iterative merging procedure, it aims to detect larger, cohesive areas. We examine the performance of the algorithm both in a qualitative and quantitative manner. Our experiments demonstrate that the proposed geo-clustering algorithm is able to correctly detect regions that contain popular tourist attractions within them with very promising results. Full article
(This article belongs to the Special Issue Humanistic Data Processing)
Show Figures

Figure 1

1348 KiB  
Article
A Novel, Gradient Boosting Framework for Sentiment Analysis in Languages where NLP Resources Are Not Plentiful: A Case Study for Modern Greek
by Vasileios Athanasiou and Manolis Maragoudakis
Algorithms 2017, 10(1), 34; https://doi.org/10.3390/a10010034 - 06 Mar 2017
Cited by 36 | Viewed by 9035
Abstract
Sentiment analysis has played a primary role in text classification. It is an undoubted fact that some years ago, textual information was spreading in manageable rates; however, nowadays, such information has overcome even the most ambiguous expectations and constantly grows within seconds. It [...] Read more.
Sentiment analysis has played a primary role in text classification. It is an undoubted fact that some years ago, textual information was spreading in manageable rates; however, nowadays, such information has overcome even the most ambiguous expectations and constantly grows within seconds. It is therefore quite complex to cope with the vast amount of textual data particularly if we also take the incremental production speed into account. Social media, e-commerce, news articles, comments and opinions are broadcasted on a daily basis. A rational solution, in order to handle the abundance of data, would be to build automated information processing systems, for analyzing and extracting meaningful patterns from text. The present paper focuses on sentiment analysis applied in Greek texts. Thus far, there is no wide availability of natural language processing tools for Modern Greek. Hence, a thorough analysis of Greek, from the lexical to the syntactical level, is difficult to perform. This paper attempts a different approach, based on the proven capabilities of gradient boosting, a well-known technique for dealing with high-dimensional data. The main rationale is that since English has dominated the area of preprocessing tools and there are also quite reliable translation services, we could exploit them to transform Greek tokens into English, thus assuring the precision of the translation, since the translation of large texts is not always reliable and meaningful. The new feature set of English tokens is augmented with the original set of Greek, consequently producing a high dimensional dataset that poses certain difficulties for any traditional classifier. Accordingly, we apply gradient boosting machines, an ensemble algorithm that can learn with different loss functions providing the ability to work efficiently with high dimensional data. Moreover, for the task at hand, we deal with a class imbalance issues since the distribution of sentiments in real-world applications often displays issues of inequality. For example, in political forums or electronic discussions about immigration or religion, negative comments overwhelm the positive ones. The class imbalance problem was confronted using a hybrid technique that performs a variation of under-sampling the majority class and over-sampling the minority class, respectively. Experimental results, considering different settings, such as translation of tokens against translation of sentences, consideration of limited Greek text preprocessing and omission of the translation phase, demonstrated that the proposed gradient boosting framework can effectively cope with both high-dimensional and imbalanced datasets and performs significantly better than a plethora of traditional machine learning classification approaches in terms of precision and recall measures. Full article
(This article belongs to the Special Issue Humanistic Data Processing)
Show Figures

Figure 1

343 KiB  
Article
Large Scale Implementations for Twitter Sentiment Classification
by Andreas Kanavos, Nikolaos Nodarakis, Spyros Sioutas, Athanasios Tsakalidis, Dimitrios Tsolis and Giannis Tzimas
Algorithms 2017, 10(1), 33; https://doi.org/10.3390/a10010033 - 04 Mar 2017
Cited by 49 | Viewed by 6066
Abstract
Sentiment Analysis on Twitter Data is indeed a challenging problem due to the nature, diversity and volume of the data. People tend to express their feelings freely, which makes Twitter an ideal source for accumulating a vast amount of opinions towards a wide [...] Read more.
Sentiment Analysis on Twitter Data is indeed a challenging problem due to the nature, diversity and volume of the data. People tend to express their feelings freely, which makes Twitter an ideal source for accumulating a vast amount of opinions towards a wide spectrum of topics. This amount of information offers huge potential and can be harnessed to receive the sentiment tendency towards these topics. However, since no one can invest an infinite amount of time to read through these tweets, an automated decision making approach is necessary. Nevertheless, most existing solutions are limited in centralized environments only. Thus, they can only process at most a few thousand tweets. Such a sample is not representative in order to define the sentiment polarity towards a topic due to the massive number of tweets published daily. In this work, we develop two systems: the first in the MapReduce and the second in the Apache Spark framework for programming with Big Data. The algorithm exploits all hashtags and emoticons inside a tweet, as sentiment labels, and proceeds to a classification method of diverse sentiment types in a parallel and distributed manner. Moreover, the sentiment analysis tool is based on Machine Learning methodologies alongside Natural Language Processing techniques and utilizes Apache Spark’s Machine learning library, MLlib. In order to address the nature of Big Data, we introduce some pre-processing steps for achieving better results in Sentiment Analysis as well as Bloom filters to compact the storage size of intermediate data and boost the performance of our algorithm. Finally, the proposed system was trained and validated with real data crawled by Twitter, and, through an extensive experimental evaluation, we prove that our solution is efficient, robust and scalable while confirming the quality of our sentiment identification. Full article
(This article belongs to the Special Issue Humanistic Data Processing)
Show Figures

Figure 1

2979 KiB  
Article
Mining Domain-Specific Design Patterns: A Case Study †
by Vassiliki Gkantouna and Giannis Tzimas
Algorithms 2017, 10(1), 28; https://doi.org/10.3390/a10010028 - 21 Feb 2017
Viewed by 4536
Abstract
Domain-specific design patterns provide developers with proven solutions to common design problems that arise, particularly in a target application domain, facilitating them to produce quality designs in the domain contexts. However, research in this area is not mature and there are no techniques [...] Read more.
Domain-specific design patterns provide developers with proven solutions to common design problems that arise, particularly in a target application domain, facilitating them to produce quality designs in the domain contexts. However, research in this area is not mature and there are no techniques to support their detection. Towards this end, we propose a methodology which, when applied on a collection of websites in a specific domain, facilitates the automated identification of domain-specific design patterns. The methodology automatically extracts the conceptual models of the websites, which are subsequently analyzed in terms of all of the reusable design fragments used in them for supporting common domain functionalities. At the conceptual level, we consider these fragments as recurrent patterns consisting of a configuration of front-end interface components that interrelate each other and interact with end-users to support certain functionality. By performing a pattern-based analysis of the models, we locate the occurrences of all the recurrent patterns in the various website designs which are then evaluated towards their consistent use. The detected patterns can be used as building blocks in future designs, assisting developers to produce consistent and quality designs in the target domain. To support our case, we present a case study for the educational domain. Full article
(This article belongs to the Special Issue Humanistic Data Processing)
Show Figures

Figure 1

435 KiB  
Article
Evaluation of Diversification Techniques for Legal Information Retrieval
by Marios Koniaris, Ioannis Anagnostopoulos and Yannis Vassiliou
Algorithms 2017, 10(1), 22; https://doi.org/10.3390/a10010022 - 29 Jan 2017
Cited by 17 | Viewed by 7592
Abstract
“Public legal information from all countries and international institutions is part of the common heritage of humanity. Maximizing access to this information promotes justice and the rule of law”. In accordance with the aforementioned declaration on free access to law by legal information [...] Read more.
“Public legal information from all countries and international institutions is part of the common heritage of humanity. Maximizing access to this information promotes justice and the rule of law”. In accordance with the aforementioned declaration on free access to law by legal information institutes of the world, a plethora of legal information is available through the Internet, while the provision of legal information has never before been easier. Given that law is accessed by a much wider group of people, the majority of whom are not legally trained or qualified, diversification techniques should be employed in the context of legal information retrieval, as to increase user satisfaction. We address the diversification of results in legal search by adopting several state of the art methods from the web search, network analysis and text summarization domains. We provide an exhaustive evaluation of the methods, using a standard dataset from the common law domain that we objectively annotated with relevance judgments for this purpose. Our results: (i) reveal that users receive broader insights across the results they get from a legal information retrieval system; (ii) demonstrate that web search diversification techniques outperform other approaches (e.g., summarization-based, graph-based methods) in the context of legal diversification; and (iii) offer balance boundaries between reinforcing relevant documents or sampling the information space around the legal query. Full article
(This article belongs to the Special Issue Humanistic Data Processing)
Show Figures

Figure 1

Back to TopTop