Humanistic Data Mining: Tools and Applications

A special issue of Algorithms (ISSN 1999-4893).

Deadline for manuscript submissions: closed (31 October 2018) | Viewed by 34524

Special Issue Editors

Department of Computer Engineering and Informatics, University of Patras, 26504 Patras, Greece
Interests: data structures; information retrieval; data mining; bioinformatics; string algorithmic; computational geometry; multimedia databases; internet technologies
Special Issues, Collections and Topics in MDPI journals
Department of Informatics, Ionian University, 491 00 Kerkira, Greece
Interests: algorithmic data management; spatio-temporal database systems; distributed data structures and P2P overlays; cloud infrastructures; indexing; query processing and query optimization
Special Issues, Collections and Topics in MDPI journals
Computer Engineering & Informatics Department, University of Patras, Greece
Interests: database and knowledge-based systems; intelligent information systems; data mining; pattern recognition; data compression; biomedical informatics; multimedia

Special Issue Information

Dear Colleagues,

Digital data mining could be described as one of the most important, computationally intensive and challenging tasks of our era. As this observation applies both to the research community, which is faced with enormous challenges derived from (big-)data management as well as new emerging disciplines like, for instance, precision agriculture, and the applied world, in terms, for instance, of social data handling and related social apps, it is becoming evident that new approaches have to be followed and new tools and applications have to invented in order to efficiently handle the vast amounts of information.

The aim of the “Mining Humanistic Data Workshop”, and by association of the proposed Special Issue, is formed around two main pillars. The first pillar focuses on the primitive information and knowledge analysis, as well as the extraction of the inherited knowledge. The task here is to achieve a better understanding of human activities associated to the respective computational tasks. The second pillar aims to exploit the extracted knowledge by incorporating it into smart tools and applications; the latter will ultimately make the life of involved users easier with respect to their everyday life.

This Special Issue aims to bring together interdisciplinary approaches that focus on the application of innovative as well as existing humanistic data mining and knowledge discovery and management methodologies. Since humanistic data typically are dominated by semantic heterogeneity and are quite dynamic in nature, computer science researchers are obliged and encouraged to develop new suitable algorithms, tools and applications to efficiently tackle them, whereas existing ones need to be adapted to the individual special characteristics using traditional methodologies, such as decision rules, decision trees, association rules, ontologies and alignments, clustering, filtering, learning, classifier systems, neural networks, support vector machines, preprocessing, post processing, feature selection and visualization techniques. The Special Issue is devoted to the exploitation of the multiple facets of the above research fields and will explore the current related state-of-the-art. Its topics of interest cover the scope of the MHDW 2018 workshop (https://conferences.cwa.gr/mhdw2018/). Extended versions of papers presented at MHDW 2018 are sought, but this Call for Papers is also fully open to all who want to contribute by submitting a relevant research manuscript.

Dr. Phivos Mylonas
Dr. Katia Lida Kermanidis
Dr. Christos Makris
Dr. Spyros Sioutas
Dr. Vasileios Megalooikonomou
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Algorithms is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • humanistic sciences
  • data mining
  • knowledge discovery
  • knowledge representation and management
  • artificial intelligence
  • information retrieval
  • context
  • social data analytics

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 463 KiB  
Article
Ensemble and Deep Learning for Language-Independent Automatic Selection of Parallel Data
by Despoina Mouratidis and Katia Lida Kermanidis
Algorithms 2019, 12(1), 26; https://doi.org/10.3390/a12010026 - 18 Jan 2019
Cited by 5 | Viewed by 4348
Abstract
Machine translation is used in many applications in everyday life. Due to the increase of translated documents that need to be organized as useful or not (for building a translation model), the automated categorization of texts (classification), is a popular research field of [...] Read more.
Machine translation is used in many applications in everyday life. Due to the increase of translated documents that need to be organized as useful or not (for building a translation model), the automated categorization of texts (classification), is a popular research field of machine learning. This kind of information can be quite helpful for machine translation. Our parallel corpora (English-Greek and English-Italian) are based on educational data, which are quite difficult to translate. We apply two state of the art architectures, Random Forest (RF) and Deeplearnig4j (DL4J), to our data (which constitute three translation outputs). To our knowledge, this is the first time that deep learning architectures are applied to the automatic selection of parallel data. We also propose new string-based features that seem to be effective for the classifier, and we investigate whether an attribute selection method could be used for better classification accuracy. Experimental results indicate an increase of up to 4% (compared to our previous work) using RF and rather satisfactory results using DL4J. Full article
(This article belongs to the Special Issue Humanistic Data Mining: Tools and Applications)
Show Figures

Figure 1

19 pages, 541 KiB  
Article
Trajectory Clustering and k-NN for Robust Privacy Preserving Spatiotemporal Databases
by Elias Dritsas, Maria Trigka, Panagiotis Gerolymatos and Spyros Sioutas
Algorithms 2018, 11(12), 207; https://doi.org/10.3390/a11120207 - 14 Dec 2018
Cited by 10 | Viewed by 5184
Abstract
In the context of this research work, we studied the problem of privacy preserving on spatiotemporal databases. In particular, we investigated the k-anonymity of mobile users based on real trajectory data. The k-anonymity set consists of the k nearest neighbors. We [...] Read more.
In the context of this research work, we studied the problem of privacy preserving on spatiotemporal databases. In particular, we investigated the k-anonymity of mobile users based on real trajectory data. The k-anonymity set consists of the k nearest neighbors. We constructed a motion vector of the form (x,y,g,v) where x and y are the spatial coordinates, g is the angle direction, and v is the velocity of mobile users, and studied the problem in four-dimensional space. We followed two approaches. The former applied only k-Nearest Neighbor (k-NN) algorithm on the whole dataset, while the latter combined trajectory clustering, based on K-means, with k-NN. Actually, it applied k-NN inside a cluster of mobile users with similar motion pattern (g,v). We defined a metric, called vulnerability, that measures the rate at which k-NNs are varying. This metric varies from 1 k (high robustness) to 1 (low robustness) and represents the probability the real identity of a mobile user being discovered from a potential attacker. The aim of this work was to prove that, with high probability, the above rate tends to a number very close to 1 k in clustering method, which means that the k-anonymity is highly preserved. Through experiments on real spatial datasets, we evaluated the anonymity robustness, the so-called vulnerability, of the proposed method. Full article
(This article belongs to the Special Issue Humanistic Data Mining: Tools and Applications)
Show Figures

Figure 1

12 pages, 1046 KiB  
Article
Decision Support Software for Forecasting Patient’s Length of Stay
by Ioannis E. Livieris, Theodore Kotsilieris, Ioannis Dimopoulos and Panagiotis Pintelas
Algorithms 2018, 11(12), 199; https://doi.org/10.3390/a11120199 - 06 Dec 2018
Cited by 12 | Viewed by 3755
Abstract
Length of stay of hospitalized patients is generally considered to be a significant and critical factor for healthcare policy planning which consequently affects the hospital management plan and resources. Its reliable prediction in the preadmission stage could further assist in identifying abnormality or [...] Read more.
Length of stay of hospitalized patients is generally considered to be a significant and critical factor for healthcare policy planning which consequently affects the hospital management plan and resources. Its reliable prediction in the preadmission stage could further assist in identifying abnormality or potential medical risks to trigger additional attention for individual cases. Recently, data mining and machine learning constitute significant tools in the healthcare domain. In this work, we introduce a new decision support software for the accurate prediction of hospitalized patients’ length of stay which incorporates a novel two-level classification algorithm. Our numerical experiments indicate that the proposed algorithm exhibits better classification performance than any examined single learning algorithm. The proposed software was developed to provide assistance to the hospital management and strengthen the service system by offering customized assistance according to patients’ predicted hospitalization time. Full article
(This article belongs to the Special Issue Humanistic Data Mining: Tools and Applications)
Show Figures

Figure 1

22 pages, 1404 KiB  
Article
Solon: A Holistic Approach for Modelling, Managing and Mining Legal Sources
by Marios Koniaris, George Papastefanatos and Ioannis Anagnostopoulos
Algorithms 2018, 11(12), 196; https://doi.org/10.3390/a11120196 - 03 Dec 2018
Cited by 6 | Viewed by 4308
Abstract
Recently there has been an exponential growth of the number of publicly available legal resources. Portals allowing users to search legal documents, through keyword queries, are now widespread. However, legal documents are mainly stored and offered in different sources and formats that do [...] Read more.
Recently there has been an exponential growth of the number of publicly available legal resources. Portals allowing users to search legal documents, through keyword queries, are now widespread. However, legal documents are mainly stored and offered in different sources and formats that do not facilitate semantic machine-readable techniques, thus making difficult for legal stakeholders to acquire, modify or interlink legal knowledge. In this paper, we describe Solon, a legal document management platform. It offers advanced modelling, managing and mining functions over legal sources, so as to facilitate access to legal knowledge. It utilizes a novel method for extracting semantic representations of legal sources from unstructured formats, such as PDF and HTML text files, interlinking and enhancing them with classification features. At the same time, utilizing the structure and specific features of legal sources, it provides refined search results. Finally, it allows users to connect and explore legal resources according to their individual needs. To demonstrate the applicability and usefulness of our approach, Solon has been successfully deployed in a public sector production environment, making Greek tax legislation easily accessible to the public. Opening up legislation in this way will help increase transparency and make governments more accountable to citizens. Full article
(This article belongs to the Special Issue Humanistic Data Mining: Tools and Applications)
Show Figures

Figure 1

24 pages, 3347 KiB  
Article
Measuring the Impact of Financial News and Social Media on Stock Market Modeling Using Time Series Mining Techniques
by Foteini Kollintza-Kyriakoulia, Manolis Maragoudakis and Anastasia Krithara
Algorithms 2018, 11(11), 181; https://doi.org/10.3390/a11110181 - 06 Nov 2018
Cited by 8 | Viewed by 4943
Abstract
In this work, we study the task of predicting the closing price of the following day of a stock, based on technical analysis, news articles and public opinions. The intuition of this study lies in the fact that technical analysis contains information about [...] Read more.
In this work, we study the task of predicting the closing price of the following day of a stock, based on technical analysis, news articles and public opinions. The intuition of this study lies in the fact that technical analysis contains information about the event, but not the cause of the change, while data like news articles and public opinions may be interpreted as a cause. The paper uses time series analysis techniques such as Symbolic Aggregate Approximation (SAX) and Dynamic Time Warping (DTW) to study the existence of a relation between price data and textual information, either from news or social media. Pattern matching techniques from time series data are also incorporated, in order to experimentally validate potential correlations of price and textual information within given time periods. The ultimate goal is to create a forecasting model that exploits the previously discovered patterns in order to augment the forecasting accuracy. Results obtained from the experimental phase are promising. The performance of the classifier shows clear signs of improvement and robustness within the time periods where patterns between stock price and the textual information have been identified, compared to the periods where patterns did not exist. Full article
(This article belongs to the Special Issue Humanistic Data Mining: Tools and Applications)
Show Figures

Figure 1

22 pages, 1468 KiB  
Article
Learning Representations of Natural Language Texts with Generative Adversarial Networks at Document, Sentence, and Aspect Level
by Aggeliki Vlachostergiou, George Caridakis, Phivos Mylonas and Andreas Stafylopatis
Algorithms 2018, 11(10), 164; https://doi.org/10.3390/a11100164 - 22 Oct 2018
Cited by 8 | Viewed by 4048
Abstract
The ability to learn robust, resizable feature representations from unlabeled data has potential applications in a wide variety of machine learning tasks. One way to create such representations is to train deep generative models that can learn to capture the complex distribution of [...] Read more.
The ability to learn robust, resizable feature representations from unlabeled data has potential applications in a wide variety of machine learning tasks. One way to create such representations is to train deep generative models that can learn to capture the complex distribution of real-world data. Generative adversarial network (GAN) approaches have shown impressive results in producing generative models of images, but relatively little work has been done on evaluating the performance of these methods for the learning representation of natural language, both in supervised and unsupervised settings at the document, sentence, and aspect level. Extensive research validation experiments were performed by leveraging the 20 Newsgroups corpus, the Movie Review (MR) Dataset, and the Finegrained Sentiment Dataset (FSD). Our experimental analysis suggests that GANs can successfully learn representations of natural language texts at all three aforementioned levels. Full article
(This article belongs to the Special Issue Humanistic Data Mining: Tools and Applications)
Show Figures

Figure 1

19 pages, 507 KiB  
Article
LSTM Accelerator for Convolutional Object Identification
by Alkiviadis Savvopoulos, Andreas Kanavos, Phivos Mylonas and Spyros Sioutas
Algorithms 2018, 11(10), 157; https://doi.org/10.3390/a11100157 - 17 Oct 2018
Cited by 22 | Viewed by 3023
Abstract
Deep Learning has dramatically advanced the state of the art in vision, speech and many other areas. Recently, numerous deep learning algorithms have been proposed to solve traditional artificial intelligence problems. In this paper, in order to detect the version that can provide [...] Read more.
Deep Learning has dramatically advanced the state of the art in vision, speech and many other areas. Recently, numerous deep learning algorithms have been proposed to solve traditional artificial intelligence problems. In this paper, in order to detect the version that can provide the best trade-off in terms of time and accuracy, convolutional networks of various depths have been implemented. Batch normalization is also considered since it acts as a regularizer and achieves the same accuracy with fewer training steps. For maximizing the yield of the complexity by diminishing, as well as minimizing the loss of accuracy, LSTM neural net layers are utilized in the process. The image sequences are proven to be classified by the LSTM in a more accelerated manner, while managing better precision. Concretely, the more complex the CNN, the higher the percentages of exactitude; in addition, but for the high-rank increase in accuracy, the time was significantly decreased, which eventually rendered the trade-off optimal. The average improvement of performance for all models regarding both datasets used amounted to 42 % . Full article
(This article belongs to the Special Issue Humanistic Data Mining: Tools and Applications)
Show Figures

Figure 1

16 pages, 325 KiB  
Article
An Auto-Adjustable Semi-Supervised Self-Training Algorithm
by Ioannis E. Livieris, Andreas Kanavos, Vassilis Tampakas and Panagiotis Pintelas
Algorithms 2018, 11(9), 139; https://doi.org/10.3390/a11090139 - 14 Sep 2018
Cited by 21 | Viewed by 3973
Abstract
Semi-supervised learning algorithms have become a topic of significant research as an alternative to traditional classification methods which exhibit remarkable performance over labeled data but lack the ability to be applied on large amounts of unlabeled data. In this work, we propose a [...] Read more.
Semi-supervised learning algorithms have become a topic of significant research as an alternative to traditional classification methods which exhibit remarkable performance over labeled data but lack the ability to be applied on large amounts of unlabeled data. In this work, we propose a new semi-supervised learning algorithm that dynamically selects the most promising learner for a classification problem from a pool of classifiers based on a self-training philosophy. Our experimental results illustrate that the proposed algorithm outperforms its component semi-supervised learning algorithms in terms of accuracy, leading to more efficient, stable and robust predictive models. Full article
(This article belongs to the Special Issue Humanistic Data Mining: Tools and Applications)
Show Figures

Figure 1

Back to TopTop