Research

22 pages, 2548 KiB

Open AccessArticle

Comparison of Compression-Based Measures with Application to the Evolution of Primate Genomes

by Diogo Pratas, Raquel M. Silva and Armando J. Pinho

Entropy 2018, 20(6), 393; https://doi.org/10.3390/e20060393 - 23 May 2018

Cited by 5 | Viewed by 4502

An efficient DNA compressor furnishes an approximation to measure and compare information quantities present in, between and across DNA sequences, regardless of the characteristics of the sources. In this paper, we compare directly two information measures, the Normalized Compression Distance (NCD) and the [...] Read more.

An efficient DNA compressor furnishes an approximation to measure and compare information quantities present in, between and across DNA sequences, regardless of the characteristics of the sources. In this paper, we compare directly two information measures, the Normalized Compression Distance (NCD) and the Normalized Relative Compression (NRC). These measures answer different questions; the NCD measures how similar both strings are (in terms of information content) and the NRC (which, in general, is nonsymmetric) indicates the fraction of one of them that cannot be constructed using information from the other one. This leads to the problem of finding out which measure (or question) is more suitable for the answer we need. For computing both, we use a state of the art DNA sequence compressor that we benchmark with some top compressors in different compression modes. Then, we apply the compressor on DNA sequences with different scales and natures, first using synthetic sequences and then on real DNA sequences. The last include mitochondrial DNA (mtDNA), messenger RNA (mRNA) and genomic DNA (gDNA) of seven primates. We provide several insights into evolutionary acceleration rates at different scales, namely, the observation and confirmation across the whole genomes of a higher variation rate of the mtDNA relative to the gDNA. We also show the importance of relative compression for localizing similar information regions using mtDNA. Full article

(This article belongs to the Special Issue Entropy-based Data Mining)

► Show Figures

Figure 1

9 pages, 3839 KiB

Open AccessArticle

Remote Sensing Extraction Method of Tailings Ponds in Ultra-Low-Grade Iron Mining Area Based on Spectral Characteristics and Texture Entropy

by Baodong Ma, Yuteng Chen, Song Zhang and Xuexin Li

Entropy 2018, 20(5), 345; https://doi.org/10.3390/e20050345 - 06 May 2018

Cited by 12 | Viewed by 3860

Abstract

With the rapid development of the steel and iron industry, ultra-low-grade iron ore has been developed extensively since the beginning of this century in China. Due to the high concentration ratio of the iron ore, a large amount of tailings was produced and [...] Read more.

With the rapid development of the steel and iron industry, ultra-low-grade iron ore has been developed extensively since the beginning of this century in China. Due to the high concentration ratio of the iron ore, a large amount of tailings was produced and many tailings ponds were established in the mining area. This poses a great threat to regional safety and the environment because of dam breaks and metal pollution. The spatial distribution is the basic information for monitoring the status of tailings ponds. Taking Changhe Mining Area as an example, tailings ponds were extracted by using Landsat 8 OLI images based on both spectral and texture characteristics. Firstly, ultra-low-grade iron-related objects (i.e., tailings and iron ore) were extracted by the Ultra-low-grade Iron-related Objects Index (ULIOI) with a threshold. Secondly, the tailings pond was distinguished from the stope due to their entropy difference in the panchromatic image at a 7 × 7 window size. This remote sensing method could be beneficial to safety and environmental management in the mining area. Full article

(This article belongs to the Special Issue Entropy-based Data Mining)

► Show Figures

Figure 1

17 pages, 1967 KiB

Open AccessArticle

KL Divergence-Based Fuzzy Cluster Ensemble for Image Segmentation

by Huiqin Wei, Long Chen and Li Guo

Entropy 2018, 20(4), 273; https://doi.org/10.3390/e20040273 - 12 Apr 2018

Cited by 22 | Viewed by 5008

Abstract

Ensemble clustering combines different basic partitions of a dataset into a more stable and robust one. Thus, cluster ensemble plays a significant role in applications like image segmentation. However, existing ensemble methods have a few demerits, including the lack of diversity of basic [...] Read more.

Ensemble clustering combines different basic partitions of a dataset into a more stable and robust one. Thus, cluster ensemble plays a significant role in applications like image segmentation. However, existing ensemble methods have a few demerits, including the lack of diversity of basic partitions and the low accuracy caused by data noise. In this paper, to get over these difficulties, we propose an efficient fuzzy cluster ensemble method based on Kullback–Leibler divergence or simply, the KL divergence. The data are first classified with distinct fuzzy clustering methods. Then, the soft clustering results are aggregated by a fuzzy KL divergence-based objective function. Moreover, for image segmentation problems, we utilize the local spatial information in the cluster ensemble algorithm to suppress the effect of noise. Experiment results reveal that the proposed methods outperform many other methods in synthetic and real image-segmentation problems. Full article

(This article belongs to the Special Issue Entropy-based Data Mining)

► Show Figures

Figure 1

19 pages, 24604 KiB

Open AccessArticle

Multiple Sclerosis Identification Based on Fractional Fourier Entropy and a Modified Jaya Algorithm

by Shui-Hua Wang, Hong Cheng, Preetha Phillips and Yu-Dong Zhang

Entropy 2018, 20(4), 254; https://doi.org/10.3390/e20040254 - 05 Apr 2018

Cited by 35 | Viewed by 5200

Abstract

Aim: Currently, identifying multiple sclerosis (MS) by human experts may come across the problem of “normal-appearing white matter”, which causes a low sensitivity. Methods: In this study, we presented a computer vision based approached to identify MS in an automatic way. [...] Read more.

Aim: Currently, identifying multiple sclerosis (MS) by human experts may come across the problem of “normal-appearing white matter”, which causes a low sensitivity. Methods: In this study, we presented a computer vision based approached to identify MS in an automatic way. This proposed method first extracted the fractional Fourier entropy map from a specified brain image. Afterwards, it sent the features to a multilayer perceptron trained by a proposed improved parameter-free Jaya algorithm. We used cost-sensitivity learning to handle the imbalanced data problem. Results: The 10 × 10-fold cross validation showed our method yielded a sensitivity of 97.40 ± 0.60%, a specificity of 97.39 ± 0.65%, and an accuracy of 97.39 ± 0.59%. Conclusions: We validated by experiments that the proposed improved Jaya performs better than plain Jaya algorithm and other latest bioinspired algorithms in terms of classification performance and training speed. In addition, our method is superior to four state-of-the-art MS identification approaches. Full article

(This article belongs to the Special Issue Entropy-based Data Mining)

► Show Figures

Figure 1

14 pages, 2997 KiB

Open AccessArticle

Multi-Graph Multi-Label Learning Based on Entropy

by Zixuan Zhu and Yuhai Zhao

Entropy 2018, 20(4), 245; https://doi.org/10.3390/e20040245 - 02 Apr 2018

Cited by 6 | Viewed by 4256

Abstract

Recently, Multi-Graph Learning was proposed as the extension of Multi-Instance Learning and has achieved some successes. However, to the best of our knowledge, currently, there is no study working on Multi-Graph Multi-Label Learning, where each object is represented as a bag containing [...] Read more.

Recently, Multi-Graph Learning was proposed as the extension of Multi-Instance Learning and has achieved some successes. However, to the best of our knowledge, currently, there is no study working on Multi-Graph Multi-Label Learning, where each object is represented as a bag containing a number of graphs and each bag is marked with multiple class labels. It is an interesting problem existing in many applications, such as image classification, medicinal analysis and so on. In this paper, we propose an innovate algorithm to address the problem. Firstly, it uses more precise structures, multiple Graphs, instead of Instances to represent an image so that the classification accuracy could be improved. Then, it uses multiple labels as the output to eliminate the semantic ambiguity of the image. Furthermore, it calculates the entropy to mine the informative subgraphs instead of just mining the frequent subgraphs, which enables selecting the more accurate features for the classification. Lastly, since the current algorithms cannot directly deal with graph-structures, we degenerate the Multi-Graph Multi-Label Learning into the Multi-Instance Multi-Label Learning in order to solve it by MIML-ELM (Improving Multi-Instance Multi-Label Learning by Extreme Learning Machine). The performance study shows that our algorithm outperforms the competitors in terms of both effectiveness and efficiency. Full article

(This article belongs to the Special Issue Entropy-based Data Mining)

► Show Figures

Figure 1

20 pages, 1789 KiB

Open AccessArticle

Deconstructing Cross-Entropy for Probabilistic Binary Classifiers

by Daniel Ramos, Javier Franco-Pedroso, Alicia Lozano-Diez and Joaquin Gonzalez-Rodriguez

Entropy 2018, 20(3), 208; https://doi.org/10.3390/e20030208 - 20 Mar 2018

Cited by 66 | Viewed by 9649

Abstract

In this work, we analyze the cross-entropy function, widely used in classifiers both as a performance measure and as an optimization objective. We contextualize cross-entropy in the light of Bayesian decision theory, the formal probabilistic framework for making decisions, and we thoroughly analyze [...] Read more.

In this work, we analyze the cross-entropy function, widely used in classifiers both as a performance measure and as an optimization objective. We contextualize cross-entropy in the light of Bayesian decision theory, the formal probabilistic framework for making decisions, and we thoroughly analyze its motivation, meaning and interpretation from an information-theoretical point of view. In this sense, this article presents several contributions: First, we explicitly analyze the contribution to cross-entropy of (i) prior knowledge; and (ii) the value of the features in the form of a likelihood ratio. Second, we introduce a decomposition of cross-entropy into two components: discrimination and calibration. This decomposition enables the measurement of different performance aspects of a classifier in a more precise way; and justifies previously reported strategies to obtain reliable probabilities by means of the calibration of the output of a discriminating classifier. Third, we give different information-theoretical interpretations of cross-entropy, which can be useful in different application scenarios, and which are related to the concept of reference probabilities. Fourth, we present an analysis tool, the Empirical Cross-Entropy (ECE) plot, a compact representation of cross-entropy and its aforementioned decomposition. We show the power of ECE plots, as compared to other classical performance representations, in two diverse experimental examples: a speaker verification system, and a forensic case where some glass findings are present. Full article

(This article belongs to the Special Issue Entropy-based Data Mining)

► Show Figures

Graphical abstract

18 pages, 3944 KiB

Open AccessArticle

Applying Time-Dependent Attributes to Represent Demand in Road Mass Transit Systems

by Teresa Cristóbal, Gabino Padrón, Javier Lorenzo-Navarro, Alexis Quesada-Arencibia and Carmelo R. García

Entropy 2018, 20(2), 133; https://doi.org/10.3390/e20020133 - 20 Feb 2018

Cited by 2 | Viewed by 3637

Abstract

The development of efficient mass transit systems that provide quality of service is a major challenge for modern societies. To meet this challenge, it is essential to understand user demand. This article proposes using new time-dependent attributes to represent demand, attributes that differ [...] Read more.

The development of efficient mass transit systems that provide quality of service is a major challenge for modern societies. To meet this challenge, it is essential to understand user demand. This article proposes using new time-dependent attributes to represent demand, attributes that differ from those that have traditionally been used in the design and planning of this type of transit system. Data mining was used to obtain these new attributes; they were created using clustering techniques, and their quality evaluated with the Shannon entropy function and with neural networks. The methodology was implemented on an intercity public transport company and the results demonstrate that the attributes obtained offer a more precise understanding of demand and enable predictions to be made with acceptable precision. Full article

(This article belongs to the Special Issue Entropy-based Data Mining)

► Show Figures

Graphical abstract

19 pages, 1713 KiB

Open AccessArticle

Patent Keyword Extraction Algorithm Based on Distributed Representation for Patent Classification

by Jie Hu, Shaobo Li, Yong Yao, Liya Yu, Guanci Yang and Jianjun Hu

Entropy 2018, 20(2), 104; https://doi.org/10.3390/e20020104 - 02 Feb 2018

Cited by 73 | Viewed by 12522

Abstract

Many text mining tasks such as text retrieval, text summarization, and text comparisons depend on the extraction of representative keywords from the main text. Most existing keyword extraction algorithms are based on discrete bag-of-words type of word representation of the text. In this [...] Read more.

Many text mining tasks such as text retrieval, text summarization, and text comparisons depend on the extraction of representative keywords from the main text. Most existing keyword extraction algorithms are based on discrete bag-of-words type of word representation of the text. In this paper, we propose a patent keyword extraction algorithm (PKEA) based on the distributed Skip-gram model for patent classification. We also develop a set of quantitative performance measures for keyword extraction evaluation based on information gain and cross-validation, based on Support Vector Machine (SVM) classification, which are valuable when human-annotated keywords are not available. We used a standard benchmark dataset and a homemade patent dataset to evaluate the performance of PKEA. Our patent dataset includes 2500 patents from five distinct technological fields related to autonomous cars (GPS systems, lidar systems, object recognition systems, radar systems, and vehicle control systems). We compared our method with Frequency, Term Frequency-Inverse Document Frequency (TF-IDF), TextRank and Rapid Automatic Keyword Extraction (RAKE). The experimental results show that our proposed algorithm provides a promising way to extract keywords from patent texts for patent classification. Full article

(This article belongs to the Special Issue Entropy-based Data Mining)

► Show Figures

Graphical abstract

15 pages, 3113 KiB

Open AccessArticle

Using Entropy in Web Usage Data Preprocessing

by Michal Munk and Lubomir Benko

Entropy 2018, 20(1), 67; https://doi.org/10.3390/e20010067 - 22 Jan 2018

Cited by 4 | Viewed by 4742

Abstract

The paper is focused on an examination of the use of entropy in the field of web usage mining. Entropy creates an alternative possibility of determining the ratio of auxiliary pages in the session identification using the Reference Length method. The experiment was [...] Read more.

The paper is focused on an examination of the use of entropy in the field of web usage mining. Entropy creates an alternative possibility of determining the ratio of auxiliary pages in the session identification using the Reference Length method. The experiment was conducted on two different web portals. The first log file was obtained from a course of virtual learning environment web portal. The second log file was received from the web portal with anonymous access. A comparison of the results of entropy estimation of the ratio of auxiliary pages and a sitemap estimation of the ratio of auxiliary pages showed that in the case of sitemap abundance, entropy could be a full-valued substitution for the estimate of the ratio of auxiliary pages. Full article

(This article belongs to the Special Issue Entropy-based Data Mining)

► Show Figures

Figure 1

793 KiB

Open AccessArticle

Cross Entropy Method Based Hybridization of Dynamic Group Optimization Algorithm

by Rui Tang, Simon Fong, Nilanjan Dey, Raymond K. Wong and Sabah Mohammed

Entropy 2017, 19(10), 533; https://doi.org/10.3390/e19100533 - 09 Oct 2017

Cited by 11 | Viewed by 4210

Abstract

Recently, a new algorithm named dynamic group optimization (DGO) has been proposed, which lends itself strongly to exploration and exploitation. Although DGO has demonstrated its efficacy in comparison to other classical optimization algorithms, DGO has two computational drawbacks. The first one is related [...] Read more.

Recently, a new algorithm named dynamic group optimization (DGO) has been proposed, which lends itself strongly to exploration and exploitation. Although DGO has demonstrated its efficacy in comparison to other classical optimization algorithms, DGO has two computational drawbacks. The first one is related to the two mutation operators of DGO, where they may decrease the diversity of the population, limiting the search ability. The second one is the homogeneity of the updated population information which is selected only from the companions in the same group. It may result in premature convergence and deteriorate the mutation operators. In order to deal with these two problems in this paper, a new hybridized algorithm is proposed, which combines the dynamic group optimization algorithm with the cross entropy method. The cross entropy method takes advantage of sampling the problem space by generating candidate solutions using the distribution, then it updates the distribution based on the better candidate solution discovered. The cross entropy operator does not only enlarge the promising search area, but it also guarantees that the new solution is taken from all the surrounding useful information into consideration. The proposed algorithm is tested on 23 up-to-date benchmark functions; the experimental results verify that the proposed algorithm over the other contemporary population-based swarming algorithms is more effective and efficient. Full article

(This article belongs to the Special Issue Entropy-based Data Mining)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Entropy-based Data Mining

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (10 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI