Submit to Information Review for Information Propose a Special Issue

Journal Menu

Journal Browser

► Journal Browser

Feature Selection for High-Dimensional Data

Special Issue Editors
Special Issue Information
Keywords
Published Papers

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (31 October 2017) | Viewed by 15491

Share This Special Issue

Special Issue Editors

Dr. Verónica Bolón Canedo

E-Mail Website
Guest Editor

Grupo LIDIA, Departamento de Computación, Facultad de Informática, Universidade da Coruña, 15071 A Coruña, Spain
Interests: machine learning; pattern recognition; feature selection; medical applications

Dr. Noelia Sánchez-Maroño

E-Mail Website
Guest Editor

Departamento de Computación, Facultad de Informática, Universidade da Coruña, 15071 A Coruña, Spain
Interests: artificial intelligence; machine learning; pattern recognition; feature selection

Dr. Amparo Alonso-Betanzos

E-Mail Website
Guest Editor

Departamento de Computación, Facultad de Informática, Universidade da Coruña, 15071 A Coruña, Spain
Interests: computer science; artificial intelligence, machine learning, feature selection, scalability issues in machine learning

Special Issue Information

Dear Colleagues,

Feature selection has been embraced as one of the high activity research areas during the last few years, because of the appearance of datasets containing hundreds of thousands of features. Therefore, feature selection was deemed as a great tool to better model the underlying process of data generation, as well as to reduce the cost of acquiring the features. Furthermore, from the Machine Learning perspective, given that feature selection can reduce the dimensionality of the problem, it can be used for maintaining or even improving the algorithms’ performance, while reducing computational costs. Nowadays, the advent of Big Data has brought unprecedented challenges to machine learning researchers, who now have to deal with huge volumes of data, in terms of both instances and features, making the learning task more complex and computationally demanding than ever. Specifically, when dealing with an extremely large number of features, learning algorithms’ performance can degenerate due to overfitting; learned models decrease their interpretability as they become more complex; and speed and efficiency of the algorithms decline in accordance with size. A vast body of feature selection methods exists in the literature, including filters based on distinct metrics (e.g., entropy, probability distributions or information theory) and embedded and wrapper methods using different induction algorithms. However, some of the most used algorithms were developed when dataset sizes were much smaller, and nowadays they cannot scale well, producing a need to readapt these successful algorithms to be able to deal with Big Data problems.

In this Special Issue, we invite investigators to contribute with their recent developments in feature selection methods for high-dimensional settings, as well as review articles that will stimulate the continuing efforts to understand the problems usually encountered in this field.

Topics of interest include, but are not limited to:

New feature selection methods
Ensemble methods for feature selection
Feature selection to deal with microarray data
Parallelization of feature selection methods
Missing data in the context of feature selection
Feature selection applications

Dr. Verónica Bolón Canedo
Dr. Noelia Sánchez-Maroño
Dr. Amparo Alonso-Betanzos
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

Feature selection
Ensemble feature selection
Filters
Wrappers
Embedded methods

Published Papers (3 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

15 pages, 3366 KiB

Open AccessArticle

Gene Selection for Microarray Cancer Data Classification by a Novel Rule-Based Algorithm

by Adrian Pino Angulo

Information 2018, 9(1), 6; https://doi.org/10.3390/info9010006 - 2 Jan 2018

Cited by 26 | Viewed by 4721

Abstract

Due to the disproportionate difference between the number of genes and samples, microarray data analysis is considered an extremely difficult task in sample classification. Feature selection mitigates this problem by removing irrelevant and redundant genes from data. In this paper, we propose a new methodology for feature selection that aims to detect relevant, non-redundant and interacting genes by analysing the feature value space instead of the feature space. Following this methodology, we also propose a new feature selection algorithm, namely Pavicd (Probabilistic Attribute-Value for Class Distinction). Experiments in fourteen microarray cancer datasets reveal that Pavicd obtains the best performance in terms of running time and classification accuracy when using Ripper-k and C4.5 as classifiers. When using SVM (Support Vector Machine), the Gbc (Genetic Bee Colony) wrapper algorithm gets the best results. However, Pavicd is significantly faster. Full article

(This article belongs to the Special Issue Feature Selection for High-Dimensional Data)

► Show Figures

Figure 1

1876 KiB

Open AccessArticle

sCwc/sLcc: Highly Scalable Feature Selection Algorithms

by Kilho Shin, Tetsuji Kuboyama, Takako Hashimoto and Dave Shepard

Information 2017, 8(4), 159; https://doi.org/10.3390/info8040159 - 6 Dec 2017

Cited by 8 | Viewed by 5060

Abstract

Feature selection is a useful tool for identifying which features, or attributes, of a dataset cause or explain the phenomena that the dataset describes, and improving the efficiency and accuracy of learning algorithms for discovering such phenomena. Consequently, feature selection has been studied intensively in machine learning research. However, while feature selection algorithms that exhibit excellent accuracy have been developed, they are seldom used for analysis of high-dimensional data because high-dimensional data usually include too many instances and features, which make traditional feature selection algorithms inefficient. To eliminate this limitation, we tried to improve the run-time performance of two of the most accurate feature selection algorithms known in the literature. The result is two accurate and fast algorithms, namely sCwc and sLcc. Multiple experiments with real social media datasets have demonstrated that our algorithms improve the performance of their original algorithms remarkably. For example, we have two datasets, one with 15,568 instances and 15,741 features, and another with 200,569 instances and 99,672 features. sCwc performed feature selection on these datasets in 1.4 seconds and in 405 seconds, respectively. In addition, sLcc has turned out to be as fast as sCwc on average. This is a remarkable improvement because it is estimated that the original algorithms would need several hours to dozens of days to process the same datasets. In addition, we introduce a fast implementation of our algorithms: sCwc does not require any adjusting parameter, while sLcc requires a threshold parameter, which we can use to control the number of features that the algorithm selects. Full article

(This article belongs to the Special Issue Feature Selection for High-Dimensional Data)

► Show Figures

Figure 1

640 KiB

Open AccessArticle

Ensemble of Filter-Based Rankers to Guide an Epsilon-Greedy Swarm Optimizer for High-Dimensional Feature Subset Selection

by Mohammad Bagher Dowlatshahi, Vali Derhami and Hossein Nezamabadi-pour

Information 2017, 8(4), 152; https://doi.org/10.3390/info8040152 - 22 Nov 2017

Cited by 35 | Viewed by 5042

Abstract

The main purpose of feature subset selection is to remove irrelevant and redundant features from data, so that learning algorithms can be trained by a subset of relevant features. So far, many algorithms have been developed for the feature subset selection, and most of these algorithms suffer from two major problems in solving high-dimensional datasets: First, some of these algorithms search in a high-dimensional feature space without any domain knowledge about the feature importance. Second, most of these algorithms are originally designed for continuous optimization problems, but feature selection is a binary optimization problem. To overcome the mentioned weaknesses, we propose a novel hybrid filter-wrapper algorithm, called Ensemble of Filter-based Rankers to guide an Epsilon-greedy Swarm Optimizer (EFR-ESO), for solving high-dimensional feature subset selection. The Epsilon-greedy Swarm Optimizer (ESO) is a novel binary swarm intelligence algorithm introduced in this paper as a novel wrapper. In the proposed EFR-ESO, we extract the knowledge about the feature importance by the ensemble of filter-based rankers and then use this knowledge to weight the feature probabilities in the ESO. Experiments on 14 high-dimensional datasets indicate that the proposed algorithm has excellent performance in terms of both the error rate of the classification and minimizing the number of features. Full article

(This article belongs to the Special Issue Feature Selection for High-Dimensional Data)

► Show Figures

Journal Menu

Journal Browser

Feature Selection for High-Dimensional Data

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (3 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI