Feature Selection in Big Image Datasets †
Abstract
:1. Introduction
2. Materials and Methods
- Feature extraction: In this work, image feature extraction was performed in order to transform image datasets into columnar feature datasets. The techniques applied here are—bag of features methods based on feature detection algorithms like SIFT [3], SURF [4] and KAZE [5]; linear binary pattern (LBP) methods [6]; and convolutional neural networks (ConvNets) used as feature extractors through architectures like VGG, ResNet and DenseNet.
- Feature selection: Feature selection includes a broad family of dimensionality reduction techniques that achieve reduction by removing the irrelevant and redundant features while keeping the original relevant ones. Particularly, filter methods select a subset of the original feature set independently of the induction model used. Accordingly, these filter methods are more likely to be applied in a big data scenario due to advantages related to computational costs [7]. In such framework, this research has driven the feature selection stage using the big data platform Apache Spark and some implementations of such filter methods: Spark’s MLlib [8] implementation of the filter selector [9]; Spark’s implementation of the Relief-F method [10]; and ITFS framework [11] implementation for Spark [12].
- Classification: Not every available classifier in Spark MLlib has a multi-class nature. So, the suitable models in Spark for this problem are Decision Trees, Random Forests, Naive Bayes and Multilayer Perceptron classifiers. Given the results obtained in the experiments, these two last classifiers were used in the results presented in this manuscript.
3. Results
4. Discussion and Conclusions
Funding
Acknowledgments
Conflicts of Interest
References
- Bellman, R.E. Dynamic Programming; Dover Publications, Inc.: New York, NY, USA, 2003. [Google Scholar]
- Guyon, I.; Gunn, S.; Nikravesh, M.; Zadeh, L.A. Feature Extraction: Foundations and Applications; Springer: Berlin/Heidelberg, Germany, 2008; Volume 207. [Google Scholar]
- Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Bay, H.; Ess, A.; Tuytelaars, T.; Van Gool, L. SURF: Speeded up robust features. Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
- Alcantarilla, P.F.; Bartoli, A.; Davison, A.J. KAZE features. In Proceedings of the European Conference on Computer Vision, Firenze, Italy, 7–13 October 2012; pp. 214–227. [Google Scholar]
- Ojala, T.; Pietikainen, M.; Harwood, D. A Comparative Study of Texture Measures with Classification Based on Feature Distributions. Pattern Recognit. 1996, 29, 51–59. [Google Scholar] [CrossRef]
- Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. Feature selection for high-dimensional data. Prog. Artif. Intell. 2016, 5, 65–75. [Google Scholar] [CrossRef]
- Meng, X.; Bradley, J.; Yavuz, B.; Sparks, E.; Venkataraman, S.; Liu, D.; Freeman, J.; Tsai, D.; Amde, M.; Owen, S.; et al. Mllib: Machine learning in apache spark. J. Mach. Learn. Res. 2016, 17, 1235–1241. [Google Scholar]
- Barnard, G. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. In Breakthroughs in Statistics; Springer: Berlin/Heidelberg, Germany, 1992; pp. 1–10. [Google Scholar]
- Ramirez, S. RELIEF-F Feature Selection for Apache Spark. Available online: https://github.com/sramirez/spark-RELIEFFC-fselection (accessed on 6 May 2019).
- Brown, G.; Pocock, A.; Zhao, M.J.; Luján, M. Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection. J. Mach. Learn. Res. 2012, 13, 27–66. [Google Scholar]
- Ramírez-Gallego, S.; Mouriño-Talín, H.; Martínez-Rego, D.; Bolón-Canedo, V.; Benítez, J.M.; Alonso-Betanzos, A.; Herrera, F. An Information Theory-Based Feature Selection Framework for Big Data Under Apache Spark. IEEE Trans. Syst. Man Cybern. Syst. 2018, 48, 1441–1453. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Figueira-Domínguez, J.G.; Bolón-Canedo, V.; Remeseiro, B. Feature Selection in Big Image Datasets. Proceedings 2020, 54, 40. https://doi.org/10.3390/proceedings2020054040
Figueira-Domínguez JG, Bolón-Canedo V, Remeseiro B. Feature Selection in Big Image Datasets. Proceedings. 2020; 54(1):40. https://doi.org/10.3390/proceedings2020054040
Chicago/Turabian StyleFigueira-Domínguez, J. Guzmán, Verónica Bolón-Canedo, and Beatriz Remeseiro. 2020. "Feature Selection in Big Image Datasets" Proceedings 54, no. 1: 40. https://doi.org/10.3390/proceedings2020054040
APA StyleFigueira-Domínguez, J. G., Bolón-Canedo, V., & Remeseiro, B. (2020). Feature Selection in Big Image Datasets. Proceedings, 54(1), 40. https://doi.org/10.3390/proceedings2020054040