*Proceeding Paper* **Survey on Preprocessing Techniques for Big Data Projects †**

**Ignacio D. Lopez-Miguel**

Centro de Postgrado, Universidad Internacional Menéndez Pelayo, C/ Isaac peral, 23, 28040 Madrid, Spain; lopezmiguelignacio@posgrado.uimp.es

† Presented at the 4th XoveTIC Conference, A Coruña, Spain, 7–8 October 2021.

**Abstract:** In the era of big data, a vast amount of data are being produced. This results in two main issues when trying to discover knowledge from these data. There is a lot of information that is not relevant to the problem we want to solve, and there are many imperfections and errors in the data. Therefore, preprocessing these data is a key step before applying any kind of learning algorithm. Reducing the number of features to a relevant subset (feature selection) and reducing the possible values of continuous variables (discretisation) are two of the main preprocessing techniques. This paper will review different methods for completing these two steps, focusing on the big data context and giving examples of projects where they have been applied.

**Keywords:** preprocessing; big data; feature selection; discretisation
