**1. Introduction**

With the irruption of the "big data" phenomenon, massive amounts of data are generated daily. These data are normally available in a raw format and need to be treated before acquiring any knowledge from them. This step in the big data chain is usually referred to as preprocessing and there exists a wide range of techniques [1].

The main approaches to preprocess big data are discretisation and feature selection. The former transforms continuous data to a limited set of values. Feature selection aims to reduce the number of attributes [1,2].

The remainder of this paper is organised as follows: Section 2 introduces the different preprocessing techniques, dividing them into feature selection and discretisation. For each of these techniques, a classification with different examples for each category is presented. Section 3 concludes the paper and suggests a future line of research.

### **2. Data Preprocessing**

Different feature selection and discretisation techniques are presented in this section based on big data projects where they have been applied.
