1.3.1. Why BD?

Large volumes of data are generally available in two formats: Structured or unstructured. Structured data can be generated by machines and humans using a particular model or schema, and are usually accumulated in a database. Structured data are ordered around outlines with the data types that are clearly defined. Certain examples that characterise structured data include data, time, numbers, and strings stored in the database's columns. On the other hand, unstructured data do not possess any pre-defined model or schema. Associated examples of unstructured data are log files, mobile data, social media posts, text files, and other media. These media do not have any pre-defined schema set, so they are categorised as unstructured data [12].

The amount of data generated by popular corporations, small-scale industries, and scientific projects has been growing at an extraordinary level. These high volumes of data produced present incredible processing, storage, and analytical challenges that must be carefully dealt with and considered. Furthermore, traditional relational database managemen<sup>t</sup> systems (RDBMSs) and the associated data processing tools are inadequate in dealing with huge data effectively, where the data size is typically measured in petabytes or terabytes. These existing tools lack the ability to deal with a large amount of data effectively when the size is enormous. Fortunately, paradigms and BD tools such as Hadoop and MapReduce are available to solve these BD challenges.

Certain features of BD are explained in the following.
