*3.3. The Data Analysis Module*

The Data Analysis module supports search, filtering and analysis of the data stored in each data pond of the Hydria data lake. It provides a powerful yet easy-to-use data manipulation and query mechanism that allows users to formulate queries against the data ponds involving selection, projection, grouping and ordering operations through simple point-and-click interaction and without requiring any background knowledge of SQL (Figure 2). The provided mechanism also contains in-context explanations for the different data analysis elements and provides online help with examples. This functionality is targeted mainly towards users with limited relevant experience. Besides analyzing each data pond, the module also offers data extraction and visualization functionality in a variety of different formats such as histograms, pie charts, (heat) maps, (stacked) bars/columns, area/mekko/bubble charts, scatter plots, and various file types (like CSV/TSV and raw text). The implemented data visualization component involves a three-step process where the user *(i)* defines the type of the chart to be exported, *(ii)* specifies the base dataset by selecting the data pond and the data pond field(s) that will be used to create the chart, and *(iii)* may apply filtering conditions (restrictions) on the chosen dataset.


**Figure 2.** Filter and project records.
