*4.1. Data Collection and Presentation*

As illustrated in the first part of Figure 5, the power generation data extracted from the polycrystalline PV systems placed at KKU are associated with four primary data sources measured over the same period of time. Weather station sensors (WS) were located near the station to measure various parameters, namely ambient temperature (Ta), relative humidity (RH), wind speed (W), wind direction (WD), solar irradiation (SR), and precipitation (R), where solar irradiance was found to be more accurate using the Py sensor. The computed parameters from the WS and Py were also considered. The latter included the solar PV system inverters (N) and panel sensors (PVSR). The four sources of data were utilized together to conduct our experiment. However, the collected data were for December 2019 until February 2020, between the autumn and the winter seasons. During this time, data were acquired and tabulated from sunrise to sunset at an interval of each five minutes for the parameters of low and high temperatures, average temperature, humidity, wind speed, and solar radiations. This differentiated cloudy days, clear-sky days, and mix days. Eventually, about 5000 samples were collected, with different data types such as integer, float, and object. The generated power statistical summary is presented in Table 6.

**Figure 5.** Block Diagram of the System.



Eventually, the collected dataset represented the sensors readings, assuming **A** = {**a1**, **a2**, **a3**, ... , **am**} to be the dataset *n* − *by* − *m* matrix, where *n* = 5402 is the number of the observations collected from each sensor and the vector **ai** is the *i*th observation with *m* = 42 attributes, and the generated power **p** is the target of these features.
