*2.2. El Niño–Southern Oscillation (ENSO)*

El Niño–southern oscillation (ENSO) is a periodic change in the oceanic atmosphere system in the tropical Pacific Ocean that affects climate around the world. It occurs every three to seven years (average five years) and typically lasts nine months to two years, associated with floods, droughts, and other global disturbances. During normal or non El Niño conditions, trade winds blow west across the Pacific Ocean. The western part of the equatorial Pacific is characterized by warm, wet, and low-pressure weather conditions due to the accumulation of moisture in the form of typhoons and thunderstorms.

During the ENSO event, there was an increase in air pressure across the Indian Ocean, Indonesia, and Australia, and a decrease in air pressure over Tahiti and the rest of the central and eastern Pacific Ocean. The trade winds in the South Pacific weaken or head east, and warm water spreads eastward from the western Pacific and Indian Ocean to the eastern Pacific. This has led to widespread droughts in the western Pacific and dry eastern Pacific rainfall. While El Niño is characterized by unusually warm ocean temperatures in the central to eastern Pacific Ocean, La Niña is characterized by unusually cold ocean temperatures in the region, but warmer waters in the western Pacific Ocean, as shown in Figure 3. However, as El Niño conditions lasted for several months, more global warming occurred in the oceans.

**Figure 3.** Model of surface temperature, wind, area of rising air and thermoline (blue surface) in the tropical Pacific during El Niño, Normal, and La Niña (https://reefresilience.org/th/stressors/ climate-and-ocean-change/el-nino-southern-oscillation/, accessed on 24 March 2022).

In this study, the Oceanic Niño Index (ONI) from the National Oceanic and Atmospheric Administration (2020) was used to identify the El Niño–southern oscillation. The ONI is the 3-month running mean of the sea surface temperature anomaly in the Niño 3.4 region (5◦ N–5◦ S, 120◦–170◦ W). The ONI index exceeding +0.5 ◦C or −0.5 ◦C for at least five consecutive months was considered as a full-fledged El Niño (E) or La Niña (L). According to Null report, the three latest very strong El Niño events (ONI ≥ 2 ◦C) in 1982, 1997, and 2015 and three latest strong La Niña events (−1.5 to −1.9 ◦C) in 1999, 2007, and 2011 were selected to study the climate variations [19].

#### *2.3. Conformal Cubic Atmospheric Model (CCAM)*

The CCAM is a dynamic global climate model developed by the Commonwealth Scientific and Industrial Research Organization (CSIRO), Division of Atmospheric Research, Australia. It is used to forecast global climate through dynamic scale reduction by generating a grid covering the region's forecast area [20]. The model has also been developed by adding physical parameterization schemes that include longwave radiation, shortwave radiation, aerosol, cumulus convection, cloud distribution, soil temperature, etc., to reduce the climate forecast error. The CCAM dataset was downscaled to 10 km grid resolution, which is sufficient for the analysis of both spatial and temporal forecasts at the regional level [21,22]. Data were changed from grid data to station format, which covers 124 meteorological measurement stations across Thailand (Figure 1).

Climate variables, focusing mainly on agricultural-related variables for cluster analysis, were used in this study. They consist of a total of 7 variables: rainfall (mm/day), relative humidity (percent), average temperature (degrees Celsius), maximum temperature (degree Celsius), minimum temperature (degrees Celsius), solar radiation (watts/square meter), and wind speed (m/s). Monthly data of those variables were collected for the years 1987, 1999, 2000, 2004, 2011, and 2015, of which the ENSO phenomenon occurred.

## *2.4. Multivariate Panel Data*

Panel data is the combination of cross-sectional data and time-series data representing behavioral units over the time *xij*(*t*) . Data were collected from cross-section data, which collects the value of the variables in each unit at a given point in time. Then, the data were repeatedly collected from the same unit at a subsequent time, either yearly, quarterly, monthly, weekly, daily, or hourly. If each panel unit is observed at the same time point, a data set is called balanced panel data. Consequently, if a balanced panel contains *n* panel units and *T* periods, the number of observations in the dataset is necessarily *N* = *n* × *T*. However, if at least one panel unit is not observed every period, a data set is called unbalanced panel data. Therefore, the number of observations in the unbalanced panel dataset is *N* < *n* × *T*.

Multivariate panel data has a very complex structure and cannot be represented by a simple two-dimensional table. Table 1 shows the multivariate combination of data in a twodimensional table format, where *n* represents the number of samples collected, *p* represents the number of variables (*x*1, *x*2, ... , *xp*), *T* represents the length of time and represents the

data value of the *i*th sample and *j*th variable at time *t*, where *i* ∈ [1, *n*]; *j* ∈ [1, *p*]; *t* ∈ [1, *T*]. Descriptive statistics, such as mean and variance of *j*th variable, is calculated as Equations (1) and (2), respectively [15–18].

$$\overline{\mathbf{x}}\_{\dot{j}} = \frac{1}{T} \frac{1}{n} \sum\_{t=1}^{T} \sum\_{i=1}^{n} x\_{\dot{i}\dot{j}}(t),\tag{1}$$

$$s\_j^2 = \frac{1}{T} \sum\_{t=1}^T \frac{1}{n-1} \sum\_{i=1}^n \left[ \mathbf{x}\_{i\bar{j}}(t) - \overline{\mathbf{x}}\_{\bar{j}}(t) \right]^2,\tag{2}$$


**Table 1.** Multivariate panel data format for spatail cluster.

The values for monthly climate variables were organized in two configuration matrices. Matrix *N* × *p* had monthly data (*T*) for stations (*n*) in its rows (*N* = *n* × *T*) and the variables (*p*) in the columns. It was used to identify clusters of similar stations. Furthermore, monthly climate variables within these clusters (*Nc*) were analyzed to discover seasonality within the spatial cluster. For the second step, monthly climate variables were arranged in *T* × *Nc* rows, and the variables (*p*) were set up in columns (Table 2).

**Table 2.** Multivariate panel data format for temporal cluster.


2.4.1. Multivariate Cluster Analysis

Cluster analysis is an unsupervised learning technique to identify groups with similar characteristics in the same group [23]. Agglomerative hierarchical clustering was used in this research. The bottom-up hierarchical algorithm treats each sample as a single cluster and then combines pair of clusters that are most similar until every cluster is grouped into one single cluster. In the case of general cross-section data, block distance, Euclidean distance, Minkowski distance, Chebychev distance, or Mahalanobis distance are used to measure the distance between two vectors *x <sup>i</sup>* <sup>=</sup> *xi*1, *xi*2,..., *xip* and *x <sup>j</sup>* <sup>=</sup>

*xj*1, *xj*2,..., *xjp* ).

Cluster analysis of samples collected from multivariate panel data is often averaged over time data into general cross-section data. Typical Euclidean distance is then calculated for further grouping. However, this will result in information loss because the mean shows the average change in the data but does not show the distributing characteristics of the data, such as the standard deviation. Therefore, in this study, a Euclidean distance timed and spaced (*drk*) is used to calculate the distance between sample *r* and sample *k* [15–18], as in Equation (3).

$$d\_{rk} = \sqrt{\sum\_{t=1}^{T} \sum\_{j=1}^{p} \left(\mathbf{x}\_{rj}(t) - \mathbf{x}\_{kj}(t)\right)^2},\tag{3}$$

The distance should satisfy some conditions as follows:


A distance matrix for spatial grouping analysis contains a distance value between every pair of samples as in Equation (4a), which is the symmetric matrix (*n* × *n*) with all diagonal values of zero. At the same time, a distance matrix for temporal grouping analysis within the spatial cluster contains a distance value between every pair of months as in Equation (4b), which is the asymmetric matrix (12 × 12) with all diagonal values of zero.

$$
\begin{bmatrix}
0\\d\_{21}&0\\d\_{31}&d\_{32}&0\\\vdots&\vdots&\vdots&\ddots\\d\_{n1}&d\_{n2}&\cdots&d\_{n(n-1)}&0
\end{bmatrix},\tag{4a}
$$

$$
\begin{bmatrix}
0\\d\_{21}&0\\d\_{31}&d\_{32}&0\\\vdots&\vdots&\vdots&\ddots\\d\_{121}&d\_{122}&\cdots&d\_{12(11)}&0
\end{bmatrix},\tag{4b}
$$

Average linkage, which is the unweighted pair group method using arithmetic averages (UPGMA), was used to average the distance values between pairs of clusters [24]. It is widely used because it compromises the extreme cases [25].

The multivariate cluster analysis used in this paper was implemented directly using the "philanthropy", "cluster", "factoextra" and "FactoMineR" package in R programming language and RStudio [26].

#### 2.4.2. Cluster Validation

This paper employed silhouette width (*Si*) [27] to determine the optimal number of clusters, and it also could be used to validate consistency within clusters of data. The silhouette measures the similarity of *i*-th observation to its own cluster and the similarity of observation to other clusters as Equation (5).

$$S\_{\bar{i}} = \frac{b\_{\bar{i}} - a\_{\bar{i}}}{\max(b\_{\bar{i}\prime}, a\_{\bar{i}})} \tag{5}$$

where *ai* is the average distance between *i* and all other observations in the same cluster, and *bi* is the average distance between *i* and the observations in the "nearest neighboring cluster" as Equation (6).

$$b\_i = \min\_{\mathbb{C}\_k, \in \mathcal{C}\_r \backslash \mathcal{C}(i)} \sum\_{j \in \mathcal{C}\_k} \frac{d(i, j)}{n(\mathbb{C}\_k)} \tag{6}$$

where *C*(*i*) is the cluster containing observation *i*, *d*(*i*, *j*) is the Euclidean distance timed and spaced between observations *i* and *j*, and *n*(*C*) is the cardinality of cluster *C*.

*Si* ranges from −1 to +1, where a high value indicates that the observation is well matched to its own cluster, while a low or negative value indicates that observation is poorly matched to its own cluster. The average of observation's silhouette in a cluster was obtained to determine whether the clustering configuration is appropriate. The advantage of using silhouette only depends on the actual partition of the observations, not on the clustering algorithm that was used, and no need to access the original data. This paper implemented this function using the silhouette function in package cluster [28].

#### **3. Results**

This section may be divided into subheadings. It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.

#### *3.1. Variable Characteristics*

Figure 4 shows boxplots of seven variables; they are varied by month but have the same pattern each year.

Rainfalls were more varied than others in 1997 and 2007 for the El Niño and La Niña phenomenon, respectively. The average rainfall in La Niña phenomenon was higher than that in El Niño phenomenon and the normal average, except for 1999, which was affected by the 1997–1998 very strong El Niño. Furthermore, all factors in each year had a pattern in relation to the season. For example, rainfall was very high and more fluctuated from August to September. It can be concluded that climate factors were different from month to month and year to year. Obviously, the rainfall between El Niño and La Niña differed significantly, while other climate factors were similar. This suggested the rainfall should be more focused to analyse the impact of the ENSO phenomenon on spatial clustering.

**Figure 4.** *Cont*.

**Figure 4.** *Cont*.

**Figure 4.** Boxplots of climate factors of 124 Thai meteorological stations by ENSO, years and months: (**a**) El Niño; (**b**) La Niña.

#### *3.2. Spatial Clustering*

The average silhouette width was used to determine a suitable number of clusters (*k*). It suggested the value 4 or 5 for *k*, due to their maximum width (Figure 5). So, a fair comparison between the ENSO events was achieved for choosing five spatial clusters (SCs) close to height 12.5 (distance between clusters) for all datasets in this study.

**Figure 5.** The average silhouette width for spatial cluster analysis by number of clusters and years.

Five spatial clusters, SC1, SC2, SC3, SC4, and SC5, which were sorted according to the amount of precipitation from ascending to high, were formed and displayed on a spatial map in Figure 6. It was obvious that precipitation was the only meteorological data to noticeably differ between clusters. Spatial clustering in El Niño events was mostly grouped in SC2 (yellow) with 62–66 members except in 1982, which mostly in SC1 (red) with 59 members; however, its average rainfalls were nearly the same to SC2, whereas spatial clustering in La Niña events was mostly grouped in SC1 (red) with 61–83 members. While SC5 (pink) was the least populated member with one member, which was the station in the east for both events (Table 3). These showed most areas in Thailand had low precipitation rate.

**Figure 6.** Spatial cluster analysis (SC1–SC5) for the 124 Thai meteorological stations on a map by ENSO events and years (*x* is monthly rainfall average).



 SCs.

*n*—number of members, C—Central, E—East, N—North,

NE—Northeast,

 S—South, W—West.

In La Niña event, SC1 (red) was found mostly in Northeast and Central areas, which had the least amount of rainfall, and SC2 (yellow) was widely distributed in the North, which had low rainfall. SC3 (green) with moderate rainfall were distributed among all regions, except North and Northeast, while SC4 (blue) are in the south which had quite a lot of rainfall. Lastly, SC5 (pink) with the highest rainfall had one station in the East (Table 3).

While, spatial clustering in El Niño was differently distributed by years. In 1997 and 2015, SC1 (red) was found mostly in the North and SC2 (yellow) was widely distributed in the Northeast, and vice versa in 1982. In 1982 and 1997, SC3 (green) with moderate rainfalls were distributed among all regions, except North and Northeast, and SC4 (blue) were in the South which had quite a lot of rainfall, and vice versa in 2015. In every year, SC5 (pink) with the highest rainfall had one station in the East (Table 3).

The spatial clistering extracted the drought areas in the North region, classified as SC1 with less rainfall than SC2 and in the South region classified as SC3 with less rainfall than SC4 for the El Niño event. These areas would be at risk to be the most drought-prone areas. This suggested the effect of ENSO on spatial clustering.

The distribution of SGs over six regions, showing a clear trend in the redistribution of SGs observed in this study, is shown in Table 4. More diverse climate was found in the East and West than other regions. All regions had a heterogenous meteorological distribution. Every year for both El Niño and La Niña events had 2–4 SCs. However, less distribution for El Niño (2015) in Central, East, and West regions and for La Niña (1999) in the West, and more distribution for La Niña (1999) in the South were noted. These would be due to changes in TGs and intensity of climate factors.

**Table 4.** The distribution of SGs over six regions for ENSO.

