*Article* **On the Spatio-Temporal Characteristics of Aerosol Optical Depth in the Arabian Gulf Zone**

**Alina Bărbulescu**

Department of Civil Engineering, Transilvania University of Bras,ov, 5 Turnului Street, 900152 Bras,ov, Romania; alina.barbulescu@unitbv.ro

**Abstract:** The article investigates some of the available measurements (Terra MODIS satellite data) of the aerosol optical depth (AOD) taken in the Arabian Gulf, a zone traditionally affected by intense sand-related (or even sand-driven) meteorological events. The Principal Component Analysis (PCA) reveals the main subspace of the data. Clustering of the series was performed after selecting the optimal number of groups using 30 different methods, such as the silhouette, gap, Duda, Dunn, Hartigan, Hubert, etc. The AOD regional and temporal tendency detection was completed utilizing an original algorithm based on the dominant cluster found at the previous stage, resulting in the regional time series (RTS) and temporal time series (TTS). It was shown that the spatially-indexed time series (SITS) agglomerates along with the first PC. In contrast, six PCs are responsible for 60.5% of the variance in the case of the temporally-indexed time series (TITS). Both RTS and TTS are stationary in trend and fit the studied data series set well.

**Keywords:** AOD; classification; dendrogram; PCA

#### **1. Introduction**

Dust clouds and storms occur worldwide, especially in the Middle East, southwestern United States, northern China, and the Saharan desert. The essential conditions triggering these phenomena are the existence of huge dust or sand sources, little vegetation, strong surface winds, and an unstable atmosphere [1]. Dust particles primarily enter the lower atmosphere through saltation bombardment, which depends on the meteorological conditions near the surface, the soil texture, and particle size [2–5]. Dust is emitted as hydrophobic particles, relatively ineffective as cloud condensation nuclei. However, during their transport in the atmosphere, due to the interaction with gaseous and particulate air pollutants, their hygroscopicity increases, fortunately enhancing the efficiency of dust removal from the atmosphere through precipitation [6,7]. Haywood et al. [8] indicated that the aerosols cause a strong radiative forcing of climate because of their efficient scattering of solar radiation.

The most abundant aerosol in the atmosphere is dust, composed of oxides (silica, iron oxides), quartz, feldspar, gypsum, and hematite [9]. Ginoux et al. [10] emphasized the anthropogenic and natural dust sources.

Many studies [8,11–13] have already investigated and documented a significant variability of the airborne desert dust during the past decades in the Middle East, Africa, central Asia, and South America, and identified Shamal (the north-westerly wind blowing over Syria, Iraq, and the Arabian Gulf) as the significant natural trigger of dust storm activities across the Arabian Peninsula. Shamal transports the dust lifted from Syria and Iraq to the Arabian Gulf and Peninsula [14–18]. Still, Notaro et al. [16] identified increased dust activity over eastern Saudi Arabia around the Ad Dahna Desert, with dust transported from the Iraqi Desert and local sources. Yu et al. [18] concluded that a strong wind speed determines higher dust activity along the coast of the Persian Gulf in north-central Saudi Arabia, and one has to consider this influence when tracing the phenomenon along and across United Arab Emirates territory.

**Citation:** B ˘arbulescu, A. On the Spatio-Temporal Characteristics of Aerosol Optical Depth in the Arabian Gulf Zone. *Atmosphere* **2022**, *13*, 857. https://doi.org/10.3390/ atmos13060857

Academic Editor: Gabriele Curci

Received: 6 May 2022 Accepted: 23 May 2022 Published: 24 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Different scientists analyzed the aerosol optical depth (AOD) distribution in southeastern Asia and the Middle East in correlation with the seasonal conditions [19–21] and to determine the air quality modifications in China, Sahel, South Africa, and South America [22–25]. They showed that the atmospheric heating rates and the absorption characteristics are linearly dependent, noticing a significant difference between the aerosols in the Indian region and the zones with large deserts and high dunes (such as UAE) contributing to the dust loadings in the atmosphere [26].

Other researchers provided the classification of the aerosol types taking into account different characteristics—fine mode fraction and the aerosol index [27], AOD [4,28,29], refraction index, and Ångström exponent [30–32].

The long-term trend of AOD over the Arabian Peninsula, and eastern and southern Asia for 2002–2009, estimated based on AERONET data, was increasing. The same tendency was observed over most tropical oceans [33]. A long-term positive AOD trend over the Arabian Peninsula occurred, with a higher seasonal tendency during spring and summer (periods when the dust is transported) [26].

Multivariate statistical analysis became one of the most utilized tools for extracting the common characteristics of big data sets issued from environmental sciences [34–38]. The spatio-temporal analysis of AOD is mainly performed by Principal Component Analysis (PCA, also named EOF—Empirical Orthogonal Functions), non-negative matrix factorization (NMF), and combined Principal Component Analysis (CPCA). These tools helped capture the aerosol regimes, the factors influencing the AOD's concentrations, and the trends [20–25].

Recent studies on dust-aerosol in the UAE evaluated the regional distribution of this type of aerosol and the dust storms' intensity [26,39–41]. Using AERONET data collected from 2006 to 2015, Abuelgasim and Farahat [26] found an increasing trend of AOD in summer and spring and a 4.32% mean annual variation of the aerosol loading. They estimated a variation of 11.36% of the mean annual Ångström exponent for the study period. The highest concentration of aerosols was found in summer, while from November to March, an increasing tendency was found during 2011–2016.

Other scientists [40,41] studied the frequency of the dust storms for nine years, using hourly data recorded at eight airports in the UAE. The variation of the aerosol radius was presented in [4], based on monthly series collected at 387 points for 15 years.

Despite the investigations performed to determine the aerosol's characteristics and the effect of meteorological conditions on their loadings and transport in the Arabian Gulf region, many aspects of the aerosol's properties in the UAE remain to be studied.

The AOD time series varies depending on the data structure, aerosol extinction, and surface reflectance [22–24]. Still, here, we shall not analyze the connection between these variables, but the spatial and temporal variation of the AOD series for 178 months. This research continues the attempt to understand the aerosol characteristics in the UAE, aspects that have not been treated in the studies [4,26,40,41].

The main contributions of this research are:


#### **2. Methods and Data Series**

The methods employed at the first stage of our investigations are PCA, also called EOF, and Clustering.

The first one was used to estimate the similarity in terms of linear dependence within the data and eventually to qualify regional/global aspects. The second one was performed to evaluate dissimilarity, the natural tendency of grouping (if present) in data, and identifying aspects and events localized in time and/or space.

PCA is a statistical technique that (linearly) transforms (and deflates) a large set of (possibly correlated) variables into a smaller set of orthogonal uncorrelated variables representing the most significant information of the initial variables set. Initially employed in the study of meteorological series [42], its use became frequent in other fields such as ozone series evaluation [43,44] and isolating aerosols' sources [23] based on AOD retrieved from satellite data.

The PCA method is shortly described in the following.

Let us consider the matrix *X*, whose columns are the series recorded at each point. If *n* is the number of series (367, in this case), and *m* is the number of time units (178 months, here), *X* is an *m* × *n* matrix. It was shown that *X* can be written as a product of two matrices, *Y* (*m* × *m*) and *Z* (*m* × *n*), of orthogonal functions of position and time.

The equation *X* = *YZ* is equivalent to

$$\alpha\_{ik} = \sum\_{j=1}^{m} y\_{ij} z\_{jk} i = \overline{1,m}, \; k = \overline{1,n} \tag{1}$$

so, the vector of the values recorded at a certain point is a linear combination of the *Y* columns (with different weights), under orthogonality conditions.

Formula (1) is called the PCA analysis.

For a symmetric matrix,

$$A = \mathbf{X}\mathbf{X}^{\mathsf{T}} = \mathbf{Y}\mathbf{Z}\mathbf{Z}^{\mathsf{T}}\mathbf{Y}^{\mathsf{T}} \tag{2}$$

there is a decomposition such as

$$A = \mathbf{Y} \boldsymbol{\Lambda} \mathbf{Y}^T \tag{3}$$

where Λ is the diagonal matrix formed by *A*'s eigenvalues. Therefore, from (1) and (3) it results that

$$z\_{ik} = \sum\_{j=1}^{m} y\_{j\text{i}} x\_{jk\text{,}} i = \overline{1, m}\_{\text{,}} k = \overline{1, n}\_{\text{,}} \tag{4}$$

because *Z=YTX*.

The *j-th* eigenvector has a contribution of *λj*/ ∑*<sup>m</sup> <sup>i</sup>*=<sup>1</sup> *λ<sup>i</sup>* to *X*, where *λi*, *i* = 1, *m* are the eigenvalues [25].

When a small dominant set of principal components exists, the technique detects the common characteristics of the data samples and reveals the regional or temporal aspects. The absence of dominant principal components results in the data series independence, so the phenomenon is localized [45].

Modifications of PCA have been proposed, such as sPCA [46] (that proposes sparse loadings), CPCA [47] (to investigate the pattern of a specific element), Common PCA (to simultaneous reduce the dimensionality in different groups) [48], or Combined PCA (to compare the modes in the AOD decomposition) [23,24,49]. PCA was chosen here because we are interested in the common characteristics of the series.

To determine the number of principal components, the scree plot, the Kaiser criterion, and the proportion of the variance explained by each component may be utilized [50–52]. Here we employed the combined scree plot and variance explained by each component.

Clustering is a method for identifying patterns and similarities within the data and the natural grouping tendency of the similar objects within a data set of interest.

Different scientists introduced various tests for detecting the optimal number of clusters. Although some are more commonly used (because they are well-known or easier to compute), there is no reason to give more credit to one or another, mainly because they rely on different mathematical and computational techniques. Therefore, the strategy for establishing the optimal number of clusters was a multi-criteria decision obeying the majority rule after performing 30 different tests, including silhouette [53], gap, Duda, Dunn, Hartigan, Hubert, and so on [54]. For example, if 15 tests voted for two clusters, eight for

three, five for four, and the rest for six, the chosen number is two. This approach assures choosing the number of clusters with the highest probability among the possible number of such groups. In the previous example, the probability associated with two clusters is 15/30 = 1/2, compared to 4/15, 1/6, and 1/15, associated with other choices.

Agglomerative Hierarchical Clustering has been performed using the R software.

The agglomerative coefficient was computed to determine the grouping quality. The higher this coefficient is, the best the clustering is. The dendrogram showing the elements in each cluster was also plotted.

The regional time series (RTS) (temporal time series (TTS)) was built using the spatiallyindexed time series (SITS) (temporally-indexed time series (TITS)) and the following algorithm, which is a modified version of *Method II* [4] based on the resulting clusters.

Given *k* data series each containing *n* values, consecutively recorded at the same time, denote by *Y* = (*yji*) (*j =* 1, ... , *n, i =* 1, ... , *k*) the matrix whose column *i* is formed by the elements of the *i-th* series (*i =* 1, ... , *k*). The steps of the algorithm for determining the representative series are:


The novelty of the approach proposed here consists of the following.


The trend and level stationarity of RTS and TTS has been checked using the KPSS test [55]. The null hypothesis, H0, was the series stationarity in level (trend), and the alternative one, H1, was the series nonstationarity in level (trend).

Remember that a time series is stationary if the statistical properties of the process generating it remain unchanged over time. Therefore, the mean, variance, and autocorrelation structure are constant over time.

One of the common causes of the violation of H0 is the existence of a trend in the mean due to the presence of a unit root or the existence of a deterministic trend. In the first case, the stochastic shocks have persistent effects. In contrast, in the second one, they have only transitory effects after which the variable tends toward a deterministically evolving (non-constant) mean (and the process is called a trend-stationary).

The KPSS test is based on the time series decomposition into a deterministic trend, a random walk, and a stationary error. In the case of stationarity, the series has a fixed element as intercept, or the series is stationary around a fixed level.

The test was performed at the level of significance of 0.05. If the *p*-value is less than 0.05, the null hypothesis is rejected.

Data used in this study are monthly AOD series retrieved by Terra MODIS (at a wavelength of 412 nm) at 387 points from July 2002 to April 2017 in the Arabian Gulf

Region (Figure 1a), between 24.95–26.25 latitudes and 51.55–55.75 longitudes. The series retrieved at the point of coordinates 26.15 latitude and 51.55 longitude is presented in Figure 1b and the series recorded in January 2003 is shown in Figure 1c.

**Figure 1.** (**a**) Observation area (https://www.google.com/mymaps/, 2022); (**b**) the series retrieved at the point of coordinates 26.15 latitude and 51.55 longitude (SITS); (**c**) series recorded in January 2003 (TITS).

The sites are ordered in increasing order in latitude, and subsequently, by decrease latitude, from the left corner on the map of the studied region to the lower right corner. Details on the study area may be found in [4,40,41]. The coordinates of the sampling points are given in Table S1 in the Supplementary Materials. Data have been organized in a matrix, *X*, whose columns contain the AOD at each point, and the lines contain the monthly values at the observation points.

#### **3. Results and Discussion**

#### *3.1. PCA and Clustering*

Table 1 shows the computed eigenvalues greater than 1, the proportion of the variance explained by each component, and the cumulative proportion of the variance explained for SITS.


**Table 1.** Eigen-analysis of the correlation matrix of SITS.

Although eight eigenvalues are greater than 1, the first component (PC1) explains 90.20% of the variance within this set. The second component (PC2) explains only 2.8% of the variance within SITS, while the others have even smaller contributions. The first two principal components (PCs) are enough to extract the essential information within the time series set, which proves to be highly PCA compressible. Only 9.80% of the variance within this data is outside the direction of the first dominant PC, and 6.90% is outside the plane determined by PC1 and PC2. These small percentages reveal that the series similarity (linear dependence) is high in this set because the data points agglomerate along with PC1. Therefore, the sand aerosols over the Arabian Gulf have a regional nature, the AOD values being relatively similar (linearly dependent) across the analyzed area.

The optimal number of clusters—two—was selected after running the NbClust package in R. Figure 2 displays the silhouette chart (one of the 30 methods run). The agglomerative clustering has been performed for SITS setting the number of clusters equal to two. The computed agglomerative coefficient was 0.7678, indicating a good partition of the series in two sets. The highest this coefficient is, the better the clustering is.

**Figure 2.** The silhouette chart. The dotted line indicates the optimum number of clusters (two).

The dendrogram displaying the groups of the observation points is presented in Figure 3.

**Figure 3.** The dendrogram for classification of SITS. The elements in different clusters are in different colors.

Analyzing the elements in the clusters related to the positions of the points on the map (Figure 1a and Supplementary Table S1), results in that one cluster contains the points situated on the eastern side of the region and (very few) near the Qatar shore, while the second one contains the rest of the locations. So, one cluster contains the sites situated in the direction where the Shamal blows—between the red borders in Figure 1.

Figure 4 shows the scree plot of the TITS (the lines of the data matrix). The computation found 25 eigenvalues greater than one, with only six PCs explaining 60.5% of the variance.

**Figure 4.** The scree plot for TITS.

The majority principle decided that the number of clusters to classify the series is also two for the TITS. Figure 5 contains the elements in the clusters and the dendrograms for the TITS. The agglomerative clustering provided an agglomerative coefficient of 0.92802, which indicates a strong separation between the groups.

**Figure 5.** (**a**) Clusters (the values on the axes represent the scores on PC2 and PC1) and (**b**) the dendrogram from the hierarchical clustering of TITS. The elements in different clusters are in different colors.

Comparing the clusters' content, it resulted that one of them mainly contains the series recorded in the summer months (March to August), while the other contains the rest. So, the classification is related to seasonality.

#### *3.2. Estimating the RTS and TTS*

Taking into account the results from the previous section, the RTS has been computed by applying the algorithm (I)–(V) described in Section 2 to SITS (columns in the matrix *X*). The AOD's RTS (as a function of time) is presented in Figure 6a. One can remark on the periodic behavior of this series, whose highest values of the AOD's RTS are primarily recorded in July, while the lowest is in November–January.

**Figure 6.** (**a**). The AOD 's RTS; (**b**) AOD's RTS monthly average.

Figure 6b shows that the AOD's RTS monthly average value for July is about three times higher than the corresponding average values for November–January. This result is in concordance with those of Abuelgasim and Farahat [26] and Yoon et al. [56], which indicated a significant increase of AOD over the Gulf Region, especially in summer, related to the dust abundance [39–41,57,58].

The trend and level stationarity hypotheses could not be rejected for the RTS (*p*-value > 0.1, in both cases) when applying the KPSS test. This means that the RTS does not present a variation in trend or level.

The TTS (Figure 7) was built from the TITS (the transposed matrix, *YT*) and the same algorithm.

**Figure 7.** The TTS of AOD obtained running the algorithm from Section 3 for TITS.

The aspect of TTS, with peaks and troughs is related not only by the monthly characteristics of the AOD (higher dust quantities in summer, and lower in winter), but also to the position of the sampling points (Figure 1a) in the direction NW-SE in which Shamal blows.

The non-stationarity in the level and trend stationarity are emphasized by the results of the KPSS test, whose *p*-values are lower than 0.01 (for level-stationarity) and 0.12376 (for trend-stationarity).

The goodness-of-fit indicators are tools for assessing the fit quality. The lower their values are, the better the fit is. For a correct model quality assessment, using more than one indicator is recommended. For example, MSE can be influenced by values that significantly deviate from the average. Therefore, three goodness-of-fit indicators have been utilized here—MAE, MSE, and MAPE—the last one being non-dimensional, so it is more reliable for comparisons between the two models.

All the values of the goodness-of-fit indicators corresponding to the RTS and TTS are all very low (Table 2). They indicate a better fit for RTS than for TTS when reported to the average indicators. Indeed, the mean values of MAE (MSE and MAPE, respectively) are 0.1724/0.0326 = 5.29 (21.88 and 4.38 times higher, respectively) for TTS than for RTS. The minimum MAE and MSE are also lower for RTS compared to TTS, but the min MAPE is higher for RTS.


**Table 2.** Goodness of fit indicators of RTS and TTS.

The maximum values of MAE (max MSE and max MAPE, respectively) is 11.15 (55.32 and 4.84, respectively) times higher for RTS than for TTS, indicating a higher variability at the temporal scale than at the spatial one.

This means that RTS better represents the individual series than TTS, so there is a higher homogeneity of the SITS than the TITS. The result is in concordance with the findings related to the seasonal variability of AOD [26].

#### **4. Conclusions**

This study extended our previous research on the aerosol radius, using the AOD series collected during the same period at the same sampling points in the Gulf Region. The novelty of the research consists of a dual analysis in time and space and the detection of RTS and TTS that characterizes the AOD behavior over the study zone.

The approach combined the PCA with a new algorithm, building the RTS and TTS series based on the classification provided by the clustering.

It was found that a single principal component explains more than 90% of the variance of SITS, indicating that the series are agglomerated along with PC1. The TITS are scattered (the first six dominant principal components accounting for only 60.5% of the variance in the sets). Still, both RTS and TTS fit data well and are trend stationary.

We intend to extend the research to sets of series with missing data, given that most of the available records present gaps.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/atmos13060857/s1, Table S1: Data series and the coordinates of the locations.

**Funding:** This research received funding from the research fund of Transilvania University of Bras, ov, Romania.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data can be freely downloaded from the site https://modis.gsfc.nasa. gov/data/ (accessed on 10 July 2017).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

