**1. Introduction**

The chemical composition of cloud water has revealed the high complexity of this medium [1], resulting from the cloud scavenging of soluble gases, the dissolution of the soluble fraction of the aerosol acting as cloud condensation nuclei (CCN), and from aqueous phase reactions [2]. Additionally, recent studies have shown that microbial activity altered the cloud water chemical composition [3–5]. Therefore, cloud water is composed of a mixture of complex inorganic and organic compounds with strong oxidants which have been shown to drive aqueous phase transformations in the presence of solar radiation [6,7].

Studies devoted to the analysis of the chemical composition and to its variability have aimed to better understand several atmospheric physicochemical processes such as droplet activation and growth, production and consumption of chemical compounds, as well as transport and deposition by precipitation of pollutants. Many field studies have reported in cloud production of low-volatility products [8,9]. The production of strong acids such as sulfates has been highlighted in the cloud aqueous phase leading to acidification, a process that controls both the phase partitioning and chemical reaction rates [10] and is responsible for the widely known phenomenon of acid rain [11,12]. Oxidative cloud processing was suggested to form secondary organic aerosol (aqSOA) through functionalization of dissolved organic compounds [13]. These aqueous phase transformations alter particle properties in terms of mass, chemical composition, hygroscopicity, and oxidation state [14], also affecting their CCN ability. Variability of solute concentrations together with drop size distribution have been observed in many field studies, revealing variations of the CCN composition with particle size and the dependency of some processes (dissolution of soluble gases and condensational growth) on drop size [15–17]. In cloud scavenging efficiencies have also been investigated by simultaneously measuring cloud water concentrations and interstitial particulate or gaseous concentrations [18–20]. Deviations, up to a few orders of magnitude from Henry's law equilibrium, have been reported for carbonyl compounds with low effective water solubility [21,22] which suggested a more efficient scavenging of organic compounds by cloud water than expected. Deposition through precipitation of the scavenged chemical compounds removes large amounts of organic and inorganic pollutants from the atmosphere providing a significant contribution of nutrients (positive inputs) and pollutants (negative inputs) in various ecosystems [23,24].

Cloud water studies have been conducted over various continents [1] including Europe [25–28], Asia [7,29–31], North and South America [1,32–34], as well as in contrasted environments (polluted, marine, and remote). Significant developments have been implemented to enhance the collection efficiency of cloud collectors [35–37] and to better characterize the molecular composition by targeted or non-targeted methods, often using mass spectrometry [3,20,38]. Those studies mostly investigated the temporal variability of cloud chemical composition, as well as the transport of air masses and the physicochemical processes [30,33]. However, most of these field campaigns were performed over short-term periods due, in part, to the inherent difficulty of collecting clouds. A few sites have continuously collected cloud water over long-term periods such as the Puy de Dôme station in France (PUY) [39], Mt. Brocken in Germany [40], Whiteface Mountain in USA [32], Mt. Oyama in Japan [41] and Mt. Tai in China [31]. These mountain sites offer facilities to sample clouds under optimal conditions and to conserve quality homogeneity of chemical analysis on a long-term basis. These stations have a relatively high altitude, i.e., hundreds of meters above the surrounding plains, where cloud formation occurs, optimizing collection efficiency and sampling air masses from varied origins, throughout the year.

The present study aims at analyzing a long-term dataset of the chemical composition of cloud water samples at PUY. This remote site is influenced by long-range transport [42,43] and the proposed study has the objective to provide information on the physicochemical variability of air masses on a regional scale. For this, cloud water chemical composition is used to constrain a multivariate statistical analysis and propose a chemical classification of the sampled clouds. Then, this classification is combined to a numerical analysis using the CAT model (computing advection-interpolation of atmospheric parameters and trajectory tool) [39,44]. This model simulates the atmospheric transport of air masses and provides zone and sector matrices. Thereby, in regards to the basic back-trajectory methods [45], the CAT model brings additional information, and completes more robust statistical analyses, such as partial least squares (PLS) regressions between chemical and air mass history matrices. A specific point is addressed by the comparison of a previous cloud classification at PUY [46] and the identification of PUY's specificities, with similar studies [33,45,47–49]. Monitoring over 18 years also reveals some trends, both in terms of chemical concentrations and history of air masses that are discussed in this paper. Such a long-term monitoring avoids numerous statistical biases encountered in short-time field campaigns. The statistical information on the variability of chemical composition as a function of environmental factors (i.e., source regions) is also helpful for evaluating the extent of oceanic vs. continental influences and to define environmental contrasted scenarios for modeling

purposes. Finally, the effect of cloud physics or air mass history on the chemical composition of clouds sampled at PUY is estimated.

### **2. Experiments**

### *2.1. Cloud Sampling*

Sampling was performed at PUY (45.7722◦ N, 2.9648◦ E), which belongs to several international networks as follows: EMEP (the European Monitoring and Evaluation Programme), GAW (Global Atmosphere Watch), and ACTRIS (Aerosols, Clouds, and Trace Gases Research Infrastructure). The observatory chalet is on top of a monogenetic volcano rising above the surrounding area with a height of 1465 m. PUY is part of the Chaîne des Puys, a north–south oriented chain of extinct volcanoes in the Massif Central (France), and to the west, an agricultural plain to the ocean (300 km apart). The urban area of Clermont-Ferrand and its surrounding suburbs (285,000 inhabitants) is situated 16 km east and 1000 m lower than the station. PUY is able to characterize air masses from various histories, coming from the boundary layer or in the free troposphere, varying as a function of the seasons and time of the day. The top of the mountain is frequently in cloudy conditions, on average 30% of the time per year, with higher occurrences during winter and autumn [50]. This makes PUY a reference site to study and sample cloud properties.

The cloud sampling dataset used in this study covered the period 2001–2018, with an average sampling time of 3 h and an average sampling volume of 75 mL. Non-precipitating cloud droplets were sampled using cloud collectors, as described previously for PUY cloud studies [46]. Cloud droplets, larger than 7 µm (cut-off diameter) [51], were collected by impaction onto a rectangular aluminum plate. Most of the time, droplets were collected directly as a liquid, and more rarely, they froze upon impaction (supercooled conditions). The water was transferred at room temperature, either directly or after a short melting period into glass vials. The aluminum collectors were cleaned and sterilized by autoclaving. Samples were collected in sterilized bottles and cloud water was filtered (0.20 µm nylon filter to eliminate microorganisms and particles). The majority of the sampled clouds resulted from frontal systems that mainly occurred during autumn, spring, and winter; these meteorological conditions enabled sampling clouds over long-time durations.

### *2.2. PuyCloud Database*

The PuyCloud observation system deals with the monitoring of the biological, microphysical, and chemical properties of clouds. Biological and chemical analyses are performed in collaboration with the ICCF (Institute of Chemistry of Clermont-Ferrand). It is part of the French CO-PDD (Cézeaux-Aulnat-Opme-Puy De Dôme) multisite platform fully described by Baray et al. [39] in terms of instrumentations and data availability and widely employed [22,25,52–55].

The meteorological parameters that are monitored at PUY include the following: wind speed and direction, temperature, pressure, and relative humidity. Cloud microphysical properties, i.e., liquid water content (LWC) and effective droplets radius (re), are measured with a Gerber particle volume monitor, model 100 (PVM-100).

Physicochemical parameters are measured immediately after sampling (pH, conductivity, and redox potential). The concentrations of the major organic and inorganic ions (acetic, formic, succinic, malonic and oxalic acids, Ca2+, K+, Mg2+, Na+, NH<sup>4</sup> <sup>+</sup>, Cl−, SO<sup>4</sup> <sup>2</sup>−, and NO<sup>3</sup> −) are measured by ion chromatography, using a DIONEX DX-320. The H2O<sup>2</sup> and iron content, which are important parameters in the evaluation of the cloud oxidative capacity, are also determined. The spectrofluorimetric method based on the reactivity of p-hydroxyphenilacetic acid with horseradish peroxidase [53] was used to measure the concentration of hydrogen peroxide in cloud water. The Fe(II) concentration was measured by UV-visible spectroscopy at 562 nm, using the method developed by Stookey [56] based on the rapid complexation of iron with ferrozine. A detailed description of the physicochemical parameters and chemical analysis has been provided in Bianco et al. [6].

Between 2001 and 2018, 141 cloud events have been sampled at PUY, representing 295 individual samples (896 h of sampling), including 72 samples in spring, 21 samples in summer, 75 samples in fall, and 127 samples in winter. Table S1 indicates the physicochemical analysis performed for each cloud event. This cloud water chemical characterization was systematically performed for the last 20 years. Additional cloud water chemical and biological analysis have been developed during this last decade using targeted or global methods [3,22,25,52,54,57–61].

### *2.3. Dynamical Analysis*

The trajectory approach is commonly used to identify source areas of air pollutants, based on conditional probability fields including back trajectory calculations, land cover, and meteorological data [62,63]. In the present work a dynamical analysis using the CAT model is performed to identify source areas of chemical compounds detected in cloud samples.

The CAT model is the recent evolution of the Lagrangian model LACYTRAJ [44]. CAT is a three-dimensional (3D) forward/backward kinematic trajectory code using initialization wind fields from the recent reanalysis ECMWF ERA-5 [64]. A cluster of starting back trajectory points is defined by the user and advected by the model using a bilinear interpolation for horizontal wind fields and time and a log-linear interpolation for vertical wind fields. The CAT model has already been used to determine the air masses arriving in PUY on the basis of calculations of two sets of 24 h back trajectories per day over a two-year period (2015–2016) [39].

In this study, sets of 45 back trajectories were calculated every hour during the cloud sampling in volume ± 0.1◦ in latitude and longitude. The vertical starting altitude of the back trajectories was deduced from the pressure measured at the Puy de Dôme summit considering the hydrostatic equilibrium. Trajectories were calculated between the summit and 50 m below (corresponding to 4 hPa) to take into account the ascent from the slopes of PUY of the air masses arriving below the observatory. The temporal resolution was 15 min and the total duration was 72 h.

The CAT model was initialized with ECMWF ERA-5 wind fields of any temporal and spatial resolution. For this work, wind fields were extracted every 3 h with a spatial resolution of 0.5◦ in latitude (55 km) and longitude (40 km), on 23 vertical pressure levels between 200 and 1000 hPa. CAT integrated a topography matrix at a resolution of around 10 km [65].

In addition to the wind parameters, the boundary layer height was also extracted from the ECMWF ERA-5 reanalysis in the same horizontal resolution, and spatially and temporally interpolated on all the trajectory points.

The trajectory calculation phase was followed by a dynamical characterization analysis phase. This phase consisted of flagging the cloud samples on the base of the results of the trajectories calculations.

The history of air masses was modeled by counting the number of trajectory points in each of the following nine geographic areas: north-northeast (NNE), east-northeast (ENE), east-southeast (ESE), south-southeast (SSE), south-southwest (SSW), west-southwest (WSW), west-northwest (WNW) and north-northwest (NNW), named "sector" hereafter, and one nearby area. The latter was defined because it was not possible to determine the origin for the closest points to PUY, in a radius of 0.5◦ , because of the spatial resolution of the wind fields. The percentage of points located over the sea and the continental surfaces was, then, determined using the topography file. If the altitude of the topography interpolated on each trajectory point is 0, this point is considered to be above the sea, and therefore corresponds to the "sea surface" zone. Otherwise it is the "continental surface" zone. Finally, we separated the continental and sea zones vertically, using the altitude of the atmospheric boundary layer height (ABLH) interpolated on the trajectory points (data summarized in Table S1, blue columns).

All of these characteristics were, then, compiled for each cloud sampling, providing a so-called "zone matrix" and a so-called "sector matrix". Thus, the matrices indicated, for each cloud sample, the distribution of the sectors or the zones crossed by their 72-hour backward trajectory. The relationship between the air mass history and the cloud composition was the subject of a statistical analysis, as described in Section 2.4.

### *2.4. Statistical Analysis*

A principal component analysis (PCA) was performed using the concentrations of both organic and inorganic ions (Ca2+, K+, Mg2+, Na+, NH<sup>4</sup> <sup>+</sup>, Cl−, NO<sup>3</sup> <sup>−</sup>, SO<sup>4</sup> <sup>2</sup>−, acetate, formate, malonate, oxalate, and succinate). The aim was to determine the most relevant variables to establish chemical categories (categories which would, then, be put in parallel with microphysical parameters or air mass history matrices). However, some chemical analyses were lacking and some samples presented missing values. Missing values were not replaced by the mean values to fully represent the variability of the dataset and to avoid overfitting [66]. Thus, samples, that were not fully characterized, were not considered in the statistical analysis.

Then, we performed numerous PCAs, and the maximum of information, gathered on the first two factors, was obtained with three ions with predominant marine sources (Cl−, Mg2+, and Na+) and three ions with major sources from the continental surface (mostly anthropogenic ones) (NH<sup>4</sup> <sup>+</sup>, SO<sup>4</sup> <sup>2</sup>−, and NO<sup>3</sup> −). Hence, by keeping six inorganic ions, as similar studies [33,46,47], we fulfilled the best balance between samples and variables (i.e., increasing the number of variables means decreasing the number of samples).

The PCA type that was used during the computations was the Spearman's correlations (more appropriate when running the PCA on variables with different distributions).

Then, we performed agglomerative hierarchical clustering (AHC), an iterative classification method, the aim of which was to make up homogeneous groups of objects (categories) on the basis of their description by a set of variables (chemical variables, herein) describing the dissimilarity between the objects (cloud events, herein). The AHC produced a dendrogram which showed the progressive grouping of the data. To calculate the dissimilarity between samples, we applied the common Ward's agglomeration method (which minimized the within-group inertia) using Euclidean distance. The data were centered-reduced, to avoid variables with strong variance which unduly weighed on the results. The truncation level was automatically defined on the base of the entropy, and therefore the number of categories to retain.

Then, a PLS regression was performed to establish the correlations between the chemical parameters and the air masses history parameters. The Mann–Whitney and Kruskal–Wallis nonparametric tests were carried out to validate significant differences between two and among several data groups, respectively. Two air mass categories were declared to be different when the probability for the groups to have identical data distribution was lower than 5% (*p*-value < 0.05). These tests were chosen because the population from which the sample was extracted did not follow a normal distribution, according to the Shapiro–Wilk normality test.

This statistical analysis was performed using Excel XLSTAT software [67].

### **3. Results and Discussion**

The multivariate statistical analysis was performed on 295 cloud samples collected at PUY, starting with PCA and AHC, in order to classify them according to their chemical composition. Then, these results were compared to the previous PUY study [46]. Then, we investigated, by using PLS regression, the relationships among these chemical data and the matrices provided by the CAT model, both on the zones ("sea and continental surfaces") and on the cardinal sectors crossed by the air masses. Finally, we compared the respective influences of the air mass history and microphysics on the chemical composition of clouds.

### *3.1. Clusterization of Cloud Waters at PUY* almost as robust with three or five categories. Nevertheless, these four AHC categories are consistent with our previous study [46]. Categories 1, 2, 3, and 4 consist of 113, 31, 55 and 9 clouds, respectively.

Data relative to Cl−, Mg2+, Na+, NH<sup>4</sup> <sup>+</sup>, NO<sup>3</sup> <sup>−</sup>, and SO<sup>4</sup> <sup>2</sup>−, presented in Table S1, were analyzed by AHC and PCA to obtain categories based on ion concentration dissimilarities. The ACH profile plot (Figure 2), represents the four categories determined from the six main inorganic ions (Cl<sup>−</sup> , Mg2+, Na<sup>+</sup> , NH<sup>4</sup> + , NO<sup>3</sup> − , and SO<sup>4</sup> 2−). The light blue category with low ion

concentrations is named "Marine" according to its air mass history (i.e., the time spent by this air

*Atmosphere* **2020**, *11*, x FOR PEER REVIEW 6 of 21

88.91) and the aggregate of the yellow and red categories (dissimilarity = 93.41), the AHC could be

#### 3.1.1. Chemical Categories mass above the "sea surface", detailed in Section 3.2). This category is the most homogeneous. This

AHC was used to categorize cloud samples based on the long-term monitoring of their chemical composition. The AHC algorithm successfully grouped all the observations with a satisfactory cophenetic correlation (correlation coefficient between the dissimilarity and the Euclidean distance matrices) of 0.619 (Figure 1). Indeed, the closer the correlation to 1, the better the quality of the clustering. The dotted line in Figure 1 represents the degree of truncation (dissimilarity = 91.16) of the dendrogram used for creating categories and was automatically chosen based on the entropy level. Given the small difference in dissimilarity (Figure 1) between the light blue category (dissimilarity = 88.91) and the aggregate of the yellow and red categories (dissimilarity = 93.41), the AHC could be almost as robust with three or five categories. Nevertheless, these four AHC categories are consistent with our previous study [46]. Categories 1, 2, 3, and 4 consist of 113, 31, 55 and 9 clouds, respectively. is confirmed by its significantly lower within-class variance (179.86), as shown in Figure 1. The "marine" category is also the main category (113 objects), which is consistent with the remoteness of the PUY. The dark blue category is characterized by high concentrations of Na<sup>+</sup> , Cl<sup>−</sup> , and Mg2+ and its air mass history, and thus called "highly marine". PUY is located more than 300 km from the Atlantic shore. Nevertheless, at a synoptic scale, the air masses are mainly transported from the Ocean to PUY with no relief between (as confirmed hereafter by the CAT model). Hence, this category, with 31 objects, would appear to be counterintuitively modest. This suggests that some western clouds (which could have been classified as "highly marine") have either precipitated or become diluted (increase in liquid water content), thereby decreasing concentration. Then, these western clouds are classified "marine", hence, the importance of this category (i.e., a category with a marine history, but without salt).

**Figure 1.** Dendrogram representing the agglomerative hierarchical clustering (AHC) based on dissimilarities using the Ward's method on 6 inorganic ion concentrations. The 208 cloud samples (without chemical missing values) were assigned to one of four automatically established categories (dissimilarities values displayed in bold). The six ions are the same as those used for principal component analysis (PCA). **Figure 1.** Dendrogram representing the agglomerative hierarchical clustering (AHC) based on dissimilarities using the Ward's method on 6 inorganic ion concentrations. The 208 cloud samples (without chemical missing values) were assigned to one of four automatically established categories (dissimilarities values displayed in bold). The six ions are the same as those used for principal component analysis (PCA).

The ACH profile plot (Figure 2), represents the four categories determined from the six main inorganic ions (Cl−, Mg2+, Na+, NH<sup>4</sup> <sup>+</sup>, NO<sup>3</sup> <sup>−</sup>, and SO<sup>4</sup> <sup>2</sup>−). The light blue category with low ion concentrations is named "Marine" according to its air mass history (i.e., the time spent by this air mass above the "sea surface", detailed in Section 3.2). This category is the most homogeneous. This is confirmed by its significantly lower within-class variance (179.86), as shown in Figure 1. The "marine" category is also the main category (113 objects), which is consistent with the remoteness of the PUY. The dark blue category is characterized by high concentrations of Na+, Cl−, and Mg2<sup>+</sup> and its air mass history, and thus called "highly marine". PUY is located more than 300 km from the Atlantic shore.

SO<sup>4</sup>

(NH<sup>4</sup> + , NO<sup>3</sup> −

, and SO<sup>4</sup>

2−

Nevertheless, at a synoptic scale, the air masses are mainly transported from the Ocean to PUY with no relief between (as confirmed hereafter by the CAT model). Hence, this category, with 31 objects, would appear to be counterintuitively modest. This suggests that some western clouds (which could have been classified as "highly marine") have either precipitated or become diluted (increase in liquid water content), thereby decreasing concentration. Then, these western clouds are classified "marine", *Atmosphere*  hence, the importance of this category (i.e., a category with a marine history, but without salt). **2020**, *11*, x FOR PEER REVIEW 7 of 21

**Figure 2.** Profile plot established by the AHC from the six main inorganic ions (Na<sup>+</sup> , Cl<sup>−</sup> , Mg2+, SO<sup>4</sup> NH<sup>4</sup> + , and NO<sup>3</sup> − ). The Y axis displays the normalized (([Cl<sup>−</sup> ] − [Cl<sup>−</sup> ]min)⁄([Cl<sup>−</sup> ]max − [Cl<sup>−</sup> ]min)) ion concentrations of the category centroids. **Figure 2.** Profile plot established by the AHC from the six main inorganic ions (Na+, Cl−, Mg2+, SO<sup>4</sup> <sup>2</sup>−, NH<sup>4</sup> <sup>+</sup>, and NO<sup>3</sup> <sup>−</sup>). The Y axis displays the normalized (([Cl−] − [Cl−]min)/([Cl−]max − [Cl−]min)) ion concentrations of the category centroids.

2− ,

In red, the smallest category (nine objects), referred to as "polluted" in Figure 1 displays peak concentrations for SO<sup>4</sup> 2− , NH<sup>4</sup> + , and NO<sup>3</sup> − , suggesting the air mass passed over an urbanized area. Below these maxima, in yellow, the "continental" category with 55 objects stands out. It should be noted, with only nine objects, the polluted category is statistically less robust, and could have been merged with the "continental" category (see dissimilarities in Figure 1), and regarded as the extreme 2− , NH<sup>4</sup> + , and NO<sup>3</sup> <sup>−</sup> values of the category. Conversely, the "highly marine" category could have In red, the smallest category (nine objects), referred to as "polluted" in Figure 1 displays peak concentrations for SO<sup>4</sup> <sup>2</sup>−, NH<sup>4</sup> <sup>+</sup>, and NO<sup>3</sup> −, suggesting the air mass passed over an urbanized area. Below these maxima, in yellow, the "continental" category with 55 objects stands out. It should be noted, with only nine objects, the polluted category is statistically less robust, and could have been merged with the "continental" category (see dissimilarities in Figure 1), and regarded as the extreme SO<sup>4</sup> <sup>2</sup>−, NH<sup>4</sup> <sup>+</sup>, and NO<sup>3</sup> − values of the category. Conversely, the "highly marine" category could have been split (see dissimilarities in Figure 1), according to their SO<sup>4</sup> <sup>2</sup><sup>−</sup> concentration (not shown).

been split (see dissimilarities in Figure 1), according to their SO<sup>4</sup> <sup>2</sup><sup>−</sup> concentration (not shown). Because the computed *p*-value in the Kruskal–Wallis test (Figure 3) is lower than the significance level alpha = 0.05, the distribution of ions (Cl<sup>−</sup> , Mg2+, Na<sup>+</sup> , NH<sup>4</sup> + , NO<sup>3</sup> − , and SO<sup>4</sup> 2− ) concentration can be accepted as significantly different between each category. The samples do not come from the same population. We observe, in particular, high sea salts concentrations (Cl<sup>−</sup> , Mg2+ , and Na<sup>+</sup> ) for both Because the computed *p*-value in the Kruskal–Wallis test (Figure 3) is lower than the significance level alpha = 0.05, the distribution of ions (Cl−, Mg2+, Na+, NH<sup>4</sup> <sup>+</sup>, NO<sup>3</sup> <sup>−</sup>, and SO<sup>4</sup> <sup>2</sup>−) concentration can be accepted as significantly different between each category. The samples do not come from the same population. We observe, in particular, high sea salts concentrations (Cl−, Mg2+, and Na+) for both "marine" and "highly marine" categories, and high concentrations of potentially anthropogenic ions (NH<sup>4</sup> <sup>+</sup>, NO<sup>3</sup> <sup>−</sup>, and SO<sup>4</sup> <sup>2</sup>−) [68–72] for both "polluted" and "continental" categories (Table S2).

"marine" and "highly marine" categories, and high concentrations of potentially anthropogenic ions

) [68–72] for both "polluted" and "continental" categories (Table S2).

*Atmosphere* **2020**, *11*, x FOR PEER REVIEW 8 of 21

**Figure 3.** Distribution of inorganic ions (Cl<sup>−</sup> , Mg2+, Na<sup>+</sup> , NH<sup>4</sup> + , NO3, and SO<sup>4</sup> 2− ) of the cloud waters sampled at PUY for each air mass category (marine, highly marine, continental, and polluted). The number of analyzed samples is 208 (samples with missing data were removed). One box plot per category is displayed for each ion. The mean values are displayed as red crosses. The central horizontal bars are the medians. The lower and upper limits of the box are the first and third quartiles, respectively. The ends of whiskers are 10th and 90th percentiles. Black diamonds are minimum and maximum for each species. The box plot's horizontal width has no statistical meaning. Statistical differences (Kruskal–Wallis test; *p* value < 0.05) between groups are indicated above box plots. **Figure 3.** Distribution of inorganic ions (Cl−, Mg2+, Na+, NH<sup>4</sup> <sup>+</sup>, NO<sup>3</sup> , and SO<sup>4</sup> <sup>2</sup>−) of the cloud waters sampled at PUY for each air mass category (marine, highly marine, continental, and polluted). The number of analyzed samples is 208 (samples with missing data were removed). One box plot per category is displayed for each ion. The mean values are displayed as red crosses. The central horizontal bars are the medians. The lower and upper limits of the box are the first and third quartiles, respectively. The ends of whiskers are 10th and 90th percentiles. Black diamonds are minimum and maximum for each species. The box plot's horizontal width has no statistical meaning. Statistical differences (Kruskal–Wallis test; *p* value < 0.05) between groups are indicated above box plots.

### 3.1.2. Variable Validation 3.1.2. Variable Validation

A PCA was computed on a Spearman correlation matrix using the concentrations of the ions (Cl−, Mg2+, Na+, NH<sup>4</sup> <sup>+</sup>, NO<sup>3</sup> <sup>−</sup>, and SO<sup>4</sup> <sup>2</sup>−). The PCA correlation circle (Figure 4a) provides evidence that Cl−, Mg2+, and Na<sup>+</sup> are strongly correlated (see correlation matrix in Table S3, r(Na+, Cl−) = 0.82, r(Mg2+, Cl−) = 0.77, and r(Na+, Mg2+) = 0.77); as well as NH<sup>4</sup> <sup>+</sup>, NO<sup>3</sup> <sup>−</sup>, and SO<sup>4</sup> <sup>2</sup><sup>−</sup> (r(NO<sup>3</sup> <sup>−</sup>, NH<sup>4</sup> <sup>+</sup>) = 0.77; r(NO<sup>3</sup> <sup>−</sup>, SO<sup>4</sup> <sup>2</sup>−) = 0.75; and r(NH<sup>4</sup> <sup>−</sup>, SO<sup>4</sup> <sup>2</sup><sup>−</sup> = 0.78); while these two sets are practically uncorrelated, except Cl<sup>−</sup> and SO<sup>4</sup> <sup>2</sup><sup>−</sup> (r(Cl−, SO<sup>4</sup> <sup>2</sup>−) = 0.49), suggesting the presence of anthropogenic chlorine and sea salt sulphate. A PCA was computed on a Spearman correlation matrix using the concentrations of the ions (Cl<sup>−</sup> , Mg2+, Na<sup>+</sup> , NH<sup>4</sup> + , NO<sup>3</sup> − , and SO<sup>4</sup> 2− ). The PCA correlation circle (Figure 4a) provides evidence that Cl<sup>−</sup> , Mg2+ , and Na<sup>+</sup> are strongly correlated (see correlation matrix in Table S3, r(Na<sup>+</sup> , Cl<sup>−</sup> ) = 0.82, r(Mg2+ , Cl<sup>−</sup> ) = 0.77, and r(Na<sup>+</sup> , Mg2+) = 0.77); as well as NH<sup>4</sup> + , NO<sup>3</sup> − , and SO<sup>4</sup> 2− (r(NO<sup>3</sup> − , NH<sup>4</sup> + ) = 0.77; r(NO<sup>3</sup> − , SO<sup>4</sup> 2− ) = 0.75; and r(NH<sup>4</sup> − , SO<sup>4</sup> <sup>2</sup><sup>−</sup> = 0.78); while these two sets are practically uncorrelated, except Cl<sup>−</sup> and SO<sup>4</sup> 2− (r(Cl<sup>−</sup> , SO<sup>4</sup> 2− ) = 0.49), suggesting the presence of anthropogenic chlorine and sea salt sulphate.

*Atmosphere* **2020**, *11*, x FOR PEER REVIEW 9 of 21

**Figure 4.** Principal component analysis (PCA) on a Spearman correlation chemical matrix. (**a**) Correlation circle and projection of the 6 ion concentrations; (**b**) Two-dimensional map of the colored observations according to the AHC category. The XLstat software automatically displayed confidence ellipses (interval 95%) around AHC categories, and resized points with squared cosines of the observations (i.e., the larger the point, the more it is related to a factor, F1 or F2). **Figure 4.** Principal component analysis (PCA) on a Spearman correlation chemical matrix. (**a**) Correlation circle and projection of the 6 ion concentrations; (**b**) Two-dimensional map of the colored observations according to the AHC category. The XLstat software automatically displayed confidence ellipses (interval 95%) around AHC categories, and resized points with squared cosines of the observations (i.e., the larger the point, the more it is related to a factor, F1 or F2).

In this PCA (Figure 4), the first two factors represent 85.57% of the initial variability of the data; the PCA is robust, with no information hidden in the next four factors (see squared cosines of the variables in Table S4). The horizontal axis (F1) is linked to the total ion concentration and represents 58.16% of the information, while the vertical axis (F2: 27.4%) is linked to the concentrations of NH<sup>4</sup> + , NO<sup>3</sup> − , and SO<sup>4</sup> 2− in positive, and Cl<sup>−</sup> , Mg2+, and Na<sup>+</sup> in negative. The PCA is consistent with the AHC. Coherently, in Figure 4b, the AHC "marine" category stands out on the left (F1 < 0) of the chart, the "highly marine" category at the bottom right (F1 > 0 and F2 < 0), the "continental" and the "polluted" categories at the top right (F1 > 0 and F2 > 0). In this PCA (Figure 4), the first two factors represent 85.57% of the initial variability of the data; the PCA is robust, with no information hidden in the next four factors (see squared cosines of the variables in Table S4). The horizontal axis (F1) is linked to the total ion concentration and represents 58.16% of the information, while the vertical axis (F2: 27.4%) is linked to the concentrations of NH<sup>4</sup> +, NO<sup>3</sup> <sup>−</sup>, and SO<sup>4</sup> <sup>2</sup><sup>−</sup> in positive, and Cl−, Mg2+, and Na<sup>+</sup> in negative. The PCA is consistent with the AHC. Coherently, in Figure 4b, the AHC "marine" category stands out on the left (F1 < 0) of the chart, the "highly marine" category at the bottom right (F1 > 0 and F2 < 0), the "continental" and the "polluted" categories at the top right (F1 > 0 and F2 > 0).

### 3.1.3. Evolution Since the 2001–2011 Study [46] 3.1.3. Evolution Since the 2001–2011 Study

only 10 were clustered in "HM\_01–11".

Cl<sup>−</sup>

In this study, the statistical analysis evolves as compared to our previous work. First, the AHC is performed with a larger number of samples (208 versus 134) and variables considered for the statistical analysis are different, i.e., pH is not taken into account and Mg2+ is added to the variables, as explained above. We removed cloud events with missing values. The ACP Spearman's correlations replaced Pearson's. However, the distribution of categories is fairly unchanged; among the 208 cloud events used in the AHC, 164 (78.8%) were clustered in a category similarly named in the 2001–2011 study [46] (see Table S1). In this study, the statistical analysis evolves as compared to our previous work. First, the AHC is performed with a larger number of samples (208 versus 134) and variables considered for the statistical analysis are different, i.e., pH is not taken into account and Mg2<sup>+</sup> is added to the variables, as explained above. We removed cloud events with missing values. The ACP Spearman's correlations replaced Pearson's. However, the distribution of categories is fairly unchanged; among the 208 cloud events used in the AHC, 164 (78.8%) were clustered in a category similarly named in the 2001–2011 study [46] (see Table S1).

The samples with high Cl<sup>−</sup> , Mg2+ , and Na<sup>+</sup> concentrations are still gathered in the so-called "highly marine" (HM) category. However, the present HM category is an expanded version of the former "highly marine" (HM\_01–11) category, with lower mean concentrations of Cl<sup>−</sup> and Na<sup>+</sup> ([Na<sup>+</sup> ]HM = 192 µM vs. [Na<sup>+</sup> ]HM\_01–<sup>11</sup> = 311 µM and [Cl<sup>−</sup> ]HM = 163 µM vs. [Cl<sup>−</sup> ]HM\_01–<sup>11</sup> = 232 µM). The average ratio The samples with high Cl−, Mg2+, and Na<sup>+</sup> concentrations are still gathered in the so-called "highly marine" (HM) category. However, the present HM category is an expanded version of the former "highly marine" (HM\_01–11) category, with lower mean concentrations of Cl<sup>−</sup> and Na<sup>+</sup> ([Na+]HM = 192 µM vs. [Na+]HM\_01–11 = 311 µM and [Cl−]HM = 163 µM vs. [Cl−]HM\_01–11 = 232 µM). The average

/Na<sup>+</sup> of this updated "HM" category is higher 1.22 vs. 1.06. Among the 31 cloud events in HM,

ratio Cl−/Na<sup>+</sup> of this updated "HM" category is higher 1.22 vs. 1.06. Among the 31 cloud events in HM, only 10 were clustered in "HM\_01–11". *Atmosphere* **2020**, *11*, x FOR PEER REVIEW 10 of 21

The "marine" category is barely larger in percentage than the former one (marine\_01–11), with lower Na+, Cl−, and SO<sup>4</sup> <sup>2</sup><sup>−</sup> concentrations ([Na+]Marine = 23.3 µM vs. [Na+]Marine\_01–11 = 32 µM, [Cl−]Marine = 20.5 µM vs. [Cl−] Marine\_01–11 =30 µM, and [SO<sup>4</sup> <sup>2</sup>−]Marine = 12.8 µM vs. [SO<sup>4</sup> <sup>2</sup>−]Marine\_01–11 = 28 µM). NH<sup>4</sup> <sup>+</sup> and NO<sup>3</sup> <sup>−</sup> are equivalent. Among the 113 cloud events in "marine", 111 are clustered in "marine\_01–11". The "marine" category is barely larger in percentage than the former one (marine\_01–11), with lower Na<sup>+</sup> , Cl<sup>−</sup> , and SO<sup>4</sup> <sup>2</sup><sup>−</sup> concentrations ([Na<sup>+</sup> ]Marine = 23.3 µM vs. [Na<sup>+</sup> ]Marine\_01–<sup>11</sup> = 32 µM, [Cl<sup>−</sup> ]Marine = 20.5 µM vs. [Cl<sup>−</sup> ] Marine\_01–<sup>11</sup> =30 µM, and [SO<sup>4</sup> 2− ] Marine = 12.8 µM vs. [SO<sup>4</sup> 2− ] Marine\_01–<sup>11</sup> = 28 µM). NH<sup>4</sup> <sup>+</sup> and NO<sup>3</sup> <sup>−</sup> are equivalent. Among the 113 cloud events in "marine", 111 are clustered in "marine\_01–11".

Conversely, the "continental" category, is slightly smaller in percentage than the former one (continental\_01–11), 26% of the samples vs. 34%, although the mean SO<sup>4</sup> <sup>2</sup><sup>−</sup> concentration increases ([SO<sup>4</sup> <sup>2</sup>−]Continental = 46.6 µM vs. [SO<sup>4</sup> <sup>2</sup>−]Continental\_01–11 = 94 µM). The other concentrations remain almost unchanged. Among the 35 cloud events in "continental", 35 were clustered in "continental\_01–11". Conversely, the "continental" category, is slightly smaller in percentage than the former one (continental\_01–11), 26% of the samples vs. 34%, although the mean SO<sup>4</sup> <sup>2</sup><sup>−</sup> concentration increases ([SO<sup>4</sup> 2− ]Continental = 46.6 µM vs. [SO<sup>4</sup> 2− ]Continental\_01–<sup>11</sup> = 94 µM). The other concentrations remain almost unchanged. Among the 35 cloud events in "continental", 35 were clustered in "continental\_01–11".

In both studies, 4% of cloud samples are in the "polluted" category, while the mean ion concentrations are markedly lower. Among the nine cloud events in "polluted", eight are clustered in "polluted\_01–11". In both studies, 4% of cloud samples are in the "polluted" category, while the mean ion concentrations are markedly lower. Among the nine cloud events in "polluted", eight are clustered in "polluted\_01–11".

In summary, the "marine" category slightly increases in percentage, as the mean ion concentration of the "continental" and "polluted" categories dwindle, in particular for the anthropogenic ions. "Highly marine" is the category that has expanded the most. This trend is not fully explained by the minor statistical processing adjustments (see Section 2.4). We compared (not shown) the two methods on the first 2001–2011 dataset, without observing any significant difference. In summary, the "marine" category slightly increases in percentage, as the mean ion concentration of the "continental" and "polluted" categories dwindle, in particular for the anthropogenic ions. "Highly marine" is the category that has expanded the most. This trend is not fully explained by the minor statistical processing adjustments (see Section 2.4). We compared (not shown) the two methods on the first 2001–2011 dataset, without observing any significant difference.

We performed a Mann–Whitney test (Figure 5) on the clouds sampled from 2001 to 2011 (period covered by our previous work [46]) and since then. It appears that NH<sup>4</sup> <sup>+</sup>, NO<sup>3</sup> <sup>−</sup>, and SO<sup>4</sup> 2− concentrations are significantly lower on this second period (2012–2018), i.e., ([NH<sup>4</sup> <sup>+</sup>]01–11 = 96 µM vs. [NH<sup>4</sup> <sup>+</sup>]12–18 = 78 µM, [NO<sup>3</sup> <sup>−</sup>]01–11 = 76 µM vs. [NO<sup>3</sup> <sup>−</sup>]12–18 = 44 µM, and [SO<sup>4</sup> <sup>2</sup>−]01–11 = 31 µM vs. [SO<sup>4</sup> <sup>2</sup>−]12–18 = 27 µM (*p*-values are 0.027, 0.0009, and 0.009, respectively). The concentration of sea salts does not evolve significantly, but the changes on anthropogenic classes (mentioned above) drive the changes observed in the "marine" and "highly marine" categories. Category terminology will receive additional justifications in Section 3.2. We performed a Mann–Whitney test (Figure 5) on the clouds sampled from 2001 to 2011 (period covered by our previous work [46]) and since then. It appears that NH<sup>4</sup> + , NO<sup>3</sup> − , and SO<sup>4</sup> concentrations are significantly lower on this second period (2012–2018), i.e., ([NH<sup>4</sup> + ]01–<sup>11</sup> = 96 µM vs. [NH<sup>4</sup> + ]12–<sup>18</sup> = 78 µM, [NO<sup>3</sup> − ]01–<sup>11</sup> = 76 µM vs. [NO<sup>3</sup> − ]12–<sup>18</sup> = 44 µM, and [SO<sup>4</sup> 2− ]01–<sup>11</sup> = 31 µM vs. [SO<sup>4</sup> 2− ]12–<sup>18</sup> = 27 µM (*p*-values are 0.027, 0.0009, and 0.009, respectively). The concentration of sea salts does not evolve significantly, but the changes on anthropogenic classes (mentioned above) drive the changes observed in the "marine" and "highly marine" categories. Category terminology will receive additional justifications in Section 3.2.

2−

**Figure 5.** Mann–Whitney nonparametric tests on 154 clouds sampled from 2001 to 2011, the period covered by the previous study [46], and 88 clouds sampled from 2012 to 2018. We compare, for both periods, NH<sup>4</sup> + , NO<sup>3</sup> − , and SO<sup>4</sup> <sup>2</sup><sup>−</sup>concentrations. The *p*-values of all pairwise comparisons are significant at level alpha = 0.05. **Figure 5.** Mann–Whitney nonparametric tests on 154 clouds sampled from 2001 to 2011, the period covered by the previous study [46], and 88 clouds sampled from 2012 to 2018. We compare, for both periods, NH<sup>4</sup> <sup>+</sup>, NO<sup>3</sup> <sup>−</sup>, and SO<sup>4</sup> <sup>2</sup>−concentrations. The *p*-values of all pairwise comparisons are significant at level alpha = 0.05.

### *3.2. Influence of Air Mass History at PUY 3.2. Influence of Air Mass History at PUY*

This section is devoted to the correlation between the concentration of the inorganic ions and the air mass history. During their atmospheric transports, the air masses received chemical species under various forms (gases and particles) from various sources. This strongly depended on the altitude of the air masses. During the transport, chemicals could also undergo multiphasic chemical transformations, as well as dry or wet deposition. The objective, here, is to evaluate the effect of the history of air masses This section is devoted to the correlation between the concentration of the inorganic ions and the air mass history. During their atmospheric transports, the air masses received chemical species under various forms (gases and particles) from various sources. This strongly depended on the altitude of the air masses. During the transport, chemicals could also undergo multiphasic chemical transformations, as well as dry or wet deposition. The objective, here, is to evaluate the effect of the history of air masses

on the chemical composition of clouds. To this end, PLS regressions are performed and the results are

As described in Section 2.4, the CAT model provides two matrices. The "zone matrix" contains information about the time spent by air masses over "continental surface" or "sea surface", in the

validated with nonparametric tests (Kruskal–Wallis and Mann–Whitney tests).

on the chemical composition of clouds. To this end, PLS regressions are performed and the results are validated with nonparametric tests (Kruskal–Wallis and Mann–Whitney tests).

As described in Section 2.4, the CAT model provides two matrices. The "zone matrix" contains information about the time spent by air masses over "continental surface" or "sea surface", in the atmospheric boundary layer (<ABLH) or in the free troposphere (>ABLH). The "sector matrix" contains information about the time spent by air masses in the eight forty-five degrees sectors (NNE, ENE, ESE, SSE, SSW, WSW, WNW and NNW; see Figure S1c). Figure 6a represents the distribution of these parameters for all the cloud events. Despite the distance from the coast (300 Km), the strong maritime influence at PUY is obvious (Figure 6a). Over a 72-hour backward trajectory, on average, an air mass spends almost two days over the "sea surface". Coherently, PUY is characterized by prevailing strong west and north winds (WSW, WNW, NNW, and NNE), with average percentages of time spent over these four main sectors of 23, 44, 14, and 12%, respectively (Figure 6b). *Atmosphere* **2020**, *11*, x FOR PEER REVIEW 11 of 21 contains information about the time spent by air masses in the eight forty-five degrees sectors (NNE, ENE, ESE, SSE, SSW, WSW, WNW and NNW; see Figure S1c). Figure 6a represents the distribution of these parameters for all the cloud events. Despite the distance from the coast (300 Km), the strong maritime influence at PUY is obvious (Figure 6a). Over a 72-hour backward trajectory, on average, an air mass spends almost two days over the "sea surface". Coherently, PUY is characterized by prevailing strong west and north winds (WSW, WNW, NNW, and NNE), with average percentages of time spent over these four main sectors of 23, 44, 14, and 12%, respectively (Figure 6b).

**Figure 6.** (**a**) Natural logarithm of the number of CAT (computing advection-interpolation of atmospheric parameters and trajectory tool) back trajectories points arriving at the summit of the PUY station per square of 0.8° size for "highly marine" (top left), "marine" (bottom left), "polluted" (top right), and "continental" (bottom right) categories. The black lines separate the different sectors. Distributions of the parameters evaluated by the CAT model for all the 295 cloud events; (**b**) Percentage of the time spent over the "sea surface" (blue) and "continental surface" (brown), in pale blue and pale brown below the atmospheric boundary layer height (<ABLH) and dark above in the free troposphere (>ABLH); (**c**) Percentage of the time spent in the 8 forty-five degrees sectors (NNE, ENE, ESE, SSE, SSW, WSW, WNW and NNW). One box plot per zone/sector is displayed. The black crosses correspond to the means. The central horizontal bars are the medians. The lower and upper limits of the box are the first and third quartiles, respectively. The ends of whiskers are 10th and 90th percentiles. Black diamonds are minimum and maximum for each species. **Figure 6.** (**a**) Natural logarithm of the number of CAT (computing advection-interpolation of atmospheric parameters and trajectory tool) back trajectories points arriving at the summit of the PUY station per square of 0.8◦ size for "highly marine" (top left), "marine" (bottom left), "polluted" (top right), and "continental" (bottom right) categories. The black lines separate the different sectors. Distributions of the parameters evaluated by the CAT model for all the 295 cloud events; (**b**) Percentage of the time spent over the "sea surface" (blue) and "continental surface" (brown), in pale blue and pale brown below the atmospheric boundary layer height (<ABLH) and dark above in the free troposphere (>ABLH); (**c**) Percentage of the time spent in the 8 forty-five degrees sectors (NNE, ENE, ESE, SSE, SSW, WSW, WNW and NNW). One box plot per zone/sector is displayed. The black crosses correspond to the means. The central horizontal bars are the medians. The lower and upper limits of the box are the first and third quartiles, respectively. The ends of whiskers are 10th and 90th percentiles. Black diamonds are minimum and maximum for each species.

To perform the PLS analysis (Figure 7), the matrix of the explanatory variables (the "Xs") is composed of the "sector matrix" and the "zone matrix". The matrix of the dependent variables (the "Ys") is the chemical matrix. As explained in Section 2.3, we restricted our statistical analysis to the concentration of six chemical compounds to avoid excessive loss of information and overfitting in the statistical analyses. To perform the PLS analysis (Figure 7), the matrix of the explanatory variables (the "Xs") is composed of the "sector matrix" and the "zone matrix". The matrix of the dependent variables (the "Ys") is the chemical matrix. As explained in Section 2.3, we restricted our statistical analysis to the concentration of six chemical compounds to avoid excessive loss of information and overfitting in the statistical analyses.

*Atmosphere* **2020**, *11*, x FOR PEER REVIEW 12 of 21

**Figure 7.** Partial least squares (PLS) chart with t component on axes t1 and t2. The correlations map superimposes the "Xs", the "Ys" and the cloud events. The dependent variables from the chemical matrix are symbolized by a black "Y"; the explanatory variables from the "sector matrix" by a black "X"; and from the "zone matrix" by a blue, brown, light or dark "X". The 208 cloud events are gathered by AHC category (red circle, "marine"; dark blue diamond, "highly marine"; yellow square, "continental"; and red triangle, "polluted"). **Figure 7.** Partial least squares (PLS) chart with t component on axes t1 and t2. The correlations map superimposes the "Xs", the "Ys" and the cloud events. The dependent variables from the chemical matrix are symbolized by a black "Y"; the explanatory variables from the "sector matrix" by a black "X"; and from the "zone matrix" by a blue, brown, light or dark "X". The 208 cloud events are gathered by AHC category (red circle, "marine"; dark blue diamond, "highly marine"; yellow square, "continental"; and red triangle, "polluted").

The index of the predictive quality of the models is quite low (Q<sup>2</sup> = 0.1, ideally it should be close to 1) suggesting weak correlations. It is well known that cloud composition depends on many other parameters than the chosen explanatory variables, related to the air mass history calculated by the model. Indeed, cloud chemical composition depends foremost on local microphysics [17,37,73], proximity to sources [33,48,74,75], biological activity [4,5,61], seasonal cycles [30,76–78], and diurnal The index of the predictive quality of the models is quite low (Q<sup>2</sup> = 0.1, ideally it should be close to 1) suggesting weak correlations. It is well known that cloud composition depends on many other parameters than the chosen explanatory variables, related to the air mass history calculated by the model. Indeed, cloud chemical composition depends foremost on local microphysics [17,37,73], proximity to sources [33,48,74,75], biological activity [4,5,61], seasonal cycles [30,76–78], and diurnal cycles [79].

cycles [79]. Figure 7 displays numerous intricacies between chemical parameters and the air mass history. First, some zone variables are weakly correlated to some sector variables (cf. PLS correlation matrix Figure 7 displays numerous intricacies between chemical parameters and the air mass history. First, some zone variables are weakly correlated to some sector variables (cf. PLS correlation matrix

28

in Table S5), "sea surface" (>ABLH) with WNW), "sea surface" (<ABLH) with WSW), "continental

in Table S5), "sea surface" (>ABLH) with WNW), "sea surface" (<ABLH) with WSW), "continental surface" (>ABLH) with ENE (too few observations to be interpretable on the graph) and more robustly, "continental surface" (>ABLH) with ENE (R = 0.7).

The "polluted" category in red (Figure 7) and, to a lesser extent, the "continental" category in yellow are on the left of the display, toward the NNE sector and the "continental surface" (>ABLH) zone. The "highly marine" category in dark blue and, to a lesser extent, the "marine" category in light blue are drawn toward the WNW/WSW sectors and the "sea surface" (>ABLH) zone.

We performed an AHC on the "sector matrix" and obtained three clusters. Then, we reran the previous PLS. The simplified correlation matrix (Table 1) highlights the link between "sea surface" zones west sector and "continental surface" zones and northeast sector. We do not keep this clusterization in the main PLS to avoid a loss of information.

**Table 1.** PLS correlation matrix between clustered sector variables and zone variables. Highest correlation displayed in red and highest anti-correlation in blue.


The simplified correlation matrix (Table 2) displays weak correlations. However, the link between sea salts (Cl−, Mg2+, and Na+) and both the "sea surface" (>ABLH) zone and the WSW/WNW clustered sectors is noticeable. The same applies to ions of potentially anthropogenic origin (NH<sup>4</sup> <sup>+</sup>, NO<sup>3</sup> −, and SO<sup>4</sup> <sup>2</sup>−) and both the "continental surface" (>ABLH) zone and the NNW/NNE/ENE sector. For both marine and continental ions, the correlations are higher above the atmospheric boundary layer height (>ABLH), confirming PUY is surely influenced by long-range transport [42,43].


**Table 2.** PLS correlation matrix between chemical variables and both zone and clustered sector variables. Highest correlation displayed in dark red and highest anti-correlation in dark blue.

In order to statistically validate these observations, we performed the Kruskal–Wallis test and compared the category distribution within each zone (Figure S1a) and main sectors (Figure S1b). As the computed *p*-values are lower than the significance level alpha = 0.05, we accept that the main sectors (WSW, WNW, NNW, NNE, and ENE) and the zones ("sea surface" (>ABLH), "sea surface" (<ABLH), and "continental surface" (>ABLH)) are significantly different for each category. The samples do not come from the same population. Only the *p*-value of "continental surface" (>ABLH) is greater than the significance level alpha = 0.05 (*p*-value = 0.062). The difference between the categories according to the sector distribution can also be observed on the map (Figure S1c) provided by the CAT model. The history of air masses significantly influences the chemical composition of clouds.
