Knowing the values of vertical profiles of meteorological parameters, we investigated their relationship with BSPM. In turn, neural networks allow us to reconstruct almost any dependence given a sufficient amount of data. It was stated above that there were 312 cases of HLC observations, which is rather small for such a task. This fact imposes a restriction on the maximum size of the neural network and the number of parameters to be trained, which should not exceed the amount of data.
The presence of the first step is motivated by the fact that if we use the values from the vertical profiles as is, it will lead to a significant increase in the number of neural network parameters.
In addition, we have significantly expanded the database by conducting many atmospheric sensing experiments in recent years. New algorithms have been created to data process and compare them with the meteorological situation.
3.1. Preliminary Analysis
As a first step in the analysis, the change in BSPM elements with altitude within the HLC thickness was investigated.
Figure 2 shows an example of one BSPM element measurement made on 19 May 2016, starting at 14:07. The m
11 element is always equal to one because all BSPM elements are normalized to it. The remaining plots show the behavior of the values of all BSPM elements within the HLC altitude range. The element values in both channels of the lidar receiving system are consistent with each other and correspond to some constant value with some noise added. Similar behavior is present in the other dimensions. Thus, altitude dependence within the HLC is not observed, and for one observation, it is sufficient to take the median value of the BSPM element for the channel with the best signal-to-noise ratio, which is more resistant to the presence of outliers.
The next step was to study the distributions of values of the BSPM elements as functions of meteorological parameters.
Figure 3 presents histograms of median values of BSPM elements in the range of HLC altitudes. Since the measurements were performed under different meteorological conditions and for different altitudes of HLC formation, we can assume that we should obtain histograms “smeared” in some interval, since their construction did not consider the presence of dependencies on meteorological parameters.
Figure 3 shows that this is fulfilled only for the following BSPM elements: m
22, m
33, and m
44. For the remaining elements, no variability is observed. For elements m
12, m
21, m
13, m
31, m
14, and m
41, a distribution like a normal distribution is observed, with a mean of 0 and some small variance at the noise level. The background lidar signal value is used as a noise. This value is calculated as the average lidar signal for the upper 3 km of the lidar operation altitude range (12–15 km) for each receiving channel. For elements m
24, m
42, m
34, and m
43, the similarity to a normal distribution is no longer observed, but the mean is also observed in the zero region. From this, we can conclude that the most sensitive to environmental conditions are the following BSPM elements: m
22, m
33, and m
44.
The last step is to make sure that there is some dependence between the meteorological parameters and the values of the HLC BSPM elements. To investigate this question, instead of the altitude profile, the value of the meteorological parameter in the central height region of the HLC was taken.
Figure 4,
Figure 5 and
Figure 6 show the scatter plot for the m
22, m
33, and m
44 BSPM elements. In pressure and temperature diagrams, there is no distribution of elongated points along one of the axes, which indicates that there is some dependence between the values. This was implemented as a proof of concept to test for any dependency on the experimental environment. So, if we obtain any shape that differs from the ellipse/band, then we have some dependency. And we obtained a non-ellipse/non-band shape for some coefficients. Therefore, it makes sense to try machine learning to restore this dependency. Further, in the article, we provide an analysis of the altitude profiles of meteorological parameters. The center point was taken only for simplification of view.
Figure 7,
Figure 8,
Figure 9 and
Figure 10 show the scatter plots for the m
24, m
42, m
34, and m
43 BSPM elements. In all figures, there is a well-defined vertical trend, which signals weak or no dependence between the values.
Thus, the following HLC BSPM elements are subject to analysis using machine learning methods: m
22, m
33, and m
44. This is probably due to the fact that the element m
44 and the sum m
22 + m
33 are invariant with respect to the rotation of the lidar basis (or the cloud itself) relative to the vertical axis [
23]. The rest of the elements do not reveal dependences with meteorological parameters or require much more data.
3.2. Implementation of Data Dimensionality Reduction
A total of 124,512 altitude profiles of meteorological parameters with hourly resolution were obtained using the ERA5 reanalysis for the period from 2009 to 2023. Each profile corresponds to the lidar coordinates and consists of more than 30 points on a heterogeneous pressure grid. For standardization, all profiles were transformed by interpolation to a single 31-point elevation grid. Feeding all of these values to the input of a neural network will result in the number of parameters of this network increasing significantly, which creates the need for data compression with maximum information preservation. The classical method of dimensionality reduction is principal component analysis (PCA). This method uses a singular value decomposition of the covariance matrix of the data. The spectrum of this decomposition characterizes the components that carry the most information.
An alternative approach is the use of so-called autoencoders, which are special neural networks. Such a tool can be represented as an ordinary multilayer perceptron. Its specific feature is that during training, they are required to obtain the same values at the output as at the input. In this case, each layer of the neural network is an encoder that translates the vector of input values into a new linear space of a different dimension. If there is a layer with a small number of neurons inside the autoencoder, it leads to data compression. The outputs of this layer can be used as new components containing maximum information from the point of view of this autoencoder. In some cases, such neural networks obtain better data compression than PCA, as they take into account nonlinear dependencies between input data. In addition, the choice of the activation function allows for customization of the mapping scale of the input data to the compressed representation. We will use the hyperbolic tangent as such a function: this will allow us to obtain components in the range from –1 to 1. The activation function of the output layer will be linear, which will allow us to obtain arbitrary values at the output of the neural network. In addition, we need to take into account the fact that the input data have different scales of change. This complicates the use of a hyperbolic tangent, which may contribute to the frozen state of some neurons. This in turn results in values that are too large or too small, forcing the neuron to keep its state at –1 or 1, which disturbs the training of the network. To eliminate this problem, all input data are normalized by subtracting the mean from them and dividing by the standard deviation. This produces input values with a mean of 0 and a standard deviation of 1.
To determine the dimensionality of the inner compressive layer of the autoencoder, as well as the constraint on the singular spectrum in PCA, we considered the standard deviation between the original altitude profiles and those reconstructed from the compressed components, determined according to the following formula:
If increasing the number of compressed components did not significantly improve this metric, the increase in components was stopped. During training, the data were divided randomly into two samples: training and test. The test sample represents 33% of the total data.
Table 1 shows the standard deviation values for PCA and autoencoder (AE) obtained on the test sample. In most cases, AE gives better compression than PCA. The exception is absolute humidity, for which these approaches give comparable results.
The following values for the number of compressed components were found during training: temperature profile requires three components; relative humidity profile requires six components; absolute humidity profile requires three components; and wind speed profile requires five components.
3.3. Estimation of HLC Detection Altitude
One of the important characteristics of HLCs, in addition to the BSPM elements, are the altitudes of their lower and upper boundaries. It is of interest to determine whether there is a relationship between meteorological parameters and HLC detection altitudes. For this purpose, it was decided to move from the determination of the upper and lower boundary to the determination of the altitude of the cloud center and the deviation of the boundaries from this center. To counterbalance the scale of the center altitude variation, normalization was performed: 8 km was subtracted and divided by two (average HLC altitude is about 6–10 km, thickness—4 km. We take the center of this interval and divide by half of the thickness). This allows us to further obtain a more stable process of neural network training. The normalization parameters were taken from the general distribution of altitudes. Due to the fact that there are data from lidar experiments from 2009, in which the HLC altitudes were also determined, the set of available values increases to 779 observations.
To determine the quality of neural network performance, we used the cross-validation approach, which is convenient to apply in conditions with a small amount of data. In this approach, the data are randomly divided into K equal parts, so-called folds. Then, the same steps are performed for each part:
The current part forms the test sample;
The remaining parts form the training sample;
Training of the neural network on the training sample and calculation of the standard deviation on the test sample.
As a result, we obtain K different values of standard deviations and K trained neural networks. In the case of normal training and the absence of data heterogeneity, these values will be commensurable in values. Otherwise, we will obtain quite different values.
In addition, the random forest (RF) method with the number of trees equal to 100 was used as a reference. This method is convenient because it is robust to different scales of input data changes and is not prone to overtraining. Thus, if the neural network obtains a result worse than the random forest method, it becomes a sign that the network architecture is chosen incorrectly. It is also a benefit that the random forest method allows us to identify those input parameters that give the greatest contribution to the determination of the output value, which can also provide additional information for analysis. Thus, it is interesting to study the behavior of the random forest method on full data and on compressed ones. At the same time, it is important to understand that, in some cases, the set of parameters obtained in this way may lead to misinterpretation of the data. This information can only be used for auxiliary purposes.
In addition, point diagrams are a convenient study tool to evaluate the relationship of one value to another.
Figure 11 shows the scatter plots for each fold using the random forest method and compression with PCA.
In the case of perfect altitude determination, we should see a straight line as the estimate will coincide with the true value. Due to the presence of noise or weak dependence on the input data, the estimate will differ from the true value. In the figure, we can see a linear relationship between the estimate and the true value. In addition, there are point deviations that give a very large error. Presumably, this is due to either an error in the experimental determination of the HLC altitude or to the specific circumstances of its formation.
It is convenient to consider the estimation error in terms of standard deviation. The values of this deviation can be seen in
Table 2 (first column). The magnitude of the error is of the order of 1 km but with low variation, signaling the homogeneity of the data and the absence of specific outliers.
For the neural network, we have chosen a model of an ordinary multilayer perceptron with one hidden layer of 15 neurons and an activation function in the form of a hyperbolic tangent. The output neuron has a linear activation function since we are trying to solve a regression problem. This architecture corresponds to 316 training parameters, which is smaller than the training dataset. Thus, it will be more difficult for the neural network to be overtrained.
Figure 12 shows the scatter plots obtained with the neural network and data compression using PCA. A linear relationship can also be seen here. The values of the standard deviation are presented in
Table 2 (second column). The values are of the same order of magnitude as the random forest method but exceed it. This is most likely due to insufficient data, making the training of a large network vulnerable to overtraining and a small network insufficient to reconstruct the dependency. Increasing the number of neurons in the hidden layer leads to a rapid overtraining of the network and an increase in the error, while decreasing it leads to an inability to reconstruct the dependency and also to an increase in the error.
Figure 13 and
Figure 14 show scatter plots using autoencoder compression. The error rates are comparable to PCA compression. The use of compressed data also allows for the estimation of observed values, and both methods give comparable results.
In all cases, there is a tendency to be able to estimate altitude using machine learning methods and meteorological observations. However, the magnitude of the error is too high to use these results for practical applications. This can be corrected either by adding additional specific data, such as taking into account the dynamics of profile changes over time, or taking into account anthropogenic factors. It is also necessary to expand the experimental dataset, which will allow us to obtain more unambiguous results.
3.5. Evaluation of HLC BSPM Elements
In this paper, the elements m22, m33, and m44 of the HLC BSPM were evaluated. There were 312 measurements suitable for training, which significantly complicated the task due to the small size of the experimental array. A random forest method with 100 trees was used as a reference. This allowed us to estimate the potentially optimal result. As a neural network, we took a multilayer perceptron with a hidden layer of five neurons—the simplest model for small statistics. To evaluate the quality of these approaches, we also used the value of standard deviation and cross-validation of two folds. The following results were obtained.
Table 4 presents data for the m
22 BSPM element. The worst result is observed when using the neural network method, while the random forest method yields better results.
The scatter plot in
Figure 17 shows a weak relationship between the estimate of m
22 and its measured value. This is most likely due to the weak relationship with the input parameters. In
Figure 18, it can be seen that the neural network is almost unable to detect the relationship between the compressed meteorological parameters and the element m
22.
Table 5 shows the m
33 BSPM element. The best result is observed when using the random forest method. Both methods virtually did not determine the presence of dependencies (
Figure 19 and
Figure 20), i.e., this BSPM element is independent of meteorological conditions.
Table 6 shows the m
44 BSPM element. A better result is observed for the random forest method. In both cases, only a small dependence on the input values is observed (
Figure 21 and
Figure 22).
Concluding the description of the results, it is worth recalling that we considered the elements of the HLC BSPM to be normalized by m
11; because of this, m
11 in the analyzed matrices is equal to one. The elements m
22 and m
44 of such BSPMs depend on meteorological parameters. Element m
33 requires further study and expansion of the analyzed dataset. The enrichment of the experimental dataset is continued [
18,
21]. In addition, the influence of other atmospheric parameters on the HLC BSPM elements is being investigated. These parameters will be added as input data for neural network training in the future.