3.1. Deep Neural Network
In 2006, deep learning was introduced in the research of Hinton and his students, which initiated the research wave of multilayer neural networks, and many scholars began to study the applications of DNN in various fields [
32]. In short, DNN can be understood as a neural network that contains many hidden layers and the hidden layers contain many neurons. DNN has a similar network structure to SNN widely used in atmospheric parameters retrieval, i.e., it contains an input layer, several hidden layers, and an output layer. Moreover, as with SNN, DNN can be based on the backpropagation learning algorithm, which targets the minimum squared error of prediction and adjusts the threshold and weight of the network to close the expected value. SNN and DNN are both fully connected networks, where each neuron in each layer is connected to all neurons in the next layer. In the hidden layer, each neuron performs nonlinear computation on all input vectors to achieve a nonlinear description of the relationship between input and output samples, which makes them have nonlinear mapping capability [
33]. Building a DNN or SNN will involve error backpropagation algorithms, loss functions, and gradient descent algorithms, etc., as detailed in References [
26,
34,
35]. In addition, the structural similarity between DNN and SNN makes them operate in similar ways when applied to microwave remote sensing atmospheric parameters, such as the building of learning samples, the application of error backpropagation algorithm, the setting of activation function, etc. However, the differences in the number of hidden layers and the number of neurons in each hidden layer between DNN and SNN make DNN have a stronger learning ability than SNN and, therefore, show a superior application.
The purpose of DNN, as a type of neural network, is to build a statistical model between input and output samples through supervised learning in a training dataset, and to make predictions on new input samples from the validation dataset when they are fed into the established statistical model. In this study, the application of DNN to the retrieval of atmospheric temperature and humidity profiles using MWHTS observations involved two main aspects: the building of the DNN-based observation-bias-correction model and the building of the DNN-based retrieval model for temperature and humidity profile.
For MWHTS-observation-bias correction, DNN was used to build a statistical model between MWHTS-observed brightness temperatures and MWHTS observation biases for predicting the observation biases, and the observation biases are defined as follows:
where
is the observed brightness temperatures, and
is the simulated brightness temperatures. The predictions of the observation bias are obtained as follows. First, the training dataset of DNN is established, i.e., the MWHTS observed brightness temperatures in the analysis dataset are set as the input samples, and the observation biases in the analysis dataset are set as the output samples. Then, the four-layer network structure of DNN (i.e., one input layer, two hidden layers, and one output layer) is built. The training dataset is used to train the DNN model, and the DNN-based observation-bias prediction model is built. Finally, the MWHTS observed brightness temperatures from the testing dataset are fed into the DNN-based observation-bias-prediction model to obtain the predictions of the observation bias, and the observation biases in the testing dataset are used to verify the correction effect of the DNN-based observation-bias-prediction model. Further details of the training and testing of the DNN-based observation-bias-prediction model are contained in
Section 3.3 and
Section 4, respectively. Based on the predictions of the observation bias, the observation-bias-correction model and the corrected brightness temperatures are obtained:
where
is the prediction of the observation bias. An illustration of the schematic of the observation-bias-correction process is displayed in
Figure 2.
For the retrieval of atmospheric temperature and humidity profiles, both MWHTS observed brightness temperature and MWHTS simulated brightness temperature can be used to build the retrieval model. The DNN-based retrieval model using the observed brightness temperature is built as follows. The training dataset of DNN is established, i.e., MWHTS observed brightness temperatures in the analysis dataset are set as the input samples, and the atmospheric temperature and humidity profiles in the analysis dataset are set as the output samples. Then, the four-layer network structure (i.e., one input layer, two hidden layers, and one output layer) of DNN is built, and DNN is trained by using the training dataset. Thus, the DNN-based retrieval model, using the observations, is built. If DNN is trained with the simulated brightness temperature instead of the observed brightness temperature in the training dataset, the DNN-based retrieval model using the simulations is obtained. Finally, the observed brightness temperatures from the testing dataset are fed into the DNN-based retrieval model using the observation or the corrected brightness temperature from the testing dataset are fed into the DNN-based retrieval model, using the simulations to obtain the retrievals of the temperature and humidity profiles, and the temperature and humidity profiles in the testing dataset are used to verify the retrieval of the temperature and humidity profiles. Further details of the training and testing of the DNN-based retrieval models are also contained in
Section 3.3 and
Section 4, respectively. The schematic of the retrieval of the atmospheric temperature and humidity profiles using the above two DNN-based retrieval models are summarized in
Figure 3 and
Figure 4, respectively.
3.2. The 1DVAR Algorithm
The 1DVAR algorithm is generally labeled under the general term of physical retrieval that inputs the initial values of atmospheric parameters to the radiative transfer model and adjusts the initial values through an iterative process with the aim of fitting the simulations from the radiative transfer model to the observations from the satellite. The 1DVAR algorithm mainly includes two parts: one is the radiative transfer model for the simulations of brightness temperature; the other is the minimization of the cost function. Assuming that the errors in the observations are neither biased nor correlated, Gaussian distribution, the optimal estimate of the atmospheric state variable,
, can be obtained by minimizing the following cost function [
36]:
where
is the observation-error covariance matrix, which is the sum of the covariance error in the brightness temperature simulations and the sensor noise;
is the background covariance matrix;
is the background state variable; f is the forward operator that simulates the satellite observations at the atmospheric state variable,
S; and T represents the matrix transpose. By minimizing the cost function, ξ, the optimal solution is obtained as follows:
where
K is the tangent linear function of f at point
S,
n is the iteration index, and
S1 is the initial state variable.
In this study, the parameters of the 1DVAR for retrieving atmospheric temperature and humidity profiles using MWHTS observations were specifically set as follows. The averages of the temperature and humidity profiles in the analysis dataset were taken as both the background state variable,
Sa, and the initial state variable,
. MWHTS observation bias was corrected by using DNN, as detailed in
Section 3.1, above. After removing biases in the observations, the biases between the observations and the simulations and the sensitivities of MWHTS measured in flight (see
Table 1), which are often considered as the instrument channel noise, were used to compute the observation-error covariance matrix,
; the atmospheric temperature and humidity profiles in the analysis dataset were used to generate the background covariance matrix,
. For details on the calculation of the observation-error covariance matrix and the background covariance matrix, see References [
20,
37,
38]. The building procedure of the MWHTS 1DVAR retrieval system by setting the parameters of 1DVAR is shown in
Figure 5.
3.3. Design of Retrieval Experiment
In this study, three commonly used retrieval schemes were designed for retrieving atmospheric temperature and humidity profiles by using MWHTS brightness temperatures to study the effect of DNN on the retrieval accuracy of temperature and humidity profiles in different retrieval schemes. In order to compare the performance of DNN in retrieving temperature and humidity profiles, studies of SNN in the three retrieval schemes were carried out. The three retrieval schemes for retrieving atmospheric temperature and humidity profiles using MWHTS brightness temperatures were specifically designed as follows.
The first retrieval scheme: The 1DVAR retrieval retrieved the atmospheric temperature and humidity profiles based on the MWHTS 1DVAR retrieval system. First, according to the description in
Section 3.1, the DNN-based and the SNN-based observation-bias-correction models were built, and they were used to correct MWHTS observed brightness temperatures in the testing dataset, respectively. Then the DNN-based corrected brightness temperatures and the SNN-based corrected brightness temperatures were obtained. Then the parameters of the 1DVAR were set according to the description in
Section 3.2, and the MWHTS 1DVAR retrieval system was established. Finally, the DNN-based corrected brightness temperatures and the SNN-based corrected brightness temperatures were input to the MWHTS 1DVAR retrieval system, and the retrieval results of the atmospheric temperature and humidity profiles based on DNN and SNN models were obtained, respectively. The schematic of the 1DVAR retrieval scheme is summarized in
Figure 6.
The second retrieval scheme: The NN-based retrieval using the observations retrieved the atmospheric temperature and humidity profiles by the NN-based retrieval model using the observations. The DNN-based retrieval model using the observations and the SNN-based retrieval model using the observations were established, respectively, as described in
Section 3.1. Then the MWHTS observed brightness temperatures in the testing dataset were input to the DNN-based retrieval model using the observations and the SNN-based retrieval models using the observations, respectively. Then, the retrieval results of atmospheric temperature and humidity profiles based on DNN and SNN models were obtained, respectively.
The third retrieval scheme: The NN-based retrieval using the simulations retrieved the atmospheric temperature and humidity profiles by the NN-based retrieval model using the simulation. The DNN-based retrieval model using the simulations and the SNN-based retrieval model using the simulations were established, respectively, as described in
Section 3.1. Then, the DNN-based corrected brightness temperature and the SNN-based corrected brightness temperature obtained in the first retrieval scheme were input to the DNN-based and SNN-based retrieval models using the simulations, respectively. Then the retrieval results of atmospheric temperature and humidity profiles based on DNN and SNN models were obtained, respectively.
In this study, the DNN and the SNN design used in the three retrieval schemes both produced the best results in terms of reproducing the observations bias and the atmospheric temperature and humidity profiles. The input layers of DNN and SNN used in the observation-bias correction have 15 neurons, which receive the observations of 15 channels of MWHTS, and the output layers have 15 neurons, which output the observation biases corresponding to the observations in the input layers. In the retrievals of the temperature and humidity profiles, the input layers of DNN and SNN also have 15 neurons, which receive the simulations or the observations of MWHTS; the output layers have 74 neurons, which output the temperature and humidity profiles.
For the hidden layers of DNN and SNN applied in the three retrieval schemes, the configurations of neurons, layers, and activation function were determined by extensive testing (CPU: Intel I5, 1.8 GHz; Memory: 16 GB; GPU: NVIDIA GeForce GTX 1060 6 GB). In the extensive testing, the trained DNN and SNN for the observation-bias correction were evaluated by the Root Mean Square Error (RMSE) between the predictions of the observation bias and the observation biases in the testing dataset, and the trained DNN and SNN for the retrieval of the temperature and humidity profiles were evaluated by the RMSE between the retrievals and the temperature and humidity profiles in the testing dataset.
In the three retrieval systems, one hidden layer for SNN and two hidden layers for DNN can ensure adequate training of the model in the extensive testing. Then the Rectified Linear Unit (ReLU) was selected as the activation function in this study because it can overcome the problems of saturation and vanishing gradients [
26]. Moreover, compared with Leaky ReLU and sigmoid, DNN or SNN with ReLU can obtain the highest prediction accuracies in the three retrieval schemes. SNN was trained with a different number of neurons in the hidden layer, and the number of neurons increases from 5 to 50, one by one. However, DNN was trained with the number of neurons in the hidden layers; the number of neurons increases from 10 to 1000 in steps of 10, and the same number of neurons was used in both hidden layers. It can be found that small differences in the number of neurons in the hidden layer have a significant impact on the prediction accuracy of the SNN, while the impact on the DNN is small. Targeting the prediction accuracy of the neural network, the number of the hidden layer, the number of neurons in the hidden layers, and the activation function were determined by extensive testing for the three retrieval schemes listed in
Table 2 and
Table 3. In addition to that, it is important to avoid overfitting in the training. Early stopping can terminate training before overfitting occurs, which split the training dataset and use a subset (20%) as a validation dataset to monitor the performance of NN in the training. An arbitrary maximum number of training epochs is specified, and the training will be terminated if the loss on the validation dataset does not change over a given number of epochs (i.e., patience). The maximum numbers of epochs and the patience for DNN and SNN used in the three retrieval schemes were 2000 and 100, respectively.