2.2.2. Radiation Correction and Spectral Reflectivity
According to radiation transmission theory, the radiance of the target detected by the hyperspectral imager mounted on the drone has two parts. One part is radiation from water, and the other part is diffuse reflection from the sky. Among them, the sky diffuse reflection is radiation information without any water surface information, which needs to be removed. According to [
3,
5,
34], radiance detected by a spectrometer can be expressed as follows:
where
is the radiance of departure from water;
is the diffuse reflection of the sky without any water information;
is the reflectivity of the air–water boundary facing the sky light, which is influenced by various factors such as solar system, observation geometry, wind speed, and etc. the value of
can be set in the range of
. In a breezy or windless environment, the water surface is calm and
is set to 0.022. When the wind speed is about 5 m/s,
can be set to 0.025, and when the wind speed is 10 m/s,
is set to 0.026–0.028.
In order to calculate the reflectivity of the water surface, the total incident radiation
needs to be estimated. According to [
3,
35], we can place the gray standard board on the waterside ground, and the gray standard plate reflectivity is about 10% to 30%. Then we can estimate the total incident radiance by Equation (2).
where
is digital value of the radiance,
is the reflectivity of the standard gray board.
Finally, the departure reflectivity of water
can be calculated by Equation (3):
2.2.4. Noisy-Label Instance Selection
Given a data set
. For regression tasks, according to the regression definition, we can write as:
where
is a regression mapping function,
is the error of the predicted value and the label value. For data with noisy labels, the label
may not be true values, and we assume that the latent true values are
, the error between the latent true value
and the actual value
needs to be considered. We assume the noise error conforms to a normal distribution, then Equation (4) can be written as:
where
indicates mean-shift parameter, representing the mean value of the difference between the label value and the latent true value.
is random error and
. If
is non-zero, it means this sample pair
may be polluted by noise. Meanwhile, to guarantee the fidelity of the data, we assume that the noisy-label instances are sparse, which can be expressed as:
where
k is a parameter indicating maximum noise label threshold. The value of
k is usually unknown and related to specific dataset.
For regression problems, the solution objective can be expressed as:
where
is a function measuring the distance of a and b,
is a sparse loss function. Equation (7) indicates that the number of detected noisy samples should be as small as possible in the case of eliminating noisy label offsets.
To determine appropriate value of
, we introduce RegENN algorithm. RegENN is a noisy label sample selection method for regression problems proposed by Kordos [
17] et al. The core idea of the RegENN algorithm is that for regression problems, the labels of similar samples should be similar. If the labels of similar instances of the sample to be detected are quite different from its label, the sample to be detected can be considered as a noisy-label instance. The RegENN algorithm can be expressed by pseudocode as the following Algorithm 1.
Algorithm 1: RegENN: Edited Nearest Neighbor for regression using a threshold |
Data: Training set , hyper parameter α to control how the threshold is calculated from the standard deviation, the number of neighbors k to train the model. |
Result: Selected instance set |
|
In Algorithm 1, we chose Manhattan distance as the metric distance for the nearest neighbor algorithm. Manhattan distance can be formulated as follows:
According to the spectral analysis, we found that the reflectance spectral curves of different water bodies are similar in shape, and more manifested in different amplitudes. So, the SAM measurement method is not suitable. Compared to SAM-based distance measurements, Manhattan distance is more sensitive to spectral reflectance amplitudes. For the search of similar samples of the target sample, we use the K-nearest neighbor method.
However, in the original algorithm, the determination of α is actually a problem. When the value of α is too large, fewer samples will be determined as outliers, and when the value of α is too small, more samples will be determined as outliers, resulting in data distortion. Combining Equation (7) and RegENN, the water quality parameter regression problem can be written as:
where
is the model trained after removing noisy label samples,
is model fitting error after removing noisy label samples,
is a function of the number of noisy samples related to α,
is a loss function to detect the number of noisy samples, t is a weight parameter. Our objective is to not only ensure the model fit after noisy label instance selection, but also impose certain constraints on the number of detected samples to ensure data fidelity. For the specific mathematical form of
and
, we will discuss in the
Section 3.1 below.
For the setting of the value of α, we use the grid search method. Empirical values of α given in previous studies ranged from . Within this range, we adopt the grid search method, combined with Equation (9), and determine value of α.
2.2.5. Water Quality Parameter Inversion
We use PLSR, RFR, KNN, Adaboost, 1DCNN-based algorithms to retrieve three water quality parameters (chromaticity, turbidity and COD). As methods are intelligent algorithms, we take all the bands as input and test the performance of different algorithms. The PLSR, RFR, KNN, Adaboost algorithms use functions included in the python programming library scikit-learn.
At the same time, we build a 1DCNN model for further improving the fitting performance. Due to the small number of samples and the large number of spectral bands, in order to better allow the model to learn better features, we introduce the Self Attention module. Self attention mechanism (SAM) is proposed by [
38]. Attention mechanism can make a network learn a weight vector that indicates the importance of different features which improves network performance. It can be divided into two categories, spatial attention [
39] and channel attention [
40]. For hyperspectral data, we adopt channel attention to allow the model to learn better features.
As for loss function, we adopt Smooth L1 loss:
For the regression problem, the Smooth L1 loss not only solves the problem of the gradient explosion caused by the L2 loss for outliers due to the large difference, but also solves the disadvantage that the L1 loss is not smooth enough in the [−1, 1] interval. The 1DCNN network structure we constructed is shown in the
Table 2.