A Novel Robust Classification Method for Ground-Based Clouds

Yu, Aihua; Tang, Ming; Li, Gang; Hou, Beiping; Xuan, Zhongwei; Zhu, Bihong; Chen, Tianliang

doi:10.3390/atmos12080999

Open AccessArticle

A Novel Robust Classification Method for Ground-Based Clouds

by

Aihua Yu

,

Ming Tang

,

Gang Li

,

Beiping Hou

^*,

Zhongwei Xuan

,

Bihong Zhu

and

Tianliang Chen

School of Automation and Electrical Engineering, Zhejiang University of Science and Technology, Hangzhou 310023, China

^*

Author to whom correspondence should be addressed.

Atmosphere 2021, 12(8), 999; https://doi.org/10.3390/atmos12080999

Submission received: 18 June 2021 / Revised: 26 July 2021 / Accepted: 30 July 2021 / Published: 3 August 2021

(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)

Download

Browse Figures

Versions Notes

Abstract

:

Though the traditional convolutional neural network has a high recognition rate in cloud classification, it has poor robustness in cloud classification with occlusion. In this paper, we propose a novel scheme for cloud classification, in which the convolutional neural networks are used for feature extraction and a weighted sparse representation coding is adopted for classification. Three such algorithms are proposed. Experiments are carried out using the multimodal ground-based cloud dataset and the results show that in the case of occlusion, the accuracy of the proposed methods can be much improved over the traditional convolutional neural network-based algorithms.

Keywords:

cloud classification; fusion convolutional neural network; sparse representation; robustness

1. Introduction

Recently, with the expansion of big data combined with the improvement in algorithms and the exponential growth in computing power, machine learning has become a research highlight. Machine learning, especially deep learning, is widely used in computer vision, speech recognition, natural language processing, data mining and meteorological information processing.

In the early days, Buch et al. extracted features from cloud images, including texture measurements, location information and pixel brightness, then used binary decision trees to divide them into four kinds of clouds: altocumulus, cirrus, cumulus and stratiform [1]. Singh et al. proposed a procedure to test the texture feature used for the automatic training of a cloud classifier, in which five feature extraction methods were examined, namely autocorrelation, co-occurrence matrix, edge frequency, Law’s feature and primitive length. These tests can help us better understand the advantages and disadvantages of different feature extraction methods and classification techniques [2]. Heinle et al. made use of k-nearest neighbor classifier to distinguish 7 kinds of cloud images by extracting color and texture features from cloud images [3]. Neto et al. used the multivariate color space features to classify clouds and sky pixels using the statistical features of Euclidean geometric distance [4]. Liu et al. proposed a multiple random projection algorithm to obtain textons and discriminative features [5]. However, the accuracy of these cloud classification methods based on manual feature extraction is far from expectation, and we must manually design features carefully for each classification.

Cloud classification plays a key role in a wide range of applications such as solar power generation, weather forecasting [6,7], deep space climate observatory mission [8], rainfall estimation [9] and optical remote sensing [10]. However, cloud classification is a challenging task. Different clouds show different meteorological information features such as temperature, humidity, pressure, wind speed and color distribution. To achieve cloud classification, it is necessary to accurately estimate its characteristics from the shape, thickness, degrees of sparseness and other cloud features. However, the traditional cloud classification [11,12,13] is in the charge of professional observers. In addition to consuming a lot of labor, there will also be human errors, and it is more difficult to classify the clouds with a satisfactory accuracy when there exist interferences such as sunlight, fog and many others.

Recently convolution neural networks (CNN) have been widely used in image classification and recognition applications. Such networks have many advantages. First, they do not need to extract features manually. In fact, the CNNs automatically extract features through many training pictures and yield a very good performance. Secondly, they can process many data well. Moreover, they can also reflect a strong image discrimination under a large amount of data, so they are a good choice for dealing with the ever-changing clouds. Yang et al. proposed CNN model, known as LeNet-5, yielding an accuracy of handwritten digit recognition as high as 99% [14]. Krizhevsky proposed a deep neural network model, denoted as Alexnet, which is relatively smaller than the LeNet in size, and changed the sigmoid activation function to a simpler ReLU activation function. Such a model makes the training easier under different parameter initialization methods and can reduce overfitting by dropout layers [15]. In [16], an efficient network, called Inception-v3, was proposed, but the deeper network is more difficult to train. He et al. proposed a residual learning framework to simplify the training of deeper networks and the network, denoted as ResNet, takes the residual blocks as the basic blocks and can prevent gradient from disappearing [17]. Liu et al. used multimodal information and visual features to achieve cloud classification through support vector machine [18], where a fusion of convolutional neural networks, including visual sub-network and multimodal sub-network, is proposed. After extracting visual and multimodal features, a weighted integration was used for cloud classification [19]. Although the recognition accuracy of these CNN-based cloud classification methods is very high, they have a common weak point—lack of robustness to perturbation.

Compressed sensing (CS) [20,21] is a recently developed technology. It is based on the concept of sparse representation of signals [22]. Let

x \in R^{n \times 1}

be a signal vector. It is called

{| | α | |}_{0}

-sparse in a matrix

D \in R^{n \times m}

if

x = D α

, where D is called the dictionary and

{| | α | |}_{0}

denotes the number of non-zero entries in the coefficient vector

α \in R^{m \times 1}

. Sparse representation has been used in the fields of image processing [23] and face recognition [24,25]. A weighted sparse representation was proposed in [26] and simulations show that it is more robust to occlusion, destruction and other interferences. Gan et al. [27] proposed a dual norm bounded sparse coding, which classifies cloud images by extracting nine-dimensional features, including five color features and four statistical features, but they did not execute the robustness analysis.

In this paper, we investigate classification of ground-based clouds that are observed by ground-based cameras. The main contributions of this paper are as follows:

A novel structure for robust cloud classification is proposed, in which the features are extracted by convolution neural networks and the classification is executed using the weighted sparse representation;
A two-channel-neural network is proposed for extracting features of the ground-based clouds.

Experimental results show that our proposed method yields a satisfactory performance for classifying ground-based clouds. In particular, the robustness of the proposed two-channel-neural network-based classifier is better than the one with a single convolutional neural network in the case of less occlusion.

The rest of this paper is organized as follows: in Section 2, we briefly introduce some related work, we describe a novel robust cloud classification structure and specific algorithm flow in Section 3, Section 4 presents experimental analysis. To end this paper, some remarking points are given in Section 5.

2. Related Work

Now, we introduce some existing results that will be used for our proposed robust classification algorithm for ground-based clouds.

2.1. ResNet Model

It has been observed that after adding too many layers in a CNN, the training error tends to rise. Though the numerical stability brought by the batch normalization makes it easier to train deeper networks, this problem still exists. The ResNet is proposed for addressing this problem. The residual block is the basic block of the ResNet. In a residual block, the input can propagate forward faster through cross layer data lines, and the occurrence of gradient disappearance can be reduced. Two additional major benefits of rectified linear units(ReLU) are sparsity and a reduced likelihood of vanishing gradient [17], see Figure 1.

Table 1 shows the parameters involved in the ResNet-50 model.

2.2. Inception Model

The efficient convolutional neural network based on Inception model, proposed in [16], consists of a set of basic convolution blocks, i.e., the Inception blocks. The naive version of such a network is showed in Figure 2.

As shown in Figure 2, the k × k convolutions, for k = 1, 3 and 5 (k is 2D of convolution kernel sizes), are employed to extract information in different spatial sizes. However, this structure increases in the number of outputs from stage to stage [28].

The Inception module with dimension reduction is shown in Figure 3, where the

1 \times 1

convolutions are used to compute reductions before the expensive

3 \times 3

and

5 \times 5

convolutions. Besides being used for dimension reduction, they are also used for the rectified linear activation, which makes them dual-purpose.

In general, an Inception network is a network consisting of modules of the above type, stacked upon each other with occasional max-pooling layers with stride 2 to halve the resolution of the grid and therefore, reducing the computational complexity and improving performance [28].

Table 2 shows the parameters involved in the Inception-v3 model.

2.3. Robust Sparse Coding-Based Classification

The traditional sparse coding problem is formulated as

\begin{matrix} \tilde{α} = arg min_{α} {∥y - D α∥}_{2}^{2} s . t . {∥α∥}_{1} \leq κ, \end{matrix}

(1)

where

κ

is the prescribed sparsity level. This coding model actually assumes that the coding residuals

e = y - D α

follow Gaussian or Laplacian distribution.

When applied for image recognition, each image is represented by a (feature) vector

x \in R^{n \times 1}

and the columns of the dictionary

D \in R^{n \times m}

are formed with the m training images and it is assumed that the test images y can be well represented as a linear combination of training samples with a few terms involved, i.e., not more than

κ

as (1) suggested.

In practice, the residual

e = y - D α

may be far away from Gaussian distribution or Laplacian distribution, especially when the images contain occlusion or damage, so the traditional sparse coding model has a poor robustness. To improve the robustness, we must make a good use of those pixels which are not perturbed and neglect those greatly distorted in the sparse coding-based classification (1). This can be done by assigning each pixel as weighting factor, leading to the following robust sparse coding problem:

\begin{matrix} \hat{α} = arg min_{α} {∥W (y - D α)∥}_{2}^{2} s . t . {∥α∥}_{1} \leq κ, \end{matrix}

(2)

where the diagonal matrix W is selected based on the residual vector

\hat{e}

in such a way that if

| \hat{e} (i) | < | \hat{e} (j) |

, then

W (i, i) > W (j, j)

. Here, we choose W with

\begin{matrix} W (i, i) = \sqrt{\frac{e^{β (φ - {\hat{e}}^{2} (i))}}{1 + e^{β (φ - {\hat{e}}^{2} (i))}}}, \forall i, \end{matrix}

(3)

where the parameters

β, φ

are determined by experiments [26]. Usually, W is normalized

trace (W^{2}) = 1

. See the next section.

The class that image y belongs to, denoted as

j (y)

, is then determined by

j (y) = arg min_{j} {| {\hat{e}}_{j} |^{2}}

s . t . {\hat{e}}_{j} = W (y - D \hat{α} (j)) .

(4)

3. Our Proposed Methods

We propose a novel robust classification scheme for ground-based clouds. The basic idea is to use CNNs for feature extraction and sparse coding for classification.

3.1. A Two-Channel-Neural Network-Based Feature Extraction

A more sophisticated approach is depicted in Figure 4, which contains two parts. The first one is for feature extraction, which converts an image into a (feature) vector z. The second part is for classifying the feature vector z.

As each CNN performs differently for different set of conditions, an intuitive idea is to fuse the features obtained with more than one CNNs. Here, we propose a scheme, shown in Figure 4, where two CNNs are used, yielding feature vectors

y_{i} \in R^{n_{i} \times 1}

for

i = 1, 2

that are simply stacked as

y = [\begin{matrix} y_{1} \\ y_{2} \end{matrix}] \in R^{(n_{1} + n_{2}) \times 1} .

(5)

Let

n_{1} + n_{2} = 2 n

. The system of feature selection aims to convert the feature vector y of dimension

2 n

into a vector z of dimension n. Here, we propose such a system consisting of two projections

P_{I}

and

P_{E}

:

y \in R^{2 n \times 1} \mapsto \tilde{z} = P_{I} (y) \in R^{1.5 n \times 1}

\tilde{z} \in R^{1.5 n \times 1} \mapsto z = P_{E} (\tilde{z}) \in R^{n \times 1},

(6)

where the two projections

P_{I}

and

P_{E}

are determined with the training samples

Y = [\begin{matrix} Y_{1} & \dots & Y_{k} & \dots & Y_{K} \end{matrix}]

with

Y_{k} \in R^{2 n \times m_{k}}

—data matrix and

V = [\begin{matrix} v_{1} & \dots & v_{k} & \dots & v_{K} \end{matrix}]

with

V_{k} \in R^{2 n \times K}

—mean matrix of the kth class for

k = 1, 2, \dots, K

.

$P_{I}$ —For a given $Y_{k} \in R^{2 n \times m_{k}}$ , denote the mean and variance vectors $v_{k}, {\bar{v}}_{k} \in R^{2 n \times 1}$

$\{\begin{matrix} v_{k} & ≜ & \frac{1}{m_{k}} \sum_{j = 1}^{m_{k}} Y_{k} (:, j) \\ {\bar{v}}_{k} (i) & ≜ & \sqrt{\frac{1}{m_{k}} \sum_{j = 1}^{m_{k}} {[Y_{k} (i, j) - v_{k} (i)]}^{2}} \\ i = 1, 2, \dots, 2 n \end{matrix}$

(7)

Define $V_{Y} ≜ \frac{1}{K} \sum_{k = 1}^{K} {\bar{v}}_{k}$ , which is a vector determined by the matrix Y with (7). Let ${V_{Y} (j_{p})}$ with $j_{p} < j_{p + 1}$ for $p = 1, 2, \dots, 1.5 n - 1$ be the set of the $1.5 n$ smallest entries of $V_{Y}$ , then the projection $P_{I}$ such that $\tilde{z} = P_{I} (y)$ is given by

$\tilde{z} (p) = y (j_{p}), \forall p \leq 1.5 n .$

(8)

Thus, $Y_{k} \mapsto {\tilde{Z}}_{k} = P_{I} (Y_{k})$ for $k = 1, 2, \dots, K$ , i.e., $Y \mapsto \tilde{Z} = P_{I} (Y)$ . As understood, the projection $P_{I}$ intends to keep those entries of the feature vector y, which are clustered within the each of the classes.
$P_{E}$ —With $V_{k} \in R^{1.5 n \times K}$ obtained, we can compute the mean and variance vectors, liked as $v^{*}, {\bar{v}}^{*} \in R^{1.5 n \times 1}$ , using (7). Let ${{\bar{v}}^{*} (j_{p})}$ with $j_{p} < j_{p + 1}$ for $p = 1, 2, \dots, n - 1$ be the set of the n largest entries of ${\bar{v}}^{*}$ , then the projection $P_{E}$ such that $z = P_{E} (\tilde{z})$ is given by

$z (p) = \tilde{z} (j_{p}), \forall p \leq n .$

(9)

Unlike $P_{I}$ , $P_{E}$ aims at enhancing the discrimination between the classes by keeping those entries of vector $\tilde{z}$ that are of a big variance.

3.2. Robust Sparse Coding with Extended Dictionary

As assumed before, there are K different classes of clouds to be considered. The dictionary for robust sparse coding is formed based on the feature vectors of the training samples. Instead of using one sample for each class, we make use of all the

m_{k}

training pictures for the kth class. Precisely speaking, the k-sub-dictionary, denoted as

D_{k}

, is given by

\begin{matrix} D_{k} & = & P_{E} (P_{I} (Y_{k})) \end{matrix}

\begin{matrix} k & = & 1, 2, \dots, m_{K} \end{matrix}

(10)

and the total dictionary

D \in R^{n \times m}

with

m = \sum_{k = 1}^{K} m_{k}

, named extended dictionary, is formed with

{D_{k}}

:

D = [\begin{matrix} D_{1} & \dots & D_{k} & \dots & D_{K} \end{matrix}] .

The optimal sparse representation-based robust classification (2) is usually attacked with the following problem (see [20,21]).

\begin{matrix} \bar{α} & = & arg min_{α} ∥ \bar{W} (z - D α) ∥_{2}^{2} + {λ | | α | |}_{1}, \end{matrix}

(11)

where

\bar{W}

is the weighting matrix, which is diagonal and a function of

e = z - D α

.

The problem defined by (11) is an alternative version of (2) and is very difficult to solve because

\bar{W}

is a function of

e = z - D α

. Practically, the weighting matrix

\bar{W}

is obtained with an iterative procedure. Actually,

\bar{W}

updated based on the previous estimate of

\bar{α}

: let

{\bar{α}}^{(l)}

be the estimate of

\bar{α}

at the lth iteration of Algorithm 1 below. Denote

\begin{matrix} {\hat{e}}^{(l)} = z - D {\bar{α}}^{(l)} \end{matrix}

(12)

The weighting matrix W is then updated with the above

{\hat{e}}^{(l)}

via (3), i.e.,

\begin{matrix} W^{(l)} (i, i) = \sqrt{\frac{e^{β (φ - | {\hat{e}}^{(l)} (i) |^{2})}}{1 + e^{β (φ - | {\hat{e}}^{(l)} (i) |^{2})}}}, \forall i \end{matrix}

(13)

and hence

{\bar{W}}^{(l)}

is then updated with

{\bar{W}}^{(l)} (i, i) = W^{(l)} (i, i) / | | W^{(l)} {| |}_{F} \forall i

(14)

with

{| | . | |}_{F}

denoting the Frobenius norm.

With the obtained

\bar{α} = [\begin{matrix} {\bar{α}}_{1} & \dots & {\bar{α}}_{k} & \dots & {\bar{α}}_{K} \end{matrix}]

and

\bar{W}

, compute

\begin{matrix} δ_{k} = \bar{W} (z - D_{k} {\bar{α}}_{k}), \forall k \leq K . \end{matrix}

(15)

and hence

\begin{matrix} g_{k} = | | δ_{k} {| |}_{2}, \forall k \leq K . \end{matrix}

(16)

The class of the test image represented by z is determined with

\begin{matrix} k (z) = arg min_{k} {g_{k}} . \end{matrix}

(17)

The entire procedure of the proposed WSR algorithm is outlined in Algorithm 1. Outline of Robust Sparse Representation-based Classification.

Algorithm 1: Weighted Sparse Representation.

Require: Dictionary

D \in R^{n \times m}

, test sample

z \in R^{n \times 1}

and initial error

{\hat{e}}^{(0)} = z - \frac{1}{m} \sum_{j = 1}^{m} D (:, j)

; set

λ = 0.4

and

N_{i t e}

—the number of iterations.
Ensure:
While

0 \leq l \leq N_{i t e}

, do
(1) update

\bar{W}

via (13) and (14), yielding

{\bar{W}}^{(l)}

;
(2) update

\bar{α}

by solving the following using any of the basis pursuit-based algorithms:

{\bar{α}}^{(l)} = arg min_{α} {∥{\bar{W}}^{(l)} (z - D α)∥}_{2}^{2} + λ {∥α∥}_{1}

(3) update the sparse representation error

{\hat{e}}^{(l)}

via (12);
(4)

l = l + 1

;
End while if

l > N_{i t e}

and output

\bar{α}

as well as

\bar{W}

, and execute the classification with (15)–(17).
Return

4. Experiments

In this section, we present some experimental results to examine our proposed approach. We first introduce the dataset and setup used in the experiments, then present the experimental results and discussions.

4.1. Dataset

The multimodal ground-based cloud dataset (MGCD) [19] collected in China mainly contains two kinds of ground-based cloud information, i.e., the cloud images and the multimodal cloud information. The cloud images with the size of 1024 × 1024 pixels are captured at different times by a sky camera with fisheye lens. The fisheye lens could provide a wide range observation of the sky conditions with the horizontal and vertical angles of 180 degrees. The multimodal cloud information is collected by a weather station, including temperature, humidity, pressure, wind speed, maximum wind speed and average wind speed. Each cloud image corresponds to a set of multimodal data. The 8000 pictures used are classified into 7 groups of clouds: cumulus, altocumulus, cirrus, clear sky, stratocumulus and mixed. In addition, it should be noted that the cloud images with less than 10% cloud cover belong to clear sky. The detailed distribution of samples for each class is illustrated in Table 3.

Figure 5 shows three samples for each of the 7 classes of clouds.

The occluded testing samples are demonstrated in Figure 6.

4.2. Parameter Setting

Random partition ensures that the learning features will be uniformly distributed, while non-random allocation leads to uneven samples, which will seriously affect the convergence of network training phase and the generalization of network testing phase. Therefore, the data set is randomly divided into training set and testing set. The first one contains 2/3 of the cloud samples of each class, and the second one contains 1/3 of the cloud samples of each class.

All experiments are carried out with the same experimental setup. We use transfer learning to train the convolutional neural network models of Inception-v3 network and ResNet-50 network. We change the size of the fully connected output layer to fit the cloud categories, and the input cloud samples are automatically cropped to input layer size.

A small batch random gradient descent method is used to adjust the parameters continuously, we set minimum batch size to 10, maximum epochs to 6, initial learn rate to 0.00001, validation frequency to 250.

4.3. Results

We will examine five methods for classifying ground-based Clouds, among which there are two existing ones:

ICNN—using the Inception-v3 convolutional neural network (ICNN) for feature extraction and classification;
RCNN—it is similar to ICNN, but the CNN used is the ResNet-50 convolutional neural network.

Three methods that we proposed:

IWSRC—using the Inception-v3 convolutional neural network for feature extraction and the weighted sparse representation coding for classification;
RWSRC—exactly the same as IWSRC but with the Inception-v3 CNN replaced by the ResNet-50;
RIWSRC—this is the proposed two-channel CNN-based sparse representation coding method, depicted in Figure 4.

Train loss of convolutional neural network is displayed in Figure 7.

Train accuracy of convolutional neural network is shown in Figure 8.

Table 4 shows the accuracy of all the methods used in this paper without occlusion.

Table 5 shows the accuracy of each method when the occluded cloud samples are used.

4.4. Discussion

In Figure 7, the overall trend is downward, and generally the loss of RCNN is lower than that of ICNN. In Figure 8, the recognition rate of RCNN is faster than that of ICNN.

As see Table 4, the RCNN is better than the ICNN since the ResNet-50 increases gradient propagation and hence is more effective in clouds classification. As to the proposed methods IWSRC, RWSRC and RIWSRC, they are all better than ICNN and RCNN because the noise interference can be suppressed with weighted representation learning during classification. However, RIWSRC achieves the best performance with the optimized feature selection process. The testing cloud images are with 5%, 10%, 15%, 20% and 25% occlusion, respectively.

In Table 5, experimental results with various occlusion show that the proposed IWSRC and RWSRC are better than ICNN and RCNN. The neural network is sensitive to perturbation with occlusion, and weighted sparse representation(WSR) will adjust the weights according to the size of the error. The larger error leads to smaller weights, which helps to reduce the perturbation on classification. In general, the proposed RIWSRC is more robust and effective than both IWSRC and RWSRC. The multiple features fusion can further improve the robustness of the algorithm The code is now publicly available on https://github.com/tangming666/NRC (accessed on 18 July 2021).

5. Conclusions

A novel robust classification scheme is proposed for ground-based clouds in this paper. The basic idea behind this is to use the CNNs for feature extraction and a weighted sparse representation coding for classification. Two classification algorithms are proposed directly along this line and the third one is based on the fusion of two CNNs. Experimental results show that the three proposed algorithms can enhance the robustness greatly. The proposed methods can be applied in most of deep learning neural networks.

As future research, we will consider adding some multimodal information to the selected neural network features to improve the robustness of the system. More applications of the proposed methods will be explored.

Author Contributions

All authors made significant contributions to the manuscript. A.Y. and M.T. conceived, designed and performed the experiments; G.L. wrote the paper; B.H. performed the experiments and analyzed the data; Z.X., B.Z. and T.C. revised the paper and provided the background knowledge of cloud classification. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

No applicable.

Informed Consent Statement

No applicable.

Data Availability Statement

The models and code used during the study are available in a repository (https://github.com/tangming666/NRC, accssed on 1 August 2021). The data is available from the corresponding author by request (shuangliu.tjnu@gmail.com).

Acknowledgments

This work was supported by Key Research & Development Project of Zhejiang Province 2021C04030 & the Public Project of Zhejiang Province LGG20F020007 and LGG21F030004. We would like to express my gratitude to Liu Shuang for providing the MGCD datasets. Our deepest gratitude goes to the reviewers for their careful work and thoughtful suggestions that have helped improve this paper substantially.

Conflicts of Interest

The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

References

Buch, K.A., Jr.; Sun, C.H.; Thorne, L.R. Cloud classification using whole-sky imager data. Ninth Symp. Meteorol. Obs. Instrum. 1996, 4, 353–358. [Google Scholar]
Singh, M.; Glennen, M. Automated ground-based cloud recognition. Pattern Anal. Appl. 2005, 8, 258–271. [Google Scholar] [CrossRef]
Heinle, A.; Macke, A.; Srivastav, A. Automatic cloud classification of whole sky images. Atmos. Meas. Tech. 2010, 3, 557–567. [Google Scholar] [CrossRef] [Green Version]
Neto, S.L.M.; Wangenheim, R.V.; Pereira, R.B. The use of euclidean geometric distance on rgb color space for the classification of sky and cloud patterns. J. Atmos. Ocean. Technol. 2010, 27, 1504–1517. [Google Scholar] [CrossRef] [Green Version]
Liu, S.; Wang, C.; Xiao, B.; Zhang, Z.; Shao, Y. Ground-based cloud classification using multiple random projections. In Proceedings of the 2012 International Conference on Computer Vision in Remote Sensing (CVRS), Xiamen, China, 16–18 December 2013. [Google Scholar] [CrossRef]
Papin, C.; Bouthemy, P.; Rochard, G. Unsupervised segmentation of low clouds from infrared meteosat images based on a contextual spatio-temporal labeling approach. IEEE Trans. Geosci. Remote Sens. 2002, 40, 104–114. [Google Scholar] [CrossRef] [Green Version]
Rathore, P.; Rao, A.S.; Rajasegarar, S.; Vanz, E.; Gubbi, J.; Palaniswami, M.S. Real-time urban microclimate analysis using internet of things. IEEE Internet Things J. 2017, 5, 500–511. [Google Scholar] [CrossRef]
Holdaway, D.; Yang, Y. Study of the effect of temporal sampling frequency on DSCOVR observations using the GEOS-5 nature run results (part i): Earth’s radiation budget. Remote Sens. 2016, 8, 98. [Google Scholar] [CrossRef] [Green Version]
Mahrooghy, M.; Younan, N.H.; Anantharaj, V.G. On the use of a cluster ensemble cloud classification technique in satellite precipitation estimation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 1, 1356–1363. [Google Scholar] [CrossRef]
Tan, K.; Zhang, Y.; Tong, X. Cloud extraction from chinese high resolution satellite imagery by probabilistic latent semantic analysis and object-based machine learning. Remote Sens. 2016, 8, 963. [Google Scholar] [CrossRef] [Green Version]
Liu, L.; Sun, X.; Chen, F.; Zhao, S.; Gao, T. Cloud Classification Based on Structure Features of Infrared Images. J. Atmos. Ocean. Technol. 2011, 410–417. [Google Scholar] [CrossRef]
Yang, J.; Lu, W.; Ma, Y.; Wen, Y. An Automatic Ground-based Cloud Detection Method Based on Adaptive Threshold. J. Appl. Meteorol. Sci. 2009, 20, 713–721. [Google Scholar]
Liu, S.; Wang, C.; Xiao, B.; Zhang, Z.; Cao, X. Tensor Ensemble of Ground-Based Cloud Sequences: Its Modeling, Classification, and Synthesis. IEEE Geosci. Remote. Sens. Lett. 2013, 10, 1190–1194. [Google Scholar] [CrossRef]
Lecun, Y.; Bottou, L. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
Krizhevsky, A.; Sutskever, I.; Hinton, G. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
Szegedy, C.; Vanhoucke, V.; Ioffe, S. Rethinking the inception architecture for computer vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision Foundation, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Liu, S.; Li, M. Deep multimodal fusion for ground-based cloud classification in weather station networks. Eurasip J. Wirel. Commun. Netw. 2018, 2018, 48. [Google Scholar] [CrossRef] [Green Version]
Liu, S.; Li, M.; Zhang, Z.; Xiao, B.; Cao, X. Multimodal ground-based cloud classification using joint fusion convolutional neural network. Remote Sens. 2018, 10, 822. [Google Scholar] [CrossRef] [Green Version]
Donoho, D.L. Compressed sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
Kutyniok, G. Theory and applications of compressed sensing. GAMM-Mitteilungen 2013, 36, 79–101. [Google Scholar] [CrossRef] [Green Version]
Elad, M.; Aharon, M. Image denoising via sparse and redundant representations over learned dictionaries. IEEE Tip 2006, 15, 3736–3745. [Google Scholar] [CrossRef] [PubMed]
Mairal, J.; Elad, M.; Sapiro, G. Sparse representation for color image restoration. IEEE Trans. Image Process. 2007, 17, 53–69. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Wright, J.; Ma, Y. Dense error correction via l1-minimization. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, 19–24 April 2009. [Google Scholar]
Yang, M.; Zhang, L. Gabor feature based sparse representation for face recognition with gabor occlusion dictionary. In Proceedings of the 11th European conference on Computer Vision, Crete, Greece, 5–11 September 2010; pp. 448–461. [Google Scholar]
Yang, M.; Zhang, L.; Yang, J. Robust sparse coding for face recognition. In Proceedings of the CVPR 2011, Colorado Springs, CO, USA, 20–25 June 2011; Volume 42, pp. 625–632. [Google Scholar]
Gan, J.; Lu, W.; Li, Q.; Zhang, Z.; Yang, J.; Ma, Y.; Yao, W. Cloud type classification of total-sky images using duplex norm-bounded sparse coding. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2017, 10, 3360–3372. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]

Figure 1. Flowchart of a residual block in the ResNet, where x is the input data vector and y is the output vector, while ReLU is rectified linear units: ReLU(z) = Max (0,z).

Figure 2. Block-diagram of the Inception CNN—a naive version.

Figure 3. Block-diagram of the Inception module with dimension reduction.

Figure 4. The structure of our proposed method. In Resnet-50, Residual block uses the structure similar to Figure 1. In Inception-v3 net, Inception uses the structure of Figure 3.

Figure 5. Samples with each for one of the seven classes of cloud pictures.

Figure 6. The cloud samples with various percentage occlusion on MGCD.

Figure 7. Train loss of the two convolutional neural networks. An epoch means one complete pass of the training dataset through the algorithm.

Figure 8. Train accuracy of the two convolutional neural networks.

Table 1. Architecture of the ResNet-50, c is the number of channels, m × n is the dimension of data, k × k is convolution kernel size.

Type	Patch Size ( $k \times k \times c$ )/Stride(s)	Input Size ( $m \times n \times c$ )
Conv	$7 \times 7 \times 64$ /2	$224 \times 224 \times 3$
max-pooling	$3 \times 3 \times 64$ /2	$112 \times 112 \times 64$
3 × residual block		$56 \times 56 \times 64$
4 × residual block		$28 \times 28 \times 256$
6 × residual block		$14 \times 14 \times 512$
3 × residual block		$7 \times 7 \times 1024$
average pooling	$7 \times 7 \times 2048$	$7 \times 7 \times 2048$
softmax		$1 \times 1 \times 2048$

Table 2. Architecture of the Inception-v3, c is the number of channels, m × n is the dimension of data, k × k is convolution kernel size.

Type	Patch Size ( $k \times k \times c$ )/Stride(s)	Input Size ( $m \times n \times c$ )
conv	$3 \times 3 \times 32$ /2	$299 \times 299 \times 3$
conv	$3 \times 3 \times 32$ /1	$149 \times 149 \times 32$
conv padded	$3 \times 3 \times 64$ /1	$147 \times 147 \times 32$
pool	$3 \times 3 \times 64$ /2	$147 \times 147 \times 64$
conv	$3 \times 3 \times 80$ /1	$73 \times 73 \times 64$
conv	$3 \times 3 \times 192$ /2	$71 \times 71 \times 80$
conv	$3 \times 3 \times 288$ /1	$35 \times 35 \times 192$
$3 \times$ Inception		$35 \times 35 \times 288$
$5 \times$ Inception		$17 \times 17 \times 768$
$2 \times$ Inception		$8 \times 8 \times 1280$
average pooling	$8 \times 8 \times 2048$	$8 \times 8 \times 2048$
softmax		$1 \times 1 \times 2048$

Table 3. Distribution of the 8000 samples from the MGCD.

Label	Cloud Class	Number of Samples
1	Cumulus	1438
2	Altocumulus	731
3	Cirrus	1323
4	Clear sky	1338
5	Stratocumulus	963
6	Cumulonimbus	1187
7	Mixed	1020

Table 4. The accuracy(%) of all the methods used in this paper without occlusion.

Method	Accuracy
ICNN	96.97
RCNN	97.09
IWSRC	99.56
RWSRC	99.28
RIWSRC	99.81

Table 5. The accuracy(%) of all the methods used in this paper with occlusion.

Method	Occ. 5%	Occ. 10%	Occ. 15%	Occ. 20%	Occ. 25%
ICNN	84.03	83.18	82.24	79.43	77.52
RCNN	90.47	89.68	83.99	78.77	75.15
IWSRC	98.62	97.2	93.94	87.21	79.4
RWSRC	99.03	97.97	97.62	96.56	93.28
RIWSRC	99.37	98.87	98.06	95.53	90.28

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yu, A.; Tang, M.; Li, G.; Hou, B.; Xuan, Z.; Zhu, B.; Chen, T. A Novel Robust Classification Method for Ground-Based Clouds. Atmosphere 2021, 12, 999. https://doi.org/10.3390/atmos12080999

AMA Style

Yu A, Tang M, Li G, Hou B, Xuan Z, Zhu B, Chen T. A Novel Robust Classification Method for Ground-Based Clouds. Atmosphere. 2021; 12(8):999. https://doi.org/10.3390/atmos12080999

Chicago/Turabian Style

Yu, Aihua, Ming Tang, Gang Li, Beiping Hou, Zhongwei Xuan, Bihong Zhu, and Tianliang Chen. 2021. "A Novel Robust Classification Method for Ground-Based Clouds" Atmosphere 12, no. 8: 999. https://doi.org/10.3390/atmos12080999

APA Style

Yu, A., Tang, M., Li, G., Hou, B., Xuan, Z., Zhu, B., & Chen, T. (2021). A Novel Robust Classification Method for Ground-Based Clouds. Atmosphere, 12(8), 999. https://doi.org/10.3390/atmos12080999

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Robust Classification Method for Ground-Based Clouds

Abstract

1. Introduction

2. Related Work

2.1. ResNet Model

2.2. Inception Model

2.3. Robust Sparse Coding-Based Classification

3. Our Proposed Methods

3.1. A Two-Channel-Neural Network-Based Feature Extraction

3.2. Robust Sparse Coding with Extended Dictionary

4. Experiments

4.1. Dataset

4.2. Parameter Setting

4.3. Results

4.4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI