1. Introduction
Any region’s weather is closely tied to the presence of clouds. Clouds play a major role in all types of precipitation. Although not all clouds may result in precipitation, they are crucial for controlling the weather in some regions. Different types and heights of clouds exist in various geographical locations, such as the Earth’s tropics or poles [
1]. The sky always has clouds, and they are ever-changing. Clouds serve as indicators of atmospheric conditions and are crucial for weather forecasting and warnings, as well as for controlling the Earth’s energy balance, temperature, and weather [
2]. Before creating a weather prediction, meteorologists investigate the specifics of the cloud type since they are constantly changing.
We can forecast changes in weather by studying and categorizing clouds. Future measurements and forecasts may be significantly impacted by changes in cloud classification. Additionally, locating and analyzing clouds can assist meteorologists in modifying weather forecasts, better comprehending the local ecology, and foretelling changes in the world’s climate [
3]. Ground-based cloud observations are remote sensing image materials, so they will have information such as cloud parameters, spatial resolution, temporal resolution, spectral resolution, etc. Classification of cloud information (cloud parameters, temporal, and spatial resolution, etc.) is very important. These parameters need to be defined for ground-based cloud observations such as cloud height, cover, and cloud type [
4,
5,
6], in which the parameter of cloud type is the first and most readily available parameter that humans can obtain.
The cloud is a product of nature, and it reflects the climate and different weather conditions on Earth. Clouds appear very diverse in different atmospheric conditions. Extracting useful information from huge amounts of image data by detecting and analyzing different entities in images is a major challenge. Today, the identification of cloud types is mainly based on the observations of experts. The results are subjective and cannot meet the actual requirements of meteorological observations. The automatic classification of cloud types has become a demanding problem that needs to be solved in this case.
Researchers are increasingly using imagery of the entire sky above the ground for a variety of purposes, including solar radiation, weather forecasting, and aviation. These images are usually higher in resolution than those obtained from [
7] the satellite. Additionally, the camera’s vertical orientation makes it simple to photograph clouds at various low altitudes. Thus, they enrich satellite images with useful data. In most cases, cloud observations are performed manually by experts from meteorological institutes. They observe these clouds to better understand atmospheric phenomena and classify clouds into various categories according to the World Meteorological Organization standard [
8]. Although the results obtained are quite accurate, such manual observations are expensive and time-consuming. Therefore, it is necessary to apply cloud classification algorithms automatically and systematically to save costs.
In recent years, several studies have approached computer vision techniques to identify and classify clouds based on their volume, shape, thickness, height, and coverage. For example, Kliangsuwan and Heednacram [
9] used a new method for feature extraction in cloud classification. The authors propose three more types of features based on Fourier transform, namely Fast Fourier Transform Projection on the modified
x-axis (
k-FFTPX), half
k-FFTPX, and
-FFT, and use an ANN-based classification technique with a tree algorithm to extract features. Li et al. [
10] proposed a new approach to cloud pattern recognition, based on analyzing the image as a set of patches (set of changes), rather than a set of pixels, and through the Support Vector Machine (SVM) classifier for classification. Zhen et al. [
11] used spectral and texture feature extraction by tonal statistical analysis and Gray Level Cooccurrence Matrix (GLCM), and an SVM classifier with the Radial Basis Function (RBF) multiplier to classify different clouds from sky images. Taravat et al. [
12] used a neural network in conjunction with SVM for automatic cloud classification for the entire color terrestrial image.
Algorithms based on structural features such as cloud scale, edge sharpness, Fourier transform, etc., cannot effectively exploit the useful information of cloud images due to cloud images, as a type of natural striated structure, often possessing variations with very large connotations due to large variations in illumination, climate, and distortion [
13]. Cheng and Yu [
14] proposed a cloud classification method that deals with mixed cloud types in an image based on the segmentation of images in different blocks. In each block, the texture statistical feature and the Local Binary Patterns (LBP) feature are extracted. These features are then classified using the Bayesian classifier. Liu et al. [
15] proposed a new feature extraction technique by improving the LBP technique. This technique is called Salient Local Binary Pattern (SaLBP) for terrestrial cloud image classification. SaLBP utilizes the most frequently occurring patterns (prominent patterns) to obtain descriptive information. Liu and Zhang [
13] presented a new feature extraction algorithm called Learning Group Patterns (LGP) to classify seven sky conditions; the proposed algorithm considers the resolution of the texture by using SaLBP and through LGP. Zhang et al. [
16] focused both on designing appropriate feature representations and learning distance metrics from sample pairs. The authors also propose a feature extraction technique called Transfer Deep Local Binary Patterns (TDLBP) and learn WML. Wang et al. [
17] proposed a powerful feature extraction method based on the average rank of occurrence frequencies of invariant rotational samples defined in the LBP of the cloud image, called SLBP. In recent years, different image classifications based on deep learning have been proposed and demonstrated for their effectiveness [
18,
19]. In 2020, Wang et al. [
20] proposed a convolutional neural network (CNN) integrated with a neural network with deep learning capabilities, called CloudA, as a ground-based cloud image recognition method. CloudA visualizes cloud features using TensorBoard visualization, and these features can help us to understand the terrestrial cloud classification process.
Therefore, feature extraction plays an important role, affecting the results of the classifier. There are many feature extraction methods have been proposed in the last decade. The LBP feature and its variants have been proposed as an effective feature extraction method for classifying natural texture images [
21]. Although the extraction of local features by LBP gives many positive results, there are still noisy and duplicated features. Indeed, many extracted features will decrease the performance due to the curse of dimensionality. Feature selection involves finding a subset of valid features. To improve the accuracy of the classifier and reduce the computational burden, the feature selection step is essential.
This paper presents an approach for color image classifcation based on LTP features and feature selection methods. Since the color information is important to represent texture, we consider different color spaces for extracting local features. The rest of this paper is structured as follows.
Section 2 presents the feature extraction by the LBP and LTP descriptor.
Section 3 introduces the histogram selection approach via the Intra-Class Simalarity (ICS) score for selecting the most important features.
Section 4 presents the cloud image classification process. Then,
Section 5 shows the experimental results on the two benchmark datasets. Finally,
Section 6 gives the conclusions and perspectives of this work.
2. Feature Extraction
Feature extraction is an important step for multimedia processing. The question of how to extract ideal features that can still reflect, as fully as possible, the contents of the image remains a challenging problem in computer vision. In other words, feature extraction is the process of obtaining the most important data from the raw data. Feature extraction is an important step in building any pattern classification model and aims to extract relevant features. In this process, relevant features are extracted from the objects to form feature vectors.
The most common image features include color, texture, and shape, etc., and most of the feature spaces are built on these features. However, the performance of the model depends heavily on the use of the image features [
22]. LBP feature extraction was first proposed by Ojala et al. [
23] to describe the texture of the image. It is the comparison of neighboring pixels with the central pixel to obtain a binary sample. This binary pattern is generated as follows: all neighboring pixels will take the value 1 if its pixel value is greater than the central pixel value, and otherwise take the value 0. Then, the pixels are multiplied with the respective weights and summed to obtain the LBP value for the center pixel.
The formula for calculating LBP is determined as follows:
where
where,
and
are the coordinates of the center pixel,
P is the number of neighboring pixels,
R is the neighborhood radius,
is the grayscale value of the center pixel,
is the grayscale value of the neighboring pixel.
Figure 1 shows the process and encoding of the LBP operator for grayscale images with 3 × 3 pixels.
The LTP operator was developed from LBP and introduced by Tan and Triggs [
24]. This proposal offers significantly higher efficiency than LBP and better noise handling than LBP in homogenous regions. In LTP,
s(
−
) is defined as follows:
where,
t is the user-defined threshold.
Figure 2 shows the working and encoding of the LTP operator for grayscale images with 3 × 3 pixels, with the parameter value
.
3. Histogram Selection
A histogram is used to describe discrete or continuous data and is one of the best ways to represent variables. In other words, it provides a visual interpretation of numeric data by displaying the number of data points that fall within a specified range of values (called a bin). In order to select the pertinent features, there are several methods, such as evaluating the individual features or groups of features. Several histogram selection methods based on graph construction or the measurement of similarity are introduced [
25,
26]. The histogram selection methods can be considered as an evaluation of the groups of features [
27]. Histogram selection methods are usually grouped into three approaches: filter method, wrapper method, and embedded method. The latter involves a combination of the reduced processing time of the filtration method and the high efficiency of the encapsulation method. The filtering method is used to calculate the score of each histogram to measure its effectiveness, and then the histogram will be ranked according to the calculated score. The histogram is evaluated using a specific classification algorithm, and the selected histograms are the histograms that maximize the classification rate.
To improve the classification performance, there are many proposed methods with the goal of reducing the dimensionality of the feature matrix. One such method is dimensionality reduction of the feature matrix based on the feature histogram, as proposed by Porebski et al. in 2013 [
28]. In this method, the most important and significant histograms are selected based on the score value of each histogram. The approach to selecting characteristic histograms using ICS techniques has recently been extended to the multicolor space domain. Considering a database with
N textured color images, each image
has a characteristic
histogram. The entire set of data is represented by the matrix
as follows:
in which
is the
rth histogram of the color image with texture
i.
is defined as follows:
, where
Q is the bin number of the histogram.
The ICS technique is based on an in-class similarity method to evaluate the similarity between histograms extracted from images of the same class.
Let
be the
k training image of class
j, and class
j has
images. Accordingly, the number of intersections of the histogram is calculated as follows:
To measure the similarity of the class
j, let
be the similarity measure, calculated as follows:
Porebski et al., suggested that the higher the
in a class, the more relevant the histogram
. Finally, to calculate the ICS score of a histogram
by:
where,
C is the number of classes to be considered.
has a value from 0 to 1. The most distinct histogram is the one with the highest score of
.
5. Experimental Results
Previous studies focus on color features/color information of the RGB color space to classify clouds/sky images. However, the specific color space allows an improvement in the classification performance [
32,
33]. This work considers 14 different color spaces, such as HLS, HSV, IHLS, Lab, rgb, YIQ, YUV, RGB, bwrgby, XYZ, YCbCr, Luv, I1I2I3, ISH, for extracting features. Each extraction method applied obtains a corresponding feature—in this case, the value of (
), with
and
. Thus, the input parameters for the feature extraction techniques LBP, LTP, and LBP+LTP will have 15 pairs of parameters (
), respectively. Each specific feature extraction technique, with a specific pair of input parameters, is applied on the 14 color spaces. Finally, histogram selection is applied for those features. In summary, the parameters used to run the experiment and give the results of cloud/sky image classification include: (1) a pair of parameters (
), (2) 14 color spaces, (3) 3 feature extraction methods: LBP, LTP, and LBP + LTP, (4) a selected number of histograms. Moreover, the dimension of each histogram is dependent on the value of
P—for example, LBP features with
have 3 histograms from three color channels, and each histogram consists of
bins. When applying the LTP descriptor to feature extraction, the number of histograms is doubled (
bins) compared with the LBP descriptor.
5.1. Results on the SWIMCAT Dataset
Table 1 is a summary and selective synthesis result obtained when running experiments on the applied SWIMCAT dataset technique to extract LBP, LTP, and LBP + LTP features of 14 color spaces. The highest value for each feature is underlined. We observe that when classifying cloud/sky images in the SWIMCAT dataset, if using more color variables, the results are better than when using grayscale images, specifically with the RGB color system, with ACC reaching
for the LBP (1, 12) technique, ACC reaching
for LTP (2, 12), and ACC reaching
for technique LBP+LTP (3, 12).
Table 2 presents the results obtained on the SWIMCAT dataset incorporating the histogram selection method. The highest value for each feature is underlined. For color spaces with H color components and S color components, using the LTP feature gives better results than using the LBP feature. Specifically, the HLS color space reaches
with the LBP (4, 12) feature and reaches
with the characteristic LTP (4, 8); the ISH color system reached
with the LBP (4, 12) feature and
with the LTP (4, 8) feature.
Figure 6 presents the selected results of the highest accuracy obtained from three types of features (LBP, LTP, LBP+LTP) in the two scenarios: without and with the histogram selection method.
Table 3 presents the comparison of the results obtained on the SWIMCAT dataset with previous studies. Thus, the highest results are obtained on the SWIMCAT dataset for three types of features (
Table 3) using the ICS technique, as follows:
- -
The LBP feature has the highest ACC of 99.2 for the YUV color system at P = 2, R = 12, with 12,288 features (3 histograms).
- -
The LTP feature has the highest ACC 99.1 for the IHLS color system at R = 2, P = 12, with 12,288 features (3 histograms).
- -
Features LBP+LTP achieved the highest ACC 99.2 for the YUV color system with R = 2, , with 12,288 features (3 histograms).
With the characteristic of LBP when using the ICS technique, there is no significant difference in results when not using the ICS technique. However, with the characteristics of LTP and LBP+LTP when using the ICS technique for better results, the number of histograms selected is also smaller than when not using the ICS technique to select histograms. Moreover, with the SWIMCAT dataset, the LTP feature gives better results than the LBP feature (
Table 2). The proposed approach clearly outperforms LBP variants such as WLBP, SRBP, and SWOBP. For example, the SaLBP technique is based on the LBP uniform, which has had many of the bins that usually arise eliminated.
5.2. Results on the Cloud-ImVN 1.0 Dataset
Table 4 shows the best results obtained with different types of features on the Cloud-ImVN 1.0 dataset. We observe the appearance of the HLS, HSV, and IHLS color spaces. It shows that color components with high dichroism or with H (Hue) and S (Saturation) components can still be good candidates for cloud/sky image classification. Moreover, we observe and confirm that the RGB space is not the best color space for characterizing cloud images. LTP features achieve higher accuracy than LBP features and have lower standard deviation. LBP features achieve the highest accuracy, with 85.4 ± 4.6, and LTP achieves the highest accuracy at 88.1 ± 2.2. In total, the combination of LBP and LTP features achieves the best accuracy at 92.2 ± 2.4.
Table 5 presents the obtained result on the Cloud-ImVN 1.0 dataset while using the histogram selection method on different parameters: color space, features used,
values. When applying the ICS method, the results of cloud/sky image classification change significantly: the results are higher and the number of features is highly reduced. Considering
Table 5, the LBP feature (3,12) in the RGB color system achieved the highest ACC of 89.3 ± 2.2 with one histogram selected, while, when not using ICS, the LBP feature (5, 12) in the RGB color system achieved the highest ACC of 81.9 ± 3.3.
Figure 7 presents the selected results of the highest accuracy obtained from three types of features (LBP, LTP, LBP+LTP) in the two scenarios: without and with the histogram selection method.
Table 6 presents the comparison of the results obtained on the Cloud-ImVN 1.0 dataset with previous studies. Thus, the highest results are obtained on this dataset for three types of features using the ICS technique, as follows:
- -
LBP achieved the highest ACC 91.6 for the Lab and Luv color space at , , with 4096 features.
- -
The LTP feature has the highest ACC of 88.1 for the HSV color space at , , with 1536 features.
- -
Features LBP+LTP have the highest ACC of 92.2 for the Lab color space at , , with only 144 features.
There are many image processing algorithms, classifying images mainly based on the RGB color space. However, for the classification of cloud images, it seems that the RGB color space carries many disadvantages and is not a good candidate. A color space with high dichroism and with high luminance components is a good candidate because the luminance in cloudy areas is higher than in others. Similarly to the results obtained on the SWIMCAT dataset, for Cloud-ImVn 1.0, the LTP descriptor gives better results than other LBP variants.
6. Conclusions
The existing cloud features are very useful in determining color space and texture features to classify cloud types. In order to be able to classify sky/clouds, it is essential to distinguish between two types of pixels (sky and clouds); a suitable color space can facilitate this classification. High-dichroism color systems are good candidates for cloud/sky image classification. In addition, color systems with high luminance components are also good candidates because the luminance in cloudy areas is higher than in cloudless areas.
This paper presents and systematically analyzes various features developed for the task of cloud/sky image classification. We found that the LBP and LTP feature extraction techniques generalized well to this objective. We integrate the color space and texture structure with the LBP feature and LTP feature effectively to obtain higher classification accuracy. Integrating color features and texture into cloud/sky image classification also enhances the performance. In the experiment, this work integrates color features to increase the efficiency of the feature extraction process, and using the ICS technique to select potential histograms allows us to enhance the performance clearly, with fewer features. However, the parameter value t, obtained using exhaustive techniques for the RGB color system, may affect the results of other color spaces.
By exploiting different aspects of sky/cloud images through ground-based sky/cloud images, the proposed method has solved the basic problems of color system processing and feature selection. Below are several future directions of the topic:
- -
In the scope of this study, parameter value t was experimentally obtained by using exhaustive techniques for the RGB color space, so future work should study methods to optimize the value of t for the LTP feature.
- -
In the future, we will study the scoring method for each feature, using it in combination with the histogram selection method to select the histograms that have the most potential features.