The flowchart of the proposed approach is shown in
Figure 1. The motivation of the proposed method is to research the complex spatial relationships of different characters. Firstly, the pixel-level category confidence-degree features are extracted by the CNN [
18], and superpixels are generated by the SLIC algorithm. The construction of the region adjacency graph (RAG) and the voting strategy accomplish the feature extraction at the region level. Subsequently, all characters are introduced into the proposed region-level MRF, and the final classification results are calculated by minimizing the total energy function.
2.1. Superpixel Construction
Superpixel construction is used to classify regions in the proposed method A superpixel is a subregion in an image composed of a series of adjacent pixels with similar features such as color, intensity, and textures. Compared with pixel-level segmentation algorithms, superpixels retain efficient spatial structure information for further image processing and generally reserve the boundary information. In this paper, the simple linear iterative clustering (SLIC) algorithm [
19] is used to generate superpixels. The SLIC algorithm converges homogeneous spatial pixels into the same class by adaptive k-means clustering. As the most common method in regional segmentation, the SLIC algorithm owns superior performance with simple implementation, and it is meanwhile resistant to speckle noise due to considering regional consistency.
In the SLIC algorithm, the value of every superpixel is obtained by calculating the mean intensity of all pixels in each region. For better facilitating the subsequent operations, the region adjacency graph (RAG) is introduced to represent the correlations between regions [
20]. The rule of connection is based on shared boundaries between regions.
Figure 2 shows an example of an RAG for superpixels; the nodes denote the values of the superpixels, and the edges describe the pairwise spatial information.
2.2. Initialization by Convolutional Neural Network
The convolutional neural network (CNN) is the most common tool in supervised image processing, and a traditional CNN in classification is structured as shown in
Figure 3. The convolution layer can be regarded as a kind of filter to output various features from input patches, and the feature maps are constructed by iterating through all positions. The main work of the pooling layer is subsampling the feature maps to resist overfitting and increase robustness for the whole design. The fully connected layer reshapes the feature maps from two-dimensional to one-dimensional; meanwhile, the softmax layer outputs the classification results.
In the softmax layer, the network outputs the pixel-level category labels and corresponding probability distribution. The probability that a pixel corresponds to different labels is listed, and the total probability for each pixel is equal to 1. The pixel-level category labels are used to generate the label field for superpixels, and the probability distribution is regarded as a deep feature to construct the probability field in the MRF.
The criterion of superpixel segmentation is clustering the spatial adjacent pixels with the same features, so the category of a superpixel region should be consistent among most pixels. Because the output of the CNN only provides pixel-wise labels, a region-based majority voting strategy is necessary for initializing RAG labels. Assume there are M pixels in superpixel r and the category of each pixel is received by the CNN, through counting the histograms of the label distribution in the region, the majority voting result will be selected as the initial category of r. The superpixel probability distribution is calculated by the joint probability of the corresponding pixels.
2.3. Region-Level Markov Random Fields
The image classification problem can be formulated as a maximum a posteriori (MAP) estimate in the manner of the Bayesian framework.
where
is the image label, and
is the image feature. The maximum a posteriori probability can be equivalent to two parts, which are
and
separately.
describes the conditional probability of
and is referred to as the feature model, and
is the prior probability of
, which is referred to as the spatial context model.
The traditional MRF model bears a sharply growing computation cost as the size of the image increases, and meanwhile, the block-level texture information fails to be completely extracted in a pixel-based framework [
21]. Based on the idea of solving the above problems, a region-level MRF model is constructed to improve algorithm capacity. Compared with pixel-based MRF model, the region-level MRF model has three main advantages. Firstly, dealing with an image in region-level patterns could effectively reduce the complexity of the algorithm due to the declining number of factors in the label field. Secondly, the latent semantic information could be reflected, including the oversegmented regions [
22]. Thirdly, the process of generating regions suppresses the influence of oversegmentation, and some of the pixel-level misclassification is smoothed out by regions. Generally, dealing with an image in regions can efficiently parse the structure of the topology and contextual information.
In the superpixel-based region-level MRF model for SAR image classification, the spatial context model describes the interactions between continuous superpixels, and the feature model expresses the intensity distribution for each superpixel. The feature model for an SAR image is often calculated as Gaussian distribution, and the energy function is written as:
and are the mean and variance of the intensity distribution at the class , and is the sets of superpixels which belong to class . represents all the regions in the input SAR image. Assuming two adjacent superpixels and , the energy function of the spatial context model is as follows:
expresses the label of expresses the interactions for two regions, denotes the set of all cliques in the SAR input image, and is a potential parameter to balance the contributions between the feature model and spatial context model. The MRF energy function of intensity field is integrated as:
2.4. Construction of Probability Field
It should be mentioned that the traditional superpixel-based region-level MRF model only considers the intensity characteristics, and the label field is guided by the circulation of the energy function. The Gaussian distribution is simple to calculate but rough for multiplicative noise, and punishment between adjacent superpixels is rude and unfair. In this paper, a probability field is constructed to jointly guide the label field and improve the classification accuracy. The probability field is based on the probability output of the CNN. By calculating the average probability of corresponding pixels in the same set, the superpixel probability distribution is obtained.
In accordance with the manner of the MRF model based on the intensity field, the energy function for the probability field has two parts as well. In the unary part, it quantifies the possibilities for a superpixel in each class, and the energy function can be defined as:
is the total number of pixels within the superpixel , is a random pixel in the superpixel , is the total superpixels in the input SAR image, and is the probability of the pixel belong to . calculates the confidence that the superpixels belong to their labels. Similar to the purpose of the spatial context model, the binary part is used to describe the relationship between regions. The energy function is represented as:
is the inner product between the superpixels
and
.
is the total probability distribution for superpixel
and is written as:
where
is the probability distribution for the superpixel
at each label, and
is the total number of categories. The inner product can be computed by relying on the construction of the softmax layer in the CNN and manifests a positive correlation to similitude for adjacent superpixels. To be specific, the value of
is increased when the regions are similar to each other.
is the balance coefficient and agrees with the initialization in the intensity field. The energy function for the probability field is as follows:
We can obtain the total RMRF energy function for the two fields, which is shown below:
The motivation of constructing the probability field is to remedy the insufficiency of MRF model while considering the superpixel probability distribution from the CNN as the deep-level character information to be operated on in the framework of the probability field. The unary part gives the specific quantified value of which category the region belongs to, and the binary part shows the spatial context relations for superpixels. The traditional region-level MRF model gives a polarized strategy for neighborhood relationships, in which the energy function equals 1 if the regions have the same label and settles to 0 when they have different labels. The binary part of the probability field renders the strategy kindly and continuous. When the labels for adjacent regions are different, the energy function outputs a measured value to judge how different they are. When the labels are the same, the energy function also gives a detailed measurement for the similarity index.
The initial RAG label is generated by the CNN, and updating the label field is performed by minimizing the energy functions of both the intensity field and probability field. The simulated annealing (SA) algorithm is used to acquire the final classification results.