**4. Experiment**

The performance of the proposed algorithm will be verified on two texture datasets and five remote sensing scene datasets. Firstly, the texture recognition performance of the algorithm is verified on two texture datasets and compared with the ResNet-50, RSNet, and several typical LBP-derived algorithms. Then, the remote sensing scene classification performance of this algorithm is evaluated on five remote sensing scene datasets and compared with the ResNet-50, RSNet, and the representative remote sensing scene classification algorithm.

#### *4.1. Experimental Data*

#### 4.1.1. Texture Dataset

The performance of the proposed algorithm is firstly validated on two classic texture datasets: KTH-TIPS2-a dataset and KTH-TIPS2-b dataset.

The KTH-TIPS2-a dataset includes 11 classes of texture images. Most classes of the textures are shot in nine different scales, three poses, and four different lighting conditions, for a total of 4608 images, each with a pixel size of 200 × 200. We use three sets of samples as the train set and one set of samples as the test set and perform four experiments, with the average of four results as the final result.

The KTH-TIPS2-b dataset includes 11 classes of texture images, each of which is shot in nine different scales, three poses, and four different lighting conditions, for a total of 4752 images, each with a pixel size of 200 × 200. We use one set of samples as the train set and three sets of samples as the test set and perform four experiments, with the average of four results as the final result.

There are some examples of these texture datasets shown in Figure 9.

**Figure 9.** Example images of two texture datasets from top to bottom: KTH-TIPS2-a and KTH-TIPS2-b.

#### 4.1.2. Remote Sensing Scene Dataset

Besides the texture image classification, the performance of the algorithm is also validated on five remote sensing scene datasets: AID dataset, RSSCN7 dataset, UC Merced Land-Use dataset, WHU-RS19 dataset, and OPTIMAL-31 dataset.

AID dataset [37] contains 30 classes of scene images, each class has about 200 to 400 samples, a total of 10,000, and each image has a pixel size of 600 × 600. Each class of images is randomly selected with ratio of 20:80 to obtain the train and test set.

RSSCN7 dataset [38] contains seven classes of scene images, each with 400 samples, a total of 2800, and each image has a pixel size of 400 × 400. Each class of images is randomly selected with ratio of 50:50 to obtain the train and test set.

UC Merced Land-Use dataset [39] contains 21 classes of scene images, each with 100 samples, a total of 2100, and each image has a pixel size of 256 × 256. Each class of images is randomly selected with ratio of 50:50 to obtain the train and test set.

WHU-RS19 dataset [40] contains 19 classes of scene images, each with about 50 samples, a total of 1005, and each image has a pixel size of 600 × 600. Each class of images is randomly selected with ratio of 60:40 to obtain the train and test set.

OPTIMAL-31 dataset [41] contains 31 classes of scene images, each with 60 samples, a total of 1860, and each image has a pixel size of 256 × 256. Each class of images is randomly selected with ratio of 80:20 to obtain the train and test set.

There are some examples of these remote sensing scene datasets shown in Figure 10.

**Figure 10.** Example images of five remote sensing scene classification datasets from top to bottom: AID, RSSCN7, UC-Merced, WHU-RS19, and OPTIMAL-31.

#### *4.2. Experimental Setup*

Performance of the algorithms in the experiments is measured by the overall accuracy (OA) and the confusion matrix (CM) on the test set. The classification accuracy over all scene categories in a dataset is calculated according to *SP ST* , where *SP* is the number of correct predictions in the test set and *ST* is the total number of images in the test set. The CM allows us to clearly see the classification accuracy of the algorithm for each type of image in the dataset.

In order to verify the performance of the proposed algorithm, we compare the proposed algorithm DBSNet with several representative algorithms on texture datasets and remote sensing datasets. For texture datasets, we compare the DBSNet with the hand-crafted texture feature descriptors ULBP and some efficient and recently proposed LBP derived algorithms such as COV-LBPD, MRELBP, and fast LBP-TOP. We experiment using the source code on the texture datasets. After the feature extraction by the texture feature descriptors, classification using nearest neighbors is conducted. In the proposed algorithm DBSNet, ResNet-50 is one solution for deep feature extractors. Because the ResNet-50 model is complex and the dimension of extracted features is large, we replace ResNet-50 with a shallow CNN model shown in Figure 11 and do the classification experiments on texture datasets to further verify the complementary effect of the hand-crafted texture features on the deep features. The network is trained and tested on four different train-test sets respectively, and then four feature extractors are obtained after removing the fully connected layer. For each feature extractor, we extract the deep features and classify them by the fully connected layer. The deep features fused with ULBP features are also classified by the fully connected layer to obtain the performance of the fused features.

For remote sensing datasets, we compare the proposed method DBSNet with the classic image classification algorithms IFK-SIFT [10], CaffeNet [42], VGG-VD-16 [18], GoogLeNet, ARCNet-VGGNet16 [41], and GBNet + global feature [43]. In addition to comparing the original results in the references [37,41,43], we do experiments on OPTIMAL-31 dataset with IFK-SIFT, CaffeNet, VGG-VD-16, and GoogLeNet referring to the parameter settings in reference [37]. We extract the deep features using the pretrained models without the fully connected layers on ImageNet and the IFK-SIFT features and then classify them respectively by the liblinear [44] for 10 times and take the mean accuracy as the result. Considering that the ResNet-50 used in the proposed methods are fine-tuned for better performance, we fine-tune the deep models CaffeNet, VGG-VD-16, and GoogLeNet for further comparison. We change the output channels of the last fully connected layer and optimize the parameters of deep models with the stochastic gradient descent (SGD). The detailed parameter settings are listed in Table 1.


**Table 1.** Parameter settings of the deep models.

Besides the comparison methods mentioned above, three different algorithms are to be compared based on the difference of feature extraction method and the loss function on both texture datasets and remote sensing datasets, which are the fine-tuned ResNet-50, RSNet, and DBSNet algorithms. These three comparison algorithms are respectively tested to verify whether the deep features extracted by the RSNet and the statistical texture features obtained by the ULBP are complementary and whether the proposed Sinkhorn loss has robust performance.

**Figure 11.** The framework of the shallow convolutional neural network (CNN).

#### *4.3. Experimental Results and Analysis*

In this section, we report the classification performance of the proposed DBSNet and other methods for comparison on challenging texture datasets and remote sensing scene classification datasets respectively.

#### 4.3.1. Experiments on Texture Dataset

For the texture recognition, the classification results given in Table 2 show the performance comparison of the different algorithms on KTH-TIPS2-a and KTH-TIPS2-b texture datasets. The accuracy of the best performing algorithm is bolded for different databases. It can be seen that on the KTH-TIPS2-a and KTH-TIPS2-b datasets, the traditional hand-crafted methods are not competitive, and the ResNet-50, the RSNet, and the DBSNet provide incremental performance, which proves that the performance of the Sinkhorn loss is excellent and the features obtained by the ULBP are complementary to the deep features.

**Table 2.** Classification accuracy of different algorithms on KTH-TIPS2-a and KTH-TIPS2-b texture datasets.


Tables 3 and 4 are the confusion matrices of the RSNet algorithm and the DBSNet algorithm on KTH-TIPS2-b texture dataset which clearly reflect the classification performance on each category in the dataset. We compare these two confusion matrices and find that among the 11 classes, DBSNet algorithm outperforms RSNet algorithm in seven classes, which are aluminium foil, brown bread, cork, cracker, lettuce leaf, linen, and wood, and is inferior to the RSNet algorithm in three classes, which are corduroy, cotton, and wool. The overall classification performance of DBSNet is better than the RSNet algorithm, which proves the superiority of the proposed feature extraction method over the normal deep feature based method.

To further verify the complementary effect of the hand-crafted texture features on the deep features, we replace the RSNet feature extractor with a shallow CNN feature extractor. In Table 5, the accuracy of the best performing algorithm is bolded for different databases. It can be seen that the classification performance of the fused features is better than the deep features on four train-test sets of both KTHTIPS2-a and KTHTIPS2-b datasets. Consequently, the ULBP features complement the low-dimensional deep features of shallow CNN in classification task and even though the dimensions of deep features increase, the complement of ULBP features still exists, which has been proved in Table 2.

**Aluminium Foil Brown Bread Corduroy Cork Cotton Cracker Lettuce Leaf Linen White Bread Wood Wool aluminium foil** 0.9846 0 0 0 0 0 0 0.0154 0 0 0 **brown bread** 0 0.8549 0 0 0 0.0494 0 0 0.0957 0 0 **corduroy** 0.0123 0.0031 0.8117 0.0802 0.0062 0.0123 0 0.0463 0.0031 0.0031 0.0216 **cork** 0 0 0 0.8549 0 0.1204 0 0 0.0247 0 0 **cotton** 0 0 0.1358 0 0.2531 0 0 0.3827 0.0031 0.0463 0.1790 **cracker** 0 0.4846 0 0.0710 0 0.4414 0 0.0031 0 0 0 **lettuce leaf** 0 0 0 0 0.0062 0 0.9938 0 0 0 0 **linen** 0 0 0.0093 0 0.1790 0 0 0.8117 0 0 0 **white bread** 0 0 0 0 0 0 0 0 0.9877 0.0123 0 **wood** 0 0 0 0 0.0247 0 0 0 0.0278 0.9475 0 **wool** 0.0062 0.0031 0.0309 0.1636 0.0123 0 0 0.5216 0 0 0.2623

**Table 3.** Confusion matrix (CM) of RSNet algorithm on KTH-TIPS-2b dataset.


**Table 4.** CM of DBSNet algorithm on KTH-TIPS-2b dataset.

**Table 5.** Classification accuracy of different feature sets on KTHTIPS2-a and KTHTIPS2-b texture datasets.


#### 4.3.2. Experiments on Remote Sensing Scene Dataset

For the remote sensing scene classification, the results given in Table 6 show the performance comparison of the different algorithms on the five challenging remote sensing datasets. The accuracy of the top three best performing algorithms for different databases is bolded. It can be seen that the DBSNet algorithm provides better performance than the RSNet algorithm and the RSNet algorithm performs better than ResNet-50 algorithm on these five datasets, which demonstrates that the features obtained by ULBP still have the performance complementary to the deep features on remote sensing datasets and the proposed Sinkhorn loss can better guide the learning process of the network than the commonly used softmax loss. Compared with the mid-level method IFK-SIFT, the advanced deep feature based methods CaffeNet, VGG-VD-16, GoogLeNet, ARCNet-VGGNet16, and GBNet + global feature achieve improvement performance but the advanced deep feature based methods still have limitations in feature extraction. Based on the deep features, the algorithm DBSNet adds texture features that are instructive for image classification and uses a more suitable loss function. Compared with these representative methods, DBSNet always ranks in the top three on all five datasets.

**Table 6.** Classification accuracy of different algorithms on AID, RSSCN7, UC-Merced, WHU-RS19, and OPTIMAL-31 remote sensing scene classification datasets.


The confusion matrices of RSNet algorithm and DBSNet algorithm are compared on RSSCN7 dataset to analyze the classification performance more carefully. It can be seen from Tables 7 and 8 that among the seven classes, DBSNet algorithm outperforms RSNet algorithm in five classes, which are

Grass, Industry, Forest, Resident, and Parking, and is second to the RSNet algorithm in two classes, which are Field and RiverLake. Generally speaking, the overall classification performance of DBSNet algorithm is better than the RSNet algorithm. As a complement, texture features play a role in the classification task.


**Table 7.** CM of RSNet algorithm on RSSCN7 dataset.

**Table 8.** CM of DBSNet algorithm on RSSCN7 dataset.

