**3. Results**

In this section, LSNet is firstly validated using the CIFAR-100 dataset. Then, we put emphasis on evaluating LSNet on the AID dataset to demonstrate the effectiveness of LSNet on the scene classification task, the performance of which is compared with ResNet34 that utilizes vanilla convolution.

#### *3.1. Results on the CIFAR-100 Dataset*

The overall accuracy of the CIFAR-100 test set is used to evaluate the performance of LSNet, as shown in Tables 4 and 5. This metric is defined as the ratio of the number of correctly predicted samples to the number of samples in the whole test set, as shown in Equation (26).

$$p\_{\mathcal{L}} = \frac{\sum\_{k=1}^{n} P\_{kk}}{N} \tag{26}$$

where *Pkk* represents the number of correctly classified samples in the *k*th class, while *n* and *N* are the number of categories and the number of samples in the whole test set, respectively. Compared with metrics that evaluate each category, *pc* is more intuitive and straightforward for comparing LSNet and ResNet on the CIFAR-100 dataset, which has 100 categories. The parameter Delta is the accuracy difference between LSNet-1d and ResNet-1d to demonstrate the performance of LSNet more intuitively.

**Table 4.** Performance of networks on the CIFAR-100 dataset in the 1D experiment.


In the 1D experiment, all of the LSNets that utilize different nonlinear functions in the LS1D block outperform the baseline network ResNet-1d by more than 1.5%. In between, LSNet-1d with ELU used in the LS1D block reaches the best test set accuracy, which is superior to the baseline network by 2.48%.

In the 2D experiment, LSNet-2d, whose LS2D block uses SELU as the nonlinear function, is slightly inferior to the baseline network ResNet-2d, while other LSNets all outperform the baseline network. In between, LSNet-2d with leaky ReLU used in the LS2D block performs best, which advantages over ResNet-2d by 1.38%.


**Table 5.** Performance of networks on the CIFAR-100 dataset in the 2D experiment.

The experimental results confirm the advantage of introducing nonlinearity into the feature extractor module. With different nonlinear functions, LSNets perform differently, which indicates the importance of choosing suitable nonlinear predict and update operators. For the 1D and 2D experiments, the most appropriate nonlinear functions are different, indicating that they are structure-dependent.

#### *3.2. Results on the AID Dataset*

For each evaluated network, the results of the AID test set's overall accuracy are listed in Table 6. For the 1D experiment, all LSNets with different nonlinear functions used in the LS1D block outperform the baseline ResNet-1d. Therefore, LSNet-1d with leaky ReLU performs similarly to ResNet-1d, while LSNet-1d with ELU reaches the highest test set overall accuracy, which is superior to ResNet-1d by 2.05%.


**Table 6.** Performance of networks on the AID dataset in the 1D experiment.

Confusion matrices for each evaluated network are shown in Figure 6. The probabilities whose values are more than 0.01 are displayed on the confusion matrices. Figure 6b–f are sparser than Figure 6a, which indicates smaller error rates and higher recalls. As Figure 6a shows, ResNet-1d performs well on some classes, such as baseball field, beach, bridge, desert, and forest, the recalls of which are higher than 95%. However, ResNet-1d confuses some classes with other classes. For instance, 20% of the images of the park class are mistaken as the resort class, while approximately 10% of the images of the mountain class are cast incorrectly into the dense residential class.

**Figure 6.** Confusion matrices of the AID dataset in the 1D experiment. The number on the *i*th row, *j*th column represents the normalized number of the images in the *i*th class that are classified as the *j*th class. The numbers below 0.01 are not displayed on the confusion matrices.

For further comparison between LSNet-1d and ResNet-1d, we select the classes whose recalls are below 75% in the ResNet-1d experiment and make further analysis. As shown in Figure 7, ResNet-1d surpasses all 1D LSNets on the medium residential class, but it is inferior to all 1D LSNets on five classes including railway station, resort, river, school, and square. All 1D LSNets are well ahead of ResNet-1d on the square class by more than 10%, which demonstrates that features extracted by nonlinear lifting scheme can better distinguish this class from all other classes.

**Figure 7.** Network performance on partial classes of the AID dataset. The classes whose test set recalls are below 75% in the ResNet-1d experiment are selected for recall comparison between 1D LSNets and ResNet-1d.

In the 2D experiment, LSNets slightly enhance the performance compared with ResNet-2d, as shown in Table 7. In between, LSNet-2d with ReLU used in the LS2D block achieves the highest test set accuracy, which is superior to ResNet-2d by 0.45%. Constructed by different nonlinear predict and update operators, the LS blocks construct different LSNets, which are distinguished in performance. This fact indicates the significance of seeking suitable nonlinear operators.


**Table 7.** Performance of networks on the AID dataset in the 2D experiment.

To explore the comparison between 2D LSNets and ResNet-2d, confusion matrices are drawn to displace the output probabilities whose values are more than 0.01, as shown in Figure 8. Recalls and error rates are displaced for each class. It can be determined that 2D LSNets perform better on some classes, such as the sparse residential class and the viaduct class, where all types of 2D LSNets are superior to ResNet-2d.

Furthermore, we select the classes whose recalls are below 80% in the ResNet-2d experiment and conduct further analysis. As shown in Figure 9, LSNets perform better on most of these classes. For instance, LSNet-2d with ReLU and LSNet-2d with ELU are superior to ResNet-2d by 8.4%, while LSNet-2d with CELU surpasses ResNet-2d by 7.8% on the commercial class. Moreover, all types of LSNets are better than ResNet-2d in river class. This fact indicates that the nonlinear lifting scheme provides an advantage in extracting the features to distinguish the river class from all other classes.

**Figure 8.** Confusion matrices of the AID dataset in the 2D experiment. The number on the *i*th row, *j*th column represents the normalized number of the images in the *i*th class that are classified as the *j*th class. The numbers below 0.01 are not displayed on the confusion matrices.

**Figure 9.** Networks' performance on partial classes of the AID dataset. The classes whose test set recalls are below 80% are selected for recall comparison between 2D LSNets and ResNet-2d.
