Next Article in Journal
Use of the Adaptive Cross Approximation for the Efficient Computation of the Reduced Matrix with the Characteristic Basis Function Method
Previous Article in Journal
Using Game Theory to Explore the Multinational Supply Chain Production Inventory Models of Various Carbon Emission Policy Combinations
Previous Article in Special Issue
Time Series Prediction Based on Multi-Scale Feature Extraction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Remote Sensing Image Classification Based on Neural Networks Designed Using an Efficient Neural Architecture Search Methodology

1
School of Information Engineering, East China Jiaotong University, Nanchang 330013, China
2
School of Computer Science, Wuhan University, Wuhan 430072, China
3
Jiangxi Xintong Machinery Manufacturing Co., Ltd., Pingxiang 330075, China
4
School of Computer and Information Science, Hubei Engineering University, Xiaogan 432100, China
5
Gravitation and Earth Tide, National Observation and Research Station, Wuhan 430071, China
6
School of Computer Science, Hunan University of Technology, Zhuzhou 412007, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(10), 1563; https://doi.org/10.3390/math12101563
Submission received: 25 March 2024 / Revised: 2 May 2024 / Accepted: 15 May 2024 / Published: 17 May 2024
(This article belongs to the Special Issue Deep Learning and Adaptive Control, 2nd Edition)

Abstract

:
Successful applications of machine learning for the analysis of remote sensing images remain limited by the difficulty of designing neural networks manually. However, while the development of neural architecture search offers the unique potential for discovering new and more effective network architectures, existing neural architecture search algorithms are computationally intensive methods requiring a large amount of data and computational resources and are therefore challenging to apply for developing optimal neural network architectures for remote sensing image classification. Our proposed method uses a differentiable neural architecture search approach for remote sensing image classification. We utilize a binary gate strategy for partial channel connections to reduce the sizes of the network parameters, creating a sparse connection pattern that lowers memory consumption and NAS computational costs. Experimental results indicate that our method achieves a 15.1% increase in validation accuracy during the search phase compared to DDSAS, although slightly lower (by 4.5%) than DARTS. However, we reduced the search time by 88% and network parameter size by 84% compared to DARTS. In the architecture evaluation phase, our method demonstrates a 2.79% improvement in validation accuracy over a manually configured CNN network.

1. Introduction

Remote sensing typically applies optical imaging obtained from satellite or aircraft platforms to detect and monitor the physical characteristics of a geographical area by measuring its reflected and emitted radiation at a distance. As such, remote sensing images are composed of multiple bands, where each band represents a specific wavelength range of electromagnetic radiation corresponding to different features or characteristics such as vegetation, water, or urban areas. These properties of remote sensing images make them ideally suited for automated analysis using machine learning approaches based on neural networks. For example, neural networks are utilized to capture the correlation among multiple spectra of remote sensing images, perform land cover classification and object detection, analyze remote sensing images that change over time, model the time dependence of remote sensing images, etc. [1,2]. Nonetheless, despite the many successful applications of machine learning toward the analysis of remote sensing images, neural networks remain quite difficult to design manually owing to the great many hyperparameters involved, such as network layer types and activation functions [3,4].
This issue has been addressed in recent years through the development of automatic search processes, denoted as neural architecture search (NAS), for determining the optimal neural network architecture for a given task without relying on human expertise [5,6,7]. Generally, NAS functions by defining a network hyperparameter search space and applying a search strategy based on a number of methods [8,9,10,11], such as reinforcement learning (RL) [5], evolutionary (EL) algorithms [12,13], gradient-based methods [14], and Bayesian optimization [15]. These methods differ in terms of computational complexity, scalability, and the ability to consider different search spaces [16]. For example, the NAS process based on RL has been demonstrated to design a neural network architecture from scratch that performs as well or better than the best human-designed architecture in terms of accuracy for identifying object classes within the CIFAR-10 dataset [5]. However, while this work overcame some limitations of conventional NAS algorithms, it relied on heuristics and other manually developed methods to guide the search process. The feasibility of automating the design of network architectures was further confirmed by the application of a large-scale EL algorithm for image classification applications [17]. However, the NAS process involves increasingly high computational costs as the architectural search space increases [18]. Moreover, each model cannot fully utilize the structure of the existing network model and the trained parameters. Hence, the search process can be very computationally intensive [19].
Several studies have sought to address this issue by simplifying the NAS process. For example, the NAS process has been simplified by first searching for architectural building blocks on a small dataset and then transferring the obtained blocks to a larger dataset [20]. A similar process was applied in conjunction with a genetic algorithm to generate optimal network architectures that managed to surpass the performances of the best human-designed architectures [21]. In another work, a controller was applied to search for the best subgraph within a large computational graph representing the neural network architecture, and the efficiency of the search process was increased by sharing parameters between the subgraphs [22]. However, truly significant reductions in the computational cost of the NAS process have been achieved using the differentiable architecture search (DARTS) algorithm [23]. DARTS introduces the concept of continuous relaxation, where softmax relaxation is applied in the discrete search space. The DARTS algorithm is able to overcome the limitations of conventional NAS approaches by considering the search process as a constant optimization problem, similar to the gradient approximation technique used by Finn et al. [24]. As such, the search process can be completed with only a single super network, and it therefore avoids repeatedly training multiple models. Moreover, a number of other NAS algorithms have been proposed based on DARTS. For example, partially connected DARTS (PC-DARTS) uses partial channel connection technology, where the search is conducted based on a subset of randomly selected channels, and edge normalization is applied to prevent instability from arising in the search process [25]. Dynamic and Differentiable Space-Architecture Search (DDSAS) generates a network architectural space and dynamically samples that space using the gradient descent algorithm, where the sampling process is guided by the upper confidence bound (UCB) to balance the exploitation and exploration of the search process, thereby preventing the solution from becoming trapped within local optima [26]. Nonetheless, DARTS and DDSAS algorithms remain computationally intensive methods that require a large amount of data and computational resources. Accordingly, applying these algorithms to developing optimal neural network architectures for remote sensing image classification remains a challenging issue.
The present work addresses this issue by proposing a differentiable neural architecture search method specifically designed for remote sensing image classification. The number of parameters in the network is reduced by limiting the number of connections between neurons based on a binary gates strategy for implementing partial channel connections. This sparse connectivity pattern allows for lower memory consumption and reduces the computational overhead of the search process. In addition, we apply edge normalization to improve the stability of the search process. Our main contributions can be summarized as follows. Firstly, our proposed method was compared with DDSAS and DARTS during the search phase. Our method’s validation accuracy, 85.3%, is 15.1% higher than DDSAS’s 70.2%, although it is 4.5% lower than DARTS’s 89.8%. However, our method reduces the time required by 88% compared to DARTS, and the number of network parameters required is reduced by 84% compared to the other two methods. Secondly, during the architecture evaluation phase, the experimental validation accuracy achieved using our proposed method is 2.79% higher than that of the manually configured CNN mentioned in Reference [1]. The robustness experiments demonstrate that the proposed method exhibits good generalization and stability.

2. A Neural Network Architecture Search Method for Remote Sensing Images

The various hyperparameters in the NAS space exploited by the DARTS algorithm represent the super network illustrated in Figure 1, where the network consists of cells, and the cells consist of nodes.
The computational complexity of the DARTS algorithm is reduced by decomposing the information from the multiple bands of remote sensing images according to the scheme illustrated in Figure 2. Not only does this reduce the dimensionality of the image, but each of the decomposed bands of the remote sensing image is processed or analyzed independently to extract relevant features or components, which reduces the training time. Comparing Figure 1 and Figure 2 indicates that the generic cells in Figure 1 are structured as normal cells and reduction cells, where a reduction cell is connected after several consecutive normal cells. With the exception of the first cell, the inputs of all remaining cells are the outputs of the two previous cells. The first cell is special because its input is specified as the images of all n bands, where n bands are input into the search network in parallel. The individual operations conducted in the first cell are illustrated in Figure 3, where the n decomposed bands are assigned partial channel connections and subjected to edge normalization.
The specific details regarding the implementation of partial channel connections is illustrated in Figure 3. As can be seen, the information flow in the directed graph from source node x i traverses through eight possible operators to destination node x j . This process can be represented by the function f p c i , j O ( x i ; α i , j O ) , where the weight of this edge is determined by the architectural parameter α i , j O . Here, the superscript O represents the search space. x i represents the output of the i node, α i , j O represents the weight of the operation ( O = { o l } , l = 1 , , m ), and m is the number of candidate operations (e.g., convolution, pooling, skipping, recognition, etc.). The goal of network training is to find the maximum weight. The advantage of a random sampling strategy is that the operations in each path are trained uniformly, making the super network appear more random and the search method more competitive.
In the early stages of the search process, the search algorithm tends to favor weightless operations, such as skipping links and max-pooling. These operations lead to consistent results without weighting. On the other hand, weighted operations, such as convolution, can lead to inconsistencies during gradient optimization. Even if they are well optimized later, they cannot outperform the weightless operators. This advantage of weightless operators is mitigated by applying edge normalization, where the weights are distributed to all operations in the directed graph.
The implementation of edge normalization is illustrated in Figure 4. As can be seen, the information flow between two nodes is represented by the function f e n i , j ( x i ; β i , j ) , where β i , j is an architectural parameter that is applied to calculate the weight of the edge as i < j exp { β i , j } / i < j exp { β i , j } . After the NAS process is completed, the weights of the edges of nodes ( i , j ) in the directed graph are determined using architectural parameters α i , j O and β i , j . Because the weights are shared for each operation during training, the learned parameters are less sensitive to channel sampling, which makes the network search process more stable.
The architectural parameters α and β and convolutional network weights w are updated according to the following bi-optimization function:
min α L v a l ( w * ( α , β ) , α , β ) , s . t . w * ( α , β ) = arg min w ( L t r a i n ( w , α , β ) )
Here, L v a l and L t r a i n are the cross-entropy loss functions of the training and validation datasets, respectively.
In considering information flow along a particular edge from node i to node j, the output of node j is represented by the following function f:
f i . j ( x b k ; α i , j O , β i , j ) = f p c i , j O ( x b k ; α i , j O ) · f e n i , j ( x b k ; β i , j ) = ( o O exp { α i , j O } o O exp { α i , j o } · o ( s i , j × x i ) + ( 1 s i , j ) × x i ) · i < j exp { β i , j } i < j exp { β i , j }
Here, s i , j is the channel sampling mask containing only values of 0 and 1. o represents operations on the directed edge x i . f p c i , j O represents a function after sampling some channels, and f e n i , j represents the edge normalization function. O represents the search space, α i , j O represents the weight of the directed edge flowing from node i to node j on the search space O , and β i , j represents the parameters of the directed edge flowing from node i to node j.
Finally, returning to Figure 2, we see that the final image is reconstructed from the n decomposed bands after being assigned partial channel connections and subjected to edge normalization as follows:
o u t p u t = k = 1 n g a t e k ( f ( x b k ; α O , β ) ) · k = 1 n exp { p k } l = 1 n exp { p l }
Here, g a t e k represents the kth binary gate operation, which is defined as follows:
g a t e k ( z ) = 1 × z , probability p k 0 , probability 1 p k
Here, p k represents the probability that the band image passes through the kth binary gate, and z represents the band image. Finally, the outputs are concatenated into the final image, as represented by the symbol ⊕ in Figure 2.

3. Experiment

The performance of the proposed differentiable NAS method was evaluated based on its application of developing a neural network architecture to classify remote sensing images derived from the C-band radar data of the Sentinel-1 satellite mission for the east coast of Canada, which included a reasonably balanced number of image samples comprising two object classes, including ships and icebergs. The C-band radar dataset was composed of 1604 image samples in total, which were partitioned to include 1443 samples (90%) in the training dataset and 161 samples (10%) in the validation dataset. The training dataset included 726 samples of ships and 717 samples of icebergs. The reasonably balanced number of samples for these two classes was advantageous for the training process. Figure 5 shows a two-dimensional and three-dimensional display of a sample ship, and Figure 6 shows a two-dimensional and three-dimensional display of a sample iceberg.
The experimental NAS process applied here was divided into two phases, including a search phase and an evaluation phase. The purpose of the search phase is to obtain the optimal set of hyperparameters { α i , j } and { β i , j } for each edge { i , j } in each cell. These parameters determine the cell with the best possible performance. In the evaluation phase, the searched best cells are used to build a larger architecture, in which we train on the data from scratch and verify the generalization ability of the searched network structure.
The super networks employed in the process are illustrated in Figure 7, where a shallow super network comprising 8 cells was used in the search phase, while a deep super network comprising 20 cells was used in the evaluation phase.
The specific parameter values of the super networks employed in the search (S) and evaluation (E) phases are listed in Table 1. In addition, the two reduction cells employed in the search phase and the evaluation phase were located at 1/3 and 2/3 of the super network. For the 50 training epochs applied for the super network of the search phase, only the network parameters (w) were updated during the first 15 epochs in conjunction with the stochastic gradient descent (SGD) optimizer, while both the network and architectural parameters ( α and β ) were updated simultaneously from the 16th epoch onward in conjunction with the Adam optimizer. In contrast, only the network parameters were updated during the 350 training epochs applied for the super network of the evaluation phase in conjunction with the SGD optimizer. In the architecture evaluation phase, the hyperparameters α i , j O and β i , j of the optimal super network architecture determined in the architecture search phase were employed as fixed values, and the super network was trained from scratch to optimize the weights w. In addition, the initial value of #LR was applied over the whole training epochs, and #LR was adjusted with cosine annealing until reaching a value of zero.
The experiments were divided into the following three parts: architecture search performance, where the outcomes of the proposed method were compared with those obtained using the DARTS and DDSAS methods; architecture evaluation, where the classification performance of the CNN obtained by the proposed method was compared with that of a previous CNN designed manually for the same image classification task of detecting ships and icebergs [1]; and the robustness of the DARTS, DDSAS, and proposed NAS methods, evaluated by comparing the verification accuracies of CNNs designed using the different methods under different random seed points, different numbers of training epochs, and different numbers of nodes applied in each cell of the super network. The computational cost and classification accuracy obtained when applying different binary gates were also compared. All experiments were conducted on a Tesla A100 graphics processing unit (GPU) using Python3.8.

3.1. Architecture Search Performance

The architecture search was ordered to find the optimal cell structure and the two optimal incoming edges of each intermediate node in the directed graph shown on the left side of Figure 3. Equation (2), based on hyperparameters α i , j O and β i , j , determines the weights of the edges. These operations were implemented by modifying the forward function in PyTorch to choose the best operation for these selected edges from eight candidate operations and their corresponding convolutional mask sizes, including none, max pooling ( m a x _ p o o l _ 33 ), average pooling (avg_pool_3×3), skip connections (skip_connect), separable convolution (sep_conv_3×3), sep_conv_5×5, dilated convolution (dil_conv_3×3), and dil_conv_5×5. In order to facilitate comparisons with DDSAS and DARTS, we considered using the same unit structure as them. The basic cell architecture methods considered employed seven nodes for each cell, which included two input nodes c _ { k 2 } and c _ { k 1 } representing the outputs of the two previous cells, one output node c _ { k } , and four intermediate nodes labeled 0, 1, 2, and 3. Each intermediate node had two incoming edges, representing the two operations with the highest weight values during the architecture search phase. The maximum weights were determined using max o O , o z e r o α i , j o , where the two largest weights were retained, and the other edges connecting to nodes i , j were pruned. A cell has 14 edges, and one of the above-discussed eight candidate operations were applied to each edge. The normal cells and the reduction cells were stacked one after the other in the super network and shared these weights. The optimal normal and reduction cell architectures obtained by the proposed NAS method are presented in Figure 8a,b, respectively. The corresponding network architectures obtained by the DDSAS and DARTS methods are presented in Figure 9 and Figure 10, respectively.
During the search phase, 50 epochs were executed on the shallow super network structure of eight cells to obtain candidate super network structures. The evaluation phase involved executing 350 epochs on the deep super network structure of 20 cells to obtain learning parameters and optimize the model. The experimental selection included DDSAS and DARTS as baselines, with the parameter settings for the three methods shown in Table 1. The performance evaluation results obtained during the architecture search phase are shown in Table 2.
Table 2 shows the performance evaluation obtained during the search phase. The results indicate that the DARTS method achieved the greatest training and valid accuracies of all methods considered, while the DDSAS method obtained the lowest accuracies. In contrast, the proposed method’s validation accuracy of 85.3% is 15.1% higher than the DDSAS method’s 70.2% but 4.5% lower than the DARTS method’s 89.8%. However, the DARTS method required the longest search time among the three methods considered, while the proposed method was substantially reduced by 88% relative to this maximum value. In addition, the proposed method required 84% fewer network parameters than the other two methods. Accordingly, the proposed method decreased the search time by a factor greater than 8 relative to that of the DARTS method while sacrificing little classification accuracy.
The training and validation accuracy should be as close as possible, indicating that the model performs similarly on both training and valid data, demonstrating a good generalization performance. A significant difference between the training and validation accuracy may lead to an insufficient generalization performance of the model in practical applications, making it challenging to handle new, unseen data effectively. In Table 2, for the search phase, the difference between the training and validation accuracy for DARTS is 9.753%, while for the proposed method, it is 7.464%. Comparatively, the proposed method shows a better generalization performance. For the evaluation phase, the evaluation results for the three methods are shown in Table 3. Compared to DARTS, the proposed method has a 1.5% higher validation accuracy and a 9.04% higher validation accuracy than those of DDSAS while still maintaining advantages in time and space as in the search phase.

3.2. Architecture Evaluation

The proposed method uses the cell structure shown in Figure 8, while in [1], the manually designed method has four layers of convolutional layers and pooling layers alternately stacked, as shown in Figure 11. The classification performance of the CNN generated by the proposed method during training in Table 4 was compared with the performance of the manually designed CNN for the same dataset classification task previously proposed in [1]. The performances of the two methods regarding training accuracy are essentially equivalent, reaching 99%. The validation accuracy of the proposed method at 99.22% is 2.8% higher than the validation accuracy of 96.43% for the method described in [1]. This indicates that the network structure obtained using the architecture search method has a better generalization performance than the manually designed model in [1]. Figure 12 shows the observed classification performance of this method over 350 training epochs.

3.3. Method Robustness

Random seeds during the search process were set to maintain model stability. However, excessive seeds will increase computational complexity, wasting resources. Table 5 lists the validation accuracy of cell structures designed using DARTS, DDSAS, and the proposed NAS method at different random seeds, training epochs, and nodes applied to each cell of the super network during the search phase. Five scenarios were attempted with random seeds 0, 1, 2, 3, and 4. The standard deviation for the DARTS method was ±0.161; for DDSAS, it was ±0.289; and for the proposed method, it was ±0.158. This indicates that the proposed method is less affected by parameter randomness, demonstrating better performance stability than the other two methods.
Four scenarios were attempted to assess the influence of epochs on the architectural performance during the search phase, with epochs set at 50, 75, 100, and 125. The standard deviation for DARTS was ±1.01; for DDSAS, it was ±4.98; and for our method, it was ±1.72. DDSAS showed the highest standard deviation in validation accuracy, indicating that the current architecture search time is insufficient for obtaining the optimal cell structure. In considering the trade-off between search cost and accuracy, extending the search phase for DARTS and the proposed method would not yield significant benefits. Therefore, setting the epoch at 50 is ideal.
Finally, the search space was expanded by trying five, six, and seven nodes in the cell, respectively. Increasing the search space enhances the model’s expressive power, but a more extensive search space sometimes means better results and increases model complexity. A more comprehensive search space can lead to overfitting if the dataset is simple. Performance improved for DDSAS and the proposed algorithm from five to six nodes; however, performance declined for all three algorithms from six to seven. Therefore, selecting six nodes during the search phase is ideal.
Compared to the channel decomposition method shown in Figure 2, the proposed method in the search phase uses different bands (e.g., HH and HV) to obtain the required search time, training accuracy (TA), and validation accuracy (VA). It evaluates different binary gates B1, B2, and B3 for various training epochs (50, 75, 100, and 125), with the results shown in Table 6. Here, a binary gate value of 0 indicates that the band passed through the binary gate, while a value of 1 indicates that the band did not pass through the gate. The remaining parameter settings for the search phase are as shown in Table 1.
The dataset has two bands and an incidence angle. In considering the incidence angle as one of the features, it is inputted into the network for training together. After processing the incidence angle in the same data format as HH and HV, it passes through gate B3 and enters the channels shown in Figure 2, and the results are shown in Table 6. When opening channel B2 and closing channels B1 and B3, with 50 epochs, the validation accuracy is only 67.5%, indicating a significant gap between the training and validation accuracy, suggesting overfitting. Analysis reveals that this is mainly due to the dataset’s relative simplicity, where the model has reached a performance bottleneck. Table 6 shows that that overfitting features are present at epochs 75, 100, and 125, leading to suboptimal validation accuracy values. Opening channels B2 and B3 results in a validation accuracy of 79.23%, while opening channels B1, B2, and B3 simultaneously leads to a validation accuracy of 88.91%. Compared to single-channel and dual-channel scenarios, the validation accuracy increases by 11.73% and 21.41%, respectively, achieving significant improvement, albeit with a corresponding increase in time. Remote sensing images contain information from multiple bands, but only some bands’ data are necessary for classification; combining certain bands can yield better results, balancing efficiency and accuracy. Designing multiple channels allows for better parallelism, which is an area that future researchers will continue to explore. Channel design is currently implemented in software, with plans for hardware implementations to improve time complexity further.

4. Conclusions

The present work addressed the computational intensiveness of existing NAS algorithms requiring a large amount of data and computational resources by proposing a differentiable neural architecture search method specifically designed for remote sensing image classification. The number of parameters in the network was reduced by limiting the number of connections between neurons based on a binary gate strategy for implementing partial channel connections, which generates a sparse connectivity pattern that decreases memory consumption and reduces the computational overhead of the NAS process. Meanwhile, edge normalization was applied to improve the stability of the search process. The experimental results of detecting ships and icebergs using C-band radar data from the Sentinel-1 satellite mission indicate that our proposed method reduces the required network parameters by 84% compared to DARTS and DDSAS methods. Therefore, compared to these existing methods, the computational time required for developing networks specific to image classification tasks is reduced by over eight times. Our proposed method significantly simplifies the automated design process without sacrificing classification accuracy.

Author Contributions

Conceptualization, L.S. and L.D.; methodology, L.S., M.Y., W.D. and Z.Z.; software, L.S. and W.D.; validation, L.S.; data curation, W.D. and Z.Z.; writing—original draft, L.S.; writing—review and editing, L.S. and L.D.; visualization, M.Y.; supervision, L.D.; project administration, Z.Z. and C.X.; funding acquisition, L.S., L.D., W.D. and Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number: 62341204); National Social Science Found of China “Reaserch on Virtual Reality Media Narrative” (grant number: 21&ZD326); the open fund of Wuhan gravitation and solid earth tide national observation and research station (grant number: WHYWZ202109); and the national natural science foundation of Hunan Province, China (grant number: 2022JJ50051).

Data Availability Statement

The data used in this study originate from an internally developed modeling software that is not publicly available. Unfortunately, we lack the authorization to publicly disclose the data generated by this software.

Conflicts of Interest

Author L.S. and C.X. were employed by the company Jiangxi Xintong Machinery Manufacturing Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Song, L.; Peters, D.K.; Huang, W.; Power, D. Shipiceberg discrimination from Sentinel-1 SAR data using parallel CNN. Concurr. Comput. Pract. Exp. 2021, 33, e6297. [Google Scholar] [CrossRef]
  2. Song, L.; Ding, L.; Wen, T.; Yin, M.; Zeng, Z. Time series change detection using reservoir computing networks for remote sensing data. Int. J. Intell. Syst. 2022, 37, 10845–10860. [Google Scholar] [CrossRef]
  3. Zhang, X.; Li, Y.; Zhang, X.; Wang, Y.; Sun, J. Differentiable Architecture Search with Random Features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 16060–16069. [Google Scholar]
  4. Guo, Z.; Zhang, X.; Mu, H.; Heng, W.; Liu, Z.; Wei, Y.; Sun, J. Single Path One-Shot Neural Architecture Search with Uniform Sampling. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 544–560. [Google Scholar]
  5. Zoph, B.; Le, Q.V. Neural Architecture Search with Reinforcement Learning. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017; pp. 1–16. [Google Scholar]
  6. Wang, H.; Yang, R.; Huang, D.; Wan, Y. iDARTS: Improving DARTS by Node Normalization and Decorrelation Discretization. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 1945–1957. [Google Scholar] [CrossRef] [PubMed]
  7. Ren, P.; Xia, Y.; Chang, X.; Huang, P.Y.; Li, Z.; Chen, X.; Wang, X. A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions. ACM Comput. Surv. 2020, 54, 1–34. [Google Scholar] [CrossRef]
  8. Wan, A.; Dai, X.; Zhang, P.; He, Z.; Tian, Y.; Xie, S.; Wu, B.; Yu, M.; Xu, T.; Chen, K.; et al. Fbnetv2: Differentiable neural architecture search for spatial and channel dimensions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 12962–12971. [Google Scholar]
  9. Song, D.; Xu, C.; Jia, X.; Chen, Y.; Xu, C.; Wang, Y. Efficient residual dense block search for image super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 12007–12014. [Google Scholar]
  10. Chang, J.; Zhang, X.; Guo, Y.; Meng, G.; Xiang, S.; Pan, C. DATA: Differentiable ArchiTecture Approximation. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 876–886. [Google Scholar]
  11. Mellor, J.; Turner, J.; Storkey, A.; Crowley, E.J. Neural architecture search without training. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 7588–7598. [Google Scholar]
  12. Chu, X.; Zhang, B.; Xu, R. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 12239–12248. [Google Scholar]
  13. Liu, Y.; Sun, Y.; Xue, B.; Zhang, M.; Yen, G.; Tan, K.C. A Survey on Evolutionary Neural Architecture Search. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 550–570. [Google Scholar] [CrossRef] [PubMed]
  14. Santra, S.; Hsieh, J.W.; Lin, C.F. Gradient Descent Effects on Differential Neural Architecture Search. IEEE Access 2021, 9, 89602–89618. [Google Scholar] [CrossRef]
  15. Zhou, H.; Yang, M.; Wang, J.; Pan, W. BayesNAS: A Bayesian Approach for Neural Architecture Search. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 7603–7613. [Google Scholar]
  16. Yu, K.; Sciuto, C.; Jajji, M.; Musat, C.; Salzmann, M. Evaluating the search phase of neural architecture search. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020; pp. 1–16. [Google Scholar]
  17. Real, E.; Moore, S.; Selle, A.; Saxena, S.; Suematsu, Y.L.; Tan, J.; Le, Q.V.; Kurakin, A. Large-scale evolution of image classifiers. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 2902–2911. [Google Scholar]
  18. Wang, W.; Zhang, X.; Cui, H.; Yin, H.; Zhang, Y. FP-DARTS: Fast parallel differentiable neural architecture search for image classification. Pattern Recognit. 2023, 136, 109193. [Google Scholar] [CrossRef]
  19. Jin, C.; Huang, J.; Wei, T.; Chen, Y. Neural architecture search based on dual attention mechanism for image classification. Math. Biosci. Eng. 2022, 20, 2691–2715. [Google Scholar] [CrossRef]
  20. Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8697–8710. [Google Scholar]
  21. Real, E.; Aggarwal, A.; Huang, Y.; Le, Q.V. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 4780–4789. [Google Scholar]
  22. Pham, H.; Guan, M.; Zoph, B.; Le, Q.; Dean, J. Efficient neural architecture search via parameters sharing. J. Mach. Learn. Res. 2018, 80, 4095–4104. [Google Scholar]
  23. Liu, H.; Simonyan, K.; Yang, Y. DARTS: Differentiable architecture search. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019; pp. 1–13. [Google Scholar]
  24. Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
  25. Xu, Y.; Xie, L.; Zhang, X.; Chen, X.; Qi, G.J.; Tian, Q.; Xiong, H. Pc-darts: Partial channel connections for memory-efficient architecture search. In Proceedings of the International Conference on Learning Representations, Virtual, 26 April–1 May 2020; pp. 1–13. [Google Scholar]
  26. Yang, L.; Hu, Y.; Lu, S.; Sun, Z.; Mei, J.; Zeng, Y.; Shi, Z.; Han, Y.; Li, X. DDSAS: Dynamic and differentiable space-architecture search. In Proceedings of the 13th Asian Conference on Machine Learning, Virtual, 17–19 November 2021; pp. 284–299. [Google Scholar]
Figure 1. Super network exploited by the DARTS algorithm for conducting NAS.
Figure 1. Super network exploited by the DARTS algorithm for conducting NAS.
Mathematics 12 01563 g001
Figure 2. Process for decomposing the information from the multiple bands of a remote sensing image.
Figure 2. Process for decomposing the information from the multiple bands of a remote sensing image.
Mathematics 12 01563 g002
Figure 3. Implementation of partial channel connections.
Figure 3. Implementation of partial channel connections.
Mathematics 12 01563 g003
Figure 4. Implementation of edge normalization.
Figure 4. Implementation of edge normalization.
Mathematics 12 01563 g004
Figure 5. A sample ship: (a) 2D display; (b) 3D display.
Figure 5. A sample ship: (a) 2D display; (b) 3D display.
Mathematics 12 01563 g005
Figure 6. A sample iceberg: (a) 2D display; (b) 3D display.
Figure 6. A sample iceberg: (a) 2D display; (b) 3D display.
Mathematics 12 01563 g006
Figure 7. Two phases of the search method.
Figure 7. Two phases of the search method.
Mathematics 12 01563 g007
Figure 8. Optimal cell architectures obtained using the proposed NAS method.
Figure 8. Optimal cell architectures obtained using the proposed NAS method.
Mathematics 12 01563 g008
Figure 9. Optimal cell architectures obtained using the DDSAS method.
Figure 9. Optimal cell architectures obtained using the DDSAS method.
Mathematics 12 01563 g009
Figure 10. Optimal cell architectures obtained using the DARTS method.
Figure 10. Optimal cell architectures obtained using the DARTS method.
Mathematics 12 01563 g010
Figure 11. Manual CNN architecture.
Figure 11. Manual CNN architecture.
Mathematics 12 01563 g011
Figure 12. Classification performance of the proposed method during the evaluation phase.
Figure 12. Classification performance of the proposed method during the evaluation phase.
Mathematics 12 01563 g012
Table 1. Parameters of the super networks employed in the search (S) and evaluation (E) phases. #NC is the number of normal cells, #RC is the number of reduction cells, #C is the number of channels, and #CS is the channel sampling rate, which is defined as a value relative to the number of feature maps. #EP is the number of training epochs, #PAR is the parameter size, #BZ is the batch size, #DPR is the DropPath probability, #OPT is the optimizer, #LR is the initial learning rate, #MOM is the momentum, and #WD is the weight decay.
Table 1. Parameters of the super networks employed in the search (S) and evaluation (E) phases. #NC is the number of normal cells, #RC is the number of reduction cells, #C is the number of channels, and #CS is the channel sampling rate, which is defined as a value relative to the number of feature maps. #EP is the number of training epochs, #PAR is the parameter size, #BZ is the batch size, #DPR is the DropPath probability, #OPT is the optimizer, #LR is the initial learning rate, #MOM is the momentum, and #WD is the weight decay.
PhaseP#NC#RC#C#CS#EP#PAR#BZ#DPR#OPT#LR#MOM#WD
Sw62161/4500.3 M1280.3SGD0.0250.93 × 10 4
α , β Adam6 × 10 4 (0.5, 0.999)1 × 10 3
Ew182361/43503.63 M1000.2SGD0.0250.93 × 10 4
Table 2. Comparison of performances of different search methods in the search phase.
Table 2. Comparison of performances of different search methods in the search phase.
MethodParametersTime (s)Training AccuracyValidation AccuracyEpochs
DARTS1.93 M52,74899.55298.850
DDSAS1.93 M23,40874.59270.18450
Proposed method0.3 M611992.77285.30850
Table 3. Comparison of performances of different search methods in the evaluation phase.
Table 3. Comparison of performances of different search methods in the evaluation phase.
MethodParametersTime (s)Training AccuracyValidation AccuracyEpochs
DARTS3.94 M18,46199.9997.74350
DDSAS4.24 M819299.59290.184350
Proposed method2.21 M215399.8499.22350
Table 4. Comparison of the classification accuracies obtained using the proposed method and the manually designed CNN [1].
Table 4. Comparison of the classification accuracies obtained using the proposed method and the manually designed CNN [1].
ModelTraining AccuracyValidation AccuracySearch Method
Our method99.8499.22gradient
The method in [1]99.8996.43manual
Table 5. Verification accuracies of CNNs designed using different methods under different random seed points and different hyperparameter values.
Table 5. Verification accuracies of CNNs designed using different methods under different random seed points and different hyperparameter values.
MethodRandom SeedsEpochsNodes
012345075100125567
DARTS89.42189.45889.8289.7989.66389.890.6791.3292.1889.9289.889.69
DDSAS70.169.9471.18470.7970.2170.1876.3379.2581.7570.0870.1870.15
Our method85.07285.07685.30884.8284.98885.3187.1888.4389.2785.2285.3185.24
Table 6. Comparison of the search time and classification accuracy of the proposed method when employing different binary gates B1, B2, and B3; 50, 75, 100, and 125 represent epochs.
Table 6. Comparison of the search time and classification accuracy of the proposed method when employing different binary gates B1, B2, and B3; 50, 75, 100, and 125 represent epochs.
B1B2B35075100125
TAVATime (s)TAVATime (s)TAVATime (s)TAVATime (s)
01083.1367.513418561.88258290.6361.88345296.8876.884322
01186.5279.23498688.6482.81804992.7684.910,74194.4887.0213,433
10083.1367.513848568.1325679060346693.1371.884350
10186.9377.88496289.6279.63793493.9782.1710,62295.7383.3313,295
11092.7785.31611995.7488.27952897.0688.3512,92397.5388.916,317
11194.4888.91976497.7891.3414,99699.3394.3520,21399.7696.6525,285
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Song, L.; Ding, L.; Yin, M.; Ding, W.; Zeng, Z.; Xiao, C. Remote Sensing Image Classification Based on Neural Networks Designed Using an Efficient Neural Architecture Search Methodology. Mathematics 2024, 12, 1563. https://doi.org/10.3390/math12101563

AMA Style

Song L, Ding L, Yin M, Ding W, Zeng Z, Xiao C. Remote Sensing Image Classification Based on Neural Networks Designed Using an Efficient Neural Architecture Search Methodology. Mathematics. 2024; 12(10):1563. https://doi.org/10.3390/math12101563

Chicago/Turabian Style

Song, Lan, Lixin Ding, Mengjia Yin, Wei Ding, Zhigao Zeng, and Chunxia Xiao. 2024. "Remote Sensing Image Classification Based on Neural Networks Designed Using an Efficient Neural Architecture Search Methodology" Mathematics 12, no. 10: 1563. https://doi.org/10.3390/math12101563

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop