Remote Sensing Image Classification Based on Neural Networks Designed Using an Efficient Neural Architecture Search Methodology

Song, Lan; Ding, Lixin; Yin, Mengjia; Ding, Wei; Zeng, Zhigao; Xiao, Chunxia

doi:10.3390/math12101563

Open AccessArticle

Remote Sensing Image Classification Based on Neural Networks Designed Using an Efficient Neural Architecture Search Methodology

by

Lan Song

^1,2,3,*

,

Lixin Ding

²,

Mengjia Yin

⁴,

Wei Ding

⁵

,

Zhigao Zeng

⁶ and

Chunxia Xiao

³

¹

School of Information Engineering, East China Jiaotong University, Nanchang 330013, China

²

School of Computer Science, Wuhan University, Wuhan 430072, China

³

Jiangxi Xintong Machinery Manufacturing Co., Ltd., Pingxiang 330075, China

⁴

School of Computer and Information Science, Hubei Engineering University, Xiaogan 432100, China

⁵

Gravitation and Earth Tide, National Observation and Research Station, Wuhan 430071, China

⁶

School of Computer Science, Hunan University of Technology, Zhuzhou 412007, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(10), 1563; https://doi.org/10.3390/math12101563

Submission received: 25 March 2024 / Revised: 2 May 2024 / Accepted: 15 May 2024 / Published: 17 May 2024

(This article belongs to the Special Issue Deep Learning and Adaptive Control, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Successful applications of machine learning for the analysis of remote sensing images remain limited by the difficulty of designing neural networks manually. However, while the development of neural architecture search offers the unique potential for discovering new and more effective network architectures, existing neural architecture search algorithms are computationally intensive methods requiring a large amount of data and computational resources and are therefore challenging to apply for developing optimal neural network architectures for remote sensing image classification. Our proposed method uses a differentiable neural architecture search approach for remote sensing image classification. We utilize a binary gate strategy for partial channel connections to reduce the sizes of the network parameters, creating a sparse connection pattern that lowers memory consumption and NAS computational costs. Experimental results indicate that our method achieves a 15.1% increase in validation accuracy during the search phase compared to DDSAS, although slightly lower (by 4.5%) than DARTS. However, we reduced the search time by 88% and network parameter size by 84% compared to DARTS. In the architecture evaluation phase, our method demonstrates a 2.79% improvement in validation accuracy over a manually configured CNN network.

Keywords:

remote sensing image; neural architecture search; differentiable architecture search; deep learning; binary gate

MSC:

68T07

1. Introduction

Remote sensing typically applies optical imaging obtained from satellite or aircraft platforms to detect and monitor the physical characteristics of a geographical area by measuring its reflected and emitted radiation at a distance. As such, remote sensing images are composed of multiple bands, where each band represents a specific wavelength range of electromagnetic radiation corresponding to different features or characteristics such as vegetation, water, or urban areas. These properties of remote sensing images make them ideally suited for automated analysis using machine learning approaches based on neural networks. For example, neural networks are utilized to capture the correlation among multiple spectra of remote sensing images, perform land cover classification and object detection, analyze remote sensing images that change over time, model the time dependence of remote sensing images, etc. [1,2]. Nonetheless, despite the many successful applications of machine learning toward the analysis of remote sensing images, neural networks remain quite difficult to design manually owing to the great many hyperparameters involved, such as network layer types and activation functions [3,4].

This issue has been addressed in recent years through the development of automatic search processes, denoted as neural architecture search (NAS), for determining the optimal neural network architecture for a given task without relying on human expertise [5,6,7]. Generally, NAS functions by defining a network hyperparameter search space and applying a search strategy based on a number of methods [8,9,10,11], such as reinforcement learning (RL) [5], evolutionary (EL) algorithms [12,13], gradient-based methods [14], and Bayesian optimization [15]. These methods differ in terms of computational complexity, scalability, and the ability to consider different search spaces [16]. For example, the NAS process based on RL has been demonstrated to design a neural network architecture from scratch that performs as well or better than the best human-designed architecture in terms of accuracy for identifying object classes within the CIFAR-10 dataset [5]. However, while this work overcame some limitations of conventional NAS algorithms, it relied on heuristics and other manually developed methods to guide the search process. The feasibility of automating the design of network architectures was further confirmed by the application of a large-scale EL algorithm for image classification applications [17]. However, the NAS process involves increasingly high computational costs as the architectural search space increases [18]. Moreover, each model cannot fully utilize the structure of the existing network model and the trained parameters. Hence, the search process can be very computationally intensive [19].

Several studies have sought to address this issue by simplifying the NAS process. For example, the NAS process has been simplified by first searching for architectural building blocks on a small dataset and then transferring the obtained blocks to a larger dataset [20]. A similar process was applied in conjunction with a genetic algorithm to generate optimal network architectures that managed to surpass the performances of the best human-designed architectures [21]. In another work, a controller was applied to search for the best subgraph within a large computational graph representing the neural network architecture, and the efficiency of the search process was increased by sharing parameters between the subgraphs [22]. However, truly significant reductions in the computational cost of the NAS process have been achieved using the differentiable architecture search (DARTS) algorithm [23]. DARTS introduces the concept of continuous relaxation, where softmax relaxation is applied in the discrete search space. The DARTS algorithm is able to overcome the limitations of conventional NAS approaches by considering the search process as a constant optimization problem, similar to the gradient approximation technique used by Finn et al. [24]. As such, the search process can be completed with only a single super network, and it therefore avoids repeatedly training multiple models. Moreover, a number of other NAS algorithms have been proposed based on DARTS. For example, partially connected DARTS (PC-DARTS) uses partial channel connection technology, where the search is conducted based on a subset of randomly selected channels, and edge normalization is applied to prevent instability from arising in the search process [25]. Dynamic and Differentiable Space-Architecture Search (DDSAS) generates a network architectural space and dynamically samples that space using the gradient descent algorithm, where the sampling process is guided by the upper confidence bound (UCB) to balance the exploitation and exploration of the search process, thereby preventing the solution from becoming trapped within local optima [26]. Nonetheless, DARTS and DDSAS algorithms remain computationally intensive methods that require a large amount of data and computational resources. Accordingly, applying these algorithms to developing optimal neural network architectures for remote sensing image classification remains a challenging issue.

The present work addresses this issue by proposing a differentiable neural architecture search method specifically designed for remote sensing image classification. The number of parameters in the network is reduced by limiting the number of connections between neurons based on a binary gates strategy for implementing partial channel connections. This sparse connectivity pattern allows for lower memory consumption and reduces the computational overhead of the search process. In addition, we apply edge normalization to improve the stability of the search process. Our main contributions can be summarized as follows. Firstly, our proposed method was compared with DDSAS and DARTS during the search phase. Our method’s validation accuracy, 85.3%, is 15.1% higher than DDSAS’s 70.2%, although it is 4.5% lower than DARTS’s 89.8%. However, our method reduces the time required by 88% compared to DARTS, and the number of network parameters required is reduced by 84% compared to the other two methods. Secondly, during the architecture evaluation phase, the experimental validation accuracy achieved using our proposed method is 2.79% higher than that of the manually configured CNN mentioned in Reference [1]. The robustness experiments demonstrate that the proposed method exhibits good generalization and stability.

2. A Neural Network Architecture Search Method for Remote Sensing Images

The various hyperparameters in the NAS space exploited by the DARTS algorithm represent the super network illustrated in Figure 1, where the network consists of cells, and the cells consist of nodes.

The computational complexity of the DARTS algorithm is reduced by decomposing the information from the multiple bands of remote sensing images according to the scheme illustrated in Figure 2. Not only does this reduce the dimensionality of the image, but each of the decomposed bands of the remote sensing image is processed or analyzed independently to extract relevant features or components, which reduces the training time. Comparing Figure 1 and Figure 2 indicates that the generic cells in Figure 1 are structured as normal cells and reduction cells, where a reduction cell is connected after several consecutive normal cells. With the exception of the first cell, the inputs of all remaining cells are the outputs of the two previous cells. The first cell is special because its input is specified as the images of all n bands, where n bands are input into the search network in parallel. The individual operations conducted in the first cell are illustrated in Figure 3, where the n decomposed bands are assigned partial channel connections and subjected to edge normalization.

The specific details regarding the implementation of partial channel connections is illustrated in Figure 3. As can be seen, the information flow in the directed graph from source node

x_{i}

traverses through eight possible operators to destination node

x_{j}

. This process can be represented by the function

f_{p c_{i, j}}^{O} (x_{i}; α_{i, j}^{O})

, where the weight of this edge is determined by the architectural parameter

α_{i, j}^{O}

. Here, the superscript

O

represents the search space.

x_{i}

represents the output of the i node,

α_{i, j}^{O}

represents the weight of the operation (

O = {o_{l}}, l = 1, \dots, m

), and m is the number of candidate operations (e.g., convolution, pooling, skipping, recognition, etc.). The goal of network training is to find the maximum weight. The advantage of a random sampling strategy is that the operations in each path are trained uniformly, making the super network appear more random and the search method more competitive.

In the early stages of the search process, the search algorithm tends to favor weightless operations, such as skipping links and max-pooling. These operations lead to consistent results without weighting. On the other hand, weighted operations, such as convolution, can lead to inconsistencies during gradient optimization. Even if they are well optimized later, they cannot outperform the weightless operators. This advantage of weightless operators is mitigated by applying edge normalization, where the weights are distributed to all operations in the directed graph.

The implementation of edge normalization is illustrated in Figure 4. As can be seen, the information flow between two nodes is represented by the function

f_{e n_{i, j}} (x_{i}; β_{i, j})

, where

β_{i, j}

is an architectural parameter that is applied to calculate the weight of the edge as

\sum_{i < j} exp {β_{i, j}} / \sum_{i^{'} < j} exp {β_{i, j}}

. After the NAS process is completed, the weights of the edges of nodes

(i, j)

in the directed graph are determined using architectural parameters

α_{i, j}^{O}

and

β_{i, j}

. Because the weights are shared for each operation during training, the learned parameters are less sensitive to channel sampling, which makes the network search process more stable.

The architectural parameters

α

and

β

and convolutional network weights w are updated according to the following bi-optimization function:

min_{α} L^{v a l} (w^{*} (α, β), α, β), s . t . w^{*} (α, β) = arg min_{w} (L^{t r a i n} (w, α, β))

(1)

Here,

L^{v a l}

and

L^{t r a i n}

are the cross-entropy loss functions of the training and validation datasets, respectively.

In considering information flow along a particular edge from node i to node j, the output of node j is represented by the following function f:

\begin{matrix} f_{i . j} (x_{b_{k}}; α_{i, j}^{O}, β_{i, j}) \\ = f_{p c_{i, j}}^{O} (x_{b_{k}}; α_{i, j}^{O}) \cdot f_{e n_{i, j}} (x_{b_{k}}; β_{i, j}) \\ = (\sum_{o \in O} \frac{exp {α_{i, j}^{O}}}{\sum_{o^{'} \in O} exp {α_{i, j}^{o^{'}}}} \cdot o (s_{i, j} \times x_{i}) + (1 - s_{i, j}) \times x_{i}) \\ \cdot \sum_{i < j} \frac{exp {β_{i, j}}}{\sum_{i^{'} < j} exp {β_{i, j}}} \end{matrix}

(2)

Here,

s_{i, j}

is the channel sampling mask containing only values of 0 and 1. o represents operations on the directed edge

x_{i}

.

f_{p c_{i, j}}^{O}

represents a function after sampling some channels, and

f_{e n_{i, j}}

represents the edge normalization function.

O

represents the search space,

α_{i, j}^{O}

represents the weight of the directed edge flowing from node i to node j on the search space

O

, and

β_{i, j}

represents the parameters of the directed edge flowing from node i to node j.

Finally, returning to Figure 2, we see that the final image is reconstructed from the n decomposed bands after being assigned partial channel connections and subjected to edge normalization as follows:

o u t p u t = \sum_{k = 1}^{n} g a t e_{k} (f (x_{b_{k}}; α^{O}, β)) \cdot \sum_{k = 1}^{n} \frac{exp {p_{k}}}{\sum_{l = 1}^{n} exp {p_{l}}}

(3)

Here,

g a t e_{k}

represents the kth binary gate operation, which is defined as follows:

g a t e_{k} (z) = \{\begin{matrix} 1 \times z, probability p_{k} \\ 0, probability 1 - p_{k} \end{matrix}

(4)

Here,

p_{k}

represents the probability that the band image passes through the kth binary gate, and

z

represents the band image. Finally, the outputs are concatenated into the final image, as represented by the symbol ⊕ in Figure 2.

3. Experiment

The performance of the proposed differentiable NAS method was evaluated based on its application of developing a neural network architecture to classify remote sensing images derived from the C-band radar data of the Sentinel-1 satellite mission for the east coast of Canada, which included a reasonably balanced number of image samples comprising two object classes, including ships and icebergs. The C-band radar dataset was composed of 1604 image samples in total, which were partitioned to include 1443 samples (90%) in the training dataset and 161 samples (10%) in the validation dataset. The training dataset included 726 samples of ships and 717 samples of icebergs. The reasonably balanced number of samples for these two classes was advantageous for the training process. Figure 5 shows a two-dimensional and three-dimensional display of a sample ship, and Figure 6 shows a two-dimensional and three-dimensional display of a sample iceberg.

The experimental NAS process applied here was divided into two phases, including a search phase and an evaluation phase. The purpose of the search phase is to obtain the optimal set of hyperparameters

{α_{i, j}}

and

{β_{i, j}}

for each edge

{i, j}

in each cell. These parameters determine the cell with the best possible performance. In the evaluation phase, the searched best cells are used to build a larger architecture, in which we train on the data from scratch and verify the generalization ability of the searched network structure.

The super networks employed in the process are illustrated in Figure 7, where a shallow super network comprising 8 cells was used in the search phase, while a deep super network comprising 20 cells was used in the evaluation phase.

The specific parameter values of the super networks employed in the search (S) and evaluation (E) phases are listed in Table 1. In addition, the two reduction cells employed in the search phase and the evaluation phase were located at 1/3 and 2/3 of the super network. For the 50 training epochs applied for the super network of the search phase, only the network parameters (w) were updated during the first 15 epochs in conjunction with the stochastic gradient descent (SGD) optimizer, while both the network and architectural parameters (

α

and

β

) were updated simultaneously from the 16th epoch onward in conjunction with the Adam optimizer. In contrast, only the network parameters were updated during the 350 training epochs applied for the super network of the evaluation phase in conjunction with the SGD optimizer. In the architecture evaluation phase, the hyperparameters

α_{i, j}^{O}

and

β_{i, j}

of the optimal super network architecture determined in the architecture search phase were employed as fixed values, and the super network was trained from scratch to optimize the weights w. In addition, the initial value of #LR was applied over the whole training epochs, and #LR was adjusted with cosine annealing until reaching a value of zero.

The experiments were divided into the following three parts: architecture search performance, where the outcomes of the proposed method were compared with those obtained using the DARTS and DDSAS methods; architecture evaluation, where the classification performance of the CNN obtained by the proposed method was compared with that of a previous CNN designed manually for the same image classification task of detecting ships and icebergs [1]; and the robustness of the DARTS, DDSAS, and proposed NAS methods, evaluated by comparing the verification accuracies of CNNs designed using the different methods under different random seed points, different numbers of training epochs, and different numbers of nodes applied in each cell of the super network. The computational cost and classification accuracy obtained when applying different binary gates were also compared. All experiments were conducted on a Tesla A100 graphics processing unit (GPU) using Python3.8.

3.1. Architecture Search Performance

The architecture search was ordered to find the optimal cell structure and the two optimal incoming edges of each intermediate node in the directed graph shown on the left side of Figure 3. Equation (2), based on hyperparameters

α_{i, j}^{O}

and

β_{i, j}

, determines the weights of the edges. These operations were implemented by modifying the forward function in PyTorch to choose the best operation for these selected edges from eight candidate operations and their corresponding convolutional mask sizes, including none, max pooling (

m a x_p o o l_33

), average pooling (avg_pool_3×3), skip connections (skip_connect), separable convolution (sep_conv_3×3), sep_conv_5×5, dilated convolution (dil_conv_3×3), and dil_conv_5×5. In order to facilitate comparisons with DDSAS and DARTS, we considered using the same unit structure as them. The basic cell architecture methods considered employed seven nodes for each cell, which included two input nodes

c_{k - 2}

and

c_{k - 1}

representing the outputs of the two previous cells, one output node

c_{k}

, and four intermediate nodes labeled 0, 1, 2, and 3. Each intermediate node had two incoming edges, representing the two operations with the highest weight values during the architecture search phase. The maximum weights were determined using

{max}_{o \in O, o \neq z e r o} α_{i, j}^{o}

, where the two largest weights were retained, and the other edges connecting to nodes

i, j

were pruned. A cell has 14 edges, and one of the above-discussed eight candidate operations were applied to each edge. The normal cells and the reduction cells were stacked one after the other in the super network and shared these weights. The optimal normal and reduction cell architectures obtained by the proposed NAS method are presented in Figure 8a,b, respectively. The corresponding network architectures obtained by the DDSAS and DARTS methods are presented in Figure 9 and Figure 10, respectively.

During the search phase, 50 epochs were executed on the shallow super network structure of eight cells to obtain candidate super network structures. The evaluation phase involved executing 350 epochs on the deep super network structure of 20 cells to obtain learning parameters and optimize the model. The experimental selection included DDSAS and DARTS as baselines, with the parameter settings for the three methods shown in Table 1. The performance evaluation results obtained during the architecture search phase are shown in Table 2.

Table 2 shows the performance evaluation obtained during the search phase. The results indicate that the DARTS method achieved the greatest training and valid accuracies of all methods considered, while the DDSAS method obtained the lowest accuracies. In contrast, the proposed method’s validation accuracy of 85.3% is 15.1% higher than the DDSAS method’s 70.2% but 4.5% lower than the DARTS method’s 89.8%. However, the DARTS method required the longest search time among the three methods considered, while the proposed method was substantially reduced by 88% relative to this maximum value. In addition, the proposed method required 84% fewer network parameters than the other two methods. Accordingly, the proposed method decreased the search time by a factor greater than 8 relative to that of the DARTS method while sacrificing little classification accuracy.

The training and validation accuracy should be as close as possible, indicating that the model performs similarly on both training and valid data, demonstrating a good generalization performance. A significant difference between the training and validation accuracy may lead to an insufficient generalization performance of the model in practical applications, making it challenging to handle new, unseen data effectively. In Table 2, for the search phase, the difference between the training and validation accuracy for DARTS is 9.753%, while for the proposed method, it is 7.464%. Comparatively, the proposed method shows a better generalization performance. For the evaluation phase, the evaluation results for the three methods are shown in Table 3. Compared to DARTS, the proposed method has a 1.5% higher validation accuracy and a 9.04% higher validation accuracy than those of DDSAS while still maintaining advantages in time and space as in the search phase.

3.2. Architecture Evaluation

The proposed method uses the cell structure shown in Figure 8, while in [1], the manually designed method has four layers of convolutional layers and pooling layers alternately stacked, as shown in Figure 11. The classification performance of the CNN generated by the proposed method during training in Table 4 was compared with the performance of the manually designed CNN for the same dataset classification task previously proposed in [1]. The performances of the two methods regarding training accuracy are essentially equivalent, reaching 99%. The validation accuracy of the proposed method at 99.22% is 2.8% higher than the validation accuracy of 96.43% for the method described in [1]. This indicates that the network structure obtained using the architecture search method has a better generalization performance than the manually designed model in [1]. Figure 12 shows the observed classification performance of this method over 350 training epochs.

3.3. Method Robustness

Random seeds during the search process were set to maintain model stability. However, excessive seeds will increase computational complexity, wasting resources. Table 5 lists the validation accuracy of cell structures designed using DARTS, DDSAS, and the proposed NAS method at different random seeds, training epochs, and nodes applied to each cell of the super network during the search phase. Five scenarios were attempted with random seeds 0, 1, 2, 3, and 4. The standard deviation for the DARTS method was ±0.161; for DDSAS, it was ±0.289; and for the proposed method, it was ±0.158. This indicates that the proposed method is less affected by parameter randomness, demonstrating better performance stability than the other two methods.

Four scenarios were attempted to assess the influence of epochs on the architectural performance during the search phase, with epochs set at 50, 75, 100, and 125. The standard deviation for DARTS was ±1.01; for DDSAS, it was ±4.98; and for our method, it was ±1.72. DDSAS showed the highest standard deviation in validation accuracy, indicating that the current architecture search time is insufficient for obtaining the optimal cell structure. In considering the trade-off between search cost and accuracy, extending the search phase for DARTS and the proposed method would not yield significant benefits. Therefore, setting the epoch at 50 is ideal.

Finally, the search space was expanded by trying five, six, and seven nodes in the cell, respectively. Increasing the search space enhances the model’s expressive power, but a more extensive search space sometimes means better results and increases model complexity. A more comprehensive search space can lead to overfitting if the dataset is simple. Performance improved for DDSAS and the proposed algorithm from five to six nodes; however, performance declined for all three algorithms from six to seven. Therefore, selecting six nodes during the search phase is ideal.

Compared to the channel decomposition method shown in Figure 2, the proposed method in the search phase uses different bands (e.g., HH and HV) to obtain the required search time, training accuracy (TA), and validation accuracy (VA). It evaluates different binary gates B1, B2, and B3 for various training epochs (50, 75, 100, and 125), with the results shown in Table 6. Here, a binary gate value of 0 indicates that the band passed through the binary gate, while a value of 1 indicates that the band did not pass through the gate. The remaining parameter settings for the search phase are as shown in Table 1.

The dataset has two bands and an incidence angle. In considering the incidence angle as one of the features, it is inputted into the network for training together. After processing the incidence angle in the same data format as HH and HV, it passes through gate B3 and enters the channels shown in Figure 2, and the results are shown in Table 6. When opening channel B2 and closing channels B1 and B3, with 50 epochs, the validation accuracy is only 67.5%, indicating a significant gap between the training and validation accuracy, suggesting overfitting. Analysis reveals that this is mainly due to the dataset’s relative simplicity, where the model has reached a performance bottleneck. Table 6 shows that that overfitting features are present at epochs 75, 100, and 125, leading to suboptimal validation accuracy values. Opening channels B2 and B3 results in a validation accuracy of 79.23%, while opening channels B1, B2, and B3 simultaneously leads to a validation accuracy of 88.91%. Compared to single-channel and dual-channel scenarios, the validation accuracy increases by 11.73% and 21.41%, respectively, achieving significant improvement, albeit with a corresponding increase in time. Remote sensing images contain information from multiple bands, but only some bands’ data are necessary for classification; combining certain bands can yield better results, balancing efficiency and accuracy. Designing multiple channels allows for better parallelism, which is an area that future researchers will continue to explore. Channel design is currently implemented in software, with plans for hardware implementations to improve time complexity further.

4. Conclusions

The present work addressed the computational intensiveness of existing NAS algorithms requiring a large amount of data and computational resources by proposing a differentiable neural architecture search method specifically designed for remote sensing image classification. The number of parameters in the network was reduced by limiting the number of connections between neurons based on a binary gate strategy for implementing partial channel connections, which generates a sparse connectivity pattern that decreases memory consumption and reduces the computational overhead of the NAS process. Meanwhile, edge normalization was applied to improve the stability of the search process. The experimental results of detecting ships and icebergs using C-band radar data from the Sentinel-1 satellite mission indicate that our proposed method reduces the required network parameters by 84% compared to DARTS and DDSAS methods. Therefore, compared to these existing methods, the computational time required for developing networks specific to image classification tasks is reduced by over eight times. Our proposed method significantly simplifies the automated design process without sacrificing classification accuracy.

Author Contributions

Conceptualization, L.S. and L.D.; methodology, L.S., M.Y., W.D. and Z.Z.; software, L.S. and W.D.; validation, L.S.; data curation, W.D. and Z.Z.; writing—original draft, L.S.; writing—review and editing, L.S. and L.D.; visualization, M.Y.; supervision, L.D.; project administration, Z.Z. and C.X.; funding acquisition, L.S., L.D., W.D. and Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (grant number: 62341204); National Social Science Found of China “Reaserch on Virtual Reality Media Narrative” (grant number: 21&ZD326); the open fund of Wuhan gravitation and solid earth tide national observation and research station (grant number: WHYWZ202109); and the national natural science foundation of Hunan Province, China (grant number: 2022JJ50051).

Data Availability Statement

The data used in this study originate from an internally developed modeling software that is not publicly available. Unfortunately, we lack the authorization to publicly disclose the data generated by this software.

Conflicts of Interest

Author L.S. and C.X. were employed by the company Jiangxi Xintong Machinery Manufacturing Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Song, L.; Peters, D.K.; Huang, W.; Power, D. Shipiceberg discrimination from Sentinel-1 SAR data using parallel CNN. Concurr. Comput. Pract. Exp. 2021, 33, e6297. [Google Scholar] [CrossRef]
Song, L.; Ding, L.; Wen, T.; Yin, M.; Zeng, Z. Time series change detection using reservoir computing networks for remote sensing data. Int. J. Intell. Syst. 2022, 37, 10845–10860. [Google Scholar] [CrossRef]
Zhang, X.; Li, Y.; Zhang, X.; Wang, Y.; Sun, J. Differentiable Architecture Search with Random Features. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 16060–16069. [Google Scholar]
Guo, Z.; Zhang, X.; Mu, H.; Heng, W.; Liu, Z.; Wei, Y.; Sun, J. Single Path One-Shot Neural Architecture Search with Uniform Sampling. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 544–560. [Google Scholar]
Zoph, B.; Le, Q.V. Neural Architecture Search with Reinforcement Learning. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017; pp. 1–16. [Google Scholar]
Wang, H.; Yang, R.; Huang, D.; Wan, Y. iDARTS: Improving DARTS by Node Normalization and Decorrelation Discretization. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 1945–1957. [Google Scholar] [CrossRef] [PubMed]
Ren, P.; Xia, Y.; Chang, X.; Huang, P.Y.; Li, Z.; Chen, X.; Wang, X. A Comprehensive Survey of Neural Architecture Search: Challenges and Solutions. ACM Comput. Surv. 2020, 54, 1–34. [Google Scholar] [CrossRef]
Wan, A.; Dai, X.; Zhang, P.; He, Z.; Tian, Y.; Xie, S.; Wu, B.; Yu, M.; Xu, T.; Chen, K.; et al. Fbnetv2: Differentiable neural architecture search for spatial and channel dimensions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 12962–12971. [Google Scholar]
Song, D.; Xu, C.; Jia, X.; Chen, Y.; Xu, C.; Wang, Y. Efficient residual dense block search for image super-resolution. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 12007–12014. [Google Scholar]
Chang, J.; Zhang, X.; Guo, Y.; Meng, G.; Xiang, S.; Pan, C. DATA: Differentiable ArchiTecture Approximation. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 876–886. [Google Scholar]
Mellor, J.; Turner, J.; Storkey, A.; Crowley, E.J. Neural architecture search without training. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 7588–7598. [Google Scholar]
Chu, X.; Zhang, B.; Xu, R. Fairnas: Rethinking evaluation fairness of weight sharing neural architecture search. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 12239–12248. [Google Scholar]
Liu, Y.; Sun, Y.; Xue, B.; Zhang, M.; Yen, G.; Tan, K.C. A Survey on Evolutionary Neural Architecture Search. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 550–570. [Google Scholar] [CrossRef] [PubMed]
Santra, S.; Hsieh, J.W.; Lin, C.F. Gradient Descent Effects on Differential Neural Architecture Search. IEEE Access 2021, 9, 89602–89618. [Google Scholar] [CrossRef]
Zhou, H.; Yang, M.; Wang, J.; Pan, W. BayesNAS: A Bayesian Approach for Neural Architecture Search. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 7603–7613. [Google Scholar]
Yu, K.; Sciuto, C.; Jajji, M.; Musat, C.; Salzmann, M. Evaluating the search phase of neural architecture search. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020; pp. 1–16. [Google Scholar]
Real, E.; Moore, S.; Selle, A.; Saxena, S.; Suematsu, Y.L.; Tan, J.; Le, Q.V.; Kurakin, A. Large-scale evolution of image classifiers. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 2902–2911. [Google Scholar]
Wang, W.; Zhang, X.; Cui, H.; Yin, H.; Zhang, Y. FP-DARTS: Fast parallel differentiable neural architecture search for image classification. Pattern Recognit. 2023, 136, 109193. [Google Scholar] [CrossRef]
Jin, C.; Huang, J.; Wei, T.; Chen, Y. Neural architecture search based on dual attention mechanism for image classification. Math. Biosci. Eng. 2022, 20, 2691–2715. [Google Scholar] [CrossRef]
Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8697–8710. [Google Scholar]
Real, E.; Aggarwal, A.; Huang, Y.; Le, Q.V. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 4780–4789. [Google Scholar]
Pham, H.; Guan, M.; Zoph, B.; Le, Q.; Dean, J. Efficient neural architecture search via parameters sharing. J. Mach. Learn. Res. 2018, 80, 4095–4104. [Google Scholar]
Liu, H.; Simonyan, K.; Yang, Y. DARTS: Differentiable architecture search. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019; pp. 1–13. [Google Scholar]
Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
Xu, Y.; Xie, L.; Zhang, X.; Chen, X.; Qi, G.J.; Tian, Q.; Xiong, H. Pc-darts: Partial channel connections for memory-efficient architecture search. In Proceedings of the International Conference on Learning Representations, Virtual, 26 April–1 May 2020; pp. 1–13. [Google Scholar]
Yang, L.; Hu, Y.; Lu, S.; Sun, Z.; Mei, J.; Zeng, Y.; Shi, Z.; Han, Y.; Li, X. DDSAS: Dynamic and differentiable space-architecture search. In Proceedings of the 13th Asian Conference on Machine Learning, Virtual, 17–19 November 2021; pp. 284–299. [Google Scholar]

Figure 1. Super network exploited by the DARTS algorithm for conducting NAS.

Figure 2. Process for decomposing the information from the multiple bands of a remote sensing image.

Figure 3. Implementation of partial channel connections.

Figure 4. Implementation of edge normalization.

Figure 5. A sample ship: (a) 2D display; (b) 3D display.

Figure 6. A sample iceberg: (a) 2D display; (b) 3D display.

Figure 7. Two phases of the search method.

Figure 8. Optimal cell architectures obtained using the proposed NAS method.

Figure 9. Optimal cell architectures obtained using the DDSAS method.

Figure 10. Optimal cell architectures obtained using the DARTS method.

Figure 11. Manual CNN architecture.

Figure 12. Classification performance of the proposed method during the evaluation phase.

Table 1. Parameters of the super networks employed in the search (S) and evaluation (E) phases. #NC is the number of normal cells, #RC is the number of reduction cells, #C is the number of channels, and #CS is the channel sampling rate, which is defined as a value relative to the number of feature maps. #EP is the number of training epochs, #PAR is the parameter size, #BZ is the batch size, #DPR is the DropPath probability, #OPT is the optimizer, #LR is the initial learning rate, #MOM is the momentum, and #WD is the weight decay.

Phase	P	#NC	#RC	#C	#CS	#EP	#PAR	#BZ	#DPR	#OPT	#LR	#MOM	#WD
S	w	6	2	16	1/4	50	0.3 M	128	0.3	SGD	0.025	0.9	3 × $10^{- 4}$
	$α$ , $β$									Adam	6 × $10^{- 4}$	(0.5, 0.999)	1 × $10^{- 3}$
E	w	18	2	36	1/4	350	3.63 M	100	0.2	SGD	0.025	0.9	3 × $10^{- 4}$

Table 2. Comparison of performances of different search methods in the search phase.

Method	Parameters	Time (s)	Training Accuracy	Validation Accuracy	Epochs
DARTS	1.93 M	52,748	99.552	98.8	50
DDSAS	1.93 M	23,408	74.592	70.184	50
Proposed method	0.3 M	6119	92.772	85.308	50

Table 3. Comparison of performances of different search methods in the evaluation phase.

Method	Parameters	Time (s)	Training Accuracy	Validation Accuracy	Epochs
DARTS	3.94 M	18,461	99.99	97.74	350
DDSAS	4.24 M	8192	99.592	90.184	350
Proposed method	2.21 M	2153	99.84	99.22	350

Table 4. Comparison of the classification accuracies obtained using the proposed method and the manually designed CNN [1].

Model	Training Accuracy	Validation Accuracy	Search Method
Our method	99.84	99.22	gradient
The method in [1]	99.89	96.43	manual

Table 5. Verification accuracies of CNNs designed using different methods under different random seed points and different hyperparameter values.

Method	Random Seeds					Epochs				Nodes
Method	0	1	2	3	4	50	75	100	125	5	6	7
DARTS	89.421	89.458	89.82	89.79	89.663	89.8	90.67	91.32	92.18	89.92	89.8	89.69
DDSAS	70.1	69.94	71.184	70.79	70.21	70.18	76.33	79.25	81.75	70.08	70.18	70.15
Our method	85.072	85.076	85.308	84.82	84.988	85.31	87.18	88.43	89.27	85.22	85.31	85.24

Table 6. Comparison of the search time and classification accuracy of the proposed method when employing different binary gates B1, B2, and B3; 50, 75, 100, and 125 represent epochs.

B1	B2	B3	50			75			100			125
B1	B2	B3	TA	VA	Time (s)	TA	VA	Time (s)	TA	VA	Time (s)	TA	VA	Time (s)
0	1	0	83.13	67.5	1341	85	61.88	2582	90.63	61.88	3452	96.88	76.88	4322
0	1	1	86.52	79.23	4986	88.64	82.81	8049	92.76	84.9	10,741	94.48	87.02	13,433
1	0	0	83.13	67.5	1384	85	68.13	2567	90	60	3466	93.13	71.88	4350
1	0	1	86.93	77.88	4962	89.62	79.63	7934	93.97	82.17	10,622	95.73	83.33	13,295
1	1	0	92.77	85.31	6119	95.74	88.27	9528	97.06	88.35	12,923	97.53	88.9	16,317
1	1	1	94.48	88.91	9764	97.78	91.34	14,996	99.33	94.35	20,213	99.76	96.65	25,285

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Song, L.; Ding, L.; Yin, M.; Ding, W.; Zeng, Z.; Xiao, C. Remote Sensing Image Classification Based on Neural Networks Designed Using an Efficient Neural Architecture Search Methodology. Mathematics 2024, 12, 1563. https://doi.org/10.3390/math12101563

AMA Style

Song L, Ding L, Yin M, Ding W, Zeng Z, Xiao C. Remote Sensing Image Classification Based on Neural Networks Designed Using an Efficient Neural Architecture Search Methodology. Mathematics. 2024; 12(10):1563. https://doi.org/10.3390/math12101563

Chicago/Turabian Style

Song, Lan, Lixin Ding, Mengjia Yin, Wei Ding, Zhigao Zeng, and Chunxia Xiao. 2024. "Remote Sensing Image Classification Based on Neural Networks Designed Using an Efficient Neural Architecture Search Methodology" Mathematics 12, no. 10: 1563. https://doi.org/10.3390/math12101563

APA Style

Song, L., Ding, L., Yin, M., Ding, W., Zeng, Z., & Xiao, C. (2024). Remote Sensing Image Classification Based on Neural Networks Designed Using an Efficient Neural Architecture Search Methodology. Mathematics, 12(10), 1563. https://doi.org/10.3390/math12101563

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Remote Sensing Image Classification Based on Neural Networks Designed Using an Efficient Neural Architecture Search Methodology

Abstract

1. Introduction

2. A Neural Network Architecture Search Method for Remote Sensing Images

3. Experiment

3.1. Architecture Search Performance

3.2. Architecture Evaluation

3.3. Method Robustness

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI