Combining KAN with CNN: KonvNeXt’s Performance in Remote Sensing and Patent Insights

Cheon, Minjong; Mun, Changbae

doi:10.3390/rs16183417

Open AccessArticle

Combining KAN with CNN: KonvNeXt’s Performance in Remote Sensing and Patent Insights

by

Minjong Cheon

¹

and

Changbae Mun

^2,*

¹

Center for Sustainable Environment Research, Korea Institute of Science and Technology, 5 Hwarang-ro 14-gil, Wolgok-dong, Seongbuk-gu, Seoul 02792, Republic of Korea

²

Department of Electrical, Electronic & Communication Engineering, Hanyang Cyber University, Seoul 04764, Republic of Korea

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(18), 3417; https://doi.org/10.3390/rs16183417 (registering DOI)

Submission received: 8 July 2024 / Revised: 12 August 2024 / Accepted: 10 September 2024 / Published: 14 September 2024

(This article belongs to the Special Issue Advanced Artificial Intelligence and Deep Learning for Remote Sensing II)

Download

Browse Figures

Versions Notes

Abstract

:

Rapid advancements in satellite technology have led to a significant increase in high-resolution remote sensing (RS) images, necessitating the use of advanced processing methods. Additionally, patent analysis revealed a substantial increase in deep learning and machine learning applications in remote sensing, highlighting the growing importance of these technologies. Therefore, this paper introduces the Kolmogorov-Arnold Network (KAN) model to remote sensing to enhance efficiency and performance in RS applications. We conducted several experiments to validate KAN’s applicability, starting with the EuroSAT dataset, where we combined the KAN layer with multiple pre-trained CNN models. Optimal performance was achieved using ConvNeXt, leading to the development of the KonvNeXt model. KonvNeXt was evaluated on the Optimal-31, AID, and Merced datasets for validation and achieved accuracies of 90.59%, 94.1%, and 98.1%, respectively. The model also showed fast processing speed, with the Optimal-31 and Merced datasets completed in 107.63 s each, while the bigger and more complicated AID dataset took 545.91 s. This result is meaningful since it achieved faster speeds and comparable accuracy compared to the existing study, which utilized VIT and proved KonvNeXt’s applicability for remote sensing classification tasks. Furthermore, we investigated the model’s interpretability by utilizing Occlusion Sensitivity, and by displaying the influential regions, we validated its potential use in a variety of domains, including medical imaging and weather forecasting. This paper is meaningful in that it is the first to use KAN in remote sensing classification, proving its adaptability and efficiency.

Keywords:

ConvNeXt; Kolmogorov-Arnold Network (KAN); KonvNeXt; occlusion sensitivity; remote sensing; satellite technology

1. Introduction

The rapid growth of satellite technology has resulted in a surge of high-resolution remote sensing (RS) images [1,2]. Integrating data from many satellites into time series has increased the identification of particular occurrences and trends, thereby increasing our understanding of global climate patterns and ecological changes [3,4]. Satellites such as Landsat, Sentinel, MODIS, and Gaofen are now among the most significant in Earth observations, with uses ranging from monitoring deforestation to tracking atmospheric conditions. To monitor vegetation, crop yields, and forest disturbances, for instance, developing sun-synchronous polar orbiting satellites is essential [5,6]. Deep learning models used in these cases have shown significant results in a variety of applications, including environmental monitoring, item detection, and land cover categorization [7,8]. Advanced image processing and analysis were made possible by the early active application of deep learning techniques in the RS area.

To gain better knowledge of the usage of deep learning (DL) and machine learning (ML) techniques in remote sensing, we examined patent titles that contained terminology of this kind. As Figure 1 shows, from 23 in 2015 to 278 in 2023, there was a significant increase in the number of these patents, which increased steadily between 2015 and 2023. This tendency suggests that while these technologies will continue to be employed in remote sensing, more advanced deep learning technologies will be necessary to manage more complex and vast remote sensing data and applications [9].

However, applying deep learning to high-resolution datasets raises major obstacles. The vast number of parameters required for deep neural networks, particularly for high-resolution images, can result in higher processing costs and memory needs, rendering the training process inefficient and potentially infeasible with standard hardware resources. Furthermore, overparameterization can lead to longer training durations and perhaps overfitting, in which the model memorizes the training data rather than generalizing from it, limiting its usefulness in real-world applications [10]. To address these difficulties, novel ways for successful training are needed, and while reducing the number of parameters, the model should retain its performance. Model pruning and quantization are representative solutions that have been widely used, but each method has its own set of benefits and downsides [11,12]. Therefore, in this paper, we introduce a novel network, which is the Kolmogorov-Arnold Network (KAN) [13]. The KAN’s characteristics could provide a more efficient learning framework [13]. The main contributions of this paper can be summarized as follows:

Efficiency and Performance Improvement: By replacing standard Multi-Layer Perceptrons (MLPs) with KAN, we hope to improve the efficiency and performance of remote sensing applications.
Comprehensive Model Comparison: Our strategy includes using and comparing different pre-trained Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) models to determine the best KAN pairings, resulting in optimum performance.
Evaluation of Diverse Datasets: Based on the above results, we presented and evaluated our suggested model, KonvNeXt, on four different remote sensing datasets to compare its performance with existing results. This experiment allowed us to scrutinize KonvNeXt’s performance in remote sensing fields.
Application of Explainable AI (XAI): In addition, we applied Explainable AI (XAI) approaches to our model for its interpretability. This process is significant for understanding the decision-making processes of deep learning models, leading to transparency in AI-driven remote sensing applications.

We conclude this section by providing a summary of the paper’s structure. In Section 2: Related Works, we introduce research papers that are closely linked to our work. Section 3: Materials and Methods describes the data used, the Kolmogorov-Arnold Network (KAN), and our proposed approach. Section 4: Results discusses the results of our experiments, Section 5: Discussion, and Section 6: Conclusion provides an analysis of the data before concluding the study with significant insights and next steps.

2. Related Works

First, we introduce related research that applied deep learning technologies to remote sensing datasets. Han and Basalamah demonstrated the effectiveness of a multibranch system combining pre-trained models for local and global feature extraction in satellite image classification and achieved superior accuracy across three datasets [14]. By maximizing the performance of small neural networks, Chen et al. used knowledge distillation to increase the accuracy on a variety of remote sensing datasets [15]. Broni-Bediako et al. used automated neural architecture search to develop efficient CNNs for scene classification, which outperformed more traditional models with fewer parameters, such as EuroSAT and BigEarthNet [16]. Temenos et al. also applied an interpretable deep learning framework forr land use classification using SHAP to achieve both high accuracy and enhanced interpretability [17]. Yadav et al. enhanced classification on the EuroSAT dataset using pre-trained CNNs and advanced preprocessing techniques, and GoogleNet showed the best performance [18]. These studies highlight advancements in model accuracy, efficiency, and interpretability for remote sensing image classification.

Then, we introduce research that utilized KAN for classification or regression. In satellite traffic forecasting, Vaca-Rubio et al. showed that KANs performed better with fewer parameters than traditional MLPs, while yielding greater accuracy. The substantial influence of KAN-specific factors was demonstrated in that research, which highlighted their potential for adaptive forecasting models [19]. Bozorgasl and Chen addressed diverse data distribution across clients by developing Wavelet Kolmogorov-Arnold Networks (Wav-KAN) for hybrid learning. Extensive trials on datasets including MNIST, CIFAR10, and CelebA validated their improvements, which were substantially increased by the incorporation of wavelet-based activation functions [20]. Using KANs, Abueidda and Pantidis created DeepONets to develop efficient substitutes for mechanics problems. Their approach required fewer learnable parameters than traditional MLP-based models and was validated in computational solid mechanics. This demonstrates notable computational speedups and efficiency, making it suitable for high-dimensional and complex engineering applications [21].

While previous research has utilized pre-trained models, knowledge distillation, NAS, and interpretability frameworks to enhance the performance on remote sensing datasets, our technique focuses on determining the possibility of applying KAN to the RS dataset. Our suggested methodology makes sense with the current trends in KAN, which provides a new way for optimization in remote sensing applications. Our integration of KAN with ConvNeXt indicates its capacity to handle remote sensing datasets effectively, unlike other approaches that rely on complex multibranch frameworks or preprocessing procedures, thus offering a viable alternative for future studies in this sector.

3. Materials and Methods

3.1. Dataset Description and Processing

Four different datasets, EuroSAT, Optimal-31, AID, and Merced, were used in this experiment. The EuroSAT dataset proposed by Helber et al. is used to categorize land cover and land use. This collection has 27,000 photographs from 34 different places in Europe, divided into ten categories: annual crop, forest, herbaceous vegetation, highway, industrial, pasture, permanent crop, residential, river, and sea lake [22]. The OPTIMAL-31 dataset consists of 1860 aerial images sourced from Wang et al. This dataset includes 31 different land use types, with each type represented by 60 images of 256 × 256 pixels, boasting a spatial resolution of 0.3 m [23]. AID is a large-scale aerial image dataset sourced from Google Earth, 600 × 600 pixels, containing 10,000 images across 30 scene types, such as airport, forest, and residential areas. Despite post-processing, these images are proven comparable to real optical aerial images for land use and cover mapping [24]. Merced consists of 21 land use classes with 100 images per class, each measuring 256 × 256 pixels. The images were manually extracted from the USGS National Map Urban Area Imagery collection, featuring a pixel resolution of 1 foot [25]. Figure 2, Figure 3, Figure 4 and Figure 5 show examples from each dataset.

The dataset has been divided into three sections for experimentation: approximately 70% of the data is allocated for training, 15% for validation, and 15% for testing. As part of the preparation of the dataset, images are resized to 224 × 224 pixels and normalized using the mean and standard deviation values of ImageNet (mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]). The transformations applied to the data include normalization, conversion to tensors, random horizontal and vertical flips, and random resizing of the cropped images.

3.2. ConvNeXt

Based on conventional ConvNet methods, the ConvNeXt algorithm explicitly modernizes the ResNet model by integrating several Vision Transformers (VIT) design features. As shown in Figure 6, the main structural changes include converting the stem cell to a patchy layer, which divides the input picture into smaller, non-overlapping patches and treats each as a sequence token for additional processing. The algorithm adopts an inverted bottleneck block, used by MobileNetV2, to expand the number of channels before reducing them, capturing spatial and channel-wise features with a reduced computational cost. Depthwise convolutions with larger network widths process each input channel separately, which could reduce computational complexity, while pointwise convolutions (1 × 1 convolution) combine information across channels. Moreover, ConvNeXt applies large kernel sizes for convolutions, Gaussian Error Linear Unit (GELU) activation functions, and LayerNorm, which is also the main strategy of the vision transformer. With these modifications, the ConvNeXt could achieve better accuracy than VIT models. Furthermore, by adjusting the hidden dimension sizes, there are several versions, such as ConvNeXt-tiny and ConvNeXt-large [26,27].

3.3. Kolmogorov-Arnold Network

Inspired by the Kolmogorov-Arnold representation theorem, Kolmogorov-Arnold Networks (KANs) are a sophisticated type of neural network with learnable activation functions on edges as opposed to fixed activations on nodes in conventional Multi-Layer Perceptrons (MLPs). B-splines, which are piecewise polynomial functions defined by control points and knots, parameterize these activation functions. Spline-parameterized functions

Φ_{q, p}

change each input feature

x_{p}

, aggregate the results into intermediate values for each q, and then pass the values through functions

Φ_{q}

. The total of these converted values is the final output f(x), which enables the network to capture complex data patterns effectively and flexibly. The activation functions in KANs are a combination of a Basis Function and a Spline, with the Basis Function often being the Sigmoid Linear Unit (SiLU), defined as

s i l u (x) = \frac{x}{(1 + e^{- x})}

. The spline component

s p l i n e (x) = \sum_{i} c_{i} B_{i} (x)

uses B-spline basis functions Bi(x) and coefficients ci, which are learned during training, and Figure 7 depicts the plot. These coefficients determine the final shape of the activation functions, replacing the need for traditional linear transformation parameters W and b in MLPs.

There are also multiple neurons in the KAN, which is similar to that of the MLP. Each hidden layer contains nodes that aggregate the inputs from the previous layer. The output of each node is determined by the spline-parameterized functions of the incoming edges. As data flows through the network, each input to a node is transformed by B-spline functions on the edges. The transformed inputs are then aggregated (usually by summation) at the node. This aggregation forms the input for the next layer. The training process in KANs involves adjusting the control points and knots of the B-splines to minimize the loss function, and there are three key steps. The first is to compute the output by passing the inputs through the network and applying spline transformations at each edge. Backpropagation computes the gradients of the loss concerning the control points and knots. These gradients are utilized to update the parameters using optimization algorithms like gradient descent. KANs exhibit various advantages over MLPs, including better accuracy and interpretability with fewer parameters. They achieve this by having smaller architectures that can perform comparably or better than larger MLPs in tasks such as data fitting and partial differential equation (PDE) solving. Since KANs can be depicted, they help discover mathematical and physical laws in scientific applications. Moreover, KANs can aid in preventing catastrophic forgetting, a neural network problem, when the learning of new information causes the loss of previously learned information. One notable drawback of KANs over MLPs is their slower training pace, which optimization can address [13,28].

The formula for the Kolmogorov-Arnold Network (KAN) is given by,

f (x) = \sum_{q = 1}^{2 n + 1} Φ_{q} (\sum_{p = 1}^{n} φ_{q, p} (x_{p}))

(1)

The function f(x) in a KAN, where

φ_{q, p} (x_{p})

are spline functions, and Φq are transformations.

φ (x) = w (b (x) + s p l i n e (x))

(2)

φ(x) denotes the activation function and w is the weight, where b(x) is the Basis Function, implemented as silu(x) (Sigmoid Linear Unit), and spline(x) is the spline function.

b (x) = s i l u (x) = \frac{x}{(1 + e^{- x})}

(3)

3.4. ConvNeXt Kolmogorov-Arnold Networks: KonvNeXt

The proposed approach combines KAN and the ConvNeXt architecture to improve a pre-trained ConvNeXt model’s learning capacity. To enhance the feature-extracting performance, we adopted models that were trained with the ImageNet datasets. The model adds two KANLinear layers instead of the conventional MLP classifier. Unlike typical neural networks, which use fixed activation functions on the nodes, KANLinear layers use learnable activation functions on their edges. More stability and flexibility are provided by the use of B-splines to describe these activation functions. By substituting these spline functions for linear weights in the KANLinear layer, the model is better able to execute the model efficiently because the KAN layer does not consume much memory compared to the original layers. This approach is not only applicable to ConvNeXt, which means that it can replace neural net layers in various deep learning models.

4. Results

First, we applied various CNN-based models that were pre-trained on the ImageNet dataset to determine the most optimal one on the EuroSAT dataset. Typically, these pre-trained networks utilize MLP layers for classification or regression tasks. However, since our research aimed to demonstrate the potential of replacing MLP with KAN, we substituted MLP with KAN in these models. We evaluated several models, including VGG16, MobileNetV2, EfficientNet, ConvNeXt, ResNet101, and ViT [26,29,30,31,32,33]. The observed accuracies of these models were 88%, 75%, 67%, 94%, 75%, and 92%, respectively, as shown in Figure 8. To demonstrate the efficiency of the KAN, we initially configured each KAN layer with 256 nodes and subsequently reduced the number of nodes to 32 for comparison. The results revealed identical performances between the two configurations. Both setups achieved an accuracy of 94% in the first epoch, which increased to 96% in the second epoch, maintaining this accuracy in subsequent epochs. This experiment indicates that the KAN layers can achieve high accuracy with fewer training epochs, even when the number of nodes is significantly reduced, which proves the efficiency of the KAN layer. From the proposed approach of integrating ConvNeXt with KAN, we obtained interesting results in terms of classification accuracy per epoch. The model achieved an accuracy of 94% in the first epoch, which increased to 96% in the second epoch. An accuracy of 96% was consistently maintained through the fifth epoch. These results suggest that even with one or two epochs, sufficient training of KAN-based networks can be achieved. Despite the drawback that KAN is ten times slower than traditional MLPs, the reduced number of necessary training epochs effectively overcomes this downside [34].

Then, using our KonvNeXt model, we tested three more popular datasets in remote sensing fields: Optimal-31, AID, and Merced. We chose these datasets since there already exists a paper that utilized VIT for remote sensing datasets [35]. As Table 1 shows, with 25 epochs, our proposed model achieved 90.59% for Optimal-31, 94.1% for AID, and 98.1% for Merced. Since Bazi et al.’s VIT model yielded 92.76% for Optimal-31, 91.76% for AID, and 92.76% for Merced datasets, this result implied that our proposed approach resulted in better performance than the VIT models. This result is meaningful since the proposed model did not require a long training time. For instance, it took 107.14 s for the Optimal-31 dataset, 545.91 s for the AID dataset, and Merced for 107.63 s. Since Bazi et al.’s model took an average of 30 min to train those datasets, these results proved that our approach not only achieved high performance but also a fast speed, which could overcome the original downsides of the KAN models [35].

In another experiment, we compared the performance of our proposed model with that of the original ConvNeXt models. ConvNeXt achieved accuracy rates of 84.68%, 94.6%, and 97.8% on the three datasets, as depicted in Table 2. These results indicate that substituting the neural network’s linear layers with the KAN layers can be effective in terms of performance and memory efficiency. Interestingly, when only two layers were used for comparison, the speed gap between the models was small. This result suggests that the ConvNeXt model with the KAN layer is efficient enough for these remote sensing datasets, even with fewer linear layers.

Since our proposed approach utilizes ConvNeXt to capture features from images, XAI methods can be applied to our model. For instance, we applied Occlusion Sensitivity in the above experiments. Occlusion Sensitivity is a technique used in Explainable AI to identify the regions of an image that most influence a model’s predictions by systematically occluding parts of the image and observing changes in the output. The heatmap overlay in the right image highlights these influential regions, with red/yellow areas indicating high sensitivity and blue areas indicating low sensitivity. The results are shown in Figure 9, Figure 10 and Figure 11. The results demonstrated that the proposed model classified the target with reasonable decision-making. Furthermore, this result also shows the potential of applying this model to diverse datasets, where interpreting deep learning is crucial, such as in weather prediction or medical diagnosis [36]. In addition to Occlusion Sensitivity, we also present confusion matrices in Figure 12 and Figure 13, which illustrate the classification performance of KonvNeXt on the UCMES and AID datasets.

5. Discussion

This paper introduces the application of Kernel Attention Networks (KAN) to remote sensing datasets by integrating them with the ConvNeXt algorithm. Given that KAN is in its early stages of development and has primarily been tested on the MNIST, CIFAR-10, CIFAR-100, and ImageNet datasets, its suitability for specific tasks was previously uncertain. Therefore, our work is pioneering in this field since we evaluated the efficacy of our proposed model, KonvNeXt, by testing it on four different representative remote sensing classification datasets. The experiments demonstrated that KonvNeXt has significant potential for applications across various datasets. In addition, our approach may offer greater applicability because many existing deep learning models utilize neural network layers that demand substantial GPU memory based on their parameters. In this context, the KAN layer could serve as a substitute, extending beyond computer vision tasks to include natural language processing (NLP) tasks, even in large language model (LLM) tasks. In future work, we will explore models that consist solely of KAN layers, following the current trend, without merging them with any CNN models, and apply them to remote sensing datasets to compare the performance of KAN to the existing CNN models [37,38].

6. Conclusions

The objective of this paper is to introduce the KAN model to the field of remote sensing. To validate the applicability of this model, we conducted several experiments. First, with the EuroSAT dataset, we combined the KAN layer with multiple pre-trained CNN models to discover the best fit. We found that ConvNeXt with KAN reached the optimal performance, so we named it KonvNeXt. For broader validation of the model in the remote sensing field, the model was tested on three different datasets—Optimal-31, AID, and Merced—and it achieved high performance in terms of both accuracy and speed. Specifically, the model achieved accuracies of 90.59% on the Optimal-31 dataset, 94.1% on the AID dataset, and 98.1% on the Merced dataset. In terms of processing speed, the model exhibited notable efficiency. For the speed, it processed the Optimal-31 and Merced datasets in 107.63 s each, demonstrating a consistent performance. However, the AID dataset, being more complex and larger in scale, took 545.91 s to process. These experiments proved that KonvNeXt could be suitably applied to remote sensing classification tasks. Furthermore, by utilizing Occlusion Sensitivity, we also demonstrated that the various existing XAI methods could be applied to KonvNeXt, which provided the possibility of its versatility in various fields, such as medicine or weather forecasting. In conclusion, this paper is meaningful as it represents the first approach to applying KAN in remote sensing classification and shows a possibility.

Author Contributions

Conceptualization, C.M.; Methodology, M.C.; Visualization, M.C.; Supervision, C.M. Project administration, C.M. Funding acquisition: C.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2023S1A5A8078960).

Data Availability Statement

The four different datasets used in this experiment are reachable via these links. Optimal-31 is available at https://huggingface.co/datasets/jonathan-roberts1/Optimal-31 (accessed on 12 June 2024), EuroSAT can be found at https://github.com/phelber/EuroSAT (accessed on 12 June 2024), Merced is available at http://weegee.vision.ucmerced.edu/datasets/landuse.html, AID can be found at https://captain-whu.github.io/AID/ (accessed on 12 June 2024).

Acknowledgments

We would like to express our gratitude to the providers of the datasets used in this study. Our sincere thanks go to Jonathan Roberts for making the Optimal-31 dataset available via Hugging Face, Patrick Helber, and contributors for providing the EuroSAT dataset on GitHub, the UC Merced team for making the Merced dataset accessible, and the Captain WHU team for their work on the AID dataset. Their contributions have helped advance our research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yang, C. High Resolution Satellite Imaging Sensors for Precision Agriculture. Front. Agric. Sci. Eng. 2018, 5, 393–405. [Google Scholar] [CrossRef]
Ma, Y.; Wu, H.; Wang, L.; Huang, B.; Ranjan, R.; Zomaya, A.; Jie, W. Remote Sensing Big Data Computing: Challenges and Opportunities. Future Gener. Comput. Syst. 2015, 51, 47–60. [Google Scholar] [CrossRef]
Xu, H.; Wang, Y.; Guan, H.; Shi, T.; Hu, X. Detecting Ecological Changes with a Remote Sensing Based Ecological Index (RSEI) Produced Time Series and Change Vector Analysis. Remote Sens. 2019, 11, 2345. [Google Scholar] [CrossRef]
Milesi, C.; Churkina, G. Measuring and Monitoring Urban Impacts on Climate Change from Space. Remote Sens. 2020, 12, 3494. [Google Scholar] [CrossRef]
Ustin, S.L.; Middleton, E.M. Current and Near-Term Advances in Earth Observation for Ecological Applications. Ecol. Process. 2021, 10, 1. [Google Scholar] [CrossRef]
Leblois, A.; Damette, O.; Wolfersberger, J. What Has Driven Deforestation in Developing Countries Since the 2000s? Evidence from New Remote-Sensing Data. World Dev. 2017, 92, 82–102. [Google Scholar] [CrossRef]
Shafique, A.; Cao, G.; Khan, Z.; Asad, M.; Aslam, M. Deep Learning-Based Change Detection in Remote Sensing Images: A Review. Remote Sens. 2022, 14, 871. [Google Scholar] [CrossRef]
Adegun, A.A.; Viriri, S.; Tapamo, J.-R. Review of Deep Learning Methods for Remote Sensing Satellite Images Classification: Experimental Survey and Comparative Analysis. J. Big Data 2023, 10, 93. [Google Scholar] [CrossRef]
Google Patents. Available online: https://patents.google.com/ (accessed on 11 June 2024).
Liu, S.; Yin, L.; Mocanu, D.C.; Pechenizkiy, M. Do We Actually Need Dense Over-Parameterization? In-Time Over-Parameterization in Sparse Training. arXiv 2021. [Google Scholar] [CrossRef]
Hoefler, T.; Alistarh, D.; Ben-Nun, T.; Dryden, N.; Peste, A. Sparsity in Deep Learning: Pruning and Growth for Efficient Inference and Training in Neural Networks. J. Mach. Learn. Res. 2021, 22, 1–124. [Google Scholar]
Vadera, S.; Ameen, S. Methods for Pruning Deep Neural Networks. IEEE Access 2022, 10, 63280–63300. [Google Scholar] [CrossRef]
Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljačić, M.; Hou, T.Y.; Tegmark, M. KAN: Kolmogorov-Arnold Networks. arXiv 2024. [Google Scholar] [CrossRef]
Khan, S.D.; Basalamah, S. Multi-Branch Deep Learning Framework for Land Scene Classification in Satellite Imagery. Remote Sens. 2023, 15, 3408. [Google Scholar] [CrossRef]
Chen, G.; Zhang, X.; Tan, X.; Cheng, Y.; Dai, F.; Zhu, K.; Gong, Y.; Wang, Q. Training Small Networks for Scene Classification of Remote Sensing Images via Knowledge Distillation. Remote Sens. 2018, 10, 719. [Google Scholar] [CrossRef]
Broni-Bediako, C.; Murata, Y.; Mormille, L.H.B.; Atsumi, M. Searching for CNN Architectures for Remote Sensing Scene Classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4701813. [Google Scholar] [CrossRef]
Temenos, A.; Temenos, N.; Kaselimi, M.; Doulamis, A.; Doulamis, N. Interpretable Deep Learning Framework for Land Use and Land Cover Classification in Remote Sensing Using SHAP. IEEE Geosci. Remote Sens. Lett. 2023, 20, 8500105. [Google Scholar] [CrossRef]
Yadav, D.; Kapoor, K.; Yadav, A.K.; Kumar, M.; Jain, A.; Morato, J. Satellite Image Classification Using Deep Learning Approach. Earth Sci. Inform. 2024, 17, 2495–2508. [Google Scholar] [CrossRef]
Vaca-Rubio, C.J.; Blanco, L.; Pereira, R.; Caus, M. Kolmogorov-Arnold Networks (KANs) for Time Series Analysis. arXiv 2024. [Google Scholar] [CrossRef]
Bozorgasl, Z.; Chen, H. WAV-KAN: Wavelet Kolmogorov-Arnold Networks. arXiv 2024. [Google Scholar] [CrossRef]
Abueidda, D.W.; Pantidis, P.; Mobasher, M.E. DeepOKAN: Deep Operator Network Based on Kolmogorov Arnold Networks for Mechanics Problems. arXiv 2024. [Google Scholar] [CrossRef]
Helber, P.; Bischke, B.; Dengel, A.; Borth, D. EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification. arXiv 2017. [Google Scholar] [CrossRef]
Wang, Q.; Liu, S.; Chanussot, J.; Li, X. Scene Classification with Recurrent Attention of VHR Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2019, 57, 1155–1167. [Google Scholar] [CrossRef]
Xia, G.-S.; Hu, J.; Hu, F.; Shi, B.; Bai, X.; Zhong, Y.; Zhang, L.; Lu, X. AID: A Benchmark Data Set for Performance Evaluation of Aerial Scene Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3965–3981. [Google Scholar] [CrossRef]
Yang, Y.; Newsam, S. Bag-of-Visual-Words and Spatial Extensions for Land-Use Classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010. [Google Scholar]
Liu, Z.; Mao, H.; Wu, C.-Y.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. arXiv 2022. [Google Scholar] [CrossRef]
Cheon, M.; Choi, Y.-H.; Kang, S.-Y.; Choi, Y.; Lee, J.-G.; Kang, D. KARINA: An Efficient Deep Learning Model for Global Weather Forecast. arXiv 2024. [Google Scholar] [CrossRef]
Schmidt-Hieber, J. The Kolmogorov–Arnold Representation Theorem Revisited. Neural Netw. 2021, 137, 119–126. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv 2018. [Google Scholar] [CrossRef]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image Is Worth 16×16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations, Online, 3–7 May 2021. [Google Scholar]
Cheon, M. Kolmogorov-Arnold Network for Satellite Image Classification in Remote Sensing. arXiv 2024. [Google Scholar] [CrossRef]
Bazi, Y.; Bashmal, L.; Rahhal, M.M.A.; Dayil, R.A.; Ajlan, N.A. Vision Transformers for Remote Sensing Image Classification. Remote Sens. 2021, 13, 516. [Google Scholar] [CrossRef]
Zeiler, M.D.; Fergus, R. Visualizing and Understanding Convolutional Networks. arXiv 2013. [Google Scholar] [CrossRef]
Cheon, M. Demonstrating the Efficacy of Kolmogorov-Arnold Networks in Vision Tasks. arXiv 2024. [Google Scholar] [CrossRef]
Drokin, I. Kolmogorov-Arnold Convolutions: Design Principles and Empirical Studies. arXiv 2024. [Google Scholar] [CrossRef]

Figure 1. Annual Trends in AI/Deep Learning/Machine Learning Patents (2015–2023).

Figure 2. Example images of the AID datasets used in the experiment.

Figure 3. Example images of Optimal-31 datasets used in the experiment.

Figure 4. Example images of Merced datasets used in the experiment.

Figure 5. Example images of EuroSAT datasets used in the experiment.

Figure 6. The overall architecture of the ConvNeXt model was used as the backbone model in the experiment.

Figure 7. An example of a B-spline Curve with Control Points shows a B-spline curve (blue) with its control points (red) and connecting lines.

Figure 8. Comparison of Various pre-trained CNN/ViT Models on the EuroSAT Dataset.

Figure 9. Occlusion Sensitivity Example When using the Optimal-31 dataset with the KonvNeXt model, the original aerial view of an aircraft that served as the KonvNeXt model’s input is shown in the left picture.

Figure 10. Illustration of Occlusion Sensitivity When using the Merced dataset with the KonvNeXt model: The original aerial photograph of a forest that served as the KonvNeXt model input is shown in the left picture.

Figure 11. Example of Occlusion Sensitivity Applied to the KonvNeXt model with the AID dataset: The left image shows the original aerial view of a bridge used as input for the KonvNeXt model.

Figure 12. Confusion Matrix results of the ConvNeXt model with the UCMES dataset.

Figure 13. Confusion Matrix results of the ConvNeXt model with the AID dataset.

Table 1. Summary of accuracy and speed when applying KonvNeXt on three different datasets: Optimal-31, AID, and Merced.

Accuracy/Speed	Optimal-31	AID	Merced
accuracy	90.59%	94.1%	98.1%
speed	107.63 s	545.91 s	107.63 s

Table 2. Summary of accuracy and speed when applying ConvNeXt to three different datasets: Optimal-31, AID, and Merced.

Accuracy/Speed	Optimal-31	AID	Merced
accuracy	84.68%	94.6%	97.8%
speed	106.63 s	549.3 s	106.64 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cheon, M.; Mun, C. Combining KAN with CNN: KonvNeXt’s Performance in Remote Sensing and Patent Insights. Remote Sens. 2024, 16, 3417. https://doi.org/10.3390/rs16183417

AMA Style

Cheon M, Mun C. Combining KAN with CNN: KonvNeXt’s Performance in Remote Sensing and Patent Insights. Remote Sensing. 2024; 16(18):3417. https://doi.org/10.3390/rs16183417

Chicago/Turabian Style

Cheon, Minjong, and Changbae Mun. 2024. "Combining KAN with CNN: KonvNeXt’s Performance in Remote Sensing and Patent Insights" Remote Sensing 16, no. 18: 3417. https://doi.org/10.3390/rs16183417

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Combining KAN with CNN: KonvNeXt’s Performance in Remote Sensing and Patent Insights

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Dataset Description and Processing

3.2. ConvNeXt

3.3. Kolmogorov-Arnold Network

3.4. ConvNeXt Kolmogorov-Arnold Networks: KonvNeXt

4. Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI