An Automated Machine Learning Framework for Adaptive and Optimized Hyperspectral-Based Land Cover and Land-Use Segmentation

Vali, Ava; Comai, Sara; Matteucci, Matteo

doi:10.3390/rs16142561

Open AccessArticle

An Automated Machine Learning Framework for Adaptive and Optimized Hyperspectral-Based Land Cover and Land-Use Segmentation

by

Ava Vali

,

Sara Comai

^*

and

Matteo Matteucci

Department of Electronic Information and Bioengineering, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(14), 2561; https://doi.org/10.3390/rs16142561

Submission received: 20 May 2024 / Revised: 3 July 2024 / Accepted: 5 July 2024 / Published: 12 July 2024

(This article belongs to the Special Issue Deep Learning for the Analysis of Multi-/Hyperspectral Images II)

Download

Browse Figures

Versions Notes

Abstract

:

Hyperspectral imaging holds significant promise in remote sensing applications, particularly for land cover and land-use classification, thanks to its ability to capture rich spectral information. However, leveraging hyperspectral data for accurate segmentation poses critical challenges, including the curse of dimensionality and the scarcity of ground truth data, that hinder the accuracy and efficiency of machine learning approaches. This paper presents a holistic approach for adaptive optimized hyperspectral-based land cover and land-use segmentation using automated machine learning (AutoML). We address the challenges of high-dimensional hyperspectral data through a revamped machine learning pipeline, thus emphasizing feature engineering tailored to hyperspectral classification tasks. We propose a framework that dissects feature engineering into distinct steps, thus allowing for comprehensive model generation and optimization. This framework incorporates AutoML techniques to streamline model selection, hyperparameter tuning, and data versioning, thus ensuring robust and reliable segmentation results. Our empirical investigation demonstrates the efficacy of our approach in automating feature engineering and optimizing model performance, even without extensive ground truth data. By integrating automatic optimization strategies into the segmentation workflow, our approach offers a systematic, efficient, and scalable solution for hyperspectral-based land cover and land-use classification.

Keywords:

automated machine learning (AutoML); feature engineering; deep learning; hyperspectral image segmentation

1. Introduction

Hyperspectral imaging (HSI) sensors gather extensive information through 3D hyperspectral images, with spatial dimensions for object visual characteristic detection and the spectral dimension helping with material identification. The extraction of such information from hyperspectral images is beyond a manual task and requires advanced computer-aided techniques. Each hyperspectral image consists of numerous narrow-band images of the same scene, thus requiring tailored image processing methods for thorough analysis. Accordingly, while conventional image principles and processing techniques apply to hyperspectral data, their comprehensive utilization requires additional adaptation and effort.

Remote sensing not only serves as the origin and primary domain of HSI technology but also drives most of its advancements and applications. Its foremost application lies under earth observation to detect and monitor the physical characteristics of areas and objects on the earth. Monitoring changes in land cover is key for better formulating and managing regulations to prevent or mitigate damage that results from human activities [1]. Furthermore, monitoring subtle yet significant alterations in land cover aids in predicting and even preventing natural disasters and hazardous events [2]. The continuous temporal availability of remote sensing data can substantially facilitate the automatic extraction, mapping, and monitoring of terrestrial objects and land covers. Being sensitive to narrow spectral bands across a continuous spectral range, HSI arguably emerges as the most promising method for acquiring remote sensing data, as it is remarkably informative and holds the potential to revolutionize earth monitoring capabilities [3].

In recent decades, the exponential growth of computational power has shifted Artificial Intelligence (AI) to the forefront as the most influential and transformative technology of our time. AI harnesses the capabilities of computing systems for training and inference, which facilitates a broad spectrum of applications. Machine Learning (ML), a prominent subdiscipline of AI, employs statistical algorithms to emulate human learning processes by leveraging available data. Through this process, Machine Learning (ML) produces statistical models capable of making predictions for new unseen data. A fundamental task within ML is classification, where objects are identified and categorized. Recent advancements in ML, particularly in image classification and segmentation, underscore the immense potential of these techniques in hyperspectral image analysis [4]. Research indicates that ML methods surpass traditional approaches in hyperspectral image analysis, which typically involves manual or semimanual examination of the spectral information to identify objects and materials [5]. Unlike conventional methods, ML autonomously explores the relationship between the spectral information and desired outcomes during the learning phase, thus exhibiting robustness against noise and outliers in the dataset. Among ML methodologies, supervised learning stands out as the preferred approach due to its simplicity, speed, cost-effectiveness, and reliability.

Semantic segmentation, the primary task in hyperspectral image analysis, entails assigning one or multiple labels to every pixel in a given image, thus generating segmented maps as output. This process utilizes both spectral and spatial information to exploit the physical and chemical characteristics of constituent objects and areas. Hyperspectral segmentation essentially performs pixel-level classification, thus distinguishing it from patch-level classification, which assigns labels to pixel patches. While Deep Learning (DL), a subgroup of ML methodologies, has significantly advanced semantic segmentation in RGB images in recent years [6], hyperspectral image segmentation presents additional complexity due to its spectral dimension [7], which usually contains the most informative details to distinguish the objects and areas within the image. Therefore, standard DL models need customization with additional considerations so that they can properly meet the requirements of the hyperspectral image segmentation task.

Despite the significant potential of ML in hyperspectral image analysis, several critical challenges persist that call for further research and development. One such challenge is the curse of dimensionality [8], which is a phenomenon wherein the computational complexity of a problem escalates drastically with the increase in variables, dimensions, or features. This challenge is particularly pronounced in hyperspectral image analysis due to its spectral dimension, which leads to sparsity and exacerbates the curse of dimensionality. Consequently, efficient exploitation of information from hyperspectral images requires adopting proper strategies for dimensionality reduction. Despite ongoing efforts, devising approaches that effectively reduce dimensionality while maximizing the preservation of valuable information remains a challenge, which is crucial for enhancing the practicality of ML solutions in real-world scenarios [9].

Another critical obstacle in hyperspectral image analysis, particularly for DL techniques, is the ground truth scarcity [10]. DL methods often require extensive training data with a ground truth, which is typically challenging to obtain. This scarcity not only hampers model training but also leads to overfitting and low model performance. Consequently, classical ML techniques like Support Vector Machines (SVMs) may outperform DL in scenarios with limited training data [11]. Some strategies, including data augmentation [12], semisupervised learning [13,14], and transfer learning [15,16,17], aim to address this challenge by respectively augmenting existing datasets, labeling unlabeled data, or transferring knowledge from pretrained models to new datasets so that they can partially mitigate the impact of ground truth scarcity on DL performance.

Ensuring the robustness, reliability, and generalizability of ML models poses another significant challenge [18,19]. Current datasets often lack the necessary variability to develop robust models capable of performing reliably under diverse conditions. Factors such as acquisition time, setup variations, sensor resolutions, and noise levels are frequently overlooked, thus leading to the development of models based on limited data scenarios. Building robust models that can accommodate a wide range of conditions remains a pressing challenge, which is essential for ensuring the practicality of ML solutions in real-world applications.

Furthermore, the accelerating pace of AI development raises concerns regarding computational limitations. Moore’s Law [20], which has historically explained computational progress, is nearing its physical boundaries, which calls for immediate innovative alternative approaches to sustain future advancements. Incorporating more transistors on a microchip is no longer possible, thus approaching physical limits to further miniaturization [21]. Meanwhile, memory production faces similar constraints, as the demand, particularly driven by AI and the Internet of Things (IoT), outpaces the production capacities [22]. Addressing these challenges requires concerted efforts in software optimization, algorithmic innovations, and architectural advancements to ensure the continued progress of AI and ML technologies [23,24].

One of the most significant developments in DL is the emergence of end-to-end pipeline structures. These structures integrate the feature engineering process with training–validation stages, thus consolidating the conventional pipeline’s four main steps: preprocessing, feature engineering, training–validation, and postprocessing. Despite the growing popularity of end-to-end DL models for hyperspectral image segmentation [25], several concerns and challenges persist, thus making the classic four-stage ML pipeline structure more suitable for real-world applications, as explained in [7]. Dimensionality reduction poses another critical challenge, thus requiring the careful preservation of valuable information. Classical ML pipelines address this by conducting feature engineering, which not only streamlines data but also enhances model robustness and reliability by projecting features into a more accurate space, thereby helping to mitigate overfitting issues caused by the limited availability of ground truth data. Although feature engineering improves classifier performance and optimizes resource consumption, its heuristic nature does not guarantee optimal solutions. In contrast, end-to-end models leverage global optimization to identify optimal features during training and validation, thus offering potential improvements in efficiency and performance.

This paper proposes a framework that integrates the classic four-stage ML pipeline structure with end-to-end optimization capabilities, thus specifically tailored to address challenges encountered in hyperspectral segmentation tasks for real-world applications. Feature engineering enables dimensionality reduction, hence lowering the impact of ground truth scarcity on the model performance. We propose a strategy to decompose feature engineering into distinct inner steps, thus enabling the design and development of a framework that generates and optimizes multiple models through various combinations of these steps, including scenarios where feature engineering steps are omitted. This approach facilitates optimized model selection and enables comparative evaluations across different pipeline configurations.

Furthermore, we extend this framework concept into a prototype Automated Machine Learning (AutoML) system [26]. An AutoML framework automates the end-to-end process of applying machine learning to real-world problems, including data preprocessing, feature engineering, model selection, and hyperparameter tuning [27,28]. By incorporating various consolidated techniques at different stages of the pipeline, our system identifies the most suitable methods for specific prediction tasks and input data. Our holistic scheme ensures diverse optimization requirements, including data versioning, model selection, and hyperparameter tuning. This enhances the generalizability, reliability, robustness, repeatability, and tractability of the resulting models while also allowing for effective monitoring and the mitigation of overfitting. The efficient implementation of our optimization scheme facilitates resource management and minimizes the risk of system failure. Additionally, our integrated midprocess statistics reporting enables a systematic review of AutoML behavior and choices, thus providing deeper insights into the effectiveness of each step. Finally, we evaluate our framework using a well-established problem with a widely cited dataset in the literature, thus allowing readers to benchmark our results against state-of-the-art approaches.

2. Materials and Methods

2.1. Framework Overview

End-to-end DL approaches face challenges such as increased processing time inefficiencies and concerns regarding model robustness and generalizability, particularly with high-dimensional data like hyperspectral images. Wolpert’s “No Free Lunch” [29,30] theorem highlights the absence of a universally superior supervised learning algorithm, thus emphasizing the need to tailor approaches to individual classification problems. Therefore, despite end-to-end DL models showing a great capacity to generalize well in practice, it is theoretically unclear and is still being questioned [31,32,33,34]. DL models also require extensive training data, thus exacerbating the challenge of ground truth scarcity in hyperspectral image segmentation tasks. Despite attempts to mitigate these challenges through techniques like unsupervised learning, the curse of dimensionality and increasing model complexity hinder processing efficiency, thus making traditional four-stage machine learning pipelines more suitable for real-world applications.

Figure 1 shows the high-level workflow map we followed for designing and implementing the approach we propose in this research. It comprises three key phases: data engineering, model generation and training, and prediction and evaluation. In the data engineering phase, tasks involve preparing datasets by collecting, preprocessing, and splitting them into train and test sets. This phase is critical for ensuring model performance and the comparability of produced models. The model generation and training phase focuses on training models, tuning hyperparameters, and packaging models. This phase is resource-intensive and aims to enhance classifier performance through feature engineering methodologies. At its core is the model tweaking process, which systematically combines the different proposed steps of feature engineering with diverse classifiers to optimize hyperparameters. This iterative approach ensures that the resulting models are finely tuned for optimal performance across diverse evaluation metrics. The prediction and evaluation phase utilizes the optimized models to predict unseen data, thus extracting evaluation metrics to compare and analyze model performance and ultimately leading to conclusions.

2.2. Model Pipeline Configuration

As reasoned before, the proposed model pipeline configuration in this research adopts a four-stage ML structure comprising preprocessing, feature engineering, core classification/segmentation, and postprocessing steps, with the latter being optional and deferred for later inclusion. This configuration supports two modes: training and prediction. During training, a trained pipeline, or model, is generated and then utilized in the prediction mode to assign labels to new data.

2.2.1. Data Preprocessing and Feature Engineering

Data preprocessing is a critical stage that refines datasets by removing noise and validating data correctness. While some studies include feature engineering tasks within data preprocessing, in this study, we separated them for clarity and conducted preprocessing tasks before data versioning.

In general, feature engineering, as the core of the four-stage ML pipeline structure, aims to optimize features to improve model performance. In the context of high-dimensional data classification tasks, such as hyperspectral image segmentation, feature engineering becomes particularly crucial due to the mathematical complexities introduced by data high dimensionality. By removing redundant information and reducing dimensionality, feature engineering enhances the computational efficiency, performance, and reliability of ML models. The proposed framework assesses different types of feature engineering methodologies, including feature transformation, feature selection, and feature extraction, with each serving distinct roles in enhancing the classification performance. Feature transformation involves mathematical operations to improve feature consistency, while feature selection reduces data dimensionality by selecting relevant subsets. Feature extraction projects data into a lower-dimensional feature space, thus further enhancing computational efficiency. By systematically optimizing the feature engineering steps, the framework aims to improve classification performance and optimize resource consumption, thus providing valuable insights about the dataset and its potential applications.

The framework employs a brute-force assembly process within an iterative loop for hyperparameter tuning, thus generating a set of models with different combinations of feature engineering steps, which are schematically shown in Figure 2. These combinations are strategically positioned within the model pipeline to maximize the effectiveness. Feature transformation is placed at the beginning to preprocess data effectively, while feature extraction, which incorporates a form of feature selection, is positioned last. This positioning ensures the optimal utilization of feature engineering methodologies and avoids redundancy in the pipeline. By delineating clear categories of feature engineering and their respective roles, the framework offers a systematic approach to assess and optimize feature engineering for diverse ML tasks, thus contributing to enhanced model performance and resource efficiency. Our chosen approach and methodology for each type of feature engineering are elucidated as follows:

Feature transformation (FT): In this study, we employed two main techniques for feature transformation, Normalization (Min–Max scaling) and Standardization (standard scaling), due to their effectiveness in enhancing classification accuracy. Normalization is suitable for datasets with small standard deviations and non-normal feature distributions, while standardization helps transform data to a normal distribution, thus improving convergence and classification performance. Other scaling techniques like Maximum Absolute and Robust scalers were excluded due to the dataset characteristics, while Quantile and Power transformer scalers were deemed unsuitable for the study’s purposes [35,36,37].
Feature selection (FS): The automatic feature selection approaches typically involve a combination of feature subset search methods and evaluation techniques to rank or prioritize features based on their correlations or importance in predictive tasks. These methods are categorized into Wrappers, Filters, and Embedded methods [38]. Wrappers utilize a predictor to assess the usefulness of feature subsets, thus often leading to overfitting, while Filters rely on correlation coefficients or Mutual Information among features, though they may fail to find the best feature set in certain scenarios due to insufficient sample size. Embedded methods, on the other hand, embed the feature subset generator within model training to reduce overfitting and increase efficiency.

Embedded feature selection methods aim to optimize both the goodness of fit and the number of features, which is often achieved through direct objective optimization or thresholding techniques. Linear models with LASSO regularization and Random Forest are examples of direct objective optimization approaches, while thresholding methods like Ridge regularization offer an alternative solution. However, selecting the optimal threshold or number of features remains a challenge, thus often requiring empirical tuning. Another group of embedded methods utilizes nested subsets to manipulate feature subset search, thus employing forward selection or backward elimination techniques. While forward selection is computationally efficient, backward elimination provides a more accurate subset in a general context. Therefore, we employed the Recursive Feature Elimination with Crossvalidation approach (RFECV), which is a common backward elimination approach incorporating crossvalidation for robust feature selection.

For this study, we incorporated Random Forest (RF), Logistic Regression (LR) using Lasso (L1) and Ridge (L2) regularization, the Linear Support Vector Machine (LinearSVM), and K-Nearest Neighbors (KNN) into the RFECV structure as the base estimators [39,40]. RF assesses feature importance by evaluating the decrease in node impurity within its decision trees, thus using Gini impurity to quantify the likelihood of misclassifying a random observation. The LinearSVM and LR determine feature importance based on the coefficients assigned to each variable within their linear model. Although KNN does not inherently offer a measure of feature importance, it can be extracted using the Permutation Feature Importance technique [41], which involves permuting feature values to gauge their impact on model precision, thereby integrating feature importance within the KNN model.

Feature extraction (FE): Feature extraction techniques primarily focus on reducing dimensionality by transforming data from a high-dimensional feature space to a lower-dimensional one. Unlike feature selection, which discards certain features, feature extraction aims to summarize information while highlighting important details and suppressing less relevant ones. While convolutional neural networks excel at feature extraction, the computational complexity and demand for extensive training data pose challenges, thus aligning with concerns regarding end-to-end pipelines. Therefore, in this study, we employed alternative feature extraction techniques such as Principal Component Analysis (PCA), Kernel Principal Component Analysis (KPCA), Independent Component Analysis (ICA), Linear Discriminant Analysis (LDA), Kernel Fisher’s Discriminant Analysis (KFDA), and Locally Linear Embedding (LLE).

PCA computes principal components for linear dimensionality reduction by maximizing variance via eigenvectors [42], while KPCA extends this by using a kernel matrix to enable nonlinear dimensionality reduction [43,44]. Generalizing the PCA approach, ICA aims to extract maximally independent components, instead of principal components, from the original features, thus relying on the assumption of mutual statistical independence and non-Gaussian distribution of the components [45]. LDA, unlike PCA and ICA, is a supervised technique that employs a discriminant rule to project data into a lower-dimensional space, thus typically reducing the dimensions to

K - 1

, where K represents the number of classes. It can further reduce the dimensions by projecting data into a subspace with dimension L, if

L < K - 1

, which is similar to PCA’s approach of selecting the first L eigenvectors of the projection [46]. KFDA focuses on maximizing the ratio of between-class variance to within-class variance using Fisher’s Discriminant Analysis in the feature space [47], while LLE preserves the distances within local neighborhoods, thus mapping data to a low-dimensional space based on optimal linear reconstructions from nearby data points [48].

2.2.2. Core Image Segmentation Task

Hyperspectral image analysis typically involves retrieving valuable information from both the spectral and spatial characteristics of image pixels. Segmentation methods for hyperspectral images vary, thus ranging from spectral-based approaches—often referred to as pixelwise classification approaches—to those integrating both spatial and spectral data—which are known as spatial–spectral classification. Common methods for incorporating spatial information include superpixel approaches [49], spatial filtering [50], and utilizing 3D CNN structures [51]. In scenarios where the spatial information is less relevant, such as material identification or component detection, the spectral information becomes paramount.

While some studies advocate for spatial–spectral approaches in hyperspectral image segmentation, they often fail to demonstrate advantages over spectral pixelwise classification methods. Coarse spatial resolution can further diminish the usefulness of spatial details, particularly in land-use and land cover segmentation problems. Accordingly, pixelwise classification for the hyperspectral segmentation task has been employed in this study. We recommend integrating spatial features through image processing tools in the preprocessing and postprocessing phases for future cases where the spatial characteristics are more meaningful, as in high-resolution scenarios. In the evaluation of the proposed framework, two popular classifiers, the Nonlinear Support Vector Machine with Radial Basis Function Kernel (KernelSVM-RBF) and Multilayer Perceptron (MLP), were selected to assess the impact of optimized feature engineering on classification performance, thus ensuring comparability with state-of-the-art results. For MLP, we adopted a shallowwide network with a single hidden layer of size 1000 to simplify the framework evaluation by reducing the classifier complexity and highlighting the impact of feature engineering steps. However, our chosen classifiers for this framework may not represent the optimal choice, thus suggesting potential improvements with better classifiers.

2.3. Optimization Strategies

Arguably, the most critical research challenges in using ML techniques concern how to solve, verify, validate, and compare the employed models. Accordingly, our AutoML framework is designed to incorporate different optimization strategies to solve the model’s inner mathematical problem, tune its parameters, and validate the results. It is specifically designed to ensure the solution’s reliability, generalizability, reproducibility, and comparability. Typically, optimization involves regularization, hyperparameter tuning, and model selection. In addition to that, we also adopted data versioning to facilitate managing the iterative process of model development by creating distinct versions of datasets to track changes and updates.

As a primary step, data versioning was achieved through stratified k-folding—a sampling approach for crossvalidation—where the dataset was divided into representative folds for training and testing, with

k = 3

. These versions were stored separately, thus ensuring reproducibility and aiding in model comparison and selection. The testing portion remained untouched for final evaluation and model selection, while the rest was used for training and hyperparameter tuning. We performed random shuffling before k-folding to ensure unbiased samples.

Regularization is an added objective specific to each ML methodology that helps to avoid overfitting by defining a loss function to optimize the model inner parameters. On the other hand, hyperparameter tuning optimizes input parameters through an iterative process, which is often conducted through heuristic approaches like Grid Search and Random Search. We adopted Grid search to ensure the reproducibility and comparability of the models, which is especially crucial for small datasets, as random search’s high variance and lack of reproducibility can hinder performance evaluation.

Back to crossvalidation, the same stratified k-folding with

k = 3

was also used for hyperparameter tuning. Crossvalidation helps monitor the possibility of overfitting by systematically rotating through different subsets of the data for training and validation. To prevent bias in evaluating the model performance, we adopted the Nested Crossvalidation (or Double Crossvalidation) approach, as depicted in Figure 3. This involves outer iterations for model selection and inner iterations for hyperparameter tuning, with each using stratified crossvalidation. The best-performing model from inner iterations was retrained on its entire training version subset to produce the final model.

The feature engineering proposed in this study emphasizes the independence of feature transformation and selection from the classifier, thus ensuring flexibility and adaptability across different ML tasks. While techniques like PCA and LDA allow for the flexible adjustment of feature components, others like LLE, ICA, and LFDA pose challenges due to the varying interpretations of component numbers. The dependence of feature extraction on the classifier’s performance prompts tuning of the number of components as a pipeline hyperparameter across different classifiers to determine the optimal value. Consequently, feature extraction’s dependency on the classifier’s choice necessitates tuning the number of components as a pipeline hyperparameter across various classifiers to determine its optimal value, as depicted in the schematic diagram in Figure 4.

As previously described, we implemented an embedded approach known as backward elimination for feature selection optimization. This method ensured the independence of the feature selection process, thus leading to a notable reduction in the computational load and enabling unbiased model inference. Accordingly, each outer–inner iteration combination of feature transformation and selection technique was conducted separately before model training and hyperparameter tuning. The best-performing feature subset across iterations was selected through maximum voting. Figure 4 illustrates this strategy, thus demonstrating how the chosen feature subset is integrated into the pipeline configuration to streamline dataset shrinkage.

Finally, we used the framework to select the most accurate model for hyperspectral classification tasks, thus implicitly determining the most effective techniques. As explained previously, the framework incorporates several combinations of pipeline steps, including the end-to-end pipelines (without feature engineering) for explanatory purposes, thus allowing us to compare and demonstrate the efficacy of feature engineering. Accordingly, the primary evaluation compared the outcomes of the proposed four-stage pipeline with an end-to-end structure, thus assessing the computational time and predictive performance. The performance evaluation involved calculating the predictive accuracy using the test portion of the dataset from the outer iterations, with the accuracies averaged for comparison. Additional metrics like F1 Score, Precision, and Recall were also calculated to provide further statistical insight into model performance.

The secondary objective emphasizes the stand-alone nature of feature engineering, thus reducing the computational burden and providing insights for robust dimension reduction. As mentioned earlier, unlike other feature engineering steps, feature extraction’s parameter optimization depends on the choice of the classifier. Therefore, we can assess if tuning the feature extraction parameters improves the prediction and whether this improvement is influenced by the classifier choice.

2.4. Implementation

The implementation of the framework is based on Python 3.6.x, which was chosen for its widespread support and compatibility with all utilized packages. Various libraries, including Pandas, Numpy, Matplotlib, mpl_toolkits, scikit-learn, scikit-image, kfda, pillow, Pickle, scipy, math, and others, were employed for different tasks such as data analysis and result visualization. The deployment and execution occurred on a Docker-managed server utilizing CPU cores exclusively. The server’s multiuser nature dictates CPU core allocation, with 10 cores designated for feature transformation–selection and the grid search split into 16 parallel executions, with each utilizing 5 cores. Communication with the server and result visualization were facilitated through the Jupyter Notebook web service, thus enabling remote access via SSH Tunnelling for efficient management and access to execution results.

2.5. Testing Dataset

To evaluate the framework, we used the Indian Pines [52] dataset, which is a hyperspectral image capturing a scene from the Indian Pines test site in northwest Tippecanoe County, Indiana, US covering a

2 \times 2

mile portion, including the Purdue University Agronomy farm and its surroundings. Captured by the AVIRIS sensor aboard a NASA aircraft on 12 June 1992, the image comprises

145 \times 145

pixels and 224 spectral reflectance bands in the range of 0.4–2.5

\times 10^{- 6}

m. Accessible through the Purdue University Research Repository [53], the dataset has 220 spectral bands due to noise removal. It is already calibrated, and the pixel values are proportional to radiance. The ground truth contains 16 classes predominantly related to agriculture and some to forests and natural perennial vegetations, as well as features elements like highways, a rail line, housing, and built structures, with some unlabeled areas. Figure 5 shows the number of labeled pixels per class and their percentage in the ground truth set.

The choice of the dataset was intentionally small in volume to align with the primary aim of this research: tackling the ground truth scarcity issue. In real-world applications of remote sensing data, collecting accurate and reliable ground truth informationfor each potential class on the ground is costly and labor-intensive but also often too difficult due to limited accessibility to regions and the temporal variability of on-ground objects. Consequently, ground truth scarcity is an inevitable issue with hyperspectral datasets. Therefore, using the Indian Pines dataset, which encapsulates the challenges of high-dimensional data with limited labeled samples, allowed us to effectively evaluate the proposed framework’s ability to handle these constraints. The inherent complexity of the dataset, due to the variety of classes and the presence of mixed pixels, further tested the robustness and reliability of our AutoML framework in hyperspectral image segmentation tasks.

3. Results

3.1. Part1: Feature Transformation–Selection

The distribution and value range of the dataset features are illustrated in Figure 6, which showcases the impact of feature transformation techniques on data normalization and scaling. This figure presents box–whisker plots of the original, standardized, and min–max-scaled versions of the Indian Pines dataset’s feature distribution, thus offering insights into the population and distribution based on quartiles. The x axis denotes spectral bands/features, while the y axis represents intensity values, with green boxes indicating the Interquartile Range (IQR) and slim black lines depicting the range from minimum to maximum feature values. White circles represent outliers, and red points denote feature medians. Notably, the original data exhibit wide variations in the feature value ranges, thus potentially stemming from inherent issues with the hyperspectral scanners. Standardization and min–max scaling were proposed to address these issues, with standardization proving effective in achieving consistent range scaling, thus unaffected by outliers. Accordingly, we also observe that our proposed framework intends to include the feature transformation step, particularly selecting standardization, in its optimized final model.

As explained previously, the execution of combined feature transformation and feature selection was carried out separately before the whole pipeline execution. Figure 7 shows how these combinations performed on each outer fold. As a reminder, we adopted the nested crossvalidation structure, where outer folds, aiming to avoid biased decisions, serve the model selection purpose, and the inner folds are used for hyperparameter tuning. The figure illustrates the yield score/accuracy, denoted on the y axis, for each recursion of the RFECV models, thus denoted on the x axis. Each recursion handled a specific subset of features reached by gradual elimination. The results highlight the inefficacy of RFECV–KNN, thus leading us to eliminate this feature selection option from further framework execution. On the other hand, the RFECV models with embedded RF and the linear SVM showed promising performance across all feature selection cases. Looking deeper into the results, scaling the data led to less biased feature selections for the RFECV–LR models. On the contrary, in the case of the embedded RF and linear SVM, all the feature transformation cases seemed to perform almost equally well. Moreover, the results indicate that RFECV–RF allowed more dimensionality reduction, and RFECV–LR with Ridge regularization maintained the highest number of channels.

3.2. Part 2: The Whole Framework

Benchmarking is essential for evaluating the framework’s performance against established standards. It helps compare the framework’s capabilities with state-of-the-art methods, thus giving a clear idea of its contribution to the field. The proposed framework provides all statistical details per pipeline configuration. The results notably reveal the low validation accuracy of the pipelines without feature engineering across our classifier choices. Including feature transformation techniques within the pipeline configuration shows a significant improvement, thus proving the impact of data preparation on model performance. Notably, standardization emerged as a superior choice over min–max scaling, thus particularly enhancing the performance of distance-based techniques like SVM-RBF. Integrating feature extraction after feature transformation, but omitting feature selection, led to a modest improvement in the accuracy without significant dimensionality reduction. In contrast, relying solely on the combination of feature transformation and feature selection techniques, better enhanced the classification performance more effectively than the latter combination. We will demonstrate and discuss these results later in the Discussion section.

As explained previously, due to the dependence of feature extraction on the classifier’s performance, their hyperparameter (number of components) was tuned as a part of pipeline hyperparameters across different classifiers to determine the optimal value. Accordingly, the final evaluation of the framework’s optimized models based on feature extraction techniques for SVM-RBF and MLP core classifiers are presented in Table 1 and Table 2, respectively. The five best-performing configurations are highlighted in yellow. Comparing the results across tables, it becomes evident that models incorporating feature selection without feature extraction yielded comparable or superior performance to those with feature extraction, thus suggesting that the latter may not necessarily enhance accuracy. Figure 8 represents the classification maps generated by the top-performing models highlighted in Table 1 and Table 2 and visually showcases their comparable performance.

4. Discussion

In ML, performance is a trade-off between the accuracy and time of the process. It stems from the fact that there is neither absoluteness in any AI task nor in ML. Specifically, ML adopts statistical learning techniques that rely on error minimization, which never reaches zero when dealing with real-world datasets. The efficiency in an ML model is also defined as a compromise between these two factors, accuracy and time, which arerestricted by the requirements of the specified application. Accordingly, the primary objective in our proposed framework is to improve hyperspectral image segmentation through feature engineering, which also involves striking a balance between the accuracy and inference time. The reduced inference time is particularly crucial for real-time applications, where reducing latency and increasing throughput are of concern.

As previously elucidated, our framework encompasses various models constructed from all conceivable pipeline configurations, thus facilitating a benchmarking setup to assess the impact of feature engineering on hyperspectral image segmentation tasks. Figure 9 illustrates a comparison of the mean accuracy and mean inference time across different models, thus incorporating or excluding various feature engineering steps. To establish a benchmark for comparing the effect of feature engineering, we categorized the pipeline structures based on state-of-the-art approaches into five reference categories: the end-to-end pipeline or “no feature engineering steps”, “only with feature transformation (FT) step”, “with FT and feature extraction (FE) steps”, “with FT and feature selection (FS) steps”, and “with all feature engineering steps” pipelines. Figure 9 depicts the mean accuracy and inference time of the best-performing model across all these configuration categories. It is important to note that for clarity, all values are presented relative to the “only with FT step”. Thus, at the “only with FT step”, the mean value for the accuracy and inference time is depicted as zero. Notably, standardization consistently emerged as the optimal feature transformation technique, with PCA being the preferred feature extraction method across all cases. Additionally, RFECV-LR-l1 has been identified as the optimal feature selection technique.

The results reveal a consistent improvement in the prediction performance and a reduction in the inference time with the inclusion of feature engineering steps. Notably, while feature transformation and feature selection significantly enhanced performance, feature extraction had a relatively minor impact. This underscores the efficacy of the proposed feature transformation–selection approach for dimensionality reduction while improving performance. It is suggested that focusing on feature transformation and selection, independent of the core classifier choice, can greatly enhance AutoML performance.

We have also assessed the framework using another hyperspectral dataset from an industrial setup, which has been reported thoroughly in [54]. This dataset focuses on a specific application in detecting and measuring residual contaminants in the production of washing machine cabinets for zero-defect manufacturing. The choice of this dataset is significant, because it not only involves a different sensor and setup but also targets classes with less distinct spectral signatures, thus providing a more stringent evaluation of the framework’s capabilities. Overall, the results are consistent. The AutoML framework produced the optimized robust model and impacted the overall performance. The findings underscore the importance of tailored feature engineering strategies in optimizing model performance and efficiency across diverse datasets and scenarios. The industrial assessment provides a comprehensive and thorough test of the framework’s transferability, thus highlighting its capability to adapt to different contexts and confirming its generalizability beyond similar case studies.

5. Conclusions

In this study, we introduced an AutoML framework tailored for hyperspectral image segmentation, thus highlighting the effectiveness of a classic four-stage ML pipeline structure that integrates feature engineering to address data challenges. Through feature engineering and dimensionality reduction techniques, we mitigated the necessity for a large quantity of labeled data, which is particularly a challenge in the case of end-to-end deep learning models. Additionally, the proposed framework not only improved overall performance but also generated models with more accurate predictions within shorter inference times. Through a multilateral optimization approach, the framework ensured model robustness and reliability by mitigating overfitting and bias concerns. The transparency of the framework’s process, allowing access to all models and statistical information, facilitated the validation of the proposed approach’s effectiveness. Overall, the study successfully achieved its objectives in addressing challenges with hyperspectral image-based machine learning solutions.

The hyperspectral-based AutoML framework presented in this study offers a streamlined solution for developing supervised ML models tailored to specific hyperspectral image segmentation tasks. Automating the task-specific feature engineering process simplifies what is typically a complex and expertise-intensive endeavor. Notably, the proposed feature selection component, identified as a key factor in enhancing predictive performance, can pinpoint irrelevant features, thus enabling informed decisions regarding data collection, transmission, and storage efficiency.

In considering future directions for AutoML, it is crucial to prioritize sustainability, efficiency, scalability, and inclusiveness. Addressing the resource-intensive nature and carbon emissions associated with AutoML processes is essential, and efforts should be made to minimize footprints through optimization criteria like overall runtime, energy consumption, and CO₂ emissions. Enhancing efficiency remains a key focus, particularly through algorithmic improvements and resource consumption optimization, and there is an opportunity to assess and refine frameworks for better performance. Additionally, scalability and inclusiveness are critical for making AutoML more accessible across different fields and applications. The proposed framework offers problem-specific solutions and can be scaled up independently or integrated into other AutoML frameworks. Future research should explore alternative feature engineering sequences and incorporate multiple techniques within each category to improve dimensionality reduction effectiveness. Evaluating additional ML and image processing methodologies, along with exploring new evaluation metrics and model selection techniques, can provide a more comprehensive analysis of AutoML frameworks. Moreover, extending the evaluation to new applications will help establish the generalizability of the proposed approach across diverse hyperspectral datasets and applications.

Author Contributions

Conceptualization, A.V., S.C. and M.M.; methodology, A.V., S.C. and M.M.; investigation, A.V.; resources, S.C. and M.M.; data curation, A.V.; writing—original draft preparation, A.V.; writing—review and editing, S.C. and M.M.; visualization, A.V.; supervision, S.C. and M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data derived from public domain resources, available at https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html (accessed on 1 July 2024).

Conflicts of Interest

The authors declare no conflict of interest.

References

De Leeuw, J.; Georgiadou, Y.; Kerle, N.; De Gier, A.; Inoue, Y.; Ferwerda, J.; Smies, M.; Narantuya, D. The function of remote sensing in support of environmental policy. Remote Sens. 2010, 2, 1731–1750. [Google Scholar] [CrossRef]
Zhong, Z.; Li, J.; Luo, Z.; Chapman, M. Spectral–spatial residual network for hyperspectral image classification: A 3-D deep learning framework. IEEE Trans. Geosci. Remote Sens. 2017, 56, 847–858. [Google Scholar] [CrossRef]
Goetz, A.F. Three decades of hyperspectral remote sensing of the Earth: A personal view. Remote Sens. Environ. 2009, 113, S5–S16. [Google Scholar] [CrossRef]
Camps-Valls, G.; Tuia, D.; Bruzzone, L.; Benediktsson, J.A. Advances in hyperspectral image classification: Earth monitoring with statistical learning methods. IEEE Signal Process. Mag. 2013, 31, 45–54. [Google Scholar] [CrossRef]
Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N.; Prabhat, f. Deep learning and process understanding for data-driven Earth system science. Nature 2019, 566, 195–204. [Google Scholar] [CrossRef] [PubMed]
Garcia-Garcia, A.; Orts-Escolano, S.; Oprea, S.; Villena-Martinez, V.; Garcia-Rodriguez, J. A review on deep learning techniques applied to semantic segmentation. arXiv 2017, arXiv:1704.06857. [Google Scholar]
Vali, A.; Comai, S.; Matteucci, M. Deep learning for land use and land cover classification based on hyperspectral and multispectral earth observation data: A review. Remote Sens. 2020, 12, 2495. [Google Scholar] [CrossRef]
Bellman, R. Dynamic programming. Science 1966, 153, 34–37. [Google Scholar] [CrossRef] [PubMed]
Jia, W.; Sun, M.; Lian, J.; Hou, S. Feature dimensionality reduction: A review. Complex Intell. Syst. 2022, 8, 2663–2693. [Google Scholar] [CrossRef]
Signoroni, A.; Savardi, M.; Baronio, A.; Benini, S. Deep learning meets hyperspectral image analysis: A multidisciplinary review. J. Imaging 2019, 5, 52. [Google Scholar] [CrossRef]
Liu, P.; Choo, K.K.R.; Wang, L.; Huang, F. SVM or deep learning? A comparative study on remote sensing image classification. Soft Comput. 2017, 21, 7053–7065. [Google Scholar] [CrossRef]
Yu, X.; Wu, X.; Luo, C.; Ren, P. Deep learning in remote sensing scene classification: A data augmentation enhanced convolutional neural network framework. Giscience Remote Sens. 2017, 54, 741–758. [Google Scholar] [CrossRef]
Triguero, I.; García, S.; Herrera, F. Self-labeled techniques for semi-supervised learning: Taxonomy, software and empirical study. Knowl. Inf. Syst. 2015, 42, 245–284. [Google Scholar] [CrossRef]
Han, W.; Feng, R.; Wang, L.; Cheng, Y. A semi-supervised generative framework with deep learning features for high-resolution remote sensing image scene classification. ISPRS J. Photogramm. Remote Sens. 2018, 145, 23–43. [Google Scholar] [CrossRef]
Marmanis, D.; Datcu, M.; Esch, T.; Stilla, U. Deep learning earth observation classification using ImageNet pretrained networks. IEEE Geosci. Remote Sens. Lett. 2015, 13, 105–109. [Google Scholar] [CrossRef]
Chen, Z.; Zhang, T.; Ouyang, C. End-to-end airplane detection using transfer learning in remote sensing images. Remote Sens. 2018, 10, 139. [Google Scholar] [CrossRef]
Hong, D.; Yokoya, N.; Xia, G.S.; Chanussot, J.; Zhu, X.X. X-ModalNet: A semi-supervised deep cross-modal network for classification of remote sensing data. ISPRS J. Photogramm. Remote Sens. 2020, 167, 12–23. [Google Scholar] [CrossRef] [PubMed]
Jiang, T.; Gradus, J.L.; Rosellini, A.J. Supervised machine learning: A brief primer. Behav. Ther. 2020, 51, 675–687. [Google Scholar] [CrossRef] [PubMed]
Zhou, L.; Pan, S.; Wang, J.; Vasilakos, A.V. Machine learning on big data: Opportunities and challenges. Neurocomputing 2017, 237, 350–361. [Google Scholar] [CrossRef]
Moore, G.E. Cramming more components onto integrated circuits. Proc. IEEE 1998, 86, 82–85. [Google Scholar] [CrossRef]
Theis, T.N.; Wong, H.S.P. The end of moore’s law: A new beginning for information technology. Comput. Sci. Eng. 2017, 19, 41–50. [Google Scholar] [CrossRef]
Gholami, A.; Yao, Z.; Kim, S.; Hooper, C.; Mahoney, M.W.; Keutzer, K. Ai and memory wall. arXiv 2024, arXiv:2403.14123. [Google Scholar] [CrossRef]
Shalf, J. The future of computing beyond Moore’s Law. Philos. Trans. R. Soc. A 2020, 378, 20190061. [Google Scholar] [CrossRef] [PubMed]
Lundstrom, M.S.; Alam, M.A. Moore’s law: The journey ahead. Science 2022, 378, 722–723. [Google Scholar] [CrossRef] [PubMed]
Li, S.; Song, W.; Fang, L.; Chen, Y.; Ghamisi, P.; Benediktsson, J.A. Deep learning for hyperspectral image classification: An overview. IEEE Trans. Geosci. Remote Sens. 2019, 57, 6690–6709. [Google Scholar] [CrossRef]
Feurer, M.; Klein, A.; Eggensperger, K.; Springenberg, J.; Blum, M.; Hutter, F. Efficient and robust automated machine learning. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar]
Hutter, F.; Kotthoff, L.; Vanschoren, J. Automated Machine Learning: Methods, Systems, Challenges; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
Baratchi, M.; Wang, C.; Limmer, S.; van Rijn, J.N.; Hoos, H.; Bäck, T.; Olhofer, M. Automated machine learning: Past, present and future. Artif. Intell. Rev. 2024, 57, 1–88. [Google Scholar] [CrossRef]
Wolpert, D.H. The lack of a priori distinctions between learning algorithms. Neural Comput. 1996, 8, 1341–1390. [Google Scholar] [CrossRef]
Wolpert, D.H.; Macready, W.G. Coevolutionary free lunches. IEEE Trans. Evol. Comput. 2005, 9, 721–735. [Google Scholar] [CrossRef]
Zhang, C.; Bengio, S.; Hardt, M.; Recht, B.; Vinyals, O. Understanding deep learning requires rethinking generalization. arXiv 2016, arXiv:1611.03530. [Google Scholar] [CrossRef]
Kawaguchi, K.; Kaelbling, L.P.; Bengio, Y. Generalization in deep learning. arXiv 2017, arXiv:1710.05468. [Google Scholar]
Saxe, A.M.; Bansal, Y.; Dapello, J.; Advani, M.; Kolchinsky, A.; Tracey, B.D.; Cox, D.D. On the information bottleneck theory of deep learning. J. Stat. Mech. Theory Exp. 2019, 2019, 124020. [Google Scholar] [CrossRef]
Dinh, L.; Pascanu, R.; Bengio, S.; Bengio, Y. Sharp minima can generalize for deep nets. In Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017; Volume 70, pp. 1019–1028. [Google Scholar]
Steinbrecher, G.; Shaw, W.T. Quantile mechanics. Eur. J. Appl. Math. 2008, 19, 87–112. [Google Scholar] [CrossRef]
Zheng, A.; Casari, A. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2018. [Google Scholar]
Kuhn, M.; Johnson, K. Feature Engineering and Selection: A Practical Approach for Predictive Models; Chapman and Hall/CRC: Boca Raton, FL, USA, 2019. [Google Scholar]
Venkatesh, B.; Anuradha, J. A review of feature selection and its methods. Cybern. Inf. Technol. 2019, 19, 3–26. [Google Scholar] [CrossRef]
Sung, J.; Han, S.; Park, H.; Hwang, S.; Lee, S.J.; Park, J.W.; Youn, I. Classification of stroke severity using clinically relevant symmetric gait features based on recursive feature elimination with cross-validation. IEEE Access 2022, 10, 119437–119447. [Google Scholar] [CrossRef]
Misra, P.; Yadav, A.S. Improving the classification accuracy using recursive feature elimination with cross-validation. Int. J. Emerg. Technol. 2020, 11, 659–665. [Google Scholar]
Altmann, A.; Toloşi, L.; Sander, O.; Lengauer, T. Permutation importance: A corrected feature importance measure. Bioinformatics 2010, 26, 1340–1347. [Google Scholar] [CrossRef] [PubMed]
Abdi, H.; Williams, L.J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
Schölkopf, B.; Smola, A.; Müller, K.R. Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 1998, 10, 1299–1319. [Google Scholar] [CrossRef]
Schölkopf, B.; Smola, A.; Müller, K.R. Kernel principal component analysis. In Proceedings of the International Conference on Artificial Neural Networks, Lausanne, Switzerland, 8–10 October 1997; pp. 583–588. [Google Scholar]
Hyvarinen, A. Fast and robust fixed-point algorithms for independent component analysis. IEEE Trans. Neural Netw. 1999, 10, 626–634. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2017; Volume 2. [Google Scholar]
Ghojogh, B.; Karray, F.; Crowley, M. Fisher and kernel Fisher discriminant analysis: Tutorial. arXiv 2019, arXiv:1906.09436. [Google Scholar]
Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [PubMed]
Jia, S.; Deng, B.; Zhu, J.; Jia, X.; Li, Q. Local binary pattern-based hyperspectral image classification with superpixel guidance. IEEE Trans. Geosci. Remote Sens. 2017, 56, 749–759. [Google Scholar] [CrossRef]
He, L.; Li, J.; Plaza, A.; Li, Y. Discriminative low-rank Gabor filtering for spectral–spatial hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2016, 55, 1381–1395. [Google Scholar] [CrossRef]
Li, Y.; Zhang, H.; Shen, Q. Spectral—Spatial classification of hyperspectral imagery with 3D convolutional neural network. Remote Sens. 2017, 9, 67. [Google Scholar] [CrossRef]
Baumgardner, M.F.; Biehl, L.L.; Landgrebe, D.A. 220 band aviris hyperspectral image data set: June 12, 1992 indian pine test site 3. Purdue Univ. Res. Repos. 2015, 10, R7RX991C. [Google Scholar]
Hyperspectral Images. Available online: https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html (accessed on 1 July 2024).
Vali, A. Hyperspectral Image Analysis and Advanced Feature Engineering for Optimized Classification and Data Acquisition. Ph.D. Thesis, Politecnico di Milano, Milan, Italy, 2022. [Google Scholar]

Figure 1. The general scheme of the ML workflow; each semirectangular node schematically represents a task within the workflow. The aim is to assess different types of feature engineering for a given individual ML problem and evaluate the impact of the best-fitted set of feature engineering in improving prediction performance and efficiency.

Figure 2. Overview of the feature engineering assessment setup (FT = feature transformation, FS = feature selection, and FE = feature extraction). Each feature engineering task.

Figure 3. The overview of our adopted Nested Crossvalidation strategy, with

k = 3

.

Figure 3. The overview of our adopted Nested Crossvalidation strategy, with

k = 3

.

Figure 4. An overview of feature transformation–selection strategy adopted within the proposed framework.

Figure 5. Overview of the class distribution on Indian Pines dataset. The left figure demonstrates the number of samples (pixels) per class, and the right figure shows each class ratio with respect to the total number of pixels.

Figure 6. The overview of feature transformation’s impact on the Indian Pines dataset. The figure contains the feature distribution demonstrated via box–whisker plots for the original data and the transformed versions using standardization and min–max scaling.

Figure 7. Complete overview of the REFCV models’ performance in each recursion. Each recursion picks a subset of features after gradually eliminating the least important features. Therefore, each recursion is referable by the number of features (along the x axis).

Figure 8. The classification maps for the Indian Pines dataset yielded using the best models achieved by the framework are listed in Table 1 and Table 2. The upper row shows the dataset, the ground truth, and the legend of classes. The lower

2 \times 5

set of images presents the classification maps, each of which was predicted by the model and includes the core classifier indicated at the left and the feature extraction technique noted at the bottom. More specifically, the best model version among the three folds was utilized for this classification.

Figure 8. The classification maps for the Indian Pines dataset yielded using the best models achieved by the framework are listed in Table 1 and Table 2. The upper row shows the dataset, the ground truth, and the legend of classes. The lower

2 \times 5

set of images presents the classification maps, each of which was predicted by the model and includes the core classifier indicated at the left and the feature extraction technique noted at the bottom. More specifically, the best model version among the three folds was utilized for this classification.

Figure 9. The performance measures, inference accuracy, and time of the framework’s final optimized model compared to its innercorporated optimized models used as baseline references based on the Indian Pines dataset. The bars represent the average difference from the extended version of the end-to-end reference (just FT step), and the line anchors represent the standard deviations.

Table 1. The performance results related to the framework’s output optimized models using the SVM-RBF core classifier executed separately for each choice of feature extraction and in case of its absence within the pipeline configuration (to allow repeatability of process in case of system failure). It shows the details of optimized models after hyperparameter tuning on each outer fold of the framework and contains the corresponding final evaluation results. Highlighted rows indicate the best-performing model configurations.

Feature Extractor			Best Feature Selector				CV-Score		Fit Time (ms)		Score Time (ms)		Final Eval. (acc.)
Type	n_comp.	FT	Inner Model	n_feat.	Fold	Best C	Mean	std	Mean	std	Mean	std	Train	Test	Mean
None	-	std	RFECV-LR-l1	144	1	100	0.9192	0.0013	7.4338	0.5305	8.1113	0.1113	0.9867	0.9309	0.9300
	-	std	RFECV-SVM	152	2	100	0.9191	0.0031	6.8188	0.2807	4.2570	0.2871	0.9792	0.9254
	-	std	RFECV-LR-l1	144	3	100	0.9195	0.0039	12.8243	1.9853	9.6307	0.4263	0.9772	0.9338
PCA	140	std	RFECV-LR-l1	144	1	100	0.9210	0.0049	3.6325	1.0745	3.3119	0.4358	0.9862	0.9309	0.9291
	150	std	RFECV-SVM	152	2	100	0.9199	0.0052	6.1206	0.2424	6.0043	1.1208	0.9791	0.9242
	82	std	RFECV-LR-l1	144	3	1000	0.9191	0.0008	4.7658	1.0875	7.0312	4.4078	0.9955	0.9321
KPCA (kernel = poly)	90	std	RFECV-RF	91	1	10,000	0.8940	0.0045	42.4327	7.1176	4.2325	0.9557	0.9873	0.9183	0.9136
	148	std	RFECV-SVM	152	2	10,000	0.8924	0.0032	40.3297	2.7449	5.2146	0.3618	0.9621	0.9057
	148	std	RFECV-SVM	152	3	10,000	0.8882	0.0007	29.8639	1.4377	4.0141	0.1835	0.9753	0.9166
KPCA (kernel = rbf)	150	std	RFECV-SVM	152	1	10,000	0.8858	0.0118	22.7631	1.7372	2.6080	0.3169	0.9985	0.9204	0.9200
	152	std	RFECV-SVM	152	2	10,000	0.8872	0.0065	39.4927	2.5375	4.0813	0.5238	0.9975	0.9177
	150	std	RFECV-SVM	152	3	10,000	0.8812	0.0047	23.1344	2.2375	2.4789	0.2873	0.9980	0.9218
FastICA	90	none	RFECV-LR-l1	91	1	10,000	0.8378	0.0043	13.2319	4.6881	2.0039	0.1606	0.9775	0.8982	0.8943
	138	std	RFECV-SVM	152	2	10,000	0.8102	0.0046	112.5495	76.0340	3.7106	0.1691	0.9498	0.8940
	134	std	RFECV-LR-l1	144	3	10,000	0.8016	0.0088	66.8760	25.2353	5.1666	1.1551	0.9530	0.8908
LDA	14	std	RFECV-SVM	152	1	100	0.8617	0.0046	3.8267	0.1692	2.7674	0.4969	0.9312	0.8721	0.8710
	10	none	RFECV-LR-l1	144	2	100	0.8608	0.0041	1.1577	0.2938	0.6132	0.0795	0.9098	0.8630
	12	none	RFECV-LR-l1	144	3	100	0.8576	0.0050	0.9377	0.3076	0.6177	0.2036	0.9230	0.8779
KFDA (kernel = poly)	18	std	RFECV-SVM	152	1	10	0.8708	0.0050	24.5791	3.3779	2.8568	0.2584	0.9997	0.8879	0.8866
	24	std	RFECV-SVM	152	2	10	0.8541	0.0082	34.0298	6.8903	2.5398	0.2063	0.9956	0.8854
	20	std	RFECV-SVM	152	3	10	0.8488	0.0061	50.1823	8.0026	1.9981	0.1023	0.9961	0.8866
KFDA (kernel = rbf)	46	std	RFECV-LR-l1	144	1	10	0.9097	0.0067	75.7540	9.0190	2.1115	0.1897	0.9994	0.9257	0.9232
	58	std	RFECV-LR-l1	144	2	10	0.9075	0.0058	47.7175	14.7411	1.3764	0.0251	0.9991	0.9236
	50	std	RFECV-LR-l1	144	3	10	0.9032	0.0069	52.8902	10.9802	1.7634	0.1559	0.9991	0.9202
LLE (nn = 3)	150	std	RFECV-SVM	152	1	10,000	0.7381	0.0046	33.3914	0.1052	8.0919	0.4409	0.8378	0.7562	0.7641
	148	std	RFECV-SVM	152	2	10,000	0.7334	0.0050	41.8622	0.8712	4.8391	1.3178	0.8196	0.7567
	152	std	RFECV-SVM	152	3	10,000	0.7295	0.0037	45.8610	4.5508	11.6997	1.4774	0.8481	0.7793

Table 2. The performance results related to the framework’s output optimized models using the MLP core classifier executed separately for each choice of feature extraction and in case of its absence within the pipeline configuration (to allow repeatability of process in case of system failure). It shows the details of optimized models after hyperparameter tuning on each outer fold of the framework and contains the corresponding final evaluation results. Best configurations are highlighted in yellow.

Feature Extractor			Best Feature Selector			cls. hyppr.		CV-Score		Fit Time (ms)		Score Time (ms)		Final Eval. (acc.)
Type	n_comp.	FT	Inner Model	n_feat.	Fold	Activation	$α$	Mean	std	Mean	std	Mean	std	Train	Test	Mean
None	-	std	RFECV-LR-l1	144	1	logistic	0.001	0.9246	0.0017	105.6994	7.9159	0.0405	0.0175	0.9817	0.9248	0.9327
	-	std	RFECV-LR-l1	144	2	logistic	0.001	0.9273	0.0054	133.5125	20.9237	0.0294	0.0092	0.9864	0.9321
	-	std	RFECV-LR-l1	144	3	logistic	0.001	0.9254	0.0041	164.0306	13.2748	0.0790	0.0363	0.9890	0.9412
PCA	144	std	RFECV-SVM	152	1	logistic	0.01	0.9280	0.0026	48.3415	2.4717	0.0247	0.0025	0.9952	0.9277	0.9338
	144	std	RFECV-LR-l1	144	2	logistic	0.01	0.9273	0.0034	119.4399	7.7521	0.0264	0.0009	0.9963	0.9309
	116	std	RFECV-LR-l1	144	3	tanh	0.1	0.9286	0.0039	57.4218	3.9009	0.0187	0.0013	0.9782	0.9429
KPCA (kernel = poly)	106	std	RFECV-SVM	152	1	relu	0.01	0.9232	0.0041	66.4434	2.8286	1.9985	0.2336	0.9704	0.9295	0.9328
	144	std	RFECV-SVM	152	2	relu	0.01	0.9207	0.0046	56.4857	1.2998	1.3047	0.2012	0.9871	0.9365
	148	std	RFECV-SVM	152	3	relu	0.01	0.9251	0.0050	56.2479	1.5859	1.2445	0.0929	0.9821	0.9324
KPCA (kernel = rbf)	124	std	RFECV-SVM	152	1	relu	0.001	0.9166	0.0072	89.9690	16.2288	0.5233	0.0784	0.9978	0.9303	0.9320
	148	std	RFECV-SVM	152	2	relu	0.001	0.9202	0.0038	124.0150	49.9485	1.2152	0.0715	0.9953	0.9309
	150	std	RFECV-SVM	152	3	relu	0.001	0.9176	0.0023	85.7280	8.2568	1.0124	0.0355	0.9974	0.9347
FastICA	32	std	RFECV-RF	91	1	relu	0.001	0.8861	0.0089	839.3872	125.4677	0.3442	0.2158	0.9763	0.9189	0.9161
	34	std	RFECV-SVM	152	2	relu	0.001	0.8813	0.0033	459.8168	61.6437	0.1076	0.0085	0.9671	0.9098
	54	std	RFECV-RF	91	3	relu	0.001	0.8793	0.0048	576.9616	35.2544	0.2097	0.0730	0.9786	0.9195
LDA	14	std	RFECV-SVM	152	1	tanh	0.1	0.8605	0.0022	19.3157	2.5674	0.0115	0.0002	0.9403	0.8739	0.8704
	12	none	RFECV-LR-l1	144	2	relu	0.1	0.8597	0.0061	16.4290	3.3352	0.0068	0.0010	0.9281	0.8621
	14	std	RFECV-LR-l1	144	3	tanh	0.1	0.8566	0.0047	21.7565	3.3710	0.0096	0.0014	0.9497	0.8753
KFDA (kernel = poly)	12	std	RFECV-RF	91	1	logistic	0.001	0.7858	0.0100	43.3090	2.1410	2.0858	0.0229	0.9903	0.8695	0.8640
	14	std	RFECV-RF	91	2	logistic	0.001	0.7521	0.0029	75.0035	22.8304	1.9501	0.0102	0.9897	0.8601
	14	std	RFECV-RF	91	3	logistic	0.001	0.7599	0.0083	69.9022	17.5838	1.7962	0.0207	0.9889	0.8625
KFDA (kernel = rbf)	54	std	RFECV-LR-l1	144	1	tanh	0.1	0.9071	0.0060	102.8650	10.4852	0.3131	0.0527	0.9999	0.9236	0.9214
	58	std	RFECV-LR-l1	144	2	tanh	0.1	0.9053	0.0078	99.5602	9.6832	0.4866	0.0434	0.9998	0.9208
	62	std	RFECV-LR-l1	144	3	tanh	0.1	0.9061	0.0072	87.9921	7.2003	0.5612	0.4089	0.9998	0.9199
LLE (nn = 3)	150	std	RFECV-SVM	152	1	relu	0.001	0.7545	0.0049	149.4420	123.9453	1.8298	0.9244	0.8431	0.7735	0.7765
	152	std	RFECV-SVM	152	2	relu	0.001	0.7604	0.0053	178.8375	99.3189	1.3647	0.1900	0.8452	0.7728
	150	std	RFECV-SVM	152	3	relu	0.001	0.7575	0.0016	350.9361	121.4696	1.2645	0.1498	0.8690	0.7831

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vali, A.; Comai, S.; Matteucci, M. An Automated Machine Learning Framework for Adaptive and Optimized Hyperspectral-Based Land Cover and Land-Use Segmentation. Remote Sens. 2024, 16, 2561. https://doi.org/10.3390/rs16142561

AMA Style

Vali A, Comai S, Matteucci M. An Automated Machine Learning Framework for Adaptive and Optimized Hyperspectral-Based Land Cover and Land-Use Segmentation. Remote Sensing. 2024; 16(14):2561. https://doi.org/10.3390/rs16142561

Chicago/Turabian Style

Vali, Ava, Sara Comai, and Matteo Matteucci. 2024. "An Automated Machine Learning Framework for Adaptive and Optimized Hyperspectral-Based Land Cover and Land-Use Segmentation" Remote Sensing 16, no. 14: 2561. https://doi.org/10.3390/rs16142561

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Automated Machine Learning Framework for Adaptive and Optimized Hyperspectral-Based Land Cover and Land-Use Segmentation

Abstract

1. Introduction

2. Materials and Methods

2.1. Framework Overview

2.2. Model Pipeline Configuration

2.2.1. Data Preprocessing and Feature Engineering

2.2.2. Core Image Segmentation Task

2.3. Optimization Strategies

2.4. Implementation

2.5. Testing Dataset

3. Results

3.1. Part1: Feature Transformation–Selection

3.2. Part 2: The Whole Framework

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI