1. Introduction
Hyperspectral imaging (HSI) sensors gather extensive information through 3D hyperspectral images, with spatial dimensions for object visual characteristic detection and the spectral dimension helping with material identification. The extraction of such information from hyperspectral images is beyond a manual task and requires advanced computer-aided techniques. Each hyperspectral image consists of numerous narrow-band images of the same scene, thus requiring tailored image processing methods for thorough analysis. Accordingly, while conventional image principles and processing techniques apply to hyperspectral data, their comprehensive utilization requires additional adaptation and effort.
Remote sensing not only serves as the origin and primary domain of HSI technology but also drives most of its advancements and applications. Its foremost application lies under earth observation to detect and monitor the physical characteristics of areas and objects on the earth. Monitoring changes in land cover is key for better formulating and managing regulations to prevent or mitigate damage that results from human activities [
1]. Furthermore, monitoring subtle yet significant alterations in land cover aids in predicting and even preventing natural disasters and hazardous events [
2]. The continuous temporal availability of remote sensing data can substantially facilitate the automatic extraction, mapping, and monitoring of terrestrial objects and land covers. Being sensitive to narrow spectral bands across a continuous spectral range, HSI arguably emerges as the most promising method for acquiring remote sensing data, as it is remarkably informative and holds the potential to revolutionize earth monitoring capabilities [
3].
In recent decades, the exponential growth of computational power has shifted Artificial Intelligence (AI) to the forefront as the most influential and transformative technology of our time. AI harnesses the capabilities of computing systems for training and inference, which facilitates a broad spectrum of applications. Machine Learning (ML), a prominent subdiscipline of AI, employs statistical algorithms to emulate human learning processes by leveraging available data. Through this process, Machine Learning (ML) produces statistical models capable of making predictions for new unseen data. A fundamental task within ML is classification, where objects are identified and categorized. Recent advancements in ML, particularly in image classification and segmentation, underscore the immense potential of these techniques in hyperspectral image analysis [
4]. Research indicates that ML methods surpass traditional approaches in hyperspectral image analysis, which typically involves manual or semimanual examination of the spectral information to identify objects and materials [
5]. Unlike conventional methods, ML autonomously explores the relationship between the spectral information and desired outcomes during the learning phase, thus exhibiting robustness against noise and outliers in the dataset. Among ML methodologies, supervised learning stands out as the preferred approach due to its simplicity, speed, cost-effectiveness, and reliability.
Semantic segmentation, the primary task in hyperspectral image analysis, entails assigning one or multiple labels to every pixel in a given image, thus generating segmented maps as output. This process utilizes both spectral and spatial information to exploit the physical and chemical characteristics of constituent objects and areas. Hyperspectral segmentation essentially performs pixel-level classification, thus distinguishing it from patch-level classification, which assigns labels to pixel patches. While Deep Learning (DL), a subgroup of ML methodologies, has significantly advanced semantic segmentation in RGB images in recent years [
6], hyperspectral image segmentation presents additional complexity due to its spectral dimension [
7], which usually contains the most informative details to distinguish the objects and areas within the image. Therefore, standard DL models need customization with additional considerations so that they can properly meet the requirements of the hyperspectral image segmentation task.
Despite the significant potential of ML in hyperspectral image analysis, several critical challenges persist that call for further research and development. One such challenge is the curse of dimensionality [
8], which is a phenomenon wherein the computational complexity of a problem escalates drastically with the increase in variables, dimensions, or features. This challenge is particularly pronounced in hyperspectral image analysis due to its spectral dimension, which leads to sparsity and exacerbates the curse of dimensionality. Consequently, efficient exploitation of information from hyperspectral images requires adopting proper strategies for dimensionality reduction. Despite ongoing efforts, devising approaches that effectively reduce dimensionality while maximizing the preservation of valuable information remains a challenge, which is crucial for enhancing the practicality of ML solutions in real-world scenarios [
9].
Another critical obstacle in hyperspectral image analysis, particularly for DL techniques, is the ground truth scarcity [
10]. DL methods often require extensive training data with a ground truth, which is typically challenging to obtain. This scarcity not only hampers model training but also leads to overfitting and low model performance. Consequently, classical ML techniques like Support Vector Machines (SVMs) may outperform DL in scenarios with limited training data [
11]. Some strategies, including data augmentation [
12], semisupervised learning [
13,
14], and transfer learning [
15,
16,
17], aim to address this challenge by respectively augmenting existing datasets, labeling unlabeled data, or transferring knowledge from pretrained models to new datasets so that they can partially mitigate the impact of ground truth scarcity on DL performance.
Ensuring the robustness, reliability, and generalizability of ML models poses another significant challenge [
18,
19]. Current datasets often lack the necessary variability to develop robust models capable of performing reliably under diverse conditions. Factors such as acquisition time, setup variations, sensor resolutions, and noise levels are frequently overlooked, thus leading to the development of models based on limited data scenarios. Building robust models that can accommodate a wide range of conditions remains a pressing challenge, which is essential for ensuring the practicality of ML solutions in real-world applications.
Furthermore, the accelerating pace of AI development raises concerns regarding computational limitations. Moore’s Law [
20], which has historically explained computational progress, is nearing its physical boundaries, which calls for immediate innovative alternative approaches to sustain future advancements. Incorporating more transistors on a microchip is no longer possible, thus approaching physical limits to further miniaturization [
21]. Meanwhile, memory production faces similar constraints, as the demand, particularly driven by AI and the Internet of Things (IoT), outpaces the production capacities [
22]. Addressing these challenges requires concerted efforts in software optimization, algorithmic innovations, and architectural advancements to ensure the continued progress of AI and ML technologies [
23,
24].
One of the most significant developments in DL is the emergence of end-to-end pipeline structures. These structures integrate the feature engineering process with training–validation stages, thus consolidating the conventional pipeline’s four main steps: preprocessing, feature engineering, training–validation, and postprocessing. Despite the growing popularity of end-to-end DL models for hyperspectral image segmentation [
25], several concerns and challenges persist, thus making the classic four-stage ML pipeline structure more suitable for real-world applications, as explained in [
7]. Dimensionality reduction poses another critical challenge, thus requiring the careful preservation of valuable information. Classical ML pipelines address this by conducting feature engineering, which not only streamlines data but also enhances model robustness and reliability by projecting features into a more accurate space, thereby helping to mitigate overfitting issues caused by the limited availability of ground truth data. Although feature engineering improves classifier performance and optimizes resource consumption, its heuristic nature does not guarantee optimal solutions. In contrast, end-to-end models leverage global optimization to identify optimal features during training and validation, thus offering potential improvements in efficiency and performance.
This paper proposes a framework that integrates the classic four-stage ML pipeline structure with end-to-end optimization capabilities, thus specifically tailored to address challenges encountered in hyperspectral segmentation tasks for real-world applications. Feature engineering enables dimensionality reduction, hence lowering the impact of ground truth scarcity on the model performance. We propose a strategy to decompose feature engineering into distinct inner steps, thus enabling the design and development of a framework that generates and optimizes multiple models through various combinations of these steps, including scenarios where feature engineering steps are omitted. This approach facilitates optimized model selection and enables comparative evaluations across different pipeline configurations.
Furthermore, we extend this framework concept into a prototype Automated Machine Learning (AutoML) system [
26]. An AutoML framework automates the end-to-end process of applying machine learning to real-world problems, including data preprocessing, feature engineering, model selection, and hyperparameter tuning [
27,
28]. By incorporating various consolidated techniques at different stages of the pipeline, our system identifies the most suitable methods for specific prediction tasks and input data. Our holistic scheme ensures diverse optimization requirements, including data versioning, model selection, and hyperparameter tuning. This enhances the generalizability, reliability, robustness, repeatability, and tractability of the resulting models while also allowing for effective monitoring and the mitigation of overfitting. The efficient implementation of our optimization scheme facilitates resource management and minimizes the risk of system failure. Additionally, our integrated midprocess statistics reporting enables a systematic review of AutoML behavior and choices, thus providing deeper insights into the effectiveness of each step. Finally, we evaluate our framework using a well-established problem with a widely cited dataset in the literature, thus allowing readers to benchmark our results against state-of-the-art approaches.
2. Materials and Methods
2.1. Framework Overview
End-to-end DL approaches face challenges such as increased processing time inefficiencies and concerns regarding model robustness and generalizability, particularly with high-dimensional data like hyperspectral images. Wolpert’s “No Free Lunch” [
29,
30] theorem highlights the absence of a universally superior supervised learning algorithm, thus emphasizing the need to tailor approaches to individual classification problems. Therefore, despite end-to-end DL models showing a great capacity to generalize well in practice, it is theoretically unclear and is still being questioned [
31,
32,
33,
34]. DL models also require extensive training data, thus exacerbating the challenge of ground truth scarcity in hyperspectral image segmentation tasks. Despite attempts to mitigate these challenges through techniques like unsupervised learning, the curse of dimensionality and increasing model complexity hinder processing efficiency, thus making traditional four-stage machine learning pipelines more suitable for real-world applications.
Figure 1 shows the high-level workflow map we followed for designing and implementing the approach we propose in this research. It comprises three key phases: data engineering, model generation and training, and prediction and evaluation. In the data engineering phase, tasks involve preparing datasets by collecting, preprocessing, and splitting them into train and test sets. This phase is critical for ensuring model performance and the comparability of produced models. The model generation and training phase focuses on training models, tuning hyperparameters, and packaging models. This phase is resource-intensive and aims to enhance classifier performance through feature engineering methodologies. At its core is the model tweaking process, which systematically combines the different proposed steps of feature engineering with diverse classifiers to optimize hyperparameters. This iterative approach ensures that the resulting models are finely tuned for optimal performance across diverse evaluation metrics. The prediction and evaluation phase utilizes the optimized models to predict unseen data, thus extracting evaluation metrics to compare and analyze model performance and ultimately leading to conclusions.
2.2. Model Pipeline Configuration
As reasoned before, the proposed model pipeline configuration in this research adopts a four-stage ML structure comprising preprocessing, feature engineering, core classification/segmentation, and postprocessing steps, with the latter being optional and deferred for later inclusion. This configuration supports two modes: training and prediction. During training, a trained pipeline, or model, is generated and then utilized in the prediction mode to assign labels to new data.
2.2.1. Data Preprocessing and Feature Engineering
Data preprocessing is a critical stage that refines datasets by removing noise and validating data correctness. While some studies include feature engineering tasks within data preprocessing, in this study, we separated them for clarity and conducted preprocessing tasks before data versioning.
In general, feature engineering, as the core of the four-stage ML pipeline structure, aims to optimize features to improve model performance. In the context of high-dimensional data classification tasks, such as hyperspectral image segmentation, feature engineering becomes particularly crucial due to the mathematical complexities introduced by data high dimensionality. By removing redundant information and reducing dimensionality, feature engineering enhances the computational efficiency, performance, and reliability of ML models. The proposed framework assesses different types of feature engineering methodologies, including feature transformation, feature selection, and feature extraction, with each serving distinct roles in enhancing the classification performance. Feature transformation involves mathematical operations to improve feature consistency, while feature selection reduces data dimensionality by selecting relevant subsets. Feature extraction projects data into a lower-dimensional feature space, thus further enhancing computational efficiency. By systematically optimizing the feature engineering steps, the framework aims to improve classification performance and optimize resource consumption, thus providing valuable insights about the dataset and its potential applications.
The framework employs a brute-force assembly process within an iterative loop for hyperparameter tuning, thus generating a set of models with different combinations of feature engineering steps, which are schematically shown in
Figure 2. These combinations are strategically positioned within the model pipeline to maximize the effectiveness. Feature transformation is placed at the beginning to preprocess data effectively, while feature extraction, which incorporates a form of feature selection, is positioned last. This positioning ensures the optimal utilization of feature engineering methodologies and avoids redundancy in the pipeline. By delineating clear categories of feature engineering and their respective roles, the framework offers a systematic approach to assess and optimize feature engineering for diverse ML tasks, thus contributing to enhanced model performance and resource efficiency. Our chosen approach and methodology for each type of feature engineering are elucidated as follows:
Feature transformation (FT): In this study, we employed two main techniques for feature transformation, Normalization (Min–Max scaling) and Standardization (standard scaling), due to their effectiveness in enhancing classification accuracy. Normalization is suitable for datasets with small standard deviations and non-normal feature distributions, while standardization helps transform data to a normal distribution, thus improving convergence and classification performance. Other scaling techniques like Maximum Absolute and Robust scalers were excluded due to the dataset characteristics, while Quantile and Power transformer scalers were deemed unsuitable for the study’s purposes [
35,
36,
37].
Feature selection (FS): The automatic feature selection approaches typically involve a combination of feature subset search methods and evaluation techniques to rank or prioritize features based on their correlations or importance in predictive tasks. These methods are categorized into Wrappers, Filters, and Embedded methods [
38]. Wrappers utilize a predictor to assess the usefulness of feature subsets, thus often leading to overfitting, while Filters rely on correlation coefficients or Mutual Information among features, though they may fail to find the best feature set in certain scenarios due to insufficient sample size. Embedded methods, on the other hand, embed the feature subset generator within model training to reduce overfitting and increase efficiency.
Embedded feature selection methods aim to optimize both the goodness of fit and the number of features, which is often achieved through direct objective optimization or thresholding techniques. Linear models with LASSO regularization and Random Forest are examples of direct objective optimization approaches, while thresholding methods like Ridge regularization offer an alternative solution. However, selecting the optimal threshold or number of features remains a challenge, thus often requiring empirical tuning. Another group of embedded methods utilizes nested subsets to manipulate feature subset search, thus employing forward selection or backward elimination techniques. While forward selection is computationally efficient, backward elimination provides a more accurate subset in a general context. Therefore, we employed the Recursive Feature Elimination with Crossvalidation approach (RFECV), which is a common backward elimination approach incorporating crossvalidation for robust feature selection.
For this study, we incorporated Random Forest (RF), Logistic Regression (LR) using Lasso (L1) and Ridge (L2) regularization, the Linear Support Vector Machine (LinearSVM), and K-Nearest Neighbors (KNN) into the RFECV structure as the base estimators [
39,
40]. RF assesses feature importance by evaluating the decrease in node impurity within its decision trees, thus using Gini impurity to quantify the likelihood of misclassifying a random observation. The LinearSVM and LR determine feature importance based on the coefficients assigned to each variable within their linear model. Although KNN does not inherently offer a measure of feature importance, it can be extracted using the Permutation Feature Importance technique [
41], which involves permuting feature values to gauge their impact on model precision, thereby integrating feature importance within the KNN model.
Feature extraction (FE): Feature extraction techniques primarily focus on reducing dimensionality by transforming data from a high-dimensional feature space to a lower-dimensional one. Unlike feature selection, which discards certain features, feature extraction aims to summarize information while highlighting important details and suppressing less relevant ones. While convolutional neural networks excel at feature extraction, the computational complexity and demand for extensive training data pose challenges, thus aligning with concerns regarding end-to-end pipelines. Therefore, in this study, we employed alternative feature extraction techniques such as Principal Component Analysis (PCA), Kernel Principal Component Analysis (KPCA), Independent Component Analysis (ICA), Linear Discriminant Analysis (LDA), Kernel Fisher’s Discriminant Analysis (KFDA), and Locally Linear Embedding (LLE).
PCA computes principal components for linear dimensionality reduction by maximizing variance via eigenvectors [
42], while KPCA extends this by using a kernel matrix to enable nonlinear dimensionality reduction [
43,
44]. Generalizing the PCA approach, ICA aims to extract maximally independent components, instead of principal components, from the original features, thus relying on the assumption of mutual statistical independence and non-Gaussian distribution of the components [
45]. LDA, unlike PCA and ICA, is a supervised technique that employs a discriminant rule to project data into a lower-dimensional space, thus typically reducing the dimensions to
, where
K represents the number of classes. It can further reduce the dimensions by projecting data into a subspace with dimension
L, if
, which is similar to PCA’s approach of selecting the first L eigenvectors of the projection [
46]. KFDA focuses on maximizing the ratio of between-class variance to within-class variance using Fisher’s Discriminant Analysis in the feature space [
47], while LLE preserves the distances within local neighborhoods, thus mapping data to a low-dimensional space based on optimal linear reconstructions from nearby data points [
48].
2.2.2. Core Image Segmentation Task
Hyperspectral image analysis typically involves retrieving valuable information from both the spectral and spatial characteristics of image pixels. Segmentation methods for hyperspectral images vary, thus ranging from spectral-based approaches—often referred to as pixelwise classification approaches—to those integrating both spatial and spectral data—which are known as spatial–spectral classification. Common methods for incorporating spatial information include superpixel approaches [
49], spatial filtering [
50], and utilizing 3D CNN structures [
51]. In scenarios where the spatial information is less relevant, such as material identification or component detection, the spectral information becomes paramount.
While some studies advocate for spatial–spectral approaches in hyperspectral image segmentation, they often fail to demonstrate advantages over spectral pixelwise classification methods. Coarse spatial resolution can further diminish the usefulness of spatial details, particularly in land-use and land cover segmentation problems. Accordingly, pixelwise classification for the hyperspectral segmentation task has been employed in this study. We recommend integrating spatial features through image processing tools in the preprocessing and postprocessing phases for future cases where the spatial characteristics are more meaningful, as in high-resolution scenarios. In the evaluation of the proposed framework, two popular classifiers, the Nonlinear Support Vector Machine with Radial Basis Function Kernel (KernelSVM-RBF) and Multilayer Perceptron (MLP), were selected to assess the impact of optimized feature engineering on classification performance, thus ensuring comparability with state-of-the-art results. For MLP, we adopted a shallowwide network with a single hidden layer of size 1000 to simplify the framework evaluation by reducing the classifier complexity and highlighting the impact of feature engineering steps. However, our chosen classifiers for this framework may not represent the optimal choice, thus suggesting potential improvements with better classifiers.
2.3. Optimization Strategies
Arguably, the most critical research challenges in using ML techniques concern how to solve, verify, validate, and compare the employed models. Accordingly, our AutoML framework is designed to incorporate different optimization strategies to solve the model’s inner mathematical problem, tune its parameters, and validate the results. It is specifically designed to ensure the solution’s reliability, generalizability, reproducibility, and comparability. Typically, optimization involves regularization, hyperparameter tuning, and model selection. In addition to that, we also adopted data versioning to facilitate managing the iterative process of model development by creating distinct versions of datasets to track changes and updates.
As a primary step, data versioning was achieved through stratified k-folding—a sampling approach for crossvalidation—where the dataset was divided into representative folds for training and testing, with . These versions were stored separately, thus ensuring reproducibility and aiding in model comparison and selection. The testing portion remained untouched for final evaluation and model selection, while the rest was used for training and hyperparameter tuning. We performed random shuffling before k-folding to ensure unbiased samples.
Regularization is an added objective specific to each ML methodology that helps to avoid overfitting by defining a loss function to optimize the model inner parameters. On the other hand, hyperparameter tuning optimizes input parameters through an iterative process, which is often conducted through heuristic approaches like Grid Search and Random Search. We adopted Grid search to ensure the reproducibility and comparability of the models, which is especially crucial for small datasets, as random search’s high variance and lack of reproducibility can hinder performance evaluation.
Back to crossvalidation, the same stratified k-folding with
was also used for hyperparameter tuning. Crossvalidation helps monitor the possibility of overfitting by systematically rotating through different subsets of the data for training and validation. To prevent bias in evaluating the model performance, we adopted the Nested Crossvalidation (or Double Crossvalidation) approach, as depicted in
Figure 3. This involves outer iterations for model selection and inner iterations for hyperparameter tuning, with each using stratified crossvalidation. The best-performing model from inner iterations was retrained on its entire training version subset to produce the final model.
The feature engineering proposed in this study emphasizes the independence of feature transformation and selection from the classifier, thus ensuring flexibility and adaptability across different ML tasks. While techniques like PCA and LDA allow for the flexible adjustment of feature components, others like LLE, ICA, and LFDA pose challenges due to the varying interpretations of component numbers. The dependence of feature extraction on the classifier’s performance prompts tuning of the number of components as a pipeline hyperparameter across different classifiers to determine the optimal value. Consequently, feature extraction’s dependency on the classifier’s choice necessitates tuning the number of components as a pipeline hyperparameter across various classifiers to determine its optimal value, as depicted in the schematic diagram in
Figure 4.
As previously described, we implemented an embedded approach known as backward elimination for feature selection optimization. This method ensured the independence of the feature selection process, thus leading to a notable reduction in the computational load and enabling unbiased model inference. Accordingly, each outer–inner iteration combination of feature transformation and selection technique was conducted separately before model training and hyperparameter tuning. The best-performing feature subset across iterations was selected through maximum voting.
Figure 4 illustrates this strategy, thus demonstrating how the chosen feature subset is integrated into the pipeline configuration to streamline dataset shrinkage.
Finally, we used the framework to select the most accurate model for hyperspectral classification tasks, thus implicitly determining the most effective techniques. As explained previously, the framework incorporates several combinations of pipeline steps, including the end-to-end pipelines (without feature engineering) for explanatory purposes, thus allowing us to compare and demonstrate the efficacy of feature engineering. Accordingly, the primary evaluation compared the outcomes of the proposed four-stage pipeline with an end-to-end structure, thus assessing the computational time and predictive performance. The performance evaluation involved calculating the predictive accuracy using the test portion of the dataset from the outer iterations, with the accuracies averaged for comparison. Additional metrics like F1 Score, Precision, and Recall were also calculated to provide further statistical insight into model performance.
The secondary objective emphasizes the stand-alone nature of feature engineering, thus reducing the computational burden and providing insights for robust dimension reduction. As mentioned earlier, unlike other feature engineering steps, feature extraction’s parameter optimization depends on the choice of the classifier. Therefore, we can assess if tuning the feature extraction parameters improves the prediction and whether this improvement is influenced by the classifier choice.
2.4. Implementation
The implementation of the framework is based on Python 3.6.x, which was chosen for its widespread support and compatibility with all utilized packages. Various libraries, including Pandas, Numpy, Matplotlib, mpl_toolkits, scikit-learn, scikit-image, kfda, pillow, Pickle, scipy, math, and others, were employed for different tasks such as data analysis and result visualization. The deployment and execution occurred on a Docker-managed server utilizing CPU cores exclusively. The server’s multiuser nature dictates CPU core allocation, with 10 cores designated for feature transformation–selection and the grid search split into 16 parallel executions, with each utilizing 5 cores. Communication with the server and result visualization were facilitated through the Jupyter Notebook web service, thus enabling remote access via SSH Tunnelling for efficient management and access to execution results.
2.5. Testing Dataset
To evaluate the framework, we used the Indian Pines [
52] dataset, which is a hyperspectral image capturing a scene from the Indian Pines test site in northwest Tippecanoe County, Indiana, US covering a
mile portion, including the Purdue University Agronomy farm and its surroundings. Captured by the AVIRIS sensor aboard a NASA aircraft on 12 June 1992, the image comprises
pixels and 224 spectral reflectance bands in the range of 0.4–2.5
m. Accessible through the Purdue University Research Repository [
53], the dataset has 220 spectral bands due to noise removal. It is already calibrated, and the pixel values are proportional to radiance. The ground truth contains 16 classes predominantly related to agriculture and some to forests and natural perennial vegetations, as well as features elements like highways, a rail line, housing, and built structures, with some unlabeled areas.
Figure 5 shows the number of labeled pixels per class and their percentage in the ground truth set.
The choice of the dataset was intentionally small in volume to align with the primary aim of this research: tackling the ground truth scarcity issue. In real-world applications of remote sensing data, collecting accurate and reliable ground truth informationfor each potential class on the ground is costly and labor-intensive but also often too difficult due to limited accessibility to regions and the temporal variability of on-ground objects. Consequently, ground truth scarcity is an inevitable issue with hyperspectral datasets. Therefore, using the Indian Pines dataset, which encapsulates the challenges of high-dimensional data with limited labeled samples, allowed us to effectively evaluate the proposed framework’s ability to handle these constraints. The inherent complexity of the dataset, due to the variety of classes and the presence of mixed pixels, further tested the robustness and reliability of our AutoML framework in hyperspectral image segmentation tasks.
4. Discussion
In ML, performance is a trade-off between the accuracy and time of the process. It stems from the fact that there is neither absoluteness in any AI task nor in ML. Specifically, ML adopts statistical learning techniques that rely on error minimization, which never reaches zero when dealing with real-world datasets. The efficiency in an ML model is also defined as a compromise between these two factors, accuracy and time, which arerestricted by the requirements of the specified application. Accordingly, the primary objective in our proposed framework is to improve hyperspectral image segmentation through feature engineering, which also involves striking a balance between the accuracy and inference time. The reduced inference time is particularly crucial for real-time applications, where reducing latency and increasing throughput are of concern.
As previously elucidated, our framework encompasses various models constructed from all conceivable pipeline configurations, thus facilitating a benchmarking setup to assess the impact of feature engineering on hyperspectral image segmentation tasks.
Figure 9 illustrates a comparison of the mean accuracy and mean inference time across different models, thus incorporating or excluding various feature engineering steps. To establish a benchmark for comparing the effect of feature engineering, we categorized the pipeline structures based on state-of-the-art approaches into five reference categories: the end-to-end pipeline or “no feature engineering steps”, “only with feature transformation (FT) step”, “with FT and feature extraction (FE) steps”, “with FT and feature selection (FS) steps”, and “with all feature engineering steps” pipelines.
Figure 9 depicts the mean accuracy and inference time of the best-performing model across all these configuration categories. It is important to note that for clarity, all values are presented relative to the “only with FT step”. Thus, at the “only with FT step”, the mean value for the accuracy and inference time is depicted as zero. Notably, standardization consistently emerged as the optimal feature transformation technique, with PCA being the preferred feature extraction method across all cases. Additionally, RFECV-LR-l1 has been identified as the optimal feature selection technique.
The results reveal a consistent improvement in the prediction performance and a reduction in the inference time with the inclusion of feature engineering steps. Notably, while feature transformation and feature selection significantly enhanced performance, feature extraction had a relatively minor impact. This underscores the efficacy of the proposed feature transformation–selection approach for dimensionality reduction while improving performance. It is suggested that focusing on feature transformation and selection, independent of the core classifier choice, can greatly enhance AutoML performance.
We have also assessed the framework using another hyperspectral dataset from an industrial setup, which has been reported thoroughly in [
54]. This dataset focuses on a specific application in detecting and measuring residual contaminants in the production of washing machine cabinets for zero-defect manufacturing. The choice of this dataset is significant, because it not only involves a different sensor and setup but also targets classes with less distinct spectral signatures, thus providing a more stringent evaluation of the framework’s capabilities. Overall, the results are consistent. The AutoML framework produced the optimized robust model and impacted the overall performance. The findings underscore the importance of tailored feature engineering strategies in optimizing model performance and efficiency across diverse datasets and scenarios. The industrial assessment provides a comprehensive and thorough test of the framework’s transferability, thus highlighting its capability to adapt to different contexts and confirming its generalizability beyond similar case studies.
5. Conclusions
In this study, we introduced an AutoML framework tailored for hyperspectral image segmentation, thus highlighting the effectiveness of a classic four-stage ML pipeline structure that integrates feature engineering to address data challenges. Through feature engineering and dimensionality reduction techniques, we mitigated the necessity for a large quantity of labeled data, which is particularly a challenge in the case of end-to-end deep learning models. Additionally, the proposed framework not only improved overall performance but also generated models with more accurate predictions within shorter inference times. Through a multilateral optimization approach, the framework ensured model robustness and reliability by mitigating overfitting and bias concerns. The transparency of the framework’s process, allowing access to all models and statistical information, facilitated the validation of the proposed approach’s effectiveness. Overall, the study successfully achieved its objectives in addressing challenges with hyperspectral image-based machine learning solutions.
The hyperspectral-based AutoML framework presented in this study offers a streamlined solution for developing supervised ML models tailored to specific hyperspectral image segmentation tasks. Automating the task-specific feature engineering process simplifies what is typically a complex and expertise-intensive endeavor. Notably, the proposed feature selection component, identified as a key factor in enhancing predictive performance, can pinpoint irrelevant features, thus enabling informed decisions regarding data collection, transmission, and storage efficiency.
In considering future directions for AutoML, it is crucial to prioritize sustainability, efficiency, scalability, and inclusiveness. Addressing the resource-intensive nature and carbon emissions associated with AutoML processes is essential, and efforts should be made to minimize footprints through optimization criteria like overall runtime, energy consumption, and CO2 emissions. Enhancing efficiency remains a key focus, particularly through algorithmic improvements and resource consumption optimization, and there is an opportunity to assess and refine frameworks for better performance. Additionally, scalability and inclusiveness are critical for making AutoML more accessible across different fields and applications. The proposed framework offers problem-specific solutions and can be scaled up independently or integrated into other AutoML frameworks. Future research should explore alternative feature engineering sequences and incorporate multiple techniques within each category to improve dimensionality reduction effectiveness. Evaluating additional ML and image processing methodologies, along with exploring new evaluation metrics and model selection techniques, can provide a more comprehensive analysis of AutoML frameworks. Moreover, extending the evaluation to new applications will help establish the generalizability of the proposed approach across diverse hyperspectral datasets and applications.