Exploring the Potential of PRISMA Satellite Hyperspectral Image for Estimating Soil Organic Carbon in Marvdasht Region, Southern Iran

Golkar Amoli, Mehdi; Hasanlou, Mahdi; Taghizadeh Mehrjardi, Ruhollah; Samadzadegan, Farhad

doi:10.3390/rs16122149

Open AccessArticle

Exploring the Potential of PRISMA Satellite Hyperspectral Image for Estimating Soil Organic Carbon in Marvdasht Region, Southern Iran

by

Mehdi Golkar Amoli

¹,

Mahdi Hasanlou

^1,*

,

Ruhollah Taghizadeh Mehrjardi

²

and

Farhad Samadzadegan

¹

School of Surveying and Geospatial Engineering, College of Engineering, University of Tehran, Tehran 14174-66191, Iran

²

Faculty of Agriculture, Ardakan University, Ardakan 89516-56767, Iran

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(12), 2149; https://doi.org/10.3390/rs16122149

Submission received: 23 April 2024 / Revised: 8 June 2024 / Accepted: 10 June 2024 / Published: 13 June 2024

(This article belongs to the Special Issue New Advances in Machine Learning for Soil Properties Prediction and Mapping)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Soil organic carbon (SOC) is a crucial factor for soil fertility, directly impacting agricultural yields and ensuring food security. In recent years, remote sensing (RS) technology has been highly recommended as an efficient tool for producing SOC maps. The PRISMA hyperspectral satellite was used in this research to predict the SOC map in Fars province, located in southern Iran. The main purpose of this research is to investigate the capabilities of the PRISMA satellite in estimating SOC and examine hyperspectral processing techniques for improving SOC estimation accuracy. To this end, denoising methods and a feature generation strategy have been used. For denoising, three distinct algorithms were employed over the PRISMA image, including Savitzky–Golay + first-order derivative (SG + FOD), VisuShrink, and total variation (TV), and their impact on SOC estimation was compared in four different methods: Method One (reflectance bands without denoising, shown as M#1), Method Two (denoised with SG + FOD, shown as M#2), Method Three (denoised with VisuShrink, shown as M#3), and Method Four (denoised with TV, shown as M#4). Based on the results, the best denoising algorithm was TV (Method Four or M#4), which increased the estimation accuracy by about 27% (from 40% to 67%). After TV, the VisuShrink and SG + FOD algorithms improved the accuracy by about 23% and 18%, respectively. In addition to denoising, a new feature generation strategy was proposed to enhance accuracy further. This strategy comprised two main steps: first, estimating the number of endmembers using the Harsanyi–Farrand–Chang (HFC) algorithm, and second, employing Principal Component Analysis (PCA) and Independent Component Analysis (ICA) transformations to generate high-level features based on the estimated number of endmembers from the HFC algorithm. The feature generation strategy was unfolded in three scenarios to compare the ability of PCA and ICA transformation features: Scenario One (without adding any extra features, shown as S#1), Scenario Two (incorporating PCA features, shown as S#2), and Scenario Three (incorporating ICA features, shown as S#3). Each of these three scenarios was repeated for each denoising method (M#1–4). After feature generation, high-level features were added to the outputs of Methods One, Three, and Four. Subsequently, three machine learning algorithms (LightGBM, GBRT, RF) were employed for SOC modeling. The results showcased the highest accuracy when features obtained from PCA transformation were added to the results from the TV algorithm (Method Four—Scenario Two or M#4–S#2), yielding an R² of 81.74%. Overall, denoising and feature generation methods significantly enhanced SOC estimation accuracy, escalating it from approximately 40% (M#1–S#1) to 82% (M#4–S#2). This underscores the remarkable potential of hyperspectral sensors in SOC studies.

Keywords:

PRISMA; SOC; total variation; denoising; hyperspectral; wavelet; VisuShrink

Graphical Abstract

1. Introduction

Soil organic matter (SOM), commonly referred to as the amount of organic matter present in the soil, is one of the most crucial parameters that determine soil health [1,2]. A primary constituent of SOM is carbon, which typically accounts for approximately 55% of its composition. Soil organic carbon (SOC) plays a great role in influencing soil fertility and plant growth [3]. Higher SOC levels improve nutrient absorption and water retention and reduce soil erosion [4,5,6]. Given the significance and benefits of SOC, there is a pressing need for a more functional method for rapid and accurate SOC estimation and monitoring [7]. However, traditional lab methods like Walkley–Black provide accurate results but are time-consuming and costly. As a result, employing these methods on a regional or national scale to compute the spatial distribution of SOC becomes impractical. On the other hand, the development of various remote sensing (RS) sensors, especially multispectral and hyperspectral (HS) sensors, offers an efficient way to estimate the SOC concentration using RS images. Castaldi et al. employed Sentinel-2 time series data to accurately identify bare soil to enhance the estimation of SOC [8]. Zhang et al. utilized a time series of Landsat-8 imagery to estimate SOC levels in the Jinanghan Plain in China [9]. Lin et al. fused Sentinel-2A and Sentinel-3A images using the Gram–Schmidt Spectral Sharpening method to obtain the benefit of both high spectral and high spatial resolution for SOM estimation [10]. However, due to the complexity of the carbon cycle and the existence of narrow absorption points in the spectral signature of SOC, relying solely on MultiSpectral sensor images that have wide bandwidths in accurate SOC absorption range results cannot be expected in SOC estimation.

Using HS is an effective solution for solving SOC estimation problems exclusively based on RS data. During the last two decades, the use of HSs in RS has been greatly appreciated [11]. Among the array of HSs, the Hyperion sensor is known as one of the first HSs in the 21st century. Deployed aboard the NASS EO-1 satellite, it captured images across 220 spectral bands (357–2576 nm) from 2000 to 2017, providing invaluable data for RS applications [12]. While Hyperion kindled great hopes in RS studies, it also grappled with significant challenges. A key issue was its low signal-to-noise ratio (SNR) in the Short-Wave Infrared region, dropping below 40 above 2100 nm and below 100 above 1225 nm [13]. This led to considerable noise interference, affecting many of the narrow absorption points of carbon and resulting in uncertainty [14]. Fortunately, recent years have seen the launch of new-generation HS satellites aimed at addressing Hyperion’s limitations. In 2022, the German hyperspectral satellite ENMAP (Environmental Monitoring and Analysis Program) was launched with DLR (German Aerospace Center), with 228 bands spanning from 420 to 2540 nm. Similarly, in 2019, Japan’s Ministry of Economy launched its HISUI (Hyperspectral Image Suite) satellite, boasting 185 bands and a signal-to-noise ratio of about 400. In the same year, ASI (Agenzia Spaziale Italiana) also launched the PRISMA satellite, equipped with two sensor instruments: a hyperspectral camera module and a panchromatic camera module [15]. The hyperspectral camera captures images in 239 bands from VNIR to SWIR (400–2505 nm) with a spatial resolution of 30 m [16,17]. Therefore, in this study, we will further investigate the capabilities of the PRISMA satellite in estimating SOC.

Over the past two decades, there have been many studies focusing on soil parameters using HS. Peon et al. utilized an airborne hyperspectral scanner and Hyperion satellite imagery to estimate topsoil organic carbon [18]. Mzid et al. compared various satellite images such as PRISMA, Sentinel-2, and Landsat-8 to estimate soil parameters like organic carbon and clay content [19]. PRISMA images were employed to map soil nutrient parameters (SOM, P₂O₅, K₂O); ultimately, the researchers achieved an accuracy of R² = 69% for SOM prediction [20]. Additionally, Castaldi et al. used simulated spectra of PRISMA, ENMAP, and HyspIRI for estimated soil variables and demonstrated their superiority over Multispectral sensors and previous generations of HS [14]. Angelopoulou et al. used PRISMA and Hyspex images to estimate SOC in crop fields across Northern Greece. In the best scenario with PRISMA images, they achieved an R² value of 76% for SOC estimation [21]. Ou et al. proposed a Kubelka–Munk (K-M) theory-based spectral correction model to remove the influence of soil moisture on airborne hyperspectral data to improve the sensitivity to SOM for SOM inversion [22]. However, most of these studies do not pay much attention to hyperspectral image processing techniques. An essential step in HS image processing is noise removal, which is necessary given the high spectral resolution of these sensors [23,24]. Maintaining SNR is essential, especially due to the narrow absorption points of SOC in the SWIR region, necessitating the denoising process. In most studies that used HS for SOC estimation, researchers have employed techniques like the Savitzky–Golay (SG) filter, Principal Component Analysis (PCA) transformation, and wavelet decomposition for denoising purposes [21,25,26]. However, in some studies, denoising methods were not utilized, and instead, researchers solely relied on transformations like logarithm, inverse logarithm, first-order derivative (FOD), or continuum removal to enhance the visibility of absorption features. In our research we utilized three distinct, denoising methods on the PRISMA image, incorporating: (1) Savitzky–Golay combined with First-order derivative (SG + FOD), (2) wavelet shrinkage-VisuShrink, and (3) standard total variation (TV). Among the three denoising proposed methods, SG + FOD is a common and known method for denoising from hyperspectral signals in digital soil mapping studies. However, VisuShrink and TV are basic and well-known methods for image denoising that have not been utilized for SOC estimation with HS satellite images until now, and it is necessary to examine and compare their performance in SOC estimation with conventional denoising methods. Wavelet shrinkage is a method initially designed for signal denoising, capitalizing on the sparse representation of signals in the wavelet domain. In this method, the coefficients of the signal (image) at different levels of decomposition are examined, and using a thresholding strategy, certain coefficients likely to be noise are eliminated [27]. The main point in wavelet shrinkage methods is selecting the optimal threshold value, and many methods have been developed for this purpose. One of the basic and most practical methods is VisuShrink. This method introduces a global threshold for all wavelet decomposition levels. This threshold value aims to minimize the maximum mean squared error over a wide range of possible signals and noise levels [28]. Another commonly employed method for threshold estimation is SureShrink. This method offers an adaptive, data-driven threshold for each wavelet decomposition level, aiming to minimize the Stein Unbiased Risk Estimate (SURE) value, an unbiased estimate of the MSE between the denoised signal and the original signal [24,29]. Besides wavelet shrinkage-VisuShrink, we utilize the standard TV method for denoising. This method is straightforward yet effective in denoising images by minimizing image variations while preserving important details like textures. The denoising process focuses on reducing the TV of the noisy image while ensuring the denoised version remains close to the original [30]. Due to the capabilities of the TV algorithm, it could be expected that this algorithm will perform well in denoising HS images. However, it is unclear whether traditional denoising methods outperform TV in this specific task. No studies have compared them for SOC estimation yet.

In addition to denoising, feature generation is a crucial part of working with hyperspectral data [31]. In this study for feature generation, a two-step algorithm is provided first, and the number of endmembers in the PRISMA image is estimated using the Harsanyi–Farrand–Chang (HFC) method [32]. Then, the best independent components (ICs) and principal components (PCs) are selected based on the number of endmembers. After generating ICs and PCs, there are three different scenarios for each of the three methods (two denoising methods: TV and VisuShrink and original bands or M#1-3-4). The first scenario uses only the denoised reflectance bands or original reflectance bands (184 bands) for estimating SOC. The second scenario combines the denoised reflectance bands with 10 generated PCs (184 bands + 10 PCs). The third scenario combines the denoised reflectance bands with 10 generated ICs (184 bands + 10 ICs). Combining the endmember’s number estimation method with PCA/ICA algorithms makes the feature generation process more interpretable. Studies related to estimating SOC using hyperspectral RS should pay attention to this subject, but little attention has been paid to interpretable feature generation in past studies.

Ultimately, the effectiveness of various denoising methods and feature generation approaches was evaluated using three ML algorithms: Random Forest (RF), Gradient-Boosting Regression Trees (GBRTs), and LightGBM over four methods (M#1–4 for denoising algorithms evaluation) and three scenarios (S#1–3 for feature generation strategy evaluation).

This study aims to explore the potential of PRISMA images in SOC estimation. Although this tool has been minimally used in soil studies, our research endeavors to introduce more specialized techniques for processing HS data for SOC estimation. To achieve this goal, three denoising methods and a feature generation strategy are employed to improve SOC estimation accuracy. In general, the main purposes of this research are the following: (1) To assess the capability of PRISMA images in SOC estimation. (2) To investigate the effects of different denoising methods on PRISMA images and their influence on SOC estimation accuracy. (3) To evaluate the effectiveness of feature generation methods based on PCA and ICA transformations in SOC estimation.

2. Materials and Methods

Figure 1 displays a flowchart outlining our proposed methods for estimating SOC. Each box is color-coded according to its role. Here is a breakdown of our approach: The orange box involves data preparation, such as removing absorption bands, denoising the PRISMA image, and gathering SOC ground data. The green box is for generating features, including creating principal components (PCs) and independent components (ICs). The black box represents various methods and scenarios. The blue box represents the ML pipeline. The red box is for outputs and results. More details will be explained in further sections.

2.1. Study Area and Soil Sampling

The area under consideration in this study lies in the southern part of Iran, in the Fars province near Marvdasht, covering an approximate area of 530 square kilometers positioned between 52°41′35.82″ to 52°57′1.07″ east longitude and 29°48′35.02″ to 30°2′14.72″ north latitude. Based on Mahler’s classification, the area comprises three primary physiographic units: mountains, piedmont plains, and alluvial plains. The soil moisture and temperature regimes in this region are identified as xeric and thermic, respectively. These soils are categorized into two types: entisols and inceptisols. The dominant land use in the study area involves irrigated agriculture, with crops such as wheat, alfalfa, and canola, alongside. The average annual rainfall for the main area and its extension is 287.63 mm with an average annual temperature hovering around 17.80 degrees Celsius. Additionally, the area’s average elevation stands at approximately 1600 m above sea level, with an average slope of around 5.0 degrees. The conditioned Latin hypercube sampling method was employed to determine the locations of up to 123 sampling points within the area (Figure 2). These samples were then transported to a laboratory, air-dried, and sifted through a 2 mm sieve. Finally, the SOC content was measured using the Walkley–Black method. To improve the effectiveness of ML models, especially given the scarcity of ground data, artificial points were generated surrounding each sample in four primary directions, with a distance of 30 m between them. These additional points retained identical SOC levels to the central point but showcased varied RS features. This method represents a type of data augmentation.

2.2. PRISMA Hyperspectral Data

PRISMA, an Italian Space Agency (ASI) hyperspectral satellite, was launched on 22 March 2019. Positioned in a sun-synchronous orbit 615 km above the Earth, it gathers data across 239 spectral bands ranging from 400 to 2505 nm (Table 1). Equipped with both hyperspectral and panchromatic sensors, PRISMA employs the push broom technique to scan the Earth’s surface [15]. The HS sensor encompasses two prism spectrometers, one in the VNIR and another in the SWIR range [15,16]. Consequently, it captures 66 bands in the VNIR and 173 bands in the SWIR with a spectral resolution of less than 12 nm and a spatial resolution of 30.0 m. Also, a panchromatic camera captures images from 400 to 700 nm with a spatial resolution of 5 m. Moreover, PRISMA offers a temporal resolution of 29 days but has 7-day relooking capability using roll maneuver [16,33]. This study utilized data from PRISMA’s scene acquired on 9 July 2020, at 9:27 UTC. The image was of type L2D, indicating that it had undergone radiometric and geometric corrections such as atmospheric and viewing geometry correction. However, an important consideration is atmospheric absorption, where approximately 16% of electromagnetic waves are absorbed by atmospheric gases and molecules [34]. The main absorbers in the atmosphere include H₂O, O₃, and CO₂, with H₂O and CO₂ molecules particularly notable for their numerous absorption points in the 0.9- to 2.7-micrometer range. Due to this issue and the high spectral resolution of PRISMA HS, certain bands are situated within the absorption regions of these gases, rendering them unable to record any data. In this study, this problem was addressed by removing bands numbered 1 to 6, 107 to 125, 149 to 169, and 238 to 239. Consequently, we researched the remaining 184 bands (Figure 3).

2.3. Denoising PRISMA Image

From a radiometric perspective, there are two primary sources of errors in HS. The first one is the atmosphere, which has effects such as scattering and absorption. These atmospheric effects are addressed through atmospheric correction [35]. The second source is related to noise that was created during the imaging operation, and this noise is the main factor in reducing the SNR [24]. The main types of noise are thermal noise and quantization noise. Thermal noise is inherent in all electronic circuits and devices due to the random thermal motion of electrons within the conductive materials [36]. Thermal noise has a direct relationship with the detector’s temperature [37]. Quantization noise arises during the analog-to-digital conversion (ADC) process, where the continuous analog signal from the detector is converted into a discrete digital representation [38]. Quantization noise is strictly related to the bit depth of the ADC. Examining the effect of these noises on HS and minimizing their impact on HS data through denoising methods is one of the important pre-processing steps when working with HS data. This process directly affects the SNR ratio and, as a result, increases the accuracy of subsequent processing steps [39]. In this research, we used three distinct denoising methods explained in the following sections.

2.3.1. Savitzky–Golay and First-Order Derivative

Savitzky–Golay (SG) is a finite impulse response filter that is widely used in signal and image processing for denoising and smooth data [40]. This filter operates by fitting a polynomial of a chosen degree to a moving window of data points (Equation (1)). It then utilizes the coefficients of this polynomial to estimate the smoothed value at the center of the window. The SG filter has two main parameters: the degree of the polynomial (m) and the length of the window (n, n ≥ m + 1). Additionally, the SG filter utilizes least-square estimation to compute the polynomial coefficient, such that it minimizes the difference between the original data and the estimated data [41].

S_{k} = \sum_{i = 0}^{m} C_{i} \times y_{k + i - (n - 1) / 2}

(1)

In Equation (1), S_k is the smoothed data at point k, c_i is the coefficient of the polynomial, and

y_{k + i - (n - 1) / 2}

are data points within the window centered at point k. Additionally, to determine the coefficients c_i, the SG filter utilizes least-square estimation for the Y as a polynomial of degree m [41,42]. The coefficients c₀, c₁, etc., are obtained by solving the normal equations (bold C represents a vector, bold j represents a matrix) (Equations (2) and (3)).

Y = C_{0} + C_{1} X + C_{2} X^{2} + \dots + C X^{m}

(2)

C = {(j^{T} j)}^{- 1} (j^{T} Y)

(3)

In Equation (3), j is a coefficient matrix and the i-th row of J has values 1, x_i,

x_{i}^{2}

, … In this study, we chose the degree of the polynomial and the window length based on their performance in SOC estimation. Our findings showed that a polynomial degree of 3 and a window length of 11 yields the best results. Additionally, a common technique to highlight absorption points in spectral signatures post-denoising is computing the first-order derivative (FOD). Following denoising using the SG method, we applied FOD to the signal of PRISMA images at locations corresponding to SOC data points (Figure 4). Generally, we start by denoising the image with the SG filter and then compute the FOD for the signal of each SOC ground truth data point.

2.3.2. Wavelet Shinrkage (VishShrink)

A commonly used method for denoising is based on wavelet shrinkage techniques. The main process in wavelet shrinkage involves applying a transformation function to the wavelet coefficients using a specific threshold [28,42]. The transformation function tries to remove coefficients related to noise to a large extent. Finally, the denoised image or signal is reconstructed through the inverse wavelet transform. In this research, 2D VisuShrink was employed for denoising the PRISMA image (Figure 5). This method uses a universal threshold, aiming to minimize the effect of noise on the wavelet coefficients of the image (Equation (4)).

y = w + n

(4)

In this context, let us denote w = (w1, w2) as the wavelet coefficients of the denoised image, y = (y1, y2) as the wavelet coefficients of the noisy image, and n = (n1, n2) as the wavelet coefficients of the white Gaussian noise. It is assumed that ŵ is the estimated value of w, where some of its coefficients are equivalent to y, while the rest are set to zero (Equation (5)) [42].

\hat{w_{k}} = \{\begin{array}{l} y_{k} & k \in A \\ 0 & k \notin A \end{array}

(5)

After calculation

\hat{w}

, we can compute the expected value of E

({‖ \hat{w_{k}} - w_{k} ‖}^{2})

, if k ∈ A (Equation (6)) [42]:

E ({‖ \hat{w_{k}} - w_{k} ‖}^{2}) = E ({‖ y_{k} - w_{k} ‖}^{2}) = E ({(w_{k} + n_{k} - w_{k})}^{2}) = E (n_{k}^{2}) = σ_{n}^{2}

(6)

If

k \notin A

, we have

\hat{w_{k}}

= 0 as a result (Equation (7)):

E ({‖ \hat{w_{k}} - w_{k} ‖}^{2}) = E ({‖ y_{k} - w_{k} ‖}^{2}) = E ({(0 - w_{k})}^{2}) = E (w_{k}^{2}) = w_{k}^{2}

(7)

Based on Equations (6) and (7), we could have:

f (k) = \{\begin{array}{l} σ_{n}^{2} & k \in A \\ w_{k}^{2} & k \notin A \end{array}

(8)

E ({‖ \hat{w_{k}} - w_{k} ‖}^{2}) = \sum_{k = 1}^{N} f (k)

(9)

Based on the equations provided, set A encompasses all indices of K where

w_{k}^{2} > σ_{n}^{2}

. In the contrary scenario, where the coefficient values are lower than the noise value, the coefficients become unrecoverable [42]. Therefore, to determine set A, it is imperative to calculate the value of

σ_{n}^{2}

. However, calculating the exact value of

σ_{n}^{2}

in practice is not feasible, as it is typically unknown [42]. Consequently, we must resort to estimating

σ_{n}^{2}

. Hample et al. show that the median absolute deviation (MAD) converges to 0.6745σ_n as the sample size goes to infinity (Equation (10)) [43].

σ_{n} = \frac{M A D (y_{1 i})}{0.6745} y_{1 i} \in H H_{1}

(10)

The VisuShrink method employs a universal threshold, which is applied to all wavelet decomposition level coefficients. This threshold is carefully selected to minimize the maximum value of the estimation error and constraint, as described in (Equation (11)). Upon solving (Equation (11)), the value of the universal threshold (λ) is calculated according to (Equation (12)):

E ({‖ \hat{w_{k}} - w_{k} ‖}^{2} \leq (2 \ln (N) + 1) (σ_{n}^{2} + \sum_{k = 1}^{N} \min (w_{k}^{2}, σ_{n}^{2})))

(11)

λ = σ_{n} \sqrt{2 \ln (N)}

(12)

In (Equation (12)), N is equal to the data length. After calculating λ, the wavelet coefficients are adjusted using the soft threshold function (Equation (13)) [42,44].

T_{λ} (w_{k}) = s i g n (w_{k}) . \max (|w_{k}| - λ, 0)

(13)

Here,

w_{k}

represents the wavelet coefficient, λ denotes the threshold value, and

T_{λ} (w_{k})

signifies the resulting coefficient after thresholding. After threshold calculation, one of the main aspects of wavelet shrinkage-based methods lies in the selection of the wavelet mother function and the number of decomposition levels in the wavelet transform. In this study, to determine these parameters, a variety of mother functions across different levels of decomposition (ranging from 2 to 5) were utilized for denoising the PRISMA image (Table 2). Subsequently, their performance was assessed by comparing the accuracy of SOC estimation. Based on the results obtained, it was found that the DB-1 (Daubechies) mother function combined with a decomposition level of 3 yielded the highest accuracy.

2.3.3. Standard Total Variation

The total variation is a basic denoising method from images that aims to reduce image noise while preserving crucial features such as textures and edges [45,46]. The main idea of this method can be modeled as an optimization problem that tries to minimize the image’s overall variation (Equations (14)–(16)):

f = u + n

(14)

{\arg \min}_{u} ‖ f - {u ‖}_{2}^{2} + {λ TV}_{Ω} (u)

(15)

{TV}_{Ω} (u) = \sum_{i} \sqrt{{(\nabla_{i}^{h} x)}^{2} + {(\nabla_{i}^{v} y)}^{2}}

(16)

where

f

is a noisy image,

u

is an original clear image,

n

is an additive in white Gaussian noise,

Ω

is image space,

\nabla_{i}^{h}

is the gradient of the image in the horizontal direction in point i and

\nabla_{i}^{v}

is the gradient of the image in the vertical direction in point i. In (Equation (15)),

λ

is a regularization parameter (positive value) that controls the trade-off between fidelity to the observed image (first part) and smoothing term (second term) [47]. The first part of a TV optimization problem is convex; however, the regularization term’s non-differentiability makes it nonconvex, hindering conventional gradient-based solutions. As a result, various methods have emerged to address this issue, among which the Chambolle Projection Algorithm stands out as one approach. [47,48]. The Chambolle algorithm, by adding a new variable (p), separates the optimization problem into two parts: the differentiable and the non-differentiable part (Equation (17)). This algorithm operates iteratively by calculating the value of p using the Chambolle projection (Equation (18)) and then updating the value of u through a linear equation (Equation (19)). This iterative process persists until convergence, signified by the difference between the denoised images falling below a predetermined threshold.

\min {‖ \frac{d i v p}{λ} - f ‖}_{x}^{2} s . t . |p_{i j}^{2}| - 1 \leq 0

(17)

p_{i j}^{n + 1} = \frac{p_{i j}^{n} + τ {(\nabla (d i v p^{n} - \frac{f}{λ}))}_{i j}}{1 + τ |{(\nabla (d i v p^{n} - \frac{f}{λ}))}_{i j}|}

(18)

u = f - λ d i v p_{i j}^{n + 1}

(19)

where div is the divergence operator,

τ

is the step size parameter, and

u^{0} = f and p^{0} = 0

. It is important to note that our study does not aim to explore the Chambolle projection algorithm in depth. For further details, interested readers are recommended to consult relevant sources for additional information [49]. Apart from all the formulations, the crucial point in using the TV algorithm is the correct selection of the regularization parameter (λ), which plays a significant role in denoising. Therefore, similar to Section 2.3.2, the selection of the λ hyperparameter in this section is based on different λs accuracy in SOC estimation. To accomplish this, the LightGBM algorithm has been employed for SOC estimation. The results indicate that the optimal performance is achieved with λ = 0.1 (Figure 6). Consequently, we apply the total variation algorithm with λ = 0.1 for denoising PRISMA HS data throughout the remainder of this study (shown after with TV-0.1).

2.4. Feature Generation

Feature generation/extraction serves as a valuable post-processing technique in HS image analysis. Its appeal lies in the capability of feature extraction methods to generate high-level features, thus enhancing the accuracy of subsequent processing, whether it involves classification or regression [31,50,51]. In this study, after applying denoising methods to the PRISMA image, a novel feature generation strategy was employed to extract high-level features to enhance the SOC estimation model. Due to the cost and time constraints in SOC data collection, our research focuses on developing an unsupervised feature extraction method. This approach allows us to investigate its performance without being hindered by the lack of training data. The proposed feature generation method comprises two steps: first, estimating the number of endmembers and applying PCA and ICA transformation to the PRISMA image. In the first part, we utilize the HFC (Harsanyi–Farrand–Chang) method to determine the number of end members in the denoised image [32]. Endmembers represent the pure materials in the HS image, encompassing natural elements like vegetation, certain mineral metals, or man-made structures such as buildings [52,53]. The outcome of the HFC algorithm reveals that the PRISMA image contains ten endmembers, indicating the presence of ten distinct materials within the specified area. This finding aligns reasonably well with our prior knowledge of the region. Moving on to the subsequent step, we apply PCA and ICA transformations to the denoised image. And then to the number of end members, PCs with the most significant variance, and ICs with the greatest neg-entropy value were selected. Overall, ten selected PCs and ten selected ICs are added separately to the hyperspectral image reflectance bands (184 bands). Each creates a different scenario in the training process, which is explained in more detail in Section 2.3.

2.5. Different Methods and Scenarios

This study applies two main HS image processing techniques to the PRISMA satellite image. The first technique includes three denoising methods: SG + FOD, VisuShrink, and TV, and the second one involves feature generation. An important point to note is that all three denoising methods (SG, TV, and VisuShrink) are applied to the image, and their output is an image. However, we have specifically used the FOD operator to highlight changes in spectral signatures. When the derivative is utilized as a two-dimensional operator on the image, it conveys information about edges, which is not relevant to our study’s objective. For this reason, FOD was only applied at the signal of points where SOC ground data were measured. Therefore, the output of the SG + FOD method is in the form of a signal that cannot be used for calculating PCA and ICA transformations in the feature generation step. Consequently, we do not use the output of the SG + FOD method (M#2) in the feature generation step.

To provide a structured comparison of various denoising algorithms and assess their effectiveness, the findings are presented through four distinct methods (Table 3). Method One works as a baseline, devoid of any denoising algorithm, relying solely on the initial reflectance band values (shown after with M#1). In contrast, Method Two employs the SG + FOD algorithm for denoising (shown after with M#2). Similarly, Method Three utilizes the VisuShrink algorithm for denoising purposes (shown after with M#3), while Method Four uses the TV algorithm for denoising (shown after with M#4). However, the assessing feature generation method is slightly different. Three different scenarios are designed to evaluate the feature generation in all four methods (M#1–4) (Table 3). In the first scenario, only the reflectance bands (after removing absorption bands) are used, without any additional features (184 bands). In the second scenario, 10 PCs created in the feature generation step are added to the reflectance bands, and SOC estimation is carried out using these (184 bands + 10 PCs). In the third scenario, SOC estimation is carried out by adding 10 ICs created in the feature generation step (184 bands + 10 ICs).

2.6. Machine Learning

After applying the preprocessing steps and creating feature sets, it is time to model SOC using ML algorithms. This research uses three ML algorithms: RF, GBRT, and LightGBM, all in the ensemble learning field. RF is one of the oldest and most used ML algorithms that is based on using the bootstrap aggregation technique on weak learners for predictions; two algorithms, GBRT and LightGBM, are newer than the RF, and they use the boosting technique for their training to increase the final accuracy by combining different weak learners in such a way that each weak learner seeks to decrease the errors of the previous weak learner [54,55]. Ultimately, all of these algorithms are used in all methods and scenarios and their performance is evaluated.

2.7. Model Evaluation

After training an ML model, the crucial step is evaluating its performance on test data. This step is essential because once the training process is finished, a well-trained model should provide accurate results for various inputs. One common approach to assess the model’s performance is by splitting the data into training and testing sets. Although widely used, this method has its drawbacks. Since the data split is stochastic, there is a risk of uneven distribution, potentially leading to overfitting or underfitting issues, affecting the model’s performance adversely. To mitigate these challenges and ensure the model’s reliability, employing cross-validation is recommended. Cross-validation divides the dataset into multiple groups or folds. Different folds are then used for training and testing iteratively. This process helps to provide a more comprehensive and robust evaluation of the model’s performance. In this research, we employed ten-fold cross-validation, where the dataset was split into ten equal folds. After specifying the test data, the model’s performance is evaluated using three metrics: coefficient of determination (R²), root mean squared error (RMSE), ratio performance interquartile (RPIQ), and mean absolute error (MAE). The R² value reflects how effectively the independent variables explain the variation in the dependent variable, indicating the goodness of fit on a scale from 0 to 1. A value closer to 1 indicates a higher proportion of the variance in the dependent variable explained by the independent variables. Conversely, a value closer to 0 suggests the model poorly explains the variance in the dependent variable and may not be suitable. It is essential to mention that the results of each metric are reported based on the average results obtained from ten-fold cross-validation, ensuring a thorough assessment of the model’s performance. These statistical indicators are calculated using (Equations (20)–(23)):

M A E = \frac{\sum_{i = 1}^{n} | y_{i} - \hat{y_{i}} |}{n}

(20)

R M S E = \frac{\sqrt[2]{\sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}}{n}

(21)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y_{i}})}^{2}}

(22)

R P I Q = \frac{I Q_{y}}{R M S E P}

(23)

where

y_{i}

and

\hat{y_{i}}

denote the observed and predicted SOC values, IQy is the interquartile distance calculated from the SOC values, RMSEP is the prediction RMSE, and n denotes the number of soil samples used for validation

\bar{y_{i}}

and denotes the means of the observed SOC values.

3. Results

3.1. General Statistics of SOC Data

Table 4 displays the overall statistics of the SOC data gathered. Based on the samples collected, the average SOC content in the region is 1.03%, with a standard deviation of approximately 0.35%. This variance may stem from variations in cultivation type across different farms. Additionally, the highest and lowest SOC levels recorded in the region are 2.2% and 0.13%, respectively.

3.2. Different Methods’ Results

3.2.1. Method One (M#1)

In Method One (M#1), no denoising technique was utilized, and only absorption bands were eliminated. In the first scenario of this method, no extra features were incorporated. Essentially, the first scenario shows the accuracy of the PRISMA image without any denoising or feature generation, standing at an R² of 40.1%. In the subsequent scenario, where 10 PCs extracted from the feature generation strategy were added, the accuracy improved by 22%. Similarly, in the third scenario, involving 10 ICs, the accuracy showed an average increase of 20.4% across all three ML algorithms, and the RF algorithm reduced the MAE by about 0.3% (from 0.20% to 0.17%) when comparing the third scenario to the first scenario. Similarly, the GBRT algorithm showed the same improvement, with the MAE decreasing from 0.19% in the first scenario to 0.15% in the third scenario, which is a promising result. Furthermore, within this method, the LightGBM algorithm attained the highest accuracy, reaching an R² of 65.05% in the second scenario (M#1–S#2) (Table 5, Figure 7 and Figure 8). Additionally, among the three algorithms, both the GBRT and LightGBM algorithms showed superior performance in the second scenario compared to the third scenario.

To achieve maximum accuracy across all three ML algorithms, a hyperparameter tuning process should be applied. In this research, we utilized the grid search method for hyperparameter tuning (Figure 9).

3.2.2. Method Two (M#2)

In the second method, SG + FOD is used as a denoising method. As detailed in Section 2.5, this method returns signals primarily influenced by the first-order derivative operator. However, these signals are unsuitable for input in the feature generation stage. Consequently, scenarios two and three do not apply to this method. Based on our findings, the GBRT algorithm achieved the highest accuracy, reaching an R² value of 56.86% (Figure 10 and Table 6). This represents an increase of approximately 16% compared to the first scenario in Method One (M#1–S#1). Such improvement underscores the effectiveness of the SG + FOD denoising method.

3.2.3. Method Three (M#3)

This method uses the VisuShrink algorithm to reduce noise in the PRISMA HS image. In the first scenario, after applying the VisuShrink method for denoising, the accuracy of estimating SOC reached an R² value of 64%, showing a notable improvement of around 24% compared to M#1-S#1 (see Figure 11 and Figure 12, and Table 6). Moreover, the second scenario noted the highest accuracy, where the LightGBM algorithm achieved an R² accuracy of 70% (M#3–S#2). Additionally, there was an average accuracy increase of about 5.5% in the second scenario, and in the third scenario, an increase of approximately 5% across all algorithms compared to the first scenario of this method. Also, the LightGBM algorithm reduced the MAE by about 0.8% (from 0.19% to 0.11%) when comparing the second scenario of this method (M#3–S#2) to the first scenario of Method One (M#1–S#1). Similarly, the GBRT algorithm showed the same improvement, with the MAE decreasing from 0.21% in the first scenario of Method One to 0.15% in the third scenario of Method Three (M#3–S#3). These outcomes highlight the adequate performance of the featured generation algorithm introduced. It is worth noting that, similar to M#1, the LightGBM algorithm demonstrated the highest accuracy in this method. Lastly, a grid search was conducted on different learning rates and number estimators to optimize the ML algorithms’ performance, as shown in Figure 13.

3.2.4. Method Four (M#4)

In the fourth method, the TV algorithm with a regularization parameter of 0.1 removes noise from PRISMA HS images. According to the results, this algorithm has been the most successful in denoising. It increased the accuracy of SOC estimation by 26.85%, 10.13%, and 2.92% compared to the first, second, and third methods, respectively (Table 6). Moreover, the second scenario showed the highest accuracy in this study, with the LightGBM algorithm achieving an accuracy of R² = 81.74% (M#4–S#2) (Figure 14 and Figure 15, and Table 6). Also, the LightGBM algorithm reduced the MAE by about 0.10% (from 0.19% to 0.9%) when comparing the second scenario of this method (M#4–S#2) to the first scenario of Method One (M#1–S#1). Similarly, the GBRT algorithm showed the same improvement, with the MAE decreasing from 0.21% in the first scenario of Method One (M#1–S#1) to 0.10% in the third scenario of Method Four (M#4–S#3). Additionally, using 10 PCs in the second scenario increased accuracy by about 13.7%, and using 10 ICs in the third scenario increased accuracy by about 14.8%, indicating the efficiency of the feature generation method. An interesting point is that, like Methods One and Three, the highest accuracy was achieved in the second scenario. Lastly, a grid search was conducted on different learning rates and number estimators to optimize the ML algorithms’ performance, as shown in Figure 16.

4. Discussion

This study investigates the abilities of the PRISMA satellite in estimating SOC in agricultural regions. Soil samples were gathered from 123 farms in Fars province and analyzed for their SOC content in the laboratory. Two strategies were implemented to enhance the accuracy of SOC estimation: noise reduction from the PRISMA image and a feature generation strategy to extract advanced features. Three methods were evaluated for noise reduction. The first method, SG + FOD, is a common technique for denoising hyperspectral observations in SOC studies. Applying SG + FOD resulted in a 14% improvement in SOC estimation accuracy, reaching a maximum accuracy of 56.86%. Previous studies have also reported similar improvements regarding the effectiveness of the SG filter in enhancing estimation accuracy. Yang et al. utilized log(1/R) + SG + FOD with PRISMA satellite imagery, resulting in a remarkable 25% improvement compared to solely using reflectance values (R) for organic carbon estimation [56]. Xu et al. used SG + log(1/R) as a preprocessing method for predicting SOM content based on laboratory spectrometer observations. This approach yielded a modest 4% improvement over using reflectance values alone [57]. Angelopoulou et al. employed SG as a denoising method and PCA for dimension reduction in SOM content estimation using an ASD Fieldspectrometer. This method demonstrated a 15% improvement in accuracy [21]. The second algorithm used for noise removal was VisuShrink. In the VisuShrink algorithm, the mother function db1 and three decomposition levels were used to remove noise-related coefficients from the image’s wavelet transform coefficients. SOC estimation accuracy increased by 23.7% after removing noise with the Visushrink method, which significantly improved. VisuShrink reached a maximum accuracy of 64% in estimating SOC, about 7% higher than the highest accuracy of the SG + FOD algorithm, indicating its better performance in denoising from PRISMA images. The VisuShrink method has been employed in two studies concerning SOC estimation for denoising from ASD spectrometer observations. However, to our knowledge, it has not been utilized in any studies related to hyperspectral satellites thus far. In these studies [21,57], the VisuShrink method was applied for denoising ASD spectrometer signals. The results showed significant improvements in accuracy, with approximately 18% and 7% enhancements achieved, respectively. The standard TV algorithm is the third denoising algorithm, a fundamental method for removing noise from images. Interestingly, this algorithm has not been previously utilized in SOC studies to denoise HS sensors. Unlike some versions of TV algorithms, the standard TV algorithm treats HS images as a collection of individual 2D images, neglecting the spectral dimension and inter-band correlation. Moreover, it applies a constant regularization parameter across all bands. Despite these simplifications, the algorithm demonstrated notable performance, enhancing SOC estimation accuracy by approximately 27%. Impressively, it achieved an R² value of 66.95%. In general, the TV algorithm outperforms the other two algorithms. One reason for this is that the TV algorithm was originally designed for image denoising, not signal denoising. In the TV algorithm, we set a regularization parameter to 0.1, which results in a slightly blurrier denoised image compared to the VisuShrink algorithm (Figure 17). However, our main goal is to estimate the SOC value rather than extract high-frequency features like edges. This blurring effect is not expected to negatively impact our estimation accuracy. This is especially true because the ground data sampling method ensures that the SOC estimation for each point is based on the average SOC content of the surrounding points (within a 30 m neighborhood).

The second approach to enhance the SOC accuracy estimation relies on generating high-level features from the PRISMA image. Given the intricate nature of the carbon cycle and its relationship with electromagnetic waves, it is necessary to create new features to bridge the gap between SOC and reflectance bands. The number of high-level features is determined based on the number of unique spectral signatures in the study area. Based on the obtained results (Section 3), both PCA-based and ICA-based feature generation methods exhibit significant performance improvements. For instance, in Scenario Two (added PCA-based features), Methods One, Three, and Four show accuracy enhancements of 22.15%, 5.51%, and 13.71%, respectively. In Scenario Three (added ICA-based features), the improvements are 20.46%, 14.50%, and 5.03%. Notably, Method One shows the most significant improvement of about 21%, likely attributed to using data without denoising. Overall, the highest accuracy is achieved in Scenario Two of Method Four (M#4-S#2), with an R² of 81.74%, which is notably high in SOC studies. Previous studies often achieved such accuracy using aerial hyperspectral imagers or spectrometers, coupled with field or laboratory measurements. For example, Francos et al. used AVIRIS NG airborne hyperspectral imagers (GSD = 1 m) and reached an R² of 80% for SOC estimation in southern Italy [58]. Peón et al. attained an accuracy of R² = 61% in SOC estimation using the Hyperion satellite [18]. Gomez et al. used an image from the Hyperion HS satellite to estimate SOC in Northwest Australia. They applied the Partial Least-Square Regression (PLSR) model for this estimation and achieved an R² of 51% [59]. Angelopoulou et al. used a PRISMA satellite image for OM estimation, reaching R² = 76% [21]. Shen et al. used a spectrometry observation in the laboratory for SOC estimation, the Visushrink method for denoising, and the PLSR algorithm for modeling, and in the best scenario, they reached an R² of 71% [40]. Gasmi et al. used the PRISMA image for SOM estimation in northern Morocco. They also compared different feature selection methods on the PRISMA image and concluded that the RF-embedded feature subset selection algorithm had the best performance and yielded R² = 69% [20]. This achievement with medium-resolution HS satellites marks a significant advancement in SOC estimation accuracy. These results suggest that the proposed feature generation method is indeed valuable. The improvements observed were not random but rather systematic, indicating the potential for this method to serve as a feature generation pipeline for future studies. In Figure 17, the SOC prediction map from a specific part of the study area under different methods, scenarios, and ML algorithms is shown and could be used for more analyses.

By examining the results, it can be said that the feature generation strategy has increased the accuracy of SOC estimation to a desirable level. However, more exploration of this strategy is needed for more interpretable investigations. For this reason, the permutation importance method has been used to examine the importance of features extracted in the feature generation step and determine the best spectral ranges of the PRISMA image for SOC estimation. The permutation importance method assesses the significance of features by measuring the decrease in the accuracy of the ML model when a feature is shuffled, compared to its normal state. This procedure is repeated for each feature. Consequently, the use of the permutation importance method in this study, the LightGBM algorithm, and test data were used to evaluate the fourth model (M#4, due to its highest accuracy). Following the permutation importance method, intriguing results emerged. In the second scenario of Method Four (M#4–S#2), it was revealed that 9 out of the 15 most important features were associated with PCA transformation, thereby underscoring the pivotal role of these extracted features. On average, the importance of the PCA transformation features was 7.08% (based on R² drop), while the selected RB had an importance of around 3.9% (Figure 18). Notably, the only unselected PC among the top 15 features was PC1, which had the highest variance. This finding suggests that minor changes are more critical for modeling SOC than major changes.

Moreover, the SWIR (bands 166 and 184) and VNIR (bands 120 and 121) spectral ranges in the PRISMA image were the most significant for accurate SOC estimation. Also, to investigate the importance of ICA features, the permutation importance method has been applied to the third scenario of the fourth method (M#4-S#3), and the same result of the second scenario (M#4–S#2) is repeated. Among the 15 most important features, 9 were related to ICA transformation, meaning that 90% of the ICA transformation features were grouped into important features (Figure 18). The importance of the ICs indicates the presence of independent components affecting SOC (e.g., vegetation phenological cycle or agricultural practices), and the ICA transformation successfully separated the impact of each component. As a result, the ML algorithm could independently determine the effect of each component on the estimation process. Another critical point is the importance of the SWIR spectral range. Among the remaining six features related to the PRISMA RB, five were associated with the SWIR spectral range (bands 177, 180, 181, 182, and 184). This achieved result underscores the significance of this spectral range in estimating SOC.

The final step in SOC modeling employed three ML algorithms: RF, GBRT, and LightGBM. Based on the results achieved, the LightGBM algorithm exhibited superiority over the other two algorithms. Specifically, in Methods One, Three, and Four, this algorithm yielded the highest accuracy, while in Method Two, the GBRT algorithm outperformed LightGBM (Figure 19). Across all four methods, the RF algorithm consistently demonstrated the lowest accuracy, these outputs highlighting the robust capabilities of algorithms utilizing a boosting strategy. In a related study, Zarei et al. utilized Extreme Gradient Boosting (XGBoost), RF, and GBRT algorithms for soil salinity estimation, revealing the advantages of GBRT and XGBoost over RF [60]. Similarly, Ye et al. demonstrated the superiority of GBRT and XGBoost over RF in SOC estimation using GF-6 hyperspectral satellite data [61]. Furthermore, Yang Li et al. showcased the superiority of XGBoost over the RF algorithm in SOC estimation based on spectrometer observations [56].

In the future, although the denoising methods employed in this study proved to be highly effective, they are regarded as fundamental techniques in the field of noise removal. It is reasonable to anticipate that newer versions of these methods may offer enhanced performance in noise reduction from HS images, which could be a focus for future research. Moreover, upcoming studies could explore feature selection algorithms to streamline the number of features used and evaluate their impact on improving the accuracy of SOC estimation. Additionally, further research could use multi-temporal PRISMA images to extract features from time series data, enhancing the SOC prediction ability. One limitation of this research is the limited sample data. One major limitation, which is common in studies related to SOC, is the small number of sample data. This limitation often necessitates the use of data augmentation methods. Although the study’s neighboring points (four main directions) belong to the same type of arable land and cultivation, with similar organic carbon content, there are still slight differences in the actual SOC content between these points and the central point. These differences introduce some uncertainty in our results. To reduce this uncertainty in future research, increasing the number of sample points would be beneficial. This would lessen the reliance on data augmentation methods and lead to more accurate results. In addition to this issue, it should be kept in mind that as the number of sampled data increases, the train–test split method can be used instead of cross-validation to evaluate the model. This helps avoid overfitting and provides more interpretability of model performance at a lower computational cost than the cross-validation method. Also, future research should use test samples from different climates to better investigate the presented methods’ performance in various conditions.

5. Conclusions

In this research, we used the PRISMA new-generation hyperspectral satellite for estimating SOC in agricultural fields. The main objective was to investigate the ability of the PRISMA image in SOC estimation and examine solutions for improving the accuracy of SOC estimation based on hyperspectral image processing techniques. Two main approaches were employed to increase the estimation accuracy. The first approach was based on denoising, and the second approach focused on feature generation. For denoising, we utilized three distinct algorithms: VisuShrink and standard TV (used for the first time in SOC estimation based on spaceborne HS), and SG + FOD (common algorithm for denoising in SOC studies). These algorithms increased the accuracy by about 24%, 27%, and 18%, respectively, showing the positive effect of denoising. Notably, while the VisuShrink algorithm has been recently used for denoising from spectrometer signals in SOC studies, the standard TV has not been used in any study. Due to its proper performance, it or its improved version can be employed in future research. The second approach involved a feature generation strategy for improving accuracy, which included two scenarios based on PCA and ICA transformations. Based on the obtained results, the performance of the feature generation strategy was auspicious, causing an average accuracy improvement of about 13–14% in all methods (M#1, M#3, and M#4). The highest accuracy achieved in this research was R² = 81.74%, obtained with the LightGBM algorithm in the second scenario of the fourth method (TV-0.1 + 10 PCA or M#4–S#2), which is a great accuracy in SOC studies. Compared to the initial state (Scenario One of Method One or M#1–S#1) with R² = 40.1%, this represents more than a two-fold increase, demonstrating the efficiency of the proposed approaches.

Author Contributions

All authors contributed to the study’s conception and design. M.G.A., M.H., R.T.M. and F.S. performed material preparation, data collection, and analysis. M.G.A. wrote the first draft of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets generated and analyzed during the current study are available from the corresponding author upon reasonable request.

Acknowledgments

The author would like to thank the editor and reviewers for their valuable comments on our manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

RS	Remote sensing
SOC	Soil organic carbon
SOM	Soil organic matter
TV	Total variation
HS	Hyperspectral
PCA	Principal Component Analysis
ICA	Independent Component Analysis
ML	Machine learning
SG	Savitzky–Golay
FOD	First-order derivative
SNR	Signal-to-noise ratio
SURE	Stein’s Unbiased Risk Estimate
HFC	Harsanyi–Farrand–Chang
ADC	Analog-to-digital conversion
PC	Principle component
IC	Independent component
RF	Random Forest
LightGBM	Light gradient-boosting machine
GBRT	Gradient-Boosting Regression Tree
db-1	Daubechies wavelet mother function, level one
S#x	Scenario number x
M#x	Method number x
RB	Reflectance Band
OM	Organic matter
MAE	Mean absolute error
RMSE	Root mean squared error
PLSR	Partial least-square regression
XGBoost	Extreme Gradient Boosting

References

Lal, R. Challenges and opportunities in soil organic matter research. Eur. J. Soil Sci. 2009, 60, 158–169. [Google Scholar] [CrossRef]
Lehmann, J.; Kleber, M. The contentious nature of soil organic matter. Nature 2015, 528, 60–68. [Google Scholar] [CrossRef] [PubMed]
Sreenivas, K.; Dadhwal, V.K.; Kumar, S.; Harsha, G.S.; Mitran, T.; Sujatha, G.; Suresh, G.J.R.; Fyzee, M.A.; Ravisankar, T. Digital mapping of soil organic and inorganic carbon status in India. Geoderma 2016, 269, 160–173. [Google Scholar] [CrossRef]
Wang, B.; Waters, C.; Orgill, S.; Gray, J.; Cowie, A.; Clark, A.; Liu, L. High resolution mapping of soil organic carbon stocks using remote sensing variables in the semi-arid rangelands of eastern Australia. Sci. Total Environ. 2018, 630, 367–378. [Google Scholar] [CrossRef] [PubMed]
Liang, Z.; Chen, S.; Yang, Y.; Zhao, R.; Shi, Z.; Viscarra Rossel, R.A. National digital soil map of organic matter in topsoil and its associated uncertainty in 1980’s China. Geoderma 2019, 335, 47–56. [Google Scholar] [CrossRef]
Nocita, M.; Stevens, A.; Noon, C.; van Wesemael, B. Prediction of soil organic carbon for different levels of soil moisture using Vis-NIR spectroscopy. Geoderma 2013, 199, 37–42. [Google Scholar] [CrossRef]
Dvorakova, K.; Heiden, U.; van Wesemael, B. Sentinel-2 exposed soil composite for soil organic carbon prediction. Remote Sens. 2021, 13, 1791. [Google Scholar] [CrossRef]
Castaldi, F.; Wetterlind, M.H.K.J.; Vinci, R.Ž.I.; Savaş, A.Ö.; Kıvrak, C.; Tunçay, T.; Volungevičius, J.; Obber, S.; Ragazzi, F.; Malo, D.; et al. Assessing the capability of Sentinel-2 time-series to estimate soil organic carbon and clay content at local scale in croplands. ISPRS J. Photogramm. Remote Sens. 2023, 199, 40–60. [Google Scholar] [CrossRef]
Zhang, Y.; Guo, L.; Chen, Y.; Shi, T.; Luo, M.; Lu, Q.; Zhang, H.; Wang, S. Prediction of soil organic carbon based on Landsat 8 monthly NDVI data for the Jianghan Plain in Hubei Province, China. Remote Sens. 2019, 11, 1683. [Google Scholar] [CrossRef]
Lin, C.; Zhu, A.-X.; Wang, Z.; Wang, X.; Ma, R. The refined spatiotemporal representation of soil organic matter based on remote images fusion of Sentinel-2 and Sentinel-3. Int. J. Appl. Earth Obs. Geoinf. 2020, 89, 102094. [Google Scholar] [CrossRef]
Seydi, S.T.; Hasanlou, M.; Amani, M. A new end-to-end multi-dimensional CNN framework for land cover/land use change detection in multi-source remote sensing datasets. Remote Sens. 2020, 12, 2010. [Google Scholar] [CrossRef]
Datt, B.; McVicar, T.; Van Niel, T.G.; Jupp, D.; Pearlman, J. Preprocessing EO-1 Hyperion hyperspectral data to support the application of agricultural indexes. IEEE Trans. Geosci. Remote Sens. 2003, 41, 1246–1259. [Google Scholar] [CrossRef]
Kruse, F.A.; Boardman, J.W.; Huntington, J.F. Comparison of airborne hyperspectral data and EO-1 Hyperion for mineral mapping. IEEE Trans. Geosci. Remote Sens. 2003, 41, 1388–1400. [Google Scholar] [CrossRef]
Castaldi, F.; Palombo, A.; Santini, F.; Pascucci, S.; Pignatti, S.; Casa, R. Evaluation of the potential of the current and forthcoming multispectral and hyperspectral imagers to estimate soil texture and organic carbon. Remote Sens. Environ. 2016, 179, 54–65. [Google Scholar] [CrossRef]
Loizzo, R.; Guarini, R.; Longo, F.; Scopa, T.; Formaro, R.; Facchinetti, C.; Varacalli, G. Prisma: The Italian Hyperspectral Mission. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 175–178. [Google Scholar]
Guarini, R.; Loizzo, R.; Facchinetti, C.; Longo, F.; Ponticelli, B.; Faraci, M.; Dami, M.; Cosi, M.; Amoruso, L.; De Pasquale, V.; et al. Prisma Hyperspectral Mission Products. In Proceedings of the IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, Spain, 22–27 July 2018; pp. 179–182. [Google Scholar]
Seydi, S.T.; Hasanlou, M.; Chanussot, J. DSMNN-Net: A deep siamese morphological neural network model for burned area mapping using multispectral sentinel-2 and hyperspectral PRISMA images. Remote Sens. 2021, 13, 5138. [Google Scholar] [CrossRef]
Peón, J.; Recondo, C.; Fernandez, S.; Calleja, J.F.; De Miguel, E.; Caretero, L. Prediction of topsoil organic carbon using airborne and satellite hyperspectral imagery. Remote Sens. 2017, 9, 1211. [Google Scholar] [CrossRef]
Mzid, N.; Castaldi, F.; Tolomio, M.; Pascucci, S.; Casa, R.; Pignatti, S. Evaluation of agricultural bare soil properties retrieval from Landsat 8, Sentinel-2 and PRISMA satellite data. Remote Sens. 2022, 14, 714. [Google Scholar] [CrossRef]
Gasmi, A.; Gomez, C.; Chehbouni, A.; Dhiba, D.; El Gharous, M. Using PRISMA hyperspectral satellite imagery and GIS approaches for soil fertility mapping (FertiMap) in northern Morocco. Remote Sens. 2022, 14, 4080. [Google Scholar] [CrossRef]
Angelopoulou, T.; Chabrillat, S.; Pignatti, S.; Milewski, R.; Karyotis, K.; Brell, M.; Ruhtz, T.; Bichtis, D.; Zalidis, G. Evaluation of airborne hyspex and spaceborne PRISMA hyperspectral remote sensing data for soil organic matter and carbonates estimation. Remote Sens. 2023, 15, 1106. [Google Scholar] [CrossRef]
Ou, D.; Tan, K.; Li, J.; Wu, Z.; Zhao, L.; Ding, J.; Wang, X.; Zou, B. Prediction of soil organic matter by Kubelka-Munk based airborne hyperspectral moisture removal model. Int. J. Appl. Earth Obs. Geoinf. 2023, 124, 103493. [Google Scholar] [CrossRef]
Yuan, Q.; Zhang, Q.; Li, J.; Shen, H.; Zhang, L. Hyperspectral image denoising employing a spatial–spectral deep residual convolutional neural network. IEEE Trans. Geosci. Remote Sens. 2018, 57, 1205–1218. [Google Scholar] [CrossRef]
Rasti, B.; Scheunders, P.; Ghamisi, P.; Licciardi, G.; Chanussot, J. Noise reduction in hyperspectral imagery: Overview and application. Remote Sens. 2018, 10, 482. [Google Scholar] [CrossRef]
Meng, X.; Bao, Y.; Liu, J.; Liu, H.; Zhang, X.; Zhang, Y.; Wang, P.; Tang, H.; Kong, F. Regional soil organic carbon prediction model based on a discrete wavelet analysis of hyperspectral satellite data. Int. J. Appl. Earth Obs. Geoinf. 2020, 89, 102111. [Google Scholar] [CrossRef]
Meng, X.; Bao, Y.; Ye, Q.; Liu, H.; Zhang, X.; Tang, H.; Zhang, X. Soil organic matter prediction model with satellite hyperspectral image based on optimized denoising method. Remote Sens. 2021, 13, 2273. [Google Scholar] [CrossRef]
Fodor, I.K.; Kamath, C. Denoising through wavelet shrinkage: An empirical study. J. Electron. Imaging 2003, 12, 151–160. [Google Scholar] [CrossRef]
Om, H.; Biswas, M. An improved image denoising method based on wavelet thresholding. J. Signal Inf. Process. 2012, 3, 17686. [Google Scholar] [CrossRef]
Xiao, F.; Zhang, Y. A comparative study on thresholding methods in wavelet-based image denoising. Procedia Eng. 2011, 15, 3998–4003. [Google Scholar] [CrossRef]
Rudin, L.I.; Osher, S.; Fatemi, E. Nonlinear total variation based noise removal algorithms. Phys. D Nonlinear Phenom. 1992, 60, 259–268. [Google Scholar] [CrossRef]
Rasti, B.; Ulfarsson, M.O.; Sveinsson, J.R. Hyperspectral feature extraction using total variation component analysis. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6976–6985. [Google Scholar] [CrossRef]
Chang, C.-I.; Du, Q. Estimation of number of spectrally distinct signal sources in hyperspectral imagery. IEEE Trans. Geosci. Remote Sens. 2004, 42, 608–619. [Google Scholar] [CrossRef]
Niroumand-Jadidi, M.; Bovolo, F.; Bruzzone, L. Water quality retrieval from PRISMA hyperspectral images: First experience in a turbid lake and comparison with sentinel-2. Remote Sens. 2020, 12, 3984. [Google Scholar] [CrossRef]
Waters, J. 2.3. Absorption and Emission by Atmospheric Gases. In Methods in Experimental Physics; Elsevier: Amsterdam, The Netherlands, 1976; Volume 12, pp. 142–176. [Google Scholar]
Kim, S.J.; Pollefeys, M. Robust radiometric calibration and vignetting correction. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 562–576. [Google Scholar] [CrossRef] [PubMed]
Udelhoven, T.; Schlerf, M.; Segl, K.; Mallick, K.; Bossung, C.; Retzlaff, R.; Rock, G.; Fischer, P.; Muller, A.; Storch, T.; et al. A satellite-based imaging instrumentation concept for hyperspectral thermal remote sensing. Sensors 2017, 17, 1542. [Google Scholar] [CrossRef] [PubMed]
Letexier, D.; Bourennane, S. Noise removal from hyperspectral images by multidimensional filtering. IEEE Trans. Geosci. Remote Sens. 2008, 46, 2061–2069. [Google Scholar] [CrossRef]
Dey, S.; Chiuso, A.; Schenato, L. Remote estimation with noisy measurements subject to packet loss and quantization noise. IEEE Trans. Control. Netw. Syst. 2014, 1, 204–217. [Google Scholar] [CrossRef]
Feng, X.; Zhang, W.; Su, X.; Xu, Z. Optical remote sensing image denoising and super-resolution reconstructing using optimized generative network in wavelet transform domain. Remote Sens. 2021, 13, 1858. [Google Scholar] [CrossRef]
Shen, L.; Gao, M.; Yan, J.; Li, Z.-L.; Leng, P.; Yang, Q.; Duan, S.-B. Hyperspectral estimation of soil organic matter content using different spectral preprocessing techniques and PLSR method. Remote Sens. 2020, 12, 1206. [Google Scholar] [CrossRef]
Haider, N.S.; Periyasamy, R.; Joshi, D.; Singh, B. Savitzky-Golay filter for denoising lung sound. Braz. Arch. Biol. Technol. 2018, 61, 203. [Google Scholar] [CrossRef]
Van Fleet, P.J. Wavelet Shrinkage: An Application to Denoising. In Discrete Wavelet Transformations; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2019; pp. 231–260. ISBN 978-1-119-55541-4. [Google Scholar]
Hampel, F.R. The influence curve and its role in robust estimation. J. Am. Stat. Assoc. 1974, 69, 383–393. [Google Scholar] [CrossRef]
Chen, G.; Bui, T.D.; Krzyzak, A. Denoising of three-dimensional data cube using bivariate wavelet shrinking. Int. J. Pattern Recognit. Artif. Intell. 2011, 25, 403–413. [Google Scholar] [CrossRef]
Sun, K.; Simon, S. Bilateral spectrum weighted total variation for noisy-image super-resolution and image denoising. IEEE Trans. Signal Process. 2021, 69, 6329–6341. [Google Scholar] [CrossRef]
Zhang, H.; Liu, L.; He, W.; Zhang, L. Hyperspectral image denoising with total variation regularization and nonlocal low-rank tensor decomposition. IEEE Trans. Geosci. Remote Sens. 2019, 58, 3071–3084. [Google Scholar] [CrossRef]
Wei, W.; Feng, X. Accelerated Chambolle Projection Algorithms for Image Restoration. Electronics 2022, 11, 3751. [Google Scholar] [CrossRef]
Liu, G.; Huang, T.-Z.; Liu, J.; Lv, X.-G. Total variation with overlapping group sparsity for image deblurring under impulse noise. PLoS ONE 2015, 10, e0122562. [Google Scholar] [CrossRef] [PubMed]
Duran, J.; Coll, B.; Sbert, C. Chambolle’s projection algorithm for total variation denoising. Image Process. Line 2013, 2013, 311–331. [Google Scholar] [CrossRef]
Kumar, B.; Dikshit, O.; Gupta, A.; Singh, M.K. Feature extraction for hyperspectral image classification: A review. Int. J. Remote Sens. 2020, 41, 6248–6287. [Google Scholar] [CrossRef]
Camps-Valls, G.; Tuia, D.; Bruzzone, L.; Benediktsson, J.A. Advances in hyperspectral image classification: Earth monitoring with statistical learning methods. IEEE Signal Process. Mag. 2013, 31, 45–54. [Google Scholar] [CrossRef]
Plaza, A.; Martínez, P.; Pérez, R.; Plaza, J. A quantitative and comparative analysis of endmember extraction algorithms from hyperspectral data. IEEE Trans. Geosci. Remote Sens. 2004, 42, 650–663. [Google Scholar] [CrossRef]
Fernandez-Beltran, R.; Pla, F.; Plaza, A. Endmember extraction from hyperspectral imagery based on probabilistic tensor moments. IEEE Geosci. Remote Sens. Lett. 2020, 17, 2120–2124. [Google Scholar] [CrossRef]
McCarty, D.A.; Kim, H.W.; Lee, H.K. Evaluation of light gradient boosted machine learning technique in large scale land use and land cover classification. Environments 2020, 7, 84. [Google Scholar] [CrossRef]
Bui, Q.-T.; Chou, T.-Y.; Hoang, T.-V.; Fang, Y.-M.; Mu, C.-Y.; Huang, P.-H.; Pham, V.-D.; Nguyen, Q.-H.; Mu, C.-Y.; Huang, P.-H.; et al. Gradient boosting machine and object-based CNN for land cover classification. Remote Sens. 2021, 13, 2709. [Google Scholar] [CrossRef]
Yang, J.; Li, X.; Ma, X. Improving the Accuracy of Soil Organic Carbon Estimation: CWT-Random Frog-XGBoost as a Prerequisite Technique for In Situ Hyperspectral Analysis. Remote Sens. 2023, 15, 5294. [Google Scholar] [CrossRef]
Xu, X.; Chen, S.; Xu, Z.; Yu, Y.; Zhang, S.; Dai, R. Exploring appropriate preprocessing techniques for hyperspectral soil organic matter content estimation in black soil area. Remote Sens. 2020, 12, 3765. [Google Scholar] [CrossRef]
Francos, N.; Nasta, P.; Allocca, C.; Sica, B.; Mazzitelli, C.; Lazzaro, U.; D’Uros, G.; Belfiore, O.R.; Crimaldi, M.; Sarghini, F.; et al. Mapping Soil Organic Carbon Stock Using Hyperspectral Remote Sensing: A Case Study in the Sele River Plain in Southern Italy. Remote Sens. 2024, 16, 897. [Google Scholar] [CrossRef]
Gomez, C.; Rossel, R.A.V.; McBratney, A.B. Soil organic carbon prediction by hyperspectral remote sensing and field vis-NIR spectroscopy: An Australian case study. Geoderma 2008, 146, 403–411. [Google Scholar] [CrossRef]
Zarei, A.; Hasanlou, M.; Mahdianpari, M. A comparison of machine learning models for soil salinity estimation using multi-spectral earth observation data. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2021, 3, 257–263. [Google Scholar] [CrossRef]
Ye, Z.; Sheng, Z.; Liu, X.; Ma, Y.; Wang, R.; Ding, S.; Liu, M.; Li, Z.; Wang, Q. Using machine learning algorithms based on GF-6 and Google Earth engine to predict and map the spatial distribution of soil organic matter content. Sustainability 2021, 13, 14055. [Google Scholar] [CrossRef]

Figure 1. The flowchart of the proposed SOC estimation method.

Figure 2. Location of study area and distribution of SOC ground data. The red square indicates the place of study area in the Fars province.

Figure 3. Spectral signature of SOC before and after removing absorption bands from the PRISMA image.

Figure 4. Spectral signature of SOC in (a) the original reflectance band, (b) after denoising with the SG algorithm, and (c) after denoising with SG + FOD. It is worth mentioning that the spectral signatures are related to the point with 1.76% SOC concentration. It should be noted that SR refers to the surface reflectance value, SG refers to the Savitzky–Golay filter, and FOD refers to the first-order derivative.

Figure 5. Analyzing SOC’s spectral signature after denoising using with VisuShrink method across different wavelet mother functions. It is worth mentioning that the spectral signatures are related to the point with 1.76% SOC concentration. It should be noted that Original RB refers to the original reflectance band value and the x-axis shows a wavelength in nm.

Figure 6. Analyzing SOC’s spectral signature after denoising using with TV algorithm across different regularization parameters (

λ

). It is worth mentioning that the spectral signatures are related to the point with 1.76% SOC concentration. It should be noted that Original RB refers to the original reflectance band value and the x-axis shows a wavelength in nm.

Figure 6. Analyzing SOC’s spectral signature after denoising using with TV algorithm across different regularization parameters (

λ

). It is worth mentioning that the spectral signatures are related to the point with 1.76% SOC concentration. It should be noted that Original RB refers to the original reflectance band value and the x-axis shows a wavelength in nm.

Figure 7. Scatterplots and corresponding fitting curves of measured SOC and predicted SOC in Method One (M#1). The x-axis is equal to the measured SOC, and the y-axis is equal to the predicted SOC.

Figure 8. Prediction map of SOC over the study area in M#1 and different scenarios and ML algorithms.

Figure 9. The trend of accuracy changes in ML algorithms in Method One across different scenarios. It should be noted that NE refers to the number of estimations, LR refers to the learning rate, and score refers to the R².

Figure 10. Scatterplots and corresponding fitting curves of measured SOC and predicted SOC in Method Two (M#2). The x-axis is equal to the measured SOC, and the y-axis is equal predicted SOC.

Figure 11. Scatterplots and corresponding fitting curves of measured SOC and predicted SOC in Method Three (M#3). The x-axis is equal to the measured SOC, and the y-axis is equal predicted SOC.

Figure 12. Prediction map of SOC over the study area in M#3 and different scenarios and ML algorithms.

Figure 13. The trend of accuracy changes in ML algorithms in Method Three across different scenarios. It should be noted that NE refers to the number of estimations, Lr refers to the learning rate, and score refers to the R².

Figure 14. Scatterplots and corresponding fitting curves of measured SOC and predicted SOC in Method Four (M#4). The x-axis is equal to the measured SOC, and the y-axis is equal to the predicted SOC.

Figure 15. Prediction map of SOC over the study area in M#4 and different scenarios and ML algorithms.

Figure 16. The trend of accuracy changes in ML algorithms in Method Four across different scenarios. It should be noted that NE refers to the number of estimations, LR refers to the learning rate, and score refers to the R².

Figure 17. SOC prediction map from a specific part of the study is over different methods (M#1, M#3, and M#4), scenarios (S#1, S#2, S#3), and different ML algorithms.

Figure 18. Fifteen most important features after applying the permutation importance method on M#4–S#2 (left) and M#4–S#3 (right). It is worth mentioning that the black verticle line represents the standard variation of accuracy across different repetitions of the permutation importance method.

Figure 19. Comparison of different ML algorithms over different methods: (a) Method One (M#1), (b) Method Two (M#2), (c) Method Three (M#3), (d) Method Four (M#4).

Table 1. Technical characteristics of the PRISMA satellite.

Property	VNIR	SWIR	Panchromatic
Spectral range	400–1010 nm	920–2505 nm	400–700 nm
Spectral resolution	<12 nm	<12 nm	--
SNR	200 in the range of 0.4–1.0 µm	200 in the range 1.0–1.75 >400 at 1.55 µm 100 in the range of 1.95–2.35 µm >200 at 2.1 µm	240
Spectral bands	66	173	1
Data quantization	12 bit
IFOV	48.34 µrad
Spatial resolution	30 m	30 m	5 m

Table 2. The result of the investigation of different mother functions and different levels of decompositions of the discrete wavelet transform and their performance on SOC estimation based on R² value from the LightGBM algorithm. The bold number show the best result in used methods.

Mother Function	Level of Decomposition
Mother Function	Level2 (R²)	Level3 (R²)	Level4 (R²)	Level5 (R²)
db-1-sigma	59.85	62.77	61.29	60.27
db-1-sigma/2	48.81	50.75	52.35	51.67
db-2-sigma	39.17	42.30	44.25	43.11
db-2-sigma/2	41.32	38.42	39.73	39.28
db-3-sigma	39.87	44.68	46.02	44.52
db-3-sigma/2	41.03	41.36	41.37	40.39
db-4-sigma	41.9	43.34	44.42	42.73
db-4-sigma/2	44.4	41.72	41.59	41.01
db-5-sigma	45.18	47.59	48.33	48.42
db-5-sigma/2	43.46	48.14	43.55	45.41
bior-1.3-sigma	55.29	55.83	58.81	57.95
bior-1.3-sigma/2	47.68	49.11	47.50	48.11
bior-1.5-sigma	54.95	56.31	54.95	57.54
bior-1.5-sigma/2	47.36	46.67	46.41	43.06
bior-2.2-sigma	49.2	50.0	49.54	50.34
bior-2.2-sigma/2	46.3	47.97	47.22	47.49
coif-1-sigma	49.15	48.35	52.40	51.24
coif-1-sigma/2	47.21	45.65	46.52	47.12
coif-2-sigma	51.11	52.70	52.61	53.80
coif-2-sigma/2	47.72	48.51	49.53	49.64
coif -3-sigma	47.79	49.11	50.43	48.11
coif-3-sigma/2	48.16	48.20	47.37	48.31

Table 3. The summary of features used in different methods and scenarios.

Method	Different Scenarios
Method	Scenario One (S#1)	Scenario Two (S#2)	Scenario One (S#3)
Reflectance bands (Method One or M#1)	Reflectance bands (184 features)	Reflectance bands + 10 PCs (184 + 10 features)	Reflectance bands + 10 ICs (184 + 10 features)
SG + FOD (Method Two or M#2)	Denoised reflectance bands (184 features)	-----	-----
VisuShrink (Method Three or M#3)	Denoised reflectance bands with VisuShrink (db1-level3) (184 features)	Denoised reflectance bands with VisuShrink (db1-level3) + 10 PCs (184 + 10 features)	Denoised reflectance bands with VisuShrink (db1-level3) + 10 ICs (184 + 10 features)
Total variation (Method Four or M#4)	Denoised reflectance bands with TV-0.1 (184 features)	Denoised reflectance bands with TV-0.1 + 10 PCs (184 + 10 features)	Denoised reflectance bands with TV-0.1 + 10 ICs (184 + 10 features)

Table 4. General statistics of SOC data.

Number of Points	Mean (%)	Std (%)	Median (%)	Q1 (%)	Q3 (%)	Min (%)	Max (%)
123	1.03	0.35	0.98	0.90	1.15	0.13	2.2

Table 5. Overall statistics of prediction results in M#1 across different scenarios and various ML algorithms. The bold numbers show the best results in this method.

ML Algorithm	M#1–S#1				M#1–S#2				M#1–S#3
ML Algorithm	RMSE (%)	MAE (%)	R² (%)	RPIQ	RMSE (%)	MAE (%)	R² (%)	RPIQ	RMSE (%)	MAE (%)	R² (%)	RPIQ
RF	0.05	0.2	38.4	4.40	0.04	0.17	57.5	5.00	0.04	0.17	57.9	4.92
GBRT	0.05	0.21	38.1	4.38	0.04	0.16	60.8	4.95	0.04	0.17	56.9	4.91
LightGBM	0.05	0.19	40.1	4.42	0.04	0.15	65.05	5.10	0.04	0.15	63.2	5.05

Table 6. Overall results of SOC prediction across different methods (M#2–4). The bold numbers show the best results in each method.

Method	ML Algorithm				Different Scenarios
		S#1				S#2					S#3
		RMSE (%)	MAE (%)	R² (%)	RPIQ	RMSE (%)	MAE (%)	R² (%)	RPIQ	RMSE (%)	MAE (%)	R² (%)	RPIQ
Method Two (M#2)	RF	0.05	0.19	46.9	4.62	--	--	--	--	--	--	--	--
	GBRT	0.05	0.17	56.86	4.87	--	--	--	--	--	--	--	--
	LightGBM	0.05	0.17	54.92	4.82	--	--	--	--	--	--	--	--
Method Three (M#3)	RF	0.04	0.14	60.47	5.10	0.04	0.14	65.43	5.16	0.05	0.14	64.26	5.19
	GBRT	0.04	0.13	63.04	5.19	0.04	0.12	68.61	5.28	0.04	0.12	69.58	5.42
	LightGBM	0.04	0.13	64.03	5.23	0.04	0.11	70.04	5.38	0.04	0.12	68.81	5.38
Method Four (M#4)	RF	0.04	0.14	64.99	5.55	0.04	0.11	78.08	6.15	0.04	0.1	78.73	6.20
	GBRT	0.04	0.13	65.50	5.60	0.04	0.10	78.76	6.21	0.03	0.1	80.66	6.26
	LightGBM	0.04	0.13	66.95	5.81	0.03	0.09	81.74	6.29	0.03	0.09	81.57	6.39

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Golkar Amoli, M.; Hasanlou, M.; Taghizadeh Mehrjardi, R.; Samadzadegan, F. Exploring the Potential of PRISMA Satellite Hyperspectral Image for Estimating Soil Organic Carbon in Marvdasht Region, Southern Iran. Remote Sens. 2024, 16, 2149. https://doi.org/10.3390/rs16122149

AMA Style

Golkar Amoli M, Hasanlou M, Taghizadeh Mehrjardi R, Samadzadegan F. Exploring the Potential of PRISMA Satellite Hyperspectral Image for Estimating Soil Organic Carbon in Marvdasht Region, Southern Iran. Remote Sensing. 2024; 16(12):2149. https://doi.org/10.3390/rs16122149

Chicago/Turabian Style

Golkar Amoli, Mehdi, Mahdi Hasanlou, Ruhollah Taghizadeh Mehrjardi, and Farhad Samadzadegan. 2024. "Exploring the Potential of PRISMA Satellite Hyperspectral Image for Estimating Soil Organic Carbon in Marvdasht Region, Southern Iran" Remote Sensing 16, no. 12: 2149. https://doi.org/10.3390/rs16122149

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring the Potential of PRISMA Satellite Hyperspectral Image for Estimating Soil Organic Carbon in Marvdasht Region, Southern Iran

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Soil Sampling

2.2. PRISMA Hyperspectral Data

2.3. Denoising PRISMA Image

2.3.1. Savitzky–Golay and First-Order Derivative

2.3.2. Wavelet Shinrkage (VishShrink)

2.3.3. Standard Total Variation

2.4. Feature Generation

2.5. Different Methods and Scenarios

2.6. Machine Learning

2.7. Model Evaluation

3. Results

3.1. General Statistics of SOC Data

3.2. Different Methods’ Results

3.2.1. Method One (M#1)

3.2.2. Method Two (M#2)

3.2.3. Method Three (M#3)

3.2.4. Method Four (M#4)

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI