1. Introduction
The potential for Earth observation (EO) from space for resource management and optimization emerged in the early 1960s and throughout the 20th century, with the advent of digital computers, pattern-recognition technology, and the first artificial satellites. The ability to monitor large areas, while reducing per-unit costs, is one of the main advantages of space-based technologies [
1]. Remote sensing (RS) through EO offers the possibility to obtain quantitative information at the pixel level. Having this information helps to understand environmental and socioeconomic trends. Land cover and land use maps can be used for sustainable resource management and the study of climate change phenomena. According to the Eurostat definition, land cover (LC) corresponds to the physical coverage of the land surface, while land use (LU) refers to the socioeconomic use of the land [
2]. It is possible to create thematic maps of land cover using different classification methods. Whereas photointerpretation can be ambiguous and subjective, automatic classification is a defined, quantifiable, and repeatable process [
3] (p. 359).
Classifying involves assigning a land cover class to all pixels in a digital image. Automatic LULC classification is typically based on the notion that different land cover types have different spectral reflectance behaviors [
4]. Classification techniques (e.g., maximum likelihood, decision tree, and neural networks) are then used to define spectral signatures using sample data to discriminate between different classes of selected LULC based on pixel values [
5,
6]. Image classification, which can be supervised or unsupervised, is the process of extracting semantic Information from raw data, i.e., pixel values, by assigning a class label to each pixel [
7,
8]. When classifying land cover, supervised approaches offer better performance than unsupervised approaches but require a sufficient number of accurate samples [
9].
Machine learning (ML) and deep learning (DL) are two approaches that use artificial intelligence (AI) to classify RS images. Among the classification techniques, support vector machine (SVM), random forest (RF), and maximum likelihood are ML-based algorithms. On the other hand, artificial neural networks (ANNs) can be either ML or DL depending on the layers of the network [
10]. All these methods are common supervised classification techniques that are also used for hyperspectral imagery (HSI) [
11]. The use of ANN for image classification and change detection has been a well-known technique for many years. Later, the focus shifted to ML algorithms, such as SVM, that could handle large data sets with few training samples, or such as RF, that were easy to use with good accuracy [
10]. In the last decade, however, with the rise of DL, there has been a renewed interest in ANNs for their ability to produce good results in image analysis, including land cover classification [
10]. These methods are derived from neural networks but have greater computational power. DL methods use deeper layers to extract feature information, particularly in the case of CNN, by considering both spatial and spectral characteristics of images [
10]. DL models provide promising results for object recognition or classification of hyperspectral data, by better handling images with high spatial and spectral resolution [
11,
12].
Multispectral data are a key resource in remote sensing because they provide more than one spectral measurement per pixel [
13] (p. 4). Most multispectral satellites record information in the visible and near-infrared regions of the electromagnetic spectrum, in a number of bands ranging from three to six or more, improving the ability to distinguish man-made surfaces, vegetation, clear and turbid water, rocks, and soil [
14]. However, bands in multispectral sensors are not contiguous across the spectrum and often have bandwidths of 100–200 nm. As a result, unlike hyperspectral sensors, they do not have sufficient spectral resolution to directly identify materials with diagnostic characteristics [
15] (p. 306). Hyperspectral data cover a wide range, from visible (0.4–0.7 nm) to shortwave infrared (SWIR) (2.4 nm) and are useful for detailed land cover legends. The ability to distinguish features with similar spectral signatures is enhanced by the availability of multiple bands [
16]. In addition to improving the ability to discriminate between similar objects, hyperspectral data make it possible to perform advanced studies, for example, predicting the type and amount of crops traits by estimating grassland biochemical parameters [
17,
18].
HSI from aircraft has been in development since the 1980s, but the first hyperspectral space EO missions for civil and scientific purposes have only been available since the early 21st century [
19]. The main providers of space-based hyperspectral data over the last few decades have been Earth Observer-1 (NMP/EO-1) [
20], launched under NASA’s New Millennium Program in 1999, with the Hyperion spectrometer, and PROBA (Project for On-Board Autonomy), launched in 2001 with the Compact High-Resolution Imaging Spectrometer (CHRIS) developed by the European Space Agency [
21]. Among the latest EO hyperspectral missions are PRISMA (Hyperspectral Precursor of the Application Mission) and EnMAP (Environmental Mapping and Analysis Program). EnMAP is a German imaging spectroscopy mission that was launched in April 2022 and recently completed its commissioning phase [
22,
23]. PRISMA is developed by the ASI and has been in orbit since March 2019, with commissioning completed in January 2020 [
24]. The PRISMA mission is expected to contribute to the advancement of environmental RS by providing hyperspectral data for various applications, such as monitoring of agricultural crops, forest resources, and inland and coastal waters; mapping of natural resources, soil properties, and soil degradation; climate change studies; and environmental research [
25].
Working with HSI raises issues that are well known in the scientific community, related to the size of the data. The very high dimensionality of hyperspectral data, due to the large amount of information recorded in different bands, has some disadvantages: high storage costs, redundancy, and degraded performance [
26]. The dimensionality reduction is addressed using techniques based on band selection or feature extraction, as we will see below. However, the reduction in the size of the data must preserve the most relevant information it contains. To predict the separability of two classes of materials, the statistical distance between two spectral bands must be measured, and one of the most common methods is the Bhattacharyya distance [
1]. The other issue is related to the notion that the number of samples that are used to train a classifier has an impact on its accuracy. Therefore, in order to maintain accuracy, the number of training pixels per class needs to increase as the dimensionality of the data increases, according to the curse of dimensionality or the Hughes phenomenon (Hughes 1968) [
13,
27].
The literature on the use of PRISMA data, particularly in the field of land cover classification, is still limited. Most of the research shows the potential of PRISMA data for specific purposes, such as forest conservation with wildfire fuel mapping [
28] or fire detection [
29], geological applications [
30], cryospheric applications [
31], urban surface detection [
32], and mapping methane point emissions [
33]. There are also interesting studies in the agricultural field dealing with specific crop or vegetation type discrimination [
34,
35,
36,
37]. However, the possibility of using PRISMA data to distinguish LULC classes has not been fully investigated.
This research aims to evaluate the possibility of classifying permanent agricultural crops (i.e., orchards, fruit trees, olive groves, etc.) in a highly fragmented agricultural area using PRISMA data. The purpose of this study is to discriminate entities with very similar spectral signatures using HSI. From this point of view, high spectral resolution becomes an advantage. However, the handling of hyperspectral images may not be an easy task due to the high dimensionality of the data [
26,
27]. Therefore, as explained in the following sections, two techniques were tested to achieve data dimensionality reduction. For the purposes of the research, it was decided to compare three different AI-based classifiers by evaluating the accuracy of their results. In particular, among the consolidate methods for supervised classification tasks, RF, ANN, and CNN were chosen. This choice was made to test the PRISMA data using algorithms with known performance, and which are expected to produce different results [
38]. In addition, it was found useful to report the processing time for data of different dimensionality, used as input in each classifier.
4. Discussion
Land cover classifications based on multispectral data allowed us to distinguish more general land cover classes. The use of hyperspectral data has advanced the identification of detailed maps of LULC. However, as noted in the introduction, most of the research focuses on temporary crops [
34,
61,
65] or a few tree species [
57,
66]. The results show that a detailed legend for land cover maps can be obtained by supervised classification using HSI. In particular, it is possible to extend the classification to the fourth level, starting from the third level of CLC. This study shows how the PRISMA data can be used to distinguish between many types of permanent crops. In this case study, CLC class “222—Fruit and berry plantations” was divided into eight subclasses. The results show that in the case of CNN, using Cube-R1, the F1 score is higher than 0.7 for the following classes: hazelnut orchard—0.93; olive groves—0.90; citrus groves—0.88; walnut orchard—0.81; persimmon orchard—0.77; peach orchard—0.71. Based on these results, it can be said that the main objective of this study has been demonstrated. Supervised classification using HSI provides an opportunity to obtain a detailed legend for land cover maps. Furthermore, to understand the actual applicability of the data in the application domain, the worst-case scenario was assumed (non-multitemporal analysis with a single autumn image), but even in this case, the results were very interesting. It is also true, as the photos in
Figure 5 show, that the climatic conditions of the study area were conducive to the maintenance of the canopy over a longer period of time.
Among the different techniques used for the classification, the best choice seems to be the use of CNN. In terms of OA, K, PA, UA, and F1, this technique gave the best global result and per-class evaluation.
However, the classification with the RF algorithm gave interesting global results (OA—0.887; K—0.867; F1—0.603), using Cube-R2, but not comparable to those of the neural networks. It is an excellent alternative for land cover mapping with aggregated classes due to its very fast computation times. Indeed, excellent results were obtained with F1 scores above 80% for impervious, bare soil, temporary crops/low vegetation, and woods and forests. With respect to the classes of interest (ID from 9 to 18), the best results concern the classes hazelnut orchard (67%) and olive grove (59%).
Classification with ANN produced excellent overall results (OA—0.963; K—0.956; F1—0.766) using Cube-R1. The F1 score, which is higher than in the RF case, for some of the classes of interest (ID from 9 to 18), such as cherry and poplar groves, is less than 60% and 0% in the case of vineyards. It may be possible to obtain better results for these classes with an image from a different season. But in the case of olive, hazelnut, and persimmon groves, results higher than 80% in terms of F1 seem to be very favorable. Among the disadvantages, it is necessary to highlight the onerous cost of calculation in terms of hours spent.
In the case of convolutional neural networks, using Cube-R1 as in the previous case, the OA and global K coefficients are very high (OA—0.973; K—0.968; F1—0.842). This is because they are influenced by the very high level of accuracy in the case of aggregated classes such as impervious. However, in this case also, UA, PA, and F1 values are higher compared to the previous techniques. The convolutional network seems to guarantee better classification results, especially for the following classes: hazelnut, olive, persimmon, and walnut. On the contrary, for cherry, vineyard, and apricot, the classification is less accurate. For the classes of interest (ID from 9 to 18), the classification output shows percentages higher than 70% in terms of the F1 score in 6 out of 10 cases (hazelnut orchard—0.93; olive grove—0.90; citrus grove—0.88; walnut orchard—0.81; persimmon orchard—0.77; peach orchard—0.71).
Convolutional technology produces an overall F1 score that is almost 10% better than ANN and 25% better than RF. When evaluating the results obtained in the classes of interest (ID from 9 to 18), CNN guarantees almost 30% better F1 results compared to RF in all classes. Compared to ANN, there are improvements of more than 20% in the vineyard, citrus, and poplar classes; they are equivalent for the apricot and persimmon classes; while for all the others, there are slightly better results with CNN.
Based on the results obtained, it has been proven that the convolutional network guarantees excellent results with PRISMA hyperspectral data, especially with the cube resulting from the band selection (Cube-R1) and with a number of epochs equal to 100.
5. Conclusions
This study shows that the HSI data, by achieving very high classification results in terms of F1 scores for some classes, can discriminate between different types of permanent crops (ID from 9 to 18). As shown before, the results in terms of F1 score are higher than 0.70 in 6 out of 10 cases (hazelnut orchard—0.93; olive grove—0.90; citrus grove—0.88; walnut orchard—0.81; persimmon orchard—0.77; peach orchard—0.71). This shows that satellite imagery can be used for level IV classification and confirms studies based on PRISMA data to distinguish crops [
34,
67], vegetation [
36], or forest types [
35].
The most promising technology for this kind of application is based on neural network methods, especially CNNs (CNN F1 results are almost 30% better compared to RF for permanent crops). Indeed, various studies have demonstrated the potential of CNNs for hyperspectral data classification compared to other techniques [
38,
59,
61,
62]. The best results were obtained with CNN using Cube-R1 (156 bands). This confirms that strong dimensionality reduction is not necessary when using DL-based methods [
37].
When dealing with high dimensionality of data, methods based on ML such as RF have limitations [
68]. This aspect necessitates information reduction, which inevitably has an impact on the ability to discriminate classes with similar characteristics. However, as shown in [
58], better results can be obtained with focused band selection.
The processing times may not be relevant for a case like this, where only a single data set is processed. However, this aspect could become crucial for multitemporal analyses or when experiments are carried out with different sample data. Although the best results were obtained with CNN, this method is more time consuming than RF. In fact, the main disadvantages of this method are the computational time and computer power consumption [
59,
69]. However, it is important to note that the algorithms have been used on open-source tools, so most of the processing/analysis is single-threaded. The time required could be significantly reduced as open-source software evolves to a multithreaded perspective.
In terms of reducing the high dimensionality of the data, the best results were obtained using Cube-R1 derived by band selection. Eliminating bands in the three regions of the spectrum where the transmittance is low is always useful [
48,
49,
50]. Data obtained with this method preserve more than 150 bands of the original 240. It guarantees good results when used as input for neural networks, but not for RF. Feature extraction using the PCA method is essential in this case. Starting from the original Cube-O, PCA can significantly reduce data dimensions while preserving a high percentage of variance. Therefore, it is useful for discriminating between very different classes, but as stated in the previous paragraphs, it tends to overlook the few differences between similar classes.
As mentioned in the introduction, there is to date a lack of studies using PRISMA for LULC classification. The reasons are mainly due to the difficulties in accessing and managing the data. However, future improvements are expected in both data availability, partly due to new hyperspectral missions such as EnMAP [
23], and data quality, using higher-accuracy DEMs for orthorectification. This would improve the reflectance accuracy, and thus even more accurate classifications are likely. It is expected that the interest in HSI will increase soon, especially because of the possibilities and advantages in the field of natural and agricultural land classification and monitoring.
The analysis of multi-seasonal imagery will certainly be a possible future application of PRISMA hyperspectral data for land cover mapping. The most promising way seems to be the use of convolutional networks. Future tests will certainly favor CNN by including different models for the network architecture. A chance to improve the results could certainly be achieved by making use of 3D CNNs that take advantage of the combination of spatial and spectral information [
70]. Also, a different dimensionality reduction method with more specific band selection may be used. This would allow for better results with ML algorithms that are less time-consuming, such as RF, or less sensitive to the Hughes phenomenon, such as SVM [
71].