Next Article in Journal
First Retrieval of Sea Surface Currents Using L-Band SAR in Satellite Formation
Previous Article in Journal
Confidence-Aware Ship Classification Using Contour Features in SAR Images
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Characterising the Thematic Content of Image Pixels with Topologically Structured Clustering

School of Geography, University of Nottingham, Nottingham NG7 2RD, UK
Remote Sens. 2025, 17(1), 130;
Submission received: 22 November 2024 / Revised: 30 December 2024 / Accepted: 31 December 2024 / Published: 2 January 2025
(This article belongs to the Section Environmental Remote Sensing)


The location of a pixel in feature space is a function of its thematic composition. The latter is central to an image classification analysis, notably as an input (e.g., training data for a supervised classifier) and/or an output (e.g., predicted class label). Whether as an input to or output from a classification, little if any information beyond a class label is typically available for a pixel. The Kohonen self-organising feature map (SOFM) neural network however offers a means to both cluster together spectrally similar pixels that can be allocated suitable class labels and indicate relative thematic similarity of the clusters generated. Here, the thematic composition of pixels allocated to clusters represented by individual SOFM output units was explored with two remotely sensed data sets. It is shown that much of the spectral information of the input image data is maintained in the production of the SOFM output. This output provides a topologically structured representation of the image data, allowing spectrally similar pixels to be grouped together and the similarity of different clusters to be assessed. In particular, it is shown that the thematic composition of both pure and mixed pixels can be characterised by a SOFM. The location of the output unit in the output layer of the SOFM associated with a pixel conveys information on its thematic composition. Pixels in spatially close output units are more similar spectrally and thematically than those in more distant units. This situation also enables specific sub-areas of interest in the SOFM output space and/or feature space to be identified. This may, for example, provide a means to target efforts in training data acquisition for supervised classification as the most useful training cases may have a tendency to lie within specific sub-areas of feature space.

1. Introduction

The thematic content of a pixel determines its location in spectral feature space. Pixels of spectrally distinct land cover classes will, for example, occupy dissimilar locations in feature space. The intermediate area between these latter locations in feature space may also contain, amongst other things, pixels that represent a mixture of the classes. The relationship between the thematic composition of a pixel and its location within feature space is critical to applications such as thematic mapping via image classification analysis. The latter partitions feature space, usually using unsupervised or supervised classification methods, into a set of mutually exclusive clusters or classes. Information on the thematic composition of pixels can be an input to a classification (e.g., labelled pixels for training and testing a supervised classification) and an output (e.g., labels for pixels predicted by unsupervised or supervised classifiers). In each case, the information normally available typically comprises only a class label for each pixel; a pixel-based approach is assumed throughout as the fundamental spatial unit of remotely sensed imagery, but the work is relevant to other spatial units. Little if any information is normally provided on the spectral or thematic similarity of the clusters or classes although this could be useful information.
This paper aims to explore the potential to characterise the thematic composition of pixels using topologically structured clustering. The latter is an unsupervised approach that clusters together similar pixels but also conveys information on the relative similarity of clusters. In addition to characterising the thematic composition of pixels, the information generated could be used in other applications such as helping to target sub-areas of feature space or pixels of particular composition, which could be of value to a study, notably as training cases for a supervised classification.

1.1. Background

The ability to characterise the thematic content of pixels in a remotely sensed image is fundamental to many analyses. An unsupervised classification can be used to cluster together spectrally similar pixels and the spectral classes defined can sometimes relate to thematic classes of interest. Thus, sometimes an unsupervised classification may be used to help generate training data for a supervised classification [1]. Alternatively, class labels indicating the thematic composition of a set of pixels in a ground reference data set are often used in the development as well as the evaluation of supervised image classifications that predict class membership for pixels [2,3]. Typically, the determination of class labels for this purpose is undertaken for a sample of pixels and requires ground reference data often arising from fieldwork or image interpretation. Acquiring the ground data to label pixels can be a challenging and expensive task [4,5,6,7] and hence methods to aid the characterisation of the thematic content of pixels are desired. This is particularly the case with training data for supervised image classification applications.
Pixels that were labelled thematically are often used as reference data in supervised image classification analyses. The latter comprise three stages: training, class allocation, and testing. In the first stage, training, the ground reference data are used essentially to teach the classifier to identify the classes of interest to allow their discrimination [2]. A variety of approaches may be used in the training stage but often the reference data are split into two separate data sets, one explicitly for training and the other for some sort of validation activity [8,9]. For example, reference data may sometimes be used in an iterative cross-validation process to aid the design of an effective classifier and select the apparently most optimal classifier [10]. In the second, the class allocation stage, pixels are allocated a class label, and the remotely sensed image is converted into a thematic map. In the third stage, testing, reference data are used to evaluate the accuracy of the map generated by the classification analysis [3,11]. Thus, three sets of reference data that are normally independent of each other to avoid bias may sometimes be required; many classifiers need only two sets of samples for the purposes of training and testing while some define additional data sets [8].
The aim of using ground reference data differs in the stages of a supervised classification analysis and hence the nature of an ideal sample of reference data for each also differs between them. Forming the reference data sets should therefore be based ideally on their intended use. For example, a sample of pixels acquired for optimal testing could differ markedly from one for optimal training. A single one-size-fits-all approach to the formation of optimal reference data sets is unlikely to exist.
With regard to the testing stage, good practice recommendations exist for the design of the testing set in order to generate a statistically rigorous assessment of classification accuracy [11]. There are also a range of activities undertaken to aid accuracy assessment such as the generation of validation databases [12,13]. The good practice recommendations advise on issues such as the nature of the sample of pixels to be used to assess classification accuracy and stresses the need for a representative and unbiased sample of pixels. Consequently, the need for the testing sample to be acquired with a probability sampling design is a key feature of good practice advice for accuracy assessment [11]. The training stage is however different to the testing stage and ideally requires a sample of reference data suited to its particular needs. Thus, an ideal training sample could deviate greatly from that which would be acquired via a probability sample design.
As the first stage of the standard three-stage supervised classification analysis, it is important that training is effective or the resulting classification may be of poor quality [14]. A variety of statements were made in the literature about training data requirements for image classification. For example, it is common to find arguments put forward that the training set should contain a large and representative sample of exemplar training pixels with the number of pixels balanced over the set of classes (e.g., [2,3,15]). In some circumstances this could be an appropriate approach, but the suggested attributes are not universally applicable or required for training a classifier. A large literature exists on the characteristics of the training data and how a classification analysis should be undertaken. From this, the validity of the statements on issues such as the size, representativity, typicality, and balance of the training pixels can be questioned.
Key attributes of a training set that impact its suitability for use in a supervised classification analysis are its size and composition [16,17] as well as geographical dispersion [18]. Classifiers do differ in their sensitivity to key properties of the training set such as its size [19]. A substantial literature suggests the need for a large training set. A large number of training samples may be required if following conventional heuristics to guide the training process. The 30p heuristic, where p indicates the number of discriminating variables (e.g., spectral wavebands) used, for example, is widely adopted with conventional statistical classifiers [2,3]. This requirement arises from the need to generate high quality descriptive statistics for each of the classes. If, for example, there are insufficient reference samples then poor estimates of the covariance matrices may be generated, and this can impact negatively on a classification analysis [20]. The size of the training set required for a popular statistical classifier such as the maximum likelihood classifier thus scales directly with the dimensionality of the data to be classified and so can be large with, for example, hyperspectral data sets [21]. Failure to use a sufficiently large training set exposes the analysis to the curse of dimensionality that could result in the generation of a classification with sub-optimal accuracy. Similar heuristics exist for other classifiers. For instance, with feedforward artificial neural networks, the required training set size is a function of the network size defined by the number of units and weighted connections contained within it [22]. Some classifiers, including popular contemporary machine learning methods such as a convolutional neural network (CNN), are also suggested to be data greedy and often require large training sets [5,23,24,25,26,27,28] although the complexity of the analysis influences the required training set size [9]. Recently, a theoretical basis for calculating the minimum sample size required was proposed [29]. This may be helpful as acquiring a large training sample can be a major challenge, especially if the data are to be acquired close to the date of image acquisition and if a temporal series of images is to be used. Moreover, beyond a critical sample size, further increases in the number of training pixels may not have a significant impact on classification accuracy [30,31]. Cost issues are especially apparent if ground reference data are to be acquired through an expensive programme of fieldwork [4,32]. However, there is also substantial literature that shows that small, often carefully selected, training sets can be used to generate accurate classifications [21,33,34,35]. Critically, characteristics of the training data set beyond its size are important in the development of an accurate classification [27,34].
It is often stated that the classes should be balanced in terms of their size in the training set to obtain an accurate classification [36,37]. Balance could be achieved by acquiring an equal number of training pixels for each class. If imbalance exists, the training set could be balanced prior to its use in classification [38,39]. For example, under-sampling could be used to reduce the number of pixels for a class that is overly abundant in the training set while over-sampling could be used to increase the number of pixels that are of a relatively rare class in the training set. Both under- and over-sampling can be effective but are not problem-free. Under-sampling is, for example, wasteful of potentially useful information and over-sampling can result in overfitting [40]. Additionally, it is well known that imbalance can be useful to a classification analysis. For example, imbalance that reflects the proportional occurrence of the classes in the region to be mapped can be constructive [41,42].
A representative sample is often suggested as being desirable in training a supervised image classification [22,23]. This could be especially valuable for conventional statistical classifiers as it would aid the derivation of accurate class descriptive statistics. Random sampling to generate such a training set is typically suggested and this could also help ensure that the number of pixels per class reflects their actual abundance in the region to be mapped. However, this approach would be difficult and expensive to achieve especially if a large sample size was required. Furthermore, it is established that a highly biassed and distinctly unrepresentative sample of relatively atypical pixels may be appropriate for the purpose of training a classifier [33].
Although training often focuses on exemplar or typical pixels of each class, this may not always be essential or desirable. As stressed above, atypical pixels of classes can sometimes be useful in training, certainly more useful than an exemplar pixel that lies at or near the centroid of the class in feature space [34,43,44,45]. Indeed, the most useful pixels for training a classifier sometimes tend to lie in distinct regions of feature space, which are also classifier-specific [43]. Additionally, there is no requirement for the pixels used in training to even be pure (i.e., represent an area covered by a single class). Mixed pixels can be highly effective training pixels [34,45]. This is especially the situation when a mixed pixel lies between the classes in feature space where it provides information on more than one class and may lie close to where a separating hyperplane could be fitted. Some classifications can be trained with mixed pixels [45,46] and hence there is sometimes a desire to be able to characterise and label such pixels as well as those that are pure.
In reality, there is no single universally applicable set of best practices to apply in the design of the training stage. Key aspects of training are classifier-specific, and the nature of the classification problem faced (e.g., number and relative separability of classes) influences the design of an optimal training stage. However, it is possible to define aspects of the nature of a useful training set based on the way the classifier to be used partitions feature space as well as the aims and nature of the mapping programme.

1.2. Forming a Useful Training Set

The way classifiers partition feature space to allow for the labelling of pixels of unknown class membership is of fundamental importance. There is no universal requirement to fully describe spectrally every class that exists in the region to be mapped but rather a desire to obtain data that help to separate the classes of interest, and this potentially requires only data for a sub-set of the feature space and/or classes present. Consider a simple hypothetical example involving just two separable classes that lie in a two dimensional feature space. Each class occupies a well-defined but separate region of the feature space. There is no need to fully sample the spectral response of each class to locate a suitable hyperplane to separate them. Rather, only a few training pixels from the edge of each class distribution that faces directly onto the other class is needed; this is especially the situation if a margin-based classifier such as a support vector machine (SVM) is used [33]. A large representative sample of pixels may include these useful pixels but inefficiently and expensively with the majority of pixels acquired unused. A small, highly unrepresentative and biased set of training data is sometimes all that is needed [33,43]. Critically, individual pixels can vary in their value for training a classifier and recognising this may allow for the development of methods to intelligently form training sets for a selected classifier [33,43,47].
A small carefully selected set of pixels for training can be ideal for training a classifier. As noted above, this is the situation with margin-based classifiers such as a SVM that only require pixels that represent effective support vectors [33,48,49]. It is these and only these pixels in the training set that contribute to the fitting of the classification hyperplanes. Such a sample could also be useful for use with other classifiers such as the standard multi-layer perceptron neural network using backpropagation learning. However, it would not be ideal and indeed possibly of little if any value for use with other classifiers. For example, a relevance vector machine (RVM) may also favour the use of atypical pixels but of a very different nature [43]. Critically, individual training pixels may be of unequal value to a classification and much depends on the specific classifier to be used. Nonetheless, the most useful pixels for training sometimes tend to cluster in specific regions of feature space and these regions may vary between classifiers [33,43]. Thus, for a selected classifier, it may be possible to focus attention on potentially useful pixels for training by targeting sampling on specific sub-regions of feature space. As noted with a SVM, for example, it may be possible to focus on the potentially informative pixels and divert efforts away from the relatively uninformative pixels. In some instances, the locations of such pixels, in feature space or even geographically, may be predicted [43]. This could enable accurate classification from small training sets.
Based on knowledge of how a classifier works and the desired classification output, there are opportunities to define means to generate small but useful training sets. Often, the focus of a study may be on only a sub-set of the classes that exist in a region [27,50,51]. This desired output presents opportunities to form useful but small training sets. For example, it may sometimes be possible to omit classes of no interest from the training set [51]. For classifiers that fully partition feature space, pixels of an omitted class that exist in the area to be mapped will be commissioned into sets of trained classes [51,52]. Such errors however may be of no concern if they are confined to classes of no interest. Alternatively, only a small number of training pixels for classes of no interest that are spectrally highly separable could be used. Such classes may be described poorly, and sub-optimal hyperplanes may be fitted but this again need be of no consequence to the application at hand [51]. An even more extreme situation is when interest is focused on a single class. In such circumstances, one class classification using only, or at least mainly, training pixels for that class may be used. The support vector data description (SVDD), essentially a one class version of the SVM, enables accurate classification from a small sample of pixels of the class of interest [51]. The SVDD is also useful when the class set is non-exhaustive [50,51].
The source of ground reference data can be important, especially in relation to its quality. Ground reference data are typically assumed implicitly to be from a gold standard (i.e., to be 100% accurate) but rarely is this the case. Large errors in labelling were noted for popular sources of ground data such as those arising from professional image interpreters [1,53,54] and citizens volunteering data [55,56]. Ground data sets do contain errors, and this can substantially impact a classification analysis [29,57,58]. Little attention has focused on the effect of training data error on classification analyses [8], yet it is a major problem as mislabelled training pixels will degrade the training set. For instance, mislabelled pixels would degrade class descriptive statistics and subsequently the key summary statistics such as the mean and covariance used in standard statistical classifiers. The effect can be even more important with machine learning methods that utilise the data for training pixels individually rather than summary descriptions based on all of them. If, for example, pixels identified as support vectors in a SVM classification were actually mislabelled, this could have a substantial negative impact on a classification analysis [57]. The effect of mislabelled pixels would also be expected to vary with their abundance in the training set, which could be problematic in very small data sets. However, if only small training sets are required, this would allow greater care and attention to be paid to labelling than if a larger sample was to be acquired. A variety of actions may also be taken to address the impact of errors in the training data [8].
Once acquired, a training sample may be subject to some pre-processing operations before being used to train a classifier. This might include analyses such as those that act to reduce noise, remove mislabelled pixels and down-weight, or even remove potential outliers [32,59,60,61,62]. These can be operations that enhance an analysis, but there are also dangers in their use. For example, the support vectors in a classification by a SVM are often atypical pixels lying at the edge of a class’s distribution with the potential to be considered as outliers, and their removal could be detrimental to the classification analysis. In addition, classifiers vary in tolerance to potential distortions caused by incorrectly labelled pixels. For example, a conventional statistical classifier may sometimes be less sensitive to mislabelled training pixels than a SVM [57]. Other classifiers may also be relatively insensitive to a degree of mislabelling [37].
Often, there is a desire to focus on automating aspects of the training stage and enhancing the training set generated. This could, for example, make use of the abundant unlabelled pixels [63,64] in the image or pre-trained classifiers [65]. Popular approaches include those described as semi-supervised learning [5,66], active learning [14,36,67,68], and transfer learning [7,28,69]. Historical land cover data may also be useful [61,70,71,72]. A key feature often exploited in the use of unlabelled pixels is the variable importance of the available pixels in the imagery to be classified for training purposes. Thus, for instance, active learning methods may identify the pixels with greatest class allocation uncertainty. These pixels could be targeted for labelling and added to the training set as they may be expected to be useful in forming an effective classifier. However, this type of approach typically requires the assessment of uncertainty across the entire image and over all classes. This may not always be ideal as sometimes interest is focused on a particular sub-set of classes and a desire to limit the geographical scope for logistical and financial reasons. Focusing efforts on a limited part of the entire data set would therefore be beneficial [36]. In particular, it would be more useful if attention could be focused on the class(es) of interest and spatially constrained to regions that were spectrally similar with other, spectrally distinct regions ignored. A more focused approach to identifying potentially useful pixels for training could be based upon the use of an unsupervised classification, which not only clusters together pixels that are spectrally similar, but also indicates the relative spectral similarity of the clusters.
Unsupervised classifications were used to inform the training of supervised classification analyses [1,73] including use in targeting the most useful training pixels [44]. If the classes are spectrally distinct, this may be a viable process but sometimes the spectral classes output from an unsupervised classification do not have a simple relationship to the meaningful classes to be used in the supervised classification. Also, there is often little information beyond the set of cluster memberships generated. For example, popular unsupervised classifiers such as the k-means or ISODATA algorithm cluster together spectrally similar pixels, and their basic output is the allocated cluster label for each pixel [2,3]. However, the Kohonen self-organising feature map (SOFM) neural network provides a means of unsupervised classification that will not only cluster together spectrally similar pixels but also explicitly indicate the relative spectral similarity of the clusters [74,75]. This could enable a focus on the cluster(s) of relevance and be used to target new pixels for training to enhance supervised classification. That is, the clusters representing the classes of interest, including pixels of mixed composition, could be highlighted and used to help select training pixels with others ignored. This allows for a focus on thematically relevant pixels and constrains the area of the imagery from where they are to be obtained facilitating targeted ground data collection for pixel labelling. The aim of this article is to show that a SOFM can be used to characterise the thematic composition of pixels and generate an unsupervised classification that can aid the selection of training sites for supervised classification. Mapping the clusters of interest in feature and/or geographical space may help target useful pixels for training the supervised classification.

2. Materials and Methods

The following two data sets were used. First, fine spatial resolution imagery acquired by an airborne thematic mapper (ATM) of an urban site was used. Second, Sentinel-2 multi-spectral instrument (MSI) imagery acquired for an agricultural region was used.

2.1. ATM Data

The ATM data set was used in earlier research to illustrate the ability to extract accurate sub-pixel scale land cover information from remotely sensed imagery, and is described in [76]. As illustrated in [76], the image data set was acquired for a test site in the city of Swansea, UK, with an airborne thematic mapper in 11 spectral wavebands and at a spatial resolution of approximately 1.5 m. The specific focus of this previous work was on a small region comprising three land cover classes: trees, grass, and asphalt.
The image data in three wavebands that offered a high level of spectral separability were spatially degraded with an 11 × 11 low pass filter to simulate crudely a coarser spatial resolution image of the site. These bands were denoted b1–b3 and represent the data acquired in the 605–625 nm, 695–750 nm, and 1550–1750 nm wavebands of the ATM sensor, respectively. The fine spatial resolution image was classified to form a fine spatial resolution reference data set from which the class composition of pixels in the coarser resolution data set could be determined. The focus was on a sample of 50 coarse spatial resolution pixels. For each thematic class, at least 5 coarse spatial resolution pixels were pure with the class composition of the remaining 35 pixels being variable, including pixels that were highly mixed, dominated by a single class, or even pure [76].

2.2. Sentinel-2 MSI Data

A Sentinel-2 MSI image was acquired for a region of agricultural land located to the west of the village of Feltwell in the UK (Figure 1).
Attention focused on a single class: oil seed rape. A notable feature of this crop is that it has a very distinctive flowering period, typically in late April and early May in the UK. To take advantage of this feature and to help enhance the separability of the class of interest, a Sentinel-2 image acquired on 7 May 2018 was obtained (Figure 2). Not only would the fields planted to oil seed rape be expected to in flower at this time, but some other fields would be bare, enhancing further the spectral separability of the class of interest in the region to be mapped. A crop map produced by the UK Centre for Ecology and Hydrology for 2018 was acquired for use as the ground reference data (Figure 1).
With the Sentinel-2 MSI imagery (Figure 2), attention was focused on the data acquired in four wavebands with a 10 m spatial resolution: Sentinel-2 bands 2 (blue), 3 (green), 4 (red), and 8 (near-infrared). To ensure that the data set would comprise pixels of variable thematic composition (i.e., comprise both mixed and pure pixels), the Sentinel-2 imagery were spatially degraded using a 17 × 17 mean filter. This degradation produced a coarse spatial resolution image of the test site that comprised 2128 pixels (Figure 2c).

2.3. Methods

In earlier work using the ATM data set [76], the attention had focused on the relationship between the actual and predicted areal coverage for each land cover class separately. Here, the focus was on the complete fractional class composition of the sampled pixels. The class composition data were rescaled to ensure that the total class coverage for each simulated coarse spatial resolution pixel summed to exactly 100% as the attention was on the thematic composition of individual pixels. The data in the 3 coarse spatial resolution images generated were inputted into a SOFM to form an unsupervised classification.
To simplify the analysis and aid the visualisation of the results from the analyses of the Sentinel-2 data, an uncentred principal components analysis (PCA) using the covariance matrix was used as a feature reduction analysis [2]. The data in the first two principal components, which explained 99.8% of the variance in the acquired Sentinel-2 data, were used as input into a SOFM. The loss of information was small and helped visualisation.
A conventional SOFM was used to generate unsupervised classifications of the data sets. This type of neural network consists of just 2 layers: an input and an output layer. The input layer contains an input neuron or unit associated with each input variable (e.g., spectral waveband) used. The output layer is normally composed of a set of output units arranged in low dimensional space. The output layer is, as used here, often a two dimensional array of output units (Figure 3). Each input unit is linked to every output unit with a weighted connection [75]. Similarly, each output unit is connected to its neighbouring output units [77]. The SOFM produces an unsupervised classification via a competitive learning process, the output of which is topologically structured [78,79,80]. This analysis clusters together similar pixels, associating them with one or more output units that are close to each other in the SOFM output layer. Conversely, spectrally dissimilar pixels lie in output units that are distant from each other in the output layer. Although the output layer is topologically structured, distances and angular relationships between output units may not be directly comparable. The key issue however is that units close together will contain relatively spectrally similar pixels and the degree of similarity between pixels associated with output units declines as distance between the output units assessed in the output layer increases. As well as being used for unsupervised classification [81], the SOFM is also attractive as a means of dimensionality reduction [74] and for informing the input of supervised classifications [74,82].
The input data to the SOFM comprises a vector x, the length of which is the number of the input variables m and thus x = (x1, x2xm). Each input unit is linked to every unit in the SOFM’s output layer by a weighted connection. Training starts with these weights set at random values. The synaptic weight vector for unit j is w = (wj1, wj2wjm). The SOFM output unit that best matches the input data is defined as the winning unit and identified by finding the output unit with the smallest Euclidean distance between the input and weight vectors. The location of this winning unit in the output layer determines the neighbourhood in the SOFM output layer in which weight updating occurs. The weight update is achieved as follows:
w t + 1 = w t + α t × β t × x t w t
in which wt is the weight at time t, α is the learning rate, and β is the neighbourhood function [79,83]. Thus, the SOFM requires two parameters to be defined: a neighbourhood function and a learning rate. The neighbourhood is a region of the output space defined around the winning unit, and weight updating is limited to only those weights associated with units that lie within the defined neighbourhood. The size of the neighbourhood typically decreases during the analysis. Similarly, the magnitude of the learning rate also often decreases during the analysis. Variants of the basic SOFM exist [84] and the exact approach and settings used are, like in this study, often defined after a series of trials.
With the analyses of the ATM data, the SOFM had 3 input units, 1 for each spectral waveband of data used. The output layer comprised 16 units arranged in the shape of a square (Figure 3).
With the analysis of the Sentinel-2 data, there were only 2 input units for the first 2 principal components extracted from the PCA. Most attention focused on analyses in which the output layer was square and comprised 16 units. However, further analysis was also undertaken using a larger output layer comprising 100 units arranged in the shape of a square. The latter was investigated since research has shown that larger networks can allow enhanced inter-class separation [77] and may thus provide better characterisation of the thematic composition of pixels than an SOFM with a small output layer.The same approach to training was used in all analyses with the SOFM. As is often undertaken, this was a two stage process involving a coarse followed by fine tuning [77]. In the first stage, the learning rate declined from 0.5 to 0.1 over 100 training epochs with a fixed neighbourhood of 1.0. The second stage also ran for 100 training epochs with the learning rate fixed at 0.1 and the neighbourhood at 0. Once trained, a data set can be passed through the SOFM and every pixel allocated to its winning output unit. The units in the output layer were allocated a numerical label, starting with 1 in the top left of the output layer and incremented consistently until reaching the final output unit in the bottom right of the output layer (Figure 3). Spectrally similar pixels would be expected to lie within a single or set of neighbouring output units. Spectrally dissimilar pixels would be expected to be associated with distant output units. Thus, the output layer can be partitioned to enable class labels to be assigned to pixels. Here, attention focused on the relationship between the thematic composition of pixels and the location of the winning unit they are associated with in the SOFM output layer. With the ATM data, this assessment was aided by the use of Spearman rank correlation analyses focused on the relationship between thematic composition and the distance between units in the output layer. Additionally, the average squared Mahalanobis distance (d2) between the pixels associated with an output unit and the class centroid, determined using the 5 pure cases of each class in earlier work [76], was assessed. The latter was calculated for each pixel using the following equation:
d 2 = x μ T 1 ( x y )
where µ indicates the location of the class centroid, x the location of the pixel, and ∑ is the covariance matrix. The average value calculated over all pixels associated with an output unit indicates how far that set of pixels is from the relevant class centroid. The smaller the value the closer the pixels lie to the centroid of the class.

3. Results

The contents of the winning units in the output layer of each of the SOFMs, generated using the ATM and Sentinel-2 data sets, were assessed and explored.

3.1. ATM Data

The data for the 50 simulated coarse spatial resolution pixels were entered into the trained SOFM and were distributed across the SOFM’s output layer. Every output unit was associated with at least one pixel (Figure 4). The class composition of the pixel(s) allocated to each output unit was assessed and the mean value over all pixels associated with each output unit is summarised in Figure 5.

3.2. Sentinel-2 MSI Data

The data for the 2,128 coarse resolution pixels were inputted into an SOFM and each allocated to its winning SOFM output unit. As with the analysis of the ATM data, all output units were associated with some of the pixels (Figure 6). Plotting the pixels belonging to each output unit in feature space showed that the SOFM output space maintained key spectral information contained in the imagery (Figure 7).

4. Discussion

4.1. ATM Data

It was evident that the three land cover classes were associated with different locations in the SOFM output layer. Indeed, output unit 16 (Figure 3) contained 6 pure pixels of the grass class. Output unit 13 contained 9 pixels strongly associated with asphalt, 8 of which were pure asphalt and 1 substantially dominated by asphalt (87% asphalt cover); the average asphalt cover for the pixels associated with the output unit was 98.5%. The trees class was found to be associated with 2 adjacent output units, unit 2 (containing only pure pixels of trees) and unit 6 (containing 2 pure pixels of trees and 4 pixels dominated by trees); the average tree cover for the unit was 87.1%. In between these specified output units that are strongly associated with a single class were units associated with mixed pixels, and these were arranged in an order determined by their thematic content.
As expected, the output of the SOFM maintained important spectral information of the imagery. Specifically, the pixels allocated to neighbouring output units were close to each other in spectral feature space (Figure 8). This is especially apparent in Figure 8b-d in which the data show a broadly triangular structure. Each apex of the latter is associated with one of the classes; it is worth noting that the pixels allocated to unit 16 (associated with grass) lie in the upper central region of the feature space while pixels allocated to output units 2 and 6 (associated with trees) lie in the bottom left and pixels allocated to output unit 13 (associated with asphalt) lie in the lower right of the feature space (Figure 8b–d). Moreover, the pixels allocated to the units between those units associated with a single class contained mixed pixels. The data are also arranged so that the degree of spectral similarity to a class varies with distance from the units associated with the pure pixels of the classes. As an example, units 13–16 form the base of the SOFM output layer (Figure 3) and contain pixels associated with the asphalt and grasses classes are arranged in order in the spectral feature space; they are arranged along the right hand side of the triangular shape in Figure 8b–d. The transition from essentially pure asphalt to pure grass is explicit in the SOFM and spectral feature space. The transition from pure grass (unit 16) to pure asphalt (unit 13) was correlated with distance along the base of the SOFM output layer (Spearman rank correlation: rs = 1.0; n = 4; significant at the 95% level of confidence for a one-tailed test). The average squared Mahalanobis distance calculated for pixels associated with each output unit also showed clear trends between thematic composition and location in the SOFM output layer (Figure 9). Again, as an example, it is evident that along the base of the SOFM output later, the average distance to asphalt increases with progress from unit 13 to 16. Correspondingly, the average squared Mahalanobis distance to the grass class declines from unit 13 to unit 16. This confirms that unit 13 is strongly associated with asphalt and unit 16 with grass, with the composition gradually changing from one class to another in a consistent manner depending on the location relative to units 13 and 16.
The results show that spectrally and thematically close pixels lie in the same or close SOFM output units while spectrally and thematically dissimilar pixels lie in distant SOFM output units. Critically, the SOFM provides a basis to characterise the thematic composition of the pixels, both mixed and pure. Thus, the location of the output unit associated with a pixel in the SOFM output space illustrates the pixel’s thematic composition. Moreover, the location relative to other locations in the output layer indicates relative thematic similarity.

4.2. Sentinel-2 MSI Data

As with any unsupervised classification, some ground data or other knowledge is required to label the classes. This could, for example, be achieved by an exploratory analysis of the pixels allocated to specific output units. Critically, with some ground data it is possible to associate output units with the land cover composition of the pixels they contain. All of the output units were associated with some of the pixels (Figure 6). Furthermore, Figure 7 shows the pixels belonging to each output unit in feature space, which highlights that the SOFM output space maintained key spectral information contained in the input imagery. It was evident that pixels associated with specific output units occupied a distinct part of the feature space. Moreover, the relative closeness of clusters in feature space reflected the location of output units in the SOFM output layer. It was also evident that there was a tendency for the pixels allocated to neighbouring SOFM output units to be spectrally close while those in distant output units were spectrally dissimilar (Figure 7). Note, for example, that the locations associated with the pixels allocated to units 1 and 16, amongst the most distant in the SOFM output layer, lie in distant locations in the feature space (Figure 7). Additionally, the pixels associated with units 2 and 5, that neighbour unit 1 in the output layer, are close together in feature space (Figure 7). This result highlights the potential of the SOFM as an unsupervised classification but also one that maintains key properties of the input spectral data.
The pixels allocated to each SOFM output unit may be mapped geographically to allow for comparison against the original imagery and ground reference data (Figure 1 and Figure 2). From this, it is evident that different classes are associated with specific parts of the output space. Of particular relevance here, oil seed rape was associated strongly with output unit 16 (Figure 10). It is also worth noting that this unit contained mostly pure or nearly pure pixels of oil seed rape; many mixed pixels at the field boundaries are absent. There was also a strong tendency for units neighbouring unit 16 (units 11, 12, and 15) to lie close in feature space (Figure 11).
Two other near neighbouring units to unit 16 were found to be partially associated with oil seed rape. These were SOFM output units 7 and 14, both of which are neighbours to neighbouring units around unit 16. The pixels belonging to units 7 and 14 included some pixels of mixed composition that included oil seed rape (Figure 12). Other output units were very strongly associated with other classes. For example, unit 8 was associated with winter wheat (Figure 13), like its neighbour unit 12.
Critically, it was possible to define a set of SOFM output units that were associated with oil seed rape. These units were either immediate, next door neighbours or close neighbours to unit 16 in the SOFM output space. Output unit 16 was strongly associated with pure pixels of oil seed rape while mixed pixels, which can be useful in training, were associated with some units, notably units 7, 11, 14, and 15. It is worth noting that the pixels associated with these output units were all close in feature space (Figure 7). Only the pixels associated with such output units may be needed for training and hence the SOFM output may allow for the targeting of the most useful locations for new training pixels.
Finally, the analysis with a SOFM with a large 10 × 10 output layer revealed the potential to provide a detailed characterisation of the thematic content of the pixels (Figure 14). For example, it was evident visually that the original set of clusters (Figure 7) were essentially subdivided when the large SOFM output space was used. For example, that the original cluster associated with pure pixels of oil seed rape (unit 16 in Figure 7) is divided into two clusters (Figure 14); similar observations may be made for other clusters such as the cluster associated with unit 6 in Figure 7. Critically, this outcome indicates that the space occupied by a class in feature space may be usefully subdivided. This might, for example, allow a focus on specific sub-clusters of a class of interest (e.g., sub-clusters closest to another class to be discriminated) and allow sub-clusters of limited or no value (e.g., a sub-cluster at the edge of a class in feature space that does not face on to any class) to be ignored when defining potential sites to acquire additional ground data for training. For example, ref. [43] shows that markedly different sub-cluster areas in feature space are especially valuable for a diverse set of machine learning classifiers. This situation suggests an ability to focus training data acquisition on sites that lie in spectrally useful locations of feature space for the specific classifier selected for an application. Future work could explore the design and training of an SOFM to enhance its utility as a source of useful training sites for more complex multi-class classifications.

5. Conclusions

The SOFM is an effective unsupervised classifier. It was shown here that it allows spectrally similar pixels to be clustered together but also provides information on the relative similarity of clusters defined on the basis of the location of the SOFM output units they are associated with. Critically, a SOFM can provide information on the thematic composition of image pixels, both pure and mixed. In brief, spectrally similar pixels are associated with output units that are close together while dissimilar pixels lie in distant units. Sub-regions of the SOFM output space are associated with specific classes, and pixels of mixed composition may lie between such regions. With a very large SOFM output, a class may be associated with a set of output units that are close in the output layer, and which may represent spectral sub-classes. Pixels associated with specific output units may be of particular value. For example, an output unit may be associated with pixels of a class that lie close to pixels of another class in feature space and hence be potentially useful training pixels for a margin-based classifier such as a SVM. As such, the use of a SOFM focusses on the potentially most useful pixels thematically and within a limited geographic scope as the pixels associated with other output units may be of no interest.
The ability to cluster similar pixels and indicate the relative similarity of clusters may be helpful in using a SOFM to help train a supervised classification. Characterising the class composition of pixels associated with particular SOFM output units enables efforts to acquire ground data to focus on potentially useful training sites. For an application focused on mapping a specific class of interest, it was shown that the class may be associated strongly with just a small part of the SOFM output space. Pixels associated with these units in the imagery may then be the target of ground data collection activities, with the less useful parts of the image ignored.
The potential of aSOFM analysis to characterise the thematic composition of pixels was demonstrated. Notably, it was stressed that a SOFM provides an enhanced output relative to other popular unsupervised classifications in that information on the similarity of the clusters is provided. While it is possible as a post-classification analysis to calculate variables such as inter-cluster distances with standard unsupervised classifications or to have some information on cluster similarity from hierarchical clustering algorithms, the similarity of clusters is an explicit part of the SOFM output. It is also stressed that the SOFM output may be used to enhance supervised classifications by aiding the focus on the most useful parts of feature space to select useful training pixels.


This research received no external funding.

Data Availability Statement

The Sentinel-2 MSI image is freely available through the Copernicus Data Space Ecosystem (, checked 30 December 2024). The crop map was theUKCEH Land Cover ® Plus Crops © 2018 UKCEH © RSAC © Crown Copyright 2007. Licence number 100017572. This map was accessed through Digimap Collections to support academic research. The ATM data were acquired by the Natural Environment Research Council and may be available from them on request.


I am grateful to the NERC, UKCEH, and Digimap Collections for the data used. I am also grateful to the two referees and the editor for their constructive comments on the manuscript.

Conflicts of Interest

The author declares no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.


  1. Dell, R.L.; Banwell, A.F.; Willis, I.C.; Arnold, N.S.; Halberstadt, A.R.; Chudley, T.R.; Pritchard, H.D. Supervised classification of slush and ponded water on Antarctic ice shelves using Landsat 8 imagery. J. Glaciol. 2022, 68, 401–414. [Google Scholar] [CrossRef]
  2. Mather, P.M.; Koch, M. Computer Processing of Remotely-Sensed Images: An Introduction; John Wiley & Sons: Chichester, UK, 2011. [Google Scholar]
  3. Kavzoglu, T.; Tso, B.; Mather, P. Classification Methods for Remotely Sensed Data, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2024. [Google Scholar]
  4. Mumby, P.J.; Green, E.P.; Edwards, A.J.; Clark, C.D. The cost-effectiveness of remote sensing for tropical coastal resources assessment and management. J. Environ. Manag. 1999, 55, 157–166. [Google Scholar] [CrossRef]
  5. Zhang, B.; Zhang, Y.; Li, Y.; Wan, Y.; Guo, H.; Zheng, Z.; Yang, K. Semi-supervised deep learning via transformation consistency regularization for remote sensing image semantic segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 16, 5782–5796. [Google Scholar] [CrossRef]
  6. Rawat, S.; Saini, R. Evaluating the impact of sampling designs on the performance of machine learning techniques for land use land cover classification using Sentinel-2 data. Int. J. Remote Sens. 2023, 44, 7889–7908. [Google Scholar] [CrossRef]
  7. Hosseiny, B.; Mahdianpari, M.; Hemati, M.; Radman, A.; Mohammadimanesh, F.; Chanussot, J. Beyond supervised learning in remote sensing: Systematic review of deep learning approaches. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2023, 17, 1035–1052. [Google Scholar] [CrossRef]
  8. Elmes, A.; Alemohammad, H.; Avery, R.; Caylor, K.; Eastman, J.R.; Fishgold, L.; Friedl, M.A.; Jain, M.; Kohli, D.; Laso Bayas, J.C.; et al. Accounting for training data error in machine learning applied to Earth observations. Remote Sens. 2020, 12, 1034. [Google Scholar] [CrossRef]
  9. Kattenborn, T.; Leitloff, J.; Schiefer, F.; Hinz, S. Review on convolutional neural networks (CNN) in vegetation remote sensing. ISPRS J. Photogramm. Remote Sens. 2021, 173, 24–49. [Google Scholar] [CrossRef]
  10. Ramezan, C.; Warner, T.A.; Maxwell, A.E. Evaluation of sampling and cross-validation tuning strategies for regional-scale machine learning classification. Remote Sens. 2019, 11, 185. [Google Scholar] [CrossRef]
  11. Olofsson, P.; Foody, G.M.; Herold, M.; Stehman, S.V.; Woodcock, C.; Wulder, M.A. Good practices for estimating area and assessing accuracy of land change. Remote Sens. Environ. 2014, 148, 42–57. [Google Scholar] [CrossRef]
  12. Pengra, B.; Long, J.; Dahal, D.; Stehman, S.V.; Loveland, T.R. A global reference database from very high resolution commercial satellite data and methodology for application to Landsat derived 30 m continuous field tree cover data. Remote Sens. Environ. 2015, 165, 234–248. [Google Scholar] [CrossRef]
  13. Li, C.; Gong, P.; Wang, J.; Yuan, C.; Hu, T.; Wang, Q.; Yu, L.; Clinton, N.; Li, M.; Guo, J.; et al. An all-season sample database for improving land-cover mapping of Africa with two classification schemes. Int. J. Remote Sens. 2016, 37, 4623–4647. [Google Scholar] [CrossRef]
  14. Tuia, D.; Volpi, M.; Copa, L.; Kanevski, M.; Munoz-Mari, J. A survey of active learning algorithms for supervised remote sensing image classification. IEEE J. Sel. Top. Signal Process. 2011, 5, 606–617. [Google Scholar] [CrossRef]
  15. Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
  16. Ramezan, C.A.; Warner, T.A.; Maxwell, A.E.; Price, B.S. Effects of training set size on supervised machine-learning land-cover classification of large-area high-resolution remotely sensed data. Remote Sens. 2021, 13, 368. [Google Scholar] [CrossRef]
  17. Weitkamp, T.; Karimi, P. Evaluating the effect of training data size and composition on the accuracy of smallholder irrigated agriculture mapping in Mozambique using remote sensing and machine learning algorithms. Remote Sens. 2023, 15, 3017. [Google Scholar] [CrossRef]
  18. Cracknell, M.J.; Reading, A.M. Geological mapping using remote sensing data: A comparison of five machine learning algorithms, their response to variations in the spatial distribution of training data and the use of explicit spatial information. Comput. Geosci. 2014, 63, 22–33. [Google Scholar] [CrossRef]
  19. Li, C.; Wang, J.; Wang, L.; Hu, L.; Gong, P. Comparison of classification algorithms and training sample sizes in urban land classification with Landsat thematic mapper imagery. Remote Sens. 2014, 6, 964–983. [Google Scholar] [CrossRef]
  20. Hoffbeck, J.P.; Landgrebe, D.A. Covariance matrix estimation and classification with limited training data. IEEE Trans. Pattern Anal. Mach. Intell. 1996, 18, 763–767. [Google Scholar] [CrossRef]
  21. Chi, M.; Feng, R.; Bruzzone, L. Classification of hyperspectral remote-sensing data with primal SVM for small-sized training dataset problem. Adv. Space Res. 2008, 41, 1793–1799. [Google Scholar] [CrossRef]
  22. Kavzoglu, T. Increasing the accuracy of neural network classification using refined training data. Environ. Model. Softw. 2009, 24, 850–858. [Google Scholar] [CrossRef]
  23. Millard, K.; Richardson, M. On the importance of training data sample selection in random forest image classification: A case study in peatland ecosystem mapping. Remote Sens. 2015, 7, 8489–8515. [Google Scholar] [CrossRef]
  24. Huang, Z.; Pan, Z.; Lei, B. Transfer learning with deep convolutional neural network for SAR target classification with limited labeled data. Remote Sens. 2017, 9, 907. [Google Scholar] [CrossRef]
  25. Lin, Z.; Ji, K.; Kang, M.; Leng, X.; Zou, H. Deep convolutional highway unit network for SAR target classification with limited labeled training data. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1091–1095. [Google Scholar] [CrossRef]
  26. Zhang, L.; Zhang, L. Artificial intelligence for remote sensing data analysis: A review of challenges and opportunities. IEEE Geosci. Remote Sens. Mag. 2022, 10, 270–294. [Google Scholar] [CrossRef]
  27. Fu, Y.; Shen, R.; Song, C.; Dong, J.; Han, W.; Ye, T.; Yuan, W. Exploring the effects of training samples on the accuracy of crop mapping with machine learning algorithm. Sci. Remote Sens. 2023, 7, 100081. [Google Scholar] [CrossRef]
  28. Ma, Y.; Chen, S.; Ermon, S.; Lobell, D.B. Transfer learning in environmental remote sensing. Remote Sens. Environ. 2024, 301, 113924. [Google Scholar] [CrossRef]
  29. Gong, P.; Wang, J.; Huang, H. Stable classification with limited samples in global land cover mapping: Theory and experiments. Sci. Bull. 2024, 69, 1862–1865. [Google Scholar] [CrossRef] [PubMed]
  30. Su, M.; Guo, R.; Chen, B.; Hong, W.; Wang, J.; Feng, Y.; Xu, B. Sampling strategy for detailed urban land use classification: A systematic analysis in Shenzhen. Remote Sens. 2020, 12, 1497. [Google Scholar] [CrossRef]
  31. Gong, P.; Liu, H.; Zhang, M.; Li, C.; Wang, J.; Huang, H.; Clinton, N.; Ji, L.; Li, W.; Bai, Y.; et al. Stable classification with limited sample: Transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017. Sci. Bull. 2019, 64, 370–373. [Google Scholar] [CrossRef] [PubMed]
  32. Stanimirova, R.; Tarrio, K.; Turlej, K.; McAvoy, K.; Stonebrook, S.; Hu, K.T.; Arévalo, P.; Bullock, E.L.; Zhang, Y.; Woodcock, C.E.; et al. A global land cover training dataset from 1984 to 2020. Sci. Data 2023, 10, 879. [Google Scholar] [CrossRef] [PubMed]
  33. Foody, G.M.; Mathur, A. Toward intelligent training of supervised image classifications: Directing training data acquisition for SVM classification. Remote Sens. Environ. 2004, 93, 107–117. [Google Scholar] [CrossRef]
  34. Plaza, J.; Plaza, A.; Perez, R.; Martinez, P. On the use of small training sets for neural network-based characterization of mixed pixels in remotely sensed hyperspectral images. Pattern Recognit. 2009, 42, 3032–3045. [Google Scholar] [CrossRef]
  35. Shao, Y.; Lunetta, R.S. Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points. ISPRS J. Photogramm. Remote Sens. 2012, 70, 78–87. [Google Scholar] [CrossRef]
  36. Ertekin, S.; Huang, J.; Bottou, L.; Giles, L. Learning on the border: Active learning in imbalanced data classification. In Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, Lisbon, Portugal, 6–10 November 2007; Association for Computing Machinery: New York, NY, USA, 2007; pp. 127–136. [Google Scholar]
  37. Mellor, A.; Boukir, S.; Haywood, A.; Jones, S. Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin. ISPRS J. Photogramm. Remote Sens. 2015, 105, 155–168. [Google Scholar] [CrossRef]
  38. Hu, S.; Liang, Y.; Ma, L.; He, Y. MSMOTE: Improving classification performance when training data is imbalanced. In Proceedings of the 2009 Second International Workshop on Computer Science and Engineering, Qingdao, China, 28–30 October 2009; IEEE: Piscataway, NJ, USA, 2009; Volume 2, pp. 13–17. [Google Scholar]
  39. Scott, G.J.; England, M.R.; Starms, W.A.; Marcum, R.A.; Davis, C.H. Training deep convolutional neural networks for land–cover classification of high-resolution imagery. IEEE Geosci. Remote Sens. Lett. 2017, 14, 549–553. [Google Scholar] [CrossRef]
  40. Seiffert, C.; Khoshgoftaar, T.M.; Van Hulse, J.; Napolitano, A. RUSBoost: Improving classification performance when training data is skewed. In Proceedings of the 2008 19th International Conference on Pattern Recognition, Tampa, FL, USA, 8–11 December 2008; IEEE: Piscataway, NJ, USA, 2008; pp. 1–4. [Google Scholar]
  41. Zhu, Z.; Gallant, A.L.; Woodcock, C.E.; Pengra, B.; Olofsson, P.; Loveland, T.R.; Jin, S.; Dahal, D.; Yang, L.; Auch, R.F. Optimizing selection of training and auxiliary data for operational land cover classification for the LCMAP initiative. ISPRS J. Photogramm. Remote Sens. 2016, 122, 206–221. [Google Scholar] [CrossRef]
  42. Waldner, F.; Jacques, D.C.; Löw, F. The impact of training class proportions on binary cropland classification. Remote Sens. Lett. 2017, 8, 1122–1131. [Google Scholar] [CrossRef]
  43. Pal, M.; Foody, G.M. Evaluation of SVM, RVM and SMLR for accurate image classification with limited ground data. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2012, 5, 1344–1355. [Google Scholar] [CrossRef]
  44. Demir, B.; Erturk, S. Clustering-based extraction of border training patterns for accurate SVM classification of hyperspectral images. IEEE Geosci. Remote Sens. Lett. 2009, 6, 840–844. [Google Scholar] [CrossRef]
  45. Foody, G.M.; Mathur, A. The use of small training sets containing mixed pixels for accurate hard image classification: Training on mixed spectral responses for classification by a SVM. Remote Sens. Environ. 2006, 103, 179–189. [Google Scholar] [CrossRef]
  46. Hansen, M.C. Classification trees and mixed pixel training data. In Remote Sensing of Land Use and Land Cover: Principles and Applications; Giri, C.P., Ed.; CRC Press: Boca Raton, FL, USA, 2012; pp. 127–136. [Google Scholar]
  47. Fowler, J.; Waldner, F.; Hochman, Z. All pixels are useful, but some are more useful: Efficient in situ data collection for crop-type mapping using sequential exploration methods. Int. J. Appl. Earth Obs. Geoinf. 2020, 91, 102114. [Google Scholar] [CrossRef]
  48. Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2011, 66, 247–259. [Google Scholar] [CrossRef]
  49. Roli, F.; Fumera, G. Support vector machines for remote sensing image classification. In Proceedings of SPIE 4170, Image and Signal Processing for Remote Sensing VI; SPIE: Bellingham, WA, USA, 2001; pp. 160–166. [Google Scholar] [CrossRef]
  50. Muñoz-Marí, J.; Bruzzone, L.; Camps-Valls, G. A support vector domain description approach to supervised classification of remote sensing images. IEEE Trans. Geosci. Remote Sens. 2007, 45, 2683–2692. [Google Scholar] [CrossRef]
  51. Foody, G.M.; Mathur, A.; Sanchez-Hernandez, C.; Boyd, D.S. Training set size requirements for the classification of a specific class. Remote Sens. Environ. 2006, 104, 1–4. [Google Scholar] [CrossRef]
  52. Mantero, P.; Moser, G.; Serpico, S.B. Partially supervised classification of remote sensing images through SVM-based probability density estimation. IEEE Trans. Geosci. Remote Sens. 2005, 43, 559–570. [Google Scholar] [CrossRef]
  53. Powell, R.L.; Matzke, N.; de Souza, C., Jr.; Clark, M.; Numata, I.; Hess, L.L.; Roberts, D.A. Sources of error in accuracy assessment of thematic land-cover maps in the Brazilian Amazon. Remote Sens. Environ. 2004, 90, 221–234. [Google Scholar] [CrossRef]
  54. Pengra, B.W.; Stehman, S.V.; Horton, J.A.; Dockter, D.J.; Schroeder, T.A.; Yang, Z.; Cohen, W.B.; Healey, S.P.; Loveland, T.R. Quality control and assessment of interpreter consistency of annual land cover reference data in an operational national monitoring program. Remote Sens. Environ. 2020, 238, 111261. [Google Scholar] [CrossRef]
  55. Foody, G.M.; See, L.; Fritz, S.; Van der Velde, M.; Perger, C.; Schill, C.; Boyd, D.S.; Comber, A. Accurate attribute mapping from volunteered geographic information: Issues of volunteer quantity and quality. Cartogr. J. 2015, 52, 336–344. [Google Scholar] [CrossRef]
  56. Wang, Y.; Li, C.; Liu, X.; Li, H.; Yao, Z.; Zhao, Y. How well do the volunteers label land cover types in manual interpretation of remote sensing imagery? Int. J. Digit. Earth 2024, 17, 2347443. [Google Scholar] [CrossRef]
  57. Foody, G.M.; Pal, M.; Rocchini, D.; Garzon-Lopez, C.X.; Bastin, L. The sensitivity of mapping methods to reference data quality: Training supervised image classifications with imperfect reference data. ISPRS Int. J. Geo-Inf. 2016, 5, 199. [Google Scholar] [CrossRef]
  58. Frank, J.; Rebbapragada, U.; Bialas, J.; Oommen, T.; Havens, T.C. Effect of label noise on the machine-learned classification of earthquake damage. Remote Sens. 2017, 9, 803. [Google Scholar] [CrossRef]
  59. Arai, K. A supervised Thematic Mapper classification with a purification of training samples. Int. J. Remote Sens. 1992, 13, 2039–2049. [Google Scholar] [CrossRef]
  60. Yang, X.; Song, Q.; Cao, A. Weighted support vector machine for data classification. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, Montreal, QC, Canada, 31 July–4 August 2005; IEEE: Piscataway, NJ, USA, 2015; Volume 2, pp. 859–864. [Google Scholar]
  61. Radoux, J.; Lamarche, C.; Van Bogaert, E.; Bontemps, S.; Brockmann, C.; Defourny, P. Automated training sample extraction for global land cover mapping. Remote Sens. 2014, 6, 3965–3987. [Google Scholar] [CrossRef]
  62. Dutta, S.; Das, M. Remote sensing scene classification under scarcity of labelled samples—A survey of the state-of-the-arts. Comput. Geosci. 2023, 171, 105295. [Google Scholar] [CrossRef]
  63. Persello, C.; Bruzzone, L. Active and semisupervised learning for the classification of remote sensing images. IEEE Trans. Geosci. Remote Sens. 2014, 52, 6937–6956. [Google Scholar] [CrossRef]
  64. Kwak, G.H.; Park, N.W. Unsupervised domain adaptation with adversarial self-training for crop classification using remote sensing images. Remote Sens. 2022, 14, 4639. [Google Scholar] [CrossRef]
  65. Nunnari, G.; Calvari, S. Exploring convolutional neural networks for the thermal image classification of volcanic activity. Geomatics 2024, 4, 124–137. [Google Scholar] [CrossRef]
  66. Miao, W.; Geng, J.; Jiang, W. Semi-supervised remote-sensing image scene classification using representation consistency siamese network. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–14. [Google Scholar] [CrossRef]
  67. Rajan, S.; Ghosh, J.; Crawford, M.M. An active learning approach to hyperspectral data classification. IEEE Trans. Geosci. Remote Sens. 2008, 46, 1231–1242. [Google Scholar] [CrossRef]
  68. Demir, B.; Minello, L.; Bruzzone, L. Definition of effective training sets for supervised classification of remote sensing images by a novel cost-sensitive active learning method. IEEE Trans. Geosci. Remote Sens. 2013, 52, 1272–1284. [Google Scholar] [CrossRef]
  69. Pires de Lima, R.; Marfurt, K. Convolutional neural network for remote-sensing scene classification: Transfer learning analysis. Remote Sens. 2019, 12, 86. [Google Scholar] [CrossRef]
  70. Paris, C.; Bruzzone, L. A novel approach to the unsupervised extraction of reliable training samples from thematic products. IEEE Trans. Geosci. Remote Sens. 2021, 59, 1930–1948. [Google Scholar] [CrossRef]
  71. Hermosilla, T.; Wulder, M.A.; White, J.C.; Coops, N.C. Land cover classification in an era of big and open data: Optimizing localized implementation and training data selection to improve mapping outcomes. Remote Sens. Environ. 2022, 268, 112780. [Google Scholar] [CrossRef]
  72. Bratic, G.; Yordanov, V.; Brovelli, M.A. High-resolution land cover classification: Cost-effective approach for extraction of reliable training data from existing land cover datasets. Int. J. Digit. Earth 2023, 16, 3618–3636. [Google Scholar] [CrossRef]
  73. Han, S.; Lee, J. Parallelized inter-image k-means clustering algorithm for unsupervised classification of series of satellite images. Remote Sens. 2023, 16, 102. [Google Scholar] [CrossRef]
  74. Lin, C.-T.; Lee, Y.-N.; Pu, H.-C. Satellite sensor image classification using cascaded architecture of neural fuzzy network. IEEE Trans. Geosci. Remote Sens. 2000, 38, 1033–1043. [Google Scholar]
  75. Goncalves, M.L.; Netto, M.L.; Costa, J.A.; Zullo Junior, J. An unsupervised method of classifying remotely sensed images using Kohonen self-organizing maps and agglomerative hierarchical clustering methods. Int. J. Remote Sens. 2008, 29, 3171–3207. [Google Scholar] [CrossRef]
  76. Foody, G.M. Relating the land-cover composition of mixed pixels to artificial neural network classification output. Photogramm. Eng. Remote Sens. 1996, 62, 491–498. [Google Scholar]
  77. Ji, C.Y. Land-use classification of remotely sensed data using Kohonen self-organizing feature map neural networks. Photogramm. Eng. Remote Sens. 2000, 66, 1451–1460. [Google Scholar]
  78. Kohonen, T. Self-organized formation of topologically correct feature maps. Biol. Cybern. 1982, 43, 59–69. [Google Scholar] [CrossRef]
  79. Kohonen, T. The self-organizing map. Proc. IEEE 1990, 78, 1464–1480. [Google Scholar] [CrossRef]
  80. Villmann, T.; Der, R.; Herrmann, M.; Martinetz, T.M. Topology preservation in self-organizing feature maps: Exact definition and measurement. IEEE Trans. Neural Netw. 1997, 8, 256–266. [Google Scholar] [CrossRef]
  81. Chang, D.H.; Islam, S. Estimation of soil physical properties using remote sensing and artificial neural network. Remote Sens. Environ. 2000, 74, 534–544. [Google Scholar] [CrossRef]
  82. Sharna, P.; Mutreja, U. Analysis of satellite images using artificial neural network. Int. J. Soft Comput. Eng. 2013, 2, 276–278. [Google Scholar]
  83. Lek, S.; Guégan, J.F. Artificial neural networks as a tool in ecological modelling, an introduction. Ecol. Model. 1999, 120, 65–73. [Google Scholar] [CrossRef]
  84. Ienne, P.; Thiran, P.; Vassilas, N. Modified self-organizing feature map algorithms for efficient digital hardware implementation. IEEE Trans. Neural Netw. 1997, 8, 315–330. [Google Scholar] [CrossRef] [PubMed]
Figure 1. A map of the test site, indicated by the black dashed box, used in the analyses of the Sentinel-2 data. This map also represents the ground reference data used in the analyses of the Sentinel-2 data. The map is based on UKCEH Land Cover ® Plus: Crops © 2018 UKCEH and acquired via the Digimap service.
Figure 1. A map of the test site, indicated by the black dashed box, used in the analyses of the Sentinel-2 data. This map also represents the ground reference data used in the analyses of the Sentinel-2 data. The map is based on UKCEH Land Cover ® Plus: Crops © 2018 UKCEH and acquired via the Digimap service.
Remotesensing 17 00130 g001
Figure 2. Sentinel-2 MSI imagery. (a) True colour composite with fields planted with oil seed rape showing in bright light green, (b) image in band 8 with oil seed rape fields shown with very light tone, and (c) coarse spatial resolution image in band 8 as guide to data used in SOFM.
Figure 2. Sentinel-2 MSI imagery. (a) True colour composite with fields planted with oil seed rape showing in bright light green, (b) image in band 8 with oil seed rape fields shown with very light tone, and (c) coarse spatial resolution image in band 8 as guide to data used in SOFM.
Remotesensing 17 00130 g002aRemotesensing 17 00130 g002b
Figure 3. A SOFM. The network shown is that used with the ATM data set, which had 3 input units (triangles) and an output layer containing 16 output units (circles) arranged in a square. The weighted connection between the input and output units is shown with black solid lines while the connections between the output units are shown with dashed red lines. Each output unit is identified by its position in the layer and defined by the numerical identifier in red.
Figure 3. A SOFM. The network shown is that used with the ATM data set, which had 3 input units (triangles) and an output layer containing 16 output units (circles) arranged in a square. The weighted connection between the input and output units is shown with black solid lines while the connections between the output units are shown with dashed red lines. Each output unit is identified by its position in the layer and defined by the numerical identifier in red.
Remotesensing 17 00130 g003
Figure 4. The number of pixels allocated to each of the SOFM output units for the ATM data set; the layout of the output layer matches that shown in Figure 3.
Figure 4. The number of pixels allocated to each of the SOFM output units for the ATM data set; the layout of the output layer matches that shown in Figure 3.
Remotesensing 17 00130 g004
Figure 5. Class composition of the pixels in the SOFM output units. Each pie chart shows the average composition of the pixels associated with the specific output unit. The classes are trees in blue, grass in orange, and asphalt in grey. The layout of the output layer matches that shown in Figure 3.
Figure 5. Class composition of the pixels in the SOFM output units. Each pie chart shows the average composition of the pixels associated with the specific output unit. The classes are trees in blue, grass in orange, and asphalt in grey. The layout of the output layer matches that shown in Figure 3.
Remotesensing 17 00130 g005
Figure 6. The number of pixels allocated to each of the SOFM output units for the Sentinel-2 MSI data set; the layout of the output layer matches that shown in Figure 3.
Figure 6. The number of pixels allocated to each of the SOFM output units for the Sentinel-2 MSI data set; the layout of the output layer matches that shown in Figure 3.
Remotesensing 17 00130 g006
Figure 7. The locations of the cases allocated to each SOFM output unit in the input spectral feature space.
Figure 7. The locations of the cases allocated to each SOFM output unit in the input spectral feature space.
Remotesensing 17 00130 g007
Figure 8. Locations of cases allocated to all 16 SOFM output units in the ATMdata feature space. (a) Two dimensional feature space defined by b1 and b2, (b) 2D feature space defined by b1 and b3, (c) 2D feature space defined by b2 and b3, and (d) 3D feature space defined by b1, b2 and b3.
Figure 8. Locations of cases allocated to all 16 SOFM output units in the ATMdata feature space. (a) Two dimensional feature space defined by b1 and b2, (b) 2D feature space defined by b1 and b3, (c) 2D feature space defined by b2 and b3, and (d) 3D feature space defined by b1, b2 and b3.
Remotesensing 17 00130 g008
Figure 9. The average squared Mahalanobis distance for the pixels associated with each SOFM unit to each class. (a) Trees, (b) grass, and (c) asphalt.
Figure 9. The average squared Mahalanobis distance for the pixels associated with each SOFM unit to each class. (a) Trees, (b) grass, and (c) asphalt.
Remotesensing 17 00130 g009
Figure 10. Cases associated with SOFM output unit 16 (shown in cyan) overlaid on original resolution image acquired in band 8 (i.e., Figure 2b).
Figure 10. Cases associated with SOFM output unit 16 (shown in cyan) overlaid on original resolution image acquired in band 8 (i.e., Figure 2b).
Remotesensing 17 00130 g010
Figure 11. Cases belonging to units neighbouring unit 16 overlain on original resolution image in band 8. (a) Unit 11, (b) unit 12, and (c) unit 15.
Figure 11. Cases belonging to units neighbouring unit 16 overlain on original resolution image in band 8. (a) Unit 11, (b) unit 12, and (c) unit 15.
Remotesensing 17 00130 g011
Figure 12. Cases belonging to units that are near neighbours of unit 16 overlain on original resolution image in band 8. (a) Unit 7 and (b) unit 14.
Figure 12. Cases belonging to units that are near neighbours of unit 16 overlain on original resolution image in band 8. (a) Unit 7 and (b) unit 14.
Remotesensing 17 00130 g012
Figure 13. Cases belonging to unit 8 overlain on original resolution image in band 8.
Figure 13. Cases belonging to unit 8 overlain on original resolution image in band 8.
Remotesensing 17 00130 g013
Figure 14. The locations of the cases allocated to each output unit in the SOFM with a 10 × 10 unit output layer. A legend is not shown as there are 100 classes and the central aim is to compare them visually against Figure 7.
Figure 14. The locations of the cases allocated to each output unit in the SOFM with a 10 × 10 unit output layer. A legend is not shown as there are 100 classes and the central aim is to compare them visually against Figure 7.
Remotesensing 17 00130 g014
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Foody, G.M. Characterising the Thematic Content of Image Pixels with Topologically Structured Clustering. Remote Sens. 2025, 17, 130.

AMA Style

Foody GM. Characterising the Thematic Content of Image Pixels with Topologically Structured Clustering. Remote Sensing. 2025; 17(1):130.

Chicago/Turabian Style

Foody, Giles M. 2025. "Characterising the Thematic Content of Image Pixels with Topologically Structured Clustering" Remote Sensing 17, no. 1: 130.

APA Style

Foody, G. M. (2025). Characterising the Thematic Content of Image Pixels with Topologically Structured Clustering. Remote Sensing, 17(1), 130.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop