Dimensionality Reduction and Anomaly Detection Based on Kittler’s Taxonomy: Analyzing Water Bodies in Two Dimensional Spaces

Marinho, Giovanna Carreira; Júnior, Wilson Estécio Marcílio; Dias, Mauricio Araujo; Eler, Danilo Medeiros; Negri, Rogério Galante; Casaca, Wallace

doi:10.3390/rs15164085

Open AccessArticle

Dimensionality Reduction and Anomaly Detection Based on Kittler’s Taxonomy: Analyzing Water Bodies in Two Dimensional Spaces

by

Giovanna Carreira Marinho

¹

,

Wilson Estécio Marcílio Júnior

¹

,

Mauricio Araujo Dias

^1,*

,

Danilo Medeiros Eler

¹

,

Rogério Galante Negri

²

and

Wallace Casaca

³

¹

Department of Mathematics and Computer Science, Faculty of Sciences and Technology, Campus Presidente Prudente, São Paulo State University (UNESP), Sao Paulo 19060-900, Brazil

²

Department of Environmental Engineering, Institute of Sciences and Technology, Campus São José dos Campos, São Paulo State University (UNESP), Sao Paulo 12247-004, Brazil

³

Department of Computer Science and Statistics, Institute of Biosciences, Letters and Exact Sciences, Campus São José do Rio Preto, São Paulo State University (UNESP), Sao Paulo 15054-000, Brazil

^*

Author to whom correspondence should be addressed.

Remote Sens. 2023, 15(16), 4085; https://doi.org/10.3390/rs15164085

Submission received: 26 June 2023 / Revised: 14 August 2023 / Accepted: 18 August 2023 / Published: 19 August 2023

(This article belongs to the Special Issue Remote Sensing for Surface Water Monitoring)

Download

Browse Figures

Versions Notes

Abstract

:

Dimensionality reduction is one of the most used transformations of data and plays a critical role in maintaining meaningful properties while transforming data from high- to low-dimensional spaces. Previous studies, e.g., on image analysis, comparing data from these two spaces have found that, generally, any study related to anomaly detection can achieve the same or similar results when applied to both dimensional spaces. However, there have been no studies that compare differences in these spaces related to anomaly detection strategy based on Kittler’s Taxonomy (ADS-KT). This study aims to investigate the differences between both spaces when dimensionality reduction is associated with ADS-KT while analyzing a satellite image. Our methodology starts applying the pre-processing phase of the ADS-KT to create the high-dimensional space. Next, a dimensionality reduction technique generates the low-dimensional space. Then, we analyze extracted features from both spaces based on visualizations. Finally, machine-learning approaches, in accordance with the ADS-KT, produce results for both spaces. In the results section, metrics assessing transformed data present values close to zero contrasting with the high-dimensional space. Therefore, we conclude that dimensionality reduction directly impacts the application of the ADS-KT. Future work should investigate whether dimensionality reduction impacts the ADS-KT for any set of attributes.

Keywords:

dimensionality reduction; remote sensing; Kittler’s taxonomy; anomaly detection; machine learning; image analysis

1. Introduction

Dimensionality reduction techniques are essential for transforming data from a high spatial dimension (e.g., multi-spectral images) to a low spatial dimension, without losing relevant properties of the data. This transformation is feasible because these techniques reduce redundancies, but preserve most of the attributes of these data. The transformations and combinations of the data space performed by these techniques are of great importance in data processing, being used in a wide range of fields and study cases, such as water resources management [1], medicine [2], route selection [3], etc. The main advantages of using dimensionality reduction are widely described in the scientific literature, for example, in various studies involving digital image analysis [4,5,6,7,8,9,10,11,12,13,14,15]. These studies show that it is possible to obtain similar results both by analyzing data in high and low dimensions.

An analysis of the separability of classes and the performance of anomaly detection was done by [4]. In this study, the impact of dimensionality reduction applied at the beginning of the analysis, i.e., during the signal acquisition process, was investigated. The performance of dimensionality reduction based on random projection and the Compressive-Projection Principal Component Analysis (CPPCA) method was evaluated by analyzing the results of classification and anomaly detection. The classifications obtained on the datasets reconstructed from this last strategy were quite similar to those obtained on the original data (correlation coefficient above 0.9 for each component).

Another study aimed to improve the accuracy of classification through two strategies and was conducted by [5]. The Maximum Margin Criterion (MMC) was used for feature extraction and a correlation-based method was used to generate a subset of features. Next, two strategies were defined to exhaustively combine these extracted and selected features. In the first instance, the amount of features extracted could be different from the amount of features selected, i.e., sets of different cardinalities were considered. In the second strategy, both sets had the same cardinality. Finally, from these combined sets of features, obtained in the two strategies, the algorithms Sequential Backward Selection (SBS) and ReliefF were used independently to eliminate redundancies and less relevant information, i.e., to perform dimensionality reduction. The results obtained from three sets of hyperspectral image data showed that the accuracy obtained was higher in the second strategy. The use of the SBS algorithm showed a higher consumption of time when the cardinality of the combined feature set was higher and did not always provide a reduced and optimized set of features.

Two methods of dimensionality reduction were analyzed in the study published by [6]. This work applies the Tucker decomposition to decompose (or compress) the spectral information of a hyperspectral image. With this, feature extraction is applied in this spectral space in order to obtain a set of features that can be used for classification. Next, the authors applied the PCA method to compare the obtained features and performed classifications from these features. Analyzing the values of the metrics applied to the results obtained, the first transformation allowed the authors to obtain data closer to the original data, while PCA presented a lower accuracy value in a shorter time.

A review of dimensionality reduction techniques comparing the results obtained by the Support Vector Machines classifier was conducted in the study published in [7]. The authors used linear feature extraction techniques—PCA and Tensor Locality Preserving Projection (TLPP)—and non-linear techniques—Kernel Principal Component Analysis (KPCA) and Laplacian Eigenmaps (LE)—in addition to supervised (MI and DM) and unsupervised band selection methods—Constrained Band Selection (CBS) and Propagation Affinity (PA). Combinations of techniques and analysis of performance in classification were made from hyperspectral images. The results show that TLPP offered better results in the classification. In addition, the joint use of TLPP with CBS improved the accuracy of the results.

An investigation into how the dimensionality reduction methods PCA and Independent Component Analysis (ICA) affect classification accuracy was conducted by [8]. Aerial images originally represented by high-dimensional descriptors (Gabor and Dist) were considered for the classification tests. Dimensionality reduction techniques were applied to the full descriptors and those calculated for each spectral component. Analyzing the performance, the classification accuracy was close to 90%, even from descriptors with smaller dimensions. The authors concluded that, through the use of descriptors and dimensionality reduction techniques, it is possible to obtain advanced results of aerial image classification.

Five methods of dimensionality reduction were compared based on the results obtained by the K-means clustering algorithm (unsupervised), in the study published by [9]. These methods were applied to ten multispectral images from the Landsat satellite and the results showed that some more advanced non-linear methods found a better clustering; however, PCA, a linear method, was responsible for presenting a low computational cost and high performance.

A comparison framework for feature extraction without the need for labels was proposed in [10]. The tests performed compared the Fast Fourier Transform (FFT) and PCA methods applied to some remote sensing images. The results (composed of a reduced feature space obtained by the framework) indicate that the last technique (PCA) outperformed the first (FFT). These conclusions are due to the fact that PCA performed a greater separation of the two classes involved in the problem and this technique produced a space that has more than 15% separability compared to FFT.

The application of dimensionality reduction through the PCA method and the monitoring of vegetation through the Normalized Difference Vegetation Index (NDVI) in satellite images were the topics of the study conducted by Navin et al. [11]. Both the application and the monitoring refer to the same area of study. In this study, PCA was used to assist in the image analysis. Moreover, the NDVI index, to help in the monitoring of local vegetation, was calculated from satellite images. The validation of the image generated through the NDVI Index was performed by comparing this image with the same region of an image from Google Earth. This comparison indicated a similarity between the two images. This work allowed for the analysis of the monitoring of the study area and the management of terrestrial resources.

The correlation between hyperspectral images and dimensionality reduction was analyzed in [12]. By applying a Minimum Noise Fraction Rotation (MNF) method to a certain study area, the authors observed an improvement in image quality in terms of both noise and redundancy reductions between features. The authors claim that after the transformation, the processed data improved the accuracy of some activities, such as classification.

The cited studies have analyzed the impact of consolidated dimensionality reduction techniques on image analysis activities. However, there are other studies which proposed different methods from those previously available in the literature [13,14,15]. A semi-supervised dimensionality reduction method based on the concept of sparse representation was proposed in [13]. This method preserves the intrinsic geometry information and discriminative information of the samples. The samples selected in this study are distinguished based on a sparse coefficient which is higher when the samples and the reference entities are of the same class, and is lower otherwise. This coefficient is used in the construction of a graph, which is an important data structure. In addition, a regularization term is incorporated into this structure for dimensionality reduction. This method makes use of labeled and unlabeled samples to improve the model performance. In addition, the Linear Discriminantion Analysis (LDA) method was incorporated for data classification. The LDA was compared to existing methods based on the classification of hyperspectral images and, from the results, it was possible to conclude that the method achieved better results.

Two other dimensionality reduction methods for hyperspectral images, the Discriminative Graph-based Dimensionality Reduction (DGDR) and its extension Multi-Scale (MS-DGDR), were presented in [14]. With regard to these methods, a projection function is sought in order to achieve the following objectives: the minimization of a similarity term (which has the intra-class dispersion relationship) and the maximization of a dissimilarity term (which has the inter-class distance relationship). The results obtained on real data show that both methods provide improvements in classification accuracy, compared to other existing methods.

A dimensionality reduction method called Class Predabillity Semi-supervised DR (CPSDR) is described in [15]. The CPSDR algorithm is based on the Local Fisher Discriminant Analysis (LFDA). Unlike semi-supervised methods, which focus only on a small amount of labeled data and depend on local geometry information, CPSDR focuses on unlabeled samples and explores, in addition to local geometry information and class structure information. Through this information, the scatter matrix is more discriminatory, and, in terms of implementation, the problem is formulated as an optimization and is solved through eigenvalue decomposition. The results obtained from two sets of image data suggest that the proposed method presents a more discriminative intra- and inter-class scatter matrix, achieving an efficient classification performance and resulting in an effective semi-supervised method.

The aforementioned studies allow us to understand that when the images to be analyzed are multi-spectral, which is very common, for example, for remote sensing images, they can contain a great deal of redundant data between their bands. Dimensionality reduction reduces the redundancies of these images and increases computational efficiency. Consequently, a posterior analysis of these images, for example for anomaly detection, can be performed on a smaller amount of data. In this way, dimensionality reduction contributes to the subsequent step, e.g., image analysis, requiring fewer computational resources. However, the consequences of the use of dimensionality reduction and certain image analysis strategies, e.g., anomaly detection based on Kittler’s Taxonomy [16], remain unknown. Therefore, new studies are necessary to discover the consequences of using dimensionality reduction and the anomaly detection strategy based on Kittler’s Taxonomy (ADS-KT) applied together to remote sensing images.

Studies regarding how dimensionality reduction impacts anomaly detection based on Kittler’s Taxonomy are of great interest within the field of machine learning. The reason for this interest lies in the fact that anomaly detection based on Kittler’s Taxonomy is an innovative and promising pattern recognition strategy, which can be used to help machines recognize the occurrence of problems, e.g., environmental disasters, in different contexts present in remote sensing images. The ADS-KT was applied for the recognition of problems monitored by remote sensing for the first time in [17]. The study presents an approach to decision-making systems, based on which machines would be closer to recognizing contexts in which the problems are embedded. The strategy uses tools to detect anomalies and inconsistencies related to water pollution resulting from environmental disasters. Unfortunately, only publications of the application of this strategy in high-dimensional data are found in the scientific literature. The results of its application on low-dimensional data remain unknown.

For example, and in relation to the research question, we hypothesized that if transformations in the original data space (e.g., reducing the attribute values) impact the ADS-KT, even if dimensions are maintained, dimensionality reduction also impacts the same strategy, as the original data space is transformed and reduced as a consequence of dimensionality reduction. Therefore, this study aims to analyze the behavior of the application of the ADS-KT, presented by Dias et al. [17], in a low-dimensional space, obtained from a dimensionality reduction technique, or in a transformed data space. This objective is important to help us understand if and how dimensionality reduction impacts Kittler’s Taxonomy-based anomaly detection, when it is applied to an image from remote sensing and especially when an environmental disaster is used as a case study.

In order to achieve this goal, this study initially applied the pre-processing phase of the ADS-KT [17] to a satellite image that records the consequences of an environmental disaster. Once the pre-processing result of the strategy was obtained, i.e., the first result, a dimensionality reduction method was applied to generate data in a low-dimensional space, i.e., the second result. Next, features were extracted from the two results in order to analyze some behaviors from some visualizations. Finally, the training, validation, and testing approaches of the classifiers were performed for the two results—the result obtained only by pre-processing (high dimension) and the result obtained by the dimensionality reduction algorithm (low dimension). From the results of the classifications and Kittler’s taxonomy, it was observed that the use of low dimension (fewer features) showed the occurrence of a different anomaly category than that found in the classification made from a high dimension. This discovery reinforces the importance of using taxonomy to classify the different behaviors that occur when working with machine learning.

The main contribution of this study is to show researchers in the field of Remote Sensing that investigations regarding the occurrence of anomalies of the type Unexpected structure and structural components should take into account high-dimensional spaces instead of low-dimensional ones. While studies related to digital image analysis, such as [4,5,6,7,8,9,10,11,12,13,14,15], allow researchers to sustain the idea that data analysis in high dimension generates results similar to those analyzed in low dimension, this study contributes to show that there is at least one exception to this rule, when digital image analysis involves anomaly detection based on Kittler’s Taxonomy.

The innovation of this study is that, to the authors’ knowledge, it is the first to describe the occurrence of an anomaly of the type Unexpected Structural Component (see [16]) in research in the field of Remote Sensing. This innovation is important because anomaly detection based on Kittler’s Taxonomy has proven to be a promising tool in regards to machine learning applied to remote sensing. In this way, this study joins other previously published ones, such as [17,18], which also innovated by describing research on the use of Kittler’s Taxonomy in the field of Remote Sensing.

2. Materials and Methods

2.1. Study Area

In order to verify the proposed methodology, data from the Mariana region (Minas Gerais, Brazil) were selected. This region is important for this study, because an environmental disaster involving the rupture of a dam occurred there, causing a discharge of mining waste into the Doce River. Figure 1 presents information related to the study area.

2.2. Materials

Data selection was made from the platform Earth Explorer, provided by the United States Geological Survey (USGS) [19], considering the following information: UTM 23, coordinates 20°13′48.07″S and 42°43′47.24″W, and date 12 November 2015 (being identified by

L C 08_L 1 T P_217074_20151112_20170402_01_T 1

). These data consist of high-resolution images (with value 9 as the image quality) from the Landsat 8 satellite, named Operational Land Imager (OLI). The scene with a resolution of 15,705 × 15,440 pixels was obtained from the USGS catalog at “Level L1TP”, ensuring an accurate representation in radiometric and geometric terms. Still related to the scene, eight spectral bands were selected, with the following wavelength (in micrometers) and spatial resolution (in meters): Band 1—coastal aerosol, 0.43–0.45 µm, 30 m; Band 2—blue, 0.45–0.51 µm, 30 m; Band 3—green, 0.53–0.59 µm, 30 m; Band 4—red, 0.64–0.67 µm, 30 m; Band 5—near infrared (NIR), 0.85–0.88 µm, 30 m; Band 6—SWIR 1, 1.57–1.65 µm, 30 m; Band 7—SWIR 2, 2.11–2.29 µm, 30 m; Band 8—panchromatic (PAN), 0.50–0.68 µm, 15 m.

The software QGis and the Orfeo toolbox were the tools used for the stages involving image analysis. At the end of the methodology, the language Python and the development environment Google Colaboratory were used to create visualizations.

2.3. Conceptualization

2.3.1. Dimensionality Reduction

When mapping data from a high to a low dimension, dimensionality reduction (DR) techniques maintain the original properties of the data [20]. This is completed in two steps: feature selection and feature extraction [5,7]. Each of these steps is described separately below.

In the feature selection stage, a search is made for a subset of features that satisfy a certain criterion. Depending on the method used, this criterion can be data similarity, entropy, and variance, among others. In this step, important and non-redundant information is selected, and the relationship between the information is maintained. In relation to remote sensing images, the objective of this stage is to select relevant spectral bands for analysis. This stage can be performed by any method, whether it be supervised, unsupervised, or semi-supervised. For example, Principal Components Analysis (PCA) is a non-supervised dimensionality reduction method, since it does not make use of any pre-defined information about the data.

In the feature extraction stage, the new feature space is built from maximizing the separability between the data, i.e., seeking to find a new space in which the data are more spread out in the space. This step can be performed by any method that performs linear or non-linear transformations. For example, PCA applies linear transformation for mapping the high-dimensional space to low-dimensional space.

2.3.2. Principal Component Analysis (PCA)

PCA is a method that can be used for dimensionality reduction. It implements the Karhunen–Loève Transform and, when applied to remote sensing images, it constitutes a powerful tool in the spectral analysis of these images [21]. PCA can be used during pre-processing of the data for image quality improvement, reducing its dimensionality and facilitating its analysis [11]. In this method, all the necessary information is estimated directly from the data.

In terms of its functioning, PCA maximizes the variance of the new components and minimizes the mean squared error between the original data and this new representation. As its goal is to reduce redundancy between the input data, first, a decomposition of the features is performed, and then, the dimensionality is reduced by searching for new features that are linear combinations of the original data. This is a popular method for dimensionality reduction of remote sensing data [10], commonly applied to multi- or hyperspectral images.

2.3.3. Multi- and Hyperspectral Images

Multi- and hyperspectral remote sensing images present a variety of relevant information for classifications and detections in general. They are represented as a cube of remote sensing image, where the width and height are associated, respectively, to the axes x and y and the number of bands is associated to the axis z [5]. Each pixel of the image provides spectral information about an observation and can be represented as a high-dimensional vector, where each dimension represents a spectral band [7].

Remote sensing images are formed from data captured by sensors that are responsible for receiving a great deal of information [13]. For example, images formed from the region of the electromagnetic spectrum called “visible light” are composed of information from different wavelengths. Thus, each band that makes it up represents a specific region of this electromagnetic spectrum area. Due to the high spectral resolution, these images have a lot of redundant information (high dimensionality), which, in addition to increasing the computational cost and storage, can reduce the accuracy of the classification, for example. Given this, the use of dimensionality reduction is important, since it seeks a low-dimensional space from a selection of bands, in which redundant information between them is eliminated, making the classification more computationally efficient [4,5,7,9,12,14,15].

2.3.4. Classification

Classification is an activity that consists of assigning, for each sample, a discrete class belonging to a set of available classes [22]. In remote sensing image classification, a sample can be represented by a subset of pixels that belong to the same class. Thus, after the manual selection of some samples in an image, a model is generated through the training of a classifier, a kind of pattern recognition specialist. Based on this model, it is identified to which class each pixel of the entire image belongs.

When labeled samples, i.e., samples identified according to class, are used for training the classifier, the classification is called supervised [23]. When there are no labeled samples, the classification is called unsupervised. If only part of the samples are labeled, the classification is called semi-supervised. In this study, supervised classifications are made, since labeled samples are used for the training of the selected classifiers. Besides these classification categories, there are two types of classifiers, i.e., contextual and non-contextual [16], which will be explained below.

Contextual classifiers (i.e., strong classifiers) depend on specific knowledge, such as prior knowledge or training data. This study used the contextual classifier Boost [24] to classify the images. This classifier performs the task of supervised classification and makes use of weak classifiers, combining the results of these, to achieve the final goal.

Non-contextual classifiers (i.e., weak classifiers) are less restrictive and precise compared to contextual classifiers. This study used the non-contextual classifier Decision Tree (DT) [25] to classify the images. Also used in supervised classification, this classifier can be used together or alone, and each decision node is responsible for separating the data into two classes (in the case of Binary Trees).

When using a contextual and a non-contextual classifier, it is expected that their results will be similar when classifying the same input data, i.e., that their class probability estimates will be similar [26,27,28,29]. In this sense, when there is a disagreement between its results, there may be an incongruence that is related to some kind of anomaly. Thus, an analysis of the divergence between the results obtained by a contextual and a non-contextual classifier is necessary to detect the incongruence in the context in which they are being used in this study.

For example, contextual and non-contextual classifiers can be used to classify a satellite image composed of a river and other diverse elements. Let us consider that the results obtained by both classifiers diverge in some regions of this river. In other words, one classifier can assign a class “water” and the other a class “non-water” to the same portion of the river. This divergence may indicate the presence of an inconsistency between the classifications. The categorization of the anomaly that may be related to this inconsistency can be performed through Kittler’s Taxonomy [16].

2.3.5. Kittler’s Taxonomy

Kittler’s Taxonomy consists of an anomaly detection framework [16]. This framework exposes the multifaceted nature of each anomaly and suggests effective mechanisms to identify and differentiate each peculiarity of the anomaly, in order to contribute to its detection. Taxonomy also helps to identify the different causes of anomalous events.

Anomaly, according to Kittler et al. [16], is the phenomenon that occurs when machine perception systems, already trained and modeled for an application domain, are exposed to a new scenario that was not previously considered. Thus, after this domain change, they will fail to assign the correct meaning to the data.

Conventional anomaly detection methods are focused on the concept of observational anomaly, which can be an outlier or a deviation from a distribution. These types of approaches do not directly evidence a domain anomaly, but provide means for it. The domain anomaly arises when the observed data are not explained by existing models.

Conventional methods use non-generative models for data classification due to their fast performance. However, non-generative models are not capable of detecting cross-domain anomaly. Thus, the use of more than one specialist allows for this detection to happen more efficiently. The incongruence between the results of several non-generative models is an indication of a potential domain anomaly. In addition, incongruence can help in identifying the type of anomaly.

When multiple experts must be used to allow for the detection of domain anomalies, the contextual classifier category stands out. This is due to the fact that its classification is able to hierarchically represent sensory data. The assignment of the class for each data is made by the contextual classifier from the information of that data and its neighbors, while the non-contextual classifier only makes use of information related to the data itself.

Some factors must be evaluated together for the detection and identification of domain anomalies, namely high quality of data obtained by the sensor, contextual and non-contextual classification of the same data, and occurrence of incongruence. The taxonomy described by [16] organizes anomalies into the following types: Measurement model drift, Component model drift, Unexpected structure and structural components, Unknown object, Unknown structure, and Unexpected structural component. Next, only the categories that are analyzed in this study are detailed.

Unexpected structure and structural components: this category of anomaly is related to a complete or partial change of the study area, i.e., of the domain. In this case, the observation differs considerably from the reference models of the classifiers in terms of its structure (which defines its shape) and components (which together make up the structure). Let us consider a model created to classify pixels in an image into “water” and “non-water” classes. During the classification, when analyzing a river that belongs to the image, if this river has undergone a change of domain, for example, it received a considerable amount of ore tailings, the classifier that was able to identify the river as water may fail, since the structure and components of this river are unexpected (water was expected and now it has turned to mud). This example was studied and published in [17]. Both the referred study [17] and any other study that applies anomaly detection based on Kittler’s Taxonomy to the analysis of water bodies (including this study) are very important to help preserve water resources. Clean water and sanitation correspond to the 6th goal of the UN Sustainable Development Goals [30]. This goal has received considerable critical attention from the scientific community to investigate and publish studies in order to help ensure the availability and the sustainable management of water resources and sanitation for the world [31,32,33,34,35].
Unexpected structural component: this other category of anomaly is related to the lack of attributes in the model characteristics (only a subset of object models is used), i.e., the lack of relevant information leads to the occurrence of this type of anomaly. For example, the application of any filter may be responsible for eliminating features relevant to the model to classify an object as belonging to a certain type of anomaly, since the entire universe of features was not contemplated.

2.4. Methodology

Figure 2 presents the proposed methodology. More details about the parameters used in some steps of the methodology can be found in the Appendix A. Initially, the seven first bands (with a spatial resolution of 30 m), which make up the selected dataset, were added as raster layers to a project in QGis. From these data, a virtual raster was created by overlapping these bands, in order to facilitate its manipulation. Next, bands 4 (red), 3 (green), and 2 (blue) were chosen for the raster rendering and a contrast enhancement based on the mean and standard deviation was performed. Such activities were important to improve the analysis and visualization of the image (raster) observations in natural colors.

Let us consider

f (x, y)

, a two-dimensional function that represents a raster (a numerical matrix of dimension

M \times N

), in which we can access the pixel intensity value at the coordinates

(x, y)

[36], as shown in Equation (1):

f (x, y) = [\begin{matrix} f (0, 0) & f (0, 1) & \dots & f (0, N - 1) \\ f (1, 0) & f (1, 1) & \dots & f (1, N - 1) \\ ⋮ & ⋮ & ⋮ \\ f (M - 1, 0) & f (M - 1, 1) & \dots & f (M - 1, N - 1) \end{matrix}]

(1)

Still, the matrix notation presented in Equation (2) also provides a representation of an image and its pixels, where

a_{i, j} = f (x = i, y = j) = f (i, j)

indicates a relation with the previous representation [36]:

A = [\begin{matrix} a_{0, 0} & a_{0, 1} & \dots & a_{0, N - 1} \\ a_{1, 0} & a_{1, 1} & \dots & a_{1, N - 1} \\ ⋮ & ⋮ & ⋮ \\ a_{M - 1, 0} & a_{M - 1, 1} & \dots & a_{M - 1, N - 1} \end{matrix}]

(2)

Thus, we can define the virtual raster generated in this stage as a k-dimensional vector defined in Equation (3). In this case,

k = 7

since seven bands were used and

b_{1} (x, y)

corresponds to the intensity of the pixels of band 1, and so on:

v r (x, y) = [\begin{matrix} b_{1} (x, y) \\ b_{2} (x, y) \\ ⋮ \\ b_{n} (x, y) \end{matrix}]

(3)

As the virtual raster is composed of seven bands, only

b_{4}

,

b_{3}

, and

b_{2}

were chosen to generate the RGB image in QGis [37]. In this rendering process, the mean (Equation (4)) and standard deviation (Equation (5)) were calculated to adjust the contrast, creating a color table. In the following equations, K is a set containing the indices i of the bands involved in this step, i.e.,

K = {4, 3, 2}

:

g (x, y) = \frac{1}{| K |} \sum_{i \in K} f_{i} (x, y)

(4)

h (x, y) = \sqrt{\frac{1}{| K |} \sum_{i \in K} {(f_{i} (x, y) - g (x, y))}^{2}}

(5)

In the next step, an activity called “superimpose sensor” (from the Orfeo toolbox) was performed to project this “virtual raster” (composed by the 7 bands). Next, band 8 (panchromatic, with a spatial resolution of 15 m) was used in the study for pan-sharpening based on component substitution technique. Thus, through the pan-sharpening (RCS) function, a fusion between the virtual raster (projected) and the panchromatic band was made. In this way, a composite image composed of seven bands with a spatial resolution of 15 m was obtained.

Equation (6) mathematically describes the operations involved in this step [38,39,40,41]. In Equation (6), k indicates the indices of the spectral bands, g consists of a vector of injection gains, and w is a vector of weights. Thus, from the multi-spectral image interpolated to the scale of the panchromatic image (

S (x, y)

), the intensity of the

I_{L}

component of the image was calculated. Next, the process of histogram matching was completed between the panchromatic image P and the intensity component

I_{L}

. Finally, from the sum of these values, the result of the pan-sharpening (

R (x, y)

) was obtained:

\begin{matrix} R_{k} (x, y) = S_{k} (x, y) + g_{k} (P - I_{L}), \\ where k = 1, \dots, N, \\ g = [\begin{matrix} g_{1}, \dots, g_{k}, \dots, g_{N} \end{matrix}], \\ I_{L} = \sum_{i = 1}^{N} w_{i} S_{i} (x, y), \\ w = [\begin{matrix} w_{1}, \dots, w_{i}, \dots, w_{N} \end{matrix}] . \end{matrix}

(6)

Then, bands 4, 3, and 2 were selected for rendering. Contrast enhancement was applied to the rendering result. These processes are important because they facilitate the manual selection of samples in the image. This image is the final result of the pre-processing.

Next, differently from the strategy proposed in [17], the PCA method was applied for dimensionality reduction in the resulting image from pre-processing, to generate an image with low dimensionality. This stage was carried out through the Orfeo toolbox, with the amount of seven output components defined, one for each band. Thus, from the input image, with a high spatial dimension, this step generated an image with a low spatial dimension.

In mathematical notation, initially, PCA performs the calculation of the covariance matrix from the input image [11], and in this case, this is the result of pre-processing. The covariance matrix is composed of eigenvalues which are calculated for dimensionality reduction. Equation (7) shows this process, where

σ_{i, j}

corresponds to the covariance of each available pair of bands,

D_{p, i}

represents the digital number of pixel p in band i, and

μ_{i}

is the mean of D for band i:

\begin{matrix} C_{B, B} = [\begin{matrix} σ_{1, 1} & \dots & σ_{1, n} \\ ⋮ & ⋮ & ⋮ \\ σ_{n, 1} & \dots & σ_{n, n} \end{matrix}] \\ where σ_{i, j} = \frac{1}{N - 1} \sum_{p - 1}^{N} (D_{p, i} - μ_{i}) (D_{p, j} - μ_{j}) \end{matrix}

(7)

In this method, the eigenvalue (

λ

) is calculated from the variance-covariance matrix as the root of

d e t (C - λ I) = 0

, where C is the covariance matrix of the bands and I is the identity matrix. The matrix of Equation (8) presents the principal components, where Y consists of the principal component vector, t is the transformation matrix, and X corresponds to the original input data vector:

Y = (\begin{matrix} y_{1} \\ y_{2} \\ ⋮ \\ y_{n} \end{matrix}) = (\begin{matrix} t_{1, 1} & \dots & t_{1, n} \\ ⋮ & ⋮ & ⋮ \\ t_{n, 1} & \dots & t_{n, n} \end{matrix}) (\begin{matrix} x_{1} \\ x_{2} \\ ⋮ \\ x_{n} \end{matrix})

(8)

During this step, the eigenvectors were calculated considering the eigenvalues, i.e.,

(C - λ_{k} I) t_{k} = 0

, where

λ_{k}

consists of the k eigenvalues and

t_{k}

of the k eigenvectors. After this step, the output image (composed of the seven components) was rendered using components 4, 3, and 2. Then, contrast enhancement based on the mean and standard deviation was also applied.

In the next step, sampling was initiated by manually selecting 250 polygons that delimit regions in the image. This selection was made on the resulting image of the pre-processing, since it was in natural colors, facilitating the visualization of the samples. For each polygon, a class was assigned (either “water” or “non-water”), according to the components of the selected region. Half of the 250 samples correspond to open water bodies, such as lakes, rivers, reservoirs, etc. It is expected that these open water bodies can be classified as “water” by the classifiers. The other half corresponds to various geographical features, such as fields, forests, roads, cities, and so on. It is expected that these diverse geographical features can be classified as “non-water” by the classifiers. Directly related observations to the environmental disaster, such as mining waste reservoirs, water reservoirs, and contaminated rivers, for example, were not selected as samples to avoid biased results. In addition, the samples were selected in a well distributed manner in the image and the validation was performed from topographic maps of the region (adopted as a ground truth). Figure 3 presents an example of a sample from each class, in addition to red dots indicating the dispersion of all samples in the resulting image from pre-processing.

Right after, features were extracted from the pre-processing image and the PCA image considering the selected regions in the sampling. Thus, the following metrics were used for each sample: mean (Equation (9)), variance (Equation (10)), standard deviation (Equation (11)), median (Equation (12)), minimum value (Equation (13)), and maximum value (Equation (14)). The following equations present these metrics [36], with

S_{x, y, k}

being a subset of M elements of the image f, which represents a sample collected over the k bands or components:

m e a n_{k} = \frac{1}{M} \sum_{(s, t) \in S_{x, y, k}} f_{k} (s, t)

(9)

v_{k}^{2} = \frac{1}{M} \sum_{(s, t) \in S_{x, y, k}} {[f_{k} (s, t) - m e a n_{k}]}^{2}

(10)

s t d_{k} = \sqrt{v_{k}^{2}}

(11)

m e d i a n_{k} = m e d i a n {S_{x, y, k}}

(12)

m i n_{k} = m i n {S_{x, y, k}}

(13)

m a x_{k} = m a x {S_{x, y, k}}

(14)

From these characteristics, Joyplot, Boxplot, and Heatmap visualizations were generated. The first consists of stacked density graphs, partially overlapping, useful for visually comparing distributions of two or more variables. The second allows the visualization of information such as discrepant data, lower and upper limit, and quartiles of a distribution of values. The last view allows the analysis, through colors, of the relationship between values of a matrix. Such visualizations were useful to analyze the differences between the two dimensional spaces.

Next, the training, validation, and testing approaches of the classifiers were performed for the two images separately: the final pre-processing image and the image resulting from the PCA method. In order to do so, it was necessary to compute the second order statistics of the two images using the Orfeo toolbox, according to Equations (15) and (16). This step results in a file for each image f (of dimension

M \times N

), containing the mean

m e a n

and the standard deviation

s t d

of each band or component k that composes it. Such information was necessary for the construction of the forecasting models. Two models were generated for each image, one for each classifier (Boost—contextual; Decision Tree—non-contextual).

m e a n_{k} = \frac{1}{M N} \sum_{x = 0}^{M - 1} \sum_{y = 0}^{N - 1} f_{k} (x, y)

(15)

s t d_{k} = \sqrt{\frac{1}{M N} \sum_{x = 0}^{M - 1} \sum_{y = 0}^{N - 1} [f_{k} (x, y) - m e a n_{k}]}

(16)

During its training, the Boost classifier performs a sequence of steps from the training set

{(x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{n}, y_{n})}

, where

x_{i}

represents a sample and

y_{i}

its respective label (

y_{i} \in - 1, 1)

[24,42]. Let

D_{t} (x_{i}, y_{i})

be the t-th distribution over the samples, where the training consists of: initializing the weights

D_{0} (x_{i}, y_{i})

(Equation (17)); training weak classifiers

h_{j}

, for each feature j (Equation (18)), from an iterative process (with respect to

t = 1, \dots, T

) of computing the classifier error

ϵ_{j}

(Equation (19)), choosing the weak classifier

h_{t}

with the lowest error

ϵ_{t}

(Equation (20)), and updating the weights

D_{t + 1} (x_{i}, y_{i})

(Equation (21), where

Z_{t}

ensures that

D_{t}

is a normal distribution, i.e.,

\sum_{i = 1}^{N} D_{t} (x_{i}, y) = 1

). This iterative process is performed until a stopping criterion is satisfied (

ϵ \geq 1 / 2

, in Equation (20)) or the maximum number of training epochs T is reached. At the end of these steps, the final output

H (x)

is obtained, as shown in Equation (22):

D_{0} (x_{i}, y_{i}) = \frac{1}{N}, i = 1, . . ., N

(17)

h (x_{i}) = \{\begin{matrix} 1, & if p_{j} x_{i, j} < p_{j} θ_{i, j} \\ - 1, & otherwise \end{matrix}

(18)

ϵ_{j} = \sum_{i : y_{i} \neq h_{j}} D_{t} (x_{i}, y_{i}), t = 1, . . ., T

(19)

\begin{matrix} If ϵ_{t} \geq 1 / 2, then stop \\ else α_{t} = \frac{1}{2} I n (\frac{1 - ϵ_{t}}{ϵ_{t}}) \end{matrix}

(20)

D_{t + 1} (x_{i}, y_{i}) = \frac{D_{t} (x_{i}, y_{i}) e^{- α_{t} y_{t} h_{t} (x_{i})}}{Z_{t}}

(21)

\begin{matrix} H (x) = s i g n (\sum_{t = 1}^{T} α_{t} h_{t} (x)), s i g n (x) \{\begin{matrix} - 1, & x < 0 \\ 0, & x = 0 \\ 1, & x > 0 \end{matrix} \end{matrix}

(22)

The Decision Tree classifier, in turn, builds a tree structure from the training data [25,43,44]. For example, the Iterative Dichotomiser 3 (ID3) is an algorithm that builds a decision tree from entropy (

E n t (D)

, which is the impurity of a collection of samples D of each node of the tree) and information gain (

G a i n (D, a)

, which represents the expected reduction in the entropy of the collection of samples D with respect to the attribute a), presented in Equations (23) and (24), respectively. In this process, the following are considered: the training set

D = {(x_{1}, y_{1}), (x_{2}, y_{2}), . . ., (x_{m}, y_{m})}

and a set of attributes

A = {a_{1}, a_{2}, . . ., a_{d}}

of D, such as color, shape, texture, etc. (where

d = {1, 2, . . ., k}

). Still,

p_{k}

(

k = 1, 2, . . ., | D |

) consists of the proportion of each type for each current sample set,

V = {a_{i}^{1}, a_{i}^{2}, . . ., a_{i}^{v}}

is a set of v features for each attribute

a_{i}

, and

D_{v}

represents the subset of samples related to the value

a_{i}^{v}

of

a_{i}

in D. Thus, considering these data, a recursive process is initiated from the root node and all training data. At each node the purity criterion is applied to find the ideal division in order to generate new child nodes. If some stopping criteria are met (e.g., maximum branch size is reached, all samples of a node are of the same class, etc.), the recursive process is stopped. Some pruning optimizations of the tree can be used in some implementations.

E n t (D) = - \sum_{d = 1}^{k} p_{k} {log}_{2} p_{k}

(23)

G a i n (D, a) = E n t (D) - \sum_{v = 1}^{V} \frac{| D_{v} |}{| D |} E n t (D_{v})

(24)

The training and validation of the models used by the classifiers were carried out based on the second-order statistics files, on the selected samples, and on the respective images (result of the pre-processing and result of the PCA). During the training stage, half of the samples were used to train each model and the rest to validate. Despite the samples having been selected from the pre-processing image, the same polygons were used to indicate the same regions in the PCA image. Simultaneously, from the validation samples, confusion matrices were built to analyze the performance of the classifiers. A confusion matrix was generated for each model.

The classifications were carried out based on their respective (1) models created in the previous step (training), (2) images, and (3) second-order statistics. Each result of this stage consists of a binary image containing the classification of objects of the “water” class in black and the “non-water” class in white. In this stage, the classifier automatically assigns, for each pixel of the image, the class to which it belongs: “water” or “non-water”.

Finally, an analysis of the differences between contextual and non-contextual classifications was carried out. For this activity, the values of the confusion matrices obtained during the training of the classifiers were analyzed and a visual analysis of the images obtained in the classification step was performed. This step is important to check for the occurrence or not of divergences.

3. Results

Five metrics (mean, median, standard deviation, minimum value, and maximum value) were used to visually analyze the dimensional spaces of the resulting image from pre-processing and PCA, from a Joyplot. Thus, a distribution was created for each metric, considering each band or component. The two visualizations are presented in Figure 4. These visualizations allow for a general analysis of the values obtained by the metrics for the samples of each band or component of the image.

Analyzing Figure 4, it can be seen that the distributions of mean, median, minimum value, and maximum value of the components of the image resulting from PCA in Figure 4b became more concentrated around the value 0, unlike what happened for the image resulting from pre-processing in Figure 4a. Considering the samples on the first component (1) of the image resulting from PCA, it is observed that it has less variability in the standard deviation than the others, since its distribution has a greater amplitude.

Next, some Boxplots were generated from the six metrics used: mean, median, standard deviation, variance, minimum value, and maximum value. Thus, in each visualization, a metric was considered and a Boxplot was obtained for each component or band. The visualizations are displayed in Figure 5. From these visualizations, it is possible to make a more detailed comparison between the values obtained by the metrics for the two images.

Figure 5 shows, in Figure 5b, that after the application of dimensionality reduction, mainly the mean, median, minimum value, and maximum value, showed values with less variability, since these were closer to the mean and the components showed similar means. Besides, analyzing the boxplots calculated on the value of the standard deviation of the components of the image resulting from PCA, it is observed that the first component was the one that presented the least variability in relation to the other components. This situation does not occur in the resulting image from pre-processing in Figure 5a, in which the last three bands present more variability than the first four.

Finally, considering the standard deviation values obtained from the samples collected from the bands of the resulting image from pre-processing (PP) and from the components of the resulting image from PCA, the cosine similarity of the two distributions was calculated in order to understand how similar these values are. A Heatmap was built considering the similarity between PP and PCA, PP and PP, and PCA and PCA. Such visualizations, displayed in Figure 6, are useful for analyzing the relationship between these values.

Analyzing Figure 6, in Figure 6a, we can observed in a lower similarity (value close to

0.6

) between the four first bands of the pre-processing image and the third component of the PCA image. In Figure 6b, it is possible to analyze that, regarding the bands of the pre-processing image, there is a strong similarity (value close to

1.0

) between the first four bands, as well as the last two. For the resulting components of the PCA, presented in Figure 6c, there is less similarity (value close to

0.65

) between the first two components and the third, and the same occurs between the third and the last.

During the training and validation of the models corresponding to the classifiers used in this study, the confusion matrices presented in Table 1 were obtained. From the data presented, it is possible to observe that the values produced for the classifications of the images from the pre-processing and PCA were different. The values reveal that for the resulting image of the pre-processing, the classifier Decision Tree correctly assigned the class “non-water” to 6190 pixels and the class “water” to 6212 pixels. For the PCA image, this classifier correctly classified 6224 pixels that belonged to the “non-water” class and 6257 to the “water” class. The classifier Boost correctly assigned the class “non-water” to 6245 pixels and the class “water” to 6189 pixels of the pre-processing image. For the PCA image, the pixels were 6195 and 6225, respectively. Analyzing the classification errors, with regard to the classifications using the Decision Tree classifier, for the resulting image of the pre-processing, there were 129 pixels that were of the “non-water” class and were classified as “water” and 107 pixels that were of the “water” class and were classified as “non-water”. Analyzing the performance of the same classifier applied to the image resulting from the PCA dimensionality reduction method, there were 95 and 62 pixels, respectively. Finally, considering the Boost classifier used with the pre-processing image, 74 pixels of the “non-water” class were classified as “water” and 130 that were “water” were classified as “non-water”. By analyzing this classifier together with the result of the PCA image classification, there were 124 and 94 pixels, respectively.

Inconsistency is an indication of the presence of anomalies and occurs when there is divergence in predictions; thus, the approach used to measure this situation was the divergence analysis, responsible for checking the difference between two probability distributions. This is necessary because it is expected that both classifications (contextual and non-contextual) produce similar probability estimates during the classification step for the same image. Figure 7 presents the results of the contextual (obtained by the classifier Boost) and non-contextual (obtained by the classifier Decision Tree) classifications of each of the two images (pre-processing and PCA).

By visually analyzing the images resulting from contextual and non-contextual classifications for the pre-processing image in Figure 7b,c respectively, it was observed that there was a divergence in classification in the region of the river that was affected by the environmental disaster (e.g., the part of the river indicated by the red arrow in Figure 7a), indicating an inconsistency in polluted water bodies. This fact does not occur in other water bodies not affected by the event (e.g., the lake indicated by the blue arrow in Figure 7a), indicating a congruence for uncontaminated water. In terms of the classification results obtained from the PCA image, in Figure 7e,f, respectively, it was observed that the region affected by the disaster was not recognized as water, while unaffected water bodies were recognized as water by both classifiers. This lack of divergence between the results indicates a congruence between them. Another relevant fact was the difference in the amount of noise between the different feature spaces for the non-contextual classifier, i.e., between the images Figure 7c,f. The result of this classifier for the image resulting from PCA Figure 7f showed more noise than in the result of the pre-processing image Figure 7c.

Therefore, the classification of the pixels that correspond to water bodies in the image depends on the type of classifier and the feature space of the image. The dimensionality reduction caused the contextual classifier not to classify the polluted river as water, as exemplified in Figure 7e, differently from the result of the same classifier without the application of PCA, as exemplified in Figure 7b. Still, in the high-dimensional space, i.e., for the pre-processing image, the Boost classifier was able to classify both polluted and non-polluted water bodies as “water”, as exemplified in Figure 7b. This situation was different for the classifier Decision Tree, as exemplified in Figure 7c, which was able to recognize only non-polluted water bodies as “water”, assigning the class “non-water” to polluted water bodies. For the low-dimensional space (i.e., PCA image), both classifiers were able to classify only non-polluted water bodies as “water”, as exemplified in Figure 7e,f.

In an analysis of the four examples, i.e., Figure 7b,c,e,f, all of them detected the lake as water, but there was no agreement regarding the classification of the contaminated river. These situations described previously demonstrate the occurrence of anomalies, even when the incongruence happens between multimodal systems, i.e., in different data channels, as happened between Figure 7b,f in the result of the classification of the contaminated river.

4. Discussion

This study aimed to investigate the impact of dimensionality reduction on anomaly detection based on Kittler’s Taxonomy in remote sensing. From the results obtained, we have found that the application of the ADS-KT in an image with low-dimensional space did not produce results similar to those obtained in an image with high-dimensional space, indicating that the reduction of dimensionality is not recommended to be associated with the strategy.

From the results obtained in the classification of the image resulting from pre-processing (high-dimensional space), i.e., the same results obtained in [17], the divergence between the two classifications indicated the occurrence of anomalies of the type Unexpected structure and structural components, since the conditions proposed by [16] for its detection were satisfied. This divergence is evidenced in Figure 7. Thus, the classifiers that were modeled from samples of “water” and “non-water” to classify a domain, when used to classify the entire image (composed of water bodies that have undergone changes in terms of structure and components, from an environmental disaster), presented a divergence in their classifications.

When applying the Principal Components Analysis method, only the most relevant features were preserved, generating an image of low-dimensional space, with statistical values different from the original image. In this way, as not all the features of the original samples were taken into consideration, the results evidenced by the classification indicated the occurrence of another category of anomaly (Unexpected structural component). In other words, dimensionality reduction eliminated attributes of features or structural components of the original image and this omission led to a weak modeling of the classifiers. This modeling led to the emergence of an unexpected structural component, since where we expected a “water” classification by one of the classifiers, there was a classification as “non-water”, with no relevant divergences in the classification, as presented in Figure 7.

The values presented in Table 1 (confusion matrix of the classifiers) confirm that the class assignments made by the classifiers were not similar between the two images considering the same classifiers. Still, the visualizations created from the features extracted from both images (Figure 4, Figure 5 and Figure 6) showed that there was a structural difference in the PCA image compared to the pre-processing image. These facts explain the difference in types of anomalies found for the two images.

In summary, although many studies [4,5,6,7,8,9,10,11,12,13,14,15] achieve the same or similar results from their data analysis activities in the low- and in the high-dimensional space, this study is responsible for introducing a new exception: when anomaly detection based on Kittler’s Taxonomy is used in image analysis activities, the use of a low-dimensional space can lead to the occurrence of a type of anomaly different from the one that occurs for the high-dimensional space.

5. Conclusions

This study investigated the impact of using dimensionality reduction and ADS-KT. Our study also compared the results obtained by applying the strategy to two dimensional spaces: high and low. Thus, it is concluded that dimensionality reduction directly impacts the application of the ADS-KT used by [17], in that its application from the PCA method generated different results from those originally presented by [17], which were also correctly reproduced in this study. This fact evidences that the taxonomy is quite restrictive in relation to the content described by the theory presented in [16]. In other words, the boundaries between the different types of anomalies proposed by Kittler’s Taxonomy are well defined, in such a way that nuances in the application of the methodology can imply the extrapolation of these boundaries, leading the study to find a different type of anomaly than the one initially imagined.

The findings indicate that dimensionality reduction, on the one hand, avoids the detection of anomalies of the type Unexpected structure and structural components. On the other hand, it allows us to detect anomalies of the type Unexpected structural component that were unexpected. To the authors’ knowledge, this is the first study that describes the occurrence of an anomaly of the type Unexpected Structural Component in Remote Sensing.

A limitation of this study is that values located between the extremes of “high dimensionality” and “low dimensionality” were not investigated for the possibility of any of these values being identified as a threshold between the anomalies of the types Unexpected structure and structural components and Unexpected structural component, in relation to the application of the methodology proposed by this study. These values were not investigated because this type of investigation is outside the scope of this study’s proposal. Thus, as future work, it is suggested to investigate if there is a threshold for dimensionality reduction, which defines the detection limit between these two types of anomalies.

It is expected that the results and outcomes achieved in this study will also contribute to other research related to machine learning. In addition, it is also expected that this study will stimulate other researchers to investigate the application of anomaly detection based on Kittler’s Taxonomy not only for Remote Sensing, but also for other areas of knowledge.

Author Contributions

Conceptualization, G.C.M., W.E.M.J., M.A.D., D.M.E., R.G.N. and W.C.; Funding acquisition, M.A.D., G.C.M. and D.M.E.; Investigation, G.C.M., M.A.D. and D.M.E.; Methodology, G.C.M., M.A.D. and D.M.E.; Resources, G.C.M., W.E.M.J., M.A.D., D.M.E., R.G.N. and W.C.; Validation, G.C.M.; writing—original draft, G.C.M., M.A.D. and D.M.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the São Paulo Research Foundation (FAPESP), grants #2020/06477-7, #2016/24185-8, #2021/01305-6, and #2021/03328-3 and the National Council for Scientific and Technological Development (CNPq), grants #316228/2021-4 and #305220/2022-5.

Data Availability Statement

The Landsat 8 dataset was provided by the United States Geological Survey (USGS): https://earthexplorer.usgs.gov/ (accessed on 5 August 2023). The QGIS software was provided by the QGIS Development Team: https://www.qgis.org/ (accessed on 5 August 2023). The Orfeo ToolBox was provided by the OTB Communities Development Team: https://www.orfeo-toolbox.org/ (accessed on 5 August 2023). The Python Programming Language was provided by Python Software Foundation: https://www.python.org/ (accessed on 5 August 2023). The Google Colaboratory was provided by Google: https://colab.google/ (accessed on 5 August 2023). The outputs (classification and extracted features) of this study can be found in the following Google Drive repository: https://drive.google.com/drive/folders/1eReaq0KhmCO-JLQz54tOJHHFqxS9Tbns?usp=sharing (accessed on 13 August 2023).

Acknowledgments

We thank the United States Geological Survey (USGS) for providing Landsat 8 dataset (https://earthexplorer.usgs.gov/ (accessed on 5 August 2023)); the QGIS Development Team for providing QGIS software (https://www.qgis.org/ (accessed on 5 August 2023)); the Open-source Geospatial and the OTB Communities for providing the Orfeo ToolBox (https://www.orfeo-toolbox.org/ (accessed on 5 August 2023)); the National Water Agency (ANA) for providing water resources information from Brazil (https://www.gov.br/ana/pt-br (accessed on 25 March 2023)). This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Processing Parameters

Appendix A.1. Pre-Processing

In the steps involving the pre-processing, the QGis software (version 2.18.19—Las Palmas) [37] and the toolbox Orfeo (version 6.4.0) [41] were used.

From the seven bands of the Landsat 8 satellite image imported into QGis as raster layers in band order 7 to 1, the tool Build Virtual Raster (Catalog) (found in the Raster menu and option Miscellaneous) was used. The visible raster layers (the seven selected bands) were chosen as input and the Separate option was checked so that each source raster was stored in a band in the raster of output (virtual).

With regard to the raster rendering settings used in this study, these can be defined in the Properties option of the right-click menu on the respective raster to be configured. In this option, when selecting the Style menu, Band 4 was defined as Red band, Band 3 as Green band and Band 2 as Blue band. Additionally, the options Mean

+ / -

standard deviation x (the value

2.00

was set) and Clip extent to canvas were selected and applied.

To perform the pan-sharpening, the toolbox Orfeo was used. In the Geometry section, the Superimpose sensor option is responsible for performing the projection. In this step, the reference input was defined as the raster of band 8 (panchromatic image) and the image to be projected as the previously created virtual raster (multi-spectral image). The default values were kept, i.e., 0 for Default elevation, 4 for Spacing of the deformation field, default Mode, and Interpolation of type Nearest Neighbor—NN.

Then, merging was performed by the Pansharpening (rcs) option of the toolbox. In this step, the panchromatic image and the result of the Superimpose sensor were selected as input to produce the image resulting from the pre-processing.

For the other next steps described in the following subsections, the software QGis (version 3.22.12—Białowieża) [45] and the toolbox Orfeo (version 7.4.0) [46] were used.

Appendix A.2. Dimensionality Reduction

Dimensionality reduction was performed by the option DimensionalityReduction from the Image Filtering section of the Orfeo toolbox. This stage received as input the image resulting from pre-processing and used the following parameters.

Rescale Output: no;
Algorithm: pca;
Option Perform pca whitening: checked (True);
Number of Components: 0, indicating that all components will be kept;
Option Center and reduce data: unchecked (False).

Appendix A.3. Classification

The second-order statistics, used in the classification, were computed by the ComputeImageStatistics option of the Learning section of the toolbox Orfeo. This step was performed for the image resulting from pre-processing and, later, for the image resulting from PCA, since classifications were performed on both images and it is necessary to know the statistical values of these two images.

For training the two classifiers (Boost and DT) on the two images, the option TrainImagesClassifier from toolbox Orfeo was used. This step received as input the respective image (pre-processing result or PCA), the file of the samples collected in the sampling and the respective image statistics file. Most of the default values have been kept.

Maximum training sample size per class: 1000;
Maximum validation sample size per class: 1000;
Bound sample number by minimum: 1;
Training and validation sample ratio: $0.5$ ;
Name of the discrimination field (the name of the field containing the classes in the collected samples file, a shapefile): Class;
Default elevation: 0;
Random seed: 0.

According to the classifier, other standard parameters were used. In case of Boost:

Boost type: real;
Weak count: 100;
Weight Trim Rate: $0.95$ ;
Maximum depth of the tree: 1.

As for the DT:

Maximum depth of the tree: 65,535;
Minimum number of samples in each node: 10;
Termination criteria for regression tree: $0.01$ ;
Cluster possible values of a categorical variable into K <= cat clusters to find a suboptimal split: 10;
Option Set Use1seRule flag to false: checked (True);
Option Set TruncatePrunedTree flag to false: checked (True).

At the end of this stage, a confusion matrix and model were generated for each classifier associated with each image.

Classification was performed using the ImageClassifier option, also from toolbox Orfeo, which received as input the respective image (result of pre-processing or PCA), the respective model previously created for the two classifiers (Boost or DT) and the image statistics file, resulting in the classifications. The default values of this version were maintained.

Appendix A.4. Feature Extraction

For feature extraction, the Zonal Statistics tool available in the Raster Analysis section of Processing Toolbox in QGis was used. Thus, for each band or component of the image, this tool was executed to calculate the following statistics: mean, median, standard deviation, minimum, maximum, and variance. The layer created in the sampling step was also used to inform the regions of the image in which these values were computed. Thus, for each band or component, considering the polygons (the samples), the six statistics were calculated.

The final file of this step was used to generate the views developed using the Python language, within the Google Collaboratory development environment.

References

Tiouiouine, A.; Yameogo, S.; Valles, V.; Barbiero, L.; Dassonville, F.; Moulin, M.; Bouramtane, T.; Bahaj, T.; Morarech, M.; Kacimi, I. Dimension Reduction and Analysis of a 10-Year Physicochemical and Biological Water Database Applied to Water Resources Intended for Human Consumption in the Provence-Alpes-Côte d’Azur Region, France. Water 2020, 12, 525. [Google Scholar] [CrossRef]
Wang, G.; Lauri, F.; Hajjam El Hassani, A. A Study of Dimensionality Reduction’s Influence on Heart Disease Prediction. In Proceedings of the 2021 12th International Conference on Information, Intelligence, Systems & Applications (IISA), Chania Crete, Greece, 12–14 July 2021; pp. 1–6. [Google Scholar] [CrossRef]
Sameer, Y.M.; Abed, A.N.; Sayl, K.N. Geomatics-based approach for highway route selection. Appl. Geomat. 2023, 15, 161–176. [Google Scholar] [CrossRef]
Fowler, J.E.; Du, Q.; Zhu, W.; Younan, N.H. Classification performance of random-projection-based dimensionality reduction of hyperspectral imagery. In Proceedings of the 2009 IEEE International Geoscience and Remote Sensing Symposium, Cape Town, South Africa, 12–17 July 2009; Volume 5, pp. V-76–V-79. [Google Scholar] [CrossRef]
Ghosh, S.; Pramanik, P. A Combined Framework for Dimensionality Reduction of Hyperspectral Images using Feature Selection and Feature Extraction. In Proceedings of the 2019 IEEE Recent Advances in Geoscience and Remote Sensing: Technologies, Standards and Applications (TENGARSS), Kochi, India, 17–20 October 2019; pp. 39–44. [Google Scholar] [CrossRef]
Bilius, L.B.; Pentiuc, S.G. Tensor-Based and Projection-Based Methods for Dimensionality Reduction of Hyperspectral Images. In Proceedings of the 2022 International Conference on Development and Application Systems (DAS), Suceava, Romania, 26–28 May 2022; pp. 167–170. [Google Scholar] [CrossRef]
Sellami, A.; Farah, M. Comparative study of dimensionality reduction methods for remote sensing images interpretation. In Proceedings of the 2018 4th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), Sousse, Tunisia, 21–24 March 2018; pp. 1–6. [Google Scholar] [CrossRef]
Avramovic, A.; Risojevic, V. Descriptor dimensionality reduction for aerial image classification. In Proceedings of the 2011 18th International Conference on Systems, Signals and Image Processing, Sarajevo, Bosnia, 16–18 June 2011; pp. 1–4. [Google Scholar]
Journaux, L.; Tizon, X.; Foucherot, I.; Gouton, P. Dimensionality Reduction Techniques: An Operational Comparison On Multispectral Satellite Images Using Unsupervised Clustering. In Proceedings of the Proceedings of the 7th Nordic Signal Processing Symposium—NORSIG 2006, Reykjavik, Iceland, 7–9 June 2006; pp. 242–245. [Google Scholar] [CrossRef]
Grobler, T.; Kleynhans, W.; Salmon, B. Empirically Comparing Two Dimensionality Reduction Techniques – PCA and FFT: A Settlement Detection Case Study in the Gauteng Province of South Africa. In Proceedings of the IGARSS 2019—2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 28 July–2 August 2019; pp. 3329–3332. [Google Scholar] [CrossRef]
Navin, M.S.; Agilandeeswari, L.; Anjaneyulu, G. Dimensionality Reduction and Vegetation Monitoring On LISS III Satellite Image Using Principal Component Analysis and Normalized Difference Vegetation Index. In Proceedings of the 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), Vellore, India, 24–25 February 2020; pp. 1–5. [Google Scholar] [CrossRef]
Yang, X.; Xu, W.d.; Liu, H.; Zhu, L.y. Research on Dimensionality Reduction of Hyperspectral Image under Close Range. In Proceedings of the 2019 International Conference on Communications, Information System and Computer Engineering (CISCE), Haikou, China, 5–7 July 2019; pp. 171–174. [Google Scholar] [CrossRef]
Zhang, X.; Huyan, N.; Zhou, N.; An, J. Semi-supervised sparse dimensionality reduction for hyperspectral image classification. In Proceedings of the 2016 IEEE Region 10 Conference (TENCON), Singapore, 22–25 November 2016; pp. 2830–2833. [Google Scholar] [CrossRef]
Gu, Y.; Wang, Q. Discriminative graph-based dimensionality reduction for hyperspectral image classification. In Proceedings of the 2016 8th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), Los Angeles, CA, USA, 21–24 August 2016; pp. 1–5. [Google Scholar] [CrossRef]
Liang, L.; Xia, Y.; Xun, L.; Yan, Q.; Zhang, D. Class-Probability Based Semi-Supervised Dimensionality Reduction for Hyperspectral Images. In Proceedings of the 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 23–25 November 2018; pp. 460–463. [Google Scholar] [CrossRef]
Kittler, J.; Christmas, W.; de Campos, T.; Windridge, D.; Yan, F.; Illingworth, J.; Osman, M. Domain Anomaly Detection in Machine Perception: A System Architecture and Taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 845–859. [Google Scholar] [CrossRef] [PubMed]
Dias, M.A.; Silva, E.A.d.; Azevedo, S.C.d.; Casaca, W.; Statella, T.; Negri, R.G. An Incongruence-Based Anomaly Detection Strategy for Analyzing Water Pollution in Images from Remote Sensing. Remote Sens. 2020, 12, 43. [Google Scholar] [CrossRef]
Dias, M.A.; Marinho, G.C.; Negri, R.G.; Casaca, W.; Muñoz, I.B.; Eler, D.M. A Machine Learning Strategy Based on Kittler’s Taxonomy to Detect Anomalies and Recognize Contexts Applied to Monitor Water Bodies in Environments. Remote Sens. 2022, 14, 2222. [Google Scholar] [CrossRef]
USGS—The United States Geological Survey, “Earth Explorer”. Available online: https://earthexplorer.usgs.gov/ (accessed on 5 August 2023).
Marcílio-Jr, W.E.; Eler, D.M. Explaining dimensionality reduction results using Shapley values. Expert Syst. Appl. 2021, 178, 115020. [Google Scholar] [CrossRef]
Crosta, A. Processamento Digital de Imagens de Sensoriamento Remoto; UNICAMP/Instituto de Geociências: Campinas, Brazilian, 1999. [Google Scholar]
Bishop, C.M. Pattern Recognition and Machine Learning (Information Science and Statistics); Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Richards, J.; Jia, X. Remote Sensing Digital Image Analysis; Springer: Berlin/Heidelberg, Germany, 1999. [Google Scholar] [CrossRef]
Boosting—OpenCV Documentation. Available online: https://docs.opencv.org/2.4/modules/ml/doc/boosting.html (accessed on 11 March 2023).
Decision Trees—OpenCV Documentation. Available online: https://docs.opencv.org/2.4/modules/ml/doc/decision_trees.html (accessed on 11 March 2023).
Weinshall, D.; Zweig, A.; Hermansky, H.; Kombrink, S.; Ohl, F.W.; Anemüller, J.; Bach, J.H.; Van Gool, L.; Nater, F.; Pajdla, T.; et al. Beyond Novelty Detection: Incongruent Events, When General and Specific Classifiers Disagree. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 1886–1901. [Google Scholar] [CrossRef] [PubMed]
Kittler, J.; Zor, C. A measure of surprise for incongruence detection. In Proceedings of the 2nd IET International Conference on Intelligent Signal Processing 2015 (ISP), London, UK, 1–2 December 2015; pp. 1–6. [Google Scholar] [CrossRef]
Ponti, M.; Kittler, J.; Riva, M.; de Campos, T.; Zor, C. A decision cognizant Kullback–Leibler divergence. Pattern Recognit. 2017, 61, 470–478. [Google Scholar] [CrossRef]
Kittler, J.; Zor, C. Delta Divergence: A Novel Decision Cognizant Measure of Classifier Incongruence. IEEE Trans. Cybern. 2019, 49, 2331–2343. [Google Scholar] [CrossRef] [PubMed]
United Nations Department of Economic and Social Affairs (UN DESA). The Sustainable Development Goals Report 2022–July 2022; United Nations Publications: New York, NY, USA, 2022. [Google Scholar]
Assaf, A.T.; Sayl, K.N.; Adham, A. Surface Water Detection Method for Water Resources Management. J. Phys. Conf. Ser. 2021, 1973, 012149. [Google Scholar] [CrossRef]
Adham, A.; Sayl, K.N.; Abed, R.; Abdeladhim, M.A.; Wesseling, J.G.; Riksen, M.; Fleskens, L.; Karim, U.; Ritsema, C.J. A GIS-based approach for identifying potential sites for harvesting rainwater in the Western Desert of Iraq. Int. Soil Water Conserv. Res. 2018, 6, 297–304. [Google Scholar] [CrossRef]
Shen, J.; Li, J.; Zhang, Y.; Song, J. Farmers’ Water Poverty Measurement and Analysis of Endogenous Drivers. Water Resour. Manag. 2023, 1–18. [Google Scholar] [CrossRef]
Sulaiman, S.O.; Kamel, A.H.; Sayl, K.N.; Alfadhel, M.Y. Water resources management and sustainability over the Western desert of Iraq. Environ. Earth Sci. 2019, 78, 495. [Google Scholar] [CrossRef]
Sayl, K.N.; Muhammad, N.S.; Yaseen, Z.M.; El-shafie, A. Estimation the Physical Variables of Rainwater Harvesting System Using Integrated GIS-Based Remote Sensing Approach. Water Resour. Manag. 2016, 30, 3299–3313. [Google Scholar] [CrossRef]
Gonzales, R.C.; Wintz, P. Digital Image Processing, 2nd ed.; Addison-Wesley Longman Publishing Co., Inc.: St, Boston, MA, USA, 1987. [Google Scholar]
Documentation for QGIS 2.18. Available online: https://docs.qgis.org/2.18/en/docs/ (accessed on 11 March 2023).
Vivone, G.; Alparone, L.; Chanussot, J.; Dalla Mura, M.; Garzelli, A.; Licciardi, G.A.; Restaino, R.; Wald, L. A Critical Comparison Among Pansharpening Algorithms. IEEE Trans. Geosci. Remote Sens. 2015, 53, 2565–2586. [Google Scholar] [CrossRef]
Mhangara, P.; Mapurisa, W.; Mudau, N. Comparison of Image Fusion Techniques Using Satellite Pour l’Observation de la Terre (SPOT) 6 Satellite Imagery. Appl. Sci. 2020, 10, 1881. [Google Scholar] [CrossRef]
Xu, Q.; Li, B.; Zhang, Y.; Ding, L. High-Fidelity Component Substitution Pansharpening by the Fitting of Substitution Data. IEEE Trans. Geosci. Remote Sens. 2014, 52, 7380–7392. [Google Scholar] [CrossRef]
Documentation for Orfeo ToolBox 6.4. Available online: https://www.orfeo-toolbox.org/CookBook-6.4/ (accessed on 11 March 2023).
Shen, L.; Li, C. Water body extraction from Landsat ETM+ imagery using adaboost algorithm. In Proceedings of the 2010 18th International Conference on Geoinformatics, Beijing, China, 18–20 June 2010; pp. 1–4. [Google Scholar] [CrossRef]
Yi-bin, L.; Ying-ying, W.; Xue-wen, R. Improvement of ID3 algorithm based on simplified information entropy and coordination degree. In Proceedings of the 2017 Chinese Automation Congress (CAC), Jinan, China, 20–22 October 2017; pp. 1526–1530. [Google Scholar] [CrossRef]
Swain, P.H.; Hauska, H. The decision tree classifier: Design and potential. IEEE Trans. Geosci. Electron. 1977, 15, 142–147. [Google Scholar] [CrossRef]
Documentation for QGIS 3.22. Available online: https://docs.qgis.org/3.22/en/docs/ (accessed on 11 March 2023).
Documentation for Orfeo ToolBox 7.4. Available online: https://www.orfeo-toolbox.org/CookBook-7.4/ (accessed on 11 March 2023).

Figure 1. Information related to the study area: (a) location in South America; (b) part of the hydrographic map of Brazil—the densest region indicates the Doce River basin; (c) image of the study area in the composition R(4)G(3)B(2).

Figure 2. Sequence of steps of the proposed methodology. The step used for dimensionality reduction is highlighted. From this stage, the methodology proposed in this study differs from the ADS-KT presented in [17].

Figure 3. The red dots, in (a), indicate the locations of the collected samples in the image. Examples of water (b) and non-water (c) samples selected in the image.

Figure 4. Joyplots obtained considering the features extracted on (a) the pre-processing image and (b) the PCA image. In each visualization, a distribution is generated considering a metric for each band or component.

Figure 5. Boxplots obtained considering the features extracted on (a) the pre-processing image and (b) the PCA image, totaling six views for each image. Each visualization considers one of the metrics used for each band or component.

Figure 6. Heatmaps obtained from the cosine similarity between (a) the pre-processing image and the PCA image, (b) only the pre-processing image, and (c) just the PCA image. Values close to 1 indicate greater similarity.

Figure 7. Part of the results obtained on (a) the pre-processing image: (b) contextual classification (Boost) and (c) non-contextual classification (Decision tree). In addition to the results obtained on (d) the PCA image: (e) contextual classification and (f) non-contextual classification. The red arrow, in (a), indicates a region of the river that was affected by the environmental disaster, and the blue arrow indicates a lake not affected by the event.

Table 1. Confusion matrices for the classifiers.

				Produced Labels
				0 (No-Water)	1 (Water)
Pre-processing	DT	Reference labels	0 (no-water)	6190	129
	DT	Reference labels	1 (water)	107	6212
	Boost	Reference labels	0 (no-water)	6245	74
	Boost	Reference labels	1 (water)	130	6189
PCA	DT	Reference labels	0 (no-water)	6224	95
	DT	Reference labels	1 (water)	62	6257
	Boost	Reference labels	0 (no-water)	6195	124
	Boost	Reference labels	1 (water)	94	6225

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Marinho, G.C.; Júnior, W.E.M.; Dias, M.A.; Eler, D.M.; Negri, R.G.; Casaca, W. Dimensionality Reduction and Anomaly Detection Based on Kittler’s Taxonomy: Analyzing Water Bodies in Two Dimensional Spaces. Remote Sens. 2023, 15, 4085. https://doi.org/10.3390/rs15164085

AMA Style

Marinho GC, Júnior WEM, Dias MA, Eler DM, Negri RG, Casaca W. Dimensionality Reduction and Anomaly Detection Based on Kittler’s Taxonomy: Analyzing Water Bodies in Two Dimensional Spaces. Remote Sensing. 2023; 15(16):4085. https://doi.org/10.3390/rs15164085

Chicago/Turabian Style

Marinho, Giovanna Carreira, Wilson Estécio Marcílio Júnior, Mauricio Araujo Dias, Danilo Medeiros Eler, Rogério Galante Negri, and Wallace Casaca. 2023. "Dimensionality Reduction and Anomaly Detection Based on Kittler’s Taxonomy: Analyzing Water Bodies in Two Dimensional Spaces" Remote Sensing 15, no. 16: 4085. https://doi.org/10.3390/rs15164085

APA Style

Marinho, G. C., Júnior, W. E. M., Dias, M. A., Eler, D. M., Negri, R. G., & Casaca, W. (2023). Dimensionality Reduction and Anomaly Detection Based on Kittler’s Taxonomy: Analyzing Water Bodies in Two Dimensional Spaces. Remote Sensing, 15(16), 4085. https://doi.org/10.3390/rs15164085

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dimensionality Reduction and Anomaly Detection Based on Kittler’s Taxonomy: Analyzing Water Bodies in Two Dimensional Spaces

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Materials

2.3. Conceptualization

2.3.1. Dimensionality Reduction

2.3.2. Principal Component Analysis (PCA)

2.3.3. Multi- and Hyperspectral Images

2.3.4. Classification

2.3.5. Kittler’s Taxonomy

2.4. Methodology

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Processing Parameters

Appendix A.1. Pre-Processing

Appendix A.2. Dimensionality Reduction

Appendix A.3. Classification

Appendix A.4. Feature Extraction

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI