A Texture-Considerate Convolutional Neural Network Approach for Color Consistency in Remote Sensing Imagery

Qian, Xiaoyuan; Su, Cheng; Wang, Shirou; Xu, Zeyu; Zhang, Xiaocan

doi:10.3390/rs16173269

Open AccessArticle

A Texture-Considerate Convolutional Neural Network Approach for Color Consistency in Remote Sensing Imagery

by

Xiaoyuan Qian

,

Cheng Su

^*

,

Shirou Wang

,

Zeyu Xu

and

Xiaocan Zhang

Institute for Geography and Spatial Information, Zhejiang University, Hangzhou 310027, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(17), 3269; https://doi.org/10.3390/rs16173269

Submission received: 16 May 2024 / Revised: 29 July 2024 / Accepted: 2 September 2024 / Published: 3 September 2024

(This article belongs to the Section Remote Sensing Image Processing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Remote sensing allows us to conduct large-scale scientific studies that require extensive mapping and the amalgamation of numerous images. However, owing to variations in radiation, atmospheric conditions, sensor perspectives, and land cover, significant color discrepancies often arise between different images, necessitating color consistency adjustments for effective image mosaicking and applications. Existing methods for color consistency adjustment in remote sensing images struggle with complex one-to-many nonlinear color-mapping relationships, often resulting in texture distortions. To address these challenges, this study proposes a convolutional neural network-based color consistency method for remote sensing cartography that considers both global and local color mapping and texture mapping constrained by the source domain. This method effectively handles complex color-mapping relationships while minimizing texture distortions in the target image. Comparative experiments on remote sensing images from different times, sensors, and resolutions demonstrated that our method achieved superior color consistency, preserved fine texture details, and provided visually appealing outcomes, assisting in generating large-area data products.

Keywords:

color consistency; texture; convolutional neural network (CNN); remote sensing cartography

1. Introduction

Satellite remote sensing serves as an unparalleled tool for understanding the Earth and provides crucial data for large-scale global change monitoring and expansive scientific research. This technology is essential for applications such as high-speed ship detection in synthetic aperture radar (SAR) images, which enables the tracking of maritime activities with great precision [1]. Additionally, it plays a significant role in the land cover classification of multispectral imagery, aiding in the assessment and management of natural resources, and in building extraction from high-resolution remote sensing images, which is vital for urban planning and development [2,3]. Owing to the different functions and orbits of different satellite sensors, individual images can cover only specific terrestrial regions [2,4,5]. Consequently, large-area remote sensing analysis inevitably involves creating a mosaic of multisource images that encompasses diverse resolutions, sensors, and temporal phases. During the optical image mosaicking process, substantial color discrepancies often arise among different images due to varying radiation environments, atmospheric conditions, sensor perspectives, and changes in land cover. These discrepancies lead to visual incoherence in the mosaicked output, which can impede data interpretation and feature extraction, thereby underscoring the importance of research on color consistency. This study aims to address the issue of color consistency in multisource optical imagery, which is a fundamental research task with significant scholarly relevance and practical value that aids in the generation of large-area data products.

Remote sensing technologies have rapidly advanced owing to increasing demand, and large-scale cartography has become essential for regional studies and analyses [6,7,8]. Concurrently, there has been a rapid increase in the volume and resolution of image data and a reduction in swath width. This has led to a substantial increase in the number of images requiring mosaicking, thereby increasing the demand for color consistency during image stitching. Various methods have been explored to meet these practical requirements. Currently, the primary approaches to achieving image color consistency can be categorized into statistical analyses, parametric models, and machine learning [9,10]. Statistical-analysis-based methods for color consistency assume that the mean of remote sensing images represents their hue and brightness, and the variance represents contrast and clarity. By manipulating these statistical properties, these methods aim to minimize inconsistencies in brightness and contrast between images [9,11,12]. Typical algorithms include Wallis filtering [13,14,15] and histogram matching [15,16,17,18,19,20]. Parametric model-based color consistency methods rely on the color correspondence between reliable pixel pairs in overlapping image regions. They represent these relationships and optimize the model based on these correspondences [9,11,21,22,23,24]. The key challenge in these methods is the selection of reliable color correspondences because satellite images are subject to complex nonlinear relationships owing to atmospheric disturbances and changes in land cover, necessitating the identification of representative relationships to avoid clearly variable features [9,11,23]. Machine-learning-based methods for color consistency have evolved from content generated by artificial intelligence, particularly style transfer, to facilitate color mapping between images with similar semantic structures [25]. Tasar et al. [25] and Benjdira et al. [26] proposed color-mapping generative adversarial networks for urban datasets that can generate fake training images that are semantically identical to the training images but have spectral distributions similar to the test images, thus achieving color consistency between images. Additionally, Li et al. [27] designed more robust “U-shaped” attention mechanisms for remote sensing images, integrating them into generative adversarial networks to better minimize color distribution differences between images.

Despite their effectiveness in certain applications, these methods struggle with the complexity of one-to-many nonlinear color mapping, owing to the diverse and complex ground truths of remote sensing images. Moreover, existing approaches lack targeted protection of texture information during the mapping process, often leading to texture detail distortion. Therefore, this study aims to solve these challenges by developing a texture-considering convolutional neural network (CNN) approach for color consistency in remote sensing imagery. This approach integrates a color-mapping strategy that addresses global and local aspects, harnessing the capabilities of the CNN for complex nonlinear relationship fitting and sophisticated spatial feature extraction. It establishes intricate one-to-many color mapping among multisource images within overlapping regions. This approach constrains the color consistency process using texture information from the source domain by employing the texture-feature extraction capabilities of the CNN. By imposing constraints based on the original texture of the images requiring correction, this method achieved better color-consistency outcomes.

2. Materials and Methods

2.1. Overview of the Proposed Method

We proposed a texture-considering CNN approach to address the inaccurate color mapping and lack of consistency constraints for texture details resulting from simplified color mapping relationships, leading to texture distortions. As illustrated in Figure 1, this method comprises two components: color mapping and texture mapping. Initially, the color mapping network used the overlapping area of the target image as the input and the overlapping area of the reference image as a label. Through the color-mapping network, complex one-to-many color mappings were achieved based on color characteristics within a specific spatial context to approximate closely the color tone of the reference image. The color loss values between the color mapping outputs and the overlapping area of the reference image were then calculated. Subsequently, using the output of the color mapping network as the input and the target image as the label, the process passed through the texture mapping network. The texture mapping network, which leverages the reversibility of color mapping, reconstructed the target image to deepen texture processing, ensuring that the texture mapping results closely resembled the original image, with texture loss values computed accordingly. The combined loss functions from the color and texture mapping calculations served as a comprehensive loss function for training the entire network. This integrated approach dynamically adjusted the color-mapping outputs to minimize overall errors, allowing the preservation of the source domain texture information while achieving a color resemblance to the reference image. The final color mapping outputs considered both color and texture mapping effects.

2.2. Color Mapping Network

To ensure high-quality color mapping across multiple scales, we considered the scale variability of land object color changes, sensitivity to color-mapping details, and adaptability of the model. Based on the UNet3+ architecture, a three-layer structure was designed for color mapping. This structure enables the model to capture essential detailed features while maintaining sensitivity and adaptability to features across various scales. This approach ensures the accuracy of color mapping and enhances the generalization capabilities of the model for remote sensing image-processing tasks at different scales, as illustrated in Figure 2.

When applying the three-layer UNet3+ model for color mapping, the number of input channels must be considered. The perception of color is influenced not only by the information within individual channels, but also by the interaction among multiple channels. Given that the RGB model represents a standard color space utilizing combinations of red, green, and blue channels to express nearly all colors found in nature, we employed a three-channel approach to enhance the dimensions of color mapping. This ensures that the model accounts for variations within individual channels while integratively handling the interactions between multiple channels, thereby enhancing mapping quality.

In this network architecture, each pixel value in each channel is treated as a distinct category. The target image and the corresponding pixel values from the reference image, which serve as labels, are mapped one to one. The classification process based on the three-layer UNet3+ model establishes color-mapping relationships. Within this model, the convolutional structure consists of 128 3 × 3 filters, with a padding of 1 and a striding of 1. On the encoder side, encoder

X_{E n}^{i}

progresses through two convolution operations: batch normalization, a ReLU activation function, and a 2 × 2 pooling operation to evolve into the encoder

X_{E n}^{i + 1}

, gradually extracting image features, including both color and texture information. On the decoder side, the full-scale skip connections in the UNet3+ model establish direct links between the encoders and decoders at different scales. Each decoder layer integrates feature maps from the same and smaller scales, as well as from larger scales in the encoder, capturing coarse- and fine-grained information across all scales. Since color consistency processing involves local and global adjustments, full-scale skip connections enhance multiscale feature integration, enabling a deeper understanding of image details and structures for more precise color mapping. These skip connections facilitate the direct transfer of texture and color features between the encoder and decoder, reducing information loss during network training and preserving vital color and texture features within the image for higher-quality color consistency processing. To fuse shallow color information with deep texture information, a feature aggregation mechanism was executed on the concatenated feature maps across three scales, adapting to the varying scales and complexities of the images and aiding in the handling of diverse types of ground objects in the images. The decoder

X_{D e}^{i}

is defined as follows:

X_{D e}^{i} = \{\begin{matrix} X_{E n}^{i} i = N \\ H ([\underset{S c a l e s : 1^{t h} ~ i^{t h}}{\underset{⏟}{C {(D (X_{E n}^{k}))}_{k = 1}^{i - 1}, C (X_{E n}^{i})}}, \underset{S c a l e s : {(1 + 1)}^{t h} ~ N^{t h}}{\underset{⏟}{C {(U (X_{D e}^{k}))}_{k = i + 1}^{N}}}]), i = 1, \dots, N - 1 \end{matrix}

(1)

where i is the downsampling layer indexed along the encoder, k is the total number of encoders,

C (\cdot)

is a convolution operation,

H (\cdot)

is the feature aggregation mechanism with a convolution followed by a batch normalization and a ReLU activation function,

D (\cdot)

and

U (\cdot)

are down- and upsampling operations, respectively, and

[\cdot]

is the concatenation.

Using multiscale integrated feature extraction, the three-layer UNet3+ model comprehensively considers the combined information of the features at different levels, allowing for an improved fit of the color and texture distributions within the images. This capability results in enhanced performance in achieving color consistency. The network architecture includes a designated loss function Loss_Color, which aims to enforce color consistency both globally and locally within images, as described in Section 2.4.

2.3. Texture Mapping Network

Despite the capacity of the three-layer UNet3 + model to implement one-to-many color mappings while reasonably preserving texture information, the inherent complexity and diversity of remote sensing data pose challenges in maintaining texture details. Consequently, a texture-mapping relationship model based on the reversible concept of color-mapping processing was established. Color consistency processing can map the color space of the target image to that of the reference image and reverse this mapping to restore the color space of the target image. This means that while achieving color consistency with the reference image, the reverse operation can preserve the texture information of the target image, thereby minimizing the loss of texture details during color consistency processing and enhancing texture information retention. Considering the capabilities of detailed mapping, multiscale adaptability, and multiscale feature integration, a texture-mapping network was designed based on a three-layer UNet3+ model. This approach enables the model to accurately capture texture details while maintaining its integrity as much as possible. Additionally, the uniformity of the model allows for the sharing of learned feature representations and mechanisms across different tasks, thereby ensuring close coordination between color mapping and texture mapping. This coordination helps maintain the coherence of the overall method, as illustrated in Figure 3.

Within this network architecture, each pixel value in each channel is treated as a distinct category. The color-mapping outputs and pixel values of the target image, which serve as labels, are paired one-to-one. This classification process, executed using the three-layer UNet3+ model, establishes the texture-mapping relationships. The convolutional structure in this three-layer UNet3+ model aligns with that described in Section 2.2. In the encoder section on the left, encoder

X_{E n}^{i}

progresses through three convolution operations: batch normalization, a ReLU activation function, and a 2 × 2 pooling operation to evolve into encoder

X_{E n}^{i + 1}

, which gradually extracts texture features at various scales of the image. On the decoder side, the UNet3+ model’s full-scale skip connections facilitate the acquisition of texture information at different hierarchical levels, enhancing the network’s capability to capture local and global image features and thereby augmenting the sensitivity of the texture mapping network to image texture details. The feature aggregation mechanism, which combines and fuses features at different levels, enables the texture restoration network to better understand the textural information of an image. The composition of the decoder

X_{D e}^{i}

was defined using Equation (1).

Through multiscale texture feature extraction, the three-layer UNet3+ model effectively addresses texture distortions during color mapping, maintaining the original texture details and features of the source domain as much as possible throughout the process of achieving color consistency. The network architecture included a designated loss function, Loss_Texture, to constrain the consistency of the texture results with the source domain, as described in Section 2.4.

2.4. Comprehensive Loss Function

Color consistency processing must not only focus on the precision of one-to-many color mapping, but also consider the potential degradation of texture information. Therefore, we proposed a comprehensive loss function, Loss_{C_T}, which considers both color and texture. This function comprises color mapping loss, Loss_Color, and texture mapping loss, Loss_Texture, as illustrated in Figure 1.

For color mapping loss, Loss_Color, the objective is to adjust the colors in the target image such that they resemble the colors in the reference image more closely. This includes adjusting both the individual pixel colors and the overall coloration of the image. To account for pixel- and image-level discrepancies and balance local and global variations, Loss_Color was formulated using a root mean square error function,

L o s s_{P_R M S E}

, and a color distance function,

L o s s_{G_C D}

.

L o s s_{P_R M S E}

quantifies the difference between the predicted pixel values and their true values by adjusting each pixel color to closely match the corresponding pixel color in the reference image. The formula used is as follows:

L o s s_{P_R M S E} = {||I_{o} - I_{r}||}^{2}

(2)

where

I_{o}

is the output image,

I_{r}

is the reference image, and

||\cdot||

represents L1 norm operation.

L o s s_{G_C D}

constrains the global color characteristics of the image, maintains the dynamic range, and prevents most pixels from clustering at a few gray levels. It adjusts the color distribution at the macroscopic level to achieve overall tonal consistency. The formula is as follows:

L o s s_{G_C D} = \sum_{I_{o} \cap I_{r} \neq \emptyset} {\bar{ω}}_{o r} \frac{Δ H ({\hat{I}}_{o r}, {\hat{I}}_{r o})}{N_{b i n}}

(3)

where

{\hat{I}}_{o r}

is the overlapping area between the output image

I_{o}

and the reference image

I_{r}

,

{\hat{I}}_{r o}

is the overlapping area between the reference image

I_{r}

and the output image

I_{o}

, and

{\bar{ω}}_{o r}

is the normalized weight proportional to the overlapping area between

I_{o}

and

I_{r}

(

\sum {\bar{ω}}_{o r} = 1

).

Δ H

calculates the difference between the image histograms, and

N_{b i n}

is the number of histogram bins.

Regarding texture mapping loss, because texture mapping aims to minimize the degradation of texture details during color mapping, the root mean square error (RMSE) function,

L o s s_{T e x t u r e}

, was selected to measure the differences between the texture-mapped and target images. This function accurately quantified these differences, particularly for complex textures with significant variability, thereby enhancing the mapping of critical texture information and reducing degradation. The formula used is as follows:

L o s s_{T e x t u r e} = {||I_{t}^{'} - I_{t}||}^{2}

(4)

where

I_{t}

is the target image,

I_{t}^{'}

is the image after texture mapping, and

||\cdot||

represents L1 norm operation.

Given the dual phases of color and texture mapping within the network, each with distinct tasks, disparities often arise in the convergence rates and magnitudes of the loss functions during training. This can lead the network to prioritize dominant tasks over others. Therefore, to balance and integrate the two network models, we used a dynamic weighting fusion strategy. The color mapping network loss function Loss_Color is defined as follows:

L o s s_{C o l o r} = \{\begin{matrix} \frac{L o s s_{P_R M S E}}{L o s s_{G_C D}} \times L o s s_{G_C D} + L o s s_{P_R M S E} i f L o s s_{P_R M S E} \geq L o s s_{G_C D} \\ \frac{L o s s_{G_C D}}{L o s s_{P_R M S E}} \times L o s s_{P_R M S E} + L o s s_{G_C D} i f L o s s_{G_C D} > L o s s_{P_R M S E} \end{matrix}

(5)

where

\times

is an integer operation.

L o s s_{C_T}

serves as the overall loss function for the entire network and is composed of Loss_Color and Loss_Texture. This ensured that the color mapping output, the final outcome of this methodology, was consistent with the coloration of the reference image, while preserving good textural details.

L o s s_{C_T}

is defined as follows:

L o s s_{C_T} = \{\begin{matrix} \frac{L o s s_{C o l o r}}{L o s s_{T e x t u r e}} \times L o s s_{T e x t u r e} + L o s s_{C o l o r} i f L o s s_{C o l o r} \geq L o s s_{T e x t u r e} \\ \frac{L o s s_{T e x t u r e}}{L o s s_{C o l o r}} \times L o s s_{C o l o r} + L o s s_{C C I} i f L o s s_{T e x t u r e} > L o s s_{C o l o r} \end{matrix}

(6)

where

\times

is an integer operation.

2.5. Experiments

2.5.1. Experimental Design

Owing to the varying spectral characteristics and color distributions inherent to the data from different sensors and temporal phases, the effectiveness of the proposed method was evaluated through color-consistency processing experiments under six distinct scenarios, as shown in Table 1.

2.5.2. Comparative Methods and Experimental Environment

To comprehensively validate the effectiveness of the proposed method, comparative experiments were conducted using four representative methods: histogram matching and Wallis filtering (statistical methods), the model developed by Xia et al. [20] (a parametric model), and the CycleGAN method (machine learning). These methods were selected because of their established relevance and utility in addressing similar challenges in image processing. Both the proposed method and CycleGAN were trained using a dataset split in a 7:3 ratio for training and testing, respectively, using the following hyperparameters: batch size of 10, learning rate of 0.0001, and decay rate of 0.8 every 20 epochs. The optimization was facilitated using the Adam optimizer, a gradient-descent-based algorithm with an adaptive learning rate featuring exponential decay rates of 0.9 and 0.999 and a weight decay of 1 × 10⁻⁸. The total training period was set to 300 epochs.

The experimental environment for this study is as follows: Windows 10 operating system, Intel(R) Core(TM) i9-12900K processor, 64 GB RAM, and NVIDIA GeForce RTX 3090 GPU with 24 GB of video memory. Under these conditions, it takes approximately 4 h and 21 min to complete the modeling of 5000 datasets with dimensions of 128 × 128, whereas the inference stage only requires 96 s.

2.5.3. Evaluation Metrics

To obtain a more comprehensive and reliable evaluation of the results, it is essential to integrate objective indicators and subjective visual evaluations. Objective indicators provide statistically significant data support and are characterized by their stability and quantifiable nature. However, these indicators do not fully capture the actual content of images or human visual perception. Subjective visual evaluations compensate for this limitation by offering insights into the overall effect, details, and content perception of images, thereby providing more guidance. Combining these assessments enables a more thorough and credible evaluation, ensuring accuracy and completeness. For a comprehensive evaluation of color and texture mapping, we designed a composite index (Equation (7)) by integrating six objective indicators: mean squared error (Equation (8)), structural similarity index (Equations (9)–(12)), color distance (Equation (13)), mean (Equation (14)), variance (Equation (15)), and information entropy (Equation (16)).

\begin{matrix} C E & = 0.2 \times \frac{R S M E - R M S E_{m i n}}{R M S E_{m a x} - R M S E_{m i n}} \\ + 0.2 \times 1 - \frac{S S I M - S S I M_{m i n}}{S S I M_{m a x} - S S I M_{m i n}} \\ + 0.15 \times \frac{C D - C D_{m i n}}{C D_{m a x} - C D_{m i n}} \\ + 0.15 \times \frac{Δ \bar{f} - Δ {\bar{f}}_{m i n}}{Δ {\bar{f}}_{m a x} - Δ {\bar{f}}_{m i n}} \\ + 0.15 \times \frac{Δ S - Δ S_{m i n}}{Δ S_{m a x} - Δ S_{m i n}} \\ + 0.15 \times (1 - \frac{H - H_{m i n}}{H_{m a x} - H_{m i n}}) \end{matrix}

(7)

where

X_{m a x}

and

X_{m i n}

are the maximum and minimum values of X calculated across all methods.

R M S E = \sqrt{\frac{1}{M \times N} \sum_{i = 0}^{M - 1} \sum_{j = 0}^{N - 1} {[f_{1} (i, j) - f_{2} (i, j)]}^{2}}

(8)

where M and N are the width and height of the image, respectively, and

f (i, j)

is the average value at pixel

(i, j)

.

S S I M = L \cdot S \cdot C

(9)

S = \frac{2 σ_{x y} + C_{2}}{σ_{x}^{2} + σ_{y}^{2} + C_{2}}

(10)

C = \frac{σ_{x y} + C_{3}}{σ_{x} σ_{y} + C_{3}}

(11)

L = \frac{2 μ_{x} μ_{y} + C_{1}}{μ_{x}^{2} + μ_{y}^{2} + C_{1}}

(12)

where L, S, and C are the luminance, contrast, and structure comparison functions, respectively; n is the number of image partitions;

μ_{x}

and

μ_{y}

are the average luminosities of images x and y;

σ_{x}

and

σ_{y}

are their standard deviations;

σ_{x y}

is the covariance between images x and y; and

C_{1}

,

C_{2}

, and

C_{3}

are constants to prevent division by zero, typically,

C_{1} = {(k_{1} \cdot L)}^{2}

,

C_{2} = {(k_{2} \cdot L)}^{2}

, and

C_{3} = \frac{C_{2}}{2}

, with

k_{1}

and

k_{2}

usually set to 0.01 and 0.03, respectively.

C D = \sum_{I_{r} \cap I_{t} \neq \emptyset} {\bar{ω}}_{r t} \frac{Δ H ({\hat{I}}_{r t}, {\hat{I}}_{t r})}{N_{b i n}}

(13)

where

{\hat{I}}_{r t}

is the overlapping region between the resulting image

I_{r}

and the reference image

I_{t},

and

{\hat{I}}_{t r}

is the overlapping region between the reference image

I_{t}

and the result

I_{r}

. The normalized weight,

{\bar{ω}}_{r t}

, proportional to the overlapping area between

I_{r}

and

I_{t}

, ensures that the sum of

{\bar{ω}}_{r t}

equals 1 (

\sum {\bar{ω}}_{r t} = 1

).

Δ H

is the difference between the histograms of the images and

N_{b i n}

is the number of histogram bins.

Δ \bar{f} = | | {\bar{f}}_{1} - {\bar{f}}_{2} | |

(14)

where

{\bar{f}}_{1}

and

{\bar{f}}_{2}

correspond to the mean values of the two compared images, and

||\cdot||

represents absolute value operation.

Δ S = ||S_{1} - S_{2}||

(15)

where

S_{1}

and

S_{2}

represent the standard deviations of the two respective images, and

||\cdot||

represents absolute value operation.

H = - \sum_{i = 1}^{k} P (i) l o g_{2} P (i)

(16)

where k is the number of grayscale levels and

P (i)

is the probability of occurrence of the

i^{t h}

grayscale level within the image, which can be approximated using histogram frequencies.

2.5.4. Experimental Data and Procedure

We utilized imagery from the Landsat 8, Sentinel 2, and GF2 series as the data sources. Landsat 8 provides long-term time series data at a resolution of 30 m, while Sentinel 2 is supplemented with a 10 m spatial resolution, enriching the observation of terrestrial details. Considering the color variability of ground targets across different seasons, the frequent revisiting cycles of Landsat 8 (16 days) and Sentinel 2 (5 days) offer regular data updates, enabling the selection of appropriate image data for color-consistency experiments based on seasonal changes. Because of their high-quality data and open-access policies, both Landsat 8 and Sentinel 2 have become vital tools in academic research and application development. The GF2 satellite is equipped with two high-resolution 1 m panchromatic and 4 m multispectral cameras. It has the characteristics of sub-meter spatial resolution, high positioning accuracy, and rapid attitude maneuverability. It can provide high-quality earth observation data and is widely used in urban planning, agricultural monitoring, resource surveying, and other fields. Using highly representative data sources is expected to yield better research outcomes.

Based on Landsat 8, Sentinel 2, and GF2 imagery, images were selected according to the differences in sensors and seasonal variations, as depicted in Figure 4. To balance the volume of the training data and the effects of the spatial scale, the overlapping regions of the image pairs were cropped to 128 × 128 pixel slices at a 50% overlap rate. These slices, using the target image as input data and the reference images as label data, were compiled into a dataset, resulting in 43,215 experimental dataset slices.

3. Results and Discussion

The objective evaluation is presented in Table 2, which is calculated using the composite index of Equation (7). Based on the composite index, the proposed method achieved the best outcomes in all six experiments. This superiority is attributed to the ability of the method to simultaneously consider pixel- and image-level color mappings while effectively preserving texture details, thereby garnering favorable evaluations across various objective metrics. Histogram matching relies primarily on the pixel distribution in image histograms for color mapping and achieves commendable results at the image level. However, it performs poorly in maintaining pixel-level color mapping and texture details, particularly in regions with significant color differences, owing to its inability to precisely handle complex color variations between individual pixels. Wallis filtering maps colors based on the overall mean and variance of the images. Although it improved the general appearance of images, it reduced the preservation of the original textures and color details of the terrain, leading to suboptimal performance in the composite index. Xia et al.’s method [20] achieved relatively better texture preservation, but excessively simplified the complex one-to-many color correspondences during the color mapping process, diminishing its effectiveness at the pixel and image levels. Despite its strong performance in image-level color mapping metrics, CycleGAN tended to blur or even fabricate textures that were not present in the actual terrain owing to its generative mechanism, resulting in poor pixel-level color and texture mapping performance.

3.1. Comparison of the Same Sensors with Small Temporal Differences

For the first experiment, a Landsat 8 image from 7 April 2022 served as the target image, with a reference image from 15 March 2022. As shown in Figure 5, the Wallis filtering method displayed a noticeable color shift toward a yellowish tone, likely owing to the higher proportion of water surfaces in the reference image compared to the target image. The CycleGAN results exhibited blurred texture distortions caused by the random division of training data—trained data retained better texture details, but untrained portions were prone to fabricated textures. However, the other methods showed minimal visual differences. An integrated evaluation of objective and subjective assessments revealed that under conditions of small temporal differences with the same sensor, the proposed method maintained the original texture structure and achieved superior color consistency across all terrain types compared to the other methods.

3.2. Comparison of the Same Sensors with Large Temporal Differences

The second experiment used a Landsat 8 image from 7 April 2022 as the target image and an image from 22 February 2020 as the reference image. The results indicated that Wallis filtering and CycleGAN experienced the same issues as in the first experiment, including overall color bias and blurred textures (Figure 6). Additionally, because of significant temporal differences, the two images exhibited distinct stages of crop growth, with histogram matching, Wallis filtering, and Xia et al.’s method [20] retaining the greener saturation of crops from the target image, whereas CycleGAN severely blurred textures. The proposed method most closely matched the color tone of the reference image, but with notable changes in some building colors, as depicted in column II of Figure 6. Other methods failed to achieve color consistency with the reference image. An integrated evaluation of objective and subjective assessments indicated that, even with large temporal differences and significant color tone variations in the terrain, the proposed method preserved the texture structure of the target image more effectively and closely matched the color tone of the reference image, offering a distinct advantage over the other methods.

3.3. Comparison of Different Sensors with Small Temporal Differences

The third experiment used a Landsat 8 image from 7 April 2022 as the target image and a Sentinel 2 image from 8 April 2022 as the reference image. The results, shown in Figure 7, reflect notable color tone differences owing to the different sensors, particularly in buildings and water. Except for CycleGAN and the proposed method, the other techniques failed to effectively correct the color consistency for buildings and water when significant color shifts occurred in the same terrain types, such as abnormal textures. This is a fundamental issue stemming from an inability to ensure the authenticity and effectiveness of generated data; in addition, training data containing a high proportion of complex textures adversely affected simple textures, such as water surfaces. Notable color differences were apparent in some farmlands, as illustrated in column I of Figure 7, where the reference image displayed bare farmland in blue, whereas histogram matching, Wallis filtering, and Xia et al.’s method [20] retained the brownish-yellow tone of the target image. Because of the differing angles of sunlight, the mountainous regions in the two images showed reversed patterns of light and shadows. The proposed method aligned the results with the light and shadow faces of the mountain in the reference image, thereby achieving better color consistency. An integrated evaluation of objective and subjective assessments showed that under conditions of small temporal differences with different sensors, the proposed method achieved superior color consistency outcomes in both objective and subjective evaluations, primarily by correcting significant color shifts in the same terrain types while preserving the original texture structure.

3.4. Comparison of Different Sensors with Large Temporal Differences

The fourth experiment used a Landsat 8 image from 7 April 2022 as the target image and a Sentinel 2 image from 27 February 2022 as the reference image. The results, depicted in Figure 8, illustrate that the two images exhibited significant color tone differences. Histogram matching, Wallis filtering, and Xia et al.’s method [20] encountered the same issues as in the third experiment, with significant color shifts occurring in the same terrain types. CycleGAN and the proposed method achieved better correction results for terrains with significant color shifts, although CycleGAN encountered the same issues with texture distortion as in the third experiment. An integrated evaluation of objective and subjective assessments showed that under conditions of large temporal differences with different sensors, the proposed method demonstrated better performance in handling color consistency. This was a result of the significant differences making the feature disparities between the image data more pronounced and more readily corrected by the proposed method. In remote sensing cartography applications that deal with images with large temporal differences, the proposed method has significant practical value.

3.5. Comparison of High-Resolution Sensor with Small Temporal Differences

The fifth experiment used the GF2 image from 14 July 2022 as the target image and the image from the same date as the reference image. Since both images were taken on the same day, their overall tones were relatively close, with only minor color shifts, mainly caused by changes in illumination. The results indicate that the Wallis filter result is closer to the tones of the target image, while the CycleGAN result shows a certain degree of texture distortion in buildings (e.g., the blue roof in Figure 9f(Ⅲ)). The visual effects of the remaining methods are slightly different. A comprehensive evaluation of both objective and subjective assessments shows that under the conditions of high-resolution sensors with small temporal differences, the proposed method maintains the original texture structure and achieves better color consistency compared to other methods.

3.6. Comparison of High-Resolution Sensor with Large Temporal Differences

The sixth experiment used the GF2 image from 1 October 2023 as the target image and the image from 21 April 2023 as the reference image. As shown in Figure 10, the two images exhibited obvious tonal differences. Histogram matching, Wallis filtering, and Xia et al.’s method [20] encountered the same problem as in the third experiment, where significant color shifts occurred in the same terrain types, especially in vegetation and water surfaces. Both CycleGAN and the proposed method achieve better correction results for terrains with significant color shifts, although CycleGAN still suffers from a certain degree of texture distortion. An integrated evaluation of objective and subjective assessments showed that under conditions of large temporal differences with high-resolution sensors, the proposed method can more effectively preserve the texture structure of the target image and closely match the hue of the reference image, demonstrating better performance in handling color consistency.

3.7. Case Study

We selected Zhejiang Province as the research area because of its rapid socioeconomic development and widespread demand for remote sensing applications. Zhejiang boasts diverse terrain types and rich land cover marked by significant natural variations and frequent human activities, providing a rich array of surface features and color cases for color consistency processing.

Based on the proposed texture-considering CNN method for remote sensing image color consistency, we applied this approach to the region, selecting 10 scenes from Landsat 8 and Sentinel 2 imagery. A regional map before color consistency processing is shown in Figure 11. To minimize the color transmission errors during the color consistency process, image strip number 119,040, located at the center of the study area, was selected as the reference. Images from strips 119,039, 118,040, 119,041, and 120,040 were processed for consistency, and images from strips 118,039, 118,041, 120,039, 50 RPT, and 51 RTP were processed. Finally, all the processed images were used to produce a regional map displaying the results of the color consistency processing, as illustrated in Figure 12.

Comparisons of the images from the study area before and after processing revealed that the remote sensing images processed using the proposed method maintained good color consistency, both globally and locally, relative to the reference images. Moreover, this method enhanced the texture mapping constrained by the source domain, ensuring better preservation of texture information within the images. However, the method presented in this paper still exhibits certain limitations, particularly in cases where there is a significant discrepancy in the types of land objects within the images, such as the absence of water bodies in overlapping areas. Under these circumstances, the method is unable to maintain the original color tones of the water bodies. The final regional map not only exhibited pleasing visual effects, but also retained detailed textural features, providing a solid data foundation for subsequent remote sensing image analysis and applications. This contributes significantly to regional research and management efforts and demonstrates the utility of the proposed method in practical settings.

4. Conclusions

Existing methods for processing color consistency in remote sensing imagery often face challenges due to oversimplified color mapping relationships, which can lead to inaccuracies in color mapping, as well as a lack of constraints on the consistency of texture details, resulting in texture distortions. We proposed a texture-considerate convolutional neural network approach to address these challenges. This method effectively manages complex one-to-many color mappings while minimizing texture distortions in the target image. Drawing on the concept of augmenting information dimensions through spatial features and the first law of geography, which emphasizes spatial autocorrelation, the proposed method enhanced the processing capability for different objects with the same spectrum and stabilized the color-mapping relationships by considering both global and local mapping scales. Using texture mapping constrained by the source domain, we strengthened the in-depth processing of texture information, thereby reducing texture distortion during color consistency processing. Comparative experiments showed that the proposed method not only presents good visual effects for different sensors but also maintains good texture characteristics. While our experiments focused on data from Landsat-8, Sentinel-2, and GF-2 sensors, the underlying principles and architecture of our method suggest that it should generalize well to other sensors. Based on powerful feature extraction capabilities, the proposed method can learn and identify complex patterns in images, regardless of the specific sensor used. The proposed method is primarily divided into two stages: modeling and inference. The processing time is contingent upon the number of samples and the hardware specifications. One key advantage of the proposed method is that its performance is primarily influenced by computational efficiency rather than the size of the images being processed. The method’s design ensures that it can handle very large datasets and high-resolution imagery effectively, which is achieved by leveraging the parallel processing capabilities of GPUs.

Author Contributions

Conceptualization, X.Q. and C.S.; methodology, X.Q., S.W. and Z.X.; software, X.Q.; writing—original draft preparation, X.Q. and C.S.; supervision, C.S. and X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Fundamental Research Funds for the Central Universities under Grant 226-2023-00154 and in part by the National Key Research and Development Program of China under Grant 2018YFB0505002.

Data Availability Statement

The procedure and sample datasets are publicly available at https://github.com/qxyikl/ColorConsistencyProgramExample.git, accessed on 15 May 2024.

Acknowledgments

We thank the anonymous reviewers for their constructive comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yang, G.; Lei, J.; Xie, W. Algorithm/Hardware Codesign for Real-Time On-Satellite CNN-Based Ship Detection in SAR Imagery. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–18. [Google Scholar] [CrossRef]
Rees, M.J. On the Future: A Keynote Address. Engineering 2020, 6, 110–114. [Google Scholar] [CrossRef]
Gao, Y.; Zheng, N.; Wang, C. Global Change Study and Remote Sensing Technology. Geo-Inf. Sci. 2000, 2, 42–46. [Google Scholar]
Gonçalves, J.A.; Henriques, R. UAV Photogrammetry for Topographic Monitoring of Coastal Areas. ISPRS J. Photogramm. Remote Sens. 2015, 104, 101–111. [Google Scholar] [CrossRef]
Bioucas-Dias, J.M.; Plaza, A.; Camps-Valls, G. Hyperspectral Remote Sensing Data Analysis and Future Challenges. IEEE Geosci. Remote Sens. Mag. 2013, 1, 6–36. [Google Scholar] [CrossRef]
Hansen, M.C.; Loveland, T.R. A Review of Large Area Monitoring of Land Cover Change using Landsat Data. Remote Sens. Environ. 2012, 122, 66–74. [Google Scholar] [CrossRef]
Zhang, W.; Li, X.; Yu, J. Remote Sensing Image Mosaic Technology Based on SURF Algorithm in Agriculture. EURASIP J. Image Video Process. 2018, 2018, 85. [Google Scholar] [CrossRef]
Wang, W.; Wen, D. Review of Dodging Algorithm of Remote Sensing Image. Jiangsu Sci. Technol. Inf. 2017, 6, 51–55. [Google Scholar]
Li, Y.; Yin, H.; Yao, J. A Unified Probabilistic Framework of Robust and Efficient Color Consistency Correction for Multiple Images. ISPRS J. Photogramm. Remote Sens. 2022, 190, 1–24. [Google Scholar] [CrossRef]
Schroeder, T.A.; Cohen, W.B.; Song, C. Radiometric Correction of Multi-temporal Landsat Data for Characterization of Early Successional Forest Patterns in Western Oregon. Remote Sens. Environ. 2006, 103, 16–26. [Google Scholar] [CrossRef]
Li, Z.; Zhu, H.; Zhou, C. A Color Consistency Processing Method for HY-1C Images of Antarctica. Remote Sens. 2020, 12, 1143. [Google Scholar] [CrossRef]
Zhang, L.; Zhang, Z.; Zhang, J. The Image Matching Based on Wallis Filtering. J. Wuhan Tech. Univ. Surv. Mapp. 1999, 1, 24–27. [Google Scholar]
Li, D.; Wang, M.; Pan, J. Auto-dodging Processing and Its Application for Optical RS Images. Geomat. Inf. Sci. Wuhan Univ. 2006, 9, 753–756. [Google Scholar]
Yeganeh, H.; Ziaei, A.; Rezaie, A. A Novel Approach for Contrast Enhancement Based on Histogram Equalization. In Proceedings of the International Conference on Computer and Communication Engineering, Kuala Lumpur, Malaysia, 13–15 May 2008; pp. 256–260. [Google Scholar]
Jensen, J.R. Introductory Digital Image Processing: A Remote Sensing Perspective, 2nd ed.; Prentice Hall: Hoboken, NJ, USA, 1996; pp. 107–192. [Google Scholar]
Gonzalez, R.C.; Woods, R.E. Digital Image Processing, 3rd ed.; Person Prentice Hall: Hoboken, NJ, USA, 2008; pp. 120–139. [Google Scholar]
Bockstein, I.M. Color Equalization Method and Its Application to Color Image Processing. J. Opt. Soc. Am. A 1986, 3, 735–737. [Google Scholar] [CrossRef]
Buzuloiu, V.V. Adaptive-neighborhood Histogram Equalization of Color Images. J. Electron. Imaging 2001, 10, 445–459. [Google Scholar] [CrossRef]
Acharya, T.; Ray, A.K. Image Processing: Principles and Applications; Wiley-Interscience: Hoboken, NJ, USA, 2005; pp. 110–114. [Google Scholar]
Xia, M.; Yao, J.; Gao, Z. A Closed-form Solution for Multi-view Color Correction with Gradient Preservation. ISPRS J. Photogramm. Remote Sens. 2019, 157, 188–200. [Google Scholar] [CrossRef]
Zhang, Y.; Yu, L.; Sun, M. A Mixed Radiometric Normalization Method for Mosaicking of High-Resolution Satellite Imagery. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2972–2984. [Google Scholar] [CrossRef]
Zhang, L.; Wu, C.; Du, B. Automatic Radiometric Normalization for Multitemporal Remote Sensing Imagery with Iterative Slow Feature Analysis. IEEE Trans. Geosci. Remote Sens. 2014, 52, 6141–6155. [Google Scholar] [CrossRef]
Xia, M.; Yao, J.; Xie, R. Color Consistency Correction Based on Remapping Optimization for Image Stitching. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 2977–2984. [Google Scholar]
Gatys, L.A.; Ecker, A.S.; Bethge, M. A Neural Algorithm of Artistic Style. J. Vis. 2016, 16, 326. [Google Scholar] [CrossRef]
Tasar, O.; Happy, S.L.; Tarabalka, Y. ColorMapGAN: Unsupervised Domain Adaptation for Semantic Segmentation Using Color Mapping Generative Adversarial Networks. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7178–7193. [Google Scholar] [CrossRef]
Benjdira, B.; Bazi, Y.; Koubaa, A. Unsupervised Domain Adaptation using Generative Adversarial Networks for Semantic Segmentation of Aerial Images. Remote Sens. 2019, 11, 1369. [Google Scholar] [CrossRef]
Li, X.; Zhang, L.; Wang, Q. Multi-temporal Remote Sensing Imagery Semantic Segmentation Color Consistency Adversarial Network. Acta Geod. Cartogr. Sin. 2020, 49, 1473–1484. [Google Scholar]

Figure 1. Overview of the proposed method (Loss_Color represents color mapping loss; Loss_Texture represents texture mapping loss; and Loss_{C_T} represents comprehensive loss function, which is calculated by Loss_Color and Loss_Texture).

Figure 2. Architecture of the color mapping network based on the three-layer UNet3+ model.

Figure 3. Architecture of the texture mapping network based on the three-layer UNet3+ model.

Figure 4. Experimental data (the orange border represents the overlapping area of the image, and the Roman numerals indicate the locations of the partial schematic images for six experiments). (a) Target image for experiments 1–4. (b) Reference image for experiment 1. (c) Reference image for experiment 2. (d) Reference image for experiment 3. (e) Reference image for experiment 4. (f) Target image for experiment 5. (g) Reference image for experiment 5. (h) Target image for experiment 6. (i) Reference image for experiment 6.

Figure 5. Color consistency results of the partial schematic images for experiment 1. Ⅰ~Ⅳ represent different features, and the location is shown in Figure 4b. (a) Reference image. (b) Target image. (c) Histogram matching. (d) Wallis filtering. (e) Xia et al.’s method [20]. (f) CycleGAN. (g) Proposed method.

Figure 6. Color consistency results of the partial schematic images for experiment 2. Ⅰ~Ⅳ represent different features, and the location is shown in Figure 4c. (a) Reference image. (b) Target image. (c) Histogram matching. (d) Wallis filtering. (e) Xia et al.’s method [20]. (f) CycleGAN. (g) Proposed method.

Figure 7. Color consistency results of the partial schematic images for experiment 3. Ⅰ~Ⅳ represent different features, and the location is shown in Figure 4d. (a) Reference image. (b) Target image. (c) Histogram matching. (d) Wallis filtering. (e) Xia et al.’s method [20]. (f) CycleGAN. (g) Proposed method.

Figure 8. Color consistency results of the partial schematic images for experiment 4. Ⅰ~Ⅳ represent different features, and the location is shown in Figure 4e. (a) Reference image. (b) Target image. (c) Histogram matching. (d) Wallis filtering. (e) Xia et al.’s method [20]. (f) CycleGAN. (g) Proposed method.

Figure 9. Color consistency results of the partial schematic images for experiment 5. Ⅰ~Ⅳ represent different features, and the location is shown in Figure 4e. (a) Reference image. (b) Target image. (c) Histogram matching. (d) Wallis filtering. (e) Xia et al.’s method [20]. (f) CycleGAN. (g) Proposed method.

Figure 10. Color consistency results of the partial schematic images for experiment 6. Ⅰ~Ⅳ represent different features, and the location is shown in Figure 4e. (a) Reference image. (b) Target image. (c) Histogram matching. (d) Wallis filtering. (e) Xia et al.’s method [20]. (f) CycleGAN. (g) Proposed method.

Figure 11. Regional map before color consistency processing in the research area.

Figure 12. Regional map after color consistency processing in the research area.

Table 1. Experimental plans.

Experimental Plans	Target Images	Reference Images
Same sensor with small temporal differences	Landsat 8 OLI, 7 April 2022	Landsat 8 OLI, 15 March 2022
Same sensor with large temporal differences	Landsat 8 OLI, 7 April 2022	Landsat 8 OLI, 22 February 2020
Different sensors with small temporal differences	Landsat 8 OLI, 7 April 2022	Sentinel 2 MSI, 8 April 2022
Different sensors with large temporal differences	Landsat 8 OLI, 7 April 2022	Sentinel 2 MSI, 27 February 2022
High-resolution sensor with small temporal differences	GF 2, 14 July 2022	GF 2, 14 July 2022
High-resolution sensor with large temporal differences	GF 2, 1 October 2023	GF 2, 21 April 2023

Table 2. Quantitative evaluation of different methods in experiments 1–6 (calculated by Equation (7)).

Experimental Plans	Histogram Matching	Wallis Filtering	Xia et al.’s Method [20]	CycleGAN	Proposed Method
Same sensor with small temporal differences	0.3954	0.6703	0.2956	0.5650	0.2663
Same sensor with large temporal differences	0.2767	0.7204	0.2363	0.3715	0.1343
Different sensors with small temporal differences	0.2931	0.6669	0.3552	0.4994	0.2501
Different sensors with large temporal differences	0.3792	0.7983	0.3431	0.4374	0.3097
High-resolution sensor with small temporal differences	0.6258	0.4624	0.2770	0.4958	0.2650
High-resolution sensor with large temporal differences	0.5407	0.6114	0.5632	0.4966	0.3447

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qian, X.; Su, C.; Wang, S.; Xu, Z.; Zhang, X. A Texture-Considerate Convolutional Neural Network Approach for Color Consistency in Remote Sensing Imagery. Remote Sens. 2024, 16, 3269. https://doi.org/10.3390/rs16173269

AMA Style

Qian X, Su C, Wang S, Xu Z, Zhang X. A Texture-Considerate Convolutional Neural Network Approach for Color Consistency in Remote Sensing Imagery. Remote Sensing. 2024; 16(17):3269. https://doi.org/10.3390/rs16173269

Chicago/Turabian Style

Qian, Xiaoyuan, Cheng Su, Shirou Wang, Zeyu Xu, and Xiaocan Zhang. 2024. "A Texture-Considerate Convolutional Neural Network Approach for Color Consistency in Remote Sensing Imagery" Remote Sensing 16, no. 17: 3269. https://doi.org/10.3390/rs16173269

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Texture-Considerate Convolutional Neural Network Approach for Color Consistency in Remote Sensing Imagery

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of the Proposed Method

2.2. Color Mapping Network

2.3. Texture Mapping Network

2.4. Comprehensive Loss Function

2.5. Experiments

2.5.1. Experimental Design

2.5.2. Comparative Methods and Experimental Environment

2.5.3. Evaluation Metrics

2.5.4. Experimental Data and Procedure

3. Results and Discussion

3.1. Comparison of the Same Sensors with Small Temporal Differences

3.2. Comparison of the Same Sensors with Large Temporal Differences

3.3. Comparison of Different Sensors with Small Temporal Differences

3.4. Comparison of Different Sensors with Large Temporal Differences

3.5. Comparison of High-Resolution Sensor with Small Temporal Differences

3.6. Comparison of High-Resolution Sensor with Large Temporal Differences

3.7. Case Study

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI