1. Introduction
In the field of marine observation, scientists utilize underwater robots to capture photographs for studying marine organisms and mineral resources. For instance, in the research of coral bleaching phenomena and underwater polymetallic nodule deposits, the analysis and recognition of the color characteristics of underwater targets are essential [
1,
2]. However, in complex underwater environments, various factors, such as lighting conditions, spectral absorption by water media, and backscattering of particles, hinder the direct reflection of the true colors of objects in images [
3]. Even with the same light source, the resulting image colors can vary. This instability of colors in underwater images poses a challenge, as the technology to obtain the accurate surface colors of marine objects is not yet mature [
4]. Therefore, studying color constancy in underwater images becomes an urgent problem to address.
Early methods for color constancy analysis in underwater images were based on the statistical analysis of pixel values. Among them, the white-patch algorithm [
5,
6] assumes that a white surface can adequately reflect the illuminant color of the scene by selecting the maximum value among the RGB color channels as the illuminant color for the image. However, this algorithm’s estimation performance is not optimal when the overall scene brightness is low. The gray-world algorithm [
7,
8] assumes that for color-rich images, the average pixel values of the three RGB color channels tend to be similar. However, this method’s estimation performance is not ideal for images with limited color or a single color, making it challenging to apply in underwater environments. Li et al. [
9] proposed a wavelet transform method based on the YUV color model, which significantly improves imaging quality. Yan et al. [
10] proposed a new color constancy framework based on the relationship between the reflectance difference and the local normalized reflectance difference. Iqbal et al. [
11] introduced the Laplacian transform to minimize artifacts and noise. With the advancement of underwater image processing techniques, Hassan et al. [
12] constructed an underwater image processing method supported by Retinex theory and achieved better results. However, these aforementioned algorithms are manually designed and have certain limitations in their application, as they cannot effectively perform image illumination estimation under different lighting conditions and complex environments.
The application of machine learning techniques in color constancy research has provided a new direction for solving complex pattern recognition problems. In recent years, researchers have started to incorporate machine learning methods into color constancy analysis, particularly algorithms based on image feature learning. Different learning methods have been proposed for color constancy, such as Bayesian-based color constancy algorithms [
13], backpropagation (BP) neural network-based color constancy algorithms [
14], support vector regression (SVR)-based illuminance estimation algorithms [
15], and extreme learning machine (ELM)-based illuminance estimation algorithms [
16]. Furthermore, deep learning methods with more powerful learning capabilities have been added to the application of color constancy [
17]. Deep learning color constancy methods based on convolutional neural networks (CNN) [
18,
19], transfer learning [
20], fast Fourier transform-based color constancy methods [
21], and contrastive learning for color Constancy [
22] have been proposed. In deep learning algorithms, image features are determined during the training of the network, and the complex deep learning network models significantly increase the computational burden. To combine the powerful computational capabilities of deep learning algorithms with the efficient learning ability of ELM, researchers have extensively investigated the integration of ELM into general deep learning frameworks. Deng et al. [
23] presented the DELM by combining ELM with the idea of autoencoders, called the Extreme Learning Machine Autoencoder (ELM-AE). DELM exhibits strong nonlinear modeling and generalization abilities, which can be further expanded by using larger-scale training data and deeper network structures, showing great potential in the application of underwater image color constancy computation. Some studies have found that applying a swarm intelligence algorithm to find the optimal parameters of DELM is very helpful to its performance [
24,
25,
26].
Motivated by the aforementioned research, this study proposes an iterative chaotic-based arithmetic optimization algorithm (IAOA) to optimize the light estimation model of DELM, referred to as IAOA-DELM. For underwater images captured under unknown illuminations, the IAOA-DELM approach first extracts the color features of the images using the gray edge framework [
27]. Then, based on IAOA-DELM, it estimates the light source of the images. Finally, the estimated light source is applied to correct the underwater images to standard illuminant values using the Von Kries diagonal model [
28]. This correction ensures that the underwater images collected under different color temperature lighting conditions exhibit color-accurate results. The innovation of this paper is primarily demonstrated in the following aspects:
- (1)
Constructing the basic model of DELM to compute scene illumination information from the color features of underwater images.
- (2)
To address the stability and generalization issues caused by the initial parameters in the orthogonal matrix, AOA is employed to optimize the input layer weights and thresholds of ELM-AE in the DELM structure. The search and development stages of AOA are combined with the nonlinear feature mapping stage of ELM-AE.
- (3)
AOA is applied to select the hidden layer nodes’ number and adaptively search for the optimization of effective activation nodes. It simultaneously optimizes hidden layer biases, input weights, and hidden layer nodes’ numbers, obtaining an underwater image illumination estimation model with good predictive performance and stability.
- (4)
The overall initial search agents of AOA are generated using iterative chaos mapping to improve the initialization strategy of AOA and obtain IAOA. In the initialization strategy, without prior knowledge, IAOA enhances the initial population’s quality, thereby improving the algorithm’s operation speed and accuracy.
The other chapters can be divided into the following sections:
Section 2 provides an introduction to AOA and DELM.
Section 3 describes the image dataset built and presents the proposed color constancy model (IAOA-DELM).
Section 4 describes the analysis of the experimental results. Finally,
Section 5 is the summary of the article.
3. Our Contribution
The selection of weights and biases for each hidden layer in ELM-AE is done randomly, and this randomness can have an influence on the training performance and stability of DELM. To address the instability issue in the selection of DELM parameters, the arithmetic optimization algorithm (AOA) was introduced. However, the AOA initialization strategy still suffers from the drawbacks of an uneven and unstable parameter distribution. To overcome this limitation, an iterative chaotic initialization strategy was incorporated to improve the AOA initialization. After optimizing the common parameters of DELM using the improved AOA (IAOA), it is still necessary to conduct repeated experiments to determine the optimal values for the number of search agents, iteration count, hidden layer quantity, and number of neurons. This section explains the construction of the IAOA-DELM color constancy model and provides insights into the setup of the experimental scenarios and the construction of the dataset.
3.1. Search Agent Strategy of DELM Based on AOA
DELM employs an adaptive mechanism to activate effective neurons in the hidden layer based on the characteristics of the training data set. The number of hidden layer neuron nodes is a crucial parameter in the network structure, as it determines the number of relevant nodes in the input layer and the dimensionality of the effective feature parameters. Insufficient hidden layer nodes can impede DELM from accurately capturing the common features of the training set, while an excessive number of nodes may lead to an overemphasis on the specific features of the training set, thereby compromising its generalization ability. Traditional approaches for determining the optimal number of hidden layer nodes include conducting repeated experiments or relying on empirical knowledge to identify the ideal number. Alternatively, a fixed formula derived from statistical experience can be used to calculate the number of nodes in the hidden layer (m) based on the number of nodes in the input layer (n) and the number of nodes in the output layer (s), such as the formulas and . In the first method, researchers set the comparison interval and step size subjectively, which had high complexity and low accuracy. The second method only takes into account the number of nodes in the input and output layers as influential factors, neglecting other important considerations. It will miss the effective feature information of the data itself and produce large training errors. To address this issue, this study introduces a search agent strategy for DELM based on AOA optimization, aiming to determine the number of hidden layer neurons. The steps were as follows:
To determine the maximum network structure, the number of hidden layers and the upper limit of the number of hidden layer nodes were set for DELM;
The relevant parameters of AOA were initialized, and the n input nodes’ number and the s hidden layer nodes’ number were input into AOA as independent parameters for optimization;
The fitness value of each individual was calculated to obtain the optimal parameter combination based on the search agent structure, and the node parameter results of the input and output layers were collected;
According to the Ceil function, map the result to 0 or 1 (0 means freezing the node, 1 means activating the node), and calculate the number of optimal hidden layer nodes.
The search agent in the research is expressed in
Figure 2:
Figure 2 depicts the search agent structure, which consists of four nodes in the input layer and five nodes in the hidden layer. The second and fourth nodes in the input layer are activated, while the first, fourth, and fifth nodes in the hidden layer are activated. Whether it is activated or not indicates that the network selects the effective input features of the second and fourth for training and connects the effective hidden layer nodes 1, 4, and 5. According to the connection relation between the effective nodes of the input layer and the hidden layer, the input layer weight (
,
,
,
,
, and
) and the hidden layer bias (
,
, and
) are selected, and the output parameter matrix is formed. Thus, the selection and learning of important features in the training set are completed. The optimization of input layer weight, hidden layer bias, and the number of hidden layer nodes are realized simultaneously. The input weight matrix β of ELM-AE was calculated according to formula (6). The output weight matrix of all hidden layers was obtained through training so as to obtain the AOA-DELM.
3.2. Improved Arithmetic Optimization Algorithm Based on Iterative Chaotic Initialization (IAOA)
In the initial stage of the meta-heuristic optimization algorithm, the initial population is selected by means of random generation, which will lead to an uneven distribution of individuals in the population. Iterative chaotic mapping is characterized by strong chaos and pseudo-randomness. Compared with the initialization strategy of randomly generating the initial position, iterative chaotic mapping can make the population more evenly distributed in the search space [
30]. If a population has n individuals, then the random initial population X = {
, …,
, …,
}, k ∈ [1, n], and the mathematical expression is as follows:
where,
is the value of introducing iterative chaotic mapping, a ∈ (0,1). In this paper, a is 0.7.
AOA uses mathematical operators as an optimization means to select the population with the lowest loss function from all common populations (candidate schemes). However, the initial strategy of random distribution of AOA will lead to a large number of individuals in the population moving away from the optimal value, which limits the optimization efficiency of the AOA mechanism. If the population distribution is close enough to the optimal scenario, then the exploration and search phases of AOA will be efficient enough. In order to achieve this goal, an iterative chaos algorithm was introduced to form the initial population distribution of AOA into IAOA.
Suppose there are N search agents in IAOA initialization, each search agent vector has M dimensions, and each variable has the same upper boundary ub and the same lower boundary lb. Firstly, the first dimension value was randomly generated for the first search agent vector, X1. The second dimension value X2 was generated using the iterative chaos algorithm, X2 was opposite to the distribution of X1. Generate the first search proxy vector X2 = {, …, , …, } based on the idea that . Similarly, the population vector of the remaining N-1 search agents was obtained to form the initial search agent population matrix X2 = {, …, } of IAOA. The strategy of boundary absorption was adopted in the subsequent iteration. The latitude value greater than the upper boundary was set as ub, and the latitude value less than the upper boundary was set as lb.
Compared with the completely random generation of search proxy vectors by the original AOA algorithm, the initial population of IAOA is more widely distributed and more uniform in the search interval. The initial population, evenly distributed within the search interval, has a higher likelihood of capturing the correct eigenvalues, and it is more likely to explore and discover the optimal population.
3.3. Color Constancy Algorithm Flow of Underwater Image Based on IAOA-DELM
The IAOA-DELM color constancy algorithm for underwater images was constructed by combining the IAOA-DELM design idea with underwater image feature learning. The detailed steps and complete process were described as follows:
Underwater scene images were shot, and a gray edge frame was used to extract color features from the images as an input vector and constitute the input data set;
The number of DELM hidden layers, the number of iterations, and the number of search agents were input. A group of excellent initial populations for AOA was generated by using the iterative chaos algorithm;
The dataset was randomly divided into training and test sets using ten-fold cross-validation, where nine subsets were used for training and one subset was used for testing;
The training data set was input, the chromaticity feature vector was normalized, and the parameters were limited to search the effective interval. The training set was used as input for training, and the effective nodes of DELM were activated. The enhanced AOA algorithm was employed to optimize the input layer weights, hidden layer biases, and hidden layer nodes of DELM;
The fitness of the AOA search agent population was calculated and compared with the best fitness in the previous iteration to decide whether to update the population position;
The optimal parameters of IAOA-DELM were obtained after reaching the maximum number of iterations, and the input weight matrix β of ELM-AE was calculated. The output weight matrix of DELM was obtained, and the IAOA-DELM illumination estimation model was constructed;
The IAOA-DELM illumination estimation model was used to calculate the illumination of the test set images. The color constancy of underwater images is realized by restoring the image to the standard light source based on the diagonal mapping matrix.
The implementation flow chart of the IAOA-DELM lighting correction model is shown in
Figure 3.
3.4. Experimental Scene Construction
The experimental environment was a computer with the CPU model ARM
[email protected]. All experiments in this paper were completed by MATLAB2017b software in a Windows 11 environment.
The underwater image acquisition system included a PC terminal, pool, camera, object, and light source. The camera was a Mercury II series high precision industrial camera produced by DaHeng Image Company, model MER2-1220-32U3C. The camera has a resolution of 4024 (H) × 3036 (V), a frame rate of 32.3 fps, and a pixel size of 1.85 µm × 1.85 µm. The optical lens has a resolution of 8 mega pixels and a focal length of 8.0 mm. Light source Seven common lighting sources recommended by the International Commission on Illumination (CIE) were used to provide lighting. The light source types are presented in
Table 1.
In the layout of the pool environment, the filter was used to filter the impurities injected into the pool water. The method of reflection lighting was selected. The light source was installed on the same side of the camera to obtain the reflected image of the underwater object. The projection angle was guaranteed to remain unchanged after the replacement of the light source. To reduce the impact of bright spots, we painted all sides except the front (where the camera looks in) black to reduce reflections from the air and water interfaces. In the shooting environment, the doors and windows were closed to ensure that there was only one experimental light source in the scene. The straight-line distance between the target and the camera was 1 m. The experimental environment is shown in
Figure 4.
3.5. Data Set Acquisition
To enhance algorithm efficiency, it is essential to utilize an efficient and low-dimensional feature vector as the input for subsequent algorithms. Weijer et al. [
31] introduced a gray-edge framework that consolidated various traditional unsupervised color constancy algorithms. The mathematical formula is presented as follows:
where,
represents the convolution of image f and Gaussian filter
.
By adjusting the parameters n, p and , different color constancy algorithms can be derived. The gray edge framework can be used as an image feature statistics tool to introduce higher-order derivative information of the image into the input features. In this research, the value range of n was selected as {0, 1, 2}, and the value range of p was selected as {1, 2, 3, …, 10}. The value range of was {1, 3, 5, 7, 9}. There were 150 (3 × 10 × 5) cases in parameter combination i. When r and g were extracted, there were a total of 300-dimensional input feature vectors. In this data set, real illumination RGB information was added as a label by extracting color card information. Here, RGB information was converted to r and g chroma information to remove the influence of illumination intensity.
In this research, 300 underwater images were shot with six light sources of different color temperatures, as shown in
Table 1.
The color features of the image dataset were extracted using the gray-edge framework. These extracted chromaticity features, along with the corresponding light source information obtained from the underwater color cards, formed the dataset. To evaluate the model, the dataset was divided into a training set and a test set using the ten-fold cross-validation method. The training set consisted of 270 samples, while the test set contained 30 samples. The training set was utilized to train the neural network and obtain the optimal model parameters. The test set was used to check the correction ability of the model and evaluate the performance of the algorithm.
3.6. Evaluation Index
Chroma error and angle error are important evaluation indexes of illumination correction. The true illumination of the image was set as (
) and the illumination predicted by the algorithm as (
). The chroma error (
) and angle error (
) of the algorithm were shown in Equations (10) and (12). The smaller the angle error and chroma error, the better the effect of the algorithm. Chroma accuracy (CR) was used as the fitness of the search proxy population.
where,
is the Euclidean norm of a vector.
The diagram of the IAOA-DELM light correction process is shown in
Figure 5.