1. Introduction
One of the ultimate goals of spectral estimation from a camera image is to predict the spectral reflectance data that represent the physical properties of a device-dependent camera signal. Spectral reflectance is a major area of interest in many fields, including biometric identification [
1,
2], art archiving [
3], cosmetics [
4,
5], agriculture [
6], and high-fidelity color reproduction [
7]. Thus, the technique of spectral estimation from camera images has gained importance.
Generally, it is acknowledged that spectral reflectance estimation from a three-channel RGB camera has a relatively lower estimation accuracy compared with a multispectral imaging system that contains more signal channels with different filtration mechanisms. Conventionally, a multispectral imaging system is constructed using a camera with bandpass filtration systems, such as narrowband filters, liquid crystal tunable filters (LCTF) [
7,
8,
9,
10,
11], programmable illumination imaging system [
12,
13,
14,
15,
16,
17], or broadband filters with trichromatic cameras [
18,
19,
20,
21,
22,
23]. It is worth mentioning that, with dramatically developing technology for both digital cameras and LED lighting, a multispectral imaging system with consumer-grade cameras and broadband lighting can be developed with much lower cost and easier operation processing, although the spectral reflectance estimation algorithm needs to be developed and verified.
The current literature on spectral reflectance estimation is extensive and focuses particularly on accurate algorithms. These algorithms can be classified into three groups: model-based methods, interpolation methods, and learning-based methods [
20]. Recently, learning-based reconstruction has developed dramatically. It can use low-cost but high-resolution cameras, and it is convenient in practical applications because it is not necessary to characterize the spectral sensitivity function of the imaging system, which has a low requirement in wavelength resolution similar to a spectrophotometer. The performance, however, is greatly affected by the choice of the training set used as part of the characterization process. In recent years, there has been an increased amount of literature on learning-based algorithms, for example, the pseudoinverse method, Wiener estimation [
23,
24], principal component analysis [
25,
26,
27], and polynomial-based regression models [
18,
28,
29]. For example, Berns et al. proposed an image-based spectral reflectance estimation method using matrix R based on the Wyszecki hypothesis by combining a trichromatic camera and absorption filters [
18,
19]. Hardeberg et al. proposed a method using a principal eigenvector technique and pointed out that any spectral reflectance can be expressed as a linear combination of basic functions and a scalar vector and evaluated illuminant estimation models from color to multispectral imaging [
7,
10]. Li and Cao proposed two reconstruction methods, based on local linear regression, which achieve reasonable reconstruction accuracy [
30,
31]. Shen et al. reported that the partial least squares regression (PLS) method could also be adopted in constructing a regression model based on the correlation between response value and spectral reflectance [
32]. All these studies claimed to have achieved good results using different metrics.
Overall, most studies use a mathematical algorithm for spectral estimation. However, the impact of overfitting, which would lead to poor accuracy performance, has not been sufficiently investigated. Overfitting is usually caused by the polynomial degree of camera signal expansion. When using higher degree polynomials, the irrelevant detail and noise in the training dataset are picked up and learned as concepts, and the error for the test set starts to rise as the model’s ability to generalize decreases, i.e., when a model works well for the training set data but performs badly on the test set. Several approaches could be used to reduce this problem, such as approaches based on reducing the feature numbers or imposing penalties by putting weights on the features, such as the regularization method [
33,
34,
35,
36,
37,
38,
39,
40]. For example, Shen et al. proposed a nonlinear regression method based on a polynomial model for spectral estimation, with consideration being given to the potential overfitting problem in the polynomial-based regression model [
14,
33]. Graham et al. demonstrated that the root-polynomial regression model could provide leading performance in both spectral recovery and color reproduction. Harifi et al. initially applied the principal component analysis embedded regression method to recover the spectral reflectance, and a third-order polynomial system was found to be best for the calculation of the transformation matrix [
28]. Therefore, a method that reduces the problem of overfitting as well as characterizes the nonlinearity of the spectral reflectance estimation of multispectral imaging systems should be promising.
The main aim of this study is to develop a multispectral imaging system using an RGB camera and two commonly used illuminants. We specifically focused on the development of a more accurate spectral estimation method from raw camera responses using the iteratively reweighted regularization regression model proposed in this study. To solve the overfitting problem, we develop a feature selection process that uses neighborhood component analysis. The superior performance of this proposed method is evaluated and compared with existing methods by using both a semiglossy ColorChecker SG (CCSG140) chart and a matte ColorChecker DC chart (CCDC240). The overall performance of both the proposed and the traditional methods is compared in terms of both spectral and colorimetric accuracy.
2. Multispectral Imaging Model
In this section, we show the details of the proposed method. Two raw images are captured under two different color temperature lighting conditions to recover the spectral reflectance of a scene.
2.1. Regularization Model
The camera response is proportional to the intensity of the captured light; thus, the camera response can be expressed as a linear combination of the camera sensitivity functions, the illumination spectral power distribution, and the spectral reflectance of the objects. As has been shown in previous work [
5,
6,
7,
8,
9,
10], the camera sensor response vector
c can be formulated as spectral reflectance:
where
S is a 31 × 3 matrix that represents the spectral sensitivities of the sensors (assuming we have 31 bands from 400 nm to 700 nm at 10 nm intervals);
L is a 31 × 31 diagonal matrix that represents the spectral power distribution of the illuminant;
r is a 1 × 31 discrete spectral vector of the object uniformly sampled over the visible wavelength range, typically from 400 nm to 700 nm at 10 nm intervals;
ε is a 1 × 3 vector of the additive system errors; and
c is a 1 × 3 vector that represents the camera sensor response.
The spectral reflectance can be estimated using prior knowledge of a training set of measured color patches and camera responses. When the error can be ignored, the above equation can be transformed as a scalar product in matrix notation as:
where
R is an
n × 31 matrix of the spectral dataset in which each row represents the spectral reflectance of
n samples.
C is an
n × 3 matrix that represents the camera response, and each row consists of one or two sets (two three-channel images taken under two different lighting conditions) of camera responses for a sample.
Q is a 3 × 31 matrix that represents the transform between the camera response and the spectral reflectance.
As described in the literature, most spectral estimation processes perform spectral estimation based on regression models, including linear, second-, third-, and fourth-order polynomial expansion models. In general, the accuracy of the spectral estimation depends not only on the training set but also on the number of signal features: the greater the number of features, the better the linearity of the imaging system. To improve the spectral estimation accuracy, a polynomial transform was used to extend the response values instead of increasing the number of imaging channels. Taking a third-order polynomial model as an example, the six-channel camera response (a row of
C) can be expanded to 84 terms:
Using
D to denote the matrix expanded from
C, models can be built to map the polynomial camera response features to the spectral reflectance as:
where
Θ is the 84 × 31 transform matrix searched for by the least-squares method. This could address some of the problems by imposing a penalty on the size of the coefficients. The
L2-norm of vector
κ can be added to the loss expression to give the preferred solutions with smaller norms. The objective functions
are as follows:
here,
κ is a regularization parameter to be empirically selected. The purpose of this regularization setting is to stabilize the regression output, which prevents a large change in the result when small perturbations in the input camera response occur. The gradient of the objective functions in the least-square sense becomes:
Let the gradient manually be zero, that is,
where
I is the identity matrix. Small, positive values of
κ reduce the variance of the estimates. While biased, the reduced variance of ridge estimates often results in a smaller mean squared error when compared to the ordinary least-squares estimates, and
is the smallest positive singular value. Referring to the singular value decomposition of the expanded camera signal response matrix,
,
, the above equation can be transformed:
2.2. Iteratively Reweighted Regularization Model
The conventional method is optimum when the noise is ignored; however, it provides a poor estimation and is unreliable when outliers are present in the training data. Residual analysis is required to address these problems and downweight the influence of outliers. Here, we proposed a method by assigning a weight to each training sample to downweight the influence of outliers iteratively. In the first iteration, each training sample is assigned an equal weight, and the model coefficients are estimated using the regularization model from Equation (8). At subsequent iterations, weights are recomputed so that the points farther from the model predictions in the previous iteration are given lower weight until the values of the coefficient estimates converge within a specified tolerance. The modified objective function
minimized by the M-estimator is as follows:
where
ω is a function of weighted residuals called fair estimators defined as follows. The M-estimator needs to be found; it is a way of mitigating the influence of outliers in an otherwise normally distributed data set.
The value
u in the weight functions is
where
resid is the vector of residuals from the previous iteration, the
tune is the tuning constant with 1.4 as the default, and
s is an estimate of the standard deviation of the error term given by
where
h is the vector of the leverage values from a least-squares fit, which is calculated from a QR decomposition, and
. Assuming the signal matrix is of full rank and the test sample errors are independent and identically distributed with variance, the gradient for the weighted residual in the least-square sense becomes
Let the gradient of the above equation be zero. The schemes for finding the solution are as follows:
Solving this estimation equation is equivalent to a weighted least-squares problem: the weight depends upon the residuals, and the residual depends upon the estimated coefficient, so an iterative solution is therefore required.
(a) The initial estimates of spectral reflectance R0 are calculated in Equation (8).
(b) At each iteration t, residuals resid(t−1) and associated weights wi(t−1) are calculated from the previous iteration.
(c) The new weighted least-squares estimates are solved with Equation (10)
(d) Steps (b) and (c) are repeated until the estimated coefficients converge.
2.3. Feature Selection
It should be noted that when the number of polynomial expansion signals is large, its components are correlated, and the columns of the signal matrix have an approximately linear dependence. The estimation is extremely sensitive to random noise in the camera response, producing a large variance, and the situation of multicollinearity is an issue. This will degrade the prediction performance and the stability of spectral estimation precision [
29]. The objective of a feature selection search for a subset of extended polynomial camera responses is to optimally model the camera responses and the spectral reflectance. The subset is subject to constraints such as the required or excluded features and the size of the subset. The performance of the spectral estimation transform matrix can be improved using the neighborhood component analysis feature selection [
38,
39,
40,
41,
42]. Consider a spectral estimation training set
S containing
n color patches:
where
ci is the polynomial expansion of the camera signals from the
ith patch, and
ri is the corresponding spectral reflectance. A randomized regression model can be built as follows:
(a) A patch Ref(c) is randomly selected from S as the ‘reference point’ for camera response.
(b) The response value at c is set equal to the response value of reference point Ref(C).
The probability,
P(
Ref (
x)
= cj|S), that point
cj is picked from
S as the reference for
c is
where
dw is the distance function, and
ν is the kernel function that assumes large values when
dw is small. Now consider the leave-one-out application of this randomized regression model, that is, predicting the response for
using the data in
, and the training set
S excluding the point
. The probability that point
cj is picked as the reference point for
is given by:
Let
be the response value the randomized regression model predicts and
ri be the actual spectral response for
ci. Let
l be a loss function that measures the disagreement between
ri and
. Then, the average value of the loss function
l(
ri, ) is given by:
After adding the regularization term
λ, the weight vector
wf can be expressed as the following minimization regression error:
where
wr is weight vector for
rth feature item,
n is the number of observations, and
p is the number of predictor variables.
3. Experiment and Result
In this section, the proposed method is implemented and compared with the currently existing methods; meanwhile, the feature selection that will influence the estimation accuracy of the proposed method is also investigated and discussed.
3.1. Camera Setup
To verify the proposed approach, comparative experiments were conducted. The multispectral imaging system we developed includes a commercial trichromatic camera (Canon EOS 6D Mark II) with 16-bit digitization and two spectrally tunable THOUSLITE LED Cubes mounted with translucent diffuse reflectors. To illuminate glare and specular highlights, a linear polarizer was placed in the illumination plane of each LED Cube with the polarizing axes orientated in the same direction, and another linear polarizing filter was placed on the lens of the camera. As shown in
Figure 1, the imaging plane of the digital camera was set to be approximately parallel to the sample placement plane, and the two LED Cubes were placed at an angle of approximately 45° to the color samples.
During image capture, each cube is used to simulate D65 (CCT approximately 6500 K) and incandescent light (CCT approximately 3500 K), These two illuminations are most commonly used, and the shape of SPD curves are quite different and weak in correlation. The spatial nonuniformity of the light field was corrected using exposure to a white card. The relative spectral power distribution of the two light sources was measured using a diffused white sample and is illustrated in
Figure 2. Note that the SPD curves are similar in some wavelength bands because these lights are fitted from 15 narrow channels over the visible wavelength range.
The parameter settings for the camera were fixed during the image capture, with the aperture size set to f5.6, the shutter speed at 1/8 s, and the ISO speed set to 640. Canon EOS Utility software was used to control the camera for the image capture. The original raw responses were recorded by the camera and used to predict spectral reflectance. The dark current noise was recorded with the camera lens cap closed and was subsequently subtracted from the captured digital images.
An X-Rite ColorChecker semigloss chart (CCSG, 140 patches) and a Gretag–Macbeth ColorChecker DC matte chart (CCDC, 240 patches including 232 mattes, and 8 glossy patches) were used as color targets and captured by the proposed multispectral imaging system. The spectral reflectance of all the patches was measured by a Konica Minolta CM2600D portable sphere Spectrophotometer from 400 nm to 700 nm (under SCI measurement based on diffuse: 8 geometry) at intervals of 10 nm over the wavelength range. Measuring Specular Component Included (SCI) would capture true color data from the sample and negate the effect of surface appearance to measure only color. It makes little or no difference if the patches are mirror-like or matte in appearance.
Figure 3 shows the color distribution of both charts plotted in the CIELAB color space. The CIELAB values were calculated using the CIE 1931 standard observer and illuminant D65.
The raw camera RGB of each patch in the color chart was obtained, and 50 × 50 pixels from the raw Bayer-patterned response of the central color patches of the image without postprocessing were extracted and demosaiced by the DcRaw program. Then, the transform matrix between the camera response and spectral reflectance was calculated from the training set and evaluated on the test set. To test the model more robustly and fairly, a ten-fold cross-validation approach was used ten times to evaluate the proposed method. All the patches were divided into ten groups randomly, and for each group, 9/10 of the patches were assigned as the training set, and 1/10 of the patches were assigned as the test set. Both spectral differences and color differences between the model-predicted results from the camera and the measurement results from the spectrophotometer were calculated to represent the performance of the predictive accuracy for the spectral reflectance estimation. That is, root-means-square error (
RMS) was used as spectral metrics [
15], with CIEDE2000 (color difference) under the D65 CIE standard illuminant as the colorimetric metric. The
RMS and CIEDE2000 are positive values, with 0 corresponding to a perfect estimation. These metrics are given by the following equations:
where
denotes the reconstructed spectral reflectance of
ith patches,
denotes the measured reference of
ith patches, and
n denotes the number of full samples.
n denotes the wavelength sampling number of the spectral reflectance over the visible spectrum.
3.2. The Influence of Feature Selection
Another common problem that was raised in the introduction section concerns the feature number of the expanded polynomial. For all the existing methods, the performance is calculated by the first-order, second-order, and third-order, rather than by the most important selected features. As noted in
Section 2.3, the feature number of the camera response is crucial for spectral estimation.
Neighborhood component analysis feature selection is performed to optimize the feature selection for the proposed imaging system. Here, the MATLAB function fsrnca, in the Machine Learning Toolbox, was used. However, as explained by Urban et al., too many items can cause overfitting problems [
22]. By selecting weighted features, the colorimetric and spectral metric precision will be determined as the feature number is changed. In this study, using 380 color samples from the SG140 chart and DC240 chart, a regularization model with third-order polynomial expansion was used to calculate the different selected features in terms of both the spectral reflectance RMS and the mean CIEDE2000 color difference.
Figure 4 reveals the relationship between the mean spectral reflectance RMS, the mean CIEDE2000 color difference, and the feature numbers within the expanded polynomial feature range. The performance of colorimetric and spectral metrics initially decreased with an increase in the feature number, and then their performance increased after an optimal value of approximately 30 features was reached. Thus, it should be more meaningful to understand the importance of the features and train a model using the only the selected features.
Figure 5a illustrates the feature selection weights of 84 extended third-order polynomial feature items from six-channel camera responses using neighborhood component analysis (NCA). Over half of the feature weights are less than 400. Calculating the loss using the test set as a measure of the performance, the weights of the irrelevant features should be close to zero, and the performance of feature selection using five-fold cross-validation should improve.
Figure 5b shows a hierarchical treemap view of the feature weights and shows the obvious patterns, including which items are most important for spectral estimation. The relationship between each feature is shown by color and proximity. The 30 most relevant features were identified and are summarized in
Table 1. There are 6 first-order terms, 17 s-order terms, and 7 third-order terms that are most important for spectral estimation. This means that not all terms are necessary for spectral estimation; part of the second-order and third polynomial expansion are negative, irrelevant, and decrease the estimation performance. In addition, it should be noted that the selection of features might depend on the camera, illumination, and training dataset.
Based on the proposed multispectral camera system, the performance of the spectral reflectance estimation with different expansions (including the linear expansion, the second-order polynomial expansion, the third-order polynomial expansion, and finally, the third-order polynomial expansion with the proposed 30 features) was evaluated in terms of both the colorimetric error and spectral difference. All results are given in
Table 2: compare the evaluation results of different polynomial expansion methods, it can be seen that the best performance, i.e., the smallest average color difference between the predicted and measured spectra, is achieved by selecting 30 feature items from the third-order polynomial expansion; it has the smallest value in root-means-square error (RMS), CIEDE2000 color difference under D65 illumination, lowest standard deviation (SD), and largest value in Student’s t variance (t-stat). Student’s t variance is used in the testing the variance for Student’s t distribution. The higher the t-stat value, the greater the confidence we have in the coefficient as a predictor. It is not safe to conclude that more feature numbers are needed to obtain more accurate results. Excessive training numbers would cause overfitting and thus reduce accuracy.
To visualize the performance of different signal expansion methods, the estimated spectra of three randomly selected samples (No. 54, No. 180, and No. 288) are plotted in
Figure 6, where the left axis shows the estimated spectra of samples with different polynomial expansions and the right axis shows the ΔR error between estimated and measured spectra. Our feature selection method outperforms the traditional expansion methods in terms of both spectral and colorimetric accuracy.
3.3. The Influence of the Regression Model on the Proposed Method
The proposed method was implemented and compared with currently existing methods. These included the regularized least-squares (RLS) [
34], Tik regularized least-squares [
36], Wiener method [
5], ordinary least-squares method (OLS) [
10], principal component analysis (PCA) [
7], and partial least-squares method (PLS) [
33]. All the existing methods were implemented in their optimal conditions, while six principal components were used conventionally for the PCA- and PLS-based methods, as the first six (instead of the total 84) components explain over 95% of the total variance.
Table 3 and
Table 4 compare the evaluation results of the spectral metric in RMS and CIEDE2000 color difference between 84 items (third-order polynomial expansion) and 30 items (feature-selected). They show that our proposed method, the iteratively reweighted regularization model (IRWR), improves the precision of spectral reconstruction in color and spectral using feature selection, while the traditional methods exhibit a large discrepancy without considering the overfitting effect caused by an excessive number of features. Especially for the feature-selected model (30 items), IRWR has the smallest value in RMSE (mean value of 2.14%, maximum value of 9.35%, standard deviation (SD) of 1.77%) and smallest CIEDE2000 color difference (mean value is 1.79, the maximum value is 7.31, standard deviation (SD) is 1.39) compared to other methods. Furthermore, the predictive error of the feature-selected method is reduced by approximately 0.19 for the CIEDE2000 color difference compared to the unselected 84 items. Compared with the fully expanded items, our feature-selected model has the lowest standard deviation (SD), which indicates that the spectral estimation errors tend to be close to the mean value and the model has strong performance.
To illustrate both the overall predictive accuracy and the error distribution of all methods,
Figure 7 is plotted using boxplots to represent the distribution of the color difference and spectral RMS error compared with other estimation methods. The boxplot distribution of the estimation result of the proposed method is more compact than that of the existing methods, of which most of the CIEDE2000 and spectral RMS are less than those of the other traditional methods. The top of the blue rectangle indicates the upper quartile, a horizontal red line near the middle of the rectangle indicates the median, and the bottom of the blue rectangle indicates the lower quartile. The topmost outlier is represented by a vertical line that extends from the top of the rectangle to indicate the maximum value, and the bottom end outlier is represented by the vertical line that extends from the bottom of the rectangle to indicate the minimum value. The outliers (red cross) that are above or below the box body indicate that their values are greater than the upper quartile plus 1.5 times the interquartile range or less than the lower quartile minus 1.5 times the interquartile range. In addition, the boxplot of the overall error distribution for the ten-fold validation in
Figure 7 shows that there are more small error samples estimated by the proposed method, which is more intuitive to prove the superiority of the proposed method.
The reconstruction of the spectral reflectance of three randomly selected samples (No. 54, No. 180, and No. 288) compared with those of the existing methods is shown in
Figure 8, where the black line is the measured spectral reflectance. The reflectance reconstructed by the proposed method is found to be more accurate than those of the traditional methods. The experimental results using the simulated camera may, to some extent, prove the superiority of the proposed method.
3.4. Methods Implementation and Comparison for Illuminant Metamerism
To evaluate whether the RGB camera under two illuminations can give a better spectral estimation than that of the camera under one illumination, our proposed model with the third polynomial regression was also investigated under each illumination (3500 K or 6500 K) separately. The performance of the spectral estimation under 3500 K, under 6500 K, nonfeature selection (84 items), and feature selection (30 items) is listed in
Table 5 in terms of the mean error, maximum and minimum error, standard deviation (SD), and Student’s t mean and variance (t-stat) of the color differences under these five test conditions. All the color difference data were calculated according to CIE Special Metamerism Index: Change in Illuminant (CIE 015.2-1986), and the data listed in the table were CIE ΔE*76 color difference between estimated and measured spectra.
The results indicate that the three-channel signals of 3500 K and 6500 K with traditional quadratic expansions have high chromatic accuracy at the approximate color temperatures. The performance of the six channels under two illumination conditions is improved. It can be seen that the best performance, i.e., the smallest average color difference of color difference between the predicted and measured spectra, is achieved by our proposed method, which uses 30 items from the third-order polynomial expansion variables and six channels under both 3500 K and 6500 K illumination. The average color differences range from 2.14 (Illuminant D65) to 2.40 ΔE*ab units (Illuminant D50), with an overall mean of 2.23 ΔE*ab units, a maximum value of 5.41, and a standard deviation (SD) of 1.19. Additionally, the reconstructed spectral reflectance of the three randomly selected samples No. 54, No. 180, and No. 288) with single illumination (3500 K, 6500 K), double illumination, and feature-selected double illumination is shown in
Figure 9. Our method achieves improved estimation accuracy by selecting features from polynomial expansions of camera response under two illumination conditions.
3.5. Discussion
In this study, we developed a cross-polarized multispectral camera system by applying two commonly used illuminants and propose a new method to improve the accuracy of the estimation of the spectral data from the raw camera responses. The results illustrate that the performance of reflectance estimation from the camera images can be significantly improved when two broadband illuminations are used. This implies that multi-illumination with consumer-grade cameras can be an effective approach to construct a multispectral camera system. The factors that have contributed to the improvement in estimation accuracy are as follows.
(1) We found that the feature selection of the expanded camera response influences the estimation performance. We have shown that a small number of features can provide better performance than the full selection of features of the camera response expansion. The selection of features might depend on the camera, illumination, and training dataset. Further investigation of factors affecting the feature selection was conducted and reported in this paper.
(2) As outliers present in the training data could lead to poor estimates, the iteratively reweighted regulated model was proposed. An analysis of residuals was necessary to work around this by assigning weights to the training data. The weighting is done automatically and iteratively; weights are recomputed iteratively so that the points farther from the model predictions in the previous iteration are given a lower weight. Then, the influence of outliers is downweighted. The result supported our previous research.
(3) The most significant performance improvement was achieved by mapping the six-channel signals under two illumination conditions into the spectral reflectance, which minimized the degree of metamerism significantly compared to the three-channel mapping. As shown in
Table 5, for each of the illumination conditions tested, the best performance, i.e., the smallest mean color difference between the predicted and the measured spectral data, was achieved by using six channels under two illumination conditions, where the mean color differences ranged from 2.14 (Illuminant D65) to 2.4 ΔE*ab units (Illuminant D50), with an overall average of 2.23 units.