2.1. Hyperspectral Imaging
Hyperspectral imaging (HSI) is a method of capturing and processing images that are represented by electromagnetic radiation in the near-infrared (NIR), visible light (VIS), and ultraviolet (UV) ranges [
26]. This technique is applied in various industries where particle recognition, material identification, and analysis of substance compositions and concentrations are required.
The initial step in HSI involves illuminating the target element with a light source that covers the spectral range of the hyperspectral camera. Halogen lamps are often used for this purpose due to their low cost compared to their high spectral power density. After the reflection from the sample, the light beam is collected by an objective and passes through the entrance slit. The slit is designed to only allow light from a narrow linear region to pass through. After collimation, the light is dispersed into a sequence of narrow spectral bands and is visualized on a CMOS matrix detector. The collected image represents a single line of the object as a function of its wavelength [
27].
To obtain a complete image of the sample, it is essential to correlate the acquisition of the line with the sample’s linear translation. Dedicated linear tables provided by the camera manufacturer and with adjustable table movement lengths and speeds are often used for this purpose. By scanning individual lines, the hyperspectral camera eventually accumulates an image that is the sum of all the lines. The resulting image is often presented using a hyperspectral cube encompassing both spatial and spectral dimensions, which facilitates a comprehensive representation of the analyzed sample’s properties.
Hyperspectral devices for imaging capture the intensity, which is denoted as:
, where
x and
y are pixel coordinates,
represents the wavelength of electromagnetic radiation, and
I is the radiation intensity. Additionally, the quantities
determine the spatial resolution of the image and are directly dependent on the camera’s construction, including the lenses and detectors used. The size
corresponds to the spectral resolution, i.e., the intervals between consecutive spectra—this size is known as the full width at half maximum (FWHM) [
28].
Hyperspectral imaging provides continuous intensity information for the entire camera spectrum, unlike multispectral imaging, for which intensity is discretely characterized. The collected hyperspectral data are represented by a matrix , where dimensions represent the length, is the width of the image, and is the number of all channels of light wavelengths.
Hyperspectral cameras find extensive application in various industries and in material science. They are utilized for:
Conducting research on plant phenotyping [
29];
Analyzing artwork and historical artifacts for identifying material compositions used in paintings [
30];
Material science, particularly in waste sorting facilities for waste classification [
31];
The medical field for examining tissues and pathological changes [
32].
In hyperspectral imaging, the sample’s content can be studied based on the scattering, absorption, or rotation of the plane of the electromagnetic waves. The detection of particle concentrations through the scattering of rays in the near-infrared range is associated with challenges related to the precision and repeatability of measurement results. Factors like varied human body temperatures causing changes to the scattering of incoming radiation to the camera [
33] and intensity changes resulting from scattering are within the range of measurement noise [
34] and are a source of the noise.
Due to the above factors, a more favorable approach is to apply a detection method based on the examination of radiation absorption.
The research project focuses on non-invasive glucose and silicon level detection using hyperspectral imaging techniques. The absorption of radiation originating from glucose falls within similar wavelength intervals as water, hemoglobin, and fats [
35]. Through application of appropriate data analysis methods, it is possible to extract information about the concentration of the molecule of interest.
The research project focuses on non-invasive glucose level detection using hyperspectral imaging techniques. The study progresses through a systematic series of steps and data inspections aimed at achieving accurate and reliable results.
2.3. Measurement Setup
The measurement setup was equipped with a hyperspectral camera (Headwall, model: Micro-Hyperspec SWIR 384, wavelength range: 900–2500 nm) mounted vertically over a linearly movable table. The samples were placed on the table and illuminated with a halogen lamp. Each of the prepared solutions was poured into Petri dishes made of polystyrene (PS).
Figure 1 presents a testbed for data collection.
The next step was to set the focus of the lens depending on the height of the placed camera. Headwall provides the software necessary for data acquisition for their devices. In the software, parameters such as the speed and distance of linear table movement and the camera’s operating parameters, such as exposure intervals for each line of the captured image and the duration of data acquisition by the camera, needed to be set. The data collected for the study are available at [
25].
2.3.1. Initial Data Inspection for Glucose in Distilled Water
The whole dataset contains 94,730 samples of all glucose concentrations, and for silicon, there are a total of 47,375 samples. The obtained data were initially processed to classify their quality and explore the potential for predicting data based on them. The whole dataset’s hyperspectral data for all glucose concentrations is equal.
From the determined parameters for each wavelength, it can be inferred that the dataset has:
Spectral data range from to in intervals of approximately 9 nm between each channel.
Spectral data are stored in the int64 format.
There are 94,730 samples for each wavelength.
The skewness coefficient for the entire wavelength range is positive and ranges from 0.2 to 8.5, excluding the wavelength 902.503 nm.
Kurtosis for wavelengths in the range of 902.503–1312.93 nm and 2315.13–2506.03 nm has values from −0.2 to −1.7, while for wavelengths 1389.29–1580.18 nm, it ranges from 70 to 163. For wavelengths 1608.82–2019.24 nm, this coefficient is between 25 and 55, and for the remaining wavelengths, it ranges from 0.2 to 5.
The linear correlation coefficient between the wavelength intensity value and glucose concentration using the Pearson method is approximately −0.3.
The correlation coefficient indicates the convergence of two variables, i.e., the possibility of predicting one variable based on the other. In this case, it is useful information for predicting the concentration of particles using wavelengths.
The absolute value of the skewness coefficient ranges from 0.1 to 9. A high absolute value of the skewness coefficient indicates higher asymmetry of the studied samples and a lack of good repeatability of the collected data by the camera for these wavelengths. Positive values indicate rightward skewness, while negative values indicate a leftward shift relative to the system. For a normal distribution, it is 0.
For a normal distribution, the kurtosis coefficient is 3, which indicates a normal distribution of measurement data. A coefficient below this value is associated with a platykurtic distribution, which is flatter than a normal distribution. Characteristics of such a distribution include equal placement of points around the mean and a lower probability of outlier values compared to other cases. When the value is exceeded, the data are more concentrated around the mean, and the distribution has a leptokurtic character [
36]. High kurtosis in the area of light absorption may indicate a higher occurrence of extreme values associated with the absorption of energy for these wavelengths and thus indicate the possibility of changing the intensity of light intensity at different glucose concentrations.
Detailed statistical information is presented in
Table 2 (glucose) and
Table 3 (silicon). The values of individual parameters such as standard deviation, skewness, kurtosis, and correlation for different wavelengths are presented to provide insights into different aspects of a dataset.
Additionally, the correlation matrix between the concentration and each wavelength was determined using the formula:
where:
p—the number of pixels in vector
,
—the mean for a single spectral channel, and
—the standard deviation for a single spectral channel.
The wavelengths were also compared to investigate their correlation and linear dependencies. It is visible in the plot that most of the data may introduce disturbances to the model. Dimensionality reduction of hyperspectral data would be beneficial.
Figure 2 and
Figure 3 show the intensity values depending on the wavelength for glucose and silicon, respectively.
The examined parameters represent statistical features that impact the initial data preprocessing, detection of outliers, feature selection, and modeling decisions. Thorough consideration of these statistics can lead to more accurate and significant analyses and conclusions from the hyperspectral dataset. For the collected data, the kurtosis coefficient ranges from 0 to 100, indicating a diverse distribution of data for each channel. For wavelengths for which the intensity values are constant, the kurtosis coefficient falls within the <−2.0> range. The highest values were observed for the wavelength range from 1150 nm to 1700 nm. The correlation coefficient between the intensity of light and the glucose concentration is approximately −0.3, indicating a decreasing linear relationship between these variables.
Due to erroneous values for the wavelength 892.958 nm, they seem unreliable. Errors for a given wavelength may result from damage to the device or overexposure at that wavelength. It would be necessary to remove data for this wavelength to ensure the overall quality of the dataset.
2.3.2. Preprocessing
Figure 4 presents the flowchart that illustrates the applied method.
The first step was the normalization of spectral data for each pixel. This process led to the improvement of linear relationships between wavelengths and to noise reduction.
A first-order numerical differentiation was applied using the central difference method for the data. Numerical differentiation of hyperspectral functions is particularly valuable when an analysis of the rates of change or the gradient of data variability is needed, which can facilitate new insights into fundamental trends or anomalies. However, due to the susceptibility to signal changes, it can amplify noise, leading to signal distortion. Therefore, it is advisable to apply a Savitzky–Golay filter in the preceding step to mitigate this effect [
37].
Figure 5 and
Figure 6 show the intensity distribution results of each wavelength after normalization and filtering.
Figure 5 shows the results for glucose, and
Figure 6 shows the results for silicon.
Based on hyperspectral data, principal components were determined to reduce dimensionality and, consequently, noise and distortions. Principal components are a linear combination of the original features, which makes their direct interpretation challenging.
Eigenvalues provide information about what part of the total variability is explained by a given principal component. The first principal component explains the largest part of the variance, the second principal component explains the largest part of the variance not explained by the previous component, and so on. As a result, each successive principal component explains a smaller part of the variance, meaning that successive eigenvalues are progressively smaller.
The total variance is the sum of the eigenvalues, which allows for calculation of the percentage of variability defined by each component. Consequently, for each successive component, the cumulative variability and the cumulative percentage of the variability can be computed.
Principal component analysis helps us to understand which original attributes enable accurate model classification. Visualization techniques, such as a biplot, can aid with understanding the relationships between variables and components. The relational matrix between the first five principal components is presented in
Figure 7 and
Figure 8.
The color indicates the concentration of each component. Clustering of the data depending on the concentration of the components can be clearly seen.
To achieve the highest ratio of conveyed information to the number of attributes, the decision was made to select the first three attributes. The choice was made to select the first four components due to the introduction of the greatest amount of conveyed information—the eigenvalues of these elements, in relation to the entire set, sum up to a total of 95%. The application of this method allowed for a reduction in the number of dimensions from 170 to 4 attributes for glucose and to 2 attributes for silicon. The results have been presented in
Table 4 for a comparative analysis of the PCA outcomes between glucose and silicon.
2.3.3. K-Fold Cross-Validation
K-fold cross-validation is a statistical method used to approximately assess the predictive ability of a created model. The process involves dividing the dataset into equal subsets, where one of them is used for validation, while the remaining -1 subsets are used to train and test the model. This method allows simultaneous training, validation, and testing of the model on different subsets of data.
K-fold cross-validation is particularly useful for mitigating the problem of overfitting during the learning process. It helps evaluate the stability of the model and its ability to generalize to different datasets.
2.3.4. Prediction Methods
After preprocessing the collected data, two algorithms were used: the support vector machine regression (SVR) with a radial basis function (RBF) kernel and multilayer perceptron (MLP) with a ReLU activation function for the input and hidden layers.
Support vector regression (SVR) is a machine learning technique used in regression tasks. The primary goal of SVR is to find a regression function that has minimal deviation from the actual data while keeping the regression errors below a certain permissible level. Unlike traditional regression methods, SVR allows for adaptation to nonlinear relationships between variables.
MLP (multi-layer perceptron), a type of artificial neural network, can be used for both classification and regression tasks. MLP can model more complex, nonlinear relationships between inputs and outputs. MLP includes at least one hidden layer, allowing for the processing and detection of features at different levels of abstraction. This enables more advanced and intricate data representations, which can be beneficial for addressing more challenging problems.
The prepared models underwent k-fold cross-validation. K-fold cross-validation aims to assess the model’s quality during training to eliminate potential issues. The process involves dividing the training set into k subsets with an equal number of elements, where k-1 subsets are used for training/adapting the model, and one subset is used for validating the model.