1. Introduction
Over the past 20 years, hyperspectral imaging (HSI) has become an invaluable tool for food safety and quality applications [
1,
2]. Spoilage and contamination of food and agricultural products are ongoing concerns for the food industry. Recent applications of hyperspectral imaging for food safety include detection of mold in peanuts [
3,
4], lead pollution in lettuce leaves [
5], and Fusarium head blight in wheat kernels and wheat flour [
6]. Food fraud, the intentional misrepresentation of food or food ingredients for economic gain, is another major food safety issue that has been addressed with hyperspectral imaging. For example, this technology has been applied for identifying fillets of less expensive species of fish that have been marketed and sold as more expensive red snapper (
Lutjanus campechanus) fillets [
7,
8].
Hyperspectral imaging has been a staple of agriculture monitoring, with initial applications dating back to the 1970s. Early applications include large-scale remote monitoring of land and agriculture from the Landsat-I satellite [
9], monitoring of crop yield [
10], and detection of plant disease and invasive species [
11]. While agriculture applications have remained constant since these early examples, the methods have changed with new technologies enabling more localized analysis. Unmanned aerial vehicles (UAVs) have become attractive survey platforms for local, detailed aerial monitoring efforts [
12] and advancements in computing technology and miniaturization of HSI devices have enabled the construction of new systems for in-field crop analysis [
13].
Hyperspectral imaging devices are complex systems that can be characterized by the method with which the full spatial-spectral data cube is obtained. Data cubes can be acquired by spatial scanning, spectral scanning, or by a combination of these methods [
14]. With spatial scanning imagers, light is collected at a point or along a line and dispersed into its spectral components by a dispersive optic such as prism or diffraction grating. This point or line is then scanned over the target area through the physical motion of the sensor, reflection from a scanning mirror, or physical motion of the target object. With spectral scanning imagers, the full spatial content is collected by the image sensor for individual wavelengths in sequence. Collection of the wavelengths is typically accomplished by switching wavelengths through filter wheels, electronically controlled liquid crystal tunable filters (LCTF), or acousto-optic tunable filters (AOTF) [
15].
Despite successes in the food safety and agriculture industries, hyperspectral imaging does have disadvantages, mostly due to the data cube being constructed from individual components collected in a time-sequential manner. This can be an error-prone process, especially for high-speed imaging applications. Another category of the hyperspectral imager, the snapshot imager, overcomes these issues by combining an array of optics to collect both the spatial and spectral information simultaneously. Usually, this means some compromise in either the spectral or spatial domain. All of these solutions tend to be both complex and costly [
16]. In research and discovery, it is unknown which wavelengths will be significant and which are redundant. In many cases, once the spectral characteristics for a particular targeted application are understood, there can be a significant reduction in the complexity of the spectral imaging system.
Issues common to all hyperspectral imager types are the significant computing power required and the large file sizes of the data cubes, especially in applications involving larger fields of view. Attempts to address these issues have included the application of compressive sensing [
17,
18,
19], deep neural networks [
20], and methods centered around principal component analysis (PCA) [
21]. Each of these solutions has its own limitations in terms of heavy computational requirements and large file sizes for data cube analysis.
This paper shows proof of concept for a new method for selecting narrow wavelengths for the classification of material samples. This method could support the design of a hypothetical rapid spectral imaging system consisting of a focal plane array covered with a mosaic color filter array or illumination by selected wavelength LEDs. These can collect full spatial resolution images at a small number of narrow wavelengths for visible/near-infrared (VNIR), shortwave infrared (SWIR) reflectance, and fluorescence. The proposed method has the potential to be applied in a hand-held, mobile device for rapid scanning of food products in wholesale or retail marketplaces or configured as a drone-deployable payload for low-altitude aerial scanning of crops and vegetation.
The aim of this study was to evaluate the potential of this new method for use in an application combating food fraud by determining the correct species of fish fillets that are often mislabeled to justify a higher selling price [
8,
22]. Specific objectives were to (1) develop and evaluate a heuristic wavelength selection algorithm, (2) develop and evaluate methods for classifying the species of a fillet using classifiers designed for both single-mode spectroscopy and a fusion of spectroscopy modes, and (3) compare the relative effectiveness of each spectral mode for this classification task.
2. Materials and Methods
2.1. Hyperspectral Imaging Systems
Full-resolution reflectance and fluorescence images were collected using an in-house developed visible and near-infrared (VNIR) hyperspectral imaging system [
23]. The light source for the VNIR reflectance was a 150 W quartz tungsten lamp (Dolan Jenner, Boxborough, MA, USA). For fluorescence imaging, two UV narrowband light sources were used, each with four 10 W, 365 nm, LEDs (LED Engin, San Jose, CA, USA). VNIR reflectance images in 125 wavelengths within the 419–1007 nm spectral range and fluorescence images in 60 wavelengths within the 438–718 nm range were acquired using a 23 mm focal length lens, an imaging spectrograph (Hyperspec-VNIR, Headwall Photonics, Fitchburg, MA, USA), and a 14-bit electron-multiplying charge-coupled device (EMCCD) camera (Luca DL 604M, Andor Technology, South Windsor, CT, USA).
A separate hyperspectral imaging system was used to acquire reflectance images in the SWIR region. The illumination source for this system was a custom-designed two-unit lighting system, each with four 150 W gold-coated halogen lamps with MR16 reflectors. The detection unit included a 25 mm focal length lens and a hyperspectral camera, including a 16-bit mercury cadmium telluride array detector and an imaging spectrograph (Hyperspec-SWIR, Headwall Photonics, Fitchburg, MA, USA). The SWIR reflectance images were acquired in a wavelength range of 842–2532 nm (287 wavelengths).
2.2. Simulated Annealing
Rather than sensing the full resolution spectra in each of the three modes, the proposed method uses just a small number of narrow wavelength bands (referred to simply as “wavelengths” in this paper) that are specifically chosen to yield accurate species classifications. Simulated annealing, a heuristic optimization method modeled after the metallurgical annealing process in which the metal undergoes controlled cooling to remove defects and toughen it, was used to select the wavelengths. The simulated annealing algorithm consists of a discrete-time inhomogeneous Markov chain with current state
and a cooling schedule defined by a starting temperature,
, a final temperature,
, and a total number of steps,
[
24]. The goal of the algorithm is to determine the minimum of a user-defined energy function,
.
At each iteration
, a new trial state is determined by randomly selecting a “neighbor” of the previous state and calculating its energy. If the resulting energy is less than the energy from the previous iteration, the trial state becomes the new state of the system. If the resulting energy exceeds the energy of the previous energy, the algorithm adopts the trial state with probability given by:
where
is the temperature at iteration
. Note that this equation allows the algorithm to occasionally accept states that result in an increase in energy. This can benefit the optimization by preventing it from becoming stuck in local minima. The probability of accepting such states is high at the beginning of the process when the temperature is high but gradually decreases with decreasing temperature. The output of the algorithm is the state with the lowest energy encountered throughout the annealing schedule.
Figure 1 provides a summary of this algorithm.
For this wavelength selection problem, we define the state as an array of binary elements indicating the presence or absence of each wavelength in the full-resolution spectrum. Because the collected spectra may contain artifacts at the lowest and highest wavelengths, we institute a fixed buffer of size
at either end of the spectrum. Thus, the state at iteration
i can be expressed as
where
is 1 to indicate that the
jth wavelength is selected and 0 to indicate it is not, and
is the total number of wavelengths in the spectrum. Furthermore, because consecutive wavelengths are highly correlated and thus offer little additional information if both are selected, we institute a minimum separation of
wavelength indices between selected wavelengths. Finally, we set a limit,
, on the number of wavelengths selected such that:
Under these three restrictions, we update the state for each iteration by generating a “neighbor” of the current system state. This is done by randomly de-selecting one wavelength index from the current state and selecting a new one. The energy of the trial state is then calculated as
where
is the average 4-fold cross validation accuracy (see
Section 2.5) as determined using the weighted k-nearest neighbors (WKNN) classifier. WKNN is a variation of the familiar k-nearest neighbors algorithm where the training data points are weighted based on the squared inverse of their distances from the query point. It was chosen as the basis for the energy calculation because of its relatively high classification performance and its rapid training time. Accuracy, in this sense, is calculated as the percentage of correct classifications, weighted by the number of samples per class in the test sets to ensure equal contribution from each class.
The simulated annealing algorithm was implemented in Python 3.7 using the simanneal 0.5.0 library [
25]. The temperature parameters were set to
and
and the number of steps was set to
. These temperature values were selected to ensure nearly 100% selection of new states in the initial steps, regardless of whether the energy decreased or increased, and nearly 0% selection of states that increased the energy during the final steps. The number of steps was chosen to balance the desire for rapid processing with the need for algorithm convergence.
We compared the performance of the proposed simulated annealing approach for wavelength selection with three common feature selection methods: analysis of variance (ANOVA), recursive features elimination (RFE), and Extremely Randomized Trees (i.e., Extra Tress) [
26] classifier feature importance. The ANOVA method selects features based on their ability to provide separation between the target classes in a linear manner. The RFE method is a standard linear regression method which takes as inputs the desired number of features to select and the linear classification method (in this case, the linear discriminant classifier was used). Finally, the nonlinear Extra Trees method assigns a quantitative importance to each feature based on its relevance to correct classification. Performance comparison was conducted using the same WKNN classifier featured in the simulated annealing algorithm.
2.3. Classification of Fish Species
To evaluate the success of the optimal wavelength selection algorithm, a pair of classification studies were conducted with the goal to determine the correct species of a fillet based on spectral information from a single sample point on the fillet represented by one 10 × 10 pixel block (i.e., voxel). For both studies, a multi-layer perceptron (MLP) neural network served as the primary classifier. In the first study, each spectral mode (i.e., VNIR, fluorescence, and SWIR) was investigated separately and the results of the MLP classifier were compared with results from a collection of common machine learning classifiers. The classifiers were trained on the spectral values from the selected wavelengths and evaluated using 4-fold cross-validation. In the second study, the selected wavelengths from the three spectral modes were combined in the input layer of the MLP classifier, and this spectral fusion method was again evaluated with 4-fold cross-validation. Both studies were repeated for numbers of selected wavelengths k = 3, 4, 5, 6, and 7. Results using all available wavelengths were included as a benchmark for comparison.
2.3.1. Multi-Layer Perceptron (MLP) Classifier
An MLP neural network is a common feed-forward artificial neural network that determines its weight values through supervised learning to yield a nonlinear decision boundary designed to minimize a cost function. In this case, the cost function was defined as the complement of the multiclass classification accuracy (weighted by the number of samples per class). For each of the studies described in the subsequent sections, the same two-layered MLP network shown in
Figure 2 was used. To protect against overfitting, dropout with a probability of 50% was applied to both hidden layers [
27]. Additionally, L2 kernel regularization (with factor
= 0.0001) was applied to both hidden layers to protect against overfitting by adding a term to the loss function that increases with the magnitude of the network’s weight vector. The input and hidden layers featured the rectified linear unit (ReLU) activation function, and the output layer included the softmax activation function to yield the classification decision.
2.3.2. Single-Mode Classification Study
In addition to the MLP classifier, four common machine learning classifiers—including support vector machine with a linear kernel (SVM), WKNN, linear discriminant (LD), and Gaussian Naïve Bayes (GNB)—were used to perform classification separately for each of the VNIR, fluorescence, and SWIR data. As with the first study, feature sets consisted of the k spectral samples with no further attempt at feature selection. A 4-fold cross-validation was conducted for each study as a robust estimation of multiclass classification accuracy (weighted by the number of samples per class).
SVM determines the set of maximum-margin hyperplanes to separate the classes in the feature space. WKNN, as explained above, is a variation on the k-nearest neighbors algorithm that weights the training points by the inverse square of their distances from the query point. LD classification makes simplifying assumptions about the data (i.e., Gaussian distributed with the same covariance matrix for all classes) to determine the separating hyperplanes. Finally, GNB combines the probabilities of obtaining the measured value for each input given each specific class and selects the class with the highest resulting probability. GNB assumes statistical independence between the inputs [
28]. SVM was included due to its reputation as a high-performance classifier. WKNN, another robust classifier, was included for its performance and because of its use in the simulated annealing algorithm. LD was included for comparison to evaluate any performance degradation that might result from the expected violation of the Gaussian or identical covariance assumptions. GNB was included for comparison to evaluate performance degradation due to the expected violation of independence among the inputs (i.e., the selected wavelengths).
Each classifier was trained with the = 3, 4, 5, 6, and 7 wavelengths selected by the simulated annealing algorithm for each of the three spectral modes. To place the resulting classification accuracy values in context, the results of this study were compared with benchmark classification accuracies determined using all wavelengths in the full-resolution spectra.
2.3.3. Spectral Fusion Classification Study
For this study, the wavelengths were selected for each of the three spectral modes independently, as discussed in the previous section, and then concatenated into a single vector, which formed a new input layer for the MLP classifier. This classifier was then trained and evaluated (using 4-fold cross-validation) for = 3, 4, 5, 6, and 7 wavelengths and the results were compared with a benchmark determined by including all wavelengths from the full-resolution spectra. Due to concerns about the usefulness of the SWIR data for species classification, we also evaluated fusion with just the VNIR and fluorescence modes.
2.4. Fish Fillet Data Collection
Figure 3 shows an overview of the data acquisition and processing steps for the studies represented in this paper. The database for this study consisted of VNIR and SWIR reflectance and fluorescence spectra collected from 133 fish fillets representing a total of 25 different species groups (
Table 1). The species for each fillet was verified using DNA barcoding [
8]. Each fillet was placed in a 150 × 100 × 25 mm sample holder created with a 3D printer (Fortus 250mc, Stratasys, Eden Prairie, MN, USA) using production-grade black thermoplastic. Image acquisition was conducted by the pushbroom method, where a linear motorized translation stage was used to move the sample holder incrementally across the scanning line of the imaging spectrograph. The length of the instantaneous field of view (IFOV) was made slightly longer than the length of the sample holder (150 mm) by adjusting the lens-to-sample distance. The resulting spatial resolution along this dimension was determined as 0.4 mm/pixel. Each fillet was sampled along the width direction (100 mm) of the holder with a step size of 0.4 mm to match the spatial resolution of the length direction [
8].
Flat-field corrections were applied to the VNIR and SWIR reflectance images and the fluorescence images to convert the original absolute intensities in CCD counts to relative reflectance and fluorescence intensities [
29]. An initial spatial mask was then created for each imaging mode to separate the fish fillets from the background. To filter out inaccurate measurements around the thinner edges and portions of the fillets near the bone structure, an outlier removal scheme was instituted. Outliers were handled by first calculating the mean (μ) and standard deviation (σ) of the fish pixel intensities over the entire fillet. Voxels of 10 × 10 pixels were considered to mimic independent fish fillet spectral point measurements using the field of view of a fiber optic spectrometer. Exclusion occurred if ≥10% of the constituent pixels in a voxel exceeded μ ± 2 σ to eliminate outliers.
Figure 4 shows an example result of voxel processing where most of the excluded voxels are concentrated near the fillet edges. This approach produced a final set of spatial masks, one each for the VNIR and SWIR reflectance and fluorescence images, which determined the blocks to be used for analysis. Finally, the fluorescence spectra were scaled by a constant factor of 6000, the approximate maximum of fluorescence spectral values in the database. This was done to set the range of fluorescence values to between zero and one. Alternative normalization methods such as z-score and area under the curve (AUC) normalization were tried as well and produced similar results. However, this simple scaling was chosen because, unlike these alternatives, it requires no knowledge of the entire spectrum and is thus consistent with the concept of collecting only a small number of wavelengths for analysis.
Table 1 provides a summary of this database with the numbers of fillets per species and the number of valid voxels for each fillet and each collection mode.
The reflectance and scaled fluorescence spectra for each of the 25 fish species are shown in
Figure 5. The significant differences in the shapes and positions of the spectral averages for the various species and the homogeneous nature of the spectra (as indicated by the relatively short error bars) suggest that high classification accuracies can be achieved with this spectral information.
2.5. Cross-Validation Train and Test Datasets
For both the single-mode and the spectral fusion studies, 4-fold cross-validation was conducted by dividing the complete dataset (as described in
Table 1) into four disjoint test sets, each of which contained voxels from at least one fillet of each of the 25 species. The corresponding training set for each test set was then composed of all data not in the test set. Four-fold cross-validation (as opposed to the more common 5- or 10-fold versions) was chosen because there was greater variability between fillets of the same species than between voxels of the same fillet. Thus, we wanted to ensure that each test set contained entire fillets that were not included in the corresponding training set. For those species with more than four fillets in the complete dataset (e.g., Malabar blood snapper), the fillets were divided into the four test sets with the goal of having the total number of fillets in each test set as equal as possible.
2.6. Data Imbalance Correction
To prevent classification biases due to data imbalances between the various species, we applied sampling with replacement to each training set to produce 8000 voxel samples per species for a total of 200,000 samples in each training set. No resampling was applied to the test sets, but the measured multiclass classification accuracies were weighted by the number of voxel samples per class to ensure an equal contribution from each species.