Jade Identification Using Ultraviolet Spectroscopy Based on the SpectraViT Model Incorporating CNN and Transformer

Li, Xiongjun; Cai, Jilin; Feng, Jin

doi:10.3390/app14219839

Open AccessArticle

Jade Identification Using Ultraviolet Spectroscopy Based on the SpectraViT Model Incorporating CNN and Transformer

by

Xiongjun Li

^*,

Jilin Cai

and

Jin Feng

College of Physics and Optoelectronic Engineering, Shenzhen University, Shenzhen 518060, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(21), 9839; https://doi.org/10.3390/app14219839

Submission received: 4 September 2024 / Revised: 8 October 2024 / Accepted: 13 October 2024 / Published: 28 October 2024

Download

Browse Figures

Versions Notes

Abstract

:

Jade is a highly valuable and diverse gemstone, and its spectral characteristics can be used to identify its quality and type. We propose a jade ultraviolet (UV) spectrum recognition model based on deep learning, called SpectraViT, aiming to improve the accuracy and efficiency of jade identification. The algorithm combines residual modules to extract local features and transformers to capture global dependencies of jade’s UV spectrum, and finally classifying jade using fully connected layers. Experiments were conducted on a UV spectrum dataset containing four types of jade (natural diamond, cultivated diamond (CVD/HPHT), and moissanite). The results show that the algorithm can effectively identify different types of jade, achieving an accuracy of 99.24%, surpassing traditional algorithms based on Support Vector Machines (SVM) and Partial Least Squares Discriminant Analysis (PLS_DA), as well as other deep learning methods. This paper also provides a reference solution for other spectral analysis problems.

Keywords:

deep learning; spectral analysis; jade identifiction; transformer

1. Introduction

Gems are high-value and diverse minerals or rocks [1]. Effectively identifying gem species can prevent confusion between counterfeits, synthetic gems, and natural gems. Traditional gem identification methods mainly rely on manual observation, testing, and experience-based judgment, which are subjective, unstable, inaccurate, and inefficient. To overcome these issues, spectroscopic analysis technology, as a non-destructive, rapid, accurate, and objective identification method, has gradually gained widespread attention and application [2]. Spectral analysis is a method that uses the spectrum of a substance to identify its composition and relative chemical content. Ultraviolet–visible (UV–VIS) absorption spectroscopy determines the composition, structure, and properties of jade by analyzing the selective absorption characteristics of valence electrons within the material under electromagnetic radiation [3]. The position of absorption peaks and the intensity of absorption in the spectrum are used for qualitative and quantitative analysis of the components. Due to the different crystal structures and chemical compositions of gemstones, they exhibit distinct ultraviolet-visible absorption spectral features. UV–VIS absorption spectroscopy is also included as a common method for gem identification in the national standard GB/T 42645-2023 [3] “Gem Identification—Ultraviolet–Visible Absorption Spectroscopy”. UV–VIS are highly efficient and enable rapid analysis, allowing for direct and non-destructive testing of samples [4]. However, due to the variety of gemstone types, ultraviolet spectra also show complex and diverse forms. It requires a comprehensive consideration of the wavelength position, number, shape, and relative intensity of absorption peaks, making it difficult to quickly and accurately describe and classify using subjective judgment. Gemstone identification based on ultraviolet spectroscopy mainly relies on chemometric models that characterize the relationship between spectral data and gemstone categories [5]. Traditional spectral analysis methods include Principal Component Analysis (PCA) [6,7], Partial Least Squares Discriminant Analysis (PLS_DA), Support Vector Machines (SVM) [8], among others. Although the resolution of spectral measurement equipment is continuously improving, providing more effective information and reducing more noise, the high dimensionality and susceptibility to the interference of spectra, as well as unavoidable noise, pose challenges. This makes it difficult for traditional chemometrics to directly extract effective features from spectra. Therefore, how to use advanced computer technology to improve the accuracy and efficiency of jade ultraviolet spectral analysis, and quickly achieve jade identification, is a pressing problem that needs to be addressed.

In recent years, deep-learning-based spectral analysis techniques have rapidly developed [9] and have been widely applied in fields such as agricultural products [10], pharmaceuticals [11], minerals, and medicine [12]. In 2017, Acquarelli et al. were the first to apply convolutional neural network (CNN)-based methods to classify ten different types of spectra [13]. Compared to traditional chemometric methods, CNNs achieved higher accuracy in quantitative analysis and were less dependent on spectral data processing. In 2019, another study [14] introduced the Inception structure [15] based on CNN, using combinations of convolution kernels of different sizes to extract multi-scale local features of the spectra, thus improving the adaptability of the network. CNN-based classification models [16,17,18] can effectively capture local features of spectra through convolutional layers, but the fixed size of the convolution kernels limits CNNs in learning long-range features. Spectral data, characterized by wavelength or frequency amplitudes, have properties similar to time–frequency sequences [19], and CNNs struggle to capture long-distance feature dependencies. One study [20] combined visible-near-infrared spectra with LSTM and CNN to propose a prediction model for orange quality, which generally outperformed the CNN-based DeepSpectra and CNN-AT models in predicting five quality aspects of oranges, except for vitamin C content prediction, where it was slightly inferior to DeepSpectra.

The Transformer model, emerging and widely applied in various natural language processing tasks, has made significant progress and success in deep learning. Compared with CNNs, Transformers perform better in tasks requiring long-distance dependencies [21]. In recent years, Transformer-based methods have also been proposed for one-dimensional signal classification [22,23,24]. Compared with CNNs, Transformers more easily leverage the self-attention mechanism, compensating for CNNs’ inherent limitations in long-range dependencies. One study proposed the SpectraTr [11] model based on Transformers for spectral classification, but Transformers have significantly more parameters than CNNs, slower inference speeds, and require more training samples. The combination of CNNs and Transformers has shown significant advantages in recent research. CNNs excel at extracting local features, while Transformers can capture long-range dependencies and global information. This combination has achieved remarkable results and widespread application in image classification and recognition, natural language processing, medical image analysis, and multimodal fusion. One study [25] proposed a lightweight CNN–Transformer model to solve the traveling salesman problem (TSP), combining CNN embedding layers and partial self-attention mechanisms to better learn spatial features from input data and reduce redundancy in fully connected attention models, showing clear performance and accuracy advantages over other deep learning models. Another study [26] similarly combined the strengths of CNNs and Transformers, modeling the time–frequency features of EEG signals through alternating model structures, capturing both local features and long-range dependencies, achieving an ROC curve area (AUC) of 93.5% on the CHB-MIT database. Although spectral data have properties similar to time–frequency sequences, there are currently no models combining CNNs and Transformers applied to spectral analysis tasks.

Spectral data characterized by wavelength or frequency amplitudes have properties similar to time–frequency sequences. A hybrid CNN–Transformer spectral classification model, SpectraViT, has been proposed. On a jade ultraviolet spectral dataset, this model outperformed traditional SVM and PLS_DA methods as well as other deep learning classification methods, proving the proposed model to be an effective solution. The model not only has a higher performance but also consumes less computational resources and memory compared to the Transformer model, balancing performance and accuracy.

The rest of this paper is organized as follows: the second part introduces the jade ultraviolet spectral dataset and the architecture of the hybrid model, including the inverted residual structure and Transformer. The third part presents the application experiments of the model and discusses the experimental results in detail. The final section provides the conclusion.

2. Materials and Methods

2.1. Dataset Creation and Preprocessing

The dataset for the model was sourced from the PDS II 300 natural diamond identifier of Shenzhen Jewelry Research Institute Co., Ltd. (Shenzhen, China). The samples include natural diamonds (D), chemical vapor deposition (CVD)-grown diamonds, high pressure high temperature (HPHT)-grown diamonds, and moissanite (MS), with sample diameters ranging from 3 to 9.5 mm. The data collection environment had a temperature range of 16 °C to 27 °C and a humidity range of 35% to 80% RH. Ultraviolet spectral data of various diamond samples were collected under different environments and times, including 4206 natural diamonds, 1826 CVD-grown diamonds, 546 HPHT-grown diamonds, and 774 moissanite samples. Figure 1 shows the ultraviolet spectra of various jade samples.

In the obtained ultraviolet spectrum dataset, the spectral range is from 225 nm to 423 nm, with a sampling interval of 0.13 nm or 0.14 nm. The data point interval of the ultraviolet spectrum may fluctuate by 0.01 nm, and each spectrum may have different starting points and numbers of sampling points, leading to variations in sampling points and lengths for each spectrum. We believe that this inconsistency may affect the accuracy of the spectral analysis. Therefore, the collected ultraviolet spectra were interpolated and resampled. Cubic spline interpolation was used to ensure that the interpolation function satisfied the function values, first-order, and second-order derivatives at the spectral data points. Subsequently, resampling was performed starting from 265 nm with a resampling interval of 0.1 nm, standardizing the length of all spectra to 1300. The interpolation algorithm [27] steps are as follows (Algorithm 1):

Algorithm 1 Cubic spline interpolation.

1:

Input: Given spectral data points

(x_{i}, y_{i}), i = 1, 2, 3, \dots, n - 1

2:

Fitting: Use a cubic polynomial

S_{i} (x)

to fit each subinterval

[x_{i}, x (i + 1)]

, where

S (x)

is the interpolation function satisfying the following conditions:

At each data point, $S (x)$ equals $y_{i}$ , i.e., $S (x_{i}) = y_{i}, i = 1, 2, 3, \dots, n$ .
At each internal node, the first and second derivatives of $S (x)$ are continuous, i.e., $S_{i}^{'} (x_{i + 1}) = S_{i + 1}^{'} (x_{i + 1})$ and $S_{i}^{″} (x_{i + 1}) = S_{i + 1}^{″} (x_{i + 1}), i = 1, 2, 3, \dots, n - 1$
At the two endpoints, $S (x)$ satisfies one of the boundary conditions: fixed boundary conditions $S^{'} (x_{0}) = A$ , $S^{'} (x_{n}) = B$ , or non-clamped boundary conditions $S_{0}^{‴} (x_{0}) = S_{1}^{‴} (x_{1})$ , $S_{n - 2}^{‴} (x_{n - 1}) = S_{n - 1}^{‴} (x_{n})$ .

3:

Formulation: Construct a system of linear equations for the second derivatives

S_{0}^{″} (x_{0}) = M_{i}

, using the above conditions, and solve for

M_{i}, i = 1, 2, 3, \dots, n

.

4:

Coefficient Calculation: Calculation: Using

M_{i}

, determine the coefficients

a_{i}

,

b_{i}

,

c_{i}

,

d_{i}

, for each subinterval, such that

S_{i} (x) = a_{i} + b_{i} (x - x_{i}) + c_{i} {[(x - x_{i})]}^{2} + d_{i} {[(x - x_{i})]}^{3}

.

5:

Interpolation Function Construction: Integrate the polynomial coefficients from each interpolation segment into a complete interpolation function.

6:

Interpolation Calculation: Calculation: Given an interpolation point x, use the piecewise expression of

S (x)

to compute the corresponding function value

S (x)

.

Finally, the dataset was randomly divided into training, validation, and test sets in a ratio of 6:2:2, as shown in Table 1.

2.2. SpectraViT Model

In recent years, the Transformer model has been widely applied in the field of deep learning. In this study, we use the Transformer model to address the classification problem of jade and incorporate an inverted residual module to extract local features from the spectral signals. The model structure is shown in Figure 2.

First, the input data pass through the MV2 module (Figure 3) and then through an inverted residual module (Figure 3a). This inverted residual module consists of two convolutional layers and a SiLU activation function, designed to learn local features while maintaining consistent input and output dimensions. Another residual module (Figure 3b) is used for downsampling, reducing the sequence length of the input to 1 × 650 through a convolution operation with a stride of 2.

Subsequently, global feature extraction is performed through the Transformer module (Figure 2). A Conv1 × 1 convolutional layer increases the number of feature channels to 320. Next, a global average pooling layer reduces the sequence length of the features to 1 while retaining the average value of each channel. This helps capture global features and reduce the number of model parameters. Finally, a linear layer maps the features to the output space, transforming the 320-dimensional feature vector into a 4-dimensional output vector.

2.2.1. Inverted Residual Module

The core idea of residual connections [28] is to directly pass the input signal to subsequent layers through skip connections, allowing the network to better retain feature information. The inverted residual structure [29] was first introduced in the lightweight and modular MobileNetV2 model. It addresses the issues of gradient vanishing and gradient exploding during the training process of convolutional networks. The input dimension is first reduced through projection convolution, then features are extracted using depthwise separable convolution, and, finally, the output dimension is increased again through projection convolution. This approach includes a linear bottleneck to prevent information loss. Compared to traditional residual structures, it offers a higher accuracy, stronger generalization ability, and reduces the number of parameters and computational load. Our model uses two residual modules (Figure 1) for local feature learning of spectral signals, improving accuracy compared with traditional CNN structures. Each residual module includes two 1 × 1 convolutional layers and a 1 × 20 depthwise separable convolutional layer. The SiLU [30] activation function is used for feature extraction from the spectral signals. The second residual module is used for downsampling, with a convolution stride set to 2 to reduce the feature dimensions.

2.2.2. Transformer Module

The Transformer model is a sequence modeling method based on self-attention mechanisms [21], initially used for natural language processing tasks. Compared with traditional convolutional neural network (CNN) models, the Transformer module employs a global attention mechanism, which better captures long-distance dependencies within sequences. The Transformer module consists of an encoder and a decoder. The encoder is responsible for transforming the input sequence into a series of high-level feature representations, while the decoder generates the target sequence based on the output of the encoder. This study utilizes the encoder module of the Transformer.The encoder block consists of multi-head attention layers, feed-forward neural network layers, and two layer normalization blocks, as shown in Figure 4.

Multi-Head Attention Layer

In the Transformer module, the global attention mechanism is a key component. The multi-head attention layer features multiple parallel attention heads, each capable of learning different attention points. This allows the model to simultaneously focus on various aspects of the spectral data, enabling it to pay attention to all positions in the sequence and to better capture contextual information. As shown in Figure 5, the input sequence is first mapped to the spaces of queries (Q), keys (K), and values (V) through linear transformations. Then, attention weights for multiple attention heads are calculated using Formula (1), resulting in weighted values for each head. These weighted values are combined to form the final output representation. Here, Q represents the query vector, K represents the key vector, V represents the value vector, and

d_{k}

represents the dimension of the key vector.

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(1)

Feed-Forward Neural Network

Another important component of the model is the Feed-Forward Network (FFN), which applies nonlinear transformations to the output of the attention mechanism at each position. FFN maps and transforms the input according to Formula (2), thereby enhancing the model’s nonlinear expressiveness. Here, W and W are learnable weight matrices,

b_{1}

and

b_{2}

are bias terms, and SiLU is the nonlinear activation function.

FFN (x) = SiLU (x W + b_{1}) W^{'} + b_{2}

(2)

2.2.3. Classifiction Layer

Apply global average pooling to the output of the last layer of the Transformer to obtain a fixed-length vector. Input this vector into a fully connected layer for feature extraction and transformation. Finally, use a linear transformation to map the output of the fully connected layer to a 1 × 4 vector space. Then, apply the Softmax function to normalize the vector, generating a vector that represents the categories of jade.

2.3. Loss Function

This paper uses the cross-entropy loss function to address the multi-class classification problem of spectra. It combines log-likelihood loss and cross-entropy, which measures the distance between the input probability distribution and the target probability distribution. Its formula is expressed as follows:

loss (x, y) = - \sum_{i = 1}^{n} p (x_{i}) log (y_{i})

(3)

x_{i}

represents the i-th element of the true label, and

y_{i}

represents the probability that the model predicts x belongs to the i-th class.

2.4. Evaluation Methods

This paper uses classic evaluation metrics including the Confusion Matrix, Accuracy, Precision, Recall, and F1 Score. The Confusion Matrix provides an intuitive representation of classification results. Accuracy is the percentage of correctly predicted samples out of the total number of samples. Precision indicates the percentage of actual positive samples among those predicted as positive by the model. Recall represents the percentage of actual positive samples that are predicted as positive. The F1 Score is the harmonic mean of Precision and Recall. Their mathematical expressions are shown in the following equations. Due to the significant differences in the number of samples across each class in the dataset, we compute the weighted average of Precision, Recall, and F1 Score for each class to evaluate the overall performance of the model.

Acc = \frac{T N + T P}{T P + T N + F N + F P} \times 100 %

(4)

Precision = \frac{T P}{T P + F P} \times 100 %

(5)

Recall = \frac{T N}{T N + F P} \times 100 %

(6)

F 1 = \frac{2 \times (Precision \times Recall)}{Recall + Precision} \times 100 %

(7)

where

T P

denotes true positives (where both the true label and the prediction are correct).

F P

denotes false positives (where the true label is incorrect but the prediction is correct).

F N

represents false negatives (where the true label is correct but the prediction is incorrect).

T N

denotes true negatives (where both the true label and the prediction are incorrect).

3. Results

3.1. Experimental Environment

The experiment was conducted on a desktop equipped with a GTX 1050Ti graphics processing unit (GPU). The software environment used Windows 10 operating system, CUDA 12.1, and cuDNN 8.9.7. The learning framework employed Pytorch 2.0 as the backend and Python 3.10 as the programming language. Adam was used as the optimizer to optimize the network model, with an initial learning rate of 1 × 10⁻⁴ , a batch size of 10, and 500 epochs of training. The learning rate was dynamically adjusted using the cosine annealing algorithm, eventually reducing it to 1 × 10⁻⁶. The cosine function was used to first slow down the learning rate, then accelerate the decrease, and finally slow it down again, in order to avoid the phenomenon of gradient descent being too rapid during training, bringing the loss function as close to the optimal solution as possible. This effectively avoided local optimal results, accelerated model convergence, and improved the model’s generalization ability, demonstrating good results in this experiment.

3.2. Model Performance on the Jade Spectral Dataset

The model with the highest accuracy in the validation set was tested for performance on the test set. As shown in Table 2, the model achieved an accuracy of 99.24% in the jade classification task, demonstrating excellent classification capability. The model’s recall rate is slightly higher than its precision, indicating a tendency to capture positive samples, although this may lead to some false positives. The F1 score is 99.23%, reflecting a good balance between precision and recall. As shown in Figure 6, the loss curve (a) and accuracy curve (b) indicate that our model converged after 190 training iterations and achieved a high accuracy.The confusion matrix in Table 3 shows that the model performs very well on the D, CVD, and MS categories, but there are some misclassifications in the HPHT category, mainly misclassifying HPHT samples as CVD. Overall, the model exhibits an outstanding performance.

3.3. Comparison of the Model with Other Models

To validate the superiority of the SpectraViT model used in this study, comparisons were first made with traditional classification algorithms, including SVM and PLS_DA. Then, comparisons were conducted with CNN-based models such as AlexNet and DeepSpectra, as well as with the Transformer-based SpectraTr model for time-series sequences.

3.3.1. Comparison with SVM and PLS

SVM and PLS-DA are two commonly used traditional machine learning methods widely applied in spectral analysis. SVM is a popular supervised learning model used for classification and regression analysis, with the core idea of finding a hyperplane that maximizes the margin between classes. PLS_DA is a classification method that combines partial least squares regression with linear discriminant analysis, particularly suitable for high-dimensional data.

In this section, we compared the deep learning model SpectraViT with SVM and PLS_DA to verify the superiority of our model over traditional methods. SVM and PLS_DA are commonly used traditional machine learning methods in spectral analysis. To implement these two classification algorithms, we first standardized the data. For the SVM model, we used a linear kernel function with parameters C = 1 and

γ

= auto, applying the One-vs-One strategy for classifying the four types of UV spectra. For the PLS-DA model, we selected 100 principal components for feature extraction. All of the models were implemented using Python and its Scikit-learn library.

As shown in Table 4, the SpectraViT model demonstrated outstanding performance in accuracy, recall, precision, and F1 score, achieving 99.24%, 99.25%, 99.06%, and 99.23%, respectively. In comparison, the performance of SVM and PLS_DA is lower. These results indicate that SpectraViT offers performance advantages in spectral analysis tasks over traditional machine learning methods, particularly in providing a higher classification accuracy and better model generalization when handling complex datasets.

3.3.2. Comparison with Other Deep Learning Models

In this study, we compared SpectraViT with AlexNet, DeepSpectra, and SpectraTr deep learning models. The comparison was based on the number of parameters, FLOPs (floating-point operations per second), and performance metrics on the test set, including accuracy, recall, precision, and F1 score. Table 5 details the performance of each model:

In terms of the number of parameters, SpectraViT has the fewest, with only 0.852 M, significantly lower than the other models. AlexNet has 8.291 M parameters, DeepSpectra has as many as 207.398 M, and SpectraTr has 16.030 M. Fewer parameters indicate that our model is more efficient in terms of storage and computation, making it suitable for deployment in resource-constrained environments. SpectraViT also has the lowest FLOPs, at just 0.009 G, indicating the lowest computational complexity. In contrast, AlexNet’s FLOPs are 0.022 G, DeepSpectra’s are 0.213 G, and SpectraTr’s are as high as 2.101 G. Lower FLOPs give SpectraViT a significant advantage in inference speed and energy consumption.

Regarding performance on the validation set, SpectraViT achieved an accuracy of 99.31%, outperforming all other models. AlexNet and DeepSpectra achieved validation accuracies of 98.77% and 98.43%, respectively, while SpectraTr achieved 97.82%. On the test set, SpectraViT also outperformed the other models, with an accuracy of 99.24%, recall of 99.25%, precision of 99.06%, and F1 score of 99.23%. AlexNet’s accuracy was 98.02%, recall was 98.02%, precision was 98.09%, and F1 score was 98.05%. DeepSpectra’s metrics were slightly lower than AlexNet’s but close, around 98.36%. SpectraTr performed relatively worse.

Overall, SpectraViT excels in all aspects, particularly in terms of parameter count and computational complexity, while also achieving better accuracy, recall, precision, and F1 score on both the validation and test sets compared to other models. This indicates that SpectraViT is not only more efficient in terms of resource consumption but also superior in performance, with better generalization capability. Thus, SpectraViT demonstrates better performance than other deep convolutional neural network models and pure Transformer networks for jade UV spectrum recognition.

3.4. Impact of Preprocessing Algorithms on Model Performance

The experimental results shown in Table 6 indicate that preliminary preprocessing of spectral data through interpolation and resampling can significantly enhance the accuracy of classification algorithms. This finding confirms the effectiveness of interpolation and resampling techniques in adjusting data sampling rates and distributions, especially when handling unevenly sampled signals or converting signals between different sampling rates.

In the deep learning process, models using resampling techniques performed the best. With the original data, the SpectraViT model achieved an accuracy and recall of 98.29%, precision of 98.30%, and an F1 score of 98.30%. This demonstrates that even without any preprocessing, the model performs well, indicating a strong robustness to the raw data.

When the raw spectral data were resampled, model performance improved significantly, with accuracy reaching 99.24%, recall of 99.25%, precision of 99.06%, and an F1 score of 99.23%. This suggests that resampling preserves and expresses the most useful initial features of the data.

Further experiments explored the effects of different preprocessing algorithms, including normalization (NMS), standard score transformation (SS), Savitzky–Golay smoothing (SG), and multiplicative scatter correction (MSC), applied after interpolation and resampling. The results showed a slight decrease in model performance with these preprocessing methods. This indicates that these preprocessing methods may lead to some loss of feature information, reducing the model’s classification performance. However, our model still demonstrated strong nonlinear fitting and adaptive learning capabilities, effectively extracting key features from the data. Additionally, our model is capable of achieving good spectral recognition without extensive preprocessing.

3.5. Impact of Loss Functions on Model Performance

In the spectral classification experiments, we compared the performance of four different loss functions—Cross Entropy Loss, Weighted Cross Entropy Loss, Focal Loss, and Dice Loss—in terms of model accuracy. The results showed that Cross Entropy Loss achieved the highest accuracy rate of 99.24%, followed by Weighted Cross Entropy Loss with an accuracy of 98.54%. Focal Loss and Dice Loss had accuracies of 98.32% and 0.98.65%, respectively, as shown in Table 7. These results indicate that, for spectral data classification tasks, the standard Cross Entropy Loss function is the optimal choice among the four loss functions.

3.6. Ablation Experiments

To investigate the impact of each module on the model, ablation experiments were conducted. Specifically, key modules were removed from the complete model to compare the changes in model performance. The complete model includes four modules: conv1, mv2, transformer, and conv2. Here, conv1 and conv2 are convolution modules with 1 × 1 kernels, used for linear combinations of features across different channels, increasing cross-channel information interaction, and performing dimensionality reduction or expansion to adjust the number of output channels.

In this experiment, comparisons were made by removing either the Mv2 module or the Transformer module from the complete model. Table 8 shows the results of the ablation experiments, where Accuracy refers to the accuracy on the test set. As seen in Table 8, removing each module leads to a decrease in the model’s accuracy. Notably, the removal of the Transformer module causes a significant drop in accuracy, indicating the crucial role of the global attention mechanism in spectral classification. The Transformer module, combined with the convolution modules, can more effectively learn spectral features and improve accuracy.

3.7. Performance of SpectraViT on Public Datasets

To further validate the recognition and generalization performance of the SpectraViT model, we conducted a comparative analysis using the publicly available fruit puree dataset. The primary connection between this dataset and the jade dataset lies in their use for qualitative analysis, allowing us to apply similar analysis methods to process and compare the spectral features of different substances. Specifically, this dataset includes 351 strawberry samples, 159 raspberry samples, and 665 “non-fruit” samples (including various other fruits and “contaminated” strawberry and raspberry purees, where the weight percentage of other fruits is greater than 10%). The dataset was sourced from reference [31]. The results are shown in Table 9. According to the performance comparison on the fruit puree spectral dataset, our SpectraViT model outperforms all other models across all evaluation metrics (accuracy, recall, precision, and F1 score), achieving 98.47%, 98.49%, 98.47%, and 98.48%, respectively. AlexNet and SpectraTr also performed well, with AlexNet achieving accuracy, recall, precision, and F1 scores of 97.96%, 97.95%, 97.99%, and 97.96%, respectively, and SpectraTr achieving 97.45%, 97.45%, 97.43%, and 97.44%, respectively. In contrast, DeepSpectra had noticeably lower metrics, with an accuracy of about 95.41%.

Traditional machine learning methods did not perform as well on the fruit puree spectral dataset compared with the jade UV spectral dataset, with SVM and PLS-DA achieving accuracies of only 92.93% and 93.88%, respectively. Traditional machine learning methods struggle to achieve a strong performance across multiple datasets, whereas deep learning models can effectively classify multiple spectral datasets. This indicates that deep learning models have a significant advantage in handling spectral data. Furthermore, among deep learning models, our SpectraViT model demonstrates superior accuracy, recall, precision, and F1 score on both the jade UV spectral dataset and the fruit puree infrared spectral dataset compared with other deep learning models. This suggests that SpectraViT is a better-suited deep learning model for spectral data classification.

4. Conclusions

This paper presents a deep learning model, SpectraViT, which combines convolutional neural networks and Transformers for the classification task of jade UV spectra. The model leverages convolutional neural networks to extract local features of the spectra and Transformers to capture global dependencies, achieving efficient identification of jade samples. Experiments on the jade UV spectral dataset validated the effectiveness and superiority of the model, outperforming traditional machine learning algorithms and other neural network models. The experimental results indicate that the model achieves an accuracy of 99.24% in jade classification tasks. Additionally, different preprocessing methods for spectral data were compared, revealing that interpolation and resampling significantly improve classification accuracy, while other preprocessing methods may lead to feature loss. The proposed model provides a new approach and method for the UV spectral classification of jade, but there are still some limitations and areas for improvement. The dataset used in this study is relatively small, which may pose a risk of overfitting. Future work could involve using larger and more diverse spectral datasets for training and testing to enhance the model’s generalization ability and robustness.

Author Contributions

Conceptualization, J.C. and X.L.; methodology, X.L.; software, J.C.; validation, J.C., X.L. and J.F. ; formal analysis, X.L.; investigation, J.C.; resources, X.L.; data curation, J.C.; writing—original draft preparation, J.C.; writing—review and editing, X.L.; visualization, J.C.; supervision, X.L.; project administration, X.L.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 51827901.

Institutional Review Board Statement

Ethical review and approval were waived for this study because the study was conducted on Jade ultraviolet spectral dataset and publicly available spectral datasets and did not involve experimental studies in humans.

Informed Consent Statement

Not applicable.

Data Availability Statement

The ultraviolet spectral dataset presented in this study is unavailable due to privacy reasons, while the fruit puree dataset is publicly available, reference number [31].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, B.-Z. Overview of Gemstones. Synth. Cryst. 1983, 2, 88–101. [Google Scholar] [CrossRef]
Wang, H.-P.; Chen, P.; Dai, J.-W.; Liu, D.; Li, J.-Y.; Xu, Y.-P.; Chu, X.-L. Recent advances of chemometric calibration methods in modern spectroscopy: Algorithms, strategy, and related issues. TrAC Trends Anal. Chem. 2022, 153, 116648. [Google Scholar] [CrossRef]
GB/T 42645-2023; Gems Testing—Ultraviolet-Visible Absorbtion Spectroscopy. State Administration for Market Regulation, Standardization Administration of China: Beijing, China, 2023.
Song, G.; Wu, J.; Jian, H.; Li, J.; Liu, L. Rapid Detection Technology for Gems and Jewelry Based on Ultraviolet-Visible Reflection Spectroscopy. In Proceedings of the the 2013 Proceedings of the China Jewelry and Ornament Academic Exchange Conference, NGTC, Wuhan, China, 27–30 March 2013; China Gems & Jewelry Trade Association: Beijing, China, 2013; pp. 292–295. [Google Scholar]
Hopke, P.K. The evolution of chemometrics. Anal. Chim. Acta 2003, 500, 365–377. [Google Scholar] [CrossRef]
Dadon, A.; Mandelmilch, M.; Ben-Dor, E.; Sheffer, E. Sequential PCA-based Classification of Mediterranean Forest Plants using Airborne Hyperspectral Remote Sensing. Remote Sens. 2019, 11, 2800. [Google Scholar] [CrossRef]
Tan, A.; Zhao, J.; Zhao, Y.; Li, X.; Su, H. Determination of microplastics by FTIR spectroscopy based on quaternion parallel feature fusion and support vector machine. Chemom. Intell. Lab. Syst. 2023, 243, 105018. [Google Scholar] [CrossRef]
Wu, Q.; Luo, J.; Fang, H.; He, D.; Liang, T. Spectral classification analysis of recycling plastics of small household appliances based on infrared spectroscopy. Vib. Spectrosc. 2024, 130, 103636. [Google Scholar] [CrossRef]
Yang, J.; Xu, J.; Zhang, X.; Wu, C.; Lin, T.; Ying, Y. Deep learning for vibrational spectral analysis: Recent progress and a practical guide. Anal. Chim. Acta 2019, 1081, 6–17. [Google Scholar] [CrossRef]
Padarian, J.; Minasny, B.; McBratney, A.B. Using deep learning to predict soil properties from regional spectral data. Geoderma Reg. 2019, 16, e00198. [Google Scholar] [CrossRef]
Fu, P.; Wen, Y.; Zhang, Y.; Li, L.; Feng, Y.; Yin, L.; Yang, H. SpectraTr: A novel deep learning model for qualitative analysis of drug spectroscopy based on transformer structure. J. Innov. Opt. Health Sci. 2022, 15, 2250021. [Google Scholar] [CrossRef]
Chang, X.; Yu, M.; Liu, R.; Jing, R.; Ding, J.; Xia, J.; Zhu, Z.; Li, X.; Yao, Q.; Zhu, L.; et al. Deep learning methods for oral cancer detection using Raman spectroscopy. Vib. Spectrosc. 2023, 126, 103522. [Google Scholar] [CrossRef]
Acquarelli, J.; Laarhoven, T.V.; Gerretzen, J.; Tran, T.N.; Buydens, L.M.C.; Marchiori, E. Convolutional neural networks for vibrational spectroscopic data analysis. Anal. Chim. Acta 2017, 954, 22–31. [Google Scholar] [CrossRef] [PubMed]
Zhang, X.; Lin, T.; Xu, J.; Luo, X.; Ying, Y. DeepSpectra: An end-to-end deep learning approach for quantitative spectral analysis. Anal. Chim. Acta 2019, 1058, 48–57. [Google Scholar] [CrossRef] [PubMed]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef]
Balabin, R.M.; Safieva, R.Z.; Lomakina, E.I. Comparison of linear and nonlinear calibration models based on near infrared (NIR) spectroscopy data for gasoline properties prediction. Chemom. Intell. Lab. Syst. 2007, 88, 183–188. [Google Scholar] [CrossRef]
Xu, P.; Fu, L.; Xu, K.; Sun, W.; Tan, Q.; Zhang, Y.; Zha, X.; Yang, R. Investigation into maize seed disease identification based on deep learning and multi-source spectral information fusion techniques. J. Food Compos. Anal. 2023, 119, 105254. [Google Scholar] [CrossRef]
Hoon Yun, B.; Yu, H.Y.; Kim, H.; Myoung, S.; Yeo, N.; Choi, J.; Sook Chun, H.; Kim, H.; Ahn, S. Geographical discrimination of Asian red pepper powders using 1H NMR spectroscopy and deep learning-based convolution neural networks. Food Chem. 2024, 439, 138082. [Google Scholar] [CrossRef]
Fast detection of cumin and fennel using NIR spectroscopy combined with deep learning algorithms. Optik 2021, 242, 167080. [CrossRef]
Wu, Y.; Zhu, X.; Huang, Q.; Zhang, Y.; Evans, J.; He, S. Predicting the Quality of Tangerines Using the GCNN-LSTM-AT Network Based on Vis–NIR Spectroscopy. Appl. Sci. 2023, 13, 8221. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.U.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: New York, NY, USA, 2017; Volume 30. [Google Scholar]
Sun, J.; Xie, J.; Zhou, H. EEG Classification with Transformer-Based Models. In Proceedings of the 2021 IEEE 3rd Global Conference on Life Sciences and Technologies (LifeTech), Kyoto, Japan, 10–12 March 2021; pp. 92–93. [Google Scholar] [CrossRef]
Wang, H.; Cao, L.; Huang, C.; Jia, J.; Dong, Y.; Fan, C.; De Albuquerque, V.H.C. A Novel Algorithmic Structure of EEG Channel Attention Combined with Swin Transformer for Motor Patterns Classification. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 3132–3141. [Google Scholar] [CrossRef]
Zeynali, M.; Seyedarabi, H.; Afrouzian, R. Classification of EEG signals using Transformer based deep learning and ensemble models. Biomed. Signal Process. Control 2023, 86, 105130. [Google Scholar] [CrossRef]
Jung, M.; Lee, J.; Kim, J. A lightweight CNN-transformer model for learning traveling salesman problems. Appl. Intell. 2024, 54, 7982–7993. [Google Scholar] [CrossRef]
Li, C.; Huang, X.; Song, R.; Qian, R.; Liu, X.; Chen, X. EEG-based seizure prediction via Transformer guided CNN. Measurement 2022, 203, 111948. [Google Scholar] [CrossRef]
Dyer, S.A.; Dyer, J.S. Cubic-spline interpolation. 1. IEEE Instrum. Meas. Mag. 2001, 4, 44–46. [Google Scholar] [CrossRef]
Zhao, Y.; Zhang, X.; Feng, W.; Xu, J. Deep Learning Classification by ResNet-18 Based on the Real Spectral Dataset from Multispectral Remote Sensing Images. Remote Sens. 2022, 14, 4883. [Google Scholar] [CrossRef]
Sinha, D.; El-Sharkawy, M. Thin MobileNet: An Enhanced MobileNet Architecture. In Proceedings of the 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 10–12 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 280–285. [Google Scholar] [CrossRef]
Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 2018, 107, 3–11. [Google Scholar] [CrossRef]
Shu, H.; Tang, H.; Zhang, H.; Zheng, W. Data for: Spectra Data Classification with Kernel Extreme Learning. Mendeley Date V1 2019. [Google Scholar] [CrossRef]

Figure 1. Representative ultraviolet spectra of jade, including D (natural diamond), CVD (chemical vapor deposition-grown diamond), HPHT (high pressure high temperature-grown diamond), and MS (moissanite). The different colors represent the same type of spectra obtained under different conditions. (a) D, (b) CVD, (c) HPHT, (d) MS.

Figure 2. Structure of SpectraViT.

Figure 3. Structure of MV2 module. (a) MV2, (b) MV2↓.

Figure 4. Transformer module.

Figure 5. Structure of multi-head attention layer.

Figure 6. Loss and accuracy curves for training and validation sets. (a) loss, (b) acccuracy.

Table 1. The sample numbers for training, validation, and testinf with a total of 7352 samples, including D (Natural Diamond), CVD (chemical vapor deposition-grown diamond), HPHT (high pressure high temperature-grown diamond), and MS (moissanite).

Class	Training Set	Validation Set	Test Set	Total	Percentage (%)
D	2525	840	841	4206	57.21
CVD	1098	364	364	1826	24.83
HPHT	330	108	108	546	7.43
MS	466	154	154	774	10.53

Table 2. Performance of the model (%).

Datasets	Acc (%)	Recall (%)	Precision (%)	F1 (%)
Dataset of jade ultraviolet spectra	99.24	99.24	99.06	99.23

Table 3. Confusion Matrix.

Model	True Label	Predicted Label
Model	True Label	D	CVD	HPHT	MS
SpectraViT	D	838	2	0	0
	CVD	0	361	3	0
	HPHT	1	5	102	0
	MS	0	0	0	154

Table 4. Performance comparison of SpectraViT, SVM, and PLS_DA.

Model	Test Datasets
Model	Acc (%)	Recall (%)	Precision (%)	F1 (%)
SpectraViT	99.24	99.25	99.06	99.23
SVM	98.43	98.43	98.46	98.44
PLS_DA	98.36	98.36	98.48	98.39

Table 5. Performance comparison of SpectraViT and other models.

Model	Params (M)	FLOPs (G)	Val_Acc (%)	Test Datasets
Model	Params (M)	FLOPs (G)	Val_Acc (%)	Acc (%)	Recall (%)	Precision (%)	F1 (%)
SpectraViT	0.852	0.009	99.31	99.24	99.25	99.06	99.23
AlexNet	8.291	0.022	98.77	98.02	98.02	98.09	98.05
DeepSpectra	207.398	0.213	98.43	98.36	98.36	98.42	98.38
SpectraTr	16.030	2.101	97.82	96.93	96.93	97.06	96.72

Table 6. The effect of different preprocessing methods on SpectraViT.

Method		Acc (%)	Recall (%)	Precision (%)	F1 (%)
Raw		98.29	98.29	98.30	98.30
Resample	None	99.24	99.25	99.06	99.23
	NMS	98.09	98.09	98.12	98.10
	SS	98.43	98.43	98.48	98.45
	SG	98.57	98.57	98.57	98.37
	MSC	98.64	98.64	98.65	98.64

Table 7. The Effect of Different Loss Functions on SpectraViT.

	Cross Entropy Loss	Weighted Cross Entropy Loss	Focal Loss	Dice Loss
Acc (%)	99.24	98.54	98.32	98.65

Table 8. The Results of the impact of different modules on model performance. × indicates the removal of a module, ✔ indicates the retention of a module.

Mv2	Transformer	Accuracy (%)	Recall (%)	Precision (%)	F1 (%)
✔	✔	99.24	99.25	99.06	99.23
×	✔	98.91	98.91	98.91	98.91
✔	×	93.18	93.18	93.06	92.50

Table 9. Performance comparison on the fruit-puree-spectra datasets.

Model	Acc (%)	Recall (%)	Precision (%)	F1 (%)
SpectraViT	98.47	98.49	98.47	98.48
SVM	92.93	92.93	93.00	92.93
PLS-DA	93.88	93.88	93.95	93.82
AlexNet	97.96	97.95	97.99	97.96
DeepSpectra	95.41	95.41	95.49	95.38
SpectraTr	97.45	97.45	97.43	97.44

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, X.; Cai, J.; Feng, J. Jade Identification Using Ultraviolet Spectroscopy Based on the SpectraViT Model Incorporating CNN and Transformer. Appl. Sci. 2024, 14, 9839. https://doi.org/10.3390/app14219839

AMA Style

Li X, Cai J, Feng J. Jade Identification Using Ultraviolet Spectroscopy Based on the SpectraViT Model Incorporating CNN and Transformer. Applied Sciences. 2024; 14(21):9839. https://doi.org/10.3390/app14219839

Chicago/Turabian Style

Li, Xiongjun, Jilin Cai, and Jin Feng. 2024. "Jade Identification Using Ultraviolet Spectroscopy Based on the SpectraViT Model Incorporating CNN and Transformer" Applied Sciences 14, no. 21: 9839. https://doi.org/10.3390/app14219839

APA Style

Li, X., Cai, J., & Feng, J. (2024). Jade Identification Using Ultraviolet Spectroscopy Based on the SpectraViT Model Incorporating CNN and Transformer. Applied Sciences, 14(21), 9839. https://doi.org/10.3390/app14219839

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Jade Identification Using Ultraviolet Spectroscopy Based on the SpectraViT Model Incorporating CNN and Transformer

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Creation and Preprocessing

2.2. SpectraViT Model

2.2.1. Inverted Residual Module

2.2.2. Transformer Module

Multi-Head Attention Layer

Feed-Forward Neural Network

2.2.3. Classifiction Layer

2.3. Loss Function

2.4. Evaluation Methods

3. Results

3.1. Experimental Environment

3.2. Model Performance on the Jade Spectral Dataset

3.3. Comparison of the Model with Other Models

3.3.1. Comparison with SVM and PLS

3.3.2. Comparison with Other Deep Learning Models

3.4. Impact of Preprocessing Algorithms on Model Performance

3.5. Impact of Loss Functions on Model Performance

3.6. Ablation Experiments

3.7. Performance of SpectraViT on Public Datasets

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI