1. Introduction
SDSS (the Sloan Digital Sky Survey [
1,
2]) and LAMOST (the Large Sky Area Multi-Object Fiber Spectroscopic Telescope [
3,
4]) are the two most ambitious spectroscopic surveys worldwide. The latest and final data release of optical spectra of SDSS is data release 16.
LAMOST has been upgraded to have both low- and medium-resolution modes of observation and has finished data release 7 (DR7) [
5] which is the product of the medium-resolution spectra (MRS) survey (R∼7500)
1. LAMOST DR7 was released in March 2020. The massive spectra from the survey telescope has provided a new opportunity for searching for special and unusual celestial bodies such as cataclysmic variables (CVs) [
6,
7,
8].
Over recent decades, spectra classification has mainly focused on 1D processing and achieved recognized results [
9,
10,
11]. With the continuous development of deep learning technology [
12], CNN (convolutional neural network) [
13] is giving an excellent performance for feature extraction and combination. As a classic deep convolutional network, VGGNet (Visual Geometry Group Network) [
14] is widely used in image classification tasks. ResNet (Residual Network) is the network structure [
15]; its innovative use of residual learning allows deep networks to extract more effective features. DenseNet (Densely Connected Convolutional Networks) [
16] first proposes that the output of each layer is the input of the next layer, allowing the network to extract features of more dimensions. A multi-layer convolutional neural network was first introduced based on the learnable kernel into the field of spectral classification but did not achieve satisfactory results [
17]; this method tried to fold the 1D spectra, but did not achieve satisfactory results.
This paper verifies the availability of the networks above in 2D and gives an in-depth discussion on the degree of feature extraction and overfitting and proposes corresponding solutions.
Spectral signal-to-noise ratio (SNR) often leads to different results within the same method ([
18]). This paper explores the influence of SNR on classification accuracy under various methods and different dimensions of the same method. Different classification network structures are implemented under high and low SNR conditions, which efficiently improves the classification accuracy. There are uneven categories in the experimental spectra, and we investigated each small category separately.
The classification algorithm should be able to extract the specific features which are representative of individual classes. Therefore, the identification of individual features should be maximized. The appearance of common features like the continuum, telluric lines, etc. and noise should be minimized in the feature extraction process.
In this paper, different networks were verified according to the practical situation of LAMOST spectra. A novel and robust method is proposed and implemented based on enhanced multi-scale coded convolutional neural network (EMCCNN). The method is verified by SDSS spectra and then a systematic search for CVs in MRS of LAMOST is carried out with it.
CVs are binary systems consisting of a white dwarf star and a companion star. The companion star is usually a K or M type red dwarf star, and in some cases a white dwarf star or a red giant star. CVs have important role and significance for studying the theory of accretion disks. Presently, only about two thousand CVs have been discovered [
19]. Most of the optical spectra of CVs are identified by large surveys including SDSS [
20,
21,
22,
23,
24], Far-Ultraviolet Spectroscopic Explorer (FUSE) [
25], ROSAT XRT-PSPC All Sky Survey [
26] and Catalina Real-time Transient Survey (CRTS) [
27,
28]. There are still some unresolved questions about CVs such as period gap due to the limitation of samples [
29]. Higher requirements are put forward for new automatic classification methods for searching for CVs in massive spectra.
This paper is organized as follows.
Section 2 describes experimental data and data preprocessing. We introduce the different methods used in the paper in
Section 3.
Section 4 presents the implementation of the methods in detail.
Section 5 reports the results of our experiment. In
Section 6, we give our conclusions and outline our plans for future work.
3. Methods
The goal of spectra classification is to maximize common features and minimize noise and spectral individual features. A variety of classification networks are employed and tested.
3.1. Convolutional Neural Network
CNN can handle both 1D and 2D data, depending on whether its kernel is 1D or 2D. We use CNN to classify M stars, because CNN performs well in image classification, and it can also handle 1D spectra classification. The feature extraction capabilities of CNN can help us to improve the accuracy of traditional classification methods. Quadratic interpolation is performed to facilitate the folding of the spectra to form a 2D spectral matrix.
As a special form of CNN, 1D CNN has certain applications in the field of signal processing. We adopted the network structure shown in
Figure 7.
In the specific network construction, the backpropagation algorithm is adopted. The 1D matrix information composed of spectra is continuously trained to fit the parameters in each layer by gradient descent.
Using the total errors to calculate the partial derivative of the parameter, the magnitude of the influence of a parameter on the overall errors can be obtained, which is used to correct the parameters in the back propagation. Since the construction of the network uses a linear arrangement, the total error calculated by the final output layer is used to perform the partial derivative calculation of the parameters in all layers:
Each neuron in a CNN network is connected to all neurons in the previous layer, so the calculation of the local gradient requires a forward recursive calculation of the gradient of each subsequent layer of neurons. After defining the linear output of each layer and the parameters in the structure, the output of the activation function is:
, then the recursive formula for each gradient can be derived as:
In the successive calculation of the formula above, the gradient calculated by each parameter under the total errors is used as the basis to achieve the goal of correcting the parameters in the direction of the minimum loss function.
3.2. VGGNet and ResNet
VGGNet with prominent generalization ability for different datasets is widely used in 2D data processing. Considering that our spectral dimensions are not particularly large, in order to prevent overfitting on the training set, the shallow VGG16 network shown in
Figure 8 is used in our experiments.
The greatest improvement of VGGNet is to convert large and short convolutional layers into small and deep neural networks by reducing the scale of convolution kernels. In the convolutional layer, the number of extracted features for each layer output can be calculated using the following formula:
p, k and s are the size of max pooling, kernel size, and stride, respectively. With the continuous development of the classification network, the network is constantly deepening for better extraction and feature combination, but the deep network makes it difficult to train and has a potential of losing information.
ResNet as shown in
Figure 9 proposes residual learning to solve these problems, allowing information to be transmitted over the layer to preserve information integrity, while learning only the residuals of the previous network output:
This allows us to increase the number of convolutional layers, which is an effective way to train a very deep network. The most prominent characteristic of ResNet is that in addition to the result of the conventional convolution calculation in the final output, the initial input value is also added. Therefore, the result of network fitting will be the difference between the two, thereby we can obtain the calculation formula of each layer of ResNet as follows:
Then the gradient calculation formula of the neural network is changed based on conventional ones:
Compared with the traditional networks, the extra value “1” makes the calculated gradient value difficult to disappear, which means that the gradient calculated from the last layer can be transmitted back in the reverse direction, and the effective transmission of the gradient makes spectral features more efficient in the training of neural networks. However, considering that our spectral features are limited and may have high noise interference, the efficient feature extraction of the network may lead to significant overfitting, which makes the trained network generalization capability poor.
3.3. EMCCNN
EMCCNN is a network designed for the characteristics of spectra, and has achieved good results for different SNR spectra. We find that for spectra, a deep convolutional layer can lead to significant overfitting of the data and a very poor generalization capability of the model. Direct convolution of spectra will extract a lot of noise and individual features, which is not an ideal way to classify spectra. Unsupervised denoising methods tend to remove some spectral feature peaks in the data, which is also not an ideal method to maximize the common features of this spectral type.
Therefore, we try to add a supervised denoising network before the convolutional neural network, which can help in the identification of features and noise. It allows the network to extract spectral features instead of noise. On the other hand, the spectral feature peak types are not consistent. For different types of feature peaks, the different convolution kernel sizes may be able to extract different quality features. The extraction of some features may be better for larger convolution kernels. Others may be more friendly for small-scale convolution kernels. We decided to let convolution kernels of different scales learn features simultaneously, then combine these features, and obtain different weights for features through the fully connected layer. A better feature extraction network EMCCNN can be obtained as shown in
Figure 10. The detailed description for the architecture of EMCCNN is shown in
Appendix A and we use cross-entropy loss as loss function of EMCCNN.
4. Experiment
4.1. 1D Convolution Experiment and Results
The massive LAMOST spectra in this experiment are unlabeled, in order to verify the validity and efficiency of the method, we applied it in SDSS spectra before searching for CVs in massive LAMOST spectra.
We first try to compare the spectra classification of DNN (Deep Neural Network) with four hidden layers and 1D CNN as shown in
Figure 11. The optimizer used was a random gradient descent, the learning rate was set to
, and the activation function chosen was ReLU (Rectified Linear units). We found that the CNN network can quickly extract the corresponding features, but it will also overfit quickly, which has an accuracy (number of spectra correctly classified to the parent class/number of total spectra) of 60% in the test set, and the generalization capability is extremely poor.
CNN has strong feature extraction capabilities, but it is not suitable for spectra because it also extracts quite a lot of individual features and noise simultaneously. The problem becomes even worse with the increment of SNR.
Considering that VGG16 is composed of several convolutional layers, the direct convolution of the spectra will be the same as the direct use of CNN, leading to the overfitting of the training data. Hence, we try to add the encoder to the VGG16 network to denoise. We found that after denoising, VGG16 is not as bad as CNN for spectra within the scope of , but it is also not satisfactory for spectra with > 15. It is considered that the data features of high SNR are more obvious. The spectral individual features and noise will have a strong interference effect on a very deep network. On the other hand, the feedback process of the denoising encoder is too long, which is not easy to identify the difference of individual features, noise and common features.
ResNet’s interlayer transfer mechanism might help us solve the problem of overfitting, but the effect is not very satisfactory. ResNet and CNN also perform well on the training data, and can reach more than 99% accuracy on the three SNRs. It does not perform well in the test set, although not as serious as CNN. We try to add the dropout layer to help us solve the problem of overfitting, but this makes our training very slow and the final result is not satisfactory as well, as shown in
Figure 12.
Through experiments shown in
Table 1, it is noticed that the network with deep structure is not suitable for the classification of spectra, which will make the individualized features of noise and spectrum fully extracted, and the denoising of the encoder before these networks will make the feedback time too long, leading to a bad denoising result. Based on the conclusions above, we propose EMCCNN, which is not too deep, and is beneficial to the training of denoising capability of the encoder. We find that EMCCNN is very robust to SNR, which is especially suited for the spectra of LAMOST.
For objective comparison, we used support vector machine (SVM) to directly process spectra for classification and compare with the results of EMCCNN in
Table 2. The comparison demonstrates that SVM can quickly fit the training set of the data, but it tends to overfit critically in the experiment.
4.2. 2D Convolution Experiment and Results
We apply the classification network that is currently applied to 2D data in the field of spectra classification to explore the classification feasibility of the 2D folding of the 1D data. We fold the spectra into a 50 × 100 matrix and put it into a 2D classification network for experiments. The feature peaks of spectra become inconspicuous after folding. The spectra are only related on the same line after folding. The data correlation for each column is not obvious, but from the picture we can clearly see that each type of spectra is different after folding in
Figure 13, which brings us the possibility of applying the 2D classification network. We aim to fully apply the classification network so that satisfactory performance can be achieved.
AS shown in
Table 3, VGG16 performs quite well in the range of 10 <
sn < 15 SNR, and ResNet also performs better than 1D spectra in the range of 5 <
sn < 15. This discovery is inspiring, because it proves that although 1D spectra is not directly related to the top and bottom after folding, the 2D classification network can still achieve satisfactory results, which proves that the 2D classification of 1D spectra is feasible, even for some spectra which can produce results better than 1D classification.
Because the results of ResNet_2d on the training set are very good, we try to use early stopping to overcome overfitting. We show the results of ResNet_2d at different epochs in
Figure 14. Obviously early stopping cannot improve the accuracy in test set.
After the above comprehensive analysis and comparison of the methods under different situations, EMCCNN shows its superiority in 1D spectra especially its robustness against noise and is selected as the final structure to search for CVs in LAMOST archives.
4.3. Subclass Classification Results
Because the experimental spectra categories are not uniform, we explore the detailed results of the spectral categories in this section. Here we use
,
and
to judge the performance of EMCCNN for each subclass.
where
are current subclass samples predicted correctly by classifier and
are other subclass samples which is wrongly predicted to current subclass.
are current subclass samples wrongly predicted to other subclass samples.
Because the subclass data is not uniform and the number of features is not consistent, the performance of the EMCCNN in each subclass is not the same. We show results in
Table 4.
6. Conclusions
There is a great need for accurate and automatic classification methods of specified objects in massive spectra. The goal that we propose for spectral feature extraction is to maximize the extraction of common features and to minimize the extraction of noise and spectral individual features, which enhances the generalization capabilities of the network. We found that the simple DNN network cannot extract the features of spectra well, but the CNN with strong feature extraction capability can lead to significant overfitting. It also emerges that the deep network is not suited for the denoising training of the encoder, because its feedback process can be very long. This makes the encoder difficult to train and achieve a good denoising result.
In most cases, CVs especially CVs at quiescence show emission lines of Balmer series, HeI and HeII lines. For LAMOST spectra, only the H emission line (6563 Å) is included in B band which gives it a higher request for the ability of this feature extraction method. In this work, EMCCNN simultaneously performs supervised denoising, feature extraction, and classification. EMCCNN is not a deep network, which is ideal for the encoder to achieve a good denoising result. On the other hand, convolution kernels of different scales can extract spectral peaks of different quality, which provides sufficient options for the classifier. By weighting up the quality of features, the network can select high-quality spectral type features. This design enables the EMCCNN to achieve the best results in the three SNRs of 1D data.
The traditional view is that folding a 1D spectrum into a 2D image causes an information loss and the network has to do extra learning to understand that the pixels are correlated only along the horizontal axis and not along the vertical axis. However, EMCCNN achieve precise acquisition of the characteristic features of folded 2D spectra and can achieve good classification results, especially in the case where the 1D classification tends to overfit to an extreme degree. It proves that folding the spectra into 2D can effectively prevent the tendency of overfitting under certain circumstances.
Furthermore, this paper creatively proves that the classification of 2D spectra is feasible which means as a method of deep learning, powerful deep learning SDK (Software Development Kit) such as Caffe, Cognitive toolkit, PyTorch, TensorFlow, etc. and image processing libraries can be used for 2D spectral classification directly.
The discovery of the MRS of LAMOST provide more samples for astronomers to characterize the population of CVs. More new CVs will be discovered with the gradual release of LAMOST spectra.