1. Introduction
Many intelligent methods, e.g., nearest-neighbor classifiers, support vector machine, and neural networks, have been developed and implemented for various classification applications [
1,
2]. Among them, the neural network is one of the most popular and attractive methods at present. For instance, some deep neural networks have shown remarkable performance and obtained breakthroughs in many areas, including pattern recognition and image processing tasks, in recent years [
3]. The essential ideas of these typical, deep neural networks aim to deepen the layers of neural networks and then obtain more high-level features from input data [
4]. However, due to a large number of layers and complicated structure with huge hyper-parameters, the training process is quite long and tedious, which may affect the deployment and application of real-world classification tasks. In addition, deep networks usually require the support of powerful computing resources that are calculation-expensive at present [
5]. Moreover, the complicated architecture of these models brings many difficulties in analyzing them theoretically. Thus, the flat neural network with a brief structure and fewer parameters is a flexible and competitive model [
6]. It can achieve reasonable classification results without a complicated structure and a long training process.
Many researchers have recently developed and investigated the broad learning system (BLS), an attractive, flat neural network [
6]. It is designed and implemented according to the random vector functional-link neural network (RVFLNN), a fast model that provides the generalization ability of functional approximation [
7]. The BLS not only inherits the advantages of the RVFLNN but also obtains remarkable performance. The principles of basic broad learning (BL) are quite concise. Specifically, the input training data are randomly mapped to produce the corresponding feature nodes at first. Then, the generated mapped feature nodes are further treated and mapped with random weights to generate enhancement nodes. In this way, both feature nodes and enhancement nodes are utilized to compute the output results based on ridge regression approximation. The learned weights obtained using ridge regression approximation can be applied to the test data to generate the corresponding predicted results [
6].
The BLS is effective and has shown brilliant results in diverse classification and pattern recognition studies. Many researchers have developed and implemented this model and made many meaningful achievements [
8,
9,
10,
11,
12,
13]. For instance, Feng et al. implemented and integrated a fuzzy system into the basic BLS to replace original feature nodes with a group of Takagi–Sugeno fuzzy subsystems [
14]. Their experimental results show that their proposed model achieves suitable performance compared with other models. The authors in [
15] aimed to modify and replace the ridge regression approximation of standard BL with the regularized discriminative approach to generate more effective learned weights for image classification and then demonstrated the model’s outstanding classification capability. Researchers ameliorated the BLS structure, obtaining recurrent-BLS and Gated-BLS, for text classification and received the desired results in training time and accuracy [
16]. Yang et al. indicated that feature nodes and enhancement nodes may contain inefficient and redundant features. They performed a series of autoencoders on the BLS to acquire more effective features for various classification applications [
17]. Other researchers also attempted to combine a deep model, such as Convolutional Neural Network (CNN), with the BLS for classification and showed that their model is flexible in many applications [
18]. Chen et al. adopted a similar strategy to implement a CNN-based broad learning model that extracts valuable features of facial emotional images before classifying emotions [
19]. Many scholars have investigated the BLS in diverse applications. Sheng et al. developed a visual-based assessment system to evaluate a soccer game according to the BLS, which was effective in assessing trainees’ performance [
20]. In Ref. [
21], the authors implemented a discriminant manifold BLS method to classify hyperspectral images and effectively enhanced the recognition accuracy with limited training samples. In Ref. [
22], the researchers proposed a competitive BLS method for COVID-19 detection based on CT scans or chest X-ray images. In addition, Zhou et al. also investigated the BLS in the healthcare area and developed a semi-supervised BLS within transfer learning for EEG signal recognition [
23]. Other researchers implemented the BLS with decomposition algorithms for AQI forecasting and obtained ideal results [
24]. Zhao et al. processed input signals with principal component analysis (PCA) to generate valuable features and employed it with the BLS for fault diagnosis in a rotor system. Their results indicate that the PCA method can achieve dimension reduction of the input data as well as the extraction of valid features, where the BLS can acquire accurate fault diagnosis results efficiently [
25].
Although the BLS has been studied, upgraded, and applied in many aspects, it still has some deficiencies that need to be mitigated. For instance, traditional input data for the broad learning structure may contain high correlation and redundancy that can interfere with the recognition results. This issue is also mentioned and discussed in [
25]. In addition, Chen et al. mentioned that the broad learning architecture may have redundant information that can be simplified in terms of the feature nodes and enhancement nodes [
6]. A similar view is also reported in [
17]. These above-mentioned limitations are caused by the redundancy or high correlation of the input or generated data. Many feature engineering approaches can alleviate these issues in the basic BLS. Thus, it is suitable to apply feature engineering methods for further processing to obtain more effective feature information and enhance the upgraded model’s performance.
Many techniques and algorithms have been developed and implemented for feature extraction from raw input data [
26,
27], which can obtain essential features and achieve dimensionality reduction. One of the widely used methods is PCA [
28]. It can produce a series of orthogonal bases that acquire the directions of the maximum variance among input data, as well as the uncorrelated coefficients among new bases [
29,
30]. In this way, PCA explores a linear subspace with lower dimensionality compared with the original input feature space, where the new generated features retain the effective information for further analysis. Numerous studies have applied the PCA method for pattern recognition and classification and have shown favorable results. For example, Hargrove et al. implemented PCA to process the detected signals in pattern recognition-based myoelectric control and achieved remarkable results [
31]. Howley et al. investigated PCA to process spectral data. They indicated that applying PCA can enhance classification performance with high-dimensional data [
32].
Standard PCA is designed to perform linear dimensionality reduction. However, if the input data contain more complex structures that are difficult to represent in the linear subspace, basic PCA may perform poorly [
33]. Fortunately, KPCA (kernel principal component analysis) is introduced and developed to process nonlinear dimension reduction as well as feature extraction [
34]. This KPCA method has been widely utilized and verified in various applications. In Ref. [
35], the authors applied KPCA to extract gait features and then evaluated it with support vector machine (SVM) to improve the recognition of gait patterns. Their results indicate that KPCA is effective for feature extraction, as well as dimensionality reduction, and for enhancing the classification of young–elderly gait patterns. Fauvel et al. investigated KCA for feature extraction from hyperspectral remote sensing data. Their experimental results validate the usefulness of KCA for evaluating hyperspectral data compared with the conventional PCA approach [
36]. Shao et al. evaluated KPCA to extract signal features from a gear system, which can be applied to identify various faults effectively [
37].
Given the advantages of KPCA in feature extraction/dimensionality reduction [
34] and the above-mentioned issues in broad learning structures, we aim to propose a novel model, i.e., a broad learning model with a dual feature extraction strategy (BLM_DFE), for classification in this work. Differently from the basic broad learning architecture and the related mentioned studies on KPCA for recognition [
35,
36,
37], the proposed model is innovatively implemented based on the broad learning structure with dual KPCA operations. In this way, this model can process the original input data into low-dimensional, newly extracted data with the first KPCA, and it can process the generated feature/enhancement nodes into compressed nodes with the second KPCA. Therefore, we can establish a new model that simplifies the structure of broad learning and simultaneously improves recognition performance with effective, low-dimensional features. These distinguishing characteristics of BLM_DFE make it different from the ordinary broad learning approach. Moreover, several experimental evaluations indicate that BLM_DFE can obtain competitive classification accuracy compared with other popular methods.
Overall, the motivation of the proposed BLM_DFE is to upgrade the original BLS, ameliorate the broad learning structure, and further enhance classification performance in various classification tasks. Thus, the main objectives of this study are to utilize the advantages of KPCA to address the above-mentioned issues of the ordinary broad learning model to obtain more effective features and achieve the desired performance in classification for many real-world applications.
The main contributions of this work can be presented as follows:
We implement a novel broad learning structure that embeds the KPCA technique to enhance classification performance.
The proposed model compresses feature/enhancement nodes by performing KPCA and uses fewer nodes to achieve better performance.
The proposed model with a dual feature extraction strategy can perform better than the original BLS on diverse benchmark databases.
This dual feature extraction strategy with KPCA can further improve the recognition results compared with using a single KPCA in the broad learning structure, as indicated by the ablation study.
Several kernel functions are investigated and evaluated in the proposed model on various benchmark databases to validate its effectiveness.
Many popular classifiers are compared, further demonstrating the rationality and effectiveness of the proposed model.
BLM_DFE is a general model that can achieve the desired results on multiple types of data.
The rest of the paper is organized as follows:
Section 2 expresses some basic methods and techniques. In
Section 3, the proposed model is presented. Extensive experiments and analysis are conducted in
Section 4. Furthermore, we give a brief discussion in
Section 5. Finally,
Section 6 provides a conclusion for this study.
3. Proposed Broad Learning Model with a Dual Feature Extraction Strategy
From the basic methods of KPCA mentioned in
Section 2, we can conclude that the KPCA technique is effective in feature extraction and dimensionality reduction, which can relieve issues in the standard broad learning framework described in
Section 1. It is reasonable and meaningful to insert KPCA into a broad learning structure to extract effective features and compress the number of nodes in this study.
The brief diagram of the proposed BLM_DFE is presented in
Figure 2. The raw input data are initially processed with the first KPCA to obtain new input data. Then, these produced new input data containing more compact information can be used to construct feature and enhancement nodes. Thus, all these produced feature/enhancement nodes are further processed with the second KPCA to generate new compressed nodes for calculating the learned weights. These learned weights are applied for classification to evaluate the proposed model. Furthermore, the whole architecture of this model is shown in
Figure 3. In detail, the original input training data samples (
) are processed with the first KPCA (green box) to generate new, low-dimensional features (
) as the new input for the broad learning model (green arrow displayed in
Figure 3). The raw test data samples (
) are also mapped with the first KPCA, as explained in
Section 2.3. In this way, we can obtain the new feature nodes (orange circles) and new enhancement nodes (yellow circles) as follows:
All new feature nodes and enhancement nodes can be used to construct a new matrix as follows:
Here, we continue to apply KPCA (light-blue box) to process and compress
, due to a large number of feature/enhancement nodes within the redundant information. Then, the generated
(the compressed nodes in light-blue circles) are used to calculate the recognition results as follows:
The new trainable learned weight matrix (
) can be computed in the same way as mentioned in
Section 2.1:
where
denotes the compressed nodes, as expressed in
Figure 3;
Y indicates the label matrix;
I signifies the identity matrix; and
represents the regularization parameter. For the test procedures, test samples
mapped with the first KPCA are processed and generated corresponding to the feature/enhancement nodes, which are further mapped and compressed with the second KPCA and applied to determine the predicted output matrix (
) with the learned weight matrix (
). Thus, the final predicted class labels can be obtained using
.
To sum up, the specific steps of the proposed BLM_DFE can be described as follows:
5. Discussion
We have designed and developed a novel method that imposes KPCA on the broad learning framework to improve recognition performance. The novel outcomes of this BLM_DFE can be presented as follows: Our model can process the original input data and generate feature/enhancement nodes to obtain more effective features as well as simplify the architecture of broad learning. Furthermore, by applying various kernels of KPCA, this dual feature extraction approach shows its effectiveness in the experimental results shown in
Table 2,
Table 3,
Table 4,
Table 5,
Table 6 and
Table 7. For instance, it can be seen that the proposed model with an RBF kernel obtains the accuracy of 83.74%; with the sigmoid kernel, it achieves 81.71% accuracy; with the linear kernel, it produces 80.91% accuracy; all under 8 training samples on the GT database. Although the results of applying different kernels are different, they also prove the superiority of the proposed model. For example, K-NNs, as one comparison method, only achieved 73.17% accuracy under the same experimental settings. The reason that the proposed model can improve the classification performance compared with the original BLS and other methods can be expressed as follows: Applying KPCA can obtain more effective feature representation compared with the original high-dimensional data. Besides this, compressing a large number of raw feature/enhancement nodes and transforming them into another feature space can reduce redundant nodes and preserve useful information. Given these points, researchers can consider implementing their own models to obtain better recognition results based on the proposed dual feature extraction strategy in terms of high-dimensional input data or features already extracted using learning models in real-world applications.
From the above-mentioned experimental results (
Table 2,
Table 3,
Table 4,
Table 5,
Table 6 and
Table 7), it can be observed that the performance of BLM_DFE(L) is usually worse than that of BLM_DFE(R) and BLM_DFE(S). The reason is that BLM_DFE(L) uses a linear kernel, rather than the RBF or sigmoid kernel applied in BLM_DFE(R) and BLM_DFE(S), which can perform nonlinear dimension reduction and often achieve more competitive performance. Thus, these related experimental results also confirm the effectiveness of the KPCA applied in the proposed model.
We evaluated the proposed BLM_DFE on six different databases. Our model achieved significant improvements in the data types of face, plant, and object compared with the original BLS (refer to
Table 2,
Table 3,
Table 4 and
Table 5). Since this model achieved good results on face, plant, and object data types, we could consider handling practical applications similar to these mentioned data types, such as flowers, in future investigations.
In the ablation study, we compared the proposed model with several reduced models implemented by ourselves, such as KBLM and BLMK. From the experimental results in
Table 8 and
Table 9, we can find that the dual feature extraction strategy showed excellent performance. The performance of the linear kernel used in the proposed model often showed inferiority with respect to the other kernels, such as RBF. This evidence illustrates that nonlinear KPCA can obtain effective features and perform dimensionality reduction on more complicated structures of input data. In addition, although the results of BLMK are inferior to those of KBML and BML_DFE, this model is still competitive and achieved good results on the described databases. For instance, BLMK with an RBF kernel attained the accuracy of 67.39% compared with LRRR, which obtained 58.42% accuracy when employing six training samples on the Flavia database. Considering this, BLMK could simplify the structure of the BLS with fewer nodes for classification, making BLM_DFE also inherit this advantage in recognition applications. The results of this ablation study teach us that a large number of nodes may have redundant information. Hence, it is a meaningful way to compress these nodes in classification tasks. Moreover, to further verify the effectiveness of the generated
, we plan to apply the acquired
features with other popular classifiers, such as SVM, for evaluation in future studies. Thus, we can assess whether
can further improve the classification performance when employed together with other common models, which can explore and enhance the novelty of the proposed model.
In this study, we attempt to insert the KPCA technique into a broad learning structure to address some issues mentioned above (e.g., redundant information among feature/enhancement nodes) in the standard BLS and further enhance its classification performance. The KPCA technique is embedded in the broad learning architecture, rather than simply being combined with it. Besides this, some recent studies [
17,
25] also adopted similar strategies to modify the basic BLS and improved their proposed models’ performance, as expressed in the Introduction section. Therefore, although the proposed model is relatively simple in terms of technical soundness, considering the obvious improvement in experimental results on various benchmark databases and related studies by other researchers, we believe this work is still meaningful in terms of novelty.
The advantages of the proposed model are very obvious and clear, as mentioned in the previous sections. However, there is still room for improvement in the proposed model. For instance, KPCA is a quite popular and widely used feature extraction method that has been validated in various studies and applications. Thus, we selected KPCA to extract the low-dimensional useful features as the new input in the broad learning framework. Valid high-level information in the extracted raw input data may be ignored. Hence, integrating other feature extraction methods, such as stacked autoencoder [
65], to generate multiple features as the new input may further enhance classification performance. Given this point, we aim at exploring and investigating a multi-feature extraction broad learning model in our future studies. There are some disadvantages to our BLM_DFE. For instance, when using KPCA to process data, we need to address an
matrix (with
N indicating the number of input samples), and the computational complexity is
, which is also expressed in
Section 4.4. Therefore, if the number of input samples is very large, the computational cost of KPCA is also high. Considering this point, the proposed model is more suitable for dealing with classification tasks with small- or medium-sized databases. The original BLS has a good ability to process and evolve new input data in classification applications. Another shortcoming of this BLM_DFE model is that the embedding and processing of KPCA change the raw input data or original feature/enhancement nodes, affecting its ability to dynamically deal with new input data. However, it is still acceptable when considering the superior performance of the proposed model on various databases compared with other popular classifiers, including the standard BLS. In addition, we aim to further explore ameliorating this model to have a better capability to dynamically process data in the future.
Here, we introduced a dual feature extraction strategy by applying KPCA embedded in the broad learning structure to handle high-dimensional input data and feature/enhancement nodes, which adds several transformation operations. One of the main issues is that these operations can make the fine tuning and understanding of this model more complex. However, the KPCA technique can compress feature/enhancement nodes to obtain concise and valuable features, which are used to compute the learned weight matrix using ridge regression as in the standard BLS. Therefore, we only need to embed multiple KPCA operations in building this model while not significantly increasing the complexity of this model structure. In addition, we validated our implemented model on various real-world databases. The corresponding experimental results also verify the rationality and usability of our model in practical classification applications.