Next Article in Journal
Experiment Investigation of Bistable Vibration Energy Harvesting with Random Wave Environment
Previous Article in Journal
Study on Influence of Range of Data in Concrete Compressive Strength with Respect to the Accuracy of Machine Learning with Linear Regression
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Semi-Supervised Classification via Hypergraph Convolutional Extreme Learning Machine

1
School of Computer Science, China University of Geosciences, Wuhan 430078, China
2
Guangxi Key Laboratory of Marine Disaster in the Beibu Gulf, Beibu Gulf University, Qinzhou 535011, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2021, 11(9), 3867; https://doi.org/10.3390/app11093867
Submission received: 24 March 2021 / Revised: 18 April 2021 / Accepted: 22 April 2021 / Published: 25 April 2021
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

:
Extreme Learning Machine (ELM) is characterized by simplicity, generalization ability, and computational efficiency. However, previous ELMs fail to consider the inherent high-order relationship among data points, resulting in being powerless on structured data and poor robustness on noise data. This paper presents a novel semi-supervised ELM, termed Hypergraph Convolutional ELM (HGCELM), based on using hypergraph convolution to extend ELM into the non-Euclidean domain. The method inherits all the advantages from ELM, and consists of a random hypergraph convolutional layer followed by a hypergraph convolutional regression layer, enabling it to model complex intraclass variations. We show that the traditional ELM is a special case of the HGCELM model in the regular Euclidean domain. Extensive experimental results show that HGCELM remarkably outperforms eight competitive methods on 26 classification benchmarks.

1. Introduction

Extreme Learning Machine (ELM) [1,2] was developed as a simple but effective learning model for classification and regression problems. As it is a special form of random vector functional-link network (RVFL) [3], ELM suggests that the hidden layer parameters of a neural network play an important role but does not need update during training [4,5]. Inspired by this, a large number of ELM variants have been proposed and widely applied to biomedical data analysis [6], computer vision [7], system modeling and prediction [8,9], and so on.
The key to the classic ELM is random mapping [10,11]. Despite helpful, the random mapping often suffers from poor robustness due to its randomness. To remedy the drawback, a number of works have been devoted to seeking the optimal hidden parameters. Wu et al. [12] presented a multi-objective evolutionary ELM to jointly optimize the structural risk and empirical risk of ELM. Similarly, many popular heuristic search methods, including differential evolution [13], are adopted for this purpose. However, the heuristic search is often time-consuming. Alternatively, kernel ELM (KELM) [4,14] implicitly implements the ELM hidden mapping in the reproducing kernel Hilbert space, which usually results in more stable performance. Considering that the random mapping also provides good diversity, it allows us to design an ensemble of ELMs [5,15]. It has been proven to be useful to improve the robustness of single ELM. Another tendency to enhance ELM is to make ELM deeper [16,17]. Compared with popular deep learning models, e.g., Convolutional Neutral Networks (CNNs) [18], deep ELMs lack the potential to capture deep semantics from large-scale complex data.
The above-mentioned ELMs typically belong to supervised ELM, which desires adequate training samples for model training. For most general situations, where training samples are significantly limited, the auxiliary information of unlabeled data is surprisingly helpful to improve generalization ability [19,20]. To this end, Semi-supervised ELM (SS-ELM) [21] has drawn increasing attention. SS-ELM adopts a frequently used scheme, i.e., graph Laplacian regularization, to incorporate manifold structure. To combine multiple graphs into SS-ELM, Yi et al. [22] introduced an adaptive multiple graph regularized SS-ELM approach. These efforts have demonstrated that structured information is beneficial for ELM. Nonetheless, the simple graph can only reveal the pairwise relationship of data points, thus ignoring the high-order relationship. Moreover, these methods basically work in the Euclidean domain, leading to indirect and insufficient use of structured information.
Recently, Graph Neural Networks (GNNs) [23] is emerging as a powerful technique for generalizing neural networks on graph-structured data. Technically, GNNs capture the dependence of graphs via message passing between the nodes of graphs [24]. Graph Convolutional Network (GCN) [25] is one of the most popular GNN models for node-level tasks [26] and graph-level tasks [27]. It has been proven that GCN is equivalent to a fixed low-pass filter [28,29] followed by a linear classifier. Thus, it often suffers from the over-smoothing problem when model depth increased. More recently, some works have attempted to extending GCN into hypergraph scenarios [30,31] so that the high-order relationship of data can be embedded. Despite considerable success, current GCNs highly rely on arduous model update via gradient descent.
Depending on the observation that GCN is equivalent to a multilayer perceptron multiplied by an adjacent matrix, [32], Zhang et al. proposed a Graph Convolutional ELM (GCELM) for semi-supervised learning. As a pioneer work, GCELM follows the same pipeline as ELM but extends it into the non-Euclidean domain, enabling ELM to deal with graph-structured data directly. However, GCELM mainly focuses on the graph embedding of pairwise connections, resulting in failure to capture intraclass variations between data points. This motivates us to further develop an enhanced version of HGCELM. We refer to the method as Hypergraph Convolutional Extreme Learning Machine (HGCELM). Specifically, HGCELM contains a random hypergraph convolutional layer that produces hypergraph embeddings, and a hypergraph regression layer with a closed-form solution.
To sum up, the main contributions of this paper are two-fold:
  • We propose a simple but effective hypergraph convolutional ELM, i.e., HGCELM, for semi-supervised classification. The HGCELM method not only inherits all the advantages from ELM but enables ELM to model the high-order relationship of data. The successful attempt signifies that structured information, especially high-order relationships, among data is important for ELM, which offers an alternative orientation for ELM representation learning.
  • We have shown that the traditional ELMs are the special cases of HGCELM on the Euclidian data. We conduct extensive experiments on 26 popular datasets for semi-supervised classification task. Comparisons with state-of-the-art methods demonstrate that the proposed GCELM can achieve superior performance.
The rest of the paper is organized as follows. In Section 2, we briefly review hypergraph learning, ELMs, and graph neural networks. In Section 3, we systematically introduce the framework, formulation, and implementation of the proposed method. Experimental evaluations and comparisons are presented in Section 4, followed by the conclusions and future work given in Section 5.

2. Preliminary and Related Work

2.1. Notations

Through this paper, symbols for vectors are boldface lowercase italics (e.g., x), symbols for matrices are boldface uppercase roman letters (e.g., X ), and symbols for scalars are italics (e.g., x i j ). Let X = x i R m ; y i R C i = 1 N be the sample set consisting of Nm-dimensional data points and C distinct classes, in which x i and y i denote i-th data point and its one-hot target vector, respectively. For clarity, we denote the labeled data and unlabeled data with a subscript L and U , e.g., X L and X U are the feature matrices for training and test, respectively. I N signifies an identity matrix with the size of N × N . The matrix Frobenius norm is defined as X F = i j x i j 2 1 / 2 . The main notations involved in this paper and corresponding definitions are given in Table 1.

2.2. Hypergraph Preliminary

Hypergraph is the generalization of the simple graph, in which an edge can join any number of vertices. We refer to edges in a hypergraph as hyperedges. By contrast, we denote the regular graph in which each edge only connects two vertexes as a simple graph. A visual comparison between the simple graph and the hypergraph is illustrated in Figure 1. It can be seen that hypergraph can reveal a more complex data relationship, which is superior to the simple graph. For a learning task, hypergraph is usually used to represent the high-order intraclass variations among data points.
Formally, let G = V , E , W be a hypergraph composed of a vertex set v V with the size of N, a hyperedge set e E with the size E , and a weight set of hyperedge W , where the weight of hyperedge e is indicated as w e . A hypergraph is often described as an incidence matrix H R N × E whose elements indicate whether a vertex joins a corresponding hyperedge (e.g., Figure 1b). Mathematically, the incidence matrix is defined by
h v , e = 1 v e 0 v e .
The degree of a vertex v and the degree of a hyperedge e are given as follows, respectively
d v = e E w e h v , e ,
δ e = v V h v , e ,
By analogy with the simple graph, spectral analysis can be used as an efficient tool for the analysis of hypergraph. The normalized hypergraph Laplacian matrix [33,34] is calculated by
L = I D v 1 / 2 HW D e 1 H T D v 1 / 2 .
For a semi-supervised learning task, hypergraph is usually used by incorporating with an empirical error [35], as follows
arg min F R e m p F + λ tr F T LF ,
where R e m p F denotes the empirical error term over a problem-dependent prediction F .

2.3. ELMs

The basic ELM can be interpreted as two components, i.e., random hidden mapping and ridge regression classifier. Formally, ELM’s hidden layer can be expressed as
Z = σ X Θ + b ,
where Z is the hidden layer output matrix parameterized by hidden weight matrix Θ and bias vector b , σ denotes a nonlinear activation function such as Sigmoid. In the second stage, ELM computes prediction by
Y = Z β .
Here, β is the output weight matrix. Since Z is known to the output layer, Equation (7) essentially is a least-squares optimization problem and can be solved as
β = Z Y T .
ELM avoids iterative parameters tuning, thus significantly faster than gradient descent-based neural networks. SS-ELM is the semi-supervised version of ELM by introducing a graph Laplacian regularization term [21]. Its formulation is given by
arg min β Z T β Y T F 2 + λ tr β T Z T L Z β .
SS-ELM also has a closed-form solution. It should be noted that the graph structure information is considered in SS-ELM but it essentially works in the regular Euclidean domain.

2.4. GCNs

There is an increasing interest in generalizing convolutions to the graph domain [23]. The recent development of GNNs allows us to efficiently approximate convolution on graph-structured data. GNNs can typically divide into two categories [23,28]: spectral convolutions [36], which perform convolution by transforming node representations into the spectral domain using the graph Fourier transform or its extensions, and spatial convolutions [37], which perform convolution by sampling from neighborhood signals.
In [25], Kift et al. developed a Graph Convolutional Network (GCN) by simplifying the spectral convolution with the 1th-order Chebyshev polynomials and setting the largest eigenvalue of the normalized graph Laplacian to be 2. Formally, GCN defines spectral convolution over a graph as follows
Z = σ D ˜ 1 / 2 A ˜ D ˜ 1 / 2 X Θ .
Here, A ˜ = I N + A is the so-called augmented normalized adjacency, D ˜ is given by D ˜ i i = j A i j ˜ . However, the current GNNs highly rely on gradient descend optimizers, which is often time-consuming and prone to suffering from a locally optimal solution. To overcome the shortcomings, Zhang et al. [32] proposed a randomization-based GCN (i.e., GCELM) by combining the advantages of ELM with GCN. Instead of updating all trainable parameters, GCELM employs a random graph convolutional layer and keeps it fixed. Thus, this allows GCELM to calculate the closed-form solution for the training phase, and further resulting in a faster learning speed.

3. HGCELM

In Figure 2, we provide an illustration of the proposed HGCELM framework. The framework first constructs a hypergraph from the given dataset, and then feeds it into the HGCELM model consisting of a random hypergraph convolutional layer and a hypergraph convolutional regression layer. The details for HGCELM are introduced as follows.

3.1. Hypergraph Construction

We represent the high-order relationship of a dataset by constructing a hypergraph G . For this purpose, each data point x i is treated as a vertex v, and furthermore, we consider that x i is the center vertex and its k nearest neighbors are associated with the hyperedge e. As a result, each hyperedge connects k + 1 vertices. The incidence matrix of the hypergraph is defined by
h v i , e j = 1 x j N k x i 0 o t h e r w i s e .
The degree of the vertex set V and the degree of the hyperedge set E can be expressed in diagonal matrix forms, i.e., D v and D e . There are various ways to assign the weights for hyperedges, such as the sum of similarities of vertices within a hyperedge [34]. In this paper, we view all hyperedges as equal weights and thus can be indicated as an identity matrix I R E × E . The samples belonging to the same class often have a higher probability of being assigned to N k . Therefore, it is reasonable to use hypergraph to describe the intraclass variations of data.

3.2. Random Hypergraph Convolution

Inspired by the previous GCELM [32], we propose a novel ELM mapping, called random hypergraph convolution (RGC), to incorporate the high-order relationship among data into the feature mapping. We define the random hypergraph convolution following that proposed in [30,31], except that ours does not need to be updated iteratively. It is expressed as follows:
Z = σ S X Θ ,
where S = D v 1 / 2 H D e 1 H 1 D v 1 / 2 referring to as augmented normalized incidence matrix by imposing a symmetric normalization. It should be noted that S incorporates structured information of data and can be precomputed during implementation. Thus, the random hypergraph convolution enables ELM to embed the high-order information. Following the ELM’s theory, the filter parameters of the random hypergraph convolution, Θ , is randomly generated under a specific probability distribution, e.g., Gaussian distribution Θ i j N 0 , 1 .

3.3. Hypergraph Convolutional Regression

Based on the hidden hypergraph embedding Z , we use a hypergraph convolutional regression layer to predict labels. Formally, the layer can be written as:
Y = S Z β .
To solve β , the above equation can be rewritten as the following ridge regression problem:
arg min β * β F , s . t . , Y ˜ = S Z β .
Here, Y ˜ = Y T ; Y U is an augmented training target matrix. Since Y U is invisible during training stage, thus it is set to be a 0 matrix. Let M be a diagonal mask matrix with its first N T diagonal elements M i i = 1 and the rest equal to 0. We further rewrite Equation (14) as
arg min β * 1 2 M Y ˜ S Z β F 2 + λ 2 β F 2 .
It is easy to prove that Equation (15) holds a optimal solution and its closed form is expressed
β * = Z T S T M S Z + λ I 1 Z T S T M Y ˜ .
The labels of the unlabeled data points can be determined depended on the obtained β * , which is given by
Y ¯ U = S Z U β * .
We show the overall learning steps of the proposed HGCELM in Algorithm 1. Since there is no iteration in HGCELM, it would be computationally efficient for training such model.
Algorithm 1: HGCELM
Input: Dataset X , training labels Y T , hyper-parameters λ and L.
1 Construct hypergraph G = V , E , W by Equation (11);
2 Generate Θ using the standard normal distribution;
3 Calculate random hypergraph embedding: Z = σ S X Θ ;
4 Solve hypergraph conv regression: β * = Z T S T M S Z + λ I 1 Z T S T M Y ˜ ;
5 Predict test levels by Equation (17);
Output: Y ¯ U .

3.4. Computation Complexity and Connection to Existing Methods

We give a theoretical analysis for the computation complexity of our method. Ignoring the hypergraph generation procedure and assuming L N , the time complexity of HGCELM is approximately O N 3 . Compared with ELM, HGCELM has a higher time complexity, which is determined by the hypergraph convolution. Fortunately, the augmented normalized incidence matrix can be precomputed and is sparse. Thus, the hypergraph embedding can be efficiently implemented. HGCELM is more efficient than GCN since it does not use iteration.
Our HGCELM is closely related to the classic ELMs. First, HGCELM maintains the advantages of ELMs, i.e., fast learning speed achieved by a closed-form solution. Second, our HGCELM will degenerate to be the classic ELM by casting away S . Consequently, the classic ELM is a special case of the HGCELM model in the regular Euclidean domain. Furthermore, our HGCELM remedies the drawback that traditional ELMs cannot deal with structured data.

4. Results and Discussion

In this section, we conduct experiments to verify the effectiveness of the proposed HGCELM.

4.1. Experimental Configurations

4.1.1. Datasets

We evaluate our method on 26 widely used classification datasets taken from University of California at Irvine (UCI) repository. For each dataset, we scale its feature values into 0 , 1 using a min-max standardization method and randomly choose 5 samples per class as the training set and the rest as the test set. The details of these datasets are reported in Table 2.

4.1.2. Baseline Methods

We compare our method with eight baselines including the basic ELM [2], KELM [38], SS-ELM [21], Transductive Support Vector Machine (TSVM) [19], Self-training Semi-supervised ELM (ST-ELM) [39], Laplacian Support Vector Machine (LapSVM) [20], GCELM [32], and GCN [25]. For a fair comparison with previous works, we set 50 hidden neurons for these methods that contain hidden layer, i.e., GCELM, ELM, SS-ELM, ST-ELM, and GCN. The hyperparameter settings of these methods are provided in Table 3. We implement all the methods with Python 3.5 running on an Intel i5-6500 3.20 GHz CPU with 8.00 GB RAM.

4.2. Qualitative Study

4.2.1. Comparisons with Baselines

Table 4 reports the test accuracy of different methods over the 26 datasets. Each method is evaluated for 30 independent runs. At the bottom of the table, two summative metrics are calculated, that is, an arithmetic mean of test accuracy over the 26 datasets and a statistical value for how many datasets our HGCELM win/tie/loss (W/T/L) than a competitor. On the basis of these results, we conclude that the proposed HGCELM method consistently outperforms the other baselines. Compared with GCELM, HGCELM achieves 2.29 % improvement in terms of the averaged accuracy with a lower standard deviation. It demonstrates that hypergraph is more helpful for semi-supervised learning. This is because the hypergraph convolution aims to embed the intraclass variations, while the graph convolution focuses only on the pairwise relationship. There is also a significant improvement comparing HCGELM against the classic semi-supervised methods (i.e., SS-ELM, TSVM, ST-ELM, and LapSVM). Specifically, our method wins on 25, 26, 26, and 24 datasets, respectively. Notably, ELM and KELM are purely supervised, thus their classification accuracy is lower than most semi-supervised methods caused by the limited training samples. Despite being optimized, GCN cannot achieve better test accuracy than our HGCELM. It further signifies the effectiveness and reliability of our proposal.

4.2.2. Performance with Varying Training Size

To investigate the performance of HGCELM, we visualize the test accuracy under varying training sizes. In this experiment, we increase the training size from 5 % to 50 % and present the corresponding results of 30 evaluations with a box plot. As shown in Figure 3, the test accuracy of HGCELM tends to increase when the training sample increased. Meanwhile, HGCELM becomes more robust and more stable. When the training size is larger than 25 % , HGCELM’s accuracy closes to its highest accuracy and tends to be convergent. For the Iris dataset, HGCELM can achieve a remarkable accuracy (higher than 90 % ) using only 5 samples per class ( 10 % ). It means that HGCELM has the ability to explore and make better use of the useful information from unlabeled data.

4.2.3. Analysis on Decision Boundaries

To provide an intuitive understanding of the superiority of our HGCELM, we visualize the decision boundaries of ELM, SS-ELM, GCELM, and HGCELM. In Figure 4, we synthesize three types of representative data distributions, i.e., linear separable data (the first row), linear inseparable data with half circles (the second row), and linear inseparable data with concentric circles (the third row), each of which contains 100 samples with two classes. The complexity of these datasets gradually increases from top to bottom. In this experiment, we select 10 samples for training and initialize each classifier with 10 hidden neurons. Owing to the simplicity, all the four classifiers can correctly find a reliable decision boundary on the linear separable dataset. However, for the first linear inseparable dataset, semi-supervised methods (i.e., HGCELM, GCELM, and SS-ELM) can obtain more generalized decision boundaries than the supervised method (i.e., ELM). However, spectral convolution-based methods (HGCELM and GCELM) are superior to the graph Laplacian-based method (SS-ELM) in terms of decision boundary, as well as test accuracy. From the second linear inseparable dataset, HGCELM can accurately classify the data points with a better decision boundary, while the other three methods show relatively poorer ones. In particular, ELM fails to work on this dataset under the same experiment settings. This is because the ELM cannot use the unlabeled samples, making it more likely to overfitting the limited training set. Although SS-ELM shows same accuracy as HGCELM, its decision boundary cannot separate two circles. We can naturally conclude that HGCELM has a better generalization ability than the other methods. The ability is benefited from the fact that the high-order relationship among the whole data points can improve the quality of decision making. It should be noticed that due to the geometric property of data distribution, the Euclidean-distance-based graph construction strategy would result in an inappropriate graph structure on the linear inseparable datasets. Therefore, a small neighbor size is desired when constructing a graph or hypergraph for these datasets.

4.3. Parameter Sensitivity Study

Two parameter sensitivity studies are carried out to further explore the robustness of the proposed method.

4.3.1. Impact of Hidden Neurons

Figure 5 shows the impact of different hidden neurons. We compare our method with five baselines that require setting hidden neurons. As seen from Figure 5a–d, all the compared methods can achieve better performance when using more hidden neurons on all the datasets. Nevertheless, our proposed HGCELM enjoys a more competitive edge than the other methods. In particular, when our method uses more than 20 hidden neurons. Notice that GCN can guarantee relatively better accuracy than other competitors when using small hidden neurons. This is because the filter parameters of GCN are fully optimized. By contrast, ELMs require larger hidden neurons, which is repeatedly demonstrated by many previous works. From Figure 5c,d, the classic ELM suffers from overfitting (accuracy declined with hidden neurons) caused by the inadequate training data. The problem is overcome in other semi-supervised ELMs including our HGCELM.

4.3.2. Impact of λ and k

In this experiment, we further explore the impact of the other two important hyper-parameters, i.e., λ and k. Figure 6a–d shows the results by grid searching. Here, we set λ = 10 7 , 10 6 , , 10 2 and k = 3 , 6 , , 30 , respectively. We can observe two tendencies from the results. First, to obtain a better performance, the regularization coefficient λ should be set to be less than 1. Theoretically, using a larger λ will punish HGCELM to be a compact model but also resulting in an increased risk of under-fitting the training data. Second, a larger neighbor size k will guarantee a better classification accuracy in general. However, there will exist noise samples in hyperedges if k becomes too large. We provide the suitable ranges of the two parameters as λ 10 3 , 10 1 and k 5 , 10 .

5. Conclusions

We have proposed a novel ELM called HGCELM for semi-supervised classification. The idea behind HGCELM is to combine hypergraph convolution with ELM so that embedding the high-order relationship of data. The resulting model extends ELM into the non-Euclidian domain and endows ELM with the capability of modeling structure data. The proposed HGCELM is characterized by a very light computational burden and good generalization ability, making it easy to implement and apply in practice. Extensive experiments on 26 datasets demonstrate that HGCELM is superior to many existing methods. The successful attempt clues a promising avenue for designing randomized neural networks and graph neural networks.

Author Contributions

Conceptualization, Z.L. and Y.C.; Data curation, Z.L.; Formal analysis, Z.Z.; Investigation, Y.M.; Methodology, Z.L. and Y.C.; Project administration, Y.C. and Z.C.; Software, Z.Z.; Supervision, Z.C.; Validation, Y.M.; Writing—original draft, Z.L.; Writing—review & editing, Z.Z. and Y.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Fundamental Research Founds for National University, China University of Geosciences (Wuhan) under grant 1910491T06, in part by Guangxi Natural Science Fund General Project under grant 2021GXNSFAA075029, and Guangxi Science and Technology Base and Talent Project under grant Guike AD20159036.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The UCI datasets used in this work are available from https://archive.ics.uci.edu/ml/datasets.php accessed on 14 March 2021.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Huang, G.; Huang, G.B.; Song, S.; You, K. Trends in extreme learning machines: A review. Neural Netw. 2015, 61, 32–48. [Google Scholar] [CrossRef] [PubMed]
  2. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  3. Zhang, Y.; Wu, J.; Cai, Z.; Du, B.; Yu, P.S. An unsupervised parameter learning model for RVFL neural network. Neural Netw. 2019, 112, 85–97. [Google Scholar] [CrossRef]
  4. Huang, G.B.; Zhou, H.; Ding, X.; Zhang, R. Extreme Learning Machine for Regression and Multiclass Classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2012, 42, 513–529. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Cai, Y.; Liu, X.; Zhang, Y.; Cai, Z. Hierarchical ensemble of extreme learning machine. Pattern Recognit. Lett. 2018, 116, 101–106. [Google Scholar] [CrossRef]
  6. Song, Y.; Crowcroft, J.; Zhang, J. Automatic epileptic seizure detection in EEGs based on optimized sample entropy and extreme learning machine. J. Neurosci. Methods 2012, 210, 132–146. [Google Scholar] [CrossRef]
  7. Zeng, Y.; Xu, X.; Shen, D.; Fang, Y.; Xiao, Z. Traffic Sign Recognition Using Kernel Extreme Learning Machines with Deep Perceptual Features. IEEE Trans. Intell. Transp. Syst. 2017, 18, 1647–1653. [Google Scholar]
  8. Xu, Y.; Dong, Z.Y.; Zhao, J.H.; Zhang, P.; Wong, K.P. A Reliable Intelligent System for Real-Time Dynamic Security Assessment of Power Systems. IEEE Trans. Power Syst. 2012, 27, 1253–1263. [Google Scholar] [CrossRef]
  9. Chen, X.; Dong, Z.Y.; Meng, K.; Xu, Y.; Wong, K.P.; Ngan, H.W. Electricity Price Forecasting With Extreme Learning Machine and Bootstrapping. IEEE Trans. Power Syst. 2012, 27, 2055–2062. [Google Scholar] [CrossRef]
  10. Zhang, L.; Suganthan, P. A survey of randomized algorithms for training neural networks. Inf. Sci. 2016, 364–365, 146–155. [Google Scholar] [CrossRef]
  11. Cao, W.; Wang, X.; Ming, Z.; Gao, J. A review on neural networks with random weights. Neurocomputing 2018, 275, 278–287. [Google Scholar] [CrossRef]
  12. Wu, Y.; Zhang, Y.; Liu, X.; Cai, Z.; Cai, Y. A multiobjective optimization-based sparse extreme learning machine algorithm. Neurocomputing 2018, 317, 88–100. [Google Scholar] [CrossRef]
  13. Zhang, Y.; Wu, J.; Cai, Z.; Zhang, P.; Chen, L. Memetic Extreme Learning Machine. Pattern Recognit. 2016, 58, 135–148. [Google Scholar] [CrossRef]
  14. Xiao, L.; Shao, W.; Jin, F.; Wu, Z. A self-adaptive kernel extreme learning machine for short-term wind speed forecasting. Appl. Soft Comput. 2021, 99, 106917. [Google Scholar] [CrossRef]
  15. Wang, X.B.; Zhang, X.; Li, Z.; Wu, J. Ensemble extreme learning machines for compound-fault diagnosis of rotating machinery. Knowl. Based Syst. 2020, 188, 105012. [Google Scholar] [CrossRef]
  16. Huang, G.B.; Bai, Z.; Kasun, L.L.C.; Vong, C.M. Local Receptive Fields Based Extreme Learning Machine. IEEE Comput. Intell. Mag. 2015, 10, 18–29. [Google Scholar] [CrossRef]
  17. Cai, Y.; Zhang, Z.; Yan, Q.; Zhang, D.; Banu, M.J. Densely connected convolutional extreme learning machine for hyperspectral image classification. Neurocomputing 2021, 434, 21–32. [Google Scholar] [CrossRef]
  18. Cai, Y.; Liu, X.; Cai, Z. BS-Nets: An End-to-End Framework for Band Selection of Hyperspectral Image. IEEE Trans. Geosci. Remote Sens. 2020, 58, 1969–1984. [Google Scholar] [CrossRef] [Green Version]
  19. Joachims, T. Transductive inference for text classification using support vector machines. In Proceedings of the ICML, Bled, Slovenia, 27–30 June 1999; Volume 99, pp. 200–209. [Google Scholar]
  20. Belkin, M.; Niyogi, P.; Sindhwani, V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 2006, 7, 2399–2434. [Google Scholar]
  21. Huang, G.; Song, S.; Gupta, J.N.D.; Wu, C. Semi-Supervised and Unsupervised Extreme Learning Machines. IEEE Trans. Cybern. 2014, 44, 2405–2417. [Google Scholar] [CrossRef] [PubMed]
  22. Yi, Y.; Qiao, S.; Zhou, W.; Zheng, C.; Liu, Q.; Wang, J. Adaptive multiple graph regularized semi-supervised extreme learning machine. Soft Comput. 2018, 22, 3545–3562. [Google Scholar] [CrossRef]
  23. Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4–24. [Google Scholar] [CrossRef] [Green Version]
  24. Garg, V.; Jegelka, S.; Jaakkola, T. Generalization and Representational Limits of Graph Neural Networks. In Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, 12–18 July 2020; Daumé, H., III, Singh, A., Eds.; Volume 119, pp. 3419–3430. [Google Scholar]
  25. Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
  26. Cai, Y.; Zhang, Z.; Cai, Z.; Liu, X.; Jiang, X.; Yan, Q. Graph Convolutional Subspace Clustering: A Robust Subspace Clustering Framework for Hyperspectral Image. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4191–4202. [Google Scholar] [CrossRef]
  27. Lee, J.; Lee, I.; Kang, J. Self-Attention Graph Pooling. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Chaudhuri, K., Salakhutdinov, R., Eds.; Volume 97, pp. 3734–3743. [Google Scholar]
  28. Li, Q.; Han, Z.; Wu, X.M. Deeper insights into graph convolutional networks for semi-supervised learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
  29. Wu, F.; Souza, A.; Zhang, T.; Fifty, C.; Yu, T.; Weinberger, K. Simplifying Graph Convolutional Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; Chaudhuri, K., Salakhutdinov, R., Eds.; Volume 97, pp. 6861–6871. [Google Scholar]
  30. Feng, Y.; You, H.; Zhang, Z.; Ji, R.; Gao, Y. Hypergraph Neural Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 3558–3565. [Google Scholar]
  31. Bai, S.; Zhang, F.; Torr, P.H. Hypergraph convolution and hypergraph attention. Pattern Recognit. 2021, 110, 107637. [Google Scholar] [CrossRef]
  32. Zhang, Z.; Cai, Y.; Gong, W.; Liu, X.; Cai, Z. Graph Convolutional Extreme Learning Machine. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar] [CrossRef]
  33. Schölkopf, B.; Platt, J.; Hofmann, T. Learning with Hypergraphs: Clustering, Classification, and Embedding. In Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference; MITP: Cambridge, MA, USA, 2007; pp. 1601–1608. [Google Scholar]
  34. Jin, T.; Cao, L.; Zhang, B.; Sun, X.; Deng, C.; Ji, R. Hypergraph Induced Convolutional Manifold Networks. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19), Macao, China, 10–16 August 2019; pp. 2670–2676. [Google Scholar]
  35. Cai, Y.; Zhang, Z.; Cai, Z.; Liu, X.; Jiang, X. Hypergraph-Structured Autoencoder for Unsupervised and Semisupervised Classification of Hyperspectral Image. IEEE Geosci. Remote Sens. Lett. 2021, 1–5. [Google Scholar] [CrossRef]
  36. Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16), Barcelona, Spain, 4–9 December 2016; pp. 3844–3852. [Google Scholar]
  37. Hamilton, W.L.; Ying, R.; Leskovec, J. Inductive Representation Learning on Large Graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17), Long Beach, CA, USA, 4–9 December 2017; pp. 1025–1035. [Google Scholar]
  38. Chen, C.; Li, W.; Su, H.; Liu, K. Spectral-Spatial Classification of Hyperspectral Image Based on Kernel Extreme Learning Machine. Remote Sens. 2014, 6, 5795–5814. [Google Scholar] [CrossRef] [Green Version]
  39. Li, Y.; Guan, C.; Li, H.; Chin, Z. A self-training semi-supervised SVM algorithm and its application in an EEG-based brain computer interface speller system. Pattern Recognit. Lett. 2008, 29, 1285–1294. [Google Scholar] [CrossRef]
Figure 1. Visual comparison between a simple graph and a hypergraph. (a) An example of the simple graph (left) and its corresponding adjacent matrix A (right). (b) An example of hypergraph (left) and its corresponding incidence matrix H (right), where edges with the same color denote a hyperedge. Best view in color.
Figure 1. Visual comparison between a simple graph and a hypergraph. (a) An example of the simple graph (left) and its corresponding adjacent matrix A (right). (b) An example of hypergraph (left) and its corresponding incidence matrix H (right), where edges with the same color denote a hyperedge. Best view in color.
Applsci 11 03867 g001
Figure 2. The proposed HGCELM framework.
Figure 2. The proposed HGCELM framework.
Applsci 11 03867 g002
Figure 3. Test accuracy of HGCELM with varying labeled sample sizes on (a) Iris, (b) Wine, (c) WDBC, and (d) Segmentation datasets.
Figure 3. Test accuracy of HGCELM with varying labeled sample sizes on (a) Iris, (b) Wine, (c) WDBC, and (d) Segmentation datasets.
Applsci 11 03867 g003
Figure 4. Visualization of the decision boundaries for ELM, SS-ELM, GCELM, and HGCELM over the three synthetic data distribution. This figure is best viewed in color.
Figure 4. Visualization of the decision boundaries for ELM, SS-ELM, GCELM, and HGCELM over the three synthetic data distribution. This figure is best viewed in color.
Applsci 11 03867 g004
Figure 5. Classification performance comparison under different number of hidden neurons on (a) Iris, (b) Wine, (c) WDBC, and (d) Segmentation datasets.
Figure 5. Classification performance comparison under different number of hidden neurons on (a) Iris, (b) Wine, (c) WDBC, and (d) Segmentation datasets.
Applsci 11 03867 g005
Figure 6. Parameter sensitivity analysis for λ and k on (a) Iris, (b) Wine, (c) WDBC, and (d) Segmentation datasets. The x axis is denoted as l o g 10 ( λ ) .
Figure 6. Parameter sensitivity analysis for λ and k on (a) Iris, (b) Wine, (c) WDBC, and (d) Segmentation datasets. The x axis is denoted as l o g 10 ( λ ) .
Applsci 11 03867 g006
Table 1. Important notations used in this paper and their definitions.
Table 1. Important notations used in this paper and their definitions.
NotationDefinition
NThe number of data points.
mThe number of features.
CThe number of classes.
X The feature matrix of dataset, R N × m .
Y The label matrix with one-hot encoding, R N × C .
N T The number of labeled samples.
T The labeled set.
U The unlabeled set.
Θ The hidden layer parameter matrix, R m × L .
β The output layer parameter matrix, R L × C .
Z The matrix of latent representation, R N × L .
Z The Moore–Penrose generalized inverse of matrix Z .
LThe number of hidden neurons.
V The set of vertices in the hypergraph.
E The set of hyperedges in the hypergraph.
W The diagonal matrix of the hyperedge weights, R E × E .
G A hypergraph G = V , E , W .
H The incidence matrix of the hypergraph, R N × E .
d v The degree of the vertex v.
δ e The degree of the hyperedge e.
D v The diagonal matrix of the vertex degrees, R N × N .
D e The diagonal matrix of the hyperedge degrees, R E × E .
L The hypergraph Laplacian matrix, R N × N .
Table 2. Details of the benchmark datasets.
Table 2. Details of the benchmark datasets.
Dataset#Classes#Instance#Feature#Train#TestDataset#Classes#Instance#Feature#Train#Test
austra26801410680weather22241012
australian26901470620Wine31781319159
breast2277929248X8D5K510008100900
cleve22961330266zoo7101161388
diabetes2768878690cloud2102410103921
dnatest311861801201066bupa2345610335
german2100024100900air33596437322
heart22701327243segmentation72101821189
ionosphere23513210341pima In. D.2768810758
iris3150415135Xinp31781319159
sonar22086010198wdbc25693058511
vote24351610425ecoli label2335210325
WBC2683910673appendicitis210671294
Table 3. Hyperparameter settings of different methods.
Table 3. Hyperparameter settings of different methods.
MethodHyper-Parameters
HGCELM (Ours) L = 50 , λ = 10 3
GCELM [32] L = 50 , λ = 10 3
ELM [2] L = 50 , λ = 10 5
KELM [38] λ = 10 5
SS-ELM [21] L = 50 , λ 1 = 10 5 , λ 2 = 10 2
TSVM [19] C = 10 3 , C * = 1.5 , k e r n e l = R B F
ST-ELM [39] L = 50 , λ 1 = λ 2 = 10 5
LapSVM [20] λ 1 = 0.001 , λ 2 = 0.05
GCN [25] L = 50 , l r = 0.02 , e p o c h = 100
Table 4. Experimental results of different algorithms on 26 different datasets (MEAN ± STD, preferably in BOLD).
Table 4. Experimental results of different algorithms on 26 different datasets (MEAN ± STD, preferably in BOLD).
Data SetsHGCELMGCELMSS-ELMELMKELMTSVMST-ELMLapSVMGCN
austra82.57 ± 3.7280.78 ± 2.9976.24 ± 8.1473.32 ± 10.4879.47 ± 4.1277.43 ± 10.5975.17 ± 9.9955.70 ± 0.2171.38 ± 8.36
australian81.90 ± 4.7378.46 ± 4.8278.10 ± 6.2471.35 ± 11.2579.32 ± 6.1175.38 ± 10.8178.57 ± 9.5675.79 ± 6.6175.36 ± 7.56
breast66.52 ± 10.3961.61 ± 8.6155.30 ± 7.6752.00 ± 7.3855.56 ± 8.9755.77 ± 6.7853.18 ± 10.6871.54 ± 0.0060.17 ± 6.18
cleve74.51 ± 2.0972.90 ± 6.4272.88 ± 5.3068.95 ± 7.3172.38 ± 3.7972.80 ± 5.7068.85 ± 8.3054.20 ± 0.0070.17 ± 6.14
diabetes69.22 ± 3.7570.10 ± 3.0966.27 ± 8.9257.31 ± 8.5165.71 ± 5.6464.97 ± 5.2658.43 ± 8.5665.30 ± 0.0662.93 ± 6.62
dnatest78.55 ± 2.3467.87 ± 5.0351.60 ± 4.6350.77 ± 4.7950.81 ± 4.2660.99 ± 2.9442.86 ± 8.5148.27 ± 4.1059.33 ± 3.95
german67.38 ± 2.9367.10 ± 10.0154.48 ± 5.3557.95 ± 9.4251.07 ± 7.4859.52 ± 6.8959.64 ± 7.6370.21 ± 0.0359.54 ± 5.32
heart78.50 ± 2.8376.53 ± 3.6372.54 ± 6.0669.15 ± 9.0773.25 ± 8.9970.23 ± 7.7767.21 ± 8.6255.77 ± 0.0070.00 ± 8.58
ionosphere90.32 ± 0.0081.07 ± 7.5175.98 ± 4.3174.65 ± 8.9675.57 ± 6.1877.05 ± 5.4377.20 ± 5.6864.52 ± 0.0071.32 ± 8.21
iris95.56 ± 1.7294.32 ± 3.2080.04 ± 4.7980.70 ± 10.7693.04 ± 3.8893.00 ± 3.7081.04 ± 8.3692.70 ± 2.5891.19 ± 3.61
sonar67.93 ± 4.6067.44 ± 6.6463.28 ± 5.1064.66 ± 6.3762.56 ± 5.3766.41 ± 5.2762.21 ± 5.1046.46 ± 0.0067.22 ± 7.09
vote90.12 ± 2.6084.19 ± 5.7283.75 ± 3.4984.39 ± 6.1787.98 ± 1.9787.97 ± 3.5987.65 ± 6.2138.35 ± 0.0086.62 ± 3.87
WBC97.21 ± 0.2796.53 ± 0.7492.30 ± 1.8389.35 ± 5.6395.30 ± 2.7895.40 ± 1.9092.14 ± 3.5065.26 ± 0.0795.93 ± 2.28
weather84.17 ± 6.9276.94 ± 14.3869.72 ± 15.1473.61 ± 12.7469.17 ± 11.2176.39 ± 14.2970.00 ± 15.0041.67 ± 0.0050.83 ± 11.66
Wine96.38 ± 1.3891.25 ± 2.2889.37 ± 3.3379.75 ± 6.7290.31 ± 3.9290.51 ± 3.3479.59 ± 7.6888.45 ± 1.6090.84 ± 2.79
X8D5K100.00 ± 0.00100.00 ± 0.00100.00 ± 0.0096.93 ± 3.0799.96 ± 0.0699.98 ± 0.0599.93 ± 0.21100.00 ± 0.00100.00 ± 0.00
zoo100.00 ± 0.0099.22 ± 1.3298.02 ± 1.4598.18 ± 1.5198.65 ± 1.6099.48 ± 1.0999.69 ± 0.9499.95 ± 0.28100.00 ± 0.00
cloud90.38 ± 2.9290.18 ± 3.6889.92 ± 3.2273.38 ± 10.3288.48 ± 4.3689.71 ± 4.6575.95 ± 9.5481.32 ± 4.8688.10 ± 3.90
bupa57.37 ± 2.7256.33 ± 6.9352.55 ± 4.0154.07 ± 6.6356.15 ± 4.9855.95 ± 5.6255.41 ± 5.8541.79 ± 0.0051.72 ± 3.71
air81.72 ± 6.3371.30 ± 6.2566.52 ± 4.5067.44 ± 5.8370.89 ± 4.1870.68 ± 6.3869.53 ± 7.5572.49 ± 7.6379.65 ± 5.78
segmentation84.51 ± 2.0083.22 ± 3.1478.91 ± 3.4265.71 ± 7.7483.01 ± 3.2281.85 ± 4.5367.05 ± 8.3680.84 ± 3.5180.74 ± 3.61
pima In. D.68.97 ± 3.1968.45 ± 5.5567.28 ± 7.6758.43 ± 7.0663.52 ± 6.7864.84 ± 8.3462.35 ± 5.6535.29 ± 1.0365.47 ± 4.70
Xinp97.33 ± 0.8593.48 ± 1.9593.04 ± 2.3180.31 ± 6.6492.76 ± 3.0093.90 ± 1.7681.56 ± 8.5992.45 ± 1.3294.97 ± 1.82
wdbc94.44 ± 1.4292.13 ± 0.1688.94 ± 4.5981.20 ± 8.6391.72 ± 4.2091.15 ± 4.0289.51 ± 5.3962.97 ± 0.0090.86 ± 2.44
ecoli_label67.45 ± 1.7864.46 ± 7.1667.09 ± 2.2957.92 ± 11.9561.17 ± 12.5858.92 ± 13.9763.28 ± 9.7161.80 ± 8.4463.14 ± 8.21
appendicitis86.98 ± 2.2585.83 ± 2.8073.33 ± 14.1058.12 ± 11.0069.06 ± 12.8863.70 ± 15.2957.86 ± 15.2316.67 ± 0.0076.15 ± 10.68
Average82.69 ± 2.8480.4 ± 4.6175.64 ± 5.1471.26 ± 7.7876.9 ± 5.2977.53 ± 5.9472.62 ± 7.6865.9 ± 1.5776.63 ± 5.12
W/T/L24/1/125/1/026/0/026/0/026/0/026/0/024/1/124/2/0
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Liu, Z.; Zhang, Z.; Cai, Y.; Miao, Y.; Chen, Z. Semi-Supervised Classification via Hypergraph Convolutional Extreme Learning Machine. Appl. Sci. 2021, 11, 3867. https://doi.org/10.3390/app11093867

AMA Style

Liu Z, Zhang Z, Cai Y, Miao Y, Chen Z. Semi-Supervised Classification via Hypergraph Convolutional Extreme Learning Machine. Applied Sciences. 2021; 11(9):3867. https://doi.org/10.3390/app11093867

Chicago/Turabian Style

Liu, Zhewei, Zijia Zhang, Yaoming Cai, Yilin Miao, and Zhikun Chen. 2021. "Semi-Supervised Classification via Hypergraph Convolutional Extreme Learning Machine" Applied Sciences 11, no. 9: 3867. https://doi.org/10.3390/app11093867

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop