Fast Global and Local Semi-Supervised Learning via Matrix Factorization

Du, Yuanhua; Luo, Wenjun; Wu, Zezhong; Zhou, Nan

doi:10.3390/math12203242

Open AccessArticle

Fast Global and Local Semi-Supervised Learning via Matrix Factorization

¹

College of Applied Mathematics, Chengdu University of Information Technology, Chengdu 610225, China

²

School of Electronic Information and Electrical Engineering, Chengdu University, Chengdu 610225, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2024, 12(20), 3242; https://doi.org/10.3390/math12203242

Submission received: 11 September 2024 / Revised: 4 October 2024 / Accepted: 8 October 2024 / Published: 16 October 2024

Download

Browse Figures

Versions Notes

Abstract

:

Matrix factorization has demonstrated outstanding performance in machine learning. Recently, graph-based matrix factorization has gained widespread attention. However, graph-based methods are only suitable for handling small amounts of data. This paper proposes a fast semi-supervised learning method using only matrix factorization, which considers both global and local information. By introducing bipartite graphs into symmetric matrix factorization, the technique can handle large datasets effectively. It is worth noting that by utilizing tag information, the proposed symmetric matrix factorization becomes convex and unconstrained, i.e., the non-convex problem

{min}_{x} {(1 - x^{2})}^{2}

is transformed into a convex problem. This allows it to be optimized quickly using state-of-the-art unconstrained optimization algorithms. The computational complexity of the proposed method is

O (n m d)

, which is much lower than that of the original symmetric matrix factorization, which is

O (n^{2} d)

, and even lower than that of other anchor-based methods, which is

O (n m d + m^{2} n + m^{3})

, where n represents the number of samples, d represents the number of features, and

m ≪ n

represents the number of anchors. The experimental results on multiple public datasets indicate that the proposed method achieves higher performance in less time.

Keywords:

matrix factorization; bipartite graph; semi-supervised learning; clustering

MSC:

68Q04

1. Introduction

As technology progresses, particularly with the pervasive adoption of smartphones, the volume of data surrounding us is increasing at an exponential rate. Consequently, the effective processing and analysis of this vast dataset have become critically important. For instance, rapidly distinguishing images within a mobile device’s photo gallery presents significant challenges to both memory capacity and computational power. Therefore, it is imperative to develop algorithms that minimize memory usage while ensuring rapid solution speeds and achieving satisfactory learning outcomes.

Matrix factorization (MF) plays an important role in fields such as machine learning and data analysis. It can learn a low-dimensional representation of data from high-dimensional data. Many classical matrix factorization methods have been proposed, including non-negative matrix factorization [1,2], singular value decomposition (SVD) [3], principal component analysis (PCA) [4], and concept factorization [5].

Non-negative matrix factorization (NMF) has become popular in recent years. It factorizes data into a set of non-negative bases and their representations under these non-negative bases. Despite its impressive performance and interpretability, NMF can only handle non-negative data due to non-negativity constraints. Since in the case of the same low rank the error of NMF is usually larger than that obtained by SVD, Ref. [6] proposed a low-rank matrix factorization with orthogonal constraints.

Matrix factorization considers global information while ignoring local manifolds. Many graph-based matrix factorization methods have been proposed to address this problem. Cai et al. proposed GNMF [7] and LCCF [8], which consider the local manifold. Yi et al. proposed NMF-LCAG [9], which considers both reconstruction and local information. Peng et al. proposed GCCF [10], which uses the maximum correntropy criterion (MCC) against outliers. Wang et al. proposed CHNMF [11], which uses a hypergraph to explore high-order geometric information.

The above methods are all unsupervised and cannot use known information. Using a small amount of labeled data leads to a significant improvement in performance. Liu et al. proposed CNMF [12] and CCF [13], which constrain samples with the same classes into the same representation. Zhang et al. proposed NMFCC [14], which uses the labeled information to construct a must-link and cannot-link matrix. In fact, an unsupervised graph-based MF method can be transformed into a semi-supervised one by incorporating a must-link and cannot-link matrix. Peng et al. proposed CSNMF [15] and CSCF [16], which assign adaptive neighbors to construct an appropriate adjacency matrix. Zhou et al. proposed CLMF [17], which uses sparsity-induced similarity (SIS) to adaptively learn the adjacency matrix.

A particular special type of matrix factorization, called symmetric matrix factorization (SMF), directly factorizes the graph matrix into identical low-rank matrices. Since the graph contains only local relations between data, its result contains only local information. Zhang et al. proposed SNMFCC [14], which incorporates the must-link and cannot-link matrix into the adjacency matrix. Wu et al. proposed PCPSNMF [18], which adaptively and simultaneously teaches the adjacency matrix and propagates the initial pairwise constraints among the data. Chavoshinejad et al. proposed S⁴NMF, which introduces the self-supervised information into semi-supervised symmetric NMF. Yin et al. proposed HSSNMF [19], which uses the hypergraph-based pairwise constraint propagation algorithm to capture the high-order information. It is important to note that the essence of SMF is

{min}_{x} {(1 - x^{2})}^{2}

, which is non-convex.

Graph-based and symmetric matrix factorization methods are unsuitable for addressing excessive data because they require the storage and computation of an n-order matrix. To address this issue, bipartite graph-based methods have been proposed. Liu et al. [20] proposed the local anchor embedding (LAE) algorithm. Wang et al. proposed EAGR [21], which uses FLAE, an improved AGR algorithm, by modifying the estimation of local weight and construction of an adjacency matrix. Nie et al. proposed the BGSSL [22], which applies a bipartite graph to graph-based semi-supervised learning (GSSL). It should be noted that it is difficult to apply an

n \times m

-order bipartite graph to SMF, as it requires an n-order fully adjacent matrix.

In response to the substantial memory demands and high computational complexity in prior methodologies, this paper proposes a novel fast global local matrix factorization (FGLMF) model aimed at enhancing performance while addressing these critical considerations. The method has the following benefits.

A novel method using only matrix factorization is proposed. It simultaneously learns the global and local information between data, and it is easily interpretable.
A bipartite graph is introduced to the symmetric matrix factorization. The computational complexity of the proposed method needs $O (n m d)$ , which is smaller than other bipartite graph-based methods’ $O (n m d + m^{2} d + m^{3})$ .
The proposed SMF is convex due to using the label information. Therefore, every optimal solution is the global optimal solution, and it can be solved very fast.

2. Preliminaries

2.1. Bipartite Graph

Let n represent the number of samples, m represent the number of anchor points, and d represent the data dimension. A graph is utilized to represent the structural relationships among samples. Therefore, a graph necessitates the computation and storage of an n-dimensional matrix. It is well known that the computational complexity for multiplying an

n \times p

matrix with a

p \times q

matrix is

O (n p q)

. Consequently, if there exists an n-dimensional matrix such as a full graph within the model, subsequent computations will inevitably incur a complexity greater than

O (n^{2})

. A bipartite graph is a particular type of graph representing the structural relationship between samples and anchors, it necessitates the computation and storage of an

n \times m

matrix, where

m ≪ n

. Consequently, leveraging a bipartite graph frequently enhances the model’s operational efficiency. A bipartite graph can be represented as

G = [X, S, U]

, where

X

stands for samples,

U

stands for anchor points, and

S

stands for the adjacency relationship between samples and anchor points. The full adjacency matrix of a bipartite graph can be represented as

B = [\begin{matrix} 0 & S \\ S^{⊤} & 0 \end{matrix}] .

(1)

Given a graph

B

, its Laplacian matrix can be represented as

L = D - B,

(2)

where

D

is the degree matrix of

B

with

D_{i i} = \sum_{j} B_{i j}

. And, the normalized Laplacian matrix is

L = I - D^{- \frac{1}{2}} B D^{- \frac{1}{2}}

.

In [23], Huang et al. proposed a method for constructing the adjacency matrix

S

:

\frac{d_{i, k + 1} - d_{i, j}}{k d_{i, k + 1} - \sum_{i = 1}^{k} d_{i, j}},

(3)

where

d_{i, j}

represents the distance between the i-th sample and its j nearest neighbors, and k represents the number of nearest neighbors.

2.2. Matrix Factorization

Matrix factorization involves dividing data into two or more matrices, typically represented in mathematical terms as

min_{W, H} {∥ X - W H ∥}_{F}^{2},

(4)

where

W

and

H

are two low-dimensional representations,

W

is the base matrix, and

H

can be regarded as a label matrix. Since matrix factorization can be viewed as solving the problem of

{min}_{x, y} {(1 - x y)}^{2}

, it has infinitely many solutions. Therefore, it is necessary to impose some constraints on the model to limit the solutions. The most popular matrix factorization algorithm is non-negative matrix factorization, which imposes non-negativity constraints on

W

and

H

.

Symmetric matrix factorization is a specialized form of matrix factorization that factorizes the adjacency matrix and can be expressed as follows:

min_{H} {∥ B - H^{⊤} H ∥}_{F}^{2},

(5)

where

B

is the full adjacency matrix, and

H

can be regarded as a label matrix. Due to the factorization of the adjacency matrix, the resulting low-dimensional representation of SMF contains local information. Clearly, if

H^{*}

is the optimal solution for SMF, then

- H^{*}

is an optimal solution for SMF. Therefore, it is common for SMF to impose constraints to limit the solutions. The most popular SMF is symmetric non-negative MF (SNMF), which imposes non-negativity constraints on

H

.

3. Proposed Method

In this section, a new fast global local MF model is proposed. First, a novel method is introduced to construct the natural normalized full adjacency matrix using the bipartite graph. Thanks to this, the

n \times n

-order matrix, which is not explicitly represented, can effectively reduce the computational and spatial complexity. For local information, a bipartite graph-based semi-supervised symmetric matrix factorization framework is proposed, which is convex and unconstrained, thus enabling a fast solution while ensuring a global optimal solution. For global information, low-rank matrix factorization is utilized for rapid solution in one step, avoiding non-negative constraints that can handle non-negative data effectively.

3.1. Methodology

Due to the requirement of a full adjacency matrix in the symmetric matrix, it is not practical to use a bipartite graph directly. A concept of constructing the full adjacency matrix using anchor points is proposed, as shown in Figure 1. By connecting each sample with two anchor points, the adjacency relationship between samples is obtained. Therefore, the new full adjacency matrix is as follows:

B = S Λ^{- 1} S^{⊤},

(6)

where

Λ

is a diagonal matrix with

Λ_{i i} = \sum_{j} S_{j i}

. According to [24], Equation (6) can be viewed as a non-weighted hypergraph; accordingly, Figure 1 can be considered as connecting samples through hyperedges, each of which connects not just two vertices but multiple vertices. In addition, due to the row sum of the adjacency matrix of a bipartite graph being one, the fully connected matrix generated by Equation (6) is self-normalized:

Theorem 1.

For a matrix

S

with

\sum_{j} S_{i j} = 1

, the degree matrix of

S Λ^{- 1} S^{⊤}

is

I

.

The proof of Theorem 1 is in Appendix A.

Therefore, the full adjacency matrix generated by Equation (6) can capture high-order relationships between data, does not require additional normalization, and significantly reduces computational complexity by storing and computing an

n \times m

-order matrix. Therefore, the bipartite graph-based symmetric matrix factorization can be expressed in the following form:

∥ S Λ^{- 1} S^{⊤} - H^{⊤} {H ∥}_{F}^{2} .

(7)

Traditional symmetric matrix factorization requires incorporating label information into the full adjacency matrix, which involves additional computation and necessitates the explicit appearance of

S Λ^{- 1} S^{⊤}

. Therefore, this paper incorporates the label information into the low-dimensional representation matrix as one-hot constraints. Specifically, let

H = ZA

, where

Z

and

A

are auxiliary matrices.

Z = [\begin{matrix} I & \tilde{Z} \end{matrix}] \in R^{c \times (n - l + c)}

,

\tilde{Z} \in R^{c \times (n - l)}

is the low-dimensional representation of an unlabeled sample, and

I

is the identity matrix.

A = [\begin{matrix} A_{1} \\ A_{2} \end{matrix}] \in R^{(n - l + c) \times n}

records the label information, where

A_{1} \in R^{c \times n}

and

A_{2} \in R^{(n - l) \times n}

. If the j-th sample is labeled as the i-th class,

{(A_{1})}_{i j} = 1

; otherwise,

{(A_{1})}_{i j} = 0

. If the j-th column of

A_{1}

is a zero vector and the j-th sample is the i-th unlabeled one,

{(A_{2})}_{i j} = 1

; otherwise,

{(A_{2})}_{i j} = 0

. For example, there are eight samples with three classes, and

x_{3}

is labeled with class I,

x_{1}

and

x_{6}

are labeled with class II, and

x_{7}

is labeled with class III, then

A

becomes

A_{1} = [\begin{matrix} 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \\ 1 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \end{matrix}],

(8a)

A_{2} = [\begin{matrix} 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \end{matrix}] .

(8b)

Imposing a one-hot constraint on labeled samples is justifiable, as these samples do not necessitate further distinction. The reformed SMF is

min_{\tilde{Z}} {∥ S Λ^{- 1} S^{⊤} - A^{⊤} Z^{⊤} ZA ∥}_{F}^{2} .

(9)

This reduces computation and, more importantly, transforms the model from non-convex to convex. Therefore, there is no need to add constraints to

\tilde{Z}

, and any local minimizer is a global minimizer [25]. An explanation of the model’s convexity is provided in Theorem 2.

Theorem 2.

The objective function of Equation (9) is convex.

Equation (9) only considers the local information between samples; it is also crucial to consider the global information. MF can effectively capture global information. Thus, the following MF framework is proposed,

\begin{matrix} min_{W, \tilde{Z}} {∥ X - W ZA ∥}_{F}^{2} + λ {∥ S Λ^{- 1} S^{⊤} - A^{⊤} Z^{⊤} ZA ∥}_{F}^{2} . \\ s . t . W^{⊤} W = I \end{matrix}

(10)

The proof of Theorem 2 is in Appendix B.

3.2. Optimization

Considering

A

, there is

A_{1} A_{1}^{⊤} = Y, A_{1} A_{2}^{⊤} = 0, A_{2} A_{2}^{⊤} = I,

(11)

where

Y

is a diagonal matrix with

Y_{i i} = l_{i}

, and

l_{i}

is the number of the i-th labeled class.

Firstly, expanding the MF term, there is

\begin{matrix} {∥ X ∥}_{F}^{2} - 2 Tr (WZA X^{⊤}) + {∥ WZA ∥}_{F}^{2} \\ = & {∥ X ∥}_{F}^{2} - 2 Tr [W (A_{1} + \tilde{Z} A_{2}) X^{⊤}] + {∥ W A_{1} ∥}_{F}^{2} \\ + Tr (W \tilde{Z} A_{2} A_{1}^{⊤}) + {∥ W \tilde{Z} A_{2} ∥}_{F}^{2} \\ = & {∥ X ∥}_{F}^{2} - 2 〈(A_{1} + \tilde{Z} A_{2}), W^{⊤} X〉 \\ + l + {∥\tilde{Z}∥}_{F}^{2}, \end{matrix}

(12)

where

< \cdot, \cdot >

is the inner product of the matrix, representing the sum of element-wise products. Discarding all terms unrelated to

W

and

\tilde{Z}

, we obtain

{∥\tilde{Z}∥}_{F}^{2} - 2 〈(A_{1} + \tilde{Z} A_{2}), W^{⊤} X〉 .

(13)

Secondly, expanding the SMF term, there is

\begin{matrix} ∥ \tilde{S} {\tilde{S}}^{⊤} - A^{⊤} Z^{⊤} {Z A ∥}_{F}^{2} \\ = & ∥ \tilde{S} {\tilde{S}}^{⊤} ∥_{F}^{2} - 2 ∥Z A \tilde{S}∥ + {∥ A^{⊤} Z^{⊤} Z A ∥}_{F}^{2} \\ = & ∥ \tilde{S} {\tilde{S}}^{⊤} ∥_{F}^{2} - 2 {∥ A_{1} \tilde{S} + \tilde{Z} A_{2} \tilde{S} ∥}_{F}^{2} \\ + ∥ Y + {\tilde{Z}}^{⊤} \tilde{Z} ∥_{F}^{2} . \end{matrix}

(14)

Discarding all terms unrelated to

\tilde{Z}

, we obtain

∥ Y + {\tilde{Z}}^{⊤} \tilde{Z} ∥_{F}^{2} - 2 {∥ A_{1} \tilde{S} + \tilde{Z} A_{2} \tilde{S} ∥}_{F}^{2} .

(15)

3.2.1. Fix $W$ , Optimize $\tilde{Z}$

According to Equation (13) and Equation (15), the gradient of

\tilde{Z}

is

\begin{matrix} 2 \tilde{Z} - 2 W X^{⊤} A_{2}^{⊤} \\ + & 4 λ [(Y + \tilde{Z} {\tilde{Z}}^{⊤}) \tilde{Z} - (A_{1} \tilde{S} + \tilde{Z} A_{2} \tilde{S}) {\tilde{S}}^{⊤} A_{2}^{⊤}] . \end{matrix}

(16)

Due to the absence of constraints on

\tilde{Z}

, it can be optimized using state-of-the-art unconstrained optimization algorithms. Here, CG_DESCENT 6.8 (https://people.clas.ufl.edu/hager/software/, accessed on 1 September 2024) [26,27,28] is employed. It is important to note that traditional matrix factorization methods employ multiplicative updating rules (MURs) for optimization. While these methods can ensure a decrease in the objective function, they do not guarantee convergence. Furthermore, as the objective function of

\tilde{Z}

is convex, it can be solved rapidly.

3.2.2. Fix $\tilde{Z}$ , Optimize $W$

According to Equation (13), and using SVD as

(A_{1} + \tilde{Z} A_{2}) X^{⊤} = GD V^{⊤}

, we have

\begin{matrix} Tr (W (A_{1} + \tilde{Z} A_{2}) X^{⊤}) & = Tr (W G D V^{⊤}) \\ = Tr (V^{⊤} W G D) \\ = \sum_{i = 1}^{k} {(V^{⊤} W G)}_{i i} D_{i i} . \end{matrix}

(17)

Because of the orthogonality of

V^{⊤} WG

, its diagonal entries are not smaller than

- 1

. When all the diagonal elements of

V^{⊤} WG

equal

- 1

, Equation (17) has been minimized. Therefore, we have

V^{⊤} WG = - I

, i.e.,

W = - V G^{⊤}

.

The pseudocode for FGLMF is presented in Algorithm 1.

Algorithm 1 FGLMF

1:: Input: Data matrix $X \in R^{d \times n}$ , constraint matrix $A$ , parameters $λ$ .
2:: Output: Clustering indicator matrix (representation matrix) $Z$
3:: Initialize the anchor point set $U$ by k-means, and generate bipartite graph by Equation (3)
4:: while Not convergent do
5:: Update $\tilde{Z}$ by CG_DESCENT 6.8.
6:: Compute SVD of ${ZAX}^{⊤} = {GDV}^{⊤}$ .
7:: Update $W = - V G^{⊤}$ .
8:: Let $t \leftarrow t + 1$ .
9:: end while

3.3. Computational Complexity

The computational complexity of FGLMF is mainly divided into six parts.

It takes $O (m n d)$ to compute the anchors.
It takes $O (m n d)$ to compute the distance between samples and anchors.
It takes $O (k m n)$ to compute the bipartite graph, where k is the number of nearest neighbors.
It takes $O (c n d + c^{2} n + c m n)$ to compute the objective function of $\tilde{Z}$ .
It takes $O (c n d + c^{2} n + c m n)$ to compute the gradient of $\tilde{Z}$ .
It takes $O (c n d + c^{2} d)$ to compute the SVD of $ZA X^{⊤}$ .

Due to

c ≪ min (m, n, d)

and

k ≪ min (m, n, d)

, the total computational complexity is

O (m n d)

. It should be noted that other bipartite graph-based methods, such as BGSSL and EAGR, need

O (n d m + n m^{2} + m^{3})

.

4. Experiments

4.1. Compared Method

To demonstrate the efficiency and effectiveness of the proposed FGLMF, we choose the following methods for comparison:

TSVD: Truncated singular value decomposition [29], it can give the best approximation with a preset rank.
SemiGNMF: Semi-supervised graph regularized non-negative matrix factorization [7].
CSNMF: Correntropy-based semi-supervised NMF [15].
CSCF: Correntropy-based semi-supervised concept factorization [16].
CLMF: Correntropy-based low-rank matrix factorization [17].
PCPSNMF: pairwise constraint propagation-induced SNMF [18].
S⁴NMF: Self-supervised semi-supervised non-negative matrix factorization [30].
HSSNMF: Hypergraph-based semi-supervised symmetric non-negative matrix factorization [19].
EAGR: Efficient anchor graph regularization [21].
BGSSL: Bipartite graph semi-supervised learning [22].

Specifically, TSVD is an unsupervised MF method used as the baseline, and the rest are semi-supervised methods. Methods (1)–(5) are matrix factorization methods that factorize the samples, (6)–(8) are matrix factorization methods that factorize the full adjacency matrix, and (9)–(10) are bipartite graph-based methods.

4.2. Datasets

To evaluate the performance of the proposed FGLMF, six real-world datasets are used, and the relevant statistics are shown in Table 1. COIL20 ¹ (http://www.cad.zju.edu.cn/home/dengcai/Data/MLData.html, accessed on 1 September 2024) and COIL100 ¹ are object datasets; YaleB ² (http://www.cad.zju.edu.cn/home/dengcai/Data/FaceData.html, accessed on 1 September 2024) is a face dataset; USPS ¹ and MNIST ³ (https://www.nist.gov/itl/products-and-services/emnist-dataset#:~:text=The%20EMNIST%20dataset%20is%20a%20set, accessed on 1 September 2024) are handwritten digit datasets; and Letters ³ is a handwritten English letters dataset.

4.3. Experiment Settings

To ensure the fairness of the experiments, all experiments in this paper were performed on a personal computer with an Intel i7-6800k CPU and 16 GB RAM. The parameters of all the compared methods were set according to the papers in which they are described. The number of nearest neighbors was set to five for all graph-based methods. For FGLMF, we fixed the parameter

λ

at 100. For the COIL20 and YaleB datasets, the number of anchor points was set to 500 and 1500, respectively, and for the remaining datasets, the number of anchor points was set to 2000. According to [22], we set the dimensions of the low-dimensional representation of BGSSL to the number of ground truths plus 1, and for the remaining methods, the low-dimensional representation was set to the number of ground truth classes of each dataset. For TSVD, GNMF, CSNMF, and CSCF, we obtained the results using k-means on the low-dimensional representation, while for the remaining methods, we used

{arg max}_{i} H_{i j}

to obtain the results. The stop rule of CG_DESCENT was set to

{∥ ▽ f ∥}_{\infty} < 10^{- 6}

. We randomly selected 30% of the samples as labeled samples. Four commonly used metrics were adopted, including accuracy (ACC), normalized mutual information (NMI), adjusted Rand index (ARI), and F-score. Specifically, accuracy calculates the percentage of correctly assigned labels by comparing real label information with learned label information. NMI measures the amount of shared information between two statistical distributions. ARI assesses the adjusted Rand index to evaluate the agreement between obtained labels and true labels, while the F-score determines clustering accuracy through a weighted average of precision and recall. The values for these evaluation metrics range from 0 to 1, where higher values indicate superior clustering quality. Since these four metrics emphasize different aspects within clustering applications, we present experimental results across these diverse criteria to conduct a comprehensive evaluation of clustering performance for various methods. We conducted ten random trials for all methods and reported the average results for each dataset.

4.4. Experiment Performance

Table 2, Table 3, Table 4 and Table 5 shows the clustering results of the various methods with the best results highlighted. Due to negative values in the USPS data, both GNMF and CSNMF are unsuitable for this dataset. It can be observed that the proposed FGLMF can achieve the best effect in most cases. Additionally, on the YaleB data, CLMF demonstrates the best performance and, compared with other bipartite graph-based methods, the performance of FGLMF is significantly enhanced. It should be noted that similar to FGLMF, CLMF is a method that considers global and local information. This indicates that considering global and local information on facial data can yield better results. It should be noted that CLMF can only handle a small amount of data due to the need for a large number of computational resources. FGLMF performs better than other MF methods in most cases, although it is an anchor-based method. This might be because of the non-negative constraints of other MF methods, which use MUR for optimization, meaning they can only guarantee a decrease in the objective function but not convergence. Since a convex, unconstrained SMF is proposed in this paper, convergence to the global optimal solution can be guaranteed for any algorithms that guarantee local optimal solutions. Furthermore, the most advanced unconstrained optimization algorithms are often much faster than MUR.

We also compared the average performance of SemiGNMF, PCPSNMF, HSSNMF, EAGR, BGSSL, and FGLMF under different labeling ratios in Figure 2. As the labeled ratio increases, the clustering performance improves in most cases. Under the USPS dataset, for the remaining MF methods, there exists a situation where the accuracy varies to different degrees, and the performance decreases as the labeling ratio increases. On the one hand, it is observed that the improvement of all methods is not significant, indicating that the labeling information has reached a bottleneck. On the other hand, other MF methods cannot guarantee convergence, and therefore, more iterations may be required, but this will increase the already high computational cost. In most cases, the proposed FGLMF shows good performance, demonstrating its effectiveness.

4.5. Time Consumption

The average time taken for each method is presented in Table 6. The time associated with anchor-based methods can be categorized into two components: the first component involves the duration spent on the k-means algorithm, while the second depends on the models. It is noteworthy that numerous efficient k-means algorithms are currently available, such as [31] and others. However, since this article does not primarily focus on the exploration of k-means algorithms, we opted for the built-in k-means algorithm provided by the scikit-learn [32] library in Python to ensure fairness. FGLMF is the fastest among the anchor-based methods, especially for large-scale datasets, where this improvement is particularly evident. It should be noted that both FGLMF and TSVD require SVD. However, FGLMF performs SVD on a

c \times n

-order matrix, while TSVD needs to perform SVD on a

d \times n

-order matrix (c is much smaller than d), which explains why FGLMF is faster than TSVD in the solution process. Unfortunately, FGLMF requires k-means to find anchors, which makes its total elapsed time slower than TSVD. Nevertheless, FGLMF is still the second fastest among all the methods.

4.6. Sensitivity to $λ$ Parameter

FGLMF has a parameter

λ

, and the selection of this parameter is examined in this section. Figure 3 presents the average clustering accuracy of FGLMF under different lambda values. It can be observed that either an excessively high or an excessively low

λ

leads to the degradation of the performance of FGLMF. A

λ

that is too low neglects the local structure and only considers global information, reducing clustering performance. An overly high

λ

ignores the global information, reducing the clustering performance. When

λ

takes values of 10 and 100, it exhibits the best performance across all datasets.

4.7. Effect of Anchors

To verify the sensitivity of the proposed method to anchor points, we conducted tests on each dataset using only different anchor points, presented in Figure 4. With the increase in anchor points, the clustering performance improved. Among them, the YaleB dataset is the most sensitive to the number of anchor points. In contrast, the USPS and MNIST datasets are not sensitive to the number of anchor points. This might be attributed to the complexity of the data. Both USPS and MNIST are handwritten digit datasets, and a small number of anchor points can represent the entire dataset. However, YaleB is a dataset of human faces with different expressions and shadows, and a relatively large number of anchor points is necessary to represent the entire dataset. In addition, the solution time tends to increase as the number of anchor points increases. It should be noted that this increase is not significant. This is because FGLMF has a linear relationship with the number of anchor points, rather than

m^{3}

as in other methods. This indicates that FGLMF is suitable for situations with a large number of anchor points. As mentioned above, for more complex datasets, a smaller number of anchor points may only represent part of the dataset. Sometimes, it is necessary to increase the number of anchor points to improve performance.

4.8. Bases

To verify the degree of global information extraction by the proposed method, we present the first ten bases extracted by FGLMF for each dataset in Figure 5. It can be observed that each base represents its respective category relatively clearly. It should be noted that each category of the Letters dataset contains both uppercase and lowercase letters. FGLMF does an excellent job of extracting the facial features of each person in the YaleB dataset, which also explains why the proposed method performs much better than other bipartite graph-based methods on this dataset.

4.9. Generated Graph

To validate the situation of the graph generated by Equation (6), we compared the graph generated by Equation (6) with the one generated using Gaussian kernel on the COIL20 dataset, as shown in Figure 6. Figure 6a, which displays the adjacency matrix of the bipartite graph generated using 500 anchors and five nearest neighbors. It can be observed that points are scattered randomly in the graph, and no special structure of the adjacency matrix is apparent. Figure 6b presents the full adjacency matrix generated from the adjacency matrix in Figure 6a, showing a clear block structure, i.e., connections between the same classes. Figure 6c illustrates the graph generated using Gaussian kernel, which exhibits a similar structure to that in Figure 6b, validating our approach’s feasibility. Furthermore, Figure 6c emphasizes connections with the nearest samples, while Figure 6b captures high-order relationships between the data, i.e., connections within classes.

4.10. Convergence Study

To verify the convergence of FGLMF, the convergence curves are presented in Figure 7. The figures show the convergence curves of five iterations. The colored broken lines represent the inner loop of CG_DESCENT, and the blank spaces between the dots of different colors represent the process of solving

W

using SVD. If there is only one dot of a specific color, it indicates that the solution at this time has reached the target accuracy, and CG_DESCENT no longer works. At this point, neither

W

nor

\tilde{Z}

will continue to be optimized. It should be noted that the algorithm should be terminated when the number of iterations of CG_DESCENT is 0. We provide the results of five iterations here to make the pictures easier to understand. A complete iteration can be considered as starting from the dot of one color and ending at the dot of another color.

The figure shows that FGLMF reached the best result after three iterations on USPS, while only two iterations were required to reach the best result on other datasets. Additionally, in an internal loop, only a few iterations are needed to reach the optimal solution. This is attributed to the convexity of the proposed SMF, which eliminates the non-negative constraints and enables the algorithm to solve the problem rapidly. Furthermore, such a small number of iterations further explains why FGLMF requires so little time. Since the first iteration was close to the optimal value, CG_DESCENT requires only a few iterations after the first iteration to reach the optimal value.

To better demonstrate the convergence of FGLMF, we have also magnified and presented the situations before and after the first iteration in the subgraphs. It can be observed from the figure that the objective function continues to decline. After the update of

W

, the objective function undergoes a significant drop. Additionally, based on the decline in the objective function after updating

W

, it can also be seen that

\tilde{Z}

is already very close to the optimal solution.

5. Conclusions

Traditional matrix decomposition methods face challenges related to high computational complexity and substantial memory requirements. This paper proposes a rapid global–local matrix decomposition model that employs LMF (local matrix factorization) to account for global information while utilizing SMF (sparse matrix factorization) to capture local information. Compared to conventional approaches, the computational complexity is reduced from

O (n^{2} d)

to

O (n d m)

, significantly alleviating both the computational burden and memory demands. Furthermore, we introduce a convex, unconstrained symmetric matrix decomposition method for bipartite graphs and provide an analysis and proof of its convexity. Thanks to the proposed symmetric matrix decomposition technique, our method demonstrates considerable advantages in performance.

Limitations and future work: Despite the reduction in computational and spatial complexity, the proposed method struggles to handle extremely large-scale datasets due to its requirement to store an

n \times m

matrix. When n becomes excessively large, there may be instances of insufficient memory. In future work, it would be beneficial to explore optimization techniques such as mini-batch stochastic gradient descent to alleviate memory constraints on the model.

Author Contributions

Conceptualization, W.L.; methodology, W.L.; software, W.L.; validation, Y.D., W.L., Z.W., and W.L.; formal analysis, W.L.; investigation, Y.D., W.L., Z.W., and N.Z.; resources, Y.D., W.L., Z.W., and N.Z.; data curation, W.L.; writing—original draft preparation, W.L.; writing—review and editing, Y.D., W.L., Z.W., and N.Z.; visualization, W.L.; supervision, Y.D., Z.W., and N.Z.; project administration, Y.D.; funding acquisition, Y.D., Z.W., and N.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Sichuan Provincial Science and Technology Department grant numbers 2023NSFSC1425, 2023NSFSC0071 and 2023NSFSC1362, National Natural Science Foundation of China grant number 12101090, 2023 Chengdu University of Information Technology Science and Technology Innovation Capability Enhancement Plan Innovation Team Key Project grant number KYTD202322 and National Social Science Foundation of China grant number 21BTQ099.

Data Availability Statement

Data available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

MF	Matrix factorization
SMF	Symmetric matrix factorization
LMF	Low-rank matrix factorization
FGLMF	Fast global and local matrix factorization
MCC [17]	Maximum correntropy criterion
NMF [1,2]	Non-negative matrix factorization
SVD [3]	Singular value decomposition
PCA [4]	Principal component analysis
CF [5]	Concept factorization
GNMF [7]	Graph-regularized non-negative matrix factorization
LCCF [8]	Locally consistent concept factorization
NMF-LCAG [9]	Non-negative matrix factorization with locality-constrained adaptive graph
GCCF [10]	Correntropy-based graph-regularized concept factorization
CHNMF [11]	Correntropy-based hypergraph-regularized non-negative matrix factorization
CNMF [12]	Constrained non-negative matrix factorization
CCF [13]	Constrained concept factorization
NMFCC [14]	Non-negative matrix factorization-based constrained clustering
CSNMF [15]	Correntropy-based semi-supervised non-negative matrix factorization
CSCF [16]	Correntropy-based semi-supervised concept factorization
CLMF [17]	Correntropy-based low-rank matrix factorization
SIS [17]	Sparsity-induced similarity
SNMFCC [14]	Symmetric non-negative matrix factorization-based constrained clustering
PCPSNMF [18]	Pairwise constraint propagation-induced symmetric non-negative matrix factorization
HSSNMF [19]	Hypergraph-based semi-supervised symmetric non-negative matrix factorization
LAE [20]	Local anchor embedding
EAGR [21]	Efficient anchor graph regularization
FLAE [21]	Fast local anchor embedding
BGSSL [22]	Bipartite graph-based semi-supervised learning
MURs	Multiplicative updating rules

Appendix A. Proof of Theorem 1

Theorem A1.

For a matrix

S

with

\sum_{j} S_{i j} = 1

, the degree matrix of

S Λ^{- 1} S^{⊤}

is

I

.

Proof.

The matrix

H Λ^{- 1} H^{⊤}

exhibits clear symmetry, and for each row, the summation can be described as follows:

\begin{matrix} \sum_{j = 1}^{n} \sum_{k = 1}^{m} \frac{h_{i k} h_{j k}}{d_{k}} & = \sum_{k = 1}^{m} \frac{\sum_{j = 1}^{n} h_{i k} h_{j k}}{\sum_{j = 1}^{n} h_{j k}} \\ = \sum_{k = 1}^{m} h_{i k} = 1, \end{matrix}

(A1)

where

h_{i j}

represents the element in the i-th row and k-th column of matrix

H

, while

d_{k}

denotes the summation of elements in the k-th column of matrix

H

. □

Appendix B. Proof of Theorem 2

Lemma A1

([25]). A function f is convex if and only if for all

x \in dom f

and all v, the function

g (t) = f (x + t v)

is convex (on its domain,

{t | x + t v \in dom f}

).

Lemma A2.

For every

V \in R^{c \times (n - p)}

,

{∥V∥}_{F}^{2} - ∥V A_{2} \tilde{S}∥ \geq 0

.

Proof.

Suppose

W = \tilde{S} {\tilde{S}}^{⊤}

, then the above equation can be rewritten as

Tr [V (I - A_{2} W A_{2}^{⊤}) V^{⊤}]

. Next, we prove that

I - A_{2} W A_{2}^{⊤}

is a positive semidefinite matrix. Define a subset of the set

{1, 2, \dots, n}

as

{{\hat{a}}_{1}, {\hat{a}}_{2}, \dots, {\hat{a}}_{n - l}}

, where

{\hat{a}}_{i} = j

if and only if

{\tilde{a}}_{i j} = 1

, where

{\tilde{a}}_{i j}

is the i-th row and j-th column element of matrix

A_{2}

. Consequently, each element of

A_{2} W A_{2}^{⊤}

can be represented as

w_{{\hat{a}}_{i} {\hat{a}}_{j}}

, where

w_{i j}

belongs to the i-th row and j-th column of matrix

W

. In accordance with the definition of

W

, the following inequality holds true: for every

x \in R^{n}

, there is

\begin{matrix} x^{⊤} (I - A_{2}^{⊤} W A_{2}) x \\ = & \sum_{i = 1}^{n - l} x_{{\hat{a}}_{i}}^{2} - \sum_{i = 1}^{n - l} \sum_{j = 1}^{n - l} \frac{w_{{\hat{a}}_{i} {\hat{a}}_{j}}}{\sqrt{d_{{\hat{a}}_{i}} d_{{\hat{a}}_{j}}}} x_{{\hat{a}}_{i}} x_{{\hat{a}}_{j}} \\ = & \sum_{i = 1}^{n - l} (1 - \frac{w_{{\hat{a}}_{i} {\hat{a}}_{i}}}{d_{{\hat{a}}_{i}}}) x_{{\hat{a}}_{i}}^{2} - \sum_{i = 1}^{n - l} \sum_{j = 1, j \neq i}^{n - l} \frac{w_{{\hat{a}}_{i} {\hat{a}}_{j}}}{\sqrt{d_{{\hat{a}}_{i}} d_{{\hat{a}}_{j}}}} x_{{\hat{a}}_{i}} x_{{\hat{a}}_{j}} \\ \geq & \sum_{i = 1}^{n - l} \sum_{j = 1, j \neq i}^{n - l} w_{{\hat{a}}_{i} {\hat{a}}_{j}} \frac{x_{{\hat{a}}_{i}}^{2}}{d_{{\hat{a}}_{j}}} - \sum_{i = 1}^{n - l} \sum_{j = i + 1}^{n - l} 2 w_{{\hat{a}}_{i} {\hat{a}}_{j}} \frac{x_{{\hat{a}}_{i}} x_{{\hat{a}}_{j}}}{\sqrt{d_{{\hat{a}}_{i}} d_{{\hat{a}}_{j}}}} \\ = & \sum_{i = 1}^{n - l} \sum_{j = i + 1}^{n - l} w_{{\hat{a}}_{i} {\hat{a}}_{j}} \frac{x_{{\hat{a}}_{i}}^{2}}{d_{{\hat{a}}_{i}}} + \sum_{i = 1}^{n - l} \sum_{j = i + 1}^{n - l} w_{{\hat{a}}_{i} {\hat{a}}_{j}} \frac{x_{{\hat{a}}_{j}^{2}}}{d_{{\hat{a}}_{j}}} \\ - \sum_{i = 1}^{n - l} \sum_{j = i + 1}^{n - l} 2 w_{{\hat{a}}_{i} {\hat{a}}_{j}} \frac{x_{{\hat{a}}_{i}} x_{{\hat{a}}_{j}}}{\sqrt{d_{{\hat{a}}_{i}} d_{{\hat{a}}_{j}}}} \\ = & \sum_{i = 1}^{n - l} \sum_{j = i + 1}^{n - l} w_{{\hat{a}}_{i} {\hat{a}}_{j}} {(\frac{x_{{\hat{a}}_{i}}}{\sqrt{d_{{\hat{a}}_{i}}}} - \frac{x_{{\hat{a}}_{j}}}{\sqrt{d_{{\hat{a}}_{j}}}})}^{2} . \end{matrix}

(A2)

The first inequality in the equation above is scaled and reduced by l elements because of the definition of

w_{i j} > 0

. Consequently, since I-AWA is a positive semidefinite matrix, it follows that

Tr [V (I - A_{2} W A_{2}^{⊤}) V^{⊤}] \geq 0

, thereby establishing the validity of the lemma. □

Theorem A2.

F (\tilde{Z})

is a convex function.

Proof.

Let

\tilde{Z} \in R^{c \times (n - p)}

,

V \in R^{c \times (n - p)}

and

g (t) = F (\tilde{Z} - t V)

; we have

\begin{matrix} g^{″} (t) = & 12 t^{2} {∥V^{⊤} V∥}_{F}^{2} + 12 t Tr [V^{⊤} V ({\tilde{Z}}^{⊤} V + V^{⊤} \tilde{Z})] \\ + 4 Tr ({\tilde{Z}}^{⊤} \tilde{Z} V^{⊤} V) + 2 {∥{\tilde{Z}}^{⊤} V + V^{⊤} \tilde{Z}∥}_{F}^{2} \\ - 2 (2 - λ) {∥V A_{2} \tilde{S}∥}_{F}^{2} + 2 λ {∥V∥}_{F}^{2} + 4 Tr (Y V V^{⊤}) . \end{matrix}

(A3)

Let

ℓ = - 2 (2 - λ) {∥V A_{2} \tilde{S}∥}_{F}^{2} + 2 λ {∥V∥}_{F}^{2} + 4 Tr (Y V V^{⊤})

, given that

Y

is a diagonal matrix with each element

υ_{i i} \geq 1

, thus,

Tr (Y V V^{⊤}) \geq {∥V∥}_{F}^{2}

, it follows that

ℓ \geq 0

. Let

{\tilde{g}}^{″} (t) = g^{″} (t) - ℓ

; to establish

g^{″} (t) \geq 0

, we only need to demonstrate

{\tilde{g}}^{″} (t) \geq 0

. Therefore, there exists

\begin{matrix} {\tilde{g}}^{″} (t) = & 3 t^{2} {∥V^{⊤} V∥}_{F}^{2} + 3 t T r [V^{⊤} V ({\tilde{Z}}^{⊤} V + V^{⊤} \tilde{Z})] \\ + 2 T r ({\tilde{Z}}^{⊤} \tilde{Z} V^{⊤} V) + T r ({\tilde{Z}}^{⊤} V {\tilde{Z}}^{⊤} V), \end{matrix}

(A4)

and

\begin{matrix} △ = 9 T r^{2} [V^{⊤} V ({\tilde{Z}}^{⊤} V + V^{⊤} \tilde{Z})] \\ - 12 {∥V^{⊤} V∥}_{F}^{2} [2 T r ({\tilde{Z}}^{⊤} \tilde{Z} V^{⊤} V) + T r ({\tilde{Z}}^{⊤} V {\tilde{Z}}^{⊤} V)] . \end{matrix}

(A5)

By applying Cauchy’s inequality, it can be deduced that

\begin{matrix} △ \leq & 18 {∥V^{⊤} V∥}_{F}^{2} T r (\tilde{Z} {\tilde{Z}}^{⊤} {VV}^{⊤} + \tilde{Z} V^{⊤} \tilde{Z} V^{⊤}) \\ - 12 {∥V^{⊤} V∥}_{F}^{2} [2 T r ({\tilde{Z}}^{⊤} \tilde{Z} V^{⊤} V) + T r ({\tilde{Z}}^{⊤} V {\tilde{Z}}^{⊤} V)] \\ = & 6 {∥V^{⊤} V∥}_{F}^{2} [- T r ({\tilde{Z}}^{⊤} \tilde{Z} V^{⊤} V) + T r ({\tilde{Z}}^{⊤} V {\tilde{Z}}^{⊤} V)] \\ \leq & 0 . \end{matrix}

(A6)

Equation (A6) indicates that

{\tilde{g}}^{″} (t) \geq 0

, implying that

g^{″} (t) \geq 0

. Consequently, it can be inferred that

F (\tilde{Z})

is a convex function. □

References

Lee, D.D.; Seung, H.S. Learning the parts of objects by non-negative matrix factorization. Nature 1999, 401, 788–791. [Google Scholar] [CrossRef] [PubMed]
Lee, D.; Seung, H.S. Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. 2000, 13. Available online: https://papers.neurips.cc/paper_files/paper/2000/hash/f9d1152547c0bde01830b7e8bd60024c-Abstract.html (accessed on 1 September 2024).
Duda, R.O.; Hart, P.E. Pattern Classification; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Wold, S.; Esbensen, K.; Geladi, P. Principal component analysis. Chemom. Intell. Lab. Syst. 1987, 2, 37–52. [Google Scholar] [CrossRef]
Xu, W.; Gong, Y. Document clustering by concept factorization. In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, UK, 25–29 July 2004; pp. 202–209. [Google Scholar]
Zhang, Z.; Zhao, K. Low-rank matrix approximation with manifold regularization. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 1717–1729. [Google Scholar] [CrossRef] [PubMed]
Cai, D.; He, X.; Han, J.; Huang, T.S. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 1548–1560. [Google Scholar]
Cai, D.; He, X.; Han, J. Locally consistent concept factorization for document clustering. IEEE Trans. Knowl. Data Eng. 2010, 23, 902–913. [Google Scholar] [CrossRef]
Yi, Y.; Wang, J.; Zhou, W.; Zheng, C.; Kong, J.; Qiao, S. Non-negative matrix factorization with locality constrained adaptive graph. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 427–441. [Google Scholar] [CrossRef]
Peng, S.; Ser, W.; Chen, B.; Sun, L.; Lin, Z. Correntropy based graph regularized concept factorization for clustering. Neurocomputing 2018, 316, 34–48. [Google Scholar] [CrossRef]
Yu, N.; Wu, M.J.; Liu, J.X.; Zheng, C.H.; Xu, Y. Correntropy-based hypergraph regularized NMF for clustering and feature selection on multi-cancer integrated data. IEEE Trans. Cybern. 2020, 51, 3952–3963. [Google Scholar] [CrossRef]
Liu, H.; Wu, Z.; Li, X.; Cai, D.; Huang, T.S. Constrained nonnegative matrix factorization for image representation. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 34, 1299–1311. [Google Scholar] [CrossRef]
Liu, H.; Yang, G.; Wu, Z.; Cai, D. Constrained concept factorization for image representation. IEEE Trans. Cybern. 2013, 44, 1214–1224. [Google Scholar]
Zhang, X.; Zong, L.; Liu, X.; Luo, J. Constrained clustering with nonnegative matrix factorization. IEEE Trans. Neural Netw. Learn. Syst. 2015, 27, 1514–1526. [Google Scholar] [CrossRef] [PubMed]
Peng, S.; Ser, W.; Chen, B.; Lin, Z. Robust semi-supervised nonnegative matrix factorization for image clustering. Pattern Recognit. 2021, 111, 107683. [Google Scholar] [CrossRef]
Peng, S.; Yang, Z.; Nie, F.; Chen, B.; Lin, Z. Correntropy based semi-supervised concept factorization with adaptive neighbors for clustering. Neural Netw. 2022, 154, 203–217. [Google Scholar] [CrossRef] [PubMed]
Zhou, N.; Choi, K.S.; Chen, B.; Du, Y.; Liu, J.; Xu, Y. Correntropy-based low-rank matrix factorization with constraint graph learning for image clustering. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 10433–10446. [Google Scholar] [CrossRef]
Wu, W.; Jia, Y.; Kwong, S.; Hou, J. Pairwise constraint propagation-induced symmetric nonnegative matrix factorization. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 6348–6361. [Google Scholar] [CrossRef]
Yin, J.; Peng, S.; Yang, Z.; Chen, B.; Lin, Z. Hypergraph based semi-supervised symmetric nonnegative matrix factorization for image clustering. Pattern Recognit. 2023, 137, 109274. [Google Scholar] [CrossRef]
Liu, W.; He, J.; Chang, S.F. Large graph construction for scalable semi-supervised learning. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 679–686. [Google Scholar]
Wang, M.; Fu, W.; Hao, S.; Tao, D.; Wu, X. Scalable semi-supervised learning by efficient anchor graph regularization. IEEE Trans. Knowl. Data Eng. 2016, 28, 1864–1877. [Google Scholar] [CrossRef]
He, F.; Nie, F.; Wang, R.; Li, X.; Jia, W. Fast semisupervised learning with bipartite graph for large-scale data. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 626–638. [Google Scholar] [CrossRef]
Huang, J.; Nie, F.; Huang, H. A new simplex sparse learning model to measure data similarity for clustering. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
Antelmi, A.; Cordasco, G.; Polato, M.; Scarano, V.; Spagnuolo, C.; Yang, D. A survey on hypergraph representation learning. ACM Comput. Surv. 2023, 56, 1–38. [Google Scholar] [CrossRef]
Nocedal, J.; Wright, S.J. (Eds.) Numerical Optimization; Springer: New York, NY, USA, 1999. [Google Scholar]
Hager, W.W.; Zhang, H. A new conjugate gradient method with guaranteed descent and an efficient line search. SIAM J. Optim. 2005, 16, 170–192. [Google Scholar] [CrossRef]
Hager, W.W.; Zhang, H. Algorithm 851: CG_DESCENT, a conjugate gradient method with guaranteed descent. ACM Trans. Math. Softw. (TOMS) 2006, 32, 113–137. [Google Scholar] [CrossRef]
Hager, W.W.; Zhang, H. The limited memory conjugate gradient method. SIAM J. Optim. 2013, 23, 2150–2168. [Google Scholar] [CrossRef]
Hansen, P.C. Truncated singular value decomposition solutions to discrete ill-posed problems with ill-determined numerical rank. SIAM J. Sci. Stat. Comput. 1990, 11, 503–518. [Google Scholar] [CrossRef]
Chavoshinejad, J.; Seyedi, S.A.; Tab, F.A.; Salahian, N. Self-supervised semi-supervised nonnegative matrix factorization for data clustering. Pattern Recognit. 2023, 137, 109282. [Google Scholar] [CrossRef]
Nie, F.; Xue, J.; Wu, D.; Wang, R.; Li, H.; Li, X. Coordinate descent method for k k-means. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 2371–2385. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]

Figure 1. Links between samples and anchors, where

x_{1}

–

x_{7}

represent the seven samples and

u_{1}

–

u_{4}

represent the four anchors.

Figure 1. Links between samples and anchors, where

x_{1}

–

x_{7}

represent the seven samples and

u_{1}

–

u_{4}

represent the four anchors.

Figure 2. Clustering performance with different labeled samples on (a) COIL20 dataset, (b) YaleB dataset, (c) COIL100 dataset, (d) USPS dataset, (e) MNIST dataset, and (f) Letters dataset.

Figure 3. Sensitivity of FGLMF on (a) COIL20 dataset, (b) YaleB dataset, (c) COIL100 dataset, (d) USPS dataset, (e) MNIST dataset, and (f) Letters dataset.

Figure 4. Accuracy vs. time of FGLMF with different anchors on (a) COIL20 dataset, (b) YaleB dataset, (c) COIL100 dataset, (d) USPS dataset, (e) MNIST dataset, and (f) Letters dataset.

Figure 5. Bases generated by FGLMF. The first row is COIL20, the second is YaleB, the third is COIL100, the fourth is USPS, the fifth is MNIST, and the sixth is Letters.

Figure 6. Adjacency matrix on COIL20 dataset: (a) bipartite graph generated by Equation (3); (b) full adjacency matrix generated by bipartite using Equation (6); (c) normalized full adjacency matrix generated by Gaussian kernel.

Figure 7. Convergence curve on (a) COIL20 dataset, (b) USPS dataset, (c) COIL100 dataset, (d) USPS dataset, (e) MNIST dataset, and (f) Letters dataset.

Table 1. Datasets.

Dataset	No. of Instances (n)	No. of Features (d)	No. of Classes (C)
COIL20	1440	1024	20
YaleB	2414	1024	38
COIL100	7200	1024	100
USPS	9298	256	10
MNIST	70000	784	10
Letters	145600	784	26

Table 2. Average accuracy (%) ± standard deviation (%) of using different approaches on six datasets (“OM” means “out-of-memory error”). The bold text signifies the best, while the underlined text indicates the second best.

	TSVD	SemiGNMF	CSNMF	CSCF	CLMF	PCPSNMF	S⁴NMF	HSSNMF	EAGR	BGSSL	FGLMF
COIL20	60.67 ± 03.33	89.23 ± 02.19	86.15 ± 03.46	86.02 ± 03.23	91.44 ± 01.12	87.27 ± 03.22	91.41 ± 02.40	91.42 ± 02.75	95.42 ± 00.97	95.63 ± 00.96	97.79 ± 00.84
YaleB	08.54 ± 00.26	55.58 ± 02.68	58.42 ± 04.02	45.80 ± 02.19	77.46 ± 00.96	66.11 ± 02.77	69.97 ± 02.89	72.82 ± 02.01	58.06 ± 01.55	54.78 ± 01.56	72.24 ± 01.39
COIL100	48.63 ± 01.39	82.62 ± 00.74	72.92 ± 02.25	73.99 ± 02.49	75.21 ± 00.50	83.00 ± 01.22	65.01 ± 01.80	88.75 ± 01.02	86.98 ± 00.33	85.71 ± 00.91	90.44 ± 00.42
USPS	64.04 ± 01.87	-	-	93.92 ± 03.72	90.91 ± 00.26	60.67 ± 07.27	83.22 ± 05.11	83.34 ± 03.70	96.34 ± 00.14	96.03 ± 00.15	96.45 ± 00.18
MNIST	54.21 ± 02.27	OM	OM	OM	OM	OM	OM	OM	96.49 ± 00.03	95.38 ± 00.10	96.62 ± 00.05
Letters	37.31 ± 00.77	OM	OM	OM	OM	OM	OM	OM	82.32 ± 00.16	75.63 ± 00.16	82.50 ± 00.15

Table 3. Average NMI (%) ± standard deviation (%) of using different approaches on six datasets (“OM” means “out-of-memory error”). The bold text signifies the best, while the underlined text indicates the second best.

	TSVD	SemiGNMF	CSNMF	CSCF	CLMF	PCPSNMF	S⁴NMF	HSSNMF	EAGR	BGSSL	FGLMF
COIL20	74.75 ± 01.10	95.27 ± 00.41	93.71 ± 01.39	93.05 ± 01.56	89.82 ± 00.92	92.29 ± 01.44	94.88 ± 01.13	95.63 ± 01.08	96.13 ± 00.81	95.92 ± 00.78	97.54 ± 00.96
YaleB	12.89 ± 00.46	71.54 ± 00.81	70.90 ± 01.62	62.81 ± 01.46	74.58 ± 01.18	68.16 ± 01.94	73.39 ± 01.91	73.00 ± 01.22	60.91 ± 00.89	59.20 ± 01.03	71.38 ± 01.10
COIL100	75.52 ± 00.34	93.56 ± 00.33	88.98 ± 00.87	90.03 ± 00.94	80.46 ± 00.40	90.87 ± 00.43	79.07 ± 01.17	94.17 ± 00.35	93.20 ± 00.20	92.64 ± 00.31	94.41 ± 00.20
USPS	59.10 ± 00.75	-	-	91.73 ± 02.05	80.74 ± 00.45	74.91 ± 03.33	86.14 ± 04.19	85.70 ± 00.98	91.18 ± 00.31	90.59 ± 00.24	91.34 ± 00.31
MNIST	50.91 ± 01.12	OM	OM	OM	OM	OM	OM	OM	91.14 ± 00.07	89.41 ± 00.13	91.40 ± 00.10
Letters	39.48 ± 00.49	OM	OM	OM	OM	OM	OM	OM	75.36 ± 00.17	70.23 ± 00.09	75.3 ± 00.16

Table 4. Average ARI (%) ± standard deviation (%) of using different approaches on six datasets (“OM” means “out-of-memory error”). The bold text signifies the best, while the underlined text indicates the second best.

	TSVD	SemiGNMF	CSNMF	CSCF	CLMF	PCPSNMF	S⁴NMF	HSSNMF	EAGR	BGSSL	FGLMF
COIL20	53.38 ± 02.67	85.40 ± 02.72	86.54 ± 02.75	82.08 ± 03.99	84.16 ± 01.69	83.09 ± 03.27	88.46 ± 02.89	88.92 ± 02.87	91.64 ± 01.37	92.50 ± 01.45	95.60 ± 01.66
YaleB	00.29 ± 00.12	38.76 ± 02.51	47.09 ± 02.97	35.21 ± 02.53	58.79 ± 01.69	47.34 ± 03.02	55.76 ± 03.61	55.28 ± 02.19	37.06 ± 01.77	34.56 ± 01.81	52.96 ± 01.88
COIL100	43.31 ± 00.96	76.97 ± 00.89	63.84 ± 04.11	68.28 ± 04.10	59.45 ± 00.85	76.20 ± 01.31	53.23 ± 02.06	83.78 ± 01.23	81.09 ± 00.61	79.89 ± 01.02	85.75 ± 00.56
USPS	50.64 ± 01.49	-	-	92.35 ± 03.12	81.78 ± 00.53	58.55 ± 07.33	83.63 ± 04.37	80.72 ± 02.27	93.02 ± 00.25	92.41 ± 00.24	93.24 ± 00.30
MNIST	38.07 ± 01.51	OM	OM	OM	OM	OM	OM	OM	92.48 ± 00.06	90.19 ± 00.21	92.75 ± 00.10
Letters	21.66 ± 00.44	OM	OM	OM	OM	OM	OM	OM	68.10 ± 00.26	59.32 ± 00.15	68.08 ± 00.24

Table 5. Average F-score (%) ± standard deviation (%) of using different approaches on six datasets (“OM” means “out-of-memory error”). The bold text signifies the best, while the underlined text indicates the second best.

	TSVD	SemiGNMF	CSNMF	CSCF	CLMF	PCPSNMF	S⁴NMF	HSSNMF	EAGR	BGSSL	FGLMF
COIL20	58.30 ± 03.61	88.04 ± 02.24	83.73 ± 04.51	85.26 ± 03.49	91.65 ± 01.09	86.43 ± 03.50	89.86 ± 03.31	89.58 ± 02.93	95.46 ± 01.00	95.53 ± 00.96	97.80 ± 00.81
YaleB	08.62 ± 00.42	55.63 ± 03.10	57.44 ± 04.64	46.03 ± 02.21	78.84 ± 00.75	66.10 ± 02.74	68.04 ± 03.01	71.84 ± 02.19	58.39 ± 01.41	54.39 ± 01.37	72.34 ± 01.37
COIL100	46.71 ± 01.57	81.35 ± 00.85	71.94 ± 02.24	72.97 ± 02.43	76.76 ± 00.51	82.44 ± 01.28	63.22 ± 01.86	88.14 ± 00.97	86.80 ± 00.29	85.38 ± 00.92	90.42 ± 00.43
USPS	61.56 ± 02.45	-	-	91.76 ± 05.61	90.66 ± 00.27	56.10 ± 08.06	78.75 ± 05.91	78.14 ± 06.16	95.98 ± 00.17	95.64 ± 00.18	96.06 ± 00.21
MNIST	53.76 ± 02.38	OM	OM	OM	OM	OM	OM	OM	96.46 ± 00.03	95.33 ± 00.11	96.59 ± 00.05
Letters	37.29 ± 00.93	OM	OM	OM	OM	OM	OM	OM	82.25 ± 00.17	74.73 ± 00.20	82.73 ± 00.15

Table 6. Average time costs (seconds) of different methods.

	Other Learning								Anchor-Based Learning
	TSVD	SemiGNMF	CSNMF	CSCF	CLMF	PCPSNMF	S⁴NMF	HSSNMF	k-means	EAGR	BGSSL	FGLMF
COIL20	0.34	1.15	38.20	79.87	24.36	4.31	64.44	6.48	0.55	+0.12	+0.06	+0.05
YaleB	0.69	2.67	105.26	273.92	53.71	13.23	234.31	19.82	2.13	+0.23	+0.17	+0.15
COIL100	3.75	12.52	1509.16	4287.62	421.90	150.15	2787.81	176.65	9.84	+0.89	+0.85	+0.74
USPS	0.23	-	-	5522.62	328.54	225.21	3224.49	321.27	4.11	+0.62	+0.78	+0.34
MNIST	3.69	OM	OM	OM	OM	OM	OM	OM	61.5	+4.96	+5.42	+2.91
Letters	8.18	OM	OM	OM	OM	OM	OM	OM	129.77	+14.49	+13.43	+7.04

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Du, Y.; Luo, W.; Wu, Z.; Zhou, N. Fast Global and Local Semi-Supervised Learning via Matrix Factorization. Mathematics 2024, 12, 3242. https://doi.org/10.3390/math12203242

AMA Style

Du Y, Luo W, Wu Z, Zhou N. Fast Global and Local Semi-Supervised Learning via Matrix Factorization. Mathematics. 2024; 12(20):3242. https://doi.org/10.3390/math12203242

Chicago/Turabian Style

Du, Yuanhua, Wenjun Luo, Zezhong Wu, and Nan Zhou. 2024. "Fast Global and Local Semi-Supervised Learning via Matrix Factorization" Mathematics 12, no. 20: 3242. https://doi.org/10.3390/math12203242

APA Style

Du, Y., Luo, W., Wu, Z., & Zhou, N. (2024). Fast Global and Local Semi-Supervised Learning via Matrix Factorization. Mathematics, 12(20), 3242. https://doi.org/10.3390/math12203242

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fast Global and Local Semi-Supervised Learning via Matrix Factorization

Abstract

1. Introduction

2. Preliminaries

2.1. Bipartite Graph

2.2. Matrix Factorization

3. Proposed Method

3.1. Methodology

3.2. Optimization

3.2.1. Fix $W$ , Optimize $\tilde{Z}$

3.2.2. Fix $\tilde{Z}$ , Optimize $W$

3.3. Computational Complexity

4. Experiments

4.1. Compared Method

4.2. Datasets

4.3. Experiment Settings

4.4. Experiment Performance

4.5. Time Consumption

4.6. Sensitivity to $λ$ Parameter

4.7. Effect of Anchors

4.8. Bases

4.9. Generated Graph

4.10. Convergence Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Proof of Theorem 1

Appendix B. Proof of Theorem 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Fast Global and Local Semi-Supervised Learning via Matrix Factorization

Abstract

1. Introduction

2. Preliminaries

2.1. Bipartite Graph

2.2. Matrix Factorization

3. Proposed Method

3.1. Methodology

3.2. Optimization

3.2.1. Fix W , Optimize Z ˜

3.2.2. Fix Z ˜ , Optimize W

3.3. Computational Complexity

4. Experiments

4.1. Compared Method

4.2. Datasets

4.3. Experiment Settings

4.4. Experiment Performance

4.5. Time Consumption

4.6. Sensitivity to λ Parameter

4.7. Effect of Anchors

4.8. Bases

4.9. Generated Graph

4.10. Convergence Study

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Proof of Theorem 1

Appendix B. Proof of Theorem 2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2.1. Fix $W$ , Optimize $\tilde{Z}$

3.2.2. Fix $\tilde{Z}$ , Optimize $W$

4.6. Sensitivity to $λ$ Parameter