4.1. The Non-Convex Approximation of Rank
Although RPCA exhibits high robustness, it also has limitations. Firstly, the assumption of a convex nature for the low-rank matrix in RPCA does not hold true in certain cases, which affects the accuracy of the algorithm. Secondly, RPCA involves solving convex optimization problems to determine the low-rank and sparse matrices, and the slow convergence of convex optimization affects the efficiency of RPCA when dealing with large-scale problems. To address these issues, an improvement has been made by introducing non-convex functions to approximate the matrix rank and -norm. This enhancement modifies the low-rank and sparse matrices in RPCA and optimizes the solution of convex optimization problems to enhance computational efficiency.
The optimization model of RPCA is relaxed to
where
,
represents the
ith singular value of the matrix
.
Although Equation (16) is a convex problem and easily optimized, in practical scenarios, due to possible data corruption, obtaining the global optimal solution of Equation (16) may lead to significant errors. Furthermore, since the nuclear norm is essentially equivalent to the -norm of the vector of matrix singular values, the -norm exhibits a shrinkage effect that results in biased estimates. This implies that the nuclear norm excessively penalizes large singular values, causing solutions that deviate significantly from the true solution.
Therefore, a new non-convex approximation function is employed to approximate the rank of
, which provides a closer approximation than the nuclear norm. The mathematical form of the non-convex function is defined as follows:
where
represents the model parameter,
represents the ith singular value of matrix
.
The non-convex rank-approximation function y possesses the following characteristics:
- (1)
;
- (2)
, ;
- (3)
is unitary invariant, meaning that for any orthogonal matrix and , it holds that ;
- (4)
For any matrix , it holds that , if and only if , .
Figure 4 illustrates the case of using the non-convex function
and nuclear norm to approximate the rank function, with the value of
increasing gradually. It can be observed that as the singular value
increases, the nuclear norm deviates significantly from 1, while the non-convex function steadily approaches one more stably. A smaller parameter value
leads to a better approximation of the rank function.
4.2. The Non-Convex Penalty Function
However, for the sparse component
, G. Gasso et al. [
39] demonstrated that adopting non-convex penalty functions can better approximate the sparsity properties of sparse signals and, under certain conditions, ensure the uniqueness of sparse solutions. Hence, a non-convex penalty function is utilized to approximate the sparse matrix
S , denoted as the sparse penalty function
. Suppose that the function
is concave on (0, ∞), continuous, and monotonically increasing, and concave on (−∞, 0), continuous, and monotonically decreasing.
Table 1 presents the non-convex penalty functions for approximating the sparse matrix
, including the penalty function (
) [
40], the logarithmic penalty function (
), the exponential penalty (
) [
41], and the Smoothly Clipped Absolute Deviation (SCAD) [
42].
Table 1 lists these penalty functions along with their super-gradients., where
represents the element at the ith row and jth column of matrix
.
All of
Figure 5 exhibits the characteristic of sparsity, i.e., large gradients in some regions and small or zero gradients in others. Thus, to some extent, it is relevant to sparse learning. This is because, in sparse learning, the model is usually expected to have large gradients in certain parameters, which results in larger weights corresponding to these parameters and smaller or zero gradients in other parameters, thus achieving sparsity in the model.
4.3. Derivation of the Overall Formula for NCA-RPCA Algorithm
From
Section 4.1 and
Section 4.2, the optimization problem can be redefined as follows:
where
represents the element in the
ith row and
jth column of matrix
, the parameter
determines the relative impact of low-rank and sparsity on the objective function. The optimal value of
that balances the minimization of the rank of
and the maximization of the sparsity of
can be achieved by controlling
. To determine the parameter
, numerical experiments were conducted in this study, using the number of iterations and error value (err) as criteria for selection. Let
, where
is set within the range
, and m and n are the rows and columns of the input image matrix, respectively.
Figure 6 shows the relationship between the number of iterations, error value (err), and
.
From
Figure 6, it can be observed that as
increases, the number of iterations continuously rises, and the error value also increases to some extent. Therefore, based on
Figure 5, this study chooses
to determine the value of parameter
. The role of
is to balance the low-rank and sparse components of the matrix, while
affects the threshold of the non-convex penalty function. Thus, these two parameters need to be jointly adjusted to achieve a balance between the low-rank and sparse components. The penalty coefficient
, following the approach in reference [
30], is set as
. This process ultimately leads to a well-optimized set of parameters for the algorithm.
When seeking a solution for Equation (18), the properties of the function can be derived from the gradient (or super-gradient when there is a non-smooth origin point). At the point , if holds for every ; then, is the super-gradient of function . Thus, the function in Equations (17) and (18) has its approximated formulation as , where is sufficiently close to . By using first-order Taylor expansion, represents the gradient of at . The same principle can also be implemented for the penalty function .
Expanding
and
using the first-order Taylor series, the Lagrangian form of the optimization problem described above can be established based on Equation (9).
where the dual variable
is the Lagrange multiplier and
is the penalty coefficient. For the sake of convenience in subsequent calculations and processing,
and
can be combined.
Here, can be treated as a constant when solving for and , thus Equation (20) can be scaled. In other words, can be equivalently represented as .
Continuing, the IALM is employed to solve the equation, mainly involving three steps: in the first step,
and
are fixed, and
is updated to find its optimal value. Since
is independent of
, it can be omitted during the process of solving for
.
is a constant, and it can also be omitted and scaled while solving for
, as follows:
Because in Equation (32),
represents the first-order Taylor expansion of
and
is a constant; therefore, during the process of solving for
,
can be omitted, meaning that
is equivalent to
. The above equations are derived and simplified.
In the second step, with
fixed along with the updated
, the optimization of
is performed to obtain the optimal value of
. Similarly, it can be observed that
is independent of
during the process of solving for
, and thus can be omitted.
Due to the first-order Taylor expansion of
being equivalent to
and
being a constant in Equation (26), during the process of solving for
, it is possible to omit
, meaning that
is effectively equivalent to
. The above equations are derived and simplified.
In the third step, with the updated
and
fixed, the gradient ascent method is applied to update
, updating the penalty parameter
using the step size
.
where
denotes the iteration number. The specific solving process is as follows:
Step one, fixing
and
, update
:
Let
and decompose the above equation into a subproblem regarding
, then the equation becomes:
where
and
are elements of
and
, respectively. For the sake of simplicity in notation, the subscripts i and j are omitted. Regarding the equation, the soft-thresholding operator [
43] can be applied to solve for
, which can be obtained from the following equation:
Here, represents .
Second step, with
and
fixed, update
:
Let
, then the above equation becomes
Performing the singular value decomposition on
, we have
, where the diagonal matrix of singular values is represented by
. The jth singular value of
is denoted as
, and the jth singular value of
is
According to the singular value soft-thresholding operator [
43], the optimal solution for
is given by
Therefore,
where
is the column vector composed of
, and
and
are matrices obtained from the singular value decomposition of
.
Finally, update the Lagrange multiplier
and the penalty parameter
.
If the convergence condition is satisfied in the iteration, meaning that the difference between the observed matrix and the reconstruction of the low-rank and sparse matrices is smaller than a given threshold, then at this point, and are very close to the actual low-rank and sparse components. It can be considered that the decomposition results are relatively stable. Therefore, output and , and the entire algorithm process concludes.
Based on the algorithm procedure described above, create a flowchart. Assuming the input original data matrix is
and the outputs
and
are the low-rank and sparse components of
, the flowchart of the NCA-RPCA algorithm is illustrated in
Figure 7.