Next Article in Journal
Evaluating Mobile Telecom Apps: An Integrated Fuzzy MCDM Model Using Marketing Mix
Previous Article in Journal
An Integrated Approach of Video Game Therapy®: A Case Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improved RPCA Method via Fractional Function-Based Structure and Its Application

College of Electronic Science and Technology, National University of Defense Technology, Changsha 410000, China
*
Author to whom correspondence should be addressed.
Information 2025, 16(1), 69; https://doi.org/10.3390/info16010069
Submission received: 24 December 2024 / Accepted: 6 January 2025 / Published: 20 January 2025
(This article belongs to the Section Artificial Intelligence)

Abstract

:
With the advancement of oil logging techniques, vast amounts of data have been generated. However, this data often contains significant redundancy and noise. The logging data must be denoised before it is used for oil logging recognition. Hence, this paper proposed an improved robust principal component analysis algorithm (IRPCA) for logging data denoising, which addresses the problems of various noises in oil logging data acquisition and the limitations of conventional data processing methods. The IRPCA algorithm enhances both the efficiency of the model and the accuracy of low-rank matrix recovery. This improvement is achieved primarily by introducing the approximate zero norm based on the fractional function structure and by adding weighted kernel parametrization and penalty terms to enhance the model’s capability in handling complex matrices. The efficacy of the proposed IRPCA algorithm has been verified through simulation experiments, demonstrating its superiority over the widely used RPCA algorithm. We then present a denoising method tailored to the characteristics of logging data and based on the IRPCA algorithm. This method first involves the segregation of the original logging data to acquire background and foreground information. The background information is subsequently further separated to isolate the factual background and noise, resulting in the denoised logging data. The results indicate that the IRPCA algorithm is practical and effective when applied to the denoising of actual logging data.

1. Introduction

Oil is the lifeblood of the industry, and China has emerged as the world’s largest oil importer. As China’s dependence on foreign oil intensifies, the mismatch between supply and demand becomes increasingly apparent [1]. To mitigate the issue of the oil supply–demand conflict, one effective strategy is the accurate identification of oil reservoirs, which can stabilize oil production and enhance the development of oil reserves.
The logging data curve is a data signal reflecting changes in physical properties with well depth and serves as the foundation for determining various parameters of oil and gas reservoirs in oil logging recognition. Logging data curves offer invaluable insights into identifying subsurface sedimentation and analyzing the distribution of subsurface material layers [2,3].
Employing logging data curves for lithology identification is faster and more cost-effective compared to other methods. Logging data processing primarily encompasses data pre-processing, attribute reductions, and classification. Moreover, well logging data denoising is the most critical task in logging data pre-processing. In the process of logging data denoising, challenges are inevitable due to the sparse and inhomogeneous sampling methods, which lead to redundant information, noise, and even misinterpretation. By technically reconstructing the logging data [4], we can render the geophysical information more comprehensive, accurately reflect the characteristics of the underground geological body, ensure the precision of complex geological reconstructions, and provide more effective guidelines and references for logging data mining.
The application of robust principal component analysis (RPCA) in logging data processing [5] merits in-depth study, particularly in the contexts of logging data cleaning and data mining. Regarding well logging data denoising, the measurement instruments or sampling methods used to collect the data often result in a substantial amount of redundancy and noise in the well logging data from each field, especially in the harsh down-hole environments where various errors are inevitable. Consequently, to further enhance the mining efficacy of logging data, the development of novel logging data denoising methods for effective denoising is an urgent matter of concern.
Several common methods exist for solving the standard RPCA model, such as the Iterative Thresholding Algorithm (IT) [6], Accelerated Proximal Gradient (APG) [7], the Augmented Lagrange Multiplier Method (ALM) [8], Exact ALM method (EALM) [9], and Inexact ALM method (IALM) [10].
Presently, the RPCA model and its enhanced optimization algorithm have been employed for feature extraction, dimensionality reduction, subspace segmentation, etc., in the context of image signal and voice data [11,12,13]. Similarly, the advanced concept of sparse decomposition in the RPCA model can be utilized in logging data processing, effectively integrating the features and advantages of both to achieve superior denoising results [14,15,16].
The fundamental concept behind enhancing RPCA algorithms involves addressing the issue of parametric optimization intrinsic to the RPCA model [12]. However, a prevalent issue with these algorithms is their tendency to decelerate and amplify the recovery error as the dimensionality of the input matrix escalates. The referenced study [17] suggests employing a smoothing function adhering to specific value rules to approximate the minimization parametrization. This algorithm operates faster and exhibits superior recovery accuracy for the input matrix compared to other algorithms under identical experimental conditions. Candes [18] introduced the concept of weighted parametric minimization, employed to enhance the performance of the low-rank matrix recovery model in terms of matrix sparse decomposition capacity. However, the performance indicated that this approach impacts the stability of the solution to a certain degree. Peng [19] proposed the idea of weighted kernel parametrization to augment the low-rankness and sparsity of the matrix, thereby increasing the efficiency of matrix recovery. Zou [20] demonstrated that in the process of solving the low-rank matrix recovery model, achieving both the stability and sparsity of the solution simultaneously is unattainable. To guarantee the solution’s accuracy, it is necessary for the algorithm analysis to take both factors into account. Simultaneously, RPCA algorithms can also leverage both static and dynamic functions to augment their performance under sparsity constraints [21,22,23]. Rekavandi [24] has introduced four α−divergence-based RPCA methods, which provide a robust and flexible framework for signal recovery in fMRI and for Foreground–Background separation in video analysis. The main idea of those algorithms is to extract the principal loading vectors and their corresponding PCs by minimizing a cost function derived from the α−divergence between the sample density (obtained using the observed data) and the nominal density model. These methods effectively mitigate the impact of both structured and unstructured outliers.
Motivated by the aspiration to construct a more robust and swift matrix recovery method from RPCA, while ensuring solution stability, the low-rankness of the observation matrix and the sparsity of the sparse matrix can be more effectively utilized. Moreover, there is an actual demand for oil logging denoising methods. Therefore, we present two main contributions in this study. First, we proposed an approximate zero norm, based on the structure of the fractional function, which is used to construct the objective optimization function, and subsequently, we utilized weighted kernel parametrization and penalty terms to optimize the RPCA model. Second, we have proposed a new method for oil logging data denoising, which is predicated on the use of IRPCA.
This paper is organized as follows. Section 2 introduces the basic principles of RPCA. Section 3 elucidates the details of the Improved RPCA. Section 4 presents experimental results to demonstrate the efficiency of IRPCA in both simulation experiments and logging data denoising experiments. Lastly, we present our conclusions in Section 5.

2. Related Work

Principle of RPCA

Similar to classical principal component analysis (PCA) [25], robust principal component analysis (RPCA) [26] fundamentally involves finding the optimal projection of data in a low-dimensional space. However, PCA may fail to yield the desired results, whereas RPCA is capable of recovering fundamentally low-ranked data when the observed data is voluminous and noisy. PCA assumes that the data noise follows a Gaussian distribution, and PCA can be influenced by significant noise or severe outliers. However, RPCA assumes that the data noise is sparse and does not consider the intensity of the noise. RPCA is primarily resolved through regularization. Initially, RPCA establishes the comprehensive objective function, then it imposes certain constraints to ensure the objective function has a unique solution. Ultimately, it solves the objective function using an optimization method. The principle of RPCA is described as follows:
In numerous practical applications, the observation matrix D i m × n of a signal frequently exhibits a low rank or approximately low rank structure, simultaneously compounded by substantial sparse noise interference. To recover the low-rank structure of the original matrix and eliminate the sparse noise, the matrix D is expressed as the sum of a low-rank matrix and a sparse matrix. The recovery process of the low-rank matrix A can be formulated as a bi-objective optimization problem, as depicted in Equation (1) below:
min   A , E rank A , E 0 s . t .   D = A + E
Subsequently, the dual-objective optimization problem in RPCA, as shown in Equation (1), is transformed into a minimization problem as per Equation (2), by introducing a compromise factor λ ( > 0 ) . This allows for a balance between maintaining the low rank of matrix A and preserving the sparsity of matrix E .
min   A , E   rank A + λ E 0 s . t .   D = A + E
where rank A represents rank of matrix and 0 means the number of non-zero elements in the sparse matrix.
The minimization problem in Equation (2) necessitates convex relaxation because both the rank function and the norm in the robust principal component analysis model are discontinuous functions in space. Therefore, Equation (2) can be transformed into Equation (3) using convex relaxation.
min   A , E A + λ E 1 s . t .   D = A + E
In Equation (3), the singular value decomposition of the matrix A 0 = U V T = i = 1 r σ i u i v i T is set, which satisfies the independent condition of the parameter. The equation is shown below:
max i U T e i 2 u r n 1 max i V T e i 2 u r n 2 U V T u r n 1 n 2
where e i is the unit vector.
If the rank of the low-rank matrix is satisfied then we have Equation (5), when the elements of sparse matrix E0 obey the uniform distribution and the number of non-zero elements is m.
r a n k ( A 0 ) p r n 2 u 1 ( log n 1 ) , m p s n 1 n 2
where low rank rate p r > 0 , sparsity p s > 0 , and λ = max ( n 1 , n 2 ) . So, it will have a constant c that reconstructs the low-rank matrix by the rate of p 1 c n 1 10 with Equation (5).

3. Proposed Methodology

3.1. Improved RPCA Model Based on Approximate-Zero Norm (Fractional Function)

To achieve superior sparse signal reconstruction results, sparse matrix optimization, based on an approximate zero norm, employs a minimization problem of the approximate norm to supplant the issue of solving the minimum norm. Continuous smooth functions are considerably easier to manipulate than discontinuous functions in actual computation; hence, an approximation norm based on a smoothing function is utilized to approximate the discontinuous norm. Furthermore, the smoothing function needs to align with the trend of the norm, and it must satisfy conditions f x i = 1 , x i 0 0 , x i = 0 and x 0 = i = 1 n f x i . This facilitates the transformation from the optimization problem of the norm to the optimization problem of the smoothing function.
There are various types of smoothing functions. In this paper, we propose using the simplest fractional function ρ b t to replace the discontinuous norm l 0 . The graph of the fractional function is shown in Figure 1.
ρ b t = b t b t + 1.5 , t 0
where b 0 , + .
As shown in Figure 1, when t = 0 , then ρ b 0 = 0 , and when t 0 , b + , so we obtain ρ b t 1 . Therefore, the fractional function p b ( t ) can approximate function x 0 , leading to the transformation of Equation (3) into Equation (7) to solve the optimization problem under specific conditions [6]. The details of Equation (7) are presented below.
min   A , E A + λ p b E s . t . D = A + E
where p b ( x ) = i = 1 N p b ( x i ) .

3.2. Improved RPCA Model Based on Weighted Kernel Parametrization and Penalty Terms

To enhance the robustness of the solution model, an improvement strategy involving weighted kernel parametrization with penalty terms can be integrated into Equation (7). The rank of a low-rank matrix corresponds to the number of non-zero singular values in the RPCA model, implying that the minimization process of σ ( A ) 0 is equivalent to that of r a n k ( A ) . σ ( A ) = σ i ( A ) represents the singular value vectors of the matrix. Therefore, the weighted kernel parametrization is denoted as i w i σ i ( A ) , where w = w i signifies the weighting of component singular values. For each component w i = 1 / σ i ( A )   ( σ i ( A ) 0 ) for each component σ i ( A ) , there exists i w i σ i ( A ) = r a n k ( A ) . Consequently, by adopting the concept of weighted kernel parametrization, the RPCA algorithm model incorporating this technique is derived by amalgamating Equation (7) as shown below.
min   A , E w A , j σ j + λ P b E s . t . D = A + E
where w A = { w A , j } is the weights of singular values in the low-rank matrix A .
Considering the interplay between sparsity and robustness in relation to the solution process of Equation (8) is essential. Therefore, building upon Equation (8), the Frobenius parametrization is introduced to regulate the robustness of the recovery matrix solution. Additionally, the weights from the enhanced weighted kernel parametrization can govern the sparsity of the matrix undergoing recovery, aiming to strike a balance between robustness and sparsity within the solution framework of the RPCA model. By incorporating these dual enhancements, the accuracy of the RPCA model’s recovery can be significantly enhanced. The refined model, presented in Equation (9), is outlined below.
min   A , E w A , j σ j + λ P b E + λ 2 A F 2 s . t . D = A + E  
where P b x = Σ i = 1 N ρ b x i , w A = { w A , j } , σ j are the singular values of matrix A , and λ 2 A F 2 are the penalty terms of matrix A .
In order to address the optimization problem presented in Equation (9), the values of w A need to be determined initially. Subsequently, the update procedure of w A is shown in Algorithm 1.
Algorithm 1. Update Steps of w A
1: Initialize   i = 0 , w A ( 0 ) = 1 R N
2: Alternating   iterative   update   matrix   A , E
3: Update   w A , j i + 1 = 1 σ j i + ε A ,   where   ε A = 0.01
4: If i reaches max value, then output A , E . Otherwise i = i + 1 , back to step 2.
After the values of w A have been obtained by Algorithm 1, the value of w A in Equation (9) is constant. Therefore, we can construct the augmented Lagrangian function from Equation (9) to derive Equation (10). Equation (10) is shown below:
L ( A , E , Y , μ ) = w A , j σ j + λ p b ( E ) + λ 1 A F 2 + < Y , D A E > + μ 1 2 D A E F 2 s . t .   B = A
Let Y = ( Y 1 , Y 2 ) , μ = ( μ 1 , μ 2 ) , it can result in Equation (11):
L ( A , E , Y , μ ) = j = 1 n w A , j σ j + λ p b ( E ) + λ 1 B F 2   +   < Y 1 , D A E >   + μ 1 2 D A E F 2   +   < Y 2 , B A >   +   μ 2 2 B A F 2  
where Y R m × n is Lagrange multiplier, μ > 0 are Penalty Parameters, and · is used to calculate the standard inner product. We using the variable alternate update method [1] to solve min   A , E L A , E , Y k , μ k . Alternating iterations update the matrix A and E until the convergence condition is satisfied.
Then, based on Equation (11), we update matrix A when E = E k + 1 ; Equation (12) is shown below:
A k + 1 = arg min Γ ( A , E k , Y k , μ k ) = arg min A j = 1 n w A , j σ j + μ 1 2 D A E + μ 1 1 Y 1 F 2 + μ 2 2 B A + μ 2 1 Y 2 F 2 = D w A μ 1 + μ 2 1 μ 1 D E + μ 1 1 Y 1 + μ 2 B + μ 2 1 Y 2  
where D ε Q = U S ε Σ V Τ is the singular value operator.
Then, we use the updated matrix A k + 1 j + 1 to iteratively update matrix E
E k + 1 = arg min Γ ( A k + 1 , E , Y k , μ k ) = arg min λ 1 μ 2 1 P b E + E D A k + 1 + Y 2 / μ 2 F 2 / 2
We update Y 1 and Y 2 by Equation (14) when A = A k + 1 * , E = E k + 1 * .
Y 1 k + 1 = Y 1 k + μ 1 k D A k + 1 E k + 1   Y 2 k + 1 = Y 2 k + μ 2 k B k + 1 A k + 1
We update μ by Equation (15)
μ k = ρ μ k if   E k + 1 * E k * / D F < ε μ   otherwise
where ρ > 1 and ε > 0 .
Because Equation (13) is non-convex optimization problem, we use the Difference of Convex Function Algorithm 2 to transform Equation (13) to Equation (16):
f E = λ 1 μ 2 1 P b E + 1 2 E D A k + 1 + Y 2 / μ 2 F 2 g E = 1 2 E D A k + 1 + Y 2 / μ 2 F 2 + λ μ 2 1 E 1 h E = λ μ 2 1 E 1 λ μ 2 1 P b E
Then we get Equation (17) and (18).
V k + 1 = λ s i g n E / μ 2 k μ k 1 λ s i g n ( E ) b b E + 1   + λ b 2 E b E + 1 2 / μ 2 k
E k + 1 = arg min 1 2 E D A k + 1 + Y 2 / μ 2 F 2 + λ μ k 1 E 1 E , V k + 1
Based on Equation (18), we can get the updated Equation (19) of matrix E :
E k + 1 = S μ 2 1 λ V k + 1 + D A k + 1 + Y 2 / μ 2
And Equation (20) of update B is shown below:
arg min   B λ 2 B F 2 + μ 2 2 B A + μ 2 1 Y 2 F 2
Because Equation (20) is a convex function, we can get B by Equation (21):
B = μ 2 2 λ 2 + μ 2 A μ 2 1 Y 2
The pseudo-code of IRPCA is shown in Algorithm 2; the flow chart of IRPCA is shown in Figure 2.
Algorithm 2. Pseudo-Code of IRPCA
Input: Observation matrix D , λ , b = 1.3 .
1: Initialize Y 0 , E 0 = 0 , μ 0 > 0 , ρ > 1 , k = 0 .
2: While not converged do
3:               M = μ 1 D E + μ 1 1 Y 1 + μ 2 B + μ 2 1 Y 2
4:               U ,   Σ ,   V = s v d M
5:              Gets B , B = μ 2 ( 2 λ 2 + μ 2 ) 1 ( A μ 2 1 Y 2 )
6:               A k + 1 = U S w A μ 1 + μ 2 1 Σ V Τ
7:               V k + 1 = μ 2 1 λ 2 s i g n E k μ 2 1 λ 2 s i g n ( E k ) b b E k + 1 + μ 2 1 λ 2 b 2 E k b E k + 1 2
8:               E k + 1 = S μ 2 1 λ V k + 1 + D A k + 1 + Y 2 / μ 2
9:               Y 2 k + 1 = Y 2 k + μ 2 k D A k + 1 E k + 1   , μ 2 k + 1 = ρ μ 2 k , Y 1 k + 1 = Y 1 k + μ 1 k B k + 1 A k + 1 , μ 1 k + 1 = μ 2 k + 1
10:               k = k + 1
11: end while
Output: A k , E k .

4. Experiment and Analysis

4.1. Experiment Environment and Evaluation Index

The entire experimental setup in this paper was conducted on a system equipped with a 2.40 GHz Intel(R) Core TM i7-3630QM CPU and 8 GB of RAM, running MATLAB2015a on a Windows7 operating system.
We employ two performance metrics to verify the efficacy of the algorithm. One metric is the iteration time, while the other pertains to the error rate. A lower error rate indicates better performance. The formula for the error rate is denoted by A ^ A F / A F , and A ^ represents a low-rank matrix calculated by the algorithm. A represents the low-rank matrix of the input data.
The input data is represented by D = A + E , where A is the low-rank matrix with size m × m , m is the dimenson of matrix. E is the sparse matrix. The low-rank matrix A is generated as follows: first, generate a set of basis vectors. Subsequently, r is selected as the basis vector. Finally, a low-rank matrix A of rank r and dimension m is constructed through random combinations.
To evaluate the performance of the algorithm, we partitioned the experiment into two segments: the first segment involves the simulation data experiment, and the second segment encompasses the oil logging data denoising experiment.

4.2. Denoising Experiment of Simulation Data

4.2.1. Algorithm Performance Evaluation with Varying Rank of the Matrix

First, we use noise-free simulation data to simulate and verify the improved RPCA(IRPCA) which we compare to APG [7], ELAM [9], IALM [10] and Mo-ST0 [5]. We set m = 103. In this part, we use these five algorithms to calculate the A ^ and E ^ of the object matrix. The iteration time and error rate of the five algorithms are compared, taking into account a constant matrix dimension m = 103 and the varying rank of the matrix. The results are depicted in Figure 3 and Figure 4 below.
As depicted in Figure 3 and Figure 4, the iteration time and error rate increase as the rank of the matrix increases. Not only does the IRPCA exhibit a lower iteration time and error rate compared to other algorithms, it also demonstrates high recovery accuracy and a smooth curve trend.

4.2.2. Algorithm Performance Evaluation with Varying Matrix Dimensions

On the basis of the above experiment, we add a sparse matrix E with 10% sparse noise to test the performance of the algorithm at the same r of matrix dimension. Where r a n k ( A ) = 0.05 m ,   a = 0.1 , and the range of non-zero elements in E is [ 25 , 25 ] . We use APG, EALM, IALM, and IRPCA to calculate the A ^ and E ^ of the input data and record the iteration time, number of iterations, rank of A ^ , and error rate. The results are shown in Table 1.
As Table 1 shows, IRPCA has higher recovery accuracy and less iteration time compared to the other four algorithms. However, Mo-ST0 has the lowest number of iterations compared to the other three algorithms.

4.2.3. Algorithm Performance Evaluation of Matrix Dimensionality with Noise

On the basis of the experiment in Section 4.2.2, we add mixed noise to the input data D , which means E consist of 10% sparse noise, 20% Gaussian noise ~ N ( 0 , 1 ) and 70% Gaussian noise ~ N ( 0 , 0.01 ) . The mixed noise is close to the real distribution of real noise. The results are shown below in Table 2.
The results of Table 2 show that the performance of the five algorithms is degraded. However, the accuracy and efficiency of IRPCA is still better than other four algorithms.
In this section, we evaluate the performance of the five aforementioned algorithms under various noise conditions. The results indicate that the data recovery capability of the algorithms diminishes as the level of noise intensifies. However, IRPCA consistently outperforms the others.

4.3. Denoising Experiment of Oil Logging Data

4.3.1. Procedure of Oil Logging Data Denoising

As illustrated in Figure 5, the logging data curve, which is a series of curves derived from geophysical logging instruments, encapsulates the most crucial attribute information within the formation. The changes in trend and distribution of these curves directly signify the depositional environment and characteristics of the formation. Typically, the interpretation of logging data curves provides insights into the distribution of oil or gas within a formation.
Obtaining accurate logging data curves is an essential prerequisite for ensuring high-precision processing and data mining of logging data. During the collection of logging data, redundant information and noise, introduced by various random factors, may degrade the effectiveness and value of logging data mining. Therefore, it is crucial to pre-process the logging data before further application.
Hence, we propose a denoising strategy for oil logging data. This strategy is predicated on the concept of separation. Initially, the foreground and background information are segregated from the logging curve using IRPCA, as detailed in Equation (22).
min   L , S r a n k L + σ S 0 s . t .   I = L + S
where L is the low-rank part representing the background information, S is the low-rank part representing the foreground information.
Subsequently, the IRPCA model is further applied to L in order to decompose the information, thereby obtaining the low-rank and sparse matrix. The resulting denoising model is presented in Equation (23):
min   A , E r a n k A + λ E 0 s . t .   D = A + E
where A is the low-rank part after denoising and E is the sparse part of the noise.
Finally, we combine the low-rank part A and foreground information S , then we can obtain clean logging data after denoising.

4.3.2. Experiment of Oil Logging Data Denoising

To evaluate the effectiveness of the denoising scheme proposed for logging data in the previous section, we utilize the real data from well W1 for the denoising experiment. Detailed information regarding W1 can be found in Table 3.
In the denoising experiment, 10% and 40% random noise were, respectively, introduced into the logging data from well W1 for denoising, and the performance of IRPCA was compared with that of the other four algorithms.
To more visually illustrate the process of denoising the logging data, the GR and SP attributes of well W1, with 10% random noise added, were selected for data denoising.
Below is a demonstration of the denoising process for oil data:
The background information L and foreground information S are separated from the noisy logging data, as shown in Figure 6.
The separated background information, denoted as L still contains some residual noise, prompting us to further apply the IRPCA decomposition on L . The denoised logging data is obtained by integrating the low-rank signal, denoted as A , with the foreground signal, denoted as S . Figure 7 illustrates the information curves of the denoised GR and SP logging data, juxtaposed with the original data. The near-coincidence of the two curves serves as an indication of the effective recovery.
The results of the denoising process applied to the logging data with 10% random noise are presented in Table 4. Similarly, the results of the denoising process applied to the logging data with 40% random noise are presented in Table 5. The time of first iteration in Table 5 denotes the time required for the separation of foreground and background information, while the time of the second iteration refers to the time required for the subsequent separation.
As shown in Table 4, in the denoising process of the five logging attributes, the Mo-ST0 algorithm exhibits the highest iteration time and a higher error rate than the other four algorithms. The IALM algorithm requires less time and exhibits a lower error rate than ELAM and APG. The IRPCA algorithm exhibits the best denoising performance with the lowest error rate. It requires the shortest time in terms of iteration time.
As shown in Table 5, the error rates obtained from the five algorithms increase with the proportion of random noise added. In contrast, the iteration time does not increase significantly as indicated in Table 5 because the matrix dimension and rank do not change. The IALM algorithm exhibits the shortest iteration time and the lowest error rate compared to EALM and APG. On this basis, IRPCA performs as anticipated, with a shorter iteration time and it maintains the lowest error rate.
From the results of both the simulated data experiments and the oil logging data experiments, it is clear that the proposed algorithm enhances the efficiency of the computation and the accuracy of the recovery matrix compared to other algorithms.

5. Conclusions

In this paper, an Improved RPCA algorithm is proposed, which applies the approximate zero norm based on the fractional function structure to reconstruct the RPCA model then adds the weighted kernel parametrization and penalty terms to help the RPCA model enhance the low-rankness of the observation matrix and the sparsity of the sparse matrix. Initially, simulated data experiments are used to evaluate the performance of five algorithms. According to the simulation experimental results, IRPCA achieves a significant improvement in iteration time and error rate compared to the other five algorithms. Subsequently, we detail the steps of oil logging data denoising. Lastly, the results of oil logging data experiments indicate that the IRPCA algorithm operates quickly and maintains a low error rate, thereby demonstrating its remarkable effectiveness in application.
Moreover, the results of the simulation experiments indicate that IRPCA can also be applied to solve a wide variety of optimization problems encountered in other fields, suggesting a promising prospect of practical applications. The IRPCA model proposed in this paper can achieve assured recovery accuracy while improving the iteration efficiency. However, it cannot avoid problems like the slower training speed. Therefore, the use of a rapid optimization algorithm for fine-tuning the prototype parameters of the robust principal component analysis model could result in an increased training speed and improved recovery accuracy. Simultaneously, the cost function is optimized by minimizing the dynamic function between sample density (obtained using the observed data) and the nominal density model, thereby enhancing the robustness and accuracy of the recovery. This challenge will be addressed and practical applications will be further explored in our subsequent research.

Author Contributions

Conceptualization, Y.-K.P. and S.P.; methodology, Y.-K.P. and S.P.; software, Y.-K.P. validation, Y.-K.P.; formal analysis, Y.-K.P. and S.P.; investigation, Y.-K.P. and S.P.; resources, Y.-K.P.; data curation, Y.-K.P.; writing—original draft preparation, Y.-K.P.; visualization, S.P. supervision, S.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (No. 62106276).

Data Availability Statement

All data are available upon request from the corresponding author.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Hu, W.; Bao, J. Development trends of oil industry and China’s countermeasures. China Univ. Pet. Ed. Nat. Sci. 2018, 42, 1–10. [Google Scholar]
  2. Lai, J.; Han, N.R.; Jia, Y.W.; Ji, Y.; Wang, G.; Pang, X.; He, Z.; Wang, S. Detailed description of the sedimentary reservoir of a braided delta based on well logs. Geol. China 2018, 45, 304–318. [Google Scholar]
  3. Ellis, D. Well Logging for Earth Scientists; Springer: Dordrecht, The Netherlands, 2007. [Google Scholar]
  4. Hu, H.; Zhang, J.; Li, Z. A distortion correction method of lateral multi-lens video logging image. In Proceedings of the 2012 IEEE International Conference on Computer Science and Automation Engineering (CSAE), Zhangjiajie, China, 25–27 May 2012; IEEE: New York, NY, USA, 2012. [Google Scholar]
  5. Xia, X.; Gao, F. An Optimization Algorithm of Robust Principal Component Analysis and Its Application. In IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2019. [Google Scholar]
  6. Fornasier, M.; Rauhut, H. Iterative thresholding algorithms. Appl. Comput. Harmon. Anal. 2008, 25, 187–208. [Google Scholar] [CrossRef]
  7. Polson, N.G.; Scott, J.G.; Willard, B.T. Proximal Algorithms in Statistics and Machine Learning. Stat. Sci. 2015, 30, 559–581. [Google Scholar] [CrossRef]
  8. Wah, B.W.; Wang, T.; Shang, Y.; Wu, Z. Improving the Performance of Weighted Lagrange-Multiplier Methods for Nonlinear Constrained Optimization. Inf. Sci. 2000, 124, 241–272. [Google Scholar] [CrossRef]
  9. Lin, Z.; Chen, M.; Ma, Y. The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices. arXiv 2010, arXiv:1009.5055. [Google Scholar]
  10. Lin, Z.; Ganesh, A.; Wright, J.; Wu, L.; Chen, M.; Ma, Y. Fast Convex Optimization Algorithms for Exact Recovery of a Corrupted Low-Rank Matrix; Report No. UILU-ENG-09-2214, DC-246; Coordinated Science Laboratory: Urbana, IL, USA, 2009. [Google Scholar]
  11. Bouwmans, T.; Sobral, A.; Javed, S.; Jung, S.K.; Zahzah, E.-H. Decomposition into Low-Rank Plus Additive Matrices for Background/Foreground Separation: A Review for a Comparative Evaluation with a Large-Scale Dataset. Comput. Sci. Rev. 2017, 23, 1–71. [Google Scholar] [CrossRef]
  12. Bouwmans, T.; Javed, S.; Zhang, H.; Lin, Z.; Otazo, R. On the Applications of Robust PCA in Image and Video Processing. Proc. IEEE 2018, 106, 1427–1457. [Google Scholar] [CrossRef]
  13. Yi, X.; Park, D.; Chen, Y. Fast Algorithms for Robust PCA via Gradient Descent. Adv. Neural Inf. Process. Syst. 2016, 29, 4152–4160. [Google Scholar]
  14. Succurro, M.; Arcuri, G.; Costanzo, G.D. A combined approach based on robust PCA to improve bankruptcy forecasting. Rev. Account. Financ. 2019, 18, 296–320. [Google Scholar] [CrossRef]
  15. Wang, S.; Xia, K.; Wang, L. Improved RPCA Method Via Non-Convex Regularisation for Image Denoising. IET Signal Process. 2020, 14, 269–277. [Google Scholar] [CrossRef]
  16. Han, G.; Wang, J.; Cai, X. Background subtraction based on modified online robust principal component analysis. Int. J. Mach. Learn. Cybern. 2017, 8, 1839–1852. [Google Scholar] [CrossRef]
  17. Mohimani, H.; Babaie-Zadeh, M.; Jutten, C. A Fast Approach for Overcomplete Sparse Decomposition Based on Smoothed L0 Norm. IEEE Trans. Signal Process. 2008, 57, 289–301. [Google Scholar] [CrossRef]
  18. Candes, E.J.; Wakin, M.B.; Boyd, S.P. Enhancing Sparsity by Reweighted ℓ1 Minimization. J. Fourier Anal. Appl. 2008, 14, 877–905. [Google Scholar] [CrossRef]
  19. Peng, Y.; Ganesh, A.; Wright, J. RASL: Robust Alignment by Sparse and Low-Rank Decomposition for Linearly Correlated Images. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 34, 2233–2246. [Google Scholar] [CrossRef]
  20. Zou, H.; Zhang, H.H. On the Adaptive Elastic-Net with a Diverging Number of Parameters. Ann. Stat. 2009, 37, 1733. [Google Scholar] [CrossRef] [PubMed]
  21. Rekavandi, A.M.; Seghouane, A.K.; Evans, R.J. Adaptive Matched Filter Using Non-Target Free Training Data. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020. [Google Scholar]
  22. Rekavandi, A.M.; Seghouane, A.K. Robust Principal Component Analysis Using Alpha Divergence. In Proceedings of the 2020 IEEE International Conference on Image Processing (ICIP), Abu Dhabi, United Arab Emirates, 25–28 October 2020. [Google Scholar]
  23. Rekavandi, A.M.; Seghouane, A.K.; Abed-Meraim, K. TRPAST: A Tunable and Robust Projection Approximation Subspace Tracking Method. IEEE Trans. Signal Process. 2023, 71, 2407–2419. [Google Scholar] [CrossRef]
  24. Rekavandi, A.M.; Seghouane, A.K.; Evans, R.J. Learning Robust and Sparse Principal Components with the α-Divergence. IEEE Trans. Image Process. 2024, 33, 3441–3455. [Google Scholar] [CrossRef]
  25. Jolliffe, I.T. Principal Component Analysis; Springer: New York, NY, USA, 2005. [Google Scholar]
  26. Candes, E.J.; Li, X.; Ma, Y.; Wright, J. Robust Principal Component Analysis. ACM 2009, 58, 11–16. [Google Scholar] [CrossRef]
Figure 1. Graph of fractional function.
Figure 1. Graph of fractional function.
Information 16 00069 g001
Figure 2. Flow chart of IRPCA.
Figure 2. Flow chart of IRPCA.
Information 16 00069 g002
Figure 3. Iteration time of five algorithms when m = 103.
Figure 3. Iteration time of five algorithms when m = 103.
Information 16 00069 g003
Figure 4. Error rate of five algorithms when m = 103.
Figure 4. Error rate of five algorithms when m = 103.
Information 16 00069 g004
Figure 5. An example of logging data curves.
Figure 5. An example of logging data curves.
Information 16 00069 g005
Figure 6. Background and foreground information of signal; (a) GR; (b) SP.
Figure 6. Background and foreground information of signal; (a) GR; (b) SP.
Information 16 00069 g006
Figure 7. Comparison between recovery curve and original curve; (a) GR; (b) SP.
Figure 7. Comparison between recovery curve and original curve; (a) GR; (b) SP.
Information 16 00069 g007
Table 1. Performance comparison of low-rank matrix recovery algorithm.
Table 1. Performance comparison of low-rank matrix recovery algorithm.
Matrix DimensionAlgorithmIteration Time (s)Number of Iterations r a n k A ^ A ^ A F / A F
500APG8.793291251.4578 × 10−4
EALM6.350649251.3215 × 10−8
IALM2.890424253.5487 × 10−8
Mo-ST02.341432251.3154 × 10−8
IRPCA1.804238251.1235 × 10−8
800APG28.017292401.2354 × 10−4
EALM21.709546405.5478 × 10−8
IALM12.372724406.2457 × 10−8
Mo-ST011.592332405.8975 × 10−8
IRPCA10.102442405.4325 × 10−8
1000APG58.469193501.1245 × 10−4
EALM34.402547505.1245 × 10−8
IALM18.118125505.7897 × 10−8
Mo-ST014.145734504.8974 × 10−8
IRPCA11.324538504.4578 × 10−8
1500APG274.446993751.3578 × 10−4
EALM106.249546753.4578 × 10−8
IALM76.609026754.4787 × 10−8
Mo-ST069.554735753.9878 × 10−8
IRPCA56.542339753.2457 × 10−8
2000APG706.6480941009.0457 × 10−5
EALM240.6892441002.4578 × 10−8
IALM202.8769271004.3741 × 10−8
Mo-ST0196.3647351002.9878 × 10−8
IRPCA153.2442391002.4785 × 10−8
3000APG2359.0081941509.4557 × 10−5
EALM756.9117451502.7852 × 10−8
IALM661.3952261503.7892 × 10−8
Mo-ST0621.1346331502.9784 × 10−8
IRPCA521.6254401502.4575 × 10−8
Table 2. Performance comparison of low-rank matrix recovery algorithm in mixed sparse noise.
Table 2. Performance comparison of low-rank matrix recovery algorithm in mixed sparse noise.
Matrix DimensionAlgorithmIteration Time (s)Number of Iterations r a n k A ^ A ^ A F / A F
500APG12.31891233150.0245
EALM59.86455753700.0278
IALM6.1080352880.0275
Mo-ST06.1214462470.0232
IRPCA4.5621572430.0219
800APG40.78801245020.0196
EALM189.87965365920.0185
IALM25.7993354610.0167
Mo-ST021.2365574450.0159
IRPCA17.5623564220.0151
1000APG77.61301246250.0185
EALM398.85125187410.0179
IALM57.1481355770.0154
Mo-ST042.9562494780.0141
IRPCA40.2154564650.0132
1500APG323.05901249320.0145
EALM1476.096348510690.0155
Mo-ST0165.1245496980.0122
IALM215.4624368630.0121
IRPCA143.2654616970.0117
2000APG819.072512512410.0135
EALM3532.553546114290.0121
IALM524.98273611510.0118
Mo-ST0375.1258519810.0107
IRPCA356.3264619860.0098
3000APG2815.510312518480.0112
EALM11,534.670942821710.0131
IALM1811.30963617280.0089
Mo-ST01396.36995716790.0083
IRPCA1325.53215915630.0079
Table 3. Detailed information of well W1.
Table 3. Detailed information of well W1.
DepthAttributes of Logging Data
2750 m–3550 mGR, DT, SP, LLD, DEN, LLS, K
Table 4. Comparison of denoising performance of five logging attributes (with 10% random noise).
Table 4. Comparison of denoising performance of five logging attributes (with 10% random noise).
Logging AttributesAlgorithmNumber of First IterationsNumber of Second IterationsTime of First IterationsTime of Second Iterations A ^ A F / A F
DTAPG49470.02690.02570.07141
EALM11100.93111.05130.07539
IALM59550.02340.02310.01284
Mo-ST057560.01950.01970.01158
IRPCA49460.01830.01760.01053
LLDAPG49490.01920.01870.01245
EALM880.08280.08130.01321
IALM54520.01850.01750.00233
Mo-ST051490.01710.01690.00215
IRPCA49490.01630.01600.00198
DENAPG49490.03580.03480.01123
EALM870.11030.10890.01215
IALM80780.02440.02320.00196
Mo-ST064610.02310.02310.00191
IRPCA76760.02250.02230.00188
LLSAPG49490.02730.02830.01224
EALM880.12330.12280.01256
IALM70700.02080.02030.00294
Mo-ST061620.01960.01950.00291
IRPCA67670.01890.01880.00287
KAPG49470.45710.36430.03451
EALM11110.21630.19870.02534
IALM57560.08020.07890.02575
Mo-ST052510.06080.06090.2412
IRPCA53530.05950.05970.02378
Table 5. Comparison of denoising performance of seven logging attributes (with 40% random noise).
Table 5. Comparison of denoising performance of seven logging attributes (with 40% random noise).
Logging AttributesAlgorithmNumber of First IterationsNumber of Second IterationsTime of First IterationsTime of Second Iterations A ^ A F / A F
GRAPG49490.75820.68520.03312
EALM11100.17850.17230.04212
IALM56530.01450.01330.00983
Mo-ST047500.01310.01290.00815
IRPCA49460.01130.01070.00789
DTAPG49470.02980.02780.09831
EALM11100.29230.24880.08523
IALM59550.02840.02450.03212
Mo-ST047560.02130.02120.02935
IRPCA49480.01980.01870.02895
SPAPG49480.06780.06550.02432
EALM10100.11450.10430.01985
IALM56550.02790.02460.00756
Mo-ST048470.02590.02570.00721
IRPCA52510.02530.02470.00693
LLDAPG49490.01980.01890.03212
EALM980.08320.08250.03121
IALM54520.01980.01890.01453
Mo-ST047460.01740.01750.01325
IRPCA49490.01730.01690.01234
DENAPG49490.03630.03530.02583
EALM880.11230.10930.01423
IALM80780.02550.02430.01413
Mo-ST068670.02310.02910.01385
IRPCA76760.02230.02190.01332
LLSAPG49490.02630.02530.02453
EALM880.13210.13120.01868
IALM70700.02280.02030.00987
Mo-ST057590.02290.02270.00912
IRPCA67670.02120.02100.00878
KAPG49470.46710.39820.07545
EALM11110.22340.59780.04563
IALM57560.07860.07450.02987
Mo-ST047470.07140.07160.02812
IRPCA53550.06320.07230.02789
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Pan, Y.-K.; Peng, S. Improved RPCA Method via Fractional Function-Based Structure and Its Application. Information 2025, 16, 69. https://doi.org/10.3390/info16010069

AMA Style

Pan Y-K, Peng S. Improved RPCA Method via Fractional Function-Based Structure and Its Application. Information. 2025; 16(1):69. https://doi.org/10.3390/info16010069

Chicago/Turabian Style

Pan, Yong-Ke, and Shuang Peng. 2025. "Improved RPCA Method via Fractional Function-Based Structure and Its Application" Information 16, no. 1: 69. https://doi.org/10.3390/info16010069

APA Style

Pan, Y.-K., & Peng, S. (2025). Improved RPCA Method via Fractional Function-Based Structure and Its Application. Information, 16(1), 69. https://doi.org/10.3390/info16010069

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop