Next Article in Journal
Self-Built or Third-Party Blockchain Traceability Strategy in a Dual-Channel Supply Chain Considering Consumers’ Traceability Awareness
Next Article in Special Issue
Testing Equality of Several Distributions at High Dimensions: A Maximum-Mean-Discrepancy-Based Approach
Previous Article in Journal
On Chaos and Complexity Analysis for a New Sine-Based Memristor Map with Commensurate and Incommensurate Fractional Orders
Previous Article in Special Issue
Optimal Estimation of Large Functional and Longitudinal Data by Using Functional Linear Mixed Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Minimum Residual Sum of Squares Estimation Method for High-Dimensional Partial Correlation Coefficient

1
School of Mathematical Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China
2
School of Mathematics, Yunnan Normal University, Kunming 650500, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(20), 4311; https://doi.org/10.3390/math11204311
Submission received: 20 September 2023 / Revised: 10 October 2023 / Accepted: 13 October 2023 / Published: 16 October 2023
(This article belongs to the Special Issue Advances of Functional and High-Dimensional Data Analysis)

Abstract

:
The partial correlation coefficient (Pcor) is a vital statistical tool employed across various scientific domains to decipher intricate relationships and reveal inherent mechanisms. However, existing methods for estimating Pcor often overlook its accurate calculation. In response, this paper introduces a minimum residual sum of squares Pcor estimation method (MRSS), a high-precision approach tailored for high-dimensional scenarios. Notably, the MRSS algorithm reduces the estimation bias encountered with positive Pcor. Through simulations on high-dimensional data, encompassing both sparse and non-sparse conditions, MRSS consistently mitigates the arithmetic bias for positive Pcors, surpassing other algorithms discussed. For instance, for large sample sizes ( n 100 ) with Pcor > 0, the MRSS algorithm reduces the MSE and RMSE by about 30–70% compared to other algorithms. The robustness and stability of the MRSS algorithm is demonstrated by the sensitivity analysis with variance and sparsity parameters. Stocks data in China’s A-share market are employed to showcase the MRSS methodology’s practicality.

1. Introduction

The partial correlation coefficient (Pcor) measures the correlation between two random variables, X and Y, after accounting for the effects of controlling variables Z, denoted by ρ X Y | Z . The Pcor essentially quantifies the unique relationship between X and Y, after removing the correlations between X and Z, and between Y and Z [1]. This correlation coefficient provides a more thorough comprehension of the connection between variables, untainted by the influence of confounding factors. Unlike the Pearson correlation coefficient, which only captures the direct correlation between random variables, the Pcor enables the identification of whether correlations stem from intermediary variables. This distinction enhances the precision and validity of statistical analyses.
The Pcor is a fundamental statistical tool for investigating intricate relationships and gaining a more profound comprehension of the underlying mechanisms in a variety of scientific fields, such as psychology, biology, economics, and social sciences. When examining genetic markers and illness outcomes, biologists used the Pcor to identify correlations while accounting for potential confounding factors [2,3,4]. Marrelec et al. utilised the partial correlation matrix to explore large-scale functional brain networks through functional MRI [5]. In the field of economics, Pcor assists in comprehending complex connections, including the interplay between interest rates and inflation, while considering other variables’ influence [6]. The financial industry also employs Pcor to interpret connections and relationships between stocks in the financial markets [7,8]. For example, Michis proposed a wavelet procedure for estimating Pcor between stock market returns over different time scales and implemented it for portfolio diversification [9]. Using partial correlations within a complex network framework, Singh et al. examined the degree of globalisation and regionalisation of stock market linkages and how these linkages vary across different economic or market cycles [10]. Meanwhile, the employment of the Gaussian graphical model (GGM) technique in psychology has recently gained popularity for defining the relationships between observed variables. This technique employs Pcors to represent pairwise interdependencies, controlling the influence of all other variables [11,12,13]. In the field of geography, a correlation analysis based on the Pcor of the fractal dimension of the variations of HZD components is implemented to study the geomagnetic field component variations in Russian [14].
Several methodologies have been proposed over the years to estimate the Pcor in statistical analyses. For instance, Peng et al. introduced a Pcor estimation technique that relies on the sparsity property of the partial correlation matrix and utilises sparse regression methods [3]. Khare et al. suggested a high-dimensional graphical model selection approach based on the use of pseudolikelihood [15]. Kim provided an R package “ppcor” for a fast calculation to semi-Pcor [16]. Huang et al. introduced the kernel partial correlation coefficient as a measure of the conditional dependence between two random variables in various topological spaces [17]. Van Aert and Goos focused on calculating the sampling variance of Pcor [18]. Hu and Qiu proposed a statistical inference procedure for Pcor under the high-dimensional nonparanormal model [19]. However, these methods mainly centre around determining whether or not the partial correlation coefficient is zero, without adequate regard for the precision of the Pcor calculation and the algorithm’s efficacy. We analysed multiple high-dimensional algorithms and discovered notable Pcor estimation biases, particularly for positive Pcor. Even with larger sample sizes, these biases persisted. Motivated by these findings, our primary goal is to put forward a Pcor estimation algorithm to increase the precision of the Pcor estimation algorithm and diminish the estimation bias for positive Pcor values.
This paper reviews current methods for estimating Pcor in high-dimensional data. We introduce a novel minimum residual sum of squares (MRSS) Pcor estimation method under high-dimensional conditions, aiming to mitigate the estimation bias for positive Pcor. The algorithm’s effectiveness is validated through simulation studies under sparse and non-sparse conditions and real data analysis on stock markets.
The sections are structured as follows: Section 2 outlines definitions and corresponding formulae for calculating Pcor, and examines common algorithms for estimating Pcor. Section 3 presents our Minimum Residual Sum of Squares Pcor estimation, designed to mitigate estimation bias for positive Pcor. In Section 4, we demonstrate the effectiveness of our proposed algorithm through simulation studies on high-dimensional data under both sparse and non-sparse conditions. Section 5 provides an analysis of real data related to stock markets, while Section 6 contains the conclusion.

2. Estimation for Partial Correlation Coefficient

2.1. Definition of Pcor

The classical definition of the partial correlation coefficient is defined by the correlation coefficient between the regression residuals from the linear models of two variables with the controlling variable, respectively. Let X and Y be two random variables, and Z = [ Z 1 ,   Z 2 ,   ,   Z p ] be p-dimensional controlling variables. Consider the linear regression models of X and Y, respectively, with the controlling variable Z,
X = α 0 + i = 1 p α i Z i + ε ,   Y = β 0 + i = 1 p β i Z i + ζ ,  
where ε and ζ are error terms. The partial correlation coefficient between X and Y is conditional on Z, denoted by ρ X Y | Z , and defined by the correlation coefficient between the regression residuals ε and ζ , as follows  
ρ X Y | Z = c o r ( ε ,   ζ ) = c o v ( ε ,   ζ ) v a r ( ε ) v a r ( ζ ) .
where c o r ( . ,   . ) is the correlation coefficient of two random variables; c o v ( . ,   . ) is the covariance of two random variables; v a r ( . ) is the variance of a random variable. Let the sample size be n. In conventional low-dimensional cases ( p < n ), the ordinary least squares (OLS) is used to compute the residuals ε and ζ . Subsequently, the Pcor is computed from the correlation coefficient of residuals. However, the OLS method is not practical for high-dimensional cases ( p > n ). Regularisation methods are introduced to deal with such cases later.

2.2. Calculation Formulae of Pcor

2.2.1. Based on Concentration Matrix

The concentration matrix can also be used to calculate Pcor. Let U 1 = [ X ,   Y ] , U 2 = Z = [ Z 1 ,   ,   Z p ] , U = [ U 1 ,   U 2 ] = [ X ,   Y ,   Z ] and Σ = c o v ( U ) be the covariance matrix. When assuming that Σ is a non-singular matrix, the concentration matrix is denoted as Ω = ( ω i j ) i ,   j = 1 p + 2 = Σ 1 . Consider the following linear regression,
U 1 = U 2 b + e ,  
where b = ( b 1 ,   b 2 ) is the regression coefficient and e = ( e 1 ,   e 2 ) is the regression error. We have
e ^ = ( e ^ 1 ,   e ^ 2 ) = U 1 U 2 b ^ = U 1 U ^ 1 ,  
where b ^ is the estimator of b and U ^ 1 is the estimator of U 1 . The regression residual e ^ N ( 0 ,   V ) is independent of U ^ 1 . The covariance matrix of e ^ can be computed by
c o v ( e ^ ) = c o v ( U 1 ) + c o v ( U ^ 1 ) 2 c o v ( U 1 ,   U ^ 1 ) = Σ 11 + Σ 12 Σ 22 1 Σ 22 Σ 22 1 Σ 21 2 Σ 12 Σ 22 1 Σ 21 = Σ 11 Σ 12 Σ 22 1 Σ 21 = Ω 11 1 .
According to the definition in Equation (1) and Ω 11 1 = 1 ω 11 ω 22 ω 12 ω 21 ω 22 ω 12 ω 21 ω 11 ,   the partial correlation coefficient can be computed by
ρ X Y | Z = c o r ( e ^ 1 ,   e ^ 2 ) = c o v ( e ^ 1 ,   e ^ 2 ) v a r ( e ^ 1 ) v a r ( e ^ 2 ) = ω 12 ω 11 ω 22 .

2.2.2. Based on Additional Regression Models

Additional linear regression models are introduced to calculate the Pcor. Consider new linear regression models of X with [ Y ,   Z ] and Y with [ X ,   Z ] , respectively,
X = λ 0 Y + i = 1 p λ i Z i + η ,  
Y = γ 0 X + i = 1 p γ i Z i + τ ,  
where η and τ are regression error terms. Peng et al. [3] established the correlation between the aforementioned regression coefficients and Pcor, while verifying that formulas λ 0 = ρ X Y | Z ω 22 ω 11 ,   γ 0 = ρ X Y | Z ω 11 ω 22 , v a r ( η ) = 1 ω 11 and v a r ( τ ) = 1 ω 22 hold. Then, we derive λ 0 γ 0 = ρ X Y | Z 2 . Thus, the partial correlation coefficient between X and Y can be calculated by the formula below,
ρ X Y | Z = λ 0 v a r ( τ ) / v a r ( η ) ,  
= s i g n ( λ 0 ) λ 0 γ 0 ,  
where s i g n ( · ) is the sign function.
Consider linear regression models of Y with [ 1 ,   Z ] and Y with [ 1 ,   X ,   Z ] , respectively,
Y = β 0 + i = 1 p β i Z i + ζ ,   Y = γ 1 + γ 0 X + i = 1 p γ i Z i + τ ,  
where ζ and τ are error terms. The partial correlation coefficient can also be calculated as follows [20]
ρ X Y | Z = s i g n ( γ 0 ) v a r ( ζ ) v a r ( τ ) v a r ( ζ ) .
Here, we present five distinct formulae, (1), (2), (5), (6), and (7), for calculating Pcor based on diverse regression models. Specific algorithms applicable to high-dimensional scenarios will be presented in the following section.

2.3. Regularisation Regression for High-Dimensional Cases

Suppose we have centralised samples { x j ,   y j ,   z j 1 ,   ,   z j p } j = 1 n i.i.d. observed from [ X ,   Y ,   Z ] with Z = [ Z 1 ,   ,   Z p ] . Let X = [ x 1 ,   ,   x n ] T , Y = [ y 1 ,   ,   y n ] T and Z = [ Z 1 ,   ,   Z p ] = ( z j i ) n × p . We consider matrix-type linear regression models as follows,
X = i = 1 p α i Z i + ε ,  
Y = i = 1 p β i Z i + ζ ,  
where ε = [ ε 1 ,   ,   ε n ] T and ζ = [ ζ 1 ,   ,   ζ n ] T are error terms. If we estimate regression coefficients α ^ = [ α ^ 1 ,   ,   α ^ p ] T and β ^ = [ β ^ 1 ,   ,   β ^ p ] T , then we can calculate the estimated residuals ε ^ = X X ^ and ζ ^ = Y Y ^ , with X ^ = Z α ^ = i = 1 p α ^ i Z i and Y ^ = Z β ^ = i = 1 p β ^ i Z i . According to the definition of Pcor, we can estimate the Pcor as follows
ρ ^ = c o v ( ε ^ ,   ζ ^ ) v a r ( ε ^ ) v a r ( ζ ^ ) ,  
where c o v ( ε ^ ,   ζ ^ ) = j = 1 n ( ε ^ j ε ¯ ) ( ζ ^ j ζ ¯ ) , v a r ( ε ^ ) = j = 1 n ( ε ^ j ε ¯ ) 2 and v a r ( ζ ^ ) = j = 1 n ( ζ ^ j ζ ¯ ) 2 with ε ¯ = 1 n j = 1 n ε ^ j , ζ ¯ = 1 n j = 1 n ζ ^ j .
In high-dimensional ( p > n ) situations, the penalty function and regularisation regression methods can be introduced to estimate the regression coefficients for regression models. Regularisation regression methods address overfitting in statistical modelling by adding a penalty to the loss function, constraining the coefficient magnitudes. Let p λ ( β ) be the penalty function with a tuning parameter λ , for example, the regularisation estimate of model (8) is given by
α ^ = arg min α 1 n | | X Z α | | 2 + p λ ( α ) ,  
where the penalty p λ ( α ) could widely choose the Lasso penalty [21], the Ridge penalty [22], the SCAD penalty [23], the Elastic net [24], the Fused lasso [25], the MCP penalty [26], and other penalty functions. In this paper, the Lasso regularisation with penalty as p λ ( α ) = λ | | α | | 1 is implemented by the R-package “glmnet” [27], and the MCP with penalty as p λ ( α ) = 1 t ( t λ α ) + , ( t > 1 ) is implemented by the R-package “ncvreg”.

2.4. Existing Pcor Estimation Algorithms

To investigate high-dimensional Pcor estimation methods, we present some existing methods that are suitable for both sparse and non-sparse conditions. Combining the advantages and disadvantages of these methods, we propose a new high-dimensional Pcor estimation method: MRSS—minimum residual sum of squares partial correlation coefficient estimation algorithm.

2.4.1. Res Algorithm

The Res algorithm is primarily defined by the Pcor definition. This algorithm is implemented as follows. First, we use the regularisation regression (Lasso and MCP) on linear models (8) and (9) to obtain the estimated regression coefficients α ^ and β ^ ; then calculate estimated residuals ε ^ = X X ^ and ζ ^ = Y Y ^ , with X ^ = Z α ^ and Y ^ = Z β ^ ; at last, estimate Pcor ρ ^ r e s by formula (10).

2.4.2. Reg2 Algorithm

The Reg2 algorithm can more effectively remove the influence of Z in X and Y using the new regressions below. Consider new linear regression models as follows
X = a 1 X ^ + a 2 Y ^ + η 1 ,  
Y = b 1 X ^ + b 2 Y ^ + τ 1 ,  
where η 1 and τ 1 are error terms, the estimators X ^ = i = 1 p α ^ i Z i and Y ^ = i = 1 p β ^ i Z i are estimated by the Lasso or MCP regularisation regressions of models (8) and (9). Then, we implement the ordinary least squares (OLS) on models (11) and (12), and denote new estimators of X and Y by X ^ R e g 2 and Y ^ R e g 2 . Computing new residuals η ^ 1 = X X ^ R e g 2 and τ ^ 1 = Y Y ^ R e g 2 , we finally estimate Pcor by the Reg2 algorithm as ρ ^ r e g 2 = c o r ( η ^ 1 ,   τ ^ 1 ) .

2.4.3. Coef and Var Algorithm

The Coef and Var algorithm is generated through the introduction of novel regression coefficients based on the Pcor definition formula (5) and (6). Consider linear regression models as follows
X = λ 0 Y + i = 1 p λ i Z i + η 2 ,  
Y = γ 0 X + i = 1 p γ i Z i + τ 2 ,  
where η 2 and τ 2 are error terms. Then, we implement MCP regularisation on these models (13) and (14) and obtain estimated first-term regression coefficients λ ^ 0 , γ ^ 0 and the estimated variance v a r ( η ^ 2 ) , v a r ( τ ^ 2 ) . Finally, we can obtain the Pcor estimate by Coef algorithm as ρ ^ c o e f = s i g n ( λ ^ 0 ) λ ^ 0 γ ^ 0 and the Pcor estimate by Var algorithm as ρ ^ v a r = λ ^ 0 v a r ( τ ^ 2 ) / v a r ( η ^ 2 ) .

2.4.4. RSS2 Algorithm

The RSS2 algorithm is given by the residual sum of squares in formula (7). First, we implement the MCP regularisation on model (9): Y = Z β + ζ and estimate the residual ζ ^ and the residual sum of squares (RSS) R 1 = | | ζ ^ | | 2 2 . Similarly, we implement the MCP regularisation on model (14): Y = γ 0 X + i = 1 p γ i Z i + τ and estimate the first-term regression coefficient γ ^ 0 , the residual τ ^ , and the RSS R 2 = | | τ ^ | | 2 2 . Then, we obtain the Pcor estimate ρ ^ Y = s i g n ( γ ^ 0 ) max ( 0 ,   R 1 R 2 ) / R 1 . Switch the position of X and Y similarly as the above steps. Then, we implement the MCP regularisation on model (8): X = Z α + ε and model (13): X = λ 0 Y + i = 1 p λ i Z i + η and obtain the RSS R 3 = | | ε ^ | | 2 2 , R 4 = | | η ^ | | 2 2 and the estimated first-term coefficient λ ^ 0 . We obtain another Pcor estimate ρ ^ X = s i g n ( λ ^ 0 ) max ( 0 ,   R 3 R 4 ) / R 3 . Finally, we have the estimate Pcor by RSS2 algorithm as ρ ^ r s s 2 = ( ρ ^ X + ρ ^ Y ) / 2 .

3. Minimum Residual Sum of Squares Pcor Estimation Algorithm

3.1. Motivation

From the comprehensive simulations in this paper, it is evident that the Pcor estimation methods discussed exhibit significant bias. This bias becomes more pronounced as the true Pcor increases, especially when the Pcor is positive. Therefore, further research is necessary to address this estimation bias in positive Pcor scenarios. While each algorithm has its merits, the Reg2 algorithm performs notably well when Pcor is below approximately 0.5 . In contrast, the Coef and Var algorithm stands out with minimal bias when Pcor exceeds roughly 0.5 . Our goal is to develop a method that synergises the strengths of both the Reg2 and Var algorithms.
The models introduced in the Reg2 algorithm, (11) and (12), can be represented as,
X = a 1 X ^ + a 2 i = 1 p β ^ i Z i + η 1 ,  
Y = b 2 Y ^ + b 1 i = 1 p α ^ i Z i + τ 1 ,  
When compared to models (13) and (14) from the Coef and Var algorithm, it is evident that the residuals η 1 and η 2 share commonalities. Both provide insights into the information in X after the exclusion of Y and Z effects in some sense. Similarly, τ 1 and τ 2 capture the essence of Y after removing for X and Z influences. If we choose a η i and τ i with a smaller residual sum of squares, then this will lead to a better estimation for the corresponding regression models. A reduced residual sum of squares in the corresponding regression models signifies enhanced precision in eliminating controlling variables effects, leading to a more accurate Pcor estimator. Guided by the objective of minimising the residual sum of squares, we introduce a novel algorithm for high-dimensional Pcor estimation in the subsequent subsection.

3.2. MRSS Algorithm and Its Implementation

We propose a novel Minimum Residual Sum of Squares partial correlation coefficient estimation algorithm, denoted by MRSS. This algorithm aims to diminish the estimation bias for positive Pcor values under high-dimensional situations. Our MRSS algorithm amalgamates the strengths of the Reg2, Coef, and Var algorithms, effectively curtailing bias in Pcor estimation.
Define R S S X = η k 2 2 and R S S Y = τ k 2 2 as the residual sum of squares of X after removing the effects of X and Z, and the residual sum of squares of Y after removing the effects of X and Z, respectively. The tuning parameter k is chosen by minimising the sum of squares of the residuals, so as to remove more associated effects and ensure a more efficient Pcor estimator. For k = 1 , the pair ( η 1 ,   τ 1 ) represents the residuals from the Reg2 algorithm’ models (11) and (12). For k = 2 , ( η 2 ,   τ 2 ) corresponds to the residuals from the Coef and Var algorithms’ models (13) and (14). Then, the residuals estimated by the MRSS algorithm satisfy the minimum residual sum of squares of both X and Y for a more efficient Pcor estimator as follows
η m r s s = arg min k = 1 ,   2 R S S X = arg min k = 1 ,   2 η k 2 2 ,   τ m r s s = arg min k = 1 ,   2 R S S Y = arg min k = 1 ,   2 τ k 2 2 .
The Pcor estimated by MRSS is then given by
ρ m r s s = c o r ( η m r s s ,   τ m r s s ) I k = 1 + λ 0 v a r ( τ m r s s ) / v a r ( η m r s s ) I k = 2
where I is the indicator function and λ 0 is the primary regression coefficient in model (13). If k = 1 , then ρ m r s s is estimated following the idea of Reg2 algorithm; if k = 2 , then ρ m r s s is estimated following the idea of the Coef and Var algorithm. If the two k estimates in (17) differ, the more stable Reg2 algorithm is preferred, setting k = 1 in (18). Given that MRSS integrates two existing algorithms, its convergence should align with their rates.
During the implementation of the MRSS algorithm (Algorithm 1), the Coef and Var algorithm often misestimates Pcor as 0 or ± 1 when the true Pcor is close to 0 or ± 1 , affecting the algorithms’ precision. To address this, we incorporate a discriminative condition in the MRSS pseudo-code. If the estimated Pcor ρ ^ c o e f or ρ ^ v a r is zero or ± 1 , the Coef and Var algorithm is deemed unreliable, and the Reg2 algorithm’s estimate is adopted. Mathematics 11 04311 i001
The proposed MRSS algorithm selects the most suitable residuals by minimising RSS and removing the impact of control variables to optimise the estimation of residuals in the regression model. As such, the estimated Pcor generated by the MRSS algorithm combines the advantages of both algorithms, resulting in a more accurate estimate. Notably, our MRSS algorithm effectively addresses the Pcor estimation bias in cases where Pcor 0 . For instance, when the Coef and Var algorithms estimate Pcor as 0 for true Pcor near 0, the MRSS algorithm utilises the minimum RSS principle to select the Reg2 algorithm, which performs better in the vicinity of Pcor = 0 , and thereby efficiently avoids such misestimations. Around Pcor = 0.5 , the MRSS algorithm employs the minimum RSS principle to determine the more accurate method between Reg2 and Var for exact selection. This selection conforms to the minimum RSS principle, where the regression model and accompanying residuals are selected to provide optimal estimation accuracy, leading to a more precise Pcor estimate. When Pcor lies close to 1, the Reg2 algorithm’s estimates are typically lower with a high RSSs. Thereafter, the MRSS method selects the Var algorithm with small RSSs, which performs better based on the minimum RSS principle. In essence, the MRSS method amalgamates the merits of the Reg2 and Var algorithms. By reducing the sum of squares of the residuals, MRSS can choose the algorithm with a smaller estimation error for Pcor 0 , which allows for the proficient regulation of the estimation bias of Pcor.

4. Simulation

4.1. Data Generation

To study the estimation efficiency of Pcor estimation algorithms under high-dimensional conditions, we generate n centralised samples { x j ,   y j ,   z j 1 ,   ,   z j p } j = 1 n i.i.d from [ X ,   Y ,   Z ] with Z = [ Z 1 ,   ,   Z p ] . Let X = [ x 1 ,   ,   x n ] T , Y = [ y 1 ,   ,   y n ] T and Z = [ Z 1 ,   ,   Z p ] = ( z j i ) n × p . Initially, we produce n controlling samples { Z i } i = 1 p independently and identically by
Z i = 0.5 u + e i
where u = [ u 1 ,   ,   u n ] T and e i = [ e 1 i ,   ,   e n i ] T with u j and e j i generated independently from the normal distribution N ( 0 ,   σ 2 ) with variance σ 2 for i = 1 ,   ,   p . The samples X and Y are then generated by
X = i = 1 p α i Z i + ε ,   and Y = i = 1 p β i Z i + ζ ,  
where ε = [ ε 1 ,   ,   ε n ] T and ζ = [ ζ 1 ,   ,   ζ n ] T with ζ j = ω ε j + η j 1 + ω 2 and ε j , η j drawn i.i.d. from N ( 0 ,   σ 2 ) . The Pearson correlation of ε and ζ gives the partial correlation coefficient Pcor ρ X Y | Z = ω 1 ω 2 . Notably, there is a one-to-one mapping between the true Pcor and the ω parameter.
Since our MRSS algorithm and the Reg2 algorithm perform essentially the same for Pcor < 0 , our simulation focuses on real Pcor values in the range [ 0 ,   1 ] , an interval prone to significant biases with existing methods. Let the true partial correlation coefficient vary as P c o r = 0 ,   0.05 ,   0.1 ,   ,   0.95 with the sample size n = 50 ,   100 ,   ,   400 , the controlling variable size p = 200 ,   500 ,   1000 , 2000 ,   4000 and the normal distribution variance σ 2 = 1 ,   10 ,   40 . For each n ,   p combination, we estimate the partial correlation coefficient for 200 replications using the aforementioned estimation algorithms. We use the software R (4.3.1) for our simulation.
Recognising that both sparse and non-sparse conditions are prevalent in real-world applications [3,28], we present examples under both conditions. To ensure comparability between the examples, the initial l coefficients of α and β are fixed under both conditions, where we select the high-correlated numbers of controlling variables as l = 6 ,   10 ,   14 . For non-sparse examples, the coefficients of α and β asymptotically converge to 0 at varying rates, with coefficients beyond the ( l + 1 ) -th starting at 0.05 , which is significantly smaller than the initial l coefficients.
  • Example 1: under sparse conditions
    Let the coefficients α and β be non-zero for the initial l elements and zero for the rest as follows
    α = β = ( 0.1 ,   0.2 ,   ,   l 20 ,   0.1 ,   0.2 ,   ,   l 20 ,   0 ,   . . . ,   0 ) .
  • Example 2: under non-sparse conditions
    Let the coefficients α and β be the same as Example 1 for the initial l elements with a convergence rate of O ( 1 / 2 p ) for the remaining elements as follows
    α = β = ( 0.1 ,   0.2 ,   ,   l 20 ,   r 2 l / 2 + 1 ,   r 2 l / 2 + 2 ,   ,   r 2 p / 2 ,   r 2 l / 2 + 1 ,   ,   r 2 p / 2 ) ,  
    where r is a tuning parameter to make the ( l + 1 ) -th element close to 0.05 .
  • Example 3: under non-sparse conditions
    Let the coefficients α and β be the same as Example 1 for the initial l elements with a convergence rate of O ( 1 / p ) for the remaining elements as follows,
    α = β = ( 0.1 ,   0.2 ,   ,   l 20 ,   r l / 2 + 1 ,   r l / 2 + 2 ,   ,   r p / 2 ,   r l / 2 + 1 ,   ,   r p / 2 ) ,  
    where r is a tuning parameter to make the ( l + 1 ) -th element close to 0.05 .
  • Example 4: under non-sparse conditions
    Let the coefficients α and β be the same as Example 1 for the initial l elements with a convergence rate of O ( 1 / p ) for the remaining elements as follows,
    α = β = ( 0.1 ,   0.2 ,   ,   l 20 ,   r l / 2 + 1 ,   r l / 2 + 2 ,   ,   r p / 2 ,   r l / 2 + 1 ,   ,   r p / 2 ) ,  
    where r is a tuning parameter to make the ( l + 1 ) -th element close to 0.05 .

4.2. Simulation Results

4.2.1. By MSE and RMSE

We will assess the efficacy of the Pcor estimation algorithms using the mean square error (MSE) and root mean square error (RMSE) indices as follows. These evaluation indicators may indicate the performance of Pcor estimation algorithms from various perspectives.
M S E ( ρ 0 ) = 1 R i = 1 R ( ρ ^ ( i ) ρ 0 ) 2 ,   and R M S E ( ρ 0 ) = 1 R i = 1 R ( ρ ^ ( i ) ρ 0 ) 2 ,  
where ρ 0 is the true Pcor, and ρ ^ ( i ) is the estimated Pcor in the ( i ) -th replication of R = 200 replications.
Table 1 displays the mean of MSE and RMSE ( × 10 2 ) for the estimated Pcors of the true Pcor = 0 ,   0.05 , ,   0.95 with l = 10 , σ 2 = 1 , n = 50 ,   100 ,   200 ,   400 and p = 200 ,   500 ,   1000 ,   2000 ,   4000 across Examples 1–4 using various methods. Table A1 and Table A2, which consider the means of MSE and RMSE ( × 10 2 ) for the estimated Pcors for high correlation controlling variables number l = 6 ,   14 , can be found in the Appendix.
For small sample sizes ( n < 100 ), all algorithms tend to underperform due to the limited data information, with the mean MSE and RMSE being approximately ten times higher than that of large sample size n > 100 . And, our MRSS algorithm remains competitive, with both MSE and RMSE in the same order of magnitude as the best performance Lasso.Reg2. However, for large sample size ( n 100 ), the MRSS algorithm’s performance becomes notably superior. Specifically, the MRSS reduces the MSE by around 40 % compared to the suboptimal MCP.Reg2, and this percentage grows with increasing n. The MRSS represents a significant improvement in algorithmic performance. Additionally, the MSE of the MRSS algorithm exhibits a slower increase with increasing controlling size p, implying improved stability to some extent.
To compare the performance of different algorithms more intuitively, we calculated the percentage difference of MSE by M S E M R S S M S E A L G M S E A L G × 100 % with A L G be algorithms listed above. Similarly, the percentage difference of RMSE can be calculated. And, Table 2 shows the average percentage difference of MSE and RMSE compared to the MRSS algorithm for a small sample size ( n = 50 ) and large sample size( n = 100 ,   200 ,   400 ) with the same settings in Table 1. For a small sample size ( n = 50 ), we observe a 10–20% decrease in MSE and RMSE for an MRSS algorithm relative to the Res algorithm, a 10–20% increase relative to Lasso.Reg2, and a slight change relative to other algorithms. For large sample size ( n = 100 ,   200 ,   400 ), the MRSS algorithm reduces MSE by about 30–70% and RMSE by 20–60% relative to other algorithms, achieving effective control of the Pcor estimation error. These results further illustrate the superiority of the MRSS algorithm. For optimal Pcor estimation performance, we suggest using the MRSS algorithm with a minimum sample size of n = 100 .
For Examples 1–4, shifting from sparse to non-sparse conditions with increasing non-sparsity, we observe that all algorithms exhibit a higher MSE and RMSE under non-sparse conditions compared to sparse conditions, and the MSE and RMSE increase with increasing non-sparsity. This could be attributed to the greater impact and more complicated correlations of the controlling variables, resulting in a less accurate estimate of the partial correlation. However, even in Example 4 with the strongest non-sparsity, the MRSS algorithm still performs well, possessing the smallest MSE and RMSE and outperforming conventional algorithms. Especially under non-sparse conditions, the MRSS algorithm provides a dependable and accurate estimation of Pcor despite the influence of complex controlling variables.

4.2.2. For Pcor Values on [ 0 ,   1 ]

To investigate the effectiveness of Pcor estimation algorithms for various Pcor values, we set a constant ratio of the dimension of controlling variables to the sample size (i.e., a fixed p / n = 2 ,   10 ). Figure 1 displays the average estimated Pcor of 200 repetitions compared to the true Pcor for n = 100 ,   200 ,   400 and l = 6 in Example 1. The MRSS, MCP.Reg2, and MCP.Var are denoted in red, green and blue, respectively. When Pcor is small around Pcor < 0.5, the MRSS accurately simulates the true Pcor, performing similarly to the MCP.Reg2. When Pcor is large, like about Pcor > 0.5, the MRSS performs sub-optimally and comparable to the MCP.Var, falling slightly behind the RSS2. Essentially, the MRSS effectively amalgamates the strengths of both MCP.Reg2 and MCP.Var algorithms, reducing potential weaknesses for Pcor estimation. For a small sample size n = 100 , the MRSS leads to a significant improvement in the estimation for a large Pcor in [ 0 ,   1 ] , but still a considerable estimation bias for small Pcor in [ 0 ,   1 ] owing to the limited sample size and information. For a large sample size n 200 , the MRSS effectively reduces the Pcor estimation bias for Pcor > 0 . Consequently, greatly enhancing the sample size substantially boosts the MRSS estimation accuracy, even if the ratio of the controlling variables dimension to the sample size p / n increases from 2 to 10.

4.3. Parameter Sensitivity

We investigate the sensitivity of the performance of the MRSS algorithm to different parameter settings, such as variance and sparsity. This allows us to explore the robustness of algorithms under different parameter configurations.

4.3.1. For Variance

We set a variance parameter σ 2 in data generation to test the stability of our algorithm under varying variance. Table 3 shows the mean of MSE ( × 10 2 ) and RMSE ( × 10 2 ) for the estimated Pcors of real Pcor = 0 ,   0.05 ,   ,   0.95 with different variances σ 2 = 1 ,   10 ,   40 and l = 10 for a large sample size ( n = 50 ,   100 ) and small sample size ( n = 200 ,   400 ) in Examples 1–4. We discover that, as the variance increases σ 2 from 1 to 40, the MSE and RMSE remain consistent for various examples and sample sizes. This indicates that our MRSS algorithm is highly robust to variance and retains good stability.

4.3.2. For Sparsity

To evaluate the effectiveness of algorithms under different sparsity conditions, we set the data generation conditions to develop from sparse to non-sparse, with an increasingly non-sparse convergence rate from Example 1 to Example 4. This suggests a greater inclusion of controlling variables as we progress through the examples. From the above Table 1, Table 2 and Table 3, our observations show that the MRSS algorithm performs well for all examples. For moderate non-sparse convergence rates, as witnessed in Examples 2–3, MRSS demonstrates both low MSE and RMSE, comparable to the sparse conditions of Example 1. As the rate of non-sparsity convergence and the impact of controlling variables increase in Example 4, the best-performing MRSS also encounters difficulties in reducing the estimation bias. Therefore, the best-performing MRSS algorithm remains the most favoured choice for estimating Pcor under both sparse and non-sparse conditions. If it is possible to analyse the degree of non-sparsity the initial data, then we can obtain a better understanding of the algorithm’s error margin.
Another indication of the sparsity strength is the number of high correlation controlling variables l. Figure 2 illustrates the performance of the featured algorithms for varying numbers l = 6 ,   10 ,   14 . The figure contrasts the average Pcor with the true Pcor for l = 6 ,   10 ,   14 in Example 2 with the first row n = 100 ,   p = 200 and the second n = 200 ,   p = 2000 . As l increases, the interference from controlling variables in the estimation process becomes more pronounced, leading to a heightened estimation bias. However, the MRSS algorithm consistently showcases an optimal performance throughout the entire [ 0 ,   1 ] interval. Remarkably, despite encountering a high interference level at l = 14 , MRSS keeps the bias in close alignment with the diagonal, in contrast to its counterparts. Table 4 shows the mean of the MSE and RMSE for l = 6 ,   10 ,   14 . As l increases, both the MSE and RMSE of the MRSS algorithm increase, but always remain slightly weaker than optimal in small samples and significantly more optimal than the other algorithms in large samples. These results demonstrate the robustness, stability, and precision advantages of the MRSS algorithm.

4.4. Summaries

Based on numerous simulations, our study examines the practicality and effectiveness of the MRSS algorithm in a variety of scenarios. Through extensive simulations, we provide valuable insights into the accuracy and effectiveness of the MRSS algorithm. We provide empirical evidence that MRSS effectively incorporates the strengths of the MCP.Reg2 and MCP.Var algorithms and reduces the potential weaknesses of Pcor estimation, especially in challenging environments with high-dimensional sparse and non-sparse conditions. For larger sample sizes ( n 100 ), the MRSS algorithm reduces the MSE and RMSE by approximately 30–70% compared to other algorithms and effectively controls Pcor estimation errors. For small sample sizes ( n < 100 ), a reduction of 10–20% is observed in MSE and RMSE for the MRSS algorithm compared to the Res algorithm, an increase of 10–20% compared to Lasso.Reg2, and a slight change compared to other algorithms.
Conducting a sensitivity analysis with various variance and sparsity parameters, the outcomes demonstrate the benefits of the MRSS algorithm in terms of robustness, stability, and accuracy. As the variance increases from 1 to 40, the MSE and RMSE remain consistent for distinct examples and sample sizes. This demonstrates that our MRSS algorithm is remarkably resilient to variability and maintains excellent stability. As the level of sparsity decreases (from Examples 1–4, or from l = 6 to 14), it is noticeable that the MSE and RMSE of the MRSS algorithm increase, but remain within the same order of magnitude. Even the optimal MRSS algorithms undergo a significant rise in MSE and RMSE for Example 4 and l = 14 , as an escalation of non-sparse and intricate controlling variables brings forth certain systematic errors.

5. Real Data Analysis

A distinguishing feature of financial markets is the observed correlation among the price movements of various financial assets. A prevalent feature entails the existence of a substantial cross-correlation between stock returns’ simultaneous time evolution [29]. In numerous instances, a strong correlation does not necessarily imply a significant direct relationship. For instance, two stocks in the same market may be subject to shared macroeconomic or investor psychology influences. Therefore, to examine the direct correlation between these stocks, it is necessary to eliminate the common drivers represented by the market index. The Pcor meets this requirement by assessing the direct relationship between the two stocks after removing the market impacts of controlling variables. When accurately estimating the Pcor, it is possible to evaluate the impact of diverse factors (e.g., economic sectors, other markets, or macroeconomic factors) on a specific stock. The resulting partial correlation data may be utilised in fields, such as stock market risk management, stock portfolio optimisation, and financial control [7,8]. Moreover, the Pcor can also indicate the interdependence and influence of industries in the context of global integration. These techniques for analysing Pcor can provide valuable information on the correlations between different assets and different sectors of the economy, as they are generalisable and can be applied to other asset types and cross-asset relationships in financial markets. This information is beneficial for practitioners and policymakers.
We chose 100 stocks with substantial market capitalisation and robust liquidity from the Shanghai Stock Exchange (SSE) market. These stocks can comprehensively represent the overall performance of listed stock prices in China’s A-share market. We then downloaded their daily adjusted closing prices from Yahoo Finance from January 2018 to August 2023 and removed the missing data. Here, a sufficient sample size of n = 1075 was chosen to ensure the effectiveness of algorithms and limit the bias in Pcor estimation. For each pair of the 100 stocks, we estimate their Pcor by setting the remaining stocks as the corresponding controlling variables and construct the estimated Pcor matrix. The Pcor matrix shows the better internal correlation between two stocks after removing the influence of the stock market.
Figure 3 presents the estimated Pcor matrices for 100 stocks from SSE markets using MCP.Reg2, MCP.Var and MRSS algorithms. Blue signifies Pcor = 1 , while red represents Pcor = 1 . Whilst the MCP.Coef, MCP.Var, and RSS2 algorithms all estimate Pcor as 0 when true Pcor approaches 0, our proposed MRSS algorithm resembles the MCP.Reg2, which estimates an accurate Pcor for weak partial correlation. Thus, the MRSS is capable of effectively estimating weak partial correlations. When dealing with high Pcor values and strong partial correlation, we find that the MCP.Var algorithm overestimates Pcor as a result of the divergence in stock prices. For two stocks with a higher stock price, the Pcor estimated by the Var algorithm to be overestimated or even most at 1. MRSS effectively solves this problem. Notably, as a result of incorporating the MCP.Var algorithm, the MRSS algorithm amplifies certain partial correlations that are not significant by MCP.Reg2. These results can also be seen in Table 5. The MRSS estimates these correlations to be stronger partial correlations resulting in improved clarity in the partial correlations.
Figure 4 shows the stocks’ Pcor network for the top-100 and top-50 pairs of Pcor estimates by the MRSS algorithm from 100 SSE stocks. The node represents the stock, coloured with its sector. The edge thickness represents the Pcor estimate between two nodes, with the thick-edge Pcor > 0.4 and the thin-edge Pcor < 0.4 . Table 5 shows the stock pairs with their sector and Pcor estimates for all the MRSS estimated Pcor > 0.4 from 100 SSE stocks, and Table 6 shows the corresponding stock pairs with their company name, business, and sector. Here, we use industry classifications from the Global Industry Classification Standard (GICS) with Communication Services, Consumer Discretionary (C.D.), Consumer Staples, Energy, Financials, Health Care, Industrials, Information Technology (I.T.), Materials, Real Estate and Utilities. We find that two stocks connected in the partial correlation network with a high Pcor are almost in the same sector and operate in the same business. In addition, high Pcor values may indicate shareholding relationships between companies. For instance, the highly correlated 601398–601939–601288–601988–601328 (financials) are all state-controlled banks that do not have a direct high Pcor link with the city banks 601009–601166 (financials). And, stocks that do not belong to the same industry under a high Pcor may have certain other links behind them, such as 601519 (I.T.)–601700 (industrials) having a common major shareholder. After stripping out the other factors influencing the market, Pcor represents the inherent and intrinsic correlation between two stocks because they are in the same sector.
As societies become increasingly integrated, the productive activities of different industries become interdependent and interact with each other. Categorising a company into only one industry does not reflect its overall performance and associated risks. Many listed companies in the stock market belong to conglomerates and operate in different industry sectors, so it is natural for the performance of these companies to be affected by multiple industries. Therefore, we will also find that Pcor, apart from showing the correlation between industries, will also reveal the correlation between two industries that are linked together by two stocks in different industries. For example, the partial correlation between the Bank of Communications (601328) and PetroChina (601857) with Pcor = 0.258 links the Energy (600028–601857 in orange) and Financial (601398–601939–601288–601988–601328 in dark blue) sectors of state-owned assets.
Overall, the MRSS algorithm amalgamates the characteristics of MCP.Reg2 and MCP.Var, enhancing the estimation of strong partial correlations, while effectively estimating those weak partial correlations, ultimately revealing the stock correlations.

6. Conclusions

This paper presents a novel minimum residual sum of squares (MRSS) algorithm for estimating partial correlation coefficients. Its purpose is to reduce the estimation bias of positive partial correlation coefficients in high-dimensional settings under both sparse and non-sparse conditions. The MRSS algorithm is effective in mitigating a Pcor estimation bias by synergistically harnessing the strengths of the Coef, Reg2, and Var algorithms. We also discuss the MRSS algorithm mathematical foundation and its performance in various scenarios compared to some existing algorithms. Through rigorous simulations and real data analysis, it becomes evident that the MRSS algorithm consistently outperforms its constituent and listed algorithms, particularly in challenging environments characterised by non-sparse conditions and high dimensionality. The sensitivity analysis with variance and sparsity parameters demonstrate the robustness, stability, and precision advantages of the MRSS algorithm. Further evidence of the effectiveness of the MRSS algorithm in the correlation analysis of stock data is provided by real data analyses.

7. Future Work

Our proposed MRSS algorithm combines the benefits of two existing algorithms by reducing the total squared residuals and enhancing the accuracy of Pcor estimation. In upcoming studies, we may explore the integration of additional algorithms by minimising the RSS to achieve a greater amalgamation of benefits from various algorithms and improve the estimation accuracy of the integrated algorithm. Reducing the computational complexity of our minimised RSS integration algorithm to decrease computing time represents a core issue in future research. Additionally, conducting in-depth theoretical research on MRSS algorithms, including a proof analysis of consistency and convergence, will be an essential direction for our next steps. Further refinement of theoretical proofs and an in-depth investigation of error convergence speed may uncover reasons for the systematic estimation bias that cannot be ignored when Pcor is positive in all current algorithms. Meanwhile, expanding the use of the MRSS algorithm to a wider range of fields is a focal point of our future research. Concerning financial data, we intend to thoroughly examine the biased correlations between financial data besides stocks and advise on relevant policies.

Author Contributions

Conceptualisation and methodology, J.Y. and M.Y.; software, G.B. and J.Y.; validation and formal analysis, G.B.; data curation, writing—original draft preparation, review and editing, and visualisation, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Doctoral Foundation of Yunnan Normal University (Project No.2020ZB014) and the Youth Project of Yunnan Basic Research Program (Project No.202201AU070051).

Data Availability Statement

The authors confirm that the data supporting the findings of this study are available within the article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Tables for the mean of MSE ( × 10 2 ) and RMSE ( × 10 2 ) of estimated Pcors of real Pcor = 0 ,   0.05 ,   ,   0.95 with σ 2 = 1 , n = 50 ,   100 ,   200 ,   400 , p = 200 ,   500 ,   1000 ,   2000 ,   4000 and the numbers of high correlation controlling variables l = 6 ,   14 in Examples 1–4.
Table A1. The mean of MSE ( × 10 2 ) and RMSE ( × 10 2 ) for the estimated Pcors of real Pcor = 0 ,   0.05 ,   ,   0.95 with l = 6 , σ 2 = 1 , n = 50 ,   100 ,   200 ,   400 and p = 200 ,   500 ,   1000 ,   2000 ,   4000 in Examples 1–4.
Table A1. The mean of MSE ( × 10 2 ) and RMSE ( × 10 2 ) for the estimated Pcors of real Pcor = 0 ,   0.05 ,   ,   0.95 with l = 6 , σ 2 = 1 , n = 50 ,   100 ,   200 ,   400 and p = 200 ,   500 ,   1000 ,   2000 ,   4000 in Examples 1–4.
npMSE ( × 10 2 )RMSE ( × 10 2 )
MethodLassoMCPLassoMCP
Example 1ResReg2ResReg2CoefVarRSS2MRSSResReg2ResReg2CoefVarRSS2MRSS
5020010.011.110.610.215.814.312.18.131.231.932.031.035.333.831.527.6
50011.414.311.712.320.519.116.211.333.236.033.533.940.239.136.432.6
100011.815.812.013.122.521.518.612.833.837.634.135.042.041.239.034.7
1005007.46.67.25.87.36.25.33.126.724.526.423.422.320.919.616.3
10008.68.68.57.29.28.06.84.028.927.728.626.025.223.822.218.4
20009.610.79.48.711.210.08.55.430.430.930.128.428.126.825.021.4
2005003.11.72.01.11.81.51.60.617.312.614.110.610.710.010.77.2
10004.12.62.91.62.42.02.10.819.815.616.612.512.311.612.28.1
20005.03.73.72.23.12.62.51.021.918.319.014.413.913.013.39.0
40010001.30.70.50.40.60.50.60.311.38.37.06.36.05.86.54.6
20001.71.10.70.50.80.70.80.312.910.18.17.26.76.47.25.0
40002.21.50.90.70.90.80.90.414.512.19.38.17.47.08.15.4
Example 2
5020010.511.511.010.616.515.012.48.631.832.532.531.636.435.031.928.7
50011.814.812.112.721.619.916.711.833.736.534.134.441.339.937.133.3
100012.216.012.413.523.122.019.213.134.337.934.635.542.541.739.735.1
1005007.86.97.66.17.86.75.53.327.425.027.123.923.421.920.116.8
10009.09.08.97.69.78.57.14.429.528.429.226.626.324.822.819.5
200010.011.09.89.011.610.58.85.731.031.430.729.028.927.625.522.3
2005003.31.82.31.32.01.71.70.718.013.214.911.411.310.611.17.7
10004.42.93.21.82.72.32.10.920.516.217.513.313.312.412.48.6
20005.33.94.12.43.42.92.61.122.718.919.815.114.813.813.69.6
40010001.50.80.60.50.70.60.60.312.19.17.77.06.56.36.75.1
20001.91.20.80.60.80.70.80.413.710.98.98.07.26.97.45.5
40002.41.71.10.81.00.91.00.415.312.810.28.98.07.68.36.0
Example 3
5020011.412.312.011.518.116.613.910.033.133.534.032.938.136.834.230.8
50012.715.513.013.722.921.418.213.234.937.335.435.842.441.438.935.2
100013.217.013.514.524.623.420.614.535.739.136.136.843.842.940.936.8
1005008.67.78.67.08.77.66.44.028.826.328.725.525.524.021.918.9
10009.99.59.88.510.99.77.95.230.929.230.728.128.426.924.421.7
200010.911.810.89.913.111.910.16.832.532.532.230.531.530.027.824.9
2005004.02.32.91.72.42.11.90.919.714.816.713.013.312.511.89.3
10005.13.43.92.43.22.82.51.222.317.719.415.015.314.313.610.5
20006.24.64.93.14.13.53.11.524.420.421.717.017.015.915.011.7
40010002.01.21.00.80.90.90.80.513.910.99.78.98.38.07.76.7
20002.51.71.21.01.11.00.90.615.612.810.99.99.18.88.37.3
40003.12.21.51.21.41.21.20.717.214.612.210.810.09.69.37.9
Example 4
5020013.914.214.714.022.420.717.713.336.636.037.636.342.341.039.035.5
50016.518.416.917.227.526.423.817.539.840.740.340.145.945.244.140.3
100018.020.518.218.828.628.026.119.241.642.941.942.046.546.345.742.3
10050012.410.812.410.313.712.410.27.834.531.134.531.033.431.929.227.2
100014.613.414.612.718.316.814.010.837.534.837.534.538.737.334.632.0
200016.616.516.615.422.120.617.914.240.038.540.038.042.141.039.136.7
2005007.04.75.94.15.14.73.62.926.020.823.819.921.320.517.716.9
10009.46.88.15.77.36.65.14.130.124.827.923.225.124.221.120.0
200011.68.910.37.49.48.76.85.533.428.231.426.428.427.524.323.1
40010005.23.93.73.33.53.32.42.722.319.218.817.718.117.815.216.2
20006.85.54.94.34.64.43.23.625.522.721.620.320.720.317.418.7
40008.57.06.25.45.95.64.24.528.525.424.422.623.423.019.620.9
Table A2. The mean of MSE ( × 10 2 ) and RMSE ( × 10 2 ) for estimated Pcors of real Pcor = 0 ,   0.05 ,   ,   0.95 with l = 14 , σ 2 = 1 , n = 50 ,   100 ,   200 ,   400 and p = 200 ,   500 ,   1000 ,   2000 ,   4000 in Examples 1–4.
Table A2. The mean of MSE ( × 10 2 ) and RMSE ( × 10 2 ) for estimated Pcors of real Pcor = 0 ,   0.05 ,   ,   0.95 with l = 14 , σ 2 = 1 , n = 50 ,   100 ,   200 ,   400 and p = 200 ,   500 ,   1000 ,   2000 ,   4000 in Examples 1–4.
npMSE ( × 10 2 )RMSE ( × 10 2 )
MethodLassoMCPLassoMCP
Example 1ResReg2ResReg2CoefVarRSS2MRSSResReg2ResReg2CoefVarRSS2MRSS
5020068.536.870.455.287.996.061.557.480.958.281.972.492.997.176.374.3
50091.049.591.271.986.893.277.373.793.467.793.482.792.395.786.284.4
1000100.956.498.878.383.589.383.079.098.372.597.386.390.593.689.687.2
10050039.421.418.214.291.1103.230.514.461.344.141.536.694.9100.649.936.8
100054.428.330.922.997.9104.735.723.372.150.654.146.498.1101.354.646.9
200069.634.647.934.499.8104.641.934.881.555.967.456.998.9101.160.857.5
20050010.65.31.71.77.113.22.80.831.822.513.012.821.928.414.18.2
100016.49.42.52.420.932.34.51.339.529.615.615.440.849.818.210.5
200023.414.33.73.743.656.78.42.447.336.518.818.763.472.425.914.9
40010004.42.10.50.50.90.80.90.220.514.26.96.97.36.97.84.5
20006.43.70.60.71.21.21.20.324.819.07.88.08.38.68.84.9
40009.06.00.80.91.83.21.50.429.324.29.09.410.313.49.85.6
Example 2
5020068.637.070.556.088.195.961.458.180.958.482.073.093.097.176.374.9
50091.549.991.371.987.394.477.673.993.668.293.482.792.596.386.384.5
1000101.056.999.278.283.589.483.578.998.472.797.586.390.493.789.987.2
10050039.921.718.814.792.2103.230.714.861.744.442.237.295.4100.650.137.4
100055.028.731.923.698.0104.936.023.872.451.055.047.198.1101.354.947.5
200070.034.848.835.299.8104.842.435.881.856.068.157.698.9101.261.258.3
20050011.05.61.91.97.613.83.00.932.423.113.813.522.829.314.58.7
100016.89.82.82.722.132.74.91.440.030.216.416.242.550.619.311.3
200023.914.94.04.044.757.88.82.747.837.219.619.564.473.026.616.1
40010004.72.30.60.61.10.91.00.321.214.97.67.77.97.68.15.0
20006.84.00.80.81.31.41.30.425.419.88.78.88.89.29.15.5
40009.46.51.01.11.93.51.60.530.025.09.810.210.914.210.26.2
Example 3
5020070.338.172.557.689.497.263.260.281.959.383.174.093.797.777.576.2
50093.151.392.873.488.495.079.075.694.469.194.283.693.196.687.185.5
1000102.558.2100.780.085.191.085.380.399.173.798.287.391.394.590.787.8
10050042.423.221.917.394.4105.331.917.563.646.045.540.496.5101.651.140.7
100057.630.235.726.8100.1106.737.027.274.152.258.250.399.1102.155.850.8
200072.236.052.337.9101.8106.543.738.583.157.170.659.899.8102.062.360.6
20050013.07.33.33.210.417.54.01.735.326.217.717.428.735.517.612.8
100019.311.94.44.326.037.66.32.642.933.320.520.347.355.922.215.6
200026.917.15.95.949.864.110.84.350.739.823.923.768.677.729.520.3
40010006.33.71.41.51.81.71.60.824.619.011.911.912.111.711.19.0
20008.86.01.81.82.22.31.91.029.124.013.013.213.314.012.49.8
400011.88.82.12.23.35.22.41.233.629.114.314.716.119.613.610.7
Example 4
5020073.440.276.761.492.499.966.464.583.861.085.576.495.299.079.578.9
50096.653.696.677.392.599.282.780.196.370.896.185.895.298.689.188.0
1000106.661.8105.885.490.496.491.186.4101.176.0100.790.294.197.293.891.2
10050049.227.931.025.1102.1111.134.725.468.550.554.248.7100.2104.253.749.1
100066.035.548.437.1106.9112.141.037.879.456.867.859.3102.3104.659.860.0
200081.741.866.849.8109.2112.950.450.888.561.979.968.7103.2104.967.869.7
20050018.912.28.38.120.829.78.96.342.433.828.227.844.051.627.824.8
100028.319.411.711.545.059.214.79.951.942.433.533.066.175.335.131.0
200038.826.416.215.876.192.823.615.260.949.339.338.787.095.743.238.1
400100013.210.36.96.98.07.86.46.035.531.225.725.827.527.524.924.3
200018.815.79.29.411.612.59.78.242.438.529.729.933.434.930.628.4
400024.921.411.712.018.322.412.310.848.844.933.634.042.046.434.032.5

References

  1. Tabachnick; Barbara, G.; Linda, S.F.; Jodie, B.U. Using Multivariate Statistics, 6th ed.; Pearson: Boston, MA, USA, 2013. [Google Scholar]
  2. Huang, Y.; Chang, X.; Zhang, Y.; Chen, L.; Liu, X. Disease characterization using a partial correlation-based sample-specific network. Brief. Bioinform. 2021, 22, bbaa062. [Google Scholar] [CrossRef]
  3. Peng, J.; Wang, P.; Zhou, N.; Zhu, J. Partial correlation estimation by joint sparse regression models. J. Am. Stat. Assoc. 2009, 104, 735–746. [Google Scholar] [CrossRef] [PubMed]
  4. De La Fuente, A.; Bing, N.; Hoeschele, I.; Mendes, P. Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 2004, 20, 3565–3574. [Google Scholar] [CrossRef] [PubMed]
  5. Marrelec, G.; Kim, J.; Doyon, J.; Horwitz, B. Large-scale neural model validation of partial correlation analysis for effective connectivity investigation in functional MRI. Hum. Brain Mapp. 2009, 30, 941–950. [Google Scholar] [CrossRef] [PubMed]
  6. Wang, G.J.; Xie, C.; Stanley, H.E. Correlation structure and evolution of world stock markets: Evidence from Pearson and partial correlation-based networks. Comput. Econ. 2018, 51, 607–635. [Google Scholar] [CrossRef]
  7. Kenett, D.Y.; Tumminello, M.; Madi, A.; Gur-Gershgoren, G.; Mantegna, R.N.; Ben-Jacob, E. Dominating clasp of the financial sector revealed by partial correlation analysis of the stock market. PLoS ONE 2010, 5, e15032. [Google Scholar] [CrossRef]
  8. Kenett, D.Y.; Huang, X.; Vodenska, I.; Havlin, S.; Stanley, H.E. Partial correlation analysis: Applications for financial markets. Quant. Finance 2015, 15, 569–578. [Google Scholar] [CrossRef]
  9. Michis, A.A. Multiscale partial correlation clustering of stock market returns. J. Risk Financ. Manag. 2022, 15, 24. [Google Scholar] [CrossRef]
  10. Singh, V.; Li, B.; Roca, E. Global and regional linkages across market cycles: Evidence from partial correlations in a network framework. Appl. Econ. 2019, 51, 3551–3582. [Google Scholar] [CrossRef]
  11. Epskamp, S.; Fried, E.I. A tutorial on regularized partial correlation networks. Psychol. Methods 2018, 23, 617–634. [Google Scholar] [CrossRef]
  12. Williams, D.R.; Rast, P. Back to the basics: Rethinking partial correlation network methodology. Brit. J. Math. Stat. Psy. 2020, 73, 187–212. [Google Scholar] [CrossRef] [PubMed]
  13. Waldorp, L.; Marsman, M. Relations between networks, regression, partial correlation, and the latent variable model. Multivariate Behav. Res. 2022, 57, 994–1006. [Google Scholar] [CrossRef]
  14. Gvozdarev, A.; Parovik, R. On the relationship between the fractal dimension of geomagnetic variations at Altay and the space weather characteristics. Mathematics 2023, 11, 3449. [Google Scholar] [CrossRef]
  15. Khare, K.; Oh, S.Y.; Rajaratnam, B. A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees. J. R. Stat. Soc. B 2015, 77, 803–825. [Google Scholar] [CrossRef]
  16. Kim, S. ppcor: An R package for a fast calculation to semi-partial correlation coefficients. Commun. Stat. Appl. Methods 2015, 22, 665. [Google Scholar] [CrossRef] [PubMed]
  17. Huang, Z.; Deb, N.; Sen, B. Kernel partial correlation coefficient—A measure of conditional dependence. J. Mach. Learn. Res. 2022, 23, 9699–9756. [Google Scholar]
  18. Van Aert, R.C.; Goos, C. A critical reflection on computing the sampling variance of the partial correlation coefficient. Res. Synth. Methods 2023, 14, 520–525. [Google Scholar] [CrossRef]
  19. Hu, H.; Qiu, Y. Inference for nonparanormal partial correlation via regularized rank based nodewise regression. Biometrics 2023, 79, 1173–1186. [Google Scholar] [CrossRef]
  20. Cox, D.R.; Wermuth, N. Multivariate Dependencies–Models, Analysis and Interpretation; Chapman and Hall: London, UK, 1996. [Google Scholar]
  21. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 1996, 58, 267–288. [Google Scholar] [CrossRef]
  22. Owen, A.B. A robust hybrid of lasso and ridge regression. Contemp. Math. 2007, 443, 59–72. [Google Scholar]
  23. Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  24. Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B. 2005, 67, 301–320. [Google Scholar] [CrossRef]
  25. Tibshirani, R.; Saunders, M.; Rosset, S.; Zhu, J.; Knight, K. Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. B. 2005, 67, 91–108. [Google Scholar] [CrossRef]
  26. Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 2010, 38, 894–942. [Google Scholar] [CrossRef] [PubMed]
  27. Wang, H. Coordinate descent algorithm for covariance graphical lasso. Stat. Comput. 2014, 24, 521–529. [Google Scholar] [CrossRef]
  28. Fan, J.; Fan, Y.; Lv, J. High dimensional covariance matrix estimation using a factor model. JoE. 2008, 147, 186–197. [Google Scholar] [CrossRef]
  29. Elton, E.J.; Gruber, M.J.; Brown, S.J.; Goetzmann, W.N. Modern Portfolio Theory and Investment Analysis; John Wiley and Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
Figure 1. Average Pcor against true Pcor of each true Pcor = 0 ,   0.1 ,   ,   0.95 for p = 2 n in first row and p = 10 n in second row with n = 100 ,   200 ,   400 and l = 6 in Example 1.
Figure 1. Average Pcor against true Pcor of each true Pcor = 0 ,   0.1 ,   ,   0.95 for p = 2 n in first row and p = 10 n in second row with n = 100 ,   200 ,   400 and l = 6 in Example 1.
Mathematics 11 04311 g001
Figure 2. Average Pcor against true Pcor for n = 100 ,   p = 200 in the first row and n = 200 ,   p = 2000 in the second row with l = 6 ,   10 ,   14 in Example 2.
Figure 2. Average Pcor against true Pcor for n = 100 ,   p = 200 in the first row and n = 200 ,   p = 2000 in the second row with l = 6 ,   10 ,   14 in Example 2.
Mathematics 11 04311 g002
Figure 3. Estimated Pcor matrix of 100 HKSE stocks, with blue representing Pcor = 1 and red representing Pcor = 1 .
Figure 3. Estimated Pcor matrix of 100 HKSE stocks, with blue representing Pcor = 1 and red representing Pcor = 1 .
Mathematics 11 04311 g003
Figure 4. Stocks’ Pcor network for the top-100 and top-50 pairs of Pcor estimates by MRSS algorithm from 100 SSE stocks. The node represents the stock, coloured with its sector. The edge thickness represents the Pcor estimate between two nodes, with the thick edge Pcor > 0.4 and the thin edge Pcor < 0.4 .
Figure 4. Stocks’ Pcor network for the top-100 and top-50 pairs of Pcor estimates by MRSS algorithm from 100 SSE stocks. The node represents the stock, coloured with its sector. The edge thickness represents the Pcor estimate between two nodes, with the thick edge Pcor > 0.4 and the thin edge Pcor < 0.4 .
Mathematics 11 04311 g004
Table 1. The mean of MSE ( × 10 2 ) and RMSE ( × 10 2 ) for estimated Pcors of real Pcor = 0 ,   0.05 ,   ,   0.95 with l = 10 , σ 2 = 1 , n = 50 ,   100 ,   200 ,   400 and p = 200 ,   500 ,   1000 ,   2000 ,   4000 in Examples 1–4.
Table 1. The mean of MSE ( × 10 2 ) and RMSE ( × 10 2 ) for estimated Pcors of real Pcor = 0 ,   0.05 ,   ,   0.95 with l = 10 , σ 2 = 1 , n = 50 ,   100 ,   200 ,   400 and p = 200 ,   500 ,   1000 ,   2000 ,   4000 in Examples 1–4.
MSE ( × 10 2 )RMSE ( × 10 2 )
Example 1LassoMCPLassoMCP
n p ResReg2ResReg2CoefVarRSS2MRSSResReg2ResReg2CoefVarRSS2MRSS
5020036.721.737.627.434.635.434.627.159.244.459.950.855.256.355.650.5
50046.829.647.236.034.435.336.435.566.951.967.258.254.956.057.357.5
100051.433.050.739.334.134.836.638.870.254.869.760.854.355.357.560.3
10050022.111.715.29.524.221.514.07.745.932.438.129.947.946.035.527.3
100029.616.122.413.932.530.022.613.453.137.846.136.154.553.545.235.4
200036.119.729.618.435.034.329.418.258.841.753.241.556.156.351.141.3
2005006.63.11.71.52.42.61.80.725.117.312.812.112.713.311.57.6
10009.65.22.52.23.84.32.40.930.222.115.614.516.417.113.18.9
200013.07.93.63.16.06.63.31.435.327.018.617.321.021.515.310.5
40010002.71.30.50.50.70.60.70.216.111.36.76.86.36.06.74.4
20003.82.20.60.60.80.70.90.319.214.67.88.06.86.47.54.9
40005.23.50.80.91.00.91.10.422.318.49.09.37.77.58.75.6
Example 2
5020037.021.838.328.134.735.834.927.959.444.460.451.355.356.756.051.2
50047.129.547.636.534.535.436.436.067.151.767.458.555.056.157.258.0
100051.733.651.339.834.134.836.939.270.455.470.161.254.455.257.860.5
10050022.612.015.79.925.122.614.98.146.532.838.730.548.847.136.728.0
100030.016.523.014.433.330.823.413.953.538.146.836.655.154.145.836.2
200036.620.130.319.235.334.730.019.059.242.153.842.456.456.651.642.2
2005006.93.41.91.72.62.81.90.825.817.913.712.913.514.011.88.2
10009.95.62.82.44.14.62.61.130.822.816.415.417.017.913.49.5
200013.58.33.93.46.46.93.51.535.927.619.518.121.622.315.811.2
40010002.91.50.60.60.70.60.70.316.812.17.57.56.76.46.94.9
20004.12.50.80.80.90.80.90.419.915.48.68.87.37.07.85.5
40005.53.91.01.11.11.01.20.523.019.39.810.38.38.19.16.3
Example 3
5020038.523.039.929.535.136.235.529.460.645.761.652.755.757.256.552.5
50048.730.649.237.934.835.737.137.368.352.768.559.655.456.558.059.0
100053.434.553.041.434.535.437.440.971.556.071.262.454.956.158.461.9
10050024.413.317.811.527.925.517.310.148.334.641.232.851.350.039.631.1
100031.917.525.216.134.633.126.515.855.239.349.038.856.255.948.638.5
200038.821.532.420.736.035.831.720.661.043.655.644.157.157.653.044.0
2005008.34.42.82.63.63.92.51.328.120.316.515.817.117.714.011.1
100011.66.94.03.55.56.23.31.733.325.219.518.321.122.116.012.8
200015.49.75.44.78.38.94.62.338.329.822.621.126.226.818.814.6
40010004.02.31.11.21.21.01.00.619.615.110.610.69.59.28.97.7
20005.43.61.41.51.41.31.20.822.818.611.912.010.410.19.98.5
40007.05.21.81.91.71.61.60.925.922.313.113.511.611.511.29.4
Example 4
5020041.725.443.632.836.337.637.232.663.148.164.455.657.258.758.355.4
50053.834.354.342.636.137.238.941.971.756.072.063.357.058.260.062.6
100058.938.358.946.835.936.939.945.775.259.375.166.456.757.861.165.3
10050030.417.324.816.935.534.625.916.653.939.448.639.957.657.848.139.7
100039.722.534.623.538.439.533.623.461.644.857.446.859.861.154.546.8
200048.027.143.229.739.540.837.029.567.849.364.352.960.962.157.952.7
20050012.98.16.86.38.79.35.54.535.127.425.524.428.729.822.121.1
100018.512.49.88.813.915.08.26.742.133.730.628.936.538.026.925.6
200024.616.613.511.821.622.112.39.748.538.936.033.545.846.532.730.7
40010009.17.05.25.25.25.04.04.229.625.922.322.422.221.919.720.4
200012.710.47.07.17.27.15.85.834.931.425.926.026.226.223.724.0
400016.514.18.99.29.79.97.87.839.836.329.329.630.431.127.227.6
Table 2. The average percentage difference of the MSE and RMSE compared to the MRSS algorithm for a small sample size ( n = 50 ) and a large sample size ( n = 100 ,   200 ,   400 ) with the same settings in Table 1.
Table 2. The average percentage difference of the MSE and RMSE compared to the MRSS algorithm for a small sample size ( n = 50 ) and a large sample size ( n = 100 ,   200 ,   400 ) with the same settings in Table 1.
(%)For MSEFor RMSE
LassoMCPLassoMCP
ExampleResReg2ResReg2CoefVarRSS2ResReg2ResReg2CoefVarRSS2
Small sample size ( n = 50 )
Example 1−2521−25−1−2−4−6−1412−15−120−1
Example 2−2422−25−10−3−5−1412−14−131−1
Example 3−2323−24−130−2−1312−14−1420
Example 4−2223−23−11183−1312−13−1752
Large sample size ( n = 100 ,   200 ,   400 )
Example 1−79−62−52−39−65−63−56−60−44−34−27−36−35−30
Example 2−78−61−51−39−63−61−53−58−43−34−26−34−34−27
Example 3−74−55−47−34−56−55−42−52−35−28−20−30−30−18
Example 4−53−28−27−14−37−38−15−31−14−14−6−20−21−6
Table 3. The means of MSE ( × 10 2 ) and RMSE ( × 10 2 ) for the estimated Pcors of real Pcor = 0 ,   0.05 ,   ,   0.95 with different variances σ 2 = 1 ,   10 ,   40 and l = 10 for large sample size ( n = 50 ,   100 ) and small sample size ( n = 200 ,   400 ) in Examples 1–4.
Table 3. The means of MSE ( × 10 2 ) and RMSE ( × 10 2 ) for the estimated Pcors of real Pcor = 0 ,   0.05 ,   ,   0.95 with different variances σ 2 = 1 ,   10 ,   40 and l = 10 for large sample size ( n = 50 ,   100 ) and small sample size ( n = 200 ,   400 ) in Examples 1–4.
Small Sample SizeMSE ( × 10 2 )RMSE ( × 10 2 )
LassoMCPLassoMCP
Example σ 2 ResReg2ResReg2CoefVarRSS2MRSSResReg2ResReg2CoefVarRSS2MRSS
Example 1137.1021.9933.7924.0832.4731.8928.9323.4459.0143.8355.7046.2153.8253.8950.3845.38
1037.0421.7433.6624.0232.3331.8428.7323.4758.9643.6455.5846.1353.7053.8850.1645.39
4037.0221.8433.8024.1732.4031.8828.8323.5758.9743.6855.6846.2053.6853.8250.2345.43
Example 2137.5122.2434.3624.6232.8632.3429.3924.0359.3644.1156.2146.7654.1754.3150.8546.00
1037.5422.1834.2524.4932.7432.1629.1723.8459.3944.0656.1146.6354.1254.1950.6345.82
4037.4022.1634.2924.6432.8032.2729.1524.0859.2744.0056.1546.7554.0854.2250.5046.05
Example 3139.2823.3936.2526.1933.8233.6230.9125.7060.8045.3457.8748.4055.1155.5552.3547.83
1039.2623.3136.1426.1033.7333.6330.7025.5960.8145.2657.8348.3655.0655.6152.1547.77
4039.1623.2636.3526.3133.8833.7130.7325.8260.7145.2057.9448.4755.1355.5852.1647.91
Example 4145.4027.5043.2332.0436.9737.7635.4231.6365.5449.5063.6554.1658.2159.2856.6453.76
1045.4327.4043.2032.0036.9837.7635.5031.6565.5649.4063.6554.1058.2159.3456.7753.76
4045.3827.4243.1531.8937.0637.9435.3031.5965.4949.3663.5653.9858.2359.4356.5053.68
Large Sample SizeMSE ( × 10 2 )RMSE ( × 10 2 )
LassoMCPLassoMCP
Example σ 2 ResReg2ResReg2CoefVarRSS2MRSSResReg2ResReg2CoefVarRSS2MRSS
Example 116.813.871.621.472.462.601.710.6524.7018.4411.7711.3411.8011.9710.476.98
106.803.841.651.492.512.711.760.6824.6818.3911.8511.4111.8712.1810.557.08
406.773.881.641.462.432.551.700.6624.6118.4311.7811.2911.6511.8610.346.98
Example 217.164.171.831.672.632.781.800.7525.3819.1812.5812.1612.4212.6010.807.60
107.154.131.871.692.722.891.850.7725.3619.1112.6812.2012.6212.8110.887.66
407.144.151.841.672.632.761.800.7525.3219.1012.5912.1112.3212.5510.777.56
Example 318.605.342.752.553.623.822.371.2828.0021.8815.6915.2215.9916.2313.1210.69
108.585.302.782.573.613.822.371.3127.9721.7915.7615.2615.9316.1813.0110.74
408.575.332.762.533.593.762.331.2827.9421.8415.6815.1415.9316.1312.8410.63
Example 4115.7311.448.538.0711.0511.407.276.4538.3132.2728.2527.4631.6232.2625.3724.88
1015.7111.438.568.0711.0411.407.226.4438.2932.2528.2927.4531.6132.2425.2824.86
4015.7011.428.568.0511.0911.367.296.4338.2832.2528.2827.4231.6832.2025.3924.83
Table 4. The mean of MSE ( × 10 2 ) and RMSE ( × 10 2 ) for estimated Pcors of real Pcor = 0 ,   0.05 ,   ,   0.95 with l = 6 ,   10 ,   14 and σ 2 = 1 for a large sample size ( n = 50 ,   100 ) and small sample size ( n = 200 ,   400 ) in Examples 1–4.
Table 4. The mean of MSE ( × 10 2 ) and RMSE ( × 10 2 ) for estimated Pcors of real Pcor = 0 ,   0.05 ,   ,   0.95 with l = 6 ,   10 ,   14 and σ 2 = 1 for a large sample size ( n = 50 ,   100 ) and small sample size ( n = 200 ,   400 ) in Examples 1–4.
Small Sample SizeMSE ( × 10 2 )RMSE ( × 10 2 )
LassoMCPLassoMCP
Example l ResReg2ResReg2CoefVarRSS2MRSSResReg2ResReg2CoefVarRSS2MRSS
Example 169.811.29.99.514.413.211.37.430.731.430.829.632.230.928.925.2
1037.122.033.824.132.531.928.923.459.043.855.746.253.853.950.445.4
1470.637.859.646.191.298.555.047.181.258.272.663.694.698.269.664.5
Example 2610.211.610.39.915.113.811.67.831.331.931.430.233.131.829.525.9
1037.522.234.424.632.932.329.424.059.444.156.246.854.254.350.846.0
1471.038.260.146.691.598.855.347.581.558.473.064.094.798.469.864.9
Example 3611.112.311.310.816.415.112.99.032.633.032.931.635.033.631.428.1
1039.323.436.226.233.833.630.925.760.845.357.948.455.155.552.347.8
1473.039.562.648.893.2100.356.749.982.759.675.065.995.699.170.866.9
Example 4615.315.615.614.722.120.818.313.838.337.338.637.041.540.438.635.7
1045.427.543.232.037.037.835.431.665.549.563.654.258.259.356.653.8
1478.943.570.956.098.9105.361.057.586.362.880.771.598.3101.473.972.8
Large Sample SizeMSE ( × 10 2 )RMSE ( × 10 2 )
LassoMCPLassoMCP
Example l ResReg2ResReg2CoefVarRSS2MRSSResReg2ResReg2CoefVarRSS2MRSS
Example 162.91.91.81.11.61.41.40.516.312.812.49.99.59.09.76.5
106.83.91.61.52.52.61.70.624.718.411.811.311.812.010.57.0
1411.76.81.61.612.617.93.20.932.224.311.811.925.329.914.18.1
Example 263.12.12.01.31.81.51.50.617.013.513.210.610.29.69.97.1
107.24.21.81.72.62.81.80.725.419.212.612.212.412.610.87.6
1412.17.21.81.813.118.33.41.032.825.012.712.726.230.714.68.8
Example 363.82.62.61.72.21.91.70.918.815.215.112.412.211.511.08.9
108.65.32.72.53.63.82.41.328.021.915.715.216.016.213.110.7
1414.49.13.23.115.621.44.51.936.028.516.916.931.035.717.713.0
Example 468.16.16.55.06.05.64.23.927.623.524.721.722.822.219.219.3
1015.711.48.58.111.011.47.36.438.332.328.227.531.632.325.424.9
1423.817.610.710.629.937.412.69.447.040.031.731.550.055.232.629.9
Table 5. Stock pairs with their sector and Pcor estimates for all the MRSS estimated Pcor > 0.4 by different algorithms from 100 SSE stocks.
Table 5. Stock pairs with their sector and Pcor estimates for all the MRSS estimated Pcor > 0.4 by different algorithms from 100 SSE stocks.
Stock1Stock2LassoMCP
SymbolSectorSymbolSectorResReg2ResReg2CoefVarRSS2MRSS
601398Financials601939Financials0.5260.5260.5350.5350.5330.8400.5270.840
600022Materials601005Materials0.5690.5690.5810.5810.5800.7690.5900.769
601186Industrials601390Industrials0.5890.5890.5660.5660.5870.7480.5840.748
600012Industrials601099Financials0.4050.4050.3990.3990.4040.6970.4140.697
601288Financials601988Financials0.4730.4730.4760.4760.4900.6460.4730.646
600028Energy601857Energy0.5500.5500.5450.5450.5690.6070.5340.607
601098C.D.601801C.D.0.4680.4680.4740.4740.4750.6060.4760.606
601328Financials601988Financials0.3570.3570.3160.3160.3690.6000.3220.600
600017Industrials601880Industrials0.3720.3720.3820.3820.3840.5740.3940.574
600026Industrials601872Industrials0.5900.5900.5720.5730.59010.5930.573
601866Industrials601919Industrials0.5520.5520.5450.5450.56210.5540.545
601179Industrials601390Industrials0.2910.2910.2750.2750.2850.5430.2840.543
600011Utilities600021Utilities0.5350.5350.5220.5220.5350.5430.5290.543
601333Industrials601801C.D.0.5260.5260.5260.5260.52510.5280.526
600011Utilities600027Utilities0.5140.5140.5170.5170.51410.5010.517
601088Energy601666Energy0.2890.2890.3260.3260.3590.4920.3490.492
601288Financials601398Financials0.3530.3530.3380.3380.3490.4910.3390.491
601168Materials601899Materials0.5150.5150.4900.4900.53510.4970.490
601186Industrials601618Industrials0.2600.2600.2450.2450.2500.4880.2490.488
600018Industrials601018Industrials0.4800.4800.4830.4830.48610.4850.483
600008Utilities600012Industrials0.3190.3190.3090.3090.3120.4360.3040.436
601009Financials601166Financials0.3000.3000.2980.2980.3030.4270.3070.427
600020Industrials601177Industrials0.4310.4310.4300.4300.4210.4220.4290.422
601001Energy601137Materials0.2680.2680.2610.2610.2830.4210.2600.421
601519I.T.601700Industrials0.2240.2240.2190.2190.2170.4100.2030.410
600017Industrials601008Industrials0.4040.4040.4050.4050.4030.9790.4070.405
601318Financials601601Financials0.4140.4140.4030.4030.41410.4140.403
Table 6. Stock pairs with their company name, business, and sector for all the MRSS estimated Pcor > 0.4 from 100 SSE stocks.
Table 6. Stock pairs with their company name, business, and sector for all the MRSS estimated Pcor > 0.4 from 100 SSE stocks.
SymbolCompanyBusinessSectorSymbolCompanyBusinessSector
601098South central MediaMediaC.D.601801Anhui Xinhua MediapublishingC.D.
600028SinopecRefining and TradingEnergy601857PetroChinaRefining and TradingEnergy
601088China Shenhua EnergyCoal MiningEnergy601666Pingdingshan Tianan Coal MiningCoal MiningEnergy
601001Datong Coal IndustryCoal MiningEnergy601137Ningbo Boway Alloy MaterialIndustrial MetalsMaterials
601398Industrial and Commercial Bank of ChinaBanksFinancials601939China Construction BankBanksFinancials
601288Agricultural Bank of ChinaBanksFinancials601988Bank of ChinaBanksFinancials
601328Bank of CommunicationsBanksFinancials601988Bank of ChinaBanksFinancials
601288Agricultural Bank of ChinaBanksFinancials601398Industrial and Commercial Bank of ChinaBanksFinancials
601009Bank of NanjingBanksFinancials601166Industrial Bank of ChinaBanksFinancials
601318Ping An Insurance of ChinaInsuranceFinancials601601China Pacific InsuranceInsuranceFinancials
601186China Railway ConstructionInfrastructureIndustrials601390China Railway Engineering GroupInfrastructureIndustrials
600012Anhui ExpresswayRailway and HighwayIndustrials601099China Pacific InsurancecertificateFinancials
600017Rizhao PortShipping PortsIndustrials601880Dalian PortShipping PortsIndustrials
600026China Shipping DevelopmentShipping PortsIndustrials601872China Merchants Energy ShippingShipping PortsIndustrials
601866China Shipping Container LinesShipping PortsIndustrials601919China Ocean ShippingShipping PortsIndustrials
601179China Xidian ElectricGrid EquipmentIndustrials601390China Railway EngineeringInfrastructureIndustrials
601333Guangshen RailwayRailway and HighwayIndustrials601801Anhui Xinhua MediapublishingC.D.
601186China Railway ConstructionInfrastructureIndustrials601618Metallurgical Corporation of ChinaProfessional EngineeringIndustrials
600018Shanghai International Port GroupShipping PortsIndustrials601018Ningbo PortShipping PortsIndustrials
600020Zhongyuan ExpresswayRailway and HighwayIndustrials601177Hangzhou Advance GearboxmachineIndustrials
600017Rizhao PortShipping PortsIndustrials601008Lianyungang PortShipping PortsIndustrials
601519Shanghai DZHSoftware DevelopmentI.T.601700Changshu Fengfan Power EquipmentGrid EquipmentIndustrials
600022Jinan Iron and SteelPlain SteelMaterials601005Chongqing Iron and SteelPlain SteelMaterials
601168Western MiningIndustrial MetalsMaterials601899Zijin MiningIndustrial MetalsMaterials
600011Huaneng Power InternationalElectricityUtilities600021Shanghai Electric PowerElectricityUtilities
600011Huaneng Power InternationalElectricityUtilities600027Huadian Power InternationalElectricityUtilities
600008Beijing CapitalWaterUtilities600012Anhui ExpresswayRailway and HighwayIndustrials
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, J.; Bai, G.; Yan, M. Minimum Residual Sum of Squares Estimation Method for High-Dimensional Partial Correlation Coefficient. Mathematics 2023, 11, 4311. https://doi.org/10.3390/math11204311

AMA Style

Yang J, Bai G, Yan M. Minimum Residual Sum of Squares Estimation Method for High-Dimensional Partial Correlation Coefficient. Mathematics. 2023; 11(20):4311. https://doi.org/10.3390/math11204311

Chicago/Turabian Style

Yang, Jingying, Guishu Bai, and Mei Yan. 2023. "Minimum Residual Sum of Squares Estimation Method for High-Dimensional Partial Correlation Coefficient" Mathematics 11, no. 20: 4311. https://doi.org/10.3390/math11204311

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop