Next Article in Journal
Aperiodic Sampled-Data Control for Anti-Synchronization of Chaotic Nonlinear Systems Subject to Input Saturation
Previous Article in Journal
On the Fixed Circle Problem on Metric Spaces and Related Results
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Default Prediction with Industry-Specific Default Heterogeneity Indicators Based on the Forward Intensity Model

School of Management, Harbin Institute of Technology, Harbin 150001, China
*
Author to whom correspondence should be addressed.
Axioms 2023, 12(4), 402; https://doi.org/10.3390/axioms12040402
Submission received: 8 March 2023 / Revised: 18 April 2023 / Accepted: 19 April 2023 / Published: 21 April 2023
(This article belongs to the Special Issue Big Data Analytics and Mathematical Methods in Digital Economy)

Abstract

:
When predicting the defaults of a large number of samples in a region, this will be affected by industry default heterogeneity. To build a credit risk model that is more suitable for Chinese-listed firms, which have highly industry-specific default heterogeneity, we extend the forward intensity model to predict the defaults of Chinese-listed firms with information about the default heterogeneity of industries. Compared with the original model, we combine the Bayes approach with the forward intensity model to generate time-varying industry-specific default heterogeneity indicators. Our model can capture co-movements of different industries that cannot be observed based on the original forward intensity model so that the model can flexibly adjust the firm’s PD according to the industry. In addition, we also consider the impact of default heterogeneity in other industries by studying the influence of the level and trends of other industries’ default heterogeneity on a firm’s credit risk. Finally, we compute PDs for 4476 firms from January 2001 to December 2019 for 36 prediction horizons. The extended model improves the prediction accuracy ratios both for the in-sample and out-of-sample firm’s PDs for all 36 horizons. Almost all the accuracy ratios of the prediction horizons’ PDs are increased by more than 6%. In addition, our model also reduces the gap between the aggregated PDs and the realized number of defaults. Our industry-specific default heterogeneity indicator is helpful to improve the model’s performance, especially for predicting defaults in a large portfolio, which is of significance for credit risk management in China and other regions.

1. Introduction

1.1. Background and Purpose

Currently, credit risk management is an important topic for individual investors, banks, and financial regulators. To analyze the different structures of credit risk for multiple periods, credit risk models have developed to calculate the probability of default (PD) with concerning term structure. Firms’ PDs have been employed by researchers and practitioners to do credit analysis on firms or large portfolios. Recently, researchers have found that industry-specific default heterogeneity exists among different industries. Traditional credit risk models, without information on industry-specific default heterogeneity, cannot meet the needs of financial regulators and obligors who loan to a large number of corporates. A credit risk model which does not consider industry-specific default heterogeneity cannot capture the co-movements of defaults in the industry. As the largest emerging market, China has a huge impact on the global economy. As the industry-specific default heterogeneity of Chinese-listed firms is complex, the prediction of Chinese-listed firms’ PDs by various credit risk models is influenced by the industry’s default heterogeneity.
Therefore, the purpose of this paper is to build a credit risk model that is more suitable for Chinese-listed firms than the traditional model. In addition, in order to solve the problem of default heterogeneity in different industries in China, this model needs to contain information on industrial default heterogeneity, so that it can adjust according to the default heterogeneity of different industries when predicting the defaults of a large number of Chinese-listed firms, which makes the model more applicable. Such a model also has strong practicability when applied to the credit analysis of large portfolios, whether for China or other countries.

1.2. Research Objectives and Main Work

According to our research purpose, the main objectives of this paper are as follows: (1) We apply an advanced PD model to predict the PDs of Chinese-listed firms, which has rarely been used for Chinese firms before. Firstly, the model can realize dynamic measurement, which can automatically predict forward PDs with an update of the firm’s risk factors. Secondly, the model should be able to predict the term structure of the PD and realize muti-period PD estimation. (2) In view of the default heterogeneity of different industries in China, we need to adjust the model to make the new model contain default heterogeneity information when predicting future defaults, so that our PDs have more accurate ratios. In addition, when predicting the defaults of a large number of Chinese-listed firms, the aggregated PDs should be closer to the realized default than the original model. As long as the research objectives are met, the model’s practicability in the credit analysis of large portfolios can be improved.
Based on the above research objectives, this paper mainly addresses the following problems: (1) What PD model can realize the dynamic measurement of multi-period PDs for Chinese-listed firms? (2) How is it possible to improve the PD model to make it contain information on the default heterogeneity of different industries? (2) How is it possible to improve the accuracy ratios of multi-period PDs and reduce the gap between aggregated PDs and the realized number of defaults for multiple periods?
To solve the above problems, we develop a richer model based on the forward intensity approach proposed by Duan et al. [1], which has been applied to estimate the PDs of more than 73,000 listed firms worldwide. The forward intensity model can address the term structure effect and predict PDs for different horizons for a firm, which makes it an advanced credit risk measurement approach. Our model has two major differences compared with the original model. Firstly, we constructed time-varying industry-specific default heterogeneity indicators, which can reflect an unobserved part of the differences in defaults among different industries through a Bayesian approach based on the forward intensity model. The indicators we constructed can capture co-movements of defaults in all industries. Secondly, we consider the impact of co-movements of defaults both within industries and across industries when computing PDs. In fact, the situations of defaults in the industry and in the whole country both influence the PD of each listed firm to different degrees.
Because firms in different industries have different characteristics, we divided all Chinese-listed firms into 10 industries. Our extended model can measure the influence on different industries’ firms. In our preliminary analysis, we found there was great heterogeneity of default among different industries. In this paper, we compute PDs according to default heterogeneity in different industries and improve the predictive ability of the forward intensity model.

1.3. Hypothesis, Novelties and Contributions

In this paper, Chinese-listed firms’ PDs are estimated based on the forward intensity model. Comparing with the original model, we introduce the default heterogeneity indicator to characterize the default correlations within the industry. Therefore, the model in this paper has three hypotheses in total. Hypothesis (1): Firms’ defaults and other exits obey the Poisson process. Since the forward intensity model is modified by the doubly stochastic Poisson intensity model, we keep this basic assumption. This also means that defaults and other exits do not affect the firms’ covariates. Hypothesis (2): The firms’ defaults are conditionally independent. This is also called CID hypothesis. The traditional default intensity model is usually built into the framework of the CID model. This means that default correlations only depend on common factors, which are observable or potential. Hypothesis (3): Default heterogeneity causes default correlations within the industry. According to the CID hypothesis, we believe that the default correlations within the industry come from potential common factors, which can be described by industry default heterogeneity.
Compared with previous research, our novelties mainly include two aspects: (1) In recent research on the PD prediction of Chinese-listed companies, the forward intensity model, which has better performance, is rarely applied or modified by researchers compared with the structural model. In this paper, we fill this gap. (2) The original forward intensity model does not capture the co-movement of defaults within the industry when predicting the PDs of a large number of firms in a region. We propose a Bayesian approach to make the forward intensity model include information on industry default heterogeneity, so that the model can flexibly adjust the firm’s PD according to industry.
Our main work is as follows. Firstly, we introduce the industry-specific default heterogeneity indicator into our model with a Bayesian approach and maximize the pseudo log-likelihood function to estimate the parameters of our Bayesian model. Then, default heterogeneity indicators for 10 industries are generated from January 2000 to December 2019. Secondly, we measure industry-specific default heterogeneity’s influence on each Chinese-listed firm, both from its industry and from other industries, for all prediction horizons. In addition, our model also includes trends in industry-specific default heterogeneity indicators within the industry and among other industries. Due to default heterogeneity among different industries, we estimate four parameters of 10 industries for 36 prediction horizons in our second estimation, and, finally, we compute PDs from them based on forward intensity.
The main contributions of this paper are as follows: (1) We construct a credit risk measurement model considering industry-specific default heterogeneity to predict the PDs of Chinese-listed firms based on the forward intensity model. Including information on industry-specific default heterogeneity can make the model more practical in stress testing for an industry. (2) We introduce industry-specific default heterogeneity indicators into the forward intensity model. We capture co-movements in firms’ defaults within the industry and among other industries for all prediction horizons. PDs are computed by considering the impact of default heterogeneity among different industries. (3) Compared with the original forward intensity model, our extended model improves the accuracy ratio and reduces the gap between aggregated PDs and the realized number of defaults for all prediction horizons.
This paper is organized as follows. In Section 2, we review the theoretical development of credit risk models and introduce recent research on industry-specific default heterogeneity. In Section 3, we extend the forward intensity model and introduce the industry-specific default heterogeneity indicator. This section also explains how we compute PDs. Section 4 gives details about how to estimate the parameters of our extended model. Section 5 presents the data and some preliminary analyses on defaults for all industries. In Section 6, we show the estimated values of the parameters of our model. We tested the model’s performance for all horizons by comparing its predictive ability for PDs. Section 7 presents our conclusions and discusses future developments.

2. Literature Review

2.1. Credit Risk Models

Credit risk models that can compute PDs with term structure can be classified into two main categories: structural and reduced-form models. A structural model uses the structure of assets and liabilities to estimate the expected default frequency in the future. The earliest structural model was proposed by Merton [2] based on the famous Black–Scholes option pricing model. Later, the KMV company proposed the KMV model, based on the Merton model. Hillegeist et al. [3] believe that structural models have stronger predictive power than Z-scores. Because the structural model only relies on market information as a predictive variable, it misses much information. Research usually combines structural models with other models, and recent examples are Zhang and Shi [4], Song et al. [5], and Zeng et al. [6].
In this paper, we extend the forward intensity model, which falls into the class of reduced-form models. Compared with structured models, reduced-form models make it possible to select risk factors freely, but early reduced-form models could not calculate PDs with term structure. The earliest reduced-form models date back to the 1960s. Altman and Edward [7] and Beaver William [8] only calculated the credit score using discriminant analysis. In 2007, Duffie et al. [9] proposed a doubly stochastic Poisson intensity model, making it possible to predict PD concerning its term structure. They addressed the term structure effect to realize multi-period prediction, which provides the theoretical basis for our study. In 2012, Duan et al. [1] proposed a forward intensity approach to make the doubly stochastic Poisson intensity model more practical. The forward intensity approach was implemented by maximizing the decomposable pseudo-likelihood function. They realized multi-period prediction using only the data available at the time they made the prediction from, providing the practical basis for this paper. Some researchers compute PDs based on the doubly stochastic Poisson intensity model or the forward intensity model. Recent examples are Hwang and Chu [10], Caporale et al. [11], Berent and Rejman [12], and Sigrist and Leuenberger [13].

2.2. Default Heterogeneity in Industries

Compared with structured models, the forward intensity approach includes more risk factors due to its broader assumptions. However, the original forward intensity approach computes PDs for a large sample of firms without considering default heterogeneity in industry and its influence. Recently, more and more researchers have studied industry-specific default heterogeneity among different industries. Bhimani et al. [14] computed PDs for 31,025 private enterprises in Portugal and found that industry and geographical location affect default. Dakovic et al. [15] constructed a generalized linear mixed model and modeled unobserved default heterogeneity in different industries, and found that a model considering unobserved heterogeneity had a higher accuracy ratio. Giesecke and Kim [16] developed a dynamic measurement method for systemic risk in the entire financial sector, which captures the impact of industry-specific risk factors on the timing of failure. Koopman et al. [17] proposed a high-dimensional, nonlinear, non-Gaussian dynamic factor model to decompose system default risk into (1) macro financial risk, (2) autonomous default dynamics, and (3) potential components of industry-specific effects. They found that about 35% of the variation in default rate was caused by industry-specific factors. Mensi et al. [18] investigated the short-term and long-term effects of FFR, VIX index, and crude oil price on credit risk in the US banking, financial service, and insurance sectors, which are quantile-dependent, and found that the short-term and long-term effects of risk factors are time-varying and heterogeneous under different credit market conditions. Gertler et al. [19] used the quasi-panel method to model the corporate loan default rates of four major economic industries. This model allowed the combination of long-term and short-term technologies at the same time to maintain a flexible unified framework, so as to capture heterogeneity among different industries, and found that significant default heterogeneity exists in industries. Lee [20] examined whether industry-level credit risk affects the yield spread of corporate bonds and used three types of industry risk variables: dilemma exposure measures, industry status, and product market competition. Evidence suggests that industry systemic risk does play an important role in explaining bond yield spreads.
This paper extends the forward intensity approach by including information on default heterogeneity in industries. We computed PDs with information on the default heterogeneity of all industries to make them more helpful for predicting defaults both for one firm and a large portfolio of firms.

3. Computed PDs by Industry-Specific Heterogeneity Indicators

3.1. Industry-Specific Default Heterogeneity Indicators

It is an effective method to construct different categories of samples’ indicators as independent variables impacting credit risk. For example, Batrancea [21] constructed 10 ratios concerning financial performance to study their influence on bank assets and the liabilities of the most important 45 banks in Europe and Israel, the United States of America, and Canada, which provides a good basis for the research in this paper. In this paper, we extend the forward intensity model by constructing industry-specific heterogeneity indicators for all industries to compute PDs for multiple periods.
Duffie et al. [9] described defaults and other types of delisting excluding bankruptcy (other exit) as two independent doubly stochastic Poisson processes. For the i-th firm, default and other exit intensities can be denoted by h i ( m , n ) and h ¯ i ( m , n ) , respectively. The two intensities represent two average arrival rates during the interval [ n τ , ( n + 1 ) τ ] with observation time point m τ . Here, τ is set as one month, which means the basic time interval. Therefore, in a basic time interval, τ = 1 / 12 and the default intensity is deterministic, so we can get the following probabilities:
PD i ( m , n ) = 1 exp ( h i ( m , n ) τ ) , POE i ( m , n ) = exp ( h i ( m , n ) τ ) [ 1 exp ( h ¯ i ( m , n ) τ ) ] , PS i ( m , n ) = exp [ ( h i ( m , n ) + h ¯ i ( m , n ) ) τ ] .
Here, PD i ( m , n ) , POE i ( m , n ) , and PS i ( m , n ) are the probabilities of default, other exit and survival, respectively, during the interval [ n τ , ( n + 1 ) τ ] and observed at the time point m τ . Obviously, PD i ( m , n ) + POE i ( m , n ) + PS i ( m , n ) = 1 . For more details, readers are referred to Duan et al [1].
A lot of evidence shows that industry-specific default heterogeneity exists in different industries. In our preliminary analysis, Chinese-listed firms also show that there are different co-movements in different industries over time. Besides, a firm’s default tendency changes with co-movements of defaults in the industry and in the whole economy. Section 5 shows the details of the preliminary analysis. To study unobserved co-movements of defaults in industries, we take an industry as a whole by averaging the default intensities of all the firms in the industry:
H j ( m , n ) = i = 1 I j ( m ) h i ( m , n )   I j ( m ) ,
where I j ( m ) is the total number of surviving firms in the sample at the time point m τ in the j-th industry.
A Bayesian approach was often employed when modeling the firms’ credit risk. Ni et al. [22] found that Bayesian estimation can be employed on the firm’s default forward intensity to introduce default heterogeneity into the forward intensity model. In this paper, we take the industry’s average default intensity, estimated by the forward intensity approach, as the average prior default intensity of the industry. According to the additivity of Poisson processes, the number of firm defaults in the j-th industry follows the Poisson distribution. If we take all firms in the j-th industry as a whole, the number of defaults for every firm in the j-th industry follows the Poisson distribution with the industry's average prior default intensity H j ( m , n ) . According to the properties of the Poisson process, in a basic time interval, the conjugate prior distribution of H j ( m , n ) is a Gamma distribution expressed as Γ ( α j ( m , n ) , β j ( m , n ) ) , and
H j ( m , n ) ~ Γ ( α j ( m , n ) , β j ( m , n ) ) .
Then, the density function of H j ( m , n ) is
π ( H j ( m , n ) ) = β i ( m , n ) α i ( m , n ) Γ ( α i ( m , n ) ) H j ( m , n ) α i ( m , n ) 1 e β i ( m , n ) h i ( m , n ) .  
Let y i ( n ) denote whether the i-th firm has a default during the interval [ n τ , ( n + 1 ) τ ] . When the observation time is after ( n + 1 ) τ , the probability function of y i ( n ) is expressed as
p ( y i ( n ) ) = h i ( m , n ) y i ( n ) Γ ( y i ( n ) + 1 ) e h i ( m , n ) .
Because default is a low-probability event, we assume a firm defaults at most once a month. Let y j ( m , n ) denote the number of defaults during the interval [ n τ , ( n + 1 ) τ ] for all firms in the j-th industry which are surviving in the sample at time point m τ . We then get the posterior distribution of H j ( m , n ) for the j-th industry when the observation time is after ( n + 1 ) τ :
π H j ( m , n ) y j ( m , n ) = p y j ( m , n ) H j ( m , n ) π H j ( m , n ) 0 + p y j ( m , n ) H j ( m , n ) π H j ( m , n ) d H j ( m , n ) , = H j ( m , n ) i = 1 I j ( m ) y i ( n ) e I j ( m ) H j ( m , n ) i = 1 I j ( m ) Γ y i ( n ) + 1 β j ( m , n ) α j ( m , n ) Γ α j ( m , n ) H j ( m , n ) α j ( m , n ) 1 e β j ( m , n ) H j ( m , n ) 0 + H j ( m , n ) i = 1 I j ( m ) y i ( n ) e I j ( m ) H j ( m , n ) i = 1 I j ( m ) Γ y i ( n ) + 1 β j ( m , n ) α j ( m , n ) Γ α j ( m , n ) H j ( m , n ) α j ( m , n ) 1 e β j ( m , n ) H j ( m , n ) d H j ( m , n ) = I j ( m ) + β i ( m , n ) α i ( m , n ) + y j ( m , n ) Γ α i ( m , n ) + y j ( m , n ) H j ( m , n ) α i ( m , n ) + y j ( m , n ) 1 e I j ( m ) + β i ( m , n ) H j ( m , n ) .
Let H ^ j ( m , n ) be the parameter of the posterior intensity of H j ( m , n ) , then
H ^ j ( m , n ) ~ Γ ( α j ( m , n ) + y j ( m , n ) , β j ( m , n ) + I j ( m ) ) .
According to the properties of the Gamma distribution, during the period [ n τ , ( n + 1 ) τ ] , the mean values of the prior and posterior default intensities can be expressed correspondingly as
E ( H j ( m , n ) ) = α j ( m , n ) β j ( m , n ) = τ H j ( m , n ) , E ( H ^ j ( m , n ) ) = α j ( m , n ) + y j ( m , n ) β j ( m , n ) + I j ( m ) = τ H ^ j ( m , n ) .
Combining the above formulas to eliminate α j ( m , n ) , we have a relationship between H j ( m , n ) and H ^ j ( m , n ) :
H ^ j ( m , n ) = β j ( m , n ) H j ( m , n ) + y j ( m , n ) τ β j ( m , n ) + I j ( m ) .
According to the properties of conjugate distributions,   the higher β i ( m , n ) , the more confidence we have in H j ( m , n ) . Specially, as β i ( m , n ) approaches positive infinity, H ^ j ( m , n ) = H j ( m , n ) , which means default intensity computed by the forward intensity approach completely dominates posterior default intensities. On the contrary, if β i ( m , n ) = 0 , H ^ j ( m , n ) = y j ( n ) τ I j ( m ) means that the posterior default intensity depends on the data we observe, which is the average number of defaults of all firms in the j-th industry during the n-th month. If we combine the above formulas:
H ^ j ( m , n ) = β j ( m , n ) + I j ( m ) y j ( m , n ) τ i = 1 I j ( m ) h i ( m , n ) β j ( m , n ) + I j ( m ) H j ( m , n ) .
The above equation shows the relationship between the average posterior intensity and the average prior intensity for all the j-th industry’s firms. Here, τ i = 1 I j ( m ) h i ( m , n ) is the expectation of the number of defaults in the j-th industry in month n, which is estimated using the forward intensity approach, while y j ( m , n ) is actual number of defaults in the n-th month in the same sample from this industry. If τ i = 1 I j ( m ) h i ( m , n ) > y j ( m , n ) , it means that the original model overestimates the default intensity of the whole industry in the n-th month, then   H ^ j ( m , n ) < H j ( m , n ) . On the contrary, if τ i = 1 I j ( m ) h i ( m , n ) < y j ( m , n ) ,it means that the original model underestimates the default intensity of the whole industry in the n-th month, then H ^ j ( m , n ) < H j ( m , n ) . Let the ratio of the posterior default intensity to the prior default intensity be the industry-specific default heterogeneity indicator, which reflects the extent to which the total number of defaults in the industry exceeds or falls below the aggregated prior PDs.
Z j ( m , n ) = H ^ j ( m , n ) H j ( m , n ) = β j ( m , n ) + I j ( m ) y j ( m , n ) τ i = 1 I j ( m ) h i ( m , n ) β j ( m , n ) + I j ( m ) .
Note that the industry-specific default heterogeneity indicator is time-varying, and depends on the difference between the total number of firm defaults in the n-th month and the expectation of the number of defaults estimated by the original model in the same industry in the same month. It can capture the co-movements of defaults in the industry. Because the original model does not consider industry-specific default heterogeneity, it can only capture less information on the time dynamics in the industry or the whole economy by including observable common factors in the input variables. Our industry-specific default heterogeneity indicator is calculated by comparing the posterior and prior default intensity in the industry, which can capture an unobserved part of the co-movements of defaults in industries. For example, when the average posterior default intensity is much greater than the average prior default intensity of an industry, we can know a co-rise of defaults in the industry has not been captured by the original model. If we ignore this information, we will underestimate the firms’ PDs when the default cluster occurs in this industry. Besides, we can also measure the average tendency to default in different industries over different time periods due to its time-varying nature. On the other hand, for industries with low credit risk, due to fewer default events within the industry, the number of realized defaults for different default prediction horizons will be far less than the aggregated PDs for all firms within the industry. According to Formula ( 11 ) , the industry-specific default heterogeneity indicator Z j ( m , n ) will be less than 1. If a firm’s credit risk is very dependent on the industry-specific default heterogeneity indicator for its industry, when we estimate firms’ PDs in this industry, the new default forward intensity will be obviously lower than the original default forward intensity, which means the new PDs are lower. Therefore, when we estimate PDs, we will introduce default heterogeneity in the industries into the forward intensity function. Duan and Miao [23] found that short-term PDs can capture more co-movements in credit risk. Therefore, we calculated the shortest horizon’s industry-specific default heterogeneity indicators to capture most co-movements by set n = m . Then, h i ( m , m ) denotes the i-th firm’s default intensity during the m-th month observed on the first day of the m-th month. We assume that for firms in the same industry the posterior 1-month PDs share the same confidence in the 1-month prior PD. Then, we replace β j ( m , m ) and Z j ( m , m ) by β j ( 1 ) and Z j ( m ) respectively.
Z j ( m ) = β j ( 1 ) + I j ( m ) y j ( m , m ) τ i = 1 I j ( n ) h i ( m , m ) β j ( 1 ) + I j ( m ) .  
To predict forward PDs in the future, we assume that an industry maintains a similar industry default heterogeneity during close periods. We calculate the most recent available industry-specific default heterogeneity indicators to represent co-movements of defaults in industries in the current month. We define the industry-specific posterior default intensity of the m-th month in terms of the prior default intensity multiplied by the most recent available industry-specific default heterogeneity indicator:
h ^ i j ( m ) = Z j ( m 1 ) h i ( m , m ) ,
Here, we use 1-month-horizon prior default intensity and an industry-specific default heterogeneity indicator to make h ^ i j ( m ) as close as possible to the current situation. When we observe at the first day of the m-th month, Z j ( m , m ) is unknown. Then, the industry-specific posterior default intensity is as follows:
h ^ i j ( m ) = β j ( 1 ) + I j ( m 1 ) y j ( m 1 , m 1 ) τ i = 1 I j ( m ) h i ( m 1 , m 1 ) β j ( 1 ) + I j ( m 1 ) h i ( m , m ) .
The 1-month industry-specific posterior default intensity of every firm is calculated taking industry-specific default heterogeneity into account. In addition, if the PDs of an industry are aggregated, the systematic credit risk of the industry can also be estimated, which can be applied for stress testing of the industry’s credit risk. This way of calculating portfolio credit risk by aggregating every firm’s PD is a bottom-up method. However, this industry-specific posterior default intensity cannot be regarded as our final calibrated default intensity. This is because a firm’s default can not only be influenced by the industry the firm is in, but may also be influenced by other industries. Therefore, if we calibrate a firm’s PD, we need to take all industries’ default situations into account. But we need industry-specific posterior default intensity to estimate β j ( 1 ) , which will be introduced in the next section. With β j ( 1 ) , we can calculate Z j ( m ) for all past months, which reflects the average level of default heterogeneity of the j -th industry in the m-th month.

3.2. PDs for All Horizons

If the default heterogeneity of other industries is not taken into account, the impact of co-movement of the whole economy’s defaults will not be adequately considered. Thus, we constructed a new variable, Z j , o t h e r ( m ) , denoting the weighted average industry-specific default heterogeneity of other firms:
Z j , o t h e r ( m ) = s = 1 10 Z s ( m ) I s ( m ) Z j ( m ) I j ( m ) s = 1 10 I s ( m ) I j ( m ) .
Here, Z j ( m ) and Z j , o t h e r ( m ) both contribute to our calibrated default intensity. On the other hand, when we compute the longer prediction horizons’ PDs, it is not enough to just consider the current value of industry-specific default heterogeneity. Duan et al [1] found trends in some input variables are helpful to predict PDs. We also take the trends in our industry-specific default heterogeneity indicators into account. Trends are calculated using the difference between current value and the average value over the past period of time. To be consistent, we take the average value of the past 12 months following Duan et al [1]. Because the default intensity function is a proportional-hazards form of the original forward intensity approach, our forward default intensity function is as follows:
h ^ i ( m , m + l 1 ) = exp ( X j ( m ) γ j ( l ) ) h i ( m , m + l 1 ) .
Here, γ j ( l ) is a column vector and X j ( m )   is a row vector:
γ j ( l ) = [ γ j 1 ( l ) , γ j 2 ( l ) , γ j 3 ( l ) , γ j 4 ( l ) ] T , X j ( m ) = [ Z j ( m 1 ) ,   TZ j ( m 1 ) , Z j , o t h e r ( m 1 ) , TZ j , o t h e r ( m 1 ) ] ,
where X j ( m ) is the industry-specific default heterogeneity indicator of the j-th industry. Here, TZ j ( m 1 ) and TZ j , o t h e r ( m 1 ) are the trends of Z j ( m 1 ) and Z j , o t h e r ( m 1 ) , respectively. The next section will describe how we estimate γ j ( l ) for all the prediction horizons. As long as we estimate γ j ( l ) , we can calibrate the default intensity estimated by the forward intensity approach to compute h ^ i ( m , n ) , which considers co-movements in the industry’s credit risk. With h ^ i ( m , n ) , we can compute cumulative POEs, PDs for different prediction horizons. Firstly, we compute conditional PDs and POEs by h ^ i ( m , n ) and h ¯ i ( m , n ) , where h ¯ i ( m , n ) is calculated by the original models. Then, we can compute forward PDs, which are conditional PDs timed by probabilities that the firm survives between the predicting month and the observed month. Finally, we can compute cumulative PDs by cumulating forward PDs. For more details of computing PDs, readers may also refer to Duan et al. [1].
In this paper, we compute cumulative PDs for horizons from 1 month to 36 months and evaluate the prediction performance of our PDs computed with industry-specific default heterogeneity and PDs estimated by the original model in Section 6.

4. Pseudo-Likelihood Functions

Firstly, we employ the forward intensity model to calculate default forward intensity. Then, let τ i and τ ¯ i be the months after the ( m 1 ) -th months in which the default and other exit occur, respectively, for the i-th firm. We assume the firms’ defaults are conditionally independent, and the 1-month horizon pseudo-likelihood function of the j-th industry’s posterior probability is expressed as:
P l = 1 posterior ( β j ( 1 ) , τ D , τ O E , h ) = m = 1 M   j = 1 J   i = 1 I j ( m )   P ^ j , l = 1 posterior ( β j ( 1 ) , τ i , τ ¯ i , h i ( m , m ) )
where M denotes the last month of the sample, I is the number of firms in the sample, J is the number of industries in the sample, P ^ l ( β l , τ D i , τ O E i , h i ( m ) )   is a probability depending on the actual status of the i-th firm during month m , and h i ( m ) is default intensity estimated by the forward intensity model on the first day of month m for the i-th firm. Then,
P ^ j , l = 1 posterior β j ( 1 ) τ i , τ ¯ i , h i ( m , m ) = 1 t 0 i m , min τ i , τ ¯ i > m P t 0 i m , min τ i , τ ¯ i > m + 1 t 0 i < m , τ i τ ¯ i , τ i = m P t 0 i m , τ ¯ i τ ¯ i , τ i = m + 1 t 0 i < m , τ ¯ i τ i , τ ¯ i = m P t 0 i m , τ ¯ i τ i , τ ¯ i = m + 1 t 0 i > m P t 0 i > m + 1 min τ i , τ ¯ i < m P min τ i , τ ¯ i < m ,
where t 0 i is the first month for the i-th firm. The above formula is like the overlapped pseudo-likelihood function proposed by Duan et al. [1]. Here, the first term is the probability the firm survives during month m . The second term is the probability the firm defaults during month m . The third term is the probability the firm has another exit event during month m . The last two terms are the situations where the firm has not entered the sample and the firm has already exited from the market. We introduced Formula (14) into the overlapped pseudo-likelihood function:
P ^ j , l = 1 posterior ( β j ( 1 ) , τ i , τ ¯ i , h i ( m , m ) ) = 1 { t 0 i m , min ( τ i , τ ¯ i ) > m } × exp { τ [ β j ( 1 ) + I j ( m 1 ) y j ( m 1 , m 1 ) τ i = 1 I j ( m ) h i ( m 1 , m 1 ) β j ( 1 ) + I j ( m 1 ) h i ( m , m ) + h ¯ i ( m , m ) ] } + 1 { t 0 i m , τ i τ ¯ i , τ i = m } { 1 exp [ τ β j ( 1 ) + I j ( m 1 ) y j ( m 1 , m 1 ) τ i = 1 I j ( m ) h i ( m 1 , m 1 ) β j ( 1 ) + I j ( m 1 ) h i ( m , m ) ] } + 1 { t 0 i m , τ ¯ i τ i , τ ¯ i = m } { exp [ τ h ¯ i ( m , m ) ] } × exp [ τ β j ( 1 ) + I j ( m 1 ) y j ( m 1 , m 1 ) τ i = 1 I j ( m ) h i ( m 1 , m 1 ) β j ( 1 ) + I j ( m 1 ) h i ( m , m ) ] + 1 { t 0 i > m } + 1 { min ( τ i , τ ¯ i ) < m } .
Then, we keep the terms associated with β j ( 1 ) :
P ^ j , l = 1 posterior , β ( β j ( 1 ) , τ i , τ ¯ i , h i ( m , m ) ) = 1 { t 0 i m , min ( τ i , τ ¯ i ) > m } × exp { τ [ β j ( 1 ) + I j ( m 1 ) y j ( m 1 , m 1 ) τ i = 1 I j ( m ) h i ( m 1 , m 1 ) β j ( 1 ) + I j ( m 1 ) h i ( m , m ) ] } + 1 { t 0 i m , τ i τ ¯ i , τ i = m } { 1 exp [ τ β j ( 1 ) + I j ( m 1 ) y j ( m 1 , m 1 ) τ i = 1 I j ( m ) h i ( m 1 , m 1 ) β j ( 1 ) + I j ( m 1 ) h i ( m , m ) ] } + 1 { t 0 i m , τ ¯ i τ i , τ ¯ i = m } exp [ τ β j ( 1 ) + I j ( m 1 ) y j ( m 1 , m 1 ) τ i = 1 I j ( m ) h i ( m 1 , m 1 ) β j ( 1 ) + I j ( m 1 ) h i ( m , m ) ]   + 1 { t 0 i > m } + 1 { min ( τ i , τ ¯ i ) < m } .
Appendix A shows how to maximize L l = 1 posterior ( β j ( 1 ) , τ D , τ O E , h ) and estimate β j ( 1 ) . With β j ( 1 ) , we can compute all firms’ PDs and calculate the trends and current values of our industry-specific default heterogeneity indicators. The industry-specific default heterogeneity indicators of all industries are then calculated. The pseudo-likelihood function of new default intensity with industry-specific default heterogeneity indicators is expressed as:
P l ( γ , τ D , τ O E , X , h ) = m = 1 M 1   j = 1 J   i = 1 I j ( m ) P ^ j , l ( γ j 1 , γ j 2 , γ j 3 , γ j 4 , τ i , τ ¯ i , X j ( m ) ,   h i ( m , m + l 1 ) )
where
P ^ j , l ( γ j 1 , γ j 2 , γ j 3 , γ j 4 , τ i , τ ¯ i , X j ( m ) ,   h i ( m , + l 1 ) ) = 1 { t 0 i m , min ( τ i , τ ¯ i ) m + l } exp ( τ { k = 0 l 1 { exp [ X j ( m ) γ j ( k + 1 ) ] h i ( m , m + k ) + h ¯ i ( m , m + k ) } } ) + 1 { t 0 i m , τ i τ ¯ i , τ i < m + l } exp ( τ { k = 0 τ i m 1 { exp [ X j ( m ) γ j ( k + 1 ) ] h i ( m , m + k ) + h ¯ i ( m , m + j ) } } ) × { 1 exp [ τ exp [ X j ( m ) γ j ( τ i m + 1 ) ] h i ( m , τ i ) ] ] + 1 { t 0 i m , τ ¯ i τ i , τ ¯ i < m + l } exp ( τ { k = 0 τ ¯ i m 1 { exp [ X j ( m ) γ j ( k + 1 ) ] h i ( m , m + k ) + h ¯ i ( m , m + k ) } } ) × { 1 exp [ τ h ¯ i ( m , τ ¯ i ) ] } exp { τ exp [ X j ( m ) γ j ( τ i m + 1 ) ] h i ( m , τ i ) } + 1 { t 0 i > m } + 1 { min ( τ i , τ ¯ i ) < m } .  
Here, the first term is the probability that a firm survives during l months from the m -th month. The second term is the probability that a firm defaults during l months from the m -th month. The third term is the probability that a firm has another exit during l months from the m -th month. The last two terms are the situations that a firm has not entered the sample and that a firm has already exited the market. Similarly, we keep the terms associated with γ   :
P ^ j , l γ ( γ j 1 , γ j 2 , γ j 3 , γ j 4 , τ i , τ ¯ i , X j ( m ) ,   h i ( m , + l 1 ) ) = 1 { t 0 i m , min ( τ i , τ ¯ i ) m + l } exp [ τ k = 0 l 1 exp [ X j ( m ) γ j ( k + 1 ) ] h i ( m , m + k ) ] + 1 { t 0 i m , τ i τ ¯ i , τ i < m + l } exp [ τ k = 0 τ i m 1 exp [ X j ( m ) γ j ( k + 1 ) ] h i ( m , m + k ) ] × { 1 exp [ τ exp [ X j ( m ) γ j ( τ i m + 1 ) ] h i ( m , τ i ) ] } + 1 { t 0 i m , τ ¯ i τ i , τ ¯ i < m + l } exp [ τ k = 0 τ ¯ i m 1 exp [ X j ( m ) γ j ( k + 1 ) ] h i ( m , m + k ) ] × exp [ τ exp [ X j ( m ) γ j ( τ i m + 1 ) ] h i ( m , τ i ) ] + 1 { t 0 i > m } + 1 { min ( τ i , τ ¯ i ) < m } .
Details about how to maximize P l ( γ , τ D , τ O E , X , h ) and to estimate γ are also introduced in Appendix A.

5. Data and Preliminary Analysis

5.1. Data

Our data consist of firm-specific variables, firms’ events information, and common factors obtained from the NUS-CRI database. Default events are defined using the standards of CRI: (1) bankruptcy filing; (2) a missed or delayed payment; or (3) debt restructuring/distressed exchange. Our data set is all Chinese-listed firms from 1991 to 2020. In our sample, we used firm-specific and common variables for the first available day of each month from January 2000 to December 2019 to estimate our models’ parameters. To test the out-of-sample performance, the data were divided into the experimental group and the evaluation group at a ratio of 5 to 1, randomly. There were in total 3747 and 729 firms in the experimental group and the evaluation group, respectively.
To compare the model, we construct the forward intensity function with the same variables used by Duan et al. [1]. There are four common factors: stock index return, interest rate, financial aggregated DTD, and non-financial aggregated DTD. Here, aggregated DTD is the median DTD of all financial or non-financial Chinese-listed firms. DTD is calculated by an adjustment method provided by Duan and Wang [24]. The firm-specific variables include DTD, CASH/TA, CA/CL, NI/TA, SIZE, M/B, and SIGMA. Duan et al. [1] found it helpful for estimating default to consider the trends in some firm-specific variables, which are the differences in value between the current values and the 1-year mean values of the variables. Therefore, DTD, CASH/TA, CA/CL, NI/TA, and size are calculated for the trend and level. Here, the level is the 1-year mean value of the variable.

5.2. Preliminary Analysis

According to the NUS-CRI database, Chinese-listed firms are divided into 10 industries: basic materials, communications, consumer (cyclical), consumer (non-cyclical), diversified, energy, financial, industrial, technology, and utilities. Figure 1 shows the number of all industry defaults per year from January 2000 to December 2019. Figure 2 shows the frequency of all industry defaults per year in the same period. Because there are many industries, we show five industries with more defaults and five industries with fewer defaults in two figures, respectively.
We find that there are different fluctuations of default frequency in different industries. In terms of the overall default ratio, diversified firms have a higher default frequency than average. Comparatively, PDs in the utilities industry are very low. In addition, there are differences in time span and degree of fluctuation in different industries. From 2003 to 2007, for example, default frequency was high in the financial and communications industries. Compared with the energy industry, the time span of default clustering is between 2004 and 2008. We believe that corporate defaults are not only related to the overall default ratio and trend in the region but also related to the overall default ratio and trends in the industry.
Currently, credit risk models are able to measure a firm’s PDs for multiple periods. By aggregating the PDs of all firms, the PD of the whole industry or region can be obtained. Therefore, credit risk models can also be employed to stress test. However, if a credit risk model does not contain information on industry-specific default heterogeneity, it cannot capture unobserved co-movements of defaults in the industry. Our model considers different co-movements of defaults in all industries. The prediction performance for aggregated PDs calculated by a bottom-up approach will be closer to the real situation with information on industry-specific default heterogeneity.

6. Empirical Results

6.1. Parameter Estimates

After employing the forward intensity model to estimate the original default forward intensity of all firms, we estimate β j ( 1 ) by maximizing the pseudo-likelihood function 17. Table 1 shows the estimated values of 10 industries’ parameters.
According to Formula (14), the larger β j ( 1 ) is, the closer posterior PDs are to prior PDs. This means that the firm in the j-th industry is slightly influenced by other firms’ defaults in the same industry, and posterior probability does not change much. According to the estimated values of 10 industries’ parameters, a firm in the consumer (non-cyclical) industry is least influenced by other firms’ defaults in the same industry. In contrast, financial firms’ defaults influence other financial firms the most. Firms in basic materials, diversified, technology, and utilities industries can also be influenced by other firms’ defaults in the same industries.
After computing all Z j ( m ) and T Z j ( m ) values, we estimate γ j ( l ) for all horizons in all industries. Table 2, Table 3, Table 4 and Table 5 show estimated values of γ j 1 ( l ) , γ j 2 ( l ) , γ j 3 ( l ) ,   γ j 4 ( l ) for some representative prediction horizons, and γ j 1 ( l ) , γ j 2 ( l ) , γ j 3 ( l ) ,   γ j 4 ( l ) are parameters of Z j ,   TZ j , Z j , o t h e r , TZ j , o t h e r .
Table 2 reflects the impact of current values of industry-specific default heterogeneity indicators in 10 industries for some representative horizons. Table 3 reflects the impact of trends in default heterogeneity indicators in 10 industries for some representative horizons. In the industrial industry, the impact of co-movements is largest and longest. Default clustering may influence the industrial industry over more than 2 years. In the financial industry, the impact of co-movements is also large and lasts for more than a year.
Table 4 reflects the impact of the current values of industry-specific default heterogeneity indicators in other industries for some representative horizons. Table 5 reflects the impact of trends in industry-specific default heterogeneity in other industries for some representative horizons. Some industry categories, like communications, consumer (non-cyclical), diversified, and energy are influenced much more by co-movements in other industries’ defaults than by themselves.

6.2. Comparing with the Number of Defaults

Since the forward intensity model is practical for estimating multi-period PDs with term structure, we chose to extend it by constructing industry-specific default heterogeneity indicators. Therefore, we need to compare the original forward intensity model with our extended model. In addition, neural networks and machine learning are also good ways to improve default prediction. Previous research on machine learning usually only predicts defaults through discrimination or calculation of credit scores rather than estimation of accumulative PDs for multiple periods. For example, Barboza et al. [25] predicted bankruptcy one year prior to the event and compared the performance of different machine learning models, including support vector machines, bagging, boosting, and random forest. Gunnarsson et al. [26] constructed a multilayer perceptron network and a deep belief network and compared their performance for credit scoring. However, with the development of PD models, it is possible to predict multi-period PDs through machine learning. For example, Sigrist and Leuenberger [13] combined econometric models with different machine learning models to estimate multi-period cumulative PDs and found that tree-boosting has the highest prediction accuracy. In this paper, we need to judge whether our industry default heterogeneity indicators are helpful to improve default-predicting ability for Chinese-listed firms, and the comparison between the original forward intensity model and our extended model is as follows.
Comparing the aggregated number of defaults with aggregated PDs is widely used to evaluate credit risk model performance. For each horizon, we computed the aggregate accumulated PDs of all surviving firms using data for the first day of the month and aggregated numbers of defaults in the prediction horizons for the same firms. Then, we compared them for all prediction horizons. Figure 3 and Figure 4 show a comparison of the realized number of defaults, original aggregated accumulated PDs, and new aggregated accumulated PDs for a 12-month prediction horizon in sample and out of sample, respectively.
Obviously, new aggregated PDs that consider industry-specific default heterogeneity are closer to the realized number both in and out of sample. From 2003 to 2005, a large number of firms defaulted, and new aggregated PDs that consider industry-specific default heterogeneity are higher than the original PDs for the prediction horizon of 12 months. In contrast, the new aggregated PDs are lower than the original PDs for the period 2013 to 2015. This indicates that our model can capture more effective information on industry-specific default heterogeneity and its impact.

6.3. Prediction Accuracy Ratio

Accuracy ratio (AR) is widely adopted to evaluate the predictive ability of credit risk models. AR can reflect the effective information on defaults in the future that a credit risk model contains. If the AR is zero, it means the model is a zero-information model. If the cumulative accuracy profile is at the 45° line, then the model is a zero-information model. The higher the AR, the better the predictive power of the model, and the cumulative accuracy profile will be further above the 45° line. Readers may refer to Vassalou and Xing [27] for more details. The AR they obtained for Merton’s model is 0.592, which means the model contains substantial, effective information on defaults in the future. Figure 5 and Figure 6 show the cumulative accuracy profiles for calibrated PDs from January 2000 to December 2019 for horizons of 1, 2, 3, 6, 12, 24, and 36 months in and out of sample, respectively. The model’s cumulative accuracy profiles are obviously above the 45° line for all prediction horizons. In particular, the predictive power of the model is strong for horizons of no more than 1 year.
Figure 7 and Figure 8 contrast the ARs of new PDs with the original PDs for all horizons in and out of sample. The ARs of new PDs are always higher than those of original PDs, both in sample and out of sample, for all horizons. On the other hand, the out-of-sample ARs of the new PDs are not lower than the in-sample ARs. For the short-term prediction horizons, the ARs of our new PDs were greatly improved compared with the original PDs out of sample. When the prediction horizons increase, our calibrated PDs improve less. However, when the prediction horizon increases to 3 years, the ARs of our new PDs are still higher than for the original PDs.
Table 6 and Table 7 show the values of the ARs for horizons of 1, 2, 3, 6, 12, 24, and 36 months in and out of sample, respectively. The ARs of our new PDs in the sample show a similar increase, of about 7%. Out of sample, the ARs of our new PDs are relatively increased by 8% when the prediction horizons are no more than 3 months. When the prediction horizons increase to more than 6 months, the relative increase in ARs is stable at around 6%. This means that our model is also helpful for long-horizon prediction.

7. Conclusions

All in all, industry-specific default heterogeneity is a problem that cannot be ignored in default prediction for a large portfolio. To address this issue, we extended the forward intensity model and introduced industry-specific default heterogeneity indicators into the model. We divided Chinese-listed firms into 10 industries and estimated the parameters of industry-specific default heterogeneity indicator functions for different industries. Then, we calculated the default heterogeneity indicators for all industries for each month and maximized the pseudo-likelihood function of new PDs. We measured the impacts that all firms received from within- and across-industry default heterogeneity for all horizons. Finally, we computed new PDs containing information about co-movements of default within and across industries over time. The new PDs improve the ARs and reduce the gap between aggregated PDs and the realized number of defaults, both in and out of sample, for all horizons. Through theoretical modeling and data analysis, we reached the following conclusions:
(1)
Co-movements in different industries are very heterogeneous. Defaults in some industries are greatly affected by co-movements of defaults within the industry, and some industries are greatly affected by co-movements of default in other industries.
(2)
Extending the forward intensity model, we studied how firms’ defaults are influenced by co-movements of default within the industry and across industries. We computed all Chinese-listed firms’ PDs by measuring the influence of co-movements of default within the industry and across industries.
(3)
The empirical results show that new PDs considering the impact of industry-specific default heterogeneity within and across industries have stronger predictive ability.
For both in-sample and out-of-sample predictions, new PDs’ ARs improved by more than 6% relative to all prediction horizons. Out of sample, short-term PDs’ ARs improved by up to 8% or more. We also compared the aggregated PDs and the realized number of defaults for one year and found that new PDs have a smaller gap from the realized number of defaults. The main contribution of this paper is that we extended the forward intensity model to make PDs contain information about default heterogeneity in industries. In addition, we measured the impact of default heterogeneity within and across industries. This makes the model capture more co-movements of defaults in the industry and the region. This is of significance not only for individual investors to avoid risk but also for credit risk supervision departments to detect systemic credit risk. For examples, it is helpful for a stress test to capture risk spillover within and across industries when default clustering occurs.
In addition, we find that the levels and trends of systemic credit risk across the region have a very different impact on various industries. The credit risk of industrial and financial firms, for example, is more sensitive to the increasing frequency of defaults across the region. While the cluster of defaults across the region is accelerating rapidly, even if it has not yet reached a high level, regulators must pay attention to these two industries. When the default cluster is already severe, diversified, energy, and communications firms are more likely to default. The non-cyclical consumption sector is more affected by external influences than the cyclical consumption sector. In a bad economic climate, regulators can focus on limiting leverage in these sectors to avoid a chain reaction. In addition, industries with large fluctuations in the industry-specific default heterogeneity indicator are more affected by systemic credit risk, such as the financial industry. When a default cluster appears, the financial industry's overall credit risk rises much higher than in other industries. When economic environment is good, the financial industry's overall credit risk declines more than in other industries. For industries with small fluctuations in the industry-specific default heterogeneity indicator, credit risk is mainly affected by the firm-specific attributes and observed common factors, and the new PD is closer to the original PD. Therefore, regulators could use such a model to do stress testing on the whole region and control the leverage of firms in financial, diversified and energy industries according to the level of systemic credit risk.
Credit risk modeling for large portfolios still faces challenges. Future research has several directions. In order to maintain stability, the spillover credit risks from other industries which we did not classify in this paper could be considered. Researchers can use machine learning methods to study the relationships between risk spillovers across industries. On the other hand, Duan et al. [1] adopted the exponential function to fit the firm's forward default intensity in the forward intensity approach. Compared with the linear function, the exponential function has more advantages. Researchers can also try to use the neural network method to fit the forward default intensity and realize a combination of the machine learning model with the forward default intensity model. Researchers can also conduct a more detailed study on how defaults spread within and across industries. We need deeper study on sources of default heterogeneity in industries, and it is important to improve credit risk models. Credit risk models which capture more unobserved co-movements of defaults in industries will be meaningful for improving credit risk models.

Author Contributions

Conceptualization, Z.N.; methodology, Z.N.; software, Z.N.; validation, Z.N.; formal analysis, Z.N.; investigation, Z.N.; resources, M.J.; writing—original draft preparation, Z.N. and W.Z.; writing—review and editing, Z.N. and W.Z.; supervision, M.J.; project administration, M.J.; funding acquisition, M.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the National Natural Science Foundation of China Grant No. 71831005 and No. 71502044. Funder: Minghui Jiang.

Data Availability Statement

Our data source is the Credit Research Initiative (CRI) database of the National University of Singapore.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

In this paper, we employed the gradient descent method to maximize the pseudo log-likelihood function and estimate β j ( 1 ) . Appendix A will show the pseudo log-likelihood function of β j ( 1 ) and its gradient.
We substitute Formula (14) into Formula (18) and take the log of it:
L l = 1 posterior = m = 1 M   i = 1 I j ( m ) l n P ^ j , l = 1 ( β j ( 1 ) , τ i , τ ¯ i , h i ( m , m ) ) = j = 1 J L j , l = 1 posterior .
Here, L l = 1 posterior can be decomposed by industry, and m = 1 denotes the first month from which we want to compute new PDs. Note that we also have data before this month, so we can also calculate Z j ( m ) and TZ j ( m ) when m < 1 . If we maximize j sets of L j , l = 1 posterior , L l = 1 posterior will be the maximum. Let h i E ( m , n ) be the default intensity, on the condition that the firm has this kind of event at n -th month:
h i E ( m , n ) = 1 { E i ( n ) = E } × h i ( m , n ) ,
where E is the event type, which can be 0, 1, or 2 to denote survival, default, or other exit respectively. Here, E i ( n ) is the event type of the i-th firm in the n-th month and L j , l = 1 β , posterior is the decomposed part of L j , l = 1 posterior , which is only associated with β j ( 1 ) . Then, it can be expressed as follows:
L j , l = 1 β , posterior = τ m = 1 M i = 1 I j ( m ) β j ( 1 ) + I j ( m 1 ) y j ( m 1 , m 1 ) τ i = 1 I j ( m ) h i ( m 1 , m 1 ) β j ( 1 ) + I j ( m 1 ) ( h i 0 ( m , m ) + h i 2 ( m , m ) ) + m = 1 M i = 1 I j ( m ) l n { 1 exp ( τ β j ( 1 ) + I j ( m 1 ) y j ( m 1 , m 1 ) τ i = 1 I j ( m ) h i ( m 1 , m 1 ) β j ( 1 ) + I j ( m 1 ) h i 1 ( m , m ) ) }
The gradient of L j , l = 1 β , posterior is expressed as
G j , l = 1 β , posterior = τ m = 1 M i = 1 I j ( m ) I j ( m 1 ) I j ( m 1 ) y j ( m 1 , m 1 ) τ i = 1 I j ( m ) h i ( m 1 , m 1 ) β j ( 1 ) + I j ( m 1 ) 2 h i 0 ( m , m ) + h i 2 ( m , m ) + τ m = 1 M i = 1 I j ( m ) exp τ β j ( 1 ) + I j ( m 1 ) y j ( m 1 , m 1 ) τ i = 1 I j ( m ) h i ( m 1 , m 1 ) β j ( 1 ) + I j ( m 1 ) h i 1 ( m , m ) 1 exp τ β j ( 1 ) + I j ( m 1 ) y j ( m 1 , m 1 ) τ i = 1 I j ( m ) h i ( m 1 , m 1 ) β j ( 1 ) + I j ( m 1 ) h i 1 ( m , m ) × I j ( m 1 ) I j ( m 1 ) y j ( m 1 , m 1 ) τ i = 1 I j ( m ) h i ( m 1 , m 1 ) β j ( 1 ) + I j ( m 1 ) 2 h i 1 ( m , m ) .
Similarly, we also calculate the log of the pseudo likelihood functions of γ j ( l ) and then calculate the gradient. The pseudo log-likelihood functions of γ j can be expressed as:
L l = l = 1 l m a x   m = 1 M l + 1   i = 1 I ( m ) l n P ^ j , l ( γ j 1 , γ j 2 , γ j 3 , γ j 4 , τ i , τ ¯ i , X i ( m ) ,   h i ( m , m + l 1 ) ) = l = 1 l m a x   j = 1 J L j , l ,
where l m a x is the largest prediction horizon we set and L l can be decomposed by industry and prediction horizon. Then, L j , l γ is the decomposed part of L j , l which is only associated with γ j ( l ) , and can be expressed as follows:
L j , l γ = τ m = 1 M   i = 1 I j ( m ) { exp [ X j ( m ) γ j ( l ) ] [ ( h i 0 ( m , m + l 1 ) + h i 2 ( m , m + l 1 ) ] } + m = 1 M   i = 1 I j ( m ) ln { 1 exp { τ exp [ X j ( m ) γ j ( l ) ] h i 1 ( m , m + l 1 ) } } ,
Let G j , l γ be the gradient of L j , l γ :
G j , l γ = τ m = 1 M i = 1 I j ( m ) X j ( m ) T exp X j ( m ) γ j ( l ) h i 0 ( m , m + l 1 ) + h i 2 ( m , m + l 1 ) + τ m = 1 M i = 1 I j ( m ) exp τ · exp X j ( m ) γ j ( l ) h i 1 ( m , m + l 1 ) 1 exp τ · exp X j ( m ) γ j ( l ) h i 1 ( m , m + l 1 ) × X j ( m ) T exp X j ( m ) γ j ( l ) h i 1 ( m , m + l 1 ) .
The gradient descent method is used to estimate β and γ . Then, L j , l = 1 β and     L j , l γ will be the maximum. Finally, we realize the estimation on the pseudo likelihood function and get the value of all parameters.

References

  1. Duan, J.C.; Sun, J.; Wang, T. Multiperiod corporate default prediction—A forward intensity approach. J. Econom. 2012, 170, 191–209. [Google Scholar] [CrossRef]
  2. Merton, R. On the pricing of corporate debt: The risk structure of interest rates. J. Financ. 1974, 28, 449–470. [Google Scholar]
  3. Hillegeist, S.A.; Keating, E.K.; Cram, D.P.; Lundstedt, K.G. Assessing the probability of bankruptcy. Rev. Account. Stud. 2004, 9, 5–34. [Google Scholar] [CrossRef]
  4. Zhang, Y.J.; Shi, B.S. Non-tradable shares pricing and optimal default point based on hybrid KMV models: Evidence from China. Knowl.-Based Syst. 2016, 110, 202–209. [Google Scholar] [CrossRef]
  5. Song, Y.N.; Zhang, F.R.; Liu, C.C. The risk of block chain financial market based on particle swarm optimization. J. Comput. Appl. Math. 2020, 370, 112667. [Google Scholar] [CrossRef]
  6. Zeng, L.; Lau, W.Y.; Bahri, E.N.A. Can the modified ESG-KMV logit model explain the default risk of internet finance companies? Front. Environ. Sci. 2022, 10, 961239. [Google Scholar] [CrossRef]
  7. Altman, E.I. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J. Financ. 1968, 23, 589–609. [Google Scholar] [CrossRef]
  8. Beaver, W.H. Market prices, financial ratios, and the prediction of failure. J. Account. Res. 1968, 6, 179–192. [Google Scholar] [CrossRef]
  9. Duffie, D.; Saita, L.; Wang, K. Multi-period corporate default prediction with stochastic covariates. J. Financ. Econ. 2007, 83, 635–665. [Google Scholar] [CrossRef]
  10. Hwang, R.-C.; Chu, C.-K. Forecasting Forward Defaults with the Discrete-Time Hazard Model. J. Forecast. 2014, 33, 108–123. [Google Scholar] [CrossRef]
  11. Caporale, G.M.; Cerrato, M.; Zhang, X. Analysing the determinants of insolvency risk for general insurance firms in the UK. J. Bank. Financ. 2017, 84, 107–122. [Google Scholar] [CrossRef]
  12. Berent, T.; Rejman, R. Bankruptcy Prediction with a Doubly Stochastic Poisson Forward Intensity Model and Low-Quality Data. Risks 2021, 9, 217. [Google Scholar] [CrossRef]
  13. Sigrist, F.; Leuenberger, N. Machine learning for corporate default risk: Multi-period prediction, frailty correlation, loan portfolios, and tail probabilities. Eur. J. Oper. Res. 2023, 305, 1390–1406. [Google Scholar] [CrossRef]
  14. Bhimani, A.; Gulamhussen, M.A.; Lopes, S.D.-R. Accounting and non-accounting determinants of default: An analysis of privately-held firms. J. Account. Public Policy 2010, 29, 517–532. [Google Scholar] [CrossRef]
  15. Dakovic, R.; Czado, C.; Berg, D. Bankruptcy prediction in Norway: A comparison study. Appl. Econ. Lett. 2010, 17, 1739–1746. [Google Scholar] [CrossRef]
  16. Giesecke, K.; Kim, B. Systemic Risk: What Defaults Are Telling Us. Manag. Sci. 2011, 57, 1387–1405. [Google Scholar] [CrossRef]
  17. Koopman, S.J.; Lucas, A.; Schwaab, B. Dynamic Factor Models with Macro, Frailty, and Industry Effects for U.S. Default Counts: The Credit Crisis of 2008. J. Bus. Econ. Stat. 2012, 30, 521–532. [Google Scholar] [CrossRef]
  18. Mensi, W.; Shahzad, S.J.H.; Hammoudeh, S.; Hkiri, B.; Al Yahyaee, K.H. Long-run relationships between US financial credit markets and risk factors: Evidence from the quantile ARDL approach. Financ. Res. Lett. 2019, 29, 101–110. [Google Scholar] [CrossRef]
  19. Gertler, L.; Jancovicova-Bognarova, K.; Majer, L. Explaining Corporate Credit Default Rates with Sector Level Detail. Financ. A Uver-Czech J. Econ. Financ. 2020, 70, 96–120. [Google Scholar]
  20. Lee, H.-H. Distress risk, product market competition, and corporate bond yield spreads. Rev. Quant. Financ. Account. 2020, 55, 1093–1135. [Google Scholar] [CrossRef]
  21. Batrancea, L.M. An Econometric Approach on Performance, Assets, and Liabilities in a Sample of Banks from Europe, Israel, United States of America, and Canada. Mathematics 2021, 9, 3178. [Google Scholar] [CrossRef]
  22. Ni, Z.; Jiang, M.; Zhan, W. Predict-ing Multi-Period Corporate Default Based on Bayesian Estimation of Forward Intensi-ty—Evidence from China. Systems 2023, 11, 18. [Google Scholar] [CrossRef]
  23. Duan, J.-C.; W Miao. Default Correlations and Large-Portfolio Credit Analysis. J. Bus. Econ. Stat. 2016, 34, 536–546. [Google Scholar] [CrossRef]
  24. Duan, J.C.; Wang, T. Measuring Distance-to-Default for Financial and Non-Financial Firms. Glob. Credit. Rev. 2012, 2, 95–108. [Google Scholar] [CrossRef]
  25. Barboza, F.; Kimura, H.; Altman, E. Machine learning models and bankruptcy prediction. Expert Syst. Appl. 2017, 83, 405–417. [Google Scholar] [CrossRef]
  26. Gunnarsson B R, Broucke S V, Baesens B et al Deep learning for credit scoring: Do or don’t? Eur. J. Oper. Res. 2021, 295, 292–305. [Google Scholar] [CrossRef]
  27. Vassalou, M.; Xing, Y.H. Default risk in equity returns. J. Financ. 2004, 59, 831–868. [Google Scholar] [CrossRef]
Figure 1. The number of all industry defaults per year. Source: NUS-CRI database.
Figure 1. The number of all industry defaults per year. Source: NUS-CRI database.
Axioms 12 00402 g001
Figure 2. The frequency of all industry defaults per year. Source: NUS-CRI database.
Figure 2. The frequency of all industry defaults per year. Source: NUS-CRI database.
Axioms 12 00402 g002
Figure 3. This figure shows a comparison of the realized number of defaults, original aggregated accumulated PDs, and new aggregated accumulated PDs for a 12-month prediction horizon in the sample.
Figure 3. This figure shows a comparison of the realized number of defaults, original aggregated accumulated PDs, and new aggregated accumulated PDs for a 12-month prediction horizon in the sample.
Axioms 12 00402 g003
Figure 4. This figure shows a comparison of the realized number of defaults, original aggregated accumulated PDs, and new aggregated accumulated PDs for a 12-month prediction horizon out of sample.
Figure 4. This figure shows a comparison of the realized number of defaults, original aggregated accumulated PDs, and new aggregated accumulated PDs for a 12-month prediction horizon out of sample.
Axioms 12 00402 g004
Figure 5. This figure shows the in-sample cumulative accuracy profiles of new PDs from January 2000 to December 2019 for different prediction horizons.
Figure 5. This figure shows the in-sample cumulative accuracy profiles of new PDs from January 2000 to December 2019 for different prediction horizons.
Axioms 12 00402 g005
Figure 6. This figure shows the out-of-sample cumulative accuracy profiles of new PDs from January 2000 to December 2019 for different prediction horizons.
Figure 6. This figure shows the out-of-sample cumulative accuracy profiles of new PDs from January 2000 to December 2019 for different prediction horizons.
Axioms 12 00402 g006
Figure 7. This figure compares the in-sample cumulative accuracy profiles of new PDs and original PDs for horizons of 1–36 months.
Figure 7. This figure compares the in-sample cumulative accuracy profiles of new PDs and original PDs for horizons of 1–36 months.
Axioms 12 00402 g007
Figure 8. This figure compares the out-of-sample cumulative accuracy profiles of new PDs and original PDs for horizons of 1–36 months.
Figure 8. This figure compares the out-of-sample cumulative accuracy profiles of new PDs and original PDs for horizons of 1–36 months.
Axioms 12 00402 g008
Table 1. Maximum pseudo-likelihood estimates for  β j ( 1 ) .
Table 1. Maximum pseudo-likelihood estimates for  β j ( 1 ) .
IndustryBasic MaterialsCommunicationsConsumer (Cyc)Consumer (N-Cyc)Diversified
β j ( 1 ) 775.41873.631152.1019,243.82705.98
IndustryEnergyFinancialIndustrialTechnologyUtilities
β j ( 1 ) 2423.68276.791135.92613.32685.59
Table 2. Maximum pseudo-likelihood estimates for  γ j 1 ( l ) .
Table 2. Maximum pseudo-likelihood estimates for  γ j 1 ( l ) .
Z j γ j 1 ( 1 ) γ j 1 ( 2 ) γ j 1 ( 3 ) γ j 1 ( 6 ) γ j 1 ( 12 ) γ j 1 ( 24 ) γ j 1 ( 36 )
Basic Materials2.4911852.2806422.1553841.899994−0.17849−1.35955−0.44241
Communications−2.33016−2.48579−2.58817−3.55322−1.46946−0.95426−1.068
Consumer (Cyc)−0.17068−0.3005−0.29659−1.62310.2707780.5157922.265838
Consumer (NC)−2.71041−2.76414−2.86428−2.44659−3.08668−1.99594−0.90574
Diversified−1.40276−0.98109−0.66264−1.03316−0.40614−0.40304−0.98643
Energy−4.25579−4.065−4.19237−3.32675−1.30519−0.384961.197159
Financial2.7348432.8681242.8598682.6550782.034357−0.374940.082072
Industrial2.8541562.8714962.9460543.5409543.5361432.5460840.027085
Technology−0.25124−0.244870.196717−0.99298−0.11297−1.51631−0.15389
Utilities−2.21956−2.17543−2.34536−2.69005−2.0572−2.72963−0.02748
Table 3. Maximum pseudo-likelihood estimates for  γ j 2 ( l ) .
Table 3. Maximum pseudo-likelihood estimates for  γ j 2 ( l ) .
T Z j γ j 2 ( 1 ) γ j 2 ( 2 ) γ j 2 ( 3 ) γ j 2 ( 6 ) γ j 2 ( 12 ) γ j 2 ( 24 ) γ j 2 ( 36 )
Basic Materials−2.24804−3.07666−2.29236−1.391370.7746152.24390.465381
Communications2.4176193.5337612.4230674.272601−0.843870.8289250.20722
Consumer (Cyc)0.2582110.9999590.2231380.8954330.307311−0.60722−1.1084
Consumer (NC)−2.178742.0996452.6887542.5474061.7943379.7308363.621596
Diversified−1.33008−1.437561.718504−0.997191.340722−1.39585−0.19119
Energy0.9457321.480913.6759421.2154483.317640.2855142.258077
Financial−2.99554−2.91961−3.03768−2.8092−1.841640.3486430.618982
Industrial−2.59238−2.71164−2.95667−3.20924−3.50535−2.62851−0.19389
Technology0.411408−0.05615−0.101751.3199240.4479830.4777570.030217
Utilities1.6685643.580656−0.045580.0048092.576682−0.742620.720997
Table 4. Maximum pseudo-likelihood estimates for  γ j 3 ( l ) .
Table 4. Maximum pseudo-likelihood estimates for  γ j 3 ( l ) .
Z j , o t h e r γ j 3 ( 1 ) γ j 3 ( 2 ) γ j 3 ( 3 ) γ j 3 ( 6 ) γ j 3 ( 12 ) γ j 3 ( 24 ) γ j 3 ( 36 )
Basic Materials−2.52744−2.34069−2.21324−2.022930.1168481.2082480.370522
Communications2.7368752.8718242.9796983.7713851.6235221.1958121.250113
Consumer (Cyc)0.0863830.2116140.2291381.523775−0.33232−0.54249−2.31581
Consumer (NC)2.82662.8762282.9935462.6291153.2426112.0991840.765712
Diversified2.0393271.6019271.2794741.7898171.0386780.7362291.258517
Energy4.6802274.4625854.6073253.6611661.5463560.606136−1.10992
Financial−2.80724−2.94792−2.92779−2.76668−2.165420.132517−0.26506
Industrial−2.80509−2.84218−2.90324−3.50324−3.48343−2.55798−0.07193
Technology0.5681410.5979140.0830021.3184480.5376571.797630.146919
Utilities1.6772091.6147041.6669231.9677371.4047742.075835−0.47442
Table 5. Maximum pseudo-likelihood estimates for  γ j 4 ( l ) .
Table 5. Maximum pseudo-likelihood estimates for  γ j 4 ( l ) .
T Z j , o t h e r γ j 4 ( 1 ) γ j 4 ( 2 ) γ j 4 ( 3 ) γ j 4 ( 6 ) γ j 4 ( 12 ) γ j 4 ( 24 ) γ j 4 ( 36 )
Basic Materials2.1478683.7261772.7875421.3739240.439178−1.032041.224042
Communications−2.72153−2.59917−2.4975−2.920830.923662−1.474030.027272
Consumer (Cyc)−0.63154−0.00092−0.08518−1.262681.5904672.3104092.734578
Consumer (NC)−2.89664−2.00488−2.07615−2.74982−2.17565−1.410850.23462
Diversified−0.90503−1.673940.751807−1.695931.0858872.5085691.608284
Energy−3.07229−2.68038−4.0168−5.102840.3821571.5571671.929201
Financial4.3676024.1456672.8982240.8821993.9984580.9754750.269972
Industrial3.3071873.9756713.5913462.9659274.5662142.6237840.640499
Technology1.1760820.7532291.602314−1.259780.4046731.779642.752035
Utilities−2.04431−1.27136−2.15938−3.01697−0.00345−1.59101−0.13897
Table 6. Comparison of the ARs in the sample.
Table 6. Comparison of the ARs in the sample.
Accuracy Ratio 1 Month2 Months3 Months6 Months12 Months24 Months36 Months
Original PDs0.57850.57820.58110.57720.55200.50830.4913
Revised PDs0.62460.62020.62550.62380.60560.55540.5293
Increased (%)7.46.87.17.58.88.57.2
Table 7. Comparison of the ARs out of sample.
Table 7. Comparison of the ARs out of sample.
Accuracy Ratio 1 Month2 Months3 Months6 Months12 Months24 Months36 Months
Original PDs0.55430.57330.60320.61490.61310.52540.5174
Revised PDs0.62620.63350.65590.65570.64180.56280.5527
Increased (%)11.59.586.24.86.66.4
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ni, Z.; Jiang, M.; Zhan, W. Default Prediction with Industry-Specific Default Heterogeneity Indicators Based on the Forward Intensity Model. Axioms 2023, 12, 402. https://doi.org/10.3390/axioms12040402

AMA Style

Ni Z, Jiang M, Zhan W. Default Prediction with Industry-Specific Default Heterogeneity Indicators Based on the Forward Intensity Model. Axioms. 2023; 12(4):402. https://doi.org/10.3390/axioms12040402

Chicago/Turabian Style

Ni, Zhengfang, Minghui Jiang, and Wentao Zhan. 2023. "Default Prediction with Industry-Specific Default Heterogeneity Indicators Based on the Forward Intensity Model" Axioms 12, no. 4: 402. https://doi.org/10.3390/axioms12040402

APA Style

Ni, Z., Jiang, M., & Zhan, W. (2023). Default Prediction with Industry-Specific Default Heterogeneity Indicators Based on the Forward Intensity Model. Axioms, 12(4), 402. https://doi.org/10.3390/axioms12040402

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop