An Age–Period–Cohort Framework for Profit and Profit Volatility Modeling

Breeden, Joseph L.

doi:10.3390/math12101427

Open AccessArticle

An Age–Period–Cohort Framework for Profit and Profit Volatility Modeling

by

Joseph L. Breeden

Deep Future Analytics LLC, 1600 Lena St., Suite E3, Santa Fe, NM 87505, USA

Mathematics 2024, 12(10), 1427; https://doi.org/10.3390/math12101427

Submission received: 25 March 2024 / Revised: 26 April 2024 / Accepted: 6 May 2024 / Published: 7 May 2024

(This article belongs to the Special Issue Application of Survival Analysis in Economics, Finance and Insurance)

Download

Browse Figures

Versions Notes

Abstract

:

The greatest source of failure in portfolio analytics is not individual models that perform poorly, but rather an inability to integrate models quantitatively across management functions. The separable components of age–period–cohort models provide a framework for integrated credit risk modeling across an organization. Using a panel data structure, credit risk scores can be integrated with an APC framework using either logistic regression or machine learning. Such APC scores for default, payoff, and other key rates fit naturally into forward-looking cash flow estimates. Given an economic scenario, every applicant at the time of origination can be assigned profit and profit volatility estimates so that underwriting can truly be account-level. This process optimizes the most fallible part of underwriting, which is setting cutoff scores and assigning loan pricing and terms. This article provides a summary of applications of APC models across portfolio management roles, with a description of how to create the models to be directly integrated. As a consequence, cash flow calculations are available for each account, and cutoff scores can be set directly from portfolio financial targets.

Keywords:

credit scoring; survival models; age–period–cohort; profitability

MSC:

62N99

1. Introduction

The greatest technological change in credit risk analytics in the last decade has been the application of machine learning and non-bureau datasets. This is intended to provide better discrimination of credit risk between loan applicants, particularly where traditional credit bureau data and scores are limited. However, we should ask how often lenders fail due to an inability to rank-order risk or are unprofitable because of their credit scores. While obtaining more rank-order discrimination in a credit score is always valuable, the largest portfolio failures come from systematic shifts in the probability of default relative to a credit score or an inability to price loans accurately considering the risk of prepayment.

These difficulties arise because the architecture of almost all credit scores deployed at lenders today does not fit mathematically into the cash flow models used in finance departments to predict yield and set pricing. This mismatch occurs because credit scores are built using cross-sectional data, where each account appears once in the dataset with an indicator of whether it defaulted or prepaid during a fixed observation window, commonly the first 12 to 48 months, depending on the product. Any model built on cross-sectional data will not be able to adjust for or predict the timing of defaults or prepayments. Without knowledge of timing, the models cannot integrate macroeconomic scenarios in order to calibrate the probabilities to future environments. Timing and environmental calibration are both critical to predicting the expected yield for a loan. With cross-sectional credit scores, such as credit bureau scores, intermediate models are created to attempt to connect rank order to cash flow models. The results are rarely satisfactory.

The concepts of survival modeling solve this problem by providing a framework that unifies credit scoring, finance, and many other predictive analytical needs within a lender. This paper presents a framework for profit and uncertainty (FPU) modeling that shows how lending analytics can be integrated by leveraging the body of work that already exists for discrete-time survival modeling and age–period–cohort (APC) models, in particular. In this sense, this article is a survey article of previous work, organizing this work into a framework to solve the biggest problems facing lenders today. Notably, this framework can incorporate the most popular techniques in machine learning and alternate data, as well as forecast uncertainty, climate risk stress testing, portfolio optimization, and normalizing pandemic-era data for credit risk modeling.

Section 2 starts from the perspective of the business owner to establish the key objectives. Section 3 establishes the basic principles of age–period–cohort models for the purpose of supporting lending analytics. Section 4 defines a basic cash flow and yield model and explains how APC models support yield calculations. This section also addresses the competing aspect of prepayment, which is essential for predicting loan yields. Section 5 reviews the most recent innovations in integrating APC outputs with traditional credit scoring techniques by switching to a panel data structure while leaving the credit score output largely unchanged. Section 6 reviews how APC models have been used for loss forecasting and economic capital since the 1990s. Section 8 and Section 9 show how the same approach works for machine learning models and alternate data. Section 10 and Section 11 incorporate credit cycles and economic cycles in this framework. Section 12 defines pricing uncertainty. In Section 13, we bring these components together to describe how loan pricing can be optimized at the segment or account level. With a fully aligned framework, Section 14 revisits how scores should be used in underwriting, with special attention to forecast uncertainty. Section 15 discusses how stress testing for extreme events, such as climate change and pandemics, can be incorporated into the FPU. Section 16 reveals the unique ability of APC models to provide adjustments after an event like the COVID-19 pandemic so that scores may still be created using data collected during this period. We conclude with Section 17.

Although many of these applications of APC models have been previously published, this article is the first to present this integrated framework usable in finance, credit risk, and loan origination. In addition, the discussions regarding setting score cutoffs from an APC cash flow model, including volatility estimates in loan origination, and normalizing pandemic-era data with APC inputs to allow for its use in scoring models are all original contributions.

2. Start from the Goal and Work Backward

Technological innovations are always initially used as replacement parts in an existing framework. Only over time are frameworks redesigned to leverage the full power of innovation. That is happening today in lending. The way to break free from legacy frameworks is to start from the top, identify the core goals of the business, and then create the components we need to realize those objectives.

In the context of lending, that legacy framework is the scorecard [1]. Still, in the 1960s, much loan underwriting was being performed with paper cards, where the underwriter would add or subtract points for specific borrower attributes. The formulas for these were initially judgmental. Scientists from operations research pioneered the use of statistical models to replace these heuristic scorecards. Using historical data, the risk of default could be ranked. Lenders could set a cutoff score to control how much risk they were willing to assume, just as was done with the original paper scorecards. Even the first credit scores would have been optimized to predict the probability of default in sample, but predicting probabilities out of sample in future economic conditions was beyond those models. Thus, credit scores have been used primarily for rank ordering ever since [2,3].

The largest financial institutions have large teams dedicated to creating origination scores to predict default over a fixed initial horizon, for example, 24 or 36 months, or behavior scores to predict default using loan performance attributes over a shorter horizon, such as 12 months. The data are better, the models are sophisticated, and the IT systems are efficient, but we have not revisited the problem statement.

The goal of lending is not to rank-order risk. Lenders provide money to borrowers with the expectation of achieving a positive net yield above their established hurdle rate. Over the past several decades, yields have fallen due to increasingly intense price competition and higher expenses, in large part from increased regulatory compliance costs. This leaves little room for error in designing loan products and setting pricing, but the credit scores used by almost all lenders are still only ranking risk. To compete effectively, lenders need to start with detailed cash flow models for the product and determine what model inputs are required to make effective, forward-looking estimates of yield.

The finance teams of financial institutions have cash flow models with varying degrees of complexity. However, too often, they use historic average default and prepayment rates as inputs. Instead, an effective cash flow model requires forward-looking predictions that include the lifecycle (or hazard function) for the borrower, a scenario for future macroeconomic conditions, and an adjustment for the borrower’s attributes or segment-level risk adjustment. The structure of age–period–cohort models is a natural fit for cash flow models and also provides a mechanism to integrate with loan-level scores.

3. Age–Period–Cohort Models as a Credit Risk Framework

Age–period–cohort models were developed independently of survival models [4,5] but share many commonalities. Cox proportional hazards (Cox PH) [6] models extend survival models to estimate the relative risk between individuals or cohorts. Cox PH models have been applied to create credit scores for loan applicants.

The challenge when using Cox PH models in lending [7] is that they were developed to consider a hazard function and attributes along only one other dimension. That second dimension could be for borrower attributes through origination dates or macroeconomic factors by calendar date. Clearly, information from all three dimensions—age of the account (a), origination date (vintage) (v), and calendar date (t)—is important for predicting loan performance. However, the simple relationship that

a = t - v

introduces a linear dependency between any estimate of the dependence of performance on information from those three dimensions. Cox PH models will generate a unique estimate, but these estimates can be both unstable and biased when attempting to reproduce the original coefficients used to generate test data [8]. Consequently, the only advantage of APC models over Cox PH models in the application to loan performance is that they allow the model developer to make explicit assumptions about the linear trend allocation given the data available and prior knowledge about the problem domain.

When used in lending, APC models [9,10,11,12,13] are most often applied to predicting the probability of default (PD) or probability of attrition or prepayment (PA) [14]. When data are available, they may also be used to estimate the recovery rate or loss given default, where the cohort origination date may be taken as the default date, and the age function is measured for recoveries from the default date.

Taking the PD as an example, a binomial distribution is applicable, so the APC formula may be written as

l o g i t (P D) = F (a) + G (v) + H (t) + ϵ

(1)

where

F (a)

is the lifecycle function,

G (v)

is the vintage function that measures variations in credit risk by vintage, and

H (t)

is the environment function that measures external impacts, usually macroeconomic, upon loan performance versus calendar date. These functions are commonly estimated via smoothing splines [15] or non-parametrically with a Bayesian estimator [16].

Regardless of the estimation method, each of these functions may be viewed as having linear and nonlinear components. Therefore, Equation (1) can be written as

l o g i t (P D) = α_{0} + α_{1} a + F^{'} (a) + β_{1} v + G^{'} (v) + γ_{1} t + H^{'} (t) + ϵ

(2)

In Equation (2),

α_{0}

is the only constant term, which is included in

F (a)

by convention, meaning that

G (v)

and

H (t)

are mean-zero. The constants

α_{1}

,

β_{1}

, and

γ_{1}

are not independently estimable because of the relation

a = t - 1

. Many techniques have been proposed for resolving this. In the case of lending, if the available data span more than one economic cycle, the most common solution is to assume that

γ_{1} = 0

. This is consistent with assuming that a through-the-cycle (TTC) average PD exists, which is central to many banking regulations, including the Basel II formula for computing regulatory capital requirements [17].

Beyond the challenges in determining the linear coefficients, previous work has shown that the nonlinear components

(F^{'} (a), G^{'} (v), H^{'} (t))

are uniquely estimable. Although important, having estimable coefficients does not prove that the formulation is a good fit for lending data. One could imagine cross-terms between a, v, and t. In fact, these are common. Small variations in credit bureau scores by vintage should be a primary determinant of the structure in

G (v)

. However, cohorts with significantly different credit bureau scores will exhibit different lifecycle functions

F (a)

. Making these functions multidimensional to capture cross-term effects has proven challenging with real data. The simple solution is to segment the lifecycle function by credit bureau score and segment the environment function by geographic region.

With the above considerations, APC models have been found to have in-sample fits of 1% to 3% errors over a one-year period for sufficiently large datasets. Tests can be performed to verify that the model errors do not result from missing cross-terms or, equivalently, insufficient segmentation. Out-of-time tests show that APC forecasting errors grow slowly given actual economic conditions because the lifecycle functions estimate the full life of the loans. In contrast, roll rate and state transition models show significant increases in error rates beyond the first six months unless they are hybridized with APC methods [18].

APC models can be employed as the central framework for credit risk modeling because once the functions of Equation (1) are estimated and tested, those functions are independent. Independence means that individual functions may be replaced with models for credit risk based on borrower attributes or macroeconomic sensitivity, holding the other functions as fixed. In this way, the approach becomes equivalent to a discrete-time survival model estimated as a panel logistic regression with the hazard function (lifecycle) taken as a fixed input from the APC analysis.

4. Profit Models

To show how APC models integrate with cash flow models for yield forecasting, we need to develop a simple cash flow model. First, note that “cash flow model” does not refer to a statistical model. Rather, it is a set of equations for combining forecast components and loan terms in order to calculate an expected yield for the loan. These are better described as an aggregation technique.

\begin{matrix} P a y m e n t (a) & = & - L o a n A m o u n t - O r i g i n a t i o n C o s t s + \\ S c h e d u l e d P a y m e n t (a) \cdot P A c t i v e (a) + \\ S c h e d u l e d P r i n c i p a l O u t s t a n d i n g (a) \cdot P A (a) + \\ S c h e d u l e d I n t e r e s t (a) \cdot P A c t i v e (a) + P e n a l t y F e e s (a) \cdot P A c t i v e (a) + \\ - A c c o u n t M a i n t e n a n c e (a) \cdot P A c t i v e (a) + \\ - C o s t o f F u n d s (a) \cdot S c h e d u l e d P r i n c i p a l O u t s t a n d i n g (a) \cdot P A c t i v e (a) \\ S c h e d u l e d P r i n c i p a l O u t s t a n d i n g (a) \cdot R e c o v e r y R a t e (a) \cdot P D (a) \end{matrix}

(3)

Equation (3) is a simple cash flow model for a term loan, such as an auto loan or personal loan.

P A c t i v e (a) = P A c t i v e (a - 1) \cdot (1 - P D (a) - P A (a)

is the probability that the account is active at age a.

P A c t i v e (a = 0) = 1

. The

L o a n A m o u n t

and

O r i g i n a t i o n C o s t s

are the initial negative entries in the cash flow.

E x p e c t e d P r i n c i p l e P a y m e n t (a) = S c h e d u l e d P a y m e n t (a) \cdot P A c t i v e (a)

P r e p a y m e n t (a) = S c h e d u l e d P r i n c i p a l O u t s t a n d i n g (a) \cdot P A (a)

F e e s (a) = S c h e d u l e d I n t e r e s t (a) \cdot P A c t i v e (a) + P e n a l t y F e e s (a) \cdot P A c t i v e (a)

are the payments and fees expected throughout the life of the loan. Costs may be allocated to servicing the accounts, for which

A c c o u n t M a i n t e n a n c e (a) \cdot P A c t i v e (a)

is included. In the event of default, net recoveries, including collateral liquidation

A c c o u n t M a i n t e n a n c e (a) \cdot P A c t i v e (a)

are included. The cost of funds can also be deducted and adjusted for the probability of being active:

S c h e d u l e d P r i n c i p a l O u t s t a n d i n g (a) * R e c o v e r y R a t e (a) \cdot P D (a)

The amortization schedule for payments and balances is determined in the usual way from the loan’s interest rate, term or draw period, repayment period, or balloon date. Note that PD and PA in Equation (3) are conditional on the account being active in the previous month. This is a competing risk structure, where each month, the account may default, prepay, or remain active.

The final profit calculation is the sum of all the incremental cash (t) for the life of the loan plus the lagging delinquency, charge-off, and recovery periods. The loan’s life is defined as when PActive = 0, which may be at term but can also be either before or after that, depending on the various adjustments.

\begin{matrix} P r o f i t & = & \sum P a y m e n t s (a) \\ A v g E x p B a l a n c e & = & \frac{1}{N} \sum S c h e d u l e d P r i n c i p a l O u t s t a n d i n g (a) \cdot P A c t i v e (a) \\ Y i e l d & = & P r o f i t / A v e r a g e E x p e c t e d B a l a n c e \end{matrix}

(4)

Cash flow models must also be segmented or even account-level. Account-level pricing is impossible without account-level yield forecasting. Specific loan and borrower attributes change not just the level of default and prepayment risk but also the timing of those risks and thus have an amplified effect on the yield, net present value (NPV), or internal rate of return (IRR).

Lines of credit will have obvious differences since there is no initial loan or scheduled payments. Many hybrid products also exist. However, the basic formulation in Equations (3) and (4) provides the point of comparison needed to integrate an APC framework.

The above discussion presents many input variables measured on a portfolio without a detailed definition of those variables. This is intentional, as the accounting policies of the institution will have the final say over how charge-offs are determined, what is included in fees and expenses, and many other small details. The APC modeling and supporting model coefficients will adapt to these definitions without detailed adjustment by the analyst.

APC models integrate well within cash flow calculations because they can provide monthly forecasts of conditional PD and PA specific to segments and vintages. To obtain account-level PD and PA forecasts, the vintage function of an APC model needs to be replaced with a score using borrower attributes.

5. Credit Scoring for Cash Flow Forecasting

The mismatch between the structures of credit scores and cash flow models is one of the primary causes of yield forecasting failure. The primary problem is that rank-order scores do not connect well to a forward-looking estimate of the PD. In addition, the PA is often not modeled in detail, which is equally essential for predicting the expected yield. Rather than trying to put a wrapper around an existing credit score to create a PD model, a better solution is to create a model using a panel data structure [19,20,21] so that the timing of losses may be related to exogenous events or the maturing of the account. In credit scoring, this naturally leads to some variation of a discrete-time survival model. Survival models have been used for credit risk modeling for many years, including many recent innovations [22,23,24,25,26].

Within the APC framework, this means estimating a logistic regression credit score, where the APC lifecycle and environment functions are included as fixed offsets when estimating the credit score [27,28].

l o g i t (P D (i, a, t)) = F (a) + H (t) + \sum_{j} c_{j} s_{i j} + ϵ

(5)

Having a fixed offset in a regression equation means that there are no coefficients in front of

F (a)

and

H (t)

, and they are previously estimated and taken as numerical inputs. Only the scoring coefficients

c_{j}

are estimated for the scoring factors

s_{i j}

for factor j for account i. In most discrete-time survival estimations, the functions

F (a)

and

H (t)

are estimated simultaneously with the scoring coefficients, but then a solution to the linear ambiguity must be incorporated as a constraint or by removing the trend from one of the functions.

By predicting log-odds of default within an APC framework, the credit score component,

C (i) = \sum_{j} c_{j} s_{i j}

, is also in units of log-odds of default, just as with a traditional credit score. This creates the bridge whereby the total forecast

P D (i, a, t)

can be used in the cash flow model, and the credit score

C (i)

can be deployed within traditional loan origination systems. In practical application, these APC credit scores have been seen to be more robust out of sample, but still look generally quite similar to the coefficients of the traditional cross-sectional credit scores.

6. Loss Reserves and Economic Capital

Perhaps the most significant advantage of panel data-based credit scores is that they can work over any forecast horizon. With the APC origination score, the credit score component

C (i)

is independent of the forecast horizon and, therefore, can be used to rank-order risk regardless of the interval. If this model is to be used with Basel II where a 12-month PD is required, then the unconditional PD is summed over 12 months. For a lifetime forecast, the unconditional PD is summed for the life of the loan. One model and one score apply for any interval. This is the opposite of traditional cross-sectional scores, where a new score is required for every time interval.

Computing loss reserves and economic capital is not an area of innovation for most lenders, but it can be a path to improvement. The complexity and cost of implementing International Financial Reporting Standard (IFRS) 9 [29,30,31] internationally and Current Expected Credit Loss (CECL) [32,33] in the US lead to a natural desire to repurpose existing models. However, traditional scores built on cross-sectional data lead to an immediate problem when converting to a PD forecast that incorporates an economic scenario or a through-the-cycle economic environment.

Let us consider first the application of an APC score to an IFRS 9 loss reserve calculation. At origination, all loans are assumed to be performing as planned, and the loss risk is covered via product pricing, Stage 1. After origination, at each assessment period (monthly, quarterly, or annually, depending on the size of the institution), a second model using behavioral information, especially delinquency, is run to determine whether loans have deteriorated significantly and should be classified as Stage 2. On average, this might be 20% of the total portfolio. Loans classified as Stage 2 must have their loss reserves recomputed to cover the cumulative loss risk throughout the remaining life of the loan. Stage 3 loans are in default and still reserved as cumulative lifetime loss risk, but post-default accounting treatment applies.

Because Stage 1 is a 12-month loss reserve and Stage 2 is a lifetime loss reserve, the majority of account-level implementations for IFRS 9 use two different scoring models. This can lead to credit risk assessment discontinuities beyond the change to the lifetime calculation. With the APC score, the same model can be used for both stages, but with different intervals for summing the estimates. The lifetime interval also arises naturally from the competing risk prediction of prepayments. When the balance reaches zero, the lifetime is complete, without making assumptions about the average life of the loan. The most important feature of an APC approach to IFRS 9 estimation is that it changes smoothly as the account ages. Because the lifecycle,

F (a)

, is a core part of the forecast, each month that the account ages without defaulting reduces the loss reserve until the reserve reaches zero at payoff.

CECL was intended to simplify IFRS 9 for smaller US lenders by classifying all accounts as Stage 2, lifetime loss reserve. This simplifies the task of having one model, but lifetime loss forecasting is significantly more challenging than a 12-month forecast. Many approximate approaches are used for CECL compliance, but few are technically sound and compliant with all CECL requirements. Auditors and regulators have, so far, given wide latitude to what is considered CECL-compliant, but lesser models do not provide actual insights into portfolio risks or tie into other analytics needs for the organization.

The IFRS 9 process does not say that a lender should use their 12-month PD models from Basel II, but that is an approach often taken. The problem is that almost none of the Basel II IRB models [34,35,36] consider competing risks such as prepayments. Therefore, the Stage 2 lifetime loss forecast must come from a completely different model that incorporates the expected change in default and prepayment risk as the account ages and for the remaining life of the loan. Alternatively, an organization might use a 12-month PD but incorporate an overlay to adjust to a lifetime loss risk. However, the Basel II framework is designed to calibrate a loss distribution to estimate the tail risk using a through-the-cycle (TTC) calibration for the PD. The PD used in Basel II is not the PD driven by specific economic scenarios needed for IFRS 9. However, the APC scoring framework can actually provide both the IFRS 9 forecasts and Basel II TTC PD just by changing the environment scenario to a through-the-cycle environment and summing for the life of the loan.

Economic capital, distinguished from the regulatory capital calculation of Basel II, can be computed directly with the default and prepayment models developed for cash flow modeling. The only added step is to compute a mean-reverting distribution using a diffusion equation [37,38] and then sample via Monte Carlo to estimate the lifetime loss risk distribution. When studying a sample account within a segment to allocate economic capital, the results fit very well to a beta distribution with a small number of samples [39]. This calculation is no more difficult than a lifetime loss forecast.

Basel II might set the lender’s overall capital requirements, but to understand the loan’s profitability, the capital can be allocated proportionally according to the relative economic capital levels. Getting this right is mutually beneficial to loan origination because then we can scale the cost of capital by the capital needs of the segments. High-risk borrowers have a higher loss rate by definition, but their loss volatility is proportionally lower if measured by the unexpected loss divided by the expected loss (UL/EL). As a rule of thumb, the UL/EL for a subprime borrower might be 0.5, but for a superprime borrower, it can be as much as 4 [40]. The scale from subprime to superprime also depends on the loan product because of the consumer’s payment hierarchy.

Prepayment Scores

Credit risk modeling predominantly refers to predicting defaults, and yet for prime lending, voluntary prepayment may be the greatest risk of unprofitable lending. Prepayment modeling needs to be on par with default modeling, although the motivations for prepayment can be more complex and less predictable than default. In prime mortgage lending, this parity with default modeling has long been true [41,42,43]. In consumer lending, fewer good prepayment scores are in use today, although they certainly exist [44,45].

Competing risk survival models have been available for decades [46,47], but they have mostly been utilized in simpler situations. When both default and prepayment have unique lifecycles, macroeconomic drivers, and credit score factors, estimating both models simultaneously can quickly degenerate. However, panel data structures naturally allow for separately estimated competing risk models. Even if the default and prepayment models are built entirely independently, by making them periodic performance forecasts for accounts that were active in the previous period, the models can immediately be combined into a cash flow model. This means that the PD and PA models are both conditional on the account being active in the previous period. When implemented in the cash flow model, the unconditional default and prepayment rates are computed as

\begin{matrix} U P D (t) & = & P D (t) \cdot P A c t i v e (t) \end{matrix}

(6)

\begin{matrix} U P A (t) & = & P A (t) \cdot P A c t i v e (t) \end{matrix}

(7)

\begin{matrix} P r o b A c t i v e & = & \{\begin{matrix} t = 1 & 1 \\ t > 1 & P A c t i v e (t - 1) \cdot (1 - P D (t) - P A (t)) \end{matrix} \end{matrix}

(8)

With a competing risks approach, prepayment is an account rate and refers to a loan that pays off entirely. With automatic payments, partial loan prepayments have fallen dramatically. Payment amounts are mostly only dynamic for lines of credit.

As with default forecasting, the lifecycle versus age of the account is the most important factor in a prepayment model. Early prepayments are much more financially painful for the lender than prepayments late in the lifecycle. First payment default from underwriting failures is the most extreme example. Both loan originators and subsequent loan buyers should be concerned about the timing of prepayment throughout the life of the loan.

Prepayment models diverge from default models in their connection between vintage (origination month) and environmental effects. The most important economic factor is the difference between the loan’s annual percentage rate (APR) and externally available loan rates as a predictor of the likelihood that the customer will refinance the loan. This can be approximated for modeling as the change in a comparable market interest rate between loan origination and a later forecast date. This introduces a cross-term into the APC structure but is readily incorporated into a panel data prepayment score with the APC lifecycle as an input.

For credit cards or other lines of credit, prepayment may be replaced with purchase and payment models. Purchase models are necessary in order to predict revenue as part of the cash flow model. Payment models will determine the level of revolving balances. All of the same model designs will work here as well.

7. Loss Recovery Prediction

The recovery of debt owed by the consumer after the loan has been charged off is an important contributor to cash flow analysis. Modeling recovery rates is a difficult problem because there are generally three possible outputs: no recovery, full recovery, and a distribution of partial recoveries. Several approaches have been proposed [48,49,50,51].

Recovery efforts can last for years, so for young portfolios, the full recovery rate cannot be determined from simple moving averages. Survival and APC models are also a good fit for recovery modeling [52] but with the charge-off date used to define the initial event. The hazard function or APC lifecycle measures the probability of recovery versus the age of the defaulted debt.

The expected incremental recoveries can be added to cash flow projections, conditioned on the probability of the original charge-off. As part of the cash flow, these recoveries can be discounted for the net present value calculation. Few lenders have sufficient recovery data to estimate the full lifetime recovery rate, so the estimation typically involves choosing an extrapolation of the recovery rate lifecycle.

8. The Role of Machine Learning and AI

Machine learning is seeing wide adoption in credit risk modeling. Even the list of surveys of applications of machine learning is quite long [53,54,55,56]. Research into applying machine learning to credit risk has closely followed the development of new algorithms [57,58,59,60,61]. However, the vast majority of this research continues to view credit risk modeling as a classification problem rather than predicting probabilities as a time series.

Refs. [50,62,63,64] have won the most scoring competitions, and ReLu neural networks [58,60,65,66,67] are popular for their explainability. Companies such as Google made deep learning neural networks famous, but with the usual credit scoring inputs, ReLu networks in credit scoring rarely have more than two or three hidden layers. ReLu is the name of the most common activation function in neural networks for credit scoring. For an input x, ReLu(x) = max(0, x), which means it is 0 for negative values and linear for positive values. This means that the neural network is essentially a piecewise linear model across the input space. Interestingly, stochastic gradient-boosted regression trees also create a piecewise linear model across the input space. Differences in the estimation process mean that the models are not identical. Also, regression trees iteratively choose the next best tree. If mapped to a neural network architecture, this would likely correspond to a heavily pruned multilayer network.

Neural networks excel in processing images, sound, and credit card transactions because of the logical proximity of the inputs. Traditional credit scoring data are a mix of continuous and discrete variables that usually have no logical proximity. Research has suggested that tree-based approaches are better suited to such discontiguous inputs [68]. Alternatively, tree-based methods may just be better at discovering the ideal architecture for an equivalent sparse neural network.

Some authors have applied the advantages of ensemble modeling [69,70,71,72,73,74,75] to survival analysis with random survival forests [76], even with competing risks [77].

The root cause of the limitations of ML, AI, and generative AI in credit risk is the sparsity of data. We may have many accounts to model, but not enough economic cycles or credit cycles for any statistical algorithm to predict borrower behavior without a conceptual understanding of the borrowers, the loan products, and the economy. Multiple decades of data are required for AI to learn such patterns purely from the data, or we need a future-generation AI that can learn from human best practices and combine that with statistical analysis the way our best human analysts do today. An alternative given the data available today is to combine the long-range forecasting talents of APC models with the nonlinear scoring capabilities of machine learning methods.

Regardless of which method is best or simply preferred by an analyst expert in a certain technique, the real win comes by changing how the data are structured. To integrate ML and APC, the key is again to adopt a panel data structure and incorporate lifecycle and environment inputs [78], as with discrete-time survival models or multihorizon survival models from logistic regression applications [28]. The extension to age–period–cohort (APC) machine learning models is immediately available with stochastic gradient-boosted regression trees (SGBRT) and can be designed into neural network architectures [78].

For SGBRT, libraries exist in R and Python that allow the analyst to specify an “offset”. In regression, an offset is used to include the forecasts of an external model without applying any scaling coefficient. The primary consequence is that the machine learning model will only attempt to explain the residuals of the external model. In this case, the external model is the lifecycle + environment from the APC decomposition. Practically, it means that the long-term structure in the data is left to the APC algorithm, and the short-term, account-specific structure is explained by the machine learning model. If the goal is to build a score, this is an ideal approach to ensuring that the scoring model focuses only on the part that rank-orders accounts, without becoming confused by economic trends and product lifecycles.

For neural networks, a slightly custom architecture is required. Figure 1 shows how to create a neural network for an origination score (left) with lifecycle and environment as fixed inputs, equivalent to the offset in regression. On the right is the same concept, but designed to create a unique forecast for each horizon when incorporating behavioral data, such as delinquency.

Simply, one must create a network in two halves. The left sides of the networks are just the APC inputs going to the output without adjustment by the network estimation. The right sides of the networks can be the same ReLu networks as described above. The input data must again be panel data.

Not all machine learning algorithms can be modified in this way, but fortunately, the two most popular approaches can both be modified in an afternoon by an analyst already familiar with the techniques, assuming the package allows for such customization. In out-of-sample/out-of-time testing, both SGBRT and NNs, when modified with APC inputs, were significantly more robust and could go for years without retraining [79]. To predict future PD values, one need only put the future values of lifecycle and environment into the models to obtain a scenario-based prediction, rather than just a rank-order score.

Beyond dramatically reducing overfitting, this modification turns the machine learning model into an interlocking component of an account-level cash flow model capable of accepting economic scenarios to adjust the forecasts. The output of the machine learning model can be in the same units as its logistic regression counterparts, so they can be integrated into a yield calculation without modification or assumptions. This relatively small change turns machine learning scores into machine learning cash flow models.

9. Incorporating Alternate Data

Machine learning is generating successes in credit risk, although less dramatically in well-worn domains where the input data are still bureau and application data. These datasets have been linearized over decades such that little nonlinearity remains to be discovered.

The biggest wins for machine learning appear to be in niche products, alternate channels, and serving the underbanked [80], as well as utilizing alternate data sources. A well-trained machine learning algorithm may preprocess deposit histories [81], corporate financial statements, Twitter posts [82], social media [83,84,85], or mobile phone use [86,87] to create input factors that eventually feed into deceptively simple methods like logistic regression models.

When exploring alternate data, the real question is whether we are making the best use of it. The APC cash flow framework inherently separates account idiosyncratic effects from systemic effects in the environment. This is valuable with alternate data because it can improve the signal-to-noise ratio for the information contained therein. Alternate data, either at origination or as behavioral information, can be used in an APC + machine learning model just like any other data source, but it may be even more valuable in the context of alternate data to avoid spurious correlations or double-counting against the economic environment.

10. Understanding Credit Cycles

Credit quality and origination volumes go through cycles. A credit cycle is when the intrinsic credit risk of loans originated in certain periods can be better or worse than loans from other periods, even after normalizing for all available scoring factors and post-origination economic conditions.

Adverse selection is not noise. This residual risk can arise from price competition among lenders, or when the consumer’s loan purchasing choices change with the overall economic environment. Competitive adverse selection occurs when one lender’s product offerings appear less desirable to borrowers compared to those of other lenders. Those who cannot qualify for the desirable competing product will apply to the less desirable lender as a last option [88,89]. These are generally higher-risk borrowers, and this can persist for a period of time, but competitive adverse selection should not be cyclical.

Macroeconomic adverse selection is autocorrelated over years and goes through cycles [90,91,92]. Generally speaking, it is not model-dependent. Machine learning methods do nothing to reduce this uncertainty [93]. Rather, we have a severe hidden variable problem. Credit bureau and loan application data do not tell us about the personality of the borrower, only whether they have managed to pay their bills in the past. In terms of the probability of default, adverse selection can be a bigger contributor to default risk than economic cycles.

Figure 2 shows the residuals of an APC origination score [27] built on Fannie Mae and Freddie Mac data, including the fields available in those datasets: FICO score, loan-to-value (LTV), debt-to-income (DTI), loan purpose, jumbo, property type, and documentation. The graph is scaled to the units of the FICO score, so by adding this adjustment to the reported FICO score, it may be interpreted as the effective score for the applicant. The highest-scoring band is also the most dynamic through the credit cycle. The period of poor quality in 2018–2019 did not impact lower FICO borrowers, so it was largely ignored by commentators. However, the post-pandemic period of 2022–2023 saw dramatically poorer quality among all score bands. These variations correlate well with the Senior Loan Officer Opinion Survey (SLOOS) of mortgage demand and changes in economic conditions, most importantly, the change in the 30-year mortgage interest rate.

No one has yet published the necessary survey results to profile the attitudes of borrowers seeking loans in different environments, but the best intuitive explanation appears to be the loss of value shoppers in certain economic conditions. Lender surveys show that borrower demand falls dramatically when interest rates rise or the cost of the item purchased is rising. Demand is apparently not falling proportionally among all personality types. Instead, the value shoppers who can delay their purchases will wait for better economic conditions. The remaining borrowers are the impulse buyers who will eventually be at higher risk. This effect is strongest among the highest-scoring borrowers because that is where we have the largest proportion of value shoppers.

Some have suggested that this is a nonlinear modeling problem that can be resolved by replacing logistic regression scores with machine learning. Measuring macroeconomic adverse selection is a noisy and delayed process when using cross-sectional data. When using panel data versions of logistic and machine learning models with the same, traditional inputs, an early warning signal of adverse selection is possible.

Macroeconomic adverse selection is partly predictable, but the first step is measurement—tracking residual risk by vintage pool and segment. This residual risk can be calibrated to units of bureau score so that the underwriting team can understand that recent loans booked with 720 scores are performing as if their scores were 35 points lower, for example. Because of the autocorrelation in macroeconomic adverse selection, these adjustments can be incorporated directly into the default models for new originations, assuming that the problem persists until proven otherwise.

Still, estimating adverse selection requires performance data, which may result in a persistent six-to-nine-month lag. An alternative is to monitor loan demand or even predict loan demand because falling demand strongly correlates with higher residual risk. This prediction is more of a nowcast, looking at current or near-term economic conditions, so it is more reliable than a longer-term economic scenario. The nowcast of adverse selection can then be incorporated into the default model and estimated yields by segment, ultimately influencing the pricing, terms, and risk appetite of the lender.

11. Incorporating Economic Cycles

The profit projections used at loan origination and in finance for portfolio management must use forward-looking economic scenarios. The lending industry has become very efficient, with margins much tighter in recent decades, making the use of historical averages for key loan performance rates non-competitive. Rather, market-leading lenders already set pricing according to forward expectations of economic conditions.

APC and survival models have long been deployed in lending for stress testing [22,24,94,95,96,97]. This capability remains unchanged when account-level, panel data models are created.

Directly incorporating economic scenarios into profit projections changes how loan rates are set. If management takes a more pessimistic stance on the economic environment, swapping economic scenarios or changing scenario weights would immediately flow through the portfolio and account-level profitability estimates without manual intervention. Rather than guessing at new credit score cutoffs, the re-estimation of loan yields would immediately adjust the product offering.

This is really just putting portfolio stress-test models to work. Most lenders have implemented stress-testing models for regulatory compliance. Unfortunately, many of those are not high enough quality to be used for anything else, but in principle, that is what is needed. At the very least, creating the necessary models to stress-test loan profitability can also be used to enhance portfolio stress testing. This is a chance to obtain a tangible return from stress-testing investments.

Although incorporating economic scenarios is a necessity, we must be realistic in the use of these scenarios. The economy has momentum that allows for some reasonable predictability for up to twelve months. In fact, our research has shown that we could do a good job of predicting economists by continuing current economic trends for the near term and flattening them out to long-run averages around 12 months in the future. For loss forecasting, most lenders use scenarios up to 24 months so that they can see the impacts of a near-term recession or expansion. Beyond 24 months, no one really knows what will happen, and we should revert the net portfolio impact of the economy to a long-run average. For pricing loans and portfolio optimization, this will give the most reasonable lifetime expectation and is also consistent with the reasonable and supportable (R&S) period used by most lenders.

One of the best uses of stress testing in cash flow forecasting is reverse stress testing [98,99]. By testing a range of macroeconomic scenarios or scenarios generated via Monte Carlo simulation, the lender can determine how much of a recession can be withstood while still remaining profitable or staying above the institution’s hurdle rate for investment.

12. Profit Volatility Models

Predicting profitability for future originations is not enough to manage a portfolio. To optimize resource allocation by segment, modern portfolio theory (MPT) is equivalent to optimizing the Sharpe Ratio [100,101]

S h a r p e R a t i o = E x p e c t e d R e t u r n / E x p e c t e d V o l a t i l i t y i n R e t u r n

(9)

This is the classic risk–reward tradeoff, although lenders have often misinterpreted “risk” in this formulation. The risk that needs to be minimized is not the expected loss. It is also not the volatility of losses, as measured for economic capital or Basel II regulatory capital. Rather, expected volatility in return is the confidence interval in the yield estimate in Equation (4), considering model estimation uncertainties and future economic uncertainties in the PD, PA, recoveries, and any other estimated components of Equation (3). In fact, for prime and superprime lending, the greatest volatility is in prepayment rates, and thus revenue.

To compute the uncertainty in yield, we need to estimate the uncertainties in the PD, PA, recovery rates, and any other forecasted component. Using an APC framework, the uncertainty in the PD will have three components: uncertainty in the lifecycle, uncertainty from estimating the credit risk, and uncertainty in the future economic environment.

Each of these components (lifecycle, quality, and environment) can be separate models with their own forecast uncertainties at a given age, vintage, and date. If those models are estimated via logistic regression or APC + ML, forecast confidence intervals are normally distributed in log-odds space. Combining these components gives a log-odds forecast for the account rate and a logit-normal distribution for the forecast.

The forecast and uncertainty are straightforward to compute at a single time step for an account or vintage cohort. With a competing risks approach (default, payoff, or continue), as expressed in the cash flow model in Equation (8), the uncertainties must be combined each month in order to obtain a lifetime forecast of yield and yield uncertainty. The forecast can be solved algebraically, but the uncertainty distribution does not have a closed-form solution.

Fortunately, some simple Monte Carlo experiments have shown that the result can always be expressed as a beta distribution with two estimated parameters [39]. In a quantum computing future, this might be solved without simulation, but with current classical computing, just fifty iterations of a Monte Carlo simulation provide a sufficiently precise estimate of the lifetime yield uncertainty. The entire calculation can be performed in one second on a laptop, which is efficient enough for portfolio optimization or improving underwriting, as discussed in the next section.

Having estimated yield and yield uncertainty by product and segment allows for true portfolio optimization using efficient frontier concepts from Modern Portfolio Theory. Unfortunately, the first discussion of applications of the efficient frontier to lending has been taken out of context. Robert Oliver showed [102] that, analogous to the efficient frontier, one could set score cutoffs to manage tradeoffs between yield and other business targets. Others have repeated this work, interpreting the efficient frontier in lending as comparing yield to risk, where risk is the loss rate. Unfortunately, this perspective is an incorrect interpretation. Portfolio optimization and the efficient frontier associated with optimal investment across a range of loan products and segments compare yield to uncertainty in yield. The loss rate is a component of yield but cannot substitute for volatility in yield.

The final required component of portfolio optimization is a correlation matrix between the yields of the different loan products and segments. Unfortunately, this is also usually computed incorrectly. The most common approach is to take time series of losses and compute a correlation matrix. First, one must consider yield, not losses. Second, when a large volume of new loans is originated, those loans will mature according to their lifecycles. This usually happens when interest rates fall and consumer demand increases. This creates oscillations in loss rates and yields that arise from the predictable response to changes in origination volumes. Correlating loss time series is correlating loan origination volumes. This is the source of many spurious anti-correlation results in loan product correlation matrices. In response to a recession, consumer loans usually default before commercial loans, but there is no anti-correlation among loan products, only poorly estimated correlation matrices.

To properly compute correlations for loan portfolios, the time series must be normalized for origination volumes, which means factoring out the lifecycle from the observed performance. APC modeling also offers a solution for computing correlation matrices. The data for each loan portfolio can be decomposed, the environment functions extracted, and then a correlation matrix computed among those functions. This correlation will again be in log-odds space, which is appropriate for correlation measures that assume normal distributions.

With yield forecasts, yield uncertainties, and yield correlations, portfolio optimization between loan products or segments becomes useful, and the algorithms are widely available.

13. Optimizing Pricing for Net Income

One of the worst practices in lending today is “meet the market” pricing. Essentially, lenders look at their competitors’ prices and decide whether to be slightly more or less aggressive, based largely on loan volume growth goals. This can fail spectacularly when it creates a herd mentality of lenders competing for growth with no actual predictions of future profitability.

The purpose of starting with product yield forecasts is to restructure how loans are priced. At times, this will require recognizing that matching competitor prices in some segments will likely be unprofitable. The result could be for the lender to decide to pull back on lending in certain segments until market prices are more sensible, restructuring the offering to appeal to less price-sensitive and more profitable segments, or making a strategic decision to bear a temporarily increased loss risk in order to maintain a market position, although perhaps prudently shrinking this exposure. Originating to resell the loans when they are expected to be unprofitable has been done by many, but it carries significant reputation risk or even repurchase risk, depending on the sale terms.

Any changes to pricing and terms will necessarily change the volume of loans originated [103,104,105]. Consumer segments will have different price sensitivity or may be sensitive to monthly payment size rather than total interest cost. We have found that we can incorporate other loan terms in the demand forecast, such as changes in credit limit and teaser rates and periods. Building a volume forecasting model allows us to change from setting fixed yield cutoffs for loan origination to optimizing the expected net income. With net income defined simply as

N e t i n c o m e = E s t i m a t e d y i e l d \cdot E s t i m a t e d V o l u m e

(10)

an increase in interest rates will increase estimated yield but reduce volume. Decreasing rates will decrease estimated yield but increase volume, so there will be a natural optimal value for setting the loan terms.

Of course, many other approaches can be used for this optimization. One popular approach is to find the loan terms that maximize volume while maintaining a minimum yield, also known as a hurdle rate. This approach is most common for those seeking to sell or securitize loans in excess of what can be held in the portfolio. The yield forecast can be modified in this case to focus on servicing revenue and selling the risk. In situations with significant capital constraints, such as the outflow of deposits experienced by US credit unions in 2022, the focus might turn to maximizing yield with volume held at a threshold.

The real value of this analysis comes when it is segmented. The usual situation is not that a single policy applied to the entire portfolio beats the market, but rather, by carefully optimizing by segment, we find opportunities that are missed. This is most often true when economic conditions are changing, and the lending market has not yet found equilibrium. Note that the yield estimates must consider the effect of changing interest rates on macroeconomic adverse selection, so raising rates significantly could actually lower yields for some segments as poorer quality loans are originated.

Ultimately, all of the above needs to incorporate a yield uncertainty adjustment or economic capital allocation. This can also be handled in portfolio optimization where the portfolio growth targets consider profit volatility.

14. Underwriting with Profitability and Profit Uncertainty Estimates

At financial institutions today, yield estimates remain in the finance department, with only high-level impacts on product management. Instead, loans are generally originated according to a credit score, with minimum cutoffs set in a backward-looking, intuitive manner. Loan origination scores are rank-ordering bureau scores or in-house scores. The technology for creating scores has been refined over decades. Portfolio failures rarely occur because the lender could not rank-order risk within a pool of applicants. Failures usually happen because the lender could not set origination cutoffs through changing economic conditions that would ensure profitable lending.

The business line, which owns the success of the products, needs better origination tools. Some of the leading fintechs have already moved to PD-based origination systems. Those companies often build their own tech stacks so that they are not locked into legacy architectures. However, even legacy systems do not need to restrict lenders from using PD-based scores. The age–period–cohort or discrete-time survival model frameworks view credit risk as a component separable from the lifecycle and environment. This means that even when a PD model has been properly built with panel data and economic scenarios, the credit risk component can be run separately to produce a numerical risk ranking and scaled as required by the underwriting system.

The advantage of replacing a rank-order score built on cross-sectional data with a PD-based score is that the cutoff score can be set immediately from a chosen lifetime loss or profitability goal. For loss rate targets, the threshold is adjusted to achieve the target according to the competing risk cash flow model under the chosen economic scenario. As mentioned before, for many products, the probability of prepayment is actually more important than the probability of charge-off. Certain borrower or product attributes can significantly change the likelihood of prepayment. For example, having a deposit or checking account, especially with a direct payroll deposit, will alter the likelihood of prepayment and the profitability of the loan or line of credit. Certain rewards products can significantly alter the product profitability, both positively for improved utilization and negatively for the reward costs. Both aspects should be in the yield calculation at underwriting. Therefore, a better approach would be to extract the prepayment score from the panel data-based prepayment model and use it as a second score for underwriting, at least in custom segmentation. This can be a matrix or decision tree approach where both cutoff scores are estimated, with constraints, to achieve the business’s profitability targets and risk appetite.

Extracting the scoring component from a panel data-based model in order to fit into existing systems is a quick path to implementation, but it does not fully utilize the model’s potential. Rather than computing a risk-ranked credit score at origination and running through a decision tree, loan underwriting could be based directly on yield. Account-level yield calculations are quick enough to run at underwriting. For each account, given the current pricing for the borrower’s attributes and the loan terms, and considering the accepted macroeconomic scenario, the yield can be estimated. The decision to originate the loan can be based directly on a yield threshold. As economic scenarios change, the models and thresholds can be preserved, and the forecasts can simply be updated.

Using forward-looking estimates of profitability in underwriting would be a major advance, but considering profit uncertainty may be even more critical. Within linear models, the forecast uncertainty varies smoothly across the input space, except when crossing the boundary of important indicator variables. Bureau scores are reported without confidence intervals, and the industry assumes a standard amount of uncertainty, with a few key exceptions. For example, the thin-file/thick-file distinction is important in underwriting. A low bureau score based on very little credit history means that we do not really know if the borrower is a good or bad risk. A starter product might be offered to allow the customer to demonstrate their creditworthiness. A low score for a thick-file customer means that we are fairly certain that they are a poor credit risk, and they might be rejected outright.

Aside from the thin-file/thick-file distinction, forecast uncertainty is rarely considered in lending. The real power of machine learning is its ability to identify pockets of predictability within a broad input space rather than assume continuity in forecast accuracy across the whole space. Finding pockets of predictability necessarily means that other regions may have much lower predictability and higher uncertainty.

With the creation of panel data scores with APC inputs, estimating the full forecast uncertainty is a fairly efficient process using Monte Carlo simulation to estimate the parameters of a beta distribution for each borrower, but such uncertainty estimates are not in use anywhere in lending. This is a missed opportunity for more precision in product offer generation, but with machine learning, forecast uncertainty estimation is critical. Even legacy loan origination systems could be adapted to using a yield forecast as a credit score and the uncertainty in yield as a second score in a decision tree or matrix.

15. Preparing for Natural Disasters, Climate Change, and Pandemics

Stress testing natural disasters, climate change, and pandemics is currently being viewed as a regulatory activity. However, these risks also fit within the APC modeling framework and can even be run through the profit and profit uncertainty forecasts in order to determine the corresponding risks.

For stress testing the impact of natural disasters, climate events, and pandemics on loan portfolios, a consistent approach may be followed [106]:

Identify severe historical events that may be used as reference events.
Estimate how each event impacted the economy relative to borrowers, or find previous research that has quantified the economic impacts.
Create “impulse functions” for key macroeconomic variables that can be overlaid on a baseline macroeconomic scenario.
Run this scenario through a portfolio stress-test model, profit model, or underwriting model to measure the potential impact of each event.
Estimate the probability of future events of this magnitude in order to determine the net expected impact. If more than one event has been observed historically, they can be used to create a risk distribution.

If an event has no precedent, an economist may still be consulted to create scenarios of the potential impact.

Climate risk assessment, appropriately, grew from an insurance industry perspective: physical risk and transition risk. Insurance portfolios directly bear the cost of property damage and have such long horizons that transition risk is important. Financial institutions bear little physical risk because collateralized loans are required to be insured. Some residual risk will remain because of the claims and liquidation process following a disaster, but lenders rarely show significant losses from the kinds of events that can devastate insurers.

Much discussion is occurring around transition risks deriving from adopting carbon-neutral policies over the next 30 to 50 years. However, even US 30-year mortgages have such short effective lifetimes, around 6 years on average, that portfolio managers can adapt to transition risks comparatively quickly. Transition risks can easily be designed out of any loan product by including periodic interest rate repricing. Such renewal periods have already been adopted in order to minimize the lifetime loss reserve calculations under the US CECL.

The biggest risk for lenders is economic risk. This is a risk largely ignored by insurance companies and the climate risk community because insurers are not subject to the economic environment in which consumers live, other than possible decreases in property values. For lenders, loans can default from a natural disaster event, even when no property damage occurs. In fact, this is the only significant climate risk for lenders.

When a devastating wildfire hits a forest community, such as in California in 2019, neighboring communities see less business from forest products, recreation and tourism, and the many businesses that support those directly affected [107]. Residents lose their jobs, even if their homes and cars are unharmed. Credit cards default. Commercial real estate loans default. Small business loans default. The actual populations in these communities decline as people move to other locations for work. Lenders experience rising defaults, rising prepayments, and a fall in loan demand in communities that saw no direct climate damage.

Past severe weather events involving drought, wildfires, and floods can be analyzed in order to quantify the broader economic impact across a region. This is already being done by economists. For example, the California wildfires in 2019 reduced the statewide GDP by 1% compared to the baseline [108]. Each such event that is studied provides a data point for calibrating event severity to economic impact.

The climate event economic impact can be overlaid on an existing baseline economic scenario and fed through the lender’s CCAR or other stress-test model, or an industry stress-test model. This approach fits naturally into the profit forecasting and stress-testing framework described in this article.

An even more compelling example of a climate event causing only economic harm was the 2012 Midwestern drought. This drought caused no lasting damage to fields, barns, tractors, and combines, but the economic base for rural communities in states like Kentucky was so devastated that it took seven years to recover, and over those seven years, cumulative economic growth was 16% lower than in neighboring urban communities [109,110]. FDIC data show that lenders had measurable losses across the range of loan product types, not just agricultural loans. This event serves as an effective reference event for assessing the impact of drought on loan portfolios [106].

When considering contagious diseases in human populations, we have two well-documented reference events: the 2002 Hong Kong SARS recession and the 2020 COVID-19 recession. Other disease outbreaks, such as MERS, Ebola, and Zika, have not caused the kind of macroeconomic impacts that are applicable to loan portfolios.

Severe Acute Respiratory Syndrome (SARS) was used as a reference event at the outset of the COVID-19 pandemic in order to run scenarios of what could happen. In banking, the Hong Kong SARS recession of 2002–2003 was the only disease-led recession within our data prior to COVID-19.

In late 2002, Severe Acute Respiratory Syndrome spread from civets to humans, eventually affecting 8000 people in southern China and Hong Kong. With 774 deaths, the fatality rate was 9.6%. The virus itself did not affect a large percentage of the population of Hong Kong (only 1755 in total), but the resulting fear and control measures caused a measurable shock to the Hong Kong economy.

When working for a Hong Kong lender in 2002–2003, the author was able to create an effective model for SARS using time series data of the Hong Kong hospitalization rate. This model could even explain the sudden reversal from the 2002 economic recovery. However, any model that correlates the number of cases of disease with loan performance will be specific to only that pathogen and societal response. The extra loan defaults experienced by a financial institution are not due to the sickness of those people. Rather, it is an economic contagion effect caused by fear of the disease. The multiplier between the number of sick people and the economic impact is a function of how much the disease is feared.

Therefore, the best approach is to model defaults using traditional macroeconomic factors such as GDP, unemployment, and residential property values. In Hong Kong, all of these were affected by the fear of SARS. Consumers did not go shopping, so businesses laid off workers. Property values fell and fell especially hard in areas with outbreaks. Rather than creating a model of the disease, we created a model of recessions that would translate future epidemics into scenarios impacting typical macroeconomic factors. The lesson from this exercise was that to run stress tests for pandemics and natural disasters, in general, lenders do not need to create custom stress-test models. They need to create custom macroeconomic scenarios that can be integrated into existing stress-test models.

Given a loss expectation for a specific risk, lenders can directly translate this to an incremental loss reserve or capital charge, depending on the immediacy of the danger. Increased reserves or capital can be allocated to the products and geographies most at risk, increasing the cost of lending in those areas. In response, the lender may choose to adjust interest rates or terms on loans originated in those areas or sell loans from these areas for greater portfolio diversification.

When selling loans, the incremental loss estimation due to disaster risk will also impact the expected yield on the loans. Buyers of loans may want to perform their own climate risk assessment to make sure that the expected yields will be what is asserted by the seller.

16. Modeling Post-Event Data

After an event has occurred, unique challenges are created for credit risk analysts. After the 2020 COVID-19 pandemic, loan performance data in the US only returned to economic principles in 2023. During the primary years of the pandemic, consumers were supported by government policies and lender forbearance programs intended to counterbalance the worst economic impacts. As lenders sought to refresh their scoring models, how best to use the pandemic data became a vexing problem.

The question is whether the pandemic data can be normalized in order to make it applicable to post-pandemic consumer behavior. Unfortunately, credit score development does not provide a robust method of normalizing the data because the timing of defaults is not specifically identified. Without such normalization, most lenders were forced to exclude the pandemic data and continue using pre-pandemic models. However, the APC scoring approach described earlier allows for a normalization approach.

In fact, the solution is a simple application of the APC scoring methodology. The historical data are first decomposed via APC. The environment function from this decomposition captures the net impact of the crisis and any government or lender support for borrowers. Figure 3 shows an example of the environment function for nine auto lenders throughout the pandemic. In the US, rising unemployment would have predicted an increase in defaults. However, with government assistance and lender forbearance programs, the number of defaults dropped instead. Including macroeconomic data when building a score would not provide the needed normalization, but using the environment function from an APC decomposition as a fixed offset when constructing a score effectively normalizes the performance history for the net impact of exogenous forces.

A credit score built in this way will be able to use pandemic data and still extrapolate to normal economic periods. Our tests show that these scores are quite robust and can be built by incorporating data from before, during, and after the pandemic into a single score. This is a significant change in the way scores are built and employed during natural disasters. Note that this method works with machine learning algorithms exactly as described in Section 8.

17. Conclusions

In data analysis, we often think of modeling techniques, such as Cox proportional hazards models or age–period–cohort models, as a package of specific estimators and output parameters. If we step back and think of APC and survival methods as a framework where age, vintage, and calendar date effects are separable predictions over forecast intervals, leaving open the details about how each piece is estimated, the approach becomes very powerful. An APC framework provides a natural integration of data analysis with cash flow forecasting.

If the creation of credit scores is modified to use panel data with lifecycle and environment as inputs, then all credit risk analytics can be integrated across finance, underwriting, credit risk management, stress testing, and capital. This level of integration is not possible with models built on cross-sectional data. The approximations required to incorporate temporal effects with traditional credit scores lead to inherent model weaknesses. An APC framework provides direct mathematical solutions for integrating models and results in better component models.

Beyond presenting age–period–cohort modeling as an integrating framework across an institution’s credit risk analytics, this article contributed several new pieces to the framework. Being able to solve for score cutoffs from yield targets set by the finance team solves a significant issue in lending today. Incorporating volatility estimates in the underwriting process needs to be carried out with careful consideration of fair lending regulations but could be a valuable part of applicant segmentation. Further, the COVID-19 pandemic created a shock in the historical data of lenders that has caused almost all institutions to exclude that data from consideration in credit score construction. Including APC offsets for lifecycle and environment during the score estimation normalizes the pandemic data, meaning that no data need to be excluded.

The greatest challenges to the adoption of FPU are overcoming the limitations of older lending systems, the siloed nature of many organizations, and the seeming complexity of the approach. Much of this can be overcome via analyst training, as the added complexity is the requirement to learn new estimation methods and quantifying processes that were previously hidden as management judgment.

Funding

This research received no external funding.

Data Availability Statement

The data used in this study are proprietary to the contributing institutions and not publicly available.

Conflicts of Interest

Joseph L. Breeden was employed by Future Analytics LLC. The Future Analytics LLC had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Myers, J.H.; Forgy, E.W. The development of numerical credit evaluation systems. J. Am. Stat. Assoc. 1963, 58, 799–806. [Google Scholar] [CrossRef]
Thomas, L.; Crook, J.; Edelman, D. Credit Scoring and Its Applications; SIAM: Philadelphia, PA, USA, 2017. [Google Scholar]
Siddiqi, N. Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards; John Wiley & Sons: Hoboken, NJ, USA, 2017. [Google Scholar]
Cox, D.R.; Oakes, D.O. Analysis of Survival Data; Chapman and Hall: London, UK, 1984. [Google Scholar]
Fleming, T.R.; Harrington, D.P. Counting Processes and Survival Analysis; Wiley: New York, NY, USA, 1991. [Google Scholar]
Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. B (Methodol.) 1972, 34, 187–220. [Google Scholar] [CrossRef]
De Leonardis, D.; Rocci, R. Assessing the default risk by means of a discrete-time survival analysis approach. Appl. Stoch. Model. Bus. Ind. 2008, 24, 291–306. [Google Scholar] [CrossRef]
Breeden, J.L.; Bellotti, A.; Leonova, Y. Instabilities in cox proportional hazards models in credit risk. J. Credit. Risk 2023, 19, 29–55. [Google Scholar] [CrossRef]
Holford, T.R. The estimation of age, period and cohort effects for vital rates. Biometrics 1983, 39, 311–324. [Google Scholar] [CrossRef] [PubMed]
Glenn, N.D. Cohort Analysis, 2nd ed.; Sage: London, UK, 2005. [Google Scholar]
Carstensen, B. Age–period–cohort models for the lexis diagram. Stat. Med. 2007, 26, 3018–3045. [Google Scholar] [CrossRef] [PubMed]
Smith, T.R.; Wakefield, J. A review and comparison of age–period–cohort models for cancer incidence. Stat. Sci. 2016, 31, 591–610. [Google Scholar] [CrossRef]
Fu, W. A Practical Guide to Age-Period-Cohort Analysis: The Identification Problem and Beyond; Chapman and Hall/CRC: Boca Raton, FL, USA, 2018. [Google Scholar]
Breeden, J.L. Reinventing Retail Lending Analytics: Forecasting, Stress Testing, Capital and Scoring for a World of Crises, 2nd Impression; Risk Books: London, UK, 2014. [Google Scholar]
Gascoigne, C.; Smith, T. Penalized smoothing splines resolve the curvature identifiability problem in age-period-cohort models with unequal intervals. Stat. Med. 2023, 42, 1888–1908. [Google Scholar] [CrossRef] [PubMed]
Schmid, V.; Held, L. Bayesian age-period-cohort modeling and prediction—bamp. J. Stat. Softw. Artic. 2007, 21, 1–15. [Google Scholar] [CrossRef]
Basel Committee on Banking Supervision. International Convergence of Capital Measurement and Capital Standards: A Revised Framework. Available online: http://www.bis.org (accessed on 21 November 2023).
Breeden, J.L. Testing retail lending models for missing cross-terms. J. Risk Model Valid. 2010, 4, 49–57. [Google Scholar] [CrossRef]
Yaffee, R. A primer for panel data analysis. Connect. Inf. Technol. 2003, 8, 1–11. [Google Scholar]
Hsiao, C. Panel data analysis—advantages and challenges. Test 2007, 16, 1–22. [Google Scholar] [CrossRef]
Hsiao, C. Analysis of Panel Data; Number 64; Cambridge University Press: Cambridge, UK, 2022. [Google Scholar]
Stepanova, M.; Thomas, L. Survival analysis methods for personal loan data. Oper. Res. 2002, 50, 277–289. [Google Scholar] [CrossRef]
Ata, N.; Özkök, E.; Karabey, U. Survival data mining: An application to credit card holders. Sigma Mühendislik Fen Bilim. Derg. 2008, 26, 33–42. [Google Scholar]
Bellotti, T.; Crook, J. Credit scoring with macroeconomic variables using survival analysis. J. Oper. Res. Soc. 2009, 60, 1699–1707. [Google Scholar] [CrossRef]
Byanjankar, A. Predicting credit risk in peer-to-peer lending with survival analysis. In Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence (SSCI), Honolulu, HI, USA, 27 November–1 December 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–8. [Google Scholar]
Djeundje, V.B.; Crook, J. Dynamic survival models with varying coefficients for credit risks. Eur. J. Oper. Res. 2019, 275, 319–333. [Google Scholar] [CrossRef]
Breeden, J.L. Incorporating lifecycle and environment in loan-level forecasts and stress tests. Eur. J. Oper. Res. 2016, 255, 649–658. [Google Scholar] [CrossRef]
Breeden, J.L.; Crook, J. Multihorizon discrete time survival models. J. Oper. Res. Soc. 2022, 73, 56–69. [Google Scholar] [CrossRef]
IASB. IFRS 9 Financial Instruments; Technical Report; IFRS Foundation: London, UK, 2014. [Google Scholar]
Bellini, T. IFRS 9 and CECL Credit Risk Modelling and Validation: A Practical Guide with Examples Worked in R and SAS; Academic Press: Cambridge, MA, USA, 2019. [Google Scholar]
Novotny-Farkas, Z. The interaction of the ifrs 9 expected loss approach with supervisory rules and implications for financial stability. Account. Eur. 2016, 13, 197–227. [Google Scholar] [CrossRef]
FASB. Financial Instruments Credit Losses (Subtopic 825-15); Financial Accounting Series; FASB: Norwalk, CT, USA, 2012. [Google Scholar]
Breeden, J.L. Living with CECL: Mortgage Modeling Alternatives; Prescient Models LLC: Santa Fe, NM, USA, 2018. [Google Scholar]
Gordy, M.B. A risk-factor model foundation for ratings-based bank capital rules. J. Financ. Intermediation 2003, 12, 199–232. [Google Scholar] [CrossRef]
Australian Prudential Regulation Authority. Implementation of the Basel II Capital Framework 3. Internal Ratings-Based Approach to Credit Risk; Technical Report; Australian Prudential Regulation Authority: Sydney, Australia, 2005.
Cornford, A. The Global Implementation of Basel II: Prospects and Outstanding Problems; United Nations Conference on Trade and Development: Geneva, Switzerland, 2006; SSRN 1278049. [Google Scholar]
Uhlenbeck, G.E.; Ornstein, L.S. On the theory of brownian motion. Phys. Rev. 1930, 38, 823–841. [Google Scholar] [CrossRef]
Breeden, J.L.; Liang, S. A mean-reverting model to create macroeconomic scenarios for credit risk models. J. Risk Model Valid. 2015, 9, 1–12. [Google Scholar]
Breeden, J.L.; Leonova, Y. Classical and quantum computing methods for estimating loan-level risk distributions. J. Oper. Res. Soc. 2023, 74, 1800–1814. [Google Scholar] [CrossRef]
Breeden, J.L. Universal laws of retail economic capital. RMA J. 2006, 88, 48. [Google Scholar]
Schwartz, E.S.; Torous, W.N. Prepayment and the valuation of mortgage-backed securities. J. Financ. 1989, 44, 375–392. [Google Scholar]
Kang, P.; Zenios, S.A. Complete prepayment models for mortgage-backed securities. Manag. Sci. 1992, 38, 1665–1685. [Google Scholar] [CrossRef]
Stanton, R. Rational prepayment and the valuation of mortgage-backed securities. Rev. Financ. Stud. 1995, 8, 677–708. [Google Scholar] [CrossRef]
Heitfield, E.; Sabarwal, T. What drives default and prepayment on subprime auto loans? J. Real Estate Financ. Econ. 2004, 29, 457–477. [Google Scholar] [CrossRef]
Li, Z.; Li, K.; Yao, X.; Wen, Q. Predicting prepayment and default risks of unsecured consumer loans in online lending. Emerg. Mark. Financ. Trade 2019, 55, 118–132. [Google Scholar] [CrossRef]
Sohn, S.Y.; Jeon, H. Competing risk model for technology credit fund for small and medium-sized enterprises. J. Small Bus. Manag. 2010, 48, 378–394. [Google Scholar] [CrossRef]
Li, Z.; Li, A.; Bellotti, A.; Yao, X. The profitability of online loans: A competing risks analysis on default and prepayment. Eur. J. Oper. Res. 2023, 306, 968–985. [Google Scholar] [CrossRef]
Jokivuolle, E.; Peura, S. A Model for Estimating Recovery Rates and Collateral Haircuts for Bank Loans. Bank of Finland Research Discussion Paper. 2000. Available online: https://ssrn.com/abstract=1021182 (accessed on 9 April 2023).
Altman, E.I.; Resti, A.; Sironi, A. Analyzing and Explaining Default Recovery Rates; A report submitted to the International Swaps & Derivatives Association; International Swaps & Derivatives Association: New York, NY, USA, 2001. [Google Scholar]
Bastos, J.A. Forecasting bank loans loss-given-default. J. Bank. Financ. 2010, 34, 2510–2517. [Google Scholar] [CrossRef]
Calabrese, R.; Zenga, M. Bank loan recovery rates: Measuring and nonparametric density estimation. J. Bank. Financ. 2010, 34, 903–911. [Google Scholar] [CrossRef]
Witzany, J.; Rychnovsky, M.; Charamza, P. Survival Analysis in Lgd Modeling; Charles University Prague: Prague, Czech Republic, 2010. [Google Scholar]
Lessmann, S.; Baesens, B.; Seow, H.-V.; Thomas, L.C. Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. Eur. J. Oper. Res. 2015, 247, 124–136. [Google Scholar] [CrossRef]
Bhatore, S.; Mohan, L.; Reddy, Y.R. Machine learning techniques for credit risk evaluation: A systematic literature review. J. Bank. Financ. Technol. 2020, 4, 111–138. [Google Scholar] [CrossRef]
Breeden, J.L. A survey of machine learning in credit risk. J. Credit. Risk 2021, 17, 1–62. [Google Scholar] [CrossRef]
Shi, S.; Tse, R.; Luo, W.; D’Addona, S.; Pau, G. Machine learning-driven credit risk: A systemic review. Neural Comput. Appl. 2022, 34, 14327–14339. [Google Scholar] [CrossRef]
Makowski, P. Credit scoring branches out. Credit World 1985, 75, 30–37. [Google Scholar]
Desai, V.S.; Crook, J.N.; Overstreet, G.A., Jr. A comparison of neural networks and linear scoring models in the credit union environment. Eur. J. Oper. Res. 1996, 95, 24–37. [Google Scholar] [CrossRef]
Henley, W.E.; Hand, D. Construction of a k-nearest-neighbour credit-scoring system. IMA J. Manag. Math. 1997, 8, 305–321. [Google Scholar] [CrossRef]
West, D. Neural network credit scoring models. Comput. Opns. Res. 2000, 27, 1131. [Google Scholar] [CrossRef]
Yobas, M.B.; Crook, J.N.; Ross, P. Credit scoring using neural and evolutionary techniques. IMA J. Manag. Math. 2000, 11, 111–125. [Google Scholar] [CrossRef]
Twala, B. Multiple classifier application to credit risk assessment. Expert Syst. Appl. 2010, 37, 3326–3336. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Chang, Y.-C.; Chang, K.-H.; Wu, G.-J. Application of extreme gradient boosting trees in the construction of credit risk assessment models for financial institutions. Appl. Soft Comput. 2018, 73, 914–920. [Google Scholar] [CrossRef]
Malhotra, R.; Malhotra, D.K. Evaluating consumer loans using neural networks. Omega 2003, 31, 83–96. [Google Scholar] [CrossRef]
Vellido, A.; Lisboa, P.J.G.; Vaughan, J. Neural networks in business: A survey of applications (1992–1998). Expert Syst. Appl. 1999, 17, 51–70. [Google Scholar] [CrossRef]
Jensen, H.L. Using neural networks for credit scoring. Manag. Financ. 1992, 18, 15–26. [Google Scholar] [CrossRef]
Grinsztajn, L.; Oyallon, E.; Varoquaux, G. Why do tree-based models still outperform deep learning on typical tabular data? Adv. Neural Inf. Process. Syst. 2022, 35, 507–520. [Google Scholar]
Dasarathy, B.V.; Sheela, B.V. A composite classifier system design: Concepts and methodology. Proc. IEEE 1979, 67, 708–713. [Google Scholar] [CrossRef]
Clemen, R.T. Combining forecasts: A review and annotated bibliography. Int. J. Forecast. 1989, 5, 559–583. [Google Scholar] [CrossRef]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 832–844. [Google Scholar]
Lee, T.-H.; Yang, Y. Bagging binary and quantile predictors for time series. J. Econom. 2006, 135, 465–497. [Google Scholar] [CrossRef]
Liang, G.; Zhu, X.; Zhang, C. An empirical study of bagging predictors for different learning algorithms. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 7–11 August 2011. [Google Scholar]
Van Erp, M.; Vuurpijl, L.; Schomaker, L. An overview and comparison of voting methods for pattern recognition. In Proceedings of the Proceedings Eighth International Workshop on Frontiers in Handwriting Recognition, Ontario, ON, Canada, 6–8 August 2002; IEEE: Piscataway, NJ, USA, 2002; pp. 195–200. [Google Scholar]
Fantazzini, D.; Figini, S. Random survival forests models for sme credit risk measurement. Methodol. Comput. Appl. Probab. 2009, 11, 29–45. [Google Scholar] [CrossRef]
Frydman, H.; Matuszyk, A. Random survival forest for competing credit risks. J. Oper. Res. Soc. 2022, 73, 15–25. [Google Scholar] [CrossRef]
Breeden, J.L.; Leonova, E. When big data isn’t enough: Solving the long-range forecasting problem in supervised learning. In Proceedings of the 2019 International Conference on Modeling, Simulation, Optimization and Numerical Techniques (SMONT 2019), Shenzhen, China, 27–28 February 2019; Atlantis Press: Amsterdam, The Netherlands, 2019. [Google Scholar]
Breeden, J.L.; Leonova, Y. Stabilizing machine learning models with age-period-cohort inputs for scoring and stress testing. Front. Appl. Math. Stat. 2023, 9, 1195810. [Google Scholar] [CrossRef]
Abdulrahman, U.F.I.; Panford, J.K.; Hayfron-acquah, J.B. Fuzzy logic approach to credit scoring for micro finance in Ghana: A case study of kwiqplus money lending. Int. J. Comput. Appl. 2014, 94, 11–18. [Google Scholar]
New Credit Score Unveiled Drawing on Bank Account Data. ABA Banking Journal, October 2018. Newsbytes, Retail and Marketing, Technology. Available online: https://bankingjournal.aba.com/2018/10/new-credit-score-unveiled-drawing-on-bank-account-data/ (accessed on 21 August 2023).
Mengelkamp, A.; Hobert, S.; Schumann, M. Corporate credit risk analysis utilizing textual user generated content-a twitter based feasibility study. In Proceedings of the PACIS, Singapore, 5–9 July 2015; p. 236. [Google Scholar]
Allen, L.; Peng, L.; Shan, Y. Social Networks and Credit Allocation on Fintech Lending Platforms; Technical Report; Baruch College: New York, NY, USA, 2020. [Google Scholar]
Bailey, M.; Cao, R.; Kuchler, T.; Stroebel, J.; Wong, A. Social connectedness: Measurement, determinants, and effects. J. Econ. Perspect. 2018, 32, 259–280. [Google Scholar] [CrossRef] [PubMed]
Freedman, S.; Jin, G.Z. The information value of online social networks: Lessons from peer-to-peer lending. Int. J. Ind. Organ. 2017, 51, 185–222. [Google Scholar] [CrossRef]
Björkegren, D.; Grissen, D. Behavior revealed in mobile phone usage predicts loan repayment. arXiv 2017, arXiv:1712.05840. [Google Scholar] [CrossRef]
Pedro, J.S.; Proserpio, D.; Oliver, N. Mobiscore: Towards universal credit scoring from mobile phone data. In Proceedings of the International Conference on User Modeling, Adaptation, and Personalization, Dublin, Ireland, 29 June–3 July 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 195–207. [Google Scholar]
Marquez, R. Competition, adverse selection, and information dispersion in the banking industry. Rev. Financ. Stud. 2002, 15, 901–926. [Google Scholar] [CrossRef]
Crawford, G.S.; Pavanini, N.; Schivardi, F. Asymmetric information and imperfect competition in lending markets. Am. Econ. Rev. 2018, 108, 1659–1701. [Google Scholar] [CrossRef]
Breeden, J.L. Macroeconomic adverse selection: How consumer demand drives credit quality. In Proceedings of the Credit Scoring and Credit Control XII Conference, Edinburgh, UK, 23–25 August 2011. [Google Scholar]
Calem, P.S.; Cannon, M.; Nakamura, L.I. Credit Cycle and Adverse Selection Effects in Consumer Credit Markets-Evidence from the Heloc Market; Technical Report; Board of Governors of the Federal Reserve Bank: Washington, DC, USA, 2011. [Google Scholar]
Breeden, J.L.; Canals-Cerdá, J.J. Consumer risk appetite, the credit cycle, and the housing bubble. J. Credit. Risk 2018, 14, 1–30. [Google Scholar] [CrossRef]
Breeden, J.L.; Leonova, Y. Macroeconomic adverse selection in machine learning models of credit risk. Eng. Proc. 2023, 39, 95. [Google Scholar] [CrossRef]
Breeden, J.L. Modeling data with multiple time dimensions. Comput. Stat. Data Anal. 2007, 51, 4761–4785. [Google Scholar] [CrossRef]
Breeden, J.L.; Thomas, L.C.; McDonald, J., III. Stress testing retail loan portfolios with dual-time dynamics. J. Risk Model Valid. 2008, 2, 43–62. [Google Scholar] [CrossRef]
Breeden, J.L.; Thomas, L.C. The relationship between default and economic cycle for retail portfolios across countries: Identifying the drivers of economic downturn. J. Risk Model Valid. 2008, 2, 11–44. [Google Scholar] [CrossRef]
Bellotti, A.; Crook, J. Retail credit stress testing using a discrete hazard model with macroeconomic factors. J. Oper. Res. Soc. 2014, 65, 340–350. [Google Scholar] [CrossRef]
Grundke, P. Reverse stress tests with bottom-up approaches. J. Risk Model Valid. 2011, 5, 71. [Google Scholar] [CrossRef]
Eichhorn, M.; Bellini, T.; Mayenberger, D. Reverse Stress Testing in Banking: A Comprehensive Guide; Walter de Gruyter GmbH & Co. KG: Berlin, Germany, 2021. [Google Scholar]
Sharpe, W.F. Mutual fund performance. J. Bus. 1966, 39, 119–138. [Google Scholar] [CrossRef]
Farinelli, S.; Ferreira, M.; Rossello, D.; Thoeny, M.; Tibiletti, L. Beyond sharpe ratio: Optimal asset allocation using different performance ratios. J. Bank. Financ. 2008, 32, 2057–2063. [Google Scholar] [CrossRef]
Robert M Oliver and Eric Wells. Efficient frontier cutoff policies in credit portfolios. J. Oper. Res. Soc. 2001, 52, 1025–1033. [Google Scholar] [CrossRef]
Karlan, D.S.; Zinman, J. Elasticities of demand for consumer credit. Yale University Economic Growth Center Discussion Paper; Yale University: New Haven, CT, USA, 2005. [Google Scholar]
DeFusco, A.A.; Paciorek, A. The interest rate elasticity of mortgage demand: Evidence from bunching at the conforming loan limit. Am. Econ. J. Econ. Policy 2017, 9, 210–240. [Google Scholar] [CrossRef]
Karlan, D.; Zinman, J. Long-run price elasticities of demand for credit: Evidence from a countrywide field experiment in mexico. Rev. Econ. Stud. 2019, 86, 1704–1746. [Google Scholar] [CrossRef]
Breeden, J.L. Impacts of drought on loan repayment. J. Risk Financ. Manag. 2023, 16, 85. [Google Scholar] [CrossRef]
McCoy, S.J.; Walsh, R.P. Wildfire risk, salience & housing demand. J. Environ. Econ. Manag. 2018, 91, 203–228. [Google Scholar]
Wang, D.; Guan, D.; Zhu, S.; Kinnon, M.M.; Geng, G.; Zhang, Q.; Zheng, H.; Zheng, H.; Lei, T.; Shao, S.; et al. Economic footprint of california wildfires in 2018. Nat. Sustain. 2021, 4, 252–260. [Google Scholar] [CrossRef]
Al-Kaisi, M.M.; Elmore, R.W.; Guzman, J.G.; Hanna, H.M.; Hart, C.E.; Helmers, M.J.; Hodgson, E.W.; Lenssen, A.W.; Mallarino, A.P.; Robertson, A.E.; et al. Drought impact on crop production and the soil environment: 2012 experiences from iowa. J. Soil Water Conserv. 2013, 68, 19A–24A. [Google Scholar] [CrossRef]
Bell, J.E.; Leeper, R.D.; Palecki, M.A.; Coopersmith, E.; Wilson, T.; Bilotta, R.; Embler, S. Evaluation of the 2012 drought with a newly established national soil monitoring network. Vadose Zone J. 2015, 14, 1–7. [Google Scholar] [CrossRef]

Figure 1. A neural network for credit scoring with age–period–cohort inputs for stabilization, and calibration to probability with economic scenarios.

Figure 2. Adverse selection measured in units of bureau score for models of Fannie Mae and Freddie Mac mortgage performance data, segmented by score band.

Figure 3. The APC environment function for US auto defaults throughout the pandemic.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Breeden, J.L. An Age–Period–Cohort Framework for Profit and Profit Volatility Modeling. Mathematics 2024, 12, 1427. https://doi.org/10.3390/math12101427

AMA Style

Breeden JL. An Age–Period–Cohort Framework for Profit and Profit Volatility Modeling. Mathematics. 2024; 12(10):1427. https://doi.org/10.3390/math12101427

Chicago/Turabian Style

Breeden, Joseph L. 2024. "An Age–Period–Cohort Framework for Profit and Profit Volatility Modeling" Mathematics 12, no. 10: 1427. https://doi.org/10.3390/math12101427

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Age–Period–Cohort Framework for Profit and Profit Volatility Modeling

Abstract

1. Introduction

2. Start from the Goal and Work Backward

3. Age–Period–Cohort Models as a Credit Risk Framework

4. Profit Models

5. Credit Scoring for Cash Flow Forecasting

6. Loss Reserves and Economic Capital

Prepayment Scores

7. Loss Recovery Prediction

8. The Role of Machine Learning and AI

9. Incorporating Alternate Data

10. Understanding Credit Cycles

11. Incorporating Economic Cycles

12. Profit Volatility Models

13. Optimizing Pricing for Net Income

14. Underwriting with Profitability and Profit Uncertainty Estimates

15. Preparing for Natural Disasters, Climate Change, and Pandemics

16. Modeling Post-Event Data

17. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI