1. Introduction
Digital lending platforms promise to replace the frictions of mortgage origination with instant approvals, data-rich underwriting and finely tuned prices. In segments such as unsecured consumer credit, those promises appear to hold: by ingesting alternative variables—from cash-flow traces to digital footprints—fintech algorithms have outperformed legacy scorecards and expanded credit access without raising loss rates. Yet, the largest slice of U.S. housing finance—the conforming mortgage market dominated by Fannie Mae and Freddie Mac—operates inside a markedly different institutional shell. Every loan must pass a government-sponsored enterprise (GSE) scorecard that fixes both the information set and the modeling approach, and most credit risk is transferred to investors within weeks via agency mortgage-backed securities. Whether a lender is a community bank, a nationwide originator or an app-based fintech, the same standardized inputs feed the same automated underwriting engines, and the originating institution rarely bears long-run losses.
In the U.S. conforming mortgage market, banks refer to traditional financial institutions operating under strict regulatory oversight, holding a bank charter and typically maintaining physical branch networks. In contrast, fintech lenders are predominantly online platforms leveraging digital technology to streamline mortgage origination, enhance user experience and accelerate loan approval processes [
1,
2]. Recent studies have highlighted significant differences between fintech lenders and traditional banks regarding risk assessment capabilities, pricing strategies, and borrower targeting [
3,
4]. This study specifically investigates how these two lender types differ in aligning pricing with borrower risk within the regulated framework of U.S. government-sponsored enterprises (GSEs). First, how accurately can lenders discriminate between borrowers who will default and those who will not when all parties share the same mandatory data? Second, once risk is measured, how tightly do lenders map it into interest rates, and does that mapping differ between fintechs and incumbent banks? Answering these questions requires separating screening—the act of predicting default—from pricing—the act of converting that prediction into an interest rate. Prior work often conflates the two, using the origination APR itself as both the lender’s risk signal and its price. Here, we deploy a two-stage empirical framework that disentangles them: machine-learning models trained within each lender class generate out-of-sample default probabilities, and a pooled benchmark translates those probabilities into a fair pricing curve against which actual rates can be judged.
Leveraging 30-year fixed-rate mortgages originated from 2012 through 2020, we find that non-fintech lenders post the highest screening accuracy (average AUC ≈ 0.860), with banks following closely (≈0.857) and fintechs lagging (≈0.852) despite using the same gradient-boosting algorithms. More strikingly, banks display the steepest rate-risk slope—about 7.1 basis points for every one percentage-point increase in predicted default probability—and the narrowest distribution of mispricing residuals. Fintech lenders, in contrast, exhibit a slope that is roughly 40 percent flatter (4.18 slope) and underprice nearly one-third of the riskiest loans relative to the benchmark. The evidence suggests that technological sophistication alone cannot overcome two structural frictions: the information ceiling imposed by GSE scorecards and the weak incentives that arise when credit risk is swiftly securitized.
These findings matter for both market design and consumer welfare. If alternative-data fields were cautiously integrated into agency underwriting engines, the ceiling on predictive accuracy might rise for all lenders, allowing genuine analytics advantages to surface. Conversely, modest risk-retention requirements could sharpen pricing discipline by forcing originators—fintech and bank alike—to internalize a sliver of future losses. Until such reforms take hold, fintechs’ celebrated algorithms are likely to remain blunted in the very market that shapes the majority of U.S. household leverage.
2. Literature Review
Fintech’s rapid rise in unsecured consumer lending is frequently attributed to its proficiency in combining alternative data with machine-learning (ML) models, thus enhancing default prediction accuracy and expanding credit access. Specifically, incorporating digital footprint variables significantly lowers both rejection rates and subsequent delinquency rates compared to traditional FICO-based screening methods [
5]. Similarly, incorporating even basic online behavioral signals yields considerable improvements in predictive accuracy, as measured by area under the curve (AUC) [
6]. Complementing this evidence, fintech lenders have been shown to offer more finely granulated pricing of unsecured personal loans compared to traditional banks, suggesting advanced risk differentiation capabilities [
4].
However, despite these technological advancements, mortgage markets impose institutional frictions that restrict fintech’s full potential. Within the conforming mortgage segment, proprietary data usage by fintech lenders is constrained by standardized underwriting systems like Fannie Mae’s Desktop Underwriter and Freddie Mac’s Loan Product Advisor [
7,
8]. Additionally, the rapid securitization of mortgages tends to dilute incentives for lenders to invest in granular screening processes, as documented by Keys et al. and Acharya et al. [
9,
10].
Empirical evidence regarding fintech performance in mortgage lending is somewhat mixed. Some studies indicate efficient mortgage processing without increased default rates [
8]. Conversely, others highlight fintech lenders’ coarse pricing strategies and note cross-subsidization within loan portfolios, suggesting limitations in fintech’s risk assessment capabilities [
1,
3]. Furthermore, there is evidence of inclusivity shortcomings, particularly affecting women, due to gaps in product suitability, transparency and trust, thereby underscoring the importance of fairness and transparency in algorithmic design [
11].
Algorithmic bias represents a critical challenge across all algorithm-driven lending systems, including both fintech platforms and traditional banks. Such biases are attributed primarily to skewed or incomplete training data rather than deliberate programmer intent [
12]. This concern aligns with findings from studies of big tech lending in China, where algorithmic models outperform traditional credit assessments during economic downturns, partly due to structural features like high interest rates and short maturities that mitigate risk in unsecured lending [
13,
14].
Research exploring fintech’s technological edge in risk pricing further elaborates on its potential and constraints. Some observe that fintech institutions are more responsive than traditional banks to expansions in credit markets [
15], while others discuss reintermediation trends driven by fintech, impacting lending dynamics and borrower experiences [
16]. There are also insights into the complementary roles fintech platforms and traditional banks play in lending markets, highlighting the distinct strengths and limitations of each [
17]. Additionally, deep-learning methods have demonstrated significant potential in improving mortgage risk prediction, reinforcing fintech’s position at the methodological forefront [
18]. Complementing these methodological advancements, recent studies propose frameworks to assess fairness in algorithmic credit scoring, addressing crucial ethical and regulatory concerns [
19].
Further complexities arise from regulatory environments. Some authors illustrate how fintech lenders exploit regulatory arbitrage opportunities, reshaping mortgage market dynamics and introducing competitive pressures alongside systemic risks [
20]. Complementing these observations, others detail how securitization processes negatively impact distressed loan renegotiations, indicating structural barriers that fintech firms must navigate [
21].
Recent literature also focuses on distinguishing predictive accuracy from pricing effectiveness. Some validate the predictive effectiveness of machine-learning models for consumer credit risk [
22]. Others examine information asymmetries prevalent in peer-to-peer lending markets [
23]. Additional studies investigate the role of machine learning in exacerbating or alleviating disparities within credit markets [
2], while others explicitly discuss discriminatory practices and pricing inefficiencies observed among fintech lenders [
1]. Further research emphasizes the importance of clearly separating default prediction accuracy from pricing strategies to enhance analytical clarity [
6].
Against this backdrop, the current study adopts a rigorous benchmarking approach, evaluating lender-specific screening accuracy using multiple ML classifiers. Additionally, it introduces a pooled risk benchmark to independently assess potential mispricing in mortgage lending. This dual-pronged methodological strategy aims to clarify fintech’s specific limitations—whether rooted in predictive accuracy, pricing inefficiencies or a combination of both—in the standardized and regulated conforming mortgage market.
3. Data and Sample
Lenders were classified based on their regulatory identifiers and operational characteristics. Specifically, banks have a bank charter and a Research, Statistics, Supervision and Discount (RSSD) identifier issued by the Federal Reserve, while fintech lenders are identified as non-bank entities whose mortgage origination processes are predominantly completed online, following previous studies [
8,
20,
24,
25]. The analysis employs borrower FICO scores, which represent widely recognized creditworthiness measures in the U.S. A FICO score is a three-digit number, typically ranging from 300 to 850, calculated based on an individual’s credit history—including payment history, amounts owed, length of credit history, new credit and credit mix—and is widely used by lenders to assess the likelihood of timely loan repayment.
Panel A of
Table 1 presents key origination characteristics of conforming 30-year fixed-rate GSE mortgages. Fintech lenders tend to originate loans with slightly lower FICO scores and loan-to-value (LTV) ratios compared to banks and non-fintechs. Fintech loans have a higher proportion of refinances (70%) and lower first-time homebuyer rates. Banks lend in areas with higher unemployment rates and lower real income, while fintechs originate in higher-income, lower-unemployment metros.
Panel B of
Table 1 summarizes loan performance across groups. Fintech loans exhibit lower delinquency rates at 12 and 24 months compared to non-fintechs but slightly higher than banks. Notably, fintechs demonstrate higher prepayment rates across all horizons, consistent with more tech-savvy borrowers or aggressive rate-shopping behavior.
These patterns suggest that fintech lenders target a different risk-return profile, possibly emphasizing refinance opportunities and faster prepayment cycles, while maintaining comparable or lower default risk relative to traditional lenders.
4. Methodology
We employ a two-stage empirical framework, distinctively separating default prediction (screening) from interest rate setting (pricing). This separation allows clear identification of how lenders translate borrower risk into pricing decisions—an advantage over approaches conflating these tasks [
1]. To ensure model robustness and mitigate overfitting, the hyperparameters for each model were carefully tuned through 3-fold cross-validation within the training data. The use of out-of-sample area under the ROC curve (AUC) scores further validates model generalizability. While the primary evaluation metric is AUC, we acknowledge precision–recall curves as another valuable evaluation tool, particularly useful in future extensions of this research.
4.1. Stage One: Lender-Specific Default Prediction and AUC Evaluation
Let the binary variable indicate whether loan i becomes 90 or more days delinquent within 36 months of origination. We denote by the vector of borrower- and loan-level features available at origination (e.g., FICO score, loan-to-value ratio, debt-to-income ratio).
For each lender group , we split the data into training (70%) and test (30%) subsets using stratified random sampling. We then estimate the probability of default using five different machine-learning models: logistic regression (logit), random forest (RF), LightGBM (LGBM), XGBoost (XGB) and gradient-boosting classifier (GBC).
We select these models to balance interpretability, robustness and predictive power. Logistic regression serves as a benchmark due to its simplicity and transparency. Tree-based models are included due to their ability to capture non-linear relationships and interactions among features, which are common in mortgage risk prediction. Gradient-boosting variants (GB, XGBoost, LightGBM) are especially suited for handling imbalanced classification problems and high-dimensional tabular data. The models employed in this study are widely used in credit and mortgage risk modeling. Their effectiveness in accurately predicting mortgage defaults has been demonstrated in prior research [
13,
18,
24], supporting their application in our comparative analysis.
Each model is trained separately for each lender group to allow for group-specific patterns in default behavior. To ensure fair comparison and optimal performance, we tune the hyperparameters for each model using 3-fold cross-validation within the training data. The hyperparameters are selected based on the highest cross-validated AUC score. For example, the regularization strength C is tuned for logistic regression, while the number of estimators, learning rate and maximum tree depth are tuned for tree-based models. Technical descriptions of all models are provided in
Appendix B, while the full list of candidate hyperparameter values and the selected configurations by lender group are reported in
Appendix C.
For observation
i in group g, the predicted probability of default is denoted by
where
is the predicted probability that loan
i defaults, while
represents the machine-learning model trained and tuned specifically for lender group g.
To measure predictive accuracy, we rely on the area under the ROC curve (AUC). Although the AUC lacks a single closed-form expression for binary classifiers, it can be interpreted as the probability that a randomly chosen “positive” (defaulted) loan receives a higher predicted default probability than a randomly chosen “negative” (non-defaulted) loan. We evaluate each of the five models on the test set and select the best-performing model (the one with the highest AUC) for subsequent analysis within each lender group. Finally, we compute the chosen model’s out-of-sample AUC. A higher indicates stronger discriminative power in identifying high-risk borrowers for lender group g.
4.2. Pooled Model and Predicted Risk Scores
Although we obtain separate predictions
by training on each lender group separately, we also want a uniform risk benchmark that does not depend on the lender’s own pricing or underwriting. To do this, we train a single best-performing “pooled” model on all loans from all lenders, denoted by
. This model likewise uses only borrower- and loan-level features available at origination (excluding interest rates to preserve exogeneity). Formally,
This pooled measure is our baseline estimate of each borrower’s default risk, unaffected by which lender originated the loan. The pooled model is trained on 70% of the combined dataset and then used to predict default probabilities for the full sample. We implement using LightGBM, the best-performing machine-learning algorithm identified in the group-specific training stage. This pooled measure serves as our baseline estimate of borrower risk, independent of the originating lender.
4.3. Pricing Alignment Analysis
To test whether lenders set interest rates in proportion to exogenous risk, we compare the actual loan interest rate to the pooled default probability . We employ two complementary approaches:
We sort loans into deciles by . Within each decile , calculate the average predicted default probability and the average interest rate . Plotting against for banks, non-fintech non-banks and fintechs reveals how steeply rates rise as risk increases; a steeper, more linear curve indicates tighter risk-based pricing, whereas a flatter curve signals weaker sensitivity.
We estimate a lender-group-specific linear regression of the form
The slope captures the marginal change in the interest rate associated with a one-unit increase in predicted default probability. A larger reflects stronger risk-based pricing. Comparing across groups therefore shows which lenders adjust rates most sharply in response to borrower risk.
4.4. Mispricing Residual Analysis
Even if average rates rise with default probability, individual loans may be over- or underpriced relative to a benchmark pricing curve. To quantify this, we first estimate
using all loans in a pooled regression (across all lenders). We define the fitted rate for loan
i as
and then compute the residual (mispricing) for each loan
A negative means that the loan is underpriced (the lender charged a lower rate than the risk-based benchmark), while a positive implies that the loan is overpriced. Aggregating these residuals by lender group reveals whether fintechs, banks or non-fintech lenders systematically deviate from risk-based prices.
5. Results
5.1. Screening Accuracy
Figure 1 and
Table 2 report the out-of-sample AUCs of five machine-learning classifiers—logistic regression (logit), random forest (RF), LightGBM (LGBM), XGBoost (XGB) and gradient-boosting classifier (GBC)—estimated separately for banks, non-fintech non-banks and fintech lenders.
Across all lender types, LightGBM consistently posts the highest—or statistically indistinguishable second-highest—AUC, confirming its superior ability to separate defaulters from non-defaulters. Averaging over the five models, non-fintech lenders attain the best overall predictive accuracy (mean AUC ≈ 0.860), banks follow closely (≈0.857), and fintechs lag (≈0.852). Performance within non-fintech and fintech groups is remarkably stable across the three gradient-boosting methods (LGBM, XGB, GBC), whereas logistic regression and random forest score a few hundredths lower. Banks show the greatest spread, with a distinct dip for random forest and a recovery under boosting algorithms.
Taken together, the evidence highlights tree-based boosting—especially LightGBM—as the most reliable modeling choice for high-dimensional mortgage credit data across all lender categories.
5.2. Pricing Alignment
Next, we examine whether interest rates align with the pooled model’s default probability . A linear regression of interest rate on yields
Figure 2 plots the average origination rates against decile-level predicted default probabilities
. This figure plots the average interest rate against the average predicted default probability across deciles of estimated risk, separately for banks, non-fintech non-banks and fintech lenders. Each point represents the mean interest rate and mean predicted risk within a decile. The upward-sloping curves reflect positive pricing alignment—higher-risk borrowers are charged higher rates. However, the steepness and level of each curve differ across lender types. Banks exhibit the highest average rates and the steepest pricing gradient, suggesting stronger risk-based pricing. In contrast, non-fintech and fintech lenders show flatter slopes, indicating weaker sensitivity of pricing to estimated borrower risk. The results show that banks consistently charge the highest interest rates across all deciles, followed by fintech and then non-fintech lenders.
Table 3 presents the results from lender-type-specific OLS regressions of interest rates on the predicted default probability (
). All coefficients on (
) are statistically significant at the 1% level, confirming that lenders positively adjust pricing in response to borrower risk. Among the three groups, banks exhibit the steepest pricing sensitivity, with a coefficient of 7.19, indicating a strong alignment between risk and rate. Non-fintech lenders also show meaningful alignment (5.43), while fintechs apply the shallowest pricing slope (4.18). These results suggest that banks price risk more aggressively, whereas fintech lenders exhibit relatively weaker sensitivity to borrower default risk, despite operating with modern algorithms. Notably, the R-squared values remain low across all regressions (4–5%), consistent with the fact that much of the interest rate variation is driven by factors beyond the modeled credit risk, including competition, borrower characteristics not captured in (
) and loan features.
5.3. Mispricing Summary
Using the fitted pricing curve as a benchmark, we compute the mean residuals and the share of loans classified as under- or overpriced.
Table 4 reports summary statistics on interest rate mispricing, calculated as the difference between the actual interest rate and the fair rate predicted by the pooled PD-based pricing model. Banks exhibit a small positive average mispricing of +4.6 basis points, indicating a slight tendency to overcharge relative to risk. In contrast, both fintech and non-fintech lenders show negative average mispricing (–8.2 and –15.1 basis points, respectively), suggesting systematic underpricing of riskier borrowers. The share of underpriced loans is highest among fintech lenders (32.02%), followed closely by non-fintechs (29.99%), while their shares of overpriced loans remain relatively low (13.5% and 15.37%, respectively). These results imply that banks adhere more closely to risk-based pricing, while alternative lenders—especially non-fintechs—tend to offer below-benchmark rates to higher-risk borrowers, potentially reflecting either competitive strategies or less precise risk-pricing mechanisms.
6. Discussion
Three forces jointly explain fintech lenders’ muted alignment of price and risk in conforming mortgages. First, regulatory constraints limit informational flexibility. Specifically, every conforming mortgage loan must clear the standardized underwriting systems—Fannie Mae’s Desktop Underwriter (DU) or Freddie Mac’s Loan Product Advisor (LPA). These automated platforms employ strict and uniform “scorecards” that evaluate borrower risk based on predetermined criteria, such as credit scores, loan-to-value ratios and debt-to-income ratios. Critically, these scorecards do not accept proprietary fintech data or alternative risk indicators—such as real-time cash flow, rental payments or utility bill histories—that have significantly enhanced predictive accuracy in unsecured lending markets. Consequently, even the most advanced fintech algorithms can achieve only incremental improvements within this rigid framework, restricting the ability to differentiate borrower risks more effectively.
Second, incentive misalignment compounds this regulatory rigidity. Fintech and other non-bank lenders primarily use warehouse funding, originating loans intended for rapid sale into securitization pools backed by government-sponsored enterprise (GSE) guarantees. This originate-to-distribute model shifts the long-term credit risk to MBS investors, significantly weakening incentives for precise risk-based pricing. The immediate rewards from slightly lower rates and higher origination volume thus outweigh the long-term, dispersed default costs, making systematic underpricing rational from a business growth perspective.
Third, competitive positioning further shapes these pricing dynamics. Fintech lenders differentiate themselves through speed, streamlined user experiences and digital convenience, whereas traditional banks leverage brand recognition, customer trust, cross-selling opportunities and rigorous regulatory oversight. Banks also frequently retain loan servicing rights and face capital requirements that strongly incentivize accurate upfront risk pricing. This strategic and regulatory alignment explains why banks consistently display steeper rate-risk slopes compared to fintech lenders.
These dynamics underscore the structural limitations in algorithmic credit scoring when underwriting decisions are decoupled from long-term financial accountability. However, targeted policy reforms could significantly enhance fintech lenders’ alignment between pricing and borrower risk. First, modernizing GSE underwriting scorecards to allow carefully verified alternative data—such as rental payments, utility histories or verified real-time financial transaction data—could meaningfully expand the informational scope, enabling more nuanced borrower risk differentiation. Second, introducing modest risk-retention requirements, where lenders must retain a small percentage (e.g., 5%) of each originated loan’s risk, would align lenders’ incentives with long-term loan performance without compromising the liquidity benefits provided by agency mortgage-backed securities. Such requirements echo existing regulatory frameworks like the Dodd–Frank Act’s risk-retention rules and could substantially strengthen pricing discipline.
The findings also suggest several promising directions for future research. One avenue is examining whether fintech lenders demonstrate superior performance in private-label or non-conforming mortgage segments, where underwriting standards are more flexible, and lenders retain greater exposure to loan outcomes. Another research direction involves evaluating the broader welfare implications: specifically, does fintech-driven underpricing sustainably enhance homeownership access, or does it merely shift default risks onto government-supported entities and, ultimately, taxpayers? Finally, analyzing operational efficiencies in post-origination loan servicing may reveal whether fintech lenders provide measurable value that offsets weaker initial pricing accuracy. By explicitly separating default prediction from risk-based pricing, our analytical framework provides a useful template for exploring these critical questions in other regulated credit markets.
7. Conclusions
This study provides robust, causal evidence that fintech mortgage lenders lag behind traditional banks in both screening accuracy and risk-based pricing, even when all parties operate under the same GSE-mandated information regime. Using five state-of-the-art machine-learning models that are rigorously tuned via cross-validation and evaluated out-of-sample, we document a systematic performance gap (best-model AUC: 0.852 for fintechs vs. 0.857 for banks). A two-stage framework that cleanly separates default prediction from pricing further reveals that banks adjust rates by 7.2 basis points for every percentage-point increase in predicted default probability, whereas fintechs adjust them by just 4.2 bp. These patterns persist across more than six million conforming loans originated between 2012 and 2020, underscoring the scientific soundness and external validity of our results.
The findings carry broad international relevance. Many mortgage markets—from Canada to the U.K. and Australia—share two key features of the U.S. conforming segment: (i) highly standardized, regulator-approved underwriting algorithms and (ii) rapid securitization that shifts future credit losses off lenders’ balance sheets. In such environments, data ceilings and weak ex-post incentives can blunt the very technologies that drive fintech’s success in unsecured credit. Policymakers worldwide can therefore draw two actionable lessons. First, cautiously expanding the set of verifiable alternative data (e.g., rental payment histories, transaction-level cash-flow data) that government or quasi-government underwriting systems accept would raise the ceiling on predictive accuracy for all lenders. Second, modest risk-retention rules—mirroring the 5% “skin-in-the-game” standard in other securitized asset classes—would strengthen price discipline without unduly inhibiting secondary-market liquidity. Together, these reforms could unlock fintech’s analytic potential while safeguarding systemic stability, rendering our results pertinent well beyond the U.S. context.