*Model II:*

Logit Default*<sup>j</sup>* = β<sup>0</sup> + β1*Hard In f ormtionj* + β<sup>2</sup> *So f t Inf ormtionj* + ∝ *Control Variablesj* + ε, (2)

> The dependent variable for Model I, the funding probability model, is a dummy variable which equals 1 when the loans have been successfully funded, otherwise 0.

> Model II is the default predicting model; the dependent variable default represents whether the loan has been repaid completely without delay. 1 represents 'defaulted'; 0 represents 'repaid'.

> All the chosen hard and soft information variables are listed in Appendix A, Table A1. All the chosen variables are based on the references from the literature review. We used financially related information, income level and collaterals as the hard information. Socially and psychologically related information such as age, gender, loan description, marital status, educational level and social media information are used as the soft information. Loan features are used as the control variables.

> The hard information is represented by key financial determinants that indicate the wealth and solvency of the borrower. They are the four key fundamental financial indicators that are available in our dataset: monthly income, home ownership, car ownership and existing mortgage loans. Car and home ownership are dummy variables, with 1 indicating 'ownership' and 0 indicating 'none'. We include verification of income in the model to certify accuracy.

As soft information is difficult to measure, proxies must be employed. Table 1 summarizes the proxies used in our model. Our approach to soft data is similar to that in the literature: we employ education duration (e.g., (Liao et al. 2015)), age (e.g., (Gonzalez and Loureiro 2014)), and gender (e.g., (Gonzalez and Loureiro 2014; Barasinska and Schäfer 2014; Ravina 2019; Pope and Sydnor 2011)). We also employed the length of the loan purpose statement as a linguistic indicator, as suggested by (Lin et al. 2013; Kim et al. 2020).

Since social impact has been proved to be a significant factor on loan success by (Greiner and Wang 2009; Herrero-Lopez 2009; Lin et al. 2013), we used the verification data from Weibo (the largest Chinese social network) as our indicator of social impact. If an applicant's social network was verified, it is represented as "1", otherwise "0".

Profile photos were shown to influence the funding success in (Pope and Sydnor 2011) study. Since the profile photos on Renrendai.com were not always real pictures of the applicants, we chose video verification as the picture indicator's proxy. During the verification process, borrowers must video themselves holding their ID cards and reading a statement accepting general rules and conditions from Renrendai.com as part of the verification procedure, and then upload the video with their loan application. If the applicant accepted video verification, this is recorded as a "1," otherwise it is reported as a "0".

The expansion of mobile services is a fundamental component of Fintech 2.0, and mobile usage data is the preferred verification tool for Fintech firms, particularly big data firms. Since mobile numbers were introduced to China's real-name system, allowing tracking and verifying of real cellphone users, it has become a critical source for anti-fraud efforts. Furthermore, one of the most powerful indicators of default in the consumer finance market is mobile usage behavior. As a result, we included a variable for mobile verification in our model. This is also a dummy variable: "1" means verified, "0" means not verified.

Based on (Nigmonov et al. 2022) and (Khan and Xuan 2021), we included the interest rate, the length of the loan, and the amount of the loan. The average interest rate is 14.9%, and the highest interest rate is 24.4%. The average amount is 60,637.93 yuan. Since the amount is quite large, we used the log of amount as the proxy to normalize the distribution. The loan term is from 1 month to 36 months. The average term is 16 months.

We summarize the descriptive statistics of all the independent variables in Table 1 below.


**Table 1.** Descriptive Summary of Independent Variables.

#### **4. Results**

Table 2 shows the logit regression results for Model I and Model II. The results show that income has a positive relationship with success since we take the mean group 4 as the reference group. Income groups lower than 4 are less likely to receive loans, while groups higher than 4 are more likely than the average group to have loans funded. This reflects the common sense of peer investors, who believe higher income means better solvency and more trustworthiness. This is consistent with most of the research in the field such as (Pötzsch and Böhme 2010). However, the default results suggest that this is not the case: the lower income group is negatively correlated to default, thus they actually have lower default possibility (e.g., income groups 2 and 3), while the high income group can default more (e.g., income groups 6 and 7 are more likely to default than income group 4). This may be because borrowers have the intention to lie about their income to create a more trustworthy image to the lenders. However, the lenders did not recognize the risk of the fake information. Moreover, the value of the income verification has not been recognized: the high verified income group has a lower default probability. Nevertheless, compared to income group 4, investors give more loans to income group 3 than groups 5,6,7, which is evidently a TYPE II error that provides loans to those with lower creditworthiness. This results from the misdiagnosis signals from income. This also implies the necessity of key information verification on the P2P platform. Since there is no credit rationing process on the platform, the judgment is purely based on unprofessional lenders. The validity of the information provided on the platform becomes critical.

Table 2 presents the logit regression results for the funding probability model and default prediction model with coefficient and robust standard errors in brackets.


**Table 2.** Comparison of Logit Regression Results for Funding Probability and Default Predicting Model.

#### **Table 2.** *Cont*.


Heteroscedasticity-Robust, standard errors in parentheses. \*\*\* *p* < 0.01, \*\* *p* < 0.05, \* *p* < 0.1. The numbers associated with the variable 'income' refer to income groups. The sample includes 7 income groups.

After comparing the logit regression results from both models, we can see that, except car ownership, all other hard information variables have either opposite results when compared to each other or different significance levels.

The median income group 4 is used as the reference variable, revealing that lowerincome groups (1,2,3) are less likely to receive loan funding compared to the median income group (4), whereas higher-income groups (5,6,7) were more likely to be funded. The funding probability model shows interesting results, in which the interaction effect of verified income and declared income elicit opposite results. Surprisingly, higher-income groups are less preferred by the investor. Combined with the results of the default predicting model, we find that verified higher-income groups show lower default probability. However, higher-income groups without income verification demonstrate a higher probability of default. The implication may be that people in higher-income groups are more inclined to be dishonest regarding their incomes. In Table 3, we further analyzed the distribution of the income verification, the results showing that the income verification percentage increases along with the increase of income levels. Applicants in income groups 1 and 2 are very unlikely to verify their income, the verification percentage being only around 0.3%. On the other hand, the high-income groups all have a verification percentage above 14%. However, as we can see from the regression results, investors are less willing to lend to verified high-income groups than the average income group, although verified high-income groups have a lower probability of default. But investors are more willing to lend to unverified high-income groups, who actually have a higher probability of default. This induces TYPE II errors among the investors, since they cannot diagnose the income verification in high-income groups as a positive signal of creditworthiness and lend more funds to those who have a higher probability of default.

Table 3 shows the distribution of the verified income group and the percentage it occupies of the total application according to income group.


**Table 3.** Verified Income Distributions.

Lenders tend to prefer borrowers with fixed assets such as houses or cars. However, only car ownership is seen to be a significant indicator of reduced probability of default. House ownership is unable to secure loan payment, a finding that is in consonance with that of (Jiménez and Saurina 2004) research, in which loans with collateral are often linked to higher default rates. Additionally, since loans in the P2P market are usually small-sized, this makes a car easier to monetize, whereas the process of realizing a house for loan repayment is more time-consuming and complicated, compared to smaller assets. As far as the mortgage loan is concerned, investors prefer borrowers without any debt. However, the default model is suggestive of the fact that the probability of default is lower for people with mortgage loans. This could be attributed to the fact that people with mortgage loans are more concerned about their creditworthiness.

For soft information, mobile verification exhibits the opposite result in the logit regression. It is negatively correlated to funding probability, but also negatively correlated to default. This means that borrowers who have mobile verification are less likely to default but are also less likely to get the loan funded. From Table 4, we can see that the percentage of mobile verified in successful loans (4.77%) is much less than in defaulted loans (17.87%). Additionally, the percentages of successful and non-default mobile and video verified loans differentiated substantially. Successful mobile verified loans represent 26.6% of all verified loans, among which only 3.9% defaulted. This is lower than the total default rate of 4.6%. This substantiates a positive relationship of the verified mobile with the high creditworthiness of the borrowers. However, lenders cannot effectively diagnose the signal and categorize the borrowers by this feature.

The phenomenon of non-financial information can improve the prediction model and can sometimes even outperform financial information in predicting default, which has been proved by (Fernando et al. 2020) and (Bhimani et al. 2013) using business loans. Now we add further evidence from the microfinance dataset.

Table 4 shows the distribution of the mobile verification in funded and not funded loans, and in default and defaulted loans.


**Table 4.** Mobile Verification Distribution List.

The video verification also showed opposite results in the Logit regression comparison, which is consistent with (Duarte et al. 2012), where borrowers' willingness to show their appearance does not indicate that they have higher creditworthiness. However, most of the lenders attach great trust to video verification since the indicator is significantly correlated to loan success. As shown in Table 5, in contrast to mobile verified, 61.29% of video verified loans succeed in funding, while 8.2% defaulted, which is 3.6% higher than the total default rate of 4.6%. This may be due to the fact that borrowers that bear higher risk are willing to offer more information, indicating a classic adverse selection case and a TYPE II error existence.

Table 5 shows the distribution of video verification in funded and not funded loans, and in default and defaulted loans.


**Table 5.** Video Verification Distribution List.

We can also see from the significance level of the variables that all the hard information is significant in the funding probability model except house ownership, but becomes less significant when it comes to the default predicting model. However, this phenomenon does not exist in soft information variables, as the results of soft information are more consistent in both models. This suggests that lenders were less capable if diagnosing the signals from hard information compared to soft information.

From our regression results, we can see that investors were not able to effectively diagnose most of the useful information from the signals provided by borrowers provide, especially from hard financially related signals. This indicates that investors on the P2P platform may have lacked the financial literacy regarding credit appraisal. Their biased investment decisions may have created credit risk to the disintermediated financial system. On the other hand, the P2P investors react surprisingly well to soft signals. They correctly diagnosed the effect of age, gender, educational level, marital status, and social media on creditworthiness. This has important policy implication - in a financial environment with a weak credit bureau and limited financial literacy, soft information may even performs better on credit screening. Adding more socially related soft information into the credit rationing model could mitigate adverse selection in disintermediated financial institutions.

## **5. Discussion and Conclusions**

This paper examines whether online P2P investors can accurately and effectively diagnose signals of creditworthiness during their decision-making process. According to our findings, the TYPE II errors exist in the investors' decision-making process. Comparisons of the signs used in determining both loan defaults and loan funding show that the investors were predisposed to making inaccurate diagnoses of signals and gravitate to borrowers with low creditworthiness, while inadvertently screening out their counterparts with high creditworthiness.

This particularly happens with hard financially based signals. Specifically, signals such as income and property ownership were insignificant or typically provided contradictory guidance in terms of default. However, investors have allocated disproportionate weights to this in the decision-making process of loan funding. Surprisingly, rather than hard financial signals, investors were more adept at diagnosing soft social signals. That is, all directions of soft signals in the loan funding process were found to be accurate reflections in the default prediction model with the exception of softer signals such as video and mobile verification. These results suggest that soft social information can be a compensatory solution when hard information is not solid enough. The absence of solid credit bureau is typically the main problem for developing countries in credit appraisal, and as our results show, soft information can provide an alternative solution in credit analysis to this problem. Due to data limitations, our soft information is restricted to social identity information. However, with artificial intelligence and machine learning development, softer information relevant to social behavior such as social networks and mobile usage behavior can provide more comprehensive angles of credit analysis in microfinance and deserve further research.

Our paper clearly demonstrated the existence of the TYPE II errors in the disintermediated lending market, indicating a high potential credit risk in financial markets. Due to the growing size of the Fintech industry, this may pose systematic risk to financial systems, requiring regulators' close attention. In addition, we believe the problem of misidentification of credit worthiness signals can be alleviated by a sophisticated and independent credit bureau and increasing public financial literacy. Meanwhile, expanding the use of social soft information could also mitigate adverse selection in the disintermediated financial institutions. And this process must be accompanied by establishing a transparent and effective oversight over the use of soft information in order to avoid abuse.

**Author Contributions:** Conceptualization, Y.W. and Z.D.; methodology, Y.W.; software, Y.W.; validation, Y.W.; formal analysis, Y.W.; investigation, Y.W.; resources, Y.W.; data curation, Y.W.; writing original draft preparation, Y.W.; writing—review and editing, Y.W.; visualization, Y.W.; supervision, Z.D.; project administration, Y.W.; funding Y.W. and Z.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by Charles University, grant number "SVV 260 597" and GACR funding under project number 18-04630S.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Restrictions apply to the availability of these data. Data was obtained from Tsinghua University and are available from the author with the permission of Tsinghua University and RenrenDai.com.

**Acknowledgments:** I would like to sincerely gratitude to Tsinghua University for providing the research dataset. I would like to express my sincere gratitude to my supervisor, Zdenek Drabek, for his professional guidance. Finally, I need to thank all my colleagues from PBC School of Finance, Tsinghua University for their insightful suggestions.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. List of Variables**

**Table A1.** Description of independent variables.

