Next Article in Journal
Navigating Gender Nuances: Assessing the Impact of AI on Employee Engagement in Slovenian Entrepreneurship
Previous Article in Journal
Transitioning to Agile Organizational Structures: A Contingency Theory Approach in the Financial Sector
Previous Article in Special Issue
Blue Sky Protection Campaign: Assessing the Role of Digital Technology in Reducing Air Pollution
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Machine Learning Approach for Investigating the Determinants of Stock Price Crash Risk: Exploiting Firm and CEO Characteristics

Business School, Shandong University, Weihai 264209, China
*
Author to whom correspondence should be addressed.
Systems 2024, 12(5), 143; https://doi.org/10.3390/systems12050143
Submission received: 5 March 2024 / Revised: 7 April 2024 / Accepted: 17 April 2024 / Published: 23 April 2024

Abstract

:
This study uses machine learning to investigate the effects of firm and CEO characteristics on stock price crash risk by collecting massive data on publicly listed firms in China. The results show that eXtreme Gradient Boosting (XGBoost) is the most effective model for predicting stock price crash risk, with relatively satisfactory performance. Meanwhile, the SHapley Additive exPlanations (SHAP) method is used to interpret the importance of features. The results show that the average weekly return of a firm over a year (RET) contributes the most and is negatively associated with crash risk, followed by Sigma, IPO age, and firm size. We also found that, among CEO characteristics, CEO pay contributes substantially to crash risk at the firm level. Our findings have important implications for research into the impact of firm and CEO characteristics on stock price crash risk and provide a novel way for investors to plan their investment decisions and risk-taking behavior rationally.

1. Introduction

Stock price crash is undoubtedly a disaster for society and investors, particularly retail investors who concentrate their money on a handful of firms; a stock price crash in a portfolio reduces their wealth [1]. The severe economic losses caused by stock price crash have prompted extensive research into the internal formation mechanisms of stock price crash risk. Because of information opacity and asymmetry, managers can conceal bad news to keep their high salaries [2,3]. When bad news accumulates above a certain threshold, it is immediately disclosed to the market, resulting in a significant decrease in stock price and company reputation [4].
Firm characteristics influence stock price movement and corporate risk-taking, as is common in risk-related research [5]. For instance, Deng et al. [6] proposed that stock price crash risk is related to several firm characteristics, including cash flow, operating capacity, debt-paying ability, growth potential, and profitability. They also suggested that we could use machine learning techniques to find each factor’s effects and feature importance. Large companies are likelier to experience a stock price crash because they imply discretionary disclosure [4,7]. However, other studies have stated that a stock price crash occurs when the “bad news hoarding” phenomenon accumulates and reaches a critical value, at which point the bad news floods the stock market without warning [3,8,9]. Therefore, we require a systematic investigation of the influence of firm characteristics to accurately reflect each determinant’s impact.
Previous studies have also identified CEO characteristics as key factors influencing firm-specific stock price crash risk [10,11,12,13]. For example, a younger CEO early in his career has the incentive to defer bad news for a string of consecutive earnings, increasing the likelihood of a crash risk [10]. Furthermore, a CEO with a greater position of power can withhold bad news for financial gain [14], resulting in stock price crashes. In addition, an overconfident CEO with poor management skills is likelier to overstate returns and ignore bad news, increasing crash risk [15]. According to Habib, Hasan, and Jiang [16], female CEOs positively impact stock price crash risk, but the relationship varies depending on whether the female CEO is in her first appointment. Various aspects of CEO characteristics can influence crash risk, so a comprehensive investigation into how CEO characteristics contribute to crashes is urgently needed.
Understanding the causes of stock price crashes is critical for instructing investors on how to protect shareholder value and reduce wealth losses. The large number of firms in the Chinese market across a wide range of industries and sectors allows for the collection of a large amount of data, which helps to analyze and predict the risk of stock crashes more accurately. Thus, this study uses data from Chinese listed companies from 2010 to 2020 to provide comprehensive information on the factors that influence crash risk in the Chinese stock market and explore how firms and CEO characteristics affect crash risk using machine learning algorithms. Unlike traditional analytical tools, machine learning methods can precisely analyze large and complex datasets and produce convincing results [17].
Our study contributes to two areas of risk management research. First, we developed a novel stock price crash determinants model by combining firm and CEO characteristics to provide a new perspective on crash risk research. Previous studies in this field have focused solely on the impact of firm or CEO characteristics [6,14,15], but this research combined the two. Second, our study contributes to revealing important rankings in firm and CEO characteristics and finding specific relationships between factors and crash risk. These relationships have empirical implications for increasing the detectability of firm-specific stock price crashes and improving stock market regulation.
The remainder of this study is structured as follows. Section 2 shows the data source and measurements, and Section 3 presents the analysis results. Section 4 discusses these findings. Finally, Section 5 concludes the paper by discussing the study’s contributions and limitations.

2. Research Methodology

Using machine learning algorithms, this study analyzed crash risk based on firm and the CEO characteristics. Machine learning can process a wide range of complex data, and its superior accuracy and explanatory power have made it a popular method for prediction and analysis [18,19,20,21]. In addition to its ability to identify long-term and delicate temporal patterns that are difficult for human analysts to detect, machine learning is particularly effective at modeling nonlinear behavior in financial data and accurately predicting the interaction effects of leading indicators of financial volatility [22]. Many scholars have shown that combining advanced deep learning and machine learning techniques is the best approach for financial forecast performance [23,24]. Recognizing the value of machine learning, we used 11 machine learning methods to investigate the relationships between firm characteristics and crash risk, including ridge regression, least absolute shrinkage and selection operator (Lasso), elastic net, multilayer perceptron regressor (MLPRegressor), decision tree, bagging, random forest, Extra-Trees, adaptive boosting (AdaBoost), Gradient Boosting Decision Tree (GBDT), and eXtreme Gradient Boosting (XGBoost).

2.1. Data and Sample

Our data cover listed companies in China from 2010 to 2020. The Chinese market, which has long been the world’s second largest and most dynamic market, has significantly affected the global market. The Chinese stock market has a short history, and its laws and regulatory systems are insufficient. Thus, relying on Chinese stock market data is reasonable. We used the Choice dataset to collect firm characteristics and stock returns, and the China Stock Market and Accounting Research (CSMAR) database to obtain CEO characteristics. Then, we matched the firm characteristics data, CEO characteristics data, and stock return data using the same stock code and fiscal year. Furthermore, we winsorized all variables at 1% and 99% to account for potential bias, as extreme values affect the accuracy of the analyzed results. To accurately measure the crash risk, we excluded observations with missing values and those with fewer than 30 weeks of stock returns. The final sample source included 1,999 firms (11,915 firm-year observations) from 2010 to 2020. Table 1 shows companies’ frequency and percentage distributions across the industries examined in this study. The dataset covers a wide range of industries from computer communication to production and supply of electric power and heat power, highlighting the diversity and intricacy of these industries.

2.2. Measuring Stock Crash Risk

Following Chen et al. [7], we used negative conditional return skewness (NCSKEW) and down-to-up volatility (DUVOL) in our study to estimate stock price crash risk. We used weekly return data to estimate the weekly return of each firm using the following regression model [4]:
R i , w = β 0 + β 1 R r , w 2 + β 2 R r , w 1 + β 3 R r , w + β 4 R r , w + 1 + β 5 R r , w + 2 + ε i , w
where R i , w is the stock return of firm i in week w , R i , w   is weighted average return, and ε i , w is the bias term, representing the parts of stock returns that do not relate to market returns. We also used two lead and two lag terms to alleviate potential problems that result in asynchronous stock trading [25]. We measured W j , t as the natural logarithm of the residual ( ε i , w ) plus 1.
NCSKEW, the negative conditional return skewness of a specific firm’s weekly returns, is the first indicator of a stock price crash risk. We computed NCSKEW using the following model:
N C S K E W j , t = 1 n n 1 3 2 W j , r 3 n 1 n 2 W j , r 2 3 2
where n is the number of observations of daily returns for firm j in year t. W j , t is the weekly returns of a specific firm, measured as the natural logarithm of their residual plus 1. An increasing risk of stock price crash exists for a firm with a higher NCSKEW.
DUVOL, the weekly return down-to-up volatility, is the second indicator of a stock price crash risk. We calculated DUVOL with the following model:
D U V O L j , t = ln n u 1 D O W N W j , t 2 n d 1   U P W j , t 2
where n u is the number of up weeks and n d is the number of down weeks. Specifically, we divided the total weeks into up weeks and down weeks. Furthermore, we calculated the standard deviations of the subsamples and found that firms with a higher DUVOL have an increased risk of stock price crash.

2.3. Measuring Determinants

We can divide the factors that influence crash risk into two categories: firm characteristics and CEO characteristics. The former includes firm age, IPO age, LogSize, leverage, goodwill, brand capital, cash, return on assets (ROA), return on equity (ROE), sigma, RET, and DTURN. The latter includes the CEO’s gender, age, education, MBA, duality, tenure, pay, shareholdings, board experience, academic experience, and overseas experience. We used previous studies to assess firm and CEO characteristics [10,19,21,23,24,26,27,28]. Table 2 shows a detailed description of these variables.

3. Results

3.1. Evaluation Criterion

We used MSE to evaluate the model performance, computed as follows:
M S E = 1 n i = 1 n Y i Y ˙ i 2
where Y i is the true value of NCSKEW or DUVOL, and Y i is the predicting value. MSE is a good indicator for evaluating machine learning models [29,30]. Thus, we employ MSE to compare our 11 machine learning models.

3.2. Model Evaluation and Comparison

This study trained machine learning models such as ridge regression, Lasso, elastic net, MLPRegressor, decision tree, bagging, random forest, Extra-Trees, AdaBoost, GBDT, and XGBoost on 80% of the data. We used the remaining 20% to assess the model. We evaluated the model using five-fold cross-validation, which divides the dataset into five subsamples. The model extracts four subsamples to train models, and the remaining subsample serves as a test set to evaluate the model. Finally, we constructed and evaluated five models using a different test dataset. In addition, we chose MSE to evaluate the model. Models with higher MSE values perform poorly. Table 3 displays the MSE results of the five-fold cross-validation process (MSE kf1-MSE kf5). When measured with NCSKEW, XGBoost’s MSE values ranged from 0.4654 to 0.5518, whereas when measured with DUVOL, they ranged from 0.2293 to 0.2657, which is lower than other methods. XGBoost performed best with a minimum mean MSE value, regardless of whether we used NCSKEW or DUVOL as the measure.
Next, we compared the models’ performance using MSE as the measure. We found that the MSE value to measure crash risk in the XGBoost was 0.4557 using NCSKEW and 0.2186 using DUVOL, which were slightly higher than those in ridge regression, Lasso, bagging, Extra-Trees, and GBDT. However, the XGBoost results outperformed the elastic net, MLPRegressor, and decision tree models. Although the MSE value of AdaBoost was slightly lower than that of XGBoost when measured using NCSKEW, the mean value of the five-fold cross-validation in XGBoost was lower than that of AdaBoost, indicating that XGBoost has greater stability than AdaBoost. The results are summarized in Table 3, and show that XGBoost is the best model to explore the determinants of crash risk.

3.3. Model Interpretation

Our study interpreted the model results using Lundberg and Lee’s [31] proposed SHapley Additive exPlanation (SHAP) model. SHAP indicates the contributions of each variable using game theory. We calculate SHAP values as follows:
S H A P   v a l u e i = S N i S ! M S 1 ! M ! f x S i f x S
where i is a feature that we need to interpret. N is the set of all features we input. M is the number of features that we need to interpret, and S is the subset of N . f x is the predicted result of x in the models. f x S indicates the predicted result of the set of S , and f x S i is the predicted result with the set of S adding i .
Feature importance is critical for determining which features contribute the most to the model’s performance. This study used SHAP to estimate the model by determining each factor’s importance and the feature’s specific impact. Table 4 shows the results of the SHAP summary analysis. Using the summary table, we calculated the mean absolute SHAP value, which reflected the contributions of each variable’s characteristics. Furthermore, the table efficiently conveys the effects of the variables on the model. Finally, we calculated the average feature importance rank because NCSKEW produces results that differ from those of DUVOL.
The SHAP summary table shows that the eight most important firm characteristics are RET, sigma, IPO age, LogSize, DTURN, brand capital, leverage, and cash. In addition, the top eight CEO characteristics include CEO pay, CEO accounting, CEO shareholdings, CEO age, CEO marketing, CEO tenure, CEO education, and CEO academic experience. The SHAP summary table displays the specific impact of features on the risk of stock price crash. Crash risk correlates negatively with RET, sigma, IPO age, LogSize, DTURN, brand capital, CEO pay, cash, ROE, firm age, CEO marketing, CEO education, CEO RD, CEO design, CEO production, and CEO overseas experience. Meanwhile, a positive relationship exists between crash risk and leverage, goodwill, ROA, CEO accounting, CEO age, CEO tenure, CEO academic experience, CEO board experience, CEO finance, CEO HRM, and CEO MBA. However, using NCSKEW to estimate the impact of CEO duality and CEO gender yields different results than using DUVOL.

4. Discussion

Intense fluctuations in stock prices have created uncertainty in investors’ reactions and behaviors, as well as in the daily operations of companies, increasing the need to manage stock price crash risk for companies’ long-term development. Given that China is the world’s largest developing country, with a huge consumer base and diverse market demand, there is significant uncertainty about stock market changes. Based on the background of the Chinese market, we used 11 machine learning methods to explore the influential factors of crash risk using firm and CEO characteristics from a large-scale sample of Chinese listed companies. This section discusses the study’s significant theoretical and practical implications.
This study contributes to the literature on stock price crash risk by developing a stock price crash determinants model using machine learning techniques considering firm and CEO characteristics. Although much previous literature has studied the relationship between firm and CEO characteristics and crash risk, most research has not explored the joint influence and influential degree of large firm and CEO characteristics [4,6]. As a result, the complexity of analyzing the determinants of crash risk has increased due to the presence of multiple features at once. However, our study fills the gap by applying machine learning methods to explore the factors that significantly impact stock price crash risk.
This study provides evidence of how firm and CEO characteristics impact crash risk, including firm characteristics (i.e., RET, sigma, and firm age) and CEO characteristics (i.e., CEO pay, CEO accounting, and CEO shareholdings). Extending Zhang et al.’s [32] study with the SHAP method, we found that RET, or the weekly return of a firm over a year, is the most significant factor among these, negatively impacting crash risk. However, unlike Xu and Zou [33], who argued that the relationship between CEO pay and stock price crash risk is unclear, our study found that CEO pay has the greatest significance in terms of CEO characteristics and has a negative influence on crash risk. Higher pay encourages them to focus more on firm performance, resulting in lower crash risk.
Furthermore, we found that a firm with a CEO with accounting experience is more likely to crash. We think that a CEO with accounting experience can easily manipulate company performance, resulting in a higher crash risk. Additionally, CEO finance and crash risk are positively associated, consistent with Jiang et al.’s [11] findings. However, our study’s results on CEO age’s impact on crash risk contradict previous research. According to other researchers, a firm with a younger CEO is more likely to experience a crash risk [10]. However, our results show a positive relationship between CEO age and crash risk.
This study has some practical implications for investors and supervisors. First, there is a strong association between firm characteristics and crash risk. Our study found that RET, IPO age, firm size, and brand capital affect crash risk. To avoid loss, investors should focus on these factors. For example, they can invest in stocks with a higher and more stable average weekly return, a higher ROE, and a lower ROA. RET, as the most important determinant in current research, should be noted that it negatively influences crash risk. A low or declining RET for a company may indicate poor profitability or business challenges, leading investors to become pessimistic about its future performance and stock price, resulting in share sales. Consequently, a mass sell-off can put downward pressure on the stock price, potentially leading to a market disaster if left unchecked. During a crash, stock prices may plummet, market turnover may sharply decrease, investor confidence may suffer, and the stock market’s operational mechanism may sustain significant damage. Therefore, a decreasing or downward-trending RET is an early indicator of an impending stock crash. Investors and regulators should closely monitor this indicator and respond quickly to potential risks. Simultaneously, companies should strive to improve performance and profitability to maintain share prices and bolster investor trust.
Our findings suggest that regulators should be vigilant in monitoring firms’ characteristics, particularly those associated with higher crash risk. They can use the insights from this study to develop more targeted surveillance mechanisms and policies to prevent potential crashes. For example, they could impose stricter disclosure requirements on firms with specific characteristics, such as a high IPO age or a large size, to ensure that investors can access all relevant information when making investment decisions.

5. Conclusions

This study comprehensively identified the impact of various firm and CEO characteristics on crash risk and their respective contributions to stock price crash prediction. We applied 11 machine learning models to analyze data from Chinese listed firms between 2010 and 2020. The practical results show that XGBoost has the best machine learning performance and effectively examines the relationships between 31 input variables and the risk of a stock price crash. Furthermore, the SHAP method used in this study shows the importance of firm and CEO characteristics in interpreting the XGBoost model. We found among the characteristics of the firm and the CEO, the most ten important factors that impact the stock price crash risk are RET, sigma, IPO age, LogSize, DTURN, Brand capital, CEO pay, Leverage, Cash, and Goodwilk
Although this article contributes to a comprehensive presentation of firm and CEO characteristics that may be associated with the risk of a stock price crash, there are still research limitations. First, our study only included 31 features in our models, which may explain the models’ limited performance. The study may have overlooked important variables impacting model performance. Second, we do not differentiate between the equity sectors to which the stock belongs, which may affect the accuracy of the results. Future research can use advanced machine learning methods to analyze samples from various equity sectors. Moreover, we only considered stocks from companies listed in the Chinese stock market. Future research can use a longer period of observations and a broader range of stocks from different markets to explore crash risk more comprehensively.

Author Contributions

Conceptualization, Y.L., S.W. and F.L.; methodology, Y.L., S.W., H.X., R.W. and F.L.; software, S.W.; formal analysis, Y.L., S.W. and F.L.; resources, F.L.; data curation, S.W.; writing—original draft preparation, Y.L., H.X., S.W. and F.L.; writing—review and editing, Y.L., H.X., R.W. and F.L.; supervision, F.L.; project administration, F.L.; funding acquisition, F.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the Humanities and Social Sciences Foundation of the Ministry of Education of China (Grant No. 21YJC630076).

Data Availability Statement

Data will be available from the corresponding author upon reasonable request.

Acknowledgments

We gratefully acknowledge the insightful suggestions from the editors and anonymous reviewers that substantively improved this article. We would also like to thank the members of Star-lights Research Team for their comments on earlier versions of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Barber, B.M.; Odean, T. The behavior of individual investors. Handbook Econ. Financ. 2013, 2, 1533–1570. [Google Scholar] [CrossRef]
  2. Bleck, A.; Liu, X. Market transparency and the accounting regime. J. Account. 2007, 45, 229–256. [Google Scholar] [CrossRef]
  3. Jin, L.; Myers, S.C. R2 around the world: New theory and new tests. J. Financ. Econ. 2006, 79, 257–292. [Google Scholar] [CrossRef]
  4. Hutton, A.P.; Marcus, A.J.; Tehranian, H. Opaque financial reports, R2, and crash risk. J. Financ. Econ. 2009, 94, 67–86. [Google Scholar] [CrossRef]
  5. Liu, C.; Chen, Y.; Huang, S.; Chen, X.; Liu, F. Assessing the Determinants of Corporate Risk-Taking Using Machine Learning Algorithms. Systems 2023, 11, 263. [Google Scholar] [CrossRef]
  6. Deng, S.; Zhu, Y.; Duan, S.; Fu, Z.; Liu, Z. Stock Price Crash Warning in the Chinese Security Market Using a Machine Learning-Based Method and Financial Indicators. Systems 2022, 10, 108. [Google Scholar] [CrossRef]
  7. Chen, J.; Hong, H.; Stein, J.C. Forecasting crashes: Trading volume, past returns, and conditional skewness in stock prices. J. Financ. Econ. 2001, 61, 345–381. [Google Scholar] [CrossRef]
  8. Baik BO, K.; Farber, D.B.; Lee SA, M. CEO ability and management earnings forecasts. Contemp. Account. Res. 2011, 28, 1645–1668. [Google Scholar] [CrossRef]
  9. Xie, W.; Ye, C.; Wang, T.; Shen, Q. M&A goodwill, information asymmetry and stock price crash risk. Econ. Res.-Ekon. Istraz. 2020, 33, 3385–3405. [Google Scholar] [CrossRef]
  10. Andreou, P.C.; Louca, C.; Petrou, A.P. CEO age and stock price crash risk. Int. Rev. Financ. 2017, 21, 1287–1325. [Google Scholar] [CrossRef]
  11. Jiang, X.; Zhu, J.; Akbar, A.; Hou, Z.; Bao, X. The dark side of executives’ professional background: Evidence from Chinese firm’s stock price crash risk. Manag. Decis. Econ. 2022, 43, 3771–3784. [Google Scholar] [CrossRef]
  12. Kim, J.B.; Liao, S.; Liu, Y. Married CEOs and stock price crash risk. Eur. Financ. Manag. 2022, 28, 1376–1412. [Google Scholar] [CrossRef]
  13. Li, Y.; Zeng, Y. The impact of top executive gender on asset prices: Evidence from stock price crash risk. J. Corp. Financ. 2019, 58, 528–550. [Google Scholar] [CrossRef]
  14. Al Mamun, M.; Balachandran, B.; Duong, H.N. Powerful CEOs and stock price crash risk. J. Corp. Financ. 2020, 62, 101582. [Google Scholar] [CrossRef]
  15. Kim, J.B.; Wang, Z.; Zhang, L. CEO overconfidence and stock price crash risk. Contemp. Account. Res. 2016, 33, 1720–1749. [Google Scholar] [CrossRef]
  16. Habib, A.; Hasan, M.M.; Jiang, H. Stock price crash risk: Review of the empirical literature. Account. Financ. 2018, 58, 211–251. [Google Scholar] [CrossRef]
  17. Zhu, Y.; Xie, C.; Wang, G.J.; Yan, X.G. Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance. Neural Comput. Appl. 2017, 28, 41–50. [Google Scholar] [CrossRef]
  18. Choudhury, P.; Allen, R.T.; Endres, M.G. Machine learning for pattern discovery in management research. Strateg. Manag. J. 2021, 42, 30–57. [Google Scholar] [CrossRef]
  19. Liu, F.; Wang, R.; Fang, M. Mapping green innovation with machine learning: Evidence from China. Technol. Forecast. Soc. Chang. 2024, 200, 123107. [Google Scholar] [CrossRef]
  20. Liu, F.; Long, X.; Dong, L.; Fang, M. What makes you entrepreneurial? Using machine learning to investigate the determinants of entrepreneurship in China. China Econ. Rev. 2023, 81, 102029. [Google Scholar] [CrossRef]
  21. Liu, F.; Huang, W.; Zhang, J.; Fang, M. Corporate social responsibility in family business: Using machine learning to uncover who is doing good. Technol. Soc. 2024, 76, 102453. [Google Scholar] [CrossRef]
  22. Chatzis, S.P.; Siakoulis, V.; Petropoulos, A.; Stavroulakis, E.; Vlachogiannakis, N. Forecasting stock market crisis events using deep and statistical machine learning techniques. Expert Syst. Appl. 2018, 112, 353–371. [Google Scholar] [CrossRef]
  23. Wang, M.; Yu, Y.; Liu, F. Does digital transformation curb the formation of zombie firms? A machine learning approach. Technol. Anal. Strat. Manag. 2023, 1–17. [Google Scholar] [CrossRef]
  24. Zhang, J.; Zhu, M.; Liu, F. Find who is doing social good: Using machine learning to predict corporate social responsibility performance. Oper. Manag. Res. 2023, 17, 253–266. [Google Scholar] [CrossRef]
  25. Dimson, E. Risk measurement when shares are subject to infrequent trading. J. Financ. Econ. 1979, 7, 197–226. [Google Scholar] [CrossRef]
  26. Ben-Nasr, H.; Bouslimi, L.; Zhong, R. Do patented innovations reduce stock price crash risk? Inter. Rev. Financ. 2021, 21, 3–36. [Google Scholar] [CrossRef]
  27. Chen, Y.; Fan, Q.; Yang, X.; Zolotoy, L. CEO early-life disaster experience and stock price crash risk. J. Corp. Financ. 2021, 68, 101928. [Google Scholar] [CrossRef]
  28. Hasan, M.M.; Taylor, G.; Richardson, G. Brand capital and stock price crash risk. Manag. Sci. 2022, 68, 7221–7247. [Google Scholar] [CrossRef]
  29. Baek, Y.; Kim, H.Y. ModAugNet: A new forecasting framework for stock market index value with an overfitting prevention LSTM module and a prediction LSTM module. Expert Syst. Appl. 2018, 113, 457–480. [Google Scholar] [CrossRef]
  30. Lv, P.; Shu, Y.; Xu, J.; Wu, Q. Modal decomposition-based hybrid model for stock index prediction. Expert Syst. Appl. 2022, 202, 117252. [Google Scholar] [CrossRef]
  31. Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  32. Zhang, M.; Xie, L.; Xu, H. Corporate philanthropy and stock price crash risk: Evidence from China. J. Bus. Ethics. 2016, 139, 595–617. [Google Scholar] [CrossRef]
  33. Xu, J.; Zou, L. The impact of CEO pay and its disclosure on stock price crash risk: Evidence from China. China Financ. Rev. Int. 2019, 9, 479–497. [Google Scholar] [CrossRef]
Table 1. Distribution of the sample across industries.
Table 1. Distribution of the sample across industries.
Panel A. Distribution of Sample Firms Across Industries
IndustryFrequencyPercentage
Computers, Communication Equipment, and Other Electronic Equipment Manufacturing20110.06%
Pharmaceutical Manufacturing1688.40%
Electrical Machinery and Equipment Manufacturing1386.90%
Software and Information Technology Services1366.80%
Raw Chemical Materials and Chemical Products Manufacturing1346.70%
Special-purpose Machinery Manufacturing1346.70%
Real Estate773.85%
General-purpose Machinery Manufacturing683.40%
Automobile Manufacturing572.85%
Retail532.65%
Nonmetal Mineral Products482.40%
Rubber and Plastic Products452.25%
Wholesale412.05%
Internet and Other Related Services412.05%
Smelting and Pressing of Nonferrous Metals361.80%
Metal Products341.70%
Instrument and Apparatus Manufacturing341.70%
Wine, Beverage, and Refined Tea Manufacturing301.50%
Railway, Shipping, Aerospace, and Other Transport Equipment Manufacturing291.45%
Agricultural and Sideline Product Processing281.40%
Business Services271.35%
Food Manufacturing261.30%
Ecological Protection and Environmental Governance241.20%
Civil Engineering Construction231.15%
Textile Garments and Clothing221.10%
Production and Supply of Electric Power and Heat Power201.00%
Others32516.26%
Total1999100.00%
Panel B. Distribution of Firm-year Observations Across Industries
IndustryFrequencyPercentage
Computers, Communication Equipment, and Other Electronic Equipment Manufacturing11179.37%
Pharmaceutical Manufacturing10408.73%
Electrical Machinery and Equipment Manufacturing8076.77%
Software and Information Technology Services7966.68%
Raw Chemical Materials and Chemical Products Manufacturing7616.39%
Special-purpose Machinery Manufacturing7576.35%
Real Estate5604.70%
General-purpose Machinery Manufacturing4003.36%
Retail3673.08%
Automobile Manufacturing3332.79%
Nonmetal Mineral Products3112.61%
Wholesale2812.36%
Rubber and Plastic Products2562.15%
Metal Products2432.04%
Wine, Beverage, and Refined Tea Manufacturing2301.93%
Internet and Other Related Services2221.86%
Smelting and Pressing of Nonferrous Metals2201.85%
Agricultural and Sideline Product Processing1861.56%
Instrument and Apparatus Manufacturing1761.48%
Business Services1701.43%
Textile Garments and Clothing1471.23%
Railway, Shipping, Aerospace and Other Transport Equipment Manufacturing1471.23%
Food Manufacturing1381.16%
Civil Engineering Construction1381.16%
Textiles1201.01%
Ecological Protection and Environmental Governance1191.00%
Production and Supply of Electric Power and Heat Power1050.88%
Others176814.84%
Total11,915100.00%
Table 2. Variable definitions.
Table 2. Variable definitions.
VariableDefinitionMeanSD
Firm characteristics
Firm ageYears since the firm’s founding18.5355.531
IPO ageYears since the firm’s IPO9.7956.872
LogSizeln(Total assets)22.0551.145
LeverageTotal debt/Total assets0.4130.207
GoodwillGoodwill/Total assets; Zero replaces the missing values in goodwill0.0380.082
Brand capitalAdvertising expenses/Total assets0.0380.082
Cash(Cash+Short-term Investments)/Total assets0.1670.129
ROANet income/Total assets0.0550.070
ROENet income/Total equity0.0590.141
SigmaStandard deviation of weekly stock revenue over a year0.0520.020
RETAverage weekly returns of a specific firm over a year0.0000.007
DTURNCurrent-year mean monthly share turnover–Last-year mean monthly share turnover0.14024.296
CEO characteristics
CEO gender“1” if the CEO is male and “0” if the CEO is female0.9300.256
CEO ageAge of the CEO49.7246.508
CEO education“1” if the CEO holds a postgraduate degree and “0” otherwise0.4650.499
CEO MBA“1” if the CEO holds an MBA degree and “0” otherwise.0.0770.267
CEO duality“1” if the CEO also serves as chairman and “0” otherwise0.3140.464
CEO tenureNumber of years in a CEO position with a particular company1.3150.798
CEO payln(Total Annual Salary + 1)13.1271.813
CEO shareholdingsln(Outstanding Shares held by CEO + 1)9.0557.903
CEO board experience“1” if the CEO is also a director and “0” otherwise0.9180.274
CEO academic experience“1” if the CEO has the experience of (a) teaching at a college, (b) working at a research laboratory, and (c) researching at an institute, and “0” otherwise0.2340.423
CEO overseas experience“1” if the CEO has overseas experience and “0” otherwise0.0940.292
CEO production“1” if the CEO has career experience in the production area and “0” otherwise0.1270.333
CEO RD“1” if the CEO has career experience in the R&D area and “0” otherwise0.2660.442
CEO design“1” if the CEO has career experience in the design area and “0” otherwise0.0310.172
CEO HRM“1” if the CEO has career experience in the human resource management area and “0” otherwise0.0220.146
CEO administration“1” if the CEO has career experience in the administration area and “0” otherwise1.0000.000
CEO marketing“1” if the CEO has career experience in the marketing area and “0” otherwise0.2910.454
CEO finance“1” if the CEO has career experience in the finance area and “0” otherwise0.1360.342
CEO accounting“1” if the CEO has career experience in the accounting area and “0” otherwise0.1050.306
Table 3. Descriptive results of machine learning methods.
Table 3. Descriptive results of machine learning methods.
ModelMSEMSE kf1MSE kf2MSE kf3MSE kf4MSE kf5Mean
Panel A. Using NCSKEW as the measure
XGBoost0.45570.48830.46540.53370.52220.55180.5123
GBDT0.45780.48310.48530.55220.50360.53720.5123
AdaBoost0.45220.50320.48980.54170.51090.55020.5191
Bagging0.45820.50550.49390.55030.51680.55530.5243
Random forest0.51960.49710.53050.57220.54300.67940.5644
Extra-Trees0.53850.54020.53510.58100.56470.65860.5759
Lasso0.52410.65910.65280.56680.55310.65620.6176
Ridge0.52480.66660.65280.56750.55500.65510.6194
MLPRegressor0.90780.76920.68920.76410.72410.80270.7499
Elastic net0.81240.76490.81740.92460.83830.82710.8345
Decision tree1.00601.05451.05020.99741.13521.16621.0807
Panel B. Using DUVOL as the measure
XGBoost0.21860.24590.26570.25890.24720.22930.2494
GBDT0.22150.23660.28190.27150.22950.23020.2499
AdaBoost0.22350.25590.28510.27170.23770.24910.2599
Bagging0.22430.25550.28080.27170.23330.25850.2600
Random forest0.26830.25010.32570.30080.26100.35190.2979
Extra-Trees0.29380.29150.32450.32980.28780.32790.3123
Lasso0.28470.42070.44540.32600.28010.34470.3634
Ridge0.28530.42450.44580.32590.28080.34370.3641
MLPRegressor0.63720.46500.55440.63040.41680.40750.4948
Elastic net0.47660.55880.54570.51940.49120.47200.5174
Decision tree0.55340.51550.58110.65090.54700.46830.5526
Note: The table compares the performance of 11 models using MSE aided by five-fold cross-validation.
Table 4. The results of the SHAP summary analysis.
Table 4. The results of the SHAP summary analysis.
VariableSHAP1Effects1Rank1SHAP2Effects2Rank2Average Rank
RET0.4565−0.921210.4441−0.933011
Sigma0.1206−0.923020.0738−0.801622
IPO age0.0779−0.891930.0641−0.855433
LogSize0.0624−0.878140.0518−0.918244
DTURN0.0296−0.138850.0215−0.554355
Brand capital 0.0257−0.183960.0178−0.306776.5
CEO pay0.0193−0.428990.0203−0.539367.5
Leverage0.01690.3330100.01550.573189
Cash0.0217−0.102170.0081−0.1376129.5
Goodwill0.01150.2983110.01190.7344910
ROE0.0197−0.052980.0080−0.21351310.5
Firm age0.0104−0.3905130.0096−0.72121011.5
ROA0.01100.5675120.00850.61581111.5
CEO accounting0.00860.8266150.00460.71131515
CEO shareholdings0.00790.3512170.0058−0.24651415.5
CEO age0.00810.1521160.00450.41511616
CEO marketing0.0090−0.7914140.0041−0.85441816
CEO tenure0.00670.2150180.00390.36011918.5
CEO education0.0016−0.8944210.0042−0.77421719
CEO academic experience0.00180.6689190.00150.83962019.5
CEO RD0.0006−0.0171240.0006−0.40332122.5
CEO duality0.0007−0.3051230.00040.07932624.5
CEO board experience0.00040.4694260.00050.09192425
CEO design0.0017−0.5892200.0000−0.06673025
CEO finance0.00040.4053270.00050.58842325
CEO HRM0.00100.7714220.00010.45822825
CEO production0.0006−0.5677250.0004−0.31662525
CEO gender0.00020.3752300.0005−0.71192226
CEO overseas experience0.0002−0.3707290.0002−0.24512728
CEO MBA0.00030.5626280.00010.54752928.5
CEO administration0.00000.0000310.00000.00003131
Note: SHAP1, Effects1, and Rank1 are the results using NCSKEW as a measure and SHAP2, Effects2, and Rank2 are the results using DUVOL. Rank is the rank of feature importance on average.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Y.; Xue, H.; Wei, S.; Wang, R.; Liu, F. A Machine Learning Approach for Investigating the Determinants of Stock Price Crash Risk: Exploiting Firm and CEO Characteristics. Systems 2024, 12, 143. https://doi.org/10.3390/systems12050143

AMA Style

Li Y, Xue H, Wei S, Wang R, Liu F. A Machine Learning Approach for Investigating the Determinants of Stock Price Crash Risk: Exploiting Firm and CEO Characteristics. Systems. 2024; 12(5):143. https://doi.org/10.3390/systems12050143

Chicago/Turabian Style

Li, Yan, Huiyuan Xue, Shiyu Wei, Rongping Wang, and Feng Liu. 2024. "A Machine Learning Approach for Investigating the Determinants of Stock Price Crash Risk: Exploiting Firm and CEO Characteristics" Systems 12, no. 5: 143. https://doi.org/10.3390/systems12050143

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop