Text Mining for U.S. Pension De-Risking Analysis
Abstract
:1. Introduction
2. Literature Review
2.1. Research Related to Pension Plan De-Risking Strategies
2.2. Text Mining of Financial Documents
2.3. Machine Learning in Text Classification
3. Research Methodology
3.1. Data Collection
3.2. Level 1 and Level 2 Filters
3.3. Machine Learning
3.3.1. Data Pre-Processing
- 2-g: “board took”, “took action”, “action termin”, “termin db”, “db plan”;
- 3-g: “board took action”, “took action termin”, “action termin db”, “termin db plan”.
3.3.2. Model Training and Testing
Algorithm Selection
Model Training
Model Testing
3.4. Level 3 Filter and Manual Process
4. Empirical Analysis and Implications
4.1. Impacts of Pension De-Risking on Firms’ Performance
4.2. Implications
5. Conclusions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Appendix A
Variables | Variable Definitions |
---|---|
equals 1 if firm has one or more de-risking activities in year equals 0 for all the observations of the non-derisking firms and the observations of the de-risking firms in the years when they do not conduct any de-risking activity. | |
Pension Assets (PA) | Calculated as sum of overfunded and underfunded pension assets (PPLAO + PPLAU before 1997, and PPLAO after 1997). |
Pension Liabilities (PL) | Calculated as the sum of overfunded and underfunded pension benefit obligation (PBPRO + PBPRU before 1997, and PBPRO after 1997). |
Pension Underfunding Ratio | Defined as the ratio of difference between PA and PL to PA. |
Total Assets | Defined as logarithm of book value of firm total assets with CPI-adjustment. |
Leverage | Defined as the book value of firm debt divided by the sum of market value of firm equity and the book value of firm debt. |
Profitability | Defined as firm earnings before interest, tax, depreciation, and amortization (EBITDA) divided by the book value of firm assets. |
Earnings Volatility | Defined as standard deviation of firms’ earnings (first difference of EBITDA ratio) during the four-year period before each of the firms’ fiscal year-ends. |
Cash Holding | The ratio of cash plus marketable securities to total assets. |
No-cash Working Capital | The ratio of working capital net of cash to total assets. |
Tangible Assets | Defined as the book value of firms’ tangible assets divided by the book value of firms’ total assets. |
Capital Expenditure | The ratio of capital expenditure to total assets. |
Sales Growth | The annual growth rate of a firm’s total sales. |
Private Debt | The ratio of private debt capital to the market value of assets. The private debt is calculated using total debt minus the amount of notes, subordinated debt, debentures and commercial papers. |
Credit Rating | Computed using a conversion process in which AAA-rated bonds are assigned a value of 22 and D-rated bonds receive a value of one, following Klock et al. (2005). |
Stock Return Volatility | Defined as the standard deviation of firm equity monthly returns during the 24-month period before each of firms’ fiscal year-ends. |
Equity Excess Return | It follows the method in Faulkender and Wang (2006) to estimate a firm’s annualized stock returns subtracted by the benchmark returns of Fama and French (1993) size and book-to-market matched portfolios during the same time period. |
Appendix B
Document Frequency | Total Frequency | |
---|---|---|
2-g | ||
benefit_pension | 111 | 148 |
benefit_plan | 246 | 570 |
benefit_retir | 80 | 110 |
compani_s | 159 | 300 |
compani_termin | 83 | 98 |
contribut_plan | 90 | 166 |
death_disabl | 89 | 166 |
defer_compens | 104 | 168 |
defin_benefit | 191 | 327 |
defin_contribut | 104 | 191 |
defin_section | 98 | 217 |
employ_termin | 89 | 144 |
employe_benefit | 106 | 191 |
employe_pension | 82 | 138 |
financi_statement | 92 | 106 |
mean_section | 84 | 147 |
particip_s | 80 | 291 |
pension_benefit | 97 | 185 |
pension_plan | 295 | 766 |
plan_termin | 194 | 295 |
profit_share | 124 | 186 |
retir_benefit | 117 | 224 |
retir_plan | 196 | 350 |
retir_termin | 106 | 165 |
s_employ | 83 | 154 |
section_erisa | 92 | 247 |
set_forth | 151 | 299 |
stock_option | 126 | 277 |
termin_employ | 209 | 652 |
termin_plan | 96 | 132 |
year_end | 89 | 143 |
3-g | ||
benefit_pension_plan | 107 | 137 |
defin_benefit_pension | 99 | 125 |
defin_benefit_plan | 85 | 162 |
defin_contribut_plan | 82 | 148 |
employe_benefit_plan | 91 | 128 |
References
- Allahyari, Mehdi, Seyedamin Pouriyeh, Mehdi Assefi, Saied Safaei, Elizabeth D. Trippe, Juan B. Gutierrez, and Krys Kochut. 2017. A brief survey of text mining: Classification, clustering and extraction techniques. arXiv arXiv:1707.02919. [Google Scholar]
- Atanasova, Christina, and Karel Hrazdil. 2010. Why do healthy firms freeze their defined-benefit pension plans? Global Financial Journal 21: 293–303. [Google Scholar] [CrossRef]
- Bollen, Johan, and Mao Huina. 2011. Twitter mood as a stock market predictor. Computer 44: 91–94. [Google Scholar] [CrossRef]
- Cantor, David R., Frederick M. Hood, and Mark L. Power. 2017. Annuity buyouts: An empirical analysis. Investment Guides 2017: 10–20. [Google Scholar]
- Chang, Chih-Chung, and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2: 1–27. [Google Scholar] [CrossRef]
- Chintalapudi, Nalini, Gopi Battineni, and Francesco Amenta. 2021. Sentimental analysis of COVID-19 tweets using deep learning models. Infectious Disease Reports 13: 329–39. [Google Scholar] [CrossRef]
- Choy, Helen, Juichia Lin, and Micah S. Officer. 2014. Does freezing a defined benefit pension plan affect firm risk? Journal of Accounting and Economics 57: 1–21. [Google Scholar] [CrossRef]
- Comprix, Joseph, and Karl A. Muller III. 2011. Pension plan accounting estimates and the freezing of defined benefit pension plans. Journal of Accounting and Economics 51: 115–33. [Google Scholar] [CrossRef]
- Das, Sanjiv R., and Mike Y. Chen. 2007. Yahoo! for Amazon: Sentiment extraction from small talk on the web. Management Science 53: 1375–88. [Google Scholar] [CrossRef] [Green Version]
- Fama, Eugene F., and Kenneth R. French. 1993. Common risk factors in the returns on stocks and bonds. Journal of Financial Economics 33: 3–56. [Google Scholar] [CrossRef]
- Faulkender, Michael, and Rong Wang. 2006. Corporate financial policy and the value of cash. Journal of Finance 61: 1957–90. [Google Scholar] [CrossRef]
- Ghasiya, Piyush, and Koji Okamura. 2021. Investigating COVID-19 news across four nations: A topic modeling and sentiment analysis approach. IEEE Access 9: 36645–56. [Google Scholar] [CrossRef] [PubMed]
- Hagenau, Michael, Michael Liebmann, and Dirk Neumann. 2013. Automated news reading: Stock price prediction based on financial news using context-capturing features. Decision Support Systems 55: 685–69. [Google Scholar] [CrossRef]
- Hotho, Andreas, Andreas Nürnberger, and Gerhard Paaß. 2005. A brief survey of text mining. LDV Forum 20: 19–62. [Google Scholar]
- Huang, Chenn-Jung, Jia-Jian Liao, Dian-Xiu Yang, Tun-Yu Chang, and Yun-Cheng Luo. 2010. Realization of a news dissemination agent based on weighted association rules and text mining techniques. Expert Systems with Applications 37: 6409–13. [Google Scholar] [CrossRef]
- Jallan, Yashovardhan, and Baabak Ashuri. 2020. Text mining of the securities and exchange commission financial filings of publicly traded construction firms using deep learning to identify and assess risk. Journal of Construction Engineering and Management 146: 04020137. [Google Scholar] [CrossRef]
- Jiang, Ming, Junlei Wu, Xiangrong Shi, and Min Zhang. 2019. Transformer based memory network for sentiment analysis of web comments. IEEE Access 7: 179942–53. [Google Scholar] [CrossRef]
- Joachims, Thorsten. 1998. Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning. Berlin and Heidelberg: Springer, pp. 137–42. [Google Scholar]
- jsoup: Java HTML Parser. n.d. Available online: https://jsoup.org/ (accessed on 3 December 2020).
- Klock, Mark S., Sattar A. Mansi, and William. F. Maxwell. 2005. Does corporate governance matter to bondholders? Journal of Financial and Quantitative Analysis, 693–719. [Google Scholar] [CrossRef]
- Kloptchenko, Antonina, Tomas Eklund, Jonas Karlsson, Barbro Back, Hannu Vanharanta, and Ari Visa. 2004. Combining data and text mining techniques for analysing financial reports. Intelligent Systems in Accounting, Finance and Management 12: 29–41. [Google Scholar] [CrossRef]
- Kumar, B. Shravan, and Vadlamani Ravi. 2016. A survey of the applications of text mining in financial domain. Knowledge-Based Systems 14: 128–47. [Google Scholar] [CrossRef]
- Leo, Martin. 2020. Operational Resilience Disclosures by Banks: Analysis of Annual Reports. Risks 8: 128. [Google Scholar] [CrossRef]
- Naseem, Usman, Imran Razzak, Katarzyna Musial, and Muhammad Imran. 2020. Transformer based Deep Intelligent Contextual Embedding for Twitter sentiment analysis. Future Generation Computer Systems 113: 58–69. [Google Scholar] [CrossRef]
- Nassirtoussi, Arman Khadjeh, Saeed Aghabozorgi, Teh Ying Wah, and David Chek Ling Ngo. 2014. Text mining for market orediction: A systematic review. Expert Systems with Applications 41: 7653–70. [Google Scholar] [CrossRef]
- RapidMiner. n.d. Available online: https://rapidminer.com/ (accessed on 3 December 2020).
- Schumaker, Robert P., and Hsinchun Chen. 2009. Textual analysis of stock market prediction using breaking financial news: The AZFin text system. ACM Transactions of Information Systems 27: 1–19. [Google Scholar] [CrossRef]
- Sebastiani, Fabrizio. 2002. Machine learning in automated text categorization. ACM Computing Surveys 34: 1–47. [Google Scholar] [CrossRef]
- Singh, Mrityunjay, Amit Kumar Jakhar, and Shivam Pandey. 2021. Sentiment analysis on the impact of coronavirus in social life using the BERT model. Social Network Analysis and Mining 11: 33. [Google Scholar] [CrossRef]
- Stanford NLP Group. n.d. CoreNLP. Available online: https://stanfordnlp.github.io/CoreNLP/ (accessed on 3 December 2020).
- Tang, Yuchun, Yan-Qing Zhang, Nitesh V. Chawla, and Sven Krasser. 2009. SVMs modeling for highly imbalanced classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39: 281–88. [Google Scholar] [CrossRef] [Green Version]
- Tetlock, Paul C., Maytal Saar-tsechansky, and Sofus Macskassy. 2008. More than words: Quantifying language to measure firms’ fundamentals. The Journal of Finance 63: 1437–67. [Google Scholar] [CrossRef]
- Tharwat, Alaa, Aboul Ella Hassanien, and Basem E. Elnaghi. 2017. A BA-based algorithm for parameter optimization of Support Vector Machine. Pattern Recognition Letters 93: 13–22. [Google Scholar] [CrossRef]
- Tian, Ruilin, and Jeffrey (Jun) Chen. 2020. De-Risking Strategies of Defined Benefit Plans: Empirical Evidence from the United States. Schaumburg: Society of Actuaries. [Google Scholar]
- Türegün, Nida. 2019. Text mining in financial information. Current Analysis on Economics & Finance 1: 18–26. [Google Scholar]
- U.S. Department of Labor. n.d. Pension Protection Act (PPA). Available online: https://www.dol.gov/agencies/ebsa/laws-and-regulations/laws/pension-protection-act (accessed on 3 December 2020).
- U.S. Securities and Exchange Commission. n.d. About EDGAR. Available online: https://www.sec.gov/edgar/about (accessed on 25 November 2020).
- Vafeas, Nikos, and Adamos Vlittis. 2018. Independent directors and defined benefit pension plan freezes. Journal of Corporate Finance 50: 505–18. [Google Scholar] [CrossRef]
- Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30: 5998–6008. [Google Scholar]
- Vu, Tien Thanh, Shu Chang, Quang Thuy Ha, and Nigel Collier. 2012. An experiment in integrating sentiment features for tech stock prediction in Twitter. In The Workshop on Information Extraction and Entity Analytics on Social Media Data. Mumbai: The COLING 2012 Organizing Committee, pp. 23–38. [Google Scholar]
- Werner, Antweiler, and Murray Z. Frank. 2004. Is all that talk just noise? The information content of internet stock message board. Journal of Finance 10: 1259–94. [Google Scholar]
- Wu, Gang, and Edward Y. Chang. 2005. KBA: Kernel boundary alignment considering imbalanced data distribution. IEEE Transactions on Knowledge and Data Engineering 17: 786–95. [Google Scholar] [CrossRef] [Green Version]
- Zhai, Yu Zheng, Arthur L Hsu, and Saman K Halgamuge. 2007. Combining news and technical indicators in daily stock price trends prediction. Paer presented at 4th International Symposium on Neural Networks: Advances in Neural Networks, Part III, Nanjing, China, June 3–7; Berlin and Heidelberg: Springer, pp. 1087–96. [Google Scholar]
- Zhang, Wen, Taketoshi Yoshida, and Xijin Tang. 2011. A comparative study of TF*IDF, LSI and multi-words for text classification. Expert Systems with Applications 38: 2758–65. [Google Scholar] [CrossRef]
- Zheng, Ying, and Harry Zhou. 2012. An intelligent text mining system applied to SEC docuemnts. Paper presented at IEEE/ACIS 11th International Conference on Computer and Information Science, Shanghai, China, June 8. [Google Scholar]
Total No. of Filings Scanned | Total No. of Level 1 Results | Total No. of Level 2 Results | Total Process Time of Level 1 and Level 2 Filters | Average Process Time per Filing |
---|---|---|---|---|
18,351,566 | 1,892,026 | 881,942 | 15,002 h | 1.94 s |
Strategy | Search Keywords |
---|---|
Shift | shift, switch |
Freeze | freeze |
Termination | cease, terminate, termination, wind up |
Buyout | buyout, buy-out, buy out |
Buyin | buyin, buy-in, buy in |
Longevity hedge | longevity |
Extended Basic Keywords |
---|
DB |
DC |
Defined benefit |
Defined contribution |
Pension |
Retirement |
Shift | Freeze | Terminate | Buyout | Buyin | LH | Total | |
---|---|---|---|---|---|---|---|
Number of cases | 20,980 | 56,302 | 832,355 | 19,539 | 291 | 6311 | 935,778 |
True Positive | True Negative | |
---|---|---|
Predicted Positive | 303 (TP) | 16 (FP) |
Predicted Negative | 45 (FN) | 436 (TN) |
Accuracy | Precision | Recall | Specificity | F-Measure | |
---|---|---|---|---|---|
Linear SVM (C = 0.0) | 92.4% | 95.0% | 87.1% | 96.5% | 0.91 |
kNN (similarity measure = cosine, k = 6) | 88.4% | 87.2% | 85.9% | 90.3% | 0.87 |
NB | 83.4% | 83.9% | 76.4% | 88.7% | 0.80 |
DT | 81.0% | 88.3% | 64.9% | 93.4% | 0.75 |
Penalty Parameter (C) | Accuracy |
---|---|
0 | 92.4% |
0.05 | 56.8% |
0.10 | 63.4% |
0.15 | 76.6% |
0.20 | 85.9% |
0.25 | 89.5% |
0.30 | 90.5% |
0.35 | 91.1% |
0.40 | 91.6% |
0.45 | 90.8% |
0.50 | 92.0% |
k-Value | Canberra Distance | Chebyshev Distance | Correlation Similarity | Cosine Similarity | Dice Similarity | Euclidean Distance |
---|---|---|---|---|---|---|
1 | 43.5% | 79.6% | 88.4% | 87.8% | 89.6% | 87.6% |
2 | 56.5% | 80.3% | 82.8% | 86.6% | 85.9% | 88.0% |
3 | 56.5% | 72.5% | 87.1% | 87.1% | 88.3% | 89.4% |
4 | 56.5% | 71.0% | 83.1% | 88.6% | 86.8% | 87.6% |
5 | 56.5% | 67.3% | 87.3% | 89.0% | 88.6% | 87.8% |
6 | 56.5% | 68.4% | 86.4% | 90.0% | 85.5% | 88.4% |
7 | 56.5% | 65.9% | 87.4% | 87.5% | 88.5% | 87.6% |
8 | 56.5% | 66.5% | 86.3% | 89.3% | 87.3% | 88.4% |
9 | 56.5% | 64.6% | 87.4% | 87.8% | 87.6% | 87.0% |
10 | 56.5% | 65.5% | 85.9% | 88.0% | 87.1% | 88.0% |
11 | 56.5% | 63.1% | 88.1% | 88.1% | 87.9% | 87.3% |
12 | 56.5% | 62.9% | 86.5% | 88.0% | 86.9% | 88.4% |
13 | 56.5% | 62.6% | 87.9% | 88.9% | 87.4% | 85.8% |
14 | 56.5% | 63.6% | 86.9% | 88.3% | 87.4% | 87.9% |
15 | 56.5% | 62.5% | 89.1% | 88.5% | 88.5% | 88.3% |
16 | 56.5% | 62.6% | 86.8% | 87.8% | 87.0% | 88.8% |
17 | 56.5% | 60.5% | 87.1% | 87.1% | 88.1% | 87.5% |
18 | 56.5% | 62.0% | 86.8% | 87.5% | 87.4% | 88.6% |
19 | 56.5% | 60.9% | 87.6% | 87.5% | 87.8% | 87.9% |
20 | 56.5% | 61.1% | 86.1% | 87.1% | 87.1% | 88.1% |
Accuracy | Precision | Recall | Specificity | F-Measure | |
---|---|---|---|---|---|
Pruning < 10% and > 90%, without class weighting | 86.4% | 72.6% | 78.6% | 89.2% | 0.75 |
Pruning < 5% and > 95%, without class weighting | 89.3% | 79.0% | 81.6% | 92.1% | 0.80 |
Pruning < 5% and > 95%, with class weighting | 93.3% | 92.8% | 79.6% | 97.9% | 0.86 |
Accuracy | Precision | Recall | Specificity | F-Measure | |
---|---|---|---|---|---|
Linear SVM model testing | 93.3% | 75.0% | 95.8% | 92.8 | 0.84 |
Shift | Freeze | Terminate | Buyout | Buyin | LH | Total | |
---|---|---|---|---|---|---|---|
Cases Identified by Level 1 and Level 2 Filters | 20,980 | 56,302 | 832,355 | 19,539 | 291 | 6311 | 935,778 |
True De-risking Cases | 467 | 8649 | 1735 | 138 | 28 | 5 | 11,022 |
% of True Cases | 2.2% | 15.4% | 0.2% | 0.7% | 9.6% | 0.1% | 1.2% |
Underfunding Ratio | Profitability | Stock Ret Vol. | Credit Rating | Excess Return | |
---|---|---|---|---|---|
0.024 *** | −0.008 *** | 0.009 *** | −0.522 *** | 0.011 | |
(3.640) | (−5.437) | (8.297) | (−7.622) | (1.024) | |
Total Assets | −0.017 *** | 0.004 *** | −0.006 *** | 1.114 *** | 0.000 |
(−11.071) | (11.508) | (−23.394) | (56.191) | (0.027) | |
Leverage | 0.121 *** | −0.099 *** | 0.076 *** | −5.232 *** | −0.339 *** |
(5.842) | (−20.951) | (18.399) | (−23.078) | (−8.597) | |
Profitability | −0.214 *** | −0.091 *** | 14.652 *** | 0.687 *** | |
(−6.155) | (−14.979) | (34.247) | (11.994) | ||
Earnings Volatility | 0.042 | 0.067 ** | 0.195 *** | −3.173 *** | −1.131 *** |
(0.353) | (2.451) | (8.587) | (−2.044) | (−5.247) | |
Cash Holding | 0.090 *** | 0.029 *** | 0.031 *** | −5.966 *** | −0.094 ** |
(3.112) | (4.419) | (6.374) | (−16.264) | (−2.076) | |
No-cash Working Capital | 0.038 * | 0.017 *** | −0.003 | −1.617 *** | −0.105 *** |
(1.849) | (3.563) | (−0.702) | (−6.273) | (−3.101) | |
Tangible Assets | −0.128 *** | −0.052 *** | 0.023 *** | 0.823 *** | 0.036 |
(−7.344) | (−13.014) | (7.583) | (4.543) | (1.286) | |
Capital Expenditure | 0.434 *** | 0.461 *** | 0.048 *** | 0.277 *** | −0.705 *** |
(6.972) | (33.076) | (4.438) | (0.390) | (−6.855) | |
Sales Growth | −0.001 | 0.001 ** | 0.002 *** | −0.315 * | 0.015 *** |
(−0.307) | (2.151) | (4.888) | (−4.869) | (2.902) | |
Private Debt | −0.045 * | −0.017 | 0.009 * | 0.481 *** | −0.144 *** |
(−1.762) | (−2.817) | (1.837) | (1.762) | (−3.063) | |
Constant | 0.539 *** | 0.112 *** | 0.101 *** | 3.642 *** | 0.047 |
(14.596) | (13.307) | (15.907) | (9.411) | (0.771) | |
Industry-fixed Effect | Yes | Yes | Yes | Yes | Yes |
Time-fixed Effect | Yes | Yes | Yes | Yes | Yes |
Number of obs. | 15,444 | 16,279 | 14,023 | 10,513 | 13,340 |
0.284 | 0.308 | 0.347 | 0.551 | 0.062 |
Panel A: One-Year Lead Dependent Variable | |||||
Underfunding Ratio | Profitability | Credit Rating | Stock Ret Vol. | Excess Equity Return | |
0.026 *** | −0.005 *** | −0.519 *** | 0.007 *** | 0.023 ** | |
(3.719) | (−3.187) | (−7.322) | (6.261) | (2.264) | |
Number of obs. | 13,800 | 14,458 | 9554 | 12,573 | 11,998 |
0.271 | 0.272 | 0.560 | 0.326 | 0.024 | |
Panel B: Three-Year Forward Moving Average Dependent Variable | |||||
Underfunding Ratio | Profitability | Credit Rating | Stock Ret Vol. | Excess Equity Return | |
0.027 *** | −0.007 *** | −0.492 *** | 0.007 *** | 0.012 * | |
(4.348) | (−4.677) | (−7.640) | (7.886) | (1.702) | |
Number of obs. | 15,586 | 16,279 | 10,883 | 14,199 | 13,526 |
0.252 | 0.310 | 0.575 | 0.392 | 0.063 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, L.; Tian, R.; Chen, J. Text Mining for U.S. Pension De-Risking Analysis. Risks 2022, 10, 41. https://doi.org/10.3390/risks10020041
Zhang L, Tian R, Chen J. Text Mining for U.S. Pension De-Risking Analysis. Risks. 2022; 10(2):41. https://doi.org/10.3390/risks10020041
Chicago/Turabian StyleZhang, Limin, Ruilin Tian, and Jun Chen. 2022. "Text Mining for U.S. Pension De-Risking Analysis" Risks 10, no. 2: 41. https://doi.org/10.3390/risks10020041
APA StyleZhang, L., Tian, R., & Chen, J. (2022). Text Mining for U.S. Pension De-Risking Analysis. Risks, 10(2), 41. https://doi.org/10.3390/risks10020041