Intelligent Risk Evaluation for Investment Banking IPO Business Based on Text Analysis

Zhang, Lei; Wang, Chao; Liu, Xiaoxing

doi:10.3390/info15080498

Open AccessArticle

Intelligent Risk Evaluation for Investment Banking IPO Business Based on Text Analysis

by

Lei Zhang

¹,

Chao Wang

^2,3,*

and

Xiaoxing Liu

²

¹

School of Cyber Science and Engineering, Southeast University, Nanjing 210096, China

²

School of Economics and Management, Southeast University, Nanjing 210096, China

³

School of Finance, Nanjing Agricultural University, Nanjing 210095, China

^*

Author to whom correspondence should be addressed.

Information 2024, 15(8), 498; https://doi.org/10.3390/info15080498

Submission received: 25 July 2024 / Revised: 13 August 2024 / Accepted: 19 August 2024 / Published: 20 August 2024

(This article belongs to the Section Information Applications)

Download

Browse Figures

Versions Notes

Abstract

:

By constructing a text quality analysis system and company quality analysis system based on a prospectus, the intelligent analysis method of investment banking IPO business risk is proposed based on the machine learning method and text analysis technology. Taking the Sci-Tech Innovation Board in China as a sample, the empirical analysis results show that the text quality and the company quality disclosed in the prospectus can affect the withdrawal rate of investment banking IPO business. By carrying out a text analysis and machine learning on the text quality and company quality, the risk of investment banking IPO business can be predicted intelligently and effectively. The research results can not only improve the business efficiency of investment banking IPO, and save resource cost, but also improve the standardization and authenticity of investment banking IPO business.

Keywords:

natural language processing; machine learning; prospectus; withdrawal rate of IPO

1. Introduction

Cultivating first-class investment banks is the main starting point for promoting the prosperity of the capital market, and it is also an important link for creating capital channels and promoting the development of the real economy. Investment banks have diversified their income in recent years. They have consulting fees, SEOs, bond issuances, advising services in general, etc. In 2023, the total revenue was CNY 405.90 billion in China’s investment banking. However, the revenue of the IPO (initial public offering) business was CNY 54.16 billion, accounting for about 13.34% of the total revenue. Therefore, IPOs are one of the main businesses of investment banking, and how to identify the risks of IPO business efficiently and accurately has become the key to building a first-class investment bank. The capital market reform characterized by improved information disclosure provides an opportunity for this. In terms of China’s financial market, government departments adopted the document “Overall Implementation Plan for Setting Up the Science and Technology Innovation Board in Shanghai Stock Exchange and Pilot Registration System” in 2019, thus marking the establishment of the Sci-Tech Innovation Board. This document said that the Sci-Tech Innovation Board emphasizes “information disclosure as the center” and sets up an interactive question-and-answer session, requiring IPO companies to timely supplement accurate and complete company information to the market. Utilizing an audit inquiry, the regulatory authorities conduct multiple rounds of inquiry and require supplementary information on the problems existing in the prospectus of IPO companies to protect the interests of investors and improve market transparency. This regulatory method has become one of the main means of Sci-Tech Innovation Board listing.

The risk of IPO business in investment banking is concentrated in the production of filing documents, and the core content that runs through all the materials in the filing documents is concentrated in the prospectus. Before applying for an IPO, the company has relatively little public information in the market, and the prospectus is the first systematic information disclosure made by the company. A prospectus plays a great role in reducing the problem of company information asymmetry and becomes a fundamental factor in ensuring the smooth operation of the stock market. Under the current practice model of investment banking, it is difficult to accurately quantify the quality of disclosure in prospectuses, which is a pressing issue for both regulatory authorities and investment banking. Investment banking hopes to be able to elevate the role of capital market gatekeepers, the unqualified projects blocked before the declaration. To enhance the success rate of investment banking underwriting and reduce the risk of investment banking at the same time, more recommended high-quality projects are listed to reduce the investment risk of investors. Therefore, how to realize the effective evaluation of the quality of a company’s IPO project through their prospectus is of great practical significance. However, existing research lacks quantitative methods on the quality of prospectus disclosure.

Regarding the quality of information disclosure in a prospectus, the traditional manual audit query method needs to invest a lot of human resources, is highly subjective, and may have human errors. Artificial intelligence technology provides an important research paradigm to reduce the human workload of employees as well as human errors. The introduction of text analysis techniques and machine learning methods can make the quantification of information disclosure quality in prospectuses more intelligent. Therefore, this paper will conduct a text analysis based on the prospectus of investment banking IPO business, from the perspective of text quality and company quality, combined with a machine learning model to provide intelligent evaluation methods to realize the risk of investment banking IPO business. The most important question addressed in this study is proposing a method to identify the risk of an investment bank’s IPO business through text analysis and machine learning. It suggests that investment banks may similarly add text analysis as one tool of risk assessment. This will not only help reduce labor costs but also enhance the standardization and quality of information disclosure in the investment banking IPO business. It can improve the risk management capability of investment banking IPO business and provide a more effective guarantee for realizing financial security in the capital market.

The main innovations of this paper are as follows: this paper proposes a specific method for quantitatively evaluating the disclosure quality of prospectuses by deeply mining the disclosure features of text quality and company quality contained in prospectuses through text analysis technology. On this basis, combined with the machine learning method, an intelligent evaluation system for the IPO business risk of investment banking based on prospectus is constructed, which improves the effectiveness and efficiency of IPO risk evaluation for investment banking.

2. Literature Review

Text analysis is a process of in-depth mining information content in text by using corresponding technical methods. The specific content of text analysis involves the readability of text [1,2], similarity [3,4], emotional tone [5,6] and semantic features [7,8] among others. Based on these features, Loughran and McDonald [9] gave an overview of the relevant content of text analysis in the field of accounting. Guo et al. [10] and Gentzkow et al. [11] introduced the application of text analysis in finance and economics, respectively. There are numerous types of texts mentioning finance-related fields, and the existing literature has conducted a text analysis in the financial field mainly based on the disclosed text information of listed companies (such as financial reports, periodic reports, interim announcements, prospectuses, etc.), financial reports in the media, media financial reports, media financial information, social network text, online search index, etc. [12,13,14,15,16]. The above studies provide important references for the understanding and application of text analysis methods.

The development of text analysis techniques has also provided new means to financial risk management [17]. Some studies have mined and parsed the risk information hidden behind financial data through text analysis techniques [18,19,20]. Financial risk management based on text analysis technology cannot be separated from the support of artificial intelligence technology. Many low-value density data need to be processed by machine learning methods, and multi-dimensional and multi-form data also provide training samples for the realization of financial risk management based on artificial intelligence technology. It is currently based on Bayesian networks [21], artificial neural networks [22], support vector machines [23,24], decision tree and random forest [25], and deep learning [26] and other artificial intelligence methods for financial risk management have been carried out successively. The results show that the accuracy of risk identification is significantly improved by the artificial intelligence method.

Despite the wide application of text analysis in finance, the application of text analysis in financial risk management has mainly focused on online public opinion risk [27,28,29], and relatively few applications have been made for the identification of risk factors in the text of company disclosures. The text analysis for company disclosure mainly focuses on text quality, and the mining of company disclosure information is insufficient, and the text analysis of company quality is lacking. Therefore, this paper focuses on text analysis based on prospectuses to identify risk factors in investment banking IPO business accordingly by mining text quality and firm quality features in disclosures.

3. Methodology

3.1. Construction of Text Quality Analysis System Based on Prospectus

The core of the registration system is full, accurate, and timely information disclosure. There are significant differences in the quality of information disclosure among different investment banking. Every year, the China Securities Regulatory Commission issues many fines against investment banking firms with poor quality practices. Many of these fines are caused by low-level errors such as inconsistencies in the prospectus and the lack of data alignment. The text quality of prospectus disclosure by investment banking can affect the withdrawal rate of IPO business. Therefore, this paper firstly constructs a text analysis system based on a prospectus.

There have been many studies that initially explored the text quality of listed companies’ disclosures through text analysis techniques [30]. By summarizing the text quality analysis indicators of these studies and combining them with the features of prospectuses, this paper constructs the text quality analysis system based on prospectus at the levels of Chinese characters, vocabulary, sentences, and chapters.

3.1.1. Chinese Characters Level

The validity of Chinese characters: A prospectus usually contains a lot of invalid Chinese characters such as spaces, punctuation marks, and auxiliary words. These invalid characters are also known as stop words in text analysis. In general studies, the removal of stop words is the basis of text analysis. It is necessary to remove these invalid Chinese characters in the process of word segmentation, which is realized based on a corpus of stop words built by combining the mainstream Chinese stop word lists, namely, the stop word lists released by the Natural Language Processing Laboratory of the Harbin Institute of Technology, Baidu Inc., the School of Information of Renmin University of China, and the Machine Intelligence Laboratory of Sichuan University, respectively. Therefore, the description of text quality at the level of Chinese characters in this paper includes the number of valid characters.

The commonality of Chinese characters: According to the frequency of occurrence of Chinese characters, Chinese characters can be divided into common characters, sub-common characters, and rare characters. Since there are some rare characters included in a prospectus, this paper only measures the commonality of Chinese characters by the number of common characters and sub-common characters. Common words are positive indicators, that is, a higher proportion of the prospectus is easier to understand. Sub-common words will reduce the fluency of reading and increase the difficulty of reading. Therefore, the higher the number of sub-common words, the lower the readability of the prospectus. A corpus of common and sub-common characters is constructed using the List of Common Characters in Modern Chinese published in 1988 in China. The List of Common Characters in Modern Chinese is regarded as one of the bases for the current standard Chinese characters, which are divided into common characters and sub-common characters. After removing the stop words, the number of common and sub-common characters is obtained by word segmentation in the prospectus.

3.1.2. Vocabulary Level

Commonality of vocabulary: This is described by the total number of the common vocabulary. Like the commonality of Chinese characters, common vocabulary is a positive indicator. A corpus of common vocabularies is constructed using the List of Common Words in Modern Chinese (2nd Edition) published by the Commercial Press of China in 2021. After removing the stop words, the number of common vocabularies is obtained by word segmentation in the prospectus.

Part of speech complexity: Although adjectives, adverbs, quantifiers, numerals, and other parts of speech belong to content words, the number of such content words usually increases the difficulty of understanding the prospectus. Therefore, the proportion of adjectives, adverbs, quantifiers, and numerals is counted specifically for the description of part of speech complexity.

Vocabulary semantic difficulty: This paper analyzes the text quality of the prospectus. The text involves many professional vocabularies, which makes it difficult to read and understand the prospectus. Therefore, the semantic difficulty of vocabulary is measured by the proportion of financial vocabulary. A corpus of financial vocabulary is constructed according to the commonly used accounting dictionary named the Oxford English–Chinese Accounting Dictionary and the financial dictionary named the Latest Chinese-English Economic and Financial Common Terms. After removing the stop words, the number of financial vocabularies is obtained by word segmentation in the prospectus.

3.1.3. Sentence Level

Syntactic complexity: In a strict sense, syntactic complexity should be expressed by the average syntactic tree height. Due to the difficulties in the classification of Chinese phrases, the syntactic complexity is represented by the commonality of vocabulary instead of the phrase type. First, the average clause data are counted. On this basis, the syntactic complexity is further represented by the proportion of common characters, sub-common characters, common vocabulary, complex vocabulary, and financial vocabulary in each sentence.

Sentence length: This includes the average number of characters and vocabulary in the whole sentence. Punctuation serves as a temporary pause to receive text information. The more words contained in a clause or the whole sentence, the greater the amount of information to be processed, and the higher the difficulty of reading the text. The optimal length of a Chinese sentence is 7 to 12 characters; otherwise, reading is made difficult. Therefore, the longer the average sentence length, the lower the readability of the text.

3.1.4. Chapter Level

This is mainly depicted by the chapter length of the prospectus, and Chinese characters, vocabulary, and sentences are used as calculation units, respectively. The more Chinese characters, vocabulary, and sentences a chapter contains, the longer it is and the higher the cost of information processing.

In summary, the text quality analysis system constructed in this paper for the prospectus of the company in the IPO process is shown in Table 1.

3.2. Construction of Company Quality Analysis System Based on Prospectus

Although company disclosure quality is a popular research topic, relatively few studies have examined disclosure quality based on prospectuses. The only studies that have been conducted analyze mainly from the perspective of text analysis and lack quantitative analysis of company quality in prospectuses [6,7,8,21]. Therefore, this study further starts with the disclosure of information about company quality in the prospectus, proposes the construction method of a company quality analysis system based on the prospectus, and explores the impact of disclosure information about company quality on its IPO withdrawal rate.

The prospectuses of IPO companies on the Sci-Tech Innovation Board in China have a generally fixed writing mode. According to the “Standards on the Content and Format of Information Disclosure No. 41” document, the fixed format of the prospectus of IPO companies on the Sci-Tech Innovation Board in China usually includes the core contents such as an overview, the basic information of the issuer, business and technology, corporate governance and independence, financial accounting information, management analysis, the use of raised funds and future development planning, investor protection, and other important matters. The historical review and inquiry records of IPO companies on the Sci-Tech Innovation Board in China show that compliance status, industry status, technical status, management status, and financial status are more attractive to regulators and investors. Therefore, this study constructs a company quality analysis system and indicators from these aspects.

3.2.1. Compliance Status

This corresponds to the text analysis and mining of the “company governance and independence” section of the prospectus. For the characterization of compliance status, the coverage and details of relevant disclosures are quantified by the number of secondary, tertiary, and quaternary titles disclosed and the number of characters in each title. The more detailed the information disclosed in the prospectus, the higher the quality of the issuing company presented in the corporate governance and independence section. The lower the potential risk factors corresponding to investment banking IPO business, the easier it is to successfully pass the IPO review.

3.2.2. Industry Status

Quantification is based on text analysis of the “company’s main business, main products or services” and “basic situation of the industry in which the company operates” in the “business and technology” section of the prospectus. For the feature mining of industry status, firstly, the industry classification of the company is identified through text analysis. The dummy variable for this feature is defined as 1 if the industry it belongs to contains emerging industries that the Sci-Tech Innovation Board focuses on supporting, such as those involving information technology, high-end equipment, new materials, new energy, energy saving and environmental protection, as well as biomedicine, etc., and is defined as 0 otherwise. In addition, industry status is identified and quantified by company market share. For the portrayal of other industry-related content, the number of tertiary and quaternary titles disclosed in the “company’s main business, main products or services” and “basic situation of the industry in which company operates” sections, as well as the number of characters in each title, are also counted.

3.2.3. Technology Status

Quantification is based on the text analysis of the “company’s core technology and research and development” in the “business and technology” section of the prospectus. It is important to refer to relevant policy documents to focus on identifying companies that meet the four additional indicators. Dummy variables to measure the company’s core technology and R&D capability are set accordingly. We identify the company’s R&D investment and the proportion of R&D investment to the main business income; if the cumulative amount of the last three years exceeds CNY 60 million or the proportion of investment to operating income in the last three years is above 5%, then the value of the dummy variable is increased by 1. We identify the proportion of the existing R&D personnel to the total number of employees; if it exceeds 10%, then the value of the dummy variable is increased by 1. The company has the core technology and the number of invention patents (including national defense patents) applied to the main business statistics; if the invention patents applied to the company’s main business are more than five, then the value of the above dummy variable is increased by 1. The compound growth rate of operating income in the last three years and the operating income in the last year is identified; if the growth rate is 20% or the amount is CNY 300 million, then the dummy variable value is increased by 1. In addition, the number of tertiary and quaternary titles disclosed in the section “core technologies of company’s main products” and the number of texts on each title are counted.

3.2.4. Management Status

This is quantified based on the text analysis of “directors, supervisors, senior managers, and core technical personnel” in the “company basic situation” section of the prospectus. We identify the number of directors, supervisors, senior managers, and core technical personnel as well as their birth dates, and estimate the working experience of relevant personnel according to their birth dates. We identify whether the prospectus discloses the “shareholdings of directors, supervisors, senior managers and core technical personnel and their close relatives”. In addition, the number of tertiary and quaternary titles disclosed in the section “directors, supervisors, senior managers, and core technical personnel” and the number of characters on each title are counted.

3.2.5. Financial Status

The “financial and accounting information and management analysis” section of the prospectus mainly focuses on the text analysis of the “major financial indicators for the reporting period”, which specifically includes quantitative indicators such as the current ratio, asset–liability ratio, accounts receivable turnover, inventory turnover, the increased rate of main business revenue and other quantitative indicators.

It is notable that according to the structure standards of the prospectuses in the Sci-Tech Innovation Board, the primary title usually shows the necessary disclosure content, corresponding to compliance status, industry status, technical status, management status, and financial status in the company quality analysis system. Therefore, the primary title of all prospectuses is basically the same. Under the same primary titles, there are different secondary, tertiary, and quaternary titles. Among them, the secondary title is a common indicator reflecting the detail of information disclosure in different prospectuses. The quaternary title is usually the lowest level of title in the prospectus, reflecting the most detailed information disclosure. In summary, the company quality analysis system based on the prospectus is shown in Table 2.

3.3. Construction of Risk Intelligent Evaluation System for Investment Banking IPO Business

The construction of the risk intelligent evaluation system for investment banking IPO business mainly consists of two processes, that is, the selection of features of the risk evaluation index system for investment banking IPO business and the intelligent prediction of the risk of investment banking IPO business based on the above features. The risk evaluation indicator system for investment banking IPO business integrates prospectus text analysis indicators and company quality analysis indicators and then analyzes the relationship between different indicators and the withdrawal rate for investment banking IPO business through feature selection. On this basis, the results of the company IPO review are predicted by a deep neural network, and the risk of investment banking IPO business is evaluated by the predicted results.

3.3.1. Feature Selection Based on a Random Forest Model

The fused prospectus-based text quality and company quality indicator system has high feature dimensions, and it is difficult to determine which features are more advantageous for portraying the risk of investment banking IPO business. Therefore, the high-dimensional features are selected through the random forest model. Random forest is a very effective ensemble learning method suitable for various regression and classification problems. Random forests can handle high-dimensional data and large-scale data sets, are robust to missing values and outliers, and are not prone to overfitting.

To further reflect the screening process of features, the importance of different features is ranked here by out-of-bag data errors, and the optimal number of features is determined accordingly as follows: Each time the decision tree is built, the data set used to train the decision tree is obtained by repeated sampling, and the prediction error rate of the model is calculated by using the remaining data that do not participate in the establishment of the decision tree, that is, the mean square error of the out-of-bag data. On this basis, noise interference is randomly added to a certain feature of all samples of out-of-bag data, and the mean square error of out-of-bag data is calculated again. Then, the importance of the feature can be described by the change amplitude of the mean square error of the out-of-bag data before and after two times. The importance degree of each feature is calculated and the corresponding proportion of features is eliminated according to the importance degree of each feature to obtain new feature sets. The above process is repeated for the new feature collection and the feature set with the lowest out-of-bag error is selected to realize the feature selection of the indicator system constructed in this study.

3.3.2. Risk Prediction Based on Deep Neural Network

Compared with the random forest model, the neural network model can describe complex nonlinear relationships and interactions among explanatory variables and extract and identify risks involved in the investment banking IPO business. Therefore, the deep neural network model is used to predict the possible risks of investment banking IPO business. To improve the accuracy of the results, a three-layer fully connected network is superimposed as the hidden layer of the deep neural network, and the deep neural network is trained by the backpropagation algorithm.

The specific algorithm process of the deep neural network model is as follows: if the model has

n

neurons in layer

l

and

m

neurons in layer

l - 1

, the matrix composed of linear coefficients

w

in layer

n

is represented as

W_{n \times m}^{l}

, the vector composed of bias term

b

in layer

l

is represented as

B_{n \times 1}^{l}

, the vector composed of output term

a

in layer

l - 1

is represented as

A_{m \times 1}^{l - 1}

, the vector composed of the linear output

z

before activation of layer

l

is represented as

Z_{n \times 1}^{l}

, and the vector composed of output

a

in layer

l

is represented as

A_{n \times 1}^{l}

. To sum up, the output of layer

l

is expressed as

A_{n \times 1}^{l} = σ (Z_{n \times 1}^{l}) = σ (W_{n \times m}^{l} A_{m \times 1}^{l - 1} + B_{n \times 1}^{l})

(1)

where

σ (Z_{n \times 1}^{l})

represents the activation function. If there are

k

neurons in layer

l + 1

of the model, the recurrence relationship is obtained by the gradient descent algorithm as follows:

Δ_{n \times 1}^{l} = {(W_{k \times n}^{l + 1})}^{T} Δ_{k \times 1}^{l + 1} ⊙ σ^{'} (Z_{n \times 1}^{l})

(2)

where the symbol

⊙

denotes the Hadamard product. Accordingly, the coefficient matrix

W

and bias term

B

of each hidden layer and output layer are randomly initialized, and the loss function is used to calculate the

Δ^{L}

of the output layer, and then the gradient

Δ_{n \times 1}^{l}

in layer

l

can be calculated through the backpropagation algorithm. We can update

W_{n \times m}^{l}

and

B_{n \times 1}^{l}

in layer

l

according to

Δ_{n \times 1}^{l}

and obtain

W_{n \times m}^{l} = W_{n \times m}^{l} - α \sum_{i = 1}^{s} Δ_{n \times 1}^{i, l} {(A_{m \times 1}^{i, l - 1})}^{T}

(3)

B_{n \times 1}^{l} = B_{n \times 1}^{l} - α \sum_{i = 1}^{s} Δ_{n \times 1}^{i, l}

(4)

where

α

represents the iteration step and

s

represents the number of training samples. Sample classification results are obtained when all the change values of

W

and

B

are less than the threshold value.

3.3.3. Performance Metrics of Machine Learning

The common performance metrics of machine learning are the accuracy rate, precision rate, and recall rate [31,32,33]. The accuracy rate is the proportion of the number of correctly classified samples to the total number of samples. The precision rate is the proportion of all samples predicted by the model to be positive that are actually positive. It reflects the confidence of the result that the model predicts a positive class. The recall rate is the proportion of samples that are correctly predicted to be positive by the model out of all samples that are actually positive. It reflects the ability of the model to capture positive class samples.

The calculation of performance metrics relies on the confusion matrix. A confusion matrix is a matrix used to summarize the results of a machine learning classifier. For the binary classification confusion matrix applied in this study, the following symbols are defined to describe the relationship between the real value and the predicted value: the true negative rate (TN) is the number of samples predicted to be negative samples; the false positive rate (FP) is the number of samples predicted from negative samples to positive samples; the false negative rate (FN) is the number of samples predicted from positive samples to negative samples; and the true positive rate (TP) is the number of samples predicted to be positive samples. Based on the defined confusion matrix, the accuracy rate, precision rate, and recall rate are calculated as follows:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(5)

P r e c i s i o n = \frac{T P}{T P + F P}

(6)

R e c a l l = \frac{T P}{T P + F N}

(7)

The MAE (mean absolute error), MSE (mean squared error), and RMSE (root mean square error) are commonly used indicators to measure the difference between the predicted values of the model and the real observed values [32,33,34]. The MAE is obtained by calculating the average of the absolute values of the differences between the predicted values and the real observed values. The advantage of MAE is that it has less impact on outliers because it uses the absolute value of the difference. The calculation is

M A E = \frac{1}{n} \sum_{i = 1}^{n} |{\hat{y}}_{i} - y_{i}|

(8)

where

{\hat{y}}_{i}

represents the real observed values,

y_{i}

represents the predicted values, and n is the number of samples.

The MSE is obtained by calculating the average of the squared differences between the predicted and the real observed values. The advantage of the MSE is that the difference value is squared, so the impact of larger error values on the fit will be greater, which helps to capture the prediction error of the model more sensitively. The calculation is

M S E = \frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}

(9)

The RMSE is obtained by calculating the mean of the squared difference between the predicted value and the real observed value and taking its square root. The RMSE has the advantage of having a large penalty for larger error values because it squares the difference values. This can avoid the excessive influence of large error values on the fit. The calculation is

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}

(10)

4. Application of Risk Intelligent Evaluation System for Investment Banking IPO Business

4.1. Data

For all companies applying for an IPO on the China Sci-Tech Innovation Board, we obtained the specific project information of the issuing company according to the content disclosed on the official website, including company name, review status, acceptance date, inquiry date, listing committee meeting date, registration submission date, registration effective date, etc. At the same time, the document information disclosed in the “Inquiry and reply” section of the official website was statistically analyzed to obtain the number of review inquiries, audit replies, and legal replies of the issuing company. Through the above process, a total of 904 companies were submitted to the Sci-Tech Innovation Board for IPO review, and 567 listed companies with effective registration were screened. For companies applying for an IPO in the Sci-Tech Innovation Board, their application drafts of the prospectus are downloaded for text analysis. The prospectus is segmented by the “jieba” program based on the Python 3.10 platform.

4.2. Statistical Features of Risk Evaluation Indicators for Investment Banking IPO Business

According to the industry category in the prospectus, Table 3 gives the actual statistical features of the Sci-Tech Innovation Board IPO review process. From the perspective of industry classification, the information technology industry has the largest number of IPO companies, accounting for more than one-third of all the companies applying for listing. In addition, the number of IPO companies in the biomedical industry and the energy-saving and environmental protection industry is also relatively large. The above industries are all hot sectors supported by the state in recent years. The number of high-end equipment industry and new materials industry companies is relatively small (high-end equipment and new materials belong to the manufacturing industry), and the proportion of scientific and innovative companies in the manufacturing industry is relatively small. However, from the results of the listing pass rate, the listing pass rate of the company in the new materials and high-end equipment industry is significantly higher than that of other industries, which reflects the support of the Sci-Tech Innovation Board for manufacturing industry companies and shows that the manufacturing industry in line with the Sci-Tech Innovation Board listing standards is usually a relatively high-quality company. Overall, the IPO approval rate of the Sci-Tech Innovation Board is about 64%, and the average passing time is about 8–9 months. No matter the passing rate or the passing time, the Sci-Tech Innovation Board does provide a more flexible, transparent, and convenient financing channel for scientific and technology companies. According to the statistical results of review inquiries, audit replies, and legal replies, the number of ordinary review inquiries and replies is relatively high, with an average of about 5.6 times, followed by audit inquiries and replies, with an average of about 4.59 times, and legal inquiries and replies, with the smallest average of about 3.24 times.

4.3. Feature Selection of Risk Evaluation Indicators for Investment Banking IPO Business

Based on the risk evaluation indicators for investment banking IPO business that integrate text quality analysis and company quality analysis, the features of the above indicators are selected according to the constructed random forest model. Specific parameter settings are determined by testing the mean square error (MSE) of random forest for a given different numbers of leaves, and the results are shown in Figure 1. Figure 1 shows that the mean square error level of random forest is basically similar as the number of leaves gradually changes from 5 to 100. Therefore, the setting of the number of leaves has relatively little effect on the result of feature selection through random forest. Since the mean square error of the random forest is generally low when the number of leaves is selected as 20 in the figure, the number of leaves of the random forest in the benchmark experiment is set as 20 to conduct subsequent feature selection.

Based on the set model parameters, the random forest model is further used to rank the importance of each indicator. The ratio of the mean square error increased by replacing the input variables with the standard deviation of each variable ranked, where a larger ratio indicates a higher importance of the variable. The focus here is on feature selection using review status as an indicator variable of their risk level, which presents the results of the analysis. The steps are similar using review time, inquiry replies, and initial underpricing as the indicator variables. Setting the screening threshold for feature selection to 0.1, Figure 2 presents all the features with an importance level greater than 0.1 that can predict the risk of investment banking IPO business. As can be seen from the figure, the variables obtained after the selection of features are the number of titles of compliance status, the total number of words of compliance status, the attribute of conforming to the Sci-Tech Innovation Board, the increased rate of the main business revenue, etc., and the importance of the above variables decreases successively. The variables related to compliance status, science and innovation attributes, and financial status are more capable of portraying the withdrawal rate of an IPO and are more valuable for predicting the review results of investment banking IPO business.

The compliance status mainly corresponds to the “company governance and independence” section of the prospectus. Corporate governance can improve the operational efficiency of the company. The independence of business, assets, personnel, and finance ensures that the company can operate independently and reduces conflicts of interest. Therefore, the compliance status helps to protect the interests of investors, which is crucial for the company’s IPO approval.

The Sci-Tech Innovation Board in China is positioned to focus on supporting emerging industries such as information technology, high-end equipment, new materials and new energy. It aims to provide more flexible and transparent financing channels for scientific and technological innovation companies to promote scientific and technological innovation and industrial upgrading. Therefore, whether the attributes of science and technology innovation are met is the key to determining a company’s IPO approval.

The Sci-Tech Innovation Board in China has high requirements for information disclosure, where listed companies need to provide more detailed and transparent financial information, business information, and risk tips. This helps to improve the understanding and evaluation of enterprises by market participants and improve the transparency and fairness of the market. Therefore, the importance of financial information for listed companies on the Sci-Tech Innovation Board in China is more prominent.

4.4. Risk Prediction for Investment Banking IPO Business Based on Random Forest

The random forest model itself can be used for direct result prediction. The former 80% of sample data are training samples, and the latter 20% of sample data are test samples. Based on the random forest model selected after the above feature, the review status of a company’s IPO can be predicted. An analysis of the prediction results of the test samples is shown in Table 4. It specifically analyzes the prediction results in terms of accuracy rate, precision rate, and recall rate. As can be seen from the table, the results of the prediction with all features and the prediction with the model selected with features are basically the same in terms of accuracy rate, precision rate, and recall rate. Although the variable dimension is greatly reduced after feature selection, the prediction effect of the model does not suffer too much loss, indicating that the variables obtained after feature selection have good representation. The withdrawal rate of investment banking IPO business can be basically described by the feature selected above. Based on the random forest model, the accuracy rate is about 68.59%, which indicates that the feature of prospectus text quality and company quality constructed in this paper can basically judge the withdrawal rate of IPO business. This is of great practical significance for investment banking to conduct an intelligent risk analysis of IPO business in advance and then guide the IPO to meet the requirements according to the analysis results, and this can greatly reduce the labor cost of investment banking. The results of precision and recall are significantly better than precision, indicating that the model has a higher ability to capture and predict positive class samples.

Further, the risk of investment banking IPO business is described by the review time and inquiry opinions, and the prediction results of random forest are analyzed. The review status of a company IPO is a categorical variable, so Table 4 analyzes the prediction results of investment banking IPO business risk through the accuracy rate, accuracy rate, and recall rate. The review time, inquiry opinions, and underpricing degree are numerical variables to describe the risk of investment banking IPO business. The errors of the accuracy rate, accuracy rate, and recall rate of the forecast results are large. Therefore, goodness of fit is used here to analyze the forecast results of investment banking IPO business risk. The MAE, MSE, and RMSE are also considered for error measurement, and the specific results are shown in Table 5.

As can be seen from Table 5, from the perspectives of review inquiry, audit inquiry, legal inquiry, and review time, the goodness of fit of the training set for the risks for investment banking IPO business is all around 40%. However, the goodness of fit of the predicted results of review time and inquiry opinions based on the test set and the random forest model is generally low. Among them, the responses to audit inquiries and legal inquiries are relatively slightly higher, and both are about 21%, while the goodness of fit for IPO review time is only 2%. Although the goodness of fit is relatively low, the evaluation indicators of MAE, MSE, and RMSE show that the prediction error of the random forest model for investment banking IPO business risk is relatively small. Although the prediction deviation of the random forest model for the risk profile indicators of the numerical type is relatively large, the ranking result of the overall risk is basically consistent with the actual data, indicating that the random forest method can basically achieve an intelligent analysis of the risks for investment banking IPO business.

The risk of the IPO business of investment banking is portrayed by review inquiries, audit inquiries, legal inquiries, and review time, respectively. The features of prospectus text quality and company quality are selected through the random forest model, and the initial features with the top ten importance ranking are selected, which are obtained as shown in Table 6. As can be seen from Table 6, there is an overlap in describing important features of review inquiries, audit inquiries, and legal inquiries. Relevant features of compliance status can have better predictive performance for various inquiries and replies of the China Securities Regulatory Commission. From the features given in Table 6, the importance of some company quality indicators disclosed in the prospectus is significantly higher than the text quality indicators. However, for the description of different risk features, both text quality features and company quality features play a very important role, indicating that they all contain effective information that needs to be paid attention to when predicting the risk for investment banking IPO business.

4.5. Risk Prediction for Investment Banking IPO Business Based on Deep Neural Network

Since the random forest model has relatively low performance in predicting the risks for investment banking IPO business, the deep neural network is further used to predict risks. Similarly, 80% of the sample data are used as the training set, and the remaining 20% is used as the test set. All features about prospectus text quality as well as company quality are taken as the system input, and risks for investment bank IPO business (described by the review status, review inquiries, audit inquiries, legal inquiries, and review time, respectively) are taken as the system output, and the neural network containing three hidden layers is used for learning prediction. The hidden layer is described by the logsig function and the number of neurons is 10; the output layer selects the excitation function as purelin and the number of neurons is 2; the maximum convergence times are set as 500; the convergence error is set as 0.1; and the learning rate is set as 0.1.

It takes the risk prediction for investment banking IPO business described by the review status, and the accuracy rate of the company’s IPO withdrawal rate is 72.73%, indicating that the prediction result has certain accuracy. On this basis, repeated experiments are conducted with the features that have been selected by the random forest model to obtain the prediction results of the company IPO review status. The results are also compared using the accuracy rate, precision rate, and recall rate, and the specific data are shown in Table 7. Compared with the prediction results based on the random forest model in Table 4, the prediction performance based on the deep neural network is generally slightly higher when all features are taken as system inputs. However, considering only feature selection, the prediction performance of the deep neural network is slightly worse than that of the random forest. This also shows that deep neural networks are more suitable for machine learning with many initial features. Similarly, the risk for investment banking IPO business is further described by other indicators, and the predicted results and corresponding performance analysis are obtained, as shown in Table 8. Compared with the prediction results based on the random forest model in Table 5, the performance of the deep neural network in predicting the risk for investment banking IPO business is significantly improved, all kinds of errors are significantly reduced, and the overall error level is generally low, indicating that based on the construction of the prospectus text analysis system and the company quality analysis system, an intelligent analysis of risk for investment banking IPO business can be effectively realized.

5. Conclusions

The most important question addressed in this study is the use of intelligent analysis in the study of IPOs. By constructing the text quality analysis system and company quality analysis system based on a prospectus, the intelligent analysis method of investment banking IPO business risk is proposed based on the machine learning method and text analysis technology. Taking the Sci-Tech Innovation Board in China as a sample, the empirical analysis results show that the text quality and the company quality disclosed in the prospectus can affect the withdrawal rate of investment banking IPO business. Through text analysis and machine learning on the text quality and company quality, the risk of investment banking IPO business can be predicted intelligently and effectively. Therefore, this intelligent analysis method can be used as a risk analysis tool for investment banks.

This study mainly yielded the following conclusions. This study proposes an IPO business risk analysis method based on machine learning and text analysis, which can be used as a risk analysis tool for investment banks. It shows that the feature mining of text quality and company quality based on a prospectus can effectively predict the review status of a company’s IPO. Therefore, investment banking can identify the risks that IPO businesses may face through an intelligent analysis of the prospectus and reduce the risks through corresponding measures.

In addition, the prospectus should be a high-quality and professional document. Different investment banks vary considerably in the quality of their prospectuses. Regulatory authorities also have a lot of inquiries about the quality of the prospectus. The resulting negative impact will lead to the withdrawal of their IPO businesses. This study shows that there is a relationship between the information disclosure quality of a prospectus and the IPO withdrawal rate. Therefore, investment banks should improve the information disclosure of their prospectus regarding the qualities of texts and companies.

The investment bank is responsible for writing the prospectus and they will judge whether the company meets the IPO requirements based on the disclosure information provided by the company in the prospectus to decrease the IPO withdrawal rate. Although companies may play games in information disclosure, outstanding investment banks will usually identify the problems reflected in the information disclosure of companies and guide the companies to meet IPO requirements. Therefore, this study provides a new risk analysis tool for investment banks, which highlights its application value.

According to the conclusions, this study mainly yielded the following insights: Motivated by the proposed method to intelligently identify the IPO business risks of investment banking, it suggests investment banks add text analysis as one tool of risk assessment. The intelligent risk assessment for investment banking IPO business can not only improve the business efficiency of investment banking and save resource cost but also improve the standardization and authenticity of investment banking IPO business.

In addition, investment banks should pay attention to the standardization and professionalism of the prospectus to improve the quality of IPO business and provide better services for companies. Investment banks should disclose detailed and accurate information about the company’s conditions in the prospectus and effectively guide the possible risks of the company in advance to improve the quality of IPO companies. This is of great value to enhance the gatekeeper role of investment banks in the capital market.

This study mainly yielded the following limitations and future directions. The limitation of this study mainly shows that the risk analysis perspective of investment banks is relatively narrow. In this study, the risks faced by investment banks are mainly analyzed from the perspective of the IPO business, and the market risk, capital risk, legal risk, and reputation risk that investment banks may face are not deeply considered.

This study proposes a method to identify the IPO business risks of investment banking through text analysis and machine learning and applies the method to the Sci-Tech Innovation Board in China as an example. The text analysis of the prospectus requires word segmentation, which depends on the understanding of different languages by the word segmentation program. Therefore, this study does not apply it to the markets of other countries for comparison. Future work will fill this gap by comparing it with other countries to explain if this methodology can be used elsewhere.

Author Contributions

Conceptualization, L.Z.; methodology, L.Z.; software, C.W.; validation, X.L.; formal analysis, C.W.; investigation, C.W.; resources, L.Z.; data curation, L.Z.; writing—original draft preparation, C.W.; writing—review and editing, L.Z.; visualization, X.L.; supervision, X.L.; project administration, X.L.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 72173018, and the Ministry of Education of Humanities and Social Science, grant number 21YJC790108.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Guay, W.; Samuels, D.; Taylor, D. Guiding through the fog: Financial statement complexity and voluntary disclosure. J. Account. Econ. 2016, 62, 234–269. [Google Scholar] [CrossRef]
Bushee, B.J.; Gow, I.D.; Taylor, D.J. Linguistic complexity in firm disclosures: Obfuscation or information? J. Account. Res. 2018, 56, 85–121. [Google Scholar] [CrossRef]
Lang, M.; Stice-Lawrence, L. Textual analysis and international financial reporting: Large sample evidence. J. Account. Econ. 2015, 60, 110–135. [Google Scholar] [CrossRef]
Kelly, B.; Papanikolaou, D.; Seru, A.; Taddy, M. Measuring technological innovation over the long run. Am. Econ. Rev. Insights 2021, 3, 303–320. [Google Scholar] [CrossRef]
Jiang, F.; Lee, J.; Martin, X.; Zhou, G. Manager sentiment and stock returns. J. Financ. Econ. 2019, 132, 126–149. [Google Scholar] [CrossRef]
Bochkay, K.; Hales, J.; Chava, S. Hyperbole or reality? Investor response to extreme language in earnings conference calls. Account. Rev. 2020, 95, 31–60. [Google Scholar] [CrossRef]
Bochkay, K.; Chychyla, R.; Nanda, D. Dynamics of CEO disclosure style. Account. Rev. 2019, 94, 103–140. [Google Scholar] [CrossRef]
Hanley, K.W.; Hoberg, G. Dynamic interpretation of emerging risks in the financial sector. Rev. Financ. Stud. 2019, 32, 4543–4603. [Google Scholar] [CrossRef]
Loughran, T.; McDonald, B. Textual analysis in accounting and finance: A survey. J. Account. Res. 2016, 54, 1187–1230. [Google Scholar] [CrossRef]
Guo, L.; Shi, F.; Tu, J. Textual analysis and machine leaning: Crack unstructured data in finance and accounting. J. Financ. Data Sci. 2016, 2, 153–170. [Google Scholar] [CrossRef]
Gentzkow, M.; Kelly, B.; Taddy, M. Text as data. J. Econ. Lit. 2019, 57, 535–574. [Google Scholar] [CrossRef]
Chi, S.S.; Shanthikumar, D.M. Local bias in Google search and the market response around earnings announcements. Account. Rev. 2017, 92, 115–143. [Google Scholar] [CrossRef]
Jung, M.J.; Naughton, J.P.; Tahoun, A.; Wang, C. Do firms strategically disseminate? Evidence from corporate use of social media. Account. Rev. 2018, 93, 225–252. [Google Scholar] [CrossRef]
Baloria, V.P.; Heese, J. The effects of media slant on firm behavior. J. Financ. Econ. 2018, 129, 184–202. [Google Scholar] [CrossRef]
Bonaime, A.; Gulen, H.; Ion, M. Does policy uncertainty affect mergers and acquisitions? J. Financ. Econ. 2018, 129, 531–558. [Google Scholar] [CrossRef]
Cookson, J.A.; Niessner, M. Why don’t we agree? Evidence from a social network of investors. J. Financ. 2020, 75, 173–228. [Google Scholar] [CrossRef]
Broeders, D.; Prenio, J. Innovative Technology in Financial Supervision (Suptech): The Experience of Early Users; Financial Stability Institute/Bank for International Settlements: Basel, Switzerland, 2018; Available online: https://www.bis.org/fsi/publ/insights9.pdf (accessed on 20 July 2024).
Cerchiello, P.; Giudici, P. Big data analysis for financial risk management. J. Big Data 2016, 3, 1–12. [Google Scholar] [CrossRef]
Nyman, R.; Kapadia, S.; Tuckett, D. News and narratives in financial systems: Exploiting big data for systemic risk assessment. J. Econ. Dyn. Control 2021, 127, 104119. [Google Scholar] [CrossRef]
Hale, G.; Lopez, J.A. Monitoring banking system connectedness with big data. J. Econom. 2019, 212, 203–220. [Google Scholar] [CrossRef]
Gandy, A.; Veraart, L.A. A Bayesian methodology for systemic risk assessment in financial networks. Manag. Sci. 2017, 63, 4428–4446. [Google Scholar] [CrossRef]
Iturriaga, F.J.L.; Sanz, I.P. Bankruptcy visualization and prediction using neural networks: A study of US commercial banks. Expert. Syst. Appl. 2015, 42, 2857–2869. [Google Scholar] [CrossRef]
Chatzis, S.P.; Siakoulis, V.; Petropoulos, A.; Stavroulakis, E.; Vlachogiannakis, N. Forecasting stock market crisis events using deep and statistical machine learning techniques. Expert. Syst. Appl. 2018, 112, 353–371. [Google Scholar] [CrossRef]
Gong, C.; Liu, T.; Yang, J.; Tao, D. Large-margin label-calibrated support vector machines for positive and unlabeled learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3471–3483. [Google Scholar] [CrossRef]
Döpke, J.; Fritsche, U.; Pierdzioch, C. Predicting recessions with boosted regression trees. Int. J. Forecast. 2017, 33, 745–759. [Google Scholar] [CrossRef]
Hsieh, J.P.A.; Rai, A.; Xu, S.X. Extracting business value from IT: A sensemaking perspective of post-adoptive use. Manag. Sci. 2011, 57, 2018–2039. [Google Scholar] [CrossRef]
Derakhshan, A.; Beigy, H. Sentiment analysis on stock social media for stock price movement prediction. Eng. Appl. Artif. Intell. 2019, 85, 569–578. [Google Scholar] [CrossRef]
Affuso, E.; Lahtinen, K.D. Social media sentiment and market behavior. Empir. Econ. 2019, 57, 105–127. [Google Scholar] [CrossRef]
Ouyang, Z.; Chen, S.; Lai, Y.; Yang, X. The correlations among COVID-19, the effect of public opinion; the systemic risks of China’s financial industries. Phys. A Stat. Mech. Its Appl. 2022, 600, 127518. [Google Scholar] [CrossRef]
Miller, B.P. The effects of reporting complexity on small and large investor trading. Account. Rev. 2010, 85, 2107–2143. [Google Scholar] [CrossRef]
Peng, Y.; Albuquerque, P.H.; Kimura, H.; Saavedra, C.A. Feature selection and deep neural networks for stock price direction forecasting using technical analysis indicators. Mach. Learn. Appl. 2021, 5, 100060. [Google Scholar] [CrossRef]
Sahu, S.K.; Mokhade, A.; Bokde, N.D. An overview of machine learning, deep learning, and reinforcement learning-based techniques in quantitative finance: Recent progress and challenges. Appl. Sci. 2023, 13, 1956. [Google Scholar] [CrossRef]
Nazareth, N.; Reddy, Y.V.R. Financial applications of machine learning: A literature review. Expert. Syst. Appl. 2023, 219, 119640. [Google Scholar] [CrossRef]
Dessain, J. Machine learning models predicting returns: Why most popular performance metrics are misleading and proposal for an efficient metric. Expert. Syst. Appl. 2022, 199, 116970. [Google Scholar] [CrossRef]

Figure 1. Mean square error of random forest.

Figure 2. Importance ranking of feature selection.

Table 1. Text quality analysis system based on prospectus.

Lever	Feature	Indicator
Chinese character	validity of Chinese characters	the number of valid characters
	commonality of Chinese characters	the number of common characters
	commonality of Chinese characters	the number of sub-common characters
Vocabulary	commonality of vocabulary	the number of common vocabularies
	part of speech complexity	the proportion of adverbs
	vocabulary semantic difficulty	the proportion of financial vocabulary
Sentence	syntactic complexity	the proportion of adverbs
		the proportion of common vocabulary
		the proportion of financial vocabulary
		the proportion of valid characters
		the proportion of common characters
		the proportion of sub-common characters
	sentence length	the average number of characters in the sentence
	sentence length	the average number of vocabularies in the sentence
Chapter	chapter length	the number of characters
		the number of vocabularies
		the number of sentences

Table 2. Company quality analysis system based on prospectus.

Level	Prospectus Section	Indicator
Compliance status	corporate governance and independence	the number of secondary titles
		the number of tertiary titles
		the number of quaternary titles
		the number of characters
Industry status	business and technology: company’s main business, main products, or services; the basic situation of the industry in which the company operates	the number of tertiary titles
		the number of quaternary titles
		the number of characters
		market share
		sunrise industry or not
Technology status	business and technology: company’s core technology and research and development; core technologies of the company’s main products	the number of tertiary titles
		the number of quaternary titles
		the number of characters
		attributes of science and innovation
Management status	company basic situation: directors, supervisors, senior managers, and core technical personnel	the number of tertiary titles
		the number of quaternary titles
		the number of characters
		the disclosure of close relatives
		the number of directors
		the number of supervisors
		the number of senior managers
		the number of core technical personnel
		the average age of directors
		the average age of supervisors
		the average age of senior managers
		the average age of core technical personnel
Financial status	financial and accounting information and management analysis: major financial indicators for the reporting period	the current ratio
		the asset–liability ratio
		accounts receivable turnover
		inventory turnover
		increase the rate of the main business revenue

Table 3. Statistical features of the review process of Sci-Tech Innovation Board IPO.

	Passing Rate	Passing Time	Review Inquiries	Audit Replies	Legal Replies	Number of Samples
Information technology	66.98	254.72	5.51	4.64	3.06	318
High-end equipment	76.92	219.31	5.85	4.77	3.54	13
New materials	82.05	241.18	5.54	4.26	3.6	39
New energy	70.33	257.55	5.77	4.35	3.15	91
Energy saving and environmental protection	70	251.80	5.53	4.59	3.21	120
Biomedicine	62.71	258.22	5.99	4.81	3.62	177
Others	41.78	254.25	5.03	4.36	3.03	146
Total	63.61	253.99	5.60	4.59	3.24	904

Table 4. Prediction results of IPO review based on random forest.

	Accuracy Rate	Precision Rate	Recall Rate
Feature selection	0.6835	0.9730	0.8030
All features	0.6859	0.9640	0.8015

Table 5. Prediction results for IPO risk based on random forest.

	Review Inquiries	Audit Inquiries	Legal Inquiries	Passing Time
Training results	0.3748	0.3744	0.4089	0.4126
Fitting results	0.0831	0.2118	0.2067	0.0218
MAE	515.8704	361.8981	248.3763	8.26 × 10³
MSE	2.50 × 10³	1.15 × 10³	536.2318	1.15 × 10⁶
RMSE	50.0357	33.8409	23.1567	1.07 × 10³

Table 6. Statistics of the key features.

Description of Risk	Main Feature
Review inquiries	Compliance status: secondary titles, tertiary titles, quaternary titles, the number of characters; the number of sub-common characters; the number of sentences; the attributes of science and innovation; the asset–liability ratio
Audit inquiries	Compliance status: secondary titles, tertiary titles, quaternary titles, the number of characters; technology status: the number of characters; the proportion of financial vocabulary; the proportion of sub-common characters; the number of sub-common characters
Legal inquiries	Compliance status: secondary titles, tertiary titles, quaternary titles, the number of characters; the proportion of common vocabulary; the proportion of financial vocabulary; the proportion of sub-common characters; the number of financial vocabularies
Review time	The number of vocabularies; the number of adverbs; the number of common vocabularies; similarity; the proportion of adverbs; the proportion of sub-common characters; compliance status: tertiary titles, the number of sentences

Table 7. Prediction results for IPO review based on deep neural network.

	Accuracy Rate	Precision Rate	Recall Rate
Dimension reduction variables	0.6816	0.9333	0.8053
All variables	0.7273	0.9839	0.8175

Table 8. Prediction results for IPO risk based on deep neural network.

	Review Inquiries	Audit Inquiries	Legal Inquiries	Passing Time	Initial Underpricing
Fitting results	0.1489	0.2007	0.1990	0.0157	0.0193
MAE	2.9979	1.8473	1.4740	45.8540	0.0651
MSE	13.7366	5.5815	3.0562	7786.5685	0.0133
RMSE	3.7063	2.3625	1.7482	88.2415	0.1154

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, L.; Wang, C.; Liu, X. Intelligent Risk Evaluation for Investment Banking IPO Business Based on Text Analysis. Information 2024, 15, 498. https://doi.org/10.3390/info15080498

AMA Style

Zhang L, Wang C, Liu X. Intelligent Risk Evaluation for Investment Banking IPO Business Based on Text Analysis. Information. 2024; 15(8):498. https://doi.org/10.3390/info15080498

Chicago/Turabian Style

Zhang, Lei, Chao Wang, and Xiaoxing Liu. 2024. "Intelligent Risk Evaluation for Investment Banking IPO Business Based on Text Analysis" Information 15, no. 8: 498. https://doi.org/10.3390/info15080498

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Risk Evaluation for Investment Banking IPO Business Based on Text Analysis

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Construction of Text Quality Analysis System Based on Prospectus

3.1.1. Chinese Characters Level

3.1.2. Vocabulary Level

3.1.3. Sentence Level

3.1.4. Chapter Level

3.2. Construction of Company Quality Analysis System Based on Prospectus

3.2.1. Compliance Status

3.2.2. Industry Status

3.2.3. Technology Status

3.2.4. Management Status

3.2.5. Financial Status

3.3. Construction of Risk Intelligent Evaluation System for Investment Banking IPO Business

3.3.1. Feature Selection Based on a Random Forest Model

3.3.2. Risk Prediction Based on Deep Neural Network

3.3.3. Performance Metrics of Machine Learning

4. Application of Risk Intelligent Evaluation System for Investment Banking IPO Business

4.1. Data

4.2. Statistical Features of Risk Evaluation Indicators for Investment Banking IPO Business

4.3. Feature Selection of Risk Evaluation Indicators for Investment Banking IPO Business

4.4. Risk Prediction for Investment Banking IPO Business Based on Random Forest

4.5. Risk Prediction for Investment Banking IPO Business Based on Deep Neural Network

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI