1. Introduction
There is a strong relationship between investors’ divergence and the trading volume (
Banerjee and Kremer 2010;
Carlin et al. 2014;
Cookson and Niessner 2020). Currently, investors are eager to obtain information and share opinions in online stock forums. Therefore, the relationship between investors’ divergence and the trading volume can be investigated by quantifying and mining the online public opinions (
Antweiler and Frank 2004;
Atmaz and Basak 2018;
Han et al. 2022). Online public opinions not only contain investors’ judgments, but also affect investors’ trading strategies. Therefore, we collected the online public opinions in the Chinese stock message board Guba Eastmoney (Guba) and regarded the users of Guba as investors to examine the impacts of investors’ online public opinion divergence on the trading volume of the CSI 300 index constituents. In addition, we also examined the differential impact of investors on the trading volume based on the classification of investors’ influence levels in Guba.
According to the 52nd Statistical Report on China’s Internet Development, the Internet penetration rate has reached 76.4%. The popularization of the Internet has helped break down information barriers and improve the transparency of public market information. Investors make extensive use of media platforms as a source of information (
Zhang et al. 2024), and investors from different countries can interact with each other through online stock forums (
Liu et al. 2023). Investors in financial markets are heterogeneous in terms of their individual capabilities (
Lee and Swaminathan 2000). Investors are more susceptible to external information due to the multi-channel information available on the Internet (
Lei and Song 2024). The way in which investors gather and interpret information can have a significant impact on their expectations (
Tan et al. 2014;
Drake et al. 2015). Differences in investors’ abilities to interpret public information are the dominant factor in the emergence of divergence (
Daniel and Hirshleifer 2015).
The most direct manifestation of investor divergence is the two relative trading choices of buying and selling. There are two different views in academic research on the question of the impact of investor divergence on the stock trading volume. The classical view, represented by
Hirshleifer (
1977), argues that higher investor divergence corresponds to a higher stock trading volume.
Al-Nasseri and Ali (
2018) analyzed 289,443 online tweets from StockTwits and constructed an indicator of divergence of opinion, finding that higher online divergence increases the trading volume.
Cookson and Niessner (
2020) quantified the daily divergence of investor sentiment on StockTwits and found that investor divergence is significantly and positively associated with the stock trading volume. Another view is the “no-trade theorem” of
Milgrom and Stokey (
1982). According to this theorem, in a rational expectations equilibrium, investors choose to stay on the sidelines, because they fear that their counterparties have information that they do not know, and a more cautious investment attitude leads to a reduction in the trading volume.
Cao et al. (
2002) found that investors with a wait-and-see attitude delay trading until the price of a stock confirms their expected private information.
Antweiler and Frank (
2004) used computer text-classification techniques to extract investor sentiments from Yahoo and Raging Bull stock message board posts and conducted an empirical study using high-frequency data at 15 min intervals to find that investor divergence on the same day leads to a decrease in the trading volume on the next day. In addition, some scholars have argued that both divergence and convergence of opinion help to explain the trading volume. For example,
Banerjee and Kremer (
2010) analyzed the relationship between divergence of opinions and the pre- and post-announcement trading volumes using a dynamic divergence-of-opinion model and found that both divergence and convergence of opinions help to explain the trading volume.
Li and Hou (
2024) captured individuals’ opinions on stocks from the most popular Chinese stock forums and classified the posts using a machine learning approach. They found that both convergence and divergence of opinions lead to an increased trading volume in the Chinese stock market.
Guba was established in 2006, and it is an online stock forum with one of the largest numbers of users and browsing volumes in China (
Huang et al. 2016). The forum stores information about the person who posted, the time of the post, the title, the number of clicks and comments. It is dominated by non-professional individual investors whose quality varies widely (
Huang et al. 2023;
Ackert et al. 2016). Investors are unable to absorb and utilize information due to their low level of financial expertise (
Lei and Song 2024). There are both experienced investors with extensive investment experience and “newcomers” who are new to the market in Guba. The two have different impacts on the online public opinions. Therefore, it is essential to investigate the impacts of divergence among investors at the same or different levels based on their influence. The open online environment and the complex composition of netizens can easily lead to divergence, which may affect investors’ original trading strategies and motivation to participate in the market. The trading volume is an important part of market quality and an important indicator of investor participation in the secondary market. Therefore, exploring the divergence of investors’ online public opinions can reveal the current status of Guba and the impact of market quality. It also provides a reference for financial institutions to manage online public opinions to support the development of the secondary market.
In this paper, we find that an increase in investors’ online opinions in Guba contributes to an increase in the stock trading volume, but the existence of online opinion divergence reduces investors’ willingness to trade. By differentiating the constituents of the CSI 300 index by industry and market capitalization, we find that non-financial stocks are more discussed and, therefore, more affected by online public opinion divergence than financial stocks. Although the number of online public opinions for mid-cap stocks is lower than for large-cap stocks, they are more affected by online public opinion divergence. In addition, we refer to the classification of Guba and classify investors into three levels and find that the convergence of divergence among high-level investors is conducive to the recovery of the stock trading volume. However, the existence of divergence among low-level investors will exacerbate the decline in trading volume.
Our contributions are as follows: First, when analyzing the relationship between investor behavior and the stock market, existing research tends to look at the market as a whole without classifying investors. Based on special data, we further differentiate investors and examine whether there is a difference between the divergence of investors and the stock trading volume. Investors are classified as high, medium or low level according to Guba’s classification of investor influence levels. This classification is explored to suggest new directions for future research. Second, we provide a more comprehensive and detailed analysis of investor divergence when investors do not publish online public opinions. We adjust the online public opinion divergence indicator by treating the absence of online public opinions in Guba as the absence of investor divergence, which fills the gaps in existing studies. Only in this way can we better represent the views of certain types of users on the CSI 300 index.
The remainder of this paper is organized as follows:
Section 2 provides the research hypotheses.
Section 3 describes our data and methodology.
Section 4 displays the results and some initial interpretations, while
Section 5 provides our conclusions.
2. Hypotheses
In general, investors’ different interpretations of public information can lead to divergence in market expectations and buying and selling choices, resulting in an increase in the trading volume (
Hong and Stein 2007;
Daniel and Hirshleifer 2015). Guba provides investors with greater flexibility, enabling them to engage in discussions on a wide range of stocks without transaction costs.
Benjamin et al. (
2022) found that a positive social media sentiment can lead to an increase in the value of a company.
Rakowski et al. (
2021) found that the attention generated through Twitter activity significantly impacts trading volume. However, divergence resulting from differences in investors’ interpretations of public information affects their trading strategies. This can lead to a decrease in the expected volume of buying and selling transactions. Therefore, based on the above analysis, we propose two hypotheses to examine the impact of online public opinions and divergence on the trading volume:
H1. The greater the number of online public opinions, the greater the trading volume.
H2. Divergence can hinder the increase in trading volume.
Increased discussion among investors inevitably leads to more information being available for stock trading. Experienced investors typically delay trading when prices deviate from their private information (
Cao et al. 2002).
Banerjee and Kremer (
2010) found that both divergence and convergence can help to explain the trading volume.
Giannini et al. (
2019) found that both divergence and convergence result in an abnormal trading volume during earnings announcements by measuring investors’ divergence on StockTwits. According to Guba’s classification of investors, those with low level of influence are identified as “newcomers”. They have a shorter registration duration, fewer followers, less recognition of online public opinions and comprise a larger number of people. Therefore, due to their limited professional ability and cognition, the divergence of low-level investors can bring pressure on the stable operation of the market. Investors with high levels of influence have more followers and receive greater attention in Guba, which means their online public opinions carry more weight. It means that the divergence between high- and low-level investors can have different impacts on the Chinese stock market. Therefore, we propose hypotheses 3 and 4:
H3. High-level investors’ divergence leads to an increased trading volume.
H4. Low-level investors’ divergence leads to a decreased trading volume.
For stocks in different sectors, there are differences in the size of investor divergence. The CSI 300 index consists of high-quality stocks in various industries, which have different characteristics due to different industries. Financial stocks are less affected by politics, are less liquid and have a lower price dispersion. Non-financial stocks are of greater interest to investors because of their broader coverage, greater price dispersion and more opportunities for quick short-term gains.
Oliveira et al. (
2017) found that microblogging sentiment and attention indicators are particularly useful for the prediction of some sectors such as high technology, energy and telecommunications. Based on the above hypotheses, we conclude that the trading volume of non-financial stocks is significantly influenced by divergence. Meanwhile, there is a distinction between the impacts of high- and low-level investors. Therefore, we propose hypotheses 5 and 6:
H5. The smaller the divergence among high-level investors, the greater the increase in the trading volume of non-financial stocks.
H6. The presence of divergence among low-level investors leads to a faster decrease in the trading volume of non-financial stocks.
Listed non-financial companies vary in market capitalization. Large-cap companies tend to have more stable stock prices and attract more investments from financial institutions. Small-cap companies have lower stock prices and are more susceptible to market manipulation, making them riskier investments.
Liu and Liu (
2014) used the A-share market of the Shanghai Stock Exchange from 2005 to 2011 to compare the impacts of individual and institutional investor sentiments. It was found that stocks with higher investor attention have small market capitalization and a low book-to-market ratio. Therefore, combined with the aforementioned characteristics of high-level investors with a high influence and long registration duration, we propose hypothesis 7:
H7. The more consistent the divergence of high-level investors, the more it will help to increase the trading volume in non-financial mid- and small-cap stocks.
3. Data and Methodologies
3.1. Data
We examined the impact of divergence among investors on the trading volume of the CSI 300 index constituents from 1 January 2021 to 31 December 2021. A total of 237 constituent stocks are selected in this paper. We have excluded the stocks of the CSI 300 index constituents that are not regularly sampled and suspended from trading. This avoids the “index effect” and ensures the continuity of the research. The relevant data on the trading of the stock and the market are provided by Wind.
Most studies have found that online public opinions during the mid-market session are more sensitive, so they mostly focus on the mid-market session (
Antweiler and Frank 2004). At the same time, the online public opinions expressed during the mid-market session align with the actual trading data of the market, which are more closely linked to the market and have a higher number of online public opinions per unit of time. For the sake of clarity, we focus on the relationship between investors’ divergence and the trading volume during mid-market sessions.
In this paper, the superscript “OC” (from opening to closing) is used to indicate the mid-market session, i.e., from 9:30 a.m. to 3:00 p.m. on the trading day. The non-trading session is from 3:00 p.m. on the previous trading day to 9:30 a.m. on the trading day, which includes holidays, weekends and other closed days. It is indicated by the superscript “CO” (from closing to opening). After crawling and organizing investors’ online public opinions in Guba, we collected a total of 2,001,527 online public opinions during trading hours and 4,491,458 online public opinions during non-trading hours. In order to avoid missing the impact of non-trading hour online public opinions on the trading volume, we include the online public opinions during non-trading hours as a control variable in the regression.
In addition, we utilize the “Jieba” Chinese word segmentation and natural language processing library (SnowNLP) for word segmentation processing. Customized dictionaries, including stock names, industry-specific terms, professional terms and network buzzwords are integrated for participle training to ensure the accuracy of the participle results. By utilizing the China Knowledge Network Sentiment Dictionary (HowNet) and the National Taiwan University Simplified Chinese Sentiment Polarity Dictionary (NTSUSD), we have expanded the corpus of positive and negative lexical categorization in SnowNLP. This improves the accuracy of sentiment categorization. Based on the methodology above, we categorize investors’ online public opinions in terms of sentiment. This provides a good basis for further empirical research.
To control for noisy information and to avoid misleading information, we process the underlying data as follows: First, we collect user IDs whose historical main post and comment counts are not 0 and crawl the user information published by the website. Then, we index their personal home pages according to the user IDs, crawl the investor’s historical comment data one by one and, at the same time, store the fields of comment time, stock bar name and posting topic of each comment. Since the research period of this paper is from 1 January 2021 to 31 December 2021, the above crawled comments are filtered by the posting time of each comment. We remove comments outside the 2021 timeframe, non-trading days within 2021 and non-trading hours on trading days and then generate each investor user’s posting information based on the ‘stockbar_code’ and ‘source_post_id’ fields reserved for the comments. According to the ‘stockbar_code’ and ‘source_post_id’ fields reserved in the comments, we generate the URLs of the posts made by each investor user and crawl the content of the main posts corresponding to the comments. Second, we clean up the text data. Specifically, we adapt the cleaning function in Python. We remove special symbols, @ and usernames in body text, URL links, advertising links and image postings in replies/retweets and merge excess spaces in the body text. Considering the difference between the sentiment represented by the use of emoji and the reality of the meaning, which can make the measurement of investor sentiment more biased, we finally decide to remove them altogether. As for the comments with repeated themes in the captured comments, on the one hand, it may be the fluctuation of the network that leads to repeated posting, and on the other hand, it is the expression of users highlighting their own emotions. Therefore, we de-duplicate the text data of comments with the same posting time, user ID and corresponding bar name and keep the text data of comments with different posting times.
3.2. Classification of Investors
Guba categorizes investors into ten levels, with higher levels representing greater influence. Meanwhile, Guba also provides information on investors’ other characteristics, such as the number of investors’ online public opinions, the number of followers and the registration date. We collected a total of 2,001,527 online public opinions during the trading session and found 160,705 investors, who are all the investors who posted online public opinions on the 237 constituent stocks in Guba during the year 2021. The statistics are shown in
Table 1.
Table 1 shows the statistics on the basic information of investors, classified according to their levels. The average online public opinions of investors are counted separately for each level. The results show that the average number of online public opinions increases with the level of investors. However, the total number of investors in each level decreases as the level increases. Therefore, we reclassify investors into three groups based on the average number of online public opinions (k = 1, 2 and 3 represent low-, mid- and high-level investors) to explore the relationship between divergence and the trading volume among investors of different levels.
First, in addition to the difference between the average registration duration and the total number of investors after categorization, there is a clear difference between the average number of followers. This also explains why Guba identifies the level of investors by their number of followers. A higher number of followers implies a higher level of influence. Second, low-level investors have the highest total number of investors, the lowest average number of followers and the shortest average registration duration. The opposite is true for high-level investors. Third, in terms of the total number of online public opinions, low-level investors are the most numerous followed by mid-level investors. And the total number of online public opinions from high-level investors is the lowest. However, after dividing the investors into three groups, we can see that high-level investors have the highest average number of online public opinions. This suggests that they are more concerned about the market and more willing to express their personal opinions about the market.
3.3. Investors Divergence
In this paper, we quantify the divergence and construct the divergence variable. Then, we explore its relationship with the trading volume and discuss it in terms of categorizing investors who choose to express online public opinions.
Based on the existing text mining and sentiment scoring, the words in online public opinions are classified into positive, neutral and negative. Then, the enhanced SnowNLP is used to score the online public opinions one by one. The sentiment of each online public opinion is quantified based on the number of sentiment words it contains. In general, connectives and nouns are considered neutral words. They are not counted in each online public opinion, because they do not contain sentiment. The final score of each online public opinion sentiment ranges from −1 to 1. The closer the score is to 1, the more positive the sentiment. The score for neutral online public opinions is 0. Numerous studies have pointed out that neutral online public opinions should be considered as noise. If they are included in the discussion, it will lead to biased results in the scoring of online public opinions (
Antweiler and Frank 2004). Therefore, when calculating the divergence, only positive and negative online public opinions will be considered. Scores between −1 and 0 will be classified as negative online public opinions, while scores between 0 and 1 will be considered positive online public opinions.
Antweiler and Frank (
2004) found that the value of the previous period should be maintained as the period in which there is no online public opinion instead of assuming that there is no online public opinion. However, they found that this assumption reduces the accuracy of the results. Therefore, when there is no online public opinion, the sentiment of the online public opinion should be considered as neutral. After categorizing investors, we found that there is randomness in the online public opinions of investors at each level. Therefore, if there is no online public opinion, the sentiment of online public opinions should be considered as neutral and defined as no divergence of the online public opinions. This is a better way of representing investors’ opinions on the CSI 300 index constituents. In conclusion, we propose a divergence variable based on the methodology proposed by
Antweiler and Frank (
2004). Furthermore, we provide a separate explanation for the absence of online public opinions. The specific equation is as follows:
In Equation (1), represents investors at different levels, represents the CSI 300 index constituents, and represents the time interval distinguished by trading and nontrading sessions. and represent the number of positive and negative online public opinions on stock on day of the trading session for level investors. is the divergence variable on stock on day of the trading session for level investors. For , the smaller its value, the larger the divergence.
The difference between this paper and
Antweiler and Frank’s (
2004) work in setting the divergence variable is the separate determination when there is no online public opinion. To exhaustively characterize
, four cases are listed here: (1) If all three online public opinions are positive,
=
= 1. There is no divergence, and, conversely, the result is the same for all the negative online public opinions. (2) If there are two positive online public opinions and one negative online public opinion,
, i.e., there exists divergence. (3) If there is one positive, one neutral and one negative online public opinion,
. Then, the divergence is greatest. (3) If there are no positive and negative online public opinions,
=
= 0. According to the original author’s equation design, if there are no online public opinions, the divergence would be considered greatest. However, the interpretation is different from the meaning of case (3). We believe that when investors do not make comments, they are more inclined to have no opinion or have a wait-and-see attitude. Since there is no specific sentiment tendency, there is no divergence. In this case, it is more reasonable to set
to
.
3.4. Variables
The previous part describes the construction of the divergence variable; here, we will explain the selection and treatment of each variable in the regression model.
There are differences in investors’ attentiveness, degree of sophistication and capital size, which will lead to the absence of relevant online public opinions, as well as neutral online public opinions. Therefore, is set when constructing the divergence variable, i.e., there is no divergence in such cases. To avoid the inaccuracy of estimating the trading volume in the absence of divergence, we add a dummy variable () to the construction of the regression model. indicates the presence of online public opinions in Guba, and, vice versa, . The dummy variable is set to truly reflect the relationship between online public opinions and the stock trading volume.
The following control variables are selected based on the study of
Antweiler and Frank (
2004), where the return (
), the market capitalization (
) and the number of online public opinions (
) are selected as control variables. Since we ignore the impact of divergence on the trading volume during non-trading hours, we add the overall divergence of investors during non-trading hours (
) and the trading volume of individual stocks on the previous trading day (
) as control variables.
3.5. Empirical Model
We established a regression model to analyze the differential impact of divergence on the trading volume, as shown in Equation (2).
In Equation (2), represents the CSI 300 index constituents, and represents the time interval in terms of trading days. represent all, low-, mid-, and high- level investors. In this model, the smaller value of is indicative of greater divergence. represents the value of the expected increase when there is no online public opinion. If , it means that the trading volume of the stock tends to increase at that time. reflects the magnitude of the impact of investors’ divergence on the trading volume in the presence of online public opinions. When , if and , it indicates that the existence of divergence reduces the expected increase in trading volume, so the existence of divergence hinders the increase in trading volume. If , and , it indicates that the existence of divergence helps to mitigate the decline in trading volume. At the same time, the smaller the divergence is, the more beneficial it is to the recovery of the trading volume. The number of online public opinions signifies the amount of investor discussion about the stock. When investors talk positively about the expected movement of a stock, the corresponding number of main posts or comment entries will be higher. We treat the number of online public opinion articles logarithmically. the number of online public opinions. If > 0, it means that the higher its discussion and attention, the higher the trading volume of the corresponding stock.
In addition, due to the different industries and market capitalization of the CSI 300 index constituents, investors have shown mixed interest in discussing and trading them. Non-financial companies such as Kweichow Moutai Co Ltd. (600519.SH), Contemporary Amperex Technology Co., Limited. (300750.SZ) and BYD Co Ltd. (002594.SZ) not only have higher weightings but are also more topical. In contrast, financials are typically discussed only when dividends or major events are released, as they are known for their solid returns. The amount of capital is also an important factor in determining investors’ trading preferences for stocks of different market capitalizations.
We collected statistics on investors’ online public opinions, distinguishing between various industries and market capitalization. Then, we calculated the average market capitalization of stocks on each trading day throughout 2021. The lower and upper quartiles are considered as small-cap and large-cap stocks, while the rest are classified as mid-cap stocks. The results are shown in
Table 2.
Table 2 shows that the cumulative number of online public opinions posted during the trading session is much higher for non-financial stocks than for financial stocks. Additionally, there is a difference in the number of online public opinions about stocks with different market capitalization across industries. Large-cap stocks have more online public opinions, i.e., investors prefer to discuss large-cap stocks. However, China is still an emerging market, and individual investors have less capital. Therefore, mid- and small-cap stocks are more favored by investors. Categorizing stocks of different industries and market capitalization based on the above statistics can provide a basis for the subsequent exploration of the impact of investor divergence on the trading volumes of different types of stocks. It also helps to illustrate the differential impact of investors’ divergence on market liquidity in China.
5. Conclusions
We prove the impact of investors’ divergence on the trading volume of the CSI 300 index constituents using online public opinions from Guba. The variability of the impact on the trading volumes of different level investors’ divergences is further tested by analyzing the publicly available investors’ information in Guba. The results show that all investors’ divergences hinder the increase in trading volume. But an increase in the number of online opinions represents a higher level of discussion. This will lead to a greater trading volume. When investors are classified by their levels of influence, it is evident that investors of different levels have varying impacts on the trading volume. This is demonstrated by the convergence among high-level investors, which can increase investor confidence and willingness to trade. As a result, the trading volume rebounds and contributes to the improvement in market quality. Conversely, the divergence of low-level investors disrupts the original order of the market. The trading volume is decreasing, as investors are becoming less willing to trade. Additionally, the divergence among investors of different levels can also impede an increase in the trading volume. After dividing the CSI 300 index constituents by industry and market capitalization, it is found that non-financial stocks and mid-cap stocks are more affected by investors’ divergence in this paper. Additionally, the impact of high- and low-level investors’ divergences on the trading volume always differs. All of them indicate that the convergence of high-level investors has a positive effect on the recovery of the trading volume, while the divergence of low-level investors accelerates the decline in trading volume.
The impact of investors’ divergence on the Chinese stock market is a significant concern. Online stock forums provide a convenient platform for investors to express their views and expectations. However, they also serve as a means to regulate investors’ online public opinions. Therefore, market regulators and financial institutions can utilize relevant stock online forums to provide positive guidance for low-level investors. This will be helpful in maintaining market stability and improving the quality of the Chinese stock market. On the basis of the above findings, we would like to make the following recommendations: First, we recommend strengthening the regulation of Internet stock forums: with the development of the Internet and communication technologies, information dissemination has become faster and more widespread. Stock forums such as Guba facilitate investors’ behaviors in sharing market expectations due to their anonymity. However, the rapid dissemination of information may exacerbate irrational investment behavior, as many investors lack professionalism and experience. Therefore, regulators need to use big data and information technology to strengthen market regulation, including monitoring and verifying information on forums to ensure orderly information dissemination. Second, we recommend improving investors’ financial literacy: Investors need to improve their financial literacy and ability to discriminate information. This includes improving their knowledge of the financial market to make rational investments, as well as improving their ability to discern information on the Internet to avoid making wrong investment decisions by blindly following the herd.
There are limitations and shortcomings in our paper, and subsequent studies may consider breakthroughs in the following areas: First, we only use the level of influence to differentiate investors, and we do not discuss high-level investors in more detail. However, Guba also provides other characteristics to label fields, so subsequent studies can be based on multiple perspectives to conduct more detailed research. Second, we use Jieba (Jieba) for Chinese word segmentation and SnowNLP for word segmentation processing to classify investors’ online public opinions into sentiment. With the development of large-scale artificial intelligence models, subsequent research can use them to handle massive data and natural language processing.