We divide 6-year interval into 67 half-years overlapping intervals and choose 1053 companies with highest density of news that mention them during the period under review. We excluded news with relevance under 80 (i.e., news with 80% or less probability of being connected with the company). Then for each time interval we check the amount of co-mentions for each pair of companies in one article (if two companies are both mentioned in one article during the period of time, the weight of the link is considered 1); if companies were not mentioned during the interval the weight of the link is considered 0. Then we form unweighted symmetric matrices of co-mentions for each time interval using these weighed calculations of the collective companies’ mentions.
5.1. Similarity Analysis Using Measures h and d
We apply the proposed model to the co-mention network and to the market graph. The information about how the structure of the market graph changed over the adjacent half-years regarding ranking distance
d and local structure distance
h is shown in
Figure 3.
For each six-month window (period) we constructed a market graph in accordance with the approach described in
Section 2.1. The IDs of the periods and their starting and ending dates are given in
Table 1. Thus, we obtained 67 market graphs
corresponding to the 67 six-month periods. Similarly, we obtained 67 company co-mention networks
corresponding to each of the 67 periods (see
Table 1) using the methodology described in
Section 2.2.
We found the values of d-metric for each pair of graphs constructed for all two consecutive 6-month periods, i.e., . In addition, we calculated the values of h-metrics for each pair of graphs constructed for all of two consecutive 6-month periods,i.e., .
Figure 3 shows the evolution of ranking and local structure distances between each pair of market graphs constructed for every pair of consecutive six-month periods, i.e., between 1 and 2, between 2 and 3,…, between 66 and 67. Thus,
i-th point on the
-plane has coordinates
,
. Each point on the plane characterizes the differences between the graphs at the current and previous time windows, evaluated by both the Hamming distance
h and the
d-measure. This visualization allows one to distinguish periods with higher or lower intensity of graph changes.
Figure 3 shows that the local structure of the market graph changed very little until the beginning of the 2008 crisis (blue points). However, during the crisis (red points), the values of the similarity measure
h (i.e., the Hamming distance) between consecutive graphs increased sharply (more than ten-fold). Moreover, after the peak of the crisis was passed, the instability of the network local structure remained at the same high level (green points). On the other hand, the value of the measure
d, which measures the proximity of the ranking of the vertices of two consecutive graphs, did not increase during the crisis.
The
i-th point in
Figure 3 show
-similarity of
i-th and
-th graphs constructed for the corresponding consecutive 6-month intervals defined in
Table 1. Points
correspond to periods with midpoints from July 2005 to May 2008 and are colored in blue. Points
correspond to periods with midpoints from June 2008 to May 2009 and are colored in red. Points
correspond to periods with midpoints from June 2009 to October 2010 and are colored in green. It should be noted that the local structure of the market graph changed greatly at the beginning and during the financial crisis.
Figure 3 shows that structure of significant correlations between asset returns was slightly changing before the crisis, while turbulence in financial markets during the crisis was inducing the visible transformations of the market graphs. Structural changes slowed down for several periods and then they started again. The central vertices list of the market graphs was updating more intense before and after the crisis than during the crisis, i.e., the ranking order of the companies was more stable during the crisis. Perhaps, it was caused by the fact that during the crisis many vulnerable companies were from the same economic sectors that were exposed by risks.
It is well-known that if the edge densities of any two graphs are very different, then the Hamming distance between these graphs will be large. Thus, the main contribution to the change of the market graph structure was due to increase and decrease in the edge density of the graph which can be seen in
Figure 2).
Please note that from the fact that the “blue” points are close to the “green” ones it does not follow that the corresponding graphs are
-close. To understand how much the graphs from the starting “blue” period differ from the “green” graphs, we conduct the multidimensional scaling analysis in
Section 5.3.
Similarly, we found the values of
h- and
d-metrics for each pair of company co-mention networks built for all of two consecutive six-month periods, i.e.,
and
. Points with coordinates
,
, are shown in
Figure 4.
Unlike the market graph, the node ranking and the structure of co-mention networks did not change significantly over time. However, the network local structure had been changing in periods from April 2007 to March 2008 (
Figure 4). This period occurs before and during the financial crisis of 2008.
Figure 4 shows that the co-mention network local structure changed slightly in 2007 (blue points). However, in the period before the crisis (red points), the values of the similarity measure
h (i.e., the Hamming distance) between consecutive graphs increased by more than 1.5–2 times. Questions about what caused the changes in the local structure of the company co-mention network, as well as whether such changes in the characteristics of the news flow may be forerunner of crisis phenomena on the financial market, remain open. Surprisingly, at the very beginning of the crisis, the network local structure became more stable than in 2007, and remained stable in subsequent periods (green points). On the other hand, the value of the measure
d, which measures the similarity in the ranking of the vertices of two consecutive graphs, did not increase during the crisis.
The obtained values of the measures
d and
h for consecutive market graphs (
Figure 3) significantly exceed the values of the measures
d and
h for consecutive company co-mention networks (
Figure 4). Some values of measure
d differ by more than 2 times, while the values of
h-measure differ by an order of magnitude. In this sense, the company co-mention network is more stable than the market graph.
The information about how the structure of the market graph changed in the adjacent half-years regarding co-mention network is shown in
Figure 5. The ranking distance has increased significantly while local structure distance has been stable and not high. So, from the local structure point of view the market graph and co-mention network are similar in many ways. The only exception are the periods from 41 to 51 (with midpoints in August 2008–June 2009), when the United States subprime mortgage crisis started, and from 61 to 67 (with midpoints in April 2010–October 2010).
Financial and economic news which impacts an industry or a sector often mentions key companies of the industry or the sector. Therefore, the connection between companies reflected by their joint co-mention in a news item may be the result of their belonging to the same economic sector. It is known that correlations between returns on assets in the same sector are quite high. Therefore, it can be assumed that the market graph, constructed based on correlations between asset returns, and the company co-mention network, constructed on the basis of co-mentioning in the news, should be similar. However, as
Figure 5 shows, this is not quite true: the differences are significant both with respect to network local structure (
h), and with respect to node ranking (
d).
5.2. QAP Correlation and Regression Analysis
Using networks of co-mentioning companies and market graphs, we carry out a QAP correlation analysis, since standard correlation analysis is not suitable for such data (as they are not independent from each other). This is contrary to one of the basic assumptions of linear regression analysis. QAP (Quadratic Assignment Procedure) was proposed and developed in [
49,
50,
51,
74]. We use QAP correlation analysis to determine the significance of correlations:
for related networks of co-mention,
for time-related market graphs,
When using the market graph as the main network, the corresponding cells of the matrix are compared to compute the Pearson correlation coefficient. Furthermore, this process is repeated, randomly rearranging the columns and rows to find a correlation. Lower Pearson correlation values for random permutations indicate a significant relationship between the respective matrices.
For the correlation analysis, we used the package R.
We apply QAP regression to find the factors which influence the market graph and the company co-mention network. For network presented in binary data, OLS should not be used when building regression, since this method requires observations to be independent and equally distributed. Connections between nodes in the network imply a potentially dependent relationship between either directly or indirectly connected nodes. Hence, the assumption is incorrect and the OLS method cannot be used. Rows and columns of network matrices in QAP are rearranged, thus the calculation of correlations is done between the independent matrices and the dependent matrix. Test statistics can be obtained after several permutations, we use them to check the null regression hypothesis.
In our study, we wanted to find a connection between market graphs, company co-mention networks in adjacent periods of time. To investigate how the market graph is related to the company co-mention network, we used QAP regression, where at time t is used as a dependent variable. Market graph matrices in previous periods and company co-mention networks in the current period were used as independent variables for QAP regression.
The results of the analysis are presented in
Table 5 and
Table 6. Rows and columns of the dependent variable matrix were rearranged 1000 times. Matrices of independent variables are shown in
Table 6. The QAP results showed that the market graph matrix is closely related to the market graph in the previous period of time. The exceptions are periods 37–43 (April 2008–October 2008)—the peak of the financial crisis. Company co-mention networks had a smaller impact on the market graph, though they are also significant for all models built.
QAP shows (
Table 5) that there is a significant correlation both between adjacent co-mention networks and between adjacent market graphs. The estimated density of repeated launches of QAP shows that of all launches, correlations for random graphs turned out to be less than test statistics, and therefore the obtained correlation values can be considered statistically significant.
Estimated correlation coefficients are quite high. At the same time, the company co-mention network is stably reproduced from period to period. As for market graphs, the correlation values vary in wide ranges and it can be argued that it decreased during the beginning of the global financial crisis.
Since we have data for several types of graphs and periods of time, this also allows us to construct a linear regression on graphs. The market graph was taken as a dependent variable at the current time (period) of time (). The independent variables were the market graph at the previous point in time () and the company co-mention network in the current period of time (Co-mention).
The QAP regression analysis of the dependence of the current market graph on the previous one, as well as on the current company co-mention graph, is given in
Table 6. All coefficients of the models are statistically significant.
We also note that the coefficient for has its highest value for . This period corresponds exactly to the beginning of 2008 crisis. This indicates that during the crisis, the market graph had a special structure, which can be explained by the structure of a corresponding co-mention network.
5.3. Multidimensional Scaling
In this subsection we use the multidimensional scaling procedure to visually represent the matrix of pairwise distances between graphs (both market graphs and company co-mention networks). Multidimensional scaling was developed in [
75] and aims in a graphical representation of distances between sets of objects [
76]. Given a small number of dimensions,
k, and for a given distance matrix with the distances between each pair of objects (graphs), multidimensional scaling algorithm is aimed in placing every object (graph) into
k-dimensional Euclidian space in a way such that the between-object distances obtained by graph similarity measures would be preserved as close as possible.
The best-known methods of multidimensional scaling are metric, non-metric and generalized multidimensional scaling methods. Please note that metric multidimensional scaling algorithm finds a linear relationship, while non-metric multidimensional scaling algorithm is characterized by a set of nonparametric monotonic curves. Since we used quantitative rather than ordinal scales, the preference was given to the classical multidimensional scaling (MDS) which is also known as principal coordinates analysis [
77].
Since we consider two sequence of graphs (market graphs and company co-mention networks) and use five measures for calculating distances between graphs, results are formed as the ten matrices of pairwise distances between graphs (five for market graphs, and five for co-mention graphs). Therefore, we apply the multidimensional scaling procedure to the ten distance matrices.
Let be a similarity measure which finds the distance (similarity) between two graphs and . In our study we will use as :
the Hamming distance h;
the network similarity measure
d proposed in [
46];
graph diffusion distance (GDD) [
48].
Using the measure , we can find the distance matrix (adjacency matrix) between all pairs of market graphs from our sequence . Also, using the measure , we can calculate the distance matrix (adjacency matrix) between all pairs of company co-mention networks from the sequence .
Multidimensional scaling analysis allows us
Therefore, the multidimensional scaling analysis can provide an important insight into the dynamics of both market graphs and company co-mention networks.
Figure 6a presents the results of multidimensional scaling applied to the distance matrix between market graphs which is calculated using
h-measure defined in (
2).
Figure 6a shows that the local structure of the market graph is stable over time. During the financial crisis of 2008 (periods 38–50), the topological dissimilarity increases significantly and quickly returns to its previous level. Redundancy analysis shows that 56% of the variance is explained by the first principal component which is good enough.
Figure 6b presents the results of multidimensional scaling applied to the distance matrix between the co-mention graphs which is calculated using
h-measure defined in (
2).
Figure 6b shows that the topological dissimilarity of co-mention graphs is largely decreased before the beginning of the crisis and quickly returns to its previous level after that. Only 20% of the variance is explained by the first principal component.
Figure 6c presents the results of multidimensional scaling to the market graph for distance matrix obtained using
d-measure defined in (
3). There can be seen a significant shift of the central nodes (companies) of the market graph during the crisis. 28% of the variance is explained by the first principal component.
Figure 6d presents the results of multidimensional scaling to the company co-mention graph for distance matrix obtained using
d-measure defined in (
3).
Figure 6d shows that for the co-mention graph there is a monotone increase in the rank distance, which accelerates after the crisis. Thus, the crisis led to significant changes in the ranking order of the co-mention graph companies. Only 31% of the variance is explained by the first principal component.
The results of multidimensional scaling to the market graph and the co-mention graph based on the distance matrix obtained using the linear combination of
d and
h defined in (
5) are presented in
Figure 6e (with
) and
Figure 6f (with
).
Figure 6g presents the results of multidimensional scaling applied to the market graph for distance matrix obtained using
D-measure. It should be noted that the results are quite similar to the results shown in
Figure 6c. 59% of the variance is explained by the first principal component.
Figure 6h presents the results of multidimensional scaling applied to the co-mention graph for distance matrix obtained using
D-measure. There can be seen a significant decrease before the beginning of the crisis and an increase to higher level after that. 75% of the variance is explained by the first principal component.
Figure 6k,l present the results of multidimensional scaling applied to the market graph and to the co-mention graph respectively for distance matrix obtained using Graph Diffusion Distance. The results are similar to the results shown in
Figure 6a. 39% of the variance is explained by the first principal component for the market graph and 13% for the co-mention graph.
The graph similarity measures (D-measure, Graph Diffusion Distance, d, h) showed similar results for the market graph in terms of the principal component method. In the case of the D-measure and h-metrics it suffices to use only the first principal component. For the co-mention network, there were obtained different results for different measures. Except for the D-measure, the first principal component explains less than 32% of the total variance.
However, it seems that the calculation of D-measure is the most time-consuming with comparison to other similarity measures. In our study, we used the corresponding R functions to estimate the similarity between graphs with 1053 nodes. The calculation of the similarity for each of the pairs using D-measure lasted about 5 times longer (and even more in case of increasing the edge density of the graphs) with comparison to d-, h-metrics and GDD.
Below we draw some conclusions on the results of the multidimensional scaling (MDS).
We found that the one-factor model can explain a significant part of the change dynamics in the structure of both the market graph and the co-mention graph. However, the reliability of the conclusion essentially depends on the choice of a graph similarity measure.
One-factor estimates obtained by the MDS based on the distance matrix for the market graphs are turned out to be slightly diverse for different graph similarity measures. In particular, the use of h-measure and GDD metrics gives very similar results, which are different from the results obtained for d- and D- measures. The one-factor estimates obtained by the MDS for the co-mentioning graphs are more sensitive to the choice of the graph similarity measure.
We would like to note that visual representations of the evolution of the market graph constructed using the Hamming distance and GDD-measure (
Figure 6a,k), show very similar temporal dynamics.
The visual representations of the evolution of the company co-mention network constructed using these two measures (
Figure 6b,l) show also quite similar temporal dynamics, which differ only in sign.
The apparent similarity of the edge density dynamics (
Figure 2) with the dynamics shown in
Figure 6a,k indicates that the main factor, that has been identified by the MDS when using the Hamming distance or GDD-measure, is the graph edge density. In other words, the dynamics of graph changes obtained using the Hamming distance or GDD-measure can be easily explained by such a simple factor as the graph edge density.
On the other hand, the use of
d-measures allowed us to identify almost identical dynamics for both the market graph and the co-mention network over time (
Figure 6c,d). The figures show that these changes took place smoothly and continuously, while the ranking of the central nodes during the entire period under consideration changed quite significantly in both graphs.
The results obtained using the
D-measure are more ambiguous.
Figure 6g,h show that one factor is not sufficient to explain the dynamics of the market graph. It seems that the
D-measure is a more adequate tool for network comparison, since it uses more factors to explain the differences between the graphs.
One method out of five (d-measure) shows a significant difference in the structure of graphs in the pre-crisis and after crisis periods. The dynamics of changes for the market graph are turned out to be not similar to the dynamics of the company co-mention network. However, we obtained the closest similarity when applying d-measure.