3.2. Measurement of Key Level
A social network can be defined as 
 where 
V is the set of nodes 
G; 
E is the set of edges between nodes in 
G. 
G is a logical network where the construction of edges relies on results of real communications between nodes. 
G has the huge scale of nodes, but the distance between nodes in 
G is not far in terms of six degrees of separation based on the small world of social networks [
47]. In other words, the small world of social networks can effectively support fast speed, high success ratio and a large scale of video propagation. Propagation of a video can seemingly have extensive coverage in 
G with the help of short social distance between nodes. However, a small number of videos has relatively extensive popularity in networks. Obviously, the social distance is not a decisive factor for the popularity of video propagation. Social networks have the characteristics of small world; some nodes have abundant social links and strong social influence and undertake the link tasks of social relationship. These nodes which use social links and influence to promote propagation of resources are considered the key nodes. Specifically, the key nodes use social links to supply video data for other nodes or push video data to their neighbor nodes and further rely on the social influence to enable the neighbor nodes to accept the pushed video data. If the key nodes are selected before propagation of videos, the video systems use the key nodes to promote the propagation scale of videos.
We assume that the visibility of video information for each node is one at the initial moment of video propagation; in other words, all nodes in 
G can obtain video information by message broadcasting of the video source nodes. “Pull” and “push” are the two ways of video propagation. When a node 
 wants to obtain video resources, 
 sends the request message to a or multiple nodes (“pull”) or waits for the pushed video data from other nodes (“push”). If 
 has obtained video data and successfully supplies or pushes video data for many nodes, 
 can be considered as the key nodes. The coverage levels of 
 for propagation of 
 and the interest-related probability that 
 obtained data of video 
 are the important factors for estimation of key levels of 
. Let 
 be the interest-related probability that 
 obtained data of video 
. The value of 
 is calculated by the probability 
 that 
 accepts 
 and the influence factor 
 that 
 is influenced by the number of 
’s one-hop neighbor nodes which fetch data from 
. 
 can be defined as:
The videos are classified as different categories in terms of theme and content of videos, such as comedy and science fiction. Most users have similar interest distributions: intense interest for a few video categories and bland for numerous video categories. The stronger the interest of users is, the higher the probability that the users accept videos is. For instance, a user has high interest for the basketball game. The basketball game of the NBA is accepted regardless of the Lakers or the Warriors. Let  be a set of video categories where each video has a unique category;  is the number of items in ;  is the similarity value of content between  and . The information (e.g., title, introduction, directors, actors, etc.) of  and  is constructed as the two vectors, respectively; The included angle cosine of two vectors can be used as the value of .
 is the average value of content-based similarity values between 
 and all items in 
 and is considered as the predicted probability; 
 can be defined as:
        where 
k is a positive integer and is the number of videos which belong to 
 and are propagated at the same period time; 
 denotes that multiple videos which belong to the same video category and contemporaneously are propagated in networks; 
 denotes that a unique video which belongs to 
 is propagated in networks. When the videos of multiple categories are propagated in networks, there is a game among different video categories. The videos rely on the adaptation level between the belonged categories and the interest preference of users to be accepted by users. When the multiple videos which belong to the same category are propagated in networks, there is also a game between them. In fact, 
 can be considered as a game factor, which means that the users accept the videos with the same category using the same probability.
 denotes the important level of a video category 
 relative to other categories and is also considered as the probability that 
 accepts videos in 
; 
 can be defined as:
        where 
 is the number of items in 
 which is accepted by 
; 
 returns the number of all video categories. 
 is the total number of all videos in all video categories which is accepted by 
; 
 is a regulatory factor; 
 means that the probability that 
 accepts 
 is suited to the content-based correlation levels between 
 and items in 
; 
 means that the probability that 
 accepts 
 is suited to the historical behaviors of video playback of 
.
 means that there is a balance content-based correlation levels and historical behaviors.
In social networks, if the two nodes build a direct communication relationship and the message interaction between them does not depend on forwarding of other nodes, they have one-hop neighbor relationship, and the social distance between them in social network is one. Let 
 be the set of one-hop neighbor nodes of 
 in 
G. The two nodes 
 and 
 in 
G directly implement communication (e.g., “pull” and “push” of video data), so the edge between them is built, and they have the neighbor relationship in 
G. The social relationship level between 
 and 
 is closer than the nodes which do not have the edge with them; 
 uses the edges to interact with the neighbor nodes, so that there are different levels of social relationship between 
 and the neighbor nodes. If 
 and neighbor nodes in 
 have high-frequency video sharing behaviors of interaction, they have a close social relationship. For instance, if 
 and a neighbor node 
 in 
 always meet the request of video data with each other, there is a close social relationship between 
 and 
. If 
 and 
 always accept the pushed video data with each other, 
 and 
 also have a close social relationship. Further, the behaviors of fetching videos of 
’s neighbor nodes generate different influence for 
 in terms of the close levels of social relationship. When 
 has data of 
 and obtains the information that the close neighbor nodes accept 
, the probability that 
 makes the same decision (acceptance or rejection of 
) with the close neighbor nodes may be high. On the other hand, the influence of nodes which have a low-coupling social relationship for 
 is low. Therefore, 
 can be defined as:
 is a set of 
’s neighbor nodes which accept 
; 
 returns the number of nodes in 
; 
 is the number of videos which is accepted by 
 in terms of the influence of neighbor nodes. 
 is the historical ratio of influence-based video fetching of 
 and also is considered as the experiential probability of fetching 
 of 
. 
 is also a regulatory factor like 
. 
w is the weight of edge between 
 and a neighbor node in 
; 
 is the cumulative sum of edge weight between 
 and nodes in 
 which accept 
; 
 is the total sum of edge weight between 
, and all nodes in 
. 
 of 
 and 
’s one-hop neighbor node 
 can be defined as:
        where 
 is the interaction frequency between 
 and 
; 
 is the total frequency of interaction between 
 and all nodes in 
; 
 is the successful number of interaction between 
 and 
; 
 is the total successful number of interaction between 
 and all nodes in 
. 
 is the interaction level of 
 relative to 
 among all nodes in 
; 
 is the propagation level of video data of 
 relative to all nodes in 
. The investigation of interaction and propagation levels for the nodes in 
 is the two important factors of edge weight 
w; interaction and propagation have the close relation: (1) The interaction between nodes is the precondition of propagation. (2) The propagation reflects the effectiveness of interaction between nodes. The high-frequency interaction can promote the probability of successful propagation; the high probability of successful propagation further enhances the driving force of interaction. The larger the value of 
w is, the higher the probability of successful video propagation is. 
 is a time-related statistical value, which is different from 
. For instance, 
 and 
 are the starting and statistical time, respectively. The value of 
 is calculated according to the number of 
 during the period time from 
 to 
. Therefore, the value of 
 is related with time and relies on the value of 
 during the time period. According to the Equations (1) and (2), 
 can be defined as:
        where 
 is also a regulatory factor like 
. On the other hand, the value of 
 dynamically changes according to the increase in the number of nodes which are covered by 
. The nodes make use of the edges to implement “pull” and “push” of data for 
 propagation in social networks. In the process of “pull”, 
 wants to obtain data of 
 and sends to the request message to neighbor nodes. If a neighbor node 
 has data of 
, 
 delivers data of 
 to 
 and achieves covering 
. If the neighbor nodes do not have data of 
, they forwards the request messages to their neighbor nodes. The request messages are forwarded along the edges and are responded to by the nodes carrying data of 
. In the process of “push”, 
 accepts push of 
 data of a neighbor node 
, which denotes that 
 is covered by 
 via “push”. If 
 enables most of the nodes in 
G to be covered via “pull” and “push”, 
 plays an important role for the video propagation and can be considered as the key node. If 
 enables a large number of neighbor nodes to be covered during a period of time, 
 can be considered as the candidate key node. Because the key nodes are selected before the starting time of the propagation period of 
, the number of neighbor nodes covered by 
 during the propagation period should be a predicted value. In other words, the prediction of coverage levels of nodes in 
G is an estimation for the selection of key nodes before propagation of 
.
In the case where 
 is a neighbor node of 
 and the value of 
 of 
 can be calculated by the similarity between 
 and the accepted videos in 
 according to the historical playback records of 
, which is the content-based probability of 
 fetching 
, 
 is the probability that 
 is covered by 
 for propagation of 
 via “pull” and “push” between 
 and 
. Here, “pull” means that 
 responds to the request for 
’s data of 
, and “push” means that 
 receives 
’s data pushed by 
. 
 is the predicted value in terms of the statistical information that 
 is covered by 
 and is defined as:
        where 
 is the number that 
 is successfully covered by 
 for all videos in 
, and 
 is the number of “pull” from 
 to 
 and “push” from 
 to 
 for all videos in 
. Not all interactions of “pull” and “push” between 
 and 
 are successfully implemented. For instance, when 
 receives a request message for video data from 
 and does not store the requested video resource, 
 does not supply video data for 
 and only forwards the request message to other neighbor nodes. When 
 receives a pushed message of video data from 
 and is uninterested in the pushed video, 
 rejects the pushed video. Therefore, the probability 
 that 
 is covered by 
 for propagation of 
 can be defined as:
If a threshold value 
 is used to estimate whether 
 is covered by 
 for propagation of 
, 
 denotes that 
 is interested in 
 and 
 is covered by 
 via “pull” or “push”. 
 denotes that 
 is not covered by 
 via “pull” or “push” 
’s neighbor nodes which may be covered by 
 for propagation of 
 form a set 
 by comparison between 
 and 
. The nodes in 
 are the predicted results. The propagation process of 
 can be divided into multiple period time rounds. The time length of each propagation round can be defined as playback time 
 of 
. 
 is the set of nodes which is covered by 
 at 
 round of 
, and 
 returns the number of nodes in 
 at 
 round of 
. The value of 
 at 
 round of 
 can be calculated according to the following equation.
        
        where 
 returns the number of nodes in 
; 
 is the set of nodes which have stored data of 
 in 
; 
 returns the intersection set of 
, and 
; 
 is the difference set between the predicted set 
 and the set of nodes which have been covered in 
. 
 is the set of nodes which have been covered in 
G, and 
 denotes the set of nodes which have not been covered in 
G; 
 is a ratio of the cumulative sum of weight values of predicted nodes covered by 
 relative to the set of nodes which have not been covered in 
G. Because 
 is a predicted result, the value of 
 can be calculated before the starting time of the 
 propagation round. 
 can be calculated according to the predicted coverage levels of neighbor nodes of 
 at the 
 propagation round. However, the difference between the predicted and real results of 
 can be used to optimize the estimation of coverage levels of other nodes at the 
 propagation round, which brings the negative influence for the selection accuracy of key selection at the next propagation round. 
 can be re-defined as:
        where 
 is an influence factor which is used to regulate the weight values of nodes which are not covered; 
 increases linearly with the increase in the number 
x of the propagation round; 
 is the frequency that 
 receives the push of 
 data from 
’s neighbor nodes (we assume that 
’s neighbor nodes push data of 
 only once); 
 returns the number of 
’s neighbor nodes and 
. For instance, a node 
 is the one-hop neighbor node of multiple nodes in 
G. If 
 is not covered by 
’s neighbor nodes in 
G after 
x propagation round of 
 and receives many push requests for 
 data, the probability that 
 is covered by 
’s neighbor nodes via “pull” or “push” may increase at the next propagation round of 
 in terms of the linear threshold theory. Moreover, 
 is also a prior factor: entropy between predicted and real results of covered neighbor nodes of 
 for historical propagation of videos. For instance, before propagation of 
, 
 is the key node of multiple videos; 
 enables 
’s neighbor nodes to store the propagated videos according to the predicted neighbor nodes with high coverage probability. Let 
 be the set of nodes which belong to the predicted set 
 and are not covered by 
 in the process of propagation of 
. Let 
 be the set of nodes which do not belong to the predicted set 
 and are covered by 
 in the process of propagation of 
. The value of 
 can be defined as:
        where 
 and 
 return the number of nodes in 
 and 
, respectively, and 
 denotes the ratio between the number of covered items in all neighbor nodes and number of all neighbor nodes. Further, the value of 
 can be defined as:
        where 
 returns the number of videos in a video set 
, 
 becomes the key nodes in the propagation process of videos in 
, and 
 is the mean value of the coverage success rate of 
 for the propagation of videos in 
. 
 is the probability that 
 accepts 
 and can be calculated in real time according to the historical playback records of 
. Because the number of videos watched by 
 increases stably, the value of 
 is also stable during the propagation process of 
. The value of 
 can also be calculated in real time or period time by the current ratio of the number between neighbor nodes which have played 
 and all neighbor nodes and historical influence-based playback records. 
 can be calculated according to the current coverage levels of neighbor nodes of 
 at the 
 propagation round. Therefore, when 
 starts to be propagated in 
G, the value of 
 can be calculated and be periodically updated according to the variation of values of 
 and 
 during the period time of the propagation round. According to values of 
 and 
, the key levels of 
 for propagation of video 
 at the 
 round can be defined as:
At the initial propagation round, , but the values of  and  may be not equal to 0, so the value of  can be calculated or updated before the starting time of each propagation round of .
  3.3. Video Propagation Based on Key Nodes
The video propagation in social networks shows a cascade process, so scale and capacity of key nodes determine scale and efficiency of video propagation. Except for the key nodes, video propagation is implemented according to the supply of video data for request nodes. They also use their social influence to achieve successful push of  data with high probability. However, all key nodes are not activated at once due to the limited upload bandwidth of source nodes carrying video data. The key nodes are wholesale activated according to the priority based on key levels. The key nodes make use of extensive social connections and strong social influence to promote the coverage range of video propagation so that they should be preferentially covered. However, the promotion of range and efficiency of video coverage also causes a fast increase in the demand for network bandwidth required by video delivery. If the key nodes have enough bandwidth and numerous neighbor nodes, they can successfully handle the converged request of video data. The successful delivery of video data not only increases the coverage range of video propagation, but also the new covered nodes become the new suppliers of upload bandwidth. The video cascade propagation from key nodes to their neighbor nodes uses the high propagation success rate to balance supply and demand of upload bandwidth at the initial propagation of videos and ensure the data delivery performance with low wait delay and low packet loss rate. On the other hand, if the key nodes do not have adequate upload bandwidth to respond to the video data request of nodes, the number of video copies is not quickly increased, and the startup delay of request nodes is lengthened; If the key nodes have low node degree centrality, the limited channels of video propagation restrict the scale of video copies and do not utilize upload bandwidth resources of key nodes. Therefore, the available bandwidth and node degree centrality should be considered in the process of selection of key nodes except for interest preference  and coverage capacities  of key nodes.
When multiple videos propagate in 
G, the nodes in 
G may be selected as the key nodes of multiple videos. The bandwidth resources of key nodes compete for the propagation of multiple videos. The bandwidth resources of key nodes should be dynamically allocated in terms of predicted popularity levels of videos. For instance, the videos that are in the primary stage of propagation need more upload bandwidth resources than those in the end stage of propagation; The videos that have a large number of potential covered nodes also need more upload bandwidth resources than those with a small number of potential covered nodes. The allocated upload bandwidth of key nodes 
 at the 
 round of 
 propagation can be defined as:
        where 
 is the available bandwidth of 
; 
k is the number of propagated videos at 
 round, 
 returns the number of in 
 at 
 round, 
 is the number of 
’s neighbor nodes which may be covered by 
 for propagation of 
 at 
 round, and 
 is the allocated ratio of upload bandwidth of 
 for propagation of 
 at the 
 round. The larger the values of 
 of nodes are, the stronger the key levels (propagation capacities) of nodes will be. Because the key level values of nodes are in the range 
, the nodes with the values of key levels in 
 are considered as the nodes which do not have the propagation capacities. For instance, before the starting time of the 
 propagation round of 
, the key levels of all nodes for propagation of 
 are estimated and form a set 
. All nodes are also considered as candidate key nodes and form a set 
. The nodes that have stored data of 
 form a set 
. The following steps show the process of propagation of 
 based on the selection of key nodes at the 
 round and is described in Algorithm 1 “Propagation of 
 based on key nodes”:
(1) 
 and the items in 
 are greater than 0. The nodes corresponding to the items in 
 form a candidate set 
 of key nodes at the 
 propagation round of 
. Upload bandwidth and node degree centrality should be considered the weight values of key levels and be added into the estimation of key levels of nodes before the selection of key nodes. The available upload bandwidth determines the number of nodes that accept services of video data delivery of 
. The more sufficient the available upload bandwidth of 
 is, the stronger the capacity of propagation of 
 will be. The capacity of propagation of 
 based on the available upload bandwidth at the 
 round for 
 can be defined as:
        where 
 is the number of request nodes which are served by 
 via data delivery of 
 and is defined as:
        where 
 is the average packet loss rate of 
 in the process of data delivery of all videos, and 
 is the transmission rate of data required by playback of 
. The values of 
 of all candidate key nodes are calculated and form a set 
; 
 and 
 are the maximum and minimum values in 
, respectively; 
. The node degree centrality of 
 can be defined as:
        where 
 returns the number of neighbor nodes of 
, 
 is the normalization value of node degree centrality of 
, which reduces the negative influence caused by variation of number of nodes in 
G, and 
. The weighted key levels of nodes in 
 can be calculated according to the following equation.
        
The values of  of nodes in  are in the range [0, 1]. The items in  are descendingly sorted in terms of the weighted key levels.
(2) Distribution and scale of data requests in 
G are different during the different propagation rounds of 
. Variation of distribution and scale of data requests result in a change in the balance between supply and demand of upload bandwidth. If the scale of requests is far larger than the current supply capacities of upload bandwidth in 
G, the more key nodes should be selected to meet the demand of blowout upload bandwidth. If the scale of requests is less than the current supply capacities of upload bandwidth in 
G, the selection of key nodes should be suspended in order to save the resources of upload bandwidth in networks. Therefore, the demand value of upload bandwidth should be predicted before the selection of key nodes. The predicted demand value 
 of upload bandwidth at the 
 round of 
 propagation can be defined as:
        where 
 is the predicted number of increased covered nodes every unit time at the 
 round, which includes two types of node fetching data of 
 via active request and push acceptance; 
 is the length of playback time of 
 and is also the time length of the 
 round; 
 is the number of covered nodes at the 
 round; 
 is the predicted value of upload bandwidth at the 
 round of 
 propagation. The value of 
 is calculated according to the grey forecasting model (GFM). The time length of the 
 round is equally divided into multiple time slots 
. The value of 
 at each time slot 
 can be defined as 
, where 
 is the number of covered nodes during 
 and 
 is the time length of 
. The covering rate of nodes corresponding to each time slot can be calculated and form a time-ordered sequence set 
 where 
. The covering rate of nodes corresponding to each time slot at the 
 round can be calculated according to the GFM, and the value of 
 can be obtained according to the cumulative sum of the predicted covering rate of nodes at the 
 round. The covering rate of nodes at the initial round cannot be obtained, so the mean value of the real covering rate of all videos in 
 at the initial round can be considered as the predicted value of 
 of 
 at the initial round. When a neighbor node of 
 is the member in 
 of 
 and is covered at the 
 round, it should be removed from 
 of 
 at the 
 round. Let 
 be the cumulative sum of available upload bandwidth in 
 at the 
 round. If 
, the scarce supply of upload bandwidth requires that the key nodes be selected, preferentially covered and step (3) implemented. If 
, the redundant supply of upload bandwidth means that there is no need to add new key nodes and implement step (5).
(3)  is a number of the selected key nodes at the  round of propagation of ;  returns the number of items in  and is the upper bound of ;  is the lower bound of  where  is the means value of upload bandwidth of nodes in . If , the  nodes in  are selected as the key nodes and are preferentially covered. If  where , all nodes in  are selected as the key nodes. The period time length of the current round is defined as , which means that propagation of  should enter a new round in order to select new key nodes after the available upload bandwidth of all nodes in  is consumed. The decrease in period time of the propagation round promotes the real-time levels of updating supply and demand of upload bandwidth, which speeds up the iteration of the propagation round and relieves the supply shortage of upload bandwidth by increasing the new key nodes.
(4) The  key nodes are selected from . They immediately fetch data from  by sending request messages and accepting the push of  data. The nodes in  of key nodes have a high probability of fetching data of , so the key nodes should preferentially push data of  to nodes in their  in terms of the descending values of . Moreover, the key nodes also need to handle the request messages of  data from their nodes. For instance, when a neighbor node  of  uses the edges with social neighbor nodes to send a request message for  data,  responds and delivers data from  to . Moreover, when the neighbor node  of  receives a request from a neighbor node of  and  does not deliver data of  for the request nodes,  also forwards the request message to , and  directly returns the response message to the request nodes.
(5) When the current round finishes, the values of parameters in all equations are updated according to the distribution of  copies in G. The covered nodes at the  round are also removed from . All nodes in G remove the nodes which have been covered from their . After the values of  of nodes in  are re-estimated, the nodes in  with  are added into . If  is the empty set, the propagation process of  based on the selection of key nodes returns step (6); Otherwise, the process returns step (1).
(6) The process of key nodes for propagation of 
 is ended.
        
| Algorithm 1 Propagation of  based key nodes | 
| 1: x is round number of propagation of ; | 
| 2:  is length of  round; | 
| 3:  and  are constructed; | 
| 4: calculates  of nodes in ; | 
| 5: while | 
| 6: | 
| 7:  for (h = 0; h < ; h++) | 
| 8:    ; | 
| 9:  end for | 
| 10:  calculates value of ; | 
| 11:  calculates value of ; | 
| 12:  if | 
| 13:     and ; | 
| 14:    for () | 
| 15:      if | 
| 16:        ; | 
| 17:         is selected as key node; | 
| 18:        ; | 
| 19:      else break; | 
| 20:      end if | 
| 21:    end for | 
| 22:    ; | 
| 23:     is starting time of  round; | 
| 24:     is current time; | 
| 25:    while | 
| 26:      key nodes implement push and supply data request; | 
| 27:    end while | 
| 28:  end if | 
| 29:  removes new covered nodes from ; | 
| 30:  recalculates value of  in ; | 
| 31:  reconstructs ; | 
| 32:  recalculates  of nodes in ; | 
| 33:  ; | 
| 34: end while |