**7. Case Study**

In this section, we will apply our system in the task of anomaly detection with a call record data provided by an operator to demonstrate the effectiveness of our system.

In this study, the dataset is provided by one of the largest mobile operators in China. It covers 7 million people of a Chinese provincial capital city for half a year spanning from January to June 2014. According to the operator, all the users can be divided into two categories the local users (customers of the mobile operator who provide this dataset) and the alien users (customers from the other operators). The reason for such distinction is that the communication behaviors of alien users are not recorded and cannot be collected by our data provider based on policies. Therefore, we have to put our focus on the local users whose entire calling behaviors are recorded within the dataset. We won't show the alien alters' details, such as the score of them in the detail view and ego view. In order to protect the

user's personal information, we encryp<sup>t</sup> all telephone numbers and embody the characteristics of local and alien directly on the user ID. So 728 indicates local user, and 719 indicates alien user. Each user has his own unique ID. The basic statistics of the mobile communication data are summarized in Table 1.


**Table 1.** Basic statistics of the mobile communication networks.

Parameter Selection. As described in Chapter 4, for the LOF algorithm, the most important parameter is its neighbor number n. We find that the n has a grea<sup>t</sup> influence on the score of egos in sparse and dense boundaries. In order to ensure that the points in these areas can be classified more accurately, we need to compare them with enough points, so we select a larger n, which can ensure that the points in dense or sparse areas do not have an impact, but can give more full consideration to the points in the boundary. In feature selection, we find that if we do not consider the time sequence, some points with anomalous behavior cannot be detected. We found that because their attributes, such as the number of alters, the number of calls and so on, are not different from normal people, leading to the failure to detect. However, when we introduce the time sequence, the problem is improved.

Exploratory Analysis. First of all, we have made a preliminary exploration to ge<sup>t</sup> the whole picture of the entire dataset. From the group view's MDS map, shown in Figure 4a and containing all the ego information, we can find that most of the points inside the network are concentrated together, and only a small number of points are distributed in the periphery with high anomaly scores. This shows that most users are regarded as normal users when they gather together, while a few users are regarded as outliers when they are distributed at the edge. When we zoom in on the view for observation, we find that egos with higher scores also appear in places with lower scores, where is a place worth studying.

From the list in Figure 4b, we notice that nine egos with more than three points, and from the Figure 4c, the data of each segmen<sup>t</sup> shows a downward trend. We also find that the number of alters with anomaly scores less than 1 does not exceed 150, but they make up only about 6% of the population, while the percentage of egos whose anomaly scores are less than 1.5 is about 95%, and the alters' number of them is no more than 216, which is larger than the Dunbar's Number. We think the reason for this is that communication is bidirectional firstly, which means that you may receive calls you don't want to answer, leading to an increase in the number of contacts. Secondly, it may be affected by the algorithm, and there is the possibility of misclassification. We need a deep analysis to validate.

Results. In order to further verify the effectiveness of the system, we proceed from the actual case and demonstrate the system. We have drawn ego views of all users with ratings greater than 3. As shown in Figure 8, their alters and calls are particularly numerous. They mainly show two kinds of structure, either focusing on the outer layer, showing the characteristics of advertisement users, or mainly focusing on the inner layer, showing the characteristics of robots. From their central radar maps, we can see that they have distinct convex shapes. Figure 4f is the first ego's detail view. We find that this one contacts many users with higher anomaly scores and contacts in more than contacts out, through understanding, we find that this is a customer service of the scam group. For the second ego, shown in Figure 9, we find that he is a highly active user with average contact interval, so we think he is a robot account. Those users who scores more than three points can initially identify anomalous users from the group view, which is further confirmed by the analysis of ego view.

**Figure 8.** Top nine users' ego views. Decrease from left to right and from top to bottom.

**Figure 9.** Second ego's statistical view.

At the same time, some egos cannot directly judge whether they are anomalous users through group view, such as ego: 7285322362, shown in Figure 10a, whose alters and calls are not high, but from his solar ego glyph we can find that his connection with alters is weak, and from his detail view we can also find that his active time is different from normal people, so he is an abnormal ego. As mentioned above, there are some nodes with high anomaly scores in an area full of low scores. We speculate that they may be normal egos with abnormal behavior or abnormal egos disguised as normal egos. So we have a detailed analysis of these points. Like ego: 7281468187, shown in Figure 10b, his score is 1.738, and can be considered as an anomaly ego. However, after our exploration, we find that although the behavior pattern is slightly different from that of ordinary people, the network structure of his contacts is not abnormal, so we think that this is a normal ego who shows abnormal behavior. While the above ego has high anomaly scores, further analysis shows that he is not really abnormal, just because he behaves differently from normal egos. In the dense areas, we believe that there are also really anomaly egos. As shown in Figure 10c, it is an anomalous ego that we have found. While we can't tell if he's abnormal from his solar ego glyph, when we go deep into his behavior patterns, we find that he has signs of full-time activity, so we conclude that he's an abnormal user. We think that this kind of nodes is anomalous egos who want to mix up with normal egos and disguise as normal egos, but they can still be detected by our system. After communicating with experts, they confirm that our speculate is correct and show that our research is very helpful for them to mine potential abnormal users.

We have fully investigated the anomaly detection results by our system and many interesting patterns are found. We also invite several experts from our data provider and telecom data analysis field to help us understand and check our findings. First, we show them the abnormal users with high scores, as shown in Figure 8. They share the same idea with us. For example, the first ego, ID:7282270387, is a customer service of the scam group, while the second user, ID:7283827875, is a robot account. Next, we verify some controversial nodes, like nodes with high scores in an area full of low scores, with them. They all feel that our discovery is valuable and prove that our discovery is useful. They verify that, for example, ego:7285321419, shown in Figure 10c, is indeed a potential abnormal user. However, the results involve the privacy of users, so they do not disclose more detail to us.

**Figure 10.** Some examples; (**a**) is the ego who cannot be judged directly by ego view, (**b**) is the ego who is misclassified, (**c**) is the ego who intends to disguise as a noraml one, but actually is an anomlous ego.
