*Problem Description*

Generally, the data of social communication is noisy and huge. Typically we usually need to analyze on a real-time. Besides, experts with different backgrounds may have different ideas of whether a user in the network is anomalous or not. For example, Sociologists may judge whether a user is anomalous by the number of contacts he/she has; researchers of the algorithm may focus on calculating the similarity between the user and others in the network. Thus, we need to design a novel visualization system to help experts from different domains. Our goal is to find anomalies in unlabeled social data, which is difficult to analyze through only one perspective. For the follow-up study, we define the following questions.


### **4. Detection in Social Networks**

Anomaly usually refers to the part of users that behave differently from other users throughout the social networks. Thus, only by identifying those weird users can we make follow-up analysis and validation. In this section, we will introduce the egocentral based data model and the metrics used in detecting anomaly in our system. Then, we combine them into anomaly detection to find out the egos we need to explore deeply.

### *4.1. Data and Model*

Communication data records the behaviors of how people communicate with each other and how they organize their social networks. For example, the Call Detail Records (CDRs) can be used not only to study the human communication behaviors but also to analyze Ego Networks (ENs). The call detail records are collected by mobile operators for billing and network traffic monitoring. The basic information of such data contains the anonymous IDs of callers and callees, time stamps, call durations, and so on.

In order to design a system using in all social networks, the first thing we need to do is to build a general model. However, this is not an easy task, because it not only needs to summarize from a wide variety of data, but also requires to make the features meet the requirements for anomaly detection. On the basis of existing research, we build a general model based on the egocentric network. The egocentric network can reveal the topology and features of egos and is useful in understanding the egos and validating them.

An egocentric network usually consists of a central node and several other nodes surrounding it, and there is a bond that allows egos and alters to connect with each other. For social networks, it is called contact. Contact is an important measure of the intimacy between egos and alters, as well as the structure of the egos' network. Different social communication has different contact methods. For example, in the telecommunication or e-mail, it means I have a phone call or email with you, while in the Tweeter or Weibo, it means I retweet, comment or like under your tweet or vice versa.

It is a common way to use graphs to represent social networks [33]. Both directed and undirected graphs are used in the research [4,42]. General speaking, bidirectional contact usually shows stronger intimacy than unidirectional [33] and is full of research value. Thus, in order to preserve the difference and information between bidirectional contacts and unidirectional contacts, we abstract the social network into a directed graph *<sup>G</sup>*(*<sup>V</sup>*, *E*) , where *V* and *E* respectively represent the number of nodes and the number of links in the graph. *lti*,*j* is a link from *i* to *j* at start time *t*, and the weight *wti*,*j* means the value of contact between *i* and *j* from the start time *t*.While the methods to quantify contact are different in different social networks, it can be concluded that the contact between two people is measured by the total counts, the strength of each contact and the direction. Figure 1 shows our data model using the above concepts, and the longer the arrows are, the less intimate the relationship is. The dotted border indicates the alter is a bidirectional alter. The square box represents the ego and the color of the box indicates which group the ego belongs to. The color of nodes means that whether they have something in common. For telecommunication or email, it means whether they are from the same operator or service provider, while for Twitter and Weibo, it means whether they have joined in the same topic. Without special explanation, we will all represent the contact from ego to alter with contact-out, and from alter to ego with contact-in. The same to alters, we use alter-in and alter-out to represent alters who have contact-in or contact-out behavior, and we use local alter and alien alter to show whether the alter and the ego have something in common, such as interests.
