4.2. Iot Identity Ecosystem
This research has set out to identify identity attributes collected or generated by IoT by conducting empirical studies related to IoT devices, mobile apps, and device services, and also referring to daily news and use cases. With the far-reaching list of identity attributes, this research builds the IoT Identity Ecosystem.
The UTCID IoT Identity Ecosystem organizes identity attributes describing People (e.g., blood type, postal address, name) and Devices (model number, name, GPS location). Of course some attributes might be used to describe both a Person and a Device (e.g., GPS location).
Recently the concept of “Smart City” has rapidly risen [
45]. Smart Cities consists of smart phones, mobile devices, sensors, embedded systems, smart environments, smart meters, and instrumentation sustaining the intelligence of cities [
46]. As a result, the relationship between people and devices has become more intertwined. From mobile phones and laptops to GPS, sports watches and even to baby monitors, technical devices are collecting identity attributes anytime and anywhere. This research constructed a list of identity attributes of items according to devices’ characteristic, function, affordances and other documents [
47].
The UTCID IoT Identity Ecosystem takes UTCID ITAP dataset as input. Each identity attribute in this UTCID ITAP dataset has several characteristics such as “Attribute Type”, “Risk”, “Liability Value”, “Possession”, “Verification Accuracy”, “Prevalence”, “Uniqueness”, “Verification Invasiveness”, and “Probability of Exposure”. This research utilizes some of these characteristics in the proposed solution. For example, identity attribute’s type is divided into four categories: What You Are, What You Have, What You Know, and What You Do [
48], while prior work focused on identity attribute type as relates to the people, this research extends the concepts and properties to identity attributes not only for people but also for devices.
What You Are: For a person, it means a person’s physical characteristics, such as fingerprints and retinas. For a device, it means the type of a device. It can be a laptop, a smart watch, a sensor, and so on. It is also related to a device’s hardware configuration, such as circuit design and power usage.
What You Have: For a person, it means credentials and numbers assigned to the person by other entities. For a device, it means identifiers assigned to the device by other entities such as model numbers, serial numbers, and inventory tags.
What You Know: For a person, it means information known privately to the person, such as passwords. For a device, it means any information that is stored on the device and know only to the device.
What You Do: For a person, it means a person’s behavior and action patterns, such as GPS location. For a device, it is related to application types, GPS location, behavior patterns.
With experts knowledge and studies, the research populated identity attributes of devices, classified those attributes and defined relationships between those device identity attributes to the identity attributes of people defined in previous work (see
Figure 2).
Table 2 shows examples of identity attributes of IoT devices. We will discuss more details of UTCID IoT Identity Ecosystem in the next subsection.
With the Bayesian inference, the UTCID IoT Identity Ecosystem answers several research questions relevant to the privacy risk, risk of exposure and liability of any person in terms of managing identity attributes. The graphic model in the IoT identity ecosystem does not represent a specific person. The graphic model in the IoT identity ecosystem shows a more general analysis for the universal relationships of identity attributes and miscellaneous risks for identity management. The question that IoT Identity Ecosystem can answer and also the functionality that this research utilizes is “When a set of attributes is exposed, how does it affect the risk of other attributes being exposed?”.
For instance, if the credit card number of an individual is compromised, what are the most risky node items that fraudsters might proceed to obtain after that? Moreover, what if one’s unique device identifier number is compromised? What are other set of device identity attributes that fraudsters might try to obtain in order to identify a specific person? Multiple attributes can be selected as evidence (i.e., exposed identity attribute) at the same time. It also shows potential loss after such a breach. The UTCID IoT Identity Ecosystem also allows the users to choose a node property, such as value or risk, to determine node sizes and colors in the 3D graphic model.
Let us define some notations before introducing more details. Given V as a set consisting of N identity attributes and E as a set of directed edges, the IoT identity ecosystem is represented as a graph . Each edge is a tuple where the identity attribute is the starting node and the identity attribute is the destination node such that . Each node represents an identity attribute that consists of different quantitative properties and each edge represents a possible path by which the destination node can be breached given that the starting node is breached. We assume there is no cycle in the graph of UTCID IoT Identity Ecosystem.
This research constructs two different approaches to calculate the impact on the identity attributes from their ancestors and descendants. The first one is a static approach.
This research has defined an identity attribute linked to a person or device. Among several different properties for attributes the research defined Risk and Uniqueness as follows:
Risk (shows the risk of exposure): Low, Medium, High.
Uniqueness (shows how unique the identity attribute is for the individuals, devices or organization who have it): Individual, Small Group, Large Group.
These properties are obtained from UTCID ITAP dataset. The Uniqueness of an identity attribute determines the strength of the identity attribute [
49]. At first, by referring to the Bayesian Network Model in UTCID IoT Identity Ecosystem, this research endeavours to provide a approach that utilizes the basic properties of each identity attribute.
For example, the identity attribute “Social Security Number” has risk of “High” and has uniqueness of “Individual”. However, using this approach as a metric is not leveraging the characteristics of the Bayesian Network model. So we move forward to an advanced approach.
For the advanced approach, or dynamic approach, this research uses Bayesian inference. Each identity attribute A has a prior probability, or probability of exposure, denoted as which indicates the probability of this identity attribute gets exposed on its own. Each identity attribute A also has a liability value denoted as which indicates the max amount of potential monetary loss one would encounter when the identity attribute A is exposed. Note that the probability of exposure and the liability value of identity attributes are obtained from UTCID ITAP dataset.
The first part of this dynamic approach is bringing the ancestors of an identity attribute into the calculation.
Given the graph in
Figure 3 as an example of IoT identity ecosystem, identity attribute
and
E are ancestors of identity attribute
A, whereas identity attribute
D is not the ancestor of
A. Every ancestor of identity attribute
A has a path that can lead to itself. Let
be the set of ancestors of
A. Given that
is exposed where
, by applying the Bayesian inference, we can get the posterior probability of exposure of
A which is denoted as
. This probability indicates the probability of identity attribute
A gets exposed on its own after its ancestor
was exposed. For simplicity, we denote the posterior probabilities caused by
as
.
Hence, given the
values, it is easy to compute the percentage increase in the probability of exposure for identity attribute
A as
. Therefore, the sum of the percentage increase in the probability of exposure for
A can be computed as
where
and we call
the “Accessibility” of
A.
This research calls accessibility a dynamic property because it changes its value based on different situation (different set of identity attributes got breached). The value of accessibility of identity attribute A would be different if different ancestors of A is breached. The value of accessibility indicates the difficulty to get this identity attribute. The lower the value is, the harder to get to this identity attribute. The value of the accessibility could be affected by the size of the ancestors. It makes it harder to get to this identity attribute due to only few entrances.
The second dynamic property is called the “Post Effect”. Take the graph in
Figure 3 for example again, identity attributes
and
H are descendants for identity attribute
A. As a result, given that identity attribute
A is breached, by applying Bayesian inference, the probability of exposure of every descendant of identity attribute
A is going to be impacted. Let
be the set of descendants for identity attribute
A and identity attribute in
be
. Hence, the posterior probability of exposure for identity attribute
is
and the percentage of the probability difference can be shown as
. Recall the each identity attribute
A has a liability value
. Therefore, the increase for monetary loss for identity attribute
can be denoted as
.
Thus, the total monetary loss increase of the descendants set
can be showed as
where
and we call
the “Post Effect” of
A.
The post effect also changes its value based on different situation. In real life cases, fraudsters would not only target on one identity attribute of the victim. Surely multiple identity attributes will suffer. Therefore, as the identity attributes of different sets are stolen, the exposure probability of each identity attribute will also change, causing the Post effect to change accordingly. The higher the value of post effect for identity attribute A is, the larger the impact to the descendants of A will be.
Take “Social Security Number” for instance again. It has accessibility of 58% and it has post effect close to $14 million. Hence, with the two dynamic properties of the identity attributes, we are able to perform the privacy risk analysis in the next part.