3.3. Trust Assessment
Building a zero-trust system requires defining a set of attributes from different categories to verify the trust cycle. The zero-trust ecosystem needs to be verified through a continuous trust cycle by implementing a series and chain of trust in order to assess semantic and syntactic relationships between the cloud input sources from users, devices, and output data. The chain of trust is important for deciding what level of access can be granted and denying access if the connection is below the threshold of an acceptable trust score.
Figure 4 illustrates the chain of trust and the assessment scoring criteria within the cloud ecosystem. The proposed framework constructs two assessment scoring criteria to manage the access of distributed medical devices. First is the critical trust (
), which relies on cloud-native microservices. Second is the bond trust (
), which is a proposed scoring scheme to manage access control, as explained below.
uses pre-trained machine learning models to analyze the semantic and syntactic attributes from the trusted and authorized change of the zero-trust cycle pillars see
Section 3.2, related to users, devices, and data output.
Critical Trust (
):
is the initial evaluation and scoring criteria used to grant access to the cloud ecosystem. This assessment grant is preliminary and not for direct connection to the back-end resources for storage and computation.
is important because it acts as an additional layer of security to separate user access control from the actual dataset resources.
is evaluated using cloud-based microservices. There are four main attributes for the critical trust score. Cloud-based microservices such as authorization, authentication, logging, and encryption are digitized to derive the final
score, as per Equation (
1).
Each microservice attribute is assigned a logical value, i.e., 1 or 0. Then, these microservices’ logical values are multiplied by a scoring factor (
) based on their importance, which can be set by the system administrator. The cloud decision engine grants access status to
allow for trusted authority,
verify whether more information is needed, and
deny non-trusted access requests.
In the above equation,
is the authentication and its scoring factor is
;
is the authorization and its scoring factor is
;
is the encryption and its scoring factor is
; and
is the logging, with a scoring factor of
.
Table 1 provides an example of critical trust score evaluation using different scoring factors and logical values of the micro-services.
Bond Trust (): When a transaction passes the critical trust assessment, the bond trust is used to evaluate the relationship to other resources in order to build a trust cycle, ensuring that only authorized and highly trusted actors and designated people can access data or resources based on the organizational policy or rules. Calculating bond trust is more complex, and depends on several different aspects. has two main assessment criteria. The first is , which assesses the semantic relationship between each individual attribute stored in the health care information system. The second is , which assesses the syntactic relationship between the set of candidates in a generated health report. The reason for using these two measures is, first, that it is essential for each attribute to have meaning and to be related to similar attributes as compared to the pretrained one; second; it is essential to guarantee that the attributes in the generated report are in keeping with the context of the patient’s history to ensure that the report is highly likely to be related to the same patient, avoiding false diagnoses due to having the wrong case.
The proposed assessment of
uses an Attribute2Vec representation based on a pretrained Word2Vec model [
40,
41]. Attribute2Vec is used to map the attributes and their synonyms to words that have the same context from the user (
x), hardware (
y), and output (
z) attributes stored in their electronic health records. The skip-gram methodology [
42] is used to derive the attributes with the same context; in this framework, we suggest using the first three words with the highest context probability. The advantage of using this assessment technique is to generalize the model by accepting a wide variety of attribute descriptions in a global context. Word2Vec is valid for different languages and dialects; for example, it was used by Altibbi.com [
43] to train 1.5 million medical consultation questions in the Arabic language. We recommend using a matching engine on the Vertex AI platform at Google Cloud to ensure that the word embedding and vector similarity matching processes are efficient and reliable.
Figure 5 depicts the process of assessing bond trust. The input has three attributes: users, devices, and output. The hidden layer extracts features and the
layer is used to predict the probability and extract the set of similar attributes that has the highest probability. In this research, we selected the three highest attributes. Eventually, the cosine similarity is used to predict the relationship between attributes from different categories (x, y, and z). Then, bond trust scoring is used to derive the final score to decide whether to accept or reject the attributes based on the predefined threshold.
The cosine distance is used in Equation (
2) to predict the similarity probability of the context of attributes of
x,
y, and
z:
where
is the dot product between two vector attributes
and
, while
,
are the respective L2-norms of attributes
and
and
is the angle between the two vectors.
The attribute vectors with the highest probability between x, y, and z are then used to derive the bond or semantic mutual relationship in three bond trust scores sets: , , and for the relationships between , , and , where is the is the bond trust set, which is derived using the two inputs described below.
A. Cosine similarity logical evaluation: Algorithm 1 is used to assign a logical value to the cosine similarity between two attributes, taking an assigned value of either one or zero based on the relationship between attributes
x,
y, and
z. The value is assigned based on the threshold of the angle
between the two attributes. Equation (
2) is used to derive
using the cosine similarity between the attribute vector product for the given index
i or position for similar context attributes. The algorithm produces a set of three logical values
,
, and
for each given index
i.
Algorithm 1 Algorithm for the proposed cosine similarity logical evaluation process |
Input: User (x), Device (y), Output data (z), Angle threshold () - 1:
if then - 2:
- 3:
else if then - 4:
- 5:
end if - 6:
if then - 7:
- 8:
else if then - 9:
- 10:
end if - 11:
if then - 12:
- 13:
else if then - 14:
- 15:
end if - 16:
Output: , ,
|
B. Weight: The weight is calculated using the GloVe word embedding model [
44] to consider the co-occurrence of the attributes in a global representation context of the healthcare database. The weight is based on the conditional probability of attribute occurrence or importance, as shown in Equation (
3):
where
is the probability of word
B occurring in the context of word
A in a given index
i of two semantic or syntactically similar attributes.
The three scaler values of
,
, and
are stored in
, as shown in Equation (
4), where
is a
vector:
In the above equation,
is the relationship score between the user (
x) and hardware (
y), and is derived using Equation (
5);
is the relationship score between the user (
x) and output (
z), and is derived using Equation (
6); and
is the relationship score between the output (
z) and hardware (
y), and is derived using Equation (
7):
where
is a scalar weight that is used to scale the bond score for each attribute based on the importance of the feature at given
i and derived by Equation (
3) and
N is the sequence number of attributes, which are numbered based on the probability of their context relationship. Only each similar class attribute of user, devices, and output is multiplied by each other; if they belong to the same category, the algorithm assigns them a similarity score of either 0 or 1, then multiplies them by the scalar weight for that attribute. This step is repeated for all attributes. The final multiplication is then aggregated to obtain a final scalar number that resembles the combined similarity score for
.
The
vector is normalized in Equation (
8) using the
function. The normalization process produces a new vector
of dimension
.
The result is stored in Equation (
9), and has three scalar values that are between zero and one.
The first part of the bond score is calculated in Equation (
10) by aggregating the three normalized scores,
,
, and
:
where
takes a value between zero and one, where zero indicates completely non-matched attributes and one indicates the highest attribute similarity match. Any number between zero and one requires an additional trust verification and reassessment.
At the same time,
is used to assess the similarity in the generated report text by evaluating the syntactic performance of the candidate report generated from the stored data in the healthcare information system. Unlike semantic analysis, syntactic analysis is effective for evaluating a full report, not just the meaning of a single word; on the other hand, semantic analysis provides a wider contextual analysis using various probabilistic-related attributes.
is inspired by the
score [
45], which was originally designed by
for scoring machine translation evaluations, as shown in Equation (
11):
where it can be seen that
has two parts; the first is the brevity penalty, which compensates for the length of a short generated report, while the second is the precision for the n-gram candidates. Here,
n refers to the number of candidates used to evaluate the score; the notation
n is typically 4 and can be increased to include more restrictions around identifying medical errors. In the case of
, the
score requires the candidate report to match the reference template by at least four attributes.
In a case with no patient history, the
score is zero, making it less effective for syntactic analysis. This scoring evaluation is more meaningful when the patient has a previous history in the EHR. The final bond trust normalizes the summation of
and
to keep the value between zero and one in Equation (
12).