2.1. Problem Definition and General Framework
Health-condition assessment is the task of automatically classifying a health-condition monitoring data into one of health-condition classes. Formally, we denote the training dataset as where are labelled health-condition monitoring sequence inputs, the corresponding label set, the set of corresponding domain knowledge attribute set, each domain knowledge attribute set has F domain knowledge attributes and N the total number of training data. Moreover, each health-condition monitoring record has L monitoring channels and each channel is a time-series .
Given the unlabelled testing dataset where are unlabelled health-condition monitoring sequence inputs, the goal of the health-condition assessment is to learn a predictive model which takes unlabelled health-condition monitoring sequence as input and outputs the prediction of its domain knowledge attributes and its prediction of health-condition class where is a set of M different health-condition classes.
As research results based on deep learning have achieved remarkable success in health-condition assessment tasks recently, which can naturally integrate and extract hierarchical features, the DNN model is used as the basic classifier. Moreover, to automatically extract features from CNN and capture long-term trends from RNN, we use a residual convolution recurrent neural network (RCR-Net), as shown in
Figure 2.
Nevertheless, a RCR-Net model still cannot be directly applied for reliably detecting and assessing health conditions from monitoring data with noise and class imbalance. On the one hand, the class imbalance and semantical ambiguities caused by noisy segments may lead to poor and even unacceptable quality of DNN models. On the other hand, a proper explanation of the health-condition assessment result is required to support reliable decision-making.
A K-margin-based interpretable learning method is presented to solve the above difficulties. Specifically, a skewness-aware RCR-Net approach (
Section 2.2) is presented to alleviate the problem of class imbalance. In addition, to tackle noise, a K-margin based diagnosis model (
Section 2.3) is proposed, which can automatically focus on the most important segments and involve only part of the labeled segments in the skewness-aware RCR-Net learning process. Finally, a knowledge-directed diagnosis interpretation model is employed to extract features that can be regarded as important domain knowledge. The framework of our model is shown in
Figure 2.
2.2. Skewness-Aware RCR-Net
To alleviate the problems of the lack of well-labelled data and class imbalance, a skewness-aware RCR-Net model is proposed to deal with these problems. First, a skewness-aware data augmentation is employed to generate more short-term segments from the long-term monitoring records. After that, a multi-view RCR-Net model is presented to establish a general deep learning model for capturing features automatically from the health-condition monitoring data.
Generally, two predefined parameters are required for the skewness-aware data augmentation—the length of the sliding windows
) and the maximum stride threshold
. For records with rare labels, the dynamic stride
becomes smaller, while, for records with common labels, it becomes larger. Formally, given the maximum stride threshold of
and the label set
, the dynamic stride of records with labeled
is given by the following formula:
Please note that, if the length of a monitoring record is less than the length of the sliding windows , a zero-padding approach would be employed to pad it with zeros at the end of it
Additionally, a multi-view RCR-Net is presented to establish a general deep learning model, which can automatically obtain features for a wide range of health conditions. Specifically, it consists of 33 layers of residual blocks [
43], one layer of recurrent block, and one layer of a fully connected block. The purpose of using residual blocks is to construct a deeper model through the residual connection between blocks, and automatically extract more effective local morphological-level features. To capture potential trend-level features in monitoring data, a recurrent layer with Bi-directional Long-Short Term Memory (Bi-LSTM) cells is employed. Finally, the prediction is made by a fully connected layer and a softmax layer. The cross-entropy loss of the objective function is calculated to optimize the loss of the training neural network.
The high-level architecture of the multi-view RCR-Net is shown in
Figure 3. It takes an augmented segment as input. After that, it splits the augmented segments into
fragments with length of
and outputs the health-condition prediction of the augmented segment.
2.3. Diagnosis Model Based on K-Margin
Although the improvement of data inadequacy and class imbalance through data augmentation process is essential to enhancing the performance of the health-condition assessment model, it inevitably generates “hard” health-condition monitoring segments due to noisy segments. For example, as shown in
Figure 4, the reference label of the 8th–14th augmented segments is ‘Atrial Fibrillation’ as the reference label of the ECG instance is ‘Atrial Fibrillation’. Nevertheless, the main ECG signs are ‘Too noisy to classify’ for the 8th–11th augmented segments and ‘Normal sinus rhythm’ for the 12th–14th augmented segments. Therefore, to filter these noisy segments, we first enhance the robustness of our model by using a K-margin-based noise filtering approach to compute the cross-entropy objective function of only a portion of the selected segments. Moreover, a K-margin-based health-condition Detector is proposed to predict the health condition of a monitoring record according to the top-K confident segments.
2.3.1. K-Margin-Based Noise Filtering
A minimum uncertainty margin is proposed to select appropriate segments to calculate cross-entropy. As the prediction of all augmented segments of each health monitoring record can be obtained by a trained multi-view RCR-Net model, and thus we define the uncertainty margin of
t-th segment
for a given monitoring record
as:
where
and
are the most probable and second-most probable prediction classes of the record
.
Usually, the trained model has less doubt (that is, more confidence) in distinguishing the two most probable categories if the uncertainty margin is smaller for a given health monitoring record
. Instead, segments with larger uncertainty margins are more ambiguous. Therefore, the most confident segment of
i-th monitoring record
can be defined as:
where
is the
t-th segment of the record
. A diagnosis based on minimum uncertainty margin is a strategy to find the predicted class of a monitoring record with the largest confidence. Similarly, the most confident label of a record
can be defined as follows:
A K-margin-based health-condition label prediction algorithm (K-margin) is presented (see Algorithm 1) to select top-K most confident segments for training the multi-view RCR-Net. Given a health-condition monitoring record and its augmented segments , it requires K iterations to output the top-K most confident segments and their label under trained multi-view RCR-Net model .
Intuitively, the top-K most confident fragments
can be used for the multi-view RCR-Net training process, instead of the entire segments array
, to avoid learning features from noise segments and automatically focus on the most essential part that can represent the characteristics of a given record. Nevertheless, it is not reliable to use them for the training process when the model is not reliable yet, since they are selected by the KEEN model. Hence, we calculate the average probability of predictions for all augmented segments of a record
:
where
is the prediction probability of
t-th augmented candidate for a record
, and
T is the number of augmented candidates for the record
. The top-K most confident candidates for the given record
would be used in the model fine-tuning process to achieve higher consistency among the augmented candidates and reduce the impact of noisy ones when
. Otherwise, all the candidates would be used. The reason is that the low average probability of predictions for all augmented candidates indicates a poor model performance. Therefore, the multi-view RCR-Net model is not accurate and reliable, a more “hard” sample is required for the training process. Thereafter, the selected
-segments
can be defined as follows:
Consequently, the task of health-condition assessment using our KEEN model can be expressed as optimizing the cross-entropy objective function:
Algorithm 1 K-margin() |
Input: A T-segments array } of an input health-condition monitoring record , trained multi-view RCR-Net model Parameter: An integer K Output: Top-K most confident segments and their label predictions - 1:
- 2:
, , - 3:
whiledo - 4:
using Equation (3) - 5:
using Equation (4) - 6:
# Implementation: remove segment index of from segment indexes list of - 7:
, - 8:
- 9:
end while - 10:
return
|
2.3.2. K-Margin-Based Health-Condition Detector
In order to diagnose the health-condition for the given health-condition monitoring record
, given the training model
, we convert the label predictions
of the top-K most confident segments into the matrix
as follows:
Furthermore, a K-majority weighted voting algorithm based on the minimum uncertainty margin is presented to output the most likely label of a given record
. The k-majority weighted voting method is to vote on the classes that the record
may belong to, which can be defined as follows:
Thus far, by naturally exploiting expected consistency among the segments associated with each record, our KEEN model can make a diagnosis of a given record . It can automatically deal with noise as only a portion of segments that belong to the same record would be included in the learning process; thereafter, diagnosis would be made according to the predictions of the most essential segments as well.
2.4. Knowledge-Directed Interpretation
In the real-life scenario of health-condition assessment, the method of computer-aided diagnosis requires a high degree of interpretability so that humans can give a reliable conclusion based on the diagnosis basis.
Recently, some methods try to explain the DNN model by highlighting the most relevant segments of health-condition monitoring data [
38] and exploring feature effects [
40] in the prediction process. Nevertheless, this kind of method cannot provide detailed domain technological-level information on “why”, as we still do not know the relationship between this kind of explanation and domain knowledge.
The main goal of knowledge-directed interpretation is to provide an indispensable domain technological-level information on “why”, and to make the complex reliability of predicting results well explicable in order to make reliable decisions. Specifically, we integrate domain knowledge into the training process of the skewness-aware RCR-Net.
The function of knowledge-directed interpretation is similar to the observation process when domain experts try to classify health-condition monitoring data. When domain experts diagnose health-condition monitoring records, they first observe the characteristics of these records. Taking arrhythmia diagnosis as an example, cardiologists usually analyze the ECG record to see characteristics such as “P waves disappear”, “RR-interval”, etc. After that, using these characteristics to classify them. Inspired by this, a knowledge-directed interpretation method is proposed to convert the morphological-level local features, which are automatically extracted from the health-condition monitoring record, into the knowledge-level features.
The architecture of the knowledge-directed interpretation method is shown in
Figure 5. The health-condition predictor is shown in the orange dashed box, which trains a multi-view RCR-Net for health-condition classification. The red one is a knowledge-level feature extraction step, which shares weights with the Residual layer of health-condition predictor, and then replaces the Bi-LSTM layer and Softmax layer with a Mean square error loss (MSE) layer to form a knowledge-level feature extractor.
It takes a fragment of a segment which is augmented from a health-condition monitoring recode , and outputs the prediction of its knowledge-level features . Subsequently, we concatenate all knowledge-level features of the fragments which belong to the same segment to get the combined features.
Formally, given extracted features
corresponding to one segment
, where each
, and
. The aggregated features are computed by column-wise aggregation operation of
. Then, the
f-th knowledge-level feature of a segment
can be denoted as:
where
is an aggregation operation. Usually, pooling operation (e.g., sum, max, average, etc.) [
44] can be used to aggregate these features. As each of the knowledge-level features varies very much in nature, it requires aggregating them into one unified feature vector according to their own properties. To name only a few, a Max Pooling operation would be employed as the aggregation operation for the knowledge-level attribute of “P waves disappear” to determine whether there is “P wave disappear” in some fragments of a segment. On the contrary, Mean Pooling operation would be better for the knowledge-level attribute of “ST slope”. Sometimes, other aggregation operations may be used as well; for example, the calculation of the standard deviation coefficient may be employed for the knowledge-level attribute of “PR interval” to determine whether the interval is unequal on the entire record.
Similarly, the f-th knowledge-level feature of a record
can be denoted as:
To avoid learning knowledge-level information from noisy segments, the aggregation operation would be only applied on the selected top-K most confident segments for the given health-condition monitoring record
under trained model
. We chose the mean square error loss as the empirical loss; thereafter, we optimize the knowledge-directed interpretation model by gradient descent as follows:
The total loss could be defined as follows:
In this way, domain technological-level features (e.g., P waves disappear, RR-interval, etc.), which correspond to domain knowledge that is convincing and understandable to a domain expert, can be extracted.