Next Article in Journal
Inversion of Gravity Anomalies Based on U-Net Network
Previous Article in Journal
Loop-Back Quantum Key Distribution (QKD) for Secure and Scalable Multi-Node Quantum Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

FFICL-Net: A Fusing Symmetric Feature-Importance Ranking Contrastive-Learning Network for Multivariate Time-Series Classification

School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Symmetry 2025, 17(4), 522; https://doi.org/10.3390/sym17040522
Submission received: 12 February 2025 / Revised: 21 March 2025 / Accepted: 27 March 2025 / Published: 30 March 2025
(This article belongs to the Section Computer)

Abstract

:
Supervised contrastive learning has emerged as a novel method to help solve the problem of multivariate time-series classification. By utilizing labeled data, it maximally learns the feature-representation differences between various categories. However, existing supervised contrastive-learning approaches lack model interpretability, making it difficult to analyze the importance ranking among features. Experimentally, different preprocessing of the data often results in variations in feature-importance ranking. Therefore, we propose FFICL-Net, which combines LSTM, to analyze the importance of sequence variables, and ITransformer, to treat each variable as a token, learning the self-attention relationships between variables and their relationship to the final feature representation. This approach contrasts the feature importance derived from the two modules, making two feature-importance ranking results more similar and forming a kind of symmetry, allowing the resultant feature representation to fuse the characteristics of both models, leading to more stable and accurate feature-importance ranking results and aiding in improving classification accuracy. We conducted comparative experiments on all 30 public UEA datasets and achieved the best results on most of these datasets compared to the current top models. The average accuracy reached 72.8%, which is an improvement of 0.7% over the best-performing model.

1. Introduction

Multivariate time-series data [1] refers to a collection of data points across multiple related variables over a set period of time at specified intervals. This type of data is crucial and widely used in various fields, including economics [2], climate [3], medicine [4], and social sciences [5]. Through multivariate time series, research tasks such as classification, forecasting [6], and imputation [7] can be performed on existing time-series data.
Contrastive learning [8] is often used in time-series classification tasks to address the issue of scarce labeled data. With contrastive learning, effective feature representations can be learned through self-supervised or semi-supervised learning under limited labeled conditions. For multivariate time series with complete classification labels, contrastive learning can also enhance the performance of supervised learning by generating similar representations for time-series data of the same category, while differentiating the feature representations between different categories.
However, current supervised contrastive learning [9] often lacks model interpretability to yield variable importance results for multivariate variables due to its black-box property and the sole focus on extracting important feature representations. In recent years, with the development of machine-learning and deep-learning technologies, an increasing number of researchers have begun to focus on the interpretability of computational models. Attention mechanisms are widely applied in the field of interpretability to highlight important features in the data. The classic Transformer [10] structure uses self-attention to focus on relationships between different time steps, while the ITransformer [11] structure, in contrast, focuses on the relationships between variables. However, this inversion initially ignores the sequential computation of each time step in the time series when extracting and embedding features, which is especially critical for multivariate time-series data.
LSTM (Long Short-Term Memory) [12] is effective at capturing long-term dependencies and avoiding problems of gradient vanishing or exploding. However, traditional LSTM models lack interpretability regarding the influence of each multivariate variable during sequential computation and are unable to determine feature importance.
To address the limitations of ITransformer and LSTM in time-series feature extraction, as well as their lack of interpretability, which makes it challenging to obtain stable and accurate feature-importance rankings, we designed FFICL-Net, a feature-importance contrastive-learning classification network. To achieve more stable and accurate feature-importance results that aid supervised contrastive learning in feature-representation learning, we expanded the traditional supervised contrastive-learning approach. We designed an interpretable LSTM model to learn the feature importance of each variable at every time step in the time series and an interpretable ITransformer model to learn the feature importance among variable embeddings. The feature-importance outcomes from both modules are compared, aiming to reduce their differences and incorporate this as a loss function. Additionally, the fused feature importance is used to weight the feature representations, assisting in the classification of multivariate time series.
Specifically, our work contributes in four main areas:
  • To address the issue of unstable feature-importance results obtained from interpretable models, we introduced contrastive-learning methods to predict time-series data. Feature fusion is performed across different temporal dimensions of the time series, which is beneficial for both the prediction and interpretation of the time series.
  • We design two methods incorporating attention mechanisms, adding attention layers to the ITransformer and LSTM, respectively, to capture the feature importance of variables in time-series data.
  • We propose a new loss function that combines supervised contrastive-learning loss, feature-importance contrast loss, and classifier loss.
  • Experimentally, we test our methods on thirty public UEA (University of East Anglia) datasets, achieving results that surpass current state-of-the-art time-series classification models.

2. Related Works

2.1. Multivariate Time-Series Classification

In recent years, the problem of time-series classification has seen extensive application research in fields such as medicine, finance, and engineering. Multivariate time series, where each sequence has multiple dimensions—for instance, ECG (Electrocardiogram) [13] and EEG (Electroencephalogram) [14] data—have garnered significant attention from researchers. In 2018, the UEA archive [15] published 30 multivariate time-series classification datasets, providing a robust benchmark for comparing the strengths and weaknesses of proposed algorithms.
Like many deep-learning challenges, multivariate time-series classification issues can often be addressed through feature extraction using convolutional neural networks. Chia-Yu Hsu [16] and others proposed an MTS (Multivariate Time-Series)-CNN (Convolutional Neural Network) model for multivariate time-series classification, which extracts temporal features along sub-sequences of a sliding window using CNNs, followed by a fully connected layer to classify faulty semiconductor components. CNNs do not intuitively utilize the temporal step features of time series, whereas LSTM [17] structures, through gating units, effectively transmit and preserve information over long time series. Zhao [18] and colleagues introduced LSTM-MFCN (Multi-scale Fully Convolutional Network), which combines LSTM and CNN modules to tackle multivariate time-series classification, constructing a multi-scale receptive field with MFCN [19] to extract spatial features over varying ranges, while the LSTM network captures temporal information, significantly improving the model’s classification performance by integrating spatio-temporal features.
Furthermore, the Transformer [20] model addresses long-term dependencies in time series through self-attention mechanisms that process an entire sequence for embedding. Liu [21] and others expanded the gated Transformer network by merging two Transformer blocks, combining time channel and time step feature information for classification. In contrast, ITransformer [11] does not change the Transformer’s network architecture but switches the roles of self-attention mechanisms and feed-forward networks. ITransformer considers different variables separately, encoding each variable as an independent token and using self-attention to model inter-variable correlations while using feed-forward networks to model the temporal correlations of variables, achieving better sequential temporal representations.
However, current research, such as LSTM and ITransformer, lacks interpretability and cannot extract certain features of time series.

2.2. Interpretable Time-Series Classification

Unlike the general field of computer vision, time-series classification can employ CAM (Class Activation Mapping) [22], using attention maps to analyze important regions within images. Inês Neves [23] researched an Explainable Artificial Intelligence (XAI) scheme, introducing an advanced conceptual framework for explainable time series, including PFI (Permutation Feature Importance) [24], LIME (Local Interpretable Model-agnostic Explanations) [25], and SHAP (SHapley Additive exPlanations) [26]. More specifically, Kevin Fauvel [27] proposed XCM (eXplainable Convolutional neural network for Multivariate time-series classification), a new compact convolutional neural network that directly identifies and extracts observation variables and time steps crucial for classification outcomes from the input data. Similarly, Tsung-Yu Hsieh [28] adopted a CAM-like attention mechanism network to specifically analyze the contributions of input variables to classification, using convolution-based feature extraction layers and attention mechanism layers to concurrently assess the importance of variables across the entire time series and at different time steps for classification results. However, the results obtained through existing interpretable models are often unstable.

2.3. Contrastive Learning for Time-Series Classification

The goal of contrast learning [8] is to encode similar data from the same category similarly, while ensuring that encodings of different category data differ as much as possible. Contrast learning introduces additional information to effectively extract features. Supervised contrast learning [9] goes further by specifying that the external information to be extracted is related to different image classifications, differing from general self-supervised contrast learning, pairing each sample with its data-augmented counterpart as positive pairs, and each sample with other samples as negative pairs. Supervised contrast learning uses existing label information to better reduce the distance between feature representations of the same category in feature space and increase the distance between different categories, which is greatly beneficial for classification issues.
Moreover, in time-series classification contrast learning, differing from general data augmentation, Luo [29] proposed InfoTs, a new information-aware contrast-learning method that adaptively selects the optimal augmentations for learning time-series representations. Zhang [30] introduced SMDE, a unique time-series contrastive-learning framework, diverging from conventional techniques that predominantly concentrate on instance-level contrast. SMDE leverages signal decomposition and ensemble methodologies, enhancing focus on localized mode alterations within time-series signals.
Therefore, we combined interpretable time-series classification and a supervised contrastive-learning framework to address the problems of existing time-series models lacking interpretability and existing interpretable models producing unstable results.

3. Method

In this section, we propose a supervised contrastive-learning model that integrates feature importance (FFICL-Net); the overall framework is illustrated in Figure 1. By combining the interpretable LSTM and interpretable ITransformer, we obtain more accurate feature-importance ranking. Furthermore, employing an end-to-end supervised contrastive-learning framework enables the model to extract more precise feature representations to aid classification. Implementation and functionality of the interpretable LSTM module, interpretable ITransformer module, and improved supervised contrastive-learning framework will be detailed in the following subsections.

3.1. Problem Formulation

Define a multivariate time series as X = ( X 1 , X 2 , , X T ) , where T denotes the length of the time series, starting from time step 1 and ending at time step T. Each X t = [ x t 1 , , x t p ] , where x t belongs to R p and p represents the dimension of the multivariate variables, indicating that there are p variables that collectively constitute the data at time step t. For each X, there is a corresponding label y, which represents the true label of this time series. The task is to learn the features and inherent information of the time series from a set of known input multivariate time series and labels, where X = { ( X 1 , y 1 ) , , ( X N , y N ) } , N represents the number of samples, and predict the labels y for other multivariate time series.
While obtaining the classification results for each time series, we also aim to determine the influence of each multivariate variable on the classification outcome, that is, the feature-importance results I = [ a 1 , , a p ] , where I belongs to R p and a p represents the attention score of the p-th variable.
p = 1 P a p = 1

3.2. Interpretable LSTM

The interpretable LSTM module processes the original time-series data, capturing hidden states at each time step. To enhance the interpretability of the standard LSTM framework, we adopt a strategy where the hidden states corresponding to each variable are separated and then encoded individually, producing distinct feature representations for each variable. These representations are subsequently evaluated through an attention mechanism, which assigns a weight to each variable based on its relevance to the classification task. The architecture of the model is illustrated in Figure 2a.
To modify the original LSTM network structure, we define h ˜ t = [ h t 1 , , h t P ] T , where h ˜ t R P × d , h t p R d , h ˜ t as the hidden state matrix at time step t, where h t p represents the state of the p-th variable at time t, defined as a custom-length vector of size 1 × d . Consequently, the entire h ˜ t layer has a size of D = P × d , where P is the number of variables and d is the dimension of each variable’s state vector.
Subsequently, to convert the input vector x t at time t into a vector of size P × d , we define the transformation matrix parameter U j =   [ U i 1 , , U i P ] T , where U i R P × d × d , U i p R d × d with dimension d. Additionally, to preserve the LSTM network’s capability of recursively extracting information from the time series, we define another transformation matrix parameter, W j = [ W j 1 , , W j P ] , where W j R P × d × d and W j p R d × d . Using U j and W j , we can perform the necessary computations.
j ˜ t = tanh W j h ˜ t 1 + U j x t + b j
Here, j ˜ t R P × d ; each j t p represents the hidden state of the variable p at that moment. The tensor dot operation ⊛ is defined as the product of two tensors along the P axis.
Further, by replacing j ˜ t as the hidden state update in the standard LSTM, we can recursively update the input gate i t , forget gate f t , output gate o t , and the memory cells c t using the following equations:
i t f t o t = σ W x t vec ( h ˜ t 1 ) + b
c t = f t c t 1 + i t vec ( j ˜ t )
h ˜ t = mat o t tanh ( c t )
In the above formulas, vec ( · ) refers to the vectorization operation, which concatenates the columns of a matrix into a single vector. The concatenation operation is denoted by ⊕, and element-wise multiplication is denoted by ⊙. The operator mat ( · ) reshapes a vector from R D into a R P × d matrix.
After recursively obtaining the final step h T ˜ , we can compute the feature importance I using the formula:
I A = p = 1 P a p h T i ˜

3.3. Interpretable ITransformer

For the ITransformer, unlike the Transformer, which encodes data points at the same time step into a unified temporal token using self-attention mechanisms to model temporal correlations between different moments, the ITransformer encodes different variables separately. This approach addresses the issue in Transformers where inter-variable correlations are often neglected. The tokens produced by the ITransformer, combined with self-attention mechanisms, can reveal the correlations between different variables.
However, concerning the issue of feature importance, the self-attention mechanism in the ITransformer only considers the relationships between different variables and does not take into account the correlation between the weights of the variables and the final feature representation. We have improved the original ITransformer structure by adding an additional layer of attention mechanism to extract the feature importance of variables and integrate it with the existing feature vectors. This enhancement allows the model to focus more on important variables, resulting in better feature representations. The model structure is illustrated in Figure 2b.
For the input X T , it first goes through an Embedding process, as shown in the equation:
M = Embedding ( X T )
This reduces the original input vector with time-series length T to D dimensions. That is, Embedding : R T R D , M = { m 1 , , m P } belongs to R P × D . Here, each m p represents the intermediate feature extracted for the p-th variable, with a time step length smaller than the original length. Through attention, the importance I B of the time series after embedding can be obtained.
The attention layer employs two linear layers to calculate the attention weights, and the formulas are given by:
s t = σ 2 σ 1 W 1 m t + B 1 W 2 + B 2
I B = softmax ( s t )
where W 1 , W 2 , B 1 , and B 2 are parameters that the model learns, and α 1 and α 2 are the nonlinear activation functions used, here chosen to be tanh.

3.4. Improved Supervised Contrastive Learning

To extract better feature information, a common approach is to use supervised contrastive learning. Supervised contrastive learning is an extension of contrastive learning, which is generally unsupervised. For classification models, supervised contrastive learning leverages the additional label information to maximize the differences between categories. Furthermore, to address the issue of unstable important feature indicators produced by the interpretable LSTM, we decided to refine the supervised contrastive-learning approach by comparing the feature-importance results obtained from the interpretable LSTM and the interpretable ITransformer discussed in the first two sections. The feature-importance results obtained from the two explainable models should be similar; therefore, the differences in their comparisons should be minimized. By computing feature importance twice, we aim to ensure that the feature-importance results obtained on the dataset do not vary significantly with different training sets.
After obtaining accurate feature-importance rankings through the refined supervised contrastive learning, we employed an end-to-end model to integrate the learned feature representations with the feature-importance ranking results to obtain the final classification feature Z. Z is then used as the input vector for both the supervised contrastive learning and the classification model.
Z = H I B T
Next, the representation Z is fed into a classifier g, which produces the estimated sample label y ^ = g ( Z ) . With the ground truth y Y , we measure the classification loss.

3.5. Loss Function

The proposed end-to-end model is trained by minimizing a hybrid loss function L, which combines feature-importance contrastive loss, supervised contrastive loss, and classification loss with weighted terms.
Feature importance contrastive loss: Feature-importance contrastive loss specifically seeks to reduce the differences in feature importance derived from both the interpretable LSTM and interpretable ITransformer models, making two feature-importance ranking results more similar and forming a kind of symmetry. By minimizing this loss, the model learns to align the feature-importance representations between two temporal patterns to ensuring greater stability of the model.
L 1 = 1 N n = 1 N ( I B n I A n )
Supervised contrastive loss: The supervised contrastive loss L 2 encourages the model to increase the similarity between representations of samples from the same class while reducing the similarity between representations of samples from different classes, thereby aiding in learning meaningful feature representations. Unlike unsupervised loss, the true labels of the samples are directly used during the model’s training process, which helps the model capture the distinguishing features between different classes. The formula for this loss is as follows:
L 2 = log y = y + exp ( sim ( z , z + ) / τ ) y = y exp ( sim ( z , z ) / τ )
Z + is the representation derived from samples that belong to the same class as Z, whereas Z is derived from samples with different labels.
Classification loss: We use a loss function widely employed for multi-class classification problems, the softmax cross-entropy loss function. The specific formula is as follows:
L 3 = 1 N i = 1 N c = 1 C { y i = c } log y ^ c
where C represents the different classes and N denotes the number of samples.
Hybird loss: The hybrid loss combines the three types of losses mentioned above and weights their sum. The specific formula is as follows:
L = L 1 + L 2 + L 3
The hybrid loss enables the model to effectively balance the contributions from feature-importance contrastive, supervised contrastive, and classification losses, thereby achieving the best possible classification performance for the given time-series classification task.

4. Results

We conducted comparative experiments on thirty public UEA datasets, comparing with State-Of-The-Art (SOTA) models such as ITransformer and MICOS. We tested the differences in classification effectiveness between FFICL-Net and Transformer-like networks, and mixed supervised contrastive-learning models. For the issue of interpretability, we specifically calculated the feature-importance ranking on the FaceDetection dataset from among the thirty datasets and conducted a series of validation experiments.

4.1. Datasets

UEA datasets: The UEA dataset is currently an important open-source dataset resource in the field of time-series mining. Each dataset sample includes a class label and contains time-series data such as ECG/EEG brain signals, human activity recognition, and motion classification. The detailed information about the thirty UEA datasets we used is shown in Table 1.
FaceDetection: FaceDetection [15] includes MEG (Magnetoencephalography) recordings and category labels, where class 0 represents Scramble and class 1 represents Face. The data come from ten subjects (subject01 to subject10) with test data from six additional subjects (subject11 to subject16). Each subject has approximately 580–590 trials. Each trial includes 1.5 seconds of MEG recording and the corresponding category label.

4.2. Dataset Format and Preprocessing

As some datasets in the UEA dataset we used were not normalized, we have normalized these datasets with Z-score normalization.
x = x μ σ
Meanwhile, to ensure the security of the dataset during transtmission, we utilized a new method [31] for the encryption of medical datasets.

4.3. Training Details

For our end-to-end model, we have a Tesla P100 GPU device for 100 epochs, with a batch size of 16 and a learning rate of 1 × 10 3 . The loss function is as described in Section 3, using the adam optimizer.

4.4. Results

To validate the effectiveness and generalizability of our experiment, we conducted tests on thirty public UEA datasets, comparing them with classical methods (DTW (Dynamic Time Warping) [32], MLP (Multilayer Perceptron) [33] and LSTM [17], Transformer-based methods (FED [34] and Crossformer [35]), and recent publicated state-of-the-art methods (ITransformer [11], TimesNet [36], and SMDE [30]). As shown in Table 2, our model achieved the best results in 12 out of 30 datasets, with the highest average accuracy as well. Particularly in the FaceDetection dataset, a large dataset, our model achieved a 1.5% improvement, demonstrating that the proposed contrastive-learning framework can learn more complex features. Figure 3 shows the critical difference plot of all the above methods.
In the FaceDetection dataset, we also used 5-fold cross-validation on five different metrics to measure the model’s classification results: AUC, Accuracy, Recall, Specificity, and F1 score. The experimental results were averaged and are shown in Table 3, indicating that our model, compared to the baseline, achieved the best results in all five evaluation metrics—AUC, Accuracy, Recall, Specificity, and F1 score.
The ROC curve comparison of the three models is shown in Figure 4:

4.5. Interpretability Analysis

We validated feature-importance ranking on the FaceDetection dataset using 5-fold cross-validation. The importance ranking results obtained from a single experimental model were used as the standard. We selected a subset of features, tested every ten dimensions on classical sequence models and generalization tests, validating these on the other four validation sets in the 5-fold cross-validation. The average accuracy was calculated, and all the results in this section are the averages of multiple tests.
Feature importance ranking results: To verify that our method can obtain stable feature-importance ranking results, we selected multiple epochs from 5-fold cross-validation and extracted their feature-importance ranking results. A total of five attention scores were obtained, and we calculated the cosine similarity between each pair, then compared the minimum cosine similarity among the five groups to analyze the differences in feature-importance ranking results.
As can be seen from Table 4, with the increase in training rounds, the minimum cosine similarity also gradually increases, indicating that the differences among the five attention scores become smaller and smaller. Through the improved supervised contrastive-learning framework, combined with the attention scores of interpretable LSTM and interpretable ITransformer, we can obtain stable feature-importance ranking results. The subsequent subsections will analyze the correctness of the obtained feature-importance ranking results.
Model self-testing on selected dimensions: Similarly, we utilize the feature-importance ranking results I A obtained from the FaceDetection training dataset. We select the top 25 dimensions (1–25), middle 25 dimensions (50–75), and the least important 25 dimensions (119–144). As shown in Figure 5a, the linear accuracy of FFICL decreases in descending order of dimensions, validating the correctness of our model’s importance ranking. The less important dimensions provide less assistance to classification.
Temporal recursive models on selected dimensions: We tested every ten dimensions on the classical sequence models RNN (Recurrent Neural Network) [37], LSTM [17], and BiLSTM [38] using the by-product feature-importance ranking results I A obtained during training on Face Detection. As shown in Figure 5b, initially, the accuracy remained high because features with higher importance were more correlated with the ground truth. However, as the importance decreased continuously, unimportant features could not assist these sequence models in correctly classifying, leading to a sharp decline in accuracy.
Out-of-distribution generalization: Apart from validation on in-domain data, we also tested the model on another dataset, SelfRegulationSCP1. We used the model parameters trained on FaceDetection as a pre-trained model to test the importance ranking obtained from the training of SelfRegulationSCP1. SelfRegulationSCP1 comprises six features, and for each iteration one feature is removed, and classification is performed using the remaining features. As shown in Figure 5c, removing more important feature indicators results in a descending order of accuracy in model classification.

4.6. Ablation Study

In order to compare and contrast the effects of the learning framework and the explainable attention mechanism, we conducted the following ablation experiments. The specific experimental results are shown in Table 5.
It can be seen that, if only interpretable LSTM and only interpretable ITransformer are used, without a contrastive-learning structure, the accuracy obtained is 67.2%. Similarly, if only the original ITransformer structure is compared and learned without feature-importance score, the accuracy obtained is 66%. Thus, we can see the role of the two modules.
It is worth noting that the accuracy obtained using only ITransformer in Table 3 is 68.5% higher than 67.2% in Table 5, where both interpretable ITransformer and interpretable LSTM are used for feature fusion. This is because, when interpretable ITransformer and interpretable LSTM undergo simple feature fusion, they produce different feature-importance ranking results, and their attention scores affect the final classification results. This is also one of the reasons for adding the CL module, to ensure that the two feature-importance ranking results are more similar.

5. Discussion

The novel interpretable supervised contrastive-learning framework proposed in this study can be applied to various time series to rank the importance of each feature within the time series. This holds significant implications for time-series data in fields such as medicine and finance. For instance, the importance ranking results can be further analyzed to assess the impact of medical indicators on patient prognosis or to identify the factors that most significantly influence market prices in financial data. Additionally, the improved supervised contrastive-learning framework presented in this paper offers a potential approach for contrastive learning to extract better feature representations.
However, this study also has certain limitations. First, the method for testing the importance of indicators is not sufficiently robust, lacking a quantitative way to prove the effectiveness of the feature-importance results. The model also performs poorly on datasets with fewer features. Secondly, the model was only tested on the public UEA datasets, which do not provide detailed information about the dimensionality of features. Thus, the feature-importance rankings obtained cannot provide meaningful explanations as in datasets from the finance or climate sectors, where some prior knowledge of feature importance is available from other studies to aid or validate explainable models.All of the aforementioned areas are worthy of further research.

6. Conclusions

The interpretability of deep-learning models for time-series classification is a significant area of research. Previous explainable models often failed to provide a stable importance ranking, with results potentially varying significantly based on the choice of training set random seeds. In this study, we introduced FFICL-Net for classifying multivariate time series. We conducted tests on thirty public UEA datasets to validate the model’s generalizability and the stability of feature-importance rankings obtained during training.
The introduced improvements in the contrastive-learning framework and attention mechanisms effectively enhanced the accuracy of the final outcome classification for patients. Combining the feature-importance rankings from the interpretable LSTM with those from the improved attention mechanism of the interpretable ITransformer helped to effectively extract features. The average accuracy achieved on the thirty datasets was 0.7% higher than previous State-Of-The-Art (SOTA) models. It can be said that the proposed integrated model outperforms existing research and provides valuable insights for subsequent studies.
In summary, our proposed model that integrates feature-importance ranking and contrastive learning is the first to combine explainable modules and improve the existing contrastive learning framework, using deep-learning methods to study the problem of multivariate time-series classification. This approach is highly accurate, and the feature-importance rankings it generates are of considerable reference value.

Author Contributions

W.Q., methodology, software, writing—original draft, visualization; A.S., supervision, writing—review and editing; C.Z., investigation, validation; S.L., investigation, validation. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

UEA datasets we used in this paper are from public databases. UEA datasets downloaded link: https://www.timeseriesclassification.com/dataset.php (accessed on 2 November 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Singhal, A.; Seborg, D.E. Clustering multivariate time-series data. J. Chemom. J. Chemom. Soc. 2005, 19, 427–438. [Google Scholar]
  2. Islam, F.; Shahbaz, M.; Ahmed, A.U.; Alam, M.M. Financial development and energy consumption nexus in malaysia: A multivariate time-series analysis. Econ. Model. 2013, 30, 435–441. [Google Scholar] [CrossRef]
  3. Mudelsee, M. Climate Time Series Analysis; Atmospheric and Oceanographic Sciences Library; Springer: Cham, Switzerland, 2010; p. 397. [Google Scholar]
  4. Ghassemi, M.; Pimentel, M.; Naumann, T.; Brennan, T.; Clifton, D.; Szolovits, P.; Feng, M. A multivariate timeseries modeling approach to severity of illness assessment and forecasting in icu with sparse, heterogeneous clinical data. Proc. AAAI Conf. Artif. Intell. 2015, 29, 446–453. [Google Scholar] [CrossRef]
  5. Chamlin, M.B.; Sanders, B.A. Social policy and crash fatalities: A multivariate time series analysis. J. Crime Justice 2018, 41, 322–333. [Google Scholar]
  6. Du, S.; Li, T.; Yang, Y.; Horng, S.-J. Multivariate time series forecasting via attention-based encoder–decoder framework. Neurocomputing 2020, 388, 269–279. [Google Scholar] [CrossRef]
  7. Miao, X.; Wu, Y.; Wang, J.; Gao, Y.; Mao, X.; Yin, J. Generative semi-supervised learning for multivariate time series imputation. Proc. AAAI Conf. Artif. Intell. 2021, 35, 8983–8991. [Google Scholar] [CrossRef]
  8. Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive-learning of visual representations. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
  9. Khosla, P.; Teterwak, P.; Wang, C.; Sarna, A.; Tian, Y.; Isola, P.; Maschinot, A.; Liu, C.; Krishnan, D. Supervised contrastive learning. Adv. Neural Inf. Process. Syst. 2020, 33, 18661–18673. [Google Scholar]
  10. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
  11. Liu, Y.; Hu, T.; Zhang, H.; Wu, H.; Wang, S.; Ma, L.; Long, M. itransformer: Inverted transformers are effective for time series forecasting. arXiv 2023, arXiv:2310.06625. [Google Scholar]
  12. Graves, A. Long short-term memory. In Supervised Sequence Labelling with Recurrent Neural Networks; Springer: Berlin/Heidelberg, Germany, 2012; pp. 37–45. [Google Scholar]
  13. Gao, J.; Zhang, H.; Lu, P.; Wang, Z. An effective lstm recurrent network to detect arrhythmia on imbalanced ecg dataset. J. Healthc. Eng. 2019, 2019, 6320651. [Google Scholar]
  14. Shen, F.; Liu, J.; Wu, K. Multivariate time series forecasting based on elastic net and high-order fuzzy cognitive maps: A case study on human action prediction through eeg signals. IEEE Trans. Fuzzy Syst. 2020, 29, 2336–2348. [Google Scholar] [CrossRef]
  15. Bagnall, A.; Dau, H.A.; Lines, J.; Flynn, M.; Large, J.; Bostrom, A.; Southam, P.; Keogh, E. The uea multivariate time series classification archive. arXiv 2018, arXiv:1811.00075. [Google Scholar]
  16. Hsu, C.-Y.; Liu, W.-C. Multiple time-series convolutional neural network for fault detection and diagnosis and empirical study in semiconductor manufacturing. J. Intell. Manuf. 2021, 32, 823–836. [Google Scholar] [CrossRef]
  17. Kawakami, K. Supervised Sequence Labelling with Recurrent Neural Networks. Ph.D. Thesis, Technical University of Munich, Munich, Germany, 2008. [Google Scholar]
  18. Zhao, L.; Mo, C.; Ma, J.; Chen, Z.; Yao, C. Lstm-mfcn: A time series classifier based on multi-scale spatial–temporal features. Comput. Commun. 2022, 182, 52–59. [Google Scholar] [CrossRef]
  19. Salloum, R.; Ren, Y.; Kuo, C.-C.J. Image splicing localization using a multi-task fully convolutional network (mfcn). J. Vis. Commun. Image Represent. 2018, 51, 201–209. [Google Scholar] [CrossRef]
  20. Zerveas, G.; Jayaraman, S.; Patel, D.; Bhamidipaty, A.; Eickhoff, C. A transformer-based framework for multivariate time series representation learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, 14–18 August 2021; pp. 2114–2124. [Google Scholar]
  21. Liu, M.; Ren, S.; Ma, S.; Jiao, J.; Chen, Y.; Wang, Z.; Song, W. Gated transformer networks for multivariate time series classification. arXiv 2021, arXiv:2103.14438. [Google Scholar]
  22. Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
  23. Neves, I.; Folgado, D.; Santos, S.; Barandas, M.; Campagner, A.; Ronzio, L.; Cabitza, F.; Gamboa, H. Interpretable heartbeat classification using local model-agnostic explanations on ecgs. Comput. Biol. Med. 2021, 133, 104393. [Google Scholar] [CrossRef]
  24. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  25. Ribeiro, M.T.; Singh, S.; Guestrin, C. “why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
  26. Lundberg, S.M.; Lee, S.-I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
  27. Fauvel, K.; Lin, T.; Masson, V.; Fromont, É.; Termier, A. Xcm: An explainable convolutional neural network for multivariate time series classification. Mathematics 2021, 9, 3137. [Google Scholar] [CrossRef]
  28. Hsieh, T.-Y.; Wang, S.; Sun, Y.; Honavar, V. Explainable multivariate time series classification: A deep neural network which learns to attend to important variables as well as time intervals. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, Virtual, 8–12 March 2021; pp. 607–615. [Google Scholar]
  29. Luo, D.; Cheng, W.; Wang, Y.; Xu, D.; Ni, J.; Yu, W.; Zhang, X.; Liu, Y.; Chen, Y.; Chen, H.; et al. Time series contrastive learning with information-aware augmentations. Proc. AAAI Conf. Artif. Intell. 2023, 37, 4534–4542. [Google Scholar]
  30. Zhang, H.; Chan, S.; Qin, S.; Dong, Z.; Chen, G. Smde: Unsupervised representation learning for time series based on signal mode decomposition and ensemble. Knowl.-Based Syst. 2024, 301, 112369. [Google Scholar]
  31. Rehman, M.U.; Shafique, A.; Khan, I.U.; Ghadi, Y.Y.; Ahmad, J.; Alshehri, M.S.; Al Qathrady, M.; Alhaisoni, M.; Zayyan, M.H. An efficient deep learning model for brain tumour detection with privacy preservation. CAAI Trans. Intell. Technol. 2023. [Google Scholar] [CrossRef]
  32. Berndt, D.J.; Clifford, J. Using dynamic time warping to find patterns in time series. In Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 31 July–1 August 1994; pp. 359–370. [Google Scholar]
  33. Zheng, Y.; Liu, Q.; Chen, E.; Ge, Y.; Zhao, J.L. Time series classification using multi-channels deep convolutional neural networks. In Web-Age Information Management, Proceedings of the 15th International Conference, WAIM 2014, Macau, China, 16–18 June 2014; Springer: Cham, Switzerland, 2014; pp. 298–310. [Google Scholar]
  34. Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MA, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar]
  35. Zhang, Y.; Yan, J. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In Proceedings of the Eleventh International Conference on Learning Representations, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
  36. Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. Timesnet: Temporal 2d-variation modeling for general time series analysis. arXiv 2022, arXiv:2210.02186. [Google Scholar]
  37. Hüsken, M.; Stagge, P. Recurrent neural networks for time series classification. Neurocomputing 2003, 50, 223–235. [Google Scholar]
  38. Zhang, L.; Xiang, F. Relation classification via bilstm-cnn. In Data Mining and Big Data, Proceedings of the Third International Conference, DMBD 2018, Shanghai, China, 17–22 June 2018, Proceedings 3; Springer: Cham, Switzerland, 2018; pp. 373–382. [Google Scholar]
Figure 1. The model we proposed includes two ways to obtain feature importance. (1) The feature importance A is extracted by adding a layer of attention through the interpretable ITransformer. (2) The feature importance B is obtained through the interpretable LSTM. Then, comparing the two to obtain the feature-importance contrastive loss L1. Z + and Z are feature representation and obtain the supervised contrastive-learning loss L2 through comparison. “FC” represents the fully connected layer, which classifies the original features of the fusion feature importance to obtain classification loss L3.
Figure 1. The model we proposed includes two ways to obtain feature importance. (1) The feature importance A is extracted by adding a layer of attention through the interpretable ITransformer. (2) The feature importance B is obtained through the interpretable LSTM. Then, comparing the two to obtain the feature-importance contrastive loss L1. Z + and Z are feature representation and obtain the supervised contrastive-learning loss L2 through comparison. “FC” represents the fully connected layer, which classifies the original features of the fusion feature importance to obtain classification loss L3.
Symmetry 17 00522 g001
Figure 2. (a) Interpretable LSTM: Using one time step from the original time series input as a token, updating the hidden state through network parameters to obtain the final hidden state H T , applying a weighted attention mechanism to H T to derive feature importance I A . (b) Interpretable ITransformer: Treating one variable from the original time series input as a token, after reducing the time dimension through embedding, applying attention to the feature representation of each variable to derive feature importance I B .
Figure 2. (a) Interpretable LSTM: Using one time step from the original time series input as a token, updating the hidden state through network parameters to obtain the final hidden state H T , applying a weighted attention mechanism to H T to derive feature importance I A . (b) Interpretable ITransformer: Treating one variable from the original time series input as a token, after reducing the time dimension through embedding, applying attention to the feature representation of each variable to derive feature importance I B .
Symmetry 17 00522 g002
Figure 3. Critical difference plot of the MTS classifiers on the 30 UEA datasets.
Figure 3. Critical difference plot of the MTS classifiers on the 30 UEA datasets.
Symmetry 17 00522 g003
Figure 4. ROC curves of the test set on three models. Green ROC curve represents the results of the FFICL model on the test set. Red ROC curve represents the results of the ITransformer model on the test set. Purple ROC curve represents the results of the MICOS model on the test set.
Figure 4. ROC curves of the test set on three models. Green ROC curve represents the results of the FFICL model on the test set. Red ROC curve represents the results of the ITransformer model on the test set. Purple ROC curve represents the results of the MICOS model on the test set.
Symmetry 17 00522 g004
Figure 5. (a) Evaluation results of selected dimensions of FFICL-Net. (b) Temporal recursive models evaluation results of selected dimensions of RNN, LSTM, and BiLSTM. (c) The accuracy changing trend on SelfRegulationSCP1 obtained by deleting one of the nine indicators in order of indicator importance.
Figure 5. (a) Evaluation results of selected dimensions of FFICL-Net. (b) Temporal recursive models evaluation results of selected dimensions of RNN, LSTM, and BiLSTM. (c) The accuracy changing trend on SelfRegulationSCP1 obtained by deleting one of the nine indicators in order of indicator importance.
Symmetry 17 00522 g005
Table 1. The basic information and feature dimensions of all 30 UEA datasets.
Table 1. The basic information and feature dimensions of all 30 UEA datasets.
DatasetTraincasesTestcasesDimensionsLengthClasses
ArticularyWordRecognition275300914425
AtrialFibrillation151526403
BasicMotions404061004
CharacterTrajectories14221436318220
Cricket108726119712
DuckDuckGeese604013452705
EigenWorms128131617,9845
Epilepsy13713832064
EthanolConcentration261263317512
ERing30304656
FaceDetection58903524144622
FingerMovements31610028502
HandMovementDirection320147104004
Handwriting150850315226
Heartbeat204205614052
JapaneseVowels27037012299
Libras18018024515
LSST2459246663614
InsectWingbeat30,00020,0002007810
MotorImagery2781006430002
NATOPS18018024516
PenDigits749434982810
PEMS-SF2671739631447
Phoneme331533531121739
RacketSports1511526304
SelfRegulationSCP126829368962
SelfRegulationSCP2200280711522
SpokenArabicDigits65992199139310
StandWalkJump1215425003
UWaveGestureLibrary12032033158
Table 2. Model accuracy comparison on 30 public UEA datasets, compared with the results of classical methods, Transformer-based methods, and recent publicated state-of-the-art methods on 30 UEA datasets.
Table 2. Model accuracy comparison on 30 public UEA datasets, compared with the results of classical methods, Transformer-based methods, and recent publicated state-of-the-art methods on 30 UEA datasets.
Datasets/ModelsClassical MethodsRNNTransformersMixed SupervisedOurs
DTW-KNN MLP MLSTM-FCN FED CrossF ITransformer TimesNet TS-TCC SMDE
(1994) (2014) (2019) (2022) (2022) (2023) (2023) (2022) (2024)
ArticularyWordRecognition93.097.397.397.798.098.096.297.396.398.0
AtrialFibrillation20.046.726.753.346.740.033.330.020.040.0
BasicMotions100.085.095.092.590.092.5100.0100.097.5100.0
CharacterTrajectories96.198.898.599.198.299.499.299.199.299.7
Cricket97.291.791.784.784.798.687.593.8100.0100.0
DuckDuckGeese44.042.067.557.560.055.062.542.060.066.0
EigenWorms63.452.750.461.855.083.284.039.384.087.8
Epilepsy97.860.176.165.973.273.278.195.797.174.6
ERing91.982.694.192.984.493.391.494.490.093.7
EthanolConcentration28.933.537.328.935.031.225.127.228.932.3
FaceDetection54.967.454.568.966.266.067.854.954.970.3
FingerMovements53.064.058.062.064.060.059.047.053.062.0
HandMovementDirection24.358.136.558.158.140.050.040.037.850.0
Handwriting27.222.528.626.026.228.023.139.241.932.0
Heartbeat69.373.266.376.676.673.775.169.574.678.5
InsectWingbeatN/A10.016.710.027.611.010.047.955.821.8
JapaneseVowels88.197.897.698.798.998.496.590.897.098.1
Libras81.173.385.681.176.187.077.878.385.087.8
LSST52.635.837.367.842.857.559.239.161.962.7
MotorImagery46.061.051.061.061.059.051.051.062.059.0
NATOPS85.093.988.996.788.393.981.880.091.786.7
PEMS-SF79.882.169.988.482.180.983.263.580.989.6
PenDigits94.693.097.897.393.797.698.296.798.298.1
PhonemeSpectra13.37.111.011.77.610.118.216.521.911.0
RacketSports84.979.080.384.281.681.682.682.584.283.6
SelfRegulationSCP175.688.487.489.891.588.791.889.089.492.2
SelfRegulationSCP248.351.747.254.453.354.453.353.357.858.3
SpokenArabicDigits90.696.799.099.796.498.398.499.897.899.1
StandWalkJump40.060.06.760.053.353.353.345.053.360.0
UWaveGestureLibrary84.781.989.180.081.687.585.380.691.991.3
Average Accuracy64.268.464.870.268.469.769.166.172.172.8
Best accuracy count33253113612
Average Rank7.336.76.474.575.725.15.436.54.52.95
Table 3. Comparison of different models on FaceDetection dataset. This table shows the results of the five evaluation indicators AUC, Accuracy, Recall, Specificity, and F1-Score.
Table 3. Comparison of different models on FaceDetection dataset. This table shows the results of the five evaluation indicators AUC, Accuracy, Recall, Specificity, and F1-Score.
MetricsMethodsResult
AUCSMDE0.67
ITransformer0.71
FFICLNet0.74
AccuracySMDE0.651
ITransformer0.685
FFICLNet0.703
RecallSMDE0.654
ITransformer0.68
FFICLNet0.642
SpecificitySMDE0.648
ITransformer0.665
FFICLNet0.741
F1-scoreSMDE0.652
ITransformer0.675
FFICLNet0.676
Table 4. The minimum cosine similarity changes with the number of training epochs.
Table 4. The minimum cosine similarity changes with the number of training epochs.
EpochEpoch 1Epoch 25Epoch 50Epoch 75Epoch 100
Minimum Cosine Similarity0.54840.69210.82470.87610.9212
Table 5. An ablation experiment conducted on the classification models of the unused FFI (Fusion Feature Importacne) module and unused CL (Contrastive Learning) module, respectively (“w/o” means none). The AUC, Accuracy, Recall, Specificity, and F1-Score results obtained from the experiments are shown in this table.
Table 5. An ablation experiment conducted on the classification models of the unused FFI (Fusion Feature Importacne) module and unused CL (Contrastive Learning) module, respectively (“w/o” means none). The AUC, Accuracy, Recall, Specificity, and F1-Score results obtained from the experiments are shown in this table.
MetricsMethodsResult
AUCFFICLNet w/o FFI0.69
FFICLNet w/o CL0.72
FFICLNet0.74
AccuracyFFICLNet w/o FFI0.66
FFICLNet w/o CL0.672
FFICLNet0.703
RecallFFICLNet w/o FFI0.589
FFICLNet w/o CL0.68
FFICLNet0.642
SpecificityFFICLNet w/o FFI0.727
FFICLNet w/o CL0.665
FFICLNet0.741
F1-scoreFFICLNet w/o FFI0.631
FFICLNet w/o CL0.675
FFICLNet0.676
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Song, A.; Qi, W.; Zhang, C.; Liu, S. FFICL-Net: A Fusing Symmetric Feature-Importance Ranking Contrastive-Learning Network for Multivariate Time-Series Classification. Symmetry 2025, 17, 522. https://doi.org/10.3390/sym17040522

AMA Style

Song A, Qi W, Zhang C, Liu S. FFICL-Net: A Fusing Symmetric Feature-Importance Ranking Contrastive-Learning Network for Multivariate Time-Series Classification. Symmetry. 2025; 17(4):522. https://doi.org/10.3390/sym17040522

Chicago/Turabian Style

Song, Anping, Wendong Qi, Chenbei Zhang, and Shibei Liu. 2025. "FFICL-Net: A Fusing Symmetric Feature-Importance Ranking Contrastive-Learning Network for Multivariate Time-Series Classification" Symmetry 17, no. 4: 522. https://doi.org/10.3390/sym17040522

APA Style

Song, A., Qi, W., Zhang, C., & Liu, S. (2025). FFICL-Net: A Fusing Symmetric Feature-Importance Ranking Contrastive-Learning Network for Multivariate Time-Series Classification. Symmetry, 17(4), 522. https://doi.org/10.3390/sym17040522

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop