Next Article in Journal
Comprehensive Validation on Reweighting Samples for Bias Mitigation via AIF360
Previous Article in Journal
Region-of-Interest Based Coding Scheme for Live Videos
Previous Article in Special Issue
Multi-Session Electrocardiogram–Electromyogram Database for User Recognition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

BIMO: Bootstrap Inter–Intra Modality at Once Unsupervised Learning for Multivariate Time Series

College of Computer Science, Kookmin University, Seoul 02707, Republic of Korea
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(9), 3825; https://doi.org/10.3390/app14093825
Submission received: 2 April 2024 / Accepted: 25 April 2024 / Published: 30 April 2024
(This article belongs to the Special Issue Deep Networks for Biosignals)

Abstract

:
It is difficult to learn meaningful representations of time-series data since they are sparsely labeled and unpredictable. Hence, we propose bootstrap inter–intra modality at once (BIMO), an unsupervised representation learning method based on time series. Unlike previous works, the proposed BIMO method learns both inter-sample and intra-temporal modality representations simultaneously without negative pairs. BIMO comprises a main network and two auxiliary networks, namely inter-auxiliary and intra-auxiliary networks. The main network is trained to learn inter–intra modality representations sequentially by regulating the use of each auxiliary network dynamically. Thus, BIMO thoroughly learns inter–intra modality representations simultaneously. The experimental results demonstrate that the proposed BIMO method outperforms the state-of-the-art unsupervised methods and achieves comparable performance to existing supervised methods.

1. Introduction

The volume of time-series data is rapidly growing with various applications in a wide variety of domains. Considerable developments have been noted in several fields, such as signal processing and machine learning [1,2,3,4]. Recently, deep learning models for time-series data have demonstrated remarkable performances [5,6,7,8,9,10].
Most of these models adopt a supervised learning approach, which has to collect a massive amount of data with high-quality data annotation. Therefore, we explore a time-series unsupervised learning approach to tackle data acquisition problems.
Unsupervised learning attempts to identify meaningful generalized properties from unlabeled data. Unsupervised learning has recently attracted significant attention, particularly in computer vision. The contrastive learning method is prominent among various unsupervised learning methods [11,12,13,14,15,16,17]. In addition, recent attempts have been made to remove negative pairs, which is a problem in the contrastive learning method [15,18].
However, unsupervised learning with time-series data has not been studied as extensively in computer vision, and some challenges remain in existing methods. Most time-series data are unpredictable and nonstationary [19,20], thus existing methods are limited with regard to extracting meaningful generalized properties.
Unsupervised learning-based time-series models can be broadly categorized into two approaches, those that learn inter-sample modality representations [21,22] and those that learn inter-temporal modality representations [23,24]. Inter-sample modality representation derives relationships between two samples. In contrast, intra-temporal modality representation derives features according to time within the same samples.
Most previous studies focused on training specific modality representations. In addition, the contrastive learning method requires attentive treatment while collecting proper negative pairs.
Therefore, in this paper, we propose the Bootstrap Inter–Intra Modality at Once (BIMO) method, which is an unsupervised learning method for multivariate time series that simultaneously explores inter–intra modality representations without negative pairs. The proposed BIMO method comprises three neural networks: the main network and two auxiliary networks (i.g., inter-auxiliary and intra-auxiliary networks). These three networks interact and learn from each other.
From given raw time-series data, two transformed samples are generated using an augmentation strategy: (1) the input to the main network and (2) the input to the inter-auxiliary network. The input of the main network generates another sample, which is the input of the intra-auxiliary network, using a subsampling strategy. The main network simultaneously predicts the representation of the two samples generated from the two auxiliary networks. The proposed BIMO method learns the complementary properties in both modalities efficiently and simultaneously by adjusting the weight of each auxiliary network dynamically.
We measured the performance of the learned representation with various datasets to validate the generalizability of the proposed method. Here, we used univariate UCR datasets [25], which are well-known time-series datasets. We showed that the proposed BIMO method is universal, comparable to state-of-the-art (SOTA) time-series supervised methods, and superior to previous time-series unsupervised methods.
We also evaluated the performance of the proposed method on multivariate UEA datasets [26]. Here, we found that the proposed BIMO method is suitable for representation learning with multivariate time-series data. We then used a real-world wearable stress and affection detection (WESAD) dataset to demonstrate the noise robustness of the proposed BIMO method.
Our primary contributions are summarized as follows. (1) We propose a unsupervised learning-based time-series simple method that trains the main network using two auxiliary networks while exploring inter–intra modality representations simultaneously. (2) We remove the constraints for negative pairs from contrastive learning-based time-series data analysis. (3) We present various comprehensive analyses to extract robust features, considering inter–intra modality representations, from the unsupervised learning perspective of time-series data. (4) We utilize various datasets to verify that the proposed BIMO method is universal, robust against noise, and outperforms contemporary SOTA methods.

2. Materials and Methods

BIMO’s goal is to be easily used in downstream tasks by discovering the most significant modalities for representation learning in all domains of time-series data. This study was inspired by existing work on SOTA contrastive learning-based unsupervised learning methods [15,23,27].
As shown in Figure 1, the proposed BIMO method consists of the main network and two auxiliary networks. The main network consists of an encoder  f θ , a projector  g θ , and a predictor  q θ , and each auxiliary network comprises an encoder and a projector. The main network learns to have a similar distribution between two values  p ξ p λ  from the respective projectors of the two auxiliary networks and a value  q θ ( p θ )  from the predictor of the main network.
It is a significant issue to simultaneously learn both inter- and intra-modality representations. We trained the proposed BIMO method to learn inter–intra modality representation efficiently and stably based on the fundamental concept, i.e., high-level features comprise low-level and intermediate-level features [28].
An overview of the training process in the proposed BIMO method is given in Algorithm 1. The complexity of the proposed BIMO method is  O ( 4 N ) , while the complexity of USRL as a existing SOTA method is at least  O ( 18 N ) .
While training, we first used a hard constraint in the inter-auxiliary network to learn sufficient low-level coarse information, i.e., the time characteristics within samples, from the intra-auxiliary network. As the number of epochs increased, we gradually applied a hard constraint to the intra-auxiliary network and not to the inter-auxiliary network. Therefore, the proposed BIMO method sufficiently learns fine-grained features, i.e., the correlation between two augmented samples, from the inter-auxiliary network.
Therefore, BIMO learns low-level features sufficiently at the initial training step, and gradually learns high-level features.
Algorithm 1 BIMO’s training procedure
Input: 
Time series set  X = { x n } n = 1 N , Number epochs M
Output: 
Trained  f θ
1:
Initialization  f θ , g θ , q θ , f λ , g λ , f ξ , g ξ  initialize weights
2:
  m 1
3:
repeat
4:
    for  n = 1  to N with  s n = s i z e ( x n )  do
5:
        generate  v t ( x ) v t ( x )  from different augmentation  t τ t τ
6:
        extract  s v s u b = s i z e ( v )  in  [ [ 1 , s n ] ]
7:
        extract  v s u b  among subseries of v of length  s v s u b
8:
         r θ f θ ( v ) p θ g θ ( r θ ) q θ ( p θ )
9:
         r ξ f ξ ( v ) p ξ g ξ ( r ξ )
10:
         r λ f λ ( v s u b ) p λ g λ ( r λ )
11:
         L i n t e r q θ ¯ ( p θ ) p ξ ¯ 2 2
12:
         L i n t r a q θ ¯ ( p θ ) p λ ¯ 2 2
13:
         L B I M O ( 1 1 m ) ( L i n t e r ) + 1 m ( L i n t r a )
14:
         f θ , g θ , q θ  update weights using  L B I M O
15:
         f λ , g λ , f ξ , g ξ  update weights using moving exponential average
16:
    end for
17:
     m m + 1
18:
until m = M

2.1. BIMO’s Components

Given time-series data,  X = { x n } n = 1 N , where N is the volume of data, which comprises a token  x n = ( x n , 1 , , x n , T ) , T ordered real values.
The proposed BIMO method consists of three networks, and each network uses a set of weights:  θ ξ , and  λ .
A sample x generates two augmented views  v t ( x )  and  v t ( x ) , which apply two augmentations  t τ  and  t τ  ( l i n e 5). For the augmentation strategy, we employ a magnitude domain augmentation method, which transforms the values of the time-series data, and a time domain augmentation method, which transforms the time-series data sequence. Here, v is the input of the main network, and  v  is the input of the inter-auxiliary network;  v s u b , which is the input of the intra-auxiliary network, is subsampled from v ( l i n e s 6–7), and M is the number of epochs.

2.2. Training Details

We first forward the three generated samples ( l i n e s 8–10). The main and inter-auxiliary networks learn representation through the generated samples from the same time-series data in different augmentation approaches. Therefore, the proposed BIMO method learns to have similar distributions between  q θ ( p θ )  from the predictor of the main network and  p ξ  from the projector of the inter-auxiliary network ( l i n e 11 ).
The inputs of the intra-auxiliary network are subsamples from the input of the main network. Hence, the samples are highly likely to have similar distributions since they are in similar periods. The proposed BIMO method also learns to have similar distribution between  q θ ( p θ )  from the predictor of the main network and  p λ  from the projector of the intra-auxiliary network ( l i n e 12 ).
First, we train the main network with the intra-auxiliary network in a high ratio and the inter-auxiliary network in a low ratio to learn the low-level coarse information at an initial time based on the fundamental principles of deep learning [28] ( l i n e 13 ). Then, we gradually decrease the ratio of the intra-auxiliary network and increase the ratio of the inter-auxiliary network in every epoch. We only minimize the loss function with a single weight,  θ , in each training step ( l i n e 14 ). The other weights (i.g.,  ξ  and  λ ) prevent network collapse using slowly moving average methods, which is  τ ξ ( 1 τ ) θ  ( l i n e 15 ).
The output of the main network is  q θ ( g θ ( f θ ( v ) ) ) q θ ( p θ ) , the output of the inter-auxiliary networks is  g ξ ( f ξ ( v ) ) p ξ , and the output of the intra-auxiliary networks is  g λ ( f λ ( v s u b ) ) p λ . Each output  q θ ( p θ ) p ξ , and  p λ  applies  2 -normalization and becomes  q θ ¯ ( p θ ) q θ ( p θ ) / q θ ( p θ ) 2 p ξ ¯ p ξ / p ξ | 2 , and  p λ ¯ p λ / p λ | 2 , respectively. Thus, the training objective aims to minimize the differences between  q θ ( p θ )  and  p ξ  as well as  q θ ( p θ )  and  p λ . Losses are defined as follows:
L i n t e r q θ ¯ ( p θ ) p ξ ¯ 2 2 = 2 2 · < q θ ( p θ ) , p ξ > q θ ( p θ ) 2 · p ξ 2
L i n t r a q θ ¯ ( p θ ) p λ ¯ 2 2 = 2 2 · < q θ ( p θ ) , p λ > q θ ( p θ ) 2 · p λ 2
L B I M O = ( 1 1 m ) ( L i n t e r + L ˜ i n t e r ) + 1 m ( L i n t r a + L ˜ i n t r a )
Equations (1) and (2) represent the inter and intra losses, respectively, and Equation (3) represents the total loss.  L ˜ i n t e r  and  L ˜ i n t r a  in Equation (3) exchange between v and  v  to symmetrize the losses, where m denotes a training epoch.

2.3. Architecture and Optimization

Time-series data have to accommodate varying lengths and be efficient in terms of time and memory, as such data are often updated in real time. Thus, we used a dilated causal convolution network [23,29,30] as a backbone to fulfil the requirements.
The dilated causal convolution network comprises 20 layers, each of which exponentially increases the dilation parameter:  2 i  for the i-th layer. We employ an adaptive max-pooling layer as the last layer to squeeze the temporal dimension and output a vector of a fixed size. Here, representation r is projected into a multilayer perceptron (MLP),  g θ , comprising two layers, and projection p is forwarded into another MLP,  q θ , which has the same structure as  g θ . We used the output dimensions of 512 and 320 for the first and second layers of the MLPs, respectively. For the auxiliary networks, we began with the exponential moving average parameter  τ b a s e = 0.996  and increased it to 1 during training.

3. Results and Discussion

We performed classification tasks to evaluate the proposed BIMO method’s validity in representation learning. We used typical time-series datasets: univariate UCR datasets [25] and multivariate UEA datasets [26]. We also used a public wearable dataset, the WESAD dataset [31], to validate BIMO’s robustness against noisy data. The encoder was trained on an unlabeled training set, and the learned encoder was used to perform a classification task. In addition, we trained a simple single-layer linear classifier on a labeled training set [32,33,34].

3.1. Implementation

Sample Generation: Time-series augmentation can be divided into magnitude-based and time-based methods. In this study, we used the time-series augmentation set t and  t , which comprises magnitude-based magnitude warping and scaling methods, and time-based time-slicing and time-warping methods [35,36].
The time-series subsampling strategy is based on the literature [23]. We randomly extracted a part of the samples by selecting the length and starting point. We selected different lengths and starting points for each epoch and trained them with various lengths of subsamples to learn a sufficient inter-temporal modality representation.
Encoder Selection: Time-series data should comprise temporal orders, which are required to consider temporal information, accommodate unequal lengths, and be efficient in terms of both time and memory. Note that deep convolutional neural networks (CNNs) do not consider temporal information and are difficult to apply to data of various lengths. Long short-term memory (LSTM) is inefficient in terms of time and memory. Thus, we used exponentially dilated causal convolutions to handle these issues [23,29,30].
To verify the conformity of our encoder selection, we measured the classification performance on the UCR datasets using dilated causal convolutions, ResNet, and a two-layer LSTM encoder. Each model outperformed the other two on 65%, 35%, and 5% of the first 20 UCR datasets, respectively. This result confirmed that the encoder with dilated causal convolution was the most suitable for the proposed BIMO method. The accuracy results are detailed in Table 1.

3.2. Univariate Time Series

We validated the proposed BIMO method’s performance using the 85 initially released UCR datasets, which are representative univariate time-series datasets [25]. (1) We compared the BIMO method’s performance to that of existing SOTA unsupervised models, (2) with the existing SOTA supervised models, and (3) compared the performance depending on combinations of the auxiliary networks.
Overall Performance: In terms of performance, we compared the proposed BIMO method with unsupervised models for time series, i.e., USRL (which utilizes triplet loss) [23], DTW (which employs a kernel-based estimation method) [37], and RWS (which uses a similarity matrix) [38], as shown in Table 2.
We also compared BIMO with supervised models, i.e., PF (which uses a decision tree ensemble) [39], BOSS (which employs a dictionary-based classifier) [5], InceptionTime (ITime) [7], and HIVE-COTE (which uses ensemble methods) [8]. As shown in Figure 2, we compared performance based on the average rank according to the accuracy results on the UCR datasets. All accuracy results are detailed in Table 2.
For the unsupervised models, the proposed BIMO method obtained the best rank scores: 3.71, 3.91, and 6.11 for BIMO, USRL, and DTW, respectively. For the supervised models, BIMO showed the third-highest score: 2.41, 2.52, 3.71, 3.73, and 3.91 for HIVE-COTE, ITime, BIMO, BOSS, and PF, respectively. These results demonstrate that BIMO is superior to existing SOTA unsupervised models and comparable to well-known supervised models.
Inter–Intra Modality Representation Ablation: We compared performance depending on the combination of auxiliary networks based on the average rank according to the accuracy results on the UCR datasets. We used a single auxiliary network, e.g., an inter-auxiliary or intra-auxiliary network, and multiple auxiliary networks, e.g., inter-auxiliary and intra-auxiliary networks. As shown in Table 3, we compared the performance in terms of the average rank score. More detailed overall accuracy results are shown in Table 4.
Given multiple auxiliary networks, we employed the static and dynamic loss functions. During training, the static loss function had an equal ratio of inter-auxiliary and intra-auxiliary networks (Inter and Intra). The dynamic loss function had different ratios of the inter- and intra-auxiliary networks for every epoch. Herein, the main network was initially trained with the inter-auxiliary network at a higher ratio than that of the intra-auxiliary network. Then, the ratio of the intra-auxiliary network was increased gradually (InterIntra). In contrast, the main network was trained with the intra-auxiliary network in a higher ratio than that used for the inter-auxiliary network at first; gradually, the ratio of the inter-auxiliary network was increased (IntraInter), which is the training method of BIMO.
As shown in Table 3, the IntraInter method obtained the best rank score. We confirmed that the initial training trained the intra-modality representations sufficiently, which are the relatively low-level features, and then the inter modality representations, which are the relatively high-level features. The proposed dynamic training method made the main network evenly learn both modality representations.
Representation Metric Space: We also validated the performance of representation learning for some UCR datasets using embedding visualization with dimensionality reduction. The results are shown in Figure 3.

3.3. Multivariate Time Series

We validated the performance of BIMO for UEA datasets. Here, we compared the performance of BIMO with USRL and DTW. The accuracy results are shown in Table 5. The BIMO, USRL, and DTW models, respectively, showed the best accuracies for approximately 50%, 32%, and 18% of the datasets. Overall, BIMO’s performance is comparable to that of SOTA unsupervised models for multivariate time series.

3.4. Robustness to Noisy Data

Most real-world time-series data contain some noise. Typically, the photoplethysmogram (PPG) signal, which is also referred to as the blood volume pulse, contains many noises. A PPG signal is simple and highly useful in daily life since it can be easily measured from the wrist. However, it is difficult to apply in an end-to-end deep learning model because it is susceptible to many internal and external noises of the measurement environment [40,41]. Therefore, most existing PPG-based studies have focused on signal processing and feature engineering [4,31,42,43,44].
In this study, we validated the noise robustness of BIMO, which is an end-to-end deep learning model, using noisy PPG signals. We used a PPG signal from the WESAD dataset [31]. The WESAD dataset is labeled with four emotional states: baseline, stress, amusement, and meditation. We performed a classification task with leave-one-subject-out cross-validation, stress versus nonstress, where nonstress is defined by combining the state baseline and amusement states [31].
We compared the performance with BIMO and existing SOTA supervised learning models for PPG, which is a weak feature engineering method [31] and a strong feature engineering method named OMDP [4]. The weak feature engineering-based method uses a peak detection algorithm, which is computed by simple statistical features. OMDP employs a two-step signal processing method in terms of both time and frequency and an ensemble-based peak detection method; it extracts diverse features from detected peaks.
As a result, we found that BIMO outperformed the supervised learning methods (Table 6), indicating that BIMO is comparable to previous SOTA models. This is a very meaningful result, since BIMO opens up the possibility that unsupervised end-to-end data-driven feature learning is also possible for noisy time-series data.

4. Conclusions

We proposed BIMO, which is an unsupervised learning method that is applicable to sparsely labeled and unpredictable time-series data. BIMO learns general features by considering both inter-modality and intra-modality representations simultaneously. In the proposed BIMO method, two auxiliary networks are employed to train the main network, and different ratios of the two auxiliary networks are dynamically applied to learn both modalities efficiently. BIMO demonstrated superior representation learning performance compared to SOTA unsupervised models, and it demonstrated comparable performance to well-known supervised models. In addition, we examined how BIMO is universal and robust to noisy data. The trained encoder of the main network could also be used in many different tasks by fine-tuning the model using simple classifiers.

Author Contributions

Conceptualization, J.L. and S.H.; writing—original draft preparation, J.L. and S.H.; writing—review and editing, J.L. and S.K.; supervision, J.L.; funding acquisition, J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Research Foundation (NRF) grant (No. RS-2023-00212484) and Institute of Information & Communications Technology Planning & Evaluation (IITP) grant (No. RS-2022-00167194) funded by the Korea government(MSIT).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Acknowledgments

This research was partially supported by the Kookmin University Industry–Academic Cooperation Foundation.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Bone, D.; Lee, C.C.; Chaspari, T.; Gibson, J.; Narayanan, S. Signal processing and machine learning for mental health research and clinical applications [perspectives]. IEEE Signal Process. Mag. 2017, 34, 195–196. [Google Scholar] [CrossRef]
  2. Costello, Z.; Martin, H.G. A machine learning approach to predict metabolic pathway dynamics from time-series multiomics data. NPJ Syst. Biol. Appl. 2018, 4, 19. [Google Scholar] [CrossRef] [PubMed]
  3. Parmezan, A.R.S.; Souza, V.M.; Batista, G.E. Evaluation of statistical and machine learning models for time series prediction: Identifying the state-of-the-art and the best conditions for the use of each model. Inf. Sci. 2019, 484, 302–337. [Google Scholar] [CrossRef]
  4. Heo, S.; Kwon, S.; Lee, J. Stress Detection With Single PPG Sensor by Orchestrating Multiple Denoising and Peak-Detecting Methods. IEEE Access 2021, 9, 47777–47785. [Google Scholar] [CrossRef]
  5. Wang, Z.; Yan, W.; Oates, T. Time series classification from scratch with deep neural networks: A strong baseline. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 1578–1585. [Google Scholar]
  6. Fawaz, H.I.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.A. Deep learning for time series classification: A review. Data Min. Knowl. Discov. 2019, 33, 917–963. [Google Scholar] [CrossRef]
  7. Fawaz, H.I.; Lucas, B.; Forestier, G.; Pelletier, C.; Schmidt, D.F.; Weber, J.; Webb, G.I.; Idoumghar, L.; Muller, P.A.; Petitjean, F. Inceptiontime: Finding alexnet for time series classification. Data Min. Knowl. Discov. 2020, 34, 1936–1962. [Google Scholar] [CrossRef]
  8. Dempster, A.; Petitjean, F.; Webb, G.I. ROCKET: Exceptionally fast and accurate time series classification using random convolutional kernels. Data Min. Knowl. Discov. 2020, 34, 1454–1495. [Google Scholar] [CrossRef]
  9. Kim, I.; Kim, D.; Kwon, S.; Lee, S.; Lee, J. Fall detection using biometric information based on multi-horizon forecasting. In Proceedings of the 2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 21–25 August 2022; pp. 1364–1370. [Google Scholar]
  10. Kim, I.; Lim, J.; Lee, J. Human Activity Recognition via Temporal Fusion Contrastive Learning. IEEE Access 2024, 12, 20854–20866. [Google Scholar] [CrossRef]
  11. He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9729–9738. [Google Scholar]
  12. Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
  13. Chen, T.; Kornblith, S.; Swersky, K.; Norouzi, M.; Hinton, G. Big self-supervised models are strong semi-supervised learners. arXiv 2020, arXiv:2006.10029. [Google Scholar]
  14. Chen, X.; Fan, H.; Girshick, R.; He, K. Improved baselines with momentum contrastive learning. arXiv 2020, arXiv:2003.04297. [Google Scholar]
  15. Grill, J.B.; Strub, F.; Altché, F.; Tallec, C.; Richemond, P.H.; Buchatskaya, E.; Doersch, C.; Pires, B.A.; Guo, Z.D.; Azar, M.G.; et al. Bootstrap your own latent: A new approach to self-supervised learning. arXiv 2020, arXiv:2006.07733. [Google Scholar]
  16. Kim, D.; Yoo, Y.; Park, S.; Kim, J.; Lee, J. Selfreg: Self-supervised contrastive regularization for domain generalization. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 9619–9628. [Google Scholar]
  17. Kim, D.; Kim, J.; Lee, J. Inter-domain curriculum learning for domain generalization. ICT Express 2022, 8, 225–229. [Google Scholar] [CrossRef]
  18. Chen, X.; He, K. Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 15750–15758. [Google Scholar]
  19. Tsay, R.S. Analysis of Financial Time Series; John Wiley & Sons: Hoboken, NJ, USA, 2005; Volume 543. [Google Scholar]
  20. Cowpertwait, P.S.; Metcalfe, A.V. Introductory Time Series with R; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
  21. Pascual, S.; Ravanelli, M.; Serra, J.; Bonafonte, A.; Bengio, Y. Learning problem-agnostic speech representations from multiple self-supervised tasks. arXiv 2019, arXiv:1904.03416. [Google Scholar]
  22. Sarkar, P.; Etemad, A. Self-supervised learning for ecg-based emotion recognition. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 3217–3221. [Google Scholar]
  23. Franceschi, J.Y.; Dieuleveut, A.; Jaggi, M. Unsupervised scalable representation learning for multivariate time series. arXiv 2019, arXiv:1901.10738. [Google Scholar]
  24. Schneider, S.; Baevski, A.; Collobert, R.; Auli, M. wav2vec: Unsupervised pre-training for speech recognition. arXiv 2019, arXiv:1904.05862. [Google Scholar]
  25. Dau, H.A.; Bagnall, A.; Kamgar, K.; Yeh, C.C.M.; Zhu, Y.; Gharghabi, S.; Ratanamahatana, C.A.; Keogh, E. The UCR time series archive. IEEE/CAA J. Autom. Sin. 2019, 6, 1293–1305. [Google Scholar] [CrossRef]
  26. Bagnall, A.; Dau, H.A.; Lines, J.; Flynn, M.; Large, J.; Bostrom, A.; Southam, P.; Keogh, E. The UEA multivariate time series classification archive, 2018. arXiv 2018, arXiv:1811.00075. [Google Scholar]
  27. Fan, H.; Zhang, F.; Gao, Y. Self-Supervised Time Series Representation Learning by Inter-Intra Relational Reasoning. arXiv 2020, arXiv:2011.13548. [Google Scholar]
  28. Bengio, Y. Learning Deep Architectures for AI; Now Publishers Inc.: Norwell, MA, USA, 2009. [Google Scholar]
  29. Oord, A.v.d.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. Wavenet: A generative model for raw audio. arXiv 2016, arXiv:1609.03499. [Google Scholar]
  30. Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
  31. Schmidt, P.; Reiss, A.; Duerichen, R.; Marberger, C.; Van Laerhoven, K. Introducing wesad, a multimodal dataset for wearable stress and affect detection. In Proceedings of the 20th ACM International Conference on Multimodal Interaction, Boulder CO, USA, 16–18 October 2018; pp. 400–408. [Google Scholar]
  32. Dosovitskiy, A.; Springenberg, J.T.; Riedmiller, M.; Brox, T. Discriminative unsupervised feature learning with convolutional neural networks. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar] [CrossRef] [PubMed]
  33. Doersch, C.; Gupta, A.; Efros, A.A. Unsupervised visual representation learning by context prediction. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1422–1430. [Google Scholar]
  34. Pathak, D.; Krahenbuhl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2536–2544. [Google Scholar]
  35. Wen, Q.; Sun, L.; Yang, F.; Song, X.; Gao, J.; Wang, X.; Xu, H. Time series data augmentation for deep learning: A survey. arXiv 2020, arXiv:2002.12478. [Google Scholar]
  36. Um, T.T.; Pfister, F.M.; Pichler, D.; Endo, S.; Lang, M.; Hirche, S.; Fietzek, U.; Kulić, D. Data augmentation of wearable sensor data for parkinson’s disease monitoring using convolutional neural networks. In Proceedings of the 19th ACM International Conference on Multimodal Interaction, Glasgow, UK, 13–17 November 2017; pp. 216–220. [Google Scholar]
  37. Lei, Q.; Yi, J.; Vaculin, R.; Wu, L.; Dhillon, I.S. Similarity Preserving Representation Learning for Time Series Clustering. arXiv 2017, arXiv:1702.03584. [Google Scholar]
  38. Wu, L.; Yen, I.E.H.; Yi, J.; Xu, F.; Lei, Q.; Witbrock, M. Random warping series: A random features method for time-series embedding. In Proceedings of the International Conference on Artificial Intelligence and Statistics, PMLR, Lanzarote, Spain, 9–11 April 2018; pp. 793–802. [Google Scholar]
  39. Lucas, B.; Shifaz, A.; Pelletier, C.; O’Neill, L.; Zaidi, N.; Goethals, B.; Petitjean, F.; Webb, G.I. Proximity forest: An effective and scalable distance-based classifier for time series. Data Min. Knowl. Discov. 2019, 33, 607–635. [Google Scholar] [CrossRef]
  40. Lee, Y.K.; Kwon, O.W.; Shin, H.S.; Jo, J.; Lee, Y. Noise reduction of PPG signals using a particle filter for robust emotion recognition. In Proceedings of the 2011 IEEE International Conference on Consumer Electronics-Berlin (ICCE-Berlin), Berlin, Germany, 6–8 September 2011; pp. 202–205. [Google Scholar]
  41. Liang, Y.; Elgendi, M.; Chen, Z.; Ward, R. An optimal filter for short photoplethysmogram signals. Sci. Data 2018, 5, 180076. [Google Scholar] [CrossRef]
  42. Hanyu, S.; Xiaohui, C. Motion artifact detection and reduction in PPG signals based on statistics analysis. In Proceedings of the 2017 29th Chinese Control and Decision Conference (CCDC), Chongqing, China, 28–30 May 2017; pp. 3114–3119. [Google Scholar]
  43. Sadhukhan, D.; Pal, S.; Mitra, M. PPG Noise Reduction based on Adaptive Frequency Suppression using Discrete Fourier Transform for Portable Home Monitoring Applications. In Proceedings of the 2018 15th IEEE India Council International Conference (INDICON), Coimbatore, India, 16–18 December 2018; pp. 1–6. [Google Scholar]
  44. Pollreisz, D.; TaheriNejad, N. Detection and removal of motion artifacts in PPG signals. Mob. Netw. Appl. 2019, 27, 728–738. [Google Scholar] [CrossRef]
Figure 1. BIMO’s architecture: f, g, and q represent the encoder, projector, and predictor, respectively.
Figure 1. BIMO’s architecture: f, g, and q represent the encoder, projector, and predictor, respectively.
Applsci 14 03825 g001
Figure 2. Average rank diagram of BIMO, existing SOTA unsupervised models (USRL, DTW), and supervised models (PF, BOSS, HIVE-COTE, ITime) for the UCR datasets. The average rank means the average of the top ranking results of a model. The black lines indicate an unsupervised models, and dotted lines represent supervised models.
Figure 2. Average rank diagram of BIMO, existing SOTA unsupervised models (USRL, DTW), and supervised models (PF, BOSS, HIVE-COTE, ITime) for the UCR datasets. The average rank means the average of the top ranking results of a model. The black lines indicate an unsupervised models, and dotted lines represent supervised models.
Applsci 14 03825 g002
Figure 3. Visualization of embedded vectors of ECG500 and UWaveGestureLibrary UCR test datasets with dimensionality reduction. Each class marked with different shapes and colors is well differentiable.
Figure 3. Visualization of embedded vectors of ECG500 and UWaveGestureLibrary UCR test datasets with dimensionality reduction. Each class marked with different shapes and colors is well differentiable.
Applsci 14 03825 g003
Table 1. Accuracy scores depending on encoder type with first 15 UCR datasets. Encoder type includes dilated convolution (DConv.), LSTM, and ResNet. Bold text represents the best accuracy.
Table 1. Accuracy scores depending on encoder type with first 15 UCR datasets. Encoder type includes dilated convolution (DConv.), LSTM, and ResNet. Bold text represents the best accuracy.
DatasetDConv. (BIMO)ResNetLSTM
Adiac0.7600.4820.342
ArrowHead0.8140.7630.388
Beef0.8000.6250.313
BeetleFly0.8500.6880.750
BirdChicken0.9000.7500.563
Car0.9170.6880.417
CBF0.9980.9920.401
ChlorineConcentration0.6350.7310.534
CinCECGTorso0.7570.6290.283
Coffee1.0001.0000.625
Computers0.6810.7290.571
CricketX0.7500.6510.107
CricketY0.7160.6280.216
CricketZ0.7580.3780.102
DiatomSizeReduction0.9770.9110.336
DistalPhalanxOutlineAgeGroup0.7430.8200.523
DistalPhalanxOutlineCorrect0.7860.8090.581
DistalPhalanxTW0.6840.6880.422
Earthquakes0.7650.7270.767
ECG2000.9000.9060.698
Table 2. Accuracy scores of BIMO, SOTA unsupervised models (USRL, and DTW), and supervised models (BOSS, PF, ResNet, HIVE-COTE and ITime). Bold text represents the best accuracy among the unsupervised models; * denotes the best accuracy, while underlined text represents the second-best accuracy among all models.
Table 2. Accuracy scores of BIMO, SOTA unsupervised models (USRL, and DTW), and supervised models (BOSS, PF, ResNet, HIVE-COTE and ITime). Bold text represents the best accuracy among the unsupervised models; * denotes the best accuracy, while underlined text represents the second-best accuracy among all models.
DatasetUnsupervisedSupervised
BIMO USRL DTW BOSS PF HIVE-COTE ITime
Adiac0.7600.7160.6040.7650.7340.8110.836 *
ArrowHead0.8140.8290.7030.8340.875 *0.8630.829
Beef0.8000.7000.6330.8000.7200.933 *0.700
BeetleFly0.8500.9000.7000.9000.8750.950 *0.850
BirdChicken0.9000.8000.7500.950 *0.8650.8670.950 *
Car0.917 *0.8170.7330.8330.8470.8670.900
CBF0.9980.9940.9970.9980.9930.999 *0.998
ChlCon0.6350.7820.6480.6610.6340.7120.875 *
CinCECGTorso0.7570.7400.6510.8870.9340.996 *0.851
Coffee1.000 *1.000 *1.000 *1.000 *1.000 *1.000 *1.000 *
Computers0.6810.6280.7000.7560.6440.7600.812 *
CricketX0.7500.7770.7540.7360.8020.8230.867 *
CricketY0.7160.7670.7440.7540.7940.8490.851 *
CricketZ0.7580.7640.7540.7460.8010.8310.859 *
DiaSizRed0.9770.993 *0.9670.9310.9660.9410.931
DisPhaOutAgeGroup0.7430.7340.770 *0.7480.7310.7630.727
DisPhaxOutCorrect0.7860.7680.7170.7280.7930.7720.794 *
DistalPhalanxTW0.684 *0.6760.5900.6760.6600.6830.676
Earthquakes0.765 *0.7480.7190.7480.7540.7480.741
ECG2000.9000.9000.7700.8700.9090.8500.910 *
ECG50000.9400.9360.9240.9410.9370.946 *0.941
ECGFiveDays1.000 *1.000 *0.7681.000 *0.8491.000 *1.000 *
ElectricDevices0.6320.7320.6020.799 *0.7060.7700.723
FaceAll0.8390.8020.8080.7820.894 *0.8030.804
FaceFour0.8410.8750.8301.000 *0.9740.9550.966
FacesUCR0.9480.9180.9050.9570.9460.9630.973 *
FiftyWords0.7830.7800.6900.7050.8310.8090.842 *
Fish0.9590.8800.8230.989*0.9350.989 *0.983
FordA0.8500.9350.5550.9300.8550.964 *0.948
FordB0.7140.8100.6200.7110.7150.8230.937 *
GunPoint1.000 *0.9930.9071.000 *0.9971.000 *1.000 *
Ham0.740 *0.6950.4670.6670.6600.6670.714
HandOutlines0.9240.9220.8810.9030.9210.9320.960 *
Haptics0.5100.4550.3770.4610.4450.5190.568 *
Herring0.703 *0.5780.5310.5470.5800.6880.703 *
InlineSkate0.3720.4470.3840.5160.542 *0.5000.486
InsWinbeatSound0.6300.6230.3550.5230.6190.655 *0.635
ItalyPowerDemand0.9630.9250.9500.9090.9670.9630.968 *
LarKitAppliances0.8660.8480.7950.7650.7820.8640.907 *
Lightning20.8830.918 *0.8690.8360.8660.8200.803
Lightning70.8190.7950.7260.6850.822 *0.7400.808
Mallat0.9560.964 *0.9340.9380.9580.9620.963
Meat1.000 *0.9500.9330.9000.9330.9330.950
MedicalImages0.7300.7840.7370.7180.7580.7780.799 *
MidPhaOutAgeGroup0.6180.656 *0.5000.5450.5620.5970.533
MidPhaOutCorrect0.8260.8140.6980.7800.836 *0.8320.835
MiddlePhalanxTW0.5660.610 *0.5060.5450.5290.5710.513
MoteStrain0.8710.8710.8350.8790.9020.933 *0.903
NonInvFetECGTho10.9230.9100.7900.8380.9060.9300.962 *
NonInvFetECGTho20.9290.9270.8650.9010.9400.9450.967 *
OliveOil0.964 *0.9000.8330.8670.8670.9000.867
OSULeaf0.7290.8310.5910.9550.8270.979 *0.934
PhaOutCorrect0.8010.8010.7280.7720.8240.8070.854 *
Phoneme0.2630.2890.2280.2650.3200.382 *0.335
Plane1.000 *0.9901.000 *1.000 *1.000 *1.000 *1.000 *
ProPhaOutAgeGroup0.863 *0.8540.8050.8340.8460.8590.854
ProPhaOutCorrect0.8780.8590.7840.8490.8730.8800.931 *
ProximalPhalanxTW0.8140.824 *0.7610.8000.7790.8150.776
RefrigerationDevices0.5240.5170.4640.4990.5320.557 *0.509
ScreenType0.4460.4130.3970.4640.4550.589 *0.576
ShapeletSim0.6940.8170.6501.000 *0.7761.000 *0.989
ShapesAll0.6670.8750.7680.9080.8860.9050.925 *
SmaKitAppliances0.7900.7150.6430.7250.7440.853 *0.779
SonAIBORobSur10.967 *0.8970.7250.6320.8460.7650.884
SonAIBORobSur20.8580.9340.8310.8590.8960.9280.953 *
StarLightCurves0.9700.9650.9070.9780.9810.982 *0.979
Strawberry0.9620.9460.9410.9760.9680.9700.984 *
SwedishLeaf0.9290.9310.7920.9220.9470.9540.971 *
Symbols0.9600.9650.9500.9670.9620.9740.982 *
SyntheticControl0.9000.9830.9930.9670.9950.997 *0.997 *
ToeSeg10.9170.9520.7720.9390.9250.982 *0.969
ToeSeg20.8910.8850.8380.962 *0.8620.9540.939
Trace1.000 *1.000 *1.000 *1.000 *1.000 *1.000 *1.000 *
TwoLeadECG0.9960.997 *0.9050.9810.9890.9960.996
TwoPatterns1.000 *1.000 *1.000 *0.9931.000 *1.000 *1.000 *
UWavGesLibAll0.9580.9410.8920.9390.972 *0.9680.955
UWavGesLibX0.8020.8110.7280.7620.8290.840 *0.825
UWavGesLibY0.7120.7350.6340.6850.7620.7650.769 *
UWavGesLibZ0.7420.7590.6580.6950.7640.783 *0.770
Wafer0.9960.9930.9800.9950.9960.999 *0.999 *
Wine0.8080.870 *0.5740.7410.5690.7780.667
WordSynonyms0.7010.7040.6490.6380.779 *0.7380.756
Worms0.6840.7140.5840.5580.7180.5580.805 *
WormsTwoClass0.842 *0.8180.6230.8310.7840.7790.792
Yoga0.8070.8780.8370.918 *0.8790.918 *0.906
Table 3. Average rank comparison depending on the combination of auxiliary networks: a single auxiliary network (Inter or Intra) and multiple auxiliary networks (Inter and Intra, InterIntra, IntraInter). Bold text represents the best rank score.
Table 3. Average rank comparison depending on the combination of auxiliary networks: a single auxiliary network (Inter or Intra) and multiple auxiliary networks (Inter and Intra, InterIntra, IntraInter). Bold text represents the best rank score.
SinglePlural
InterIntraInter and IntraInter    IntraIntra    Inter
2.393.332.873.331.90
Table 4. Accuracy scores depending on the combination of auxiliary networks for the first and recent UCR datasets: using a single auxiliary network (Inter or Intra) and plural auxiliary networks (Inter and Intra, InterIntra, IntraInter). Bold text represents the best accuracy, and the underlined text represents the second-best accuracy.
Table 4. Accuracy scores depending on the combination of auxiliary networks for the first and recent UCR datasets: using a single auxiliary network (Inter or Intra) and plural auxiliary networks (Inter and Intra, InterIntra, IntraInter). Bold text represents the best accuracy, and the underlined text represents the second-best accuracy.
DatasetSinglePlural
Inter Intra Inter and Intra Inter  ↦  Intra Intra  ↦  Inter
(BIMO)
Adiac0.7780.6930.7290.6420.760
ArrowHead0.8310.7670.8260.7850.814
Beef0.7500.7860.7860.8210.800
BeetleFly0.8500.8500.9000.8500.850
BirdChicken0.8500.9000.8000.7970.900
Car0.9170.8500.8830.9830.916
CBF0.9900.9930.9960.9860.998
ChlCon0.6130.6270.5970.7330.635
CinCECGTorso0.7450.7370.7661.0000.757
Coffee1.0001.0001.0001.0001.000
Computers0.6330.6210.6810.6250.681
CricketX0.7710.6290.6830.6490.750
CricketY0.6960.5850.6860.6520.716
CricketZ0.7500.6260.7060.6700.758
DiaSizRed0.9610.9800.9640.9840.977
DisPhaOutAgeGroup0.7350.6990.7350.7210.786
DisPhaxOutCorrect0.7720.7720.7610.7500.743
DistalPhalanxTW0.7210.6690.6690.7130.684
Earthquakes0.7500.7350.7350.7060.765
ECG2000.8900.8900.8800.8900.900
ECG50000.9400.9400.9410.9390.940
ECGFiveDays0.9910.9980.9950.9971.000
ElectricDevices0.6060.5210.6250.5850.632
FaceAll0.8300.6800.7710.7010.839
FaceFour0.8530.8750.8410.8410.841
FacesUCR0.9470.9200.9220.9170.948
FiftyWords0.7740.7920.7850.7880.783
Fish0.9590.9010.9480.9130.959
FordA0.8670.9200.8700.9180.850
FordB0.7180.7880.7560.7750.714
GunPoint1.0000.9860.9930.9861.000
Ham1.0000.7600.7400.7120.740
HandOutlines0.9210.9160.9020.9130.924
Haptics0.9160.4940.5230.4870.510
Herring0.5940.6880.6250.7030.703
InlineSkate0.3520.3670.3670.3740.372
InsWinbeatSound0.6080.5980.6090.5970.630
ItalyPowerDemand0.9550.9540.9520.9630.963
LarKitAppliances0.8710.6210.8630.6590.866
Lightning20.7670.7830.7670.7170.883
Lightning70.7780.7780.7640.7500.819
Mallat0.8980.8290.9200.8750.956
Meat1.0000.9500.9830.9831.000
MedicalImages0.7330.7260.7300.7460.730
MidPhaOutAgeGroup0.5330.6580.6180.6050.826
MidPhaOutCorrect0.7990.7920.7990.8230.618
MiddlePhalanxTW0.5860.5590.5660.5330.566
MoteStrain0.8540.8510.8590.8510.871
NonInvFetECGTho10.9160.8910.9070.8940.923
NonInvFetECGTho20.9260.9050.9200.9070.929
OliveOil1.0000.9640.9640.9640.964
OSULeaf0.7170.6500.6960.6670.729
PhaOutCorrect0.7800.7830.7930.7700.801
Phoneme0.2490.2160.2750.2200.263
Plane1.0000.9900.9901.0001.000
ProPhaOutAgeGroup0.8480.8430.8140.8430.878
ProPhaOutCorrect0.8850.8820.8920.8650.863
ProximalPhalanxTW0.7890.8190.7840.7890.814
RefrigerationDevices0.5380.5560.5300.5590.524
ScreenType0.4600.4540.4570.4190.446
ShapeletSim0.5830.6280.6000.6390.694
ShapesAll0.6620.6470.6630.6470.667
SmaKitAppliances0.7550.7280.7960.7260.790
SonAIBORobSur10.9700.9420.9530.9600.967
SonAIBORobSur20.8530.8600.8430.8730.858
StarLightCurves0.9780.9630.9630.9550.970
Strawberry0.9620.9430.9650.9480.962
SwedishLeaf0.9290.9250.9180.9230.929
Symbols0.9290.9350.9450.9330.960
SyntheticControl0.8730.8500.8630.8730.900
ToeSeg10.9170.9170.9170.8860.917
ToeSeg20.8830.8830.8830.8830.891
Trace1.0001.0001.0001.0001.000
TwoLeadECG0.9960.9960.9950.9960.996
TwoPatterns1.0000.9971.0000.9961.000
UWavGesLibAll0.9540.9560.9510.9620.742
UWavGesLibX0.8040.7990.8110.8020.958
UWavGesLibY0.7260.6860.7170.6900.802
UWavGesLibZ0.7350.7190.7480.7340.712
Wafer0.9960.9910.9950.9910.996
Wine0.8270.7310.7500.7310.808
WordSynonyms0.6820.6900.7000.6620.701
Worms0.6710.6180.6580.5920.684
WormsTwoClass0.8030.7240.7630.7110.842
Yoga0.8070.8100.7960.7970.807
Table 5. Accuracy of BIMO and SOTA unsupervised methods (USRL, DTW) on UEA datasets. Bold text indicates the best accuracy.
Table 5. Accuracy of BIMO and SOTA unsupervised methods (USRL, DTW) on UEA datasets. Bold text indicates the best accuracy.
DatasetBIMOUSRLDTW
ArticularyWordRecognition0.8300.9870.987
AtrialFibrillation0.4170.1330.200
BasicMotions1.0001.0000.975
Cricket0.8610.9861.000
DuckDuckGeese0.6880.6750.600
EigenWorms0.8520.8780.618
Epilepsy0.9260.9570.964
Ering0.9220.1330.133
EthanolConcentration0.3540.2360.323
FaceDetection0.5500.5280.529
FingerMovements0.5500.5400.530
HandMovementDirection0.4440.2700.231
Handwriting0.3460.5330.286
Heartbeat0.7400.7370.717
Libras0.6500.8670.870
LSST0.4040.5580.551
MotorImagery0.6000.5400.500
NATOPS0.8720.9440.883
PEMS-SF0.7330.6880.711
PenDigits0.9750.9830.977
Phoneme0.2800.2460.151
RacketSports0.7370.8620.803
SelfRegulationSCP10.8530.8460.775
SelfRegulationSCP20.5500.5560.539
StandWalkJump0.5000.4000.200
UWaveGestureLibrary0.8190.8840.903
Table 6. Comparison of accuracy and F1 scores of BIMO and existing models using a PPG signal in the WESAD dataset. Abbreviations: decision tree (DT), random forest (RF), Adaboost (AB), linear discriminant analysis (LDA), k-nearest neighbor (kNN), and feature engineering (FE).
Table 6. Comparison of accuracy and F1 scores of BIMO and existing models using a PPG signal in the WESAD dataset. Abbreviations: decision tree (DT), random forest (RF), Adaboost (AB), linear discriminant analysis (LDA), k-nearest neighbor (kNN), and feature engineering (FE).
Accuracy (F1)
ML AlgorithmsDTRFABLDAkNNBIMO (Ours)
weak FE0.78 (0.81)0.81 (0.84)0.81 (0.84)0.83 (0.86)0.79 (0.82)0.87 (0.85)
strong FE (OMDP)0.87 (0.81)0.91 (0.87)0.91 (0.87)0.97 (0.93)0.89 (0.89)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Heo, S.; Kim, S.; Lee, J. BIMO: Bootstrap Inter–Intra Modality at Once Unsupervised Learning for Multivariate Time Series. Appl. Sci. 2024, 14, 3825. https://doi.org/10.3390/app14093825

AMA Style

Heo S, Kim S, Lee J. BIMO: Bootstrap Inter–Intra Modality at Once Unsupervised Learning for Multivariate Time Series. Applied Sciences. 2024; 14(9):3825. https://doi.org/10.3390/app14093825

Chicago/Turabian Style

Heo, Seongsil, Sungsik Kim, and Jaekoo Lee. 2024. "BIMO: Bootstrap Inter–Intra Modality at Once Unsupervised Learning for Multivariate Time Series" Applied Sciences 14, no. 9: 3825. https://doi.org/10.3390/app14093825

APA Style

Heo, S., Kim, S., & Lee, J. (2024). BIMO: Bootstrap Inter–Intra Modality at Once Unsupervised Learning for Multivariate Time Series. Applied Sciences, 14(9), 3825. https://doi.org/10.3390/app14093825

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop