5.3. Decoder Selection and Evaluation
Our proposed split encoder FS-VAE model comprises two distinct encoders: a feedforward network and a convolutional network. The structure of the encoders is shown in
Table 1. Both encoders transform the output of the functional input layer (refer to Equation (
9)), which is 192-dimensional numerical data from the overlap of functional data into different basis functions, as mentioned in
Section 3.2, into separate latent spaces of dimensions 6 and 5, respectively. These outputs are combined into a joint latent space of 11 dimensions. A latent variable is sampled from this joint distribution using the reparameterization trick [
7], serving as the input to the decoder for data reconstruction.
The architecture of the encoders does not explicitly determine the structure of the decoder in the split encoder FS-VAE model. To identify an appropriate decoder architecture, we evaluate two candidates: a feedforward network and a convolutional network. The structures of the two decoder options are shown in
Table 2. Our assessment focused on training error decay and reconstruction error, which is the Euclidean norm between the matrices in Equations (
9) and (
13). According to the experimental results in
Figure 4, the feedforward decoder demonstrates a slower training error decay and a higher reconstruction error compared to the convolutional decoder. The convolutional decoder shows a rapid training error decay and a
reduction in reconstruction error, indicating more efficient learning and accurate reconstruction. Based on these results, the convolutional network is identified as the more suitable decoder for our split encoder FS-VAE model.
5.4. Classification Accuracy with Synthetic Dataset
The effectiveness of the supervised learning feature of FS-VAE by the softmax classifier is investigated in this section. This multi-label classification problem is trained by minimizing the cross-entropy loss function in Equation (
15). For this purpose, we consider a case with three segments (
L = 3) as depicted in
Figure 1. Each segment exhibits two behavior types: “impulsive” and “non-impulsive.” The output of the softmax consists of six nodes, representing the probabilities of the two temporal features within each of the three segments. The classifier receives input from the 6-dimensional latent space generated by the feedforward encoder.
The architecture of the classifier can be found in
Table 3. The effectiveness of our model is evaluated using the F1 score. Our findings, as shown in
Figure 5, indicate that the F1 scores remain steady across different combinations of envelope functions and impulsive function profiles. The fluctuations in the F1-score are within
. The cases with relatively low F1-scores come from the impulsive function profiles that, under Fourier analysis, have similar spectral content to non-impulsive functions. This suggests the stability of coupling the softmax classifier with the feedforward encoder to encode salient features and underlying structures of data into a lower-dimensional shared latent subspace.
In addition, we also performed an in-depth comparison analysis against the standard time series classification methods with vectorial data and functional data input. The results are shown in
Table 4, and we observe a variety of performance patterns.
For the decision trees algorithm, we find similar performance across functional and direct (discrete) approaches, with a slightly better outcome using functional data. This algorithm’s AUC score is for functional data and for vectorial data, suggesting that decision trees algorithm algorithms may have slight benefits from the dimensionality reduction offered by the functional approach, although it can also perform satisfactorily in higher-dimensional spaces.
Support vector machine (SVM) shows a small advantage in performance with the vectorial data approach. Given that SVMs are inherently designed to handle high-dimensional data and capture complex decision boundaries, they may not significantly benefit from the functional approach’s dimensionality reduction. In fact, the vectorial approach outperforms the functional approach by about , with AUC scores of and , respectively.
In contrast, the k-nearest neighbors (kNNs), naive Bayes (NB) and multi-layer perceptron (MLP) algorithms demonstrate enhanced performance with the functional approach. These algorithms typically perform better in lower-dimensional spaces, with AUC scores of for k-NN, for NB and for MLP with functional data, versus , , and , respectively, with discrete data.
The versatile random forest (RF) algorithm shows a slight improvement with the functional approach compared to the direct one. Despite its capability to handle high-dimensional data and complex relationships, it achieves an AUC of with functional data, compared to with vectorial data. This indicates that robust and adaptable algorithms such as random forest can also benefit from the more concise data representation offered by functional methods.
Our current technique, FS-VAE, involves performing softmax classification on the split encoder variational autoencoder’s latent subspace and shows remarkable performance improvements. For decision tree, FS-VAE increases the AUC to , a gain of over the functional approach and over the vectorial approach. Even for SVM, where the vectorial data approach performs exceptionally well, FS-VAE surpasses it with an AUC of , indicating a performance gain of about . For algorithms such as kNN and NB, which show significant performance improvement with the functional approach, FS-VAE drives AUCs up to around , a performance gain of 10–8%, respectively. Finally, for the random forest algorithm, which shows a slight preference for the functional approach, FS-VAE pushes the AUC to , a performance gain of .
On average, FS-VAE improves performance by about over the best method in the vectorial data category (random forests), and by about over the best method in the functional data category (random forests). Thus, the FS-VAE approach, which combines functional data analysis and variational autoencoders, provides a compelling strategy that could outperform both the functional and direct approaches in time series classification tasks.
5.6. Latent Subspaces Alignment
In our study, we implement a domain adaptation strategy using a split encoder and one decoder variational autoencoder structure for both the source and target domain branches. The split encoder produces a shared subspace and a private subspace. The shared subspace is connected to a softmax classifier to enforce the factorization of the latent space. The shared subspaces in both branches are connected by the latent space domain adaptation method, facilitating the transfer of knowledge from the source to the target domain.
The source domain is initially trained with labeled synthetic data for 100 epochs. Upon convergence, the network weights, which are achieved with a final training loss of 0.05, are used as initial values for the network weights in the target domain branch. This transfer learning trick, which provides an educated guess of the initial network parameters of the target domain branch in
Figure 2, can facilitate domain adaptation training with a modestly sized real-world dataset. Both branches are then trained together for another 50 epochs with the domain adaptation method connecting the shared subspaces, achieving a final training loss of 0.03. During the domain adaptation training process, we encounter a situation known as “latent space collapse” [
28], where the diversity of latent features is suppressed to be very small. This issue is mitigated by incorporating batch normalization [
29] into the training process, which helps to maintain the diversity of the latent space and improves the model’s performance on the target domain data.
The result of domain adaptation training with and without batch normalization is studied with a labeled target domain dataset comprising 30 samples. The shared subspace in the trained target domain branch is connected to the softmax classifier. The classification evaluation metrics are tabulated in
Table 5.
In
Table 5 we can see that the inclusion of batch normalization significantly improves all the metrics. The accuracy increases from
to
, precision from
to
, recall from
to
, and F1 score from
to
. The error metrics also decrease, with the mean absolute error reducing from 0.45 to 0.20 and root mean squared error from 0.60 to 0.35. These improvements demonstrate the effectiveness of the domain adaptation method, achieving an accuracy of about
.
5.7. Clustering in Private Latent Subspace and Prognostic Implications
This section delineates the operational principles of the FS-VAE private subspace model, specifically, its ability to extract clustering correlations from input functional data. As shown in
Figure 1, the model projects the input data into separate subspaces. A portion of this projection is handled by the 1D convolutional neural network (1DCNN) decoder, which maps the data into the private subspace. This private subspace is then leveraged for clustering analysis, capturing and exploring the variations within the functional dataset. The successful factorization into subspaces is evidenced by the low reconstruction error, as generated by the total variational autoencoder (VAE). This can be visually confirmed in
Figure 4, which shows a comparative representation of the original and reconstructed data, thus illustrating the effectiveness of our approach.
The neural network responsible for generating the clusters is trained concurrently with the procedures for classification and domain adaptation as previously described. This simultaneous training approach ensures that the network’s learning is harmonized across these different tasks, thus leading to more cohesive and effective performance.
Our study aims to evaluate the performance of various dimensionality reduction techniques on time series data. We primarily focus on the convolutional variational auto-encoder (CVAE) [
30] and principal component analysis (PCA) [
31], specifically for vectorial time series data, in addition to the FS-VAE private latent subspace model. These models are examined based on their capacity to extract meaningful clusters from the time series data. To quantify the quality of these clusters, we employ measures such as accuracy, mutual information, and purity metrics, following the standards defined in the well-established literature [
32,
33,
34].
The initial stages involve the pre-processing and transformation of our time series data into a lower-dimensional representation. Here, we note a significant distinction between the techniques. CVAE and PCA, which are designed to handle discrete data, require larger network sizes—approximately 5394 and 5000 parameters, respectively. On the other hand, the FS-VAE model, designed for functional data representation, leads to a considerably smaller network size, around the order of 2200 parameters.
After transforming the data into a lower-dimensional representation with the respective techniques, we subject them to the K-means clustering algorithm as implemented in the scikit-learn library [
35]. Our comparative analysis of the three dimensionality reduction techniques, considering their computational efficiency and the quality of the resultant clusters, is shown in
Table 6. The accuracy in clustering performance shows improvement between
to
. Notably, there is a computational speedup of about
for FS-VAE compared to CVAE, and around
when compared to PCA.
Figure 7 also shows the t-SNE plot of the clustering result of FS-VAE against PCA. By inspection, the FS-VAE model’s clusters are more distinct and well separated, and the t-SNE plot provides additional evidence to support the improved clustering performance of the FS-VAE model.
We can calculate the mean intra-cluster distance (MICD) [
35], which is a measure of the average distance between data points within a cluster in the private latent subspace, as an indicator of changes in the underlying signal generation process. For a given cluster
C, its MICD can be computed as follows:
where
is the number of points in cluster
is the Euclidean distance between distinct pair of points
and
in the latent space. By tracking changes in MICD, we can detect alterations in the underlying signal generation process, which may manifest as variations in data distribution within the latent space. An increase in MICD indicates that the average distance between points within a cluster is growing. This can be linked to the signal generating process becoming more random. This randomness is often associated with a degradation in reliability.
Figure 8 showcases MICD plots for a series of datasets, spanning three distinct time periods, each differing by a month. In addition, it includes a magnitude-shape (MS) plot [
36] for functional data analysis comparison. Both the MICD and MS plots exhibit a positive trend over the observed time period. This upward trend potentially indicates a deterioration in the robustness of the underlying signal generating process, hinting at possible degradation [
37]. This consistency between the plots effectively demonstrates the utility of the private subspace in finding trends in sensor data variations. It provides valuable insights into the stability of the underlying signal generation process. This approach complements temporal feature-based classification in the shared latent subspace for anomaly detection.