ChronoVectors: Mapping Moments through Enhanced Temporal Representation

Zhang, Qilei; Mott, John H.

doi:10.3390/math12172651

Open AccessArticle

ChronoVectors: Mapping Moments through Enhanced Temporal Representation

by

Qilei Zhang

^*

and

John H. Mott

^*

Niswonger Aviation Technology Building, Purdue University, 1401 Aviation Drive, West Lafayette, IN 47907, USA

^*

Authors to whom correspondence should be addressed.

Mathematics 2024, 12(17), 2651; https://doi.org/10.3390/math12172651

Submission received: 22 July 2024 / Revised: 19 August 2024 / Accepted: 22 August 2024 / Published: 26 August 2024

(This article belongs to the Special Issue Statistical Modeling and Data-Driven Methods in Aviation Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Time-series data are prevalent across various fields and present unique challenges for deep learning models due to irregular time intervals and missing records, which hinder the ability to capture temporal information effectively. This study proposes ChronoVectors, a novel temporal representation method that addresses these challenges by enabling a more specialized encoding of temporal relationships through the use of learnable parameters tailored to the dataset’s dynamics while maintaining consistent time intervals post-scaling. The theoretical demonstration shows that ChronoVectors allow the transformed encoding tensors to map moments in time to continuous spaces, accommodating potentially infinite extensions of the sequence and preserving temporal consistency. Experimental validation using the Parking Birmingham and Metro Interstate Traffic Volume datasets reveals that ChronoVectors enhanced the predictive capabilities of deep learning models by reducing prediction error for regression tasks compared to conventional time representations, such as vanilla timestamp encoding and Time2Vec. These findings underscore the potential of ChronoVectors in handling irregular time-series data and showcase its ability to improve deep learning model performance in understanding temporal dynamics.

Keywords:

time-series data; temporal representation; deep learning; Time2Vec; ChronoVectors

MSC:

37M10

1. Introduction

Time-series data are prevalent across numerous fields, ranging from finance and transportation to weather forecasting and healthcare [1]. It comprises a sequence of data points with either a single variable or multiple attributes, collected at periodic or variable time intervals. This irregularity in time intervals, along with issues such as outliers and missing values, presents unique impediments to accurate data analysis and modeling [2]. Temporal dependencies and patterns within the data may be obscured by these irregularities and biases, leading to suboptimal model performance.

Concurrently, various deep learning models have emerged for processing time-series or sequential data, such as vanilla Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU) networks. However, these RNN-based models predominantly assume that the input data are recorded at fixed intervals. Similarly, transformer models [3], which have revolutionized language processing tasks, also struggle with discontinuous sequences due to their inherent architecture designed for handling continuous text data. Discontinuous timestamps or missing records at certain timestamps can hinder the model’s ability to effectively capture temporal information, thereby significantly impacting model performance [4,5]. Irregular time intervals pose such significant difficulties, as they can disrupt the chronological order and temporal dependencies within the data, while outliers can introduce biases and reduce model accuracy. Techniques such as interpolation, imputation, or normalization have been explored to address these gaps, though they come with their own set of limitations and potential biases [6]. Therefore, developing robust methods to handle these irregularities is crucial for improving model accuracy.

Effectively handling time-series data in deep learning models presents several challenges. Simply using the timestamp as a feature may prove inadequate and could introduce issues. For instance, the arbitrary nature of the reference point for time zero may prevent the model from accurately capturing temporal information, particularly when the time-series data are shifted. Moreover, if a scaler such as MinMaxScaler is applied to the timestamp during training, the model may struggle to interpret new timestamps that fall outside the training data range. The scaled interval spacing of timestamps can vary across datasets, potentially resulting in data that lack a clear chronological order. Therefore, it is imperative to explore new methodologies for managing time-series data in deep learning models to enhance the capture of temporal information and facilitate the effective manipulation of data sequences.

2. Related Work

Several studies, such as those by Cao et al. [7] and Tang et al. [8], address the issue of missing values by employing imputation methods, such as linear interpolation, or by treating it as a prediction task to fill in the missing values, thereby rendering the time-series data continuous. For instance, Liu et al. [9] attempted to fuse different data sources to mitigate the impact of missing data in traffic flow prediction tasks.

Other research endeavors have focused on modifying machine learning models or neural network architectures to enhance their performance on time-series data. Che et al. [5] introduced a novel GRU network with trainable decays, termed GRU-D, which leverages missing data patterns to achieve superior prediction performance. Chen et al. [10] parameterized the derivative of the hidden state by integrating ordinary differential equations (ODE) with neural network models. This approach, known as Neural ODE, enables models to handle data that arrive at irregular time intervals. Studies such as Zhang et al. [11] and Zhang and Mott [12] have demonstrated the potential of utilizing the Toeplitz inverse covariance-based clustering method [13] and the Adaboost algorithm [14] to classify flight trajectory data, which are non-continuous time-series data with multiple distinct features (multivariate), in both supervised and unsupervised manners.

In addition, preprocessing time information can enhance its compatibility for various models. Li and Marlin [15] addressed the challenges of classifying sparse and irregularly sampled time-series data by developing an uncertainty-aware kernel called the expected Gaussian kernel (MEG), which captures the local temporal structure and uncertainties in the data. Xu et al. [4] proposed several functional time representation methods based on Bochner’s and Mercer’s theorems to construct a translation-invariant kernel, effectively capturing useful time–event interactions. Tancik et al. [16] discussed how passing input points through a simple Fourier feature mapping enables multilayer perceptrons (MLPs) to learn high-frequency functions in low-dimensional problem domains, significantly improving the performance of MLPs in tasks relevant to computer vision and graphics.

In particular, the work of Kazemi et al. [17] received significant attention in the field of time-series data representation. Several studies, such as Geng et al. [18] and Diniz et al. [19], adopted the Time2Vec encoding method introduced in that work to represent time-series data. The Time2Vec encoding is a technique that uses a set of sinusoidal variables to encode timestamp information [17]. Specifically, this representation encodes timestamps into a one-dimensional vector of length

k + 1

, where k is the number of timestamps. Equation (1) illustrates the encoding process, where

τ

represents a scalar timestamp, i is the index of the elements in the vector, and

F

is a periodic function, with sin being the specific function used in their work.

t 2 v (τ) [i] = \{\begin{matrix} ω_{i} τ + φ_{i}, & if i = 0 \\ F (ω_{i} τ + φ_{i}) = sin (ω_{i} τ + φ_{i}), & if 1 \leq i \leq k \end{matrix}

(1)

Therefore, the aforementioned studies have demonstrated the potential for improving the performance of deep learning models on time-series data by enhancing the temporal representation of the data. Building on the concepts introduced by the positional encoding of transformers [3,20], Zhang [21] has outlined a set of ideal objectives to facilitate the neural networks’ comprehension of time-series data.

First, a unique encoding for each timestamp is required to ensure that each moment is distinctly represented and uniquely identifiable. Second, this encoding must accommodate potentially infinite extensions, as timestamps could span vast ranges. On the one hand, large numbers might not differ essentially from small numbers in terms of the timestamp value, serving merely as indices or sequence numbers. On the other hand, neural networks require scaling to an appropriate range, typically between −1 and 1 or 0 and 1, to ensure compatibility with other features that have been normalized or processed by activation functions. The third requirement pertains to the consistency of intervals between timestamps. The encoding should preserve the temporal relationships within the data points, ensuring that regardless of the chosen reference point, the temporal intervals between different timestamps consistently convey the same measure of time. One may also express the requirements in a mathematical representation, as follows:

Distinct encoding for each timestamp: $\forall τ_{1}, τ_{2} \in R, τ_{1} \neq τ_{2}, f (τ_{1}) \neq f (τ_{2})$ . Here, a function $f : T \to R^{n}$ is defined, where $T$ is the set of all possible timestamps, and $R^{n}$ is an n-dimensional real vector space. The function $f$ assigns a unique vector to each timestamp, ensuring distinct representation.
Unlimited extension of sequence: $T = {t_{i} ∣ t_{i} \in R and t_{i} < t_{i + 1}, \forall i \in N}$ . Here, $T \subset R$ represents the subset of all possible timestamps, and the condition $t_{i} < t_{i + 1}$ ensures that greater timestamps can be added, accommodating the potentially infinite extension of the sequence.
Consistency of intervals: $\forall τ_{1}, τ_{2}, τ_{3}, τ_{4} \in T, (τ_{2} - τ_{1}) = (τ_{4} - τ_{3}) \Rightarrow ∥ f (τ_{2}) - f (τ_{1}) ∥ = ∥ f (τ_{4}) - f (τ_{3}) ∥$ . Here, for any two pairs of timestamps with equivalent intervals, the Euclidean norm of their differences, defined for any vector $f (τ) \in R^{n}$ as $∥ f (τ) ∥ = \sqrt{\sum_{i = 1}^{n} f {(τ)}_{i}^{2}}$ , should also reflect this equivalence. This ensures that the temporal relationships within the data points are preserved.

3. Methodology

Following the previous discussions, this research will explore the theoretical feasibility of addressing the challenges associated with time-series data representation. Subsequently, a representation method called ChronoVectors will be proposed, which resembles the positional encodings used in transformers. Furthermore, the ChronoVectors representation will be thoroughly analyzed to ensure it meets these requirements raised in Section 2, providing a robust and flexible solution for handling chronological data that are not continuous or have missing records at certain timestamps.

3.1. Theoretical Feasibility

3.1.1. Distinct Encoding and Unlimited Extension

To satisfy the first and second requirements, it is necessary to consider whether an infinite set or countably quasi-infinite set can be mapped to a limited range without replication. Specifically, an injective function f that can map an infinite set

T

, such as the real number set

R

, into a limited range, such as

[- 1, 1]

, in a one-to-one manner is required. Formally,

\forall τ_{1}, τ_{2} \in domain (f), f (τ_{1}) = f (τ_{2}) \Rightarrow τ_{1} = τ_{2}

, with

f : T \to [a, b]

. Here, a and b represent the lower and upper bounds of the range, respectively. Since both

T

and

[a, b]

have the same cardinality

c

, it is theoretically possible to construct such a function. For example, an arctangent function, which maps the real number set to the range of

[- 1, 1]

, is shown in Equation (2).

f (τ) = \frac{2}{π} arctan (τ)

(2)

There are several other examples of such functions, such as the sigmoid activation function, which can map the real number set to a bounded range. However, while the injective function theoretically ensures no loss of unique representation, in practice, numerical precision limitations may lead to some degree of loss of unique representation, particularly when the timestamp (assumed to be non-negative) is very large. Nevertheless, to some extent, it is theoretically feasible to find a function that satisfies the first and second requirements.

3.1.2. Interval Consistency

To satisfy the requirement of interval consistency, it is essential to ensure that the norm of the difference between the representations of any two pairs of timestamps is equal, i.e.,

∥ f (τ_{2}) - f (τ_{1}) ∥ = ∥ f (τ_{4}) - f (τ_{3}) ∥

. This can be rephrased by defining the change in input as

Δ_{t} = τ_{2} - τ_{1} = τ_{4} - τ_{3}

. Thus, the requirement can be transformed into finding a function

g (Δ_{t})

that satisfies the following equations:

\begin{matrix} ∥ f (τ_{2}) - f (τ_{1}) ∥ & = g (Δ_{t}) = g (τ_{2} - τ_{1}) \\ ∥ f (τ_{4}) - f (τ_{3}) ∥ & = g (Δ_{t}) = g (τ_{4} - τ_{3}) \end{matrix}

(3)

Furthermore, this transforms the problem of maintaining the interval consistency into finding a function

g (Δ_{t}) = ∥ f (τ + Δ_{t}) - f (τ) ∥

. This equation ensures that the function f exhibits consistent and predictable behavior under translation with respect to the time interval

Δ_{t}

, regardless of the reference point.

Interestingly, cosinusoidal functions exhibit properties that align with the interval consistency requirement. According to the properties of isosceles triangles, the length of the base of the triangle on the unit circle is determined solely by the cosine of the angle opposite the base. This angle can be represented as the time interval

Δ_{t}

. The distance between the vectors

f (τ)

and

f (τ + Δ_{t})

, which is the norm of their difference, can be calculated using Equation (4).

\begin{matrix} g (Δ_{t}) & = ∥ f (τ + Δ_{t}) - f (τ) ∥ = \sqrt{2 - 2 cos (Δ_{t})} \end{matrix}

(4)

In this context, the Euclidean norm is applied to the distance between two points on the unit circle in a 2D plane, which is the square root of the sum of the squares of the vector elements. In Figure 1, points

f (τ)

and

f (τ + Δ_{t})

on the unit circle are vectors specified by the sine and cosine functions of time

τ

and

τ + Δ_{t}

, respectively. These points, when plotted on the unit circle, form an isosceles triangle, where the angle between the vectors is

Δ_{t}

and the length of the base is

\sqrt{2 - 2 cos (Δ_{t})}

. This result ensures the property of interval consistency as it directly correlates the time interval

Δ_{t}

with the distance between points on the unit circle.

3.2. Proposed Representation—ChronoVectors

Building upon the previous discussion and drawing inspiration from the positional encoding mechanisms used in transformers [3] and the Time2Vec encoding [17], this research proposes a novel representation method called ChronoVectors. The basic form of the ChronoVectors consists of a two-dimensional vector, defined by the sine and cosine functions of time

τ

. Learnable parameters

ω

and

φ

are introduced to control the angular frequency and phase shift of these functions within a neural network setting. These parameters are optimized during backpropagation, allowing for a tailored temporal representation that aligns with the inherent temporal dynamics of the dataset.

The fundamental representation of ChronoVectors is expressed as

c (τ) = [sin (ω τ + φ), cos (ω τ + φ)]

. This formulation adheres to the proposed requirements under relaxed conditions. First, the sine and cosine transformations naturally scale any value to the range of [−1, 1], regardless of the magnitude, facilitating an infinite extension of time sequences without exceeding the desired normalized value range. Second, the Euclidean distance between the representations remains consistent, as demonstrated in Equation (5). The representation of time intervals is inherently dependent on their magnitude, irrespective of their position on the timeline.

\begin{matrix} g (Δ_{t}) & = ∥ f (τ + Δ_{t}) - f (τ) ∥ \\ = \sqrt{{[sin (ω (τ + Δ_{t}) + φ) - sin (ω τ + φ)]}^{2} + {[cos (ω (τ + Δ_{t}) + φ) - cos (ω τ + φ)]}^{2}} \\ = \sqrt{2 - 2 cos (ω Δ_{t})} \end{matrix}

(5)

The distinct encoding for each timestamp, however, is not strictly guaranteed due to the periodicity of the sine and cosine functions. The transformed points will be unique in 2D space only under specific conditions. For instance, uniqueness can be maintained if the original values fall within a range smaller than

2 π / ω

, thereby avoiding overlap on the unit circle. This situation is particularly manageable when the timestamp information is within a restricted range. When

ω

is sufficiently small, the repetition of values is significantly minimized. Alternatively, if the values exceed a

2 π / ω

range, provided that none of them are

2 π / ω

apart, they will not map to the same point due to the periodic nature of the sine and cosine functions.

To mitigate this issue, additional dimensions may be added to the representation to enhance its uniqueness. Another approach involves introducing flexibility by differentiating the parameters

ω

, setting

ω_{s} \neq ω_{c}

for the sine and cosine functions. This adjustment introduces a weak condition of uniqueness, allowing the ChronoVectors to satisfy the requirements under relaxed conditions. Consequently, the proposed ChronoVectors representation is theoretically feasible for handling chronological data that are either non-continuous or have missing records at certain timestamps.

4. Case Study

This section describes the experiments conducted to validate the proposed ChronoVectors methodology. The research is performed using time-series forecasting tasks to demonstrate the effectiveness of the representation.

4.1. Data

The research utilized two datasets from the UCI Machine Learning Repository. The first dataset, called the Parking Birmingham Dataset [22], records the occupancy status of several parking lots in Birmingham, UK, from October 2016 to December 2016. This dataset contains 35,717 instances and includes attributes such as timestamp, occupancy, capacity, and parking lot name. To create the target variable, the occupancy rate is calculated by dividing the occupancy by the capacity. Notably, the timestamps are not continuous and are formatted as “YYYY-MM-DD HH:MM:SS”. For this study, the time information is represented as the number of seconds since the start of the day, and supplementary weekday information is included as a categorical variable. The objective is to predict the occupancy rate based on the parking lot name, weekday, and time of the day. For convenience, the experiments on this dataset will be referred to as Park. Figure 2a illustrates the clustering of the occupancy rate by plotting the data for the four parking lots with the highest number of records. Figure 2b presents a heatmap of system-level average occupancy rates, aggregated at 30 min intervals across different times of the day and days of the week. Notably, the parking data, originally recorded at irregular timestamps, was resampled to a consistent 30 min frequency, with occupancy rates calculated by summing the total occupancy and dividing by the total capacity for each interval. The color intensity in each cell represents the mean occupancy rate, highlighting temporal patterns in parking demand.

Another dataset used in this research is the Metro Interstate Traffic Volume Dataset [23], which contains hourly traffic volume data for Interstate 94 Westbound and comprises 48,204 instances. This dataset encompasses a wider range of attributes, both categorical and numerical. Categorical attributes include weekday, holiday, and weather descriptions, while numerical attributes consist of temperature, rain, snow, and traffic volume. The timestamp information, labeled “date time”, follows the format “YYYY-MM-DD HH:MM:SS” as well. However, since the minute and second information are consistently zero, only the date and hour information will be utilized. Additionally, a considerable number of hourly records are missing, leading to two possible approaches for handling this timestamp information. One approach involves calculating the hour of the year as a single feature to represent the time information and processing by ChronoVectors. The other approach includes providing the month and hour of the day, along with the hour of the year, and processing all these time features by ChronoVectors. Both approaches are tested in the experiments. Notably, this analysis omits the year information. Because the dataset encompasses a limited timeframe of only several years, including the year information would not be statistically meaningful for revealing any periodic trends within this restricted period. Consequently, the target of this experiment is to predict the average traffic volume on a specific day using the available features, such as weather information and time information. Subsequently, the experiments conducted on this dataset will be referred to as Metro. Figure 3a illustrates an example of a year of clustering the traffic volume changes by weather conditions. Additionally, Figure 3b presents a heatmap of the hourly traffic volume by day of the week and hour of the day. Each cell shows the calculated average traffic volume for the corresponding slots across all days, providing insights into the traffic volume trends.

4.2. Experiment Design

Both networks used in the experiments are designed concisely and equipped with ChronoVectors representation to enhance the integration of temporal features. The primary network structures for both Park and Metro datasets share several common elements, including the incorporation of ChronoVectors for time representation; embeddings for categorical variables; and multiple fully connected layers with batch normalization, activation functions, and dropout for regularization. The output layers in both networks are designed to produce a single output value, indicating their application in prediction tasks, specifically regression.

ParkNet is specifically designed to handle the Park dataset. It incorporates embeddings for park IDs and weekdays. The final input to ParkNet consists of concatenated embeddings of the park ID, weekday, ChronoVectors representation, and capacity features. This concatenated input passes through three fully connected layers, each followed by batch normalization, LeakyReLU activation, and dropout for regularization, with the final layer producing the desired output.

In contrast, MetroNet is optimized for handling more dynamic, sequential data, as it processes traffic volume data on an hourly basis for each day. It incorporates embeddings for multiple categorical features, including holiday ID, weather conditions (main and description), and weekdays. In particular, this experiment provides two types of models, distinguished by their use of time features. One model utilizes solely the hour of the year, while the other incorporates additional temporal features such as month and hour of the day. The input to MetroNet combines these embeddings with time features and additional continuous features, such as temperature and precipitation. The network employs a GRU layer to handle variable-length sequences, accommodating instances where some hours in a day are missing. Following the GRU layer, layer normalization and dropout are employed before applying the output for the final fully connected layer.

In summary, both ParkNet and MetroNet share foundational design elements, such as the employment of ChronoVectors representation to encode time information and the use of embedding layers for categorical variables. However, ParkNet is structured for static temporal data associated with park occupancy rates, leveraging fully connected layers for their prediction at a specific timestamp. In contrast, MetroNet is designed to demonstrate the application of ChronoVectors representation in processing sequential data by employing GRU layers and a more diverse set of embeddings to accommodate the complexities of traffic and weather data. This dual approach highlights the versatility and adaptability of the ChronoVector’s representation in handling different types of time-series data.

4.3. Baseline Setup

In general, the experiments are designed to compare the performance of the proposed ChronoVectors representation with the vanilla timestamp representation and the Time2Vec representation. The vanilla timestamp representation serves as an ablation experiment, using simple timestamps as input to illustrate the effects of the proposed ChronoVectors representation. In other words, the network structure remains unchanged, but the time representation is replaced accordingly with embedding tensors.

All experiments will utilize the mean squared error (MSE) as the loss function and the Adam optimizer, with consistent parameters for optimization. The evaluation metric for both datasets will be the average absolute deviation. Additionally, the 95% confidence interval of the deviation will be calculated to provide a more comprehensive understanding of the overall prediction quality.

4.4. Results

The experimental results, presented in Table 1 and Table 2, showcase the impact of different time representations on prediction accuracy for the Park and Metro datasets. Each model’s accuracy and stability are evaluated using the average deviation (Avg. Dev.) and standard deviation (Std. Dev.) metrics, respectively. The 95% confidence interval (95% CI) provides a range of values likely to contain the true average deviation, offering a more comprehensive insight into the prediction quality.

Table 1 displays the results for parking occupancy rate prediction. The ChronoVectors representation outperforms both the vanilla timestamp representation and the Time2Vec representation across the training, validation, and test sets. Specifically, the ChronoVectors model achieves the lowest average deviations of 6.944%, 8.612%, and 10.112% for the training, validation, and test sets, respectively. These figures translate to superior accuracy compared to the other models, which consistently exhibit higher deviations.

Similar to the parking occupancy rate predictions, the ChronoVectors representation again outperforms the vanilla timestamp and the Time2Vec representations across all sets. Figure 4 provides a visual demonstration of the performance curves across different time representations, with the ChronoVectors model achieving lower loss values. Table 2 also showcases the ChronoVectors representation’s dominance in Interstate traffic volume prediction. The ChronoVectors model achieves the lowest average deviations of 226.4, 239.2, and 261.0 for the training, validation, and test sets, respectively. Interestingly, the inclusion of more granular time features, denoted by the “+” sign in the model names, further enhances the model’s predictive capabilities, with improvements being more pronounced in the vanilla timestamp and the Time2Vec representations compared to the ChronoVectors representation.

Furthermore, although the ChronoVectors representation shows superior accuracy, the standard deviation results reveal some nuances. The standard deviation for the ChronoVectors models is slightly higher than that for the vanilla timestamp models in both test sets, but it remains lower than that for the Time2Vec models. This increased variability in the predictions could be attributed to the additional parameters in the ChronoVectors model, which may require more careful tuning to achieve better stability. Nevertheless, the ChronoVectors model’s superior accuracy is evident, as indicated by the lower bounds of the 95% confidence intervals compared to the other models in all sets.

Overall, the experimental results clearly indicate that the proposed ChronoVectors representation significantly enhances the accuracy of temporal data predictions compared to the vanilla timestamp and the Time2Vec representations. This improvement is observed consistently across both datasets and across the training, validation, and test sets.

5. Conclusions and Future Work

The research has proposed a novel representation method called ChronoVectors, which effectively captures and represents the temporal dynamics inherent in time-series data. The ChronoVectors representation satisfies the proposed requirements in a relaxed manner, providing a robust and reliable method for time-series data processing. The experimental results demonstrate that the ChronoVectors representation’s ability to predict time-series data outperforms conventional time representations, such as vanilla timestamps and Time2Vec. The superior performance of ChronoVectors can be attributed to its ability to effectively capture and represent the temporal dynamics inherent in the data, specifically through the learnable parameters

ω

and

φ

that allow for tailored temporal representations aligning with the dataset’s temporal dynamics.

The primary focus of this research was to validate the conceptual viability of the ChronoVectors representation in enhancing the integration of temporal features. However, the exploration of parameter combinations or more intricate network structures during the training process was limited, potentially hindering the attainment of optimal model performance. For instance, the utilization of parameter tuning methods like grid search was not fully explored. Despite these limitations, the results successfully indicate that the ChronoVectors representation offers a promising approach for time-series data processing tasks.

Future work could involve more complex network structures and different types of sequential data to further validate the effectiveness of the ChronoVectors representation. For instance, applying the ChronoVectors method to tasks such as flight trajectory prediction or classification, as demonstrated in Zhang [21], could provide additional insights into its capabilities. Additionally, other functions beyond sine and cosine that satisfy the requirements of the timestamp representation could be considered to enhance the understanding of temporal dynamics.

Another avenue for future research is the potential application of the ChronoVectors representation in other machine learning models, such as decision trees, random forests, or gradient boosting, to examine their performance across various methodologies. Furthermore, the synergistic effects of combining the ChronoVectors representation with other neural network architectures should be explored. As demonstrated by the results, the inclusion of more granular time features improved model performance to varying degrees, suggesting that non-orthogonal factors, such as the number of layers, learning rate, and batch size, may influence model outcomes.

In conclusion, the ChronoVectors representation demonstrates a promising approach to enhancing the integration of temporal features for time-series data processing tasks by effectively capturing and representing the temporal dynamics inherent in the data. This research could encourage further exploration into more complex network structures and diverse applications.

Author Contributions

Conceptualization, Q.Z. and J.H.M.; methodology, Q.Z. and J.H.M.; software, Q.Z.; validation, Q.Z.; formal analysis, Q.Z.; investigation, Q.Z.; resources, Q.Z.; data curation, Q.Z.; writing—original draft preparation, Q.Z.; writing—review and editing, Q.Z. and J.H.M.; visualization, Q.Z.; supervision, J.H.M.; project administration, J.H.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this study are available on the UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/index.php (accessed on 17 May 2024). The code for the experiments will be available once the paper is accepted.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ahmed, D.M.; Hassan, M.M.; Mstafa, R.J. A review on deep sequential models for forecasting time series data. Appl. Comput. Intell. Soft Comput. 2022, 2022, 6596397. [Google Scholar] [CrossRef]
Kwak, S.K.; Kim, J.H. Statistical data preparation: Management of missing values and outliers. Korean J. Anesthesiol. 2017, 70, 407–411. [Google Scholar] [CrossRef] [PubMed]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Xu, D.; Ruan, C.; Korpeoglu, E.; Kumar, S.; Achan, K. Self-attention with Functional Time Representation Learning. In Advances in Neural Information Processing Systems; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32. [Google Scholar]
Che, Z.; Purushotham, S.; Cho, K.; Sontag, D.; Liu, Y. Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 2018, 8, 6085. [Google Scholar] [CrossRef] [PubMed]
White, I.R.; Carlin, J.B. Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values. Stat. Med. 2010, 29, 2920–2931. [Google Scholar] [CrossRef]
Cao, W.; Wang, D.; Li, J.; Zhou, H.; Li, L.; Li, Y. Brits: Bidirectional recurrent imputation for time series. Adv. Neural Inf. Process. Syst. 2018, 31, 6776–6786. [Google Scholar]
Tang, J.; Zhang, X.; Yin, W.; Zou, Y.; Wang, Y. Missing data imputation for traffic flow based on combination of fuzzy neural network and rough set theory. J. Intell. Transp. Syst. 2021, 25, 439–454. [Google Scholar] [CrossRef]
Liu, Q.; Wang, B.; Zhu, Y. Short-term traffic speed forecasting based on attention convolutional neural network for arterials. Comput.-Aided Civ. Infrastruct. Eng. 2018, 33, 999–1016. [Google Scholar] [CrossRef]
Chen, R.T.Q.; Rubanova, Y.; Bettencourt, J.; Duvenaud, D.K. Neural Ordinary Differential Equations. In Advances in Neural Information Processing Systems; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
Zhang, Q.; Mott, J.H.; Johnson, M.E.; Springer, J.A. Development of a Reliable Method for General Aviation Flight Phase Identification. IEEE Trans. Intell. Transp. Syst. 2021, 23, 11729–11738. [Google Scholar] [CrossRef]
Zhang, Q.; Mott, J.H. Improved Framework for Classification of Flight Phases of General Aviation Aircraft. Transp. Res. Rec. 2022, 2677, 1665–1675. [Google Scholar] [CrossRef]
Hallac, D.; Vare, S.; Boyd, S.; Leskovec, J. Toeplitz inverse covariance-based clustering of multivariate time series data. arXiv 2017, arXiv:1706.03161. [Google Scholar]
Hastie, T.; Rosset, S.; Zhu, J.; Zou, H. Multi-class adaboost. Stat. Its Interface 2009, 2, 349–360. [Google Scholar] [CrossRef]
Li, S.C.X.; Marlin, B. Classification of sparse and irregularly sampled time series with mixtures of expected Gaussian kernels and random features. In Proceedings of the Thirty-First Conference on Uncertainty in Artificial Intelligence, Arlington, VA, USA, 12–16 July 2015; pp. 484–493. [Google Scholar]
Tancik, M.; Srinivasan, P.; Mildenhall, B.; Fridovich-Keil, S.; Raghavan, N.; Singhal, U.; Ramamoorthi, R.; Barron, J.; Ng, R. Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains. In Advances in Neural Information Processing Systems; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2020; Volume 33, pp. 7537–7547. [Google Scholar]
Kazemi, S.M.; Goel, R.; Eghbali, S.; Ramanan, J.; Sahota, J.; Thakur, S.; Wu, S.; Smyth, C.; Poupart, P.; Brubaker, M. Time2vec: Learning a vector representation of time. arXiv 2019, arXiv:1907.05321. [Google Scholar]
Geng, D.; Wang, B.; Gao, Q. A hybrid photovoltaic/wind power prediction model based on Time2Vec, WDCNN and BiLSTM. Energy Convers. Manag. 2023, 291, 117342. [Google Scholar] [CrossRef]
Diniz, P.; Junior, D.A.D.; Diniz, J.O.; de Paiva, A.C.; Silva, A.C.d.; Gattass, M.; Quevedo, R.; Michelon, D.; Siedschlag, C.; Ribeiro, R. Time2Vec transformer: A time series approach for gas detection in seismic data. In Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, Virtual Event, 25–29 April 2022; pp. 66–72. [Google Scholar]
Kazemnejad, A. Transformer Architecture: The Positional Encoding. 2019. Available online: https://kazemnejad.com/blog/transformer_architecture_positional_encoding (accessed on 15 May 2024).
Zhang, Q. General Aviation Aircraft Flight Status Identification Framework. Ph.D. Thesis, Purdue University, West Lafayette, IN, USA, 2024. [Google Scholar] [CrossRef]
Stolfi, D. Parking Birmingham. UCI Machine Learning Repository. 2019. Available online: https://archive.ics.uci.edu/dataset/482/parking+birmingham (accessed on 17 May 2024).
Hogue, J. Metro Interstate Traffic Volume. UCI Machine Learning Repository. 2019. Available online: https://archive.ics.uci.edu/dataset/492/metro+interstate+traffic+volume (accessed on 17 May 2024).

Figure 1. The unit circle illustrating two points,

f (τ)

and

f (τ + Δ_{t})

, which form an isosceles triangle. The angle between the vectors is

Δ_{t}

, and the base length is

\sqrt{2 - 2 cos (Δ_{t})}

.

Figure 1. The unit circle illustrating two points,

f (τ)

and

f (τ + Δ_{t})

, which form an isosceles triangle. The angle between the vectors is

Δ_{t}

, and the base length is

\sqrt{2 - 2 cos (Δ_{t})}

.

Figure 2. Visual representation of occupancy rate for the Parking Birmingham Dataset: (a) Clustering of variations in occupancy rate for the top four parking lot entries (SystemCodeNumber) over multiple days. Each color corresponds to a specific parking lot entry, with individual lines representing daily data within each color group. (b) Heatmap of the overall occupancy rates segmented by 30 min intervals and days of the week. Each cell shows the calculated average occupancy rate for the corresponding time slot across all parking lots.

Figure 3. Visual representation of traffic volume for the Metro Interstate Traffic Volume Dataset: (a) Clustering of traffic volume changes by weather conditions for the year 2016. Each color corresponds to a specific weather condition (weather_main), with individual lines representing data for different days. (b) Heatmap of the overall hourly traffic volume by hour and day of the Week. Each cell shows the calculated average traffic volume for the corresponding hour across all days.

Figure 4. Comparison of loss functions for different time representations in the Metro dataset: (a) training loss function comparison; (b) validation loss function comparison.

Table 1. Model results for parking occupancy rate (%) prediction.

Model	Metric	Train	Validation	Test
Vanilla	Avg. Dev.	8.010%	10.119%	10.533%
	Std. Dev.	7.053%	8.476%	9.164%
	95% CI	(7.928%, 8.092%)	(9.842%, 10.397%)	(10.233%, 10.834%)
Time2Vec	Avg. Dev.	7.716%	9.834%	11.168%
	Std. Dev.	7.087%	8.357%	10.403%
	95% CI	(7.633%, 7.798%)	(9.561%, 10.108%)	(10.827%, 11.509%)
ChronoVec	Avg. Dev.	6.944%	8.612%	10.112%
	Std. Dev.	6.924%	8.032%	9.641%
	95% CI	(6.863%, 7.024%)	(8.349%, 8.874%)	(9.796%, 10.428%)

Table 2. Model results for interstate traffic volume prediction.

Model	Metric	Train	Validation	Test
Vanilla	Avg. Dev.	260.0	272.8	292.0
	Std. Dev.	242.3	281.3	326.8
	95% CI	(247.7, 272.3)	(232.4, 313.3)	(245.1, 339.0)
Time2Vec	Avg. Dev.	252.1	276.4	286.3
	Std. Dev.	262.9	317.0	323.5
	95% CI	(238.7, 265.4)	(230.8, 321.9)	(239.8, 332.8)
ChronoVec	Avg. Dev.	226.4	239.2	261.0
	Std. Dev.	257.3	303.0	346.6
	95% CI	(213.3, 239.5)	(195.7, 282.8)	(211.2, 310.8)
Vanilla+	Avg. Dev.	234.7	248.7	273.1
	Std. Dev.	251.4	300.1	316.6
	95% CI	(221.9, 247.4)	(205.6, 291.9)	(227.6, 318.6)
Time2Vec+	Avg. Dev.	240.5	244.5	267.9
	Std. Dev.	256.2	321.0	365.4
	95% CI	(227.5, 253.6)	(198.4, 290.6)	(215.4, 320.4)
ChronoVec+	Avg. Dev.	213.8	230.4	255.6
	Std. Dev.	256.9	315.4	322.8
	95% CI	(200.7, 226.8)	(185.0, 275.7)	(209.2, 302.0)

Note: Models with a “+” are provided with additional time information, including month and the hour.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Q.; Mott, J.H. ChronoVectors: Mapping Moments through Enhanced Temporal Representation. Mathematics 2024, 12, 2651. https://doi.org/10.3390/math12172651

AMA Style

Zhang Q, Mott JH. ChronoVectors: Mapping Moments through Enhanced Temporal Representation. Mathematics. 2024; 12(17):2651. https://doi.org/10.3390/math12172651

Chicago/Turabian Style

Zhang, Qilei, and John H. Mott. 2024. "ChronoVectors: Mapping Moments through Enhanced Temporal Representation" Mathematics 12, no. 17: 2651. https://doi.org/10.3390/math12172651

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ChronoVectors: Mapping Moments through Enhanced Temporal Representation

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Theoretical Feasibility

3.1.1. Distinct Encoding and Unlimited Extension

3.1.2. Interval Consistency

3.2. Proposed Representation—ChronoVectors

4. Case Study

4.1. Data

4.2. Experiment Design

4.3. Baseline Setup

4.4. Results

5. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI